Discovery of Prostate Cancer Biomarkers by Microarray Gene Expression Profiling

Karina Dalsgaard Sørensen; Torben Falck Ørntoft

Disclosures

Expert Rev Mol Diagn. 2010;10(1):49-64. 

In This Article

Microarray Gene Expression Profiling

Microarray expression profiling of tumor samples for biomarker discovery is based on the assumption that gene expression patterns are major determinants of cancer cell behavior. Using microarray technologies, it is possible to map, at a genome-wide scale, the complex molecular aberrations associated with cancer development. Findings can be correlated to clinical data in order to identify molecules, or molecular signatures, associated with a certain tumor stage, grade or clinical outcome.

A microarray is an ordered array of thousands of well-defined single-stranded DNA molecules (probes) immobilized on a glass slide. The following protocol refers to mRNA expression analyses, while profiling of noncoding miRNAs is described in a separate section. Typically, total RNA extracted from a biological sample is reverse transcribed into cDNA, labeled and hybridized to the microarray for determination of transcript abundance (single-color systems) or relative expression compared to a reference sample with a different label (two-color systems). Several microarray platforms using synthetic oligonucleotide probes are commercially available, and generally provide high-quality data, with superior reproducibility compared with earlier generation custom-spotted cDNA arrays.

The feasibility of measuring the expression of tens of thousands of genes in a single experiment has shifted the bottleneck from data generation to data analysis. There are two fundamental strategies for analyzing microarray datasets. The first approach, known as unsupervised classification (or hierarchical clustering), aims to identify subgroups of samples with similar gene expression profiles and is used to discover previously unknown relationships between samples (or genes).[13,14] An advantage of this method is that additional clinical data are not required. The second approach, termed supervised classification (or nonhierarchical clustering), is utilized for identification of differentially expressed genes between predefined groups of samples; for example, recurrent versus nonrecurrent tumors. While one set of samples (the training set) is used to identify the optimal expression signature that can discriminate between the groups, another set (the validation or test set) is used to evaluate how accurately the signature can classify samples that have not been grouped.[15] Due to the complexity of microarray datasets, there is an inherent risk of identifying irrelevant genes or clusters. Independent validation is therefore essential for assessment of data reliability. If an independent sample set is not available, internal validation methods based on resampling statistics are often used (e.g., cross-validation, bootstrapping and jack-knifing). A common approach is leave-one-out cross-validation (LOOCV), which repeatedly uses one sample from the original set for validation with the remaining samples as training set, until all samples have been used once for validation.

Comments

3090D553-9492-4563-8681-AD288FA52ACE
Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.

processing....