Editor’s Choice : Validation of Gene Expression Profiles in Genomic Data through Complementary Use of Cluster Analysis and PCA-Related Biplots

ijsmr logo-pdf 1349088093

Validation of Gene Expression Profiles in Genomic Data through Complementary Use of Cluster Analysis and PCA-Related Biplots
Pages 162-173
Niccolò Bassani, Federico Ambrogi, Danila Coradini, Patrizia Boracchi and Elia Biganzoli
DOI:
http://dx.doi.org/10.6000/1929-6029.2012.01.02.09
Published: 21 December 2012


Abstract: High-throughput genomic assays are used in molecular biology to explore patterns of joint expression of thousands of genes.

These methodologies had relevant developments in the last decade, and concurrently there was a need for appropriate methods for analyzing the massive data generated.

Identifying sets of genes and samples characterized by similar values of expression and validating these results are two critical issues related to these investigations because of their clinical implication. From a statistical perspective, unsupervised class discovery methods like Cluster Analysis are generally adopted.

However, the use of Cluster Analysis mainly relies on the use of hierarchical techniques without considering possible use of other methods. This is partially due to software availability and to easiness of representation of results through a heatmap, which allows to simultaneously visualize clusterization of genes and samples on the same graphical device. One drawback of this strategy is that clusters’ stability is often neglected, thus leading to over-interpretation of results.

Moreover, validation of results using external datasets is still subject of discussion, since it is well known that batch effects may condition gene expression results even after normalization.

In this paper we compared several clustering algorithms (hierarchical, k-means, model-based, Affinity Propagation) and stability indices to discover common patterns of expression and to assess clustering reliability, and propose a rank-based passive projection of Principal Components for validation purposes.

Results from a study involving 23 tumor cell lines and 76 genes related to a specific biological pathway and derived from a publicly available dataset, are presented.

Keywords: Microarrays, cluster stability, multivariate visualization, Principal Components Analysis, cell polarity.
Download Full Article
Submit to FacebookSubmit to TwitterSubmit to LinkedIn