independent component analysis: idenfying data‐driven...

Independentcomponentanalysis:iden1fyingdata‐drivenhumangenemodulesJesseEngreitz1,BernieDaigle2,YoniMarshall1,andRussB.Altman1,2

1DepartmentofBioengineering2DepartmentofGene@csStanfordUniversity

mo1va1on

Cellularphysiology,includingdiseasestatesanddrugresponses,results from the combined influences of many genes.Experimentalists have now sampled many condi@ons and celltypes, contribu@ng vast amounts of geneexpressiondata thatrepresentmanybiologicalprocesses. Theexpansionofpublicmicroarray databases such as the Gene Expression Omnibus(GEO) allows the use of intelligent datamining approaches toextract informa@onabout thesebiologicalprocesses inadata‐driven manner. Using 9,395 human microarrays measuringover20,000genes,weuseindependentcomponentanalysistoiden1fy func1onal gene modules, or sets of genes, thatdescribeawiderangeofbiologicalcondi1ons.

methods

9,395humanmicroarraysAffymetrixHGU133Plus2.0Diverseexperimentalcondi@ons

PreprocessingWe used hierarchical clustering to reduce thecontribu@ons from highly replicated experimentalsystems. We consolidated clusters of over‐represented condi@ons to create a meta‐compendiumof423meta‐arrays.

IndependentcomponentanalysisIndependent component analysis (ICA) modelsgene expression data as a linear combina@on oftranscrip@onal paZerns, termed independentcomponents.1 Given a set of microarrays, ICAiden@fies components so that sta@s@calindependence is maximized. Since ICA is astochas@cmethod,werunthealgorithm20@mesandclusterthecomponentes@mates.

GenemodulesEach independent component has a weight foreachgenethatquan@[email protected] each component, we use a weight cut‐off togenerate gene modules of over‐expressed andunder‐expressedgenes.2

GenemoduleanalysisWe can calculate the expression of eachindependent component in a new microarrayexperiment. We predict that gene modulesassociatedwithhighly‐expressedcomponentsplayanimportantroleintheexperiment.

Componentsrankedbyvarianceexplained

results

applica1on:parthenolide

Weiden@fied423 independentcomponentsanddefined846genemodules. Annota@onusingthe Gene Ontology (GO) suggests that while some of the gene modules represent knownbiologicalprocesses,somemaydescribetranscrip@onalprogramsnotwellcharacterizedbyGO.ICA gives reproducible component es@mates when applied to a large compendium of geneexpressiondata,andperformsbeZerthanPCAindescribingthedata.

Rela1onshipbetweenvarianceexplainedandnumberofenrichedGOterms Clusteredcomponentes1mateswithtopGOannota1ons

GSE7538:TreatmentofprimaryAMLspecimenswithparthenolide

12treated‐untreatedpairsofmicroarrays

Differen1allyexpressedcomponentsinparthenolideresponse

CREM

We inves@gated the mechanism for parthenolide, apreclinicaldrugwithan@‐prolifera@veeffects. Firstweiden@fieddifferen@allyexpressedcomponentsbetweentreated and untreated samples. These componentsrepresent known parthenolide‐modulated pathwaysincludingNF‐κ[email protected] iden@fy key genes in these pathways, we used theup‐regulated genes in component 373 to generate agenenetwork.3 Werankedthesegenesbasedontheirconnec@vity and iden@fied CREM as a poten@alregulatorinparthenolideresponse.

Networkofup‐regulatedgenesincomponent373createdusingIngenuitysoTware.3Greenindicatesa‘known’parthenolidegene.Blueindicatesanovelpredic1on.

conclusionGSE1060: TherecurrentSET‐NUP214fusionasanewHOXAac@va@onmechanisminpediatricT‐ALLGSE7440 EarlyResponseandOutcomeinHigh‐RiskChildhoodAcuteLymphoblas@cLeukemiaGSE10358 Discoveryandvalida@onofexpressiondatafortheGenomicsofAcuteMyeloidLeukemia…GSE10792 GenomewidegenotypingandgeneexpressiondataofchildhoodB‐cellprecursorALL…GSE7757 Robustnessofgeneexpressionsignaturesinleukemia:comparisonofthreedis@nct…GSE11190 Interferonsignalingandtreatmentoutcomeinchronichepa@@sC

Topexperimentsassociatedwithcomponent373

We extracted gene modules from alargecorpusofexpressiondatausingdata‐driven means, providing a newmethod for predic@ng func@onalrela@onships between genes. Thesemodules are useful for differen@alexpression analysis and may beappliedinanumberofothersehngs,including Gene Set EnrichmentAnalysis, phenotype classifica@on,drug discovery, and content‐basedmicroarraysearch.

SupportBio‐XUndergraduateResearchAwardBio‐X2Supercompu@ngCluster

References1.LiebermeisterW.Bioinforma)cs2002,18(1):51‐60.2.LeeS,BatzoglouS.GenomeBiology2003,24(11):R76.3.IngenuitySystems,www.ingenuity.com

independent component analysis: idenfying data‐driven...

Documents