independent component analysis: idenfying data‐driven...
TRANSCRIPT
Independentcomponentanalysis:iden1fyingdata‐drivenhumangenemodulesJesseEngreitz1,BernieDaigle2,YoniMarshall1,andRussB.Altman1,2
1DepartmentofBioengineering2DepartmentofGene@csStanfordUniversity
mo1va1on
Cellularphysiology,includingdiseasestatesanddrugresponses,results from the combined influences of many genes.Experimentalists have now sampled many condi@ons and celltypes, contribu@ng vast amounts of geneexpressiondata thatrepresentmanybiologicalprocesses. Theexpansionofpublicmicroarray databases such as the Gene Expression Omnibus(GEO) allows the use of intelligent datamining approaches toextract informa@onabout thesebiologicalprocesses inadata‐driven manner. Using 9,395 human microarrays measuringover20,000genes,weuseindependentcomponentanalysistoiden1fy func1onal gene modules, or sets of genes, thatdescribeawiderangeofbiologicalcondi1ons.
methods
9,395humanmicroarraysAffymetrixHGU133Plus2.0Diverseexperimentalcondi@ons
PreprocessingWe used hierarchical clustering to reduce thecontribu@ons from highly replicated experimentalsystems. We consolidated clusters of over‐represented condi@ons to create a meta‐compendiumof423meta‐arrays.
IndependentcomponentanalysisIndependent component analysis (ICA) modelsgene expression data as a linear combina@on oftranscrip@onal paZerns, termed independentcomponents.1 Given a set of microarrays, ICAiden@fies components so that sta@s@calindependence is maximized. Since ICA is astochas@cmethod,werunthealgorithm20@mesandclusterthecomponentes@mates.
GenemodulesEach independent component has a weight foreachgenethatquan@[email protected] each component, we use a weight cut‐off togenerate gene modules of over‐expressed andunder‐expressedgenes.2
GenemoduleanalysisWe can calculate the expression of eachindependent component in a new microarrayexperiment. We predict that gene modulesassociatedwithhighly‐expressedcomponentsplayanimportantroleintheexperiment.
Componentsrankedbyvarianceexplained
results
applica1on:parthenolide
Weiden@fied423 independentcomponentsanddefined846genemodules. Annota@onusingthe Gene Ontology (GO) suggests that while some of the gene modules represent knownbiologicalprocesses,somemaydescribetranscrip@onalprogramsnotwellcharacterizedbyGO.ICA gives reproducible component es@mates when applied to a large compendium of geneexpressiondata,andperformsbeZerthanPCAindescribingthedata.
Rela1onshipbetweenvarianceexplainedandnumberofenrichedGOterms Clusteredcomponentes1mateswithtopGOannota1ons
GSE7538:TreatmentofprimaryAMLspecimenswithparthenolide
12treated‐untreatedpairsofmicroarrays
Differen1allyexpressedcomponentsinparthenolideresponse
CREM
We inves@gated the mechanism for parthenolide, apreclinicaldrugwithan@‐prolifera@veeffects. Firstweiden@fieddifferen@allyexpressedcomponentsbetweentreated and untreated samples. These componentsrepresent known parthenolide‐modulated pathwaysincludingNF‐κ[email protected] iden@fy key genes in these pathways, we used theup‐regulated genes in component 373 to generate agenenetwork.3 Werankedthesegenesbasedontheirconnec@vity and iden@fied CREM as a poten@alregulatorinparthenolideresponse.
Networkofup‐regulatedgenesincomponent373createdusingIngenuitysoTware.3Greenindicatesa‘known’parthenolidegene.Blueindicatesanovelpredic1on.
conclusionGSE1060: TherecurrentSET‐NUP214fusionasanewHOXAac@va@onmechanisminpediatricT‐ALLGSE7440 EarlyResponseandOutcomeinHigh‐RiskChildhoodAcuteLymphoblas@cLeukemiaGSE10358 Discoveryandvalida@onofexpressiondatafortheGenomicsofAcuteMyeloidLeukemia…GSE10792 GenomewidegenotypingandgeneexpressiondataofchildhoodB‐cellprecursorALL…GSE7757 Robustnessofgeneexpressionsignaturesinleukemia:comparisonofthreedis@nct…GSE11190 Interferonsignalingandtreatmentoutcomeinchronichepa@@sC
Topexperimentsassociatedwithcomponent373
We extracted gene modules from alargecorpusofexpressiondatausingdata‐driven means, providing a newmethod for predic@ng func@onalrela@onships between genes. Thesemodules are useful for differen@alexpression analysis and may beappliedinanumberofothersehngs,including Gene Set EnrichmentAnalysis, phenotype classifica@on,drug discovery, and content‐basedmicroarraysearch.
SupportBio‐XUndergraduateResearchAwardBio‐X2Supercompu@ngCluster
References1.LiebermeisterW.Bioinforma)cs2002,18(1):51‐60.2.LeeS,BatzoglouS.GenomeBiology2003,24(11):R76.3.IngenuitySystems,www.ingenuity.com