pathway/ gene set analysis in genome-wide association studies · (ge) – ge enrichment typically...
Post on 13-Jun-2019
222 Views
Preview:
TRANSCRIPT
Pathway/GeneSetAnalysisinGenome-WideAssociationStudies
AlisonMotsinger-Reif,PhDAssociateProfessor
BioinformaticsResearchCenterDepartmentofStatistics
NorthCarolinaStateUniversity
ManySharedIssues
• Manyoftheissues/choices/methodologicalapproachesdiscussedformicroarraydataaretrueacrossall“-omics”
• Manymethodshavebeenreadilyextendedforotheromic data
• Thereareseveralbiologicalandtechnologicalissuesthatmaymakejust“offtheshelf”useofpathwayanalysistoolsinappropriate
Genome-WideAssociationStudiesPopulationresources• trios• case-controlsamples
Whole-genomegenotyping• hundredsofthousandsormillion(s)ofmarkers,typicallySNPs
Genome-wideAssociation• singleSNPalleles• genotypes• multimarkerhaplotypes
AdvantagesofGWAS
• Comparedtocandidategenestudies– unbiasedscanofthegenome– potentialtoidentifytotallynovelsusceptibilityfactors
• Comparedtolinkage-basedapproaches– capitalizeonallmeioticrecombinationeventsinapopulation
• Localizesmallregionsofthechromosome• enablesrapiddetectioncausalgene
– Identifiesgeneswithsmallerrelativerisks
ConcernswithGWAS• AssumesCDCVhypothesis
• Expense
• Powerdependenton:– Allelefrequency– Relativerisk– Samplesize– LDbetweengenotyped
markerandtheriskallele– diseaseprevalence– .ultiple testing– …….
• StudyDesign– Replication– ChoiceofSNPs
• Analysismethods– ITsupport,data
management– Variableselection– Multipletesting
SuccessesinGWASStudies• Over400GWASpaperspublishedtodate
• BigFinds:– In2005,itwaslearnedthroughGWASthatage-relatedmacular
degenerationisassociatedwithvariationinthegeneforcomplementfactorH,whichproducesaproteinthatregulatesinflammation(Kleinetal.(2005)Science,308,385–389)
– In2007,theWellcome TrustCase-ControlConsortium(WTCCC)carriedoutGWASforthediseasescoronaryheartdisease,type1diabetes,type2diabetes,rheumatoidarthritis,Crohn'sdisease,bipolardisorderandhypertension.Thisstudywassuccessfulinuncoveringmanynewdiseasegenesunderlyingthesediseases.
MoreSuccesses• Associationscanof14,500nonsynonymous SNPs infourdiseasesidentifies
autoimmunity variants.NatGenet.2007
• Genome-wideassociationstudyof14,000casesofsevencommondiseasesand3,000sharedcontrols.Wellcome TrustCaseControlConsortiumNature.2007;447;661-78
• Genomewide associationanalysisofcoronaryarterydisease.Samani etal.NEngl JMed.2007;357;443-53
• Sequencevariantsintheautophagy geneIRGMandmultipleother replicatinglocicontribute toCrohn's diseasesusceptibility.Parkes etal.NatGenet.2007;39;830-2
• Robustassociationsof fournewchromosome regions fromgenome-wideanalysesoftype1diabetes.Toddetal.NatGenet.2007;39;857-64
• AcommonvariantintheFTOgeneisassociatedwithbodymassindexandpredisposes tochildhood andadultobesity.Frayling etal.Science.2007;316;889-94
• Replicationofgenome-wideassociationsignalsinUKsamplesrevealsrisklocifortype2diabetes.Zeggini etal.Science.2007;316;1336-41
• Scottetal.(2007)Agenome-wideassociationstudyoftype2diabetesinFinnsdetectsmultiple susceptibilityvariants.Science,316,1341–1345.
• …………
Limitations• Formanydiseases,theamountoftraitvariationexplainedbyeventhesuccessesiswaybelowtheestimatedheritability.
• Recently,GWASareunderalotofcriticismforrelativelyfewtranslatablefindingsgiventheinvestmentandhype.
• AssumptionsunderlyingGWASarenottrueforalldiseases.
TAManolio etal.Nature 461,747-753 (2009)doi:10.1038/nature08494
Feasibilityofidentifyinggeneticvariantsbyriskallelefrequencyandstrengthofgeneticeffect(oddsratio).
ReasonsGWASCanFailevenifwell-poweredandwell-designed….
• Alleleswithsmalleffectsizes• Rarevariants• Populationdifferences• Epistatic interactions• Copynumbervariation• Epigeneticinheritance• Diseaseheterogeneity• ……….
PossibleAssociationModels
1. Eachofseveralgenesmayhaveavariantthatconfersincreasedriskofdiseaseindependentofothergenes
2. Severalgenesincontributeadditivelytothemalfunctionofthepathway
3. Thereareseveraldistinctcombinationsofgenevariantsthatincreaserelativeriskbutonlymodestincreasesinriskforanysinglevariant
HypotheticalDiseaseMechanism
• Foreachgeneprobabilityofknockout=0.22 =0.04
• Probabilityofdisease:– Pathwayknockedout=0.4– Pathwayintact=0.2
• SampleSize=2000cases,2000controls• Power:
EnrichmentTestinginGWAS• TestingpathwayenrichmentispossibleinGWASdata
– ManyofthesameissuesthatexistingeneexpressionenrichmenttestingoccurinGWASenrichmenttesting(e.g.choiceofstatistics,competitivevs self-contained)
• Primarydifference:– Inexpressiondatatheunitoftestingisagene– InGWASdatatheunitoftestingisaSNP
• Challenges:– IdentifyingtheSNP(set)->Genemapping– SummarizingacrossindividualSNPstatisticstocomputeaper-
genemeasure
MappingSNPstoGenes• AllSNPsinphysicalproximityofeachgene
– Pros:• All/mostgenesrepresented
– Cons:• VaryingnumberofSNPspergene• ManyoftheSNPsmaydilutesignal• Defininggeneproximitycanaffectresults
• eSNPs (ExpressionassociatedSNPs)– Pros:
• 1SNPpergene• SNPsfunctionallyassociated
– Cons:• Assumesvariantseffectexpression• NotallgeneshaveeSNPs• eSNPsmaybestudyandtissuedependent
Genesummaries
• Initialstudiesproposedifferentstatisticsforsummarizingtheoverallgeneassociationpriortoenrichmentanalysis– Number/proportionofSNPswithpvalue <0.05– Mean(-log10(pvalue))– Min(pvalue)– 1-(1-Min(pvalue))N
– 1-(1-Min(pvalue))(N+1)/2
Firstapproaches:combiningp-values• Computegene-wisep-value:
– Selectmostlikelyvariant- ‘best’ p-value– Selectedminimump-valueisbiaseddownward– Assign‘gene-wise’ p-valuebypermutations(Westfall-Young)
• Permutesamplesandcompute‘best’ p-valueforeachpermutation
• ComparecandidateSNPp-valuestothisnulldistributionof‘best’ p-values
• Combinep-valuesbyFisher’smethod,acrossSNPs(biasedinthepresenceofcorrelation)
)2(P
)log(
2)2( Vp
pV
k
Ggi
i
>=
=
Nextapproaches
• Additivemodel:
– Whereni indexesthenumberofalleleBs ofaSNPingenei inthegenesetG
– SelectsubsetofmostlikelySNP’s– Fitbylogisticregression(glm()inR)
• Significancebypermutations– Permutesampleoutcomes– Selectgenesandfitlogisticregressionagain
• Assessgoodnessoffiteachtime– Compareobservedgoodnessoffit
=Gg
iii
npp )
1log(
Competitivevs.Self-ContainedTests
• Competitivecutofftests– RequireonlypermutingSNPorGenelabels– Mayonlyallowtoassessrelativesignificance
• Self-containeddistributiontests– Requirepermutingphenotype-genotyperelationships
– Resourceintensive,maybedifficultforlargemeta-analyses
– Allowtoassessoverallsignificance
Competitivevs.Self-ContainedTests
• Self-containednullhypothesis– nogenesingenesetaredifferentiallyexpressed
• Competitivenullhypothesis– genesingenesetareatmostasoftendifferentiallyexpressedasgenesnotingeneset
WhatdoesthismeanforSNPdata?
ChoiceofPathways/GeneSets• Relativelyless“signal”inGWASthaningeneexpression
(GE)– GEenrichmenttypicallytestwhichgenesets/pathwaysshow
enrichment– GWASenrichmenttypicallytestifthereisenrichment
• Typicallywanttobeconservativeaboutselectingthenumberofpathwaystotest,otherwisewillbedifficulttoovercomemultipletesting
• PrioritizedApproach:– Limitednumberofspecifichypotheses(e.g.genesetsfrom
experiment,co-expressionmodules,disease-specificpathways/ontologies)
– ExploratoryanalysessuchasallKEGG/GOsets
SomeSpecificMethods
• SSEA– SNPSetEnrichmentAnalysis
• i-GSEA4GWAS• MAGENTA– Meta-AnalysisGene-setEnrichmentofvariantAssociations
SSEA
• Zhong etal.AJHG(2010)• eSNP analysistomapSNPstogenes– Moreonthislater…..
• Pathwaystatistic=one-sidedKolmogorov-Smirnovteststatistic
• Pathwayp-valueassessedbypermutinggenotype-phenotyperelationship
• FDRusedtocontrolerrorduetothenumberofpathwaystested
i-GSEA4GWAS• Zhangetal.Nucl AcidsRes(2010)• http://gsea4gwas.psych.ac.cn/
• Categorizesgenesassignificantornotsignificant– Significant:Atleast1SNPinthetop5%ofSNPs– Doesnotadjustforgenesize
• Pathwayscore:k/K– k=Proportionofsignificantgenesinthegeneset– K=ProportionofsignificantgenesintheGWAS
• FDRassessedbypermutingSNPlabels
MAGENTA• Segreetal.PLoS Genetics(2010)• Softwaredownload:– http://www.broadinstitute.org/mpg/magenta/– RequiresMATLAB!!– Lessconvenient,butmorecustomizablethaniGSEA4GWAS
• Customizableproportionof“significant”genes• Customizablegenewindow(upstream&downstream)• OptionforRank-Sumtest• GeneSummary=min(p)– Usesstepwiseregressiontoadjustformultiplepossiblefactors:e.g.genesize,SNPdensity
AdaptationsofGSEA
• Orderlog-oddsratiosorlinkagep-valuesforallSNPs
• MapSNPstogenes,andgenestogroups• Uselinkagep-valuesinplaceoft-scoresinGSEA– Comparedistributionoflog-oddsratiosforSNPsingrouptorandomlyselectedSNP’sfromthechip
SummaryPointsforGWAS• InGWAS,fewSNPstypicallyreachgenome-widesignificance
• Biologicalfunctionofthosethatdocantakeyearsofworktounravel
• Incorporatingbiologicalinformation(expression,pathways, etc)canhelpinterpretandfurtherexploreGWASresults
• Enrichmenttestscanbeusedtoexplorebiologicalpathwayenrichment– Differentteststellyoudifferent things
• Annotationchoicesverydifferentthatingeneexpressiondata,thoughstillrelyonthesameresources....notnecessarilysoforother‘omics”
AddinginGeneExpressionData
• Manymotivatingreasonstocombine/integratedatafrommultiple“-omes”
• ExpressionandSNPdataismostcommonlydone– Thoughmethodscouldbeappliedtocombineother“-omics”
• Generallymakeassumptionsaboutcentraldogma
GeneticsofGeneExpression
• Schadt,Monks,etal.(Nature2003)&Morley,Molony,etal.(Nature2004)showedthatgeneexpressionisaheritabletraitundergeneticcontrol
• Identifyingexpression-associatedSNPs(eSNPs)canidentifySNPswhichareassociatedwithbiologicalfunction
• ForsignificantGWAS“hits”eSNPs cansuggestcandidategenesandpossiblyinformationaboutdirectionofassociation
MotivationforIntegratedAnalysis
• Newerapproacheswillallowyoutonotdopartitioned/filteredanalysis,andleverageinformationacrossdatatypes
• Newtechnologiesallowformorereadyintegration– Ex.RNA-Seq– Droppingcostsallowformoredatatypes tobecollectedsimultaneously
– Biobanking effortarestoringmoretissues
MotivationforIntegratedAnalysis• NaturallyallowBayesianapproachesforidentifyingpriorsorjointingmodelingdata
• Severalnewapproachesproposed– MethodsthatweredevelopedforeSNPs arereadilyextendedacrossdatatypes
– Otherapproachestakeintoaccountsimilaritiesbetween/withingphenotypes• SeveralanontologyjointlyrepresentingdiseaseriskfactorsandcausalmechanismsbasedonGWASresults
• Proposedontologyisdisease-specific(nicotineaddictionandtreatment)andonlyapplicabletoveryspecificresearchquestions
– Morelateron“differentissuesfor–omics”
MotivationforIntegratedAnalysis
• Methodsarelargelyrelyingoncentraldogmaassumptionsthatdonotalwayshold
Summary• PathwayandgenesetanalysishasbeenextendedtoSNPandSNVdata
• Someannotationresourcesarereadilyadapted,butanewseriesofchoicesareavailable
• SoftwarepackagesforGWASpathwayanalysisarematuring
• Advancesinapproximationforpermutationtestingwillmakethesetoolsmorecomputationallytractable
• Manyofthesameissueswithmissingannotation,etc.arestillaconcern
Summary• IntegrationofSNPlevelandeSNP datahasbeenhighlysuccessful,andhelpsmotivatetheintegrationofother“-omes”inanalysis
• Suchintegrationwillbedependentonthequalityoftheannotationthatitrelieson
• Next,wewilltalkaboutspecificconcernsfordifferentdatatypes
• Issueswillcompoundinintegratedanalysis…
top related