func%onal genomics - university of michigan
TRANSCRIPT
Func%onalGenomics
Nov 29, 2017 Sarah Gagliano
Biostat666: Statistical methods in human genetics
Goals for this lecture • Provide an overview of functional
genomics
• Understand its importance/uses in the context of GWAS
DNAàmRNAàaminoacidschain(protein)
phenotype
The“CentralDogma”ofBiology
Transcrip%on Transla%on
Codingvaria%on
FigurefromEckeretal.(2012)Nature,489:52-55
ENCODEProjectcreatesamapoffunc%onalregions
Non-codingregionsarenot“junk”!
ChiP-seqtomapouthistonemarks
hdp://mmg-233-2013-gene%cs-genomics.wikia.com/wiki/Chroma%n_Immunoprecipita%on_(ChIP)
Crosslinkcells.IsolategenomicDNAandsonicatetoshear.
Addan%bodyspecifictomarkofinterest.
IsolateboundedDNA.Purifyandsequence.
Newer:Chroma%nconforma%oncapturetechniques(e.g.Hi-C)
AccesstoHistoneModifica%ons
RoadmapEpigenomicsProject(lotsof%ssues)ENCODE(lotsofcelllines)PsychENCODE(brainfromcontroland/orpsychiatricsamples)UCSCGenomeBrowsertovisualize
Whatisanexpressionquan%ta%vetraitlocus?
Adaptedfrom:WolenAR,MilesMF.Iden%fyinggenenetworksunderlyingtheneurobiologyofethanolandalcoholism.AlcoholRes.2012;34(3):306-17.
Expressionquan.ta.vetraitloci(eQTLs)=DNAlocithatregulateexpressionlevelsofRNAs
AccesstoeQTLs
GTEx(44%ssueswitheQTLs)hdp://www.gtexportal.org/UKBEC(10brain%ssueswitheQTLs)hdp://braineac.org/Individualstudies
ArraystoassessDNACH3
IlluminaHumanMethyla%on450KIlluminaEPIC(>850KCH3sites)
• >90%contenton450K• New:CH3sitesinENCODE
openchroma%n&enhancers
Key Points • Most variants identified through
GWAS are noncoding • Non-coding DNA help regulate
transcription (e.g. histone marks, eQTLs, DNA methylation)
• Functional annotations are dynamic
FigurefromMauranoetal.(2012)Science,337:1190-1195
(Frankeetal.2010)NSNPs=938,703Ninds=6,333cases&15,056controls
GWAShitsoverlapwithfunc%onalregions
FiguresfromMauranoetal.(2012)Science,337:1190-1195
(Frankeetal.2010)NSNPs=938,703Ninds=6,333cases&15,056controls
(Sotoodehniaetal.2010)NSNPs~2.5MNinds~40K
GWAShitsoverlapwithfunc%onalregions
Variants(annotated+classlabel)
Variableimportancemeasureforeachannota%on
Predic%onscoreforeachvariant
Algorithm
Usefunc%onalinfotoiden%fyhits
Quiz!Supervisedorunsupervised?FindnovelTFBSbasedonannotatedTFBSintheJASPARdatabasevs.non-TFBSsequencesIIden%fypathogenicvariantsfromvariantswithhighderivedallelefrequencyusinggenomiccharacteris%cs(CADD,Kircheretal.2014)Segmentthegenomeintochroma%nstatesbasedonENCODEdata(Segway,Hoffmanetal.2012)
Quiz!Supervisedorunsupervised?FindnovelTFBSbasedonannotatedTFBSintheJASPARdatabasevs.non-TFBSsequencesIden%fyclustersofsimilartumorsincancerpa%entsIden%fypathogenicvariantsfromvariantswithhighderivedallelefrequencyusinggenomiccharacteris%cs(CADD,Kircheretal.2014)Segmentthegenomeintochroma%nstatesbasedonENCODEdata(Segway,Hoffmanetal.2012)
Quiz!Supervisedorunsupervised?FindnovelTFBSbasedonannotatedTFBSintheJASPARdatabasevs.non-TFBSsequencesIden%fyclustersofsimilartumorsincancerpa%entsIden%fyriskvariantsfromothervariantstrainedongenomiccharacteris%csforHumanGeneMuta5onDatabaseSNPsvs.1KG(GWAVA,Ritchieetal.2014)Segmentthegenomeintochroma%nstatesbasedonENCODEdata(Segway,Hoffmanetal.2012)
Quiz!Supervisedorunsupervised?FindnovelTFBSbasedonannotatedTFBSintheJASPARdatabasevs.non-TFBSsequencesIden%fyclustersofsimilartumorsincancerpa%entsIden%fyriskvariantsfromothervariantstrainedongenomiccharacteris%csforHumanGeneMuta5onDatabaseSNPsvs.1KG(GWAVA,Ritchieetal.2014)SegmentthegenomeintoXchroma%nstatesusingchroma%nmarkpaderns(Segway,Hoffmanetal.2012)
Linearlynonseparable?KernelTrick
1.Kernelfunc%ontoprojectthedatain2-Dto4-Dspace
Figure1Noble2016Nat.Biotech
1.Kernelfunc%ontoprojectthedatain2-Dto4-Dspace2.ProjectSVMhyperplanein4-Dtooriginal2-Dspace
Linearlynonseparable?KernelTrick
Figure1Noble2016Nat.Biotech
Simulatedvs.Observedvariants
Simulated(14.7millionvariants):- Empiricalmodelofsequenceevolu%onwithlocaladjustmentofmuta%onrates
Simulatedvs.Observedvariants
Simulated(14.7millionvariants):- Empiricalmodelofsequenceevolu%onwithlocaladjustmentofmuta%onrates
Observed(14.7millionvariants):- Humanderivedallele>95%(exposedtomanygenera%onsofnaturalselec%on)
1%reservedfortes%ng
JohnLong(KellisLab)“EvidenceofPurifyingSelec%oninHumans”hdps://math.mit.edu/research/highschool/primes/materials/2013/conf/9-2-Long.pdf
PepperedMothEvolu%on
Nopollu%on
Ancestral
Derived
Photofrom:hdp://www.truthinscience.org.uk/%s2/index.php/evidence-for-evolu%on-mainmenu-65/127-the-peppered-moth.html
PepperedMothEvolu%on
Pollu%onNopollu%onPhotofrom:hdp://www.truthinscience.org.uk/%s2/index.php/evidence-for-evolu%on-mainmenu-65/127-the-peppered-moth.html
Ancestral
Derived
Kircher’suseofSVM
• Linearkernel• Priorfeatureselec%on(univariateanalysis)• Imputedmissingvalues• Booleanvariablesforcategoricalvariables• Interac%onterms(includethefewthatimprovethetwo-featurelinearregressionmodels)
Cell-type-specificannotsfortrainingcell-type-specificmodels
Figure1fromLeeetal.2015Nat.Gene5cs
Comparefreqofk-mers
Hepatocytes(maincellinliver)
deltaSVMvaluesfor3
experimentallyvalidatedSNPs
humanprostateadenocarcinoma
Key Points • GWAS variants are enriched
for functional information in a tissue-specific manner
• Machine learning can be used to find patterns in functional data to identify novel variants associated with disease