func%onal genomics - university of michigan

80
Func%onal Genomics Nov 29, 2017 Sarah Gagliano [email protected] Biostat666: Statistical methods in human genetics

Upload: others

Post on 01-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Func%onalGenomics

Nov 29, 2017 Sarah Gagliano

[email protected]

Biostat666: Statistical methods in human genetics

Goals for this lecture •  Provide an overview of functional

genomics

•  Understand its importance/uses in the context of GWAS

Outline •  Functional Genomics 101

• Using functional genomics to prioritize risk variants

Outline •  Functional Genomics 101

• Using functional genomics to prioritize risk variants

DNAàmRNAàaminoacidschain(protein)

phenotype

The“CentralDogma”ofBiology

Transcrip%on Transla%on

Codingvaria%on

Gene%cvaria%onàaminoacidàprotein

phenotypeCodingvaria%on

Gene%cvaria%onàaminoacidàprotein

phenotypeCodingvaria%on

Gene%cvaria%onàaminoacidàprotein

phenotypeCodingvaria%on

Gene%cvaria%onàaminoacidàprotein

phenotypeCodingvaria%on

Gene%cvaria%onàaminoacidàprotein

phenotypeCodingvaria%on

MostGWAShitsarenon-coding

Hindorffetal.2009

Non-coding(Intergenic/Intronic)

88%

12%

FigurefromEckeretal.(2012)Nature,489:52-55

ENCODEProjectcreatesamapoffunc%onalregions

Non-codingregionsarenot“junk”!

FigurefromEckeretal.(2012)Nature,489:52-55

FigurefromEckeretal.(2012)Nature,489:52-55

àChroma%nConforma%on

Marksassociatedwithclosedvs.openchroma%n

-  Developmental%me-point-  Tissue-  Sex

Histone Modifications

Histonemarksdefinefunc%onalregions

UCSCGenomeBrowserMarkofanac%veenhancer

Fig2bfromCavalli&Misteli(2012)NatureStructural&MolecBio

Enhancer-PromoterLooping

Detec%ngHistoneModifica%ons

1. Wetlab

2.  Impute

ChiP-seqtomapouthistonemarks

hdp://mmg-233-2013-gene%cs-genomics.wikia.com/wiki/Chroma%n_Immunoprecipita%on_(ChIP)

Crosslinkcells.IsolategenomicDNAandsonicatetoshear.

Addan%bodyspecifictomarkofinterest.

IsolateboundedDNA.Purifyandsequence.

Newer:Chroma%nconforma%oncapturetechniques(e.g.Hi-C)

Ernst&Kellis(2015)NatBiotech

Correla%onsacrossmarkspersample

E017=fetallungfibroblast

Ernst&Kellis(2015)NatBiotech

Correla%onsacrosssamplespermark

E017=fetallungfibroblast

Ernst&Kellis(2015)NatBiotech

Usecorrela%onsbtwnmarks&btwnsamples

Ensembleofregressiontrees

AccesstoHistoneModifica%ons

RoadmapEpigenomicsProject(lotsof%ssues)ENCODE(lotsofcelllines)PsychENCODE(brainfromcontroland/orpsychiatricsamples)UCSCGenomeBrowsertovisualize

eQTLs

Whatisanexpressionquan%ta%vetraitlocus?

Adaptedfrom:WolenAR,MilesMF.Iden%fyinggenenetworksunderlyingtheneurobiologyofethanolandalcoholism.AlcoholRes.2012;34(3):306-17.

Expressionquan.ta.vetraitloci(eQTLs)=DNAlocithatregulateexpressionlevelsofRNAs

Lundberg

RNA-seqtoiden%fyeQTLs

AccesstoeQTLs

GTEx(44%ssueswitheQTLs)hdp://www.gtexportal.org/UKBEC(10brain%ssueswitheQTLs)hdp://braineac.org/Individualstudies

DNA methylation

5-methyl-Cytosine

Robertson(2005)NatureReviewGene5cs

TSG=tumorsuppressorgene

Fragaetal.(2005)PNAS

DNAMethyla%onchangesover%me

3-year-oldtwins

Fragaetal.(2005)PNAS

DNAMethyla%onchangesover%me

3-year-oldtwins 50-year-oldtwins

ArraystoassessDNACH3

IlluminaHumanMethyla%on450KIlluminaEPIC(>850KCH3sites)

•  >90%contenton450K•  New:CH3sitesinENCODE

openchroma%n&enhancers

(GoldStandard)BisulfiteConversionofgDNA

NEB.com

Gene%cvaria%onàgeneregula%on

phenotypeNon-codingvaria%on

Gene%cvaria%onàgeneregula%on

phenotypeNon-codingvaria%on

Gene%cvaria%onàgeneregula%on

phenotypeNon-codingvaria%on

Key Points • Most variants identified through

GWAS are noncoding • Non-coding DNA help regulate

transcription (e.g. histone marks, eQTLs, DNA methylation)

•  Functional annotations are dynamic

Outline •  Functional Genomics 101

• Using functional genomics to prioritize risk variants

FigurefromMauranoetal.(2012)Science,337:1190-1195

(Frankeetal.2010)NSNPs=938,703Ninds=6,333cases&15,056controls

GWAShitsoverlapwithfunc%onalregions

FiguresfromMauranoetal.(2012)Science,337:1190-1195

(Frankeetal.2010)NSNPs=938,703Ninds=6,333cases&15,056controls

(Sotoodehniaetal.2010)NSNPs~2.5MNinds~40K

GWAShitsoverlapwithfunc%onalregions

Variants(annotated+classlabel)

Variableimportancemeasureforeachannota%on

Predic%onscoreforeachvariant

Algorithm

Usefunc%onalinfotoiden%fyhits

MachineLearningSteps

Algorithm Learnfromdata

Applyontestdata

Supervisedvs.Unsupervised

Classassignment(dataarelabelled)

Paderndiscovery(dataaren’tlabelled)

Supervisedvs.Unsupervised

Figure1fromLibbrechtandNoble2015

AsimpleMLapplica%on

Quiz!Supervisedorunsupervised?

Quiz!Supervisedorunsupervised?FindnovelTFBSbasedonannotatedTFBSintheJASPARdatabasevs.non-TFBSsequencesIIden%fypathogenicvariantsfromvariantswithhighderivedallelefrequencyusinggenomiccharacteris%cs(CADD,Kircheretal.2014)Segmentthegenomeintochroma%nstatesbasedonENCODEdata(Segway,Hoffmanetal.2012)

Quiz!Supervisedorunsupervised?FindnovelTFBSbasedonannotatedTFBSintheJASPARdatabasevs.non-TFBSsequencesIden%fyclustersofsimilartumorsincancerpa%entsIden%fypathogenicvariantsfromvariantswithhighderivedallelefrequencyusinggenomiccharacteris%cs(CADD,Kircheretal.2014)Segmentthegenomeintochroma%nstatesbasedonENCODEdata(Segway,Hoffmanetal.2012)

Quiz!Supervisedorunsupervised?FindnovelTFBSbasedonannotatedTFBSintheJASPARdatabasevs.non-TFBSsequencesIden%fyclustersofsimilartumorsincancerpa%entsIden%fyriskvariantsfromothervariantstrainedongenomiccharacteris%csforHumanGeneMuta5onDatabaseSNPsvs.1KG(GWAVA,Ritchieetal.2014)Segmentthegenomeintochroma%nstatesbasedonENCODEdata(Segway,Hoffmanetal.2012)

Quiz!Supervisedorunsupervised?FindnovelTFBSbasedonannotatedTFBSintheJASPARdatabasevs.non-TFBSsequencesIden%fyclustersofsimilartumorsincancerpa%entsIden%fyriskvariantsfromothervariantstrainedongenomiccharacteris%csforHumanGeneMuta5onDatabaseSNPsvs.1KG(GWAVA,Ritchieetal.2014)SegmentthegenomeintoXchroma%nstatesusingchroma%nmarkpaderns(Segway,Hoffmanetal.2012)

Maximum-marginsepara%nghyperplaneinmul%-dimensionalspace

SlidefromReenaRavji

SupportVectorMachines

Whichlineprovidesthebestclassifier?

Figure1Noble2016Nat.Biotech

Whichlineprovidesthebestclassifier?

Figure1Noble2016Nat.Biotech

The“SovMargin”

Figure1Noble2016Nat.Biotech

The“SovMargin”

Figure1Noble2016Nat.Biotech

Linearlynonseparable?KernelTrick

1.Kernelfunc%ontoprojectthedatain2-Dto4-Dspace

Figure1Noble2016Nat.Biotech

1.Kernelfunc%ontoprojectthedatain2-Dto4-Dspace2.ProjectSVMhyperplanein4-Dtooriginal2-Dspace

Linearlynonseparable?KernelTrick

Figure1Noble2016Nat.Biotech

1.Classifier2.SVMprotocol3.Someresults

1. Classifier

Simulatedvs.Observedvariants

Simulated(14.7millionvariants):-  Empiricalmodelofsequenceevolu%onwithlocaladjustmentofmuta%onrates

Simulatedvs.Observedvariants

Simulated(14.7millionvariants):-  Empiricalmodelofsequenceevolu%onwithlocaladjustmentofmuta%onrates

Observed(14.7millionvariants):-  Humanderivedallele>95%(exposedtomanygenera%onsofnaturalselec%on)

1%reservedfortes%ng

JohnLong(KellisLab)“EvidenceofPurifyingSelec%oninHumans”hdps://math.mit.edu/research/highschool/primes/materials/2013/conf/9-2-Long.pdf

PepperedMothEvolu%on

Nopollu%on

Ancestral

Derived

Photofrom:hdp://www.truthinscience.org.uk/%s2/index.php/evidence-for-evolu%on-mainmenu-65/127-the-peppered-moth.html

PepperedMothEvolu%on

Pollu%onNopollu%onPhotofrom:hdp://www.truthinscience.org.uk/%s2/index.php/evidence-for-evolu%on-mainmenu-65/127-the-peppered-moth.html

Ancestral

Derived

2. SVM Protocol

Kircher’suseofSVM

•  Linearkernel•  Priorfeatureselec%on(univariateanalysis)•  Imputedmissingvalues•  Booleanvariablesforcategoricalvariables•  Interac%onterms(includethefewthatimprovethetwo-featurelinearregressionmodels)

3. Some results

PerformanceinGWAS

Figure5fromKircheretal.2014Nat.Gene5cs

1.Classifier2.SVMprotocol3.Someresults

deltaSVM

•  Takesintoaccount%ssue-specificity

Cell-type-specificannotsfortrainingcell-type-specificmodels

Figure1fromLeeetal.2015Nat.Gene5cs

Comparefreqofk-mers

1.Classifier2.SVMprotocol3.Someresults

deltaSVM

•  Takesintoaccount%ssue-specificity

Hepatocytes(maincellinliver)

deltaSVMvaluesfor3

experimentallyvalidatedSNPs

humanprostateadenocarcinoma

Key Points • GWAS variants are enriched

for functional information in a tissue-specific manner

• Machine learning can be used to find patterns in functional data to identify novel variants associated with disease