genomics)of)gene)regulaon)2:) conservaon,)integraon)of)features,)assays… · 10/29/15 1...
TRANSCRIPT
10/29/15
1
Genomics of Gene Regula0on 2: Conserva0on, Integra0on of features, Assays,
Issues in interpreta0on
CSHL Course in Computa0onal and Compara0ve Genomics 2015
Ross Hardison
10/29/15 1
Features of cis-‐regulatory
modules (CRMs)
10/29/15 2 Hardison &Taylor (2012) Nature Reviews Gene/cs 13: 469-‐483
a. Bound and unbound motif instances
b. Transcription factors and histone modifications characteristic of different CRMs
10/29/15
2
CONSERVATION OF SEQUENCE AND EPIGENETIC FEATURES OF CRMS
10/29/15 3
Methods for predic0ng CRMs
Hardison &Taylor (2012) Nature Reviews Gene/cs 13: 469-‐483 10/29/15 4
10/29/15
3
Erythroid enhancer, HS2 of HBB locus control region
Window Positionchr11:
Short Match
SINELINELTRDNA
SimpleLow Complexity
SatelliteRNA
OtherUnknown
Human Feb. 2009 (GRCh37/hg19) chr11:5,301,795-5,302,089 (295 bp)5,301,850 5,301,900 5,301,950 5,302,000 5,302,050
HS2_pos
K562 Sg 1
PBDE GAT1 UCD
K562 Sig149 -
1 _
Mammal Cons
NFE2 KLF1
TAL1 GATA TFs bound
DNase footprints
Mammalian constraint
ChIP-‐seq GATA1 PBDE
DNase HS Match WGATAR
10/29/15 5
But not all CRMs are that obvious…
Evolu0onary constraint on SOME enhancers
• Occupancy of transcrip0on factors is conserved in mouse and humans • Strong evidence for evolu0onary constraint on the DNA sequence • Preserva0on of the TF binding site mo0fs across mammals
Hardison &Taylor (2012) Nature Reviews Gene/cs 13: 469-‐483 10/29/15 6
10/29/15
4
Mo0f turnover at SOME enhancers
• Occupancy of transcrip0on factors is conserved in mouse and humans • More localized evolu0onary constraint on the DNA sequence • Preserva0on of one TF binding site mo0fs across mammals, but second mo0f is in
different loca0on in rodents compared to other mammals (lineage-‐specific)
Hardison &Taylor (2012) Nature Reviews Gene/cs 13: 469-‐483 10/29/15 7
Lineage specific evolu0on of SOME enhancers
• Occupancy of transcrip0on factors only in mouse, not human • No evidence for evolu0onary constraint on the DNA sequence • Preserva0on of one TF binding site mo0f in rodents and laurasiatherians (dog,
horse, cow), but not in humans (lineage-‐specific loss of binding?)
Hardison &Taylor (2012) Nature Reviews Gene/cs 13: 469-‐483 10/29/15 8
10/29/15
5
Different approaches to finding func0on
9 ENCODE Project Consor0um "Defining func0onal elements in the human genome” (2014) PNAS
10/29/15
How similar are pacerns of gene expression between human and mouse?
10/29/15 10 Mouse ENCODE Project Consor0um (2014) Integrated Encyclopedia of mouse DNA elements. Nature
10/29/15
6
Dis0nctly different expression pacerns
10/29/15 11 Mouse ENCODE Project Consor0um (2014) Integrated Encyclopedia of mouse DNA elements. Nature
Genes with high variance between 0ssues
Genes with high variance between species
Conserva0on: Sequence-‐level and ac0vity-‐level
12
About 40% of regulatory DNA (TFBS, DHS) in mouse maps to aligning DNA in human. About 10% of TF-‐bound DNA in mouse is also bound by the same TF in human.
Olgert Denas, Richard Sandstrom, Yong Cheng, Kathryn Beal, Javier Herrero, Ross Hardison, James Taylor, (2015) BMC Genomics. Genome-‐wide compara0ve analysis reveals human-‐mouse regulatory landscape and evolu0on.
10/29/15
10/29/15
7
Genomic DNA segments occupied by orthologous pairs of TFs
13
Conserved loca0ons rela0ve to TSS
Yong Cheng et al., Snyder, Hardison, Pennacchio labs (2014) . Principles of Regulatory Informa0on Conserva0on Revealed by Comparing Mouse and Human Transcrip0on Factor Binding Profiles. Nature
Conserved binding site mo0fs
10/29/15
Conserved and divergent occupancy of orthologous DNA segments
10/29/15 14
Yong Cheng et al., Snyder, Hardison, Pennacchio labs (2014) . Principles of Regulatory Informa0on Conserva0on Revealed by Comparing Mouse and Human Transcrip0on Factor Binding Profiles. Nature
10/29/15
8
Conserva0on of GATA1-‐occupancy between mouse and human
15
Window Positionchr1:
Mouse July 2007 (NCBI37/mm9) chr1:156,885,743-156,887,787 (2,045 bp)156,886,500 156,887,000 156,887,500
hs1862_heart
G1E-ER GATA1 24hr
Erythrobl GATA1
MEL GATA-1
Mammal Cons
Window Positionchr1:
--->Gaps
HumanOrangutan
RhesusMarmoset
Mouse_lemurMouse
RatGuinea_Pig
CowHorse
DogElephant
TenrecArmadillo
SlothOpossumPlatypusChicken
Lizard
Human Feb. 2009 (GRCh37/hg19) chr1:181,122,256-181,122,304 (49 bp)181,122,270 181,122,280 181,122,290 181,122,300
CAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGC3
CAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C TGTGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T CC T TGCA GCAGGGC T C TGA T A A T C TGCCGG T TCAGA A TGGT CC T T A T C T C T T TGCA GCAGGAC T C TGA T AGT C TGCCCCA TCAGA A TGGT CC T T A T C T C T T TGCA GCAGGAC T C TGA T AGT C TGCCCCA TCA A A ACGT T CC T T A T C T C T T TGT A GCAGGAC T C TGA T A A T C TGCCCCC TCAGG - CG T T CC T T A T C T C T TGGC T GCAGGGT T C T CA T A A TGTGCCCAG TCGGGT CG T T CC T T A T C T C T T TGCA GCAGGGT T C TGA T A A T C TGCCCAG TCAGGACGT T CC T T A T C T C T C TGCA GCAGGGT T C TGA T A A TGCGCCC AG TCAGA A TGT T CC T T A T C T C T TGGC A C CAGGGC T - TGA T AGT CAGCCAGG TCA A A A TGT T CC T T A T C T C T TGGC A C CAGGGC T C TGA T A A T TGGCCAGG TCAGA A TGT CCC - T A T C T C T CGGCC C CA - GGCCC TGGT A A T C TGC T CGGCCAGA ACGT T CC - T A T C T C T TGGT T C CAGGGC T C TGA T A A T C TGCC TGGCCAGA A TGT T CCCCA T CGCC T C T CA C CGGGGCA T TGA T A AGC T ACCA T C TCAGA ACA T T CCC TGT CAC T T CGC A C CAGGGCA T TGA T A A A T T T T C T CC C
Window Positionchr1:
Human Feb. 2009 (GRCh37/hg19) chr1:181,121,049-181,123,654 (2,606 bp)181,121,500 181,122,000 181,122,500 181,123,000 181,123,500
hs1862
K562 GATA1 Sg
PBDE GATA1 Sg
Mammal Cons
Mo0fs for GATA factor binding preserved across mammals 10/29/15
Conserva0on of TF occupancy predicts enhancers ac0ve in mul+ple 0ssues
16
Model: Pleiotropic func0ons (mul0ple 0ssues, mul0ple TFs binding) are subject to stronger constraint, leading to preserva0on of occupancy despite tendency of regulatory regions to “turn over”
Yong Cheng et al., Snyder, Hardison, Pennacchio labs (2014) . Principles of Regulatory Informa0on Conserva0on Revealed by Comparing Mouse and Human Transcrip0on Factor Binding Profiles. Nature 10/29/15
10/29/15
9
GATA factor
Tissue
Erythroid, Megakaryocyte
T-‐lymphocytes Heart Brain Vasculature Liver Pancreas Lung Intes/ne Ovary Tes/s
GATA1 + GATA2 + + + GATA3 + + + GATA4 + + + + + GATA5 + GATA6 + + + + + + + FOG1 + FOG2 + + + +
Enhancers predicted by conserved GATA1 binding are ac0ve in 0ssues with paralogous GATA factors
Hypothesis: The same GATA factor-‐dependent enhancer is used in erythroid (GATA1), heart (GATA4) and brain (GATA3) for different targets. 17 10/29/15
Non-‐erythroid func0on of GATA1-‐bound sites could result from binding of paralogs (e.g. GATA4) to same site in other 0ssues
10/29/15 18
Window Positionchr3:
Mouse Dec. 2011 (GRCm38/mm10) chr3:84,438,567-84,482,797 (44,231 bp)84,445,000 84,450,000 84,455,000 84,460,000 84,465,000 84,470,000 84,475,000 84,480,000
Fhdc1Fhdc1
Fhdc1
GSM746581_2_Gata1.bw
226 -
0 _
GSM1151146_Gata1.bw
74 -
0 _
GSM558904_Gata4.bw
181 -
0 _
GSM558909_Ep300.bw
49 -
0 _
ERY GATA1
ERY GATA1
Heart GATA4
Heart EP300
Gocgens, CODEX hcp://codex.stemcells.cam.ac.uk
10/29/15
10
GAIN-‐OF-‐FUNCTION ENHANCER ASSAYS
10/29/15 19
TF OCCUPANCY IS A GOOD PREDICTOR OF ERYTHROID ENHANCERS: TAL1
Nergiz Dogan
10/29/15 20
10/29/15
11
TAL1 + GATA1 = induc0on
10/29/15 21
Tripic et al (2009) Blood 113: 2191 Cheng et al (2009) Genome Res. 19: 2172 Wu et al (2011) Genome Res. 21: 1659
Epigene0c signatures can predict enhancers with high accuracy
10/29/15 22 Dogan et al (2015) Epigene/cs & Chroma/n 8: 16
10/29/15
12
What dis0nguishes enhancer ac0ve vs inac0ve TAL1 OSs?
Scalechr18:
1 kb mm932,701,500 32,702,000 32,702,500 32,703,000
TAL1_2105
150 _
1 _
Scalechr1:
1 kb mm9135,722,000 135,722,500 135,723,000
TAL1_201
150 _
1 _
6.27
0.39
Fold change in ac0vity ChIP-‐seq signals of TAL1 peaks
10/29/15 23 Dogan et al. (2015) Epigenentics & Chromatin 8: 16.
Clusters of feature combina0ons contribute differen0ally to measured enhancement
10/29/15 24
Dogan et al. (2015) Epigenentics & Chromatin 8: 16.
10/29/15
13
10/29/15 25
TF occupancy: frequently ac0ve as enhancers HMs without TFs: rarely ac0ve as enhancers
Dogan et al. (2015) Epigene/cs & Chroma/n 8: 16.
n= 273
INTEGRATIVE ANALYSES ENCODE consor0um
10/29/15 26
10/29/15
14
“Paint” genomic regions by dominant histone modifica0ons: mul0variate HMM
10/29/15 27 Ernst and Kellis (2010) Nature Biotechnology 28: 817… Wu et al. 2011 Genome Res 21: 1659-1671
Integrate features using mul0variate segmenta0ons
10/29/15 28 M. Hoffman, J. Ernst et al. 2013. Integrative segmentations of function-associated marks. Nucleic Acids Res.
10/29/15
15
Enhancer predicted from HMM segmenta0on based on histone modifica0ons
10/29/15 29
ENCODE Project Consor0um: Combined segmenta0on results from chromHMM (Ernst et al) and Segway (Hoffman et al) to generate 25-‐state models from histone modifica0ons and other epigene0c features. Use state with enhancer-‐associated state to predict CRMs. Test in mice and fish.
ENCODE Project Consor0um (2012) Nature
DNase HSs
Accurate predic0ons of enhancers
10/29/15 30
Transient transgenic mouse embryo
Circulating erythrocytes with GFP in transgenic Medaka fish
ENCODE Project Consor0um (2012) Nature
10/29/15
16
MASSIVELY PARALLEL ENHANCER ASSAYS
S0ll use the genomics and epigenomics, but ramp up the scale of assays
10/29/15 31
CRE-‐seq
10/29/15 32
Kwasnieski, J.C. et al. (B.A. Cohen) 2012. PNAS USA 109:19498–19503
Similar assays from Shendure and Mikkelson groups
10/29/15
17
CRE-‐seq: For DNA segments with CBX binding site mo0fs, occupancy by CBX correlates with enhancer ac0vity
10/29/15 33
White, Myers, Corbo, Cohen (2013). PNAS USA 110:11952-‐11957
Reproducible expression measurements show differences in expression by segmentation class.
Kwasnieski J C et al. Genome Res. 2014;24:1595-1602
© 2014 Kwasnieski et al.; Published by Cold Spring Harbor Laboratory Press 10/29/15 34
10/29/15
18
Frac0on of “predic0ons” that are ac0ve in CRE-‐seq
10/29/15 35
Kwasnieski J C et al. Genome Res. 2014;24:1595-1602
Combina0on: STARR-‐seq of predicted CRMs
10/29/15 36
Feng Yue with Mouse ENCODE (2014). Nature
10/29/15
19
MACHINE LEARNING APPROACHES TO FINDING CRMS
10/29/15 37
EnhancerFinder integrates diverse datasets to predict developmental enhancers
10/29/15 38 Erwin et al (K. Pollard) 2014. PLoS Comp Biol 10: e1003677
mo0fs
constraint epigene0c 1
epigene0c 2 epigene0c 3
10/29/15
20
EnhancerFinder: Two stage predic0ons
10/29/15 39 Erwin et al (K. Pollard) 2014. PLoS Comp Biol 10: e1003677
Single marks work, but integra0on improves predic0on
10/29/15 40 Erwin et al (K. Pollard) 2014. PLoS Comp Biol 10: e1003677
10/29/15
21
Examples of successful enhancer predic0ons
10/29/15 41 Erwin et al (K. Pollard) 2014. PLoS Comp Biol 10: e1003677
DIRECT SELECTION FOR ACTIVITY TO FIND ENHANCERS
Forget the genomics and bioinforma0cs …
10/29/15 42
10/29/15
22
STARR-‐seq: Self-‐transcribing ac0ve regulatory region sequencing
10/29/15 43
Arnold et al (A. Stark) 2013. Science 339: 1074-‐1077
Other methods of screening for enhancer ac0vity on a large scale
10/29/15 44
Murtha, M. et al. Nat. Methods 11, (2014). Dickel, D.E. et al. Nat. Methods 11, (2014). Hardison, R.C. Nat. Methods 11, News&Views (2014).
10/29/15
23
LOSS-‐OF-‐FUNCTION ENHANCER ASSAYS
10/29/15 45
Exploi0ng nature’s variants for novel avenues to therapy
10/29/15 46 Hardison and Blobel (2013) Science 342: 206; commen0ng on ar0cle by Bauer et al. same issue
10/29/15
24
CRISPR-‐Cas9 to engineer close to satura0on mutagenesis
10/29/15 47
Canver et al. … Zhang, Orkin, Bauer (2015) BCL11A enhancer dissec0on by Cas9-‐mediated in situ satura0ng mutagenesis. Nature, published online Sept 16
Find region of candidate enhancer that is needed for ac0vity
10/29/15 48
Canver et al. … Zhang, Orkin, Bauer (2015) Nature, published online Sept 16
10/29/15
25
Fine mapping of “Achilles heel” of enhancer
10/29/15 49 Canver et al. … Zhang, Orkin, Bauer (2015) Nature, published online Sept 16
Highly constrained noncoding sequences are frequently 0ssue-‐specific enhancers
Pennacchio et al. (2006) Nature 444: 499-502 10/29/15 50
10/29/15
26
Implications of enhancer activities in highly constrained sequences
• Non-‐coding regions that are conserved at an unusually high level are highly enriched in enhancers located near developmental genes
• We should be able to predict many enhancer regions based on strong sequence constraint
Assump/on: These regions are sequence constrained because of func/onal constraint
Slides edited from: Jonathan McGovern
BMMB 541 4/30/09
10/29/15 51
Dele/on of Ultraconserved Elements Yields Viable Mice
Nadav Ahituv, Yiwen Zhu, Amy Holt, Veena Afzal, Len A. Pennacchio and Edward Rubin
PLoS Biology, September 2007
But…
10/29/15 52
10/29/15
27
Selection of Ultraconserved Enhancers KO PHENOTYPE KO PHENOTYPE
• Dmrt1: Male sexual development abnormali0es
• Pax6: Eye defects, lethality, CNS, craniofacial, pituitary and pancrea0c abnormali0es
• DNA polymerase: Assumed Lethal
• ATP11C: Assumed Lethal
• Dmrt3: Male sexual development abnormali0es, lethal due to dental malforma0on • WT1: Wilms tumor, kidney defects, lethality, mesothelium defects, heart/lung malforma0on
• ARX: Lethality, male sexual development abnormali0es, small brain
• Sox3: Abnormal sexual development and pituitary func0on, mental retarda0on in humans 10/29/15 53
Successful Knock-Out KO PHENOTYPE KO PHENOTYPE
• Dmrt1: Male sexual development abnormali0es
• Pax6: Eye defects, lethality, CNS, craniofacial, pituitary and pancrea0c abnormali0es
• DNA polymerase: Assumed Lethal
• ATP11C: Assumed Lethal
• Dmrt3: Male sexual development abnormali0es, lethal due to dental malforma0on • WT1: Wilms tumor, kidney defects, lethality, mesothelium defects, heart/lung malforma0on
• ARX: Lethality, male sexual development abnormali0es, small brain
• Sox3: Abnormal sexual development and pituitary func0on, mental retarda0on in humans 10/29/15 54
10/29/15
28
Results • uc248 knockout :
-‐Expected Phenotype: Male reproduc0ve abnormality, severe dental malforma0on -‐Observed Phenotype: Normal reproduc0ve capability (Table 4), normal den00on
• uc467 knockout: -‐Expected Phenotype: Perinatal mortality, small brain, male sexual reproduc0ve abnormality -‐Observed Phenotype: Normal brain, normal reproduc0ve capability (Table 4), normal survival (Table 2)
• uc329 knockout: -‐Expected Phenotype: Wilms tumor, WAGR syndrome, other kidney abnormality, eye abnormali0es -‐Observed Phenotype: ~2% unilateral renal agenesis (compare to 0.5% in wt), normal eyes
Table 2 (Expect 1:2:1 Ra0o)
Table 4 10/29/15 55
• Loss of highly conserved regulatory regions leads to insignificant (if any) phenotype
• Maybe there is phenotype outside lab or over time • Regulatory element redundancy probable • The idea that highly conserved non-coding regions are
hotspots for developmental enhancers is not being challenged but it raises the question…
What prevents sequence change?
Summary #2
10/29/15 56
10/29/15
29
Deep phenotyping can reveal defects from enhancer dele0ons
10/29/15 57 Acanasio et al…Pennacchio, Visel (2013) Science 342:1241006
Predict cranio-‐facial enhancers by P300 ChIP-‐seq. Delete candidate enhancers. Look for phenotypes by morphome0c analysis.
Genomics of Gene Regula0on: Predic0ng and tes0ng CRMs
• Conserva0on and pacerns of alignments in noncoding regions can be used to predict CRMs – Miss lineage specific func0ons, turnover
• Biochemical features associated with cis-‐regulatory modules can be used to predict CRMs – May over-‐call CRMs – TF occupancy alone does not necessarily mean that the DNA is ac0vely
involved in regula0on. • Start with epigene0c features, and use evolu0onary pacerns to
discern history and predict func0ons – Some genes have conserved expression pacerns, others differ between
species – Conserva0on of TF occupancy: Pleiotropic func0ons, core func0ons – Lineage-‐specific TF occupancy: Adap0ve func0ons – Sequence conserved but func0on co-‐opted (exapted) to different
func0on in one species
10/29/15 58
10/29/15
30
Genomics of Gene Regula0on: Integra0on and Assays
• Individual features can be good predictors, but they tend to point to similar regions – EP300, 0ssue-‐specific TFs, H3K4me1, H3K27ac – Simple intersec0ons do not increase power very much
• Integra0on of features by unsupervised machine-‐learning can reveal frequently occurring (and important) states
• Supervised machine-‐learning (e.g. EnhancerFinder) does a good job of finding the enhancers it was trained to find: developmental enhancers
• Significant progress in ramping up the scale of gain-‐of-‐func0on assays for enhancers
• Loss-‐of-‐func0on assays can be defini0ve – Facilitated by new genome edi0ng technology (CRISPR-‐Cas9) – Some0mes show no phenotype for strong gain-‐of-‐func0on enhancers
10/29/15 59