a brief introduction to chip-seq - bioshare...a brief introduction to chip-seq monica britton, ph.d....
TRANSCRIPT
A Brief Introduction to ChIP-Seq
Monica Britton, Ph.D. Sr. Bioinformatics Analyst
March 2015 Workshop
A Word of Apology
This lecture would normally be given by one of the UC Davis ChIP-Seq experts. Unfortunately, this week they are traveling. So, this lecture will mostly cover the basics. But, we will keep track of all questions asked and post the answers as soon as possible!
Functional Elements in the Genome
www.encodeproject.org
Chromatin Structure Determines Gene Status
Transcription factors cannot gain access
gene A gene B
Closed Chromatin
gene A gene B
Promoter and enhancer binding sites accessible
Open chromatin
DNA and Histone Modifications Create an Epigenome
MBD
protein
MBD
protein
meC
protein that creates
and/or binds to meC modified
histone
protein that modifies and/or
binds to a modified histone
Coordinated Efforts to Decipher Epigenomes
• There is a wealth of publicly available data. Don’t be afraid to dig!
• NIH Roadmap Epigenomics Mapping Consortium http://www.roadmapepigenomics.org/
• Encyclopedia of DNA Elements Consortium (ENCODE)
• ENCODE data limited to cell-types at http://genome.ucsc.edu/index.html
ChIP (Before the Seq)
www.whatisepigenetics.com
Comparison of Experimental Protocols
Furey, 2012 Nat. Rev. Gen. 13:840
Key Steps in a ChIP Assay
ChIP DNA Input DNA
Cross-linking
Fragmentation
Immunoprecipitation
DNA purification
Henny’s Parameters for a Successful ChIP
• Cross-linking
• Antibody, antibody, antibody – ChIP-grade antibodies commercially available
– Previously published
– Ideally antibody works in IP
• Amount of starting material
• Chromatin fragmentation – Size does matter
– Just right: not too big and not too small
• IP-matrix – Magnetic beads are easy to handle
• Stringency of washes
Sonication Conditions Vary Between Different Cell Types
Fragments too big:
Reduced signal to noise ratio
in ChIP-seq
Oversonication:
Fragmentation biased towards
promoter regions leading to
ChIP-seq enrichments at
promoters in ChIP AND
control (input) sample
ChIP-Seq QC Check Points
O’Geen et al., 2011, Meth. in Mol. Biol , 791:265
ENCODE Recommendations for Antibody Characterization
Landt, et al., 2012 Genome Res. 22:1813
ENCODE Guidelines For Controls and Replicates
• Controls are Important! – Necessary to avoid non-uniform background (sonication, etc.)
– Many cell lines have aneuploidy (genome size, copy number)
– “Input” controls – similar prep, but no ChIP.
– “IgG” – IP without the specific antibody
– If amplification is done, must be done on all samples, including controls (and complexity needs to be evaluated after sequencing to ensure peaks aren’t due to PCR artifacts)
• Replicates – Minimum of two biological replicates.
– The number of mapped reads and identified targets should be within 2 fold between replicates
– 80% of the top 40% of targets from one replicate should overlap the list of targets from the other replicate. OR
– More than 75% of targets should be in common between each replicate
ENCODE Guidelines for Sequencing Depth
• The number of targets that can be identified varies substantially between cell types and experiments
• Depends on the TF, antibody, and peak-calling algorithm.
• Mammalian cells: – 10M uniquely mapped reads per replicate for point-source peaks
(increased from previous requirement of 3M reads)
– 20M uniquely mapped reads per replicate for broad-source peaks
• Other organisms need fewer reads (insects, yeast, etc.)
• Each replicate should be sequenced to similar depth. Controls to similar or greater depth.
• Complexity is important – low complexity libraries indicate PCR over-amplification, resulting in high false-positive rate (and failed experiment).
• FRiP (Fraction of Reads in Peaks) should be >1% of reads
Called Peaks Increase With Sequencing Depth
Landt, et al., 2012 Genome Res. 22:1813
From Binding Site to Sequence to Peak
Short sequences are generated from each DNA molecule.
When mapped, a tag distribution is seen around a stable binding site.
Cross-correlation is calculated for the distance between positive- and negative-strand peaks
Kharchenko et al., 2008 Nat. Biotech. 26:1351
Library Complexity and Cross-Correlation
Landt, et al., 2012 Genome Res. 22:1813
Characteristics of Good ChIP-Seq Experiments
Successful experiments will identify many thousands of peaks for most TFs. These peaks will also have good cross-correlation
ChIP-Seq Peak Characteristics
Two cross-correlation peaks are usually observed: one corresponding to the read length (‘‘phantom’’ peak) and the other correspoding to the average fragment length of the library.
The absolute and relative height of the two peaks can help determine the success of a ChIP-seq experiment.
In a high-quality IP, the ChIP peak is much higher than the ‘‘phantom’’ peak, while very small or no ChIP peaks are seen in failed experiments.
Comparison of Good vs. Poor ChIP-Seq Experiments
A (Non-Exhaustive) List of Useful References
ENCODE and modENCODE Guidelines For Experiments Generating ChIP, DNase, FAIRE, and DNA Methylation Genome Wide Location Data Version 2.0, July 20, 2011 (www.encodeproject.org)
ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Landt et al., Genome Research, 2012, 22:1813.
ChIP–seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions. Furey, Nat. Rev. Genetics 2012, 13:840
Using ChIP-Seq Technology to Generate High-Resolution Profiles of Histone Modifications. O’Geen et al., 2011, Methods in Molecular Biology , 791:265
Design and analysis of ChIP-seq experiments for DNA-binding proteins. Kharchenko et al., 2008 Nature Biotechnology 26:1351
Global analysis of ZNF217 chromatin
occupancy in the breast cancer cell genome
reveals an association with ERalpha
Frietze S, O'Geen H, Littlepage LE, Simion C,
Sweeney CA, Farnham PJ, Krig SR.
BMC Genomics. 2014 Jun 24;15:520.
The following slides are from Heny O’Geen’s lecture
at our December Workshop …..
• The 20q13 locus is one of twenty known “hot
spots” found in breast cancer.
• 20q13 is clinically associated with
aggressive disease and poor prognosis.
• ZNF217: multiple zinc finger motifs,
suggested a DNA-binding protein and putative
oncogene
• Hypothesis: ZNF217 overexpression gives a
selective advantage by interfering with
regulation of : cell growth, cell death,
differentiation or DNA repair.
The 20q13.2 locus is amplified
in 20-30% of breast tumors
Collins et al., PNAS 1998
Genome-wide Mapping of ZNF217 in
MCF7 cells to elucidate function
ENCODE data
0
10
20
30
40
50 T
IFF
1
CC
ND
1
Axin
1
TC
FL1
SL
C2
2A
5
CA
RM
1a
CA
RM
1b
erb
b3
TIF
F1
Lrig1
PG
R
Lrig3
TG
FB
2
ZN
F10
Pro
mote
r
Enhancer Co
ntr
ol
Rela
tiv
e C
hIP
en
rich
men
t ZNF217
IgG
0
20
40
60
80
100
120
ERBB3 TIFF1 LRIG1
ZNF217
ERa
FOXA1
IgG
Rela
tive C
hIP
en
rich
men
t
A
B
ZNF217 binding (33%) at intronic
enhancer regions
Motif analysis: ZNF217 binding localizes to GATA, FOXA1
and ER (endocrine response) motifs
GREAT: Pathway Commons Category
Gene Ontology supports ZNF217
co-binding with ER and FOXA1
Heatmap clustering of ENCODE
ChIP-seq datasets
ZNF217 co-binding with ER, GATA3
and FOXA1
0
10
20
30
40
50
TIF
F1
CC
ND
1
Axin
1
TC
FL
1
SL
C2
2A
5
CA
RM
1a
CA
RM
1b
erb
b3
TIF
F1
Lri
g1
PG
R
Lri
g3
TG
FB
2
ZN
F1
0
Pro
mo
ter
Enhancer Co
ntr
ol
Rela
tiv
e C
hIP
en
ric
hm
en
t ZNF217
IgG
0
20
40
60
80
100
120
ERBB3 TIFF1 LRIG1
ZNF217
ERa
FOXA1
IgG
Rela
tive C
hIP
en
ric
hm
en
t
A
B
Mohammed et al., Cell Rep 2013
ER co-immunopreciptation identifies
108 ER-associated proteins
ZNF217
ERα
Actin
ZNF and ER protein correlate in
breast tumors
ERE
ER
CtBP
TSS
ZNF217
RNA Pol II
FoxA1
Chromatin Looping
>5 kb
Model for chromatin looping. Change in the three dimensional
structure between the distal ER-bound enhancer region and the proximal
promoter allows protein interactions with RNA pol II at the TSS.
Hypothesis: ZNF217 regulates chromatin
looping to assist ER gene regulation
Gene expression changes in
ZNF217 knock-down cells
Gene expression changes in ZNF217
knock-down cells
Gene Set Enrichment Analysis (GSEA) compares differences between two biological states
Gene Ontology on differentially expressed
genes co-bound by ZNF217 and ER
Clinical Conclusions:
ZNF217 overexpression is a biomarker for worse prognosis in
breast cancer patients:
- Prognostic: Identify a subset of ZNF217-overexpressing tumors with
worse prognosis in the ER+ Luminal A subtype
- Predictive: Identify tumors with de novo anti-hormone resistance or
more likely to relapse with anti-hormone treatment
Hypothesis: ZNF217 overexpression in ER+ breast cancer leads to aberrant
ER gene regulation
- Knowledge of the ZNF217 role in ER biology will have clinical impact
- ZNF217 overexpression alters ER co-regulator activity at the chromatin level