chromatin structure & dynamics victor jin department of biomedical informatics the ohio state...
TRANSCRIPT
Chromatin Structure Chromatin Structure & Dynamics& Dynamics
Victor JinDepartment of Biomedical InformaticsThe Ohio State University
Chromatin Walther Flemming first used the term Chromatin in 1882. At that
time, Flemming assumed that within the nucleus there was some kind of a nuclear-scaffold.
Chromatin is the complex of DNA and protein that makes up chromosomes.
Chromatin structure: DNA wrapping around nucleosomes – a
“beads on a string” structure.
In non-dividing cells there are two types of chromatin: euchromatin and heterochromatin.
Chromatin Fibers
30 nmchromatin fiber
11 nm(beads)
Chromatin as seen in the electron microscope. (source: Alberts et al., Molecular Biology of The Cell, 3rd Edition)
The basic repeating unit of chromatin.
It is made up by five histone proteins: H2A, H2B, H3, H4 as core histones and H1 as a linker.
It provides the lowest level of compaction of double-strand DNA into the cell nucleus.
It often associates with transcription.
Nucleosome
H2A H2BH3
H4
1974: Roger Kornberg discovers nucleosome who won Nobel Prize in 2006.
Core Histones are highly conserved proteins - share a structural motif called a histone fold including three α helices connected by two loops and an N-terminal tail
Histone Octamer
Each core histone forms pairs as a dimer contains 3 regions of interaction with dsDNA; H3 and H4 further assemble tetramers. The histone octamer organizes 146 bp of DNA in 1.65 helical turn of DNA: 48 nm of DNA packaged in a disc of 6 x 11nm
< 6 nm >
<
11
nm
>
Nucleosome Assembly In Vitro
4 core histones + 1 naked DNA template at 4C at 2M salt concentration, from Dyer et al, Methods in Enzymology (2004), 375:23-44.
DNA compaction compaction in a human cell nucleus
1bp (0.3nm)
10,000 nm
30nm
11 nm
The N-terminal tails protrude from the core
Histone Modifications
Me
P
Ub
Su
Ac Me
Acetylation
Methylation
Ubiquitination
Sumoylation
Phosphorylation
‘Histone Code’
Acetylation of LysinesAcetylation of the lysines at the N terminus of histones removes positive charges, thereby reducing the affinity between histones and DNA.
This makes RNA polymerase and transcription factors easier to access the promoter region.
Histone acetylation enhances transcription while histone deacetylation represses transcription.
Methylation of Arginines and Lysines
Arginine can be methylated to form mono-methyl, symmetrical di-methyl and asymmetrical di-methylarginine.
Lysine can be methylated to form mono-methyl,
di-methyl
and tri-methylarginine.
Methylation of Histone H3-K27
K27
PCDNMT
SUZ12
HDACEED
EZH2
Functional Consequences of Histone Modification
Establishing global chromatin environment, such as Euchromatin, Heterochromatin and Bivalent domains in embryonic stem cells (ESCs).
Orchestration of DNA-based process transcription.
Euchromatin
A lightly packed form of chromatin; Gene-rich; At chromosome arms; Associated with active transcription.
Heterochromatin
A tightly packed form of chromatin; At centromeres and telomeres; Contains repetitious sequences; Gene-poor; Associated with repressed transcription.
Bivalent Domains
Poised state. The chromatin of embryonic stem cells has “bivalent” domains with marks of both gene activation and repression. In these domains, the tail of histone protein H3 has a methyl group attached to lysine 4 (K4) that is activating and a methyl group at lysine 27 (K27) that is repressive (above). This contradictory state may keep the genes silenced but poised to activate if needed. When the cell differentiates (right), only one tag or the other remains, depending on whether the gene is expressed or not.
DNA Methylation
5-methylcytosine
S-adenosylmethionine
DNA methyltransferase
deoxycytosine
N
N
O
OH H
-OO
N
N
N
O
OH H
-OO
NCH3
CpG Islands
CpG island: a cluster of CpG residues often found near gene promoters (at least 200 bp and with a GC percentage that is greater than 50% and with an observed/expected CpG ratio that is greater than 0.6).
~29,000 CpG islands in human genome (~60% of all genes are associated with CpG islands)
Most CpG islands are unmethylated in normal cells.
Mark Transcriptionally relevant sites Biological Role
Methylated cytosine(meC)
CpG islands Transcriptional Repression
Acetylated lysine (Kac)
H3 (9,14,18,56), H4 (5,8,13,16), H2A, H2B
Transcriptional Activation
Phosphorylated serine/threonine
(S/Tph)
H3 (3,10,28), H2A, H2B Transcriptional Activation
Methylated argine (Rme)
H3 (17,23), H4 (3) Transcriptional Activation
Methylated lysine (Kme)
H3 (4,36,79)H3 (9,27), H4 (20)
Transcriptional Activation
Transcriptional Repression
Ubiquitylated lysine(Kub)
H2B (123/120)H2A (119)
Transcriptional Activation
Transcriptional Repression
Sumoylated lysine (Ksu)
H2B (6/7), H2A (126) Transcriptional Repression
Chromatin modifications
Genome-wide Distribution Pattern of Histone Modification Associated with Transcription
Li et al. Cell (review) 128, 707-719Source: Li et al. Cell (Review, 2007), 128:707-719
ChIP-chipStep 1: Rapid fixation of cells chemically cross-links DNA binding proteins to their genomic targets in vivo.
Step 2: Cell lysis releases the DNA-protein complexes, and sonication fragments the DNA.
Step 3: Immunoprecipitation (IP) purifies the protein-DNA fragments, with specificity dictated by antibody choice.
Step 4: Hydrolysis reverses the cross-links within the released DNA fragments.
Step 5: PCR amplification of ChIP DNA
Step 6: PCR amplification on a known binding-site region for that protein will need to be performed using either conventional PCR methods followed by agarose gel electrophoresis or by quantitative PCR.
Step 7: Labeling pool of protein-DNA fragments.
Step 8: Hybridization of DNA onto microarrays featuring 60-mer oligonucleotide probes.
Major types of array platforms
NimbleGen Arrays: tiling arrays, promoter arrays, whole
genome arrays.
(http://www.nimblegen.com/products/chip/index.html)
Agilent Arrays: promoter arrays, whole genome arrays.
(http://www.chem.agilent.com/Scripts/Phome.asp)
Affymetrix Arrays: tiling arrays, Chr21,22 arrays, whole
genome arrays.
(http://www.affymetrix.com/index.affx)
Measurement of intensity of probes on the array
The hybridized arrays were scanned on an Axon GenePix 4000B scanner (Axon Instruments Inc.) at wavelengths of 532 nm for control (Cy3), and 635 nm (Cy5) for each experimental sample. Data points were extracted from the scanned images using the NimbleScan 2.0 program (NimbleGen Systems, Inc.). Each pair of N probe signals was normalized by converting into a scaled log ratio using the following formula:
•Si = Log2 (Cy5l(i) /Cy3(i))
Confirming on a known target
Different antibodies to same factor
Antibodies to different family members
siRNA-ChIP
Antibodies to two components of a complex
Antibodies to an enzyme/modification pair
Antibody Validation
Confirming on a known target
Comparison of biological replicates and antibodies to different E2Fs
Loss of E2F6 ChIP signal after knockdown of E2F6 siRNA
•Promoter 1 •Promoter 2
Reproducibility of promoter arrays using biological replicates
•Top 1000 overlap
•Top 1000 overlap
•H3me3K27; Ntera2 cells
•500 kb region of chromosome 6
•500 kb region of chromosome 1
Amount of Sample Per ChIP
Number of cells Chromatin input
ChIP output
1x107 200 µg 150 ng
1x106 20 µg 10 ng
5x105 10 µg 1.3 ng
1x105 2 µg 300 pg
1x104 200 ng 30 pg
Amount of Sample Per ChIP
Number of cells Chromatin input
ChIP output
1x107 200 µg 150 ng
1x106 20 µg 10 ng
5x105 10 µg 1.3 ng
1x105 2 µg 300 pg
1x104 200 ng 30 pg
•Standard ChIP Protocol (1x107 cells; WGA2)
• Promoter Arrays
• Genome Tiling Arrays
•MicroChIP Protocol (10,000-100,000 cells; WGA4)
• Promoter Arrays
• Genome Tiling Arrays
Miniaturization
Reproducibility of MicroChIP Protocol
Peak calling programs
• Moving average method by Keles et al. (2004), • A Hidden Markov Model (HMM) approach by Li et al. (2005), • TileMap by Ji and Wong (2005) using moving averages or
an HMM to account for information of adjacent probes, • PMT by Chung et al. (2007) that integrates a physical model
to correct for probe-specific behavior. • ChIPmix (Martin-Magniette et al. (2008)) based on a linear
regression mixture model .
Spike-ins comparison
• Mixtures of human genomic DNA and “spike-ins” comprised of nearly 100 human sequences at various concentrations were hybridized to four tiling array platforms by eight independent groups.
• Ref: Johnson et al., “Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets”, Genome Research, 18: 393-403, 2008.
Programs in Spike-ins comparison • MAT: Model-based Analysis of Tiling arrays first standardizes each individual Affymetrix
tiling arrays by modeling the effect of probe’s 25-mer sequence and genome copy number on its signal.
• TAS: Affymetrix Tiling Array Software first uses quantile normalization to normalize probes on all the arrays. Then a Mann-Whitney U test (also known as Wilcoxon rank-sum test) is used across 500bp sliding windows to identify windows where the spike-in probes has higher signals than the control probes.
• Weighted Average (WA): To detect enriched regions, we used an approach that judged the significance of ratios of a contiguous set of probes defining a region by comparing a score based on their weighted average to the distribution of scores of all sets of probes taken in windows of the same predefined size (500bp in this case.)
• TAMAL: the algorithm proceeds in two basic steps. First, peaks are found using the TAMALPAIS. Then, the enrichment is estimated within the peak by using the maxfour approach described in Krig et al. (2007, J Biol Chem 282:9703). Bieda et al. (2006) describe four levels of stringency, called L1, L2, L3, L4, with L1 being the most stringent set of detection parameters and L4 the least stringent.
• Mpeak: The model-based Mpeak method is used to identify peaks in ChIP-on-chip data.• Wavelet: The algorithm uses wavelet transform of the signals from the red and green
channels of the tiling array. From the approximation coefficients of the wavelet transform we obtain clear intensity and length-scale separation between the background signal and the signal coming from the regions of the biochemical activity.