genome-wide dna methylation analysis bi-qing li key laboratory of systems biology, shanghai...
TRANSCRIPT
Genome-wide DNA methylation analysis
Bi-Qing LiKey Laboratory of Systems biology,
Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences
outlineBackgroundMethod to distinguish 5mCArray based genome-wide DNA methylation analysisNGS based genome-wide DNA methylation analysisThird generation sequencing based genome-wide DNA
methylation analysisIllumina BS-seq data manipulation
BackgroundMethod to distinguish 5mCArray based genome-wide DNA methylation analysisNGS based genome-wide DNA methylation analysisThird generation sequencing based genome-wide DNA
methylation analysisIllumina BS-seq data manipulation
Background DNA methylation is the main covalent chemical modification
of DNA involved in a variety of biological processes, including embryogenesis and development, silencing of transposable elements, regulation of gene transcription and tumorigenesis and progression.
The methylation pattern of DNA is highly variable among cells types and developmental stages and influenced by disease processes and genetic factors, which brings considerable theoretical and technological challenges for its comprehensive analysis.
Recently various high-throughput approaches have been developed and applied for the genome wide analysis of DNA methylation providing single base pair resolution, quantitative DNA methylation data with genome wide coverage.
Genes 2010, 1(1), 85-101; doi:10.3390/genes1010085
BackgroundMethod to distinguish 5mCArray based genome-wide DNA methylation analysisNGS based genome-wide DNA methylation analysisThird generation sequencing based genome-wide DNA
methylation analysisIllumina BS-seq data manipulation
Method to distinguish 5mC
Biotechniques. 2010 Oct;49(4):iii-xi
Restriction endonuclease-based analysis
Pu: A or G, mC: 5-methylcytosine or 5-hydroxymethylcytosine or N4-methylcytosine , These half-sites can be separated by up to 3 kb, but the optimal separation is 55-103 base pairs
Cut unmethylated DNA Regardless of methylation
Cut unmethylated DNA Partially affacted by CpG methylation
Cut methylated DNA
isoschizomer
neoschizomer
Biotechniques. 2010 Oct;49(4):iii-xi
Restriction endonuclease-based analysisMethylation-sensitive restriction digestion followed by PCR
across the restriction site is a very sensitive technique that is still used in some applications today.
This method is still applicable for some locus-specific studies that require linkage of DNA methylation information across multiple kilobases, either between CpGs or between a CpG and a genetic polymorphism.
Limited by providing methylation data only at the restriction enzyme recognition sites or adjacent regions
It is extremely prone to false-positive results caused by incomplete digestion for reasons other than DNA methylation.
Nat Rev Genet. 2010 Feb 2;11(3):191-203
Bisulfite conversion of DNA
Proc Natl Acad Sci U S A. 1992 Mar 1;89(5):1827-31.
Bisulfite conversion
PCR
Bisulfite conversion of DNASingle base pair resolution, no bias
DNA degradation by high temperature and low PH
Incomplete conversion of unmethylated cytosine
High GC density regions
Protected by histones
Stable secondary structure elements
Reduced complexity of genome, greater sequence redundancy, decreased hybridization specificity
Difficult to mapping (repetitive regions)
Genes 2010, 1(1), 85-101; doi:10.3390/genes1010085
Immunoprecipitation-based methodsmethylated DNA immunoprecipitation (MeDIP-seq)
Antibody recognizes 5mc to pull down the methylated fraction of genome
More sensitive to highly methylated, intermediate-CpG density regions
methyl-binding domain protein (MBD-seq)
Using the methyl-binding protein MeCP2 or MBD2’s affinity for CpGs
More sensitive to highly methylated, high-CpG density regions
Methods. 2010 Nov;52(3):203-12
Immunoprecipitation-based methodsStraitforward and data relatively easier to analyze
Bias associated with CpG density and need adjustment
High(MBD) or intermediate(MeDIP) CpG dense regions will be interpreted as “more methylated” than equally methylated low-CpG density regions
Low resolution, do not yield information on individual CpG dinucleotides
Methods. 2010 Nov;52(3):203-12
BackgroundMethod to distinguish 5mCArray based genome-wide DNA methylation analysisNGS based genome-wide DNA methylation analysisThird generation sequencing based genome-wide DNA
methylation analysisIllumina BS-seq data manipulation
Array-based genome wide DNA methylation analysis & restriction endonuclease
Digestion of one pool of genomic DNA with a methylation-sensitive restriction enzyme and mock digestion of another pool or using two different enzymes
Two DNA pools are amplified and labelled with different fluorescent dyes for two-color
Array hybridization
Nat Rev Genet. 2010 Feb 2;11(3):191-203
Array-based genome wide DNA methylation analysis & restriction endonuclease
Comprehensive high-throughput arrays for relative methylation (CHARM)
McrBC fractionate unmethylated DNA
Label methyl-depleted DNA with Cy5 and total DNA with Cy3
Hybridized on high density arrays
Genome Res. 2008 May;18(5):780-90
Cut methylated DNA
Array-based genome wide DNA methylation analysis & restriction endonuclease
Digestion genomic DNA with HpaII and MspI
Ligation-mediated PCR for the amplification of HpaII or MspI genomic restriction fragments
Label HpaII amplified with Cy5 and MspI with Cy3
Array hybridization
Genome Res. 2006 Aug;16(8):1046-55
HpaII tiny fragment enrichment by ligation mediatedPCR (HELP)
Cut unmethylated DNA
Regardless of methylation
Array-based genome wide DNA methylation analysis & methylation immunoprecipitation
Enrichment of methylated fragments using 5mC antibody or the affinity of methyl-binding proteins
Input DNA and enriched DNA are labeled with different fluorescent dyes
Array hybridization
Nat Rev Genet. 2010 Feb 2;11(3):191-203
Array-based genome wide DNA methylation analysis & methylation immunoprecipitation
Methylated DNA immunoprecipitationFrom Wikipedia, the free encyclopedia
Array-based genome wide DNA methylation analysis & bisulfite conversion
ILLUMINA® EPIGENETIC ANALYSIS
Array-based genome wide DNA methylation analysis & bisulfite conversion
27,578 CpG sites
14,495 protein-coding gene promoters
110 microRNA gene promoters Nat Rev Genet. 2010 Feb 2;11(3):191-203
Array-based genome wide DNA methylation analysis & bisulfite conversion
Genome Res. 2006 Mar;16(3):383-93
Array-based genome wide DNA methylation analysis & bisulfite conversion
GoldenGate BeadArray 1536 specific CpG site in 371 geneGoldenGate Methylation Cancer Panel I 1505 CpG sites selected from 807 genes
Nat Rev Genet. 2010 Feb 2;11(3):191-203
Illumina® Epigenetics Analysis
Array-based genome wide DNA methylation analysis
Easy to perform such experimentsEasy to interpret data with many well-characterized
software programsLow resolutionNot easy to distinguish one repetitive element from
another in a hybridization-based methodNot truly genome-wide
BackgroundMethod to distinguish 5mCArray based genome-wide DNA methylation analysisNGS based genome-wide DNA methylation analysisThird generation sequencing based genome-wide DNA
methylation analysisIllumina BS-seq data manipulation
NGS based genome-wide DNA methylation analysis
Biotechniques. 2010 Oct;49(4):iii-xi
NGS based genome-wide DNA methylation analysis-ROCHE 454
Roche/454 pyrosequencing-based massively parallel bisulfite pyrosequencing
Include more CpG sites facilitating complex methylation pattern research
Easier and more accurately aligned to reference, especially in repetitive regions
Bigger chance to cover more genotype information (SNP) adjacent to cytosine
Relatively high sequencing costHigher error rates in calling identical bases
Genes 2010, 1(1), 85-101; doi:10.3390/genes1010085
NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA
Methyl-seq
~100-350bp
Illumina Genome Analyzer II
Genome Res. 2009 Jun;19(6):1044-56
Cut unmethylated DNA
Regardless of methylation
NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA
Methyl-sensitive cut counting(MSCC)
Nat Biotechnol. 2009 Apr;27(4):361-8
The method is similar to Methyl-Seq; however, sequencing of MspI libraries was reported to have little effect on the measurement of methylation and was abolished to reduce costs.
Genome Med. 2009 Nov 16;1(11):106
NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA
methyl-DNA immunoprecipitation(MeDIP) seq
Methods. 2009 Mar;47(3):142-50
NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA
Reduced representation bisulfite sequencing(RRBS)
Nucleic Acids Research, 2005, Vol. 33, No. 18 Nature. 2008 Aug 7;454(7205):766-70Nat Methods. 2010 Feb;7(2):133-6
Illumina Genome Analyzer
NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA
Bisulfite padlock probes(BSPPs)
Nat Biotechnol. 2009 Apr;27(4):353-60
NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA
Bisulfite sequencing(BS-seq)
Nature. 2008 Mar 13;452(7184):215-9
NGS based genome-wide DNA methylation analysis-Illumina/SOLEXA
Cytosine methylome sequencing(MethylC-seq)
Cell. 2008 May 2;133(3):523-36
Nature. 2009 Nov 19;462(7271):315-22
Nature. 2011 Mar 3;471(7336):68-73
BackgroundMethod to distinguish 5mCArray based genome-wide DNA methylation analysisNGS based genome-wide DNA methylation analysisThird generation sequencing based genome-wide DNA
methylation analysisIllumina BS-seq data manipulation
Third generation sequencing based genome-wide DNA methylation analysis-PacBio
single-molecule, real-time sequencing (SMRT)
ZMW: zero mode waveguide Nat Biotechnol. 2010 May;28(5):426-8
Third generation sequencing based genome-wide DNA methylation analysis-PacBio
single-molecule, real-time sequencing (SMRT)
Nat Methods. 2010 Jun;7(6):461-5 Nat Methods. 2010 Jun;7(6):435-7
Third generation sequencing based genome-wide DNA methylation analysis-Oxford Nanopore
Oxford Nanopore Technologies
Nat Biotechnol. 2010 May;28(5):426-8
BackgroundMethod to distinguish 5mCArray based genome-wide DNA methylation analysisNGS based genome-wide DNA methylation analysisThird generation sequencing based genome-wide DNA
methylation analysisIllumina BS-seq data manipulation
Illumina BS-seq data manipulation
FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application
Illumina BS-seq data manipulation
FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application
Illumina BS-seq data manipulationFASTQ file format
FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score
Nucleic Acids Research, 2010, Vol. 38, No. 6 1767–1771
Illumina BS-seq data manipulationPHRED score
Nucleic Acids Research, 2010, Vol. 38, No. 6 1767–1771
Nature. 2009 Nov 19;462(7271):315-22
Illumina BS-seq data manipulationPHRED score
http://en.wikipedia.org/wiki/FASTQ_format#cite_note-Illumina_User_Guide_1.5-2
Illumina BS-seq data manipulation
FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application
Illumina BS-seq data manipulationadaptor trimming with FASTX
Nature. 2009 Nov 19;462(7271):315-22
Illumina BS-seq data manipulationadaptor trimming with FASTX
http://hannonlab.cshl.edu/fastx_toolkit/index.html
Illumina BS-seq data manipulationadaptor trimming with FASTX
http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastx_clipper_usage
Illumina BS-seq data manipulation
FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application
Illumina BS-seq data manipulationQuality control with FastQC
http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
Illumina BS-seq data manipulationQuality control with FastQC
Illumina BS-seq data manipulation Quality control with FastQC
Illumina BS-seq data manipulation
FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application
Illumina BS-seq data manipulationReads filter and trimming with FASTX
http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastq_quality_filter_usage
e.g.1 fastq_quality_filter -Q 33 -q 20 -p 100 -v -i input -o output
e.g.2 fastq_quality_filter -q 10 -p 100 -i /usr/local/data/GBS/OWB-RAD1.fastq -Q 33 | fastq_quality_filter -Q 33-q 20 -p 80 -o OWB1-filt.fastq
Illumina BS-seq data manipulationReads filter and trimming with FASTX
FASTQ quality trimmer
e.g.1 fastq_quality_trimmer -t 20 -l 35 -v -i input -o output
Illumina BS-seq data manipulation
FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application
Illumina BS-seq data manipulationReads mapping with Bismark
Illumina BS-seq data manipulation Reads mapping with Bismark
Bioinformatics. 2011 Jun 1;27(11):1571-2.
Two computationally converted reference
Bioinformatics. 2011 Jun 1;27(11):1571-2.
Illumina BS-seq data manipulationReads mapping with Bismark
Illumina BS-seq data manipulation Reads mapping with Bismark
Illumina BS-seq data manipulation Reads mapping with Bismark
H=A, C or T
Illumina BS-seq data manipulationReads mapping with Bismark
H=A, C or T
Illumina BS-seq data manipulationReads mapping with Bismark
H=A, C or T
Illumina BS-seq data manipulationReads mapping with Bismark
Illumina BS-seq data manipulationReads mapping with Bismark
H=A, C or T
Illumina BS-seq data manipulationReads mapping with Bismark
chromosome position strand context mC All C
1 468 + CG 4 4
1 469 - CG 5 6
1 470 + CG 5 5
1 471 - CG 7 7
1 7384 - CHG 6 9
1 225896 - CHH 4 16
1 771455 + CHH 5 22
1 702235 + CHG 2 12
Illumina BS-seq data manipulation
FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application
Illumina BS-seq data manipulationBasic analysis-Reads coverage
Illumina BS-seq data manipulationBasic analysis-Reads depth
Illumina BS-seq data manipulationBasic analysis-Reads depth percentage
Illumina BS-seq data manipulationBasic analysis- Methylation level
number of methylated readsmethylationlevel
number of methylated reads number of unmethylated reads
chromosome position strand context mC All C Methylationlevel
1 468 + CG 4 4 100%
1 469 - CG 5 6 83.3%
1 470 + CG 5 5 100%
1 471 - CG 7 7 100%
1 7384 - CHG 6 9 66.7%
1 225896 - CHH 4 16 25%
1 771455 + CHH 5 22 22.7%
1 702235 + CHG 2 12 16.7%
H=A, C or T
Illumina BS-seq data manipulationBasic analysis-Methylaion density
( , , )( )
( , , )( )
number of calls of a givenmethylationtype mCG mCHG mCHHAbsolute mC
bin size
mC number of calls of a givenmethylationtype mCG mCHG mCHHRelativemethylation
C total number of sites of the sametype
H=A, C or T
Illumina BS-seq data manipulation
FASTQ file format and PHRED scoreAdaptor trimming with FASTXQuality control with FastQCReads filter and trimming with FASTXReads mapping with BismarkBasic analysisAdvanced analysis and application
Illumina BS-seq data manipulationAdvanced analysis and application
DNA methylation and gene expression
DNA methylation is linked to gene silencing and is considered to be an important mechanism in the regulation of gene expression
Gene expression
Gene expression microarray
RNA-seq
Illumina BS-seq data manipulationAdvanced analysis and application
DNA methylation and gene expression
proximal TSS (-150 bp to +150 bp across TSS)
Promoter (1.5 kb upstream of the TSS)
Nature. 2009 Nov 19;462(7271):315-22
Genome Res. 2010 Mar;20(3):320-31.
Illumina BS-seq data manipulationAdvanced analysis and application
DNA methylation and gene expression
Illumina BS-seq data manipulationAdvanced analysis and application
Differentially methylated region(DMRs) and gene expression
DNA methylation at DNA–protein interaction sitesDNA methylation, miRNA, and histone modification……
Nature. 2009 Nov 19;462(7271):315-22
Genome Res. 2010 Mar;20(3):320-31.
Thank you!