cs-e5875 high-throughput bioinformatics dna methylation
TRANSCRIPT
1/ 29
CS-E5875 High-Throughput BioinformaticsDNA methylation analysis
Harri Lahdesmaki
Department of Computer ScienceAalto University
November 13, 2020
2/ 29
Contents
I DNA methylation
I Bisulfite sequencing (BS-seq) protocol
I Alignment and quantification of BS-seq data
I Statistical analysis of BS-seq data
3/ 29
DNA methylation
I Epigenetic changes are reversible modifications on DNA, or“on top of DNA”, which do notchange the DNA sequence itself
I DNA methylation is an epigenetic modification where methyl group is added to the 5position of a cytosine in DNA
I Methyl group is added enzymatically by DNA methyl transferases (DNMT)I By far the most extensively studied epigenetic modification on DNA
Figure from http://www.ks.uiuc.edu/Research/methylation/
4/ 29
DNA methylation
I In mammaling genomes, DNA methylationprimarily occurs in the context of CpGdinucleotides
I Non-CpG methylation found e.g. in stemcells and brain
I CpGs occur with a smaller frequency thanexpected
I Human genome GC content is 42%I CpGs are expected to occur 4.41% of the
timeI The frequency of CpG dinucleotides is
1%I Methylated CpGs are prone to
spontaneous deamination to thyminesFigure from (Schubeler, 2009)
5/ 29
DNA methylation
I Two general classes of enzymatic methylation activitiesI De novo methylationI Maintenance methylation
Figure from http://2014.igem.org/Team:Heidelberg/Project/PCR_2.0
6/ 29
DNA methylation in gene regulation and various traits
I CpG islands (C+G dense &500 long regions) are present in the 5’ regulatory regions ofmany genes
I Hypermethylation (=overmethylation) of CpG islands near gene promoters contributes totranscriptional silencing by
I Affecting binding of transcription factors (DNA binding protein that regulate genetranscription)
I Binding proteins with methyl-CpG-binding domains (MBDs), and recruiting e.g. histonedeacetylases and other chromatin remodellers
I DNA methylation differences are associated with many diseases
I DNA methylation is also known to associate with e.g. age of an individual and smoking
6/ 29
DNA methylation in gene regulation and various traits
I CpG islands (C+G dense &500 long regions) are present in the 5’ regulatory regions ofmany genes
I Hypermethylation (=overmethylation) of CpG islands near gene promoters contributes totranscriptional silencing by
I Affecting binding of transcription factors (DNA binding protein that regulate genetranscription)
I Binding proteins with methyl-CpG-binding domains (MBDs), and recruiting e.g. histonedeacetylases and other chromatin remodellers
I DNA methylation differences are associated with many diseases
I DNA methylation is also known to associate with e.g. age of an individual and smoking
7/ 29
DNA methylation
Figure from (Spruijt & Vermeulen, 2014)
8/ 29
DNA demethylation
I Until recently, it was believed that methylated DNA can be unmethylated only by dilutionduring cell differentiation/DNA replication
I Recently, TET family proteins were shown to be dioxygenases that converted 5mC to5hmC, 5fC and 5caC, which can be further converted back to unmethylated C
I TETs thus contribute to active demethylation, but 5hmC, 5fC and 5caC can also havemultiple functions
9/ 29
DNA demethylation
Nature Reviews | Molecular Cell Biology
N
N
NH2
O
RCytosine 5mC
N
N
NH2
O
R5hmC
N
N
NH2
O
R
OH
5fC
N
N
NH2
O
R
O
5caC
N
N
NH2
O
R
O
OH
5hmU
HN
N
O
O
R
OH
DNMT TET proteinTET proteinTET protein
AID or APOBEC?
TDG or SMUG1 and BER
TDG and BER
DNMT enzymes?
Unknown decarboxylase?a
cmC G
G Cm
mC GG Cm
mC GG Cm
mC GG C
C GG Cm
hmC GG C
C GG Chm
Replication Replication
hmC GG Chm
hmC GG C
C GG Chm
TET protein
UHRF1
UHRF1
DNMT1
DNMT1
Maintenancemethylation
Impairedmaintenancemethylation?
m
m
UHRF1DNMT1
UHRF1DNMT1
HN
N
O
O
RThymine
β-glucosylhydroxymethyluracil (base J)
HN
N
O
O
R
O O
OH
OHOH
HO
5hmU
HN
N
O
O
R
OH
Unknownβ-glucosyltransferase
JBPs
b Trypansoma brucei, other kinetoplastid protozoa
Figure 1 | Mechanisms of TET-mediated demethylation. a |�-PQYP�CPF�RWVCVKXG�RCVJYC[U�QH�&0#�FGOGVJ[NCVKQP�VJCV�KPXQNXG�QZKFK\GF�OGVJ[NE[VQUKPG�KPVGTOGFKCVGU��6GP�GNGXGP�VTCPUNQECVKQP�6'6��RTQVGKPU�UGSWGPVKCNN[�QZKFK\G���OGVJ[NE[VQUKPG��O%��VQ���J[FTQZ[OGVJ[NE[VQUKPG��JO%�����HQTO[NE[VQUKPG��H%��CPF���ECTDQZ[NE[VQUKPG��EC%����H%�CPF��EC%�ECP�DG�TGOQXGF�D[�VJ[OKPG�&0#�IN[EQU[NCUG�6&)��CPF�TGRNCEGF�D[�E[VQUKPG�XKC�DCUG�GZEKUKQP�TGRCKT�$'4���CNVJQWIJ�VJG�GZVGPV�VQ�YJKEJ�VJKU�OGEJCPKUO�QRGTCVGU�KP�URGEKHKE�EGNN�V[RGU�FWTKPI�FGXGNQROGPV�KU�WPMPQYP��1VJGT�RTQRQUGF�OGEJCPKUOU�QH�FGOGVJ[NCVKQP�CTG�NGUU�YGNN�GUVCDNKUJGF��KPENWFKPI�FGECTDQZ[NCVKQP�QH��EC%��&0#|OGVJ[NVTCPUHGTCUG�&0/6��OGFKCVGF�TGOQXCN�QH�VJG�J[FTQZ[OGVJ[N�ITQWR�QH��JO%�CPF�FGCOKPCVKQP�QH��JO%�CPF��O%��UGG�OCKP�VGZV��D[�VJG�E[VKFKPG�FGCOKPCUGU�#+&�CEVKXCVKQP�KPFWEGF�E[VKFKPG�FGCOKPCUG���CPF�#21$'%�CRQNKRQRTQVGKP�$�O40#�GFKVKPI�GP\[OG��ECVCN[VKE�RQN[RGRVKFG���#+&�GP\[OGU�FGCOKPCVG�E[VQUKPG�DCUGU�KP�&0#�VQ�yield uracil. #+&�CPF�VJG�NCTIGT�HCOKN[�QH�#21$'%�GP\[OGU�JCXG�DGGP�RTQRQUGF�VQ�GHHGEV�&0#�FGOGVJ[NCVKQP�D[�FGCOKPCVKPI��O%�CPF��JO%�KP�&0#�VQ�[KGNF�VJ[OKPG�CPF��JO7��TGURGEVKXGN[���#U�VJGUG�CTG�RTGUGPV�KP�OKUOCVEJGF�6�)�CPF��JO7�)�DCUGRCKTU��VJG[�JCXG�DGGP�RTQRQUGF�VQ�DG�GZEKUGF�D[�5/7)��UKPING�UVTCPF�UGNGEVKXG�OQPQHWPEVKQPCN�WTCEKN�&0#�IN[EQU[NCUG��QT�6&)��6JKU�OGEJCPKUO�KU�EQPVTQXGTUKCN��JQYGXGT�UGG�OCKP�VGZV���b | The mechanism of base J β�Ż�INWEQU[N�J[FTQZ[OGVJ[NWTCEKN� DKQU[PVJGUKU��6JG�VJ[OKFKPG�QZKFCVKQP�UVGR�OGFKCVGF�D[�,�DKPFKPI�RTQVGKP���,$2���QT�,$2���VQ�RTQFWEG���J[FTQZ[WTCEKN��JO7���KU�CPCNQIQWU�VQ�VJG��O%�QZKFCVKQP�OGFKCVGF�D[�6'6�RTQVGKPU��,$2U�CTG�VJG�HQWPFKPI�OGODGTU�QH�VJG�6'6s,$2�UWRGTHCOKN[��VJG�RTGFKEVGF�QZ[IGPCUG�FQOCKPU�QH�,$2��CPF�,$2��YGTG�WUGF�CU�VJG�UVCTVKPI�RQKPV�HQT�VJG�UGSWGPEG�RTQHKNG�UGCTEJGU�VJCV�TGEQXGTGF�VJG�JQOQNQIQWU�FQOCKPU�QH�VJG�VJTGG�OCOOCNKCP�6'6�proteins. c |�/GEJCPKUO�D[�YJKEJ��JO%�EQWNF�HCEKNKVCVG�TGRNKECVKQP�FGRGPFGPV�&0#�FGOGVJ[NCVKQP��#�U[OOGVTKECNN[�OGVJ[NCVG�F�%R)�UGSWGPEG�KU�EQPXGTVGF�FWTKPI�&0#�TGRNKECVKQP�KPVQ�VYQ�CU[OOGVTKECNN[�OGVJ[NCVGF�&0#�UVTCPFU�NGHV|RCPGN���*GOKOGVJ[NCVGF�%R)�UKVGU�CTG�TGEQIPK\GF�D[�7*4(+��VJG�QDNKICVG�RCTVPGT�QH�VJG�OCKPVGPCPEG�&0#�OGVJ[NVTCPUHGTCUG�&0/6���YJKEJ�TGUVQTGU�U[OOGVTKECN�OGVJ[NCVKQP��6'6�RTQVGKPU�CEV�CV�OGVJ[NCVGF�%R)�UKVGU�VQ�IGPGTCVG�U[OOGVTKECNN[�J[FTQZ[OGVJ[NCVGF�%R)�UGSWGPEGU���JO%�CPF�QVJGT�QZK\KFGF�OGVJ[NE[VQUKPGU�OC[�KORCKT�OCKPVGPCPEG�OGVJ[NCVKQP�D[�KPJKDKVKPI�7*4(��DKPFKPI��&0/6��CEVKXKV[��QT�DQVJ�TKIJV�RCPGN���#U�C�TGUWNV��VJG�%R)�UGSWGPEG�RTQITGUUKXGN[�NQUGU�&0#�OGVJ[NCVKQP�VJTQWIJ�UWEEGUUKXG�&0#�TGRNKECVKQP�E[ENGU�
REVIEWS
342 | JUNE 2013 | VOLUME 14 www.nature.com/reviews/molcellbio
© 2013 Macmillan Publishers Limited. All rights reserved
BER := base excision repair TDG := thymine DNA glycosylase AID := activation-induced deaminase APOBEC := apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like
References and Notes1. J. P. Doyle et al., Cell 135, 749 (2008).2. M. Heiman et al., Cell 135, 738 (2008).3. G. R. Wyatt, S. S. Cohen, Biochem. J. 55, 774
(1953).4. S. Tardy-Planechaud, J. Fujimoto, S. S. Lin, L. C. Sowers,
Nucleic Acids Res. 25, 553 (1997).5. A. Burdzy, K. T. Noyes, V. Valinluck, L. C. Sowers,
Nucleic Acids Res. 30, 4068 (2002).6. S. Zuo, R. J. Boorstein, G. W. Teebor, Nucleic Acids Res.
23, 3239 (1995).7. J. Cadet, T. Douki, J. L. Ravanat, Nat. Chem. Biol. 2, 348
(2006).
8. N. W. Penn, R. Suwalski, C. O'Riley, K. Bojanowski, R. Yura,Biochem. J. 126, 781 (1972).
9. R. M. Kothari, V. Shankar, J. Mol. Evol. 7, 325 (1976).10. J. H. Gommers-Ampt, P. Borst, FASEB J. 9, 1034 (1995).11. H. Hayatsu, M. Shiragami, Biochemistry 18, 632 (1979).12. S. K. Ooi, T. H. Bestor, Cell 133, 1145 (2008).13. V. Valinluck et al., Nucleic Acids Res. 32, 4100 (2004).14. B. H. Ramsahoye, Methods Mol. Biol. 200, 9 (2002).15. We thank B. Gauthier for technical assistance; S. Mazel,
C. Bare, and X. Fan for flow cytometry advice and nucleisorts; and H. Deng and J. Fernandez for acquirement ofMS data and help with HPLC. We are grateful to membersof the Heintz laboratory for discussions and support. This
work was supported by the Howard Hughes MedicalInstitute and the Simons Foundation Autism ResearchInitiative.
Supporting Online Materialwww.sciencemag.org/cgi/content/full/1169786/DC1Materials and MethodsFigs. S1 to S6References
15 December 2008; accepted 18 March 2009Published online 16 April 2009;10.1126/science.1169786Include this information when citing this paper.
Conversion of 5-Methylcytosine to5-Hydroxymethylcytosine in MammalianDNA by MLL Partner TET1Mamta Tahiliani,1 Kian Peng Koh,1 Yinghua Shen,2 William A. Pastor,1Hozefa Bandukwala,1 Yevgeny Brudno,2 Suneet Agarwal,3 Lakshminarayan M. Iyer,4David R. Liu,2* L. Aravind,4* Anjana Rao1*DNA cytosine methylation is crucial for retrotransposon silencing and mammalian development. In acomputational search for enzymes that could modify 5-methylcytosine (5mC), we identified TET proteinsas mammalian homologs of the trypanosome proteins JBP1 and JBP2, which have been proposed tooxidize the 5-methyl group of thymine. We show here that TET1, a fusion partner of the MLL gene in acutemyeloid leukemia, is a 2-oxoglutarate (2OG)- and Fe(II)-dependent enzyme that catalyzes conversionof 5mC to 5-hydroxymethylcytosine (hmC) in cultured cells and in vitro. hmC is present in the genome ofmouse embryonic stem cells, and hmC levels decrease upon RNA interference–mediated depletion of TET1.Thus, TET proteins have potential roles in epigenetic regulation through modification of 5mC to hmC.
5-methylcytosine (5mC) is a minor base inmammalian DNA: It constitutes ~1% of allDNA bases and is found almost exclusively
as symmetrical methylation of the dinucleotideCpG (1). The majority of methylated CpG is
found in repetitive DNA elements, suggestingthat cytosine methylation evolved as a defenseagainst transposons and other parasitic elements(2). Methylation patterns change dynamically inearly embryogenesis, when CpG methylation is
essential for X-inactivation and asymmetric ex-pression of imprinted genes (3). In somatic cells,promoter methylation often shows a correlationwith gene expression: CpG methylation may di-rectly interfere with the binding of certain transcrip-tional regulators to their cognate DNA sequencesor may enable recruitment of methyl-CpG bindingproteins that create a repressed chromatin environ-ment (4). DNA methylation patterns are highlydysregulated in cancer: Changes in methylationstatus have been postulated to inactivate tumorsuppressors and activate oncogenes, thus con-tributing to tumorigenesis (5).
Fig. 2. Two-dimensional TLC, HPLC, and MS identification of hmC. (A) Two-dimensional TLC analysis of synthetic DNA templates indicates that hmCcomigrates with the “x” spot (Fig. 1). (B) HPLC chromatograms (A, 254 nm) ofthe nucleosides derived from synthetic and cerebellum DNA. The peaks wereidentified by MS. The arrow points to the peak, which elutes at the same time
as hmdC. (C) MS of the fraction corresponding to the HPLC peak indicatedabove. Closed arrows indicate the masses of 5-hydroxymethylcytosine and5-hydroxymethyl-2′-deoxycytidine sodium ions (structures are shown in theinsets). Open arrows indicate the ions generated by 2′-deoxycytidine, whichelutes in a large nearby peak and spills over into the analyzed fraction.
1Department of Pathology, Harvard Medical School and Im-mune Disease Institute, 200 Longwood Avenue, Boston, MA02115, USA. 2Department of Chemistry and Chemical Bi-ology and the Howard Hughes Medical Institute, HarvardUniversity, Cambridge, MA 02138, USA. 3Division of Pe-diatric Hematology/Oncology, Children’s Hospital Bostonand Dana-Farber Cancer Institute, Boston, MA 02115, USA.4National Center for Biotechnology Information, NationalLibrary of Medicine, National Institutes of Health, Bethesda,MD 20894, USA.
*To whom correspondence should be addressed. E-mail:[email protected] (A.R.); [email protected](L.A.); [email protected] (D.R.L.)
15 MAY 2009 VOL 324 SCIENCE www.sciencemag.org930
REPORTS
References and Notes1. J. P. Doyle et al., Cell 135, 749 (2008).2. M. Heiman et al., Cell 135, 738 (2008).3. G. R. Wyatt, S. S. Cohen, Biochem. J. 55, 774
(1953).4. S. Tardy-Planechaud, J. Fujimoto, S. S. Lin, L. C. Sowers,
Nucleic Acids Res. 25, 553 (1997).5. A. Burdzy, K. T. Noyes, V. Valinluck, L. C. Sowers,
Nucleic Acids Res. 30, 4068 (2002).6. S. Zuo, R. J. Boorstein, G. W. Teebor, Nucleic Acids Res.
23, 3239 (1995).7. J. Cadet, T. Douki, J. L. Ravanat, Nat. Chem. Biol. 2, 348
(2006).
8. N. W. Penn, R. Suwalski, C. O'Riley, K. Bojanowski, R. Yura,Biochem. J. 126, 781 (1972).
9. R. M. Kothari, V. Shankar, J. Mol. Evol. 7, 325 (1976).10. J. H. Gommers-Ampt, P. Borst, FASEB J. 9, 1034 (1995).11. H. Hayatsu, M. Shiragami, Biochemistry 18, 632 (1979).12. S. K. Ooi, T. H. Bestor, Cell 133, 1145 (2008).13. V. Valinluck et al., Nucleic Acids Res. 32, 4100 (2004).14. B. H. Ramsahoye, Methods Mol. Biol. 200, 9 (2002).15. We thank B. Gauthier for technical assistance; S. Mazel,
C. Bare, and X. Fan for flow cytometry advice and nucleisorts; and H. Deng and J. Fernandez for acquirement ofMS data and help with HPLC. We are grateful to membersof the Heintz laboratory for discussions and support. This
work was supported by the Howard Hughes MedicalInstitute and the Simons Foundation Autism ResearchInitiative.
Supporting Online Materialwww.sciencemag.org/cgi/content/full/1169786/DC1Materials and MethodsFigs. S1 to S6References
15 December 2008; accepted 18 March 2009Published online 16 April 2009;10.1126/science.1169786Include this information when citing this paper.
Conversion of 5-Methylcytosine to5-Hydroxymethylcytosine in MammalianDNA by MLL Partner TET1Mamta Tahiliani,1 Kian Peng Koh,1 Yinghua Shen,2 William A. Pastor,1Hozefa Bandukwala,1 Yevgeny Brudno,2 Suneet Agarwal,3 Lakshminarayan M. Iyer,4David R. Liu,2* L. Aravind,4* Anjana Rao1*DNA cytosine methylation is crucial for retrotransposon silencing and mammalian development. In acomputational search for enzymes that could modify 5-methylcytosine (5mC), we identified TET proteinsas mammalian homologs of the trypanosome proteins JBP1 and JBP2, which have been proposed tooxidize the 5-methyl group of thymine. We show here that TET1, a fusion partner of the MLL gene in acutemyeloid leukemia, is a 2-oxoglutarate (2OG)- and Fe(II)-dependent enzyme that catalyzes conversionof 5mC to 5-hydroxymethylcytosine (hmC) in cultured cells and in vitro. hmC is present in the genome ofmouse embryonic stem cells, and hmC levels decrease upon RNA interference–mediated depletion of TET1.Thus, TET proteins have potential roles in epigenetic regulation through modification of 5mC to hmC.
5-methylcytosine (5mC) is a minor base inmammalian DNA: It constitutes ~1% of allDNA bases and is found almost exclusively
as symmetrical methylation of the dinucleotideCpG (1). The majority of methylated CpG is
found in repetitive DNA elements, suggestingthat cytosine methylation evolved as a defenseagainst transposons and other parasitic elements(2). Methylation patterns change dynamically inearly embryogenesis, when CpG methylation is
essential for X-inactivation and asymmetric ex-pression of imprinted genes (3). In somatic cells,promoter methylation often shows a correlationwith gene expression: CpG methylation may di-rectly interfere with the binding of certain transcrip-tional regulators to their cognate DNA sequencesor may enable recruitment of methyl-CpG bindingproteins that create a repressed chromatin environ-ment (4). DNA methylation patterns are highlydysregulated in cancer: Changes in methylationstatus have been postulated to inactivate tumorsuppressors and activate oncogenes, thus con-tributing to tumorigenesis (5).
Fig. 2. Two-dimensional TLC, HPLC, and MS identification of hmC. (A) Two-dimensional TLC analysis of synthetic DNA templates indicates that hmCcomigrates with the “x” spot (Fig. 1). (B) HPLC chromatograms (A, 254 nm) ofthe nucleosides derived from synthetic and cerebellum DNA. The peaks wereidentified by MS. The arrow points to the peak, which elutes at the same time
as hmdC. (C) MS of the fraction corresponding to the HPLC peak indicatedabove. Closed arrows indicate the masses of 5-hydroxymethylcytosine and5-hydroxymethyl-2′-deoxycytidine sodium ions (structures are shown in theinsets). Open arrows indicate the ions generated by 2′-deoxycytidine, whichelutes in a large nearby peak and spills over into the analyzed fraction.
1Department of Pathology, Harvard Medical School and Im-mune Disease Institute, 200 Longwood Avenue, Boston, MA02115, USA. 2Department of Chemistry and Chemical Bi-ology and the Howard Hughes Medical Institute, HarvardUniversity, Cambridge, MA 02138, USA. 3Division of Pe-diatric Hematology/Oncology, Children’s Hospital Bostonand Dana-Farber Cancer Institute, Boston, MA 02115, USA.4National Center for Biotechnology Information, NationalLibrary of Medicine, National Institutes of Health, Bethesda,MD 20894, USA.
*To whom correspondence should be addressed. E-mail:[email protected] (A.R.); [email protected](L.A.); [email protected] (D.R.L.)
15 MAY 2009 VOL 324 SCIENCE www.sciencemag.org930
REPORTS
data have accession numbers AFHZ00000000 (AAA001-B15),AFIB00000000 (AAA001-C10), AFHY00000000(AAA007-O20), and AFIA00000000 (AAA240-J09). Rawsequences were deposited in the GenBank Short Read Archiveunder accession numbers SRA029592 and SRA035467(AAA001-B15), SRA029604 and SRA035394 (AAA001-C10),
SRA029593 and SRA035468 (AAA007-O20), and SRA029596and SRA035470 (AAA240-J09).
Supporting Online Materialwww.sciencemag.org/cgi/content/full/333/6047/1296/DC1Materials and Methods
Figs. S1 to S19Tables S1 to S15References
1 February 2011; accepted 13 July 201110.1126/science.1203690
Tet Proteins Can Convert5-Methylcytosine to 5-Formylcytosineand 5-CarboxylcytosineShinsuke Ito,1,2* Li Shen,1,2* Qing Dai,3 Susan C. Wu,1,2 Leonard B. Collins,4 James A. Swenberg,2,4
Chuan He,3 Yi Zhang1,2†
5-methylcytosine (5mC) in DNA plays an important role in gene expression, genomic imprinting, andsuppression of transposable elements. 5mC can be converted to 5-hydroxymethylcytosine (5hmC) bythe Tet (ten eleven translocation) proteins. Here, we show that, in addition to 5hmC, the Tet proteins cangenerate 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) from 5mC in an enzymatic activity–dependent manner. Furthermore, we reveal the presence of 5fC and 5caC in genomic DNA of mouseembryonic stem cells andmouse organs. The genomic content of 5hmC, 5fC, and 5caC can be increased orreduced through overexpression or depletion of Tet proteins. Thus, we identify two previously unknowncytosine derivatives in genomic DNA as the products of Tet proteins. Our study raises the possibilitythat DNA demethylation may occur through Tet-catalyzed oxidation followed by decarboxylation.
Although enzymes that catalyze DNAmeth-ylation process are well studied (1), howDNA demethylation is achieved is less
known, especially in animals (2, 3). A repair-based mechanism is used in DNA demethylationin plants, but whether a similar mechanism is
also used in mammalian cells is unclear (3, 4).Identification of hydroxymethylcytosine (5hmC)as the sixth base of the mammalian genome(5, 6) and the capacity of Tet (ten eleven trans-location) proteins to convert 5-methylcytosine(5mC) to 5hmC in an Fe(II) and alpha-ketoglutarate(a-KG)–dependent oxidation reaction (6, 7) raisedthe possibility that a Tet-catalyzed reaction mightbe part of the DNA demethylation process.
A potential 5mC demethylation mechanismcan be envisioned from similar chemistry forthymine-to-uracil conversion (3, 8, 9) (fig. S1A),
1Howard Hughes Medical Institute and Department of Bio-chemistry and Biophysics, University of North Carolina atChapel Hill, Chapel Hill, NC 27599–7295, USA. 2LinebergerComprehensive Cancer Center, University of North Carolina atChapel Hill, Chapel Hill, NC 27599–7295, USA. 3Departmentof Chemistry and Institute for Biophysical Dynamics, Uni-versity of Chicago, Chicago, IL 60637, USA. 4Department ofEnvironmental Sciences and Engineering, University of NorthCarolina at Chapel Hill, Chapel Hill, NC 27599–7295, USA.
*These authors contributed equally to this work.†To whom correspondence should be addressed. E-mail:[email protected]
Fig. 1. Optimization of conditions for detection of cytosine and its5-position modified forms by TLC. (A) Migration of labeled C andits 5-position modified forms by TLC under the first developingbuffer. Lanes 1 to 3 serve as controls for the migration of 5mC and5hmC generated from DNA oligos incubated with wild-type (WT) orcatalytic mutant (MUT) Tet2. (B) The same samples used in (A)were separated by TLC under the second developing buffer. Withthe exception of 5mC and C, all of the other forms of C can beseparated under this condition. (C) Autoradiographs of 2D-TLCanalysis of samples derived from 5mC-containing TaqI 20-meroligo DNA incubated with WT and catalytic-deficient mutant Tet1,Tet2, and Tet3.
2 SEPTEMBER 2011 VOL 333 SCIENCE www.sciencemag.org1300
REPORTS
on
Sept
embe
r 23,
201
1w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
data have accession numbers AFHZ00000000 (AAA001-B15),AFIB00000000 (AAA001-C10), AFHY00000000(AAA007-O20), and AFIA00000000 (AAA240-J09). Rawsequences were deposited in the GenBank Short Read Archiveunder accession numbers SRA029592 and SRA035467(AAA001-B15), SRA029604 and SRA035394 (AAA001-C10),
SRA029593 and SRA035468 (AAA007-O20), and SRA029596and SRA035470 (AAA240-J09).
Supporting Online Materialwww.sciencemag.org/cgi/content/full/333/6047/1296/DC1Materials and Methods
Figs. S1 to S19Tables S1 to S15References
1 February 2011; accepted 13 July 201110.1126/science.1203690
Tet Proteins Can Convert5-Methylcytosine to 5-Formylcytosineand 5-CarboxylcytosineShinsuke Ito,1,2* Li Shen,1,2* Qing Dai,3 Susan C. Wu,1,2 Leonard B. Collins,4 James A. Swenberg,2,4
Chuan He,3 Yi Zhang1,2†
5-methylcytosine (5mC) in DNA plays an important role in gene expression, genomic imprinting, andsuppression of transposable elements. 5mC can be converted to 5-hydroxymethylcytosine (5hmC) bythe Tet (ten eleven translocation) proteins. Here, we show that, in addition to 5hmC, the Tet proteins cangenerate 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) from 5mC in an enzymatic activity–dependent manner. Furthermore, we reveal the presence of 5fC and 5caC in genomic DNA of mouseembryonic stem cells andmouse organs. The genomic content of 5hmC, 5fC, and 5caC can be increased orreduced through overexpression or depletion of Tet proteins. Thus, we identify two previously unknowncytosine derivatives in genomic DNA as the products of Tet proteins. Our study raises the possibilitythat DNA demethylation may occur through Tet-catalyzed oxidation followed by decarboxylation.
Although enzymes that catalyze DNAmeth-ylation process are well studied (1), howDNA demethylation is achieved is less
known, especially in animals (2, 3). A repair-based mechanism is used in DNA demethylationin plants, but whether a similar mechanism is
also used in mammalian cells is unclear (3, 4).Identification of hydroxymethylcytosine (5hmC)as the sixth base of the mammalian genome(5, 6) and the capacity of Tet (ten eleven trans-location) proteins to convert 5-methylcytosine(5mC) to 5hmC in an Fe(II) and alpha-ketoglutarate(a-KG)–dependent oxidation reaction (6, 7) raisedthe possibility that a Tet-catalyzed reaction mightbe part of the DNA demethylation process.
A potential 5mC demethylation mechanismcan be envisioned from similar chemistry forthymine-to-uracil conversion (3, 8, 9) (fig. S1A),
1Howard Hughes Medical Institute and Department of Bio-chemistry and Biophysics, University of North Carolina atChapel Hill, Chapel Hill, NC 27599–7295, USA. 2LinebergerComprehensive Cancer Center, University of North Carolina atChapel Hill, Chapel Hill, NC 27599–7295, USA. 3Departmentof Chemistry and Institute for Biophysical Dynamics, Uni-versity of Chicago, Chicago, IL 60637, USA. 4Department ofEnvironmental Sciences and Engineering, University of NorthCarolina at Chapel Hill, Chapel Hill, NC 27599–7295, USA.
*These authors contributed equally to this work.†To whom correspondence should be addressed. E-mail:[email protected]
Fig. 1. Optimization of conditions for detection of cytosine and its5-position modified forms by TLC. (A) Migration of labeled C andits 5-position modified forms by TLC under the first developingbuffer. Lanes 1 to 3 serve as controls for the migration of 5mC and5hmC generated from DNA oligos incubated with wild-type (WT) orcatalytic mutant (MUT) Tet2. (B) The same samples used in (A)were separated by TLC under the second developing buffer. Withthe exception of 5mC and C, all of the other forms of C can beseparated under this condition. (C) Autoradiographs of 2D-TLCanalysis of samples derived from 5mC-containing TaqI 20-meroligo DNA incubated with WT and catalytic-deficient mutant Tet1,Tet2, and Tet3.
2 SEPTEMBER 2011 VOL 333 SCIENCE www.sciencemag.org1300
REPORTS
on
Sept
embe
r 23,
201
1w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
10/ 29
Contents
I DNA methylation
I Bisulfite sequencing (BS-seq) protocol
I Alignment and quantification of BS-seq data
I Statistical analysis of BS-seq data
11/ 29
Bisulfite sequencing (BS-seq) protocol
I Bisulfite treatment of genomic DNA converts unmethylated cytosines to urasils which areread as thymine during sequencing
I Methylated (and hydroxymethylated) cytosines are resistant to the conversion and are readas cytosine
3. W. Kim, S. Kook, D. J. Kim, C. Teodorof, W. K. Song,J. Biol. Chem. 279, 8333 (2004).
4. V. Giambra et al., Mol. Cell. Biol. 28, 6123 (2008).5. F. E. Garrett et al., Mol. Cell. Biol. 25, 1511
(2005).6. W. A. Dunnick et al., J. Exp. Med. 206, 2613
(2009).7. M. Cogné et al., Cell 77, 737 (1994).8. J. P. Manis et al., J. Exp. Med. 188, 1421 (1998).9. A. G. Bébin et al., J. Immunol. 184, 3710 (2010).
10. E. Pinaud et al., Immunity 15, 187 (2001).11. C. Vincent-Fabert et al., Blood 116, 1895 (2010).12. R. Wuerffel et al., Immunity 27, 711 (2007).13. Z. Ju et al., J. Biol. Chem. 282, 35169 (2007).14. H. Duan, H. Xiang, L. Ma, L. M. Boxer, Oncogene 27,
6720 (2008).15. M. Gostissa et al., Nature 462, 803 (2009).16. C. Chauveau, M. Cogné, Nat. Genet. 14, 15 (1996).
17. C. Chauveau, E. Pinaud, M. Cogne, Eur. J. Immunol. 28,3048 (1998).
18. M. A. Sepulveda, F. E. Garrett, A. Price-Whelan,B. K. Birshtein, Mol. Immunol. 42, 605 (2005).
19. E. Pinaud, C. Aupetit, C. Chauveau, M. Cogné,Eur. J. Immunol. 27, 2981 (1997).
20. A. A. Khamlichi et al., Blood 103, 3828 (2004).21. R. Shinkura et al., Nat. Immunol. 4, 435 (2003).22. A. Yamane et al., Nat. Immunol. 12, 62 (2011).23. M. Liu et al., Nature 451, 841 (2008).24. J. Stavnezer, J. E. Guikema, C. E. Schrader, Annu. Rev.
Immunol. 26, 261 (2008).25. S. Duchez et al., Proc. Natl. Acad. Sci. U.S.A. 107, 3064
(2010).26. T. K. Kim et al., Nature 465, 182 (2010).
Acknowledgments: We thank T. Honjo for providingAID−/− mice and F. Lechouane for sorted B cells DNA samples.
We are indebted to the cell sorting facility of LimogesUniversity for excellent technical assistance in cell sorting. Thiswork was supported by grants from Association pour laRecherche sur le Cancer, Ligue Nationale contre le Cancer,Cancéropôle Grand Sud-Ouest, Institut National du Cancer,and Région Limousin. The data presented in this paper aretabulated here and in the supplementary materials.
Supplementary Materialswww.sciencemag.org/cgi/content/full/science.1218692/DC1Materials and MethodsFigs. S1 to S4Tables S1 and S2References (27–30)
4 January 2012; accepted 27 March 2012Published online 26 April 2012;10.1126/science.1218692
Quantitative Sequencing of5-Methylcytosine and5-Hydroxymethylcytosine atSingle-Base ResolutionMichael J. Booth,1* Miguel R. Branco,2,3* Gabriella Ficz,2 David Oxley,4 Felix Krueger,5
Wolf Reik,2,3† Shankar Balasubramanian1,6,7†
5-Methylcytosine can be converted to 5-hydroxymethylcytosine (5hmC) in mammalian DNA by theten-eleven translocation (TET) enzymes. We introduce oxidative bisulfite sequencing (oxBS-Seq),the first method for quantitative mapping of 5hmC in genomic DNA at single-nucleotide resolution.Selective chemical oxidation of 5hmC to 5-formylcytosine (5fC) enables bisulfite conversion of5fC to uracil. We demonstrate the utility of oxBS-Seq to map and quantify 5hmC at CpG islands(CGIs) in mouse embryonic stem (ES) cells and identify 800 5hmC-containing CGIs that haveon average 3.3% hydroxymethylation. High levels of 5hmC were found in CGIs associated withtranscriptional regulators and in long interspersed nuclear elements, suggesting that theseregions might undergo epigenetic reprogramming in ES cells. Our results open new questionson 5hmC dynamics and sequence-specific targeting by TETs.
5-Methylcytosine (5mC) is an epigenetic DNAmark that plays important roles in genesilencing and genome stability and is found
enriched at CpG dinucleotides (1). In metazoa,5mC can be oxidized to 5-hydroxymethylcytosine(5hmC) by the ten-eleven translocation (TET) en-zyme family (2, 3). 5hmCmay be an intermediatein active DNA demethylation but could also con-stitute an epigenetic mark per se (4). Levels of5hmC in genomic DNA can be quantified withanalytical methods (2, 5, 6) and mapped throughthe enrichment of 5hmC-containing DNA frag-
ments that are then sequenced (7–13). Such ap-proaches have relatively poor resolution and giveonly relative quantitative information. Single-nucleotide sequencing of 5mC has been per-formed by using bisulfite sequencing (BS-Seq),but this method cannot discriminate 5mC from5hmC (14, 15). Single-molecule real-time se-quencing (SMRT) can detect derivatized 5hmCin genomic DNA (16). However, enrichment of5hmC-containing DNA fragments is required,which causes loss of quantitative information(16). Furthermore, SMRT has a relatively highrate of sequencing errors (17), and the peak call-ing of modifications is imprecise (16). Proteinand solid-state nanopores can resolve 5mC from5hmC and have the potential to sequence unam-plified DNA (18, 19).
We observed the decarbonylation and deami-nation of 5-formylcytosine (5fC) to uracil (U)under bisulfite conditions that would leave 5mCunchanged (Fig. 1A and supplementary text).Thus, 5hmC sequencing would be possible if5hmC could be selectively oxidized to 5fC andthen converted to U in a two-step procedure (Fig.
1B). Whereas BS-Seq leads to both 5mC and5hmC being detected as Cs, this “oxidativebisulfite” sequencing (oxBS-Seq) approach wouldyield Cs only at 5mC sites and therefore allowus to determine the amount of 5hmC at a partic-ular nucleotide position by subtraction of thisreadout from a BS-Seq one (Fig. 1C).
Specific oxidation of 5hmC to 5fC (table S1)was achieved with potassium perruthenate (KRuO4).In our reactivity studies on a synthetic 15-nucleotideoligomer single-stranded DNA (ssDNA) contain-ing 5hmC, we established conditions under whichKRuO4 reacted specifically with the primary al-cohol of 5hmC (Fig. 2A). Fifteen-nucleotide oligo-mer ssDNA that contained C or 5mC did notshow any base-specific reactions with KRuO4 (fig.S1, A and B). For 5hmC in DNA, we only ob-served the aldehyde (5fC) and not the carboxylicacid (20), even with a moderate excess of oxidant.The KRuO4 oxidation can oxidize 5hmC in sam-ples presented as double-stranded DNA (dsDNA),with an initial denaturing step before addition ofthe oxidant; this results in a quantitative conver-sion of 5hmC to 5fC (Fig. 2B).
To test the efficiency and selectivity of the oxi-dative bisulfite method, three synthetic dsDNAscontaining either C, 5mC, or 5hmC were eachoxidized with KRuO4 and then subjected to aconventional bisulfite conversion protocol. Sangersequencing revealed that 5mC residues did notconvert to U, whereas both C and 5hmC resi-dues did convert to U (fig. S2). Because Sangersequencing is not quantitative, to gain a moreaccurate measure of the efficiency of transforming5hmC to U, Illumina (San Diego, California) se-quencing was carried out on the synthetic DNAcontaining 5hmC (122-nucleotide oligomer) afteroxidative bisulfite treatment. An overall 5hmC-to-U conversion level of 94.5% was observed (Fig.2C and fig. S14). The oxidative bisulfite proto-col was also applied to a synthetic dsDNA thatcontained multiple 5hmC residues (135-nucleotideoligomer) in a range of different contexts thatshowed a similarly high conversion efficiency(94.7%) of 5hmC to U (Fig. 2C and fig. S14).Last, the KRuO4 oxidation was carried out ongenomic DNA and showed through mass spec-trometry a quantitative conversion of 5hmC to
1Department of Chemistry, University of Cambridge, CambridgeCB2 1EW, UK. 2Epigenetics Programme, Babraham Institute,Cambridge CB22 3AT, UK. 3Centre for Trophoblast Research,University of Cambridge, Cambridge CB2 3EG, UK. 4ProteomicsResearch Group, Babraham Institute, Cambridge CB22 3AT,UK. 5Bioinformatics Group, Babraham Institute, CambridgeCB22 3AT, UK. 6School of Clinical Medicine, University ofCambridge, Cambridge CB2 0SP, UK. 7Cancer Research UK,Cambridge Research Institute, Li Ka Shing Centre, Cam-bridge CB2 0RE, UK.
*These authors contributed equally to this work.†To whom correspondence should be addressed. E-mail:[email protected] (W.R.); [email protected] (S.B.)
18 MAY 2012 VOL 336 SCIENCE www.sciencemag.org934
REPORTS
3. W. Kim, S. Kook, D. J. Kim, C. Teodorof, W. K. Song,J. Biol. Chem. 279, 8333 (2004).
4. V. Giambra et al., Mol. Cell. Biol. 28, 6123 (2008).5. F. E. Garrett et al., Mol. Cell. Biol. 25, 1511
(2005).6. W. A. Dunnick et al., J. Exp. Med. 206, 2613
(2009).7. M. Cogné et al., Cell 77, 737 (1994).8. J. P. Manis et al., J. Exp. Med. 188, 1421 (1998).9. A. G. Bébin et al., J. Immunol. 184, 3710 (2010).
10. E. Pinaud et al., Immunity 15, 187 (2001).11. C. Vincent-Fabert et al., Blood 116, 1895 (2010).12. R. Wuerffel et al., Immunity 27, 711 (2007).13. Z. Ju et al., J. Biol. Chem. 282, 35169 (2007).14. H. Duan, H. Xiang, L. Ma, L. M. Boxer, Oncogene 27,
6720 (2008).15. M. Gostissa et al., Nature 462, 803 (2009).16. C. Chauveau, M. Cogné, Nat. Genet. 14, 15 (1996).
17. C. Chauveau, E. Pinaud, M. Cogne, Eur. J. Immunol. 28,3048 (1998).
18. M. A. Sepulveda, F. E. Garrett, A. Price-Whelan,B. K. Birshtein, Mol. Immunol. 42, 605 (2005).
19. E. Pinaud, C. Aupetit, C. Chauveau, M. Cogné,Eur. J. Immunol. 27, 2981 (1997).
20. A. A. Khamlichi et al., Blood 103, 3828 (2004).21. R. Shinkura et al., Nat. Immunol. 4, 435 (2003).22. A. Yamane et al., Nat. Immunol. 12, 62 (2011).23. M. Liu et al., Nature 451, 841 (2008).24. J. Stavnezer, J. E. Guikema, C. E. Schrader, Annu. Rev.
Immunol. 26, 261 (2008).25. S. Duchez et al., Proc. Natl. Acad. Sci. U.S.A. 107, 3064
(2010).26. T. K. Kim et al., Nature 465, 182 (2010).
Acknowledgments: We thank T. Honjo for providingAID−/− mice and F. Lechouane for sorted B cells DNA samples.
We are indebted to the cell sorting facility of LimogesUniversity for excellent technical assistance in cell sorting. Thiswork was supported by grants from Association pour laRecherche sur le Cancer, Ligue Nationale contre le Cancer,Cancéropôle Grand Sud-Ouest, Institut National du Cancer,and Région Limousin. The data presented in this paper aretabulated here and in the supplementary materials.
Supplementary Materialswww.sciencemag.org/cgi/content/full/science.1218692/DC1Materials and MethodsFigs. S1 to S4Tables S1 and S2References (27–30)
4 January 2012; accepted 27 March 2012Published online 26 April 2012;10.1126/science.1218692
Quantitative Sequencing of5-Methylcytosine and5-Hydroxymethylcytosine atSingle-Base ResolutionMichael J. Booth,1* Miguel R. Branco,2,3* Gabriella Ficz,2 David Oxley,4 Felix Krueger,5
Wolf Reik,2,3† Shankar Balasubramanian1,6,7†
5-Methylcytosine can be converted to 5-hydroxymethylcytosine (5hmC) in mammalian DNA by theten-eleven translocation (TET) enzymes. We introduce oxidative bisulfite sequencing (oxBS-Seq),the first method for quantitative mapping of 5hmC in genomic DNA at single-nucleotide resolution.Selective chemical oxidation of 5hmC to 5-formylcytosine (5fC) enables bisulfite conversion of5fC to uracil. We demonstrate the utility of oxBS-Seq to map and quantify 5hmC at CpG islands(CGIs) in mouse embryonic stem (ES) cells and identify 800 5hmC-containing CGIs that haveon average 3.3% hydroxymethylation. High levels of 5hmC were found in CGIs associated withtranscriptional regulators and in long interspersed nuclear elements, suggesting that theseregions might undergo epigenetic reprogramming in ES cells. Our results open new questionson 5hmC dynamics and sequence-specific targeting by TETs.
5-Methylcytosine (5mC) is an epigenetic DNAmark that plays important roles in genesilencing and genome stability and is found
enriched at CpG dinucleotides (1). In metazoa,5mC can be oxidized to 5-hydroxymethylcytosine(5hmC) by the ten-eleven translocation (TET) en-zyme family (2, 3). 5hmCmay be an intermediatein active DNA demethylation but could also con-stitute an epigenetic mark per se (4). Levels of5hmC in genomic DNA can be quantified withanalytical methods (2, 5, 6) and mapped throughthe enrichment of 5hmC-containing DNA frag-
ments that are then sequenced (7–13). Such ap-proaches have relatively poor resolution and giveonly relative quantitative information. Single-nucleotide sequencing of 5mC has been per-formed by using bisulfite sequencing (BS-Seq),but this method cannot discriminate 5mC from5hmC (14, 15). Single-molecule real-time se-quencing (SMRT) can detect derivatized 5hmCin genomic DNA (16). However, enrichment of5hmC-containing DNA fragments is required,which causes loss of quantitative information(16). Furthermore, SMRT has a relatively highrate of sequencing errors (17), and the peak call-ing of modifications is imprecise (16). Proteinand solid-state nanopores can resolve 5mC from5hmC and have the potential to sequence unam-plified DNA (18, 19).
We observed the decarbonylation and deami-nation of 5-formylcytosine (5fC) to uracil (U)under bisulfite conditions that would leave 5mCunchanged (Fig. 1A and supplementary text).Thus, 5hmC sequencing would be possible if5hmC could be selectively oxidized to 5fC andthen converted to U in a two-step procedure (Fig.
1B). Whereas BS-Seq leads to both 5mC and5hmC being detected as Cs, this “oxidativebisulfite” sequencing (oxBS-Seq) approach wouldyield Cs only at 5mC sites and therefore allowus to determine the amount of 5hmC at a partic-ular nucleotide position by subtraction of thisreadout from a BS-Seq one (Fig. 1C).
Specific oxidation of 5hmC to 5fC (table S1)was achieved with potassium perruthenate (KRuO4).In our reactivity studies on a synthetic 15-nucleotideoligomer single-stranded DNA (ssDNA) contain-ing 5hmC, we established conditions under whichKRuO4 reacted specifically with the primary al-cohol of 5hmC (Fig. 2A). Fifteen-nucleotide oligo-mer ssDNA that contained C or 5mC did notshow any base-specific reactions with KRuO4 (fig.S1, A and B). For 5hmC in DNA, we only ob-served the aldehyde (5fC) and not the carboxylicacid (20), even with a moderate excess of oxidant.The KRuO4 oxidation can oxidize 5hmC in sam-ples presented as double-stranded DNA (dsDNA),with an initial denaturing step before addition ofthe oxidant; this results in a quantitative conver-sion of 5hmC to 5fC (Fig. 2B).
To test the efficiency and selectivity of the oxi-dative bisulfite method, three synthetic dsDNAscontaining either C, 5mC, or 5hmC were eachoxidized with KRuO4 and then subjected to aconventional bisulfite conversion protocol. Sangersequencing revealed that 5mC residues did notconvert to U, whereas both C and 5hmC resi-dues did convert to U (fig. S2). Because Sangersequencing is not quantitative, to gain a moreaccurate measure of the efficiency of transforming5hmC to U, Illumina (San Diego, California) se-quencing was carried out on the synthetic DNAcontaining 5hmC (122-nucleotide oligomer) afteroxidative bisulfite treatment. An overall 5hmC-to-U conversion level of 94.5% was observed (Fig.2C and fig. S14). The oxidative bisulfite proto-col was also applied to a synthetic dsDNA thatcontained multiple 5hmC residues (135-nucleotideoligomer) in a range of different contexts thatshowed a similarly high conversion efficiency(94.7%) of 5hmC to U (Fig. 2C and fig. S14).Last, the KRuO4 oxidation was carried out ongenomic DNA and showed through mass spec-trometry a quantitative conversion of 5hmC to
1Department of Chemistry, University of Cambridge, CambridgeCB2 1EW, UK. 2Epigenetics Programme, Babraham Institute,Cambridge CB22 3AT, UK. 3Centre for Trophoblast Research,University of Cambridge, Cambridge CB2 3EG, UK. 4ProteomicsResearch Group, Babraham Institute, Cambridge CB22 3AT,UK. 5Bioinformatics Group, Babraham Institute, CambridgeCB22 3AT, UK. 6School of Clinical Medicine, University ofCambridge, Cambridge CB2 0SP, UK. 7Cancer Research UK,Cambridge Research Institute, Li Ka Shing Centre, Cam-bridge CB2 0RE, UK.
*These authors contributed equally to this work.†To whom correspondence should be addressed. E-mail:[email protected] (W.R.); [email protected] (S.B.)
18 MAY 2012 VOL 336 SCIENCE www.sciencemag.org934
REPORTS
Bisulphite sequencing (BS-seq)
Oxidative bisulphite sequencing (oxBS-seq)
(He et al., 2011; Ito et al., 2011; Tahiliani et al., 2009). These
oxidized methylcytosines (oxi-mC) have been proposed to play a role
in active DNA demethylation through 5mC oxidation and DNA re-
pair, and in chromatin regulation (Pastor et al., 2013). 5mC and all5 the oxi-mC species are of great interest due to the alleged role of
DNA methylation in diseases, such as different cancers (Baylin,
2005), Alzheimer (De Jager et al., 2014), asthma (Rastogi et al.,
2013), autism (Nardone et al., 2014) and type 2 diabetes (Dayeh
et al., 2014). However, studies of primary human clinical samples are10 complicated by many factors; for instance, greater biological vari-
ation compared with more controlled molecular biology studies, pos-
sible confounding factors and case-control matching.
Bisulphite sequencing (BS-seq) has become the gold standard
technique for profiling methylation at single nucleotide resolution15 (Lister et al., 2009, 2013; Rein et al., 1998). In BS-seq, genomic
DNA is treated with sodium bisulphite, which will rapidly deami-
nate unmodified cytosine (and 5fC and 5caC) to uracil, while de-
amination of 5mC and 5hmC are much slower (Frommer et al.,
1992). Next, after PCR amplification, uracil and cytosine are read20 as thymine and cytosine, respectively. Importantly, 5fC and 5caC
will have the same read-out as unmodified cytosine and, similarly,
5hmC and 5mC share the same read-out in BS-seq (Huang et al.,
2010). This observation drove the development of various modified
bisulphite sequencing protocols (reviewed in Plongthongkum et al.,25 2014). For instance, oxidative bisulphite sequencing (oxBS-seq)
(Booth et al., 2012) and Tet-assisted bisulphite sequencing (TAB-
seq) (Yu et al., 2012) were developed for distinguishing 5hmC from
5mC. Both methods, oxBS-seq and TAB-seq, are based on oxida-
tion; 5hmC is oxidised into 5fC by KRuO4 in oxBS-seq, whereas in30 TAB-seq 5mC is oxidised into 5caC by recombinant mouse Tet1. To
gain information on 5fC, 5fC chemical modification-assisted bisul-
phite sequencing (fCAB-seq) (Lu et al., 2013) and reduced bisulphite
sequencing (redBS-seq) (Booth et al., 2014) have been proposed.
Chemical modification-assisted bisulphite sequencing (CAB-seq) to-35gether with BS-seq allows the quantification of 5caC by protecting
5caC from deamination by sodium bisulphite with 1-ethyl-3-[3-
dimethylaminopropyl]-carbodiimide hydrochloride (Lu et al.,
2013). CpG methyltransferase (M.SssI) assisted bisulphite sequenc-
ing (MAB-seq) when combined with BS-seq distinguishes 5fC/5caC40from C (Wu et al., 2014). A summary of the read-outs of the
described bisulphite sequencing approaches is listed in Figure 1A.
In order to estimate proportions of multiple methylation modifi-
cations, one has to deconvolute and integrate data from multiple
bisulphite based measurements (Fig. 1A) which often have biases45due to imperfect experimental steps (Plongthongkum et al., 2014).
Many computational methods have been developed for analysing
the standard bisulphite sequencing data (here we will describe only
the most relevant methodologies, for a more comprehensive list of
different methods see €Aijo et al., 2016). Methods based on beta-50binomial models have been proposed allowing modeling of sampling
and biological variation. For instance, MOABS uses a hierarchical
beta-binomial model with an empirical Bayesian approach (Sun
et al., 2014). To assess differential methylation, MOABS uses cred-
ible methylation difference metric for summarizing statistical and55biological significance (Sun et al., 2014). Another method,
RADMeth, takes into account covariates under the beta-binomial
model using a generalised linear model approach with the logit link
function (Dolzhenko and Smith, 2014). RADMeth detects differen-
tial methylation by using the log-likelihood ratio test and the evi-60dence for differential methylation across neighbouring cytosines is
shared using the Stouffer-Liptak weighted Z test. Recently, the
MACAU method was proposed, which combines a binomial mixed
model with a sampling-based inference algorithm to model various
genetic relatedness/population structures (Lea et al., 2015).65MACAU uses Wald test statistics on the posterior samples to call
whether a covariate has an effect on methylation (Lea et al., 2015).
A C
B
Fig. 1. (A) The conversion chart of C, 5mC, 5hmC, 5fC and 5caC in BS-seq, oxBS-seq, TAB-seq, CAB-seq, fCAB-seq, redBS-seq and MAB-seq experiments. (B) The
experimental steps of BS- and oxBS-seq experiments are represented in terms of experimental parameters. Green and red arrows depict successful and unsuc-
cessful steps, respectively. (C) The proposed hierarchical model for modeling methylation modification proportions for BS-seq and oxBS-seq data and parts of
the original Lux model represented in the plate notation. The grey and white circles are used to represent observed variables and latent variables, respectively.
The grey squares represent fixed hyperparameters. The components, which model the experimental parameters and control cytosines are the same as in the Lux
model (€Aijo et al., 2016)
i2 T.€Aijo et al.
potassium perruthenate (KRuO4)
5fC (Fig. 2D), with no detectable degradation ofC (fig. S1C). Thus, the oxidative bisulfite protocolspecifically converts 5hmC to U in DNA, leavingC and 5mC unchanged, enabling quantitative,single-nucleotide-resolution sequencing on wide-ly available platforms.
We then used oxBS-Seq to quantitatively map5hmC at high resolution in the genomic DNAof mouse embryonic stem (ES) cells. We choseto combine oxidative bisulfite with reduced rep-resentation bisulfite sequencing (RRBS) (21),which allows deep, selective sequencing of afraction of the genome that is highly enrichedfor CpG islands (CGIs). We generated RRBSand oxidative RRBS (oxRRBS) data sets, achiev-ing an average sequencing depth of ~120 readsper CpG, which when pooled yielded an aver-age of ~3300 methylation calls per CGI (fig.S3). After applying depth and breadth cutoffs(supplementary materials, materials and meth-ods), 55% (12,660) of all CGIs (22) were cov-ered in our data sets.
To identify 5hmC-containing CGIs, we testedfor differences between the RRBS and oxRRBSdata sets using stringent criteria, yielding a falsediscovery rate of 3.7% (supplementary materials,materials and methods). We identified 800 5hmC-containing CGIs, which had an average of 3.3%(range of 0.2 to 18.5%) CpG hydroxymethylation(Fig. 3, A and B). We also identified 4577 5mC-containing CGIs averaging 8.1% CpG methyla-tion (Fig. 3B). We carried out sequencing on anindependent biological duplicate sample ofthe same ES cell line but at a different passage
A
B
C
0 100 200 300 4000
1
2
3d5fC
dU
Time / minutes
Nor
mal
ised
Con
cent
ratio
n / x
10-1
2
NO
N
NH2
R
OH
NO
N
NH2
R
O
NO
HN
O
R
Oxidation 1) NaHSO3
5hmC 5fC U
2) NaOH
Base Sequence BS Sequence oxBS Sequence C C T T
5mC C C C
5hmC C C T
A G T TA G T T
A G T C 5mC 5hmC
C C C T
Oxidation
BS and amplification
BS and amplification
Compare sequences
Input DNA
A G T C 5mC 5fC
Fig. 1. A method for single-base resolution sequencing of 5hmC. (A)Reaction of 2ʹ-deoxy-5-formylcytidine (d5fC) with NaHSO3 (bisulfite)quenched by NaOH at different time points and then analyzed with high-performance liquid chromatography (HPLC). Data are mean T SD of three
replicates. (B) Oxidative bisulfite reaction scheme: oxidation of 5hmC to5fC followed by bisulfite treatment and NaOH to convert 5fC to U. The Rgroup is DNA. (C) Diagram and table outlining the BS-Seq and oxBS-Seqtechniques.
5mC 5hmC0
20
40
60
80
100 94.5
2.1
% C
-T c
onve
rsio
n
5mC 5hmC0
20
40
60
80
100 94.7
2.1
% C
-T c
onve
rsio
n
Input Oxidised0
1
2
3
4
5 5hmC5fC
Nor
mal
ised
Con
cent
ratio
n / x
10-1
Input Oxidised0
100
200
300
5hm
C (n
orm
alis
ed p
eak
area
s)
Input Oxidised0
2
4
6 5hmC5fC
Nor
mal
ised
Con
cent
ratio
n / x
10-2
Input Oxidised0
10
20
30
5fC
(nor
mal
ised
pea
k ar
eas)
A B
DC Genomic DNA OxidationSingle 5hmCpG Multiple 5hmCpGs
Single Stranded DNA Oxidation Double Stranded DNA Oxidation
Fig. 2. Quantification of 5hmC oxidation. (A) Levels of 5hmC and 5fC (normalized to T) in a 15-nucleotideoligomer ssDNA oligonucleotide before and after KRuO4 oxidation, measured with mass spectrometry. (B)Levels of 5hmC and 5fC (normalized to 5mC) in a 135-nucleotide oligomer dsDNA fragment before andafter KRuO4 oxidation. (C) C-to-T conversion levels as determined by means of Illumina sequencing of twodsDNA fragments containing either a single 5hmCpG (122-nucleotide oligomer) or multiple 5hmCpGs(135-nucleotide oligomer) after oxidative bisulfite treatment. 5mC was also present in these strands. (D)Levels of 5hmC and 5fC (normalized to 5mC in primer sequence) in ES cell DNA measured before and afteroxidation. Data are mean T SD.
www.sciencemag.org SCIENCE VOL 336 18 MAY 2012 935
REPORTS
BS-seq, oxBS-seq, etc.
3. W. Kim, S. Kook, D. J. Kim, C. Teodorof, W. K. Song,J. Biol. Chem. 279, 8333 (2004).
4. V. Giambra et al., Mol. Cell. Biol. 28, 6123 (2008).5. F. E. Garrett et al., Mol. Cell. Biol. 25, 1511
(2005).6. W. A. Dunnick et al., J. Exp. Med. 206, 2613
(2009).7. M. Cogné et al., Cell 77, 737 (1994).8. J. P. Manis et al., J. Exp. Med. 188, 1421 (1998).9. A. G. Bébin et al., J. Immunol. 184, 3710 (2010).
10. E. Pinaud et al., Immunity 15, 187 (2001).11. C. Vincent-Fabert et al., Blood 116, 1895 (2010).12. R. Wuerffel et al., Immunity 27, 711 (2007).13. Z. Ju et al., J. Biol. Chem. 282, 35169 (2007).14. H. Duan, H. Xiang, L. Ma, L. M. Boxer, Oncogene 27,
6720 (2008).15. M. Gostissa et al., Nature 462, 803 (2009).16. C. Chauveau, M. Cogné, Nat. Genet. 14, 15 (1996).
17. C. Chauveau, E. Pinaud, M. Cogne, Eur. J. Immunol. 28,3048 (1998).
18. M. A. Sepulveda, F. E. Garrett, A. Price-Whelan,B. K. Birshtein, Mol. Immunol. 42, 605 (2005).
19. E. Pinaud, C. Aupetit, C. Chauveau, M. Cogné,Eur. J. Immunol. 27, 2981 (1997).
20. A. A. Khamlichi et al., Blood 103, 3828 (2004).21. R. Shinkura et al., Nat. Immunol. 4, 435 (2003).22. A. Yamane et al., Nat. Immunol. 12, 62 (2011).23. M. Liu et al., Nature 451, 841 (2008).24. J. Stavnezer, J. E. Guikema, C. E. Schrader, Annu. Rev.
Immunol. 26, 261 (2008).25. S. Duchez et al., Proc. Natl. Acad. Sci. U.S.A. 107, 3064
(2010).26. T. K. Kim et al., Nature 465, 182 (2010).
Acknowledgments: We thank T. Honjo for providingAID−/− mice and F. Lechouane for sorted B cells DNA samples.
We are indebted to the cell sorting facility of LimogesUniversity for excellent technical assistance in cell sorting. Thiswork was supported by grants from Association pour laRecherche sur le Cancer, Ligue Nationale contre le Cancer,Cancéropôle Grand Sud-Ouest, Institut National du Cancer,and Région Limousin. The data presented in this paper aretabulated here and in the supplementary materials.
Supplementary Materialswww.sciencemag.org/cgi/content/full/science.1218692/DC1Materials and MethodsFigs. S1 to S4Tables S1 and S2References (27–30)
4 January 2012; accepted 27 March 2012Published online 26 April 2012;10.1126/science.1218692
Quantitative Sequencing of5-Methylcytosine and5-Hydroxymethylcytosine atSingle-Base ResolutionMichael J. Booth,1* Miguel R. Branco,2,3* Gabriella Ficz,2 David Oxley,4 Felix Krueger,5
Wolf Reik,2,3† Shankar Balasubramanian1,6,7†
5-Methylcytosine can be converted to 5-hydroxymethylcytosine (5hmC) in mammalian DNA by theten-eleven translocation (TET) enzymes. We introduce oxidative bisulfite sequencing (oxBS-Seq),the first method for quantitative mapping of 5hmC in genomic DNA at single-nucleotide resolution.Selective chemical oxidation of 5hmC to 5-formylcytosine (5fC) enables bisulfite conversion of5fC to uracil. We demonstrate the utility of oxBS-Seq to map and quantify 5hmC at CpG islands(CGIs) in mouse embryonic stem (ES) cells and identify 800 5hmC-containing CGIs that haveon average 3.3% hydroxymethylation. High levels of 5hmC were found in CGIs associated withtranscriptional regulators and in long interspersed nuclear elements, suggesting that theseregions might undergo epigenetic reprogramming in ES cells. Our results open new questionson 5hmC dynamics and sequence-specific targeting by TETs.
5-Methylcytosine (5mC) is an epigenetic DNAmark that plays important roles in genesilencing and genome stability and is found
enriched at CpG dinucleotides (1). In metazoa,5mC can be oxidized to 5-hydroxymethylcytosine(5hmC) by the ten-eleven translocation (TET) en-zyme family (2, 3). 5hmCmay be an intermediatein active DNA demethylation but could also con-stitute an epigenetic mark per se (4). Levels of5hmC in genomic DNA can be quantified withanalytical methods (2, 5, 6) and mapped throughthe enrichment of 5hmC-containing DNA frag-
ments that are then sequenced (7–13). Such ap-proaches have relatively poor resolution and giveonly relative quantitative information. Single-nucleotide sequencing of 5mC has been per-formed by using bisulfite sequencing (BS-Seq),but this method cannot discriminate 5mC from5hmC (14, 15). Single-molecule real-time se-quencing (SMRT) can detect derivatized 5hmCin genomic DNA (16). However, enrichment of5hmC-containing DNA fragments is required,which causes loss of quantitative information(16). Furthermore, SMRT has a relatively highrate of sequencing errors (17), and the peak call-ing of modifications is imprecise (16). Proteinand solid-state nanopores can resolve 5mC from5hmC and have the potential to sequence unam-plified DNA (18, 19).
We observed the decarbonylation and deami-nation of 5-formylcytosine (5fC) to uracil (U)under bisulfite conditions that would leave 5mCunchanged (Fig. 1A and supplementary text).Thus, 5hmC sequencing would be possible if5hmC could be selectively oxidized to 5fC andthen converted to U in a two-step procedure (Fig.
1B). Whereas BS-Seq leads to both 5mC and5hmC being detected as Cs, this “oxidativebisulfite” sequencing (oxBS-Seq) approach wouldyield Cs only at 5mC sites and therefore allowus to determine the amount of 5hmC at a partic-ular nucleotide position by subtraction of thisreadout from a BS-Seq one (Fig. 1C).
Specific oxidation of 5hmC to 5fC (table S1)was achieved with potassium perruthenate (KRuO4).In our reactivity studies on a synthetic 15-nucleotideoligomer single-stranded DNA (ssDNA) contain-ing 5hmC, we established conditions under whichKRuO4 reacted specifically with the primary al-cohol of 5hmC (Fig. 2A). Fifteen-nucleotide oligo-mer ssDNA that contained C or 5mC did notshow any base-specific reactions with KRuO4 (fig.S1, A and B). For 5hmC in DNA, we only ob-served the aldehyde (5fC) and not the carboxylicacid (20), even with a moderate excess of oxidant.The KRuO4 oxidation can oxidize 5hmC in sam-ples presented as double-stranded DNA (dsDNA),with an initial denaturing step before addition ofthe oxidant; this results in a quantitative conver-sion of 5hmC to 5fC (Fig. 2B).
To test the efficiency and selectivity of the oxi-dative bisulfite method, three synthetic dsDNAscontaining either C, 5mC, or 5hmC were eachoxidized with KRuO4 and then subjected to aconventional bisulfite conversion protocol. Sangersequencing revealed that 5mC residues did notconvert to U, whereas both C and 5hmC resi-dues did convert to U (fig. S2). Because Sangersequencing is not quantitative, to gain a moreaccurate measure of the efficiency of transforming5hmC to U, Illumina (San Diego, California) se-quencing was carried out on the synthetic DNAcontaining 5hmC (122-nucleotide oligomer) afteroxidative bisulfite treatment. An overall 5hmC-to-U conversion level of 94.5% was observed (Fig.2C and fig. S14). The oxidative bisulfite proto-col was also applied to a synthetic dsDNA thatcontained multiple 5hmC residues (135-nucleotideoligomer) in a range of different contexts thatshowed a similarly high conversion efficiency(94.7%) of 5hmC to U (Fig. 2C and fig. S14).Last, the KRuO4 oxidation was carried out ongenomic DNA and showed through mass spec-trometry a quantitative conversion of 5hmC to
1Department of Chemistry, University of Cambridge, CambridgeCB2 1EW, UK. 2Epigenetics Programme, Babraham Institute,Cambridge CB22 3AT, UK. 3Centre for Trophoblast Research,University of Cambridge, Cambridge CB2 3EG, UK. 4ProteomicsResearch Group, Babraham Institute, Cambridge CB22 3AT,UK. 5Bioinformatics Group, Babraham Institute, CambridgeCB22 3AT, UK. 6School of Clinical Medicine, University ofCambridge, Cambridge CB2 0SP, UK. 7Cancer Research UK,Cambridge Research Institute, Li Ka Shing Centre, Cam-bridge CB2 0RE, UK.
*These authors contributed equally to this work.†To whom correspondence should be addressed. E-mail:[email protected] (W.R.); [email protected] (S.B.)
18 MAY 2012 VOL 336 SCIENCE www.sciencemag.org934
REPORTS
3. W. Kim, S. Kook, D. J. Kim, C. Teodorof, W. K. Song,J. Biol. Chem. 279, 8333 (2004).
4. V. Giambra et al., Mol. Cell. Biol. 28, 6123 (2008).5. F. E. Garrett et al., Mol. Cell. Biol. 25, 1511
(2005).6. W. A. Dunnick et al., J. Exp. Med. 206, 2613
(2009).7. M. Cogné et al., Cell 77, 737 (1994).8. J. P. Manis et al., J. Exp. Med. 188, 1421 (1998).9. A. G. Bébin et al., J. Immunol. 184, 3710 (2010).
10. E. Pinaud et al., Immunity 15, 187 (2001).11. C. Vincent-Fabert et al., Blood 116, 1895 (2010).12. R. Wuerffel et al., Immunity 27, 711 (2007).13. Z. Ju et al., J. Biol. Chem. 282, 35169 (2007).14. H. Duan, H. Xiang, L. Ma, L. M. Boxer, Oncogene 27,
6720 (2008).15. M. Gostissa et al., Nature 462, 803 (2009).16. C. Chauveau, M. Cogné, Nat. Genet. 14, 15 (1996).
17. C. Chauveau, E. Pinaud, M. Cogne, Eur. J. Immunol. 28,3048 (1998).
18. M. A. Sepulveda, F. E. Garrett, A. Price-Whelan,B. K. Birshtein, Mol. Immunol. 42, 605 (2005).
19. E. Pinaud, C. Aupetit, C. Chauveau, M. Cogné,Eur. J. Immunol. 27, 2981 (1997).
20. A. A. Khamlichi et al., Blood 103, 3828 (2004).21. R. Shinkura et al., Nat. Immunol. 4, 435 (2003).22. A. Yamane et al., Nat. Immunol. 12, 62 (2011).23. M. Liu et al., Nature 451, 841 (2008).24. J. Stavnezer, J. E. Guikema, C. E. Schrader, Annu. Rev.
Immunol. 26, 261 (2008).25. S. Duchez et al., Proc. Natl. Acad. Sci. U.S.A. 107, 3064
(2010).26. T. K. Kim et al., Nature 465, 182 (2010).
Acknowledgments: We thank T. Honjo for providingAID−/− mice and F. Lechouane for sorted B cells DNA samples.
We are indebted to the cell sorting facility of LimogesUniversity for excellent technical assistance in cell sorting. Thiswork was supported by grants from Association pour laRecherche sur le Cancer, Ligue Nationale contre le Cancer,Cancéropôle Grand Sud-Ouest, Institut National du Cancer,and Région Limousin. The data presented in this paper aretabulated here and in the supplementary materials.
Supplementary Materialswww.sciencemag.org/cgi/content/full/science.1218692/DC1Materials and MethodsFigs. S1 to S4Tables S1 and S2References (27–30)
4 January 2012; accepted 27 March 2012Published online 26 April 2012;10.1126/science.1218692
Quantitative Sequencing of5-Methylcytosine and5-Hydroxymethylcytosine atSingle-Base ResolutionMichael J. Booth,1* Miguel R. Branco,2,3* Gabriella Ficz,2 David Oxley,4 Felix Krueger,5
Wolf Reik,2,3† Shankar Balasubramanian1,6,7†
5-Methylcytosine can be converted to 5-hydroxymethylcytosine (5hmC) in mammalian DNA by theten-eleven translocation (TET) enzymes. We introduce oxidative bisulfite sequencing (oxBS-Seq),the first method for quantitative mapping of 5hmC in genomic DNA at single-nucleotide resolution.Selective chemical oxidation of 5hmC to 5-formylcytosine (5fC) enables bisulfite conversion of5fC to uracil. We demonstrate the utility of oxBS-Seq to map and quantify 5hmC at CpG islands(CGIs) in mouse embryonic stem (ES) cells and identify 800 5hmC-containing CGIs that haveon average 3.3% hydroxymethylation. High levels of 5hmC were found in CGIs associated withtranscriptional regulators and in long interspersed nuclear elements, suggesting that theseregions might undergo epigenetic reprogramming in ES cells. Our results open new questionson 5hmC dynamics and sequence-specific targeting by TETs.
5-Methylcytosine (5mC) is an epigenetic DNAmark that plays important roles in genesilencing and genome stability and is found
enriched at CpG dinucleotides (1). In metazoa,5mC can be oxidized to 5-hydroxymethylcytosine(5hmC) by the ten-eleven translocation (TET) en-zyme family (2, 3). 5hmCmay be an intermediatein active DNA demethylation but could also con-stitute an epigenetic mark per se (4). Levels of5hmC in genomic DNA can be quantified withanalytical methods (2, 5, 6) and mapped throughthe enrichment of 5hmC-containing DNA frag-
ments that are then sequenced (7–13). Such ap-proaches have relatively poor resolution and giveonly relative quantitative information. Single-nucleotide sequencing of 5mC has been per-formed by using bisulfite sequencing (BS-Seq),but this method cannot discriminate 5mC from5hmC (14, 15). Single-molecule real-time se-quencing (SMRT) can detect derivatized 5hmCin genomic DNA (16). However, enrichment of5hmC-containing DNA fragments is required,which causes loss of quantitative information(16). Furthermore, SMRT has a relatively highrate of sequencing errors (17), and the peak call-ing of modifications is imprecise (16). Proteinand solid-state nanopores can resolve 5mC from5hmC and have the potential to sequence unam-plified DNA (18, 19).
We observed the decarbonylation and deami-nation of 5-formylcytosine (5fC) to uracil (U)under bisulfite conditions that would leave 5mCunchanged (Fig. 1A and supplementary text).Thus, 5hmC sequencing would be possible if5hmC could be selectively oxidized to 5fC andthen converted to U in a two-step procedure (Fig.
1B). Whereas BS-Seq leads to both 5mC and5hmC being detected as Cs, this “oxidativebisulfite” sequencing (oxBS-Seq) approach wouldyield Cs only at 5mC sites and therefore allowus to determine the amount of 5hmC at a partic-ular nucleotide position by subtraction of thisreadout from a BS-Seq one (Fig. 1C).
Specific oxidation of 5hmC to 5fC (table S1)was achieved with potassium perruthenate (KRuO4).In our reactivity studies on a synthetic 15-nucleotideoligomer single-stranded DNA (ssDNA) contain-ing 5hmC, we established conditions under whichKRuO4 reacted specifically with the primary al-cohol of 5hmC (Fig. 2A). Fifteen-nucleotide oligo-mer ssDNA that contained C or 5mC did notshow any base-specific reactions with KRuO4 (fig.S1, A and B). For 5hmC in DNA, we only ob-served the aldehyde (5fC) and not the carboxylicacid (20), even with a moderate excess of oxidant.The KRuO4 oxidation can oxidize 5hmC in sam-ples presented as double-stranded DNA (dsDNA),with an initial denaturing step before addition ofthe oxidant; this results in a quantitative conver-sion of 5hmC to 5fC (Fig. 2B).
To test the efficiency and selectivity of the oxi-dative bisulfite method, three synthetic dsDNAscontaining either C, 5mC, or 5hmC were eachoxidized with KRuO4 and then subjected to aconventional bisulfite conversion protocol. Sangersequencing revealed that 5mC residues did notconvert to U, whereas both C and 5hmC resi-dues did convert to U (fig. S2). Because Sangersequencing is not quantitative, to gain a moreaccurate measure of the efficiency of transforming5hmC to U, Illumina (San Diego, California) se-quencing was carried out on the synthetic DNAcontaining 5hmC (122-nucleotide oligomer) afteroxidative bisulfite treatment. An overall 5hmC-to-U conversion level of 94.5% was observed (Fig.2C and fig. S14). The oxidative bisulfite proto-col was also applied to a synthetic dsDNA thatcontained multiple 5hmC residues (135-nucleotideoligomer) in a range of different contexts thatshowed a similarly high conversion efficiency(94.7%) of 5hmC to U (Fig. 2C and fig. S14).Last, the KRuO4 oxidation was carried out ongenomic DNA and showed through mass spec-trometry a quantitative conversion of 5hmC to
1Department of Chemistry, University of Cambridge, CambridgeCB2 1EW, UK. 2Epigenetics Programme, Babraham Institute,Cambridge CB22 3AT, UK. 3Centre for Trophoblast Research,University of Cambridge, Cambridge CB2 3EG, UK. 4ProteomicsResearch Group, Babraham Institute, Cambridge CB22 3AT,UK. 5Bioinformatics Group, Babraham Institute, CambridgeCB22 3AT, UK. 6School of Clinical Medicine, University ofCambridge, Cambridge CB2 0SP, UK. 7Cancer Research UK,Cambridge Research Institute, Li Ka Shing Centre, Cam-bridge CB2 0RE, UK.
*These authors contributed equally to this work.†To whom correspondence should be addressed. E-mail:[email protected] (W.R.); [email protected] (S.B.)
18 MAY 2012 VOL 336 SCIENCE www.sciencemag.org934
REPORTS
Bisulphite sequencing (BS-seq)
Oxidative bisulphite sequencing (oxBS-seq)
(He et al., 2011; Ito et al., 2011; Tahiliani et al., 2009). These
oxidized methylcytosines (oxi-mC) have been proposed to play a role
in active DNA demethylation through 5mC oxidation and DNA re-
pair, and in chromatin regulation (Pastor et al., 2013). 5mC and all5 the oxi-mC species are of great interest due to the alleged role of
DNA methylation in diseases, such as different cancers (Baylin,
2005), Alzheimer (De Jager et al., 2014), asthma (Rastogi et al.,
2013), autism (Nardone et al., 2014) and type 2 diabetes (Dayeh
et al., 2014). However, studies of primary human clinical samples are10 complicated by many factors; for instance, greater biological vari-
ation compared with more controlled molecular biology studies, pos-
sible confounding factors and case-control matching.
Bisulphite sequencing (BS-seq) has become the gold standard
technique for profiling methylation at single nucleotide resolution15 (Lister et al., 2009, 2013; Rein et al., 1998). In BS-seq, genomic
DNA is treated with sodium bisulphite, which will rapidly deami-
nate unmodified cytosine (and 5fC and 5caC) to uracil, while de-
amination of 5mC and 5hmC are much slower (Frommer et al.,
1992). Next, after PCR amplification, uracil and cytosine are read20 as thymine and cytosine, respectively. Importantly, 5fC and 5caC
will have the same read-out as unmodified cytosine and, similarly,
5hmC and 5mC share the same read-out in BS-seq (Huang et al.,
2010). This observation drove the development of various modified
bisulphite sequencing protocols (reviewed in Plongthongkum et al.,25 2014). For instance, oxidative bisulphite sequencing (oxBS-seq)
(Booth et al., 2012) and Tet-assisted bisulphite sequencing (TAB-
seq) (Yu et al., 2012) were developed for distinguishing 5hmC from
5mC. Both methods, oxBS-seq and TAB-seq, are based on oxida-
tion; 5hmC is oxidised into 5fC by KRuO4 in oxBS-seq, whereas in30 TAB-seq 5mC is oxidised into 5caC by recombinant mouse Tet1. To
gain information on 5fC, 5fC chemical modification-assisted bisul-
phite sequencing (fCAB-seq) (Lu et al., 2013) and reduced bisulphite
sequencing (redBS-seq) (Booth et al., 2014) have been proposed.
Chemical modification-assisted bisulphite sequencing (CAB-seq) to-35gether with BS-seq allows the quantification of 5caC by protecting
5caC from deamination by sodium bisulphite with 1-ethyl-3-[3-
dimethylaminopropyl]-carbodiimide hydrochloride (Lu et al.,
2013). CpG methyltransferase (M.SssI) assisted bisulphite sequenc-
ing (MAB-seq) when combined with BS-seq distinguishes 5fC/5caC40from C (Wu et al., 2014). A summary of the read-outs of the
described bisulphite sequencing approaches is listed in Figure 1A.
In order to estimate proportions of multiple methylation modifi-
cations, one has to deconvolute and integrate data from multiple
bisulphite based measurements (Fig. 1A) which often have biases45due to imperfect experimental steps (Plongthongkum et al., 2014).
Many computational methods have been developed for analysing
the standard bisulphite sequencing data (here we will describe only
the most relevant methodologies, for a more comprehensive list of
different methods see €Aijo et al., 2016). Methods based on beta-50binomial models have been proposed allowing modeling of sampling
and biological variation. For instance, MOABS uses a hierarchical
beta-binomial model with an empirical Bayesian approach (Sun
et al., 2014). To assess differential methylation, MOABS uses cred-
ible methylation difference metric for summarizing statistical and55biological significance (Sun et al., 2014). Another method,
RADMeth, takes into account covariates under the beta-binomial
model using a generalised linear model approach with the logit link
function (Dolzhenko and Smith, 2014). RADMeth detects differen-
tial methylation by using the log-likelihood ratio test and the evi-60dence for differential methylation across neighbouring cytosines is
shared using the Stouffer-Liptak weighted Z test. Recently, the
MACAU method was proposed, which combines a binomial mixed
model with a sampling-based inference algorithm to model various
genetic relatedness/population structures (Lea et al., 2015).65MACAU uses Wald test statistics on the posterior samples to call
whether a covariate has an effect on methylation (Lea et al., 2015).
A C
B
Fig. 1. (A) The conversion chart of C, 5mC, 5hmC, 5fC and 5caC in BS-seq, oxBS-seq, TAB-seq, CAB-seq, fCAB-seq, redBS-seq and MAB-seq experiments. (B) The
experimental steps of BS- and oxBS-seq experiments are represented in terms of experimental parameters. Green and red arrows depict successful and unsuc-
cessful steps, respectively. (C) The proposed hierarchical model for modeling methylation modification proportions for BS-seq and oxBS-seq data and parts of
the original Lux model represented in the plate notation. The grey and white circles are used to represent observed variables and latent variables, respectively.
The grey squares represent fixed hyperparameters. The components, which model the experimental parameters and control cytosines are the same as in the Lux
model (€Aijo et al., 2016)
i2 T.€Aijo et al.
potassium perruthenate (KRuO4)
5fC (Fig. 2D), with no detectable degradation ofC (fig. S1C). Thus, the oxidative bisulfite protocolspecifically converts 5hmC to U in DNA, leavingC and 5mC unchanged, enabling quantitative,single-nucleotide-resolution sequencing on wide-ly available platforms.
We then used oxBS-Seq to quantitatively map5hmC at high resolution in the genomic DNAof mouse embryonic stem (ES) cells. We choseto combine oxidative bisulfite with reduced rep-resentation bisulfite sequencing (RRBS) (21),which allows deep, selective sequencing of afraction of the genome that is highly enrichedfor CpG islands (CGIs). We generated RRBSand oxidative RRBS (oxRRBS) data sets, achiev-ing an average sequencing depth of ~120 readsper CpG, which when pooled yielded an aver-age of ~3300 methylation calls per CGI (fig.S3). After applying depth and breadth cutoffs(supplementary materials, materials and meth-ods), 55% (12,660) of all CGIs (22) were cov-ered in our data sets.
To identify 5hmC-containing CGIs, we testedfor differences between the RRBS and oxRRBSdata sets using stringent criteria, yielding a falsediscovery rate of 3.7% (supplementary materials,materials and methods). We identified 800 5hmC-containing CGIs, which had an average of 3.3%(range of 0.2 to 18.5%) CpG hydroxymethylation(Fig. 3, A and B). We also identified 4577 5mC-containing CGIs averaging 8.1% CpG methyla-tion (Fig. 3B). We carried out sequencing on anindependent biological duplicate sample ofthe same ES cell line but at a different passage
A
B
C
0 100 200 300 4000
1
2
3d5fC
dU
Time / minutes
Nor
mal
ised
Con
cent
ratio
n / x
10-1
2
NO
N
NH2
R
OH
NO
N
NH2
R
O
NO
HN
O
R
Oxidation 1) NaHSO3
5hmC 5fC U
2) NaOH
Base Sequence BS Sequence oxBS Sequence C C T T
5mC C C C
5hmC C C T
A G T TA G T T
A G T C 5mC 5hmC
C C C T
Oxidation
BS and amplification
BS and amplification
Compare sequences
Input DNA
A G T C 5mC 5fC
Fig. 1. A method for single-base resolution sequencing of 5hmC. (A)Reaction of 2ʹ-deoxy-5-formylcytidine (d5fC) with NaHSO3 (bisulfite)quenched by NaOH at different time points and then analyzed with high-performance liquid chromatography (HPLC). Data are mean T SD of three
replicates. (B) Oxidative bisulfite reaction scheme: oxidation of 5hmC to5fC followed by bisulfite treatment and NaOH to convert 5fC to U. The Rgroup is DNA. (C) Diagram and table outlining the BS-Seq and oxBS-Seqtechniques.
5mC 5hmC0
20
40
60
80
100 94.5
2.1
% C
-T c
onve
rsio
n
5mC 5hmC0
20
40
60
80
100 94.7
2.1
% C
-T c
onve
rsio
n
Input Oxidised0
1
2
3
4
5 5hmC5fC
Nor
mal
ised
Con
cent
ratio
n / x
10-1
Input Oxidised0
100
200
300
5hm
C (n
orm
alis
ed p
eak
area
s)
Input Oxidised0
2
4
6 5hmC5fC
Nor
mal
ised
Con
cent
ratio
n / x
10-2
Input Oxidised0
10
20
30
5fC
(nor
mal
ised
pea
k ar
eas)
A B
DC Genomic DNA OxidationSingle 5hmCpG Multiple 5hmCpGs
Single Stranded DNA Oxidation Double Stranded DNA Oxidation
Fig. 2. Quantification of 5hmC oxidation. (A) Levels of 5hmC and 5fC (normalized to T) in a 15-nucleotideoligomer ssDNA oligonucleotide before and after KRuO4 oxidation, measured with mass spectrometry. (B)Levels of 5hmC and 5fC (normalized to 5mC) in a 135-nucleotide oligomer dsDNA fragment before andafter KRuO4 oxidation. (C) C-to-T conversion levels as determined by means of Illumina sequencing of twodsDNA fragments containing either a single 5hmCpG (122-nucleotide oligomer) or multiple 5hmCpGs(135-nucleotide oligomer) after oxidative bisulfite treatment. 5mC was also present in these strands. (D)Levels of 5hmC and 5fC (normalized to 5mC in primer sequence) in ES cell DNA measured before and afteroxidation. Data are mean T SD.
www.sciencemag.org SCIENCE VOL 336 18 MAY 2012 935
REPORTS
BS-seq, oxBS-seq, etc.
Figure from (Booth et al, 2012)
12/ 29
Bisulfite sequencing (BS-seq) protocol
I Bisulfite treatment of genomic DNA converts unmethylated cytosines to urasils which areread as thymine during sequencing
I Methylated (and hydroxymethylated) cytosines are resistant to the conversion and are readas cytosine
146 | VOL.9 NO.2 | FEBRUARY 2012 | NATURE METHODS
REVIEW
amplifying bisulfite-treated DNA by PCR yields products in which unmethylated cytosines appear as thymines. By comparing the modified DNA with the original sequence, the methylation state of the original DNA can therefore be inferred. Bisulfite treatment of 5-hydroxymethylcytosine (5hmC) yields a similar intermediate to 5mC, meaning that BS-seq can be used to detect whether a position is (hydroxy-) methylated but not to determine the exact type of modification21,25 (Fig. 1). This limitation does not apply to antibody-based techniques, which can be used to specifically enrich 5hmC26–28.
Capillary electrophoresis–based bisulfite sequencing was consid-ered the gold standard for methylation analysis because of its clear readout and single-base resolution22, but it could only be applied to relatively small regions. New sequencing technologies mean that BS-seq is now a viable option for the sequencing of entire mam-malian methylomes6–8,29–32 (Supplementary Table 1).
For researchers primarily interested in CpG island methylation, the cost of bisulfite sequencing can be reduced by enriching CpG-dense regions by digesting genomic DNA with a methylation-insensitive restriction enzyme containing a C-G as part of its recognition site and selecting short fragments6,30,33. Even though the selected fragments are used to interrogate only a few percent of the genome, these data are informative for the majority of CpG islands. This approach, termed reduced representation BS-seq (RRBS), has been extensively described and compared to other techniques23,33–35, and several genome-wide methylation maps based on RRBS have been reported6,30.
In this Review we provide an overview of the computational analysis of bisulfite sequencing data. We highlight points to con-sider when designing a BS-seq experiment and point out pitfalls that can occur during the initial analysis. We also discuss dif-ferent alignment strategies and their implementation by current bioinformatic tools. In particular, we present the main differences between the analysis of base space (Illumina) and color space (SOLiD, Applied Biosystems) BS-seq data.
Challenges of BS-seq data mappingAs the methylation state of bisulfite-treated DNA must be inferred by comparison to an unmodified reference sequence, a correct alignment is of critical importance. This is challenging because the aligned sequences do not exactly match the reference, and the complexity of the libraries is reduced. Also, as cytosine methyl-ation is not symmetrical, the two strands of DNA in the reference genome must be considered separately. A single site can have a different methylation state in different cells. Thus, when sequenc-ing cell mixtures or tissue fractions, the percentage of methylation at each site needs to be determined36.
When performing an alignment one must discriminate between different types of bisulfite-treated DNA libraries (for a schematic
drawing, see ref. 16). In the first, termed directional libraries, adapters are attached to the DNA fragments such that only the original top or bottom strands will be sequenced7,30. Alternatively, all four DNA strands that arise through bisulfite treatment and subsequent PCR amplification can be sequenced with the same frequency in nondirectional libraries32,37,38. BS-seq mapping may therefore require up to four different strand alignments to be analyzed for each sequence. Because of the complexity of BS-seq alignments, standard sequence alignment software cannot be used. However, several different tools for BS-seq analysis have been developed.
Base-space BS-seq data alignmentsMethylation-‘aware’ alignment tools consider both cytosine and thymine as potential matches to a genomic cytosine. This strategy provides the highest possible mapping efficiency (high sensitivity) because it makes optimal use of the information present in the reads. However, a drawback of this technique is that methylated sequences will be aligned with greater efficiency because they carry more information than their unmethylated counterparts, leading this type of aligner to overestimate methylation levels.
Alternatively, in unbiased approaches usually any residual cytosines in the BS-seq read and all cytosines in the reference genome are converted into thymines before the alignment is per-formed7,30. This means that the read sequence to be aligned is unaffected by its methylation state. It also means that there will be an exact match between the converted read and converted genome sequence so that standard sequence alignment tools can be used to perform the mapping39,40. This approach, however, comes at the cost of slightly reduced mapping efficiencies (Fig. 2a).
BS-seq in color spaceIn contrast to the intuitive base-space sequence generated by Illumina sequencers, SOLiD sequencing (Applied Biosystems) encodes its reads in color space such that each color resembles the transition from one base to the next41. Single-nucleotide poly-morphisms (SNPs) can be called with high confidence because they will result in two adjacent color changes, whereas technical errors are indicated by a single color change (Supplementary Fig. 1a,b). Owing to the way color-space encoding works, residual cytosines are correctly converted into thymines in the bisulfite reads in silico before the mapping only if the reads are completely error-free. A single measurement error in the read would lead to incorrect conversions throughout the rest of the read (Supplementary Fig. 1c). As a consequence, the in silico cytosine to thymine conversion, which guarantees unbiased align-ments, should not be performed on color-space datasets.
Current tools to align color space BS-seq data to a reference genome either use methylation-aware alignments (SOCS-B42), which can be computationally intensive for complex genomes,
PCR amplification
Bisulfite conversionTop strand
mC
OTCTOT
CTOBOB
mC mC mC
mC
Bottom strand
Figure 1 | Effect of bisulfite treatment of DNA. Bisulfite conversion of genomic DNA and subsequent PCR amplification gives rise to two PCR products and up to four potentially different DNA fragments for any given locus. (Hydroxy)methylated cytosine residues are resistant to bisulfite conversion and can be used as a readout of the DNA methylation state. mC, 5-methylcytosine; hmC, 5-hydroxymethylcytosine; OT, original top strand; CTOT, strand complementary to the original top strand; OB, original bottom strand; and CTOB, strand complementary to the original bottom strand.
Figure from (Krueger et al, 2012)
13/ 29
Reduced representation BS-seq (RRBS-seq)
I BS-seq provides an accurate map of methylation state at single nucleotide resolution
I Whole genome analysis is expensive because only about 1% of the human genomecontains CpGs
→ Experimental techniques to enrich for the areas of the genome that have a high CpGcontent
I Reduced representation BS-seq (RRBS-seq) uses restriction enzymes prior to bisulfitesequencing
I MspI digests genomic DNA in a methylation-insensitive mannerI MspI targets 5’CCGG3’ sequences and cleaves the phosphodiester bonds upstream of CpG
dinucleotide.→ Each fragment will have a CpG at each end
I RRBS-seq will cover majority of promoters and GC rich regions
14/ 29
Reduced representation BS-seq (RRBS-seq)
Figure from (Lianga et al, 2014)
15/ 29
Contents
I DNA methylation
I Bisulfite sequencing (BS-seq) protocol
I Alignment and quantification of BS-seq data
I Statistical analysis of BS-seq data
16/ 29
Aligning BS-seq reads
I Bisulfite treatment introduces mutations into genomic DNA in a methylation dependentmanner
I Alignment of BS-seq reads is more challengingI Standard alignment methods cannot be used directly
I Bismark tool uses the following approach to map BS-seq readsI Reads from a BS-seq experiment are converted into a C-to-T version and a G-to-A versionI The same conversion for the genomeI Bowtie alignment in the genome that has reduced complexityI A unique best alignment is determined from four parallel alignment processes (see next page)
17/ 29
Bismark tool
Figure from (Krueger & Andrews, 2011)
18/ 29
Quantifying BS-seq data
I Bismark outputs, among others, one line per read containing useful informationI Mapping position, alignment strand, the bisulfite read sequence, its equivalent genomic
sequence and a methylation call string
I Bismark automatically extracts the methylation information at individual cytosinepositions
I For different sequence contexts (CpG, CHG, CHH; where H can be either A, T or C)I Strand-specific or strands merged
I That is, for each cytosine Bismark outputsI ni the number of reads covering the cytosine in sample iI mi the number of methylated readouts (i.e., “C”) for the cytosine in sample i
I One way to quantify methylation proportion is
pi =mi
ni=
the number of C reads overlapping the cytosine
the number of C or T reads overlapping the cytosine
19/ 29
Contents
I DNA methylation
I Bisulfite sequencing (BS-seq) protocol
I Alignment and quantification of BS-seq data
I Statistical analysis of BS-seq data
20/ 29
Beta-binomial model
I At the end, one is typically interested in testing a hypothesis, e.g. is there a statisticallysignificant difference in methylation levels between group A and group B
I Some early methods applied e.g. the t-test on the estimated methylation fractions pi (ortheir logit transformations)
I We will look at RadMeth tool (Dolzhenko and Smith, 2014)
I RadMeth uses the beta-binomial regression model, where beta-binomial is a compounddistribution obtained from the binomial by assuming that its probability of successparameter follows a beta distribution
21/ 29
Beta-binomial model
I i = 1, . . . , s, where s is the number of samples
I For each cytosine in the genome we have the following modelI ni : the number of reads covering the cytosine in sample iI mi : the number of reads that contain “C” readout (i.e. methylated) at the cytosine in
sample i (0 ≤ mi ≤ ni )I If we knew the underlying methylation level pi , then: Mi ∼ Binom(pi , ni )
I pi : the unknown methylation level of the cytosine in sample iI Instead of assuming a fixed (unknown) methylation level, assume pi has a compounding
distribution pi ∼ Beta(α, β), α ≥ 0, β ≥ 0I The probability of observing methylation level Mi = mi for a coverage ni follows so called
beta-binomial model
P(Mi = mi |ni , α, β) =
∫ 1
0
Binom(mi |pi , ni )Beta(pi |α, β)dpi
=
(nimi
)B(mi + α, ni −mi + β)
B(α, β),
where B is the beta function
21/ 29
Beta-binomial model
I i = 1, . . . , s, where s is the number of samples
I For each cytosine in the genome we have the following modelI ni : the number of reads covering the cytosine in sample iI mi : the number of reads that contain “C” readout (i.e. methylated) at the cytosine in
sample i (0 ≤ mi ≤ ni )I If we knew the underlying methylation level pi , then: Mi ∼ Binom(pi , ni )I pi : the unknown methylation level of the cytosine in sample iI Instead of assuming a fixed (unknown) methylation level, assume pi has a compounding
distribution pi ∼ Beta(α, β), α ≥ 0, β ≥ 0I The probability of observing methylation level Mi = mi for a coverage ni follows so called
beta-binomial model
P(Mi = mi |ni , α, β) =
∫ 1
0
Binom(mi |pi , ni )Beta(pi |α, β)dpi
=
(nimi
)B(mi + α, ni −mi + β)
B(α, β),
where B is the beta function
22/ 29
Beta-binomial model
I An illustration of binomial / beta / beta-binomial densities
0 10 200
0.05
0.1
0.15
0.2
0.25binomial: p=0.8
0 10 200
0.05
0.1
0.15
0.2
0.25beta-binomial: a=80, b=20
0 0.5 10
0.2
0.4
0.6
0.8
1p=0.8
0 10 200
0.05
0.1
0.15
0.2
0.25beta-binomial: a=8, b=2
0 0.5 10
2
4
6
8
10beta: a=80, b=20
0 0.5 10
1
2
3
4beta: a=8, b=2
Binomial and beta-binomial densities
23/ 29
Beta-binomial model
I Mean and variance of the beta-binomial model are
µ =niα
α + βand σ2 =
niαβ(α + β + ni )
(α + β)2(α + β + 1)
I ReparameterizationI π = α
α+βis the the average methylation level of a set of replicate samples
I γ = 1α+β+1
is the common dispersion parameter
allows us to write the same model as
Mi ∼ BetaBinomial(ni , π, γ)
where the mean and the variance are now defined asI E(Mi ) = niπI Var(Mi ) = niπ(1− π)(1 + (ni − 1)γ)
I Recall that the variance of the binomial distribution is niπ(1− π) which is smaller thanVar(Mi ) for ni ≥ 2
24/ 29
Generalized beta-binomial model
I In most of the real world applications, methylation levels can be confounded by one ormore factors (e.g. age and smoking)
I The generalized linear model (GLM) generalizes the ordinary linear regression to allow forresponse variables that have likelihood models other than a normal distribution
25/ 29
Generalized beta-binomial model
I For each sample i (and for each cytosine), the mean methylation level πi depends oncovariates xi = (xi1, xi2, . . . , xit)
T
g(πi ) =t∑
j=1
xijηj = xTi η
where η is a t × 1 parameter vector and
g(π) = logit(π) = log
(π
1− π
)πi = logit−1(xTi η) = logistic(xTi η) =
exp(xTi η)
exp(xTi η) + 1
I logit(·) :]0, 1[→ R, thus logit(·)−1 : R→]0, 1[
26/ 29
Model fitting and inference
I The beta-binomial regression is fit separately for each CpG site
I The parameters η and γ are estimated using maximum likelihood
I Iteratively reweighted least squares algorithm using a Newton-Raphson method
I Test the differential methylation w.r.t. a test factor ηj :I Learn the full model and the reduced model without the test factorI Compare the models using log-likelihood ratio test
D = −2 ln
(likelihood of the reduced model
likelihood of the full model
)I p-value from chi-square test with dfull − dreduced degrees of freedom, where dfull denotes
the number of free parameters in the full model
27/ 29
RadMeth application
I Neuron and non-neuron RRBS-seq samples from mouse frontal cortex: xi1 ∈ {0, 1}I 6 samples: s = 6
I Two additional factors: age (xi2 ∈ R+), sex (xi3 ∈ {0, 1})I 72 000 differentially methylated (DM) regions between neuron and non-neuron samples
that contain at least 10 CpGs
I DM regions with minimum methylation difference above 0.55I 1708 lowly methylated (active) regions in neuronsI These regions are associated with (located close to) 1089 genesI GO enrichment analysis by DAVID found a strong association of these genes with various
aspects of neuronal development and function
28/ 29
RadMeth application
Dolzhenko and Smith BMC Bioinformatics 2014, 15:215 Page 6 of 8http://www.biomedcentral.com/1471-2105/15/215
frequ
ency
0
10
20
30
5000 5000frequency
0.0 0.2 0.4 0.6 0.8
1000
1500
neuron low
non-neuron low
log odds ratio
minimum methylation difference
DMR HMR
5 kb
Eno2Lrrc23
female 12mo
female12mo
female 6wk
male 7wk
female 6wk
male 7wkne
uron
non-
neur
on
Figure 2 DM regions between neuron and non-neuron samples. (Top left) Methylation profile of the neuron specific enolase (Eno2) – a markerof neuron cells – across frontal cortex samples. (Right) Histogram of log-odds-ratios of DM regions containing at least 10 CpGs. (Bottom left)Histogram of minimum methylation differences of DM regions containing at least 10 CpGs.
file 1). Although predominantly glial, non-neuron sam-ples consisted of multiple cell types. Hence the majorityof DM regions, especially the ones corresponding to mod-est methylation changes, are likely to indicate differencebetween individual cell types and neurons. To obtainDM regions with consistent methylation changes betweenneurons and non-neurons in the majority of moleculescomprising the samples, we selected DM regions withminimum methylation difference above 0.55. The 1,708of these regions were lowly methylated in neurons andwere associated with 1,089 genes. The GO term enrich-ment analysis, performed using DAVID [30], revealed astrong association of these genes with various aspectsof neuronal development and function (see Additionalfile 2).
Large-scale datasetThe second dataset [31] consisted of 152 MethylC-seq libraries. The methylome samples obtained fromthese libraries with MethPipe [14] had mean coverage
11.2 (s.d. 2.7); 54 of these samples came from inflores-cence (flower cluster) and the remaining 98 from theleaf of Aradidopsis thaliana. RADMeth identified 13,576DM regions between the two groups of samples (seeAdditional file 1). Out of these, 5,049 DM regions contain-ing at least 10 CpG sites were retained for downstreamanalysis.
It is well known that methylation in Aradidopsis playsan important role in silencing of transposable elements(e.g. [32]), which are usually heavily methylated. Inter-estingly, most of the DM regions we found overlappedtransposons (1.781 observed over expected ratio; see alsoFigure 3). The methylation differences between inflo-rescence and leaf samples were modest: above 0.1 for1,271 DM regions and above 0.2 for just 129 regions,indicating relative loss of methylation within transposonsin a relatively small fraction of sequenced molecules.Promoter and gene bound DM regions were underrepre-sented, with 0.19 and 0.28 observed over expected ratiosrespectively.
Figure from (Dolzhenko and Andrew, 2014)
29/ 29
ReferencesI Michael J. Booth et al., Quantitative Sequencing of 5-Methylcytosine and 5-Hydroxymethylcytosine at Single-Base Resolution, Science, 336(6063):934-937, 2012
I Jeremy J Day & J David Sweatt, DNA methylation and memory formation, Nature Neuroscience 13:1319-1323, 2010
I Egor Dolzhenko and Andrew D Smith, Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencingexperiments, BMC Bioinformatics, 15:215, 2014
I Eckhardt F et al., DNA methylation profiling of human chromosomes 6, 20 and 22, Nature Genetics,38(12):1378-85, 2006.
I Felix Krueger and Simon R. Andrews, Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications, Bioinformatics, 27(11):1571-1572, 2011.
I Felix Krueger et al., DNA methylome analysis using short bisulfite sequencing data, Nature Methods 9, 145-151, 2012
I Jialong Lianga et al., Single-Cell Sequencing Technologies: Current and Future, Journal of Genetics and Genomics, 41(10):513-528, 2014
I Alexander Meissner, et al., Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis, Nucleic Acids Res., 33(18):5868-77,2005.
I Christoph Plass, et al., Mutations in regulators of the epigenome and their connections to global chromatin patterns in cancer, Nature Reviews Genetics 14, 765-780, 2013
I Dirk Schubeler, Epigenomics: Methylation matters, Nature 462:296-297, 2009
I Cornelia G Spruijt & Michiel Vermeulen, DNA methylation: old dog, new tricks?, Nature Structural & Molecular Biology 21, 949-954, 2014