uncovering the cannabis sativa methylome through enzymatic ... · 1 1.5 2 2.5 3 3.5 female leaf...
TRANSCRIPT
EM-seq WGBS
UUGTCGGAUUGC
TTGTCGGATTGC
Converted:
Sequenced:
CCGTCGGACCGC
mhm
CCGTCGGACCGC
ca
UUGTCGGAUUGC
Tet2
APOBEC
TTGTCGGATTGC
ca
CCGTCGGACCGC
mhm
WGBS
8 PCR cycles6 PCR cycles
EM-seq enables the first in depth analysis of the Cannabis sativa methylome and differential methylation identified genes involved in seed production and THCA production.
EM-seq libraries compared to WGBS libraries had:
0
0.5
1
1.5
2
2.5
3
3.5
Female Leaf EM-seq-1 Female Leaf EM-seq-2 Female Leaf WGBS-1 Female Leaf WGBS-2
Perc
ent D
uplic
atio
n
0
5
10
15
20
25
Female Leaf EM-seq-1 Female Leaf EM-seq-2 Female Leaf WGBS-1 Female Leaf WGBS-2
Yiel
d (n
M)
Brittany S. Sexton1, Louise Williams1, V.K. Chaithanya Ponnaluri1, Yvonne Helbert2, Lei Zhang2, Christine J. Sumner1, Fiona J. Stewart1, Bradley W. Langhorst1, Timothy Harkins2, Kevin J. McKernan2, Eileen T. Dimalanta1, Theodore B. Davis1
1New England Biolabs, Inc., Ipswich, MA 01938 2Medicinal Genomics Corp., Woburn, MA 01801
Uncovering the Cannabis sativa methylome through Enzymatic Methyl-seq
Introduction
Introduction
www.NEBNext.com
Methods
Results
Conclusions
Higher quality sequencing metrics with EM-seq compared to WGBS
ResultsCannabis sativa female leaf EM-seq libraries are superior to WGBS
EM-seq libraries outperform WGBS for Cannabis sativa inputDNA as shown in figures A-E: (A) Less PCR cycles are required(6 vs 8 cycles respectively) for EM-seq. (B) EM-seq librarieshave larger library insert sizes than WGBS. Additionally, theEM-seq protocol has been optimized for both standard andlarge insert libraries. (C) EM-seq libraries have lower duplicationpercentages. (D) GC distribution is more even for EM-seq thanWGBS. (E) More even base coverage for EM-seq than WGBS.50 million, 2 x 76 base reads were used for analysis (Figures A-D) and 130 million 2 x 76 base reads were used for Figure E.
C D
Cannabis sativa is an industrial crop producing more economic value than the top five crops in California combined. In the medical field, cannabinoids areused to treat and alleviate symptoms for a wide range of diseases, including nausea in cancer patients undergoing chemotherapy. Sex determination incannabis is an important economic factor. Pollination dramatically reduces cannabinoid expression in female plants. To express high levels of cannabinoids,female plants are grown in isolation of male pollen. However, female plants are capable of stress induced hermaphroditism that can lead to acres ofpollination and low cannabinoid yields.
5-methylcytosine (5mC) is an important epigenetic mechanism that regulates many cellular processes, including sex determination. To better understandthe hermaphroditic process in cannabis, more robust 5mC surveying tools are required. The current gold standard for determining 5mC, whole genomebisulfite sequencing (WGBS), introduces DNA damage and GC-biased sequences resulting in skewed methylation profiles. To further complicate matters,the cannabis genome is 66% AT-rich and 83% AT-rich after bisulfite conversion. Very few technologies currently address the complexity of plant methylationsignals, and to date no methylome exists for cannabis.
NEBNext Enzymatic Methyl-seq (EM-seq™) is a novel enzymatic method used to detect 5mC. EM-seq successfully identi fied the 5mCs within the cannabisgenome. Libraries were generated using genomic DNA extracted from female leaf for EM-seq and WGBS. To give insight into the methylation-basedregulation of the hermaphroditic process, EM-seq libraries were also prepared using genomic DNA from female flowers, female seeded flowers, and maleflowers. We conclude that EM-seq is superior to WGBS and delivers higher library yields, more accurate methylation information, reduced DNA damage,increased sequencing length, and decreased GC-bias. EM-seq enables the study of the previously unknown cannabis methylome and opens up thepossibility to discover critical and exciting regulatory functions for the hermaphroditic process.
Acknowledgements Thank you to Laurie Mazzola and Danielle Fuchs for their technical support.
PCRAmplificationAPOBEC
Ultra II End Prep/
Ligation
ShearedGenomic
DNATet2 Illumina
Sequencing
• Jamaican Lion genomic DNA was extracted from female clones (leaf, seeded and unseededflowers) and male sibling (flowers) plants.
• 50 ng of genomic DNA, spiked with control (unmethylated lambda DNA and CpG methylatedpUC19) were sheared using the Covaris® S2 instrument
• Sheared DNA was end-repaired then ligated to modified sequencing adaptors• Tet2 oxidizes 5mC to 5caC• APOBEC deaminates cytosines but not 5caC to uracils• Libraries were amplified using NEBNext® Q5UTM Master Mix and NEBNext Multiplex Oligos
for Illumina (96 Unique Dual Index Primer Pairs, E6440S)• Libraries were sequenced using an Illumina Nextseq 500, 2x76 base paired reads. 5caC are
sequenced as C and deaminated Cs as T.• Bisulfite conversion was performed using Zymo Research EZ DNA Methylation-GoldTM kit
Sample preparation
Data analysisMethyl Kit
MethylDackelMethyl
ExtractionAlign
(BWAmeth)SamblasterDeduplicate Picard
• Reads were aligned to Jamaican Lion reference genome (August 2018 assembly) andanalyzed using the tools in above flowchart.
• Four contigs from this assembly showed anomalous methylation patterns and were excluded.
A
Differential CpG methylation identified between Cannabis sativa flower tissuesA
A
EM-seq enables the identification of differential methylation across Cannabis sativa flower tissues: female flower and female seeded flower (clones) and themale flower (sibling). (A) % 5mC levels in the CpG, CHG, and CHH contexts for all flower EM-seq libraries. The female methylation levels for flower and seededflower were higher than the male flower indicating methylation patterns are potentially determined by sex. Unmethylated Lambda control was <0.5%methylated and pUC19 was >97.5% CpG methylated (data not shown). (B) Volcano plot of the significant (q<0.01) differential methylation calls for the CpGcontext between (1) female flower and female seeded flower (2) female flower and male flower. The differential methylation calls for female flower andfemale seeded flower identified 30 hypomethylated CpGs but no hypermethylated CpGs. The differential methylation calls for female and male floweridentified >50,000 hypomethylated CpGs and >11,000 hypermethylated. (C) Comparison of the hypomethylated CpGs with differential methylation betweenthe female flower samples & female and male flowers.
E
Coverage Depth
% o
f Bas
es a
t thi
s De
pth
Insert Size (bp)
B
PCR
Yiel
d (n
M)
% D
uplic
atio
n
0
25
50
75
100
CpG CHG CHH CpG CHG CHH CpG CHG CHH
0
10
20
30
40
LC-MS Female Leaf EM-seq Female Leaf WGBS Female Leaf
B
0
25
50
75
100
CpG CHG CHH CpG CHG CHH CpG CHG CHH CpG CHG CHH
Tota
l % 5
mC
EM-seq Female Leaf WGBS Female Leaf
Female Flower-1
Female Flower-2
Female Seeded Flower-1
Female Seeded Flower-2
Male Flower-1
Male Flower-2
MethylatedUnmethylated
Genomic Location (504 bp)
MethylatedUnmethylated
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Coverage Depth
Proje
ct
Medic
inalG
enom
ics_Leaf_
WG
BS
_E
Mseq_C
om
bin
edB
am
s
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
8.0%
9.0%
10.0%
Avg. F
rac B
ases
• Higher Library Yields with less PCR cycles• Lower Percent Duplication• More Even Base Coverage
• Larger Library insert Sizes• Less GC Bias• Similar Percentage Methylation as LC-MS
References: 1. Faust, G. G., and I. M. Hall. “SAMBLASTER: Fast Duplicate Marking and Structural Variant Read Extraction.”2. Pedersen et al. (2014) “Fast and Accurate Alignment of Long Bisulfite-Seq Reads.”3. Ryan D, Pederson, B “MethylDackel”4. Broad Institute. (2016). Picard tools.
Differential methylation analysis comparing female flower and seeded flower (clones) and the male flower (sibling) were performed. Significant differentialmethylation calls (q<0.01) for CpG context (1x coverage minimum) between female flower with female seeded flower and/or male flower were identi fied. (A)Region 1 BLASTs to the edestin gene family of the Cannabis sativa genome. These genes are linked to the positive regulation of seed production. Thefemale flower is CpG methylated in this region, suggesting the expression of the genes are turned off, while the female seeded flower is unmethylated atthis CpG. (B) Region 2 BLASTs to the THC acid synthase (THCA) gene of the Cannabis sativa genome and this gene is linked to the positive regulation ofTHCA production. Interestingly, the female flower is CpG unmethylated in this region, suggesting the expression of the gene is turned on, while the maleflower is methylated at this CpG, suggesting the expression of the gene is turned off. Male flowers produce log scale less THCA.
(A) Comparison of total % 5mC determined using LC-MS, EM-seq and WGBS for female leaf. The %5mC for LC-MS is determined as a percentage of total cytosines ((5mC/(total C))x100). 5mCpercentages for EM-seq and WGBS were determined by combining 5mC in the 3 methylated cytosinecontexts (CpG. CHH, CHG). EM-seq cytosine methylation numbers are closer to LCMS methylationvalues than WGBS. (B) % 5mC levels in the CpG, CHG, and CHH contexts for EM-seq and WGBSfor female leaf. Unmethylated Lambda control was <0.5% methylated for EM-seq and <2% forWGBS and was >97.5% CpG methylated for both EM-seq and WGBS (data not shown). (C) CpGcontext correlations of EM-seq and WGBS libraries (1x minimum coverage). Both EM-seq andWGBS libraries were highly correlated between replicates and methods (CHG and CHH context datanot shown). 50 million, 2 x76 base reads were used for methylation analysis.
q-va
lue
Methylation Difference
Hypo Hyper
Female & Female Seeded Flower Female & Male Flower
A B C
B
C
Methylation profiles identified genes involved in seed & cannabinoid production
20%
16%
12%
8%
4%
0%50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800
Insert Size
JL
0.00%
0.02%
0.04%
0.06%
0.08%
0.10%
0.12%
0.14%
0.16%
0.18%
0.20%
0.22%
Avg. %
of in
sert
s
% o
f Rea
ds
EM-seq (standard)EM-seq (large insert)WGBS
50 100 150 200 250 300 350 400 450 500 550 600 650 700
Norm
aliz
ed C
over
age
Percent GC0 10 20 30 40 50 60 70 80 90 100
EM-seq WGBS
5 10 15 20 25 30
Genomic Location (992 bp)
Female Flower-1
Female Flower-2
Female Seeded Flower-1
Female Seeded Flower-2
Male Flower-1
Male Flower-2
0.0000
0.0025
0.0050
0.0075
0.0100
−100 −50 0 50 100meth.diff
qvalue factor(factor)
hypo
0.0000
0.0025
0.0050
0.0075
0.0100
−100 −50 0 50 100meth.diff
qvalue
factor(factor)hypo
hyper
0.0100
0.0075
0.0050
0.0025
0.0000
-100 0 100Methylation Difference
-100 0 100
0.0100
0.0075
0.0050
0.0025
0.0000
100
75
50
25
0
% 5
mC
per C
con
text
40
30
20
10
0EM-seqLC-MS
EM-seq-1 EM-seq-2 WGBS-1 WGBS-2
EM-seq-1 EM-seq-2 WGBS-1 WGBS-2
100
75
50
25
0
% 5
mC
per C
con
text
Female Flower
Female Seeded Flower
Male Flower
EM-seq WGBS10
0
4
2
6
8
1
0
1.0
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8
2.0
2.2
q-va
lue
25
0
5
10
15
20
0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Female & Female Seeded Flower Female & Male Flower