uncovering the cannabis sativa methylome through enzymatic ... · 1 1.5 2 2.5 3 3.5 female leaf...

1
EM-seq WGBS UUGTCGGAUUGC TTGTCGGATTGC Converted: Sequenced: CCGTCGGACCGC m hm CCGTCGGACCGC ca UUGTCGGAUUGC Tet2 APOBEC TTGTCGGATTGC ca CCGTCGGACCGC m hm WGBS 8 PCR cycles 6 PCR cycles EM-seq enables the first in depth analysis of the Cannabis sativa methylome and differential methylation identified genes involved in seed production and THCA production. EM-seq libraries compared to WGBS libraries had: Brittany S. Sexton 1 , Louise Williams 1 , V.K. Chaithanya Ponnaluri 1 , Yvonne Helbert 2 , Lei Zhang 2 , Christine J. Sumner 1 , Fiona J. Stewart 1 , Bradley W. Langhorst 1 , Timothy Harkins 2 , Kevin J. McKernan 2 , Eileen T. Dimalanta 1 , Theodore B. Davis 1 1 New England Biolabs, Inc., Ipswich, MA 01938 2 Medicinal Genomics Corp., Woburn, MA 01801 Uncovering the Cannabis sativa methylome through Enzymatic Methyl-seq Introduction Introduction www.NEBNext.com Methods Results Conclusions Higher quality sequencing metrics with EM-seq compared to WGBS Results Cannabis sativa female leaf EM-seq libraries are superior to WGBS EM-seq libraries outperform WGBS for Cannabis sativa input DNA as shown in figures A-E: (A) Less PCR cycles are required (6 vs 8 cycles respectively) for EM-seq. (B) EM-seq libraries have larger library insert sizes than WGBS. Additionally, the EM-seq protocol has been optimized for both standard and large insert libraries. (C) EM-seq libraries have lower duplication percentages. (D) GC distribution is more even for EM-seq than WGBS. (E) More even base coverage for EM-seq than WGBS. 50 million, 2 x 76 base reads were used for analysis (Figures A- D) and 130 million 2 x 76 base reads were used for Figure E. C D Cannabis sativa is an industrial crop producing more economic value than the top five crops in California combined. In the medical field, cannabinoids are used to treat and alleviate symptoms for a wide range of diseases, including nausea in cancer patients undergoing chemotherapy. Sex determination in cannabis is an important economic factor. Pollination dramatically reduces cannabinoid expression in female plants. To express high levels of cannabinoids, female plants are grown in isolation of male pollen. However, female plants are capable of stress induced hermaphroditism that can lead to acres of pollination and low cannabinoid yields. 5-methylcytosine (5mC) is an important epigenetic mechanism that regulates many cellular processes, including sex determination. To better understand the hermaphroditic process in cannabi s, more robust 5mC surveying tools are required. The current gold standard for determining 5mC, whole genome bisulfite sequencing (WGBS), introduces DNA damage and GC-biased sequences resulting in skewed methylation profiles. To further complicate matters, the cannabis genome is 66% AT-rich and 83% AT-rich after bisulfite conversion. Very few technologies currently address the complexity of plant methylation signals, and to date no methylome exists for cannabis. NEBNext Enzymatic Methyl-seq (EM-seq™) is a novel enzymatic method used to detect 5mC. EM-seq successfully identified the 5mCs within the cannabis genome. Libraries were generated using genomic DNA extracted from female leaf for EM-seq and WGBS. To give insight into the methylation-based regulation of the hermaphroditic process, EM-seq libraries were also prepared using genomic DNA from female flowers, female seeded flowers, and male flowers. We conclude that EM-seq is superior to WGBS and delivers higher library yields, more accurate methylation information, reduced DNA damage, increased sequencing length, and decreased GC-bias. EM-seq enables the study of the previously unknown cannabis methylome and opens up the possibility to discover critical and exciting regulatory functions for the hermaphroditic process. Acknowledgements Thank you to Laurie Mazzola and Danielle Fuchs for their technical support. PCR Amplification APOBEC Ultra II End Prep/ Ligation Sheared Genomic DNA Tet2 Illumina Sequencing Jamaican Lion genomic DNA was extracted from female clones (leaf, seeded and unseeded flowers) and male sibling (flowers) plants. 50 ng of genomic DNA, spiked with control (unmethylated lambda DNA and CpG methylated pUC19) were sheared using the Covaris ® S2 instrument Sheared DNA was end-repaired then ligated to modified sequencing adaptors Tet2 oxidizes 5mC to 5caC APOBEC deaminates cytosines but not 5caC to uracils Libraries were amplified using NEBNext ® Q5U TM Master Mix and NEBNext Multiplex Oligos for Illumina (96 Unique Dual Index Primer Pairs, E6440S) Libraries were sequenced using an Illumina Nextseq 500, 2x76 base paired reads. 5caC are sequenced as C and deaminated Cs as T. Bisulfite conversion was performed using Zymo Research EZ DNA Methylation-Gold TM kit Sample preparation Data analysis Methyl Kit MethylDackel Methyl Extraction Align (BWAmeth) Samblaster Deduplicate Picard Reads were aligned to Jamaican Lion reference genome (August 2018 assembly) and analyzed using the tools in above flowchart. Four contigs from this assembly showed anomalous methylation patterns and were excluded. A Differential CpG methylation identified between Cannabis sativa flower tissues A A EM-seq enables the identification of differential methylation across Cannabis sativa flower tissues: female flower and female seeded flower (clones) and the male flower (sibling). (A) % 5mC levels in the CpG, CHG, and CHH contexts for all flower EM-seq libraries. The female methylation levels for flower and seeded flower were higher than the male flower indicating methylation patterns are potentially determined by sex. Unmethylated Lambda control was <0.5% methylated and pUC19 was >97.5% CpG methylated (data not shown). (B) Volcano plot of the significant (q<0.01) differential methylation calls for the CpG context between (1) female flower and female seeded flower (2) female flower and male flower. The differential methylation calls for female flower and female seeded flower identified 30 hypomethylated CpGs but no hypermethylated CpGs. The differential methylation calls for female and male flower identified >50,000 hypomethylated CpGs and >11,000 hypermethylated. (C) Comparison of the hypomethylated CpGs with differential methylation between the female flower samples & female and male flowers. E Coverage Depth % of Bases at this Depth Insert Size (bp) B PCR Yield (nM) % Duplication 5 0 5 CpG CHG CHH CpG CHG CHH CpG CHG CHH B CpG CHG CHH CpG CHG CHH CpG CHG CHH CpG CHG CHH Total % 5mC EM-seq Female Leaf WGBS Female Leaf Female Flower-1 Female Flower-2 Female Seeded Flower-1 Female Seeded Flower-2 Male Flower-1 Male Flower-2 Methylated Unmethylated Genomic Location (504 bp) Methylated Unmethylated Higher Library Yields with less PCR cycles Lower Percent Duplication More Even Base Coverage Larger Library insert Sizes Less GC Bias Similar Percentage Methylation as LC-MS References: 1. Faust, G. G., and I. M. Hall. “SAMBLASTER: Fast Duplicate Marking and Structural Variant Read Extraction.” 2. Pedersen et al. (2014) “Fast and Accurate Alignment of Long Bisulfite-Seq Reads.” 3. Ryan D, Pederson, B “MethylDackel” 4. Broad Institute. (2016). Picard tools. Differential methylation analysis comparing female flower and seeded flower (clones) and the male flower (sibling) were performed. Significant differential methylation calls (q<0.01) for CpG context (1x coverage minimum) between female flower with female seeded flower and/or male flower were identified. (A) Region 1 BLASTs to the edestin gene family of the Cannabis sativa genome. These genes are linked to the positive regulation of seed production. The female flower is CpG methylated in this region, suggesting the expression of the genes are turned off, while the female seeded flower is unmethylated at this CpG. (B) Region 2 BLASTs to the THC acid synthase (THCA) gene of the Cannabis sativa genome and this gene is linked to the positive regulation of THCA production. Interestingly, the female flower is CpG unmethylated in this region, suggesting the expression of the gene is turned on, while the male flower is methylated at this CpG, suggesting the expression of the gene is turned off. Male flowers produce log scale less THCA. (A) Comparison of total % 5mC determined using LC-MS, EM-seq and WGBS for female leaf. The % 5mC for LC-MS is determined as a percentage of total cytosines ((5mC/(total C))x100). 5mC percentages for EM-seq and WGBS were determined by combining 5mC in the 3 methylated cytosine contexts (CpG. CHH, CHG). EM-seq cytosine methylation numbers are closer to LCMS methylation values than WGBS. (B) % 5mC levels in the CpG, CHG, and CHH contexts for EM-seq and WGBS for female leaf. Unmethylated Lambda control was <0.5% methylated for EM-seq and <2% for WGBS and was >97.5% CpG methylated for both EM-seq and WGBS (data not shown). (C) CpG context correlations of EM-seq and WGBS libraries (1x minimum coverage). Both EM-seq and WGBS libraries were highly correlated between replicates and methods (CHG and CHH context data not shown). 50 million, 2 x76 base reads were used for methylation analysis. q-value Methylation Difference Hypo Hyper Female & Female Seeded Flower Female & Male Flower A B C B C Methylation profiles identified genes involved in seed & cannabinoid production 20% 16% 12% 8% 4% 0% 50 100 150 200 250 300 350 400 450 500 550 600 650 700 % of Reads EM-seq (standard) EM-seq (large insert) WGBS 50 100 150 200 250 300 350 400 450 500 550 600 650 700 Normalized Coverage Percent GC 0 10 20 30 40 50 60 70 80 90 100 EM-seq WGBS 5 10 15 20 25 30 Genomic Location (992 bp) Female Flower-1 Female Flower-2 Female Seeded Flower-1 Female Seeded Flower-2 Male Flower-1 Male Flower-2 100 50 0 50 100 0 5 0 5 0 100 50 0 50 100 0.0100 0.0075 0.0050 0.0025 0.0000 -100 0 100 Methylation Difference -100 0 100 0.0100 0.0075 0.0050 0.0025 0.0000 100 75 50 25 0 % 5mC per C context 40 30 20 10 0 EM-seq LC-MS EM-seq-1 EM-seq-2 WGBS-1 WGBS-2 EM-seq-1 EM-seq-2 WGBS-1 WGBS-2 100 75 50 25 0 % 5mC per C context Female Flower Female Seeded Flower Male Flower EM-seq WGBS 10 0 4 2 6 8 1 0 1.0 0.2 0.4 0.6 0.8 1.2 1.4 1.6 1.8 2.0 2.2 q-value 25 0 5 10 15 20 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Female & Female Seeded Flower Female & Male Flower

Upload: others

Post on 21-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Uncovering the Cannabis sativa methylome through Enzymatic ... · 1 1.5 2 2.5 3 3.5 Female Leaf EM-seq-1Female Leaf EM-seq-2Female Leaf WGBS-1 Female Leaf WGBS-2 n 0 5 10 15 20 25

EM-seq WGBS

UUGTCGGAUUGC

TTGTCGGATTGC

Converted:

Sequenced:

CCGTCGGACCGC

mhm

CCGTCGGACCGC

ca

UUGTCGGAUUGC

Tet2

APOBEC

TTGTCGGATTGC

ca

CCGTCGGACCGC

mhm

WGBS

8 PCR cycles6 PCR cycles

EM-seq enables the first in depth analysis of the Cannabis sativa methylome and differential methylation identified genes involved in seed production and THCA production.

EM-seq libraries compared to WGBS libraries had:

0

0.5

1

1.5

2

2.5

3

3.5

Female Leaf EM-seq-1 Female Leaf EM-seq-2 Female Leaf WGBS-1 Female Leaf WGBS-2

Perc

ent D

uplic

atio

n

0

5

10

15

20

25

Female Leaf EM-seq-1 Female Leaf EM-seq-2 Female Leaf WGBS-1 Female Leaf WGBS-2

Yiel

d (n

M)

Brittany S. Sexton1, Louise Williams1, V.K. Chaithanya Ponnaluri1, Yvonne Helbert2, Lei Zhang2, Christine J. Sumner1, Fiona J. Stewart1, Bradley W. Langhorst1, Timothy Harkins2, Kevin J. McKernan2, Eileen T. Dimalanta1, Theodore B. Davis1

1New England Biolabs, Inc., Ipswich, MA 01938 2Medicinal Genomics Corp., Woburn, MA 01801

Uncovering the Cannabis sativa methylome through Enzymatic Methyl-seq

Introduction

Introduction

www.NEBNext.com

Methods

Results

Conclusions

Higher quality sequencing metrics with EM-seq compared to WGBS

ResultsCannabis sativa female leaf EM-seq libraries are superior to WGBS

EM-seq libraries outperform WGBS for Cannabis sativa inputDNA as shown in figures A-E: (A) Less PCR cycles are required(6 vs 8 cycles respectively) for EM-seq. (B) EM-seq librarieshave larger library insert sizes than WGBS. Additionally, theEM-seq protocol has been optimized for both standard andlarge insert libraries. (C) EM-seq libraries have lower duplicationpercentages. (D) GC distribution is more even for EM-seq thanWGBS. (E) More even base coverage for EM-seq than WGBS.50 million, 2 x 76 base reads were used for analysis (Figures A-D) and 130 million 2 x 76 base reads were used for Figure E.

C D

Cannabis sativa is an industrial crop producing more economic value than the top five crops in California combined. In the medical field, cannabinoids areused to treat and alleviate symptoms for a wide range of diseases, including nausea in cancer patients undergoing chemotherapy. Sex determination incannabis is an important economic factor. Pollination dramatically reduces cannabinoid expression in female plants. To express high levels of cannabinoids,female plants are grown in isolation of male pollen. However, female plants are capable of stress induced hermaphroditism that can lead to acres ofpollination and low cannabinoid yields.

5-methylcytosine (5mC) is an important epigenetic mechanism that regulates many cellular processes, including sex determination. To better understandthe hermaphroditic process in cannabis, more robust 5mC surveying tools are required. The current gold standard for determining 5mC, whole genomebisulfite sequencing (WGBS), introduces DNA damage and GC-biased sequences resulting in skewed methylation profiles. To further complicate matters,the cannabis genome is 66% AT-rich and 83% AT-rich after bisulfite conversion. Very few technologies currently address the complexity of plant methylationsignals, and to date no methylome exists for cannabis.

NEBNext Enzymatic Methyl-seq (EM-seq™) is a novel enzymatic method used to detect 5mC. EM-seq successfully identi fied the 5mCs within the cannabisgenome. Libraries were generated using genomic DNA extracted from female leaf for EM-seq and WGBS. To give insight into the methylation-basedregulation of the hermaphroditic process, EM-seq libraries were also prepared using genomic DNA from female flowers, female seeded flowers, and maleflowers. We conclude that EM-seq is superior to WGBS and delivers higher library yields, more accurate methylation information, reduced DNA damage,increased sequencing length, and decreased GC-bias. EM-seq enables the study of the previously unknown cannabis methylome and opens up thepossibility to discover critical and exciting regulatory functions for the hermaphroditic process.

Acknowledgements Thank you to Laurie Mazzola and Danielle Fuchs for their technical support.

PCRAmplificationAPOBEC

Ultra II End Prep/

Ligation

ShearedGenomic

DNATet2 Illumina

Sequencing

• Jamaican Lion genomic DNA was extracted from female clones (leaf, seeded and unseededflowers) and male sibling (flowers) plants.

• 50 ng of genomic DNA, spiked with control (unmethylated lambda DNA and CpG methylatedpUC19) were sheared using the Covaris® S2 instrument

• Sheared DNA was end-repaired then ligated to modified sequencing adaptors• Tet2 oxidizes 5mC to 5caC• APOBEC deaminates cytosines but not 5caC to uracils• Libraries were amplified using NEBNext® Q5UTM Master Mix and NEBNext Multiplex Oligos

for Illumina (96 Unique Dual Index Primer Pairs, E6440S)• Libraries were sequenced using an Illumina Nextseq 500, 2x76 base paired reads. 5caC are

sequenced as C and deaminated Cs as T.• Bisulfite conversion was performed using Zymo Research EZ DNA Methylation-GoldTM kit

Sample preparation

Data analysisMethyl Kit

MethylDackelMethyl

ExtractionAlign

(BWAmeth)SamblasterDeduplicate Picard

• Reads were aligned to Jamaican Lion reference genome (August 2018 assembly) andanalyzed using the tools in above flowchart.

• Four contigs from this assembly showed anomalous methylation patterns and were excluded.

A

Differential CpG methylation identified between Cannabis sativa flower tissuesA

A

EM-seq enables the identification of differential methylation across Cannabis sativa flower tissues: female flower and female seeded flower (clones) and themale flower (sibling). (A) % 5mC levels in the CpG, CHG, and CHH contexts for all flower EM-seq libraries. The female methylation levels for flower and seededflower were higher than the male flower indicating methylation patterns are potentially determined by sex. Unmethylated Lambda control was <0.5%methylated and pUC19 was >97.5% CpG methylated (data not shown). (B) Volcano plot of the significant (q<0.01) differential methylation calls for the CpGcontext between (1) female flower and female seeded flower (2) female flower and male flower. The differential methylation calls for female flower andfemale seeded flower identified 30 hypomethylated CpGs but no hypermethylated CpGs. The differential methylation calls for female and male floweridentified >50,000 hypomethylated CpGs and >11,000 hypermethylated. (C) Comparison of the hypomethylated CpGs with differential methylation betweenthe female flower samples & female and male flowers.

E

Coverage Depth

% o

f Bas

es a

t thi

s De

pth

Insert Size (bp)

B

PCR

Yiel

d (n

M)

% D

uplic

atio

n

0

25

50

75

100

CpG CHG CHH CpG CHG CHH CpG CHG CHH

0

10

20

30

40

LC-MS Female Leaf EM-seq Female Leaf WGBS Female Leaf

B

0

25

50

75

100

CpG CHG CHH CpG CHG CHH CpG CHG CHH CpG CHG CHH

Tota

l % 5

mC

EM-seq Female Leaf WGBS Female Leaf

Female Flower-1

Female Flower-2

Female Seeded Flower-1

Female Seeded Flower-2

Male Flower-1

Male Flower-2

MethylatedUnmethylated

Genomic Location (504 bp)

MethylatedUnmethylated

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Coverage Depth

Proje

ct

Medic

inalG

enom

ics_Leaf_

WG

BS

_E

Mseq_C

om

bin

edB

am

s

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

8.0%

9.0%

10.0%

Avg. F

rac B

ases

• Higher Library Yields with less PCR cycles• Lower Percent Duplication• More Even Base Coverage

• Larger Library insert Sizes• Less GC Bias• Similar Percentage Methylation as LC-MS

References: 1. Faust, G. G., and I. M. Hall. “SAMBLASTER: Fast Duplicate Marking and Structural Variant Read Extraction.”2. Pedersen et al. (2014) “Fast and Accurate Alignment of Long Bisulfite-Seq Reads.”3. Ryan D, Pederson, B “MethylDackel”4. Broad Institute. (2016). Picard tools.

Differential methylation analysis comparing female flower and seeded flower (clones) and the male flower (sibling) were performed. Significant differentialmethylation calls (q<0.01) for CpG context (1x coverage minimum) between female flower with female seeded flower and/or male flower were identi fied. (A)Region 1 BLASTs to the edestin gene family of the Cannabis sativa genome. These genes are linked to the positive regulation of seed production. Thefemale flower is CpG methylated in this region, suggesting the expression of the genes are turned off, while the female seeded flower is unmethylated atthis CpG. (B) Region 2 BLASTs to the THC acid synthase (THCA) gene of the Cannabis sativa genome and this gene is linked to the positive regulation ofTHCA production. Interestingly, the female flower is CpG unmethylated in this region, suggesting the expression of the gene is turned on, while the maleflower is methylated at this CpG, suggesting the expression of the gene is turned off. Male flowers produce log scale less THCA.

(A) Comparison of total % 5mC determined using LC-MS, EM-seq and WGBS for female leaf. The %5mC for LC-MS is determined as a percentage of total cytosines ((5mC/(total C))x100). 5mCpercentages for EM-seq and WGBS were determined by combining 5mC in the 3 methylated cytosinecontexts (CpG. CHH, CHG). EM-seq cytosine methylation numbers are closer to LCMS methylationvalues than WGBS. (B) % 5mC levels in the CpG, CHG, and CHH contexts for EM-seq and WGBSfor female leaf. Unmethylated Lambda control was <0.5% methylated for EM-seq and <2% forWGBS and was >97.5% CpG methylated for both EM-seq and WGBS (data not shown). (C) CpGcontext correlations of EM-seq and WGBS libraries (1x minimum coverage). Both EM-seq andWGBS libraries were highly correlated between replicates and methods (CHG and CHH context datanot shown). 50 million, 2 x76 base reads were used for methylation analysis.

q-va

lue

Methylation Difference

Hypo Hyper

Female & Female Seeded Flower Female & Male Flower

A B C

B

C

Methylation profiles identified genes involved in seed & cannabinoid production

20%

16%

12%

8%

4%

0%50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800

Insert Size

JL

0.00%

0.02%

0.04%

0.06%

0.08%

0.10%

0.12%

0.14%

0.16%

0.18%

0.20%

0.22%

Avg. %

of in

sert

s

% o

f Rea

ds

EM-seq (standard)EM-seq (large insert)WGBS

50 100 150 200 250 300 350 400 450 500 550 600 650 700

Norm

aliz

ed C

over

age

Percent GC0 10 20 30 40 50 60 70 80 90 100

EM-seq WGBS

5 10 15 20 25 30

Genomic Location (992 bp)

Female Flower-1

Female Flower-2

Female Seeded Flower-1

Female Seeded Flower-2

Male Flower-1

Male Flower-2

0.0000

0.0025

0.0050

0.0075

0.0100

−100 −50 0 50 100meth.diff

qvalue factor(factor)

hypo

0.0000

0.0025

0.0050

0.0075

0.0100

−100 −50 0 50 100meth.diff

qvalue

factor(factor)hypo

hyper

0.0100

0.0075

0.0050

0.0025

0.0000

-100 0 100Methylation Difference

-100 0 100

0.0100

0.0075

0.0050

0.0025

0.0000

100

75

50

25

0

% 5

mC

per C

con

text

40

30

20

10

0EM-seqLC-MS

EM-seq-1 EM-seq-2 WGBS-1 WGBS-2

EM-seq-1 EM-seq-2 WGBS-1 WGBS-2

100

75

50

25

0

% 5

mC

per C

con

text

Female Flower

Female Seeded Flower

Male Flower

EM-seq WGBS10

0

4

2

6

8

1

0

1.0

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

2.0

2.2

q-va

lue

25

0

5

10

15

20

0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Female & Female Seeded Flower Female & Male Flower