application of epigenetic techniques to study …...dunn, & mager, 2005). for instance, changes...

Application of Epigenetic Techniques to Study Lactase Expression

By Andrea Constantinof

A thesis submitted in conformity with the requirements for the degree of Master of Science

Pharmacology and Toxicology University of Toronto

© Copyright by Andrea Constantinof 2014

ii

Application of Epigenetic Techniques to Study Lactase Expression

Andrea Constantinof

Master of Science

Pharmacology and Toxicology

University of Toronto

2014

ABSTRACT

DNA methylation is a highly studied epigenetic modification. Numerous techniques have

been developed to investigate DNA methylation and its role in gene regulation. Lactose

intolerance has been identified as a simple phenotype, regulated by a single gene. Due to its

simple phenotype, lactose intolerance represents an ideal model to investigate the

epigenetic mechanisms that govern gene expression. Here, we investigated lactase

expression in aging mice, and used molecular inversion probe enrichment followed by DNA

sequencing to study epigenetic modifications in segmental duplications and unique regions

of the genome associated with the lactose intolerance in humans. Lactase was expressed in

a mosaic pattern in the mouse small intestine, and expression significantly decreased in

older mice. Molecular inversion probe enrichment of unique genomic regions proved

challenging, and did not provide informative data. No associations between lactase

expression levels and the DNA methylation profile of the lactase gene in segmental

duplications was found.

iii

Table of Contents

ABSTRACT.................................................................................................................................................................... ii

LIST OF TABLES .......................................................................................................................................................... v

LIST OF FIGURES ........................................................................................................................................................ vi

INTRODUCTION ........................................................................................................................................................... 1

DNA METHYLATION AND DNA METHYLTRANSFERASES ................................................................................. 3

DEMETHYLATION ................................................................................................................................................... 5

CPG ISLANDS .......................................................................................................................................................... 6

HISTONE MODIFICATIONS ..................................................................................................................................... 7

EPIGENETICS AND DISEASE ................................................................................................................................... 8

THE STUDY OF GENOME-WIDE DNA METHYLATION .......................................................................................... 9

SEQUENCING PLATFORMS ....................................................................................................................................... 11

HYBRIDIZATION SEQUENCING............................................................................................................................ 12

PYROSEQUENCING ............................................................................................................................................... 14

NEXT GENERATION SEQUENCING ...................................................................................................................... 16

ENRICHMENT TECHNIQUES FOR NEXT GENERATION SEQUENCING ................................................................... 18

MTAG ................................................................................................................................................................... 18

POLYMERASE CHAIN REACTION ........................................................................................................................ 19

MOLECULAR INVERSION PROBES ...................................................................................................................... 20

EPIGENETIC MECHANISMS: LACTASE EXPRESSION IN MICE ............................................................................... 23

AIM OF THE STUDY- EPIGENETIC MECHANISMS IN LACTASE EXPRESSION IN MICE .................................... 25

AIM OF THIS STUDY- EPIGENETIC TECHNIQUES IN LACTOSE INTOLERANCE ................................................ 30

HYPOTHESES AND RATIONALE ............................................................................................................................... 31

HYPOTHESIS 1 ..................................................................................................................................................... 31

RATIONALE 1 ....................................................................................................................................................... 31

HYPOTHESIS 2 ..................................................................................................................................................... 31

RATIONALE 2 ....................................................................................................................................................... 31

METHODS ................................................................................................................................................................. 33

RNA EXTRACTION FROM TISSUE ....................................................................................................................... 33

CDNA SYNTHESIS ............................................................................................................................................... 33

REAL-TIME PCR ................................................................................................................................................. 34

HUMAN MOLECULAR INVERSION PROBE SAMPLE SELECTION ...................................................................... 35

iv

MOLECULAR INVERSION PROBES ....................................................................................................................... 38

SEQUENCING ........................................................................................................................................................ 41

STATISTICAL ANALYSIS ...................................................................................................................................... 41

MOLECULAR INVERSION PROBE ANALYSIS .................................................................................................. 42

CORRELATION OF METHYLATION LEVELS BETWEEN TECHNICAL REPLICATES ...................................... 42

ANALYSIS OF METHYLATION AND LACTASE EXPRESSION .......................................................................... 43

RESULTS ................................................................................................................................................................... 44

LACTASE MRNA EXPRESSION IN MICE ............................................................................................................. 44

MOLECULAR INVERSION PROBES: LACTASE INTOLERANCE IN HUMANS ...................................................... 47

DISCUSSION .............................................................................................................................................................. 52

LACTASE EXPRESSION IN MICE .......................................................................................................................... 52

MOLECULAR INVERSION PROBES: LACTASE INTOLERANCE IN HUMANS ...................................................... 53

SPECIFICITY ..................................................................................................................................................... 53

SENSITIVITY ..................................................................................................................................................... 55

REPRODUCIBILITY ........................................................................................................................................... 56

AMOUNT OF DNA REQUIRED ........................................................................................................................ 57

METHYLATION AND LACTASE EXPRESSION ...................................................................................................... 57

FUTURE DIRECTIONS .......................................................................................................................................... 61

REFERENCES ............................................................................................................................................................ 64

v

LIST OF TABLES

Table 1…………………………………………………………………………………………………………………………18

Table 2…………………………………………………………………………………………………………………………18

Table 3…………………………………………………………………………………………………………………………37

Table 4…………………………………………………………………………………………………………………………37

Table 5…………………………………………………………………………………………………………………………38

vi

LIST OF FIGURES

Figure 1………………………………………………………………………………………………………………………..22

Figure 2………………………………………………………………………………………………………………………..45

Figure 3………………………………………………………………………………………………………………………..46

Figure 4………………………………………………………………………………………………………………………..48

Figure 5………………………………………………………………………………………………………………………..49

Figure 6………………………………………………………………………………………………………………………..50

Figure 7………………………………………………………………………………………………………………………..51

1

INTRODUCTION

The term ‘epigenetics’ was first coined by Conrad Hal Waddington in 1942 to

describe mechanisms which link the gene to phenotype (Waddington, 2012). Since then,

the definition of epigenetics has been refined to the study of reversible changes to

chromatin structure which alter gene function without altering DNA sequences, as well as

being mitotically and/or meiotically heritable (Weichenhan & Plass, 2013). An example of

epigenetic influence on phenotypes can be seen in the study of agouti mice; epigenetic

modifications (or lack thereof) in the Intra-cisternal A Particle (IAP) transposon causes

variation in the fur coat colour of otherwise genetically identical mice (Argeson, Nelson, &

Siracusa, 1996; Michaud et al., 1994; Morgan, Sutherland, Martin, & Whitelaw, 1999).

Gradients of fur colors produced correspond to the amount of epigenetic modification in

the IAP transposon. A solid yellow coat is produced when the IAP region lacks epigenetic

modifications, varying levels of modification produce coats with both yellow and black fur,

and a black coat is produced when the IAP region is heavily modified. The modification

responsible for this effect is the methylation of cytosine residues to create 5-methyl

cytosine (Michaud et al., 1994).

Not only are epigenetic modifications essential for the regulation of tissue specific,

housekeeping, and imprinted gene expression, but they also maintain genomic stability

through silencing transposable elements of the genome (Romanish, Lock, van de Lagemaat,

Dunn, & Mager, 2005). For instance, changes in epigenetic ‘code’ within the vicinity of

specific gene clusters can regulate cell differentiation throughout development. Once cells

have completed the differentiation process, epigenetic modifications have been observed to

2

silence the expression of genes specific to other cell types, as in X-chromosome inactivation

(Bird, 2002). Females silence one of their two X-chromosomes, so that X-linked genes from

only one are expressed (Inbar-Feigenberg, Choufani, Butcher, Roifman, & Weksberg, 2013).

This process occurs to compensate for the extra set of X-linked genes present in females,

and the X-chromosome selected for inactivation is randomly chosen during embryogenesis

in the female blastocyst. The long non-coding RNA gene, X-inactivation specific transcript

(XIST), coats the X chromosome to be inactivated and recruits repressing proteins. In

addition, epigenetic modifications of histones and DNA, such as methylation, occur in

promoter regions to ensure the silencing of the X- chromosome (Morey & Avner, 2010).

Although the cells in our bodies carry the same genome from inception throughout

development, each is fated to perform distinct tasks through the process of differentiation,

dictated by the controlled expression of specific genes (Millan, 2013). Understanding the

mechanisms through which epigenetic factors act may not only lead to the development

and discovery of new therapies for cancer and other diseases in which epigenetic defects

are an associated factor, but they may also yield important insight into how these

biologically complex processes are capable of governing themself.

Epigenetic modifications are regulated by various mechanisms. This thesis primarily

focuses on the mechanisms by which epigenetic modifications are made to DNA, and how

these modifications regulate development and disease. In addition, multiple techniques

developed to interrogate epigenetic DNA modifications are examined. Here, we examine

lactase expression in mice, and use the lactase intolerance phenotype in humans as a vector

for the application and analysis of a novel DNA enrichment technique.

3

DNA METHYLATION AND DNA METHYLTRANSFERASES

In humans, the most-investigated epigenetic modification is DNA methylation; that

is, the addition of a methyl group to the cytosine pyrimidine ring within a cytosine-

phosphate-guanine (CpG) dinucleotide, at the carbon 5 position (Bird, 2002). There are

three different types of DNA methyltransferases (DNMTs), each with distinct roles for

generating methylated residues: DNMT1, DNMT3a and DNMT3b. While DMNT1 is

responsible for copying DNA methylation patterns onto newly replicated DNA, DNMT3a

and DNMT3b generate de novo DNA methylation in response to different stimuli. For

examples, diet (Ge et al., 2014), exercise (Ling & Ronn, 2014), and up-bringing environment

(Weaver et al., 2004) have all been shown to affect DNA methylation.

After fertilization, a major reorganization of zygotic DNA methylation, or

‘reprogramming’, occurs in which the majority of methylations of DNA residues are

removed, and a novel set of epigenetic modifications are established as necessary

blueprints for the different tissues of the developing embryo (Choi et al., 2011; Morgan,

Santos, Green, Dean, & Reik, 2005). DNMT3a and DNMT3b re-establish a novel set of

epigenetic modifications during gametogenesis, and again after implantation (that is,

adherence of the blastocyst to the uterine wall). Furthermore, the novel set of epigenetic

modifications established during gametogenesis establish genes with parent-of-origin

specific expression (Okano, Bell, Haber, & Li, 1999). Thus, each resulting cell from

fertilization has its own epigenetic signature, which must be carefully maintained by

DNMT1 in order to properly regulate gene expression for cells of any given lineage

(Morgan et al., 2005). During cell division, the protein UHRF1 (ubiquitin-like, containing

4

PHD and Ring finger domain 1) recognizes and binds to hemi-methylated DNA, that is, DNA

in which the parent strand is methylated but the daughter strand is not (Hashimoto,

Vertino, & Cheng, 2010). Using UHRF1 as a guide, DNMT1 transfers parental strand

methylation marks to the daughter strand. To do so, DNMT1 flips cytosine rings out from

newly synthesized DNA to form an intermediate complex before incorporating a methyl

group from S-adenosyl-L methionine (SAM) onto the carbon 5 position of the cytosine

(Hashimoto et al., 2010).

While DNMT3a and DNMT3b are highly expressed in undifferentiated embryonic

stem cells, their expression decreases to low levels after eventual differentiation to adult

somatic tissues (Okano et al., 1999). Unlike DNMT1, DNMT3a and DNMT3b show no

preference for hemi-methylated DNA and are not essential to maintaining DNA

methylation. In vitro studies have demonstrated that cell lines lacking both DNMT3a and

DNMT3b are incapable of de novo methylation, yet cell lines expressing either of the two

alone remain able to undergo de novo methylation (Okano et al., 1999; Okashita et al.,

2014). Although this finding suggests possible redundancy in the functionality of these

enzymes, studies in mice deficient for either DNMT3a or DNMT3b have revealed distinct

requirements for each methylase during development. Whereas mice lacking DNMT3a

developed to term quickly but became runted and died within 4 weeks after birth, lack of

DNMT3b results in embryonic lethality (Okano et al., 1999).

DNMT1 expression is similarly essential during embryonic development. As it is

critical for maintaining methylation that silences transposable and repetitive elements,

deletion of DNMT1 during gestation results in a 60% loss of global methylation in the

blastocyte after implantation and is consequently lethal to the embryo (Carlone et al., 2005;

5

E. Li, Bestor, & Jaenisch, 1992). In addition, DNMT1 is required for both proper

differentiation of embryonic stem cells, and optimal self-renewal capacity in adult stem

cells (Broske et al., 2009; Lee et al., 2001; Sen, Reuter, Webster, Zhu, & Khavari, 2010;

Sheaffer et al., 2014; Trowbridge, Snow, Kim, & Orkin, 2009).

DEMETHYLATION

Genome-wide loss of cytosine methylation, termed DNA demethylation, occurs at

specific developmental stages within the cells of preimplantation embryos and in

primordial germ cells (PGCs). This global DNA demethylation removes methyl groups

epigenetically inherited from both parental genomes, and is required for establishing

pluripotency in the cells of early embryos (Wu & Zhang, 2014). DNA methylation can also

be progressively lost under circumstances such as repeated cell divisions, where

methylation is not maintained by DNMT1; however, this passive demethylation occurs too

slowly to fully explain the global demethylation observed in preimplantation embryos and

PGCs (Wu & Zhang, 2014).

Active demethylation can better account for the rapid global demethylation which

occurs during embryonic development. The first step of active DNA demethylation occurs

via the base excision repair (BER) pathway in mammals. Ten eleven translocation (TET)

proteins act as methylcytosine dioxygenases to hydroxylate 5-methylcytosine (5mC) to 5-

hydroxymethylcytosine (5hmC) (Hackett, Zylicz, & Surani, 2012; Okashita et al., 2014) .

Until recently, it was thought that the protein activation-induced cytidine deaminase (AID)

would next remove the amine group from 5hmC to produce 5-hydroxymethyluridine

(5hmU), a substrate for DNA glycosylases that regenerate cytosines as well as the BER

pathway (Junjie U. Guo, Su, Zhong, Ming, & Song, 2011). Recently however, Nabel et al.

6

(2012) showed that 5hmC is not a substrate for AID, suggesting that the function of

deaminases may not be the main pathway for active DNA demethylation.

To continue, 5hmC is further oxidized to 5-formylcytosine (5fC) and 5-

carboxylcytosine (5caC), which are both repaired by thymine DNA glycosylase (TDG) to

produce unmodified cytosine (He et al., 2011; Ito et al., 2011). TET proteins are responsible

for the oxidation of 5mC to 5hmC as well as from 5hmC to 5fC/5caC; however, the genomic

content of 5hmC is much higher in embryonic stem cells than that of 5fC/5caC. Accordingly,

the conversion of 5hmC to 5fC/5caC appears to be tightly regulated by TET protein activity,

but yet the mechanisms regulating TET proteins in active DNA demethylation remain

unclear (Pastor, Aravind, & Rao, 2013).

CPG ISLANDS

Methylated cytosine residues are most often found next to a guanine residue.

Cytosine nucleotides followed by a guanine nucleotide are known as CpG dinucleotides, in

which the ‘p' represents the phosphate bond linking the two nucleotides. CpG

dinucleotides are highly mutagenic, and occur with an incidence five times lower in the

genome than would be expected by chance. The low frequency of this dinucleotide is due to

the elevated rates of transition mutations that occur through spontaneous deamination of

5mC to thymidine (J. B. Li et al., 2009).

CpG dinucleotides are found throughout the genome within genes, intergenic

regions, repetitive elements, and finally in clusters called CpG islands. CpG islands are

typically located at the promoter regions of genes, and usually tend to be protected from

methylation (Carninci et al., 2006) despite the fact that 70-90% of CpG dinucleotides are

methylated in healthy somatic cells (Miranda & Jones, 2007). When DNA methylation in

7

CpG islands does occur, it can inhibit the binding of transcription factors and in turn

suppress gene expression (Ball et al., 2009). In particular, this can occur through the

recruitment of a family of methyl-CpG-binding proteins able to recognize methylated

cytosines in mammals. These proteins all contain a homologous methyl-binding domain

(MBD1, MBD2, MBD3, MBD4, and MeCP2), and cooperate with a non-homologous methyl-

binding protein named KAISO. These proteins prevent transcription by both preventing

activating transcription factors from binding to their target sequences, as well as by

recruiting enzymes which catalyze histone posttranslational modifications (in turn

mediating structural changes in chromatin that repress gene expression) (Miranda & Jones,

2007; Thurman et al., 2012). In addition, 5mC can silence genes since transcription factors

do not bind efficiently to methylated DNA (Ball et al., 2009). CpGs found in intergenic

regions and repetitive elements are essential for maintaining genomic integrity. Extensive

CpG methylation in these regions protects the genome from transposition, transcriptional

interference from strong promoters (Romanish et al., 2005), and illegitimate

recombination during cell division (Gonzalo et al., 2006).

HISTONE MODIFICATIONS

As mentioned above, CpG methylation regulates gene expression in conjunction

with other epigenetic marks such as histone modification. N-terminal tails of histones can

undergo various posttranslational chemical modifications, including acetylation,

methylation, phosphorylation, sumoylation, and ubiquitination. For example, histone

methylation at two distinct sites, H3K9me3 and H3K27me3, is required alongside

hypermethylation of CpG islands for successful X-chromosome inactivation (Reik & Lewis,

2005).

8

In contrast, lysine residues on the N-terminal histone tails can be acetylated to

promote an open chromatin structure - consequently increasing gene expression.

Acetylation of lysine neutralizes its positive charge, and thereby weakens the affinity of the

histone for negatively charged DNA. This weaker affinity leads to a looser interaction

between the two, allowing for DNA to be more accessible to transcriptional machinery

(Hashimoto et al., 2010).

EPIGENETICS AND DISEASE

Abnormal regulation of epigenetic modifications to DNA or histones may negatively

impact vital metabolic pathways that trigger pathological conditions. For example, genome-

wide hypomethylation was the earliest epigenetic aberration found in various cancers

(Berdasco & Esteller, 2010). In general, cancer cells consistently present with

hypomethylated intergenic intervals and repetitive elements, along with hypermethylated

CpG islands (Berdasco & Esteller, 2010).

More recently, epigenetic mechanisms have been studied in the context of complex,

non-Mendelian diseases. The hallmark of a non-Mendelian disease is discordant

inheritance between monozygotic (MZ) twins, or those twins colloquially known as

identical. To illustrate, consider that concordant inheritance in MZ twins is only ~15% for

breast cancer, ~20% for ulcerative colitis, 25-30% for multiple sclerosis, 25-45% for

diabetes, 50% for schizophrenia, and 40-70% for Alzheimer’s disease (Petronis, 2001,

2010), all of which are examples of disease displaying non-Mendelian inheritance. In

addition, males and females are also differentially susceptible to non- Mendelian diseases;

multiple sclerosis, rheumatoid arthritis, Crohn’s disease, panic disorders, structural heart

disease, and hyperthyroidism are all more common among females, whilst males are more

9

often affected by autism, Hirschspurng’s disease, ulcerative colitis, Parkinson’s disease,

alcoholism, and allergies (Kaminsky, Wang, & Petronis, 2006).

Typically, the discordant inheritance in MZ twins and gender specificity observed

within complex diseases have been explained by both differential environmental factors as

well as sex-linked genes, respectively. Although few complex-disease-causing

environmental factors (e.g. smoking in lung cancer, or diet in cardiovascular disease) have

been identified, MZ twin inheritance discords may be better explained by the partial

stability of contributing epigenetic factors. These factors could allow for a substantial

degree of disease-relevant epigenetic dissimilarities to accumulate in one MZ twin or the

other (Petronis, 2010). Similarly, while sex-linked genes cannot fully explain the gender-

specific epidemiology of the relevant complex diseases, it is possible that epigenetic

mechanisms are differentially regulated by sex hormones (Gabory, Attig, & Junien, 2009).

Strong evidence supports the notion that epigenetic mechanisms may play a causal

role in complex, non-Mendelian disease. Considering the tremendous cost in both

resources and suffering these afflictions collectively toll upon us, the appeal of a potentially

novel class of therapeutics exploiting epigenetic mechanistics becomes obvious. Through

the years, many tools and techniques have been developed to investigate the components,

capabilities, and roles of epigenetic processes, a select few of which are discussed below.

THE STUDY OF GENOME-WIDE DNA METHYLATION

DNA methylation is studied through different combinations of enrichment and

analytical techniques. Understanding how genomic methylation profiles vary in disease

states is paramount to developing therapeutic treatments. Current methods used to study

the genome-wide methylation levels of CpG sites involve the capture and enrichment of

10

DNA, and can be divided into three categories: (1) restriction endonuclease-based

methods; (2) affinity capture-based techniques; and (3) bisulfite conversion-based

methods.

Enzymes that recognize methylated and non-methylated DNA sequences are often

used to study the methylation profile of the DNA. These enzymes are referred to as

methylation-sensitive restriction enzymes. Methylation analysis can be done using

isoschizomer pairs (which bind to and cut a single recognition sequence identically) of

enzymes, in which one enzyme is sensitive to methylation (cleaves only when DNA is

unmethylated), such as HpaII and the other enzyme is methylation insensitive such as,

MspI, for example. Neoschizomer pairs of enzymes are also used to evaluate DNA

methylation. As a neoschizomer pair, both enzymes bind to a single recognition sequence

but cleave at different sites. One enzyme of the neoschizomer pair such as, SmaI, cleaves

only when the DNA is unmethylated and is paired with the methylation insensitive, XmaI.

Using either isochizomer or neoschizomer pairs of enzymes results in DNA fragments

generated by the methylation sensitive enzyme with differing sizes from sizes of the

fragments generated by the methylation insensitive enzyme. Cytosine methylation can then

be estimated by calculating the ration of the different DNA fragments (Brunner et al. 2009).

Although patterns of cutting can provide a read-out of DNA methylation, this approach is

limited in resolution and coverage by the sequence and modification-type specificity of the

enzymes available (Kriukiene et al., 2013).

In affinity based methods, an antibody capable of recognizing 5mC is used in

techniques, such as MeDIP (Methylated DNA Immunoprecipitation), is used to

immunoprecipitate the methylated fraction of the genome (Weber et al., 2005). MeDIP

11

enriches for the methylated regions of the DNA. This technique is limited by its inability to

accurately resolve regions with low to medium CpG density in the genome, such as CpG

islands (Laird, 2010). Since the majority of highly methylated regions are repetitive

elements, the majority of what is enriched will be repetitive elements.

Lastly, bisulfite conversion coupled with sequencing is presently the gold standard

for genome-wide methylation levels at CpG dinucleotides due to its ability to map 5-

modified cytosines with single-base resolution (Laird, 2010). Genomic DNA is denatured

and treated with sodium bisulfite, leading to the conversion of unmethylated cytosines to

uracils through a sulphonation reaction (Ball et al., 2009), with uracils copied as thymines

by DNA polymerase during subsequent PCR. Essentially, bisulfite conversion changes an

epigenetic difference into one of sequence, as it allows for the methylation status

information to remain, even after amplification (Laird, 2010). For example, a luciferase-

based sequencing approach known as pyrosequencing is often used to sequence bisulfite

converted DNA. The fallback of bisulfite conversion is the high potential for DNA

degradation, as well as incomplete conversion. Bisulfite conversion is also limited due the

fact that it cannot distinguish between 5mC and other modifications such as 5hmC (Khare

et al., 2012).

SEQUENCING PLATFORMS

DNA sequencing, in the context of DNA methylation analysis, provides sequence

information about methylation profiles. With the advent of sequencing, it is now possible to

identify the exact genes and genomic regions affected by methylation, providing increased

insight into the role of methylation in genomic regulation.

12

HYBRIDIZATION SEQUENCING

Hybridization sequencing refers to the use of microarrays to quantify mRNA or DNA

levels. At present, our laboratory uses Human Tiling Arrays 2.0R from Affymetrix to

investigate epigenetic variation between disease and control populations. The

oligonucleotides used in Human Tiling Arrays 2.0R span the genome, and are 25

nucleotides in length. Hundreds of thousands to millions of copies of a single

oligonucleotide are grouped together in a specific area on the array, in what is called a

probe cell. Each array contains over 6.5 million probe cells fixed to the array surface with

each designed to perfectly match specific genomic regions. A major advantage of

microarray technology is that thousands of genes can be interrogated simultaneously, by a

single array. There is a gap of 10 base pairs between the oligonucleotide sequences of each

probe, offering an average resolution of 35 base pairs as measured from the central

position of adjacent probes. The probes do not include sequences which are identified as

repetitive regions, or low complexity DNA sequences by computer software program,

RepeatMasker (Tarailo-Graovac & Chen, 2009). Currently, over 56% of human genomic

sequences are identified and masked by RepeatMasker. It is essential to remove repetitive

elements from the microarray probes, as their repetitive nature produces a noisy and

uninterpretable signal.

In order to begin microarray analysis, target DNA must first be enriched so enough

DNA is available for microarray probe binding. To study epigenetic variation, our lab has

employed restriction enzyme or immunoprecipitation-based enrichment techniques. The

target DNA is first amplified using PCR, with uracil nucleotides incorporated into the

resulting amplicons. These uracil nucleotides can then be recognized by uracil DNA

13

glycosylase (UDG). UDG cleaves the N-glycosidic bond between the uracil base and the

sugar backbone, creating an apyrimidinic site that blocks DNA polymerase from continuing

the chain reaction, and adding additional nucleotides (Barzilay, Walker, Robson, & Hickson,

1995). An apyrimidinic site has a terminal 5'-phosphate, recognized and cleaved by

apurinic/apyrimidinic endonuclease (APE1) activity in the BER pathway (Barzilay et al.,

1995; Marenstein, Wilson, & Teebor, 2004). APE1 cleavage generates a single-strand DNA

break, fragmenting the enriched amplicons to an average length of 25-100 nucleotides; this

fragmentation improves both the efficiency and specificity of target binding (Dalma‐

Weiszhausz, Warrington, Tanimoto, & Miyada, 2006). The resultant fragments are labeled

by terminal deoxynucleotidyl transferase (TdT) and a biotinylated nucleotide analogue.

This label is the binding site for the following fluorescent label.

Subsequently, bound DNA fragments are labelled with a fluorescent streptavidin-

phycoerythrin conjugate (SAPE), which binds to the biotin tag incorporated during target

amplification. Once labelled, the array is ready to be scanned. The scanner is able to

identify 65,000 distinct fluorescence intensities, and converts fluorescent measurements

into an electrical signal expressed as corresponding numerical value. These values

represent and are used to quantify DNA levels (Dalma‐Weiszhausz et al., 2006). Data from

the microarrays is analyzed by comparing the relative fluorescent signal intensity from

fragment-bound probes. If, for example, DNA was enriched for unmethylated regions, then

high levels of fluorescence emitted from a particular probe cell would indicate a lesser

degree of methylation from the region covered by the oligonucleotides, relative to other

probe cells.

14

PYROSEQUENCING

Pyrosequencing was the first sequencing platform capable of parallelizing the

sequencing process, made available as a commercial product (Margulies et al., 2005). This

luciferase-based technique, traditionally used to investigate single nucleotide

polymorphisms (SNPs) (Fakhrai-Rad, Pourmand, & Ronaghi, 2002), can be used to

determine the relative extent CpG methylation when combined with bisulfite conversion.

Primers designed to be biotinylated act to interrogate CpG sites on a bisulfite-converted

template. Since bisulfite-converted DNA is more fragile than genomic DNA, amplicon

lengths are usually kept to 200 bp or less, separated from solution through the use of

streptavidin beads. This short amplicon length limits the size of the genomic region that

can be efficiently evaluated using pyrosequencing. In this technique, singe-stranded DNA is

hybridized to a sequencing primer and incubated with the enzymes DNA polymerase, ATP

sulfurylase, luciferase, and apyrase, along with the substrates adenosine 5’ phosphosulfate

(APS), luciferin and the four dinucleotdies: dCTP, dGTP, dTTP, and dATPaS (dATPaS

replacing dATP, as the latter is a substrate to luciferase and would generate non-specific

signals). The immobilized single stranded DNA is sequenced through the synthesis of the

complementary strand, one base at a time. When a nucleotide is incorporated by DNA

polymerase, a pyrophosphate (PPi) is released at 1:1 stoichiometry and subsequently

converted to ATP by ATP sulfurylase. This reaction provides the energy for luciferase to

oxidize luciferin and generate light. This visible light, proportional to the amount of ATP

released, is detected by a camera as the measurement recorded. The final step of the

reaction is the degradation of unincorporated nucleotides by apyrase. This methodology

15

allows determination of whether or not CpG sites within the amplicon are methylated, or

not (Fakhrai-Rad et al., 2002).

The determination of the DNA methylation level of a CpG site is displayed as an

average of sequencing of all possible templates present in a single PCR reaction. As a result,

pyrosequencing can accurately determine the amount of methylation above a threshold

level of approximately 5% at each CpG position (Metzker, 2010). When the reverse PCR

primer is biotinylated, the reverse strand is used as the template strand and the

pyrosequencing primer will be extended according to the base composition of the forward

strand. In this case, the relative amount of methylation at the CpG position is determined by

the ratio of incorporated dCTP and dTTP, for methylated and unmethylated CpG sites

respectively. Similarly, if the forward PCR primer is biotinylated the forward strand is used

as a template and the sequencing primer is extended according to the base composition of

the reverse strand. The amount of DNA methylation at an individual CpG site is the

calculated ratio between dGTP for methylated CpG sites, and dATP-a-S for unmethylated

CpG sites (Fakhrai-Rad et al., 2002). Bisulfite conversion paired with pyrosequencing is an

efficient method to interrogate small genomic regions in a small group of samples.

However, this methodology is limited by its ineffective reading of sequences within

homopolymers (consecutive instances of the same base, such as AAA or GGG). As light

emission is non-linear following the incorporation of more than 5-6 identical nucleotides

(Ronaghi, 2001), the length of homopolymers is difficult to infer from signal intensity using

this approach (Shendure & Ji, 2008).

16

NEXT GENERATION SEQUENCING

Sequencing platforms capable of sequencing multiple samples in parallel are

referred to as ‘next-generation sequencing platforms’. Cyclic reversible termination is one

such sequencing method for sequencing DNA with single nucleotide resolution. In this

reaction, DNA polymerase incorporates a fluorescently modified nucleotide which is

complementary to the template base (Metzker, 2010). After the nucleotide is incorporated,

the polymerase activity is terminated and the remaining unincorporated nucleotides are

washed away. An image is then obtained to determine the identity of the incorporated

nucleotide. The fluorescent dye is then cleaved along with the terminating/inhibiting group

of the nucleotide, and an additional wash is performed before the next incorporation step

begins (Metzker, 2010). Reversible termination is achieved by using deoxyribonucleoside

triphosphates (dNTPs), which have their 3'-OH group blocked to make continuation of the

reaction impossible (J. Guo et al., 2008).

Next generation sequencing produces output sequences referred to as “reads”.

Using data analysis these reads are mapped back to a reference genome. Bioinformatic

programs such as Maq and Bowtie align reads to regions of the genome from which they

most likely originated (Trapnell & Salzberg, 2009). Yet the analysis of this approach

becomes more challenging if reads are generated from sequences containing one or more

repetitive element of the reference genome much like affinity-based sequencing methods,

as the mapping program cannot accurately state which copy of the repeat the sequence

originated from (sometimes reporting multiple locations, or choosing one at random)

(Trapnell & Salzberg, 2009). The program, bisReadMapper is a new tool used for the

analysis of bisulfite converted sequencing data. This program maps bisulfite-converted

17

reads to a fully bisulfite-converted genome sequence, allowing for reads to be mapped not

only to their correct region in the genome, but to their strand of origin as well (Diep et al.,

2012) – a feature not offered by hybridization sequencing at present.

Next generation sequencing allows for millions of DNA templates to be sequenced in

parallel, producing gigabytes of data; however, large scale whole genome sequencing is still

cost and time prohibitive (Metzker, 2010). In order to minimize cost and maximize

efficiency, 'target- enrichment' methods have been developed in which targeted genomic

regions of interest are selectively captured from the DNA sample before sequencing.

Sequencing targeted regions of the genome as opposed to the entire genome is not only

more time and cost effective, but produces data which can be more easily analyzed

(Metzker, 2010). Table 1 outlines the parameters, in addition to cost and ease of use,

which provide measures of the performance target enrichment techniques (Metzker,

2010).

18

Table 1. Parameters used to measure the performance of the target enrichment techniques.

Parameter Definition

Sensitivity The percentage of target bases that are covered by one or more sequence reads

Specificity The percentage of sequences that map to the targeted regions

Uniformity Variability in sequence coverage across target regions

Reproducibility How closely results from replicate experiments correlate

DNA Required The minimum amount of DNA required per experiment

ENRICHMENT TECHNIQUES FOR NEXT GENERATION

SEQUENCING

MTAG

Recently, a novel strategy termed ‘methyltransferase-directed transfer of activated

groups’ (mTAG) has been developed. This approach takes advantage of the covalent

chemical labeling and enrichment of unmethylated DNA (Kriukiene et al., 2013;

Lukinavicius et al., 2007), occurring in five steps as outlined in Table 2 (Kriukiene et al.,

2013).

Table 2. Major steps involved in the mTAG technique. Step 1 Mechanical shearing of genomic DNA to

200bp fragments Step 2 Methyltransferase-directed azide labeling

of unmethylated CpG dinucleotides Step 3 Appending biotin reporters to azide group Step 4 Affinity capture and recovery of biotin-

labeled fragments on streptavidin-coated beads

Step 5 PCR amplification of the recovered fraction for microarray analysis

19

The advantage of using mTAG is that it enables the enrichment of unmethylated CpG

dinucleotides. In this way, the enriched portion provides information on moderately

methylated regions such as CpG islands in gene promoter regions. In addition, mTAG allows

for genome-wide interrogation of methylation levels, as opposed to specifically targeted

regions such as in PCR and in using molecular inversion probes. Though mTAG-enriched

DNA can be sequenced alone, our lab uses mTAG coupled with hybridization sequencing

(microarrays) in order to obtain a general pattern of areas in which methylation profiles

may vary between subjects and controls. If significant variation exists in a few genomic

regions between the disease samples compared with controls, PCR coupled with

pyrosequencing is used to validate the microarray results with single base resolution. If the

microarray data demonstrate that there are multiple areas within the genome with high

variance, then molecular inversion probes coupled with next generation sequencing are

used to validate the microarray data with single base pair resolution.

POLYMERASE CHAIN REACTION

PCR can be used to amplify a specific target amplicon for subsequent sequencing.

The amount of DNA required for this technique is proportional to the desired number of

target amplicons, since each amplicon must be amplified individually (Cho et al., 1999). The

sensitivity of PCR amplification can be increased by using overlapping PCRs for a target

region. Increased sensitivity comes at the cost of higher DNA input, which is notably not

always available in the cases of clinical studies. Multiplex, a technique in which multiple

primers are added in the same PCR reaction, is typically avoided when using PCR primers

in genomic DNA to avoid amplicon amplification failure, as well as non-specific

amplification caused by primer-primer interactions (Cho et al., 1999; Wang et al., 1998).

20

Furthermore, multiplexing is impossible when using bisulfite- treated DNA, since the

lowered complexity of the sequence enhances primer-primer interactions. Each PCR must

therefore be validated and optimized individually, making PCR an impractical enrichment

method for a high-throughput next generation sequencing platform like Illumina MiSeq

(Mamanova et al., 2010). In our lab, PCR enrichment is coupled with pyrosequencing for

analyzing only small regions of the genome with high uniformity and reproducibility.

MOLECULAR INVERSION PROBES

Bisulfite-converted DNA can also be selectively enriched through the use of

molecular inversion probes. Molecular inversion probes are an improved version of the

padlock probe first designed in 1994 (Nilsson et al., 1994). Padlock probes were originally

designed to identify SNPs through a ligase reaction, and have now been improved to allow

for the interrogation of longer sequences (Porreca et al., 2007). The probe is made up of

single stranded DNA, containing two sequences, separated by a common linker region,

which are complementary to the target genomic DNA. Molecular inversion probes anneal to

and capture target DNA through circularization for subsequent enrichment. The term

‘circularization’ refers to how after the probe hybridizes to the target. The two arms of the

probe hybridize to the targeted DNA sequence. DNA polymerase extends the sequence

from the first arm to the second, after which, the extended sequence is then ligated to the

second hybridized arm of the probe, creating a closed DNA circle, as outlined in Figure 1

(Hardenbol et al., 2003; Mamanova et al., 2010; Metzker, 2010; Porreca et al., 2007).

Remaining probes that did not circularize are digested by exonucleases in the subsequent

step. Since each probe has an identical linker sequence, only one pair of primers is required

to amplify all the circularized probes in solution (Akhras et al., 2007). The forward or

21

reverse primer for each sample can be fitted with a unique nucleotide sequence six

nucleotides in length, which is used to differentiate the samples. In this way, resultant

amplicons from each sample will have a unique six nucleotide sequence for identification,

allowing amplicons from many samples to be pooled for sequencing. Molecular inversion

probe enrichment is a cost-effective alternative to whole genome sequencing, and is

optimal for high-throughput analyses using minimal amounts of DNA (J. B. Li et al., 2009).

22

Figure 1 Schematic of library-free BSPP protocol. Each padlock probe has a common linker

sequence flanked by two target-specific capturing arms (red) that anneal to bisulfite converted

genomic DNA (black). The 3′ end is extended and ligated with the 5′ end to form circularized

DNA. After removal of linear DNA, all circularized captured targets are PCR-amplified with

barcoded primers and directly sequenced with an Illumina sequencing platform (GA II(x) or

HiSeq). Amplicon size is 363 bp, which includes captured target (180 bp), capturing arms (55

bp), and amplification primers and adapters (128 bp). The inserts can be read through with

paired-end 120 bp sequencing reads. Reprinted with copyright license (Diep et al., 2012).

23

In addition, molecular inversion probes can be multiplexed; that is, tens of

thousands of genomic loci can be enriched simultaneously (Diep et al., 2012; Hardenbol et

al., 2003; Porreca et al., 2007; Shen et al., 2013). A major challenge with molecular

inversion probes has been probe hybridization reproducibility. When molecular inversion

probes were used to enrich for exon regions of genomic DNA, molecular inversion probes

performed with low sensitivity and low reproducibility. The authors reported that in

duplicate experiments, only 20% of targets were captured in both data sets, and only 11%

were exon regions. Improvements to probe design have since been made to improve target

amplification. Diep et al. (2012) were able to sequence 330,000 probes that covered

140,749 non-overlapping regions with a total size of 34 mega bases. Due to the large

genomic areas which can be targeted and amplified using padlock probes, this technique

couples well with a next generation sequencing platform such as Illumina MiSeq.

EPIGENETIC MECHANISMS: LACTASE EXPRESSION IN MICE

The lactose provided by a mother’s milk is a major dietary source of carbohydrates

for mammals in the first few weeks of life. Lactose is digested by the enzyme lactase, which

is produced exclusively by the enterocytes of the small intestine epithelium intestine. As

offspring age, the small intestine develops in preparation for the consumption of solid food,

characterized by changes in enzymatic expression. Generally, lactase expression decreases

as the expression of digestive enzymes such as sucrase-isomaltase (sucrase) increase (Jost,

Duluc, Vilotte, & Freund, 1998). Initially, lactase is expressed in highest levels in the jejunal

segment of the small intestine, but expression rapidly decreases just prior to weaning in

most mammals. Sucrase has a contrasting temporal pattern of expression compared with

24

lactase, in that it is expressed minimally at birth, and then drastically increases around

weaning. Coincidentally, the expression of sucrase is also highest in the jejunal segment of

the small intestine (L. Fang, Ahn, Wodziak, & Sibley, 2012).

The onset of decreases in lactase expression and the coinciding increase in sucrase

expression can be influenced by nutritional and hormonal changes at the time of weaning.

For example, the decrease in lactase expression can be delayed or accelerated by the

delayed or premature introduction of solid food, respectively (Jost et al., 1998). In addition,

the enzymatic changes of the small intestine can be introduced by stress, which also

accelerates the decrease in lactase expression (Jost et al., 1998). Previously, it has been

investigated whether the temporal patterns of lactase and sucrase expression could be

regulated by transcription factors capable of regulating the expression of intestinal lactase

and sucrase genes (R. Fang, Olds, & Sibley, 2006). A number of transcription factors which

activate lactase and sucrase, including, Cdx-2 (R. Fang et al., 2006; Suh, Chen, Taylor, &

Traber, 1994) GATA-4, GATA-5, and GATA-6 (R. Fang, Olds, Santiago, & Sibley, 2001), along

with HNF1α and HNF1β (R. Fang et al., 2006) have been examined in the small intestine of

young and old mice, and compared with the expression of repressing transcription factors

PDX-1 and CDP. No clear pattern of transcription factor expression has yet to be identified

to account for the contrasting temporal patterns of lactase and sucrase expression.

However, spatial and temporal expression patterns may be due to a distinct combination of

regulatory transcription factors acting in concert (R. Fang et al., 2006).

Intestinal epithelial cells are essential for food absorption and digestion.

Enterocytes account for 90% of all epithelial cells and are responsible for the production of

lactase (Sheaffer et al., 2014). The intestinal epithelium is constantly maintained through

25

the division of stem cells, which differentiate as they migrate from the intestinal crypt

region to the villus. This process takes 3-5 days in both mice and humans, and allows for

function in the intestinal epithelium to be continually maintained (Sancho, Batlle, &

Clevers, 2004; Sheaffer & Kaestner, 2012).

Epigenetic mechanisms, specifically DNA methylation, have been shown to be

critical for proper gene expression during stem cell differentiation in self-renewing tissues,

just as in the small intestine (Sen et al., 2010; Smith & Meissner, 2013), and the functional

role of DNA methylation in intestinal epithelial differentiation has been recently evaluated

(Sheaffer et al., 2014). Interestingly, global interruption of DNMT1 activity in the small

intestine resulted in a decrease in differentiated enterocytes (Sheaffer et al., 2014). As a

result, expression of enterocyte-specific genes, such as lactase, was decreased. Thus, DNA

methylation plays a regulatory role not only in the differentiation of stem cells into

enterocytes, but also the gene products (such as lactase) for which they are responsible

(Sheaffer et al., 2014).

AIM OF THE STUDY- EPIGENETIC MECHANISMS IN LACTASE

EXPRESSION IN MICE

It was mentioned how the temporal pattern of lactase expression is affected by

environmental stimuli, such as the introduction of complex adult food or stress. In addition,

the post-weaning decrease in lactase expression coincides with an increase in sucrase

expression and the mechanism of this counter-expression has yet to be fully understood.

Since lactase expression is influenced by environmental factors and a significant decrease

occurs at a specific developmental stage, it’s possible that regulation of lactase expression

may be mediated by an epigenetic mechanism. As such, the overall aim of this study was to

26

determine whether DNA methylation levels vary between pre-weaned and post-weaned

mice and whether methylation levels are associated with levels of lactase expression. The

first step in determining how methylation levels correlate with lactase expression was to

examine the change in lactase expression in young and old mice.

EPIGENETIC MECHANISMS: LACTOSE INTOLERANCE IN HUMANS Lactose is hydrolyzed by enterocytic lactase into two absorbable sugars, glucose and

galactose, within the small intestine. Lactase activity is highest during the perinatal period,

but significantly decreases in some individuals after 2-12 years of age. Those who

experience this decrease in lactase expression with age lose their ability to digest lactose,

developing the condition called lactose intolerance or hypolactasia (Troelsen, 2005). In

contrast, lactase-persistent individuals retain neonatal levels of lactase activity throughout

adulthood, allowing these adults to consume lactose without negative effects (Rasinpera et

al., 2005; Troelsen, 2005). Lactase persistence shows dominant inheritance and complete

phenotype concordance in monozygotic twins (Metneki, Czeizel, Flatz, & Flatz, 1984). Yet in

those who are lactose intolerant, lactose is fermented by the colonic microflora, creating

short-chain fatty acids, hydrogen, carbon dioxide and methane. These byproducts

unfortunately cause bloating, flatulence, diarrhea and abdominal pain (Lomer, Parkes, &

Sanderson, 2008).

Numerous tests may be used to diagnose lactose intolerance. The Quick Lactase Test

(QLT) is a biochemical assay following a duodenal biopsy, capable of diagnosing the

intolerance. A glucose oxidase reagent is used in the assay to measure the amount of

glucose released after the hydrolysis of lactose (Kuokkanen et al., 2006). Lactase activity

can then be assessed by the color change of the reaction: a no-color reaction indicates

27

severe hypolactasia, with lactase activity of <10 U/g; light blue reactions indicate mild

hypolactasia corresponding to lactase activity of 10-30 U/g; and finally, normal lactase

activity of 30 U/g corresponds to a deep blue reaction color (Furnari et al., 2013). Invasive

jejunal biopsies were replaced with the less-invasive endoscopic duodenal biopsies for the

purposes of the QLT, which facilitated patient diagnosis (Furnari et al., 2013; Kuokkanen et

al., 2006). Although the mean lactase activity in the duodenum is 40% lower than in the

jejunum, this test was still effective at identifying patients with severe hypolactasia with

100% accuracy (Kuokkanen et al., 2006). Alternatively, tests that indirectly measure

lactase function have been developed to avoid the need for intestinal biopsies. In one such

test, blood glucose levels were measured before and after oral lactose ingestion, as well as

at specific time intervals. The individual is considered lactose tolerant if the there is a

minimum blood glucose rise of 20mg/dL (Law, Conklin, & Pimentel, 2010). As another

example, the hydrogen breath test was considered to be the most suitable test for

population screening of lactose intolerance (Mattar, de Campos Mazo, & Carrilho, 2012;

Newcomer, McGill, Thomas, & Hofmann, 1975). For this test, an individual orally consumes

50 g of lactose, which is the equivalent of 4-5 cups of milk. If the individual is lactose

intolerant, the undigested lactose ferments, releasing hydrogen, carbon dioxide, and

methane which are absorbed and eliminated by the lungs. High levels of hydrogen

exhalation after lactose consumption would sensibly indicate that an individual is lactose

intolerant (Law et al., 2010; Mattar et al., 2012; Newcomer et al., 1975). Genetic testing for

the lactase expressing phenotype is now used to screen populations for lactose intolerance

(Mattar et al., 2012).

28

The lactase gene in humans, LCT, is 49.3 kb in length and is located on long (q) arm

of chromosome 2 at position 21. LCT is made up of 17 exons, which translate into a 6 kb

transcript (Boll, Wagner, & Mantei, 1991). The RNA sequence in this 6kb transcript is the

same (barring some silent mutations) in individuals with either hypolactasia or lactase-

persistence (Boll et al., 1991). However, two DNA variants have been identified in LCT

introns which have been highly associated with lactase-persistence in subjects of European

descent (Enattah et al., 2002; Troelsen, 2005). The first variant, LCT-13910*C/T, is located

in intron 13 of the MCM6 gene, 13,910 bp upstream from the initiation codon for LCT. In

subjects of European descent, simply having a single LCT-13910*T allele allowed for the

lactase persistent phenotype (Mattar et al., 2012; Rasinpera et al., 2005; Troelsen, 2005).

This indicates that the lactase persistence allelic variant behaves in a dominant fashion, and

thus the LCT-13910*CC genotype, in which no lactase persistent allele is present, is

consistent with the inability to digest lactose (Mattar et al., 2012). The second variant, LCT-

22018*G/A, occurs at intron 9 of the MCM6 gene located 22,018 bp upstream of the LCT

start codon. This second variant was very strongly, but not completely, associated with the

lactase persistence phenotype. Although the second variant’s association with the lactase

persistence phenotype is incomplete, there is almost full agreement in the genotyping of

the two variants. For example, individuals with a cytosine on both copies of chromosome

two in the first SNP site (LCT-13910*CC) also had a guanine in the second SNP site (LCT-

22018*GG), individuals which were heterozygous for the first SNP site (LCT-13910*CT)

were also heterozygous at the second SNP site (LCT-22018*GA). Individuals with the

lactose tolerance phenotype had a thymine nucleotide in the first SNP site in both copies of

chromosome two (LCT-13910*TT) also had adenine in the second SNP site in both copies

29

of the second chromosome (LCT-22018*AA) (Rasinpera et al., 2005). These two SNP

genetic variants are only associated with lactose persistence in individuals of Northern-

European descent. In contrast, LCT-13907*G, LCT-13915*G, and LCT-14010*C are SNPs

which are associated with lactose persistence in African and Arabian populations (L. Fang

et al., 2012; Ingram et al., 2007).

Although SNPs may be correlated with lactose non-persistence, there is substantial

variability in lactase expression between those who are homozygous non-persistent (LCT-

13910*CC), and those heterozygous and homozygous persistent (LCT-13910*TT)

(Troelsen, 2005). Previous research indicates that the LCT gene resides in a region with

high ‘linkage disequilibrium’ which extends over several hundred kilobases (Poulter et al.,

2003), suggesting that the key SNPs are merely highly associated markers rather than

causal agents (Swallow, 2003). Linkage disequilibrium refers to a combination of alleles or

genetic markers that do not undergo random recombination and are present in a higher

frequency in the population that would be expected by chance. In vitro studies have

demonstrated that lactase persistence variants may act as enhancers of the LCT promoter

by improving the binding of Oct-1 and HNF1α transcription factors. Transgenic mice have

previously been used to determine how these lactase persistence-associated SNPs affect

lactase expression in vivo (L. Fang et al., 2012), carrying a luciferase reporter gene driven

by a rat 2 kb lactase gene promoter fused with portions of human DNA sequence

corresponding to either the lactase persistence-associated LCT-13910*T, or the non-

persistence-associated LCT-13910*C SNP. It was found that the transgene expression

followed the same expression pattern as endogenous lactase, expressed only in the

enterocytes of the small intestine with highest expression occurring in newborn pups,

30

followed by a sharp post-wean decline (Lee et al. 2002). In this study, there was a 16-fold

decrease in luciferase activity in adult mice with the LCT-13910*C SNP compared with

pups, whereas adult mice with the LCT-13910*T SNP experienced a ~1.6-fold increase in

luciferase expression compared with pups. These results support the causal role of the

LCT-13910*C/T SNPs in determining the lactase persistence/non-persistence phenotype,

although the mechanism by which these SNPs control lactase expression has yet to be fully

characterized (L. Fang et al., 2012).

AIM OF THIS STUDY- EPIGENETIC TECHNIQUES IN LACTOSE

INTOLERANCE

The theory that lactase persistence variants improve transcription factor binding

does not fully explain hypolactasia. For instance, genetic variants cannot account for the

ontogenic delayed-age-of-onset associated with the condition. In other words, no

explanation is offered as to why lactase expression only decreases after a certain age. The

age of onset of the condition and the variability in lactase down regulation between the

three genotypes supports the hypothesis that additional factors, likely epigenetic,

contribute to the specifics of lactase expression (Swallow, 2003).

With the preceding information in mind, it may be stated that the aim of the current

study was to evaluate the use of the molecular inversion probe technique to enrich unique

and segmentally duplicating regions of DNA, with the goal of validating previous data

obtained through microarray analysis.

31

HYPOTHESES AND RATIONALE

HYPOTHESIS 1 Lactase expression decreases in 60 day old mice, as compared with 6 day old mice.

RATIONALE 1

Developmental changes in rodent lactase expression have been well investigated.

Our objective was to demonstrate that the developmental decrease in lactase expression is

governed in part by an epigenetic mechanism. To that end, we first performed a gene

expression assay on the small intestinal biopsy of both 6 and 60 day old mice to

demonstrate that lactase expression decreases with age. Relative lactase gene expression in

the proximal jejunal, distal jejunem and ileum segments of the small intestines was

compared in 6 and 60 day old mice.

HYPOTHESIS 2

Molecular inversion probes were used to interrogate the methylation status of

unique and segmentally duplicating regions in the genome of experimental subjects, to

provide high resolution validation of previous microarray data.

RATIONALE 2

Dr. Viviane Labrie, of our laboratory, previously performed expression assays along

with genotyping experiments, to determine whether lactase expression was correlated

with genotype. An mTAG enrichment followed by microarray hybridization was performed,

revealing four unique regions and four segmentally duplicating regions in chromosome 2

for which the methylation profile strongly correlated with lactase expression. The

32

gastrokine 2 (GKN2) gene and the first LCT exon were among the unique regions in which

methylation levels correlated with lactase expression. As segmental duplications are blocks

of genomic DNA ranging from 1-200kb in length that contain sequence features including

high-copy repeats and gene sequences with intron-exon structure, these blocks of DNA can

repeat hundreds of times throughout the genome as well as repeat either inter- or intra

chromosomally or both (Bailey et al. 2001).

Regions of segmental duplication hence pose complications for interrogation. For

instance, when designing primers for the lactose intolerance regions, it must be kept in

mind that primers will easily anneal to the segmental duplication as it appears on several

different chromosomes in the genome. However, since the regions are only 90% identical,

amplicon sequences from each segmental duplicate will have variation. This amplicon

variation will cause the pyrosequencing to fail, since ~100% homology between predicted

sequence and amplicon is required.

Since we were attempting to interrogate the methylation levels of regions within

segmental duplications in this experiment, next generation sequencing was necessary as it

does not require 100% sequence homology of amplicons (since each amplicon is

sequenced individually). Due to the repetitive nature of the regions of interest and limited

DNA available, molecular inversion probes were selected as the enrichment method. In

sum, this experiment evaluated the efficacy of molecular inversion probe enrichment

followed by next-generation sequencing to interrogate the methylation profiles of unique

and segmentally duplicating regions within the genome, with lactase expression serving as

our biological context.

33

METHODS

RNA EXTRACTION FROM TISSUE

The proximal and distal jejunum, along with the ileum of the small intestine from

CL57BL/6 strain mice (fifteen 6 day old and fifteen 60 day old) was harvested and

immediately frozen in liquid nitrogen. Samples were then stored in -80°C. RNA was

extracted from mouse small intestinal tissue with an ‘RNeasy Mini Kit’ (Qiagen). 20-30mg

of tissue was placed in a homogenization tube (Precellys 2mm beads) containing 600µl

RNeasy lysis buffer. Tissue was immediately homogenized at 5000 rpm in two 15 second

periods, spaced 10 seconds apart. The lysis solution was then transferred to the RNeasy

column, with the following steps preformed according to the manufacturer’s instructions.

RNA was eluted in RNase-free water, prior to treatment with DNAse I (Qiagen) for one hour

to remove DNA from the samples. Treated RNA samples were then re-purified through a

fresh RNeasy column, and again eluted in RNase-free water. RNA sample quantity was

verified using a Nanodrop spectrophotometer 2000. The quality of the RNA was

determined by using the Agilent 2100 bioanalyzer which uses microcapillary

electrophoretic RNA separation and an algorithm to allow for the calculation of an RNA

integrity number (RIN). The RIN score ranges from 10 (completely intact) to 1 (completely

degraded). All samples had a RIN score >8. Samples were stored at -80°C until used

further.

CDNA SYNTHESIS

34

cDNA synthesis was performed using a High Capacity RNA-cDNA kit (Applied

Biosystems). This kit uses MultiScribe™ MuLV reverse transcriptase, which optimally

works at a temperature of 37°C. Included in the reaction are dinucleotide triphosphates

(dNTPs), random octamer primers, and oligo dT-16.

For each reaction, 1- 2µg RNA in a total volume of 20µl was used. To each RNA sample,

10µl of the 2x reaction buffer (Applied Biosystems) was added along with 1µl of the MuLV

reverse transcriptase. Samples were gently mixed and briefly centrifuged before being

incubated at 37°C for 60 minutes, followed by heating at 95°C for 5 minutes. After the

completion of the reaction, samples were diluted to 10ng/ul and stored at -80°C until

further use.

REAL-TIME PCR

Real-time PCR allows for the amplification of products to be detected as the reaction

progresses, with measures taken at the end of each cycle. Fluorophores emit a fluorescent

signal proportionate to the amount of DNA produced, permitting data to be collected in

'real time' rather than at the end of PCR.

For our real-time PCR analysis, we used Taqman™ Probes. In addition to two

primers which are designed to amplify a specific region, there is a third oligo called a probe,

which is target specific and sits on the region to be amplified between the primers. Two

molecules, the reporter dye and the quencher, are covalently bound to the probe. The

reporter sits on the 5' end of the probe, and is what produces the fluorescent signal as more

DNA product is produced. The quencher sits on the 3' end of the probe and inhibits

(“quenches”) the signal of the reporter dye whenever they are in close proximity. During

the PCR reaction, the polymerase hydrolyzes the quencher from the oligo as it replicates

35

the amplicon. When the quencher is removed from the oligo, this permanent separation

allows the reporter dye to emit its signal unimpeded. Since the oligo is target specific, the

detected fluorescent signal comes only from amplification of targets and not from anything

non-specifically amplified.

‘Taqman Gene Expression Mastermix’ (Applied Biosytems) was used to evaluate

transcript levels of target genes. 40ng of cDNA template was used in each PCR reaction.

Measurements were performed in triplicates, and controls included both non-template

controls, along with samples which had not been treated with reverse transcriptase.

Amplification curves and gene expression were normalized to the housekeeping gene

GAPDH. The expression of each specific gene was determined using the ‘Assay-on-Demand’

gene expression products (Applied Biosystems) Lactase (Mm01285112_ml) and GAPDH

(Mm99999915_g1). Each reaction had a final volume of 20 µl, and all reactions were

performed in 384-well microtiter plates. PCR amplification and fluorescence data

collections were performed using the ViiA™ 7 Real-Time PCR System (Applied Biosystems).

PCR program parameters included a hold stage of 2 min at 50°C, denaturation for 5 min at

95°C, and 40 cycles of amplification. Amplification cycles consisted of 15 sec at 95°C,

followed by 1 min at 60°C.

HUMAN MOLECULAR INVERSION PROBE SAMPLE SELECTION

Samples to be interrogated by molecular inversion probe enrichment were selected

based on several factors. Whole blood samples were selected based on lactase activity as

measured by the QLT, to include equal numbers of human individuals with low and high

lactase activity. Age and gender were normalized across groups. Enterocyte and jejunum

36

samples were selected based on LCT mRNA levels as measured using quantitative PCR.

Samples were selected to include equal numbers of samples with high and low LCT mRNA.

Age and gender were normalized across groups. Samples selected are summarized in

Table 3 and Table 4.

37

Table 3. Summary of blood sample selection. No significant differences in age or gender existed

between samples with high vs. low lactase activity as determined by Student’s T-Test (p>0.05). Blood

High Lactase Activity Low Lactase Activity

Mean Age ± SEM 39.5 ± 3.79 39.5 ± 3.29

Males 8 8

Females 4 4

Table 4. Summary of jejunum and enterocyte sample properties used. * indicates significantly

lower lactase mRNA as indicated by Student’s T-Test (p <0.05) in jejunum samples categorized

as ‘Low LCT mRNA’, as compared with samples categorized as ‘High LCT mRNA’. No

significant differences in age or gender existed between samples with high vs. low lactase

activity as determined by Student’s T-Test (p>0.05). Jejunum Enterocyte

High LCT mRNA Low LCT mRNA High LCT mRNA

Low LCT mRNA

Mean Relative Quantity ± SEM

0.9 ± 0.112 0.206* ± 0.038 1.25 ± 0.51 0.18 ± 0.68

Mean Age (yrs) ± SEM

39 ± 2.12 39.92 ± 2.59 35.5 ± 6.5 31.5 ± 10.5

Males 2 2 0 0

Females 10 10 2 2

38

MOLECULAR INVERSION PROBES

Probes were designed using ppDesigner (Diep et al., 2012). Probes were designed to

span the forward and reverse strands of specific genomic regions not masked by

RepeatMasker. Ligation and extension arm length varied from 15 nt to 25 nt in length with

the total length of both arms always equaling 40 nt. Coordinates of interrogated regions

from Genome Browser Human Genome Assembly NCBI36/hg18 are outlined in Table 5.

Table 5. Genome Browser Human Genome Assembly NCBI36/hg18 coordinates of target regions interrogated by molecular inversion probes.

Region Chromosome Starting Coordinate

Ending Coordinate

Unique 1 Chr2 69030436

69032400

Unique 2 Chr2 136260090

136260554

Unique 3 Chr2 136309688

136311228

Unique 4 Chr2 141643329

141644763

Segmental Duplication 1

Chr2 130533089

130540336


Chr2 130923685

130931342


Chr2 131747376

131752487


Chr2 131757424 131763913

39

A total of 60 probes were synthesized by the company, Integrated DNA

Technologies, diluted in water to a concentration of 0.5pM, and stored at -20°C. Samples

which had undergone whole genome amplification were used as controls for bisulfite

conversion. Whole genome amplification (WGA) was performed using phi29 DNA

polymerase (Thermo Scientific). Forty ng of DNA per sample was amplified using the

following procedure: DNA, exonuclease resistant primers (Thermo Scientific), and phi29

reaction buffer (Thermo Scientific) were mixed for a final sample volume of 17µl. The

reaction mix was slowly heated by incubation at starting temperature of 35°C, increasing

by 1 degree per minute for 60 minutes, reaching a final temperature of 95°C. After this step,

dinucleotide triphosphates (dNTPs), bovine serum albumin (Thermo Scientific), enzyme

phi29 polymerase (Thermo Scientific) and pyrophosphatase (Thermo Scientific) were

added (reaching a final volume of 20µl) and left to amplify at 30°C for 6 hours. Twenty-four

blood samples, twenty-four jejunum samples, and four samples of enterocyte DNA were

interrogated in this experiment, including four blood and two jejunum samples which were

run with a single replicate. Two blood samples were used as the WGA control. 1µg of DNA

from each sample (as well as WGA controls) was treated with sodium bisulfite using an ‘EZ

DNA Methylation-Lightning Kit’ (Zymogen). The steps of bisulfite conversion were

performed according to manufacturer’s instructions. Samples were diluted in RNAse-free

water and diluted to a concentration of 40ng/µl.

6µl of 0.5pM probes (IDT) were added to 10µl of bisulfite converted DNA (400ng),

along with 1x Ampligase buffer (Mandel) in a 96-well plate. Probes covering segmentally

duplicating regions were hybridized and amplified separately from probes covering unique

regions of the DNA. The reaction mix was denatured at 95°C for 10 minutes, followed by a

40

decrease in temperature of 1°C/minute until 55°C was reached. Samples incubated at 55°C

for 5 hours, before the temperature again decreased by 1°C/minute until 50°C was reached.

Samples again incubated for 5 hours followed by a decrease in temperature of 1°C/minute,

reaching 45°C, before incubating at this temperature for 20 hours. 2.5µl of HLN mix (18mM

dNTP, 0.5U/µl Ampligase (Epicentre) in 1x Ampligase buffer, and ~2.5U/µl of Hemo

KlenTaq (New England Biolabs) was added to the reaction for gap-filling reactions. For

circularization, the reactions were incubated at 45°C for 5 hours followed by a gradual

increase in temperature by 1°C/minute until 50°C was reached, incubated at this

temperature for 5 hours, temperature was again increased by 1°C/minute until 55°C was

reached, and incubated at this temperature for another 5 hours. This was immediately

followed by enzyme inactivation at 94°C for 2 minutes. To digest linear DNA after

circularization, 3µl of exonuclease mix (20U/µl of exonuclease I and 200U/µl of

exonuclease III; New England Biolabs) was added to the reactions, and the reactions were

incubated at 37°C for 1 hour followed by enzyme inactivation at 94°C for 2 minutes. Linear

DNA digestion was then repeated for a second time. 2µl of circularized DNA was amplified

and barcoded (labelled with a unique sequence of six nucleotides in length) in 50µl

reactions using 25µl ‘Phusion High-Fidelity 2 x Master Mix’ (NEB) and 4mM of forward and

reverse indexing primers. Reverse primers were complete with barcodes that were 6

nucleotides in length, by which individual samples could be identified. The forward primer

sequence was 5’-CAGATGTTATCGAGGTCCGAC-3’, and reverse primer sequence was, 5’-

GGAACGATGAGCCTCCAAC-3’.

Circularized DNA was amplified using the following temperature sequence: 98°C for

30 seconds, followed by 35 cycles of 98°C for 10 seconds, 58°C for 30 seconds, and 72°C for

41

30 seconds. The program terminated following 5 minutes at 72°C. Five µl of each product

was run on an agarose gel to ensure product was the expected 258 base pairs of length.

Amplicons were purified using 0.7x Ampure beads, to size select for the 258 base pair

amplicon.

SEQUENCING

Once purified, the Illumina MiSeq platform was used to sequence the enriched

amplicons, using cyclic reversible termination to produce sequencing reads. In it, a fluorescently

modified nucleotide complementary to the template is incorporated by a DNA polymerase. After

the nucleotide is incorporated, the polymerase activity is terminated and the remaining

unincorporated nucleotides was washed away. An image was then captured to determine

the identity of the incorporated nucleotide before cleavage of the fluorescent dye along

with the terminating/inhibiting group of the nucleotide, removed by an additional wash

before the next incorporation step began (Metzker, 2010). The Illumina MiSeq platform

provides 1.5-2 GB of sequences per run, with an observed raw error rate of 0.80%. It

provides reads which are up to 150 bases in length from both sides of the amplicon,

producing what is known as a paired read (Quail et al., 2012).

STATISTICAL ANALYSIS

The following formulas were used to compare the relative expression of the target

gene in the samples.

42

ΔCT (Normalized expression) = CT (GAPDH) – CT (target gene)

ΔΔCT (Relative expression) = ΔCT (Target) –ΔCT (Reference)

Fold Difference (Relative Abundance) = 2(-ΔΔCT)

Differences in lactase expression were analyzed using two-way Student’s T-Tests.

All analysis and images were done using Microsoft Excel 2013.

MOLECULAR INVERSION PROBE ANALYSIS

Molecular inversion probe analysis was performed as previously described (Diep et

al., 2012). Initially, all C’s were converted to T’s in the reference genome to create a

reference ‘bisulfite’ genome. This was done separately for the Watson’s and the Crick’s

strands. The sequencing reads were in FASTQ formation and were encoded by predicting

the mapping orientation of each read. Once the orientation was predicted, the reads are

‘bisulfite converted’ as all C’s were converted to T’s in forward mapping reads, and all G’s

were converted to A’s in predicted reverse complimentary reads. SOAP2Align was used to

map the bisulfite reads to the converted genome. Once one alignment per read was

selected, the original cytosine calls were placed back into the alignment information.

Alignments were then converted to pileup format using SamTools. bisReadMapper was

then used to call methylation frequencies.

CORRELATION OF METHYLATION LEVELS BETWEEN TECHNICAL REPLICATES

Correlation between technical replicates was calculated using Pearson’s correlation

coefficient on all CpG sites identified in both replicates. The methylation frequencies

obtained from bisReadMapper for CpG sites with at least a read depth of 10 in both samples

43

were input into the statistical package R. Pearson’s correlation for the two replicates was

computed using the cor.test function.

ANALYSIS OF METHYLATION AND LACTASE EXPRESSION

Two-tailed Spearman correlation test with Bonferroni correction for multiple

comparisons was performed on each CpG site with a minimum of 10x depth coverage in a

minimum of six samples to determine whether methylation status was correlated with

lactase expression.

44

RESULTS

LACTASE MRNA EXPRESSION IN MICE

The relative expression of lactase in the proximal jejunum of 6 day old mice (n=15)

and 60 day old mice (n=15) was compared. For a each condition, measurements are

reported as (mean±SEM). There was a significant difference in the mean relative lactase

mRNA expression levels observed in 6 day old mice was (1.56 ± 0.104) when compared

with that of 60 day old mice (1.18 ± 0.032) (p<0.05), as determined using Student’s T-Test

(Figure 2).

We then compared the relative lactase expression levels in the distal jejunum (n=3)

and ileum (n=3) of 6 day old mice vs. 60 day old mice (jejunum n=3; ileum n=3). There was

a significant difference in lactase expression in the 6 day old mice (distal jejunum=

1.03±0.18, ileum= 1.04±0.21) as compared with the lactase expression in 60 day old mice

(distal jejunum= 0.24±0.11, ileum= 0.37±0.02) (Figure 3). This difference was significant

as determined by Student’s T-Test (p<0.05).

45

Figure 2. Relative mRNA expression levels of lactase in the proximal jejunum of 6 day old mice (n=15) vs. 60 day old mice (n=15), represented as group means ± SEM. GAPDH and Actin were used as endogenous controls. The star indicates a significant difference in lactase expression (p<0.05) in 60 day old mice as compared with 6 day old mice as determined by Student’s T-Test.

*

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

6 Day 60 Day

Rel

ativ

e m

RN

A E

xpre

ssio

n

Age of Mice

46

Figure 3. Relative mRNA expression of lactase in the distal jejunum of 6 day old mice (n=3) and 60 day old mice (n=3), as well as the ileum of 6 day old mice (n=3) and 60 day old mice (n=3) represented as mean ± SEM. GAPDH and Actin were used as endogenous controls. The star indicates significant difference in lactase expression as observed using Student’s T-Test (p<0.05) in both segments of the small intestine of 60 day old mice as compared with 6 day old mice.

* *

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Distal Jejunum Ileum

Rel

ativ

e m

RN

A E

xpre

ssio

n

Intestinal Segment

6 Day60 Day

47

MOLECULAR INVERSION PROBES: LACTASE INTOLERANCE IN

HUMANS

There were 405 CpGs across the targeted regions, corresponding to 810

interrogated cytosines after bisulfite conversion. Of these 810 cytosines, 439 had adequate

data to be analyzed, which was defined to be at least 10x coverage in at least 6 out of a total

of 54 samples. On-target probe binding and sequencing was low across all unique regions.

On-target probe binding refers to the percentage of total sequencing reads for a target

region that mapped back to the region. Figure 4 shows the low specificity of the

sequencing reads produced using molecular inversion probe enrichment followed by

sequencing. Unique regions refer to regions of the DNA that occur once on each copy of a

single chromosome. For example, the sequence defining the lactase gene is only found once

on each copy of chromosome 2. Regions are unique relative to the segmental duplications,

which can be found to repeat on multiple chromosomes. The highest on-target activity was

observed in unique region 4, with 31% of total sequence reads originating from that area.

Figure 5 shows the standard deviation among methylation levels at individual CpG

sites across samples that have been separated by tissue: jejunum, enterocyte and whole

blood. High variation was observed in methylation levels at individual CpG sites, with high

standard deviations (0-0.8) seen in the majority of regions and tissues. As shown in Figure

6, methylation levels in technical replicate samples showed very little correlation when

compared (Pearson correlation coefficient = 0.33). Methylation levels of CpG dinucleotides

from enterocyte and jejunum samples were compared with relative lactase expression

using two-tailed Spearman correlation test. The significance of these correlations is plotted

in Figure 7. Methylation at the cytosine with the coordinate “chr2:130926695” was found

48

to be significantly correlated (p<0.05) with lactase expression; however, the significance of

this trend disappeared after correcting for multiple statistical comparisons (Bonferroni

Correction for multiple comparisons).

Figure 4. Percentage of on-target sequence reads. Highest on-target activity occurred in Unique Region 4, with just under a third (31%) of on-target sequence reads.

0

5

10

15

20

25

30

35

1 2 3 4

On

-Tar

get

Seq

uen

cin

g (%

)

Unique Region

49

a

b

Figure 5. a) Standard deviations in recorded methylation level (Y-axis) at individual CpG sites (X-axis) compiled across all samples and separated by tissue. Arrows indicate unique regions in the DNA. Image adapted from original created by Orion Buske. Image used with permission. b) Hypothetical image of the methylation standard deviation to be expected if gene expression were governed by the methylation status of a specific region.

50

Figure 6. Correlation plot of methylation levels at individual CpG sites averaged across replicate samples. Pearson correlation coefficient = 0.33. Image created by Orion Buske and used with permission.

51

Figure 7. Manhattan plot of the signed log10 (p) values, for the correlation between CpG-wise enterocyte and jejunum methylation vs. lactase expression as measured by Spearman correlation test. Methylation at a single CpG site at chr2:130926695 was significantly (p<0.05) correlated with lactase expression, but only before corrections for multiple comparisons (Bonferroni Correction). Arrows indicate unique DNA regions. Image adapted from original created by Orion Buske. Image used with permission.

52

DISCUSSION

LACTASE EXPRESSION IN MICE

The 1.32-fold decrease in lactase expression observed (Figure 2) in the proximal

jejunum of the 60 day old mice was not as large as the decrease reported in previous

studies. These studies observed a minimum 2.3-fold decrease in lactase expression in their

old vs. young mice, using ages of 7 and 50 days for those conditions, respectively (R. Fang

et al., 2006). However, a different strain of mouse was used, and only one biological sample

was used to represent the lactase expression in each age group. Although there is little

indication to suggest that lactase expression varies by strain of mouse, this may be a factor

adding to the variation observed between the present results and previously published

data. In addition, our results suggest that the large fold decreases previously reported by

others may diminish with increased sample size. Indeed, studies using a greater sample

size (Duluc, Jost, & Freund, 1993) have noted that lactase expression decreased

significantly in all segments of the small intestine, except in the proximal jejunum of 90 day

old rats when compared the lactase expression of 4 day old rats. The present study

comparing the lactase expression in the distal jejunum and ileum of young and old mice

demonstrated large fold decreases (distal jejunum= 4.32, ileum = 2.81) in lactase

expression across these regions (Figure 3), comparable to the fold changes observed

previously (distal jejunum= 2.5, ileum= 2.8) (R. Fang et al., 2006).

The data obtained in this experiment demonstrate a mosaic pattern of lactase

expression within the small intestine, alongside enterocytic enzyme expression variation

throughout development depending on small intestinal segment (Figure2, 3). Other

53

enterocyte-specific genes, such as the sucrase-isomaltase gene and intestinal carbamoyl

phosphate synthase I, have also been previously shown to exhibit varying expression

throughout the length of the small intestine (Rings et al., 1994; Van Beers et al., 1998).

Region specific epigenetic regulations may be a source of the high degree of variation

identified in enterocyte gene expression within different segments of the small intestine;

however, further studies are required.

MOLECULAR INVERSION PROBES: LACTASE INTOLERANCE IN

HUMANS

This study interrogated four unique and four segmentally duplicating regions of the

DNA with molecular inversion probe enrichment followed by sequencing, in which

methylation profiles had previously been shown to be significantly correlated with

enterocyte lactase expression (Dr. Viviane Labrie, Postdoctoral Fellow Petronis’

Laboratory, 2012, unpublished findings). The efficacy of these techniques was evaluated

using the performance parameters of specificity, sensitivity, uniformity, reproducibility,

and the amount of DNA required.

SPECIFICITY

It was found that the molecular inversion probe technique led to enrichment with

low specificity. Specificity, defined as the percentage of on-target enrichment, is essential

for obtaining informative data. As seen in Figure 4, only a small percentage of sequence

reads originated from the proper target location, indicating that the molecular inversion

probes did not bind specifically to the targeted regions. The low specificity observed in our

results is unusual for the molecular inversion probe technique. In fact, the technique had

been designed for probe binding in a highly specific manner, since the 3’ and 5’ binding

54

sites are both bound together and restricted locally (Hardenbol et al., 2003). Furthermore,

molecular inversion probe enrichment has previously been shown to be highly specific

when applied to genomic and bisulfite treated DNA (Diep et al., 2012; Porreca et al., 2007;

Shen et al., 2013). However, the reduced complexity of bisulfite treated DNA enables off-

target probe binding, and on-target binding values as low as 56% have previously been

reported (Diep et al., 2012). Probe design has been identified as a key factor in probe

binding specificity (Diep et al., 2012), so future experiments utilizing this technique may

begin with improving probe design. Probe design can be improved through the use of

ppDesigner, a program which has been developed to aid in the design of efficient molecular

inversion probes (Diep et al., 2012). This program was utilized in the current experiment;

however, there are various parameters that can be modified to improve molecular

inversion probe selectivity. Previous research indicates that increasing the melting

temperature of the ligation arm by increasing its Guanine content and length can improve

probe capturing efficiency. In addition, shorter target sequences are captured with higher

efficiency than long target sequences (Deng et al., 2009). As such, future attempts at

molecular inversion probe design may include increasing the melting temperature of the

ligation arm, and decreasing the target sequence length to produce probes with higher

capturing efficiency.

In addition to improved probe design, increasing the stability of the probes may also

improve target capture. Betaine, a chemical that reduces the formation of GC-rich

secondary structures, has been included during the extension and ligation reactions to

minimize probe-to-probe interactions. The addition of betaine along with decreased

annealing time improved capture yields in similar techniques applied to genomic DNA

55

(Shen et al., 2011), and may also lead to similar improvements when applied to bisulfite

treated DNA.

SENSITIVITY

The measures of coverage, defined as the number of target regions captured, as well

as depth, defined as the number of reads representing each target, were used to assess the

sensitivity of the experimental techniques. Of the 810 interrogated cytosines, just over half

were covered 10 times in at least six samples. This indicates that not only were half the

cytosines of interest missed completely, but also that the ones which were covered were

sequenced with low depth (ie. only 10x). As displayed in Figure 5, there was a high

standard deviation in the methylation measures of individual CpG sites across all samples

and tissues. In addition, Figure 5 highlights the scarcity of coverage of the molecular

inversion probe technique as portrayed by the lack of data points within the unique

regions. Although the coverage in the segmentally duplicated regions was high, so was the

standard deviation of methylation across samples. Segmentally duplicated regions and

unique regions were enriched separately and sequenced together. Due to the high

frequency of segmentally duplicating sequences in the genome relative to unique regions, a

higher percentage of sequence reads originated from segmentally duplicating regions. It is

possible that these regions unevenly occupied sequencing reactions, causing

disproportionate sequencing of the unique regions. Sequencing the unique regions

separately from the segmentally duplicating regions may result in better coverage of the

unique regions. Overall, the enrichment and sequencing techniques did not perform with

high sensitivity.

56

A lack of sequencing depth was also observed in the unique regions of the data,

indicating that the selection of sequencing platform could also stand to be improved. While

the Illumina MiSeq platform used in this experiment produces 1.5-2 Gb of sequence reads

(Quail et al., 2012), other platforms such as Illumina HiSeq produce 30 Gb of sequence

reads, which allow for greater coverage and sequence depth.

REPRODUCIBILITY

The high standard deviation of and low correlation among samples characterizes

the low uniformity and reproducibility of the techniques employed. Reproducibility of the

technique is low, as the Pearson correlation coefficient between technical replicates is 0.33

(Figure 6). Previous studies using this technique to enrich genomic DNA have found a

Pearson correlation coefficient of 0.54 between technical replicates (Porreca et al., 2007).

More recently, a Pearson correlation coefficient of 0.97- 0.98 has been achieved between

technical replicates through improved probe design (Diep et al., 2012). These low

correlation coefficients in the present and past studies speak to the low reproducibility and

uneven coverage observed with the molecular inversion probe technique, as well as the

sensitivity of probe design.

The low reproducibility observed in the present study may be due to the nature of

molecular inversion probe enrichment in segmentally duplicated regions since they are

located in numerous parts of the genome. Due to the high homology of these sequences, the

molecular inversion probes designed to bind to regions within the second chromosome are

also theoretically capable of binding to the segmentally duplicating regions present on

different chromosomes. It therefore remains possible that the methylation statuses of CpG

sites originating from other chromosomes, outside the chromosome of interest

57

(chromosome 2) were inadvertently compared; this scenario would result in a low

correlation between replicates, just as observed.

In addition, sequencing reads from this region are impossible to accurately map to

the correct point of origin due to their repetitive nature (Horvath, Schwartz, & Eichler,

2000). It is also a possibility that the high standard deviation observed in the segmentally

duplicated regions may reflect technical difficulties with properly mapping segmentally

duplicating regions, rather than an issue of biological relevance.

AMOUNT OF DNA REQUIRED

Although refinements to the molecular inversion probe and sequencing protocol are

required, a major benefit of the employed techniques was the ability to interrogate large

target regions using minimal amounts of DNA (400ng). Interrogating the same regions with

the traditional PCR technique would require several micrograms of each sample.

METHYLATION AND LACTASE EXPRESSION

Despite the high standard deviation and low correlation observed between technical

replicates, methylation level was compared with relative lactase expression. Figure 7

indicates that no correlations were significant, except for a methylation single cytosine

within a segmentally duplicated region, but this significance did not withstand correction

for multiple comparisons. Due to the high standard deviation and low replication

correlation, post hoc analyses were not performed.

The results of the molecular inversion probe experiment were expected to

demonstrate a correlation between methylation profiles and lactase expression, as one of

the unique regions interrogated was the first exon of the LCT gene. Although the current

58

experiment did not provide conclusive evidence in this respect, recent research has

demonstrated that methylation in the first exon is tightly linked to transcriptional silencing

(Brenet et al., 2011). In fact, it has been shown that methylation in the first exon has a

higher correlation to transcriptional silencing than methylation in the promoter region, or

other intergenic regions downstream of the first exon (Brenet et al., 2011). The possibility

remains open that hypomethylation within the first LCT exon correlates with lactase

expression, which in turn is correlated with genotype.

The LCT-13910*C/T SNP is located ~24-kb upstream of the first LCT exon, and any

mechanism by which a SNP could influence methylation in an exon at such a distance has

not been fully elucidated. DNA sequence variants, however, have been shown to influence

DNA methylation and gene expression (Gertz et al., 2011). For example, autosomal genes

with SNP-induced allele specific methylation within 5-kb of a gene transcription start site

also show allele specific expression (Gertz et al., 2011). Although the LCT-13910*C/T SNP

is located significantly further than 5-kb from the lactase transcription start site, it is

possible that this SNP may influence methylation and subsequent lactase gene expression.

Although the influence of the LCT-13910*C/T SNP on DNA methylation remains

unclear, this SNP has been shown to influence lactase gene expression in vivo. The post-

weaning decline in lactase expression was impeded in transgenic mice with human DNA

fragments, when the -13910*T SNP was cloned upstream of a 2-kb rat lactase promoter but

not in mice with the -13910*C SNP (L. Fang et al., 2012). These data suggest that despite

being 14-kb upstream of the lactase gene, the LCT-13910*C/T SNP may play a causal role in

lactase expression.

59

A second region investigated in the molecular inversion probe experiment is located

in the second exon of the gastrokine 2 (GKN2) gene. GKN2 expression occurs in epithelial

cells of the small intestine and stomach, and is responsible for the maintenance and repair

of the surface mucosa (Kim et al., 2014). Regions also interrogated include a unique region

500 base pairs downstream of the LCT gene, along with a region within the low density

lipoprotein receptor gene family; yet how methylation in these regions influences lactase

expression remains incompletely understood. It’s possible that interrogation of these

regions with higher coverage, may provide greater insight into the roles of these regions in

mediating the developmental decline in lactase expression.

As epigenetic modifications are typically tissue specific, jejunum, enterocyte and

whole blood DNA were all investigated in this experiment. Epigenetic signatures in

peripheral blood have previously been shown to be useful detection markers for certain

cancers (Teschendorff et al., 2009). The present study investigated the epigenetic profiles

of blood DNA to determine whether the lactase phenotype also displays an epigenetic

detection signature. Although no such signature is observed in the segmentally duplicating

regions of the blood, it is possible that more refined enrichment and sequencing techniques

could identify a signature in the unique regions. Jejunum tissue stripped of the epithelial

cell layer (and enterocytes) was interrogated to identify potential epigenetic vestiges.

Previous research has shown that environmental factors, such as stress, can leave

functional epigenetic marks in the brain. These marks not only alter gene expression and

affect neuronal function, but also leave epigenetic vestiges in buccal epithelial cells (Essex

et al., 2013). In this experiment, we investigated whether such vestiges existed in jejunum

tissue absent of enterocytes. In addition, four samples of DNA from the jejunal epithelial

60

layer including enterocytes were also interrogated to determine the role of DNA

methylation in lactase expression. Unfortunately, data obtained from this experiment was

unable to provide conclusive results within the target unique regions, and no signature was

identified in the segmentally duplicating regions.

Segmentally duplicated regions were also interrogated using molecular inversion

probe enrichment. These regions were previously found to correlate with lactase

expression in humans (Labrie, data not published). In addition, segmentally duplicating

regions are linked to disease-causing rearrangements, and may play a role in adaptive

evolution and gene innovation (Bailey et al., 2002; Eichler, 2001; Horvath et al., 2000; Zhao,

Zhu, Kasahara, Morishita, & Zhang, 2013). These highly homologous regions contain both

exonic and intronic sequences, and are found interspersed within the genome (Bailey et al.,

2002; Eichler, 2001). Inheritance of specific duplications has been shown to be one of the

major genetic causes of neurological diseases (Marshall et al., 2008; Miller et al., 2009;

Sharp et al., 2008). Similarly, hypomethylation in segmentally duplicating regions has been

observed in breast and lung cancer (Novak et al., 2008).

The molecular inversion probe enrichment coupled with sequencing used in this

study revealed high variability in the methylation profile of the examined segmentally

duplicated regions. When methylation in these regions was correlated to lactase

expression, no correlations were found to be significant after correction for multiple

comparisons. Although the lack of significant correlation may stem from the high variation

in output values, it is unlikely that these regions influence lactase expression due to their

extreme distance from the lactase gene (3 Mb). Segmentally duplicating regions, however,

61

may still be more immediately interesting areas to interrogate further due to their

influence on genomic re-arrangement.

FUTURE DIRECTIONS

Future directions of this project include determining whether there are epigenetic

modifications that correlate with changes in lactase in gene expression levels. Results from

the present study demonstrate a mosaic pattern of gene expression within the small

intestine in mice. Future studies may investigate epigenetic regulation of enterocyte

specific genes along the horizontal axis of the small intestine, to further understand the

mosaic pattern of gene expression. In addition, it is also of interest to investigate the

epigenetic mechanisms that may be responsible for developmental regulations of gene

expression.

The data obtained from the molecular inversion probe enrichment demonstrated

that optimization of the chosen protocol is required in order for this technique to provide

viable information. A major challenge to overcome will be to optimize probe design and

improve on-target probe activity (Diep et al., 2012). While traditional techniques such as

luciferase-based pyrosequencing would provide informative methylation data for the

unique regions, it requires an inconveniently large amount of DNA, and still it would be

unable to sequence the segmentally duplicating regions. Although the molecular inversion

probe technique was capable of interrogating segmentally duplicating regions, the high

homology of these regions presents a challenge in mapping the resultant sequence reads

with precision.

The uneven and sparse coverage observed using molecular inversion probes in the

unique regions may have been a result of poor target capture during enrichment, or poor

62

coverage during sequencing. In fact, an investigation into exactly which of these

contributes most to the issues encountered with the current study could prove to be quite

fruitful, from a methodological perspective. Similarly, the means by which target capture

can be enhanced may be a productive avenue toward optimizing molecular inversion probe

enrichment.

The importance of this study comes not entirely from the study of lactose

intolerance, but rather from the implications of the epigenetic influence on lactase

expression. The lactose intolerance phenotype presents a unique opportunity to study

potential epigenetic mechanisms in a simple phenotype, controlled by the expression of a

single gene. Using this phenotype as a model, we can study how epigenetic modifications

influence the onset of the decrease in lactase expression through development and as a

response to environmental stimuli. These mechanisms can then be generalized to

phenotypes regulated by the expression of multiple genes. Understanding how epigenetic

mechanisms regulate simple phenotypes is the first step in understanding epigenetic

mechanisms of complex, multi-gene disease phenotypes such as schizophrenia and bipolar

disorder.

Studying the epigenetic influence on lactase expression is a ‘proof of concept’ of

sorts. If epigenetic mechanisms can be identified that are responsible for the lactase

intolerance phenotype, then it stands to reason that similar mechanisms may influence

gene expression in more complex diseases. In order to understand even simple

phenotypes, however, the techniques used must be sensitive enough to detect epigenetic

differences, while maintaining a high level of reproducibility. The study of techniques using

a simple and well understood phenotypic model allows for the facilitated detection of noisy

63

signal inherent to the technique, independent of biological relevance. By understanding the

specifications unique to individual techniques, definitive results can be obtained to provide

an accurate reflection of the biological environment. As enrichment and sequencing

techniques evolve and develop, we will be provided with greater insight of the epigenetic

relevance in simple and complex phenotypes. Understanding epigenetic techniques and the

epigenetics of simple phenotypes is the foreground to understanding complex disease

phenotypes, which in the future, will hopefully lead to the development of effective

therapeutic agents.

64

REFERENCES Akhras, M. S., Unemo, M., Thiyagarajan, S., Nyren, P., Davis, R. W., Fire, A. Z., & Pourmand,

N. (2007). Connector inversion probe technology: a powerful one-primer multiplex DNA

amplification system for numerous scientific applications. PLoS One, 2(9), e915. doi:

10.1371/journal.pone.0000915

Argeson, A. C., Nelson, K. K., & Siracusa, L. D. (1996). Molecular basis of the pleiotropic

phenotype of mice carrying the hypervariable yellow (Ahvy) mutation at the agouti locus.

Genetics, 142(2), 557-567.

Bailey, J. A., Gu, Z., Clark, R. A., Reinert, K., Samonte, R. V., Schwartz, S., . . . Eichler, E. E.

(2002). Recent segmental duplications in the human genome. Science, 297(5583), 1003-

1007. doi: 10.1126/science.1072047

Ball, M. P., Li, J. B., Gao, Y., Lee, J. H., LeProust, E. M., Park, I. H., . . . Church, G. M. (2009).

Targeted and genome-scale strategies reveal gene-body methylation signatures in human

cells. Nat Biotechnol, 27(4), 361-368. doi: 10.1038/nbt.1533

Barzilay, G., Walker, L. J., Robson, C. N., & Hickson, I. D. (1995). Site-directed mutagenesis of

the human DNA repair enzyme HAP1: identification of residues important for AP

endonuclease and RNase H activity. Nucleic Acids Res, 23(9), 1544-1550.

Berdasco, M., & Esteller, M. (2010). Aberrant epigenetic landscape in cancer: how cellular

identity goes awry. Dev Cell, 19(5), 698-711. doi: 10.1016/j.devcel.2010.10.005

Bird, A. (2002). DNA methylation patterns and epigenetic memory. Genes Dev, 16(1), 6-21. doi:

10.1101/gad.947102

Boll, W., Wagner, P., & Mantei, N. (1991). Structure of the chromosomal gene and cDNAs

coding for lactase-phlorizin hydrolase in humans with adult-type hypolactasia or

persistence of lactase. Am J Hum Genet, 48(5), 889-902.

Brenet, F., Moh, M., Funk, P., Feierstein, E., Viale, A. J., Socci, N. D., & Scandura, J. M.

(2011). DNA methylation of the first exon is tightly linked to transcriptional silencing.

PLoS One, 6(1), e14524. doi: 10.1371/journal.pone.0014524

Broske, A. M., Vockentanz, L., Kharazi, S., Huska, M. R., Mancini, E., Scheller, M., . . .

Rosenbauer, F. (2009). DNA methylation protects hematopoietic stem cell multipotency

from myeloerythroid restriction. Nat Genet, 41(11), 1207-1215. doi: 10.1038/ng.463

Carlone, D. L., Lee, J. H., Young, S. R., Dobrota, E., Butler, J. S., Ruiz, J., & Skalnik, D. G.

(2005). Reduced genomic cytosine methylation and defective cellular differentiation in

embryonic stem cells lacking CpG binding protein. Mol Cell Biol, 25(12), 4881-4891.

doi: 10.1128/mcb.25.12.4881-4891.2005

65

Carninci, P., Sandelin, A., Lenhard, B., Katayama, S., Shimokawa, K., Ponjavic, J., . . .

Hayashizaki, Y. (2006). Genome-wide analysis of mammalian promoter architecture and

evolution. Nat Genet, 38(6), 626-635. doi: 10.1038/ng1789

Cho, R. J., Mindrinos, M., Richards, D. R., Sapolsky, R. J., Anderson, M., Drenkard, E., . . .

Oefner, P. J. (1999). Genome-wide mapping with biallelic markers in Arabidopsis

thaliana. Nat Genet, 23(2), 203-207. doi: 10.1038/13833

Choi, S. H., Heo, K., Byun, H. M., An, W., Lu, W., & Yang, A. S. (2011). Identification of

preferential target sites for human DNA methyltransferases. Nucleic Acids Res, 39(1),

104-118. doi: 10.1093/nar/gkq774

Dalma‐Weiszhausz, D. D., Warrington, J., Tanimoto, E. Y., & Miyada, C. G. (2006). [1] The

Affymetrix GeneChip® Platform: An Overview. In K. Alan & O. Brian (Eds.), Methods

in Enzymology (Vol. Volume 410, pp. 3-28): Academic Press.

Deng, J., Shoemaker, R., Xie, B., Gore, A., LeProust, E. M., Antosiewicz-Bourget, J., . . . Zhang,

K. (2009). Targeted bisulfite sequencing reveals changes in DNA methylation associated

with nuclear reprogramming. Nat Biotechnol, 27(4), 353-360. doi: 10.1038/nbt.1530

Diep, D., Plongthongkum, N., Gore, A., Fung, H. L., Shoemaker, R., & Zhang, K. (2012).

Library-free methylation sequencing with bisulfite padlock probes. Nat Methods, 9(3),

270-272. doi: 10.1038/nmeth.1871

Duluc, I., Jost, B., & Freund, J. N. (1993). Multiple levels of control of the stage- and region-

specific expression of rat intestinal lactase. J Cell Biol, 123(6 Pt 1), 1577-1586.

Eichler, E. E. (2001). Recent duplication, domain accretion and the dynamic mutation of the

human genome. Trends Genet, 17(11), 661-669.

Enattah, N. S., Sahi, T., Savilahti, E., Terwilliger, J. D., Peltonen, L., & Jarvela, I. (2002).

Identification of a variant associated with adult-type hypolactasia. Nat Genet, 30(2), 233-

237. doi: 10.1038/ng826

Essex, M. J., Boyce, W. T., Hertzman, C., Lam, L. L., Armstrong, J. M., Neumann, S. M., &

Kobor, M. S. (2013). Epigenetic vestiges of early developmental adversity: childhood

stress exposure and DNA methylation in adolescence. Child Dev, 84(1), 58-75. doi:

10.1111/j.1467-8624.2011.01641.x

Fakhrai-Rad, H., Pourmand, N., & Ronaghi, M. (2002). Pyrosequencing: an accurate detection

platform for single nucleotide polymorphisms. Hum Mutat, 19(5), 479-485. doi:

10.1002/humu.10078

Fang, L., Ahn, J. K., Wodziak, D., & Sibley, E. (2012). The human lactase persistence-

associated SNP -13910*T enables in vivo functional persistence of lactase promoter-

66

reporter transgene expression. Hum Genet, 131(7), 1153-1159. doi: 10.1007/s00439-012-

1140-z

Fang, R., Olds, L. C., Santiago, N. A., & Sibley, E. (2001). GATA family transcription factors

activate lactase gene promoter in intestinal Caco-2 cells. American Journal of Physiology

- Gastrointestinal and Liver Physiology, 280(1 43-1), G58-G67.

Fang, R., Olds, L. C., & Sibley, E. (2006). Spatio-temporal patterns of intestine-specific

transcription factor expression during postnatal mouse gut development. Gene Expr

Patterns, 6(4), 426-432. doi: 10.1016/j.modgep.2005.09.003

Furnari, M., Bonfanti, D., Parodi, A., Franze, J., Savarino, E., Bruzzone, L., . . . Savarino, V.

(2013). A comparison between lactose breath test and quick test on duodenal biopsies for

diagnosing lactase deficiency in patients with self-reported lactose intolerance. J Clin

Gastroenterol, 47(2), 148-152. doi: 10.1097/MCG.0b013e31824e9132

Gabory, A., Attig, L., & Junien, C. (2009). Sexual dimorphism in environmental epigenetic

programming. Mol Cell Endocrinol, 304(1-2), 8-18. doi: 10.1016/j.mce.2009.02.015

Ge, Z. J., Luo, S. M., Lin, F., Liang, Q. X., Huang, L., Wei, Y. C., . . . Sun, Q. Y. (2014). DNA

methylation in oocytes and liver of female mice and their offspring: effects of high-fat-

diet-induced obesity. Environ Health Perspect, 122(2), 159-164. doi:

10.1289/ehp.1307047

Gertz, J., Varley, K. E., Reddy, T. E., Bowling, K. M., Pauli, F., Parker, S. L., . . . Myers, R. M.

(2011). Analysis of DNA methylation in a three-generation family reveals widespread

genetic influence on epigenetic regulation. PLoS Genet, 7(8), e1002228. doi:

10.1371/journal.pgen.1002228

Gonzalo, S., Jaco, I., Fraga, M. F., Chen, T., Li, E., Esteller, M., & Blasco, M. A. (2006). DNA

methyltransferases control telomere length and telomere recombination in mammalian

cells. Nat Cell Biol, 8(4), 416-424. doi: 10.1038/ncb1386

Guo, J., Xu, N., Li, Z., Zhang, S., Wu, J., Kim, D. H., . . . Ju, J. (2008). Four-color DNA

sequencing with 3'-O-modified nucleotide reversible terminators and chemically

cleavable fluorescent dideoxynucleotides. Proc Natl Acad Sci U S A, 105(27), 9145-

9150. doi: 10.1073/pnas.0804023105

Guo, Junjie U., Su, Y., Zhong, C., Ming, G.-l., & Song, H. (2011). Hydroxylation of 5-

Methylcytosine by TET1 Promotes Active DNA Demethylation in the Adult Brain. Cell,

145(3), 423-434. doi: http://dx.doi.org/10.1016/j.cell.2011.03.022

Hackett, J. A., Zylicz, J. J., & Surani, M. A. (2012). Parallel mechanisms of epigenetic

reprogramming in the germline. Trends in Genetics, 28(4), 164-174. doi:

http://dx.doi.org/10.1016/j.tig.2012.01.005

http://dx.doi.org/10.1016/j.cell.2011.03.022

http://dx.doi.org/10.1016/j.tig.2012.01.005

67

Hardenbol, P., Baner, J., Jain, M., Nilsson, M., Namsaraev, E. A., Karlin-Neumann, G. A., . . .

Davis, R. W. (2003). Multiplexed genotyping with sequence-tagged molecular inversion

probes. Nat Biotechnol, 21(6), 673-678. doi: 10.1038/nbt821

Hashimoto, H., Vertino, P. M., & Cheng, X. (2010). Molecular coupling of DNA methylation

and histone methylation. Epigenomics, 2(5), 657-669. doi: 10.2217/epi.10.44

He, Y. F., Li, B. Z., Li, Z., Liu, P., Wang, Y., Tang, Q., . . . Xu, G. L. (2011). Tet-mediated

formation of 5-carboxylcytosine and its excision by TDG in mammalian DNA. Science,

333(6047), 1303-1307. doi: 10.1126/science.1210944

Horvath, J. E., Schwartz, S., & Eichler, E. E. (2000). The mosaic structure of human

pericentromeric DNA: a strategy for characterizing complex regions of the human

genome. Genome Res, 10(6), 839-852.

Inbar-Feigenberg, M., Choufani, S., Butcher, D. T., Roifman, M., & Weksberg, R. (2013). Basic

concepts of epigenetics. Fertility and Sterility, 99(3), 607-615. doi:

http://dx.doi.org/10.1016/j.fertnstert.2013.01.117

Ingram, C. J., Elamin, M. F., Mulcare, C. A., Weale, M. E., Tarekegn, A., Raga, T. O., . . .

Swallow, D. M. (2007). A novel polymorphism associated with lactose tolerance in

Africa: multiple causes for lactase persistence? Hum Genet, 120(6), 779-788. doi:

10.1007/s00439-006-0291-1

Ito, S., Shen, L., Dai, Q., Wu, S. C., Collins, L. B., Swenberg, J. A., . . . Zhang, Y. (2011). Tet

proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine.

Science, 333(6047), 1300-1303. doi: 10.1126/science.1210597

Jost, B., Duluc, I., Vilotte, J. L., & Freund, J. N. (1998). Lactase is unchanged in suckling mice

fed with lactose-free milk. Gastroenterol Clin Biol, 22(11), 863-867.

Kaminsky, Z., Wang, S. C., & Petronis, A. (2006). Complex disease, gender and epigenetics.

Annals of Medicine, 38(8), 530-544. doi: 10.1080/07853890600989211

Khare, T., Pai, S., Koncevicius, K., Pal, M., Kriukiene, E., Liutkeviciute, Z., . . . Petronis, A.

(2012). 5-hmC in the brain is abundant in synaptic genes and shows differences at the

exon-intron boundary. Nat Struct Mol Biol, 19(10), 1037-1043. doi: 10.1038/nsmb.2372

Kim, O., Yoon, J. H., Choi, W. S., Ashktorab, H., Smoot, D. T., Nam, S. W., . . . Park, W. S.

(2014). GKN2 contributes to the homeostasis of gastric mucosa by inhibiting GKN1

activity. J Cell Physiol, 229(6), 762-771. doi: 10.1002/jcp.24496

Kriukiene, E., Labrie, V., Khare, T., Urbanaviciute, G., Lapinaite, A., Koncevicius, K., . . .

Klimasauskas, S. (2013). DNA unmethylome profiling by covalent capture of CpG sites.

Nat Commun, 4, 2190. doi: 10.1038/ncomms3190

http://dx.doi.org/10.1016/j.fertnstert.2013.01.117

68

Kuokkanen, M., Myllyniemi, M., Vauhkonen, M., Helske, T., Kaariainen, I., Karesvuori, S., . . .

Sipponen, P. (2006). A biopsy-based quick test in the diagnosis of duodenal hypolactasia

in upper gastrointestinal endoscopy. Endoscopy, 38(7), 708-712. doi: 10.1055/s-2006-

925354

Laird, P. W. (2010). Principles and challenges of genomewide DNA methylation analysis. Nat

Rev Genet, 11(3), 191-203. doi: 10.1038/nrg2732

Law, D., Conklin, J., & Pimentel, M. (2010). Lactose intolerance and the role of the lactose

breath test. Am J Gastroenterol, 105(8), 1726-1728. doi: 10.1038/ajg.2010.146

Lee, P. P., Fitzpatrick, D. R., Beard, C., Jessup, H. K., Lehar, S., Makar, K. W., . . . Wilson, C.

B. (2001). A critical role for Dnmt1 and DNA methylation in T cell development,

function, and survival. Immunity, 15(5), 763-774.

Li, E., Bestor, T. H., & Jaenisch, R. (1992). Targeted mutation of the DNA methyltransferase

gene results in embryonic lethality. Cell, 69(6), 915-926. doi:

http://dx.doi.org/10.1016/0092-8674(92)90611-F

Li, J. B., Gao, Y., Aach, J., Zhang, K., Kryukov, G. V., Xie, B., . . . Church, G. M. (2009).

Multiplex padlock targeted sequencing reveals human hypermutable CpG variations.

Genome Res, 19(9), 1606-1615. doi: 10.1101/gr.092213.109

Ling, C., & Ronn, T. (2014). Epigenetic adaptation to regular exercise in humans. Drug Discov

Today. doi: 10.1016/j.drudis.2014.03.006

Lomer, M. C., Parkes, G. C., & Sanderson, J. D. (2008). Review article: lactose intolerance in

clinical practice--myths and realities. Aliment Pharmacol Ther, 27(2), 93-103. doi:

10.1111/j.1365-2036.2007.03557.x

Lukinavicius, G., Lapiene, V., Stasevskij, Z., Dalhoff, C., Weinhold, E., & Klimasauskas, S.

(2007). Targeted labeling of DNA by methyltransferase-directed transfer of activated

groups (mTAG). J Am Chem Soc, 129(10), 2758-2759. doi: 10.1021/ja0691876

Mamanova, L., Coffey, A. J., Scott, C. E., Kozarewa, I., Turner, E. H., Kumar, A., . . . Turner, D.

J. (2010). Target-enrichment strategies for next-generation sequencing. Nat Methods,

7(2), 111-118. doi: 10.1038/nmeth.1419

Marenstein, D. R., Wilson, D. M., 3rd, & Teebor, G. W. (2004). Human AP endonuclease

(APE1) demonstrates endonucleolytic activity against AP sites in single-stranded DNA.

DNA Repair (Amst), 3(5), 527-533. doi: 10.1016/j.dnarep.2004.01.010

Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A., . . .

Rothberg, J. M. (2005). Genome sequencing in microfabricated high-density picolitre

reactors. Nature, 437(7057), 376-380. doi: 10.1038/nature03959

http://dx.doi.org/10.1016/0092-8674(92)90611-F

69

Marshall, C. R., Noor, A., Vincent, J. B., Lionel, A. C., Feuk, L., Skaug, J., . . . Scherer, S. W.

(2008). Structural variation of chromosomes in autism spectrum disorder. Am J Hum

Genet, 82(2), 477-488. doi: 10.1016/j.ajhg.2007.12.009

Mattar, R., de Campos Mazo, D. F., & Carrilho, F. J. (2012). Lactose intolerance: diagnosis,

genetic, and clinical factors. Clin Exp Gastroenterol, 5, 113-121. doi:

10.2147/CEG.S32368

Metneki, J., Czeizel, A., Flatz, S. D., & Flatz, G. (1984). A study of lactose absorption capacity

in twins. Hum Genet, 67(3), 296-300.

Metzker, M. L. (2010). Sequencing technologies - the next generation. Nat Rev Genet, 11(1), 31-

46. doi: 10.1038/nrg2626

Michaud, E. J., van Vugt, M. J., Bultman, S. J., Sweet, H. O., Davisson, M. T., & Woychik, R. P.

(1994). Differential expression of a new dominant agouti allele (Aiapy) is correlated with

methylation state and is influenced by parental lineage. Genes Dev, 8(12), 1463-1472.

Millan, M. J. (2013). An epigenetic framework for neurodevelopmental disorders: from

pathogenesis to potential therapy. Neuropharmacology, 68, 2-82. doi:

10.1016/j.neuropharm.2012.11.015

Miller, D. T., Shen, Y., Weiss, L. A., Korn, J., Anselm, I., Bridgemohan, C., . . . Wu, B. L.

(2009). Microdeletion/duplication at 15q13.2q13.3 among individuals with features of

autism and other neuropsychiatric disorders. J Med Genet, 46(4), 242-248. doi:

10.1136/jmg.2008.059907

Miranda, T. B., & Jones, P. A. (2007). DNA methylation: the nuts and bolts of repression. J Cell

Physiol, 213(2), 384-390. doi: 10.1002/jcp.21224

Morey, C., & Avner, P. (2010). Genetics and epigenetics of the X chromosome. Ann N Y Acad

Sci, 1214, E18-33. doi: 10.1111/j.1749-6632.2010.05943.x

Morgan, H. D., Santos, F., Green, K., Dean, W., & Reik, W. (2005). Epigenetic reprogramming

in mammals. Hum Mol Genet, 14 Spec No 1, R47-58. doi: 10.1093/hmg/ddi114

Morgan, H. D., Sutherland, H. G., Martin, D. I., & Whitelaw, E. (1999). Epigenetic inheritance

at the agouti locus in the mouse. Nat Genet, 23(3), 314-318. doi: 10.1038/15490

Newcomer, A. D., McGill, D. B., Thomas, P. J., & Hofmann, A. F. (1975). Prospective

comparison of indirect methods for detecting lactase deficiency. N Engl J Med, 293(24),

1232-1236. doi: 10.1056/nejm197512112932405

Nilsson, M., Malmgren, H., Samiotaki, M., Kwiatkowski, M., Chowdhary, B. P., & Landegren,

U. (1994). Padlock probes: circularizing oligonucleotides for localized DNA detection.

Science, 265(5181), 2085-2088.

70

Novak, P., Jensen, T., Oshiro, M. M., Watts, G. S., Kim, C. J., & Futscher, B. W. (2008).

Agglomerative epigenetic aberrations are a common event in human breast cancer.

Cancer Res, 68(20), 8616-8625. doi: 10.1158/0008-5472.can-08-1419

Okano, M., Bell, D. W., Haber, D. A., & Li, E. (1999). DNA Methyltransferases Dnmt3a and

Dnmt3b Are Essential for De Novo Methylation and Mammalian Development. Cell,

99(3), 247-257. doi: http://dx.doi.org/10.1016/S0092-8674(00)81656-6

Okashita, N., Kumaki, Y., Ebi, K., Nishi, M., Okamoto, Y., Nakayama, M., . . . Seki, Y. (2014).

PRDM14 promotes active DNA demethylation through the ten-eleven translocation

(TET)-mediated base excision repair pathway in embryonic stem cells. Development,

141(2), 269-280. doi: 10.1242/dev.099622

Pastor, W. A., Aravind, L., & Rao, A. (2013). TETonic shift: biological roles of TET proteins in

DNA demethylation and transcription. Nat Rev Mol Cell Biol, 14(6), 341-356. doi:

10.1038/nrm3589

Petronis, A. (2001). Human morbid genetics revisited: relevance of epigenetics. Trends Genet,

17(3), 142-146.

Petronis, A. (2010). Epigenetics as a unifying principle in the aetiology of complex traits and

diseases. Nature, 465(7299), 721-727. doi: 10.1038/nature09230

Porreca, G. J., Zhang, K., Li, J. B., Xie, B., Austin, D., Vassallo, S. L., . . . Shendure, J. (2007).

Multiplex amplification of large sets of human exons. Nat Methods, 4(11), 931-936. doi:

10.1038/nmeth1110

Poulter, M., Hollox, E., Harvey, C. B., Mulcare, C., Peuhkuri, K., Kajander, K., . . . Swallow, D.

M. (2003). The causal element for the lactase persistence/non-persistence polymorphism

is located in a 1 Mb region of linkage disequilibrium in Europeans. Ann Hum Genet,

67(Pt 4), 298-311.

Quail, M. A., Smith, M., Coupland, P., Otto, T. D., Harris, S. R., Connor, T. R., . . . Gu, Y.

(2012). A tale of three next generation sequencing platforms: comparison of Ion Torrent,

Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics, 13, 341. doi:

10.1186/1471-2164-13-341

Rasinpera, H., Kuokkanen, M., Kolho, K. L., Lindahl, H., Enattah, N. S., Savilahti, E., . . .

Jarvela, I. (2005). Transcriptional downregulation of the lactase (LCT) gene during

childhood. Gut, 54(11), 1660-1661. doi: 10.1136/gut.2005.077404

Reik, W., & Lewis, A. (2005). Co-evolution of X-chromosome inactivation and imprinting in

mammals. Nat Rev Genet, 6(5), 403-410. doi: 10.1038/nrg1602

Rings, E. H., Krasinski, S. D., van Beers, E. H., Moorman, A. F., Dekker, J., Montgomery, R. K.,

. . . Buller, H. A. (1994). Restriction of lactase gene expression along the proximal-to-

http://dx.doi.org/10.1016/S0092-8674(00)81656-6

71

distal axis of rat small intestine occurs during postnatal development. Gastroenterology,

106(5), 1223-1232.

Romanish, M., Lock, W., van de Lagemaat, L. N., Dunn, C. A., & Mager, D. L. (2005).

Repeated Recruitment of LTR Retrotransposons as Promoters by the Anti-apoptotic

Locus NAIP During Mammalian Evolution. PLoS Genetics, preprint(2006), e10. doi:

10.1371/journal.pgen.0030010.eor

Ronaghi, M. (2001). Pyrosequencing sheds light on DNA sequencing. Genome Res, 11(1), 3-11.

Sancho, E., Batlle, E., & Clevers, H. (2004). Signaling pathways in intestinal development and

cancer. Annu Rev Cell Dev Biol, 20, 695-723. doi:

10.1146/annurev.cellbio.20.010403.092805

Sen, G. L., Reuter, J. A., Webster, D. E., Zhu, L., & Khavari, P. A. (2010). DNMT1 maintains

progenitor function in self-renewing somatic tissue. Nature, 463(7280), 563-567. doi:

10.1038/nature08683

Sharp, A. J., Mefford, H. C., Li, K., Baker, C., Skinner, C., Stevenson, R. E., . . . Eichler, E. E.

(2008). A recurrent 15q13.3 microdeletion syndrome associated with mental retardation

and seizures. Nat Genet, 40(3), 322-328. doi: 10.1038/ng.93

Sheaffer, K. L., & Kaestner, K. H. (2012). Transcriptional networks in liver and intestinal

development. Cold Spring Harb Perspect Biol, 4(9), a008284. doi:

10.1101/cshperspect.a008284

Sheaffer, K. L., Kim, R., Aoki, R., Elliott, E. N., Schug, J., Burger, L., . . . Kaestner, K. H.

(2014). DNA methylation is required for the control of stem cell differentiation in the

small intestine. Genes Dev, 28(6), 652-664. doi: 10.1101/gad.230318.113

Shen, P., Wang, W., Chi, A. K., Fan, Y., Davis, R. W., & Scharfe, C. (2013). Multiplex target

capture with double-stranded DNA probes. Genome Med, 5(5), 50. doi: 10.1186/gm454

Shen, P., Wang, W., Krishnakumar, S., Palm, C., Chi, A. K., Enns, G. M., . . . Scharfe, C.

(2011). High-quality DNA sequence capture of 524 disease candidate genes. Proc Natl

Acad Sci U S A, 108(16), 6549-6554. doi: 10.1073/pnas.1018981108

Shendure, J., & Ji, H. (2008). Next-generation DNA sequencing. Nat Biotechnol, 26(10), 1135-

1145. doi: 10.1038/nbt1486

Smith, Z. D., & Meissner, A. (2013). DNA methylation: roles in mammalian development. Nat

Rev Genet, 14(3), 204-220. doi: 10.1038/nrg3354

Suh, E., Chen, L., Taylor, J., & Traber, P. G. (1994). A homeodomain protein related to caudal

regulates intestine-specific gene transcription. Molecular and Cellular Biology, 14(11),

7340-7351.

72

Swallow, D. M. (2003). Genetics of lactase persistence and lactose intolerance. Annu Rev Genet,

37, 197-219. doi: 10.1146/annurev.genet.37.110801.143820

Tarailo-Graovac, M., & Chen, N. (2009). Using RepeatMasker to identify repetitive elements in

genomic sequences. Curr Protoc Bioinformatics, Chapter 4, Unit 4.10. doi:

10.1002/0471250953.bi0410s25

Teschendorff, A. E., Menon, U., Gentry-Maharaj, A., Ramus, S. J., Gayther, S. A., Apostolidou,

S., . . . Widschwendter, M. (2009). An epigenetic signature in peripheral blood predicts

active ovarian cancer. PLoS One, 4(12), e8274. doi: 10.1371/journal.pone.0008274

Thurman, R. E., Rynes, E., Humbert, R., Vierstra, J., Maurano, M. T., Haugen, E., . . .

Stamatoyannopoulos, J. A. (2012). The accessible chromatin landscape of the human

genome. Nature, 489(7414), 75-82. doi: 10.1038/nature11232

Trapnell, C., & Salzberg, S. L. (2009). How to map billions of short reads onto genomes. Nat

Biotechnol, 27(5), 455-457. doi: 10.1038/nbt0509-455

Troelsen, J. T. (2005). Adult-type hypolactasia and regulation of lactase expression. Biochim

Biophys Acta, 1723(1-3), 19-32. doi: 10.1016/j.bbagen.2005.02.003

Trowbridge, J. J., Snow, J. W., Kim, J., & Orkin, S. H. (2009). DNA methyltransferase 1 is

essential for and uniquely regulates hematopoietic stem and progenitor cells. Cell Stem

Cell, 5(4), 442-449. doi: 10.1016/j.stem.2009.08.016

Van Beers, E. H., Rings, E. H., Posthuma, G., Dingemanse, M. A., Taminiau, J. A., Heymans, H.

S., . . . Dekker, J. (1998). Intestinal carbamoyl phosphate synthase I in human and rat.

Expression during development shows species differences and mosaic expression in

duodenum of both species. J Histochem Cytochem, 46(2), 231-240.

Waddington, C. H. (2012). The epigenotype. 1942. Int J Epidemiol, 41(1), 10-13. doi:

10.1093/ije/dyr184

Wang, D. G., Fan, J. B., Siao, C. J., Berno, A., Young, P., Sapolsky, R., . . . Lander, E. S. (1998).

Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms

in the human genome. Science, 280(5366), 1077-1082.

Weaver, I. C., Cervoni, N., Champagne, F. A., D'Alessio, A. C., Sharma, S., Seckl, J. R., . . .

Meaney, M. J. (2004). Epigenetic programming by maternal behavior. Nat Neurosci,

7(8), 847-854. doi: 10.1038/nn1276

Weber, M., Davies, J. J., Wittig, D., Oakeley, E. J., Haase, M., Lam, W. L., & Schubeler, D.

(2005). Chromosome-wide and promoter-specific analyses identify sites of differential

DNA methylation in normal and transformed human cells. Nat Genet, 37(8), 853-862.

doi: ng1598 [pii]10.1038/ng1598

73

Weichenhan, D., & Plass, C. (2013). The evolving epigenome. Hum Mol Genet, 22(R1), R1-6.

doi: 10.1093/hmg/ddt348

Wu, H., & Zhang, Y. (2014). Reversing DNA methylation: mechanisms, genomics, and

biological functions. Cell, 156(1-2), 45-68. doi: 10.1016/j.cell.2013.12.019

Zhao, Q., Zhu, Z., Kasahara, M., Morishita, S., & Zhang, Z. (2013). Segmental duplications in

the silkworm genome. BMC Genomics, 14(1), 521. doi: 10.1186/1471-2164-14-521

application of epigenetic techniques to study …...dunn, & mager, 2005). for instance, changes...

Documents