application of epigenetic techniques to study …...dunn, & mager, 2005). for instance, changes...
TRANSCRIPT
Application of Epigenetic Techniques to Study Lactase Expression
By Andrea Constantinof
A thesis submitted in conformity with the requirements for the degree of Master of Science
Pharmacology and Toxicology University of Toronto
© Copyright by Andrea Constantinof 2014
ii
Application of Epigenetic Techniques to Study Lactase Expression
Andrea Constantinof
Master of Science
Pharmacology and Toxicology
University of Toronto
2014
ABSTRACT
DNA methylation is a highly studied epigenetic modification. Numerous techniques have
been developed to investigate DNA methylation and its role in gene regulation. Lactose
intolerance has been identified as a simple phenotype, regulated by a single gene. Due to its
simple phenotype, lactose intolerance represents an ideal model to investigate the
epigenetic mechanisms that govern gene expression. Here, we investigated lactase
expression in aging mice, and used molecular inversion probe enrichment followed by DNA
sequencing to study epigenetic modifications in segmental duplications and unique regions
of the genome associated with the lactose intolerance in humans. Lactase was expressed in
a mosaic pattern in the mouse small intestine, and expression significantly decreased in
older mice. Molecular inversion probe enrichment of unique genomic regions proved
challenging, and did not provide informative data. No associations between lactase
expression levels and the DNA methylation profile of the lactase gene in segmental
duplications was found.
iii
Table of Contents
ABSTRACT.................................................................................................................................................................... ii
LIST OF TABLES .......................................................................................................................................................... v
LIST OF FIGURES ........................................................................................................................................................ vi
INTRODUCTION ........................................................................................................................................................... 1
DNA METHYLATION AND DNA METHYLTRANSFERASES ................................................................................. 3
DEMETHYLATION ................................................................................................................................................... 5
CPG ISLANDS .......................................................................................................................................................... 6
HISTONE MODIFICATIONS ..................................................................................................................................... 7
EPIGENETICS AND DISEASE ................................................................................................................................... 8
THE STUDY OF GENOME-WIDE DNA METHYLATION .......................................................................................... 9
SEQUENCING PLATFORMS ....................................................................................................................................... 11
HYBRIDIZATION SEQUENCING............................................................................................................................ 12
PYROSEQUENCING ............................................................................................................................................... 14
NEXT GENERATION SEQUENCING ...................................................................................................................... 16
ENRICHMENT TECHNIQUES FOR NEXT GENERATION SEQUENCING ................................................................... 18
MTAG ................................................................................................................................................................... 18
POLYMERASE CHAIN REACTION ........................................................................................................................ 19
MOLECULAR INVERSION PROBES ...................................................................................................................... 20
EPIGENETIC MECHANISMS: LACTASE EXPRESSION IN MICE ............................................................................... 23
AIM OF THE STUDY- EPIGENETIC MECHANISMS IN LACTASE EXPRESSION IN MICE .................................... 25
AIM OF THIS STUDY- EPIGENETIC TECHNIQUES IN LACTOSE INTOLERANCE ................................................ 30
HYPOTHESES AND RATIONALE ............................................................................................................................... 31
HYPOTHESIS 1 ..................................................................................................................................................... 31
RATIONALE 1 ....................................................................................................................................................... 31
HYPOTHESIS 2 ..................................................................................................................................................... 31
RATIONALE 2 ....................................................................................................................................................... 31
METHODS ................................................................................................................................................................. 33
RNA EXTRACTION FROM TISSUE ....................................................................................................................... 33
CDNA SYNTHESIS ............................................................................................................................................... 33
REAL-TIME PCR ................................................................................................................................................. 34
HUMAN MOLECULAR INVERSION PROBE SAMPLE SELECTION ...................................................................... 35
iv
MOLECULAR INVERSION PROBES ....................................................................................................................... 38
SEQUENCING ........................................................................................................................................................ 41
STATISTICAL ANALYSIS ...................................................................................................................................... 41
MOLECULAR INVERSION PROBE ANALYSIS .................................................................................................. 42
CORRELATION OF METHYLATION LEVELS BETWEEN TECHNICAL REPLICATES ...................................... 42
ANALYSIS OF METHYLATION AND LACTASE EXPRESSION .......................................................................... 43
RESULTS ................................................................................................................................................................... 44
LACTASE MRNA EXPRESSION IN MICE ............................................................................................................. 44
MOLECULAR INVERSION PROBES: LACTASE INTOLERANCE IN HUMANS ...................................................... 47
DISCUSSION .............................................................................................................................................................. 52
LACTASE EXPRESSION IN MICE .......................................................................................................................... 52
MOLECULAR INVERSION PROBES: LACTASE INTOLERANCE IN HUMANS ...................................................... 53
SPECIFICITY ..................................................................................................................................................... 53
SENSITIVITY ..................................................................................................................................................... 55
REPRODUCIBILITY ........................................................................................................................................... 56
AMOUNT OF DNA REQUIRED ........................................................................................................................ 57
METHYLATION AND LACTASE EXPRESSION ...................................................................................................... 57
FUTURE DIRECTIONS .......................................................................................................................................... 61
REFERENCES ............................................................................................................................................................ 64
v
LIST OF TABLES
Table 1…………………………………………………………………………………………………………………………18
Table 2…………………………………………………………………………………………………………………………18
Table 3…………………………………………………………………………………………………………………………37
Table 4…………………………………………………………………………………………………………………………37
Table 5…………………………………………………………………………………………………………………………38
vi
LIST OF FIGURES
Figure 1………………………………………………………………………………………………………………………..22
Figure 2………………………………………………………………………………………………………………………..45
Figure 3………………………………………………………………………………………………………………………..46
Figure 4………………………………………………………………………………………………………………………..48
Figure 5………………………………………………………………………………………………………………………..49
Figure 6………………………………………………………………………………………………………………………..50
Figure 7………………………………………………………………………………………………………………………..51
1
INTRODUCTION
The term ‘epigenetics’ was first coined by Conrad Hal Waddington in 1942 to
describe mechanisms which link the gene to phenotype (Waddington, 2012). Since then,
the definition of epigenetics has been refined to the study of reversible changes to
chromatin structure which alter gene function without altering DNA sequences, as well as
being mitotically and/or meiotically heritable (Weichenhan & Plass, 2013). An example of
epigenetic influence on phenotypes can be seen in the study of agouti mice; epigenetic
modifications (or lack thereof) in the Intra-cisternal A Particle (IAP) transposon causes
variation in the fur coat colour of otherwise genetically identical mice (Argeson, Nelson, &
Siracusa, 1996; Michaud et al., 1994; Morgan, Sutherland, Martin, & Whitelaw, 1999).
Gradients of fur colors produced correspond to the amount of epigenetic modification in
the IAP transposon. A solid yellow coat is produced when the IAP region lacks epigenetic
modifications, varying levels of modification produce coats with both yellow and black fur,
and a black coat is produced when the IAP region is heavily modified. The modification
responsible for this effect is the methylation of cytosine residues to create 5-methyl
cytosine (Michaud et al., 1994).
Not only are epigenetic modifications essential for the regulation of tissue specific,
housekeeping, and imprinted gene expression, but they also maintain genomic stability
through silencing transposable elements of the genome (Romanish, Lock, van de Lagemaat,
Dunn, & Mager, 2005). For instance, changes in epigenetic ‘code’ within the vicinity of
specific gene clusters can regulate cell differentiation throughout development. Once cells
have completed the differentiation process, epigenetic modifications have been observed to
2
silence the expression of genes specific to other cell types, as in X-chromosome inactivation
(Bird, 2002). Females silence one of their two X-chromosomes, so that X-linked genes from
only one are expressed (Inbar-Feigenberg, Choufani, Butcher, Roifman, & Weksberg, 2013).
This process occurs to compensate for the extra set of X-linked genes present in females,
and the X-chromosome selected for inactivation is randomly chosen during embryogenesis
in the female blastocyst. The long non-coding RNA gene, X-inactivation specific transcript
(XIST), coats the X chromosome to be inactivated and recruits repressing proteins. In
addition, epigenetic modifications of histones and DNA, such as methylation, occur in
promoter regions to ensure the silencing of the X- chromosome (Morey & Avner, 2010).
Although the cells in our bodies carry the same genome from inception throughout
development, each is fated to perform distinct tasks through the process of differentiation,
dictated by the controlled expression of specific genes (Millan, 2013). Understanding the
mechanisms through which epigenetic factors act may not only lead to the development
and discovery of new therapies for cancer and other diseases in which epigenetic defects
are an associated factor, but they may also yield important insight into how these
biologically complex processes are capable of governing themself.
Epigenetic modifications are regulated by various mechanisms. This thesis primarily
focuses on the mechanisms by which epigenetic modifications are made to DNA, and how
these modifications regulate development and disease. In addition, multiple techniques
developed to interrogate epigenetic DNA modifications are examined. Here, we examine
lactase expression in mice, and use the lactase intolerance phenotype in humans as a vector
for the application and analysis of a novel DNA enrichment technique.
3
DNA METHYLATION AND DNA METHYLTRANSFERASES
In humans, the most-investigated epigenetic modification is DNA methylation; that
is, the addition of a methyl group to the cytosine pyrimidine ring within a cytosine-
phosphate-guanine (CpG) dinucleotide, at the carbon 5 position (Bird, 2002). There are
three different types of DNA methyltransferases (DNMTs), each with distinct roles for
generating methylated residues: DNMT1, DNMT3a and DNMT3b. While DMNT1 is
responsible for copying DNA methylation patterns onto newly replicated DNA, DNMT3a
and DNMT3b generate de novo DNA methylation in response to different stimuli. For
examples, diet (Ge et al., 2014), exercise (Ling & Ronn, 2014), and up-bringing environment
(Weaver et al., 2004) have all been shown to affect DNA methylation.
After fertilization, a major reorganization of zygotic DNA methylation, or
‘reprogramming’, occurs in which the majority of methylations of DNA residues are
removed, and a novel set of epigenetic modifications are established as necessary
blueprints for the different tissues of the developing embryo (Choi et al., 2011; Morgan,
Santos, Green, Dean, & Reik, 2005). DNMT3a and DNMT3b re-establish a novel set of
epigenetic modifications during gametogenesis, and again after implantation (that is,
adherence of the blastocyst to the uterine wall). Furthermore, the novel set of epigenetic
modifications established during gametogenesis establish genes with parent-of-origin
specific expression (Okano, Bell, Haber, & Li, 1999). Thus, each resulting cell from
fertilization has its own epigenetic signature, which must be carefully maintained by
DNMT1 in order to properly regulate gene expression for cells of any given lineage
(Morgan et al., 2005). During cell division, the protein UHRF1 (ubiquitin-like, containing
4
PHD and Ring finger domain 1) recognizes and binds to hemi-methylated DNA, that is, DNA
in which the parent strand is methylated but the daughter strand is not (Hashimoto,
Vertino, & Cheng, 2010). Using UHRF1 as a guide, DNMT1 transfers parental strand
methylation marks to the daughter strand. To do so, DNMT1 flips cytosine rings out from
newly synthesized DNA to form an intermediate complex before incorporating a methyl
group from S-adenosyl-L methionine (SAM) onto the carbon 5 position of the cytosine
(Hashimoto et al., 2010).
While DNMT3a and DNMT3b are highly expressed in undifferentiated embryonic
stem cells, their expression decreases to low levels after eventual differentiation to adult
somatic tissues (Okano et al., 1999). Unlike DNMT1, DNMT3a and DNMT3b show no
preference for hemi-methylated DNA and are not essential to maintaining DNA
methylation. In vitro studies have demonstrated that cell lines lacking both DNMT3a and
DNMT3b are incapable of de novo methylation, yet cell lines expressing either of the two
alone remain able to undergo de novo methylation (Okano et al., 1999; Okashita et al.,
2014). Although this finding suggests possible redundancy in the functionality of these
enzymes, studies in mice deficient for either DNMT3a or DNMT3b have revealed distinct
requirements for each methylase during development. Whereas mice lacking DNMT3a
developed to term quickly but became runted and died within 4 weeks after birth, lack of
DNMT3b results in embryonic lethality (Okano et al., 1999).
DNMT1 expression is similarly essential during embryonic development. As it is
critical for maintaining methylation that silences transposable and repetitive elements,
deletion of DNMT1 during gestation results in a 60% loss of global methylation in the
blastocyte after implantation and is consequently lethal to the embryo (Carlone et al., 2005;
5
E. Li, Bestor, & Jaenisch, 1992). In addition, DNMT1 is required for both proper
differentiation of embryonic stem cells, and optimal self-renewal capacity in adult stem
cells (Broske et al., 2009; Lee et al., 2001; Sen, Reuter, Webster, Zhu, & Khavari, 2010;
Sheaffer et al., 2014; Trowbridge, Snow, Kim, & Orkin, 2009).
DEMETHYLATION
Genome-wide loss of cytosine methylation, termed DNA demethylation, occurs at
specific developmental stages within the cells of preimplantation embryos and in
primordial germ cells (PGCs). This global DNA demethylation removes methyl groups
epigenetically inherited from both parental genomes, and is required for establishing
pluripotency in the cells of early embryos (Wu & Zhang, 2014). DNA methylation can also
be progressively lost under circumstances such as repeated cell divisions, where
methylation is not maintained by DNMT1; however, this passive demethylation occurs too
slowly to fully explain the global demethylation observed in preimplantation embryos and
PGCs (Wu & Zhang, 2014).
Active demethylation can better account for the rapid global demethylation which
occurs during embryonic development. The first step of active DNA demethylation occurs
via the base excision repair (BER) pathway in mammals. Ten eleven translocation (TET)
proteins act as methylcytosine dioxygenases to hydroxylate 5-methylcytosine (5mC) to 5-
hydroxymethylcytosine (5hmC) (Hackett, Zylicz, & Surani, 2012; Okashita et al., 2014) .
Until recently, it was thought that the protein activation-induced cytidine deaminase (AID)
would next remove the amine group from 5hmC to produce 5-hydroxymethyluridine
(5hmU), a substrate for DNA glycosylases that regenerate cytosines as well as the BER
pathway (Junjie U. Guo, Su, Zhong, Ming, & Song, 2011). Recently however, Nabel et al.
6
(2012) showed that 5hmC is not a substrate for AID, suggesting that the function of
deaminases may not be the main pathway for active DNA demethylation.
To continue, 5hmC is further oxidized to 5-formylcytosine (5fC) and 5-
carboxylcytosine (5caC), which are both repaired by thymine DNA glycosylase (TDG) to
produce unmodified cytosine (He et al., 2011; Ito et al., 2011). TET proteins are responsible
for the oxidation of 5mC to 5hmC as well as from 5hmC to 5fC/5caC; however, the genomic
content of 5hmC is much higher in embryonic stem cells than that of 5fC/5caC. Accordingly,
the conversion of 5hmC to 5fC/5caC appears to be tightly regulated by TET protein activity,
but yet the mechanisms regulating TET proteins in active DNA demethylation remain
unclear (Pastor, Aravind, & Rao, 2013).
CPG ISLANDS
Methylated cytosine residues are most often found next to a guanine residue.
Cytosine nucleotides followed by a guanine nucleotide are known as CpG dinucleotides, in
which the ‘p' represents the phosphate bond linking the two nucleotides. CpG
dinucleotides are highly mutagenic, and occur with an incidence five times lower in the
genome than would be expected by chance. The low frequency of this dinucleotide is due to
the elevated rates of transition mutations that occur through spontaneous deamination of
5mC to thymidine (J. B. Li et al., 2009).
CpG dinucleotides are found throughout the genome within genes, intergenic
regions, repetitive elements, and finally in clusters called CpG islands. CpG islands are
typically located at the promoter regions of genes, and usually tend to be protected from
methylation (Carninci et al., 2006) despite the fact that 70-90% of CpG dinucleotides are
methylated in healthy somatic cells (Miranda & Jones, 2007). When DNA methylation in
7
CpG islands does occur, it can inhibit the binding of transcription factors and in turn
suppress gene expression (Ball et al., 2009). In particular, this can occur through the
recruitment of a family of methyl-CpG-binding proteins able to recognize methylated
cytosines in mammals. These proteins all contain a homologous methyl-binding domain
(MBD1, MBD2, MBD3, MBD4, and MeCP2), and cooperate with a non-homologous methyl-
binding protein named KAISO. These proteins prevent transcription by both preventing
activating transcription factors from binding to their target sequences, as well as by
recruiting enzymes which catalyze histone posttranslational modifications (in turn
mediating structural changes in chromatin that repress gene expression) (Miranda & Jones,
2007; Thurman et al., 2012). In addition, 5mC can silence genes since transcription factors
do not bind efficiently to methylated DNA (Ball et al., 2009). CpGs found in intergenic
regions and repetitive elements are essential for maintaining genomic integrity. Extensive
CpG methylation in these regions protects the genome from transposition, transcriptional
interference from strong promoters (Romanish et al., 2005), and illegitimate
recombination during cell division (Gonzalo et al., 2006).
HISTONE MODIFICATIONS
As mentioned above, CpG methylation regulates gene expression in conjunction
with other epigenetic marks such as histone modification. N-terminal tails of histones can
undergo various posttranslational chemical modifications, including acetylation,
methylation, phosphorylation, sumoylation, and ubiquitination. For example, histone
methylation at two distinct sites, H3K9me3 and H3K27me3, is required alongside
hypermethylation of CpG islands for successful X-chromosome inactivation (Reik & Lewis,
2005).
8
In contrast, lysine residues on the N-terminal histone tails can be acetylated to
promote an open chromatin structure - consequently increasing gene expression.
Acetylation of lysine neutralizes its positive charge, and thereby weakens the affinity of the
histone for negatively charged DNA. This weaker affinity leads to a looser interaction
between the two, allowing for DNA to be more accessible to transcriptional machinery
(Hashimoto et al., 2010).
EPIGENETICS AND DISEASE
Abnormal regulation of epigenetic modifications to DNA or histones may negatively
impact vital metabolic pathways that trigger pathological conditions. For example, genome-
wide hypomethylation was the earliest epigenetic aberration found in various cancers
(Berdasco & Esteller, 2010). In general, cancer cells consistently present with
hypomethylated intergenic intervals and repetitive elements, along with hypermethylated
CpG islands (Berdasco & Esteller, 2010).
More recently, epigenetic mechanisms have been studied in the context of complex,
non-Mendelian diseases. The hallmark of a non-Mendelian disease is discordant
inheritance between monozygotic (MZ) twins, or those twins colloquially known as
identical. To illustrate, consider that concordant inheritance in MZ twins is only ~15% for
breast cancer, ~20% for ulcerative colitis, 25-30% for multiple sclerosis, 25-45% for
diabetes, 50% for schizophrenia, and 40-70% for Alzheimer’s disease (Petronis, 2001,
2010), all of which are examples of disease displaying non-Mendelian inheritance. In
addition, males and females are also differentially susceptible to non- Mendelian diseases;
multiple sclerosis, rheumatoid arthritis, Crohn’s disease, panic disorders, structural heart
disease, and hyperthyroidism are all more common among females, whilst males are more
9
often affected by autism, Hirschspurng’s disease, ulcerative colitis, Parkinson’s disease,
alcoholism, and allergies (Kaminsky, Wang, & Petronis, 2006).
Typically, the discordant inheritance in MZ twins and gender specificity observed
within complex diseases have been explained by both differential environmental factors as
well as sex-linked genes, respectively. Although few complex-disease-causing
environmental factors (e.g. smoking in lung cancer, or diet in cardiovascular disease) have
been identified, MZ twin inheritance discords may be better explained by the partial
stability of contributing epigenetic factors. These factors could allow for a substantial
degree of disease-relevant epigenetic dissimilarities to accumulate in one MZ twin or the
other (Petronis, 2010). Similarly, while sex-linked genes cannot fully explain the gender-
specific epidemiology of the relevant complex diseases, it is possible that epigenetic
mechanisms are differentially regulated by sex hormones (Gabory, Attig, & Junien, 2009).
Strong evidence supports the notion that epigenetic mechanisms may play a causal
role in complex, non-Mendelian disease. Considering the tremendous cost in both
resources and suffering these afflictions collectively toll upon us, the appeal of a potentially
novel class of therapeutics exploiting epigenetic mechanistics becomes obvious. Through
the years, many tools and techniques have been developed to investigate the components,
capabilities, and roles of epigenetic processes, a select few of which are discussed below.
THE STUDY OF GENOME-WIDE DNA METHYLATION
DNA methylation is studied through different combinations of enrichment and
analytical techniques. Understanding how genomic methylation profiles vary in disease
states is paramount to developing therapeutic treatments. Current methods used to study
the genome-wide methylation levels of CpG sites involve the capture and enrichment of
10
DNA, and can be divided into three categories: (1) restriction endonuclease-based
methods; (2) affinity capture-based techniques; and (3) bisulfite conversion-based
methods.
Enzymes that recognize methylated and non-methylated DNA sequences are often
used to study the methylation profile of the DNA. These enzymes are referred to as
methylation-sensitive restriction enzymes. Methylation analysis can be done using
isoschizomer pairs (which bind to and cut a single recognition sequence identically) of
enzymes, in which one enzyme is sensitive to methylation (cleaves only when DNA is
unmethylated), such as HpaII and the other enzyme is methylation insensitive such as,
MspI, for example. Neoschizomer pairs of enzymes are also used to evaluate DNA
methylation. As a neoschizomer pair, both enzymes bind to a single recognition sequence
but cleave at different sites. One enzyme of the neoschizomer pair such as, SmaI, cleaves
only when the DNA is unmethylated and is paired with the methylation insensitive, XmaI.
Using either isochizomer or neoschizomer pairs of enzymes results in DNA fragments
generated by the methylation sensitive enzyme with differing sizes from sizes of the
fragments generated by the methylation insensitive enzyme. Cytosine methylation can then
be estimated by calculating the ration of the different DNA fragments (Brunner et al. 2009).
Although patterns of cutting can provide a read-out of DNA methylation, this approach is
limited in resolution and coverage by the sequence and modification-type specificity of the
enzymes available (Kriukiene et al., 2013).
In affinity based methods, an antibody capable of recognizing 5mC is used in
techniques, such as MeDIP (Methylated DNA Immunoprecipitation), is used to
immunoprecipitate the methylated fraction of the genome (Weber et al., 2005). MeDIP
11
enriches for the methylated regions of the DNA. This technique is limited by its inability to
accurately resolve regions with low to medium CpG density in the genome, such as CpG
islands (Laird, 2010). Since the majority of highly methylated regions are repetitive
elements, the majority of what is enriched will be repetitive elements.
Lastly, bisulfite conversion coupled with sequencing is presently the gold standard
for genome-wide methylation levels at CpG dinucleotides due to its ability to map 5-
modified cytosines with single-base resolution (Laird, 2010). Genomic DNA is denatured
and treated with sodium bisulfite, leading to the conversion of unmethylated cytosines to
uracils through a sulphonation reaction (Ball et al., 2009), with uracils copied as thymines
by DNA polymerase during subsequent PCR. Essentially, bisulfite conversion changes an
epigenetic difference into one of sequence, as it allows for the methylation status
information to remain, even after amplification (Laird, 2010). For example, a luciferase-
based sequencing approach known as pyrosequencing is often used to sequence bisulfite
converted DNA. The fallback of bisulfite conversion is the high potential for DNA
degradation, as well as incomplete conversion. Bisulfite conversion is also limited due the
fact that it cannot distinguish between 5mC and other modifications such as 5hmC (Khare
et al., 2012).
SEQUENCING PLATFORMS
DNA sequencing, in the context of DNA methylation analysis, provides sequence
information about methylation profiles. With the advent of sequencing, it is now possible to
identify the exact genes and genomic regions affected by methylation, providing increased
insight into the role of methylation in genomic regulation.
12
HYBRIDIZATION SEQUENCING
Hybridization sequencing refers to the use of microarrays to quantify mRNA or DNA
levels. At present, our laboratory uses Human Tiling Arrays 2.0R from Affymetrix to
investigate epigenetic variation between disease and control populations. The
oligonucleotides used in Human Tiling Arrays 2.0R span the genome, and are 25
nucleotides in length. Hundreds of thousands to millions of copies of a single
oligonucleotide are grouped together in a specific area on the array, in what is called a
probe cell. Each array contains over 6.5 million probe cells fixed to the array surface with
each designed to perfectly match specific genomic regions. A major advantage of
microarray technology is that thousands of genes can be interrogated simultaneously, by a
single array. There is a gap of 10 base pairs between the oligonucleotide sequences of each
probe, offering an average resolution of 35 base pairs as measured from the central
position of adjacent probes. The probes do not include sequences which are identified as
repetitive regions, or low complexity DNA sequences by computer software program,
RepeatMasker (Tarailo-Graovac & Chen, 2009). Currently, over 56% of human genomic
sequences are identified and masked by RepeatMasker. It is essential to remove repetitive
elements from the microarray probes, as their repetitive nature produces a noisy and
uninterpretable signal.
In order to begin microarray analysis, target DNA must first be enriched so enough
DNA is available for microarray probe binding. To study epigenetic variation, our lab has
employed restriction enzyme or immunoprecipitation-based enrichment techniques. The
target DNA is first amplified using PCR, with uracil nucleotides incorporated into the
resulting amplicons. These uracil nucleotides can then be recognized by uracil DNA
13
glycosylase (UDG). UDG cleaves the N-glycosidic bond between the uracil base and the
sugar backbone, creating an apyrimidinic site that blocks DNA polymerase from continuing
the chain reaction, and adding additional nucleotides (Barzilay, Walker, Robson, & Hickson,
1995). An apyrimidinic site has a terminal 5'-phosphate, recognized and cleaved by
apurinic/apyrimidinic endonuclease (APE1) activity in the BER pathway (Barzilay et al.,
1995; Marenstein, Wilson, & Teebor, 2004). APE1 cleavage generates a single-strand DNA
break, fragmenting the enriched amplicons to an average length of 25-100 nucleotides; this
fragmentation improves both the efficiency and specificity of target binding (Dalma‐
Weiszhausz, Warrington, Tanimoto, & Miyada, 2006). The resultant fragments are labeled
by terminal deoxynucleotidyl transferase (TdT) and a biotinylated nucleotide analogue.
This label is the binding site for the following fluorescent label.
Subsequently, bound DNA fragments are labelled with a fluorescent streptavidin-
phycoerythrin conjugate (SAPE), which binds to the biotin tag incorporated during target
amplification. Once labelled, the array is ready to be scanned. The scanner is able to
identify 65,000 distinct fluorescence intensities, and converts fluorescent measurements
into an electrical signal expressed as corresponding numerical value. These values
represent and are used to quantify DNA levels (Dalma‐Weiszhausz et al., 2006). Data from
the microarrays is analyzed by comparing the relative fluorescent signal intensity from
fragment-bound probes. If, for example, DNA was enriched for unmethylated regions, then
high levels of fluorescence emitted from a particular probe cell would indicate a lesser
degree of methylation from the region covered by the oligonucleotides, relative to other
probe cells.
14
PYROSEQUENCING
Pyrosequencing was the first sequencing platform capable of parallelizing the
sequencing process, made available as a commercial product (Margulies et al., 2005). This
luciferase-based technique, traditionally used to investigate single nucleotide
polymorphisms (SNPs) (Fakhrai-Rad, Pourmand, & Ronaghi, 2002), can be used to
determine the relative extent CpG methylation when combined with bisulfite conversion.
Primers designed to be biotinylated act to interrogate CpG sites on a bisulfite-converted
template. Since bisulfite-converted DNA is more fragile than genomic DNA, amplicon
lengths are usually kept to 200 bp or less, separated from solution through the use of
streptavidin beads. This short amplicon length limits the size of the genomic region that
can be efficiently evaluated using pyrosequencing. In this technique, singe-stranded DNA is
hybridized to a sequencing primer and incubated with the enzymes DNA polymerase, ATP
sulfurylase, luciferase, and apyrase, along with the substrates adenosine 5’ phosphosulfate
(APS), luciferin and the four dinucleotdies: dCTP, dGTP, dTTP, and dATPaS (dATPaS
replacing dATP, as the latter is a substrate to luciferase and would generate non-specific
signals). The immobilized single stranded DNA is sequenced through the synthesis of the
complementary strand, one base at a time. When a nucleotide is incorporated by DNA
polymerase, a pyrophosphate (PPi) is released at 1:1 stoichiometry and subsequently
converted to ATP by ATP sulfurylase. This reaction provides the energy for luciferase to
oxidize luciferin and generate light. This visible light, proportional to the amount of ATP
released, is detected by a camera as the measurement recorded. The final step of the
reaction is the degradation of unincorporated nucleotides by apyrase. This methodology
15
allows determination of whether or not CpG sites within the amplicon are methylated, or
not (Fakhrai-Rad et al., 2002).
The determination of the DNA methylation level of a CpG site is displayed as an
average of sequencing of all possible templates present in a single PCR reaction. As a result,
pyrosequencing can accurately determine the amount of methylation above a threshold
level of approximately 5% at each CpG position (Metzker, 2010). When the reverse PCR
primer is biotinylated, the reverse strand is used as the template strand and the
pyrosequencing primer will be extended according to the base composition of the forward
strand. In this case, the relative amount of methylation at the CpG position is determined by
the ratio of incorporated dCTP and dTTP, for methylated and unmethylated CpG sites
respectively. Similarly, if the forward PCR primer is biotinylated the forward strand is used
as a template and the sequencing primer is extended according to the base composition of
the reverse strand. The amount of DNA methylation at an individual CpG site is the
calculated ratio between dGTP for methylated CpG sites, and dATP-a-S for unmethylated
CpG sites (Fakhrai-Rad et al., 2002). Bisulfite conversion paired with pyrosequencing is an
efficient method to interrogate small genomic regions in a small group of samples.
However, this methodology is limited by its ineffective reading of sequences within
homopolymers (consecutive instances of the same base, such as AAA or GGG). As light
emission is non-linear following the incorporation of more than 5-6 identical nucleotides
(Ronaghi, 2001), the length of homopolymers is difficult to infer from signal intensity using
this approach (Shendure & Ji, 2008).
16
NEXT GENERATION SEQUENCING
Sequencing platforms capable of sequencing multiple samples in parallel are
referred to as ‘next-generation sequencing platforms’. Cyclic reversible termination is one
such sequencing method for sequencing DNA with single nucleotide resolution. In this
reaction, DNA polymerase incorporates a fluorescently modified nucleotide which is
complementary to the template base (Metzker, 2010). After the nucleotide is incorporated,
the polymerase activity is terminated and the remaining unincorporated nucleotides are
washed away. An image is then obtained to determine the identity of the incorporated
nucleotide. The fluorescent dye is then cleaved along with the terminating/inhibiting group
of the nucleotide, and an additional wash is performed before the next incorporation step
begins (Metzker, 2010). Reversible termination is achieved by using deoxyribonucleoside
triphosphates (dNTPs), which have their 3'-OH group blocked to make continuation of the
reaction impossible (J. Guo et al., 2008).
Next generation sequencing produces output sequences referred to as “reads”.
Using data analysis these reads are mapped back to a reference genome. Bioinformatic
programs such as Maq and Bowtie align reads to regions of the genome from which they
most likely originated (Trapnell & Salzberg, 2009). Yet the analysis of this approach
becomes more challenging if reads are generated from sequences containing one or more
repetitive element of the reference genome much like affinity-based sequencing methods,
as the mapping program cannot accurately state which copy of the repeat the sequence
originated from (sometimes reporting multiple locations, or choosing one at random)
(Trapnell & Salzberg, 2009). The program, bisReadMapper is a new tool used for the
analysis of bisulfite converted sequencing data. This program maps bisulfite-converted
17
reads to a fully bisulfite-converted genome sequence, allowing for reads to be mapped not
only to their correct region in the genome, but to their strand of origin as well (Diep et al.,
2012) – a feature not offered by hybridization sequencing at present.
Next generation sequencing allows for millions of DNA templates to be sequenced in
parallel, producing gigabytes of data; however, large scale whole genome sequencing is still
cost and time prohibitive (Metzker, 2010). In order to minimize cost and maximize
efficiency, 'target- enrichment' methods have been developed in which targeted genomic
regions of interest are selectively captured from the DNA sample before sequencing.
Sequencing targeted regions of the genome as opposed to the entire genome is not only
more time and cost effective, but produces data which can be more easily analyzed
(Metzker, 2010). Table 1 outlines the parameters, in addition to cost and ease of use,
which provide measures of the performance target enrichment techniques (Metzker,
2010).
18
Table 1. Parameters used to measure the performance of the target enrichment techniques.
Parameter Definition
Sensitivity The percentage of target bases that are covered by one or more sequence reads
Specificity The percentage of sequences that map to the targeted regions
Uniformity Variability in sequence coverage across target regions
Reproducibility How closely results from replicate experiments correlate
DNA Required The minimum amount of DNA required per experiment
ENRICHMENT TECHNIQUES FOR NEXT GENERATION
SEQUENCING
MTAG
Recently, a novel strategy termed ‘methyltransferase-directed transfer of activated
groups’ (mTAG) has been developed. This approach takes advantage of the covalent
chemical labeling and enrichment of unmethylated DNA (Kriukiene et al., 2013;
Lukinavicius et al., 2007), occurring in five steps as outlined in Table 2 (Kriukiene et al.,
2013).
Table 2. Major steps involved in the mTAG technique. Step 1 Mechanical shearing of genomic DNA to
200bp fragments Step 2 Methyltransferase-directed azide labeling
of unmethylated CpG dinucleotides Step 3 Appending biotin reporters to azide group Step 4 Affinity capture and recovery of biotin-
labeled fragments on streptavidin-coated beads
Step 5 PCR amplification of the recovered fraction for microarray analysis
19
The advantage of using mTAG is that it enables the enrichment of unmethylated CpG
dinucleotides. In this way, the enriched portion provides information on moderately
methylated regions such as CpG islands in gene promoter regions. In addition, mTAG allows
for genome-wide interrogation of methylation levels, as opposed to specifically targeted
regions such as in PCR and in using molecular inversion probes. Though mTAG-enriched
DNA can be sequenced alone, our lab uses mTAG coupled with hybridization sequencing
(microarrays) in order to obtain a general pattern of areas in which methylation profiles
may vary between subjects and controls. If significant variation exists in a few genomic
regions between the disease samples compared with controls, PCR coupled with
pyrosequencing is used to validate the microarray results with single base resolution. If the
microarray data demonstrate that there are multiple areas within the genome with high
variance, then molecular inversion probes coupled with next generation sequencing are
used to validate the microarray data with single base pair resolution.
POLYMERASE CHAIN REACTION
PCR can be used to amplify a specific target amplicon for subsequent sequencing.
The amount of DNA required for this technique is proportional to the desired number of
target amplicons, since each amplicon must be amplified individually (Cho et al., 1999). The
sensitivity of PCR amplification can be increased by using overlapping PCRs for a target
region. Increased sensitivity comes at the cost of higher DNA input, which is notably not
always available in the cases of clinical studies. Multiplex, a technique in which multiple
primers are added in the same PCR reaction, is typically avoided when using PCR primers
in genomic DNA to avoid amplicon amplification failure, as well as non-specific
amplification caused by primer-primer interactions (Cho et al., 1999; Wang et al., 1998).
20
Furthermore, multiplexing is impossible when using bisulfite- treated DNA, since the
lowered complexity of the sequence enhances primer-primer interactions. Each PCR must
therefore be validated and optimized individually, making PCR an impractical enrichment
method for a high-throughput next generation sequencing platform like Illumina MiSeq
(Mamanova et al., 2010). In our lab, PCR enrichment is coupled with pyrosequencing for
analyzing only small regions of the genome with high uniformity and reproducibility.
MOLECULAR INVERSION PROBES
Bisulfite-converted DNA can also be selectively enriched through the use of
molecular inversion probes. Molecular inversion probes are an improved version of the
padlock probe first designed in 1994 (Nilsson et al., 1994). Padlock probes were originally
designed to identify SNPs through a ligase reaction, and have now been improved to allow
for the interrogation of longer sequences (Porreca et al., 2007). The probe is made up of
single stranded DNA, containing two sequences, separated by a common linker region,
which are complementary to the target genomic DNA. Molecular inversion probes anneal to
and capture target DNA through circularization for subsequent enrichment. The term
‘circularization’ refers to how after the probe hybridizes to the target. The two arms of the
probe hybridize to the targeted DNA sequence. DNA polymerase extends the sequence
from the first arm to the second, after which, the extended sequence is then ligated to the
second hybridized arm of the probe, creating a closed DNA circle, as outlined in Figure 1
(Hardenbol et al., 2003; Mamanova et al., 2010; Metzker, 2010; Porreca et al., 2007).
Remaining probes that did not circularize are digested by exonucleases in the subsequent
step. Since each probe has an identical linker sequence, only one pair of primers is required
to amplify all the circularized probes in solution (Akhras et al., 2007). The forward or
21
reverse primer for each sample can be fitted with a unique nucleotide sequence six
nucleotides in length, which is used to differentiate the samples. In this way, resultant
amplicons from each sample will have a unique six nucleotide sequence for identification,
allowing amplicons from many samples to be pooled for sequencing. Molecular inversion
probe enrichment is a cost-effective alternative to whole genome sequencing, and is
optimal for high-throughput analyses using minimal amounts of DNA (J. B. Li et al., 2009).
22
Figure 1 Schematic of library-free BSPP protocol. Each padlock probe has a common linker
sequence flanked by two target-specific capturing arms (red) that anneal to bisulfite converted
genomic DNA (black). The 3′ end is extended and ligated with the 5′ end to form circularized
DNA. After removal of linear DNA, all circularized captured targets are PCR-amplified with
barcoded primers and directly sequenced with an Illumina sequencing platform (GA II(x) or
HiSeq). Amplicon size is 363 bp, which includes captured target (180 bp), capturing arms (55
bp), and amplification primers and adapters (128 bp). The inserts can be read through with
paired-end 120 bp sequencing reads. Reprinted with copyright license (Diep et al., 2012).
23
In addition, molecular inversion probes can be multiplexed; that is, tens of
thousands of genomic loci can be enriched simultaneously (Diep et al., 2012; Hardenbol et
al., 2003; Porreca et al., 2007; Shen et al., 2013). A major challenge with molecular
inversion probes has been probe hybridization reproducibility. When molecular inversion
probes were used to enrich for exon regions of genomic DNA, molecular inversion probes
performed with low sensitivity and low reproducibility. The authors reported that in
duplicate experiments, only 20% of targets were captured in both data sets, and only 11%
were exon regions. Improvements to probe design have since been made to improve target
amplification. Diep et al. (2012) were able to sequence 330,000 probes that covered
140,749 non-overlapping regions with a total size of 34 mega bases. Due to the large
genomic areas which can be targeted and amplified using padlock probes, this technique
couples well with a next generation sequencing platform such as Illumina MiSeq.
EPIGENETIC MECHANISMS: LACTASE EXPRESSION IN MICE
The lactose provided by a mother’s milk is a major dietary source of carbohydrates
for mammals in the first few weeks of life. Lactose is digested by the enzyme lactase, which
is produced exclusively by the enterocytes of the small intestine epithelium intestine. As
offspring age, the small intestine develops in preparation for the consumption of solid food,
characterized by changes in enzymatic expression. Generally, lactase expression decreases
as the expression of digestive enzymes such as sucrase-isomaltase (sucrase) increase (Jost,
Duluc, Vilotte, & Freund, 1998). Initially, lactase is expressed in highest levels in the jejunal
segment of the small intestine, but expression rapidly decreases just prior to weaning in
most mammals. Sucrase has a contrasting temporal pattern of expression compared with
24
lactase, in that it is expressed minimally at birth, and then drastically increases around
weaning. Coincidentally, the expression of sucrase is also highest in the jejunal segment of
the small intestine (L. Fang, Ahn, Wodziak, & Sibley, 2012).
The onset of decreases in lactase expression and the coinciding increase in sucrase
expression can be influenced by nutritional and hormonal changes at the time of weaning.
For example, the decrease in lactase expression can be delayed or accelerated by the
delayed or premature introduction of solid food, respectively (Jost et al., 1998). In addition,
the enzymatic changes of the small intestine can be introduced by stress, which also
accelerates the decrease in lactase expression (Jost et al., 1998). Previously, it has been
investigated whether the temporal patterns of lactase and sucrase expression could be
regulated by transcription factors capable of regulating the expression of intestinal lactase
and sucrase genes (R. Fang, Olds, & Sibley, 2006). A number of transcription factors which
activate lactase and sucrase, including, Cdx-2 (R. Fang et al., 2006; Suh, Chen, Taylor, &
Traber, 1994) GATA-4, GATA-5, and GATA-6 (R. Fang, Olds, Santiago, & Sibley, 2001), along
with HNF1α and HNF1β (R. Fang et al., 2006) have been examined in the small intestine of
young and old mice, and compared with the expression of repressing transcription factors
PDX-1 and CDP. No clear pattern of transcription factor expression has yet to be identified
to account for the contrasting temporal patterns of lactase and sucrase expression.
However, spatial and temporal expression patterns may be due to a distinct combination of
regulatory transcription factors acting in concert (R. Fang et al., 2006).
Intestinal epithelial cells are essential for food absorption and digestion.
Enterocytes account for 90% of all epithelial cells and are responsible for the production of
lactase (Sheaffer et al., 2014). The intestinal epithelium is constantly maintained through
25
the division of stem cells, which differentiate as they migrate from the intestinal crypt
region to the villus. This process takes 3-5 days in both mice and humans, and allows for
function in the intestinal epithelium to be continually maintained (Sancho, Batlle, &
Clevers, 2004; Sheaffer & Kaestner, 2012).
Epigenetic mechanisms, specifically DNA methylation, have been shown to be
critical for proper gene expression during stem cell differentiation in self-renewing tissues,
just as in the small intestine (Sen et al., 2010; Smith & Meissner, 2013), and the functional
role of DNA methylation in intestinal epithelial differentiation has been recently evaluated
(Sheaffer et al., 2014). Interestingly, global interruption of DNMT1 activity in the small
intestine resulted in a decrease in differentiated enterocytes (Sheaffer et al., 2014). As a
result, expression of enterocyte-specific genes, such as lactase, was decreased. Thus, DNA
methylation plays a regulatory role not only in the differentiation of stem cells into
enterocytes, but also the gene products (such as lactase) for which they are responsible
(Sheaffer et al., 2014).
AIM OF THE STUDY- EPIGENETIC MECHANISMS IN LACTASE
EXPRESSION IN MICE
It was mentioned how the temporal pattern of lactase expression is affected by
environmental stimuli, such as the introduction of complex adult food or stress. In addition,
the post-weaning decrease in lactase expression coincides with an increase in sucrase
expression and the mechanism of this counter-expression has yet to be fully understood.
Since lactase expression is influenced by environmental factors and a significant decrease
occurs at a specific developmental stage, it’s possible that regulation of lactase expression
may be mediated by an epigenetic mechanism. As such, the overall aim of this study was to
26
determine whether DNA methylation levels vary between pre-weaned and post-weaned
mice and whether methylation levels are associated with levels of lactase expression. The
first step in determining how methylation levels correlate with lactase expression was to
examine the change in lactase expression in young and old mice.
EPIGENETIC MECHANISMS: LACTOSE INTOLERANCE IN HUMANS Lactose is hydrolyzed by enterocytic lactase into two absorbable sugars, glucose and
galactose, within the small intestine. Lactase activity is highest during the perinatal period,
but significantly decreases in some individuals after 2-12 years of age. Those who
experience this decrease in lactase expression with age lose their ability to digest lactose,
developing the condition called lactose intolerance or hypolactasia (Troelsen, 2005). In
contrast, lactase-persistent individuals retain neonatal levels of lactase activity throughout
adulthood, allowing these adults to consume lactose without negative effects (Rasinpera et
al., 2005; Troelsen, 2005). Lactase persistence shows dominant inheritance and complete
phenotype concordance in monozygotic twins (Metneki, Czeizel, Flatz, & Flatz, 1984). Yet in
those who are lactose intolerant, lactose is fermented by the colonic microflora, creating
short-chain fatty acids, hydrogen, carbon dioxide and methane. These byproducts
unfortunately cause bloating, flatulence, diarrhea and abdominal pain (Lomer, Parkes, &
Sanderson, 2008).
Numerous tests may be used to diagnose lactose intolerance. The Quick Lactase Test
(QLT) is a biochemical assay following a duodenal biopsy, capable of diagnosing the
intolerance. A glucose oxidase reagent is used in the assay to measure the amount of
glucose released after the hydrolysis of lactose (Kuokkanen et al., 2006). Lactase activity
can then be assessed by the color change of the reaction: a no-color reaction indicates
27
severe hypolactasia, with lactase activity of <10 U/g; light blue reactions indicate mild
hypolactasia corresponding to lactase activity of 10-30 U/g; and finally, normal lactase
activity of 30 U/g corresponds to a deep blue reaction color (Furnari et al., 2013). Invasive
jejunal biopsies were replaced with the less-invasive endoscopic duodenal biopsies for the
purposes of the QLT, which facilitated patient diagnosis (Furnari et al., 2013; Kuokkanen et
al., 2006). Although the mean lactase activity in the duodenum is 40% lower than in the
jejunum, this test was still effective at identifying patients with severe hypolactasia with
100% accuracy (Kuokkanen et al., 2006). Alternatively, tests that indirectly measure
lactase function have been developed to avoid the need for intestinal biopsies. In one such
test, blood glucose levels were measured before and after oral lactose ingestion, as well as
at specific time intervals. The individual is considered lactose tolerant if the there is a
minimum blood glucose rise of 20mg/dL (Law, Conklin, & Pimentel, 2010). As another
example, the hydrogen breath test was considered to be the most suitable test for
population screening of lactose intolerance (Mattar, de Campos Mazo, & Carrilho, 2012;
Newcomer, McGill, Thomas, & Hofmann, 1975). For this test, an individual orally consumes
50 g of lactose, which is the equivalent of 4-5 cups of milk. If the individual is lactose
intolerant, the undigested lactose ferments, releasing hydrogen, carbon dioxide, and
methane which are absorbed and eliminated by the lungs. High levels of hydrogen
exhalation after lactose consumption would sensibly indicate that an individual is lactose
intolerant (Law et al., 2010; Mattar et al., 2012; Newcomer et al., 1975). Genetic testing for
the lactase expressing phenotype is now used to screen populations for lactose intolerance
(Mattar et al., 2012).
28
The lactase gene in humans, LCT, is 49.3 kb in length and is located on long (q) arm
of chromosome 2 at position 21. LCT is made up of 17 exons, which translate into a 6 kb
transcript (Boll, Wagner, & Mantei, 1991). The RNA sequence in this 6kb transcript is the
same (barring some silent mutations) in individuals with either hypolactasia or lactase-
persistence (Boll et al., 1991). However, two DNA variants have been identified in LCT
introns which have been highly associated with lactase-persistence in subjects of European
descent (Enattah et al., 2002; Troelsen, 2005). The first variant, LCT-13910*C/T, is located
in intron 13 of the MCM6 gene, 13,910 bp upstream from the initiation codon for LCT. In
subjects of European descent, simply having a single LCT-13910*T allele allowed for the
lactase persistent phenotype (Mattar et al., 2012; Rasinpera et al., 2005; Troelsen, 2005).
This indicates that the lactase persistence allelic variant behaves in a dominant fashion, and
thus the LCT-13910*CC genotype, in which no lactase persistent allele is present, is
consistent with the inability to digest lactose (Mattar et al., 2012). The second variant, LCT-
22018*G/A, occurs at intron 9 of the MCM6 gene located 22,018 bp upstream of the LCT
start codon. This second variant was very strongly, but not completely, associated with the
lactase persistence phenotype. Although the second variant’s association with the lactase
persistence phenotype is incomplete, there is almost full agreement in the genotyping of
the two variants. For example, individuals with a cytosine on both copies of chromosome
two in the first SNP site (LCT-13910*CC) also had a guanine in the second SNP site (LCT-
22018*GG), individuals which were heterozygous for the first SNP site (LCT-13910*CT)
were also heterozygous at the second SNP site (LCT-22018*GA). Individuals with the
lactose tolerance phenotype had a thymine nucleotide in the first SNP site in both copies of
chromosome two (LCT-13910*TT) also had adenine in the second SNP site in both copies
29
of the second chromosome (LCT-22018*AA) (Rasinpera et al., 2005). These two SNP
genetic variants are only associated with lactose persistence in individuals of Northern-
European descent. In contrast, LCT-13907*G, LCT-13915*G, and LCT-14010*C are SNPs
which are associated with lactose persistence in African and Arabian populations (L. Fang
et al., 2012; Ingram et al., 2007).
Although SNPs may be correlated with lactose non-persistence, there is substantial
variability in lactase expression between those who are homozygous non-persistent (LCT-
13910*CC), and those heterozygous and homozygous persistent (LCT-13910*TT)
(Troelsen, 2005). Previous research indicates that the LCT gene resides in a region with
high ‘linkage disequilibrium’ which extends over several hundred kilobases (Poulter et al.,
2003), suggesting that the key SNPs are merely highly associated markers rather than
causal agents (Swallow, 2003). Linkage disequilibrium refers to a combination of alleles or
genetic markers that do not undergo random recombination and are present in a higher
frequency in the population that would be expected by chance. In vitro studies have
demonstrated that lactase persistence variants may act as enhancers of the LCT promoter
by improving the binding of Oct-1 and HNF1α transcription factors. Transgenic mice have
previously been used to determine how these lactase persistence-associated SNPs affect
lactase expression in vivo (L. Fang et al., 2012), carrying a luciferase reporter gene driven
by a rat 2 kb lactase gene promoter fused with portions of human DNA sequence
corresponding to either the lactase persistence-associated LCT-13910*T, or the non-
persistence-associated LCT-13910*C SNP. It was found that the transgene expression
followed the same expression pattern as endogenous lactase, expressed only in the
enterocytes of the small intestine with highest expression occurring in newborn pups,
30
followed by a sharp post-wean decline (Lee et al. 2002). In this study, there was a 16-fold
decrease in luciferase activity in adult mice with the LCT-13910*C SNP compared with
pups, whereas adult mice with the LCT-13910*T SNP experienced a ~1.6-fold increase in
luciferase expression compared with pups. These results support the causal role of the
LCT-13910*C/T SNPs in determining the lactase persistence/non-persistence phenotype,
although the mechanism by which these SNPs control lactase expression has yet to be fully
characterized (L. Fang et al., 2012).
AIM OF THIS STUDY- EPIGENETIC TECHNIQUES IN LACTOSE
INTOLERANCE
The theory that lactase persistence variants improve transcription factor binding
does not fully explain hypolactasia. For instance, genetic variants cannot account for the
ontogenic delayed-age-of-onset associated with the condition. In other words, no
explanation is offered as to why lactase expression only decreases after a certain age. The
age of onset of the condition and the variability in lactase down regulation between the
three genotypes supports the hypothesis that additional factors, likely epigenetic,
contribute to the specifics of lactase expression (Swallow, 2003).
With the preceding information in mind, it may be stated that the aim of the current
study was to evaluate the use of the molecular inversion probe technique to enrich unique
and segmentally duplicating regions of DNA, with the goal of validating previous data
obtained through microarray analysis.
31
HYPOTHESES AND RATIONALE
HYPOTHESIS 1 Lactase expression decreases in 60 day old mice, as compared with 6 day old mice.
RATIONALE 1
Developmental changes in rodent lactase expression have been well investigated.
Our objective was to demonstrate that the developmental decrease in lactase expression is
governed in part by an epigenetic mechanism. To that end, we first performed a gene
expression assay on the small intestinal biopsy of both 6 and 60 day old mice to
demonstrate that lactase expression decreases with age. Relative lactase gene expression in
the proximal jejunal, distal jejunem and ileum segments of the small intestines was
compared in 6 and 60 day old mice.
HYPOTHESIS 2
Molecular inversion probes were used to interrogate the methylation status of
unique and segmentally duplicating regions in the genome of experimental subjects, to
provide high resolution validation of previous microarray data.
RATIONALE 2
Dr. Viviane Labrie, of our laboratory, previously performed expression assays along
with genotyping experiments, to determine whether lactase expression was correlated
with genotype. An mTAG enrichment followed by microarray hybridization was performed,
revealing four unique regions and four segmentally duplicating regions in chromosome 2
for which the methylation profile strongly correlated with lactase expression. The
32
gastrokine 2 (GKN2) gene and the first LCT exon were among the unique regions in which
methylation levels correlated with lactase expression. As segmental duplications are blocks
of genomic DNA ranging from 1-200kb in length that contain sequence features including
high-copy repeats and gene sequences with intron-exon structure, these blocks of DNA can
repeat hundreds of times throughout the genome as well as repeat either inter- or intra
chromosomally or both (Bailey et al. 2001).
Regions of segmental duplication hence pose complications for interrogation. For
instance, when designing primers for the lactose intolerance regions, it must be kept in
mind that primers will easily anneal to the segmental duplication as it appears on several
different chromosomes in the genome. However, since the regions are only 90% identical,
amplicon sequences from each segmental duplicate will have variation. This amplicon
variation will cause the pyrosequencing to fail, since ~100% homology between predicted
sequence and amplicon is required.
Since we were attempting to interrogate the methylation levels of regions within
segmental duplications in this experiment, next generation sequencing was necessary as it
does not require 100% sequence homology of amplicons (since each amplicon is
sequenced individually). Due to the repetitive nature of the regions of interest and limited
DNA available, molecular inversion probes were selected as the enrichment method. In
sum, this experiment evaluated the efficacy of molecular inversion probe enrichment
followed by next-generation sequencing to interrogate the methylation profiles of unique
and segmentally duplicating regions within the genome, with lactase expression serving as
our biological context.
33
METHODS
RNA EXTRACTION FROM TISSUE
The proximal and distal jejunum, along with the ileum of the small intestine from
CL57BL/6 strain mice (fifteen 6 day old and fifteen 60 day old) was harvested and
immediately frozen in liquid nitrogen. Samples were then stored in -80°C. RNA was
extracted from mouse small intestinal tissue with an ‘RNeasy Mini Kit’ (Qiagen). 20-30mg
of tissue was placed in a homogenization tube (Precellys 2mm beads) containing 600µl
RNeasy lysis buffer. Tissue was immediately homogenized at 5000 rpm in two 15 second
periods, spaced 10 seconds apart. The lysis solution was then transferred to the RNeasy
column, with the following steps preformed according to the manufacturer’s instructions.
RNA was eluted in RNase-free water, prior to treatment with DNAse I (Qiagen) for one hour
to remove DNA from the samples. Treated RNA samples were then re-purified through a
fresh RNeasy column, and again eluted in RNase-free water. RNA sample quantity was
verified using a Nanodrop spectrophotometer 2000. The quality of the RNA was
determined by using the Agilent 2100 bioanalyzer which uses microcapillary
electrophoretic RNA separation and an algorithm to allow for the calculation of an RNA
integrity number (RIN). The RIN score ranges from 10 (completely intact) to 1 (completely
degraded). All samples had a RIN score >8. Samples were stored at -80°C until used
further.
CDNA SYNTHESIS
34
cDNA synthesis was performed using a High Capacity RNA-cDNA kit (Applied
Biosystems). This kit uses MultiScribe™ MuLV reverse transcriptase, which optimally
works at a temperature of 37°C. Included in the reaction are dinucleotide triphosphates
(dNTPs), random octamer primers, and oligo dT-16.
For each reaction, 1- 2µg RNA in a total volume of 20µl was used. To each RNA sample,
10µl of the 2x reaction buffer (Applied Biosystems) was added along with 1µl of the MuLV
reverse transcriptase. Samples were gently mixed and briefly centrifuged before being
incubated at 37°C for 60 minutes, followed by heating at 95°C for 5 minutes. After the
completion of the reaction, samples were diluted to 10ng/ul and stored at -80°C until
further use.
REAL-TIME PCR
Real-time PCR allows for the amplification of products to be detected as the reaction
progresses, with measures taken at the end of each cycle. Fluorophores emit a fluorescent
signal proportionate to the amount of DNA produced, permitting data to be collected in
'real time' rather than at the end of PCR.
For our real-time PCR analysis, we used Taqman™ Probes. In addition to two
primers which are designed to amplify a specific region, there is a third oligo called a probe,
which is target specific and sits on the region to be amplified between the primers. Two
molecules, the reporter dye and the quencher, are covalently bound to the probe. The
reporter sits on the 5' end of the probe, and is what produces the fluorescent signal as more
DNA product is produced. The quencher sits on the 3' end of the probe and inhibits
(“quenches”) the signal of the reporter dye whenever they are in close proximity. During
the PCR reaction, the polymerase hydrolyzes the quencher from the oligo as it replicates
35
the amplicon. When the quencher is removed from the oligo, this permanent separation
allows the reporter dye to emit its signal unimpeded. Since the oligo is target specific, the
detected fluorescent signal comes only from amplification of targets and not from anything
non-specifically amplified.
‘Taqman Gene Expression Mastermix’ (Applied Biosytems) was used to evaluate
transcript levels of target genes. 40ng of cDNA template was used in each PCR reaction.
Measurements were performed in triplicates, and controls included both non-template
controls, along with samples which had not been treated with reverse transcriptase.
Amplification curves and gene expression were normalized to the housekeeping gene
GAPDH. The expression of each specific gene was determined using the ‘Assay-on-Demand’
gene expression products (Applied Biosystems) Lactase (Mm01285112_ml) and GAPDH
(Mm99999915_g1). Each reaction had a final volume of 20 µl, and all reactions were
performed in 384-well microtiter plates. PCR amplification and fluorescence data
collections were performed using the ViiA™ 7 Real-Time PCR System (Applied Biosystems).
PCR program parameters included a hold stage of 2 min at 50°C, denaturation for 5 min at
95°C, and 40 cycles of amplification. Amplification cycles consisted of 15 sec at 95°C,
followed by 1 min at 60°C.
HUMAN MOLECULAR INVERSION PROBE SAMPLE SELECTION
Samples to be interrogated by molecular inversion probe enrichment were selected
based on several factors. Whole blood samples were selected based on lactase activity as
measured by the QLT, to include equal numbers of human individuals with low and high
lactase activity. Age and gender were normalized across groups. Enterocyte and jejunum
36
samples were selected based on LCT mRNA levels as measured using quantitative PCR.
Samples were selected to include equal numbers of samples with high and low LCT mRNA.
Age and gender were normalized across groups. Samples selected are summarized in
Table 3 and Table 4.
37
Table 3. Summary of blood sample selection. No significant differences in age or gender existed
between samples with high vs. low lactase activity as determined by Student’s T-Test (p>0.05). Blood
High Lactase Activity Low Lactase Activity
Mean Age ± SEM 39.5 ± 3.79 39.5 ± 3.29
Males 8 8
Females 4 4
Table 4. Summary of jejunum and enterocyte sample properties used. * indicates significantly
lower lactase mRNA as indicated by Student’s T-Test (p <0.05) in jejunum samples categorized
as ‘Low LCT mRNA’, as compared with samples categorized as ‘High LCT mRNA’. No
significant differences in age or gender existed between samples with high vs. low lactase
activity as determined by Student’s T-Test (p>0.05). Jejunum Enterocyte
High LCT mRNA Low LCT mRNA High LCT mRNA
Low LCT mRNA
Mean Relative Quantity ± SEM
0.9 ± 0.112 0.206* ± 0.038 1.25 ± 0.51 0.18 ± 0.68
Mean Age (yrs) ± SEM
39 ± 2.12 39.92 ± 2.59 35.5 ± 6.5 31.5 ± 10.5
Males 2 2 0 0
Females 10 10 2 2
38
MOLECULAR INVERSION PROBES
Probes were designed using ppDesigner (Diep et al., 2012). Probes were designed to
span the forward and reverse strands of specific genomic regions not masked by
RepeatMasker. Ligation and extension arm length varied from 15 nt to 25 nt in length with
the total length of both arms always equaling 40 nt. Coordinates of interrogated regions
from Genome Browser Human Genome Assembly NCBI36/hg18 are outlined in Table 5.
Table 5. Genome Browser Human Genome Assembly NCBI36/hg18 coordinates of target regions interrogated by molecular inversion probes.
Region Chromosome Starting Coordinate
Ending Coordinate
Unique 1 Chr2 69030436
69032400
Unique 2 Chr2 136260090
136260554
Unique 3 Chr2 136309688
136311228
Unique 4 Chr2 141643329
141644763
Segmental Duplication 1
Chr2 130533089
130540336
Segmental Duplication 2
Chr2 130923685
130931342
Segmental Duplication 3
Chr2 131747376
131752487
Segmental Duplication 4
Chr2 131757424 131763913
39
A total of 60 probes were synthesized by the company, Integrated DNA
Technologies, diluted in water to a concentration of 0.5pM, and stored at -20°C. Samples
which had undergone whole genome amplification were used as controls for bisulfite
conversion. Whole genome amplification (WGA) was performed using phi29 DNA
polymerase (Thermo Scientific). Forty ng of DNA per sample was amplified using the
following procedure: DNA, exonuclease resistant primers (Thermo Scientific), and phi29
reaction buffer (Thermo Scientific) were mixed for a final sample volume of 17µl. The
reaction mix was slowly heated by incubation at starting temperature of 35°C, increasing
by 1 degree per minute for 60 minutes, reaching a final temperature of 95°C. After this step,
dinucleotide triphosphates (dNTPs), bovine serum albumin (Thermo Scientific), enzyme
phi29 polymerase (Thermo Scientific) and pyrophosphatase (Thermo Scientific) were
added (reaching a final volume of 20µl) and left to amplify at 30°C for 6 hours. Twenty-four
blood samples, twenty-four jejunum samples, and four samples of enterocyte DNA were
interrogated in this experiment, including four blood and two jejunum samples which were
run with a single replicate. Two blood samples were used as the WGA control. 1µg of DNA
from each sample (as well as WGA controls) was treated with sodium bisulfite using an ‘EZ
DNA Methylation-Lightning Kit’ (Zymogen). The steps of bisulfite conversion were
performed according to manufacturer’s instructions. Samples were diluted in RNAse-free
water and diluted to a concentration of 40ng/µl.
6µl of 0.5pM probes (IDT) were added to 10µl of bisulfite converted DNA (400ng),
along with 1x Ampligase buffer (Mandel) in a 96-well plate. Probes covering segmentally
duplicating regions were hybridized and amplified separately from probes covering unique
regions of the DNA. The reaction mix was denatured at 95°C for 10 minutes, followed by a
40
decrease in temperature of 1°C/minute until 55°C was reached. Samples incubated at 55°C
for 5 hours, before the temperature again decreased by 1°C/minute until 50°C was reached.
Samples again incubated for 5 hours followed by a decrease in temperature of 1°C/minute,
reaching 45°C, before incubating at this temperature for 20 hours. 2.5µl of HLN mix (18mM
dNTP, 0.5U/µl Ampligase (Epicentre) in 1x Ampligase buffer, and ~2.5U/µl of Hemo
KlenTaq (New England Biolabs) was added to the reaction for gap-filling reactions. For
circularization, the reactions were incubated at 45°C for 5 hours followed by a gradual
increase in temperature by 1°C/minute until 50°C was reached, incubated at this
temperature for 5 hours, temperature was again increased by 1°C/minute until 55°C was
reached, and incubated at this temperature for another 5 hours. This was immediately
followed by enzyme inactivation at 94°C for 2 minutes. To digest linear DNA after
circularization, 3µl of exonuclease mix (20U/µl of exonuclease I and 200U/µl of
exonuclease III; New England Biolabs) was added to the reactions, and the reactions were
incubated at 37°C for 1 hour followed by enzyme inactivation at 94°C for 2 minutes. Linear
DNA digestion was then repeated for a second time. 2µl of circularized DNA was amplified
and barcoded (labelled with a unique sequence of six nucleotides in length) in 50µl
reactions using 25µl ‘Phusion High-Fidelity 2 x Master Mix’ (NEB) and 4mM of forward and
reverse indexing primers. Reverse primers were complete with barcodes that were 6
nucleotides in length, by which individual samples could be identified. The forward primer
sequence was 5’-CAGATGTTATCGAGGTCCGAC-3’, and reverse primer sequence was, 5’-
GGAACGATGAGCCTCCAAC-3’.
Circularized DNA was amplified using the following temperature sequence: 98°C for
30 seconds, followed by 35 cycles of 98°C for 10 seconds, 58°C for 30 seconds, and 72°C for
41
30 seconds. The program terminated following 5 minutes at 72°C. Five µl of each product
was run on an agarose gel to ensure product was the expected 258 base pairs of length.
Amplicons were purified using 0.7x Ampure beads, to size select for the 258 base pair
amplicon.
SEQUENCING
Once purified, the Illumina MiSeq platform was used to sequence the enriched
amplicons, using cyclic reversible termination to produce sequencing reads. In it, a fluorescently
modified nucleotide complementary to the template is incorporated by a DNA polymerase. After
the nucleotide is incorporated, the polymerase activity is terminated and the remaining
unincorporated nucleotides was washed away. An image was then captured to determine
the identity of the incorporated nucleotide before cleavage of the fluorescent dye along
with the terminating/inhibiting group of the nucleotide, removed by an additional wash
before the next incorporation step began (Metzker, 2010). The Illumina MiSeq platform
provides 1.5-2 GB of sequences per run, with an observed raw error rate of 0.80%. It
provides reads which are up to 150 bases in length from both sides of the amplicon,
producing what is known as a paired read (Quail et al., 2012).
STATISTICAL ANALYSIS
The following formulas were used to compare the relative expression of the target
gene in the samples.
42
ΔCT (Normalized expression) = CT (GAPDH) – CT (target gene)
ΔΔCT (Relative expression) = ΔCT (Target) –ΔCT (Reference)
Fold Difference (Relative Abundance) = 2(-ΔΔCT)
Differences in lactase expression were analyzed using two-way Student’s T-Tests.
All analysis and images were done using Microsoft Excel 2013.
MOLECULAR INVERSION PROBE ANALYSIS
Molecular inversion probe analysis was performed as previously described (Diep et
al., 2012). Initially, all C’s were converted to T’s in the reference genome to create a
reference ‘bisulfite’ genome. This was done separately for the Watson’s and the Crick’s
strands. The sequencing reads were in FASTQ formation and were encoded by predicting
the mapping orientation of each read. Once the orientation was predicted, the reads are
‘bisulfite converted’ as all C’s were converted to T’s in forward mapping reads, and all G’s
were converted to A’s in predicted reverse complimentary reads. SOAP2Align was used to
map the bisulfite reads to the converted genome. Once one alignment per read was
selected, the original cytosine calls were placed back into the alignment information.
Alignments were then converted to pileup format using SamTools. bisReadMapper was
then used to call methylation frequencies.
CORRELATION OF METHYLATION LEVELS BETWEEN TECHNICAL REPLICATES
Correlation between technical replicates was calculated using Pearson’s correlation
coefficient on all CpG sites identified in both replicates. The methylation frequencies
obtained from bisReadMapper for CpG sites with at least a read depth of 10 in both samples
43
were input into the statistical package R. Pearson’s correlation for the two replicates was
computed using the cor.test function.
ANALYSIS OF METHYLATION AND LACTASE EXPRESSION
Two-tailed Spearman correlation test with Bonferroni correction for multiple
comparisons was performed on each CpG site with a minimum of 10x depth coverage in a
minimum of six samples to determine whether methylation status was correlated with
lactase expression.
44
RESULTS
LACTASE MRNA EXPRESSION IN MICE
The relative expression of lactase in the proximal jejunum of 6 day old mice (n=15)
and 60 day old mice (n=15) was compared. For a each condition, measurements are
reported as (mean±SEM). There was a significant difference in the mean relative lactase
mRNA expression levels observed in 6 day old mice was (1.56 ± 0.104) when compared
with that of 60 day old mice (1.18 ± 0.032) (p<0.05), as determined using Student’s T-Test
(Figure 2).
We then compared the relative lactase expression levels in the distal jejunum (n=3)
and ileum (n=3) of 6 day old mice vs. 60 day old mice (jejunum n=3; ileum n=3). There was
a significant difference in lactase expression in the 6 day old mice (distal jejunum=
1.03±0.18, ileum= 1.04±0.21) as compared with the lactase expression in 60 day old mice
(distal jejunum= 0.24±0.11, ileum= 0.37±0.02) (Figure 3). This difference was significant
as determined by Student’s T-Test (p<0.05).
45
Figure 2. Relative mRNA expression levels of lactase in the proximal jejunum of 6 day old mice (n=15) vs. 60 day old mice (n=15), represented as group means ± SEM. GAPDH and Actin were used as endogenous controls. The star indicates a significant difference in lactase expression (p<0.05) in 60 day old mice as compared with 6 day old mice as determined by Student’s T-Test.
*
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
6 Day 60 Day
Rel
ativ
e m
RN
A E
xpre
ssio
n
Age of Mice
46
Figure 3. Relative mRNA expression of lactase in the distal jejunum of 6 day old mice (n=3) and 60 day old mice (n=3), as well as the ileum of 6 day old mice (n=3) and 60 day old mice (n=3) represented as mean ± SEM. GAPDH and Actin were used as endogenous controls. The star indicates significant difference in lactase expression as observed using Student’s T-Test (p<0.05) in both segments of the small intestine of 60 day old mice as compared with 6 day old mice.
* *
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Distal Jejunum Ileum
Rel
ativ
e m
RN
A E
xpre
ssio
n
Intestinal Segment
6 Day60 Day
47
MOLECULAR INVERSION PROBES: LACTASE INTOLERANCE IN
HUMANS
There were 405 CpGs across the targeted regions, corresponding to 810
interrogated cytosines after bisulfite conversion. Of these 810 cytosines, 439 had adequate
data to be analyzed, which was defined to be at least 10x coverage in at least 6 out of a total
of 54 samples. On-target probe binding and sequencing was low across all unique regions.
On-target probe binding refers to the percentage of total sequencing reads for a target
region that mapped back to the region. Figure 4 shows the low specificity of the
sequencing reads produced using molecular inversion probe enrichment followed by
sequencing. Unique regions refer to regions of the DNA that occur once on each copy of a
single chromosome. For example, the sequence defining the lactase gene is only found once
on each copy of chromosome 2. Regions are unique relative to the segmental duplications,
which can be found to repeat on multiple chromosomes. The highest on-target activity was
observed in unique region 4, with 31% of total sequence reads originating from that area.
Figure 5 shows the standard deviation among methylation levels at individual CpG
sites across samples that have been separated by tissue: jejunum, enterocyte and whole
blood. High variation was observed in methylation levels at individual CpG sites, with high
standard deviations (0-0.8) seen in the majority of regions and tissues. As shown in Figure
6, methylation levels in technical replicate samples showed very little correlation when
compared (Pearson correlation coefficient = 0.33). Methylation levels of CpG dinucleotides
from enterocyte and jejunum samples were compared with relative lactase expression
using two-tailed Spearman correlation test. The significance of these correlations is plotted
in Figure 7. Methylation at the cytosine with the coordinate “chr2:130926695” was found
48
to be significantly correlated (p<0.05) with lactase expression; however, the significance of
this trend disappeared after correcting for multiple statistical comparisons (Bonferroni
Correction for multiple comparisons).
Figure 4. Percentage of on-target sequence reads. Highest on-target activity occurred in Unique Region 4, with just under a third (31%) of on-target sequence reads.
0
5
10
15
20
25
30
35
1 2 3 4
On
-Tar
get
Seq
uen
cin
g (%
)
Unique Region
49
a
b
Figure 5. a) Standard deviations in recorded methylation level (Y-axis) at individual CpG sites (X-axis) compiled across all samples and separated by tissue. Arrows indicate unique regions in the DNA. Image adapted from original created by Orion Buske. Image used with permission. b) Hypothetical image of the methylation standard deviation to be expected if gene expression were governed by the methylation status of a specific region.
50
Figure 6. Correlation plot of methylation levels at individual CpG sites averaged across replicate samples. Pearson correlation coefficient = 0.33. Image created by Orion Buske and used with permission.
51
Figure 7. Manhattan plot of the signed log10 (p) values, for the correlation between CpG-wise enterocyte and jejunum methylation vs. lactase expression as measured by Spearman correlation test. Methylation at a single CpG site at chr2:130926695 was significantly (p<0.05) correlated with lactase expression, but only before corrections for multiple comparisons (Bonferroni Correction). Arrows indicate unique DNA regions. Image adapted from original created by Orion Buske. Image used with permission.
52
DISCUSSION
LACTASE EXPRESSION IN MICE
The 1.32-fold decrease in lactase expression observed (Figure 2) in the proximal
jejunum of the 60 day old mice was not as large as the decrease reported in previous
studies. These studies observed a minimum 2.3-fold decrease in lactase expression in their
old vs. young mice, using ages of 7 and 50 days for those conditions, respectively (R. Fang
et al., 2006). However, a different strain of mouse was used, and only one biological sample
was used to represent the lactase expression in each age group. Although there is little
indication to suggest that lactase expression varies by strain of mouse, this may be a factor
adding to the variation observed between the present results and previously published
data. In addition, our results suggest that the large fold decreases previously reported by
others may diminish with increased sample size. Indeed, studies using a greater sample
size (Duluc, Jost, & Freund, 1993) have noted that lactase expression decreased
significantly in all segments of the small intestine, except in the proximal jejunum of 90 day
old rats when compared the lactase expression of 4 day old rats. The present study
comparing the lactase expression in the distal jejunum and ileum of young and old mice
demonstrated large fold decreases (distal jejunum= 4.32, ileum = 2.81) in lactase
expression across these regions (Figure 3), comparable to the fold changes observed
previously (distal jejunum= 2.5, ileum= 2.8) (R. Fang et al., 2006).
The data obtained in this experiment demonstrate a mosaic pattern of lactase
expression within the small intestine, alongside enterocytic enzyme expression variation
throughout development depending on small intestinal segment (Figure2, 3). Other
53
enterocyte-specific genes, such as the sucrase-isomaltase gene and intestinal carbamoyl
phosphate synthase I, have also been previously shown to exhibit varying expression
throughout the length of the small intestine (Rings et al., 1994; Van Beers et al., 1998).
Region specific epigenetic regulations may be a source of the high degree of variation
identified in enterocyte gene expression within different segments of the small intestine;
however, further studies are required.
MOLECULAR INVERSION PROBES: LACTASE INTOLERANCE IN
HUMANS
This study interrogated four unique and four segmentally duplicating regions of the
DNA with molecular inversion probe enrichment followed by sequencing, in which
methylation profiles had previously been shown to be significantly correlated with
enterocyte lactase expression (Dr. Viviane Labrie, Postdoctoral Fellow Petronis’
Laboratory, 2012, unpublished findings). The efficacy of these techniques was evaluated
using the performance parameters of specificity, sensitivity, uniformity, reproducibility,
and the amount of DNA required.
SPECIFICITY
It was found that the molecular inversion probe technique led to enrichment with
low specificity. Specificity, defined as the percentage of on-target enrichment, is essential
for obtaining informative data. As seen in Figure 4, only a small percentage of sequence
reads originated from the proper target location, indicating that the molecular inversion
probes did not bind specifically to the targeted regions. The low specificity observed in our
results is unusual for the molecular inversion probe technique. In fact, the technique had
been designed for probe binding in a highly specific manner, since the 3’ and 5’ binding
54
sites are both bound together and restricted locally (Hardenbol et al., 2003). Furthermore,
molecular inversion probe enrichment has previously been shown to be highly specific
when applied to genomic and bisulfite treated DNA (Diep et al., 2012; Porreca et al., 2007;
Shen et al., 2013). However, the reduced complexity of bisulfite treated DNA enables off-
target probe binding, and on-target binding values as low as 56% have previously been
reported (Diep et al., 2012). Probe design has been identified as a key factor in probe
binding specificity (Diep et al., 2012), so future experiments utilizing this technique may
begin with improving probe design. Probe design can be improved through the use of
ppDesigner, a program which has been developed to aid in the design of efficient molecular
inversion probes (Diep et al., 2012). This program was utilized in the current experiment;
however, there are various parameters that can be modified to improve molecular
inversion probe selectivity. Previous research indicates that increasing the melting
temperature of the ligation arm by increasing its Guanine content and length can improve
probe capturing efficiency. In addition, shorter target sequences are captured with higher
efficiency than long target sequences (Deng et al., 2009). As such, future attempts at
molecular inversion probe design may include increasing the melting temperature of the
ligation arm, and decreasing the target sequence length to produce probes with higher
capturing efficiency.
In addition to improved probe design, increasing the stability of the probes may also
improve target capture. Betaine, a chemical that reduces the formation of GC-rich
secondary structures, has been included during the extension and ligation reactions to
minimize probe-to-probe interactions. The addition of betaine along with decreased
annealing time improved capture yields in similar techniques applied to genomic DNA
55
(Shen et al., 2011), and may also lead to similar improvements when applied to bisulfite
treated DNA.
SENSITIVITY
The measures of coverage, defined as the number of target regions captured, as well
as depth, defined as the number of reads representing each target, were used to assess the
sensitivity of the experimental techniques. Of the 810 interrogated cytosines, just over half
were covered 10 times in at least six samples. This indicates that not only were half the
cytosines of interest missed completely, but also that the ones which were covered were
sequenced with low depth (ie. only 10x). As displayed in Figure 5, there was a high
standard deviation in the methylation measures of individual CpG sites across all samples
and tissues. In addition, Figure 5 highlights the scarcity of coverage of the molecular
inversion probe technique as portrayed by the lack of data points within the unique
regions. Although the coverage in the segmentally duplicated regions was high, so was the
standard deviation of methylation across samples. Segmentally duplicated regions and
unique regions were enriched separately and sequenced together. Due to the high
frequency of segmentally duplicating sequences in the genome relative to unique regions, a
higher percentage of sequence reads originated from segmentally duplicating regions. It is
possible that these regions unevenly occupied sequencing reactions, causing
disproportionate sequencing of the unique regions. Sequencing the unique regions
separately from the segmentally duplicating regions may result in better coverage of the
unique regions. Overall, the enrichment and sequencing techniques did not perform with
high sensitivity.
56
A lack of sequencing depth was also observed in the unique regions of the data,
indicating that the selection of sequencing platform could also stand to be improved. While
the Illumina MiSeq platform used in this experiment produces 1.5-2 Gb of sequence reads
(Quail et al., 2012), other platforms such as Illumina HiSeq produce 30 Gb of sequence
reads, which allow for greater coverage and sequence depth.
REPRODUCIBILITY
The high standard deviation of and low correlation among samples characterizes
the low uniformity and reproducibility of the techniques employed. Reproducibility of the
technique is low, as the Pearson correlation coefficient between technical replicates is 0.33
(Figure 6). Previous studies using this technique to enrich genomic DNA have found a
Pearson correlation coefficient of 0.54 between technical replicates (Porreca et al., 2007).
More recently, a Pearson correlation coefficient of 0.97- 0.98 has been achieved between
technical replicates through improved probe design (Diep et al., 2012). These low
correlation coefficients in the present and past studies speak to the low reproducibility and
uneven coverage observed with the molecular inversion probe technique, as well as the
sensitivity of probe design.
The low reproducibility observed in the present study may be due to the nature of
molecular inversion probe enrichment in segmentally duplicated regions since they are
located in numerous parts of the genome. Due to the high homology of these sequences, the
molecular inversion probes designed to bind to regions within the second chromosome are
also theoretically capable of binding to the segmentally duplicating regions present on
different chromosomes. It therefore remains possible that the methylation statuses of CpG
sites originating from other chromosomes, outside the chromosome of interest
57
(chromosome 2) were inadvertently compared; this scenario would result in a low
correlation between replicates, just as observed.
In addition, sequencing reads from this region are impossible to accurately map to
the correct point of origin due to their repetitive nature (Horvath, Schwartz, & Eichler,
2000). It is also a possibility that the high standard deviation observed in the segmentally
duplicated regions may reflect technical difficulties with properly mapping segmentally
duplicating regions, rather than an issue of biological relevance.
AMOUNT OF DNA REQUIRED
Although refinements to the molecular inversion probe and sequencing protocol are
required, a major benefit of the employed techniques was the ability to interrogate large
target regions using minimal amounts of DNA (400ng). Interrogating the same regions with
the traditional PCR technique would require several micrograms of each sample.
METHYLATION AND LACTASE EXPRESSION
Despite the high standard deviation and low correlation observed between technical
replicates, methylation level was compared with relative lactase expression. Figure 7
indicates that no correlations were significant, except for a methylation single cytosine
within a segmentally duplicated region, but this significance did not withstand correction
for multiple comparisons. Due to the high standard deviation and low replication
correlation, post hoc analyses were not performed.
The results of the molecular inversion probe experiment were expected to
demonstrate a correlation between methylation profiles and lactase expression, as one of
the unique regions interrogated was the first exon of the LCT gene. Although the current
58
experiment did not provide conclusive evidence in this respect, recent research has
demonstrated that methylation in the first exon is tightly linked to transcriptional silencing
(Brenet et al., 2011). In fact, it has been shown that methylation in the first exon has a
higher correlation to transcriptional silencing than methylation in the promoter region, or
other intergenic regions downstream of the first exon (Brenet et al., 2011). The possibility
remains open that hypomethylation within the first LCT exon correlates with lactase
expression, which in turn is correlated with genotype.
The LCT-13910*C/T SNP is located ~24-kb upstream of the first LCT exon, and any
mechanism by which a SNP could influence methylation in an exon at such a distance has
not been fully elucidated. DNA sequence variants, however, have been shown to influence
DNA methylation and gene expression (Gertz et al., 2011). For example, autosomal genes
with SNP-induced allele specific methylation within 5-kb of a gene transcription start site
also show allele specific expression (Gertz et al., 2011). Although the LCT-13910*C/T SNP
is located significantly further than 5-kb from the lactase transcription start site, it is
possible that this SNP may influence methylation and subsequent lactase gene expression.
Although the influence of the LCT-13910*C/T SNP on DNA methylation remains
unclear, this SNP has been shown to influence lactase gene expression in vivo. The post-
weaning decline in lactase expression was impeded in transgenic mice with human DNA
fragments, when the -13910*T SNP was cloned upstream of a 2-kb rat lactase promoter but
not in mice with the -13910*C SNP (L. Fang et al., 2012). These data suggest that despite
being 14-kb upstream of the lactase gene, the LCT-13910*C/T SNP may play a causal role in
lactase expression.
59
A second region investigated in the molecular inversion probe experiment is located
in the second exon of the gastrokine 2 (GKN2) gene. GKN2 expression occurs in epithelial
cells of the small intestine and stomach, and is responsible for the maintenance and repair
of the surface mucosa (Kim et al., 2014). Regions also interrogated include a unique region
500 base pairs downstream of the LCT gene, along with a region within the low density
lipoprotein receptor gene family; yet how methylation in these regions influences lactase
expression remains incompletely understood. It’s possible that interrogation of these
regions with higher coverage, may provide greater insight into the roles of these regions in
mediating the developmental decline in lactase expression.
As epigenetic modifications are typically tissue specific, jejunum, enterocyte and
whole blood DNA were all investigated in this experiment. Epigenetic signatures in
peripheral blood have previously been shown to be useful detection markers for certain
cancers (Teschendorff et al., 2009). The present study investigated the epigenetic profiles
of blood DNA to determine whether the lactase phenotype also displays an epigenetic
detection signature. Although no such signature is observed in the segmentally duplicating
regions of the blood, it is possible that more refined enrichment and sequencing techniques
could identify a signature in the unique regions. Jejunum tissue stripped of the epithelial
cell layer (and enterocytes) was interrogated to identify potential epigenetic vestiges.
Previous research has shown that environmental factors, such as stress, can leave
functional epigenetic marks in the brain. These marks not only alter gene expression and
affect neuronal function, but also leave epigenetic vestiges in buccal epithelial cells (Essex
et al., 2013). In this experiment, we investigated whether such vestiges existed in jejunum
tissue absent of enterocytes. In addition, four samples of DNA from the jejunal epithelial
60
layer including enterocytes were also interrogated to determine the role of DNA
methylation in lactase expression. Unfortunately, data obtained from this experiment was
unable to provide conclusive results within the target unique regions, and no signature was
identified in the segmentally duplicating regions.
Segmentally duplicated regions were also interrogated using molecular inversion
probe enrichment. These regions were previously found to correlate with lactase
expression in humans (Labrie, data not published). In addition, segmentally duplicating
regions are linked to disease-causing rearrangements, and may play a role in adaptive
evolution and gene innovation (Bailey et al., 2002; Eichler, 2001; Horvath et al., 2000; Zhao,
Zhu, Kasahara, Morishita, & Zhang, 2013). These highly homologous regions contain both
exonic and intronic sequences, and are found interspersed within the genome (Bailey et al.,
2002; Eichler, 2001). Inheritance of specific duplications has been shown to be one of the
major genetic causes of neurological diseases (Marshall et al., 2008; Miller et al., 2009;
Sharp et al., 2008). Similarly, hypomethylation in segmentally duplicating regions has been
observed in breast and lung cancer (Novak et al., 2008).
The molecular inversion probe enrichment coupled with sequencing used in this
study revealed high variability in the methylation profile of the examined segmentally
duplicated regions. When methylation in these regions was correlated to lactase
expression, no correlations were found to be significant after correction for multiple
comparisons. Although the lack of significant correlation may stem from the high variation
in output values, it is unlikely that these regions influence lactase expression due to their
extreme distance from the lactase gene (3 Mb). Segmentally duplicating regions, however,
61
may still be more immediately interesting areas to interrogate further due to their
influence on genomic re-arrangement.
FUTURE DIRECTIONS
Future directions of this project include determining whether there are epigenetic
modifications that correlate with changes in lactase in gene expression levels. Results from
the present study demonstrate a mosaic pattern of gene expression within the small
intestine in mice. Future studies may investigate epigenetic regulation of enterocyte
specific genes along the horizontal axis of the small intestine, to further understand the
mosaic pattern of gene expression. In addition, it is also of interest to investigate the
epigenetic mechanisms that may be responsible for developmental regulations of gene
expression.
The data obtained from the molecular inversion probe enrichment demonstrated
that optimization of the chosen protocol is required in order for this technique to provide
viable information. A major challenge to overcome will be to optimize probe design and
improve on-target probe activity (Diep et al., 2012). While traditional techniques such as
luciferase-based pyrosequencing would provide informative methylation data for the
unique regions, it requires an inconveniently large amount of DNA, and still it would be
unable to sequence the segmentally duplicating regions. Although the molecular inversion
probe technique was capable of interrogating segmentally duplicating regions, the high
homology of these regions presents a challenge in mapping the resultant sequence reads
with precision.
The uneven and sparse coverage observed using molecular inversion probes in the
unique regions may have been a result of poor target capture during enrichment, or poor
62
coverage during sequencing. In fact, an investigation into exactly which of these
contributes most to the issues encountered with the current study could prove to be quite
fruitful, from a methodological perspective. Similarly, the means by which target capture
can be enhanced may be a productive avenue toward optimizing molecular inversion probe
enrichment.
The importance of this study comes not entirely from the study of lactose
intolerance, but rather from the implications of the epigenetic influence on lactase
expression. The lactose intolerance phenotype presents a unique opportunity to study
potential epigenetic mechanisms in a simple phenotype, controlled by the expression of a
single gene. Using this phenotype as a model, we can study how epigenetic modifications
influence the onset of the decrease in lactase expression through development and as a
response to environmental stimuli. These mechanisms can then be generalized to
phenotypes regulated by the expression of multiple genes. Understanding how epigenetic
mechanisms regulate simple phenotypes is the first step in understanding epigenetic
mechanisms of complex, multi-gene disease phenotypes such as schizophrenia and bipolar
disorder.
Studying the epigenetic influence on lactase expression is a ‘proof of concept’ of
sorts. If epigenetic mechanisms can be identified that are responsible for the lactase
intolerance phenotype, then it stands to reason that similar mechanisms may influence
gene expression in more complex diseases. In order to understand even simple
phenotypes, however, the techniques used must be sensitive enough to detect epigenetic
differences, while maintaining a high level of reproducibility. The study of techniques using
a simple and well understood phenotypic model allows for the facilitated detection of noisy
63
signal inherent to the technique, independent of biological relevance. By understanding the
specifications unique to individual techniques, definitive results can be obtained to provide
an accurate reflection of the biological environment. As enrichment and sequencing
techniques evolve and develop, we will be provided with greater insight of the epigenetic
relevance in simple and complex phenotypes. Understanding epigenetic techniques and the
epigenetics of simple phenotypes is the foreground to understanding complex disease
phenotypes, which in the future, will hopefully lead to the development of effective
therapeutic agents.
64
REFERENCES Akhras, M. S., Unemo, M., Thiyagarajan, S., Nyren, P., Davis, R. W., Fire, A. Z., & Pourmand,
N. (2007). Connector inversion probe technology: a powerful one-primer multiplex DNA
amplification system for numerous scientific applications. PLoS One, 2(9), e915. doi:
10.1371/journal.pone.0000915
Argeson, A. C., Nelson, K. K., & Siracusa, L. D. (1996). Molecular basis of the pleiotropic
phenotype of mice carrying the hypervariable yellow (Ahvy) mutation at the agouti locus.
Genetics, 142(2), 557-567.
Bailey, J. A., Gu, Z., Clark, R. A., Reinert, K., Samonte, R. V., Schwartz, S., . . . Eichler, E. E.
(2002). Recent segmental duplications in the human genome. Science, 297(5583), 1003-
1007. doi: 10.1126/science.1072047
Ball, M. P., Li, J. B., Gao, Y., Lee, J. H., LeProust, E. M., Park, I. H., . . . Church, G. M. (2009).
Targeted and genome-scale strategies reveal gene-body methylation signatures in human
cells. Nat Biotechnol, 27(4), 361-368. doi: 10.1038/nbt.1533
Barzilay, G., Walker, L. J., Robson, C. N., & Hickson, I. D. (1995). Site-directed mutagenesis of
the human DNA repair enzyme HAP1: identification of residues important for AP
endonuclease and RNase H activity. Nucleic Acids Res, 23(9), 1544-1550.
Berdasco, M., & Esteller, M. (2010). Aberrant epigenetic landscape in cancer: how cellular
identity goes awry. Dev Cell, 19(5), 698-711. doi: 10.1016/j.devcel.2010.10.005
Bird, A. (2002). DNA methylation patterns and epigenetic memory. Genes Dev, 16(1), 6-21. doi:
10.1101/gad.947102
Boll, W., Wagner, P., & Mantei, N. (1991). Structure of the chromosomal gene and cDNAs
coding for lactase-phlorizin hydrolase in humans with adult-type hypolactasia or
persistence of lactase. Am J Hum Genet, 48(5), 889-902.
Brenet, F., Moh, M., Funk, P., Feierstein, E., Viale, A. J., Socci, N. D., & Scandura, J. M.
(2011). DNA methylation of the first exon is tightly linked to transcriptional silencing.
PLoS One, 6(1), e14524. doi: 10.1371/journal.pone.0014524
Broske, A. M., Vockentanz, L., Kharazi, S., Huska, M. R., Mancini, E., Scheller, M., . . .
Rosenbauer, F. (2009). DNA methylation protects hematopoietic stem cell multipotency
from myeloerythroid restriction. Nat Genet, 41(11), 1207-1215. doi: 10.1038/ng.463
Carlone, D. L., Lee, J. H., Young, S. R., Dobrota, E., Butler, J. S., Ruiz, J., & Skalnik, D. G.
(2005). Reduced genomic cytosine methylation and defective cellular differentiation in
embryonic stem cells lacking CpG binding protein. Mol Cell Biol, 25(12), 4881-4891.
doi: 10.1128/mcb.25.12.4881-4891.2005
65
Carninci, P., Sandelin, A., Lenhard, B., Katayama, S., Shimokawa, K., Ponjavic, J., . . .
Hayashizaki, Y. (2006). Genome-wide analysis of mammalian promoter architecture and
evolution. Nat Genet, 38(6), 626-635. doi: 10.1038/ng1789
Cho, R. J., Mindrinos, M., Richards, D. R., Sapolsky, R. J., Anderson, M., Drenkard, E., . . .
Oefner, P. J. (1999). Genome-wide mapping with biallelic markers in Arabidopsis
thaliana. Nat Genet, 23(2), 203-207. doi: 10.1038/13833
Choi, S. H., Heo, K., Byun, H. M., An, W., Lu, W., & Yang, A. S. (2011). Identification of
preferential target sites for human DNA methyltransferases. Nucleic Acids Res, 39(1),
104-118. doi: 10.1093/nar/gkq774
Dalma‐Weiszhausz, D. D., Warrington, J., Tanimoto, E. Y., & Miyada, C. G. (2006). [1] The
Affymetrix GeneChip® Platform: An Overview. In K. Alan & O. Brian (Eds.), Methods
in Enzymology (Vol. Volume 410, pp. 3-28): Academic Press.
Deng, J., Shoemaker, R., Xie, B., Gore, A., LeProust, E. M., Antosiewicz-Bourget, J., . . . Zhang,
K. (2009). Targeted bisulfite sequencing reveals changes in DNA methylation associated
with nuclear reprogramming. Nat Biotechnol, 27(4), 353-360. doi: 10.1038/nbt.1530
Diep, D., Plongthongkum, N., Gore, A., Fung, H. L., Shoemaker, R., & Zhang, K. (2012).
Library-free methylation sequencing with bisulfite padlock probes. Nat Methods, 9(3),
270-272. doi: 10.1038/nmeth.1871
Duluc, I., Jost, B., & Freund, J. N. (1993). Multiple levels of control of the stage- and region-
specific expression of rat intestinal lactase. J Cell Biol, 123(6 Pt 1), 1577-1586.
Eichler, E. E. (2001). Recent duplication, domain accretion and the dynamic mutation of the
human genome. Trends Genet, 17(11), 661-669.
Enattah, N. S., Sahi, T., Savilahti, E., Terwilliger, J. D., Peltonen, L., & Jarvela, I. (2002).
Identification of a variant associated with adult-type hypolactasia. Nat Genet, 30(2), 233-
237. doi: 10.1038/ng826
Essex, M. J., Boyce, W. T., Hertzman, C., Lam, L. L., Armstrong, J. M., Neumann, S. M., &
Kobor, M. S. (2013). Epigenetic vestiges of early developmental adversity: childhood
stress exposure and DNA methylation in adolescence. Child Dev, 84(1), 58-75. doi:
10.1111/j.1467-8624.2011.01641.x
Fakhrai-Rad, H., Pourmand, N., & Ronaghi, M. (2002). Pyrosequencing: an accurate detection
platform for single nucleotide polymorphisms. Hum Mutat, 19(5), 479-485. doi:
10.1002/humu.10078
Fang, L., Ahn, J. K., Wodziak, D., & Sibley, E. (2012). The human lactase persistence-
associated SNP -13910*T enables in vivo functional persistence of lactase promoter-
66
reporter transgene expression. Hum Genet, 131(7), 1153-1159. doi: 10.1007/s00439-012-
1140-z
Fang, R., Olds, L. C., Santiago, N. A., & Sibley, E. (2001). GATA family transcription factors
activate lactase gene promoter in intestinal Caco-2 cells. American Journal of Physiology
- Gastrointestinal and Liver Physiology, 280(1 43-1), G58-G67.
Fang, R., Olds, L. C., & Sibley, E. (2006). Spatio-temporal patterns of intestine-specific
transcription factor expression during postnatal mouse gut development. Gene Expr
Patterns, 6(4), 426-432. doi: 10.1016/j.modgep.2005.09.003
Furnari, M., Bonfanti, D., Parodi, A., Franze, J., Savarino, E., Bruzzone, L., . . . Savarino, V.
(2013). A comparison between lactose breath test and quick test on duodenal biopsies for
diagnosing lactase deficiency in patients with self-reported lactose intolerance. J Clin
Gastroenterol, 47(2), 148-152. doi: 10.1097/MCG.0b013e31824e9132
Gabory, A., Attig, L., & Junien, C. (2009). Sexual dimorphism in environmental epigenetic
programming. Mol Cell Endocrinol, 304(1-2), 8-18. doi: 10.1016/j.mce.2009.02.015
Ge, Z. J., Luo, S. M., Lin, F., Liang, Q. X., Huang, L., Wei, Y. C., . . . Sun, Q. Y. (2014). DNA
methylation in oocytes and liver of female mice and their offspring: effects of high-fat-
diet-induced obesity. Environ Health Perspect, 122(2), 159-164. doi:
10.1289/ehp.1307047
Gertz, J., Varley, K. E., Reddy, T. E., Bowling, K. M., Pauli, F., Parker, S. L., . . . Myers, R. M.
(2011). Analysis of DNA methylation in a three-generation family reveals widespread
genetic influence on epigenetic regulation. PLoS Genet, 7(8), e1002228. doi:
10.1371/journal.pgen.1002228
Gonzalo, S., Jaco, I., Fraga, M. F., Chen, T., Li, E., Esteller, M., & Blasco, M. A. (2006). DNA
methyltransferases control telomere length and telomere recombination in mammalian
cells. Nat Cell Biol, 8(4), 416-424. doi: 10.1038/ncb1386
Guo, J., Xu, N., Li, Z., Zhang, S., Wu, J., Kim, D. H., . . . Ju, J. (2008). Four-color DNA
sequencing with 3'-O-modified nucleotide reversible terminators and chemically
cleavable fluorescent dideoxynucleotides. Proc Natl Acad Sci U S A, 105(27), 9145-
9150. doi: 10.1073/pnas.0804023105
Guo, Junjie U., Su, Y., Zhong, C., Ming, G.-l., & Song, H. (2011). Hydroxylation of 5-
Methylcytosine by TET1 Promotes Active DNA Demethylation in the Adult Brain. Cell,
145(3), 423-434. doi: http://dx.doi.org/10.1016/j.cell.2011.03.022
Hackett, J. A., Zylicz, J. J., & Surani, M. A. (2012). Parallel mechanisms of epigenetic
reprogramming in the germline. Trends in Genetics, 28(4), 164-174. doi:
http://dx.doi.org/10.1016/j.tig.2012.01.005
67
Hardenbol, P., Baner, J., Jain, M., Nilsson, M., Namsaraev, E. A., Karlin-Neumann, G. A., . . .
Davis, R. W. (2003). Multiplexed genotyping with sequence-tagged molecular inversion
probes. Nat Biotechnol, 21(6), 673-678. doi: 10.1038/nbt821
Hashimoto, H., Vertino, P. M., & Cheng, X. (2010). Molecular coupling of DNA methylation
and histone methylation. Epigenomics, 2(5), 657-669. doi: 10.2217/epi.10.44
He, Y. F., Li, B. Z., Li, Z., Liu, P., Wang, Y., Tang, Q., . . . Xu, G. L. (2011). Tet-mediated
formation of 5-carboxylcytosine and its excision by TDG in mammalian DNA. Science,
333(6047), 1303-1307. doi: 10.1126/science.1210944
Horvath, J. E., Schwartz, S., & Eichler, E. E. (2000). The mosaic structure of human
pericentromeric DNA: a strategy for characterizing complex regions of the human
genome. Genome Res, 10(6), 839-852.
Inbar-Feigenberg, M., Choufani, S., Butcher, D. T., Roifman, M., & Weksberg, R. (2013). Basic
concepts of epigenetics. Fertility and Sterility, 99(3), 607-615. doi:
http://dx.doi.org/10.1016/j.fertnstert.2013.01.117
Ingram, C. J., Elamin, M. F., Mulcare, C. A., Weale, M. E., Tarekegn, A., Raga, T. O., . . .
Swallow, D. M. (2007). A novel polymorphism associated with lactose tolerance in
Africa: multiple causes for lactase persistence? Hum Genet, 120(6), 779-788. doi:
10.1007/s00439-006-0291-1
Ito, S., Shen, L., Dai, Q., Wu, S. C., Collins, L. B., Swenberg, J. A., . . . Zhang, Y. (2011). Tet
proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine.
Science, 333(6047), 1300-1303. doi: 10.1126/science.1210597
Jost, B., Duluc, I., Vilotte, J. L., & Freund, J. N. (1998). Lactase is unchanged in suckling mice
fed with lactose-free milk. Gastroenterol Clin Biol, 22(11), 863-867.
Kaminsky, Z., Wang, S. C., & Petronis, A. (2006). Complex disease, gender and epigenetics.
Annals of Medicine, 38(8), 530-544. doi: 10.1080/07853890600989211
Khare, T., Pai, S., Koncevicius, K., Pal, M., Kriukiene, E., Liutkeviciute, Z., . . . Petronis, A.
(2012). 5-hmC in the brain is abundant in synaptic genes and shows differences at the
exon-intron boundary. Nat Struct Mol Biol, 19(10), 1037-1043. doi: 10.1038/nsmb.2372
Kim, O., Yoon, J. H., Choi, W. S., Ashktorab, H., Smoot, D. T., Nam, S. W., . . . Park, W. S.
(2014). GKN2 contributes to the homeostasis of gastric mucosa by inhibiting GKN1
activity. J Cell Physiol, 229(6), 762-771. doi: 10.1002/jcp.24496
Kriukiene, E., Labrie, V., Khare, T., Urbanaviciute, G., Lapinaite, A., Koncevicius, K., . . .
Klimasauskas, S. (2013). DNA unmethylome profiling by covalent capture of CpG sites.
Nat Commun, 4, 2190. doi: 10.1038/ncomms3190
68
Kuokkanen, M., Myllyniemi, M., Vauhkonen, M., Helske, T., Kaariainen, I., Karesvuori, S., . . .
Sipponen, P. (2006). A biopsy-based quick test in the diagnosis of duodenal hypolactasia
in upper gastrointestinal endoscopy. Endoscopy, 38(7), 708-712. doi: 10.1055/s-2006-
925354
Laird, P. W. (2010). Principles and challenges of genomewide DNA methylation analysis. Nat
Rev Genet, 11(3), 191-203. doi: 10.1038/nrg2732
Law, D., Conklin, J., & Pimentel, M. (2010). Lactose intolerance and the role of the lactose
breath test. Am J Gastroenterol, 105(8), 1726-1728. doi: 10.1038/ajg.2010.146
Lee, P. P., Fitzpatrick, D. R., Beard, C., Jessup, H. K., Lehar, S., Makar, K. W., . . . Wilson, C.
B. (2001). A critical role for Dnmt1 and DNA methylation in T cell development,
function, and survival. Immunity, 15(5), 763-774.
Li, E., Bestor, T. H., & Jaenisch, R. (1992). Targeted mutation of the DNA methyltransferase
gene results in embryonic lethality. Cell, 69(6), 915-926. doi:
http://dx.doi.org/10.1016/0092-8674(92)90611-F
Li, J. B., Gao, Y., Aach, J., Zhang, K., Kryukov, G. V., Xie, B., . . . Church, G. M. (2009).
Multiplex padlock targeted sequencing reveals human hypermutable CpG variations.
Genome Res, 19(9), 1606-1615. doi: 10.1101/gr.092213.109
Ling, C., & Ronn, T. (2014). Epigenetic adaptation to regular exercise in humans. Drug Discov
Today. doi: 10.1016/j.drudis.2014.03.006
Lomer, M. C., Parkes, G. C., & Sanderson, J. D. (2008). Review article: lactose intolerance in
clinical practice--myths and realities. Aliment Pharmacol Ther, 27(2), 93-103. doi:
10.1111/j.1365-2036.2007.03557.x
Lukinavicius, G., Lapiene, V., Stasevskij, Z., Dalhoff, C., Weinhold, E., & Klimasauskas, S.
(2007). Targeted labeling of DNA by methyltransferase-directed transfer of activated
groups (mTAG). J Am Chem Soc, 129(10), 2758-2759. doi: 10.1021/ja0691876
Mamanova, L., Coffey, A. J., Scott, C. E., Kozarewa, I., Turner, E. H., Kumar, A., . . . Turner, D.
J. (2010). Target-enrichment strategies for next-generation sequencing. Nat Methods,
7(2), 111-118. doi: 10.1038/nmeth.1419
Marenstein, D. R., Wilson, D. M., 3rd, & Teebor, G. W. (2004). Human AP endonuclease
(APE1) demonstrates endonucleolytic activity against AP sites in single-stranded DNA.
DNA Repair (Amst), 3(5), 527-533. doi: 10.1016/j.dnarep.2004.01.010
Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A., . . .
Rothberg, J. M. (2005). Genome sequencing in microfabricated high-density picolitre
reactors. Nature, 437(7057), 376-380. doi: 10.1038/nature03959
69
Marshall, C. R., Noor, A., Vincent, J. B., Lionel, A. C., Feuk, L., Skaug, J., . . . Scherer, S. W.
(2008). Structural variation of chromosomes in autism spectrum disorder. Am J Hum
Genet, 82(2), 477-488. doi: 10.1016/j.ajhg.2007.12.009
Mattar, R., de Campos Mazo, D. F., & Carrilho, F. J. (2012). Lactose intolerance: diagnosis,
genetic, and clinical factors. Clin Exp Gastroenterol, 5, 113-121. doi:
10.2147/CEG.S32368
Metneki, J., Czeizel, A., Flatz, S. D., & Flatz, G. (1984). A study of lactose absorption capacity
in twins. Hum Genet, 67(3), 296-300.
Metzker, M. L. (2010). Sequencing technologies - the next generation. Nat Rev Genet, 11(1), 31-
46. doi: 10.1038/nrg2626
Michaud, E. J., van Vugt, M. J., Bultman, S. J., Sweet, H. O., Davisson, M. T., & Woychik, R. P.
(1994). Differential expression of a new dominant agouti allele (Aiapy) is correlated with
methylation state and is influenced by parental lineage. Genes Dev, 8(12), 1463-1472.
Millan, M. J. (2013). An epigenetic framework for neurodevelopmental disorders: from
pathogenesis to potential therapy. Neuropharmacology, 68, 2-82. doi:
10.1016/j.neuropharm.2012.11.015
Miller, D. T., Shen, Y., Weiss, L. A., Korn, J., Anselm, I., Bridgemohan, C., . . . Wu, B. L.
(2009). Microdeletion/duplication at 15q13.2q13.3 among individuals with features of
autism and other neuropsychiatric disorders. J Med Genet, 46(4), 242-248. doi:
10.1136/jmg.2008.059907
Miranda, T. B., & Jones, P. A. (2007). DNA methylation: the nuts and bolts of repression. J Cell
Physiol, 213(2), 384-390. doi: 10.1002/jcp.21224
Morey, C., & Avner, P. (2010). Genetics and epigenetics of the X chromosome. Ann N Y Acad
Sci, 1214, E18-33. doi: 10.1111/j.1749-6632.2010.05943.x
Morgan, H. D., Santos, F., Green, K., Dean, W., & Reik, W. (2005). Epigenetic reprogramming
in mammals. Hum Mol Genet, 14 Spec No 1, R47-58. doi: 10.1093/hmg/ddi114
Morgan, H. D., Sutherland, H. G., Martin, D. I., & Whitelaw, E. (1999). Epigenetic inheritance
at the agouti locus in the mouse. Nat Genet, 23(3), 314-318. doi: 10.1038/15490
Newcomer, A. D., McGill, D. B., Thomas, P. J., & Hofmann, A. F. (1975). Prospective
comparison of indirect methods for detecting lactase deficiency. N Engl J Med, 293(24),
1232-1236. doi: 10.1056/nejm197512112932405
Nilsson, M., Malmgren, H., Samiotaki, M., Kwiatkowski, M., Chowdhary, B. P., & Landegren,
U. (1994). Padlock probes: circularizing oligonucleotides for localized DNA detection.
Science, 265(5181), 2085-2088.
70
Novak, P., Jensen, T., Oshiro, M. M., Watts, G. S., Kim, C. J., & Futscher, B. W. (2008).
Agglomerative epigenetic aberrations are a common event in human breast cancer.
Cancer Res, 68(20), 8616-8625. doi: 10.1158/0008-5472.can-08-1419
Okano, M., Bell, D. W., Haber, D. A., & Li, E. (1999). DNA Methyltransferases Dnmt3a and
Dnmt3b Are Essential for De Novo Methylation and Mammalian Development. Cell,
99(3), 247-257. doi: http://dx.doi.org/10.1016/S0092-8674(00)81656-6
Okashita, N., Kumaki, Y., Ebi, K., Nishi, M., Okamoto, Y., Nakayama, M., . . . Seki, Y. (2014).
PRDM14 promotes active DNA demethylation through the ten-eleven translocation
(TET)-mediated base excision repair pathway in embryonic stem cells. Development,
141(2), 269-280. doi: 10.1242/dev.099622
Pastor, W. A., Aravind, L., & Rao, A. (2013). TETonic shift: biological roles of TET proteins in
DNA demethylation and transcription. Nat Rev Mol Cell Biol, 14(6), 341-356. doi:
10.1038/nrm3589
Petronis, A. (2001). Human morbid genetics revisited: relevance of epigenetics. Trends Genet,
17(3), 142-146.
Petronis, A. (2010). Epigenetics as a unifying principle in the aetiology of complex traits and
diseases. Nature, 465(7299), 721-727. doi: 10.1038/nature09230
Porreca, G. J., Zhang, K., Li, J. B., Xie, B., Austin, D., Vassallo, S. L., . . . Shendure, J. (2007).
Multiplex amplification of large sets of human exons. Nat Methods, 4(11), 931-936. doi:
10.1038/nmeth1110
Poulter, M., Hollox, E., Harvey, C. B., Mulcare, C., Peuhkuri, K., Kajander, K., . . . Swallow, D.
M. (2003). The causal element for the lactase persistence/non-persistence polymorphism
is located in a 1 Mb region of linkage disequilibrium in Europeans. Ann Hum Genet,
67(Pt 4), 298-311.
Quail, M. A., Smith, M., Coupland, P., Otto, T. D., Harris, S. R., Connor, T. R., . . . Gu, Y.
(2012). A tale of three next generation sequencing platforms: comparison of Ion Torrent,
Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics, 13, 341. doi:
10.1186/1471-2164-13-341
Rasinpera, H., Kuokkanen, M., Kolho, K. L., Lindahl, H., Enattah, N. S., Savilahti, E., . . .
Jarvela, I. (2005). Transcriptional downregulation of the lactase (LCT) gene during
childhood. Gut, 54(11), 1660-1661. doi: 10.1136/gut.2005.077404
Reik, W., & Lewis, A. (2005). Co-evolution of X-chromosome inactivation and imprinting in
mammals. Nat Rev Genet, 6(5), 403-410. doi: 10.1038/nrg1602
Rings, E. H., Krasinski, S. D., van Beers, E. H., Moorman, A. F., Dekker, J., Montgomery, R. K.,
. . . Buller, H. A. (1994). Restriction of lactase gene expression along the proximal-to-
71
distal axis of rat small intestine occurs during postnatal development. Gastroenterology,
106(5), 1223-1232.
Romanish, M., Lock, W., van de Lagemaat, L. N., Dunn, C. A., & Mager, D. L. (2005).
Repeated Recruitment of LTR Retrotransposons as Promoters by the Anti-apoptotic
Locus NAIP During Mammalian Evolution. PLoS Genetics, preprint(2006), e10. doi:
10.1371/journal.pgen.0030010.eor
Ronaghi, M. (2001). Pyrosequencing sheds light on DNA sequencing. Genome Res, 11(1), 3-11.
Sancho, E., Batlle, E., & Clevers, H. (2004). Signaling pathways in intestinal development and
cancer. Annu Rev Cell Dev Biol, 20, 695-723. doi:
10.1146/annurev.cellbio.20.010403.092805
Sen, G. L., Reuter, J. A., Webster, D. E., Zhu, L., & Khavari, P. A. (2010). DNMT1 maintains
progenitor function in self-renewing somatic tissue. Nature, 463(7280), 563-567. doi:
10.1038/nature08683
Sharp, A. J., Mefford, H. C., Li, K., Baker, C., Skinner, C., Stevenson, R. E., . . . Eichler, E. E.
(2008). A recurrent 15q13.3 microdeletion syndrome associated with mental retardation
and seizures. Nat Genet, 40(3), 322-328. doi: 10.1038/ng.93
Sheaffer, K. L., & Kaestner, K. H. (2012). Transcriptional networks in liver and intestinal
development. Cold Spring Harb Perspect Biol, 4(9), a008284. doi:
10.1101/cshperspect.a008284
Sheaffer, K. L., Kim, R., Aoki, R., Elliott, E. N., Schug, J., Burger, L., . . . Kaestner, K. H.
(2014). DNA methylation is required for the control of stem cell differentiation in the
small intestine. Genes Dev, 28(6), 652-664. doi: 10.1101/gad.230318.113
Shen, P., Wang, W., Chi, A. K., Fan, Y., Davis, R. W., & Scharfe, C. (2013). Multiplex target
capture with double-stranded DNA probes. Genome Med, 5(5), 50. doi: 10.1186/gm454
Shen, P., Wang, W., Krishnakumar, S., Palm, C., Chi, A. K., Enns, G. M., . . . Scharfe, C.
(2011). High-quality DNA sequence capture of 524 disease candidate genes. Proc Natl
Acad Sci U S A, 108(16), 6549-6554. doi: 10.1073/pnas.1018981108
Shendure, J., & Ji, H. (2008). Next-generation DNA sequencing. Nat Biotechnol, 26(10), 1135-
1145. doi: 10.1038/nbt1486
Smith, Z. D., & Meissner, A. (2013). DNA methylation: roles in mammalian development. Nat
Rev Genet, 14(3), 204-220. doi: 10.1038/nrg3354
Suh, E., Chen, L., Taylor, J., & Traber, P. G. (1994). A homeodomain protein related to caudal
regulates intestine-specific gene transcription. Molecular and Cellular Biology, 14(11),
7340-7351.
72
Swallow, D. M. (2003). Genetics of lactase persistence and lactose intolerance. Annu Rev Genet,
37, 197-219. doi: 10.1146/annurev.genet.37.110801.143820
Tarailo-Graovac, M., & Chen, N. (2009). Using RepeatMasker to identify repetitive elements in
genomic sequences. Curr Protoc Bioinformatics, Chapter 4, Unit 4.10. doi:
10.1002/0471250953.bi0410s25
Teschendorff, A. E., Menon, U., Gentry-Maharaj, A., Ramus, S. J., Gayther, S. A., Apostolidou,
S., . . . Widschwendter, M. (2009). An epigenetic signature in peripheral blood predicts
active ovarian cancer. PLoS One, 4(12), e8274. doi: 10.1371/journal.pone.0008274
Thurman, R. E., Rynes, E., Humbert, R., Vierstra, J., Maurano, M. T., Haugen, E., . . .
Stamatoyannopoulos, J. A. (2012). The accessible chromatin landscape of the human
genome. Nature, 489(7414), 75-82. doi: 10.1038/nature11232
Trapnell, C., & Salzberg, S. L. (2009). How to map billions of short reads onto genomes. Nat
Biotechnol, 27(5), 455-457. doi: 10.1038/nbt0509-455
Troelsen, J. T. (2005). Adult-type hypolactasia and regulation of lactase expression. Biochim
Biophys Acta, 1723(1-3), 19-32. doi: 10.1016/j.bbagen.2005.02.003
Trowbridge, J. J., Snow, J. W., Kim, J., & Orkin, S. H. (2009). DNA methyltransferase 1 is
essential for and uniquely regulates hematopoietic stem and progenitor cells. Cell Stem
Cell, 5(4), 442-449. doi: 10.1016/j.stem.2009.08.016
Van Beers, E. H., Rings, E. H., Posthuma, G., Dingemanse, M. A., Taminiau, J. A., Heymans, H.
S., . . . Dekker, J. (1998). Intestinal carbamoyl phosphate synthase I in human and rat.
Expression during development shows species differences and mosaic expression in
duodenum of both species. J Histochem Cytochem, 46(2), 231-240.
Waddington, C. H. (2012). The epigenotype. 1942. Int J Epidemiol, 41(1), 10-13. doi:
10.1093/ije/dyr184
Wang, D. G., Fan, J. B., Siao, C. J., Berno, A., Young, P., Sapolsky, R., . . . Lander, E. S. (1998).
Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms
in the human genome. Science, 280(5366), 1077-1082.
Weaver, I. C., Cervoni, N., Champagne, F. A., D'Alessio, A. C., Sharma, S., Seckl, J. R., . . .
Meaney, M. J. (2004). Epigenetic programming by maternal behavior. Nat Neurosci,
7(8), 847-854. doi: 10.1038/nn1276
Weber, M., Davies, J. J., Wittig, D., Oakeley, E. J., Haase, M., Lam, W. L., & Schubeler, D.
(2005). Chromosome-wide and promoter-specific analyses identify sites of differential
DNA methylation in normal and transformed human cells. Nat Genet, 37(8), 853-862.
doi: ng1598 [pii]10.1038/ng1598
73
Weichenhan, D., & Plass, C. (2013). The evolving epigenome. Hum Mol Genet, 22(R1), R1-6.
doi: 10.1093/hmg/ddt348
Wu, H., & Zhang, Y. (2014). Reversing DNA methylation: mechanisms, genomics, and
biological functions. Cell, 156(1-2), 45-68. doi: 10.1016/j.cell.2013.12.019
Zhao, Q., Zhu, Z., Kasahara, M., Morishita, S., & Zhang, Z. (2013). Segmental duplications in
the silkworm genome. BMC Genomics, 14(1), 521. doi: 10.1186/1471-2164-14-521