functional genomics of the vnn1 gene - the uwa profiles ...€¦ · functional genomics of the vnn1...

Functional genomics of the VNN1 gene

Luke Aris Diepeveen

BSc, Hons (Genetics)

School of Chemistry and Biochemistry

University of Western Australia

Supervisors:

W/Prof Lawrence Abraham

A/Prof Daniela Ulgiati

This thesis is presented for the degree of Doctor of Philosophy at the University of

Western Australia

July 2013

2

Declaration I declare that this thesis describes my own research and work not previously submitted

at any other university.

Luke Aris Diepeveen

3

Abstract

Chromatin accessibility is considered an excellent measure of the functionality of

genetic variants (Mikkelson et al., 2007; Birney et al., 2010; Pang et al., 2013). It is

common to use reporter gene and gel shift assays for determining the functionality of

genetic variants, however, both approaches are limited by scalability and are lacking in

chromatin context. The present study uses a novel next-gen sequencing-based approach

(‘CHA-seq’) which translates Chromatin Accessibility by Real-Time PCR (CHART-

PCR) into a robust method in order to assess allele-specific differences in chromatin

arrangement across an extended genomic region. CHA-seq is a means of simultaneously

sequencing genetic variants, mapping nucleosomes and nuclease hypersensitivity sites,

and assessing SNP functionality.

Genetic variants within the VNN1 gene have been associated with cardiovascular

disease (Jacobo-Albavera et al., 2012; Fava et al., 2010; Zhu and Cooper, 2007), and as

such, VNN1 represents an ideal gene model for functional assessment. Göring et al.

(2007) identified two candidate functional SNPs, namely -137 (rs4897612) and -587

(rs2050153), in the VNN1 gene promoter that have been experimentally validated using

chromatin arrangement assays as part of this current research. Genotyped cell lines

derived from the San Antonio Family Heart Study (SAFHS) (Göring et al., 2007),

carrying different -137 and -587 alleles, were characterised for allele-dependent

differences in chromatin arrangement as an indicator of function. The CHART-PCR and

CHA-seq results showed allele-specific chromatin accessibility at the -137 and the -587

variants. Further CHA-seq analysis across an extended genomic interval revealed

several candidate functional genetic variants, including the rs2267951 SNP in intron 2

of VNN1. Experimental validation using EMSA showed allele-specific DNA-complexes

4

at the rs2267951 locus, therefore supporting the CHA-seq results. Of the genetic

variants selected for functional assessment, both CHA-seq and other experimental

validation assays have given an equivalent measure of function in VNN1.

Bioinformatic analysis of the VNN1 promoter showed the presence of a CpG island, of

which one CpG site was disrupted by the -587A allele. Bisulfite sequencing showed

allele-specific DNA methylation patterns at the -720 CpG site which were dependent on

the allele at the -587 locus. Therefore, this current study provides evidence that allele-

specific chromatin differences can be seen at sites remote from the actual genetic variation.

The development of enhanced genome-wide in vivo approaches such as CHA-seq

allows for a more cost and time-effective assessment of whether a variant does indeed

regulate chromatin arrangement and gene expression, yielding insights into the long

range allele-specific mechanisms of action.

5

List of Publications Curran, J., Jowett, J., Abraham, L., Diepeveen, L.A., Elliott, K., Dyer, T., Kerr- Bayles, L., Johnson, M., Comuzzie, A., Moses, E.K, et al. (2009). "Genetic variation in PARL influences mitochondrial content." Human Genetics 127(2): 183–190.

Kaskow, B.J., Diepeveen, L.A., Proffitt, J.M., Rea, A.J., Ulgiati, D., Blangero, J., Moses, E.K., Abraham, L.J. (2013). “Molecular prioritisation strategies to identify functional genetic variants in the cardiovascular disease-associated expression QTL Vanin-1.” European Journal of Human Genetics doi:10.1038/ejhg.2013.208

6

Table of Contents Declaration ........................................................................................................................ 2 Abstract ............................................................................................................................. 3 List of Publications ........................................................................................................... 5 Table of Contents .............................................................................................................. 6 List of Figures and Tables ................................................................................................. 9 Abbreviations .................................................................................................................. 11 Acknowledgements ......................................................................................................... 13 Chapter 1: Literature Review ...................................................................................... 15

1.1 Human genetic variation ....................................................................................... 15 1.2 Classes of human genetic variation ....................................................................... 18 1.3 Complex diseases and genetic association studies ................................................ 21

1.3.1 Candidate gene approach ............................................................................... 24 1.3.2 Genome-wide association studies (GWAS) ................................................... 27

1.4 Experimental validation of candidate genetic variation ........................................ 31 1.4.1 Reporter gene assays ...................................................................................... 34 1.4.2 Electrophoretic Mobility Shift Assay............................................................. 36 1.4.3 Chromatin Immunoprecipitation .................................................................... 38 1.4.4 CHART-PCR ................................................................................................. 40 1.4.5 Bisulfite sequencing ....................................................................................... 43 1.4.6 Genome-wide functional assays ..................................................................... 47 1.4.7 Bioinformatic analyses ................................................................................... 49

1.5 VNN1 – A gene model for novel experimental validation assays and cardiovascular disease ................................................................................................. 51

1.5.1 VNN1 – The novel cardiovascular risk-gene .................................................. 52 1.5.2 Gene models in cardiovascular disease – case studies ................................... 53

1.6 Research justification and PhD aims..................................................................... 57 Chapter 2: General methods and materials................................................................ 61

2.1 General reagents, biochemicals, buffers and solutions ......................................... 61 2.2 Bacterial Methods ................................................................................................. 62

2.2.1 Bacterial culture medium ............................................................................... 62 2.2.2 Bacterial culturing conditions ........................................................................ 63 2.2.3 TOPO cloning in bacteria............................................................................... 64 2.2.4 Transformation of Bacteria and blue-white colony selection ........................ 64

2.3 Eukaryotic Methods .............................................................................................. 65 2.3.1 Eukaryotic cells and culturing ........................................................................ 65 2.3.2 Eukaryotic cell counting with Trypan Blue staining ..................................... 65 2.3.3 Cyropreservation storage and revival of eukaryotic cell lines ....................... 66

2.4 Oligonucleotides ................................................................................................... 66 2.5 Isolation and purification of plasmid and eukaryotic DNA .................................. 69

2.5.1 Mini-prep purification of plasmid DNA ........................................................ 69 2.5.2 Isolation and purification of DNA from eukaryotic cells .............................. 69

2.6 Analysis of nucleic acids ....................................................................................... 70 2.6.1 Nano-drop analysis of nucleic acids .............................................................. 70 2.6.2 Gel Electrophoresis ........................................................................................ 70

2.7 Polymerase Chain Reaction (PCR) ....................................................................... 71 2.8 Sanger DNA Sequencing ...................................................................................... 71

7

2.9 Experimental validation methods for SNP functionality ...................................... 72 2.9.1 Chromatin Accessibility by Real-Time PCR (CHART-PCR) ....................... 72

2.9.1.1 Nuclei isolation ....................................................................................... 72 2.9.1.2 Nuclease digestion of chromatin ............................................................. 72 2.9.1.3 Collection of DNA fragments ................................................................. 73 2.9.1.4 qPCR and analysis .................................................................................. 73

2.9.2 Electrophoretic Mobility Shift Assay (EMSA) .............................................. 74 2.9.2.1 Nuclear protein extraction ....................................................................... 74 2.9.2.2 Binding reaction preparation ................................................................... 75 2.9.2.3 Gel electrophoresis and detection ........................................................... 76

2.9.3 Bisulfite Sequencing ...................................................................................... 77 2.9.3.1 Bisulfite treatment ................................................................................... 78 2.9.3.2 PCR amplification ................................................................................... 78 2.9.3.3 Cloning and transformation..................................................................... 78 2.9.3.4 DNA sequencing and analysis ................................................................ 79

2.10 Targeted-CHA-seq (T-CHA-seq)........................................................................ 79 2.10.1 T-CHA-seq library preparation .................................................................... 80

2.10.1.1 Isolation of nuclei from cell lines.......................................................... 80 2.10.1.2 Digesting DNA with chromatin accessibility agent .............................. 81 2.10.1.3 NEBNext DNA Sample Prep kit processing ......................................... 81

2.10.2 PCR amplification ........................................................................................ 82 2.10.3 TOPO cloning and transformation ............................................................... 84 2.10.4 Sanger DNA sequencing .............................................................................. 84 2.10.5 Analysis and bioinformatics ......................................................................... 84

2.11 Extended CHA-seq (CHA-seq) ........................................................................... 85 2.11.1 Library preparation....................................................................................... 86 2.11.2 Bead selection .............................................................................................. 86 2.11.3 Cluster generation and deep sequencing ...................................................... 86 2.11.4 Analysis with IGV 2.1.................................................................................. 86

2.12 Statistical Analyses ............................................................................................. 87 Chapter 3: In vivo assessment of VNN1 promoter SNPs using two chromatin-

sensitive nucleases in CHART-PCR ............................................................................ 88 3.1 Introduction ........................................................................................................... 88 3.2 Methods ................................................................................................................. 91 3.3 Results ................................................................................................................... 93

3.3.1 Assessment of relative chromatin accessibility within the VNN1 gene promoter using DNase I .......................................................................................... 93

3.3.1.1 Chromatin accessibility of VNN1 SNPs relative to SPA-2 and β-actin control loci using DNase I................................................................................... 93 3.3.1.2 Assessment of allele-specific chromatin accessibility at VNN1 SNPs using DNase I ...................................................................................................... 95

3.3.2 Assessment of relative chromatin accessibility within the VNN1 gene promoter using MNase ............................................................................................ 97

3.3.2.1 Chromatin accessibility of VNN1 SNPs relative to SPA-2 and β-actin control loci using MNase .................................................................................... 97 3.3.2.2 Assessment of allele-specific chromatin accessibility at VNN1 SNPs using MNase ....................................................................................................... 99

3.4 Discussion ........................................................................................................... 102 Chapter 4: CHA-seq - A novel functional variant screen ....................................... 106

4.1 Introduction ......................................................................................................... 106

8

4.2 CHA-seq method development ........................................................................... 110 4.2.1 Traditional chromatin arrangement assays – issues ..................................... 111 4.2.2 Targeted CHA-seq (T-CHA-seq) development ........................................... 113

4.2.2.1 T-CHA-seq library preparation – DNase I digestion optimisation ....... 113 4.2.2.2 T-CHA-seq library preparation – adapter-ligation optimisation ........... 115 4.2.2.3 PCR amplification - optimisation.......................................................... 117 4.2.2.4 T-CHA-seq methodology following PCR amplification ...................... 121

4.2.3 Extended CHA-seq method development .................................................... 121 4.2.3.1 Library preparation – MNase digestion optimization ........................... 122 4.2.3.2 CHA-seq methodology post-library preparation ................................... 124

4.3 Results ................................................................................................................... 124 4.3.1 Targeted CHA-seq (T-CHA-seq) ................................................................. 124 4.3.2 CHA-seq ....................................................................................................... 129

4.3.2.1 Allele-specific nuclease activity at the -587 SNP ................................. 134 4.3.2.2 Nucleosome mapping within the -587 VNN1 promoter region ............. 136

4.3.3 CHA-seq for the VNN2 and VNN3 genes ..................................................... 138 4.4 Discussion ........................................................................................................... 140

Chapter 5 – DNA methylation analysis of the -587 promoter region in the VNN1

gene ............................................................................................................................... 144 5.1 Introduction ......................................................................................................... 144 5.2 Materials and Methods ........................................................................................ 145 5.3 Results ................................................................................................................. 146

5.3.1 Predicted CpG sites in the -587 promoter region of the VNN1 gene ........... 146 5.3.2 Bisulfite sequencing of the -587 promoter region in the VNN1 gene and BISMA analysis .................................................................................................... 148 5.3.3 Bioinformatic analysis of the -720 CpG site in the VNN1 gene promoter ... 150

5.4 Discussion ........................................................................................................... 152 Chapter 6 – Experimental validation of candidate functional SNP using EMSA 155

6.1 Introduction ......................................................................................................... 155 6.2 Methods ............................................................................................................... 155 6.3 PARL Results ...................................................................................................... 158

6.3.1 EMSA and bioinformatic analysis of the -191 SNP in the PARL gene promoter ................................................................................................................ 158

6.4 VNN1 Results ...................................................................................................... 160 6.4.1 Bioinformatic analysis of the rs2267951 VNN1 SNP .................................. 160 6.4.2 EMSA analysis of the rs2267951 VNN1 SNP ............................................. 162

6.5 Discussion ........................................................................................................... 164 Chapter 7 – General discussion and future research directions ............................. 166

7.1 CHA-seq contribution to functional research ..................................................... 166 7.2 Contribution to VNN1 research ........................................................................... 168 7.3 Future research directions ................................................................................... 170

References ..................................................................................................................... 173 Appendix – Supplementary Tables and Figures ........................................................... 204

9

List of Figures and Tables Table 1.1: Characteristics of traditional experimental validation assays for the assessment of candidate functional genetic variants ....................................................... 33 Figure 1.1 Principles of CHART-PCR............................................................................ 41 Figure 1.2 Principles of methylation analysis using bisulfite genomic sequencing ....... 44 Table 2.1: Primer design guidelines ................................................................................ 68 Table 2.2: PCR primers used in T-CHA-seq .................................................................. 83 Table 3.1: CHART-PCR primers .................................................................................... 92 Figure 3.1: Chromatin accessibility differences at the -137 and -587 VNN1 SNPs relative to SPA-2 and β-actin control regions ................................................................. 94 Figure 3.2: Chromatin accessibility differences at the -137 and -587 VNN1 SNPs ........ 96 Figure 3.3: Chromatin accessibility differences at the -137 and -587 VNN1 SNPs relative to SPA-2 and β-actin control regions ................................................................. 98 Figure 3.4: Chromatin accessibility differences at the -137 VNN1 SNP ...................... 100 Figure 3.5: Chromatin accessibility differences at the -587 VNN1 SNP ...................... 101 Figure 4.1 Principles of CHA-seq and T-CHA-seq. ..................................................... 109 Figure 4.2: Chromatin accessibility differences between replicates at the β-actin promoter using CHART-PCR ....................................................................................... 112 Figure 4.3: Electrophoretic patterns of DNase 1-digested chromatin ........................... 114 Figure 4.4: Adapter ligation to DNase I-digested chromatin ........................................ 116 Figure 4.5: T-CHA-seq primer positions in the -137 VNN1 promoter region .............. 118 Figure 4.6: T-CHA-seq ‘forward’ PCR amplification of DNase I library fragments ... 119 Figure 4.7: T-CHA-seq ‘reverse’ PCR amplification of DNase I library fragments .... 120 Figure 4.8: Electrophoretic patterns of MNase-digested chromatin ............................. 123 Figure 4.9: Assessment of the chromatin arrangement across the VNN1 promoter using T-CHA-seq .................................................................................................................... 126 Figure 4.10: Assessment of the chromatin arrangement across the VNN1 promoter using T-CHA-seq .................................................................................................................... 128 Table 4.1: Heterozygous VNN1 SNPs captured in CHA-seq. ....................................... 131 Figure 4.11: Distribution of heterozygous SNPs captured in CHA-seq across the VNN1 gene relative to allele skew ........................................................................................... 133 Figure 4.12: Assessment of the allele-specific chromatin accessibility across the -587 VNN1 promoter region using CHA-seq ........................................................................ 135 Figure 4.13: Assessment of the chromatin arrangement across the -587 VNN1 promoter region using CHA-seq ................................................................................................... 137 Table 4.2: Heterozygous VNN2 and VNN3 SNPs captured in CHA-seq ...................... 139 Figure 5.1: BISMA prediction of VNN1 CpG island length ......................................... 147 Figure 5.2: Methylation of the -587 SNP ...................................................................... 149 Figure 5.3: Bioinformatic analysis of -720 CpG site in the VNN1 gene promoter ....... 151 Table 6.1: EMSA oligonucleotides for PARL and VNN1 SNPs ................................... 157 Figure 6.1: EMSA for the -191 PARL SNP .................................................................. 159 Figure 6.2: Bioinformatic analysis of the rs2267951 SNP in the VNN1 gene .............. 161 Figure 6.3: EMSA for the rs2267951 VNN1 SNP relative to the -191 PARL SNP ...... 163

10

Table A.1: Description of general biochemicals and reagents ...................................... 204 Table A.2: Distributers for laboratory equipment used in research .............................. 207 Table A.3: Distributers for general laboratory disposables used in this research ......... 208 Table A.4: DNase I cutting at the -137 and -587 VNN1 gene promoter SNPs, with SPA-

2 and β-actin control loci in SAFHS cell 0054 ............................................................. 209 Table A.5: DNase I cutting at the -137 and -587 VNN1 gene promoter SNPs in homozygous SAFHS cell lines ..................................................................................... 210 Table A.6: MNase cutting at the -137 and -587 VNN1 gene promoter SNPs, with SPA-2 and β-actin control loci in heterozygous Ramos cell line ............................................. 211 Table A.7: MNase cutting at the -137 VNN1 gene promoter SNP in different SAFHS cell lines ........................................................................................................................ 212 Table A.8: MNase cutting at the -587 VNN1 gene promoter SNP in different SAFHS cell lines ........................................................................................................................ 213

11

Abbreviations Abbreviation Description

º Degrees % Percentage ANOVA Analysis of Variance ATCC American Type Culture Collection BISMA Bisulfite Sequencing DNA Methylation Analysis bp Base pair BP Blood pressure BSA Bovine Serum Albumin C Celsius CD Crohn’s disease C/EBP CCAAT/enhancer binding protein CHART-PCR Chromatin accessibility by Real-Time PCR CHA-seq Chromatin accessibility by sequencing CHD Coronary heart disease ChIP Chromatin Immunoprecipitation cm Centimeters cM Centi-morgan CO2 Carbon dioxide CR Complement Receptor CVCD Common Variant, Common Disease CVD Cardiovascular disease ddH2O Double deionised water DMSO Dimethyl sulphoxide DNA Deoxyribonucleic acid DNase I Deoxyribonuclease I ds Double-stranded DTT Dithiothreitol EBV Epstein-Barr Virus E.coli Escherichia coli

EDTA Ethylenediaminetetraacetic acid EGTA Ethylene glycol tetraacetic acid EMSA Electrophoretic Mobility Shift Assay eQTL Expression Quantitative Trait Loci FBG Fasting blood glucose FBS Fetal Bovine Serum FH Familial hypercholesterolemia g Gram GWAS Genome-wide Association Study HCl Hydrochloric acid HDL-C High-density lipoprotein cholesterol HEPES Hydroxyethyl piperazineethanesulfonic acid Hg Mercury HGP Human Genome Project HLA Human leukocyte antigen IGV Integrative Genomics Viewer indel Insertion or deletion kb Kilobase pairs

12

L Litre(s) LB Luria-Bertani LD Linkage disequilibrium LDL Low-density lipoprotein m Milli (1 x 10-3) M Molar Mb Megabase pairs min Minute(s) MNase Micrococcal nuclease mRNA Messenger RNA NaOH Sodium Hydroxide nm Nanometers oligo Oligonucleotide PARL Presenilin Associated Rhomboid-Like PBS Phosphate Buffered Saline PCR Polymerase Chain Reaction PMSF Phenylmethylsulphonylfluoride Poly dI:dC Poly (deoxyinosinic-deoxycytidylic) acid PSI pounds per square inch qPCR Quantitative real-time PCR QTL Quantitative Trait Loci RA Rheumatoid arthritis RNA Ribonucleic acid rpm Revolutions per minute RVCD Rare Variant, Common Disease SAFHS San Antonio Family Heart Study SD Standard deviation SDS Sodium dodecyl sulphate seq Sequencing SNP Single Nucleotide Polymorphism Sp Specificity protein SPA-2 Surfactant protein 2 T1D Type 1 diabetes TAE Tris-acetate-EDTA T-CHA-seq Targeted chromatin accessibility by sequencing TE Tris-EDTA TF Transcription factor TNF Tumour Necrosis Factor Tris 2-amino-2-hydroxymethyl-1,3-propanediol U International enzyme units µ Micro (1 x 10-6) UV Ultraviolet V Volts VNN Vanin v/v Volume per volume w/w Weight per weight w/v Weight per volume X Times

13

Acknowledgements

First and foremost I would like to thank my supervisors W/Prof Lawrie Abraham and

A/Prof Daniela Ulgiati. It has been an honour to be one of your PhD students. You each

have taught me, both consciously and un-consciously, how to become a good scientist. I

appreciate all the contributions of time, ideas, and funding to make my PhD experience

productive and enriching.

Importantly, I wish to thank the University of Western Australia, the School of

Chemistry and Biochemistry and the Scholarships department for all of the funding,

education and opportunities I have been given. I must also recognise the amazing

opportunity I had to travel to the Texas Biomedical Research Institute. I am fortunate

for the time I was able to spend there, for the use of the deep sequencing facilities, and

for the cell-lines that were provided for this project. I must sincerely thank all concerned

for this opportunity, especially my supervisor Lawrie.

I must express my thanks the members of the Abraham/Ulgiati group have contributed

immensely to my personal and professional time at the University of Western Australia.

The group has been a source of friendships as well as good advice and collaboration. I

am grateful to all members including Des, Hui Min, Mahdad, Mark, Tim, Nick,

Michelle, Freya and others.

My time at the University of Western Australia was made enjoyable in large part due to

the many friends and groups that became a part of my life. I am appreciative of the

support from members of the School of Chemistry and Biochemistry including Amber,

Yat-zu, and Steve, Paul, Adam and Nehal. Thank you.

In particular, I would like to personally address the contributions of certain individuals.

Firstly Belinda, you have been a constant source of joy and inspiration. I am thankful

for all of your kind words and guidance. You are a true friend.

Alex, you have been both a friend and a role model. I could not have asked for a better

role model who is inspirational, supportive, and thoughtful.

14

Rhonda, your determination and passion was a source of inspiration for me. When times

were tough, you always managed to help me through.

Josh and Robert, you are both such amazing people. I know you both will achieve so

much in science, and life.

George, I have always enjoyed discussing our joint love of tennis and the West Coast

Eagles. More importantly, I am forever grateful for the opportunity you have given me

to work in your laboratory during the final months of writing my thesis. I also must

thank the other members of your group, Adam, Sarah, Robyn, Connie, Ken, Anne,

Aurelian, Megan, Ros, Karen and Bernard for making my time in the Yeoh laboratory

so fruitful and enriching.

Liz, you are an inspiration to me as both a scientist and person. You are both kind and

caring, and I am so grateful for your guidance. You have given me so much, and I am

truly grateful.

I would have never achieved what I have in life without the love and support of my

family. Mum, Dad, Lara, Pippa, you are my inspiration. Thank you for everything and

for always being there for me.

To my dearest wife, Becky, you are my world. I dedicate this PhD to you, because

without you I have nothing. I was so lucky to have found you during the course of this

PhD and I cannot wait for our future together.

15

Chapter 1: Literature Review

1.1 Human genetic variation

The last decade has seen significant effort committed to cataloguing human genetic

variants and their association with phenotypic differences or disease (Frazer et al., 2009;

Bodmer and Bonilla, 2008; Mackay et al., 2009). Since the completion of the Human

Genome project (HGP) in 2001, most common variants (minor allele frequency > 1%)

have been identified and assessed in genetic studies for statistical association with

complex traits and diseases (Collins et al., 2003; Bulik-Sullivan et al., 2012; Kathiresan

and Srivastava, 2012). Although genetic studies in humans have identified potential

disease-risks and novel biological insights, limited progress has been made to identify

the heritable component(s) of any complex human trait (Frazer et al., 2009).

Characterising and cataloguing genetic variation will assist in understanding the

functional mechanism behind which associated variants lead to certain phenotypic traits

or disease. Dissection of such mechanisms will enable the prioritization of candidate

genetic variants for further investigation (Stower, 2013).

One of the first large-scale studies to characterise human genetic variation was the

‘International HapMap Project’ (The International HapMap Consortium, 2003; The

International HapMap Consortium, 2004). This study aimed to characterize the

influence of human genetic variation on phenotype with a focus on understanding the

range of patterns and frequencies of genetic variants (Frazer et al., 2009). The major

objective of the HapMap project was to genotype at least one common variant every

five kilobases across the actively transcribed regions of the human genome (The

International HapMap Consortium, 2007). Between the HapMap project launch in 2002

16

and the phase II publication of results in 2007, more than 3.1 million single nucleotide

polymorphisms (SNPs) were catalogued in 270 individuals from four geographically

distinct populations (The International HapMap Consortium, 2007). More recently,

HapMap phase III results provide genotype data on approximately 1.5 million SNPs

from 1397 individuals in 11 populations (Pemberton et al., 2010). With the aid of

newer, high-throughput DNA sequencing technologies, over 10 million SNPs have now

been catalogued and are publically available online through the HapMap project

database (http://hapmap.ncbi.nlm.nih.gov/).

Ancestral and de novo mutations, in addition to the admixture generated by DNA

recombination, are the primary sources of genetic variation in a given population

(Jenkins et al., 2012; Peter et al., 2012). In humans, the contribution of de novo

mutation and recombination has been estimated on average to be 60 novel

mutations/variants per generation (Conrad et al., 2011), with an average rate of 0.5 ×

10−9 mutations base-pair−1 year−1 over the past million years (Scally and Durbin, 2012).

This estimate, however, suggests a mutation rate roughly half that of previously derived

measurements from fossil calibration and suggests a higher-than-previously-thought

contribution of DNA recombination to evolution (Scally and Durbin, 2012). Human de

novo mutation is not only an important substrate for selection and evolution (Lynch,

2010), but also to fitness through the generation of deleterious alleles as demonstrated

by Kong et al. (2012). Kong and co-workers determined that a father’s age at

conception was associated with an increase (two additional mutations per year) in de

novo mutation rate of SNPs, and subsequently increased disease-risk of schizophrenia

and autism in an Icelandic population.

17

Typically, DNA recombination occurs within 1-2 kb genomic ‘hotspots’. More than

30,000 hotspots have been identified in humans, suggesting that they are a ubiquitous

feature of the human genome (Myers et al., 2008; Myers et al., 2006). For instance,

known recombination hotspots have been reported within the human Prdm9 gene

(Baudat et al., 2010), the TAP2 gene (Jeffreys et al., 2000), the RNF212 gene (Kong et

al., 2002) and within the chromosome region 11p15.5 (Sandovici et al., 2006). The

average rate of recombination per generation in humans is 1.13 cM/Mb, however rates

vary significantly (0.1-3 cM/Mb) across the genome (Auton and McVean, 2012).

Variation in DNA recombination rate can be attributed to factors such as age and sex

(Hussin et al., 2011), genetic variation (Fledel-Alon et al., 2011), and chromosome size

(Farré et al., 2012). Wang et al. (2012) compared DNA recombination and de novo

mutation rates and found a recombination rate of approximately 23 events per cell,

whereas the aneuploidy rate ranged from 2-10% in sperm.

Gene expression is the mechanism by which genetic information is used in the synthesis

of a functional product such as RNAs and protein (Cambien and Tiret, 2007). The

functional impact of most genetic variants is hypothesised to be neutral (i.e. no

observable phenotypic impact), with many variants achieving high frequency in human

populations by chance (Powell, 2012; Cambien and Tiret, 2007). However, subsets of

genetic variants occurring within regulatory or coding regions can influence gene

expression levels or alter the amino acid sequence of proteins. This can result in

observable differences in phenotype (Cambien and Tiret, 2007). For example, Love-

Gregory et al. (2011) identified four metabolic syndrome risk-SNPs (rs1761667,

rs3211909, rs3211913, rs3211938) located within the CD36 gene which resulted in

reduced CD36 protein expression. These SNPs were also found to be associated with

levels of free fatty acids and high-density lipoprotein (HDL). More recently, Keenan et

18

al. (2012) reported that a coding SNP (rs4844609) in the CR1 gene, which correlated

with a Ser1610Thr amino acid change, was associated with episodic memory decline in

Alzheimer’s disease. Keenan and co-workers suggested that the rs4844609 SNP alters

the conformation of the functional domain of the CR1 protein, leading to the

development of Alzheimer’s symptoms. Given that ‘functional’ genetic variants can

interrupt or alter cellular mechanisms, understanding the role of genetic variation is

fundamental to disease studies in determining how a particular disease or phenotype

arises (Girirajan et al., 2011).

1.2 Classes of human genetic variation

Human genetic variants are termed as either common or rare based on the frequency of

the minor allele in a population, and is divided into two classes based on nucleotide

composition; single nucleotide variants and structural variants (Eichler et al., 2007;

Redon et al., 2006; Altshuler et al., 2010). Structural variations primarily consist of

insertions and deletions (‘indels’); in addition to copy number variations and

retrotransposons (Alkan et al., 2011; Muers, 2010; Mills et al., 2011). Single variants

account for approximately 90% of all genetic variation and occur on average every 100-

300 base pairs (Thomas et al., 2011).

The ‘1000 Genomes Project’ was established to catalogue all classes of human genetic

variation (Altshuler et al., 2010; Abecasis et al., 2012). This project consisted of

sequencing 1092 genomes using a low-coverage (2-6 fold) whole-genome

combinatorial approach. This approach consisted of targeted (50-100 fold) exonic and

dense SNP genotype data in conjunction with bioinformatic methods for selecting high

resolution variant calls. The 1000 genome project generated a haplotype map of 38

million SNPs, 1.4 million short insertions and deletions and more than 14,000 large

19

deletions (Altshuler et al., 2010; Abecasis et al., 2012). Ultimately, this project has

created the most detailed public catalogue of human genetic variation and is publically

available online (http://www.1000genomes.org/).

Common genetic variants in human populations are the focus of most current research

studies that target phenotypic traits (Marian and Belmont, 2011). Genetic variants

associated with a measurable allele-specific influence on the function of a gene product,

gene expression, or phenotype are considered to be functional (Wang et al., 2011;

Mottagui-Tabar et al., 2005; Knight, 2005; Jian et al., 2011). Functional genetic

variants that contribute to phenotypic differences amongst individuals have been

associated with cell function (Bonetti et al., 2012; Johnson et al., 2012; Collins et al.,

1997; Syvanen et al., 1999) and disease (Bartoszewski et al., 2010; Zhang et al., 2011;

Pratt and Dzau, 1999; Wang and Moult, 2001). Ultimately, the position of genetic

variants within a gene directly influences the potential functional impact on the gene

(Guo and Jamison, 2005). For example, SNPs occurring within regulatory regions,

especially near the transcription start site, can influence gene expression levels and

promote alternative splicing (Pickrell et al., 2010). Interestingly, genetic variants

located within regulatory regions that are distal to a particular gene, such as enhancer

elements, can also influence gene expression (Chavan et al., 2010; Steidl et al., 2007).

In contrast, SNPs within coding regions can alter the structure and activity of the

encoded functional products which may include both RNAs and proteins (Nackley et

al., 2006; Hoffmeyer et al., 2000). For instance, SNPs within exons of protein-coding

genes can alter the subsequent protein structure and activity as shown by Schaeffeler et

al. (2006) in the TPMT gene and Diogo et al. (2013) in the IL2RA, IL2RB and CD2

genes.

20

In contrast to single nucleotide variants, structural variants can have a more significant

influence on phenotype, often resulting in clinical manifestations such as Down and

Turner syndromes (Feuk and Carson, 2006; Lee and Scherer, 2010), and Angelman and

Prader–Willi syndromes (Sanders et al., 2011). The molecular consequences of

structural variants are locus-dependent, with variants occurring within coding regions

resulting in gene fusions or truncations. Alternatively, variants within regulatory regions

can lead to alterations in gene expression levels (Weischenfeldt et al., 2013). For

example, overlapping 1.5-3Mb deletions within the chromosome 22q11 band

containing genes such as TBX1, COMT, and CRKL, can lead to DiGeorge syndrome and

velocardio facial syndrome due to haploinsufficiency of a number of encoded proteins

(Maynard et al., 2002; Weischenfeldt et al., 2013).

In response to research evidence suggesting that most candidate functional variants are

located within regulatory rather than coding regions, a public research consortium

designated as the ‘ENCODE’ project was launched in 2003 to improve the functional

annotation of genetic variation in the human genome (Birney et al., 2007; Rosenbloom

et al., 2012). The objective of the ENCODE research was to identify and catalogue all

functional elements and variants in human and other mammalian genomes (Birney et

al., 2007; Rosenbloom et al., 2012). As of January 2012, 27 biochemical assays

(including RNA-seq, ChIP-seq, and DNA methylation assays) had been submitted. This

data covered 200 distinct human cell types and is available for public use

(http://genome.ucsc.edu). While the ENCODE project has generated significant

amounts of novel information, a detailed understanding of the default function and

interactions of regulatory variants in the phenotype of different cells and tissues remains

incomplete (Rosenbloom et al., 2012).

21

1.3 Complex diseases and genetic association studies

Complex diseases, as opposed to ‘simple’ single-gene Mendelian disorders, are

typically caused by combinations of genetic variants and/or mutations that

combinatorially result in dysregulation of components of a cellular system or pathway

(Cho et al., 2012). Complex diseases such as cancer, cardiovascular disease and

diabetes are associated with the effects of multiple genes in combination with

environmental and lifestyle factors (Erler and Linding, 2012; Kathiresan and Srivastava,

2012; Kodama et al., 2012). Consequently, most complex diseases are heterogeneous

given the unique genetic variation that exists between individuals and the vast array of

external environmental factors (Cho et al., 2012).

Fundamental to medical research is the identification of functional genetic variants and

the characterisation of the mechanisms through which they influence the phenotype of

complex diseases (Stranger et al., 2011). Thus far, genetic variation has been associated

with the disease phenotypes in terms of predisposition for disease initiation, progression

and severity (Duff, 2006; Knowles and Drumm, 2012). Two common hypotheses, the

‘common variant-common disease’ (CVCD) and the ‘rare-variant-common disease’

(RVCD) are used to describe the genetic contribution of individual susceptibility to

complex diseases (Schork et al., 2009). The CVCD hypothesis suggests that a collection

of common alleles (alleles with a population frequency of ≥0.05) increase the risk of

common disease onset and progression (Marian and Belmont, 2011). This hypothesis

suggests that common genetic variants confer a modest cumulative and interactive

effect in complex disease phenotype such as diabetes or cardiovascular disease (Liu and

Song, 2010). The RVCD hypothesis, on the other hand, suggests that variants with

lower-frequency (alleles with a population frequency of <0.005) and higher-penetrance

significantly contribute to disease phenotype (Bodmer and Bonilla 2008). For example,

22

a number of rare genetic variants have been demonstrated to contribute to diseases such

as hypertrophic cardiomyopathy (Marian, 2010) susceptibility and thus allow enhanced

detection of the diseases.

Coding variants were the initial targets of complex trait research (Bodmer and Bonilla,

2008; Majewski and Pastinen, 2011; Bostein and Risch, 2003), with casual rare variants

associated with common complex disorders often through amino acid substitution

(Ladouceur et al., 2012; Cohen et al., 2004). There is increasing evidence supporting the

RVCD hypothesis, with rare coding variants associated with common human disorders

including breast cancer (Schork et al., 2009), type I diabetes (Nejentsev et al., 2009),

hypertension (Ji et al., 2008), and cholesterol disorders associated with sterol absorption

and plasma LDL-cholesterol levels (Cohen et al., 2004; Cohen et al., 2006). For

example, Goldgar et al. (2011) identified a number of truncating variants, and a rare

SNP (‘c.7271T>G’) in the ATM gene that were associated with a significantly increased

risk of breast cancer. The penetrance of these variants was similar to that conferred by

mutations in the known breast cancer susceptibility gene BRCA2. While a focus of

complex disease research has been to identify susceptibility variants, recent studies have

identified rare genetic variants that have a ‘protective’ role in complex traits (Dai et al.,

2012). For instance, multiple rare variants have been shown to act protectively against

type I diabetes (Nejentsev et al., 2009) and psoriasis (Li et al., 2010).

The CVCD hypothesis proposes that common diseases are typically correlated with

multiple susceptibility alleles, with early supporting evidence from the ApoE ε4 allele

in Alzheimer's disease (Saunders et al. 1993; Genin et al., 2011) and a Factor V allele

(‘1691C�A’) in deep-vein thrombosis (Bertina et al. 1994; Knight, 2005). More

recently, common genetic variants have been identified in Crohn's disease (CD)

23

(Yamazaki et al., 2005; Mathew, 2008) and type 1 diabetes (Todd et al., 2007). In 2005,

Yamazakiet et al. identified a common intronic SNP (‘14,340T→C’) in the TNFSF15

gene, which encodes the tumour necrosis super-family member 15, which contributes to

CD susceptibility. A subsequent study by Duerr et al. (2006) screened a North-

American sample and reported strong association of multiple common SNPs

(rs2066843 and rs2076756) within a known CD susceptibility gene, CARD15. Common

variants associated with gene expression levels have also been linked to common

complex diseases such as breast cancer (Zhang et al., 2011). For instance, Zhang et al.

(2011) identified a common SNP (rs1044129) located in a micro-RNA binding site

(miR-367) in the 3’ untranslated region (3’UTR) of the RYR3 gene that was associated

with a higher risk of breast cancer. Zhang et al. (2011) reported that miR-367 had a

higher binding affinity (resulting in a decrease in gene expression) in the presence of the

major allele (A) compared to the minor allele (G). Genotypes in which the G allele was

present were associated with a higher risk for breast cancer development, calcification

and survival of the patient (Zhang et al., 2011).

In order to identify the genetic component of human diseases, early ‘family studies’

were performed to trace the distribution of traits from parental generations to progeny

and involved the determination of mathematical segregation ratios (Borecki and

Province, 2008). Family studies have been successful in the identification of genes

associated with monogenic diseases including Duchenne muscular dystrophy (Jones et

al., 2012) and Huntington disease (Anderson et al., 2004). However, this approach has

had limited success with complex traits and diseases. Complex diseases do not exhibit

patterns of Mendelian inheritance. Consequently, genetic association approaches are

used to identify genetic risk-variants in favour of family studies (Cambien and Tiret,

2007). Primarily through advancements in genotyping technologies, the focus of gene

24

mapping in humans has shifted towards the use of genetic association studies (Stranger

et al., 2011).

Genetic association studies incorporate large sets of SNPs and other markers located in

known candidate regions or genes to statistically determine possible causal variants that

show greater prevalence in case vs control samples (Doris, 2011). Genetic association is

dependent upon the existence of linkage disequilibrium (LD) between genetic variants

in close physical proximity within the genome Thus, the identification of possible

casual variants is possible due to the presence of LD with detected variants (Stranger et

al., 2011). Thus far, over 800 genotype/SNP-phenotype associations have been

identified from 600 published association studies covering 150 complex human diseases

(Bakir-Gungor and Sezerman, 2011). While association studies have provided great

insight into potential disease pathways, this success must be put into perspective by

considering inherent limitations (Hirschhorn et al., 2009; McCarthy et al., 2008). For

example, a review by Jakobsdottir et al. (2009) demonstrates that while genetic

associations are valuable for establishing research hypotheses, strong genetic

association does not guarantee the identification of causal genetic risk variation between

cases and controls. Ultimately, a more detailed background of the nature of the complex

disease and possible genomic regions of susceptibility and/or biochemical pathways is

required to further refine lists of potential disease-risk variants (Jostins and Barrett,

2011; Pociot et al., 2010).

1.3.1 Candidate gene approach

In order to identify and map novel genetic variation underlying human diseases or traits

in association studies, researchers utilise either candidate gene or genome-wide

approaches (Altshuler et al., 2008). Candidate gene studies target genetic variation

25

within specific genes of interest, with candidates generally selected based on prior

knowledge of the biological or functional effect of a gene in a particular phenotype or

disease (Zhu and Zhao, 2007). The key assumption in candidate gene approaches is that

a gene functionally associated with a disease-causing gene is likely to contribute to the

same or similar disease state (Ideker and Sharan, 2008; Goh et al. 2007).

Based on functional data in mice, Kurreeman et al. (2007) targeted complement

component 5 (C5) and TNF receptor-associated factor 1 (TRAF1) as candidate genes for

rheumatoid arthritis (RA). Kurreeman and co-workers identified the rs10818488 SNP to

be associated with a 6.1% increased risk of RA. The A allele of this SNP also found to

be correlated with increased RA disease progression. Candidate gene approaches have

also been used by Muise et al. (2012) to identify a novel missense variant (‘c.113

G→A’) in the neutrophil cytosolic factor 2 (NCF2) gene that increases the susceptibility

to and severity of inflammatory bowel disease. Muise and co-workers determined that

the c.113 G→A variant correlated with increased disease progression through reduced

binding of the NCF2 gene product, p67-phox, to the NADPH oxidase complex gene,

RAC2.

Typically, candidate gene studies have high statistical power, but the selection of

candidate genes is intrinsically subjective and often arbitrary (Moreau and Tranchevent,

2012). As such, traditional candidate gene studies are limited by a dependence on prior

knowledge of the biochemical, functional or physiological impact of possible

candidates, which is often incomplete or unknown (Zhu and Zhao, 2007). However,

recently computational strategies have been developed that integrate DNA sequence

data, gene expression information, functional characterisation and research literature to

26

allow improved candidate gene selection, as reviewed by Moreau and Tranchevent

(2012).

A review by Zhu and Zhao (2007) outlines three common approaches for improved

candidate gene selection in genetic association studies. The first approach, ‘position-

dependent’, integrates genome sequence information and candidate gene analyses to

identify candidate genes primarily based on the physical linkage within a chromosomal

locus previously associated with the trait of interest (Zhu and Zhao, 2007). The second

approach, outlined by Zhu and Zhao (2007), is ‘comparative genomics’. This approach

enables the identification and characterisation of putative candidates using a cross-

species approach in human-disease models. Thirdly, the ‘function-dependent’ strategy

incorporates analyses of expression of a gene associated with a particular trait at

different phenotypic stages, including signalling and regulatory pathways combined

with genome-wide transcriptional profiles, to prioritise candidate genes (Zhu and Zhao,

2007). Ultimately, combining these selection strategies together with the use of

bioinformatic tools can substantially reduces the number of candidate genes for further

analysis as reported by Moreau and Tranchevent (2012). For example, Rajab et al.

(2010) used a prioritisation tool, GeneDistiller, in combination with different candidate

gene approaches, to prioritize 74 genes from a 2 megabase region on chromosome 17

that was associated with cardiac arrythmias. Rajab and co-workers used this integrated

selection strategy to identify a mutation in the leading candidate gene PTRF, which

became the focus of subsequent research (Rajab et al., 2010).

Although the candidate gene approach is an effective and economical method for direct

gene discovery, the number of causative genetic faults that have been experimentally

validated is limited, thereby restricting the number candidate genes (Zhu and Zhao,

27

2007). Furthermore, the candidate gene approach has been criticized due to the non-

replication of results and the inability to include all potential causative genes and

polymorphisms, suggesting a large number of false-positive reports (Tabor et al., 2002;

Morgan et al., 2007; Pasche and Yi, 2011). Ultimately, the capabilities of traditional

candidate gene approaches are limited by the dependence on existing knowledge of the

underlying biology of a particular disease or phenotype (Zhu and Zhao, 2007). Given

that detailed characterisation of genes or variants underpinning most complex diseases

remains unknown, integrated selection strategies may represent the most effective

candidate gene approach in future research (Moreau and Tranchevent, 2012; Zhu and

Zhao, 2007).

1.3.2 Genome-wide association studies (GWAS)

With advances in high-throughput DNA sequencing technologies, genetic association

studies have extended trait mapping to a genome-wide scale, allowing the correlation of

allele frequencies of thousands of genetic variants with phenotypic variation in a

population (Stranger et al., 2011). Already with these advances, genome-wide

association studies (GWAS) have yielded evidence for the involvement of genetic

variants in complex human diseases including autism (Wang et al., 2009), asthma

(Sleiman et al., 2010), type 1-diabetes (Todd et al., 2007; Hakonarson et al., 2007), and

cancer (Tomlinson et al., 2012; Tao et al., 2012). GWA studies targeting human cancers

alone have yielded over 200 common low-penetrance susceptibility variants (Freedman

et al., 2011), including rs16892766 (Tomlinson et al., 2008) and rs3802842 (Tenesa et

al. 2008) in colorectal cancer, and rs1465618 and rs12621278 in prostate cancer (Eeles

et al., 2009).

28

One of the fundamental advantages of GWAS is that it does not require prior

knowledge of trait etiology and genomic structure, as opposed to the candidate gene

approach (Stranger et al., 2011). Consequently, the GWAS approach can identify causal

genes or variants that have not previously been associated with disease expression and

enable the characterisation of non-coding regions and pleiotropic genes in an unbiased

way (Gottesman et al., 2012). Essential to the GWAS approach is the necessity to

distinguish between the direct effects of functional variants themselves and indirect

associations due to LD in large sets of candidate variants generated in each study

(McCarthy et al., 2008; Teng et al., 2012). As such, the utilisation of bioinformatic and

computational approaches in GWAS are critical for the prioritisation of variants most

likely to contribute to the observed association and assists the selection of candidates for

subsequent experimental validation (Wang et al., 2011; Teng et al., 2012).

A number of bioinformatic tools such as TRAP (Thomas-Chollier et al., 2011),

BCRANK (Ameur et al., 2009), is-rSNP (Macintyre et al., 2010) and AASsites (Faber

et al., 2011), have been designed to prioritise candidate functional SNPs located within

regulatory regions such as transcription factor binding sites and exonic splice sites.

Furthermore, a number of systems for SNP prioritisation, as proposed by Lee and

Shatkay (2009) and Nica et al. (2010), combine multiple independent prediction tools to

enhance candidate SNP selection. For example, Nica et al. (2010) developed a novel

SNP prioritisation approach that accounts for local LD structure and integrates

quantitative trait loci (QTLs) and GWAS data to determine the specific association

indicators that arise from cis-acting QTLs. Nica et al. (2010) used their SNP

prioritisation approach to identify 35 candidate disease-associated SNPs from GWAS

and QTL datasets, of which the leading candidate disease-risk SNP, a non-synonymous

SNP (rs3764261) in the MT1H gene, was highly associated with high density

29

lipoprotein cholesterol (HDL-C). Previously, the MT1H gene has been associated with

inflammatory bowel diseases (Goyette et al., 2008).

In the last decade, GWAS have identified hundreds of genetic variants associated with

complex diseases such as systemic lupus erythematosus, multiple sclerosis, Crohn's

disease, rheumatoid arthritis and type 1 diabetes (T1D) (Hu and Daly, 2012; Hindorff et

al., 2009). In one such study, Shan et al. (2012) identified five out of nine previously

identified GWAS candidate loci to be significantly associated with breast cancer in a

Tunisian population. The five loci included the rs1219648 and rs2981582 SNPs from

the FGFR2 gene, rs8051542 in the TNRC9 gene, rs889312 of the MAP3K1 gene and the

rs13281615 SNP located on chromosome 8 (8q24). Shan and co-workers identified

homozygous rs2981582 genotypes to be strongly associated with lymph node negative

breast cancer, with the minor allele also linked to an increased risk of ER+ tumor

development and metastasis (Shan et al., 2012). GWA studies have also identified

candidate disease-risk SNPs in known T1D-susceptibility genes, such as HLA, INS,

PTPN22, IFIH1 and CTLA4 (Reddy et al., 2011). Moreover, genetic determinants

within the HLA class II genes are considered to account for almost 50% of T1D-risk

(Grant and Hakonarson, 2009).

GWAS have identified more than 2,000 complex trait-SNP associations, with numbers

increasing rapidly (Casto and Feldman, 2011). Regardless of the success as a screening

approach for identifying genetic variations involved in complex traits, GWAS research

based on SNP analysis are unable to identify causal genetic variants without

experimental validation (Li, M et al., 2011). Apart from the significant cost and

resources required of GWAS, a major criticism has been that the majority of the

candidate SNPs identified in such studies are only associated with minor increases in

30

disease susceptibility and therefore possess limited predictive value (Manolio, 2010). In

fact, the median odds ratio per candidate risk-SNP has been calculated as 1.33, which is

considered small given that it does not account for the significant proportion of the

heritable variation (Hindorff et al., 2009). Furthermore, without proper research design

it is possible for a number of variables to potentially confound GWAS results, including

sex, age and population stratification (Novembre et al., 2008). Confounding factors

from GWAS research design can also arise from lack of well-defined case and control

groups, controls for multiple testing and inadequate sample size (Pearson and Manolio,

2008). The resolution of GWAS is also the source of some debate, with typical screens

only able to identify partial chromosomal regions of QTLs at the mega-base level

through DNA markers which usually specify many candidate genes (Zhu and Zhao,

2007).

The results of GWAS can be considered as a source of preliminary genetic information

that serves as the basis for further analysis. Such further analysis is required to

accumulate evidence for subsequent prioritization and identification of strong candidate

variation for experimental validation (Cantor et al., 2010). Ultimately, the integration of

traditional and fine mapping data, cross-species resources, literature, bioinformatic

resources, and high-throughput sequence-based and gene expression data in GWA

studies will improve the validity of candidate variant lists (McCarthy et al., 2008; Li

and Patra, 2010). Although GWAS is an effective screening approach, experimental

validation is still required to determine actual functional variants within a prioritised

candidate gene or GWAS hits and reveal the functional mechanisms underlying disease

traits (Stranger et al., 2011)

31

1.4 Experimental validation of candidate genetic variation

Functional genomics, through experimental validation methods, attempts to elucidate

the molecular basis of biological functions by characterising the relative transcriptional

or functional potential of a genomic region (Werner, 2010). Reporter gene assays and

the Electrophoretic Mobility Shift Assay (EMSA) are benchmark tests for

experimentally validating candidate functional genetic variation within regulatory

regions as shown by Pang et al. (2013), Low et al. (2012), Knight (2005), Meckler et al.

(2013) and Lattka et al. (2010). Reporter gene assays measure transcript abundance and,

in turn, allow the elucidation of allele- or haplotype-specific effects on gene expression

levels (Cheng and Inglese, 2012; Liu et al., 2009; Seldon et al., 1986; Alam and Cook,

1990). Comparatively, EMSA can discriminate differences in binding of proteins such

as transcription factors at a particular locus, which may be caused by allelic differences

in the DNA (Albers et al., 2012; Hellman and Fried, 2007; Garner and Revzin, 1981).

Alternative experimental validation methods characterise genetic variants by assaying

for different mechanistic roles for functionality including chromatin arrangement

(Chromatin accessibility by Real-Time PCR) (Cruikshank et al., 2012; Rao et al., 2001)

and allele-specific FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements)

(Smith et al., 2012; Giresi et al., 2007) and chromatin conformation capture (Wang, S et

al., 2013), DNA methylation (bisulfite sequencing) (Shoemaker et al., 2010; Kagami et

al., 2010; Frommer et al., 1992), and in vivo DNA-protein interactions (Chromatin

Immunoprecipitation) (Johnson et al. 2007; Robertson et al., 2007; O’Neill and Turner,

1995). The results of experimental validation methods can also be complemented

through the use of bioinformatic analysis (Johnson, 2009; Schaefer et al., 2012) and

high-throughput DNA sequencing (Kingsley, 2011).

32

The development of post-association validation methods is required for characterizing

the function of trait- or disease-associated genetic variants which may influence disease

susceptibility (Gibson, 2012; Conde et al., 2013). Consequently, experimental

validation methodologies that integrate various biological data sets with genome-wide

and candidate gene results will help elucidate the mechanistic role of associated SNPs,

thus enabling the identification of causal disease variants for the development of

therapeutic interventions (Yanase et al., 2006; Kazani et al., 2010; Crews et al., 2012).

Several methodologies for experimental validation of genetic variant functionality are

currently available and assay for different mechanistic roles that lead to variant

functionality (Table 1.1). Combinations of various experimental validation methods

have been used to characterise the functional role of genetic variants in a range of

complex human diseases including schizophrenia (Li, M et al., 2011), breast cancer

(Wang et al., 2010), and type 2 diabetes (Pang et al., 2013).

33

Table 1.1: Characteristics of traditional experimental validation assays for the

assessment of candidate functional genetic variants

A summary of characteristics associated with current functional assays for assessing genetic variants, including: the functional mechanism assayed for (“Mechanism”), the type of assay (“Approach”), the issues and limitations associated with each assay (“Issues/Limitations”) and literature examples citing the use of the assay (“References”). Functional assay Mechanism Approach Issues/Limitations Reference(s)

Reporter gene assay

Relative transcript abundance

in vitro • Locus-specific

• Associated with minor fold changes in relative expression

• Confounding factors such as transfection efficiency and construct-concentration effects

Cheng and Inglese, (2012)

Liu et al. (2009)

Seldon et al. (1986)

Alam and Cook (1990)

Karimi et al. (2009)

EMSA DNA-protein binding interactions

in vitro • Locus-specific

• Target DNA sequence must be known prior to testing (oligo design)

• Inability to measure the co-operation of genomic sites that are essential for binding

Smith et al. (2010)

Carey et al. (2009)

Garner and Revzin (1981)

Knight (2005)

ChIP DNA-protein binding interactions

in vivo • Protein-specific

• Cumbersome (long)

• Requirement for significant numbers of cells

Wright et al. (2010)

O’Neill and Turner (1995)

Collas (2010)

CHART-PCR Chromatin accessibility

in vivo • Locus-specific

• Influenced by other SNPs and regulatory elements within the target region

• Target locus sequence complexity influences optimal amplification

• Methodological issues related to optimal nuclease digestion

Cruickshank et al. (2012)

Rao et al. (2001)

Cruickshank et al. (2009)

Bisulfite sequencing

Methylation patterns in vivo • Locus-specific

• Requirement for significant numbers of cells/chromatin due to harsh bisulfite treatment

• Target DNA sequence must be known prior to testing (oligo design)

Zhang, Y et al. (2009)

Shoemakers et al. (2010)

Frommer et al. (1992)

Grigg and Clark (1994)

Allele-specific FAIRE

Allelic effects on chromatin accessibility and regulatory potential

in vivo • Expensive

• Relies on the existence of a ChIP-grade antibody to recognise each DNA-bound transcription factor

Smith et al. (2012)

Giresi et al. (2007)

Chromatin conformation capture

Haplotype effects on chromatin conformation

in vivo • Detection of a product does not always mean a specific interaction has occurred between two regions

• Requirement of a great number of cells

Wang, S et al. (2013)

34

Published genetic association studies can identify candidate functional genetic variants

within gene promoters. Further studies have identified allele-specific transcription factor

binding patterns that lead to altered gene expression. This represents one possible

mechanism for genetic variants contributing to human disease (Butter et al., 2012). For

example, Li, M et al. (2011) identified a schizophrenia risk SNP (rs3755557) in the

GSK3β gene promoter that alters transcription factor binding affinities, resulting in a

higher promoter activity when the risk allele (A) is present as determined by reporter

gene and gel shift assays in multiple cell lines. More recently, McManus et al. (2012)

found that a hypertension risk-SNP (-1651T/C) in the CYP11B2 gene promoter

influenced binding of the transcriptional repressor APEX1, with the -1651T allele

associated with increased transcriptional activity in comparison to the -1651C allele,

where the repressor binds with decreased affinity.

1.4.1 Reporter gene assays

Reporter gene assays use allelic regulatory regions, such as gene promoters, to drive the

expression of a reporter gene (Karimi et al., 2009). Reporter gene expression can be

assayed so as to detect relative differences in transcript abundance as a result of

particular promoter variants in a cell line following transfection (Cheng and Inglese,

2012; Liu et al., 2009; Seldon et al., 1986; Alam and Cook, 1990). Allelic reporter

constructs that significantly differ in relative transcript abundance relative to controls

are considered to contain functional genetic variants (Karimi et al., 2009).

The reporter gene assay was originally designed to define the major transcriptional

elements that have an effect on basal and/or inducible gene expression, but has been

refined for various applications (Karimi et al., 2009). Further adjustments to the initial

35

methodology (Seldon et al., 1986; Alam and Cook, 1990) have been made, including

alternate methods of transfection and transformation, and the use of inducible

promoters. For example, Karimi et al. (2009) reports the use of a stable transfection

method that utilises FLP-recombinase to functionally assess the G-308A polymorphism

in the TNF gene promoter. Furthermore, the integration of a droplet-based micro-fluidic

system has been used to perform a quantitative reporter gene assay, and has increased

the resolution to the single cell level, allowing accurate measurement of the expression-

response profile for a nuclear receptor agonist; hormone 20E (Baret et al., 2010).

Reporter gene assays have been used to characterise the role of disease-risk variants in a

variety of human diseases, including cardiovascular diseases (Huang et al., 2007), acute

lung injury (Marzec et al., 2007), and viral infections such as hepatitis C (He et al.,

2009). For instance, Huang et al. (2007) identified a functional SNP (-764G/C) in the

IFN-γ promoter region that conferred a two- to three-fold higher promoter activity in the

presence of the G allele relative to C using reporter gene assays. Huang and co-workers

suggested that this SNP is functionally important in determining viral clearance and

treatment response in hepatitis C virus-infected patients. Alternatively, Marzec et al.

(2007) also used reporter gene assays to identify a promoter SNP (-617C/A) in the

NRF2 gene that is associated with reduced luciferase activity and is suggested to be

associated with acute lung injury. In cardiovascular disease research, He et al. (2009)

observed a 14-45% reduction in luciferase reporter expression in the presence of the

+190C coronary heart disease risk-allele (rs1043618) in the HSPA1A gene, relative to

the +190G allele in cell culture.

Although reporter gene assays are considered benchmark experimental tests for

function, typical reporter gene approaches are associated with inherent limitations that

36

confound functional assessment (Karimi et al., 2009). For instance, Karimi et al. (2009)

reviewed factors associated with transient reporter gene assays that confound the

assessment of allelic differences by testing a SNP (-308G/A) in the TNF gene promoter

identified variables that influenced allele-specific TNF transcript abundance included:

the method of transfection, host cell growth history, the amount (and quality) of DNA

and control vector used in transfection for normalisation purposes. An early study by

Smith and Hager (1997) also reported that aberrant transcriptional activity in reporter

gene assays can be influenced by the lack of appropriate chromatin arrangement.

Furthermore, in vitro gene expression, as measured by reporter gene assays, does not

necessarily correlate with in vivo gene expression (Makar et al., 2001). Makar et al.

(2001) determined that an intronic silencing region (CRS) in the human CR2 gene did

not silence transcription using in vitro reporter gene assays, but was functional upon in

vivo testing. Makar and co-workers found that the CRS region was functional only in in

vivo assays where the construct was stably integrated into cells to provide the

appropriate chromatin context.

1.4.2 Electrophoretic Mobility Shift Assay

Electrophoretic Mobility Shift Assay (EMSA) is an in vitro gel separation technique

based on electrophoretic migration of DNA-protein complexes (Hellman and Fried,

2007; Garner and Revzin, 1981). EMSA can differentiate between differences in

binding of proteins to regions of DNA which differ by as little as a single nucleotide

(SNPs). Furthermore, EMSA can be used to determine whether more than one protein

can bind to the region, as demonstrated in the KRT1 (Tao et al., 2006), ADRB2 (Naka et

al., 2012) and TNFSF4 gene promoters (Ria et al., 2011). Although variations in EMSA

methodology have been described in the literature, typically functional genetic variants

are identified by comparing migratory differences of a control binding reaction (with no

37

protein added) relative to allele-specific (or, gene-specific) DNA-protein reactions

(Hellman and Fried, 2007).

EMSA has been used to characterise the role of functional variants in human disease

studies including variants associated with increased risk of cardiovascular disease

(Göring et al., 2007; Musunuru et al., 2010). For example, Göring et al. (2007) used

EMSA to identify an additional DNA-protein binding complex in the presence of the -

137T allele within the promoter region of the cardiovascular risk-gene VNN1 that was

not present with the -137G allele. Bioinformatic analysis using the online software P-

Match (Chekmenev et al., 2005) supported the EMSA results and predicted binding of

the transcription factor (TF) Sp1 in the presence of the -137T allele (Göring et al.,

2007). Another study by Curran et al. (2009) used EMSA to identify allele-specific

binding of proteins at the -191T/C SNP in the promoter of the type 2 diabetes risk-gene

PARL. Bioinformatic analysis using P-Match (Chekmenev et al., 2005) supported the

EMSA findings and revealed that the -191T allele is embedded in a consensus Sp1

binding site whereas the -191C SNP ablated the core sequence. More recently,

Musunuru et al. (2010) used EMSA to identify an allele-specific CCAAT/enhancer

binding protein (C/EBP) transcription factor binding site, localized around the

rs12740374 SNP, which influences the hepatic expression of the SORT1 gene.

Supershift EMSA extends the capabilities of traditional EMSA methodology by using

antibodies specific to target proteins to confirm the identity of DNA-binding proteins

(Carey et al., 2009; Leong et al., 2003). For instance, Neumann et al. (2000) used

supershift EMSA to confirm the identity of 2 major DNA-protein binding complexes

observed in the κB gene within mature dendritic cells. Neumann and co-workers

identified p65/RelA-containing heterodimers (complex 1) and RelB-containing

38

heterodimers (complex 2) binding to oligonucleotides specific for the κB gene in mature

dendritic cell nuclear extract, indicating Rel-directed regulation of the gene encoding

the κB transcription factor. More recently, Smith et al. (2010) identified a novel

candidate functional SNP (rs327) in the LPL gene using supershift EMSA, which

confirmed the in vitro binding of the FOXA2 transcription factor to the T allele-

containing locus.

A recent review by Holden and Tacon (2011) cites a number of confounding factors in

EMSA, such as nuclear extract purity and the requirement of post-translational

modifications that influence in vitro DNA-protein interactions. Furthermore, Dekker

and Haisma (2009) have identified that histone acetyltransferase activation may be

required for the binding of particular proteins to genomic DNA, thereby limiting the

potential pool of candidate regulatory proteins assessed. A study by Hager et al. (2009)

describes a further limitation associated with EMSA whereby structural changes in

chromatin, which facilitate DNA-protein binding, are not taken into account in EMSA.

1.4.3 Chromatin Immunoprecipitation

Chromatin profiling represents an effective means of identifying functional disease-risk

genetic variants and for the detection of regulatory activity associated with chromatin

context (Ernst et al., 2011; Coetzee et al., 2010). The annotation of chromatin provides

a reflection of the transcriptional potential of a genomic region or of a genetic variant,

as chromatin plays a central role in mediating regulatory signals and controlling access

of transcription factors and other elements to the DNA (Ernst et al., 2011). The

characterisation of regulatory regions, in terms of histone modifications and

transcription factor-occupied regions, can be assessed through the direct (DNA-protein

binding) interaction of selected transcription factors with candidate genetic variants or

39

haplotypes (Freedman et al., 2011; Hon et al., 2009). For example, chromatin

immunoprecipitation (ChIP) is an in vivo experimental technique consisting of the

selective immunoprecipitation of a protein of interest from a chromatin preparation to

determine the associated DNA sequences (Collas, 2010; O’Neill and Turner, 1995).

Since initial publications in 1995 (O’Neill and Turner, 1995), the ChIP methodology

has been refined in order to target genetic variants for functional analysis, through the

comparison of chromatin derived from cell lines containing genetic variants of interest

(Wright et al., 2010). For example, Wright et al. (2010) used ChIP to identify allele-

specific enrichment of the TCF4 and beta-catenin proteins to a 300 bp region centred

around the rs6983267 SNP in familial colorectal cancer (CRC) cell lines. Wright and

co-workers found the ChIP DNA samples analysed using the TCF4 and beta-catenin

antibodies showed enrichment with the cancer-associated rs6983267G allele in contrast

to the T allele. ChIP has also been used by Menendez et al. (2006) to identify allele-

specific enrichment of the p53 transcription factor in the FLT-1 promoter encompassing

a SNP (‘C519T’). Menendez and co-workers identified p53 enrichment in the presence

of the T allele and determined an association between the observed allele-specific p53

interaction and angiogenesis-associated diseases, including cancer (Menendez et al.,

2006).

Despite proving useful in disease research, ChIP is associated with a number of inherent

limitations, relating to low signal strength (relative to controls) and resolution, that may

lead to inconclusive results (Collas, 2010; Carey et al., 2009). Carey and co-workers

also report that the use of ChIP cannot solely demonstrate the functionality of a protein

at a genomic region of interest (Carey et al., 2009). Experimentally, targeted ChIP

approaches can also be cumbersome and require significant numbers of cells (Collas,

40

2010), especially when targeting SNPs rather than the traditional target of haplotypes

(i.e. haploChIP') (Ameur et al., 2009; Hultman et al., 2010). Adding to the complexity,

the functional assessment of SNPs using ChIP requires the assessment of multiple

regions around and including a SNP in order to validate the influence of specific SNPs

(Charles, 2005).

1.4.4 CHART-PCR

Chromatin accessibility by Real-Time PCR (CHART-PCR) is an in vivo assay (Figure

1.1) that is able to map chromatin accessibility across a gene and regulatory sequences

(Rao et al., 2001; Cruickshank et al., 2012). CHART-PCR assigns a relative value of

accessibility of DNA binding proteins to a specific region of DNA and is an indication

of the chromatin arrangement and transcriptional activity at that site (Rao et al., 2001;

Hager et al., 1993; Poke et al., 2012). CHART-PCR is based on the concept that

localized ‘open’ chromatin conformation confers greater sensitivity to nuclease

digestion. The degree of sensitivity (or ‘chromatin accessibility’) can be amplified and

measured by quantitative real-time PCR (qPCR) (Mummidi et al., 2007).

41

Figure 1.1 Principles of CHART-PCR. CHART-PCR consists of firstly the isolation of intact nuclei. Following isolation and washing, nuclei are incubated with or without nuclease (such as micrococcal nuclease or deoxyribonuclease 1) giving “undigested” and “digested” samples. Accessibility reactions are subsequently terminated and genomic DNA recovered and prepared for analysis by quantitative real-time PCR in order to quantify target regions. Comparing the quantity of targeted genomic sequences in digested relative to undigested samples provides a measure of chromatin accessibility and expressed as a percentage of nuclease cutting across the region specified by the primers. Figure derived from www.epigentek.com

42

Mikkelson et al. (2007) and Birney et al. (2010) found that chromatin state can be read

in an allele-specific manner using SNPs, which can influence the localised chromatin

context and accessibility within genes. Moreover, Pang et al. (2013) reported that allele-

specific chromatin accessibility differences can influence gene expression, phenotype,

and functionality. CHART-PCR has been used to characterise functional chromatin

accessibility in genes including CYP19 (Fürbass et al., 2008), IL-2 (Rao et al., 2001),

and E-selectin (Edelstein et al., 2005). More recently, CHART-PCR has been refined in

order to differentiate between regions of DNA which differ by as little as a single SNP,

as shown in the CR2 gene (Cruickshank et al., 2012). Cruickshank and co-workers used

CHART-PCR to identify a systemic lupus erythematosus (SLE) risk-SNP (rs3813946)

in the 5′ untranslated region (UTR) of the CR2 gene that resulted in allele-specific

MNase digestion and transcription factor binding within the SNP-containing region.

These data confirmed the functional effects of rs3813946 on CR2 transcription

(Cruickshank et al., 2012), which had been previously found a decrease in

transcriptional activity of CR2 in the presence of the C allele relative to the T allele (Wu

et al., 2007).

While chromatin accessibility plays an essential role in transcriptional regulation,

CHART-PCR as an assay has limited potential due to locus-specific effects, with

confounding variability resulting from other SNPs and regulatory elements within the

target region (Cruickshank et al., 2009). CHART-PCR can also be considered as a low

throughput assay, with inconsistencies arising when comparing multiple cell lines due

to fluctuations in the degree of nuclease digestion (Karimi, 2007). Other sources of

potential variation can be attributed to quantitative real-time PCR (qPCR)

inconsistencies from: improper primer design (in low complexity regions), DNA hetero-

43

duplex formation, mis-priming and primer dimerisation (Freeman et al., 1999; Bustin et

al., 2005; and Git et al., 2010).

1.4.5 Bisulfite sequencing

DNA methylation is an epigenetic mechanism that involves the addition of a methyl

group to particular cytosine residues in the DNA. This process can drastically affect

transcriptional regulation during development, with dysregulation of this mechanism

associated with diseases such as type 2 diabetes (Volkmar et al., 2012) and cancer

(Deng et al., 2009; Suzuki et al., 2012). Typically, DNA methylation is associated with

reduced gene expression either directly, through reducing the accessibility of

transcription factors, or indirectly, by enhancing the formation of heterochromatin (Caro

et al., 2012; Watt and Molloy, 1988; Boyes et al., 1991). The process is reversible by

de-methylation of methylated residues (Holliday et al., 1991). Bisulfite sequencing

enables the identification of the methylation status of cystosines in targeted genomic

regions and is dependent on the action of bisulfite (Figure 1.2). Bisulfite preferentially

deaminates unmethylated cytosines, which are converted to uracil upon desulfonation,

while methylated cytosines are unaffected due to the presence of the methyl

modification (Benoukraf et al., 2013; Frommer et al., 1992). The amplification and

subsequent sequencing of target regions from bisulfite-treated DNA permits the

characterisation of the methylation status at single-nucleotide resolution (Deng et al.,

2009).

44

Figure 1.2 Principles of methylation analysis using bisulfite genomic sequencing. After treatment with sodium bisulfite, unmethylated cytosine residues are converted to uracil whereas 5-methylcytosine (5mC) remains unaffected. After PCR amplification, uracil residues are converted to thymine. DNA methylation status can be determined by direct PCR sequencing or cloning sequencing. Figure and legend obtained form Li and Tollefsbol (2011).

45

Bisulfite sequencing is particularly useful for the characterisation of CpG islands, as

increased numbers of CG dinucleotides in associated primers increase the specificity of

targeted amplification when performing PCR as they allow a tighter interaction with the

target region (Rauch et al., 2005). CpG islands are found within, or in close proximity

to, more than 70% of mammalian gene promoters (Hodges et al., 2009; Saxonov et al.,

2006). They can influence the chromatin conformation and subsequently affect the

transcriptional potential of a gene, depending on the methylation status of CG

dinucleotides within the island (Bird, 2002; Goll and Bestor, 2005). Robertson (2005)

reviewed the effects of irregular DNA methylation and aberrant regulation of CpG

island methylation status and noted associations with human diseases, including cancers

(van Rijnsoever et al., 2002; Dong et al., 2005) and cardiovascular disease (Dong et al.,

2002). Genetic variants located in CpG sites can greatly influence DNA methylation

and therefore affect chromatin conformation and accessibility. As such, allele-specific

differences in chromatin accessibility can often be detected and indicate SNP

functionality (Mikkelsen et al., 2007; Shoemakers et al., 2010). For example, Zhang, Y

et al. (2009) used bisulfite sequencing to assess DNA methylation of CpG islands on

chromosome 21 in leukocytes and identified significant allele-specific methylation due

to candidate functional SNPs in three chromosomal regions, including the promoter of

the CBR1 gene (dbSNPs rs41563015 and rs25678). Zhang and co-workers found that

DNA methylation varied by as much as 85% between alleles and was associated with

mono-allelic gene silencing of the methylated allele (Zhang, Y et al., 2009).

A review by Tycko (2010) describes two mechanistic models describing the role of

genetic variants situated within CpG islands. The first model suggests that variants can

either create or delete CpG sites, which in turn influence the propensity of adjacent non-

polymorphic CpG sites to become cytosine-methylated (Tycko, 2010). The alternative

46

model describes that a portion of CpG-associated genetic variants could modulate the

binding of transcription factors in either a positive or negative manner (Tycko, 2010).

For example, Boumber et al. (2008) found a 12 bp indel polymorphism in a Sp1 GTF-

binding site in a CpG island within the RIL gene promoter that influences the propensity

of the gene to become methylated in leukaemias.

Zhang and Jeltsch (2010) review sources of potential variability associated with the

bisulfite sequencing approach, including: clonal DNA amplification, incomplete

bisulfite conversion, non-CpG methylation and insufficient sequence depth. Recent

advances in high-throughput sequencing technologies, however, may overcome the lack

of sequencing depth required to generate conclusive evidence and also allows the

assessment of multiple genomic regions (Statham et al., 2012). For instance,

Shoemakers et al. (2010) used high-throughput bisulfite sequencing to identify

heterozygous SNPs in CpG sites (‘CpG-SNPs’) that disrupt DNA methylation.

Shoemakers and co-workers identified that allele-specific methylation (ASM) was

present on 23-37% of heterozygous SNPs in cell culture, with a significant proportion

of ASM (38-88%) influenced by SNPs disrupting potential CpG sites. In a related study,

Gertz et al. (2011) also used high-throughput bisulfite sequencing to identify allele-

specific DNA methylation on the X-chromosome, and found allele-specific methylation

at SNPs across six individuals of a three-generation family. Gertz and co-workers found

that 80% of the variation in DNA methylation was dependent on sequence. Historically,

technology has limited genome-wide scale approaches in epigenomics, but the

emergence of high-throughput sequencing technologies will allow the determination of

the global allele-specific methylation status (Callinan and Feinberg, 2006).

47

1.4.6 Genome-wide functional assays

The last decade has seen significant advances over the traditional Sanger et al. (1977)

chain-termination DNA sequencing approach, with current high-throughput (‘deep’ or

‘next-gen’) technologies capable of re-sequencing the human genome in approximately

a week using a single instrument (Ståhl and Lundeberg, 2012; Metzker, 2010; Shendure

and Ji 2008). Since the completion of the HGP, research has focussed on identifying the

functions of particular DNA sequences at a genome-wide level. Combining new high-

throughput sequencing with other techniques provides a more complete picture of how

the genome and various regulatory mechanisms result in particular biological functions

(Wu et al., 2006). Recent genome-wide functional assays that utilise high-throughput

sequencing technologies have enabled the direct evaluation of the role of genetic

variants in gene expression and function (Pastinen, 2010).

Deep sequencing technologies have enabled functional assays such as ChIP, to be

extended to a genome-wide scale (Collas, 2010; Barski et al., 2007; Statham et al.,

2012). A review by Collas (2010) provides a detailed evaluation of the existing

integrated approaches that combine deep sequencing and ChIP methodologies (‘ChIP-

seq’) for the profiling of histone modifications, histone variants, and transcription factor

occupancy of the DNA. For example, ChIP-seq was used initially by Barski et al.

(2007) to characterise histone modifications associated with H2A.Z, RNA polymerase

II, and the insulator binding protein CTCF in the human genome. More recently,

Statham et al. (2012) developed an enhanced ChIP-seq methodology for directly

determining the methylation status of histone-modified DNA through the combination

of high-throughput bisulfite sequencing and ChIP (‘BisChIP-seq’). Statham and co-

workers used BisChIP-seq to identify allele-specific methylation associated with

H3K27me3 histones in normal and prostate cancer cells (Statham et al., 2012).

48

Allele-specific gene expression has been widely studied, with an estimated 20% of

human polymorphic sites showing 1.5-fold expression differences (Zhang, K et al.,

2009), and 30% of human transcripts showing a 1.2-fold difference between alleles (Ge

et al., 2009). Thus far, the annotation of human genetic variation associated with gene

expression has centred on expression quantitative trait loci (eQTL) mapping. However,

next-gen sequencing technologies have enabled the development of genome-wide

functional assays targeting allele-specific gene expression (or ‘cis-regulatory

variation’), DNA–protein interactions and chromatin configuration (Pastinen, 2010;

Cookson et al., 2009). For example, early studies characterizing allelic-specific gene

expression utilised deep sequencing to target the human transcriptome (‘RNA-seq’), as

shown by Montgomery et al. (2010) and Pickrell et al. (2010). RNA-seq is dependent

upon the correlation between sequence reads associated with polymorphic sites and

RNA abundance, which enables the detection of genetic variants that show potentially

skewed expression (Pastinen, 2010). For instance, Pickrell et al. (2010) used RNA-seq

to identify 138 SNPs that were associated with eQTLs, including one SNP (rs7639979)

showing allele-specific expression (GG>GA>AA) in the TSP50 gene in Nigerian

lymphoblastoid cell lines.

While genome-wide functional assays have overcome some limitations related to

traditional assays, such as locus-specificity, large-scale approaches involving

comprehensive genome coverage are associated with both false positives and false

negatives (Mardis and Wilson, 2009; Grunenfelder and Winzler, 2002; Huttlin et al.,

2007). Furthermore, the quality and resolution of datasets from seemingly related

studies can vary considerably (von Mering et al., 2002). In order to resolve the effects

of confounding factors, the results from such genome-wide functional experiments must

49

be validated by performing further experiments (Martin and Drubin, 2003; Mattocks et

al., 2010).

A current focus in functional genomics is the elucidation of causal genetic variants that

are essential for cis-regulatory control of gene expression, with association studies

implicating a functional role of these genetic variants in complex traits and disease

(Pastinen, 2010). Given that the genes and variants associated with complex traits can

be diverse, the integration of multiple whole-genome approaches including GWAS and

genome-wide functional assays such as ChIP-seq and RNA-seq, are essential for

characterising the genetic basis of complex traits (Hawkins et al., 2010; Cowie et al.,

2006). This integrated approach enables the functional characterisation of genetic

variants with single nucleotide resolution, and overcomes limitations associated with

benchmark tests such as reporter gene and gel shifts assays (Pastinen, 2010; Karimi et

al., 2009). Genome-wide approaches generate large numbers of experimental data

points, with difficulties arising when attempting to discern functionally relevant results

from large-scale datasets (Salmon-Divon et al., 2010). As such, bioinformatic tools

represent an ideal approach for the prioritisation of candidate genetic variants from

genome-wide datasets for further investigation (Kann, 2010; Sun et al., 2009; Salmon-

Divon et al., 2010).

1.4.7 Bioinformatic analyses

Bioinformatic approaches are important for the prediction, prioritisation and verification

of functional genetic variants in the human genome, as reviewed by Huang et al. (2009)

and Fernald et al. (2011). Many bioinformatic approaches exist for the prioritisation of

functional genetic variation by identifying transcription factor binding motifs that

contain potential functional variants. Such tools include JASPAR (Sandelin et al.,

50

2004); P-Match (Chekmenev et al., 2005); TRANSFAC (Wingender et al., 1996; Matys

et al., 2006); and AliBaba2 (Grabe, 2002) and often utilise various methods to improve

the validity of results. For example, P-Match integrates known transcription factor

binding sites from the TRANSFAC database (Wingender et al., 1996) with pattern

matching and weight matrix algorithms to increase recognition accuracy and reduce

error rate (Chekmenev et al., 2005). P-Match has been used to predict transcription

factor binding sites in fish (Sun et al., 2006), mice (Jiang et al., 2008) and humans

(Göring et al., 2007). Similarly, AliBaba2 utilises the TRANSFAC database

(Wingender et al., 1996) to identify transcription factor binding sites by generating

predictive matrices based on input DNA sequences (Grabe, 2002), as demonstrated in

the MTA1 gene (Reddy et al., 2009) and the SVEP1 gene (Glait-Santar and Benayahu,

2011). AliBaba2 has been used to predict transcription factor binding sites in different

organisms including insects (Ursic-Bedoya et al., 2008), fish (Yeh and Klesius, 2008)

and humans (Moosa et al., 2011).

The structural influence of genetic variants at the protein level can be predicted using

bioinformatic tools as reported by Lin et al. (2005), Cuff et al. (1998) and Faraggi et al.

(2012). For example, YASPIN is an online database that utilises 7-state local structure

combined with Markov chain models to predict protein secondary structures (Lin et al.,

2005). YASPIN has been used by Pool et al. (2012) and Fallahi-Sichani et al. (2011) to

determine protein secondary structures. Other tools that are being developed to better

predict secondary and tertiary structures include SPINE X (Faraggi et al., 2012) and

FPGA accelerator (Xia et al., 2011).

With recent advancements in high-throughput sequencing technologies, new challenges

have arisen that require further bioinformatic power. Such challenges include complex

51

multiple sequence alignments from large datasets and the non-reproducibility present

amongst various current tools, as reviewed by Aniba et al. (2010). However, these

limitations are, in part, overcome by the integration of multiple bioinformatic tools.

Such a high-throughput approach is likely to increase the likelihood of identifying

casual genetic variants in human traits and disease (Huang et al., 2009). For example,

Lee and Shatkay (2008) developed a functional SNP (F-SNP) application that

incorporates 16 different bioinformatic tools to predict the splicing, transcriptional,

translational and post-translation effects of SNPs. Multiple bioinformatic tools, via the

F-SNP database, have been used to predict functional SNPs in the FOXE1 gene (Landa

et al., 2009), and genes encoding for proline metabolic enzymes (Hu et al., 2008). Xu

and Taylor (2009) use an equivalent approach called ‘SNPinfo’ for candidate SNP

prioritisation that integrates published GWAS results with LD data and in silico

predictions of functional characteristics. The SNPinfo database has been used in studies

to predict functional SNPs in the LIN28B gene (Permuth-Wey et al., 2011) and the

TNFSF8 gene (Wei et al., 2011). Typically, most genetic variants identified in GWAS

have minor effects on disease susceptibility. Thus, the integration of bioinformatic

approaches and functional data is essential for the prioritisation and identification of

candidate functional genetic variants (Moore et al. 2010).

1.5 VNN1 – A gene model for novel experimental validation assays and cardiovascular disease

The VNN1 gene has been associated with a number of human diseases (Kaskow et al.,

2012) including inflammation (Martin et al., 2004; Berruyer et al., 2006; Jansen et al.,

2009), malaria (Min-Oo et al., 2007), type 1 diabetes (Roisin-Bouffay et al., 2008) and

cardiovascular disease (Göring et al., 2007; Zhu and Cooper, 2007). Initial

cardiovascular studies concerning the VNN1 gene targeted transcript levels as an

52

endophenotype to identify the VNN1 gene as a quantitative trait locus associated with

high density lipoprotein–cholesterol levels and cardiovascular phenotypes (Göring et

al., 2007). More recently, genetic variants within the VNN1 gene have been found to be

associated with cardiovascular disease as shown by Jacobo-Albavera et al. (2012), Fava

et al. (2010) and Zhu and Cooper (2007). Furthermore, statistical and functional genetic

assays have experimentally assessed a number of functional genetic variants within the

VNN1 gene including the ‘-708G/A’, ‘-667G/A’, ‘-612C/A’, ‘-587G/A’, ‘-137T/G’

SNPs in the gene promoter (Göring et al., 2007) and the exonic rs2294757 SNP (Kerkel

et al., 2008). Reviews exemplify VNN1 as a novel cardiovascular risk-gene and an ideal

gene model for further functional analysis given the experimentally validated

associations of VNN1 genetic variation with cardiovascular phenotypes (Göring et al.,

2007; Kerkel et al., 2008).

1.5.1 VNN1 – The novel cardiovascular risk-gene

In humans, the VNN1 gene is located on the antisense strand of chromosome 6 (Entrez

Gene cytogenetic band: 6q23-24) within a cluster of three genes (VNN1, VNN2, and

VNN3) which occupy a 80 kb region and comprise the Vanin gene family (Martin et

al., 2001; Galland et al., 1998). The VNN1 gene is 33,198 bp in length and consists of

seven exons (Martin et al. 2001; http://www.genecards.org/cgi-

bin/carddisp.pl?gene=VNN1). The dbSNP database includes over 40 million validated

SNPs in the most recent human genome build (dbSNP build 135), with over seven-

hundred SNPs (779 as of July 2012) located within the VNN1 gene (Sherry et al., 2001;

http://www.ncbi.nlm.nih.gov/projects/SNP/). Orthologs for the VNN1 gene have been

identified in 15 species including mice (Martin et al., 2001) and rats (Fugmann et al.,

2011). VNN1 mRNA expression has been reported in numerous human cells and tissues

including leukocytes, kidney, liver, and the intestine (Galland et al., 1998; Jansen et al.

53

2009). A genome-wide profiling study by Yanai et al. (2005) identified VNN1

expression levels to be highest within whole blood and liver.

The VNN1 gene encodes for a 513 amino acid protein, namely Vanin 1, which was first

reported to have pantetheinase activity by Aurrand-Lions et al. (1996). Aurrand-Lions

et al. (1996) identified similar pantetheinase activity in Vanin family members VNN2

and VNN3, who also found that Vanin 1 played a role in regulating late adhesion steps

of thymus homing under physiological (non-inflammatory) conditions. Aurrands-Lion

et al. (1996) also reported the Vanin 1 protein to have a sub-cellular location on the cell

membrane, linked to a lycosylphosphatidylinositol (GPI)-anchor. In a subsequent study,

Marras et al. (1999) defined Vanin 1 pantetheinase activity to involve the hydrolysis of

one carbo-amide linkage in d-pantetheine to form d-pantothenate (vitamin

B5/pantothenic acid) and cysteamine (2-aminoethanethiol). A recent review by Kaskow

et al. (2012) reported that Vanin 1-directed hydrolysis of pantetheine impacts on

cellular processes including fatty acid metabolism and oxidative stress response.

1.5.2 Gene models in cardiovascular disease – case studies

Cardiovascular diseases (CVDs) represent the leading cause of death worldwide (Moran

et al., 2012), and affected more than 80 million people in the United States in 2012

alone (Kathiresan and Srivastava, 2012). To elucidate the genetic risk factors of CVD,

early research focused on the APOE gene, which encodes the lipid-transport protein

apolipoprotein E (Rall and Mahley, 1992; Visvikis-Siest and Marteau, 2006; Willer et

al., 2008). The APOE gene is polymorphic with two non-synonymous SNPs (rs429358

and rs7412) which correspond to three haplotypes (ε2, ε3, ε4) with varying population

frequencies (Bekris et al., 2008). The polymorphisms alter the amino acid sequence at

positions 112 and 158 and result in three apolipoprotein E isoforms which differ in

54

structure and function (Kuhlmann et al., 2010; McIntosh et al., 2012). A report by

Cambien and Tiret (2007) determined that ε4 carriers (allele frequency >20% in the

population) had a 40% higher risk of CHD than ε3 homozygotes, and shows the link

between genetic variation and susceptibility risk for cardiovascular disease. More

recently, the APOE gene has also been associated with Alzheimer’s disease (Genin et

al., 2011).

At the molecular level, combinations of genetic variants are assumed to dysregulate

particular cellular pathways and contribute to human diseases such as CVD (Kim et al.,

2011). Although few causal genetic risk factors have been associated with complex

CVDs, the genetic mechanisms underlying monogenic cardiovascular traits have been

more extensively characterised (Nabel, 2003; Glazier et al., 2002; Timpson et al.,

2012). For example, low-density lipoprotein (LDL) is a risk factor for monogenic forms

of CVD through the inactivation of hepatic LDL receptors, which modulates LDL levels

in plasma (Timpson et al., 2012). Familial hypercholesterolemia (FH) is a monogenic

disorder associated with mutations within the LDLR gene, leading to a deficit in LDL

receptors and subsequently elevated plasma cholesterol levels (Nabel, 2003; Kassim et

al., 2010). Published estimates indicate in excess of 600 mutations and variants in the

LDLR gene have been identified in patients with FH, with heritability of variants

estimated to exceed 50% (Goldstein et al., 2001; Cambien and Tiret, 2007; Browning

and Browning, 2013).

Hypertension is a complex cardiovascular condition associated with elevated blood

pressure in arteries and confers an increased susceptibility to heart and renal failure,

stroke, and myocardial infarction (Messerli et al., 2007). Many genetic risk factors

associated with increased blood pressure have been identified (Ehret et al., 2011; Lifton

55

et al., 2001). However, the genetic basis of primary hypertension remains poorly

understood; in part due to confounding environmental and lifestyle factors such as

dietary salt consumption (He and MacGregor, 2009), physical activity (Dickinson et al.,

2006), and obesity (Haslam and James, 2005) which influence the CVD phenotype

(Taylor et al., 2009). Thus far, hypertension research has focused on the identification

of genetic risk variants in the nuclear genome; however, a recent study by Liu, C et al.

(2012) identified a number of SNPs within the mitochondrial genome that contribute to

increased susceptibility to hypertension. Liu and co-workers identified a non-

synonymous mitochondrial SNP 5913G>A (Asp>Asn) in the cytochrome C oxidase

subunit 1 of respiratory complex IV that was found to have a significant association

with blood pressure (BP) and fasting blood glucose (FBG) levels (Liu, C et al., 2012).

On average, carriers of the rare 5913A allele were at greater risk of developing

hypertension, with 7mm Hg higher systolic BP than observed at baseline levels and

17mg/dL higher mean FBG (Liu, C et al., 2012). Lui and co-workers identified a

second mitochondrial SNP, 3316G>A (Ala>Thr) in the NADH dehydrogenase subunit 1

of complex I that was also highly associated with BP and FBG levels (Liu, C et al.,

2012). The two individuals with the rare 3316A allele had 17 and 25mg/dL higher FBG

than observed at baseline levels (Liu, C et al., 2012). Estimates of the contribution of

genetic variants to hypertension range from 30 to 50% (Tanira and Balushi, 2005),

which reflects the importance of identifying and characterising the role of functional

genetic variants in human diseases.

To elucidate the role of VNN1 in CVD, Göring et al. (2007) used genome-wide

transcriptional profiles (of lymphocytes) and bioinformatic tools to identify genetic

variants in the VNN1 gene promoter that were highly associated with transcript

abundance and high-density lipoprotein cholesterol (HDL-C) concentrations. Göring

56

and co-workers reported five SNPs (-708G/A, -667G/A, -612C/A, -587G/A, -137T/G)

in the VNN1 gene promoter that were identified to be putatively functional based on the

genetic association with transcript abundance and HLD-C concentration (Göring et al.,

2007). The -137T/G SNP had the highest association with HDL-C concentrations and

was functionally assessed using EMSA, which indicated an allele-specific binding

profile of DNA-protein complexes (Göring et al., 2007). Bioinformatic analysis of the -

137T/G SNP using P-Match identified allele-specific stimulating protein 1 (Sp1)

binding in the presence of the -137T allele (Göring et al., 2007). The transcription factor

Sp1 has previously been associated with high gene expression and contains a zinc finger

protein motif which directly binds to DNA and allows the TF to enhance transcriptional

activity (Jiang et al., 2005). More recently, Jacobo-Albavera et al. (2012) also found

VNN1 gene expression and the -137T/G SNP to be associated with HDL-C levels in

Mexican children.

To determine the role of DNA methylation in VNN1 gene expression, Kerkel et al.

(2008) used a methylation-sensitive restriction endonuclease (HpaII) to assess

heterozygous VNN1 genetic variants their effect on gene expression. Kerkel and co-

workers identified the A allele of an exonic SNP, rs2294757, to have preferential

upregulation of gene expression, indicating allele-specific methylation and the presence

of functional SNPs in the VNN1 gene (Kerkel et al., 2008). Subsequently, Fava et al.

(2010) found an association between the rs2294757 SNP and cardiovascular disease

symptoms such as hypertension.

The identification of causal genetic risk factors represents the primary focus of

cardiovascular research, with genetic variants in the VNN1 gene representing novel

targets for analysis based on published studies (Zhu and Cooper, 2007; Fava et al.,

57

2010; Jacobo-Albavera et al., 2012; Göring et al., 2007; Kerkel et al., 2008). Clearly,

the VNN1 gene is involved with cardiovascular phenotypes, however, the specific roles

of these genetic variants in disease is not obvious (Kaskow et al., 2012). While

predictive bioinformatic tools and genetic association studies have been used to identify

candidate functional variants in the VNN1 gene (Zhu and Cooper, 2007; Fava et al.,

2010), further experimental research is required to elucidate the role of VNN1 genetic

variants in CVD in order to complement functional studies, such as those done by

Göring et al. (2007) and Kerkel et al. (2008).

1.6 Research justification and PhD aims

Recent advances in bioinformatics and functional genomics have allowed greater

precision in predicting the function of human genetic variants in complex traits and

disease (Collas, 2010; Statham et al., 2012). However, a review of the literature

indicates there is a still a clear requirement for robust, rapid and reproducible

experimental validation methods for assessing candidate functional genetic variants,

such as those within the VNN1 gene. The VNN1 gene is clearly involved in

inflammation, but specific roles in other pathways are unclear, with dysregulation of the

VNN1 gene also resulting in the manifestation of other diseases (Kaskow et al., 2012).

Vanin 1 pantetheinase activity is critical in maintenance of normal cellular processes.

Abnormal gene activity, and consequent dysregulation of such processes, can lead to

disease and indicates the importance of characterising the effect of genetic variants on

gene expression (Kaskow et al., 2012). In fact, recently published studies show that

VNN1 genetic variants have a role in cardiovascular disease (Jacobo-Albavera et al.,

2012; Zhu and Cooper, 2007; Fava et al., 2010; Göring et al., 2007; Kerkel et al.,

2008), which promotes VNN1 as the ideal gene model for undertaking functional

analyses using novel experimental validation approaches. Since there is limited

58

information regarding the functional mechanism by which genetic variants influence

human disease, this thesis will significantly contribute to the understanding of the role

of functional genetic variants in the cardiovascular risk-gene VNN1.

The current study will include the development of a novel in vivo assay for the

identification and characterisation of functional genetic variants within the VNN1 gene,

which overcomes limitations associated with functional assays available at the

commencement of this study (2009). This novel assay, namely ‘CHA-seq’, is based on

the measurement of chromatin accessibility across regions containing genetic variants to

account for patterns of transcription factor binding, nucleosomal/histone positioning and

DNA methylation, which are directly associated with functional genetic variation. The

VNN1 gene was used as a model for the development of CHA-seq, with the aim of

extending this approach to a genome-wide scale using novel high-throughput

sequencing technologies.

The aims of this PhD research were to:

1. Use existing chromatin arrangement methods (i.e. CHART-PCR) to assess the

functionality of candidate genetic variants in the VNN1 gene promoter.

2. Develop a novel experimental validation assay (‘CHA-seq’) to evaluate the

functionality of candidate genetic variants in the VNN1 gene promoter.

3. Experimentally validate candidate functional genetic variants in the VNN1

gene promoter derived from the CHA-seq approach.

59

To address the first aim of this study, CHART-PCR was used to characterise chromatin

accessibility in the VNN1 gene promoter using cultured lymphocytic cell lines derived

from the San Antonio Family Heart Study (SAFHS) (Göring et al., 2007), in addition to

the Ramos lymphocytic cell line obtained from the American Type Culture Collection

(ATCC). Cell lines were selected for functional analysis based on the particular

genotypes present at the -137T/G and -587G/A SNP loci in the VNN1 gene promoter.

The -137T/G and -587G/A VNN1 SNPs were selected for further functional analysis

based on preliminary findings in our laboratory and the results of published studies such

as Göring et al. (2007). The -137T/G and -587G/A SNPs were assessed using two

chromatin-sensitive nucleases (micrococcal nuclease and deoxyribonuclease 1) in

CHART-PCR to determine the presence or absence of allele-specific chromatin

accessibility, which is indicative of functionality.

To address the second aim, chromatin derived from the SAFHS cell lines with different

allelic combinations were used to develop the novel ‘CHA-seq’ approach to

functionally characterise genetic variants in the VNN1 gene. The CHA-seq approach

incorporated chromatin-sensitive nucleases and high-throughput DNA sequencing in

order to simultaneously identify and functionally assess candidate SNPs based on

chromatin accessibility. CHA-seq was used to validate the findings of CHART-PCR of

the candidate functional -137T/G and -587G/A SNPs and to screen for other novel

candidate functional SNPs in the VNN1 gene.

To address the third research aim, novel candidate functional VNN1 SNPs identified

from CHA-seq were selected for further functional assessment in order to characterise

the role of the candidate genetic variants in the VNN1 gene. A combined bioinformatic

and experimental validation approach was used to confirm the CHA-seq results which

60

incorporated bioinformatic or in silico predictive methods such as P-Match and

AliBaba, in addition to ‘traditional’ experimental methods such as EMSA and bisulfite

sequencing.

61

Chapter 2: General methods and materials

2.1 General reagents, biochemicals, buffers and solutions

Biochemical reagents used in the current study were of the highest grade available and

purchased from primarily Sigma Chemical Company (Sigma, Australia), Roche

Diagnostics Corporation (Roche Diagnostics Australia Pty Ltd, Australia), Invitrogen

Corporation, Australia Corporation (Invitrogen Corporation, Australia), QIAGEN

(QIAGEN, Australia) and BD Biosciences (Becton, Dickinson and Co, USA). All

disposable plastic (polypropylene and polystyrene) lab-ware was purchased sterile from

Sarstedt (SARSTEDT AG and Co, Germany), Nalgene (Nalge Nunc International,

USA) or Corning Life Sciences (Corning Incorporated, USA), with a detailed list of lab-

ware shown in Table A.3 (Appendix – Supplementary Tables and Figures).

Biochemicals and buffers were prepared in accord with the methods described by

Sambrook and Russell (2001) using sterile non-pyrogenic (Baxter Internation Inc.,

USA), double deionised Millipore-filtered water (ddH2O), Dulbecco’s phosphate

buffered saline (Invitrogen Corporation, Australia) or molecular biology-grade organic

solvents as required. Solutions were prepared in sterile plastic tubes or glass vessels that

had been previously washed several times with ddH2O and autoclaved (30min at 121˚C,

15 PSI). Solutions were sterilised, depending on the contents, by autoclaving (30min at

121˚C, 15 PSI) or using a 0.22µm Acrodisc (Pall Corp, Australia) syringe filter units.

Commonly used 1X Tris-EDTA (TE) and 50X Tris-Acetate-EDTA (TAE) buffers were

made up routinely for general laboratory use. To make a 100 mL solution of TE Buffer,

1 mL of 1 M Tris-HCl (pH 8.0) and 0.2 mL EDTA (0.5 M) was added together in a

glass beaker and made up to volume with double distilled water. pH was adjusted to 8.0

62

using 1 M HCl and 1 M NaOH. TAE buffer was prepared as a 50X stock solution by

dissolving 242 g Tris Base in sterile double distilled water, adding 57.1 mL Glacial

Acetic Acid, and 100 mL of 0.5M EDTA (pH 8.0) solution, and maded up to 1 L with

water. This stock solution was diluted 1/50 with water to make a 1X working solution.

Where necessary, reagents were weighed using a model 310 electronic balance (MG

Scientific, USA) and heated and/or stirred with a Thermolyne Cimarec 2 magnetic

stirrer (Thermo Fisher Scientific Inc, USA). The pH of solutions was measured using a

PerpHecT pH/temperature meter (Orion Research, USA).

General biochemicals and reagents utilised in the current study were sourced from

various distributers listed in detail in Table A.1 (Appendix – Supplementary Tables and

Figures), with a detailed list of the corresponding equipment used in this research

shown in Table A.2 (Appendix – Supplementary Tables and Figures).

2.2 Bacterial Methods

2.2.1 Bacterial culture medium

Microbiological reagents were obtained from Difco Laboratories (Becton, Dickinson

and Company, USA), BDH Laboratories (Merck, Germany) and Sigma Chemical

Company (Sigma, Australia). Bacterial cell culture media and ampicillin antibiotics

were prepared in accord with the methods described by Sambrook and Russell (2001).

All media and solutions were prepared to the prescribed volumes with double-deionised

water and balanced to pH 7 with 1 M HCl and 1 M NaOH. To facilitate positive

bacterial selection, the antibiotic ampicillin is added to bacterial culture media prior to

experimentation at a concentration of 100 µg/µL. Aliquots of ampicillin were stored at -

20oC for no more than 1 week.

63

1 L of LB (Luria-Bertaini) broth was prepared by dissolving 10 g Bacto Tryptone, 5 g

Yeast Extract and 10 g NaCl into sterile double distilled water and made up to 1 L with

water. pH was adjusted to 8.0 using 1 M HCl and 1 M NaOH, followed by autoclaving

at 121°C for 30 minutes. LB broth was supplemented with 100 mg/mL ampicillin

immediately prior to use. Luria-Bertani (LB) agar was prepared using the same

methodology as LB broth, with the addition of 15 g of Bacto Agar. Following

autoclaving, LB agar was cooled to ~60°C and supplemented with 100 mg/mL

ampicillin, and poured (~25 mL) into perti dishes and left to cool at room temperature

until set. Once the agar was set, plates were sealed in parafilm and stored at 4oC until

required, however, LB agar dishes were stored for no longer than two weeks. Prior to

use, plates were warmed in a 37oC incubator for 30 minutes and supplemented with 20

µL X-gal (Promega, USA) for subsequent selection of white recombinant colonies using

traditional blue-white colony screening (Section 2.2.4).

2.2.2 Bacterial culturing conditions

All recombinant cloning and bacterial procedures were carried out using the One Shot

MAX efficiency DH5α-T1R strain of Escherichia Coli (Life Technologies, Australia),

stored at -80oC, with the following genotype: F- φ80lacZ∆M15 ∆(lacZYA-argF) U169

deoR recA1 endA1 hsdR17 (rk-, mk+) phoA supE44 λ- thi-1 gyrA96 relA1 tonA

Bacterial cells were grown at 37oC with 5% CO2 in a CO2 Water Jacketed Incubator

(Forma Scientific, USA).

64

2.2.3 TOPO cloning in bacteria

The TOPO TA Cloning kit (Invitrogen Corporation, Australia) was used in this project

to incorporate Taq-polymerase-amplified PCR products into the pCR2.1-TOPO plasmid

vector and conducted according to the manufacturer’s instructions, with the following

modifications to the reaction set-up. In a 1.5 mL micro-centrifuge tube, 1 µL of pCR2.1

TOPO vector was added to 1 µL of ‘salt solution’ and 1-4 µL of fresh PCR product.

Reaction volume was made up to 6 µL with sterile double distilled water and mixed

gently by pipette. TOPO Reaction was left to incubate for five minutes at room

temperature, followed immediately by transformation (Section 2.2.4) or stored for later

use at -20°C. TOPO reactions were subsequently transformed into competent E.Coli

cells (refer to section 2.2.4) or stored at -80°C for use at a later date.

2.2.4 Transformation of Bacteria and blue-white colony selection

Bacterial transformations were performed using One Shot MAX Efficiency DH5α-T1R

Competent Cells (Invitrogen Corporation, Australia). In brief, 2 µL of pCR2.1-TOPO

plasmid vector containing sequence of interest was added to one 50 µL vial of

competent cells and stored on ice for 30 minutes. Cells were then heat shocked at 42°C

for 45 seconds, followed by two minutes on ice, and subsequently the addition of 500

µL of pre-warmed (37°C) LB broth (See Section 2.2.1). The transformation reaction

was then incubated in 37°C water bath for one hour, and then transferred onto LB agar

plates (See Section 2.2.1) and grown overnight at 37°C with 5% CO2. Blue-white

colony selection was used in this current study for the selection of recombinant bacterial

colonies and performed using the method as described by Sambrook and Russell (2001).

65

2.3 Eukaryotic Methods

2.3.1 Eukaryotic cells and culturing

All mammalian cell lines were cultured at 37°C with 5% CO2 in a humidified incubator

(Forma Scientific, USA). Human haematopoietic suspension cell lines including Ramos

(RA 1) were obtained from the American Tissue Culture Collection (ATCC, USA), and

Epstein Bar Virus (EBV) transformed lymphoblastoid cell lines were obtained from the

SAFHS cohort (Göring et al., 2007). SAFHS cell lines included 0054, 0194, 0306, 0525,

0029, 0188. Cells were propagated in RPMI 1640 cell culture media (Invitrogen

Corporation, Australia) supplemented with 10% (v/v) heat inactivated foetal bovine

serum (FBS; Invitrogen Corporation, Australia), antibiotics (100 U/mL penicillin and

100 µg/mL streptomycin, Invitrogen Corporation, Australia) and 4 mM L-glutamine

(Invitrogen Corporation, Australia). FBS was heat inactivated at 56˚C for three hours,

dispensed into 50 mL aliquots and stored at -20˚C until used. Suspension cell cultures

were grown in filter-top tissue culture flasks (Nalge Nunc International, USA) and

maintained between 0.5-1.5 x 106 viable cells/mL by subculturing every 2 days.

2.3.2 Eukaryotic cell counting with Trypan Blue staining

To calculate cell density, a 10 µL aliquot of cells were transferred to a 1.5 mL

Eppendorf tube and mixed with 20 µL of Trypan Blue Stain (Sigma, Australia). Stained

cells were loaded into a Neubauer Counting Chamber (ProSciTech, Australia) and

counted at 20x magnification under a Wild M40 light microscope (Wild Heerbrugg,

Gais, Switzerland). Cells appearing white in colour and uniformly spherical were

considered viable and were counted. Cells appearing dark blue in colour were deemed

non-viable and were not considered. Cell density was calculated by counting the

number viable cells from the four quadrants of the Counting Chamber and then dividing

66

by the number of quadrants considered (4), and then multiplying by the dilution factor

(3) to give cell density in terms of x 104 cells/mL.

2.3.3 Cyropreservation storage and revival of eukaryotic cell lines

All mammalian cell lines were maintained in storage by cryopreservation. For long-

term storage, cells were grown in complete media for approximately five days post-

revival from cryopreservation storage (or until cells are in mid-late log growth phase)

and were harvested through centrifugation at 1000 rpm (Jouan C312 swinging rotor

centrifuge) for five minutes at room temperature in 50 mL Falcon tubes (Sarstedt,

Germany). Following the removal of the supernatant, pellet cells were resuspended in

complete media to a density of 2-3 x 106 cells/mL. 700 µL Aliquots of resuspended

cells were added to 200 µL FBS and 100 µL Dimethyl sulfoxide (DMSO) and stored in

1.5 mL cryovials (Nalge Nunc International, USA) at -80°C for 1-2 days before being

transferred to liquid nitrogen for long term storage.

To revive the cells from cryopreservation, vials were initially thawed at 37°C and then

transferred to 9 mL complete-media in a 15 mL Falcon tube, and harvested by

centrifugation at 1000 rpm (Jouan C312 swinging rotor centrifuge) for five minutes at

room temperature. Following the removal of the supernatant, pelleted cells were

resuspended in 2-5 mL of fresh complete growth media and transferred to a sterile 25

cm2 filter-top culture flask (Stratagene, USA) and grown at 37°C with 5% CO2.

2.4 Oligonucleotides

All oligonucleotide primers were obtained from Sigma-Aldrich Pty Ltd, Castle Hill

NSW, Australia (Sigma, Australia) purified by desalting for use in Polymerase Chain

Reactions (PCR) (Section 2.8) and DNA sequencing (Section 3.9), whereas

67

oligonucleotides used for EMSA were obtained cleaned by High-performance liquid

chromatography. Oligonucleotides, supplied desiccated, were resuspended in an

appropriate volume of sterile water to a final concentration of 100 µM, vortexed, and

centrifuged. Aliquots of 10 µM were prepared and stored at -20oC until required.

Oligonucleotides for use in PCR or qPCR were designed in accord with the commonly

used guidelines outlined in Table 2.1. Oligonucleotides required for use in the

assessment of SNPs in EMSA (Section 2.10.2) were designed to span fifteen

nucleotides either side (both 5’ and 3’) of a SNP (total length of 31 bp including SNP)

with the forward (5’ to 3’) oligonucleotide of each pair possessing a 5’ biotin

modification. To simulate double stranded DNA (dsDNA), 10 µL of 100 µM forward

and reverse oligos in TE buffer (See Section 2.1) were added together into the same 1.5

mL Eppendorf tube and denatured at 95˚C for ten minutes. The tube was left to cool for

two hours at room temperature to facilitate ligation of the single stranded oligos to form

a double stranded nucleic acid complex.

68

Table 2.1: Primer design guidelines

Shown are the primer design parameters, derived from those specified by Sigma-Aldrich (http://www.sigmaaldrich.com/etc/medialib/docs/Sigma-Aldrich/General_Information/1/oligo_architect_glossary.Par.0001.File.tmp/oligo_architect_glossary.pdf), that were followed in this current research.

Characteristic Parameters

Length 20-25 bp

Secondary structure None to weak

Repetitive sequences None preferably, but no more than 3 identical nucleotides in a row.

GC cap (3’ G or C nucleotide, or ideally GC di-nucleotide)

Present

GC content 40-60%

Predicted primer dimer (using Sigma Genosys online primer design software)

None

Melting temperature ~65˚C, with a difference of less than 1˚C between forward and reverse primers

69

2.5 Isolation and purification of plasmid and eukaryotic DNA

2.5.1 Mini-prep purification of plasmid DNA

Plasmids carrying DNA sequence of interest were isolated and purified using the

QIAGEN Spin Mini-prep kit (QIAGEN, Australia) in accord with the manufacturer’s

instructions. Prior to purification, transformed bacteria were spread onto LB agar plates

(See Section 2.2.1) supplemented with 100 µg/mL ampicillin and incubated at 37˚C

overnight. The following day individual bacterial colonies were picked and inoculated

in 5 mL of LB broth (See Section 2.2.1) containing ampicillin (100 µg/mL) and

incubated overnight at 37˚C with vigorous shaking (200-250 rpm in the Jouan C312

swinging rotor centrifuge). Bacterial cells were then harvested by centrifugation at 3000

rpm (Jouan C312 swinging rotor centrifuge) for ten minutes. DNA was subsequently

isolated from the cell pellet according to protocol supplied with the QIAGEN Plasmid

Purification kits. The purified DNA was resuspended in 100 µL endotoxin-free Buffer

TE (QIAGEN, Australia) and stored at -20˚C.

2.5.2 Isolation and purification of DNA from eukaryotic cells

Eukaryotic cell lines carrying genomic DNA of interest was isolated and purified using

the QIAamp DNA blood mini kit (QIAGEN, Australia) in accord with the

manufacturer’s instructions. Prior to purification, cell lines of interest in culture media

were harvested by centrifugation at 1000 rpm (Jouan C312 swinging rotor centrifuge)

for five minutes. Following the removal of the supernatant, genomic DNA was

extracted and purified using the QIAamp DNA blood mini kit (QIAGEN, Australia)

according to the manufacturer’s protocol outlined in the QIAamp DNA Mini and Blood

Mini Handbook that was supplied with the kit. The purified DNA was resuspended in

150 µL endotoxin-free Buffer TE (QIAGEN, Australia) and stored at -20˚C.

70

2.6 Analysis of nucleic acids

2.6.1 Nano-drop analysis of nucleic acids

To determine the concentration and relative purity of nucleic acid samples, 1 µL

aliquots were loaded onto a Nanodrop ND-100 spectrophotometer (NanoDrop, USA)

for analysis. Absorbance is measured at two wavelengths: 260 nm for nucleic acids, and

280 nm for peptide bonds. The 260 nm measurement was used to calculate the

concentration of the nucleic acid sample. The relative purity of the sample was

determined from the A260/280 ratio. Samples relatively free from protein contamination

return an A260/280 ratio between 1.6 and 2.0.

2.6.2 Gel Electrophoresis

1-2% (w/v) agarose (Promega Corporation, USA) gels containing 0.5 µg/mL ethidium

bromide (Sigma, Australia) were prepared using 1x TAE Buffer (See Section 2.1). DNA

samples were adjusted to approximately 18 µL with ddH2O and combined with 2 µL of

Gel Loading Dye Blue (6X) (New England Biolabs, USA) immediately prior to loading

the gel. DNA samples were run alongside maker standards such as the NEB PCR

Marker (sizes: 50 bp, 150 bp, 300 bp, 500 bp, 766 bp) and the NEB 1 kb DNA ladder

(sizes: 500 bp, 1000 bp, 1500 bp, 2000 bp, 3000 bp, 4000 bp, 5-10 kb bands

indistinguishable) from New England Biolabs (New England Biolabs, USA).

Electrophoresis was performed using a BIO-RAD Electrophoresis tank (BIO-RAD,

USA) filled with TAE buffer to cover the gel and with a constant voltage of 90V for 30-

60 minutes. Following gel electrophoresis, separated DNA fragments were visualised

by fluorescence of ethidium bromide using the Spectroline UV trans-illuminator

(Spectronics, USA) and images captured using the FinePix S1 digital camera (Fujifilm

Corporation, Japan).

71

2.7 Polymerase Chain Reaction (PCR)

In general, conventional PCR amplifications were carried out using the PTC-

Thermocycler (MJ Research Inc) and Platinum Taq polymerase (Invitrogen

Corporation, Australia) according to the manufacturer’s instructions. Briefly, PCR was

performed in sterile 0.2 mL Eppendorf tubes with 50 µL reactions including 5 µL DNA

template (~50-100 ng), 5 µL 10x PCR reaction buffer, 2 µL MgCl2 (50 mM), 2 µL

dNTPs (dATP, dGTP, dCTP, and dTTP at 10 mM each, Roche Diagnostics) 2 µL

primer mix (10 µM each forward and reverse sequence specific primers), 0.5 µL

Platinum Taq polymerase and 33.5 µL sterile water. Typical thermo-cycling conditions

were as follows: initial denaturation of 95˚C for 5 minutes, followed by 10-40 cycles of

95˚C denaturation for 30 seconds, 58-68˚C (5°C less than primer melting temperature)

annealing for 30 seconds and 72˚C extension for 1 minute; followed by extension at

72˚C for 5 minutes; finally cooled to 4˚C until required.

2.8 Sanger DNA Sequencing

Samples to be sequenced by traditional Sanger-based chemistry

(http://www.agrf.org.au/index.php?id=72#Purified DNA Process) were sent to

Australian Genome Research Facility (AGRF) on Murray St, Perth, Western Australia.

For AGRF sequencing of purified plasmid DNA, 1 µg of DNA was adjusted to a

volume of 12 µL in sterile water and was added to 1 µL of 10 mM of the appropriate

primer in a 1.5 mL Eppendorf tube. Samples were then sent to AGRF for sequencing.

For those sequences integrated into the pCR2.1 TOPO vector, M13F primer (5´-

GTAAAACGACGGCCAG-3´) was used.

72

2.9 Experimental validation methods for SNP functionality

2.9.1 Chromatin Accessibility by Real-Time PCR (CHART-PCR)

The CHART-PCR protocol was originally developed by Frances Shannon and co-

worker at the Australian National University (Rao et al., 2001). In brief, CHART-PCR

consists of firstly the isolation of intact nuclei. Following isolation and washing, nuclei

were incubated with or without nuclease (micrococcal nuclease or deoxyribonuclease 1)

giving “undigested” and “digested” samples. Accessibility reactions were subsequently

terminated and genomic DNA recovered and prepared for analysis by quantitative real-

time PCR in order to quantify target regions. Comparing the quantity of targeted

genomic sequences in digested relative to undigested samples provides a measure of

chromatin accessibility and expressed as a percentage of nuclease cutting across the

region specified by the primers.

2.9.1.1 Nuclei isolation

Approximately 1x107 cultured cells were harvested by centrifugation in 50 mL Falcon

tubes at 1000 rpm (Jouan C312 swinging rotor centrifuge) for five minutes at 4°C. The

cell pellet was washed once with ice-cold PBS before re-suspension in 1 mL ice-cold

Igepal Lysis buffer (10 mM Tris.HCl [pH 7.4], 10 mM NaCl, 3 mM MgCl2, 0.5%

Igepal CA-630, 0.15 mM spermine and 0.5 mM spermidine) and five minutes

incubation on ice. Nuclei were collected by centrifugation at 3000 rpm (Jouan C312

swinging rotor centrifuge) (4°C) for five minutes.

2.9.1.2 Nuclease digestion of chromatin

Isolated nuclei were washed with 1 mL of Digestion buffer without CaCl2 (10 mM

Tris.HCl [pH 7.4], 15 mM NaCl, 60 mM KCl, 0.15 mM spermine and 0.5 mM

spermidine) and centrifuged at 3000 rpm (Jouan C312 swinging rotor centrifuge) (4°C)

for five minutes. Washed nuclei were then resuspended in 350 µL of Digestion buffer

73

with CaCl2 (prepared by the addition of 25 µL of 1 M CaCl2 and 150 µL of 1 M MgCl2

to 25 mL of Digestion buffer without CaCl2). This cell suspension was divided into six

50 µL aliquots in sterile, labelled 1.5 mL Eppendorf tubes with each containing

approximately 1.25 x 106 nuclei. Triplicate samples were used for both MNase/DNase I

accessibility reactions (“digested”) and for controls incubated without nuclease

(“undigested”). 50 µL of pre-warmed (five minutes at 37°C) Digestion buffer (with

CaCl2) without nuclease was added to “undigested” samples, while 50 µL of pre-

warmed (five minutes at 37°C) Digestion buffer (with CaCl2 and MgCl2) supplemented

with 20 U/mL MNase (New England Biolabs, USA) or DNase I (Invitrogen

Corporation, Australia) was added to the “digested” samples. All samples were then

incubated in a 37°C water bath for ten minutes. To stop digestion, 30 µL stop solution

(3.3% SDS (w/v), 66 mM EDTA [pH 8.1], 6.6 mM EGTA [pH 8.1]) was added

followed by the addition of 20 µL of proteinase K (New England Biolabs, USA).

2.9.1.3 Collection of DNA fragments

Genomic DNA was isolated using the QIAamp DNA blood mini kit (QIAGEN,

Australia) using reagents and centrifuges at room temperature according to the

manufacturer’s instructions. DNA was recovered in 150 µL 1x TE buffer (See Section

2.1). Eluates were analysed using the Nanodrop ND-100 spectrophotometer (NanoDrop,

USA) routinely to ensure the intergrity of individual assays (Section 2.6.1).

2.9.1.4 qPCR and analysis

Quantitative real time PCR (qPCR) was performed using the SensiMix SYBR No-ROX

Kit (Bioline Pty Ltd, Australia) prepared in a 96-well plate (Bio-Rad Laboratories, USA)

using 5 µL DNA (approximately 100 ng), 2 µL of primer mix (forward and reverse

specific primers at 10 mM each), 10 µL of 2X SensiMix SYBR - No-ROX, and made up

74

to a final reaction volume of 20 µL with sterile ddH2O. Thermal cycling conditions for the

CFX96 Real-Time Detection System (Biorad Laboratories Inc., USA) were as follows:

95°C for 10 minutes; followed by 40 cycles of 95°C for 15 seconds, 64°C for 15 seconds

and 72°C for 15 seconds. Percentage chromatin accessibility was calculated as a ratio of

relative abundance of digested to undigested amplicon, with the measurement calculated

using the CFX Manager software (Biorad Laboratories Inc., USA). Statistics were

performed using GraphPad Prism v4 (GraphPad Software, USA).

2.9.2 Electrophoretic Mobility Shift Assay (EMSA)

The majority of EMSA protocols are based on those initially described by Garner and

Revzin (1981). In short, the EMSA approach consists of firstly the isolation of nuclear

protein extract from a cultured cell line of interest. Following isolation and washing,

allelic double-stranded oligonucleotides specifying a target genomic region were

incubated with or without nuclear protein extract. Binding reactions were terminated

and prepared for gel electrophoresis, and subsequently quantified using the

‘Chemilluminescent nucleic acid detection module’ (Thermo-Scientific, Australia) to

identify the presence of allele-specific DNA-protein binding complexes indicative of

functionality. For the functional assessment of genetic variants using EMSA, migratory

differences of allelic samples incubated in nuclear extract are compared to relative to a

control binding reaction absent of protein relative to determine the presence or absence

of allele-specific DNA-protein complexes.

2.9.2.1 Nuclear protein extraction

Nuclei were collected from 5x107 cells from culture cell lines by firstly harvesting by

centrifugation at 1000 rpm (Jouan C312 swinging rotor centrifuge). Following the

removal of supernatant, the cell pellet was resuspended in 5 mL ice-cold PBS. Cells

were collected by centrifugation again, re-suspended in 1 mL of ice-cold PBS and

75

transferred to a clean 1.5 mL Eppendorf tube. Cells were centrifuged down at 14000

rpm (Jouan C312 swinging rotor centrifuge), 4C for one minute, with the subsequent

supernatant discarded. The cell pellet was resuspended in 200 µL ice cold Buffer A (10

mM HEPES [pH 7.9], 1.5 mM MgCl2, 10 mM KCl, 0.5 mM DTT, supplemented prior

to use with 2 µL of 0.5 mM PMSF and 1 µL of 1 µg/mL pepstatin for every 1 mL used)

using a 28 gauge needle through which cells pass 6-8 times on ice. The cell suspension

was then collected by centrifugation at 14000 rpm (Jouan C312 swinging rotor

centrifuge), 4˚C, for ten seconds, with the resulting supernatant discarded. The crude

nuclei were extracted by re-suspending the cell pellet in 30 µL ice cold Buffer C (20

mM HEPES [pH 7.9], 1.5 mM MgCl2, 420 mM KCl, 0.5 mM DTT, 0.2 mM EDTA [pH

8.0], 25% [v/v] Glycerol, supplemented with 2 µL of 0.5 mM PMSF, 1 µL of 1 µg/mL

pepstatin and 1 µL of 1ug/mL leupeptin prior to use for every 1 mL used), followed by

mixing and incubation on ice for 20 minutes. An equal volume of Buffer D (20 mM

HEPES [pH 7.9], 0.5 mM DTT, 0.2 mM EDTA [pH 8.0], 20% [v/v] Glycerol,

supplemented with 2 µL of 0.5 mM PMSF, 1 µL of 1 µg/mL pepstatin and 1 µL of 1

µg/mL leupeptin prior to use for every 1 mL used) was added to the nuclei and mixed.

Finally, the nuclei were collected by centrifugation at 14000 rpm (Jouan C312 swinging

rotor centrifuge), 4˚C, for ten minutes. The purified nuclei were subsequently divided

into 20 µL aliquots, snap frozen in liquid nitrogen and stored at -80˚C until used.

2.9.2.2 Binding reaction preparation

To prepare the DNA retardation gel (Invitrogen Corporation, Australia) for

electrophoresis, the gel wells were washed with 0.5X running buffer (diluted from 5X

stock of 54 g trimza base, 27.5 g boric acid, added to 20 mL 0.5 M EDTA, and made-up

to 1 L with sterile water) with a Pasteur pipette to remove bubbles and contaminants.

The gel was assembled into the electrophoresis tank (Invitrogen Corporation, Australia),

76

and filled with 0.5X running buffer. The gel was pre-run at 100V for 30 minutes.

During the incubation the DNA binding reaction was prepared in 1.5 mL Eppendorf

tubes on ice with nuclear extract (4 µL of 20% ficoll, 2 µL of 200 mM HEPES [pH 7.9],

1 µL of 20 mM EDTA [pH 8.0], 2 µL of 10 mM DTT, 1 µL of Poly dI:dC, 6 µL of

sterile water and 1.5 µg of nuclear extract) and without (4 µL of 20% ficoll, 2 µL of 200

mM HEPES [pH 7.9], 1 µL of 1 M KCl, 1 µL of 20 mM EDTA [pH 8.0], 2 µL of 10

mM DTT, 1 µL of Poly dI:dC and 6.5 µL of sterile water) followed by 10 minutes

incubation on ice. Following incubation, labeled sequence-specific oligonucleotides

were added to the binding reactions (2.5 µL in the nuclear protein reaction and 8 µL in

the reaction without extract) and left to incubate on ice for 30 minutes.

2.9.2.3 Gel electrophoresis and detection

Samples were mixed with 2 µL of bromophenol blue dye (Merck, Australia) and loaded

quickly onto the pre-run retardation gel, with an additional lane of dye to visually

monitor the migration progress of the samples. The gel was run for one hour at 100V.

Following the electrophoresis, the gel tank was reassembled with the addition of a nylon

membrane (Invitrogen Corporation, Australia) placed in direct contact with the DNA

retardation gel, and run at 30V for 30 minutes to facilitate the transfer of the DNA-

protein complexes formed in the binding reaction. The membrane was then removed

from the gel tank, covered with a sheet of plastic wrap, and exposed on a UV trans-

illuminator for ten minutes. The detection of immobilized nucleic acids was carried out

using the ‘Chemilluminescent nucleic acid detection module’ (Thermo-Scientific,

Australia) according to the manufacturer’s protocol. The membrane was then exposed

onto chemiluminescence film (GE Healthcare Limited, Australia) by firmly pressing the

membrane onto a sheet of film in a film cassette (Amersham, UK) for up to five

minutes, followed by the submersion of the exposed film in developer solution (diluted

77

1:5 with sterile water prior to use) (Kodak, Australia) until bands became visible. The

film was then immediately placed into fixer solution (diluted 1:5 with sterile water prior

to use) (Kodak, Australia) to terminate the reaction. Excess fixer solution was removed

by rinsing the film in water. All exposure steps took place in a dark room with minimal

illumination. Genetic variants associated with allele-specific DNA-protein complexes in

EMSA were considered to be functional.

2.9.3 Bisulfite Sequencing

The majority of current bisulfite sequencing methodologies are based on those initially

described by Frommer et al. (1992). In brief, bisulfite sequencing consists of firstly the

bisulfite treatment of genomic DNA isolated from a cultured cell line of interest using

the EpiTect Bisulfite Kit (QIAGEN, Australia). Following isolation and washing,

genomic DNA was enriched for a genomic region of interest using sequence-specific

primers in PCR. Bisulfite-enriched target regions were subsequently cloned into TOPO

vectors using the TOPO TA Cloning kit (Invitrogen Corporation, Australia), and

transformed into competent E.coli. Following blue-white colony selection, positive

recombinant colonies were picked and grown overnight at 37˚C in LB broth

supplemented with ampicillin and subsequently purified using mini-prep spin columns

(QIAGEN, Australia). Finally, purified DNA was sequenced and analysed using

bioinformatic tools (BISMA) to determine the presence of allele-specific DNA

methylation patterns. Genetic variants that were associated with allele-specific DNA

methylation following bisulfite sequencing were considered to be functional.

78

2.9.3.1 Bisulfite treatment

Genomic DNA was isolated from seven cultured human leukocyte cell lines from the

SAFHS cohort carrying different -587 genotypes (two -587G/G homozygotes, three -

587G/A heterozygotes and two -587A/A homozygotes) and purified using DNA Blood

mini columns (QIAGEN, Australia). Isolated DNA was treated with bisulfite to convert

unmethylated cytosine bases into uracils, and subsequently purified using the EpiTect

Bisulfite Kit (QIAGEN, Australia).

2.9.3.2 PCR amplification

The converted DNA was enriched for the VNN1 gene promoter by PCR using primers

(5’-CAGTGAGCCTGTACCCCTGG-3’ and 5’-TATTCTCCCCATGTGTGTGCC-3’)

flanking the predicted CpG island (between -758 to -496 upstream of the VNN1

transcriptional start site) and PCR amplified using 40 cycles with the protocol outlined

in Section 2.7.

2.9.3.3 Cloning and transformation

The amplification products were cloned (Section 2.2.3) into the pCR2.1-TOPO vector

(Invitrogen Corporation, Australia) and transformed (Section 2.2.4) into competent

E.coli (Invitrogen Corporation, Australia) for subsequent Sanger-based DNA

sequencing (Australian Genome Research Facility, Perth, Western Australia). The

transformation mix was plated onto LB-agar plates (Section 2.2.1) supplemented with

amplicillin and X-gal for blue-white colony selection (Section 2.2.4). White

recombinant colonies were picked the following day and grown overnight at 37˚C in LB

broth (Section 2.2.1) supplemented with ampicillin and subsequently purified (Section

2.5.1) using mini-prep spin columns (QIAGEN, Australia).

79

2.9.3.4 DNA sequencing and analysis

Isolated clones were then sent for Sanger-based sequencing (Australian Genome

Research Facility, Perth, Western Australia) using the endogenous M13F primer (5´-

GTAAAACGACGGCCAG-3´). Sequences were analysed using BISMA

(http://biochem.jacobs-university.de/BDPC/BISMA/manual_unique.php) to identify the

methylation states of predicted CG di-nucleotides in CpG islands. Allele-specific

differences in methylation patterns, localised near a particular SNP, suggest the

potential for differential gene expression and functionality. Further bioinformatic

verification was performed using AliBaba 2.1 (http://www.gene-

regulation.com/pub/programs/alibaba2/index.html).

2.10 Targeted-CHA-seq (T-CHA-seq) In the present study, two novel in vivo approaches (‘targeted’ CHA-seq [T-CHA-seq] and

CHA-seq) were developed in order to characterise allele-specific differences in chromatin

arrangement within the VNN1 gene. Firstly in T-CHA-seq, a DNase I-digested genomic

library was prepared from a -137T/G heterozygous cell line from the SAFHS cohort and

prepared using the NEBNext DNA Sample Prep kit (New England Biolabs, USA).

Following library preparation DNA fragments were enriched for the -137 promoter

region of the VNN1 gene using two rounds of sequence-specific primers in PCR. VNN1-

enriched fragments were subsequently cloned into TOPO vectors using the TOPO TA

Cloning kit (Invitrogen Corporation, Australia), and transformed into competent E.coli.

Following blue-white colony selection, positive recombinant colonies were picked and

grown overnight at 37˚C in LB broth supplemented with ampicillin and subsequently

purified using mini-prep spin columns (QIAGEN, Australia). This T-CHA-seq library

was subsequently sequenced using traditional Sanger sequencing in order to map nuclease

digestion sites around the -137 SNP. Finally, purified DNA was sequenced and analysed

using bioinformatic tools (BLAST) to align allelic sequence reads to the -137 promoter

80

region and determine the presence of an allele-specific bias (“allele-skew”) in sequence

reads as a direct reflection of localised chromatin accessibility and functionality. To

facilitate a direct comparison between different alleles at a given SNP, SAFHS cell lines

containing heterozygous SNPs were used and loci correlating with high allele-skews

were considered to be functional genetic variants.

2.10.1 T-CHA-seq library preparation

2.10.1.1 Isolation of nuclei from cell lines

Nuclei from 1 x 108 cells (~100 mL) of a eukaryotic cell line were harvested by

centrifugation at 1000 rpm (Jouan C312 swinging rotor centrifuge) for five minutes at

4OC (using two 50 mL Falcon tubes). Following the removal of supernatant, the cell

pellet was washed in 50 mL ice-cold PBS and collected again by centrifugation. The

pellet was subsequently resuspended in 4.5 mL Igepal Lysis buffer (10 mM Tris.HCl

[pH 7.4], 10 mM NaCl, 3 mM MgCl2, 0.5% Igepal CA-630, 0.15 mM spermine and 0.5

mM spermidine), and incubated on ice for five minutes while swirling occasionally.

Cells were transferred to six 1.5 mL Eppendorf tubes and centrifuged at 3000 rpm

(Jouan C312 swinging rotor centrifuge) for five minutes at 4ºC. Resulting supernatant

was then removed and the remaining crude nuclei were washed with 1 mL Digestion

buffer without CaCl2 (10 mM Tris.HCl [pH 7.4], 15 mM NaCl, 60 mM KCl, 0.15 mM

spermine and 0.5 mM spermidine), and centrifuged at 3000 rpm (Jouan C312 swinging

rotor centrifuge) for five minutes at 4OC. The nuclei were then resuspended in 300 µL

warmed Digestion buffer with CaCl2 (prepared by the addition of 25 µL of 1 M CaCl2

and 150 µL of 1 M MgCl2 to 25 mL of Digestion buffer without CaCl2).

81

2.10.1.2 Digesting DNA with chromatin accessibility agent

The nuclei from all six 1.5 mL Eppendorf tubes were transferred to a 50 mL Falcon tube

(~1.8 mL) and warmed at 37°C for two minutes. Following the incubation, 1.8 mL

DNase I (T-CHA-seq) (2 µL of 251 U/µL DNase I [Invitrogen Corporation, Australia]

was diluted into 51 µL Digestion buffer with CaCl2 [Section 2.9.1.2], with 36 µL of this

dilution added to 1.8 mL of Digestion buffer with CaCl2 to generate a 1X working

solution) dilution was added to the nuclei and digested for six minutes at 37°C. A 2 mL

aliquot of buffer AL (QiaAmp DNA blood mini kit, QIAGEN, Australia) was then

added to the digested nuclei and mixed by inversion. An aliquot of 2.2 mL PBS and 200

µL Qiagen Protease (QiaAmp DNA blood mini kit, QIAGEN, Australia) was combined

with digested nuclei. The samples were then transferred to four 1.5 mL Eppendorf tubes

and incubated at 55ºC overnight to terminate digestion. Digested nuclei were diluted

with 7 mL sterile water and 10 mL buffer AL, and were subsequently mixed by

inversion. Samples were purified to isolate digested DNA fragments using the QiaAmp

DNA blood mini kit (QIAGEN, Australia) according to the manufacturer’s protocol.

2.10.1.3 NEBNext DNA Sample Prep kit processing

A T-CHA-seq library of compatible DNA fragments was prepared for deep sequencing

using the NEBNext DNA Sample Prep kit (New England Biolabs, USA) in accord with

the manufacturer’s instructions provided with the kit. The NEBNext DNA Sample Prep

kit (New England Biolabs, USA) consists of several workflow steps including end-

repair and adapter sequence annealing and adapter-specific amplification using PCR.

Size selection was required to generate a library with an average size of 330 bp for

subsequent deep sequencing using the Illumina/Solexa Genome Analyser II (Illumina,

USA). Consequently, the prepared T-CHA-seq library was sized using Agarose Gel

82

Electrophoresis (Section 2.6.2) by relative comparison to a control (100 bp DNA ladder,

New England Biolabs, USA) on a 2% agarose gel (and run for 30 min at 90V). Gel

pieces containing library fragments of sizes between 150-400 bp (mono- and di-

nucleosomal sizes) were cut out while visualising with a dark reader - blue light

transilluminator (Clare Chemical Research, USA). Library fragments were purified

using the QIAquick Gel Extraction kit (QIAGEN, Australia).

2.10.2 PCR amplification

Two rounds of PCR amplification (Section 2.7) using nested sets of sequence-specific

primers (Table 2.2) were used to increase target yield of the -137 VNN1 locus from both

the forward (5’ to 3’) and reverse (3’ to 5’) direction using the protocol outlined in

Section 2.7, with 20 cycles for each round. In each round of amplification, one of the

target specific primers was paired with an adapter-specific primer to ensure only library

fragments are amplified. After each round of amplification, products were purified

using the QIAquick PCR purification Kit (QIAGEN, Australia) to ensure no primer

sequence were present in subsequent rounds of PCR. Following two rounds of

amplification for each of the ‘forward’ and ‘reverse’ libraries consisting of enriched -

137 VNN1 sequence, libraries were combined to ensure the entire -137 promoter region

was mapped for DNase I cut sites.

83

Table 2.2: PCR primers used in T-CHA-seq. Two rounds of PCR amplification were performed using nested sets of sequence-specific primers of the sequence shown (“DNA sequence”), in order to increase target yield of the -137 VNN1 locus from both the forward (5’ to 3’) and reverse (3’ to 5’) direction. In each round of amplification, one corresponding VNN1-specific primer was paired with an adapter-specific primer to ensure only library fragments are amplified. After each round of amplification, products are purified using the QIAquick PCR purification Kit (QIAGEN, Australia) to ensure no primer sequence is present in subsequent rounds of PCR.

Primer

orientation

Round of

nested PCR

DNA sequence (5’ to 3’)

Novel forward primers

First round 5’-GCAGCCCTAGCGAACTAATG-3’

Second round

5’-TCTTGAGTTAATCTTCACAAATTACTGC-3’

Novel reverse primers

First round 5’-GTCCAGAACTTAGTACCTTAAACTACGGAAG 3’

Second round

5’-CGCTATGATTTCATAGCATTGTTACATAAC 3’

Adapter-specific primer

First and second round

5’-ACTCTTTCCCTACACGACGC-3’

84

2.10.3 TOPO cloning and transformation

The PCR-enriched amplification products were cloned (Section 2.2.3) into the pCR2.1-

TOPO vector (Invitrogen Corporation, Australia) and transformed (Section 3.2.4) into

One-Shot chemically competent E.coli (Invitrogen Corporation, Australia). The

transformation mix was plated onto LB-agar plates (Section 2.2.1) supplemented with

amplicillin and X-gal for blue-white colony selection (Section 2.2.4). Positive white

recombinant colonies were picked the following day and grown overnight at 37˚C in LB

broth (Section 2.2.1) supplemented with ampicillin and subsequently purified (Section

2.5.1) using mini-prep spin columns (QIAGEN, Australia).

2.10.4 Sanger DNA sequencing

Prepared T-CHA-seq library fragments were sequenced by traditional Sanger-based

chemistry (http://www.agrf.org.au/index.php?id=72#Purified DNA Process) at the

Australian Genome Research Facility (AGRF) on Murray St, Perth, Western Australia

using 1 µg of DNA in 12 µL and 1 µL of 10 mM M13F primer (5´-

GTAAAACGACGGCCAG-3´).

2.10.5 Analysis and bioinformatics

In order to map DNase I digestion sites around the -137 SNP and determine the presence or

absence of an allele-specific bias in read number (or ‘allele-skew’), T-CHA-seq

sequence reads were mapped to the -137 promoter region using bioinformatic alignment

tools such as BLAST, which are available online at the National Center for

Biotechnology Information (NCBI) website (http://blast.ncbi.nlm.nih.gov/Blast.cgi).

Significant allele-skews were considered to be functional.

85

2.11 Extended CHA-seq (CHA-seq) A MNase-digested genomic library was prepared from a -587G/A heterozygous cell line

from the SAFHS cohort and subjected to high-throughput Illumina sequencing in order to

map allelic-nuclease digestion sites across the entire VNN1 gene. This CHA-seq approach

was developed as a refinement in methodology over the targeted T-CHA-seq approach,

in order to screen larger genomic regions for functional genetic variants. As in T-CHA-

seq, significant allele-skews were taken as a direct reflection of allele-specific nuclease

accessibility and functionality at heterozygous loci.

In summary, CHA-seq consisted of firstly the isolation of intact chromatin from a -

587G/A heterozygous cell line from the SAFHS of interest, followed by the nuclease

digestion with micrococcal nuclease and preparation of a DNA fragment library

compatible for next-gen sequencing using the NEBNext DNA Sample Prep kit (New

England Biolabs, USA). Following library preparation DNA fragments were enriched

for the Vanin gene family using sequence-specific biotinylated probes specific for the

SureSelect DNA Target Enrichment system. Bead-enriched fragments were

subsequently processed for cluster generation on the Illumina cluster station (Agilent,

USA) and sequenced using Illumina next-gen sequencing technology on the Illumina

Genome Analyzer II platform (Agilent, USA) in accord with the manufacturer’s

instructions. Finally, sequence reads were analysed using the Integrated Genome

Viewer (IGV) 2.1 (Broad Institute) software and aligned to the hg19 reference

sequence. Raw sequence data was used to interpret chromatin accessibility by

comparing numbers of allelic sequence reads at heterozygous SNPs. Genetic variants

with significant allele-specific bias in sequence read number (‘allele-skew”) were

considered to be functional.

86

2.11.1 Library preparation

CHA-seq library preparation was equivalent to that of T-CHA-seq, as described in

Section 2.10.1, with the only modification being the substitution of MNase for DNase I

in order to map nucleosome positions, rather than mapping transcription factor binding

sites.

2.11.2 Bead selection

In order to increase sequencing coverage of the Vanin gene family, a bead selection step

was included in the CHA-seq methodology. Following preparation of the MNase-

digested library using NEBNext DNA Sample Prep kit (New England Biolabs, USA),

target sequences containing the entire Vanin gene family were supplied to Agilent to

synthesise pools of biotinylated probes specific for the SureSelect DNA Target

Enrichment system. Libraries were purified using a bead-based selection protocol with

the SureSelect Target Enrichment System - Illumina Single-End Sequencing Platform

Library Prep Version 1.2 (Agilent, USA) according to the manufacturer’s instructions.

2.11.3 Cluster generation and deep sequencing

Prepared libraries were processed for cluster generation on the Illumina cluster station

(Agilent, USA) and subsequently sequenced using Illumina next-gen sequencing

technology on the Illumina Genome Analyzer II platform (Agilent, USA) in accord with

the manufacturer’s instructions.

2.11.4 Analysis with IGV 2.1

DNA sequence reads were then analysed using the Integrated Genome Viewer (IGV)

2.1 (Broad Institute) software and aligned to the hg19 reference sequence. The IGV

software allowed the visualisation of base sequence, read lengths and polymorphisms.

87

Raw sequence data was used to interpret chromatin accessibility by comparing numbers

of allelic sequence reads at heterozygous SNPs. Genetic variants with significant allele-

skews were considered to be functional.

2.12 Statistical Analyses

Statistical analyses (t-test, ANOVA, distribution test) using Graphpad Prism 5 software

(http://www.graphpad.com/prism/prism.htm) were used to determine statistical

significance in mean chromatin accessibilities between sample sets for each target locus

in CHART-PCR. For this research, CHART-PCR data was obtained from a minimum

of three independent experiments performed in triplicate. Statistical significance (*) as

shown was determined using student’s unpaired t-test using a confidence interval of

95% (P<0.05).

Bioinformatic analysis was performed by submitting a sequence of interest to publicly

available online databases which match to large collections of documented regulators,

enabling the prediction of transcription factor binding sites occurring in the submitted

sequence. Based on the predictions from multiple databases, a model can be developed

as to the role the submitted sequence has in a particular function in a biochemical

pathway. For example, EMSA (section 2.10.2) was used to validate allele-specific

DNA-protein binding differences predicted from online databases such as AliBaba2.1

(Grabe, 2002), and P-Match (Chekmenev et al., 2005).

88

Chapter 3: In vivo assessment of VNN1 promoter SNPs using two chromatin-sensitive nucleases in CHART-PCR

3.1 Introduction

Chromatin arrangement is central to the regulation of cellular signals, controlling access

of transcription factors to the DNA, and consequently, gene expression (Ernst et al.,

2011). Furthermore, genetic variants may influence chromatin arrangement and

accessibility within genes (Mikkelson et al., 2007; Birney et al., 2010), potentially

affecting gene expression and phenotype (Pang et al., 2013). CHART-PCR was initially

designed to characterise the chromatin arrangement and in vivo transcription factor

binding differences within the IL-2 proximal gene promoter (Rao et al., 2001). Since

2001, the CHART-PCR methodology has been refined to enable cell-type and allele-

specific comparisons of chromatin accessibility associated with genetic variants, as

demonstrated for the CR2 gene (Cruikshank et al., 2012).

In CHART-PCR, chromatin accessibility can be assessed using nucleases which digest

DNA depending on the chromatin arrangement, including restriction enzymes Dra I and

HinfI (Rao et al., 2001; Bulanenkova et al., 2011). A key assumption of CHART-PCR

is that the degree of chromatin accessibility in a particular genomic region is directly

related to the level of physical protection to nuclease digestion conferred by the

occupancy of histones and bound proteins (Bell et al., 2011; An et al., 1998; Grayling et

al., 1997). Micrococcal Nuclease (MNase) and Deoxyribonuclease I (DNase I) are

commonly used nucleases in chromatin studies. They differ in terms of recognition sites

and the mode of nucleolytic digestion (Xiao et al., 2007). Complementary structural

chromatin information can be derived from the results of nuclease digestion including

89

the positions of nucleosomes and hypersensitive sites (Sajan et al., 2012; Mannherz et

al., 1995; Smith et al., 1983; Heins et al., 1967).

DNase I is an endonuclease which preferentially cleaves DNA at phosphodiester bonds

adjacent to a pyrimidine nucleotide, producing 5'-phosphate-terminated

oligonucleotides and 3’-hydroxylated ends (Liu, Z et al., 2012). DNase I footprinting is

an approach commonly used to map the chromatin arrangement of gene promoter

regions, potential transcription factor binding sites and areas of hypersensitivity within

the genome (Boyle et al., 2011; Drouin et al., 1997; Decker et al., 2009). The

integration of a deep sequencing approach has extended DNase I hypersensitive site and

chromatin mapping to a genome-wide scale (‘DNase-seq’) (John et al., 2011; Thurman

et al., 2012).

MNase (or S7 nuclease) is an endo-exonuclease, typically isolated from Staphylococcus

aureus, that preferentially digests naked DNA and linker DNA between adjacent

nucleosomes, thereby enriching for nucleosome-associated DNA which is partially or

completely protected from MNase cutting (Chung et al., 2010; Axel, 1975; Rizzo et al.,

2012). The occupancy and position of nucleosomes can therefore be assayed using

MNase (Rizzo et al., 2012; Li, Z et al., 2011; Yigit et al., 2013). Typically, the

quantification of MNase-digested DNA fragments has been performed using indirect-

end-label analysis (Kent and Mellor, 1995; Wu and Winston, 1997), or by qPCR as in

CHART-PCR (Cruikshank et al., 2008; Cruikshank et al., 2012). More recently, MNase

nucleosome mapping has shifted away from the quantification of reduced cutting

frequency, to a genome-wide approach that targets the abundance of recovered

nucleosome-sized DNA fragments by microarrays (Lantermann et al., 2010) or high-

90

throughput sequencing technologies (‘MNase-seq’) (Wei et al., 2012; Gaffney et al.

2012, Li et al., 2012).

To interpret the level of nuclease digestion in CHART-PCR, control genomic regions

are fundamental to establishing benchmark chromatin accessibility in order to compare

samples between different cell lines (Rao et al., 2001; Cruikshank et al., 2012). For

example, Surfactant protein 2 (SPA-2) and Beta-actin (β-actin) have been used in

CHART-PCR as chromatin accessibility controls to characterise the chromatin state of

the human CR2 gene with MNase in a number of immortalised human cell lines (U937,

REH, Ramos and Raji) (Cruikshank et al., 2008). Cruikshank et al. (2008) determined

an SPA-2 regulatory region to be representative of transcriptionally silent

heterochromatin, as this region has not been found to be expressed in hematopoietic

cells (Li et al., 1998). Alternatively, β-actin has been demonstrated to have

constitutively high gene expression in most eukaryotic non-muscle cells and therefore is

commonly used as an internal control for the quantification of mRNA levels (Zhang et

al., 2005; Danilition et al., 1991). For example, Cruikshank et al. (2012) used sequence

near the transcriptional start site of the β-actin gene as a control in order to characterise

the chromatin accessibility of a functional SNP (rs3813946) in the CR2 gene using

CHART-PCR (Cruikshank et al., 2012). Consequently, SPA-2 and β-actin were used as

control genomic regions in the current study to interpret the relative level of chromatin

accessibility in CHART-PCR and to facilitate comparisons between multiple cell lines.

In this chapter, the relative level of chromatin accessibility was elucidated at candidate

functional SNPs in the VNN1 gene model. The degree of chromatin accessibility in

DNA heterozygous for both -137 (rs4897612) and -587 (rs2050153) promoter variants,

which are 450bp apart, was assessed by comparing digestion patterns to SPA-2 and β-

91

actin controls. Furthermore, both nucleases were used to assess the VNN1 promoter for

allele-specific chromatin accessibility by examining the level of DNase I

hypersensitivity sites and the degree of nucleosome protection from MNase in cell lines

homozygous for both the -137 and -587 SNPs.

3.2 Methods

A detailed description of the CHART-PCR methodology is shown in Section 2.9.1

(Chapter 2), with a brief summary included below.

Nuclei were isolated and digested with 20 U/mL nuclease using approximately 1 x 107

genotyped Ramos (ATCC, USA) and SAFHS cohort cells. Nuclease-digested DNA

fragments were purified using the QIAamp DNA blood mini kit (QIAGEN, Australia),

followed by quantitative real time PCR using approximately 100 ng DNA, 0.5µM of

primers (See Table 3.1) and SensiMix SYBR No Rox (Bioline Pty Ltd, Australia) in a

final reaction volume of 20 µL. Thermal cycling conditions for the CFX96 Real-Time

Detection System (Biorad Laboratories Inc., USA) were as follows: 95°C for 10 min;

followed by 40 cycles of 95°C for 15 seconds, 64°C for 15 seconds and 72°C for 15

seconds. Percentage chromatin accessibility was determined as a ratio of relative

abundance of digested to undigested amplicon, calculated using the CFX Manager

software (Biorad Laboratories Inc., USA). Statistics were performed using GraphPad

Prism v4 (GraphPad Software, USA).

92

Table 3.1: CHART-PCR primers

Sequence-specific forward and reverse (FWD and REV) primers were used for quantitative real-time PCR analysis of the -587 and -137 VNN1 SNPs and the SPA-2 and β-actin promoter sequences. Specific sequences are shown (“Primer sequence”) and were derived from published reports of promoter sequences regulating SPA-2 (Cruikshank et al., 2008; Li et al., 1998) and β-actin (Cruikshank et al., 2008; Danilition et al., 1991) or designed based on the sequence surrounding the VNN1 SNPs. Forward (FWD) and reverse (REV) primers are indicated by suffix. Pairs used in qPCR assays are grouped together and share the same prefix along with the predicted amplicon size (“Expected amplicon size”).

Primer designation Primer sequence (5’ to 3’) Expected

amplicon size (bp)

-587 CHART-PCR FWD GAGAATCACATGAACCCGAGAGG 129

-587 CHART-PCR REV GTGAGGACACCACTCTAATCTTTAACACAG

-137 CHART-PCR FWD CCTAGCGAA CTAATGTACATAGAGTTCTTGAG 104

-137 CHART-PCR REV AACGCTATGATTTCATAGCATTGTTACATAAC

SPA-2 CHART-PCR FWD

CTAAGTATTCCTCCAGCCTGAGTGTTC 145

SPA-2 CHART-PCR REV

GGTGGACAACAGCATTTATAGCATG

β-actin CHART-PCR FWD

CTGCACTGTGCGGCGAAGCCGGTG 177

β-actin CHART-PCR REV

CGGACGCGGTCTCGGCGGTGGTG

93

3.3 Results

3.3.1 Assessment of relative chromatin accessibility within the VNN1 gene promoter using DNase I

3.3.1.1 Chromatin accessibility of VNN1 SNPs relative to SPA-2 and β-actin control loci using DNase I

To determine the relative level of DNase I digestion within the -137 and -587 VNN1

promoter regions, the CHART-PCR methodology was applied on DNase I-digested

chromatin from heterozygous -137T/G and -587G/A cell lines and compared to control

sequences near the transcriptional start site of SPA-2 and β-actin. The results showed

that the -137 SNP had an equivalent DNase I cutting (23.0%) to that of β-actin (20.5%),

indicating high chromatin accessibility (Figure 3.1). In contrast, the level of DNase I

cutting observed at the -587 locus was 17.1%, implying a moderate-to-high accessibility

by comparison to controls (β-actin = 20.5% and SPA-2 = 7.5%). These results provide

evidence that the chromatin states of the -137 and -587 promoter regions are

transcriptionally active, at a level equivalent to that of the β-actin gene promoter.

94

-137

T/G

-587

G/A

SPA-2

B-A

ctin

0

5

10

15

20

25

30

35

*P=0.0048

*P=0.005

*P=0.024

*P=0.010

Genotype

Ch

rom

ati

n A

cc

es

sib

ilit

y

(% c

utt

ing

)

Figure 3.1: Chromatin accessibility differences at the -137 and -587 VNN1 SNPs

relative to SPA-2 and β-actin control regions. Chromatin accessibility at each VNN1 SNP, expressed as a ratio of undigested to DNase I-digested chromatin (% cutting). The data were obtained from the Ramos cell line which is heterozygous at each SNP (genotype) and assayed in triplicate from independent cultures (n=3). DNase I cutting values for each replicate are shown in full in Table A.4 (Appendix – Supplementary Tables and Figures). Significant difference between genotypes (P<0.05) was determined by an unpaired Student’s t-test and indicated by an asterisk (*). Error bars represent standard error of the mean across replicates of each sample.

95

3.3.1.2 Assessment of allele-specific chromatin accessibility at VNN1 SNPs using DNase I

To evaluate allele-specific chromatin accessibility, DNase I was used to characterise

chromatin derived from cell lines, homozygous at the -137 (-137T/T and -137G/G) and

-587 (-587G/G and -587A/A) loci. The CHART-PCR results showed significant

(P<0.05) allele-specific DNase I digestion at both loci (Figure 3.2). The -137T/T

homozygote (30.4%) had a 1.2-fold higher degree of DNase I cutting than the -137G/G

homozygote (25.5%). Similarly, the -587A/A homozygote (22.6%) possessed a 1.2-fold

higher degree of DNase I cutting than -587G/G (18.2%). On average, 1.3-fold higher

frequency of DNase I cutting was observed in the -137 homozygotes compared to -587

homozygotes (Figure 3.2), which supports the results from chromatin comparison in

heterozygous cell lines (Figure 3.1), which demonstrated the -137 locus has a higher

level of chromatin accessibility and transcriptional potential

96

-137

T/T

-137

G/G

-587

G/G

-587

A/A

0

5

10

15

20

25

30

35 *P=0.003

*P=0.001

*P=0.004

*P=0.003

*P=0.047

Genotype

Ch

rom

ati

n A

cc

es

sib

ilit

y

(% c

utt

ing

)

Figure 3.2: Chromatin accessibility differences at the -137 and -587 VNN1 SNPs. Chromatin accessibility at each VNN1 SNP, expressed as a ratio of undigested to DNase I-digested chromatin (% cutting). The data were obtained from cultured human cell lines derived from the SAFHS cohort for each genotype and assayed in triplicate from independent cultures (n=3). DNase I cutting values for each replicate are shown in full in Table A.5 (Appendix – Supplementary Tables and Figures). Significant difference between genotypes (P<0.05) was determined by an unpaired Student’s t-test and indicated by an asterisk (*). Error bars represent standard error of the mean across replicates of each sample.

97

3.3.2 Assessment of relative chromatin accessibility within the VNN1 gene promoter using MNase

3.3.2.1 Chromatin accessibility of VNN1 SNPs relative to SPA-2 and β-actin control loci using MNase

MNase was used to evaluate the relative level of chromatin accessibility in the VNN1

promoter region surrounding the heterozygous -137T/G and -587G/A loci by relative

comparison to SPA-2 and β-actin controls as per the CHART-PCR method. A mean

MNase cutting frequency of 30.8% was observed at the -137 locus, 25.9% at -587,

20.4% at SPA-2 and 29.9% for the β-actin locus (Figure 3.3). These results are

suggestive that the -587 SNP had low-to-moderate chromatin accessibility, whilst the -

137 SNP had moderate-high chromatin accessibility, similar to that of the β-actin

control. The observed levels of chromatin accessibility at this locus using MNase

support the data obtained using DNase I (Figure 3.1). Results obtained using MNase

and DNase I (Figure 3.3) provide evidence that the -137 and -587 promoter reigons are

transcriptionally active regions with moderate-high chromatin accessibility

98

-137

T/G

-587

G/A

SPA-2

B-A

ctin

0

5

10

15

20

25

30

35

*P=0.027

*P=0.008

*P=0.027

*P=0.015

*P=0.007

Genotype

Ch

rom

ati

n A

cc

es

sib

ilit

y

(% c

utt

ing

)

Figure 3.3: Chromatin accessibility differences at the -137 and -587 VNN1 SNPs

relative to SPA-2 and β-actin control regions. MNase-treated cell lines of the genotypes shown were used to calculate chromatin accessibility expressed as a ratio of undigested to MNase digested DNA (% cutting). The data were obtained from the Ramos cell line and assayed in triplicate from independent cultures (n=3). MNase cutting values for each replicate are shown in full in Table A.6 (Appendix – Supplementary Tables and Figures). Significance (P<0.05) was determined between genotypes by an unpaired Student’s t-test and shown by an asterix (*). Error bars show the standard error of the mean between replicates of each sample.

99

3.3.2.2 Assessment of allele-specific chromatin accessibility at VNN1 SNPs using MNase

To differentiate allele-specific chromatin accessibility differences at SNPs within the

VNN1 gene promoter, the chromatin accessibility of DNA from cells homozygous and

heterozygous for the -137 and -587 SNPs were compared using CHART-PCR (MNase

digestion). The results showed a MNase cutting percentage in -137G/G homozygotes of

13.6% (cell line 0306) and 13.6% (cell line 0525), which was significantly lower than

that of the -137T/T homozygotes with 24.8% (cell line 0188) and 29.1% (cell line 0194)

(Figure 3.4A). The -137T/G heterozygotes had a MNase cutting level intermediate to

that of the -137 homozygotes with 18.8% (cell line 0029) and 16.6% (cell line 0054)

(Figure 3.4A). The average nuclease cutting for each genotype were 13.6% for the -

137G/G homozygotes, 26.9% for the -137T/T homozygotes, and 17.7% for the

heterozygotes (Figure 3.4B). The -137T/T homozygote had a 2-fold higher nuclease

cutting level than that of -137G/G, indicating the presence of allele-specific chromatin

accessibility at the -137 locus.

At the -587 locus, the -587A/A homozygotes had a MNase cutting level of 26.6% (cell

line 0306) and 27.6% (cell line 0525), which was higher than that of the -587G/G

homozygotes with 7.2% (cell line 0188) and 13.9% (cell line 0194) (Figure 3.5A). The

degree of MNase cutting in -587G/A heterozygotes was intermediate compared to the -

587 homozygotes with 15.0% (cell line 0029) and 16.3% (cell line 0054) (Figure 3.5A).

The average MNase cutting of the genotypes were 10.6% for the -587G homozygotes,

27.1% for the -587A homozygotes, and 15.6% for the heterozygotes (Figure 3.5B). The

-587A homozygote had a 2.6-fold higher degree of MNase cutting than that of -587G/G,

indicating the presence of allele-specific chromatin accessibility at the -587 locus.

100

A

0306

(G/G

)

0525

(G/G

)

0029

(T/G

)

0054

(T/G

)

0188

(T/T

)

0194

(T/T

)

0

5

10

15

20

25

30

35

SAFHS cohort ID (-137 genotype)

Ch

rom

ati

n A

cc

es

sib

ilit

y

(% c

utt

ing

)

B

-137

G/G

-137

T/G

-137

T/T

0

5

10

15

20

25

30

35

*P=0.0179

*P<0.0001

*P<0.0003

Genotype

Ch

rom

ati

n A

ccessib

ilit

y(%

cu

ttin

g)

Figure 3.4: Chromatin accessibility differences at the -137 VNN1 SNP. MNase-

treated cell lines of the genotypes shown were used to calculate chromatin accessibility expressed as a ratio of undigested to MNase-digested DNA (% cutting). Figure 3.4A shows the data obtained from cultured cell lines derived from the SAFHS cohort for each genotype and assayed in quadruplicate or greater from independent cultures (n≥4). MNase cutting values for each replicate are shown in full in Table A.7 (Appendix – Supplementary Tables and Figures). Averages of the genotypes are shown in Figure 3.4B. Significance (P<0.05) between means was established by One-way ANOVA, with Turkey's Multiple Comparison Test used to determine significance between cell lines (genotypes) as shown in Figure 3.4B as an asterix (*). Error bars show the standard error of the mean between replicates of each sample.

101

A

0306

(A/A

)

0525

(A/A

)

0029

(G/A

)

0054

(G/A

)

0188

(G/G

)

0194

(G/G

)

0

5

10

15

20

25

30

35

SAFHS cohort ID (-587 genotype)

Ch

rom

ati

n A

cc

es

sib

ilit

y

(% c

utt

ing

)

B

-587

A/A

-587

G/A

-587

G/G

0

5

10

15

20

25

30

35*P<0.0001

*P<0.0001

*P=0.0048

Genotype

Ch

rom

ati

n A

cc

es

sib

ilit

y

(% c

utt

ing

)

Figure 3.5: Chromatin accessibility differences at the -587 VNN1 SNP. MNase-treated cell lines of the genotypes shown were used to calculate chromatin accessibility expressed as a ratio of undigested to MNase digested DNA (% cutting). Figure 3.5A shows the data were obtained from cultured human cell lines derived from the SAFHS cohort for each genotype and assayed in quadruplicate or greater from independent cultures (n≥4). MNase cutting values for each replicate are shown in full in Table A.8 (Appendix – Supplementary Tables and Figures). Averages of the genotypes are shown in Figure 3.5B. Significance (P<0.05) between means was established by One-way ANOVA, with Turkey's Multiple Comparison Test used to determine significance between cell lines (genotypes) as shown in Figure 3.4B as an asterix (*). Error bars show the standard error of the mean between replicates of each sample.

102

3.4 Discussion

CHART-PCR results in the present study are suggestive that the -137 and -587 SNPs

have an allele-specific effect on chromatin accessibility in VNN1, which supports the

findings of Göring et al. (2007) which identified these SNPs to be candidate functional

genetic variants. The overall level of nucleosome protection within the VNN1 gene

promoter localised to the -137 (30.8% for -137T/G) and -587 (25.9% for -587G/A) loci,

was found to be low relative to control genomic regions (20.4% for SPA-2 and 29.9%

for β-actin). This observation supports research that indicates that the VNN1 gene is

highly expressed within leukocytes (Galland et al., 1998; Jansen et al. 2009; Yanai et

al., 2005). Furthermore, the level of DNA-binding protein occupancy at the -137 and -

587 VNN1 SNPs, as a measure of DNase I hypersensitivity using CHART-PCR, was

shown to be approximately equivalent (23.0% for -137T/G, and 17.1% for -587G/A) to

that of the transcriptionally-active β-actin locus (20.5%). Consequently, chromatin

within the VNN1 promoter is likely to confer accessibility to the binding of transcription

factors and other potential regulators. In fact, allele-specific DNA-protein binding

complexes have been experimentally demonstrated at the -137 VNN1 SNP using

EMSA, with Sp1 being the likely candidate in the presence of the T allele (Göring et al.,

2007). Sp1 has been associated with high gene expression, and contains a zinc finger

protein motif which directly binds DNA and enhances transcription (Jiang et al., 2005).

Since high chromatin accessibility is observed in the presence of the -137T allele,

increased VNN1 expression due to greater Sp1 accessibility would be expected.

Alternatively, the 2-fold difference in the level of nucleosome protection in the presence

of the -137 alleles may also be sufficient to inhibit the binding of the Sp1 transcription

factor reported by Göring et al. (2007). Consequently, the allele-specific effects in

chromatin arrangement at the -137 and -587 variants are likely to influence VNN1 gene

expression.

103

At the -587 VNN1 SNP, the CHART-PCR results showed allele-specific chromatin

accessibility with DNase I (1.2-fold higher at -587A/A) and MNase (2.6-fold higher at -

587A/A), demonstrating that this genetic variant has a functional role at the chromatin

level. While Göring et al. (2007) identified the -587 SNP to be a functional candidate

based on statistical approaches, no allele-specific DNA-protein binding complexes were

found with EMSA (Göring et al., 2007). Therefore, both findings indicate the functional

mechanism at the -587 SNP is not due to allele-specific chromatin accessibility

contributing to differential transcription factor binding, as seen at the -137 SNP. In

Chapter 5 of the present study, the VNN1 gene promoter was assessed for allele-specific

DNA methylation associated with the -587 SNP using bisulfite sequencing.

Two control genomic regions, namely SPA-2 and β-actin, were assessed using CHART-

PCR to demonstrate the relative level of chromatin accessibility to DNase I and MNase

within the VNN1 gene promoter. SPA-2 and β-actin were selected to represent

transcriptionally-silent heterochromatin and transcriptionally-active euchromatin

controls respectively (Li et al., 1998; Zhang et al., 2005; Danilition et al., 1991;

Cruikshank et al., 2012). Importantly, the chromatin accessibilities of the two control

regions were found to be significantly different using both DNase I and MNase, which

matched the findings of Cruikshank et al (2008). For instance, the present study

reported 1.5-fold higher MNase digestion at the β-actin locus relative to SPA-2 in

Ramos cells, with Cruikshank et al (2008) reporting a similar fold difference (2.0) using

an equivalent methodology in Ramos cells. Therefore, both SPA-2 and β-actin are

effective as control genomic regions for chromatin accessibility within VNN1.

104

While the mode of nucleolytic digestion and the recognition sites differed for DNase I

and MNase, both nucleases were able to identify allele-specific chromatin accessibility

at the -137 and -587 VNN1 SNPs. Functionality, in terms of allele-specific chromatin

accessibility, was assigned to SNPs that showed a significant difference (P<0.05)

between alleles as implemented by Cruikshank et al. (2012) at the rs3813946 SNP in

the CR2 gene. The approach used in the present study, extended upon that of

Cruikshank et al. (2012) to include DNase I, which gave equivalent results to that of

MNase. While the digestion level of MNase and DNase I differed acrossed cell lines,

both nucleases corresponded with equivalent fold differences when comparing the

heterozygous -137T/G and -587G/A (1.3-fold with DNase I and 1.2-fold with MNase)

in Ramos cells. Therefore, the use both DNase I and MNase was adequate for the

identification of functional genetic variants based on chromatin accessibility and

potentially could be applied to other nucleases and genetic variants.

The selection of both DNase I and MNase for use in chromatin arrangement studies

provided a greater level of confidence in assigning function to genetic variants. Taken

together, the results of CHART-PCR using DNase I and MNase indicate

complementary structural chromatin information regarding the VNN1 gene promoter,

including the relative positions of nucleosomes and hypersensitive sites.

Although CHART-PCR studies have enabled the in vivo characterisation of chromatin

arrangement and transcriptional potential in multiple genes and organisms (Fürbass et

al., 2008; Su et al., 2004; Goriely et al., 2004; Pompei et al., 2007), variability can

arise when targeting the alleles of individual genetic variants. Given the requirement for

the comparison of chromatin between different homozygotic cell lines, chromatin

accessibility values derived from CHART-PCR remain a relative measures approach

105

(Cruikshank et al., 2008), rather than an ‘absolute’ figure that accounts for all biological

variation when comparing multiple cell lines. For example, cell line-specific MNase

digestion was observed at both the -137 and -587 VNN1 SNPs in the present study.

MNase digestion levels at the -137T/G locus in the Ramos cell line was 30.8%, as

compared to 18.8% (‘0029’) and 16.6% (‘0054) in the SAFHS cell lines. In this case,

the characteristics of the cell line such as genotype and culture conditions also need to

be considered when interpreting CHART-PCR measurements (Baral et al., 2012).

Whole-genome comparisons of heterozygotic cell lines overcome biological variation

associated with assessment of multiple cell lines by simultaneously assessing both

alleles at a given genetic variant in the context of a genome (Pastinen, 2010).

Consequently, the development of novel genome-wide approaches for targeting

chromatin accessibility at genetic variants may overcome these inherent limitations

associated with current assays such as CHART-PCR.

106

Chapter 4: CHA-seq - A novel functional variant screen

4.1 Introduction

Genetic variants correlating with allele-specific effects on gene expression or phenotype

are considered to be functional (Wang et al., 2011; Jian et al., 2011). Functional genetic

variants have been associated with cell function (Bonetti et al., 2012; Johnson et al.,

2012) and disease (Bartoszewski et al., 2010; Zhang et al., 2011). Functional genetic

variation can be attributed to allelic sequence changes in regulatory elements, which

may alter gene expression, often without an obvious effect on phenotype (Kleinjan and

van Heyningen, 2005; Pastinen et al., 2004). For example, Pastinen et al. (2004)

surveyed allelic imbalances in gene expression in heterozygous human cells, and

reported that 23 of the 126 genes tested showed a bias in gene expression toward one

allele. This allelic imbalance approach involved quantitative genotyping of

heterozygous individuals for genetic variants within RNA transcripts and compared

observed allele ratios to assumed equimolar (50:50) allele ratios. Pastinen et al. (2004)

determined an average observed allele ratio of 37:63 as the cutoff for determining the

presence of a significant allelic imbalance. As such, an allelic imbalance of greater than

37:63 is a likely indicator of functional genetic variants.

Traditionally, reporter gene and gel shift assays have been considered sufficient to

assess the in vitro functionality of genetic variants; however there are many variables

that may confound results (Karimi et al., 2009; Holden and Tacon, 2011). Furthermore,

these ‘gold standard’ approaches have inherent limitations including locus-specificity

and the lack of endogenous chromatin context (Hager et al., 2009). While more recent

in vivo approaches such as CHART-PCR can be used to functionally interrogate

107

endogenous chromatin contexts, the applicability of using existing locus-specific assays

to interpret genetic variants in a genome-wide context remains time and cost-inefficient

(Barski et al., 2007).

With advances in DNA sequencing technologies, genome-wide approaches have

become efficient alternatives for identifying functional genetic variants (Hunt et al.,

2008; Satake et al., 2009; Collas, 2010; Statham et al., 2012). The use of sequencing to

characterise chromatin arrangement, as opposed to quantitative real-time PCR in the

CHART-PCR approach, not only allows scalability but also eliminates PCR-based

issues, including primer design, hetero-duplex formation, mis-priming and primer

dimerisation (Freeman et al., 1999; Bustin et al., 2005; and Git et al., 2010). The

current study demonstrates a novel sequencing-based approach for assessing

functionality of non-coding sequence variation, by developing strategies that assess

function in an in vivo context.

The development of strategies that assess differences in chromatin arrangement as a

measure of functional difference in an in vivo context represents an ideal approach to

greatly increase time and cost-efficiency and to assign function to a particular genetic

variant within an extended genomic region. The current study translates traditional

chromatin arrangement assays such as the CHART-PCR method into a novel, robust

approach (CHA-seq) in order to assess allele-specific differences in chromatin arrangement

across an extended genomic region, based on next generation sequencing. The CHA-seq

approach represents a means of simultaneously sequencing genetic variants, mapping

nucleosomes and nuclease hypersensitivity sites, and assessing SNP functionality. The

development of enhanced genome-wide in vivo approaches allows for a more time and

108

cost-effective assessment of whether a variant does indeed regulate gene expression and

yields insights into the long range allele-specific mechanisms of action.

In this study, two novel in vivo approaches (‘targeted CHA-seq’ [T-CHA-seq] and

‘extended CHA-seq’ [CHA-seq]) were developed in order to characterise allele-specific

differences in chromatin arrangement within the VNN1 gene (Figure 4.1). For the

development of T-CHA-seq, a DNase I-digested genomic library was prepared from a -

137T/G heterozygous cell line from the SAFHS cohort and amplified for the -137 SNP

using two rounds of ‘nested’ PCR. This T-CHA-seq library was subsequently sequenced

using traditional Sanger sequencing in order to map nuclease digestion sites around the -137

SNP. For CHA-seq, a MNase-digested genomic library was prepared from a -587G/A

heterozygous cell line from the SAFHS cohort, and subjected to high-throughput Illumina

sequencing in order to map allelic-nuclease digestion sites across the entire VNN1, VNN2

and VNN3 genes. Subsequently, library sequence reads for both approaches were

compared to establish the presence of an allele-specific bias in read number (or ‘allele-

skew’) as a direct reflection of allele-specific nuclease digestion and functionality. To

facilitate a direct comparison between different alleles at a given SNP, SAFHS cell lines

containing heterozygous SNPs were used in the two novel approaches and loci

correlating with high allele-skews were considered to be functional genetic variants.

109

Figure 4.1 Principles of CHA-seq and T-CHA-seq. A: CHA-seq consists of firstly the preparation of a MNase-digested genomic library using the NEBNext DNA Sample Prep kit (New England Biolabs) on a -587G/A heterozygous cell line from the SAFHS cohort. After bead selection to enrich for genes of interest, high-throughput Illumina sequencing is performed and reads are mapped to the human genome using the Integrated Genome Viewer (IGV)(Broad Institute) in order to map allelic-nuclease digestion sites across the

VNN1, VNN2 and VNN3 genes. B: T-CHA-seq consists of firstly the preparation of a DNase I-digested genomic library using the NEBNext DNA Sample Prep kit (New England Biolabs) on a -137T/G heterozygous cell line from the SAFHS cohort. After amplification of the -137 SNP using two rounds of ‘nested’ PCR, the library is cloned in competent E.coli and subsequently sequenced using Sanger sequencing in order to map nuclease digestion sites around the -137 SNP.

CHA-seq T-CHA-seq

A. B.

110

4.2 CHA-seq method development

The two novel CHA-seq methods required optimisation at multiple key steps in library

preparation (nuclease digestion and adapter-ligation) and PCR amplification

(optimisation of the PCR conditions in T-CHA-seq) as shown in Sections 4.2.2 (T-

CHA-seq) and 4.2.3 (CHA-seq), with the optimised CHA-seq methodologies described

in detail in Chapter 2 (Section 2.10 for T-CHA-seq and Section 2.11 for CHA-seq).

In order to determine the presence or absence of an allele-specific bias in chromatin

accessibility via CHA-seq, a proportion is calculated based on the sequence reads of

MNase or DNase-digested fragments containing each allele at a given heterozygous

SNP locus relative to the total number of reads. This allelic proportion relative to a

random distribution (50%), or ‘allele-skew’, is a direct reflection of chromatin

arrangement and nuclease-sensitivity. For example, if 60% of sequence reads contained

the A allele at a given heterozygous SNP locus (and consequently 40% the alternate

allele), then the allele-skew (from a random distribution) is 10%. A two-tailed Student’s

t-test was then applied to determine statistical significance (P<0.05). Those SNPs that

recorded a CHA-seq allele ratio of greater than 37:63, or 13% allele skew, were

identified as candidate functional variants based on the allele bias cutoff set by Pastinen

et al. (2004).

T-CHA-seq is a site-specific and more cost-effective approach for functional assessment

of genetic variants, in comparison to CHA-seq, as it does not require deep sequencing.

The T-CHA-seq method is of particular use when candidate SNPs or preliminary data is

available regarding functionality. Alternatively, CHA-seq is a large-scale functional

screen providing the DNA base sequence, novel SNP identification, chromatin

arrangement assessment, nucleosomal positioning information, and the ability to predict

111

areas of nuclease hypersensitivity (and consequently transcription factor binding sites).

A limitation of CHA-seq, when using Illumina sequencing technology, is the inherent

error rate of 0.99% for comparisons of sequence reads (Illumina Inc, 2009). Therefore,

when interpreting CHA-seq results, this error rate was considered when determining the

viability of further functional assays for candidate SNPs.

4.2.1 Traditional chromatin arrangement assays – issues

Although the traditional CHART-PCR approach has been sufficient for characterising

the chromatin arrangement at specific genomic loci as evident in the literature

(Cruikshank et al., 2012; Rao et al., 2001), this approach requires extensive

optimisation to obtain consistent results. Of particular importance to the reproducibility

of CHART-PCR, is the normalisation to chromatin accessibility controls. Variation in

the chromatin accessibility of control loci such as β-actin (Figure 4.2), illustrates that

there are inconsistencies arising from the chromatin preparation which can significantly

influence chromatin accessibility values when comparing replicates and multiple cell

lines. In order to overcome these issues, extensive optimisation was required in order to

target VNN1 SNPs for the establishment of consistent chromatin accessibility

measurements using CHART-PCR (Chapter 3). Essential to this optimisation was the

establishment of an optimal nuclease concentration to reliably discriminate chromatin

accessibility. For example, the β-actin locus shown in Figure 4.2 had a higher MNase

concentration (4 enzyme units) and hence higher nuclease digestion to that used in the

optimised CHART-PCR methodology, shown in Figure 3.3 (1 enzyme unit). Partial

nuclease digestion proved better at discriminating chromatin accessibility and was used

in CHART-PCR (Chapter 3) and CHA-seq in the current study.

112

0029

(1)

0029

(2)

0029

(3)

0054

(1)

0054

(2)

0054

(3)

0331

(1)

0331

(2)

0331

(3)

0

80

90

100

SAFHS chohort ID (replicate number)

Ch

rom

ati

n A

cc

es

sib

ilit

y

(% c

utt

ing

)

Figure 4.2: Chromatin accessibility differences between replicates at the β-actin

promoter using CHART-PCR. MNase-treated cell lines from the SAFHS cohort (‘0029’, ‘0054’ and ‘0331’) were used to calculate chromatin accessibility expressed as a ratio of undigested to MNase-digested DNA (% cutting) at sequence near the β-actin

promoter in CHART-PCR. The data were obtained from SAFHS cell lines shown and assayed in triplicate (n=3). No significance (P<0.05) was observed between samples as determined by one-way ANOVA (P value = 0.98).

113

4.2.2 Targeted CHA-seq (T-CHA-seq) development

To demonstrate the utility of CHA-seq as a functional assay, a targeted CHA-seq (T-

CHA-seq) approach was developed to verify the results of CHART-PCR (Chapter 3)

which compared allele-specific chromatin accessibility at the -137 VNN1 SNP. T-CHA-

seq was applied to SAFHS cell line ‘0331’, which was heterozygous at the -137T/G

locus, and assessed for chromatin accessibility using DNase I. In this case, DNase I was

selected as the chromatin-sensitive nuclease based on its ability to discriminate genomic

regions that are under protein occupancy, given that the -137 locus is likely to be

regulated by Sp1 (Göring et al., 2007).

4.2.2.1 T-CHA-seq library preparation – DNase I digestion optimisation

Chromatin was isolated from a SAFHS cell line (‘0331’) which was heterozygous for

multiple VNN1 SNPs (including -137 and -587) and treated with DNase I. Various

reaction conditions and enzyme concentrations were trialled. In order to determine

optimal digestion, prepared chromatin was exposed to different DNase I concentrations

(1 U, 2 U, 3 U, 4 U and 6 U) and incubation lengths at 37˚C (2, 3, 4, 5 and 6 minutes) as

shown in Figure 4.3. The 6 minute incubation using 6 U DNase I (lane 7) was selected

based on the presence of a higher proportion of fragments located within the mono- and

di-nucleosomal size range (~100-400 bp) which was optimal for subsequent library

preparation steps using the NEBNext DNA Sample Prep kit (New England Biolabs,

USA). Additional experimentation was conducted to assess longer nuclease incubation

periods which showed that digestion was complete after six minutes. Optimal DNase I

digestion (6 minutes incubation using 6 U) was used for all subsequent preparations.

114

Figure 4.3: Electrophoretic patterns of DNase 1-digested chromatin. Electrophoresis was performed using a 1% agarose gel stained with ethidium bromide and run for 45 minutes at 90V. The NEB PCR Marker was loaded in lane 1 along with the NEB 100 bp DNA ladder in lane 14, undigested chromatin derived from the SAFHS cohort in lane 2 (3 µg chromatin) and lane 8 (1 µg chromatin, note faint band due to loading spillage). 3 µg of chromatin derived from the SAFHS cohort was incubated in 6 U of DNase I with different incubation times and loaded in lane 3 (2 minutes), lane 4 (3 minutes), lane 5 (4 minutes), lane 6 (5 minutes) and lane 7 (6 minutes). 1 µg of SAFHS chromatin incubated for 6 minutes incubated in different DNase I concentrations was loaded in lane 9 (1 U), lane 10 (2 U), lane 11 (3 U), lane 12 (4 U) and lane 13 (6 U). Optimal digestion of the SAFHS chromatin was established using 6 minute incubation with 6 U DNase I.

115

4.2.2.2 T-CHA-seq library preparation – adapter-ligation optimisation

DNase 1-digested fragments within the 100-400 bp range were selected following

agarose gel electrophoresis, cut out using a scalpel, and purified using the QIAquick Gel

Extraction kit (QIAGEN, Australia). The fragments were subsequently prepared

according to the protocol outlined in the NEBNext DNA Sample Prep kit (New England

Biolabs, USA). It was of particular importance to ensure the optimal size of DNA

fragments following the ligation of adapters using agarose gel electrophoresis. As

shown in Figure 4.4, the highest proportion of fragments were approximately 300 bp

(di-nucleosomal) as estimated by comparison to the NEB PCR marker (New England

Biolabs, USA). An additional 50 bp was also observed, which matches the size of the

adapters used in the library preparation and was subsequently removed using the

QIAquick PCR purification kit (QIAGEN, Australia).

116

Figure 4.4: Adapter ligation to DNase I-digested chromatin. Electrophoresis was performed using a 1% agarose gel stained with ethidium bromide and run for 45 minutes at 90V. The NEB PCR Marker was loaded in lane 1 along with DNase I-digested fragments that had been prepared using the NEBNext DNA Sample Prep kit (New England Biolabs, USA) following the ligation of adapter sequence in lane 2.

50bp

150bp

300bp

500bp

766bp

1 2

117

4.2.2.3 PCR amplification - optimisation

In order to target specific genomic regions, namely the -137 VNN1 SNP, PCR

amplification (Section 2.7) was used to enrich prepared library fragments following

preparation using the NEBNext DNA Sample Prep kit (New England Biolabs, USA).

Two rounds of PCR amplification using ‘nested’ sets (primer sets positioned internally

to the previous set) of VNN1-specific primers were designed to increase the yield of

fragments containing the -137 locus from the forward (first round primer: 5’-

GCAGCCCTAGCGAACTAATG-3’ and second round primer: 5’-

TCTTGAGTTAATCTTCACAAATTACTGC-3’) and reverse (first round primer: 5’-

GTCCAGAACTTAGTACCTTAAACTACGGAAG 3’ and second round primer: 5’-

CGCTATGATTTCATAGCATTGTTACATAAC 3’) direction (Figure 4.5). In each

round of ‘nested’ PCR amplification, a VNN1-specific primer was paired with an

adapter-specific primer (5’-ACTCTTTCCCTACACGACGC-3’) to ensure only library

fragments were amplified. The results following PCR amplification showed the

presence of a ‘smear’ which represented a range of different sized fragments. The size

of the first and second round of ‘forward’ amplicons were within a range of

approximately 100-350 bp, indicating the presence of different VNN1-enriched

fragments (Figure 4.6), with the reverse library resulting in equivalent sized amplicons

(Figure 4.7). The forward and reverse -137T/G amplicon libraries were subsequently

pooled and sequenced using Sanger-based chemistry (Section 4.3.1).

118

5’-CTATGGGTCAAGGAGAGAGGCCTGGAACACATCTTTCCTTCACCGAAA

TCAGGAGGAACCAACTCTGCTGACACCTTCATCTGGGACTCCCACCCTCCAG

AACTGCAAAGCAATAAATTTTTTATTTTTTACACCACCCAGTTTATTGTATTT

TGTTAGGCAGCCCTAGCGAACTAATGTACATAGAGTTCTTGAGTTAATCTTCA

CAAATTACTGCAATAAGGKAGGGTCTTTTGTTATGTAACAATGCTATGAAAT

CATAGCGTTTTCTTAATTAACTTCCGTAGTTTAAGGTACTAAGTTCTGGACACC

ACGTGTCTTCTTTCTATAAATACCAGGACATGCTCTGTTTTTCAGCACT-3’

Figure 4.5: T-CHA-seq primer positions in the -137 VNN1 promoter region. Two rounds of PCR amplification were performed using nested sets of sequence-specific primers in order to increase target yield of the -137 VNN1 locus from both the forward (5’ to 3’) and reverse (3’ to 5’) direction. Shown are the consensus sequences of the first (italics) and second-round ‘forward’ (black boxes) and ‘reverse’ (grey boxes) T-CHA-seq PCR primers within the VNN1 promoter region relative to the -137 SNP (K). In each round of amplification, one corresponding VNN1-specific primer was paired with an adapter-specific primer to ensure only library fragments were amplified.

119

I-digested library

1 2 3 4 5 6

Figure 4.6: T-CHA-seq ‘forward’ PCR amplification of DNase I library fragments. Electrophoresis was performed using a 1% agarose gel stained with ethidium bromide and run for 45 minutes at 90V. The NEB PCR Marker (sizes: 50 bp, 150 bp, 300 bp, 500 bp, 766 bp) was loaded in lane 1 and 6 along with the ‘forward’ (5’ to 3’) PCR amplification products of the T-CHA-seq library in lane 2 (first round amplicons) and lane 4 (second round amplicons). Controls (no template) were run in lane 3 (first round) and 5 (second round). *Note: A 1% agarose gel was selected to confirm whether amplification was successful rather than to establish the size range which was determined in Figure 4.4.

50bp

150bp

300bp

500bp

766bp

120

1 2 3 4 5

Figure 4.7: T-CHA-seq ‘reverse’ PCR amplification of DNase I library fragments. Electrophoresis was performed using a 1% agarose gel stained with ethidium bromide and run for 45 minutes at 90V. The NEB PCR Marker (sizes: 50 bp, 150 bp, 300 bp, 500 bp, 766 bp) was loaded in lane 1 along with the ‘reverse’ (3’ to 5’) PCR amplification products of the T-CHA-seq library in lane 2 (first round amplicons) and lane 4 (second round amplicons). Controls (no template) were run in lane 3 (first round) and 5 (second round). *Note: A 1% agarose gel was selected to confirm whether amplification was successful rather than to establish the size range which was determined in Figure 4.4.

50bp

150bp

300bp

500bp

766bp

121

4.2.2.4 T-CHA-seq methodology following PCR amplification

Following PCR enrichment of the DNase I-digested libraries for the -137 VNN1 SNP,

TOPO TA cloning (Invitrogen Corporation, Australia) and bacterial transformation

steps using One Shot MAX Efficiency DH5α-T1R Competent Cells (Invitrogen

Corporation, Australia) were performed (See Sections 2.2.3 and 2.2.4) in order to isolate

and amplify single library fragments within plasmid vectors, which were subsequently

purified and sequenced using Sanger-based chemistry. Following blue-white colony

selection (See Section 2.2.4), positive bacterial clones carrying enriched library

fragments were purified using the QIAGEN Spin Mini-prep kit (QIAGEN, Australia)

(See Section 2.5.1), and subsequently sequenced using Sanger-based chemistry at the

Australian Genome Research Facility (AGRF) (See Section 2.8). The DNA sequence

reads were downloaded (http://www.agrf.org.au/) and then processed using

bioinformatic tools (BLAST). Results are shown in Figures 4.10 and 4.11.

4.2.3 Extended CHA-seq method development

The CHA-seq approach was modified from the targeted T-CHA-seq approach in order

to screen larger genomic regions for functional genetic variants (Section 2.11). CHA-

seq required a bead selection step (as opposed to locus-specific primers in T-CHA-seq)

and deep-sequencing in order to extend the target assay region. CHA-seq was applied to

a SAFHS cell line (‘0331’) which was heterozygous for multiple VNN1 SNPs

(including -137 and -587) and assessed for chromatin accessibility using MNase. The

use of MNase generated detailed information regarding nucleosome positions and

relative levels of nucleosome protection across the VNN1 gene that extends upon the

CHART-PCR results (Chapter 3).

122

4.2.3.1 Library preparation – MNase digestion optimization

Chromatin was isolated from a SAFHS cell line (‘0331’) and treated with MNase under

various reaction conditions to determine optimal nuclease digestion to generate a higher

distribution of fragments within the mono- and di-nucleosomal size range (Figure 4.8).

Prepared chromatin was exposed to different MNase concentrations (0.5, 1, and 2

enzyme units) and incubation times (5, 10 and 15 minutes) (Figure 4.8). As shown in

Figure 4.8, a ten minute incubation using one enzyme unit of MNase was selected as

optimal and thus utilised for subsequent library preparation using the NEBNext DNA

Sample Prep kit (New England Biolabs, USA).

123

1 2 3 4 5 6 7

Figure 4.8: Electrophoretic patterns of MNase-digested chromatin. Electrophoresis was performed using a 1% agarose gel stained with ethidium bromide and run for 45 minutes at 90V. The NEB 1 kb DNA ladder (5-10 kb bands indistinguishable) was loaded in lane 1 along with micrococcal nuclease-digested chromatin isolated from a SAFHS cohort cell line (‘0331’) using different incubation lengths and enzyme concentrations in lanes 2 (10 minute digestion, 0.5 U MNase), 3 (10 minute digestion, 1 U MNase), 4 (10 minute digestion, 2 U MNase), 5 (5 minute digestion, 1 U MNase), 6 (10 minute digestion, 1 U MNase) and 7 (15 minute digestion, 1 U MNase).

500bp

1000bp

1500bp

2000bp

3000bp 4000bp

124

4.2.3.2 CHA-seq methodology post-library preparation

Following preparation of the MNase-digested library using the NEBNext DNA Sample

Prep kit (New England Biolabs, USA), target sequences containing the entire Vanin

gene family were supplied to Agilent to synthesise pools of biotinylated probes specific

for the SureSelect DNA Target Enrichment system. Libraries were purified using a

bead-based selection protocol with the SureSelect Target Enrichment System - Illumina

Single-End Sequencing Platform Library Prep Version 1.2 (Agilent, USA) in

accordance with the manufacturer’s instructions.

Prepared libraries were processed for cluster generation on the Illumina cluster station

(Agilent, USA) and subsequently sequenced using Illumina next-gen sequencing

technology on the Illumina Genome Analyzer II platform (Agilent, USA). DNA

sequence reads were analysed using the IGV 2.1 software and aligned to the hg19

reference sequence. Raw sequence data was used to interpret chromatin accessibility by

comparing numbers of allelic sequence reads at heterozygous SNPs (Table 4.1 and 4.3).

The optimised protocol for CHA-seq is described in detail in Chapter 2 (Section 2.11).

4.3 Results

4.3.1 Targeted CHA-seq (T-CHA-seq)

To assess allele-specific chromatin accessibility, a DNase I-treated genomic library was

prepared from a -137G/A heterozygous cell line from the SAFHS cohort, enriched for the -

137 locus using two rounds of PCR amplification, and subjected to Sanger sequencing to

map the nuclease digestion sites. The results showed the presence of an area of high

nuclease accessibility directly adjacent (5’ upstream) to the -137 SNP, suggesting the

influence of histone or bound protein protection localised to the -137 locus (Figure 4.9).

Comparison of the number of sequenced library fragments ending at a particular nucleotide

(and hence mapping the sites of digestion by the nuclease) for each -137 variant, indicated

125

an allele-specific bias (allele skew) in the digestion pattern (Figure 4.9). Only 11.4% of the

total number of fragments carried -137G, while the majority carried the T allele (89.6%).

This is consistent with the results of the CHART-PCR (Chapter 3) that showed the -137G

allele to be correlated with greater chromatin condensation than the -137T allele.

126

Figure 4.9: Assessment of the chromatin arrangement across the VNN1 promoter

using T-CHA-seq. Shown is the total coverage across the VNN1 promoter sequence, the number of reads (vertical axis) carrying the -137 variant (K), and the position of the 3’ terminal nucleotide located in the region derived from T-CHA-seq results. Sequence reads containing the -137T allele are shown as black columns and those with the -137G allele in grey. Sites of digestion by DNase I are plotted across (horizontal axis) the -137 VNN1 promoter region, with the number of sequenced fragments ending at each nucleotide indicated by the vertical axis for each -137 allele (T or G). The areas of highest sequence reads (15) are 15, 16, and 17 base-pairs upstream of the -137 SNP and contain the T allele. The number of reads containing the -137T allele (132) were higher than that of the -137G allele (17) indicating an allelic bias towards the T allele.

127

To further delineate exact positions of DNase I hypersensitive sites within the -137

promoter region, the total number of T-CHA-seq sequence reads were mapped to the -

137 VNN1 promoter region and included those reads terminating prior to the -137 locus

(Figure 4.10). There appears to be two regions of higher DNase I accessibility (≥13

reads) adjacent to the -137 SNP (1-5 bp 3’ downstream, and 11-15 bp 5’ upstream of the

SNP) (Figure 4.12).

128

Analysis of hypersensitivity about the -137 VNN1 gene promoter SNP using sequenced

library fragment ends

Figure 4.10: Assessment of the chromatin arrangement across the VNN1 promoter

using T-CHA-seq. Shown is the total sequence read coverage across the VNN1 gene promoter (horizontal axis) with the position of the 3’ terminal nucleotide located in the -137 (K) promoter region derived from T-CHA-seq results. Numbers of sequence reads are shown (vertical axis) as black columns, with the areas of highest sequence reads (16) occurring at positions -136 and -135.

129

4.3.2 CHA-seq

To map nuclease digestion sites across the entire Vanin gene family, including the VNN1

gene, a MNase-digested genomic library was prepared from a -587 G/A heterozygous cell

line from the SAFHS cohort and subjected to deep sequencing. Using a prepared library of

DNA fragments covering the VNN1 gene, 41 heterozygous SNPs were captured

following sequencing and analysed using the Integrated Genome Viewer (IGV) 2.1

software (Broad Institute) to determine the location and allele skew for each VNN1 SNP

(Table 4.1).

To assign rs numbers, the chromosome position (based on the hg19 human reference

genome) of captured CHA-seq SNPs were matched to the dbSNP database

(http://www.ncbi.nlm.nih.gov/snp) and verified by comparing CHA-seq reads to the

SNP sequence in dbSNP. At the time of analysis (29-6-13), there was 1042 SNPs in

dbSNP that matched to the VNN1 gene. A total of 12 heterozygous SNPs captured in

CHA-seq corresponded with rs numbers. The remaining SNPs may be either novel

genetic variants within the SAFHS cohort, or base substitutions introduced as part of

Illumina sequencing error. For example, the SNP at chromosome position 133,035,606

in the VNN1 promoter had not been assigned an rs number at the time of

experimentation (2010), but has since been designated rs45513194. Therefore, the novel

SNPs identified in the current study may be verified in future studies. To reduce the

likelihood of error introduced by the Illumina sequencing, only those captured SNPs

with rs numbers were considered for further functional analysis.

A summary of the distribution of captured heterozygous SNPs across the VNN1 gene

indicates a number of densely populated genomic regions carrying captured

heterozygous SNPs (Figure 4.11). Of interest is the presence of five intronic SNPs that

130

are co-located between chromosomal positions 133,023,083 to 133,023,087 which

clearly show an active region that affects the conformation of VNN1 chromatin (Figure

4.11). Of the 41 captured SNPs in CHA-seq, 37 were found to be significantly different

from random distribution (50:50) between alleles (P<0.005) (Table 4.1). Noticeably, the

-587 SNP (88% A and 12% G) and the rs2267951 SNP (85% T, and 15% C) have an

allele-specific influence on VNN1 chromatin, with a further 21 SNPs that are also likely

to contribute (greater than 13% + 1% Illumina sequencing error rate) based on the allele

ratio cutoff set by Pastinen et al. (2004).

131

Table 4.1: Heterozygous VNN1 SNPs captured in CHA-seq. Shown is a summary of the Integrated Genome Viewer (IGV) software (Broad Institute) output from CHA-seq that detected 41 heterozygous SNPs in the VNN1 gene by comparison to the hg19 human reference sequence. Shown for each captured VNN1 SNP are the genomic positions relative to the transcriptional start site (TSS), position within the VNN1 gene (promoter/5’upstream, intronic or exonic), alleles, total CHA-seq sequence reads carrying the SNP, rs SNP numbers (N/A = no dbSNP match), allele frequencies (allele 1 and 2), and allele skew as a direct reflection of chromatin accessibility and functionality (percentage). Allele-skews that were statistically significant using the critical t-value (2.97% at P=0.005, df=41, 2-tailed) are bolded as shown. The range of nucleosomal protection at genetic variants in the VNN1 gene was between 1-38%. The average allele skew across all captured VNN1 SNPs was 13.6%, with the rs2050153 (-587) variant having the highest recorded allele-skew (38%).

Chr 6

position

Genomic location Alleles Total

reads

rs SNP

number

Allele

1 %

Allele

2 %

Allele

skew

133,035,897 promoter/5'upstream T-C 2817 rs7760367 56 44 6

133,035,856 promoter/5'upstream C-T 3250 rs13204527 53 46 3.5

133,035,776 promoter/5'upstream A-G 5253 rs2050153

(-587)

88 12 38

133,035,606 promoter/5'upstream G-C 4955 rs45513194 61 39 11

133,035,099 exon 1 A-G 1432 rs2294757 65 34 15.5

133,034,654 intron 1 A-T 1069 N/A 55 45 5

133,034,653 intron 1 G-A 1070 N/A 54 46 4

133,034,652 intron 1 G-C 1070 N/A 54 45 4.5

133,034,650 intron 1 T-C 1070 N/A 54 46 4

133,034,647 intron 1 A-T 1070 N/A 55 45 5

133,034,549 intron 1 C-T 1099 N/A 56 44 5.5

133,034,550 intron 1 C-T 1099 N/A 55 45 6

133,034,553 intron 1 C-T 1099 N/A 56 44 6

133,030,761 intron 2 G-A 758 rs148984574 56 44 6

133,023,821 intron 2 C-T 1439 rs113832566 66 34 16

133,023,820 intron 2 A-G 1439 N/A 66 34 16

133,023,174 intron 2 A-C 1175 N/A 61 39 11

133,023,167 intron 2 A-C 1175 N/A 61 39 11

133,023,087 intron 2 G-C 1951 N/A 71 26 22.5

133,023,086 intron 2 A-T 1951 N/A 73 26 23.5

133,023,085 intron 2 G-A 1951 N/A 73 27 23

133,023,084 intron 2 A-G 1951 N/A 74 26 24

133,023,083 intron 2 C-A 1913 rs78040505 73 27 23

133,020,445 intron 2 C-T 1806 N/A 66 34 16

133,020,383 intron 2 A-T 1913 N/A 68 32 18

133,020,380 intron 2 T-A 1913 N/A 68 32 18

133,020,379 intron 2 T-G 1913 N/A 68 32 18

133,020,376 intron 2 G-T 1912 rs2267949 68 32 18

133,019,859 intron 2 T-C 1522 rs2267951 85 15 35

133,019,482 intron 2 T-C 1041 N/A 51 49 1

133,013,475 exon 5 T-G 2044 N/A 73 27 23

133,013,474 exon 5 T-C 2043 N/A 73 27 23

132

133,013,374 exon 5 A-C 1524 N/A 69 31 19

Chr 6

position

Genomic location Alleles Total

reads

rs SNP

number

Allele

1 %

Allele

2 %

Allele

skew

133,013,369 exon 5 A-C 1520 N/A 69 31 19

133,012,903 intron 5 A-G 846 N/A 51 49 1

133,012,888 intron 5 G-A 846 N/A 51 49 1

133,012,885 intron 5 A-T 846 N/A 51 49 1

133,012,858 intron 5 A-T 1254 N/A 65 35 15

133,010,071 intron 5 T-C 854 N/A 72 28 22

133,005,239 intron 6 C-T 1656 rs45525739 70 30 20

133,004,992 intron 6 A-T 3015 rs9389025 65 35 15

Average 63.5 36.2 13.6

133

Figure 4.11: Distribution of heterozygous SNPs captured in CHA-seq across the

VNN1 gene relative to allele skew. Shown is a graphical summary of the Integrated Genome Viewer (IGV) software (Broad Institute) output from CHA-seq that detected 41 heterozygous SNPs in the VNN1 gene by comparison to the hg19 human reference sequence. Shown are the positions of captured VNN1 SNPs relative to the transcriptional start site (x-axis) and the associated allele-skew as a percentage (y-axis). Exonic (black rectangles), intronic (black bar between exons) and promoter (points with negative x-axis values) regions are shown in the VNN1 gene. The highest recorded allele-skews were observed at the -587 SNP (38%) in the promoter region, and at the rs2267951 SNP (35%) within intron 2 (red arrows).

134

4.3.2.1 Allele-specific nuclease activity at the -587 SNP

To interpret allele-specific areas of MNase sensitivity within the -587 VNN1 promoter

region derived from CHA-seq, the number of sequenced library fragments ending at a

particular nucleotide (and hence mapping the sites of digestion by the nuclease) for each -

587 variant were compared (Figure 4.12). An allele-specific bias in the digestion pattern

was shown for the -587 locus. In fact, only 12% of the total number of fragments carried -

587G, with the majority carrying the A allele (88%).

In the presence of the -587A allele, there were two areas (20-30 bp upstream from the -

587 locus and 70-80 bp downstream) that correlated with a high number of terminating

sequenced reads (Figure 4.12). Therefore, it is likely the -587A locus is located with an

active chromatin region. These findings are consistent with the CHART-PCR results

which showed that the presence of the -587G allele correlates with greater chromatin

condensation than when the A allele is observed.

135

Figure 4.12: Assessment of the allele-specific chromatin accessibility across the -

587 VNN1 promoter region using CHA-seq. Shown is a graphical summary of the Integrated Genome Viewer (IGV) software (Broad Institute) output from CHA-seq that detected 41 heterozygous SNPs in the VNN1 gene by comparison to the hg19 human reference sequence. The CHA-seq results shown represent the total sequence coverage (ticks every 10 base-pairs) across the -587 (red) promoter region (horizontal axis) in the VNN1 gene along with the positions and numbers of the 3’ terminal nucleotide following micrococcal nuclease digestion (vertical axis). Sequence reads containing the -587A allele are shown above the horizontal axis and reads containing the -587G allele are shown below the horizontal axis. The numbers shown on these columns represent the number of sequenced fragments whose 3’ nucleotide terminates at that position. Of

note were two areas (indicated by black arrows) containing the -587A allele that were densely populated and possessed the highest number of terminating sequenced reads, suggesting the presence of active chromatin regions.

136

4.3.2.2 Nucleosome mapping within the -587 VNN1 promoter region

To map nucleosome positions within the -587 VNN1 promoter region (and determine

the relative level of nucleosome protection), sequenced CHA-seq library fragments

containing the -587 locus (G or A) were analysed using the Internal Genome Viewer

(IGV) 2.1 software from the Broad Institute (Figure 4.13). The region surrounding the -

587 SNP displayed a periodicity in the number of MNase cut sites that was consistent

with the positioning of the protective nucleosomes in the region (Figure 4.13). The

‘novel’ SNP (newly identified at the time of this research, but since has been designated

rs45513194) shows partial protection and is positioned at the edge of a potential

nucleosome, whereas the -587 SNP (rs2050153) is located in an area of high sequence

read frequency, indicating the presence of little or no influence from nucleosomes. The

remaining two SNPs (rs7760367 and rs3204527) within the -587 VNN1 promoter region

are likely to be non-functional SNPs due to nucleosome protection.

137

Figure 4.13: Assessment of the chromatin arrangement across the -587 VNN1

promoter region using CHA-seq. Shown is an adapted summary of the Integrated Genome Viewer (IGV) software (Broad Institute) output from CHA-seq showing the -587 (rs2050153) promoter region in the VNN1 gene correlated with the total sequence coverage across the region. Sequence reads (grey bars) corresponding to particular loci are shown along the horizontal axis (genomic position), with read number on the vertical axis. Also shown are the putative positions of the nucleosomes (dashed ovals) based on sequence coverage, with the -587 locus predicted to lie within a region with no nucleosome protection. The labelled red arrows indicate the positions of genetic variants located within the -587 promoter region, with the percentage of reads carrying each allele proportionally represented as red and blue bars underneath each arrow. It is of note is that the -587 SNP is predicted to occur within an area with little to no nucleosome protection.

VNN1 gene promoter (chromosome position)

CH

A-s

eq r

ead

s (n

um

ber

)

138

4.3.3 CHA-seq for the VNN2 and VNN3 genes

The VNN2 and VNN3 genes were subjected to CHA-seq using a prepared MNase-

digested library isolated from a -587G/A heterozygous cell line (SAFHS cohort) in order

to map nuclease digestion sites and illustrate the scalability of the CHA-seq approach.

As shown in Table 4.2, 24 heterozygous SNPs were captured across both genes and

analysed using IGV to determine the location and allele skew of SNPs. All of the 24

VNN2 and VNN3 SNPs captured in CHA-seq were significantly skewed (P<0.005), of

which 11 were in the VNN2 gene and 13 located in the VNN3 gene. Of the 11 VNN2

SNPs, 4 were located in the promoter region and the remaining 7 positioned within

intronic regions. In the VNN3 gene, 3 were located in the promoter and 10 located

within intronic sequences. The SNP located within intron 3 of the VNN3 gene

(chromosome 6 position 133,048,287) had the highest recorded allele skew (28%) and

is a likely candidate for further functional assessment (Table 4.2).

There were 24 SNPs in the dbSNP database (http://www.ncbi.nlm.nih.gov/snp) that

matched to the VNN2 gene and 389 SNPs to the VNN3 gene at the time of analysis (29-

6-13). In CHA-seq, one captured heterozygous SNP at chromosome position

133,048,214 in VNN3 matched to the rs9493417 in dbSNP.

139

Table 4.2: Heterozygous VNN2 and VNN3 SNPs captured in CHA-seq. Shown is a summary of the Integrated Genome Viewer (IGV) software (Broad Institute) output from CHA-seq which detected 11 heterozygous SNPs in the VNN2 gene and 13 in the VNN3 gene by comparison to the hg19 human reference sequence. Shown for each captured VNN1 SNP are the genomic positions (chromosome position), position within the VNN2 and VNN3 genes (promoter/5’upstream, intronic or exonic), alleles, total CHA-seq sequence reads carrying the SNP, rs SNP numbers (N/A = no dbSNP match), allele frequencies (allele 1 and 2), and allele skew as a direct reflection of chromatin accessibility and functionality (percentage). Allele skews that were statistically significant as determined by their critical t-value (3.09% at P=0.005, df=24, 2-tailed) are bolded. The range of nucleosomal protection at genetic variants in the VNN2 and VNN3 genes was between 5-28%. The average allele skew across all captured VNN2 and VNN3 SNPs was 14.6%, with the VNN3 variant at chromosome position 133,048,287 possessing the highest recorded allele skew (28%) (red).

SNP location

(chromosome

position) Gene/genomic location Alleles

Total

reads

rs SNP

number

Allele

1 %

Allele 2

%

Allele

skew

133,085,576 VNN2 promoter/5'upstream A-T 1445 N/A 39 61 11

133,085,564 VNN2 promoter/5'upstream C-T 1448 N/A 61 39 11

133,085,553 VNN2 promoter/5'upstream C-T 1451 N/A 61 39 11

133,085,552 VNN2 promoter/5'upstream A-T 1451 N/A 60 40 10

133,079,482 VNN2 intron 1 C-T 803 N/A 33 67 17

133,076,897 VNN2 intron 4 A-G 711 N/A 29 71 21

133,073,728 VNN2 intron 4 A-C 1382 N/A 63 37 13

133,073,726 VNN2 intron 4 A-T 1385 N/A 37 63 13

133,073,721 VNN2 intron 4 C-T 1384 N/A 63 36 13

133,072,720 VNN2 intron 4 C-T 1518 N/A 61 39 11

133,070,042 VNN2 intron 6 A-G 1033 N/A 45 55 5

133,056,346 VNN3 promoter/5’upstream G-T 1227 N/A 31 69 19

133,056,344 VNN3 promoter/5’upstream A-T 1227 N/A 69 31 19

133,056,340 VNN3 promoter/5’upstream A-T 1227 N/A 31 69 19

133,053,371 VNN3 intron 2 C-T 859 N/A 66 34 16

133,053,369 VNN3 intron 2 A-G 859 N/A 34 66 16

133,053,367 VNN3 intron 2 C-T 859 N/A 34 66 16

133,048,325 VNN3 intron 3 A-G 1669 N/A 37 63 13

133,048,324 VNN3 intron 3 A-G 1669 N/A 62 38 12

133,048,321 VNN3 intron 3 G-T 1669 N/A 62 38 12

133,048,316 VNN3 intron 3 A-T 2268 N/A 72 28 22

133,048,287 VNN3 intron 3 C-T 2875 N/A 78 22 28

133,048,214 VNN3 intron 3 A-G 1689 rs9493417 37 63 13

133,045, 554 VNN3 intron 5 A-C 634 N/A 40 60 10

Average 14.6

140

4.4 Discussion

Genetic variants that alter chromatin accessibility and transcription factor binding

comprise a central regulatory mechanism by which sequence variation leads to changes

in gene expression in humans (Degner et al., 2012). A key result from this present study

is the utility of a novel sequencing approach (CHA-seq) to characterise functional

genetic variants on a larger scale and more time and cost-effectively than existing

chromatin arrangement assays (CHART-PCR).

T-CHA-seq was applied to the -137 promoter region in VNN1, which showed a

significant allele-bias in sequence reads (38.5% for the -137T allele). Taken together,

CHART-PCR and T-CHA-seq results are suggestive that the -137 variant confers allele-

specific chromatin accessibility differences in the VNN1 gene promoter. Göring et al.

(2007) identified allele-specific binding at the -137 variant using EMSA, supporting the

findings of both T-CHA-seq and CHART-PCR approaches from the present study.

Furthermore, the mapping of nuclease cut sites identified that the -137 locus is present

within a region of high DNase I hypersensitivity. Consequently, this result and the

findings of Göring et al. (2007) support the presence of bound proteins at the -137 locus

that confer allele-specific nuclease protection.

In order to assess allele-specific differences in chromatin arrangement across an extended

genomic region, the CHART-PCR assay was translated into a genome-wide format (CHA-

seq). CHA-seq involved the use of a prepared library of DNA fragments covering the

VNN1 gene and which captured 41 heterozygous SNPs. Of these, 37 showed a

significant allele skew (P<0.005), including the SNP captured at position 133,035,099

(rs2294757) which has been reported to have an allele-specific influence in DNA

methylation (Kerkel et al., 2008). Previously, Pastinen et al. (2004) determined an

average observed allele ratio of 37:63 as the cutoff for determining the presence of a

141

significant allelic imbalance. Taking this allele skew as a baseline for functional

candidates and incorporating the 1% Illumina sequencing error rate (i.e. 14%), CHA-seq

has identified 23 novel candidate SNPs that have an allele-specific role in VNN1

chromatin organisation. Of the 23 candidate functional SNPs, 17 occur in intronic

regions, five in exonic regions and one in the promoter (-587). The majority (17/23) of

candidate SNPs, however, are localised to the promoter and intronic regions of the

VNN1 gene, and are suggestive of a region under significant transcriptional regulation

through allele-specific effects on chromatin (Tsukada et al., 2006). Interestingly, the

rs2267951 SNP in intron 2 recorded the second highest allele skew in CHA-seq and was

selected for further in vitro characterisation by EMSA in order to verify SNP

functionality and the veracity of the CHA-seq results (Chapter 6). The remaining

genetic variants with significant allele skews represent ideal candidates for further

functional validation.

Comparison of the number of sequenced library fragments ending at a particular nucleotide

(and hence mapping the sites of digestion by the nuclease) for each -587 variant indicated

an allele-specific bias in nuclease digestion, with the majority of sequence reads terminating

within the promoter when carrying the -587A allele (88%). Furthermore, the -587

promoter region displayed a periodicity in the number of MNase cut sites that correlated

with nucleosome positions within the region. The nucleosome periodicity suggests that the -

587A locus is located within an active euchromatic region, which therefore indicates

greater transcriptional potential. Furthermore, the level of predicted protection of SNPs

within the -587 promoter region was inversely correlated to the allele skew at these

SNPs. Consequently, this may indicate that candidate functional variants with

significant allele-specific chromatin accessibility or CHA-seq allele-skews, such as the

142

-587 variant, have reduced nucleosome protection against MNase digestion due to

histone proteins.

To assess the scalability of the CHA-seq approach, this method was applied to the

VNN2 and VNN3 genes. The results of CHA-seq showed the presence of two candidate

functional SNPs within the VNN2 gene and eight in the VNN3 gene, based on a

comparison to the 14% baseline allele skew. All candidate SNPs in the VNN2 and VNN3

genes were present within non-coding regions, suggesting that the role of these genetic

variants is likely to be associated with chromatin structure and/or transcriptional

regulation. Further experimental validation would assist in the elucidation of the

functional mechanisms associated with these VNN2 and VNN3 candidate variants.

It was expected that CHA-seq would capture all heterozygous SNPs across the entire

VNN1 sequence, including the -137 SNP, using a prepared MNase-digested library from

the SAFHS cohort (note that T-CHA-seq used DNase I). However, a periodicity was

observed in densely populated and under-populated regions with respect to SNP-

containing sequence reads over extended genomic regions. The cause of this unexpected

periodicity is unclear and may be due to inconsistent capture of MNase-digested library

fragments during bead selection (see Section 4.2). Consequently, further optimisation of

the CHA-seq methodology is required to ensure capture of all genetic variants in other

CHA-seq libraries. Since the completion of the current study, a CHA-seq library using

DNase I was prepared using an optimised methodology developed as part of this

research, however, the -137 locus was again not captured. This research area is subject

to further experimentation. An additional limitation of CHA-seq, is the inherent

Illumina sequencing error rate (1%), which may introduce false polymorphisms with

DNA sequence reads.

143

In the present study, CHA-seq has identified multiple candidate functional genetic

variants across the VNN1, VNN2 and VNN3 genes. The CHA-seq results are suggestive

that this approach may represent a time- and cost-effective alternative for identifying

and characterising functional genetic variants. These candidate functional variants in the

Vanin gene family will provide targets for verification and future research. This

potential application however, requires knowledge of the heritable component of allele-

specific chromatin accessibility. Consequently, CHA-seq would represent an ideal

approach for rapidly determining chromatin accessibility patterns within familial

groups.

144

Chapter 5 – DNA methylation analysis of the -587 promoter region in the VNN1 gene

5.1 Introduction

DNA methylation plays a crucial role in transcriptional regulation and is typically

associated with gene silencing either directly, by reducing the accessibility of

transcription factors to the DNA, or indirectly, by enhancing the formation of

heterochromatin (Caro et al., 2012; Watt and Molloy, 1988; Boyes et al., 1991). These

changes are reversible upon de-methylation (Holliday et al., 1991). Bisulfite sequencing

is the standard approach for characterising DNA methylation patterns and is dependent

upon the treatment of genomic DNA with bisulfite prior to sequencing, converting

unmethylated cytosines to uracils and leaving methylated cytosines in their original

state (Rohde et al., 2010; Frommer et al., 1992). Bisulfite sequencing is particularly

useful for the characterisation of genetic variants located within CpG islands, which can

influence chromatin and gene expression depending upon the methylation state of

associated CpG sites (Bird, 2002; Goll and Bestor, 2005); as seen in the CBR1 gene

(Zhang, Y et al., 2009) and the RIL gene (Boumber et al., 2008).

Göring et al. (2007) identified the -587 VNN1 variant to be putatively functional based

on the genetic association with transcript abundance and HLD-C concentration in the

SAFHS cohort. Functional analysis identified no allele-specific DNA-protein

complexes at the -587 locus using EMSA, indicating that differential transcription

factor regulation was not the functional mechanism at this locus (Göring et al., 2007).

Taken together, the CHART-PCR (Chapter 3) and CHA-seq (Chapter 4) results

indicated that the -587 VNN1 variant was in a region of condensed chromatin, hence it

was important to determine whether the methylation status of the -587 locus was

145

directly responsible for the difference. Previously, Kerkel et al. (2008) reported

differential CpG methylation in the VNN1 gene using three haematopoietic cell lines,

and identified a functional exonic SNP (rs2294757) that was associated with

preferential gene expression through allele-specific methylation in the VNN1 gene. In

this current study, bisulfite sequencing was performed on genotyped lymphoblastoid cell

lines from the SAHFS population to assess the DNA methylation status of the -587

promoter region in the VNN1 gene.

5.2 Materials and Methods

A detailed description of the bisulfite sequencing methodology is shown in Section

2.9.3 (Chapter 2), with a brief summary included below.

Genomic DNA was isolated from seven cultured human leukocyte cell lines from the

SAFHS cohort carrying different -587 genotypes (two -587G/G homozygotes, three -

587G/A heterozygotes and two -587A/A homozygotes) and purified using DNA Blood

mini columns (QIAGEN, Australia). Isolated DNA was treated with bisulfite to convert

unmethylated cytosines bases into uracils, and subsequently purified using EpiTect spin

columns (QIAGEN, Australia). The converted DNA was enriched for the VNN1 gene

promoter by PCR using primers (5’-CAGTGAGCCTGTACCCCTGG-3’ and 5’-

TATTCTCCCCATGTGTGTGCC-3’) flanking the predicted CpG island (between -758

to -496 upstream of the VNN1 transcriptional start site). The amplification products

were cloned into the pCR2.1-TOPO vector (Life Technologies Co., USA) for

sequencing (Australian Genome Research Facility, Perth, Western Australia).

Sequences were analysed using the online analysis program BISMA (Bisulfite

Sequencing DNA Methylation Analysis) (http://biochem.jacobs-

university.de/BDPC/BISMA/) using the default analysis parameters to identify the

146

methylation state of predicted CG di-nucleotides in CpG islands. Further bioinformatic

verification was performed using AliBaba 2.1 (http://www.gene-

regulation.com/pub/programs/alibaba2/index.html) in order to determine whether

transcription factor binding sites occured within CpG sites.

5.3 Results

5.3.1 Predicted CpG sites in the -587 promoter region of the VNN1 gene

To investigate DNA methylation in the VNN1 gene promoter, bioinformatic analysis

was performed in the region surrounding the -587 SNP, indicating the presence of a

CpG island. A 2 kb region upstream of the transcriptional start site of the VNN1 gene

was analysed using BISMA, which identified the presence of a CpG island composed of

10 CG di-nucleotides (Figure 5.1). Interestingly, the -587G allele was imbedded within

a CG di-nucleotide. However, the -587A allele ablates the CG-dinucleotide and reduces

the length of the CpG island (Figure 5.1). In fact, the predicted CpG island length was

264 bp in the presence of the -587G allele, but only 204 bp when the A allele was

present (Figure 5.1).

147

Figure 5.1: BISMA prediction of VNN1 CpG island length. Shown are bioinformatic predictions from Bisulfite Sequencing DNA Methylation Analysis (BISMA) (http://biochem.jacobs-university.de/BDPC/BISMA/) which interrogated the VNN1 gene promoter for potential CpG islands in the presence of different -587 alleles (R= G or A, in red). BISMA predicted the presence of 10 methylated CG di-nucleotides (grey boxes) in the VNN1 gene promoter. Black dots illustrate the predicted CpG island length of 264 bp when the -587G allele is present, with grey dots correlating with a CpG island length of 204 bp when the -587A allele is present. Of interest is that the presence of the -587A allele disrupts the formation of a CpG site which occurs in the presence of the -587G allele and therefore correlates with an allele-specific difference in the length (60 bp) of the CpG island.

148

5.3.2 Bisulfite sequencing of the -587 promoter region in the VNN1 gene and BISMA analysis

Seven genotyped cell lines from the SAFHS cohort, differing in their -587 genotype,

were assessed using bisulfite sequencing to determine the DNA methylation status of

the predicted VNN1 promoter CpG island (Figure 5.2). As shown in Figure 5.2, the -

587G site in both homozygotes and heterozygotes shows near complete cytosine

methylation (on the opposite strand). In fact, all CpG di-nucleotides displayed heavy

methylation, except at the -720 CpG site. For this site, the methylation pattern

correlated with the genotype at the -587 locus. In haplotypes carrying the -587G allele,

the -720 CpG site was generally un-methylated (38 out of 48 sequences). Conversely,

when the -587A allele was present, the -720 CpG site was almost exclusively cytosine

methylated (41 out of 42 sequences). This pattern was present even in the heterozygous

lines, indicating that the allele-specific pattern is not cell line specific but rather is

determined by genotype. The biological consequence of this is uncertain, but is likely

responsible for the -587 allelic differences apparent in chromatin arrangement. The

observed variability in unmethylated CpG sites (apart from the -720 site) may be

attributed to factors such as: clonal DNA amplification, incomplete bisulfite conversion

or non-CpG methylation (Zhang and Jeltsch, 2010).

149

Figure 5.2: Methylation of the -587 SNP. Bisulfite sequencing analysis of genotyped lymphoblastoid lines from the SAFHS population. Cell line designations and -587 genotype are shown on the left. Black boxes indicate DNA methylation, whereas white boxes indicate no methylation. “A” represents the alternate allele at -587 which is not a substrate for CpG methylation. The grey shading indicates heterozygous cell lines. Near complete methylation (41 out of 42 sequences) was seen at the -587G allele. Additionally, the methylation status at -587 appears to affect methylation at -720, a non-variant site.

150

5.3.3 Bioinformatic analysis of the -720 CpG site in the VNN1 gene promoter

The bisulfite sequencing results and bioinformatic analysis showed an association

between the methylation status of the -720 CpG site and the allele present at the -587

locus (Figure 5.2). To determine a potential functional mechanism of allele-specific

DNA methylation at the -720 CpG site, bioinformatic analysis was conducted using

AliBaba 2.1 (http://www.gene-regulation.com/pub/programs/alibaba2/index.html).

Analysis revealed a CCAAT/enhancer-binding protein beta (C/EBPβ) transcription

factor binding site at the -720 locus (Figure 5.3). Furthermore, the GATA-binding factor

1 (GATA-1) transcription factor was predicted to bind downstream of the -720 locus

(Figure 5.3).

151

Figure 5.3: Bioinformatic analysis of -720 CpG site in the VNN1 gene promoter. Shown is an adapted output from the online bioinformatic tool AliBaba 2.1 (Grabe, 2002), which interrogated the -720 promoter region in the VNN1 gene (red) for potential transcription factor binding sites. A CCAAT/enhancer-binding protein beta (C/EBPbeta) binding site (AliBaba score: 20) encompasses the -720 CpG site as shown, while a GATA-binding factor 1 (GATA -1) binding site (AliBaba score: 29) is predicted further downstream (3’ direction).

152

5.4 Discussion

Chromatin arrangement is dependent upon DNA sequence and methylation status

(Statham et al., 2012). A genetic variant may alter chromatin arrangement over a large

genomic region whereby alternate alleles affect chromatin arrangement and hence

differentially regulate gene expression. In the current study, the chromatin architecture

was assessed in the -587 promoter region of VNN1. The CHART-PCR and CHA-seq

results showed that the -587G variant resides within a region of condensed chromatin.

Conversely, when the A allele is present at the -587 locus, chromatin is less condensed.

Furthermore, methylation at the -587G allele was supported through BISMA analysis.

Additionally, the length of the CpG island surrounding the -587G allele is greater than

that surrounding the -587A allele. The results of the current study suggest that the

methylation status of the VNN1 gene promoter at the -587 SNP directly affect chromatin

structure and the access of the transcriptional machinery to this gene. Thus, the allele

present at the -587 SNP is likely to be functional and influence gene expression.

Interestingly, DNA methylation at a site 133 bp upstream (-720) of the -587 SNP

appears to be inversely correlated with the -587 genotype, suggesting that allele-

dependant chromatin changes can be seen at sites remote from the actual genetic

variation. Typically, the -587G allele was methylated whereas the -720 CpG site was

unmethylated. In contrast, the -587A allele was not methylated and was seemingly

associated with a methylated -720 CpG site. Kerkel et al. (2008) reported a similar

pattern of differential CpG methylation at the VNN1 -720 site in three haematopoietic

cell lines derived from the perifieral blood of non-specified adults. Using methylation-

sensitive SNP analysis (MSNP), VNN1 was identified as a gene regulated by allele-

specific DNA methylation (Kerkel et al., 2008). However, in contrast to the current

study, Kerkel et al. (2008) suggest that the genotype at -587 does not correlate with -

153

720 methylation status which was a novel finding in the current study. The absence of a

correlation in the Kerkel et al. (2008) study may be attributed to the difference in the

selection of experimental validation method, population-specific differences, or

unknown -137 genotype. In fact, the CHART-PCR and CHA-seq results are suggestive

that the -137 variant contributes to the differences in chromatin seen in the VNN1

promoter. However, given the almost complete linkage disequilibrium that exists

between the -587 and -137 loci in the SAFHS population (Göring et al., 2007), it is

unclear which genetic variant has a functional role in affecting VNN1 chromatin

arrangement. A possible solution may be to re-analyse the results of the Kerkel et al.

(2008) study by accounting for the -137 genotype, or to functionally assess a cell line

from the SAFHS cohort carrying -137 and -587 loci that are not in linkage

disequilibrium.

Besides the -720 CpG site, the bisulfite sequencing data showed overall methylation of

the VNN1 CpG island regardless of -587 genotype. Given the linkage disequilibrium

between -587 and -137 in the SAFHS population (Göring et al., 2007), it is possible that

the effects seen at the -720 site are due to allele-specific events at the -137 site. A

feasible mechanism is that allele-specific Sp1 transcription factor binding leads to

allele-specific changes to chromatin arrangement of an extended promoter region

upstream of the VNN1 gene, and consequently, differential gene expression.

The bisulfite sequencing and bioinformatic analysis showed that allele-specific changes

to chromatin arrangement of an extended VNN1 promoter region may correlate with

differential C/EBPβ binding at the -720 CpG site. C/EBPβ is a transcriptional activator

of genes involved in immune and inflammatory responses (Akira et al., 1990).

Previously, C/EBPβ binding has been experimentally demonstrated to be influenced by

154

DNA methylation in mammalian cell lines (Ikeda et al., 2008). Given that VNN1 is a

key regulator of responses to inflammatory stimuli (Min-Oo et al., 2007), C/EBPβ

represents a candidate activator of the VNN1 gene under the influence of allele-specific

DNA methylation. Experimental verification using ChIP-seq or EMSA could be

conducted at a later date to confirm the C/EBPβ interaction in the VNN1 promoter.

155

Chapter 6 – Experimental validation of candidate functional SNP using EMSA

6.1 Introduction

A number of candidate functional genetic variants were identified in VNN1 using CHA-

seq (Chapter 4), including the rs2267951 SNP which recorded the second-highest allele-

skew (35%), which could influence VNN1 expression. Previously, Göring et al. (2007)

used genome-wide transcriptional profiles and bioinformatic analyses to identify cis-

regulatory genetic variants in VNN1. However, there have been no reports that the

rs2267951 variant is a functional candidate. In the current study, EMSA was used to

experimentally validate the functionality of the rs2267951 SNP and to substantiate the

veracity of the CHA-seq approach.

In order to provide a positive control for allele-specific nuclear factor binding in EMSA,

a candidate functional SNP (‘-191T/C’) in the Presenilin Associated Rhomboid-Like

(PARL, formally PSARL) gene promoter was selected based on strong genetic

association with mitochondrial content levels (Curran et al., 2009). Following

functional analysis of the -191T/C SNP, EMSA and bioinformatic analysis was applied

to experimentally validate the CHA-seq results for the rs2267951 VNN1 SNP.

6.2 Methods

A detailed description of the EMSA methodology is shown in Section 2.9.2 (Chapter 2).

Of note, complementary 5' biotinylated oligonucleotides representing both allelic forms

of the promoter variants were obtained commercially from Sigma-Aldrich Pty Ltd

(Sigma, Australia). The sequences of the EMSA oligonucleotides are shown in Table

156

6.1. Nuclear extract preparation, binding reactions and gel analysis and detection were

performed as described previously (Section 2.9.2).

For EMSA assessment at the -191T/C SNP, nuclear extract was isolated from the Jurkat

E6-1 cell line (ATCC #TIB-152) (Curran et al., 2009), whereas nuclear extract was

isolated from the ‘0331’ SAFHS cell line for the rs2267951 SNP. The 0331 cell line

was selected as it was previously used for CHA-seq experiments (Chapter 4).

157

Table 6.1: EMSA oligonucleotides for PARL and VNN1 SNPs

Electrophoretic Mobility Shift Assays (EMSA) were performed using sets of sequence-specific oligonucleotides obtained from Sigma-Aldrich Pty Ltd (Sigma, Australia) of the sequence shown (“DNA SEQUENCE/MODIFICATION”). Oligonucleotides were designed to target the -191 PARL SNP (rows 1-4) and the rs2267951 VNN1 SNP (rows 5-8). Two pairs of allelic oligonucleotides were required for each DNA-protein binding reaction targeting a particular SNP. Each pair consisted of two complementary base sequences (forward from 5’ to 3’ and reverse from 3’ to 5’) that differ only by the SNP allele (as shown in bold). The forward oligonucleotide from each complementary pair also possessed a 5’ biotin modification to facilitate visualisation of DNA-protein binding complexes. Forward (FWD) and reverse (REV) primers are indicated by their respective suffix. Pairs used in EMSA assays are grouped together and share the same prefix. OLIGONUCLEOTIDE

DESIGNATION

DNA Sequence/Modification (5’-3’)

-191 EMSA (RARE ALLELE) FWD

5’ Biotin – GAGCCAGCTGAGCAGCGGGAGGGGAAGCGGT 3’

-191 EMSA (RARE ALLELE) REV

5’ ACCGCTTCCCCTCCCGCTGCTCAGCTGGCTC 3’

-191 EMSA (COMMON ALLELE) FWD

5’ Biotin – GAGCCAGCTGAGCAGTGGGAGGGGAAGCGGT 3’

-191 EMSA (COMMON ALLELE) REV

5’ ACCGCTTCCCCTCCCACTGCTCAGCTGGCTC 3’

RS2267951 EMSA (T

ALLELE) FWD

5’ – Biotin – CTGAGATTGGATTTTGTGGGGTGAGGTGAGT- 3’

RS2267951 EMSA (T

ALLELE) REV

5’- ACTCACCTCACCCCACAAAATCCAATCTCAG -3’

RS2267951 EMSA (C

ALLELE) FWD

5’ – Biotin – CTGAGATTGGATTTCGTGGGGTGAGGTGAGT- 3’

RS2267951 EMSA (C

ALLELE) REV

5’ – ACTCACCTCACCCCACGAAATCCAATCTCAG -3’

158

6.3 PARL Results

6.3.1 EMSA and bioinformatic analysis of the -191 SNP in the PARL gene promoter

To provide a positive control for allele-specific nuclear factor binding, EMSA was

performed by comparing sequences that differed only at the -191T/C nucleotide. The

results shown in Figure 6.1 suggests that the -191 SNP did affect binding of nuclear

factors, with a major complex preferentially binding to the -191T allele in nuclear

extract conditions (lane 4). This complex is absent in the -191C sample (lane 3). In

order identify transcription factor binding sites within the PARL gene promoter,

bioinformatic analysis using P-Match was performed, and revealed that the -191T allele

is embedded in a Sp1 binding site (TRANSFAC 6.0 Core similarity = 0.957, matrix

similarity = 0.967). In contrast, the -191C SNP disrupts the core sequence. These results

suggest that the -191 SNP contributes to allele-specific differences in PARL expression,

possibly via Sp1-mediated regulation.

159

Figure 6.1: EMSA for the -191 PARL SNP. Shown are the results of the ESMA that compares the gel-migratory patterns of -191T/C specific oligos in the absence of nuclear extract (lane 1 and 2), and in the presence of Jurkat lymphocyte nuclear extract (lanes 3 and 4) and analysed on a 6% non-denaturing polyacrylamide gel. A difference in banding patterns is observed between lanes 3 and 4 which differ in -191 alleles indicate the presence of an additional complex (indicated by arrowhead) in the presence of T allele (lane 4) under nuclear extract conditions. In contrast, this band was not seen in the presence of the C allele (lane 3). No complex was observed without the presence of nuclear extract (lanes 1 and 2).

160

6.4 VNN1 Results

6.4.1 Bioinformatic analysis of the rs2267951 VNN1 SNP

To determine whether the rs2267951 SNP is embedded within a transcription factor

binding site, bioinformatic analysis was conducted using AliBaba 2.1 (TRANSFAC 7.0)

(http://www.gene-regulation.com/pub/programs/alibaba2/index.html) which predicted

C/EBPα and (yeast-specific) MIG1 binding sites specifically at the T allele (Figure 6.2).

Conversely, no transcription factor binding sites were predicted at the rs2267951 locus

in the presence of the C allele. However, Sp1 was predicted to bind downstream (3’) of

both alleles (Figure 6.2).

161

A

B

Figure 6.2: Bioinformatic analysis of the rs2267951 SNP in the VNN1 gene. Shown is the output from the online bioinformatic tool AliBaba 2.1 (Grabe, 2002), which interrogated the VNN1 gene region (red text) surrounding the rs2267951 SNP (labelled) for potential transcription factor binding sites. As shown in Figure 6.2A, three transcription factor binding sites were predicted within VNN1 in the presence of the T allele of the rs2267951, including CCAAT/enhancer-binding protein alpha (C/EBPalp)(AliBaba score: 23), MIG1 (AliBaba score: 27) and Specificity Protein 1 (Sp1) (AliBaba score: 34). As shown in Figure 6.2B is the AliBaba 2.1 output which identified a Sp1 binding site (AliBaba score: 34) in the VNN1 gene region (red text) in which the C allele of the rs2267951 is embedded. Note that the T allele introduces two transcription factor binding sites (C/EBPalpa, MIG1) which were not observed in the presence of the C allele.

162

6.4.2 EMSA analysis of the rs2267951 VNN1 SNP

To determine whether the rs2267951 SNP was associated with allele-specific DNA-

protein binding in vitro, as observed in the positive control (-191 PARL SNP), EMSA

was undertaken on sequences that differed by only the single rs2267951 nucleotide. The

results shown in Figure 6.3B indicated that the rs2267951 SNP does affect binding of

nuclear factors, with a major complex preferentially binding to the T allele in nuclear

extract derived from the SAFHS cohort (lane 4). This complex was absent in the

presence of the C allele (lane 3). A similar EMSA result was observed at the -191T/C

nucleotide (Figure 6.3A).

163

A B

Figure 6.3: EMSA for the rs2267951 VNN1 SNP relative to the -191 PARL SNP. Shown are the results of the ESMA that compares the gel-migratory patterns of rs2267951-specific oligonucleotides (Figure 6.3B on right) in the absence of nuclear extract (lane 1 and 2) and in the presence of SAFHS (cell line ‘0331’) nuclear extract (lane 3 and 4) relative to the positive control at the -191 PARL SNP (Figure 6.3A on left). Figure 6.3A shows a difference in banding patterns between lanes 3 and 4 which differ in -191 alleles. An additional complex (indicated by arrowhead) is seen the presence of T allele (lane 4) under nuclear extract conditions in contrast to the C allele (lane 3), where no band is present. No complex was observed without the presence of nuclear extract (lane 1 and 2). Figure 6.3B shows a difference in banding patterns between lanes 3 and 4 which differ in rs2267951 alleles. An additional complex (indicated by arrowhead) is seen in the presence of T allele (lane 4) under nuclear extract conditions, in contrast to the C allele (lane 3), where this band is not present. No complex was observed without nuclear extract (lane 1 and 2).

164

6.5 Discussion In the current study, two genetic variants (-191T/C PARL SNP and rs2267951 VNN1

SNP) were validated for functionality at the molecular level using EMSA and

bioinformatic analysis. The purpose of these analyses was to verify the veracity of the

CHA-seq results for the rs2267951 SNP in the VNN1 gene.

EMSA results and bioinformatic analyses where suggestive that the presence of the

common -191T SNP introduces a Sp1 transcription factor binding site into the PARL

promoter. This binding site creates a transcription domain that could regulate PARL

expression when the T allele is present. This is verified by the statistical analysis which

indentifies an association of the -191T/C SNP with mitochondrial content levels (Curran

et al., 2009). Taken together, the -191T/C SNP represents a functional genetic variant

and has been used in this current study as a positive control for allele-specific nuclear

factor binding.

For VNN1, the EMSA results show the presence of an additional DNA-protein complex

in the presence of the T allele at the rs2267951 locus. This result confirms that the

rs2267951 variant would be functional in CHA-seq. Bioinformatic analysis suggested a

CCAAT/enhancer-binding protein alpha (C/EBPα) transcription factor binding site in

the VNN1 gene in the presence of the T allele. C/EBPα is a transcription factor that is

encoded by the CEBPA gene in humans (Dufour et al., 2009; Szpirer et al., 1992; Cao

et al., 1991). The C/EBPα protein contains the basic Leucine Zipper Domain (bZIP

domain), which permits the binding to target promoters and enhancers (Dufour et al.,

2009; Ellenberger, 1994). For example, C/EBPα has been associated with body weight

homeostasis as a transcription enhancer (Miller et al., 1996). A potential consequence of

allele-specific C/EBPα regulation would be increased VNN1 expression in the presence

of the T allele. Given the association between body weight and cardiovascular disease

165

(Lavie et al., 2009; Berrington de Gonzalez et al., 2010), the rs2267951 SNP may

confer allele-specific functional differences in cardiovascular phenotypes.

The results of EMSA provide further experimental evidence of function for the

rs2267951 variant, which is supported by the bioinformatic analysis. This analysis

revealed an allele-specific C/EBPα binding site in the presence of the T allele in VNN1.

Therefore, the EMSA and bioinformatic results are suggestive that CHA-seq is a

reasonable measure of function for genetic variants.

166

Chapter 7 – General discussion and future research directions

7.1 CHA-seq contribution to functional research Reporter gene and gel shift assays have been used successfully over the last two decades to

determine the functionality of genetic variants. However, both approaches possess

limitations that affect their utility. For example, these assays lack scalability and do not

provide the proper chromatin context and hence are not always representative of function in

an in vivo context. In order to overcome limitations associated with current functional

assays, two novel in vivo assays (T-CHA-seq and CHA-seq) were developed as part of

the current study to provide a reasonable alternative for functional assessment of genetic

variants. The CHA-seq approach was developed in order to quantitate differences in

chromatin arrangement as a measure of functional differences at genetic variants by

extending CHART-PCR into a genome-wide format using next-gen DNA sequencing.

The CHART-PCR method has been refined to enable cell-type and allele-specific

comparisons of chromatin accessibility associated with genetic variants (Cruikshank et

al., 2012), however, the assay is not efficiently scalable in the current format.

Traditionally, CHART-PCR consists of the targeted amplification of specific genomic

regions of interest. However, this process requires significant time and resources to

characterise entire genes or genomes and thus is limited by its ability to analyse short

target sequences. To overcome this limitation, next-gen sequencing was incorporated as

part of the CHA-seq approach in order to provide a cost- and time-efficient alternative

for the quantification of chromatin accessibility at genetic variants across extended

genomic regions. The CHA-seq approach has been used successfully to analyse the

Vanin gene family in the current study. Other CHA-seq features of this approach

include the ability to: generate a high-resolution DNA sequence map specifying the

167

alleles of genetic variants within a region or genome of interest, identify novel SNPs,

map nucleosome positions relative to genetic variants, and to functionally assess genetic

variants through interpretation of allele-specific biases in sequence read numbers

(‘allele skew’).

Chromatin accessibility is considered a reliable measure of the functionality of genetic

variants (Mikkelson et al., 2007; Birney et al., 2010; Pang et al., 2013) and has been

used extensively in the current research to identify and characterise candidate functional

genetic variants in the VNN1 gene. Traditionally, chromatin arrangement, as measured

by CHART-PCR, is calculated as a ratio of relative abundance of digested to undigested

amplicon and expressed as a percentage of cutting within a genomic region. In contrast,

chromatin arrangement, as measured by the CHA-seq approach, is calculated as an

allele-specific bias in sequence read numbers at genetic variants. At the -137 and -587

VNN1 SNPs, both CHA-seq and CHART-PCR results showed matching allele-specific

chromatin arrangement, which is indicative that DNA sequencing is a viable alternative

for the quantification of functional genetic variants.

The selection of chromatin-sensitive nuclease is an important feature in the

interpretation of CHA-seq results, since nucleases may differ in recognition sites and

mode of nucleolytic digestion. While MNase has been used to map nucleosome

occupancies and positions (Rizzo et al., 2012), DNase I is commonly used to map gene

promoter regions for chromatin arrangement, potential transcription factor binding sites

and areas of hypersensitivity (Boyle et al., 2011; Drouin et al., 1997). Consequently, the

selection of nuclease for the CHA-seq approach was a means for studying specific

functional properties of genetic variants. For example, DNase I was specifically selected

in T-CHA-seq to assess chromatin accessibility associated with the allele-specific Sp1

168

binding at the -137 locus (Göring et al., 2007). As such, the use of multiple nucleases

can provide complementary information regarding the chromatin structure in a

particular region. This allows a more conclusive interpretation of the specific functional

mechanism of genetic variants.

7.2 Contribution to VNN1 research A significant output of the current study was the characterisation of three experimentally

verified functional genetic variants, namely rs4897612 (-137), rs2050153 (-587) and

rs2267951 in the VNN1 gene. Taken together, the results from CHART-PCR (Chapter 3)

and CHA-seq (Chapter 4) showed that the -137 and -587 SNPs within the VNN1 promoter,

previously predicted to be functional through extensive statistical interrogation of transcript

levels from the SAFHS cohort (Göring et al., 2007), were actually functional due to allele-

specific chromatin accessibility. Previously, Göring et al. (2007) used EMSA to show the

presence of allele-specific DNA-protein complexes in the presence of the -137T allele that

are consistent with the ability to preferentially bind the activating transcription factor Sp1

and the allele-specific chromatin accessibility observed in CHART-PCR (Chapter 3). In

contrast, Göring et al. (2007) reported that the -587 alleles displayed no differences in

transcription factor binding using EMSA. However, the application of the CHART-PCR

and CHA-seq methods showed significant allelic differences in chromatin accessibility

surrounding the -587 SNP and consequently indicated the functional potential at this variant

in addition to the -137 SNP. As the -587G allele is contained within a CG di-nucleotide and

hence is a potential site of cytosine methylation, BISMA analysis of the region surrounding

the -587 SNP revealed allele-specific differences in the length of the CpG island predicted

in the VNN1 promoter. Assessment of the methylation status of the region surrounding the -

587 SNP, using bisulfite sequencing, showed an allele-specific methylation pattern at a

single CpG site (-720 locus) located upstream of the -587 SNP. This indicated that allele-

specific chromatin differences can be seen at sites remote from the causal genetic variant(s).

169

Similar results have been previously reported in a study by Kerkel et al. (2008) in which

three haematopoietic cell samples showed a similar pattern of differential CpG methylation

at the -720 site to that in the SAFHS cell lines used in the current study. However, in

contrast to the results of the current research, the results of Kerkel et al. (2008) suggest that

the -587 genotype does not correlate with the -720 methylation status. Given the strong

linkage disequilibrium between the -587 and -137 SNPs in the SAFHS cohort, as reported

by Göring et al. (2007), it is possible that the effects seen at the -720 site are due to allele-

specific events at the -137 locus. Potentially, the allele-specific DNA methylation may be

caused by allele-specific Sp1 transcription factor binding, resulting in allele-specific

chromatin arrangement of an extended promoter region upstream of the VNN1 gene. Such

an occurrence could lead to differential gene expression. Allele-specific changes to

chromatin arrangement within an extended genomic region have been demonstrated

previously by Shoemakers et al. (2010) and Zhang, Y et al. (2009).

Given the nature of the functional effects of the -137 and -587 variants, it is possible that

the effects of these variants are independent. However, given the linkage disequilibrium

observed between them, it is more likely that the two functional variants are part of a

haplotype that regulates genotype-specific VNN1 expression. This interaction may be

complicated by the presence of additional functional genetic variants (CHA-seq candidates

in Chapter 4) located in regions distinct from the VNN1 gene promoter. For instance, the

CHA-seq results were suggestive that the rs2267951 VNN1 SNP to be functional based on

a significant allele-skew, which was subsequently verified using EMSA and

bioinformatic analysis supporting the presence of allele-specific C/EBPα binding at this

site. Potentially, functional haplotypes containing multiple functional genetic variants

may exist in genes other than VNN1.

170

Given that VNN1 appears to play a significant role in determining levels of ‘protective’

HDL-cholesterol (Göring et al., 2007; Jacobo-Albavera et al., 2012), functional genetic

variants associated with high levels of VNN1 gene expression, such as the -137 and -587

SNPs, may lead to reduced HDL-C levels which is known to contribute to an increased

risk of CVD.

7.3 Future research directions

The elucidation of functional genetic variants that alter human phenotypes, such as

disease susceptibility, is essential to characterising the relationship between genotype

and phenotype (Stranger et al., 2011; Cooper and Shendure, 2011). Although

comprehensive catalogues of human genetic variation exist, the identification of causal

functional variants in particular phenotypes or disease has proven difficult (Cooper and

Shendure, 2011). Perhaps the most comprehensive catalogue of functional genetic

variants is the ENCODE database which was developed in order to improve the

annotation of functional genetic variation in the human genome (Birney et al., 2007;

Rosenbloom et al., 2012). While ENCODE predictions are useful in facilitating

hypothesis generation and improving the prioritisation of variants, they are insufficient

for verifying the biological function and interactions of regulatory variants in the

phenotype of different cells and tissues when not supported by other data (Rosenbloom

et al., 2012). For example, a candidate functional variant may be predicted by the

ENCODE database to disrupt a consensus transcription factor binding motif, however,

these predictions are limited without experimental validation to verify allele-specific

binding and the associated biological consequence. Consequently, functional assays,

such as CHA-seq and T-CHA-seq that incorporate prioritisation and experimental

validation may represent a more time and cost-efficient alternative to greatly improve

the rate of discovery of functional genetic variants in phenotypes of interest.

171

The results of the current study are suggestive that CHA-seq is a time and cost-effective

in vivo approach for the identification of functional variation in extended genomic

regions. A key recommendation for the application of CHA-seq to cell lines possessing

genetic variants of interest is to incorporate one or more chromatin-sensitive nucleases

in the CHA-seq approach, followed by the application of T-CHA-seq to interrogate the

effects of individual genetic variants. The incorporation of additional experimental

validation tests, such as EMSA and bisulfite sequencing, will enhance the positive

identification of functional variants by providing further supporting evidence. Further

CHA-seq experiments will ultimately involve the generation of whole genome maps

across different populations. The application of CHA-seq to populations comprised of

familial groups would enable confirmation that observed allele-specific patterns are

heritable. The identification of heritable functional variants, in contrast to cell or

individual-specific functional variants, would further establish the veracity of the CHA-

seq approach to identify true functional variants.

Previous Vanin research used transcript levels as an endophenotype to identify VNN1 as

a quantitative trait locus associated with high density lipoprotein–cholesterol levels and

cardiovascular phenotypes (Göring et al., 2007). More recently, genetic variants within

VNN1 have been found to be associated with cardiovascular disease (Jacobo-Albavera

et al., 2012; Fava et al., 2010; Zhu and Cooper, 2007). However, limited experimental

validation has been performed to determine specific functional variants in VNN1

(Göring et al., 2007; Kerkel et al., 2008). This current research, has contributed to the

number of experimentally validated functional genetic variants in VNN1 through the

application of the novel CHA-seq assay.

172

The research described in this thesis was able to establish that allele-specific chromatin

differences can be seen at sites distinct from the actual genetic variant, and suggested

the presence of co-operative functional genetic variants within the VNN1 gene. As

functional variants may operate in a cell or tissue-dependent manner (Dimas et al.,

2009), the findings of the current study would require further interrogation using

different cell types, to elucidate whether the mechanisms of action associated with

functional variants are ubiquitous or specific to certain cells.

Of interest in the current study was the presence of a periodicity in densely populated

and under-populated regions with respect to SNP-containing CHA-seq reads over

extended genomic regions. It is not known whether this periodicity is a direct reflection

of large-scale chromatin arrangement across the VNN1 gene or a consequence of library

preparation. In either case, this observation requires further testing in order to optimise

CHA-seq for entire genomes.

The importance of characterising genetic variation is becoming apparent due to the

increasing number of functional variants associated with a wide array of diseases

(Stranger et al., 2011). Consequently, the knowledge of an individual’s SNP genotype

may provide a basis for assessing susceptibility to disease and the subsequent selection

of optimal treatments (Shastry, 2007). Alternatively, these functional genetic variants

may facilitate the development of customised genetic therapies. Functional VNN1

variants identified in the present study may represent potential targets for therapeutic

intervention, or represent genetic markers for cardiovascular disease susceptibility.

Ultimately, the long term goal in this field of research is to be able to predict disease

susceptibility based on genotype and knowledge of which genetic variants are

functional (Cooper and Shendure, 2011).

173

References

Abecasis, G.R., Auton, A., Brooks, L.D., DePristo, M.A., Durbin, R.M., Handsaker, R.E., Kang, H.M., Marth, G.T., McVean, G.A., Altshuler, D.M., et al. (2012). "An integrated map of genetic variation from 1,092 human genomes." Nature 491(7422): 56–65. Akira, S., Isshiki, H., Sugita, T., Tanabe, O., Kinoshita, S., Nishio, Y., Nakajima, T., Hirano, T. and Kishimoto, T. (1990). "A nuclear factor for IL-6 expression (NF-IL6) is a member of a C/EBP family." The EMBO Journal 9(6): 1897-1906. Alam, J. and Cook, J.L. (1990). "Reporter genes: Application to the study of mammalian gene transcription." Analytical Biochemistry 188(2): 245-254. Albers, C.A., Paul, D.S., Schulze, H., Freson, K., Stephens, J.C., Smethurst, P.A., Jolley, J.D., Cvejic, A., Kostadima, M., Bertone, P., et al. (2012). "Compound inheritance of a low-frequency regulatory SNP and a rare null mutation in exon-junction complex subunit RBM8A causes TAR syndrome." Nature Genetics 44(4): 435-439. Alkan, C., Coe, B.P. and Eichler, E.E. (2011). "Genome structural variation discovery and genotyping." Nature Reviews Genetics 12(5): 363-376. Altshuler, D., Daly, M.J. and Lander, E.S. (2008). "Genetic mapping in human disease." Science 322(5903): 881–888. Altshuler, D.M., Lander, E.S., Ambrogio, L., Bloom, T., Cibulskis, K., Fennell, T.J., Gabriel, S.B., Jaffe, D.B., Shefler, E., Sougnez, C.L., et al. (2010). "A map of human genome variation from population-scale sequencing." Nature 467(7319): 1061–1073. Ameur, A., Rada-Iglesias, A., Komorowski, J. and Wadelius, C. (2009). "Identification of candidate regulatory SNPs by combination of transcription-factor-binding site prediction, SNP genotyping and haploChIP." Nucleic Acids Research 37(12): e85-e85. An, W., Leuba, S., Van Holde, K. and Zlatanova, J. (1998). "Biochemistry Linker histone protects linker DNA on only one side of the core particle and in a sequence-dependent manner." Proceedings of the National Academy of Sciences 95(7): 3396-3401. Anderson, M.A., Tanzi, R.E., Watkins, P.C., Ottina, K., Wallace, M.R., Sakaguchi, A.Y., Youngh, A.B., Shoulsonh, I. and Bonillah, E. (2004). "A polymorphic DNA marker genetically linked to Huntington's disease." Landmarks in Medical Genetics:

Classic Papers with Commentaries 306(51): 153. Aniba, M.R., Poch, O. and Thompson, J.D. (2010). "Issues in bioinformatics benchmarking: the case study of multiple sequence alignment." Nucleic Acids Research 38(21): 7353-7363. Aurrand-Lions, M., Galland, F., Bazin, H., Zakharyev, V., Imhof, B. and Naquet, P. (1996). "Vanin 1, a novel GPI-linked perivascular molecule involved in thymus homing." Immunity 5(5): 391–405.

174

Auton, A. and McVean, G. (2012). "Estimating Recombination Rates from Genetic Variation in Humans." Evolutionary Genomics: Methods in Molecular Biology 856: 217-237. Axel, R. (1975). "Cleavage of DNA in nuclei and chromatin with staphylococcal nuclease." Biochemistry 14(13): 2921–2925. Bakir-Gungor, B. and Sezerman, O.U. (2011). "A New Methodology to Associate SNPs with Human Diseases According to Their Pathway Related Context." PLoS ONE 6(10): e26277. Baral, A., Kumar, P., Halder, R., Mani, P., Yadav, V.K., Singh, A., Das, S.K. and Chowdhury, S. (2012). "Quadruplex-single nucleotide polymorphisms (Quad-SNP) influence gene expression difference among individuals." Nucleic Acids Research 40(9): 3800–3811. Baret, J.-C., Beck, Y., Billas-Massobrio, I., Moras, D. and Griffiths, A.D. (2010). "Quantitative Cell-Based Reporter Gene Assays Using Droplet-Based Microfluidics." Chemistry & Biology 17(5): 528-536. Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I. and Zhao, K. (2007). "High-resolution profiling of histone methylations in the human genome." Cell 129(4): 823-837. Bartoszewski, R.A., Jablonsky, M., Bartoszewska, S., Stevenson, L., Dai, Q., Kappes, J., Collawn, J.F. and Bebok, Z. (2010). "A synonymous single nucleotide polymorphism in ∆F508 CFTR alters the secondary structure of the mRNA and the expression of the mutant protein." The Journal of Biological Chemistry 285(37): 28741–28748. Baudat, F., Buard, J., Grey, C., Fledel-Alon, A., Ober, C., Przeworski, M., Coop, G. and De Massy, B. (2010). "PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice." Science 327(5967): 836-840. Bekris, L.M., Millard, S.P., Galloway, N.M., Vuletic, S., Albers, J.J., Li, G., Galasko, D.R., DeCarli, C., Farlow, M.R., Clark, C.M., et al. (2008). "Multiple SNPs Within and Surrounding the Apolipoprotein E Gene Influence Cerebrospinal Fluid Apolipoprotein E Protein Levels." Journal of Alzheimer's Disease 13(3): 255–266. Bell, O., Tiwari, V.K., Thomä, N.H. and Schübeler, D. (2011). "Determinants and dynamics of genome accessibility." Nature Reviews Genetics 12(8): 554-564. Benoukraf, T., Wongphayak, S., Hadi, L.H.A., Wu, M. and Soong, R. (2013). "GBSA: a comprehensive software for analysing whole genome bisulfite sequencing data." Nucleic Acids Research 41(4): e55. Berrington de Gonzalez, A., Hartge, P., Cerhan, J., Flint, A., Hannan, L., MacInnis, R., Moore, S., Tobias, G., Anton-Culver, H., Freeman, L., et al. (2010). "Body mass index and mortality among 1.46 million white adults." The New England Journal of Medicine 363(23): 2211–2219. Berruyer, C., Pouyet, L., Millet, V., Martin, F., LeGoffic, A., Canonici, A., Garcia, S., Bagnis, C., Naquet, P. and Galland, F. (2006). "Vanin-1 licenses inflammatory mediator

175

production by gut epithelial cells and controls colitis by antagonizing peroxisome proliferator-activated receptor gamma activity." The Journal of experimental medicine 203(13): 2817–2827. Bertina, R.M., Koeleman, B.P.C., Koster, T., Rosendaal, F.R., Dirven, R.J., de Ronde, H., van der Velden, P.A. and Reitsma, P.H. (1994). "Mutation in blood coagulation factor V associated with resistance to activated protein C." Nature 369(6475): 64 - 67. Bird, A. (2002). "DNA methylation patterns and epigenetic memory." Genes and

Development 16(1): 6-21. Birney, E., John A. Stamatoyannopoulos, Anindya Dutta, Roderic Guigó, Thomas R. Gingeras, Elliott H. Margulies, Weng, Z. and al., e. (2007). "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project." Nature 447(7146): 799-816. Birney, E., Lieb, J., Furey, T., Crawford, G. and Iyer, V. (2010). "Allele-specific and heritable chromatin signatures in humans." Human Molecular Genetics 19(2): 204–209. Bodmer, W. and Bonilla, C. (2008). "Common and rare variants in multifactorial susceptibility to common diseases." Nature Genetics 40(6): 695–701. Bonetti, S., Trombetta, M., Linda, B., Dauriz, M., Malerba, G., Trabetti, E., Pignatti, P., Bonora, E. and Bonadonna, R. (2012). "Common variants of monogenic diabetes genes may affect [beta] cell functional mass in patients with newly diagnosed type 2 diabetes." Endocrine Abstracts 29: OC3.1. Borecki, I.B. and Province, M.A. (2008). "Genetic and Genomic Discovery Using Family Studies." Circulation 118(10): 1057-1063. Botstein, D. and Risch, N. (2003). "Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease." Nature Genetics 33: 228–237. Boumber, Y.A., Kondo, Y., Chen, X., Shen, L., Guo, Y., Tellez, C., Estecio, M.R., Ahmed, S. and Issa, J.P. (2008). "An Sp1/Sp3 binding polymorphism confers methylation protection." PLoS Genetics 4(8): e1000162. Boyes, J. and Bird, A. (1991). "DNA methylation inhibits transcription indirectly via a methyl CpG binding protein." Cell 64(6): 1123-1134. Boyle, A., Song, L., Lee, B.-K., London, D., Keefe, D., Birney, E., Iyer, V., Crawford, G. and Furey, T. (2011). "High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells." Genome Research 21(3): 456-464. Browning, S.R. and Browning, B.L. (2013). "Identity-by-descent-based heritability analysis in the Northern Finland Birth Cohort." Human Genetics 132(2): 129-138. Bulanenkova, S., Kozlova, A., Kotova, E., Snezhkov, E., Azhikina, T., Akopov, S., Nikolaev, L. and Sverdlov, E. (2011). "Dam methylase accessibility as an instrument for analysis of mammalian chromatin structure." Epigenetics 6(9): 1078 – 1084.

176

Bulik-Sullivan, B.K. and Sullivan, P.F. (2012). "The authorship network of genome-wide association studies." Nature Genetics 44(2): 113. Bustin, S.A., Benes, V., Nolan, T. and Pfaffl, M.W. (2005). "Quantitative real-time RT-PCR – a perspective." Journal of Molecular Endocrinology 34(3): 597–601. Butter, F., Davison, L., Viturawong, T., Scheibe, M., Vermeulen, M., Todd, J.A. and Mann, M. (2012). "Proteome-Wide Analysis of Disease-Associated SNPs That Show Allele-Specific Transcription Factor Binding." PLoS Genetics 8(9): e1002982. Callinan, P.A. and Feinberg, A.P. (2006). "The emerging science of epigenomics." Human Molecular Genetics 15(suppl 1): R95-R101. Cambien, F. and Tiret, L. (2007). "Genetics of Cardiovascular Diseases From Single Mutations to the Whole Genome." Circulation 116(15): 1714-1724. Cantor, R.M., Lange, K. and Sinsheimer, J.S. (2010). "Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application." The

American Journal of Human Genetics 86(1): 6–22. Cao, Z., Umek, R. and McKnight, S. (1991). "Regulated expression of three C/EBP isoforms during adipose conversion of 3T3-L1 cells." Genes and Development 5(9): 1538–1552. Carey, M., Peterson, C. and Smale, S. (2009). Chromatin Immunoprecipitation (ChIP). Cold Spring Harbor, New York. Carey, M., Peterson, C. and Smale, S. (2009). Transcriptional Regulation in

Eukaryotes: Concepts, Strategies, and Techniques. Cold Spring Harbor, New York. Caro, E., Stroud, H., Greenberg, M.V.C., Bernatavichute, Y.V., Feng, S., Groth, M., Vashisht, A.A., Wohlschlegel, J. and Jacobsen, S.E. (2012). "The SET-Domain Protein SUVR5 Mediates H3K9me2 Deposition and Silencing at Stimulus Response Genes in a DNA Methylation–Independent Manner." PLoS Genetics 8(10): e1002995. Casto, A.M. and Feldman, M.W. (2011). "Genome-Wide Association Study SNPs in the Human Genome Diversity Project Populations: Does Selection Affect Unlinked SNPs with Shared Trait Associations?" PLoS Genetics 7(1): e1001266. Chavan, S.V., Maitra, A., Roy, N., Patwardhan, S., Chavan, P.R..(2010). “Genetic variants in the distal enhancer region of the PSA gene and their implication in the occurrence of advanced prostate cancer” Molecular Medicine Reports 3(5): 837-843. Chekmenev, D.S., Haid, C. and Kel, A.E. (2005). "P-Match: transcription factor binding site search by combining patterns and weight matrices." Nucleic Acids Research 33(suppl 2): W432–W437. Cheng, K.C.-C. and Inglese, J. (2012). "A coincidence reporter-gene system for high-throughput screening." Nature Methods 9(10): 937-937. Cho, D.-Y., Kim, Y.-A. and Przytycka, T.M. (2012). "Network Biology Approach to Complex Diseases." PLoS Computational Biology 8(12): e1002820.

177

Chung, H.-R., Dunkel, I., Heise, F., Linke, C., Krobitsch, S., Ehrenhofer-Murray, A.E., Sperling, S.R. and Vingron, M. (2010). "The Effect of Micrococcal Nuclease Digestion on Nucleosome Positioning Data." PLoS ONE 5(12): e15754. Coetzee, G.A., Jia, L., Frenkel, B., Henderson, B.E., Tanay, A., Haiman, C.A. and Freedman, M.L. (2010). "A systematic approach to understand the functional consequences of non-protein coding risk regions." Cell Cycle 9(2): 47–51. Cohen, J.C., Kiss, R.S., Pertsemlidis, A., Marcel, Y.L., McPherson, R. and Hobbs, H.H. (2004). "Multiple rare alleles contribute to low plasma levels of HDL cholesterol." Science 305(5685): 869–872. Cohen, J.C., Pertsemlidis, A., Fahmi, S., Esmail, S., Vega, G.L., Grundy, S.M. and Hobbs, H.H. (2006). "Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels." Proceedings of the National

Academy of Sciences 103(6): 1810-1815. Collas, P. (2010). "The Current State of Chromatin Immunoprecipitation." Molecular

Biotechnology 45(1): 87-100. Collins, F., Guyer, M. and Charkravarti, A. (1997). "Variations on a theme: cataloging human DNA sequence variation." Science 278(5343): 1580–1581. Collins, F.S., Morgan, M. and Patrinos, A. (2003). "The Human Genome Project: lessons from large-scale biology." Science 300(5617): 286-290. Conde, L., Bracci, P.M., Richardson, R., Montgomery, S.B. and Skibola, C.F. (2013). "Integrating GWAS and expression data for functional characterization of disease-associated SNPs: an application to follicular lymphoma." The American Journal of

Human Genetics 92(1): 126-130. Conrad, D.F., Keebler, J.E., DePristo, M.A., Lindsay, S.J., Zhang, Y., Casals, F., Idaghdour, Y., Hartl, C.L., Torroja, C., Garimella, K.V., et al. (2011). "Variation in genome-wide mutation rates within and between human families." Nature Genetics 43(7): 712–714. Cookson, W., Liang, L., Abecasis, G., Moffatt, M. and Lathrop, M. (2009). "Mapping complex disease traits with global gene expression." Nature Reviews Genetics 10(3): 184–194. Cooper, G.M. and Shendure, J. (2011). “Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data.” Nature Reviews Genetics 12(9): 628-640. Cowie, A., Cheng, J., Sibley, C.D., Fong, Y., Zaheer, R., Patten, C.L., Morton, R.M., Golding, G.B. and Finan, T.M. (2006). "An Integrated Approach to Functional Genomics: Construction of a Novel Reporter Gene Fusion Library for Sinorhizobium meliloti." Applied and Environmental Microbiology 72(11): 7156–7167. Crews, K.R., Hicks, J.K., Pui, C.-H., Relling, M.V. and Evans, W.E. (2012). "Pharmacogenomics and Individualized Medicine: Translating Science Into Practice." Clinical Pharmacology & Therapeutics 92(4): 467–475.

178

Cruickshank, M., Fenwick, E., Abraham, L. and Ulgiati, D. (2008). "Quantitative differences in chromatin accessibility across regulatory regions can be directly compared in distinct cell-types." Biochemical and Biophysical Research

Communications 367(2): 349-355. Cruickshank, M., Karimi, M., Mason, R., Fenwick, E., Mercer, T., Tsao, B., Boackle, S. and Ulgiati, D. (2012). "Transcriptional effects of a lupus-associated polymorphism in the 5′ untranslated region (UTR) of human complement receptor 2 (CR2/CD21)." Molecular Immunology 52(3): 165–173. Cruickshank, M.N., Fenwick, E., Karimi, M., Abraham, L.J. and Ulgiati, D. (2009). "Cell- and stage-specific chromatin structure across the Complement receptor 2 (CR2/CD21) promoter coincide with CBF1 and C/EBP-beta binding in B cells." Molecular Immunology 46(13): 2613-2622. Cuff, J., Clamp, M., Siddiqui, A., Finlay, M. and Barton, G. (1998). "JPred: a consensus secondary structure prediction server." Bioinformatics 14(10): 892-893. Curran, J., Jowett, J., Abraham, L.J, Diepeveen, L.A, Elliott, K., Dyer, T., Kerr- Bayles, L., Johnson, M., Comuzzie, A., Moses, E., et al. (2009). "Genetic variation in PARL influences mitochondrial content." Human Genetics 127(2): 183–190. Dai, Y., Jiang, R. and Dong, J. (2012). "Weighted selective collapsing strategy for detecting rare and common variants in genetic association study." BMC Genetics 13(1): 7. Danilition, S., Fredrickson, R., Taylor, C. and Miyamoto, N. (1991). "Transcription factor binding and spacing constraints in the human beta-actin proximal promoter." Nucleic Acids Research 19(24): 6913-6922. Decker, T., di Magliano, M.P., McManus, S., Sun, Q., Bonifer, C., Tagoh, H. and Busslinger, M. (2009). "Stepwise Activation of Enhancer and Promoter Regions of the B Cell Commitment Gene Pax5 in Early Lymphopoiesis." Immunity 30(4): 508-520. Degner, J.F., Pai, A.A., Pique-Regi, R., Veyrieras, J.-B., Gaffney, D.J., Pickrell, J.K., De Leon, S., Michelini, K., Lewellen, N., Crawford, G.E., et al. (2012). "DNase I sensitivity QTLs are a major determinant of human expression variation." Nature 482(7385): 390-394. Dekker, F. and Haisma, H. (2009). "Histone acetyl transferases as emerging drug targets." Drug Discovery Today 14(19): 942−948. Deng, J., Shoemaker, R., Xie, B., Gore, A., LeProust, E.M., Antosiewicz-Bourget, J., Egli, D., Maherali, N., Park, I.-H., Yu, J., et al. (2009). "Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming." Nature

Biotechnology 27(4): 353-360. Dickinson, H.O., Mason, J.M., Nicolson, D.J., Campbell, F., Beyer, F.R., Cook, J.V., Williams, B. and Ford, G.A. (2006). "Lifestyle interventions to reduce raised blood pressure: a systematic review of randomized controlled trials." Journal of Hypertension 24(2): 215–233.

179

Dimas, A.S., Deutsch, S., Stranger, B.E., Montgomery, S.B., Borel, C., Attar-Cohen, H., Ingle, C., Beazley, C., Arcelus, M.G., Sekowska, M., et al. (2009). “Common Regulatory Variation Impacts Gene Expression in a Cell Type–Dependent Manner.” Science. 325(5945): 1246-1250. Diogo, D., Kurreeman, F., Stahl, E.A., Liao, K.P., Gupta, N., Greenberg, J.D., Rivas, M.A., Hickey, B., Flannick, J., Thomson, B., et al. (2012). "Rare, Low-Frequency, and Common Variants in the Protein-Coding Sequence of Biological Candidate Genes from GWASs Contribute to Risk of Rheumatoid Arthritis." The American Journal of Human

Genetics 92(1): 15-27. Dong, C., Yoon, W. and Goldschmidt-Clermont, P. (2002). "DNA Methylation and Atherosclerosis." Journal of Nutrition 132(8): 2406-2409. Dong, K., Li, B., Qing, Y., Liu, J., Li, C. and Sun, Z. (2005). "Aberrant Methylation in CpG Islands of p15 and p16 Tumor Suppressor Genes in Pancreatic Cancer Tissue." The Chinese-German Journal of Clinical Oncology 4(4): 213-217. Doris, P.A. (2011). "The Genetics of Blood Pressure and Hypertension: The Role of Rare Variation." Cardiovascular Therapeutics 29(1): 37–45. Drouin, R., Angers, M., Dallaire, N., Rose, T.M., Khandjian, E.W. and Rousseau, F. (1997). "Structural and functional characterization of the human FMR1 promoter reveals similarities with the hnRNP-A2 promoter region." Human Molecular Genetics 6(12): 2051–2060. Duerr, R.H., Taylor, K.D., Brant, S.R., Rioux, J.D., Silverberg, M.S., Daly, M.J., Steinhart, A.H., Abraham, C., Regueiro, M., Griffiths, A., et al. (2006). "A genome-wide association study identifies IL23R as an inflammatory bowel disease gene." Science 314(5804): 1461–1463. Duff, G.W. (2006). "Evidence for genetic variation as a factor in maintaining health." The American Journal of Clinical Nutrition 83(2): 431S-435S. Dufour, A., Schneider, F., Metzeler, K.H., Hoster, E., Schneider, S., Zellmeier, E., Benthaus, T., Sauerland, M.-C., Berdel, W.E., Büchner, T., et al. (2010). "Acute Myeloid Leukemia With Biallelic CEBPA Gene Mutations and Normal Karyotype Represents a Distinct Genetic Entity Associated With a Favorable Clinical Outcome." Journal of Clinical Oncology 28(4): 570-577. Edelstein, L., Pan, A. and Collins, T. (2005). "Chromatin Modification and the Endothelial-specific Activation of the E-selectin Gene." The Journal of Biological

Chemistry 280(12): 11192-11202. Eeles, R.A., Kote-Jarai, Z., Al Olama, A.A., Giles, G.G., Guy, M., Severi, G., Muir, K., Hopper, J.L., Henderson, B.E., Haiman, C.A., et al. (2009). "Identification of seven new prostate cancer susceptibility loci through a genome-wide association study." Nature

Genetics 41(10): 1116–1121. Ehret, G.B., Munroe, P.B., Rice, K.M., Bochud, M., Johnson, A.D., Chasman, D.I., Smith, A.V., Psaty, B.M., Abecasis, G.R., Chakravarti, A., et al. (2011). "Genetic

180

variants in novel pathways influence blood pressure and cardiovascular disease risk." Nature 478(7367): 103–109. Eichler, E.E., Nickerson, D.A., Altshuler, D., Bowcock, A.M., Brooks, L.D., Carter, N.P., Church, D.M., Felsenfeld, A., Guyer, M., Lee, C., et al. (2007). "Completing the map of human genetic variation." Nature 447(7141): 161–165. Ellenberger, T. (1994). "Getting a grip in DNA recognition: structures of the basic region leucine zipper, and the basic region helix-loop-helix DNA-binding domains." Current Opinion in Structural Biology 4(1): 12–21. Erler, J.T. and Linding, R. (2012). "Network Medicine Strikes a Blow against Breast Cancer." Cell 149(4): 731–733. Ernst, J., Kheradpour, P., Mikkelsen, T.S., Shoresh, N., Ward, L.D., Epstein, C.B., Zhang, X., Wang, L., Issner, R., Coyne, M., et al. (2011). "Mapping and analysis of chromatin state dynamics in nine human cell types." Nature 473(7345): 43–49. Faber, K., Glatting, K.-H., Mueller, P.J., Risch, A. and Hotz-Wagenblatt, A. (2011). "Genome-wide prediction of splice-modifying SNPs in human genes using a new analysis pipeline called AASsites." BMC Bioinformatics 12(Suppl 4): S2. Fallahi-Sichani, M., El-Kebir, M., Marino, S., Kirschner, D. and Linderman, J. (2011). "Multiscale computational modeling reveals a critical role for TNF-α receptor 1 dynamics in tuberculosis granuloma formation." The Journal of Immunology 186(6): 3472-3483. Faraggi, E., Zhang, T., Yang, Y., Kurgan, L. and Zhou1, Y. (2012). "SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles." Journal of

Computational Chemistry 33(3): 259–267. Farré, M., Micheletti, D. and Ruiz-Herrera, A. (2012). "Recombination Rates and Genomic Shuffling in Human and Chimpanzee—A New Twist in the Chromosomal Speciation Theory." Molecular Biology and Evolution 6(6): e20321. Fava, C., Montagnana, M., Danese, E., Sjogren, M., Hedblad, B. and Melander, O. (2010). "Vanin-1 I26t Polymorphism and Hypertension in Two Large Urban-Based Prospective Studies in Swedes." Journal of Hypertension 28: e341. Fernald, G., Capriotti, E., Daneshjou, R., Karczewski, K. and Altman, R. (2011). "Bioinformatics challenges for personalized medicine." Bioinformatics 27(13): 1741-1748. Feuk, L., Carson, A.R. and Scherer, S.W. (2006). "Structural variation in the human genome." Nature Reviews Genetics 7(2): 85–97. Fledel-Alon, A., Leffler, E.M., Guan, Y., Stephens, M., Coop, G. and Przeworski, M. (2011). "Variation in Human Recombination Rates and Its Genetic Determinants." PLoS

ONE 6(6): e20321. Frazer, K.A., Murray, S.S., Schork, N.J. and Topol, E.J. (2009). "Human genetic

181

variation and its contribution to complex traits." Nature Reviews Genetics 10(4): 241-251. Freedman, M.L., Monteiro, A.N.A., Gayther, S.A., Coetzee, G.A., Risch, A., Plass, C., Casey, G., De Biasi, M., Carlson, C., Duggan, D., et al. (2011). "Principles for the post-GWAS functional characterization of cancer risk loci." Nature Genetics 43(6): 513–518. Freeman, W., Walker, S. and Vrana, K. (1999). "Quantitative RT-PCR: Pitfalls and Potential." BioTechniques 26: 112-125. Frommer, M., McDonald, L., Millar, D., Coiiis, C., Watt, F., Grigg, G., Molloy, P. and Paul, C. (1992). "A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands." Proceedings of the National

Academy of Sciences 89(5): 1827-1831. Fugmann, T., Borgia, B., Révész, C., Godó, M., Forsblom, C., Hamar, P., Holthöfer, H., Neri, D. and Roesli, C. (2011). "Proteomic identification of vanin-1 as a marker of kidney damage in a rat model of type 1 diabetic nephropathy." Kidney International 80(3): 272-281. Fürbass, R., Selimyan, R. and Vanselow, J. (2008). "DNA methylation and chromatin accessibility of the proximal Cyp19 promoter region 1.5/2 correlate with expression levels in sheep placentomes." Journal of Molecular Reproduction and Development 75(1): 1–7. Gaffney, D.J., McVicker, G., Pai, A.A., Fondufe-Mittendorf, Y.N., Lewellen, N., Michelini, K., Widom, J., Gilad, Y. and Pritchard, J.K. (2012). "Controls of nucleosome positioning in the human genome." PLoS Genetics 8(11): e1003036. Galland, F., Malergue, F., Bazin, H., Mattei, M., Aurrand-Lions, M., Theillet, C. and Naquet, P. (1998). "Two human genes related to murine Vanin-1 are located on the long arm of human chromosome 6." Genomics 53(2): 203-213. Garner, M. and Revzin, A. (1981). "A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system." Nucleic Acids Research 9(13): 3047-3060. Ge, B., lok, D.K., Kwan, T., Grundberg, E., Morcos, L., Verlaan, D.J., Le, J., Koka, V., Lam, K.C., Gagné, V., et al. (2009). "Global patterns of cis variation in human cells revealed by high-density allelic expression analysis." Nature Genetics 41(11): 1216–1222. Genin, E., Hannequin, D., Wallon, D., Sleegers, K., Hiltunen, M., Combarros, O., Bullido, M.J., Engelborghs, S., De Deyn, P., Berr, C., et al. (2011). "APOE and Alzheimer disease: a major gene with semi-dominant inheritance." Molecular

Psychiatry 16(9): 903–907. Gertz, J., Varley, K., Reddy, T., Bowling, K., Pauli, F., Parker, S., Kucera, K., Willard, H. and Myers, R. (2011). "Analysis of DNA Methylation in a Three-Generation Family Reveals Widespread Genetic Influence on Epigenetic Regulation." PLoS Genetics 7(8):

182

e1002228. Gibson, G. (2012). "Rare and common variants: twenty arguments." Nature Reviews

Genetics 13(2): 135-145. Giresi, P.G., Kim, J., McDaniell, R.M., Iyer, V.R. and Lieb, J.D. (2007) "FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin." Genome Research 17(6): 877-885. Girirajan, S., Brkanac, Z., Coe, B.P., Baker, C., Vives, L., Vu, T.H., Shafer, N., Bernier, R., Ferrero, G.B., Silengo, M., et al. (2011). "Relative Burden of Large CNVs on a Range of Neurodevelopmental Phenotypes." PLoS Genetics 7(11): e1002334. Git, A., Dvinge, H., Salmon-Divon, M., Osborne, M., Kutter, C., Hadfield, J., Bertone, P. and Caldas, C. (2010). "Systematic comparison of microarray profiling, real-time PCR, and next-generation sequencing technologies for measuring differential microRNA expression." RNA 16(5): 991-1006. Glait-Santar, C. and Benayahu, D. (2011). "SVEP1 promoter regulation by methylation of CpG sites." Gene 490(1): 6–14. Glazier, A.M., Nadeau, J.H. and Aitman, T.J. (2002). "Finding genes that underlie complex traits." Science 298(5602): 2345-2349. Goh, K.I., Cusick, M.E., Valle, D., Childs, B., Vidal, M. and Barabasi, A.L. (2007). "The human disease network." Proceedings of the National Academy of Sciences 104(21): 8685–8690. Goldgar, D.E., Healey, S., Dowty, J.G., Da Silva, L., Chen, X., Spurdle, A.B., Terry, M.B., Daly, M.J., Buys, S.M., Southey, M.C., et al. (2011). "Rare variants in the ATM gene and risk of breast cancer." Breast Cancer Research 13(4): R73-R73. Goldstein, J.L., Hobbs, H.H. and Brown, M.S. (2001). Familial hypercholesterolemia. McGraw-Hill Book Co., New York. Goll, M.G. and Bestor, T.H. (2005). "Eukaryotic cytosine methyltransferases." Annual

Review of Biochemistry 74: 481–514. Goriely, S., Van Lint, C., Dadkhah, R., Libin, M., De Wit, D., Demonté, D., Willems, F. and Goldman, M. (2004). "A defect in nucleosome remodeling prevents IL-12 (p35) gene transcription in neonatal dendritic cells." The Journal of Experimental Medicine 199(7): 1011-1016. Göring, H., Curran, J., Johnson, M., Dyer, T., Charlesworth, J., Cole, S., Jowett, J., Abraham, L., Rainwater, D., Comuzzie, A., et al. (2007). "Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes." Nature

Genetics 39(10): 1208 - 1216. Gottesman, O., Drill, E., Lotay, V., Bottinger, E. and Peter, I. (2012). "Can Genetic Pleiotropy Replicate Common Clinical Constellations of Cardiovascular Disease and Risk?" PLoS ONE 7(9): e46419.

183

Goyette, P., Lefebvre, C., Ng, A., Brant, S.R., Cho, J.H., Duerr, R.H., Silverberg, M.S., Taylor, K.D., Latiano, A., Aumais, G., et al. (2008). "Gene-centric association mapping of chromosome 3p implicates MST1 in IBD pathogenesis." Mucosal Immunology 1(2): 131–138. Grabe, N. (2002). "AliBaba2: Context specific identification of transcription factor binding sites." In Silico Biology 2(1): 1-15. Grant, S.F. and Hakonarson, H. (2009). "Genome-wide association studies in type 1 diabetes." Current Diabetes Reports 9(2): 157-163. Grayling, R., Bailey, K. and Reeve, J. (1997). "DNA binding and nuclease protection by the HMf histones from the hyperthermophilic archaeon Methanothermus fervidus." Extremophiles 1(2): 79-88. Grunenfelder, B. and Winzler, E.A. (2002). "Treasures and traps in genome-wide data sets: case examples from yeast." Nature Reviews Genetics 3(9): 653-661. Guo, Y. and Jamison, D. (2005). "The distribution of SNPs in human gene regulatory regions." BMC Genomics 6(1): 140. Hager, G., Archer, T., Fragoso, G., Bresnick, E., Tsukagoshi, Y., John, S. and Smith, C. (1993). "Influence of Chromatin Structure on the Binding of Transcription Factors to DNA." Cold Spring Harbor Symposia on Quantitative Biology 58: 63-71. Hager, G., McNally, J. and Misteli, T. (2009). "Transcription dynamics." Molecular

Cell 35(6): 741-753. Hakonarson, H., Grant, S.F., Bradfield, J.P., Marchand, L., Kim, C.E., Glessner, J.T., Grabs, R., Casalunovo, T., Taback, S.P., Frackelton, E.C., et al. (2007). "A genome-wide association study identifies KIAA0350 as a type 1 diabetes gene." Nature 448(7153): 591-594. Haslam, D.W. and James, W.P. (2005). "Obesity." Lancet 366(9492): 1197–209. Hawkins, R.D., Hon, G.C. and Ren, B. (2010). "Next-generation genomics: an integrative approach." Nature Reviews Genetics 11(7): 476-486. He, F.J. and MacGregor, G.A. (2009). "A comprehensive review on salt and health and current experience of worldwide salt reduction programmes." Journal of Human

Hypertension 23(6): 363–384. He, M., Guo, H., Yang, X., Zhang, X., Zhou, L., Cheng, L., Zeng, H., Hu, F.B., Tanguay, R.M. and Wu, T. (2009). "Functional SNPs in HSPA1A Gene Predict Risk of Coronary Heart Disease." PLoS ONE 4(3): e4851. Heins, J., Suriano, J., Taniuchi, H. and Anfinsen, C. (1967). "Characterization of a nuclease produced by Staphylococcus aureus." The Journal of Biological Chemistry 242(5): 1016-1020. Hellman, L.M. and Fried, M.G. (2007). "Electrophoretic Mobility Shift Assay (EMSA) for Detecting Protein-Nucleic Acid Interactions." Nature Protocols 2(8): 1849–1861.

184

Hirschhorn, J.N. (2009). "Genomewide association studies—illuminating biologic pathways." The New England Journal of Medicine 360(17): 1699-1701. Hodges, E., Smith, A.D., Kendall, J., Xuan, Z., Ravi, K., Rooks, M., Zhang, M.Q., Ye, K., Bhattacharjee, A., Brizuela, L., et al. (2009). "High definition profiling of mammalian DNA methylation by array capture and single molecule bisulfite sequencing." Genome Research 19(9): 1593-1605. Hoffmeyer, S., Burk, O., von Richter, O., Arnold, H., Brockmöller, J., Johne, A., Cascorbi, I., Gerloff, T., Roots, I., Eichelbaum, M., et al. (2000). "Functional polymorphisms of the human multidrug-resistance gene: Multiple sequence variations and correlation of one allele with P-glycoprotein expression and activity in vivo." Proceedings of the National Academy of Sciences 97(7): 3473-3478. Holden, N. and Tacon, C. (2011). "Principles and problems of the electrophoretic mobility shift assay." Journal of Pharmacological and Toxicological Methods 63(1): 7-14. Holliday, R. and Ho, T. (1991). "Gene silencing in mammalian cells by uptake of 5-methyl deoxycytidine-5'-triphosphate." Somatic Cell and Molecular Genetics 17(6): 537-542. Hon, G., Wang, W. and Ren, B. (2009). "Discovery and Annotation of Functional Chromatin Signatures in the Human Genome." PLoS Computational Biology 5(11): e1000566. Hu, C., Bart Williams, D., Zhaorigetu, S., Khalil, S., Wan, G. and Valle, D. (2008). "Functional genomics and SNP analysis of human genes encoding proline metabolic enzymes." Amino Acids 35(4): 655-664. Hu, X. and Daly, M. (2012). "What have we learned from six years of GWAS in autoimmune diseases, and what is next?" Current Opinion in Immunology 24(5): 571–575. Huang, D.W., Sherman, B.T. and Lempicki, R.A. (2009). "Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists." Nucleic

Acids Research 37(1): 1–13. Huang, Y., Yang, H., Borg, B., Su, X., Rhodes, S., Yang, K., Tong, X., Tang, G., Howell, C., Rosen, H., et al. (2007). "A functional SNP of interferon-γ gene is important for interferon-α-induced and spontaneous recovery from hepatitis C virus infection." Proceedings of the National Academy of Sciences 104(3): 985-990. Hultman, K., Tjarnlund-Wolf, A., Odeberg, J., Eriksson, P. and Jern, C. (2010). "Allele-specific transcription of the PAI-1 gene in human astrocytes." Thrombosis &

Haemostasis 104(5): 998-998. Hunt, K., Zhernakova, A., Turner, G., Heap, G., Franke, L., Bruinenberg, M., Romanos, J., Dinesen, L., Ryan, A., Panesar, D., et al. (2008). "Newly identified genetic risk variants for celiac disease related to the immune response." Nature Genetics 40(4): 395-402.

185

Hussin, J., Roy-Gagnon, M.-H., Gendron, R., Andelfinger, G. and Awadalla, P. (2011). "Age-Dependent Recombination Rates in Human Pedigrees." PLoS Genetics 7(9): e1002251. Huttlin, E.L., Hegeman, A.D., Harms, A.C. and Sussman, M.R. (2007). "Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy." Journal of Proteome Research 6(1): 392-398. Ideker, T. and Sharan, R. (2008). "Protein networks in disease." Genome Research 18(4): 644-652. Ikeda, R., Nishida, T., Watanabe, F., Shimizu-Saito, K., Asahina, K., Horikawa, S. and Teraoka, H. (2008). "Involvement of CCAAT/enhancer binding protein-beta (C/EBPbeta) in epigenetic regulation of mouse methionine adenosyltransferase 1A gene expression." The International Journal of Biochemistry & Cell Biology 40(9): 1956-1969. Illumina. (2009). "Genomic Sequencing." Retrieved 20/04/2013, from http://www.illumina.com/Documents/products/datasheets/datasheet_genomic_sequence.pdf. Jacobo-Albavera, L., Aguayo-de la Rosa, P.I., Villarreal-Molina, T., Villamil-Ramírez, H., León-Mimila, P., Romero-Hidalgo, S., López-Contreras, B.E., Sánchez-Muñoz, F., Bojalil, R., González-Barrios, J.A., et al. (2012). "VNN1 Gene Expression Levels and the G-137T Polymorphism Are Associated with HDL-C Levels in Mexican Prepubertal Children." PLoS ONE 7(11): e49818. Jakobsdottir, J., Gorin, M.B., Conley, Y.P., Ferrell, R.E. and Weeks, D.E. (2009). "Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers." PLoS Genetics 5(2): e1000337. Jansen, P., Kamsteeg, M., Rodijk-Olthuis, D., van Vlijmen-Willems, I., de Jongh, G., Bergers, M., Tjabringa, G., Zeeuwen, P. and Schalkwijk, J. (2009). "Expression of the vanin gene family in normal and inflamed human skin: induction by proinflammatory cytokines." Journal of Investigative Dermatology 129(9): 2167–2174. Jeffreys, A.J., Ritchie, A. and Neumann, R. (2000). "High resolution analysis of haplotype diversity and meiotic crossover in the human TAP2 recombination hotspot." Human Molecular Genetics 9(5): 725-733. Jenkins, P.A., Song, Y.S. and Brem, R.B. (2012). "Genealogy-Based Methods for Inference of Historical Recombination and Gene Flow and Their Application in Saccharomyces cerevisiae." PLoS ONE 7(11): e46947. Ji, W.Z., Foo, J.N., O’Roak, B.J., Zhao, H., Larson, M.G., Simon, D.B., Newton-Cheh, C., State, M.W., Levy, D. and Lifton, R.P. (2008). "Rare independent mutations in renal salt handling genes contribute to blood pressure variation." Nature Genetics 40(5): 592-599. Jian, Y., Manolio, T.A., Pasquale, L.R., Boerwinkle, E., Caporaso, N., Cunningham,

186

J.M., de Andrade, M., Feenstra, B., Feingold, E., Hayes, M.G., et al. (2011). "Genome partitioning of genetic variation for complex traits using common SNPs." Nature

Genetics 43(6): 519-525. Jiang, L., Yao, M., Shi, J., Shen, P., Niu, G. and Fei, J. (2008). "Yin yang 1 directly regulates the transcription of RE-1 silencing transcription factor." Journal of

Neuroscience Research 86(6): 1209–1216. Jiang, Y., Wang, L., Gong, W., Wei, D., Le, X., Yao, J., Ajani, J., Abbruzzese, J., Huang, S. and Xie, K. (2005). "A high expression level of insulin-like growth factor I receptor is associated with increased expression of transcription factor Sp1 and regional lymph node metastasis of human gastric cancer." Clinical and Experimental Metastasis 21(8): 755-764. John, S., Sabo, P.J., R.E., T., Sung, M.H., Biddie, S.C., Johnson, T.A., Hager, G.L. and Stamatoyannopoulos, J.A. (2011). "Chromatin accessibility pre-determines glucocorticoid receptor binding patterns." Nature Genetics 43(3): 264–268. Johnson, A.D. (2009). "Single-Nucleotide Polymorphism Bioinformatics. A Comprehensive Review of Resources." Circulation: Cardiovascular Genetics 2(5): 530-536. Johnson, A.R., Lao, S., Wang, T., Galanko, J.A. and Zeisel, S.H. (2012). "Choline Dehydrogenase Polymorphism rs12676 Is a Functional Variation and Is Associated with Changes in Human Sperm Cell Function." PLoS ONE 7(4): e36047. Johnson, D.S., Mortazavi, A., Myers, R.M. and Wold, B. (2007). "Genome-wide mapping of in vivo protein-DNA interactions." Science 316(5830): 1497–1502. Jones, T.I., Chen, J.C., Rahimov, F., Homma, S., Arashiro, P., Beermann, M.L., King, O.D., Miller, J.B., Kunkel, L.M., Emerson Jr, C.P., et al. (2012). "Facioscapulohumeral muscular dystrophy family studies of DUX4 expression: evidence for disease modifiers and a quantitative model of pathogenesis." Human Molecular Genetics 21(20): 4419-4430. Jostins, L. and Barrett, J.C. (2011). "Genetic risk prediction in complex disease." Human Molecular Genetics 20(R2): R182-R188. Kagami, M., O'Sullivan, M.J., Green, A.J., Watabe, Y., Arisaka, O., Masawa, N., Matsuoka, K., Fukami, M., Matsubara, K., Kato, F., et al. (2010). "The IG-DMR and the MEG3-DMR at Human Chromosome 14q32.2: Hierarchical Interaction and Distinct Functional Properties as Imprinting Control Centers." PLoS Genetics 6(6): e1000992. Kann, M.G. (2010). "Advances in translational bioinformatics: computational approaches for the hunting of disease genes." Briefings in bioinformatics 11(1): 96–110. Karimi, M. (2007). Functional analysis of the -308G/A polymorphism in the tumour necrosis factor promoter. School of Biomedical, Biomolecular and Chemical Sciences. Perth, The University of Western Australia. PhD Thesis. Karimi, M., LC, G., Cruickshank, M., Moses, E. and Abraham, L. (2009). "A critical assessment of the factors affecting reporter gene assays for promoter SNP function: a

187

reassessment of −308 TNF polymorphism function using a novel integrated reporter system." European Journal of Human Genetics 17(11): 1454–1462. Kaskow, B., Proffit, J., Blangero, J., Moses, E. and Abraham, L. (2012). "Diverse biological activities of the vascular non-inflammatory molecules – The Vanin pantetheinases." Biochemical and Biophysical Research Communications 417(2): 653–658. Kassim, S.H., Wilson, J.M. and Rader, D.J. (2010). "Gene therapy for dyslipidemia: a review of gene replacement and gene inhibition strategies." Journal of Clinical

Lipidology 5(6): 793–809. Kathiresan, S. and Srivastava, D. (2012). "Genetics of Human Cardiovascular Disease." Cell 148(6): 1242–1257. Kazani, S., Wechsler, M.E. and Israel, E. (2010). "The role of pharmacogenomics in improving the management of asthma." Journal of Allergy and Clinical Immunology 125(2): 295–302. Keenan, B.T., Shulman, J.M., Chibnik, L.B., Raj, T., Tran, D., Sabuncu, M.R., Allen, A.N., Corneveaux, J.J., Hardy, J.A., Huentelman, M.J., et al. (2012). "A coding variant

in CR1 interacts with APOE-ε4 to influence cognitive decline." Human Molecular

Genetics 21(10): 2377-2388. Kent, N.A. and Mellor, J. (1995). "Chromatin structure snap-shots: Rapid nuclease digestion of chromatin in yeast." Nucleic Acids Research 23(18): 3786–3787. Kerkel, K., Spadola, A., Yuan, E., Kosek, J., Jiang, L., Hod, E., Li, K., Murty, V., Schupf, N., Vilain, E., et al. (2008). "Genomic surveys by methylation-sensitive SNP analysis identify sequence-dependent allele-specific DNA methylation." Nature

Genetics 40(7): 904-908. Kim, Y.-A., Wuchty, S. and Przytycka, T.M. (2011). " Identifying Causal Genes and Dysregulated Pathways in Complex Diseases." PLoS Computational Biology 7(3): e1001095. Kingsley, C.B. (2011). "Identification of causal sequence variants of disease in the next generation sequencing era." Disease Gene Identification 700: 37-46. Kleinjan, D.A. and van Heyningen, V. (2005). “Long-Range Control of Gene Expression: Emerging Mechanisms and Disruption in Disease.” The American Journal

of Human Genetics 76(1): 8-32. Knight, J. (2005). "Regulatory polymorphisms underlying complex disease traits." Journal of Molecular Medicine 83(2): 97-109. Charles, K.J. (2005). "HaploChIP: an in vivo assay." Methods in Molecular Biology 311: 49-60. Knowles, M.R. and Drumm, M. (2012). "The Influence of Genetics on Cystic Fibrosis Phenotypes." Cold Spring Harbor Perspectives in Medicine 2(12): a009548.

188

Kodama, K., Horikoshic, M., Toda, K., Yamada, S., Hara, K., Irie, J., Sirota, M., Morgan, A.A., Chen, R., Ohtsu, H., et al. (2012). "Expression-based genome-wide association study links the receptor CD44 in adipose tissue with type 2 diabetes." Proceedings of the National Academy of Sciences 109(18): 7049–7054. Kong, A., Frigge, M.L., Masson, G., Besenbacher, S., Sulem, P., Magnusson, G., Gudjonsson, S.A., Sigurdsson, A., Jonasdottir, A., Jonasdottir, A., et al. (2012). "Rate of de novo mutations and the importance of father's age to disease risk." Nature 488(7412): 471-475. Kong, A., Gudbjartsson, D.F., Sainz, J., Jonsdottir, G.M., Gudjonsson, S.A., Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G., et al. (2002). "A high-resolution recombination map of the human genome." Nature Genetics 31(3): 241-247. Kuhlmann, I., Minihane, A.M., Huebbe, P., Nebel, A. and Rimbach, G. (2010). "Apolipoprotein E genotype and hepatitis C, HIV and herpes simplex disease risk: a literature review." Lipids in Health and Disease 9(1): 8. Kurreeman, F.A.S., Padyukov, L., Marques, R.B., Schrodi, S.J., Seddighzadeh, M.S.-R., G, van der Helm-van Mil, A.H.M., Allaart, C.F., Verduyn, W., Houwing-Duistermaat, J., Alfredsson, L., et al. (2007). "A Candidate Gene Approach Identifies the TRAF1/C5 Region as a Risk Factor for Rheumatoid Arthritis." PLoS Medicine 4(9): e278. Ladouceur, M., Dastani, Z., Aulchenko, Y.S., Greenwood, C.M.T. and Richards, J.B. (2012). "The Empirical Power of Rare Variant Association Methods: Results from Sanger Sequencing in 1,998 Individuals." PLoS Genetics 8(2): e1002496. Landa, I., Ruiz-Llorente, S., Montero-Conde, C., Inglada-Pérez, L., Schiavi, F., Leskelä, S., Pita, G., Milne, R., Maravall, J., Ramos, I., et al. (2009). "The Variant rs1867277 in FOXE1 Gene Confers Thyroid Cancer Susceptibility through the Recruitment of USF1/USF2 Transcription Factors." PLoS Genetics 5(9): e1000637. Lantermann, A.B., Straub, T., Stralfors, A., Yuan, G.C., Ekwall, K. and Korber, P. (2010). "Schizosaccharomyces pombe genome-wide nucleosome mapping reveals positioning mechanisms distinct from those of Saccharomyces cerevisiae." Nature

Structural & Molecular Biology 17(2): 251–257. Lattka, E., Eggers, S., Moeller, G., Heim, K., Weber, M., Mehta, D., Prokisch, H., Illig, T. and Adamski, J. (2010). "A common FADS2 promoter polymorphism increases promoter activity and facilitates binding of transcription factor ELK1." Journal of Lipid

Research 51(1): 182-191. Lavie, C., Milani, R. and Ventura, H. (2009). "Obesity and Cardiovascular Disease Risk Factor, Paradox, and Impact of Weight Loss." Journal of the American College of

Cardiology 53: 1925-1932. Lee, C. and Scherer, S.W. (2010). "The clinical context of copy number variation in the human genome." Expert Reviews in Molecular Medicine 12: e8. Lee, P. and Shatkay, H. (2008). "F-SNP: computationally predicted functional SNPs for

189

disease association studies." Nucleic Acids Research 36(suppl 1): D820-D824. Lee, P.H. and Shatkay, H. (2009). "An integrative scoring system for ranking SNPs by their potential deleterious effects." Bioinformatics 25(8): 1048-1055. Leong, P., Andrews, G., Johnson, D., Dyer, K., Xi, S., Mai, J., Robbins, P., Gadiparthi, S., Burke, N., Watkins, S., et al. (2003). "Targeted inhibition of Stat3 with a decoy oligonucleotide abrogates head and neck cancer cell growth." Proceedings of the

National Academy of Sciences 100(7): 4138-4143. Li, J., Gao, E., Seidner, S. and Mendelson, C. (1998). "Differential regulation of baboon SP-A1 and SP-A2 genes: structural and functional analysis of 5’-flanking DNA." American Journal of Physiology 275(6): L1078-L1088. Li, M., Mo, Y., Luo, X.J., Xiao, X., Shi, L., Peng, Y.M., Qi, X.B., Liu, X.Y., Yin, L.D., Diao, H.B., et al. (2011). "Genetic association and identification of a functional SNP at GSK3β for schizophrenia susceptibility." Schizophrenia Research 133(1-3): 165-171. Li, Y and Tollefsbol, T.O. (2011). "DNA methylation detection: Bisulfite genomic sequencing analysis." Epigenetics Protocols. Humana Press: p11-21 Li, Y.H., Liao, W., Cargill, M., Chang, M., Matsunami, N., Feng, B.J., Poon, A., Callis-Duffin, K.P., Catanese, J.J., Bowcock, A.M., et al. (2010). "Carriers of Rare Missense Variants in IFIH1 Are Protected from Psoriasis." Journal of Investigative Dermatology 130(12): 2768-2772. Li, Z., Gadue, P., Chen, K., Jiao, Y., Tuteja, G., Schug, J., Li, W. and Kaestner, K.H. (2012). "Foxa2 and H2A.Z mediate nucleosome depletion during embryonic stem cell differentiation." Cell 151(7): 1608–1616. Li, Z., Schug, J., Tuteja, G., White, P. and Kaestner, K.H. (2011). "The nucleosome map of the mammalian liver." Nature structural & molecular biology 18(6): 742-746. Lifton, R.P., Gharavi, A.G. and Geller, D.S. (2001). "Molecular mechanisms of human hypertension." Cell 104(4): 545-556. Lin, K., Simossis, V., Taylor, W. and Heringa, J. (2005). "A simple and fast secondary structure prediction method using hidden neural networks." Bioinformatics 21(2): 152-159. Liu, A.M.F., New, D.C., Lo, R.K.H. and Wong, Y.H. (2009). Reporter Gene Assays. Cell-Based Assays for High-Throughput Screening, Humana Press. 486: 109-123. Liu, C., Yang, Q., Hwang, S.-J., Sun, F., Johnson, A.D., Shirihai, O.S., Vasan, R.S., Levy, D. and Schwartz, F. (2012). "Association of Genetic Variation in the Mitochondrial Genome With Blood Pressure and Metabolic Traits." Hypertension 60(4): 949-956. Liu, S. and Song, Y. (2010). "Building Genetic Scores to Predict Risk of Complex Diseases in Humans: Is It Possible?" Diabetes 59(11): 2729-2731. Liu, Z., Zhan, Y., Bai, Y. and Sun, J. (2012). "AND logic gate application based on

190

colorimetric screening of enzyme activity." Solid State Sciences 14(7): 870–873. Love-Gregory, L., Sherva, R., Schappe, T., Qi, J.-S., McCrea, J., Klein, S., Connelly, M.A. and Abumrad, N.A. (2011). "Common CD36 SNPs reduce protein expression and may contribute to a protective atherogenic profile." Human Molecular Genetics 20(1): 193-201. Low, S.-K., Takahashi, A., Cha, P.-C., Zembutsu, H., Kamatani, N., Kubo, M. and Nakamura, Y. (2012). "Genome-wide association study for intracranial aneurysm in the Japanese population identifies three candidate susceptible loci and a functional genetic variant at EDNRA." Human Molecular Genetics 21(9): 2102-2110. Lynch, M. (2010). "Rate, molecular spectrum, and consequences of human mutation." Proceedings of the National Academy of Sciences 107(3): 961-968. Macintyre, G., Bailey, J., Haviv, I. and Kowalczyk, A. (2010). "is-rSNP: a novel technique for in silico regulatory SNP detection." Bioinformatics 26(18): i524-i530. Mackay, T.F.C., Stone, E.A. and Ayroles, J.F. (2009). "The genetics of quantitative traits: challenges and prospects." Nature Reviews Genetics 10(8): 565-577. Majewski, J. and Pastinen, T. (2011). "The study of eQTL variations by RNA-seq: from SNPs to phenotypes." Trends in Genetics 27(2): 72–79. Makar, K., Ulgiati, D., Hagman, J. and Holers, V. (2001). "A site in the complement receptor 2 (CR2/CD21) silencer is necessary for lineage specific transcriptional regulation." International Immunology 13(5): 657-664. Mannherz, H., Peitsch, M., Zanotti, S., Paddenberg, R. and Polzar, B. (1995). "A new function for an old enzyme: the role of DNase I in apoptosis." Current Topics in

Microbiology and Immunology 198: 161-174. Manolio, T.A. (2010). "Genomewide association studies and assessment of the risk of disease." The New England Journal of Medicine 363(2): 166–176. Maras, B., Barra, D., Duprè, S. and Pitari, G. (1999). "Is pantetheinase the actual identity of mouse and human vanin-1 proteins?" Journal of Clinical Investigation 461(3): 149-152. Mardis, E.R. and Wilson, R.K. (2009). "Cancer genome sequencing: a review." Human

Molecular Genetics 18(R2): R163-R168. Marian, A.J. (2010). "Hypertrophic cardiomyopathy: from genetics to treatment." European Journal of Clinical Investigation 40(4): 360–369. Marian, A.J. and Belmont, J. (2011). "Strategic Approaches to Unraveling Genetic Causes of Cardiovascular Diseases." Circulation Research 108(10): 1252-1269. Martin, A.C. and Drubin, D.G. (2003). "Impact of genome-wide functional analyses on cell biology research." Current Opinion in Cell Biology 15(1): 6-13. Martin, F., Malergue, F., Pitari, G., Philippe, J., Philips, S., Chabret, C., Granjeaud, S.,

191

Mattei, M., Mungall, A., Naquet, P., et al. (2001). "Vanin genes are clustered (human 6q22-24 and mouse 10A2B1) and encode isoforms of pantetheinase ectoenzymes." Immunogenetics 53(4): 296-306. Martin, F., Penet, M.-F., Malergue, F., Lepidi, H., Dessein, A., Galland, F., de Reggi, M., Naquet, P. and Gharib, B. (2004). "Vanin-1(-/-) mice show decreased NSAID- and Schistosoma-induced intestinal inflammation associated with higher glutathione stores." Journal of Clinical Investigation 113(4): 591-597. Marzec, J., Christie, J., Reddy, S., Jedlicka, A., Vuong, H., Lanken, P., Aplenc, R., Yamamoto, T., Yamamoto, M., Cho, H.-T., et al. (2007). "Functional polymorphisms in the transcription factor NRF2 in humans increase the risk of acute lung injury." The

FASEB Journal 21(9): 2237-2246. Mathew, C.G. (2008). "New links to the pathogenesis of Crohn disease provided by genome-wide association scans." Nature Reviews Genetics 9(1): 9-14. Mattocks, C.J., Morris, M.A., Matthijs, G., Swinnen, E., Corveleyn, A., Dequeker, E., Müller, C.R., Pratt, V. and Wallace, A. (2010). "A standardized framework for the validation and verification of clinical molecular genetic tests." European Journal of

Human Genetics 18(12): 1276–1288. Matys, V., Kel-Margoulis, O., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., et al. (2006). "TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes." Nucleic Acids

Research 34(suppl 1): D108-D110. Maynard, T.M., Haskell, G.T., Lieberman, J.A. and LaMantia, A.S. (2002). "22q11 DS: genomic mechanisms and gene function in DiGeorge/velocardiofacial syndrome." International Journal of Developmental Neuroscience 20(3): 407–419. McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P.A. and Hirschhorn, J.N. (2008). "Genome-wide association studies for complex traits: consensus, uncertainty and challenges." Nature Reviews Genetics 9(5): 356-369. McIntosh, A.M., Bennett, C., Dickson, D., Anestis, S.F., Watts, D.P., Webster, T.H., Fontenot, M.B. and Bradley, B.J. (2012). "The Apolipoprotein E (APOE) Gene Appears Functionally Monomorphic in Chimpanzees (Pan troglodytes)." PLoS ONE 7(10): e47760. McManus, F., Sands, W., Diver, L., MacKenzie, S.M., Fraser, R., Davies, E. and Connell, J.M. (2012). "APEX1 Regulation of Aldosterone Synthase Gene Transcription Is Disrupted by a Common Polymorphism in Humans." Circulation Research 111(2): 212-219. Meckler, J.F., Bhakta, M.S., Kim, M.S., Ovadia, R., Habrian, C.H., Zykovich, A., Yu, A., Lockwood, S.H., Morbitzer, R., Elsäesser, J., et al. (2013). "Quantitative analysis of TALE–DNA interactions suggests polarity effects." Nucleic Acids Research 41(7): 4118-4128. Menendez, D., Krysiak, O., Inga, A., Krysiak, B., Resnick, M. and Schönfelder, G. (2006). "A SNP in the flt-1 promoter integrates the VEGF system into the p53

192

transcriptional network." Proceedings of the National Academy of Sciences 103(5): 1406-1411. Messerli, F.H., Williams, B. and Ritz, E. (2007). "Essential hypertension." Lancet 370(9587): 591-603. Metzker, M. (2010). "Sequencing technologies - the next generation." Nature Reviews

Genetics 11(1): 31-46. Mikkelsen, T., Ku, M., Jaffe, D., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., Brockman, W., Kim, T.-K., Koche, R., et al. (2007). "Genome-wide maps of chromatin state in pluripotent and lineage-committed cells." Nature 448(7153): 553-560. Miller, S., De Vos, P., Guerre-Millo, M., Wong, K., Hermann, T., Staels, B., Briggs, M. and Auwerx, J. (1996). "The adipocyte specific transcription factor C/EBPalpha modulates human ob gene expression." Proceedings of the National Academy of

Sciences 93(11): 5507-5511. Mills, R.E., Pittard, W.S., Mullaney, J.M., Farooq, U., Creasy, T.H., Mahurkar, A.A., Kemeza, D.M., Strassler, D.S., Ponting, C.P., Webber, C., et al. (2011). "Natural genetic variation caused by small insertions and deletions in the human genome." Genome Research 21(6): 830-839. Min-Oo, G., Fortin, A., Pitari, G., Tam, M., Stevenson, M. and Gros, P. (2007). "Complex genetic control of susceptibility to malaria: positional cloning of the Char9 locus." The Journal of Experimental Medicine 204(3): 511–524. Montgomery, S.B., Sammeth, M., Gutierrez-Arcelus, M., Lach, R.P., Ingle, C., Nisbett, J., Guigo, R. and Dermitzakis, E.T. (2010). "Transcriptome genetics using second generation sequencing in a Caucasian population." Nature 464(7289): 773–777. Moore, J.H., Asselbergs, F.W. and Williams, S.M. (2010). "Bioinformatics challenges for genome-wide association studies." Bioinformatics 26(4): 445-455. Moosa, M., Ayub, M., Bashar, A., Sarwardi, G., Khan, W., Khan, H. and Yeasmin, S. (2011). "Combination of two rare mutations causes B-thalassaemia in a Bangladeshi patient." Genetics and Molecular Biology 34(3): 406-409. Moran, A.E., Oliver, J.T., Mirzaie, M., Forouzanfar, M.H., Chilov, M., Anderson, L., Morrison, J.L., Khan, A., Zhang, N., Haynes, N., et al. (2012). "Assessing the Global Burden of Ischemic Heart Disease : Part 1: Methods for a Systematic Review of the Global Epidemiology of Ischemic Heart Disease in 1990 and 2010." Global Heart 7(4): 315–329. Moreau, Y. and Tranchevent, L.C. (2012). "Computational tools for prioritizing candidate genes: boosting disease gene discovery." Nature Reviews Genetics 13: 523–536. Morgan, T.M., Krumholz, H.M., Lifton, R.P. and Spertus, J.A. (2007). "Non-validation of reported genetic risk factors for acute coronary syndrome in a large-scale replication study." JAMA: The Journal of the American Medical Association 297(14): 1551-1561.

193

Mottagui-Tabar, S., Faghihi, M., Mizuno, Y., Engström, P., Lenhard, B., Wasserman, W. and Wahlestedt, C. (2005). "Identification of functional SNPs in the 5-prime flanking sequences of human genes." BMC Genomics 6(1): 18. Muers, M. (2010). "Genomic variation: Human retrotransposons keep it active." Nature

Reviews Genetics 11(8): 527-527. Muise, A.M., Xu, W., Guo, C.-H., Walters, T.D., Wolters, V.M., Fattouh, R., Lam, G.Y., Hu, P., Murchie, R., Sherlock, M., et al. (2012). "NADPH oxidase complex and IBD candidate gene studies: identification of a rare variant in NCF2 that results in reduced binding to RAC2." Gut 61(7): 1028-1035. Mummidi, S., Adams, L.M., Van Compernolle, S.E., Kalkonde, M., Camargo, J.F., Kulkarni, H., Bellinger, A.S., Bonello, G., Tagoh, H., Ahuja, S.S., et al. (2007). "Production of Specific mRNA Transcripts, Usage of an Alternate Promoter, and Octamer-Binding Transcription Factors Influence the Surface Expression Levels of the HIV Coreceptor CCR5 on Primary T Cells." The Journal of Immunology 178(9): 5668-5681. Musunuru, K., Strong, A., Frank-Kamenetsky, M., Lee, N.E., Ahfeldt, T., Sachs, K.V., Li, X., Li, H., Kuperwasser, N., Ruda, V.M., et al. (2010). "From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus." Nature 466(7307): 714-719. Myers, S., Freeman, C., Auton, A., Donnelly, P. and McVean, G. (2008). "A common sequence motif associated with recombination hot spots and genome instability in humans." Nature Genetics 40(9): 1124–1129. Myers, S., Spencer, C.C.A., Auton, A., Bottolo, L., Freeman, C., Donnelly, P. and McVean, G. (2006). "The distribution and causes of meiotic recombination in the human genome." Biochemical Society Transactions 34(4): 526–530. Nabel, E.G. (2003). "Cardiovascular disease." The New England Journal of Medicine 349(1): 60-72. Nackley, A., Shabalina, S., Tchivileva, I., Satterfield, K., Korchynskyi, O., Makarov, S., Maixner, W. and Diatchenko, L. (2006). "Human Catechol-O-Methyltransferase Haplotypes Modulate Protein Expression by Altering mRNA Secondary Structure." Science 314(5807): 1930-1933. Naka, I., Hikami, K., Nakayama, K., Koga, M., Nishida, N., Kimura, R., Furusawa, T., Natsuhara, K., Yamauchi, T., Nakazawa, M., et al. (2012). "A functional SNP upstream of the beta-2 adrenergic receptor gene (ADRB2) is associated with obesity in Oceanic populations." International Journal of Obesity: 1-7. Nejentsev, S., Walker, N., Riches, D., Egholm, M. and Todd, J.A. (2009). "Rare Variants of IFIH1, a Gene Implicated in Antiviral Responses, Protect Against Type 1 Diabetes." Science 324(5925): 387-389. Neumann, M., Fries, H.-W., Scheicher, C., Keikavoussi, P., Kolb-Mäurer, A., Bröcker, E.-B., Serfling, E. and Kämpgen, E. (2000). "Differential expression of Rel/NF-κB and octamer factors is a hallmark of the generation and maturation of dendritic cells." Blood 95(1): 277-285.

194

Nica, A.C., Montgomery, S.B., Dimas, A.S., Stranger, B.E., Beazley, C., Barroso, I. and Dermitzakis, E.T. (2010). "Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations." PLoS Genetics 6(4): e1000895. Novembre, J., Johnson, T., Bryc, K., Kutalik, Z., Boyko, A.R., Auton, A., Indap, A., King, K.S., Bergmann, S., Nelson, M.R., et al. (2008). "Genes mirror geography within Europe." Nature 456(7218): 98–101. O’Neill, L. and Turner, B. (1995). "Histone H4 acetylation distinguishes coding regions of the human genome from heterochromatin in a differentiation-independent manner." The EMBO Journal 14(16): 3946–3957. Pang, D.X., Smith, A.J. and Humphries, S.E. (2013). "Functional analysis of TCF7L2 genetic variants associated with type 2 diabetes." Nutrition, Metabolism &

Cardiovascular Diseases 23(6): 550-556. Pasche, B. and Yi, N. (2010). "Candidate gene association studies: Successes and failures." Current Opinion in Genetics & Development 20(3): 257–261. Pastinen, T. (2010). "Genome-wide allele-specific analysis: insights into regulatory variation." Nature Reviews Genetics 11(8): 533-538. Pastinen, T., Sladek, R., Gurd, S., Sammak, A., Ge,B., Lepage, P., Lavergne., K., Villeneuve, A., Gaudin, T., Brändström., et al. (2004). “A survey of genetic and epigenetic variation affecting human gene expression.” Physiological Genomics 16(2): 184-193. Pearson, T.A. and Manolio, T.A. (2008). "How to interpret a genome-wide association study." JAMA: The Journal of the American Medical Association 299(11): 1335–1344. Pemberton, T.J., Wang, C., Li, J.Z. and Rosenberg, N.A. (2010). "Inference of unexpected genetic relatedness among individuals in HapMap Phase III." The American

Journal of Human Genetics 87(4): 457-464. Permuth-Wey, J., Kim, D., Tsai, Y.-Y., Lin, H.-Y., Chen, Y.A., Barnholtz-Sloan, J., Birrer, M.J., Bloom, G., Chanock, S.J., Chen, Z., et al. (2011). "LIN28B polymorphisms influence susceptibility to epithelial ovarian cancer." Cancer Research 71(11): 3896-3903. Peter, B.M., Huerta-Sanchez, E. and Nielsen, R. (2012). "Distinguishing between Selective Sweeps from Standing Variation and from a De novo Mutation." PLoS

Genetics 8(10): e1003011. Pickrell, J.K., Marioni, J.C., Pai, A.A., Degner, J.F., Engelhardt, B.E., Nkadori, E., Veyrieras, J.-B., Stephens, M., Gilad, Y. and Pritchard, J.K. (2010). "Understanding mechanisms underlying human gene expression variation with RNA sequencing." Nature 464(7289): 768-772. Pociot, F., Akolkar, B., Concannon, P., Erlich, H.A., Julier, C., Morahan, G., Nierras, C.R., Todd, J.A., Rich, S.S. and Nerup, J. (2010). "Genetics of Type 1 Diabetes: What's

195

Next?" Diabetes 59(7): 1561–1571. Poke, F.S., Upcher, W.R., Sprod, O.R., Young, A., Brettingham-Moore, K.H. and Holloway, A.F. (2012). "Depletion of c-Rel from Cytokine Gene Promoters Is Required for Chromatin Reassembly and Termination of Gene Responses to T Cell Activation." PLoS ONE 7(7): e41734. Pompei, L., Jang, S., Zamlynny, B., Ravikumar, S., McBride, A., Hickman, S.P. and Salgame, P. (2007). "Disparity in IL-12 release in dendritic cells and macrophages in response to Mycobacterium tuberculosis is due to use of distinct TLRs." The Journal of

Immunology 178(8): 5192-5199. Pool, R., Heringa, J., Hoefling, M., Schulz, R., Smith, J.C. and Feenstra, K. (2012). "Enabling grand-canonical Monte Carlo: Extending the flexibility of GROMACS through the GromPy python interface module." Journal of Computational Chemistry 33(12): 1207-1214. Powell, R. (2012). "The Evolutionary Biological Implications of Human Genetic Engineering." Journal of Medicine and Philosophy 37(3): 204-225. Pratt, R. and Dzau, V. (1999). "Genomics and hypertension. Concepts, potentials and opportunities." Journal of Hypertension 33(1): 238–245. Rajab, A., Straub, V., McCann, L.J., Seelow, D., Varon, R., Barresi, R., Schulze, A., Lucke, B., Lützkendorf, S., Karbasiyan, M., et al. (2010). "Fatal cardiac arrhythmia and long-QT syndrome in a new form of congenital generalized lipodystrophy with muscle rippling (CGL4) due to PTRF-CAVIN mutations." PLoS Genetics 6(3): e1000874. Rall, S.C. and Mahley, R.W. (1992). "The role of apolipoprotein E genetic variants in lipoprotein disorders." Journal of Internal Medicine 231(6): 653–659. Rao, S., Procko, E. and Shannon, M. (2001). "Chromatin Remodeling, Measured by a Novel Real-Time Polymerase Chain Reaction Assay, Across the Proximal Promoter Region of the IL-2 Gene." The Journal of Immunology 167(8): 4494-4503. Raunch, T.A. and Pfeifer, G.P. (2005). "Methylated-CpG island recovery assay: a new technique for the rapid detection of methylated-CpG islands in cancer." Laboratory

Investigation 85(9): 1172-1180. Reddy, M.V.P.L., Wang, H., Liu, S., Bode, B., Reed, J.C., Steed, R.D., Anderson, S.W., Steed, L., Hopkins, D. and She, J.-X. (2011). "Association between type 1 diabetes and GWAS SNPs in the southeast US Caucasian population." Genes and Immunity 12(3): 208–212. Reddy, S.D.N., Pakala, S.B., Ohshiro, K., Rayala, S.K. and Kumar, R. (2009). "MicroRNA-661, a c/EBPα Target, Inhibits Metastatic Tumor Antigen 1 and Regulates Its Functions." Cancer Research 69(14): 5639-5642. Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, D., Fiegler, H., Shapero, M.H., Carson, A.R., Chen, W., et al. (2006). "Global variation in copy number in the human genome." Nature 444(7118): 444-454.

196

Ria, M., Lagercrantz, J., Samnegård, A., Boquist, S., Hamsten, A. and Eriksson, P. (2011). "A Common Polymorphism in the Promoter Region of the TNFSF4 Gene Is Associated with Lower Allele-Specific Expression and Risk of Myocardial Infarction." PLoS ONE 6(3): e17652. Rizzo, J.M., Bard, J.E. and Buck, M.J. (2012). "Standardized collection of MNase-seq experiments enables unbiased dataset comparisons." BMC Molecular Biology 13(1): 15. Robertson, G., Hirst, M., Bainbridge, M., Bilenky, M., Zhao, Y., Zeng, T., Euskirchen, G., Bernier, B., Varhol, R., Delaney, A., et al. (2007). "Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing." Nature Methods 4(8): 651–657. Robertson, K. (2005). "DNA methylation and human disease." Nature Reviews Genetics 6(8): 597-610. Rohde, C., Zhang, Y., Reinhardt, R. and Jeltsch, A. (2010). "BISMA - Fast and accurate bisulfite sequencing data analysis of individual clones from unique and repetitive sequences." BMC Bioinformatics 11(1): 230. Roisin-Bouffay, C., Castellano, R., Valéro, R., Chasson, L., Galland, F. and Naquet, P. (2008). "Mouse vanin-1 is cytoprotective for islet beta cells and regulates the development of type 1 diabetes." Diabetologia 51(7): 1192–1201. Rosenbloom, K.R., Dreszer, T.R., Long, J.C., Malladi, V.S., Sloan, C.A., Raney, B.J., Cline, M.S., Karolchik, D., Barber, G.P., Clawson, H., et al. (2012). "ENCODE whole-genome data in the UCSC Genome Browser: update 2012." Nucleic Acids Research 40(D1): D912-D917. Sajan, S.A. and Hawkins, R.D. (2012). "Methods for identifying higher-order chromatin structure." Annual Review of Genomics and Human Genetics 13: 59-82. Salmon-Divon, M., Dvinge, H., Tammoja, K. and Bertone, P. (2010). "PeakAnalyzer: Genome-wide annotation of chromatin binding and modification loci." BMC

Bioinformatics 11(1): 415. Sambrook, J. and Russell, D.W. (2001). Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, New York. Sandelin, A., Alkema, W., Engström, P., Wasserman, W. and Lenhard, B. (2004). "JASPAR: an open access database for eukaryotic transcription factor binding profiles." Nucleic Acids Research 32(suppl 1): D91-D94. Sanders, S.J., Ercan-Sencicek, A.G., Hus, V., Luo, R., Murtha, M.T., Moreno-De-Luca, D., Chu, S.H., Moreau, M.P., Gupta, A.R., Thomson, S.A., et al. (2011). "Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism." Neuron 70(5): 863–885. Sandovici, I., Kassovska-Bratinova, S., Vaughan, J.E., Stewart, R., Leppert, M. and Sapienza, C. (2006). "Human Imprinted Chromosomal Regions Are Historical Hot-Spots of Recombination." PLoS Genetics 2(7): e101.

197

Satake, W., Nakabayashi, Y., Mizuta, I., Hirota, Y., Ito, C., Kubo, M., Kawaguchi, T., Tsunoda, T., Watanabe, M., Takeda, A., et al. (2009). "Genome-wide association study identifies common variants at four loci as genetic risk factors for Parkinson's disease." Nature Genetics 41(12): 1303 - 1307. Saunders, A.M., Strittmatter, W.J., Schmechel, D., George-Hyslop, P.H., Pericak-Vance, M.A., Joo, S.H., Rosi, B.L., Gusella, J.F., Crapper-MacLachlan, D.R., Alberts, M.J., et al. (1993). "Association of apolipoprotein E allele epsilon 4 with late-onset familial and sporadic Alzheimer's disease." Neurology 43(8): 1467–1472. Saxonov, S., Berg, P. and Brutlag, D.L. (2006). "A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters." Proceedings of the National Academy of Sciences 103(5): 1412–1417. Scally, A. and Durbin, R. (2012). "Revising the human mutation rate: implications for understanding human evolution." Nature Reviews Genetics 13(10): 745-753. Schaefer, C., Meier, A., Rost, B. and Bromberg, Y. (2012). "SNPdbe: constructing an nsSNP functional impacts database." Bioinformatics 28(4): 601-602. Schaeffeler, E., Eichelbaum, M., Reinisch, W., Zanger, U. and Schwab, M. (2006). "Three Novel Thiopurine S-Methyltransferase Allelic Variants (TPMT*20, *21, *22) – Association With Decreased Enzyme Function." Human Mutation 27(9): 976. Schork, N.J., Murray, S.S., Frazer, K.A. and Topol, E.J. (2009). "Common vs. Rare Allele Hypotheses for Complex Diseases." Current Opinion in Genetics & Development 19(3): 212–219. Selden, R.F., Howie, K.B., Rowe, M.E., Goodman, H.M. and Moore, D.D. (1986). "Human Growth Hormone as a Reporter Gene in Regulation Studies Employing Transient Gene Expression." Molecular and Cellular Biology 6(9): 3173-3179. Shan, J., Mahfoudh, W., Dsouza, S.P., Hassen, E., Bouaouina, N., Abdelhak, S., Benhadjayed, A., Memmi, H., Mathew, R.A., Aigha, I.I., et al. (2012). "Genome-Wide Association Studies (GWAS) breast cancer susceptibility loci in Arabs: susceptibility and prognostic implications in Tunisians." Breast Cancer Research and Treatment 135(3): 715–724. Shastry, B.S. (2007). “SNPs in disease gene mapping, medicinal drug development and evolution.” Journal of Human Genetics 52(11): 871-880. Shendure, J. and Ji, H. (2008). "Next-generation DNA sequencing." Nature

Biotechnology 26(10): 1135-1145. Sherry, S., Ward, M., Kholodov, M., Baker, J., Phan, L., Smigielski, E. and Sirotkin, K. (2001). "dbSNP: the NCBI database of genetic variation." Nucleic Acids Research 29(1): 308-311. Shoemaker, R., Deng, J., Wang, W. and Zhang, K. (2010). "Allele-specific methylation is prevalent and is contributed by CpG-SNPs in the human genome." Genome Research 20(7): 883–889.

198

Sleiman, P., Flory, J., Imielinski, M., Bradfield, J., Annaiah, K., Willis-Owen, S.A., Wang, K., Rafaels, N.M., Michel, S., Bonnelykke, K., et al. (2010). "Variants of DENND1B associated with asthma in children." The New England Journal of Medicine 362(1): 36-44. Smith, A.J., Palmen, J., Putt, W., Talmud, P.J., Humphries, S.E. and Drenos, F. (2010). "Application of statistical and functional methodologies for the investigation of genetic determinants of coronary heart disease biomarkers: lipoprotein lipase genotype and plasma triglycerides as an exemplar." Human Molecular Genetics 19(20): 3936-3947. Smith AJP, Howard P, Shah S, Eriksson P, Stender S, et al. (2012) “Use of Allele-Specific FAIRE to Determine Functional Regulatory Polymorphism Using Large-Scale Genotyping Arrays” . PLoS Genetics 8(8): e1002908 Smith, C. and Hager, G. (1997). "Transcriptional regulation of mammalian genes in vivo. A tale of two templates." The Journal of Biological Chemistry 272(44): 27493-27496. Smith, R., Seale, R. and Yu, J. (1983). "Transcribed chromatin exhibits an altered nucleosomal spacing." Proceedings of the National Academy of Sciences 80(18): 5505-5509. Ståhl, P.L. and Lundeberg, J. (2012). "Toward the Single-Hour High-Quality Genome." Annual Review of Biochemistry 81: 359-378. Statham, A., Robinson, M., Song, J., Coolen, M., Stirzaker, C. and Clark, S. (2012). "Bisulfite sequencing of chromatin immunoprecipitated DNA (BisChIP-seq) directly informs methylation status of histone-modified DNA." Genome Research 22(6): 1120-1127. Steidl, U., Steidl, C., Ebralidze, A., Chapuy, B., Han, H-J., Will, B., Rosenbauer, F., et

al. (2007) "A distal single nucleotide polymorphism alters long-range regulation of the PU. 1 gene in acute myeloid leukemia." Journal of Clinical Investigation 117(9): 2611-2620. Stower, H. (2013). "Evolution: Explosive human genetic variation." Nature Reviews

Genetics 14(1): 5. Stranger, B.E., Stahl, E.A. and Raj, T. (2011). "Progress and Promise of Genome-Wide Association Studies for Human Complex Trait Genetics." Genetics 187(2): 367-383. Su, L., Creusot, R.J., Gallo, E.M., Chan, S.M., Utz, P.J., Fathman, C.G. and Ermann, J. (2004). "Murine CD4+ CD25+ regulatory T cells fail to undergo chromatin remodeling across the proximal promoter region of the IL-2 gene." The Journal of Immunology 173(8): 4994-5001. Sun, B., Chang, M., Chen, D. and Nie, P. (2006). "Gene structure and transcription of IRF-2 in the mandarin fish Siniperca chuatsi with the finding of alternative transcripts and microsatellite in the coding region." Immunogenetics 58(9): 774-784. Sun, J., Jia, P., Fanous, A.H., Webb, B.T., van den Oord, E.J.C.G., Chen, X., Bukszar, J., Kendler, K.S. and Zhao, Z. (2009). "A multi-dimensional evidence-based candidate

199

gene prioritization approach for complex diseases–schizophrenia as a case." Bioinformatics 25(19): 2595-6602. Suzuki, H., Maruyama, R., Yamamoto, E. and Kai, M. (2012). "DNA methylation and microRNA dysregulation in cancer." Molecular Oncology 6(6): 567-578. Syvanen, A., Landegren, U., Isaksson, A., Gyllensten, U. and Brookes, A. (1999). "Enthusiasm mixed with skepticism about singlenucleotide polymorphism markers for dissecting complex disorders." European Journal of Human Genetics 7: 98-101. Szpirer, C., Riviere, M., Cortese, R., Nakamura, T., Islam, M., Levan, G. and Szpirer, J. (1992). "Chromosomal localization in man and rat of the genes encoding the liver-enriched transcription factors C/EBP, DBP, and HNF1/LFB-1 (CEBP, DBP, and transcription factor 1, TCF1, respectively) and of the hepatocyte growth factor/scatter factor gene (HGF)." Genomics 13(2): 293–300. Tabor, H.K., Risch, N.J. and Myers, R.M. (2002). "Candidate-gene approaches for studying complex genetic traits: practical considerations." Nature Reviews Genetics 3(5): 391-397. Tanira, M.O. and Al Balushi, K.A. (2005). "Genetic variations related to hypertension: a review." Journal of Human Hypertension 19(1): 7-19. Tao, H., Cox, D.R. and Frazer, K.A. (2006). "Allele-Specific KRT1 Expression Is a Complex Trait." PLoS Genetics 2(6): e93. Tao, S., Wang, Z., Feng, J., Hsu, F.-C., Jin, G., Kim, S.-T., Zhang, Z., Gronberg, H., Zheng, L.S., Isaacs, W.B., et al. (2012). "A genome-wide search for loci interacting with known prostate cancer risk-associated genetic variants." Carcinogenesis 33(3): 598-603. Taylor, J.Y., Maddox, R. and Wu, C.Y. (2009). "Genetic and Environmental Risks for High Blood Pressure Among African American Mothers and Daughters." Biological

Research for Nursing 11(1): 53–65. Tenesa, A., Farrington, S.M., Prendergast, J.G., Porteous, M.E., Walker, M., Haq, N., Barnetson, R.A., Theodoratou, E., Cetnarskyj, R., Cartwright, N., et al. (2008). "Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21." Nature Genetics 40(5): 631–637. Teng, M., Ichikawa, S., Padgett, L.R., Wang, Y., Mort, M., Cooper, D.N., Koller, D.L., Foroud, T., Edenberg, H.J., Econs, M.J., et al. (2012). "regSNPs: a strategy for prioritizing regulatory single nucleotide substitutions." Bioinformatics 28(14): 1879-1886. The International HapMap Consortium (2003). "The International HapMap Project." Nature 426(6968): 789–796. The International HapMap Consortium (2004). "Integrating ethics and science in the International HapMap Project." Nature Reviews Genetics 5(6): 467–475. The International HapMap Consortium (2007). "A second generation human haplotype

200

map of over 3.1 million SNPs." Nature 449(7164): 851-861. Thomas, P.E., Klinger, R., Furlong, L.I., Hofmann-Apitius, M. and Friedrich, C.M. (2011). "Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers." BMC Bioinformatics 12(suppl 4): S4. Thomas-Chollier, M., Hufton, A., Heinig, M., O'Keeffe, S., El Masri, N., Roider, H.G., Manke, T. and Vingron, M. (2011). "Transcription factor binding predictions using TRAP for the analysis of ChIP-seq data and regulatory SNPs." Nature Protocols 6(12): 1860–1869. Thurman, R.E., Rynes, E., Humbert, R., Vierstra, J., Maurano, M.T., Haugen, E., Sheffield, N.C., Stergachis, A.B., Wang, H., Vernot, B., et al. (2012). "The accessible chromatin landscape of the human genome." Nature 489(7414): 75-82. Timpson, N.J., Wade, K.H. and Smith, G.D. (2012). "Mendelian Randomization: Application to Cardiovascular Disease." Current Hypertension Reports 14(1): 29-37. Todd, J.A., Walker, N.M., Cooper, J.D., Smyth, D.J., Downes, K., Plagnol, V., Bailey, R., Nejentsev, S., Field, S.F., Payne, F., et al. (2007). "Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes." Nature Genetics 39(7): 857-864. Tomlinson, I.P., Webb, E., Carvajal-Carmona, L., Broderick, P., Howarth, K., Pittman, A.M., Spain, S., Lubbe, S., Walther, A., Sullivan, K., et al. (2008). "A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3." Nature Genetics 40(5): 623–630. Tomlinson, I.P.M., Houlston, R.S., Montgomery, G.W., Sieber, O.M. and Dunlop, M.G. (2012). "Investigation of the effects of DNA repair gene polymorphisms on the risk of colorectal cancer." Mutagenesis 27(2): 219-223. Tsukada, S., Tanaka, Y., Maegawa, H., Kashiwagi, A., Kawamori, R. and Maeda, S. (2006). "Intronic Polymorphisms within TFAP2B Regulate Transcriptional Activity and Affect Adipocytokine Gene Expression in Differentiated Adipocytes." Molecular

Endocrinology 20(5): 1104-1111. Tycko, B. (2010). "Allele-specific DNA methylation: beyond imprinting." Human

Molecular Genetics 19(R2): R210-R220. Ursic-Bedoya, R., Nazzari, H., Cooper, D., Triana, O., Wolff, M. and Lowenberger, C. (2008). "Identification and characterization of two novel lysozymes from Rhodnius prolixus, a vector of Chagas disease." Journal of Insect Physiology 54(3): 593–603. van Rijnsoever, M., Grieu, F., Elsaleh, H., Joseph, D. and Iacopetta, B. (2002). "Cancer Characterisation of colorectal cancers showing hypermethylation at multiple CpG islands." Gut 51(6): 797-802. Visvikis-Siest, S. and Marteau, J.-B. (2006). "Genetic variants predisposing to cardiovascular disease." Current Opinion in Lipidology 17(2): 139-151. Volkmar, M., Dedeurwaerder, S., Cunha, D.A., Ndlovu, M.N., Defrance, M., Deplus,

201

R., Calonne, E., Volkmar, U., Igoillo-Esteve, M., Naamane, N., et al. (2012). "DNA methylation profiling identifies epigenetic dysregulation in pancreatic islets from type 2 diabetic patients." The EMBO Journal 31(6): 1405–1426. von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S. and Bork, P. (2002). "Comparative assessment of large-scale data sets of protein-protein interactions." Nature 417(6887): 399-403. Wang, J., Fan, H.C., Behr, B. and Quake, S.R. (2012). "Genome-wide single-cell analysis of recombination activity and de novo mutation rates in human sperm." Cell 150(2): 402-412. Wang, K., Zhang, H., Ma, D., Bucan, M., Glessner, J.T., Abrahams, B.S., Salyakina, D., Imielinski, M., Bradfield, J.P., Sleiman, P.M., et al. (2009). "Common genetic variants on 5p14.1 associate with autism spectrum disorders." Nature 459(7246): 528-533. Wang, L., Ellsworth, K.A., Moon, I., Pelleymounter, L.L., Eckloff, B.W., Martin, Y.N., Fridley, B.L., Jenkins, G.D., Batzler, A., Suman, V.J., et al. (2010). "Therapeutics, Targets, and Chemical Biology: Functional Genetic Polymorphisms in the Aromatase Gene CYP19 Vary the Response of Breast Cancer Patients to Neoadjuvant Therapy with Aromatase Inhibitors." Cancer Research 70(1): 319-328. Wang, S., Wen, F., Wiley, G.B., Kinter, M.T., Gaffney, P.M. (2013) “An Enhancer Element Harboring Variants Associated with Systemic Lupus Erythematosus Engages the TNFAIP3Promoter to Influence A20 Expression” PLoS Genetics 9(9): e1003750. Wang, W., Ronaghi, M., Chong, S.S. and Lee, C.G.L. (2011). "pfSNP: An integrated potentially functional SNP resource that facilitates hypotheses generation through knowledge syntheses." Human Mutation 32(1): 19-24. Wang, Z. and Moult, J. (2001). "SNPs, Protein Structure, and Disease." Human

Mutation 17(4): 263-270. Watt, F. and Molloy, P. (1988). "Cytosine methylation prevents binding to DNA of a HeLa cell transcription factor required for optimal expression of the adenovirus major late promoter." Genes and Development 2(9): 1136-1143. Wei, G., Hu, G., Cui, K. and Zhao, K. (2012). "Genome-wide mapping of nucleosome occupancy, histone modifications, and gene expression using next-generation sequencing technology." Methods in Enzymology 513: 297-313. Wei, S., Niu, J., Zhao, H., Liu, Z., Wang, L.-E., Han, Y., Chen, W., Amos, C., Rafnar, T., Sulem, P., et al. (2011). "Association of a novel functional promoter variant (rs2075533 C>T) in the apoptosis gene TNFSF8 with risk of lung cancer—a finding from Texas lung cancer genome-wide association study." Carcinogenesis 32(4): 507-515. Weischenfeldt, J., Symmons, O., Spitz, F. and Korbel, J.O. (2013). "Phenotypic impact of genomic structural variation: insights from and for human disease." Nature Reviews

Genetics 14(2): 125-138. Werner, T. (2010). "Next generation sequencing in functional genomics." Briefings in

202

Bioinformatics 11(5): 499-511. Willer, C.J., Sanna, S., Jackson, A.U., Scuteri, A., Bonnycastle, L.L., Clarke, R., Heath, S.C., Timpson, N.J., Najjar, S.S., Stringham, H.M., et al. (2008). "Newly identified loci that influence lipid concentrations and risk of coronary artery disease." Nature Genetics 40(2): 161 - 169. Wingender, E., Dietze, P., Karas, H. and Knüppel, R. (1996). "TRANSFAC: a database on transcription factors and their DNA binding sites." Nucleic Acids Research 24(1): 238-241. Wright, J., Brown, S. and Cole, M. (2010). "Upregulation of c-MYC in cis through a Large Chromatin Loop Linked to a Cancer Risk-Associated Single-Nucleotide Polymorphism in Colorectal Cancer Cells." Molecular Cell Biology 20(6): 1411-1420. Wu, H., Boackle, S.A., Hanvivadhanakul, P., Ulgiati, D., Grossman, J.M., Lee, Y., Shen, N., Abraham, L.J., Mercer, T.R., Park, E., et al. (2007). "Association of a common complement receptor 2 haplotype with increased risk of systemic lupus erythematosus." Proceedings of the National Academy of Sciences of the United States

of America 104(10): 3961–3966. Wu, J., Smith, L.T., Plass, C.P. and Huang, T.H.M. (2006). "ChIP-chip comes of age for genome-wide functional analysis." Cancer Research 66(14): 6899-6902. Wu, L. and Winston, F. (1997). "Evidence that Snf–Swi controls chromatin structure over both the TATA and UAS regions of the SUC2 promoter in Saccharomyces cerevisiae." Nucleic Acids Research 25(21): 4230–4234. Xia, F., Dou, Y., Lei, G. and Tan, Y. (2011). "FPGA accelerator for protein secondary structure prediction based on the GOR algorithm." BMC Bioinformatics 12(suppl 1): S5. Xiao, F., Widlak, P. and Garrard, W.T. (2007). "Engineered apoptotic nucleases for chromatin research." Nucleic Acids Research 35(13): e93. Xu, Z. and Taylor, J. (2009). "SNPinfo: integrating GWAS and candidate gene information into functional SNP selection for genetic association studies." Nucleic Acids

Research 37(suppl 2): W600-W605. Yamazaki, K., McGovern, D., Ragoussis, J., Paolucci, M., Butler, H., Jewell, D., Cardon, L., Takazoe, M., Tanaka, T., Ichimori, T., et al. (2005). "Single nucleotide polymorphisms in TNFSF15 confer susceptibility to Crohn's disease." Human

Molecular Genetics 14(22): 3499–3506. Yanai, I., Benjamin, H., Shmoish, M., Chalifa-Caspi, V., Shklar, M., Ophir, R., Bar-Even, A., Horn-Saban, S., Safran, M., Domany, E., et al. (2005). "Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification." Bioinformatics 21(5): 650-659. Yanase, K., Tsukahara, S., Mitsuhashi, J. and Sugimoto, Y. (2006). "Functional SNPs of the breast cancer resistance protein â€ therapeutic effects and inhibitor development." Cancer letters 234(1): 73-80.

203

Yeh, H.-Y. and Klesius, P. (2008). "Molecular cloning, sequencing and characterization of channel catfish (Ictalurus punctatus, Rafinesque 1818) cathepsin S gene." Veterinary

Immunology and Immunopathology 126(3): 382–387. Yigit, E., Bischof, J.M., Zhang, Z., Ott, C.J., Kerschner, J.L., Leir, S., Buitrago-Delgado, E., Zhang, Q., Wang, J.P., Widom, J., et al. (2013). "Nucleosome mapping across the CFTR locus identifies novel regulatory factors." Nucleic Acids Research 41(5): 2857-2868. Zhang, K., Li, J.B., Gao, Y., Egli, D., Xie, B., Deng, J., Li, Z., Lee, J.H., Aach, J., Leproust, E.M., et al. (2009). "Digital RNA allelotyping reveals tissue-specific and allele-specific gene expression in human." Nature Methods 6(8): 613–618. Zhang, L., Liu, Y., Song, F., Zheng, H., Hu, L., Lu, H., Liu, P., Hao, X., Zhang, W. and Chen, K. (2011). "Functional SNP in the microRNA-367 binding site in the 3'UTR of the calcium channel ryanodine receptor gene 3 (RYR3) affects breast cancer risk and calcification." Proceedings of the National Academy of Sciences 108(33): 13653-13658. Zhang, W.M., Zhang, Y., Zhang, L.H., Wang, S.G., Zhu, T.Y., Lin, D. and Ma, G.Z. (2005). "Nucleotide sequence and mRNA expression analysis of β-actin gene in the orange-spotted grouper Epinephelus coioides." Fish Physiology and Biochemistry 31(4): 373-383. Zhang, Y. and Jeltsch, A. (2010). "The Application of Next Generation Sequencing in DNA Methylation Analysis." Genes 1(1): 85-101. Zhang, Y., Rohde, C., Reinhardt, R., Voelcker-Rehage, C. and Jeltsch, A. (2009). "Non-imprinted allele-specific DNA methylation on human autosomes." Genome Biology 10(12): R138 Zhu, M. and Zhao, S. (2007). "Candidate Gene Identification Approach: Progress and Challenges." International Journal of Biological Sciences 3(7): 420-427. Zhu, X. and Cooper, R. (2007). "Admixture Mapping Provides Evidence of Association of the VNN1 Gene with Hypertension." PLoS ONE 2(11): e1244.

204

Appendix – Supplementary Tables and Figures

Table A.1: Description of general biochemicals and reagents

As shown are the biochemical reagents (“Biochemical/reagent”) that were used in the current study. Biochemicals were of the highest grade available and purchased from a variety of distributers (“Distributer/Supplier”).

Biochemical/reagent Distributer/Supplier

1X Dulbecco’s Phosphate Buffered Saline (PBS)

Life Technologies Australia Pty Ltd, Mulgrave, VIC, Australia (Life Technologies, Australia)

100 base pair DNA Ladder (500 µg/mL) New England Biolabs, Ipswich, MA, USA (New England Biolabs, USA)

1 kilobase DNA Ladder (500 µg/mL) New England Biolabs, Ipswich, MA, USA (New England Biolabs, USA)

Agarose, analytical grade Promega Corporation, Madison WI, USA (Promega, USA)

Ampicillin sodium salt Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia)

Bacto-tryptone Difco Laboratories, Detroit MI, USA (Difco, USA)

Bacto-yeast extract Difco Laboratories, Detroit MI, USA (Difco, USA)

Boric acid Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia)

Bovine Serum Albumin (BSA) New England Biolab, Ipswich, MA, USA (New England Biolabs, USA)

Bromophenol blue Merck Pty Limited, Kilsyth, Victoria, Austraila (Merck, Australia)

DH5α-T1R Chemically Competent Escherichia Coli


Difco Agar Difco Laboratories, Detroit, MI, USA (Difco, USA)

Dimethyl sulfoxide (DMSO) Merck Pty. Ltd, Kilsyth VIC, Australia (Merch, Australia)

Deoxyribonuclease I (DNase I) Invitrogen Corporation, Mount Waverley, VIC, Australia (Invitrogen Corporation, Australia)

Dithiothreitol (DTT) Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia)

Ethylenediaminetetraacetic acid (EDTA) Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia)

Ethylene glycol tetraacetic acid (EGTA) Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia)

Ethanol, absolute APS Ajax Finechem, Auburn NSW, Australia (APS Ajax Finechem, Australia)

205


Ethidium Bromide Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia)

Fetal Bovine Serum (FBS) Life Technologies, Gaitherburg MD, USA (Life Technologies, USA)

Ficoll GE Healthcare Technologies, Sydney, Australia (GE Healthcare Limited, Australia)

Gel Loading Dye Blue (6X) New England Biolabs, Ipswich, MA, USA (New England Biolabs, USA)

Glycerol Merck Pty Limited, Kilsyth, Victoria, Austraila (Merck, Australia)

HEPES MP Biomedicals, LLC, Ohio, USA (MP Biomedicals, USA)

Hi-Di Formamide Applied Biosystems Pty Ltd, Scoresby Victoria, Australia (Applied Biosystems, Australia)

Hydrogen Chloride School of Biochemistry Chemical Store

Isopropanol (propan-2-ol) Merck Pty. Limited, Kilsyth VIC, Australia (Merch, Australia)

Liquid Nitrogen School of Biochemistry Chemical Store, Crawley, WA, Australia

Leupeptin Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia)

Magnesium Chloride Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia)

Micrococcal Nuclease (MNase) New England Biolab, Ipswich, MA, USA (New England Biolabs, USA)

Penicillin-Streptomycin 10 000 Units/mL Penicillin 10 000 µg/mL Streptomycin


Pepstatin Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia)

Phosphate Buffered Saline (PBS) Life Technologies, Gaitherburg MD, USA (Life Technologies, USA)

Phenylmethylsulfonyl fluoride (PMSF) Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia)

Poly-deoxy-inosinic-deoxy-cytidylic (Poly[d(I-C)])

Roche Diagnostics Corporation, Indianapolis IN, USA (Roche Diagnostics, USA)

Potassium Chloride CHEM-SUPPLY, Gillman, SA, Australia

Proteinase K New England Biolab, Ipswich, MA, USA (New England Biolabs, USA)

RPMI-1640 cell culture medium Life Technologies, Gaitherburg MD, USA (Life Technologies, USA)

Sodium Acetate Laboratory Supply, Osborn Park, WA, Australia (Laboratory Supply, Australia)

Sodium Chloride Merck Pty. Limited, Kilsyth VIC, Australia (Merch, Australia)

Sodium dodecyl sulphate (SDS) Sigma-Aldrich Pty Ltd, Castle Hill, NSW

206

Australia (Sigma, Australia)


Sodium Hydroxide School of Biochemistry Chemical Store, Crawley, WA, Australia

Trimza base Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia)

Trypan Blue Stain Sigma-Aldrich Pty Ltd, Castle Hill, NSW Australia (Sigma, Australia)

Water for Irrigation Baxter Healthcare Pty Ltd, Old Toongabbie, NSW, Australia

X-Gal (5-bromo-4-chloro-3-indolyl-beta-D-galacto-pyranoside)

Promega Corporation, Madison WI, USA (Promega, USA)

207

Table A.2: Distributers for laboratory equipment used in research

As shown is the equipment (“Equipment”) that was used in the current study. Equipment was of the highest standard available and was previously purchased from a variety of distributers (“Distributer/Supplier”).

Equipment Distributer/Supplier

Beckman J2-MC Centrifuge Beckman Coulter Inc., Fullerton CA, USA (Beckman Coulter, USA)

Bioline Shaking Incubator Edwards Instrument Company, Narellan NSW, Australia (Edwards Instrument Company, Australia)

CFX96 Real-Time PCR Detection System Bio-Rad Laboratories, CA, USA (Bio-Rad Laboratories, USA)

C02 Water Jacketed Incubator Forma Scientific Inc., Marietta OH, USA (Forma Scientific, USA)

Dark reader - blue light transilluminator Clare Chemical Research, CO, USA (Clare Chemical Research, USA)

Electrophoresis Tank BIO-RAD Laboratories, Hercules, CA, USA (BIO-RAD, USA)

Eppendorf Centrifuge 5415C Eppendorf AG, Hamberg, Germany (Eppendorf, Germany)

Eppendorf Refridgerative Microcentrifuge 5417R

Eppendorf AG, Hamberg, Germany (Eppendorf, Germany)

Hypercassette Amersham International plc, Buckinghamshire, England, UK (Amersham, UK)

Jouan C312 swinging rotor centrifuge Jouan, St-Herblain, France (Jouan, France)

Nanodrop ND-100 NanoDrop Technologies, DE, USA (NanoDrop, USA)

Neubauer Counting Chamber ProSciTech, Kirwan, QLD, Australia

PTC-100 Programmable Thermal Controller

MJ Research Inc., San Francisco, CA, USA (MJ Research, USA)

Spectroline UV Transilluminator Spectronics Corporation, Westbury, NY, USA (Spectronics, USA)

Wild M40 light microscope Wild Heerbrugg, Gais, Switzerland

XCell SureLock Electrophoresis Cell Invitrogen Corporation, Mount Waverley, VIC, Australia (Invitrogen Corporation, Australia)

208

Table A.3: Distributers for general laboratory disposables used in this research

As shown are the general laboratory disposables (“Disposables”) that were used in the current study. Disposables were of the highest standard available and were purchased from a variety of distributers (“Distributer/Supplier”).

Disposables Distributer/Supplier

1.5mL Cryovials Nalge Nunc International, NY, USA (Nalge Nuc International, USA)

96-well plates Bio-Rad Laboratories, CA, USA (Bio-Rad Laboratories, USA)

15ml and 50mL Falcon Tubes Sarstedt Inc., Nürnberg, Germany (Sarstedt, Germany)

0.2mL and 1.5ml Eppendorf micro-centrifuge tubes

Eppendorf AG, Hamberg, Germany (Eppendorf, Germany)

25 cm2 and 75 cm2 filter-top culture flask Stratagene, La Jolla, CA, USA

Petri dishes Sarstedt Inc., Nürnberg, Germany (Sarstedt, Germany)

Pasteur pipettes Sarstedt Inc., Nürnberg, Germany (Sarstedt, Germany)

209

Table A.4: DNase I cutting at the -137 and -587 VNN1 gene promoter SNPs, with

SPA-2 and β-actin control loci in SAFHS cell 0054

DNase I-treated cultured cell line of the genotypes (“Genotype”) shown were used to calculate chromatin accessibility expressed as a ratio of undigested to DNase I-digested (DNase I cutting [%]). The data were obtained from the Ramos cell line and assayed in triplicate (n=3). The average DNase I cutting values for each replicate are shown in addition to the overall average (“Average”) of each genotype.

Genotype -137T/G -587G/A SPA-2 β-actin

DNase I cutting

(%)

19.995 15.991 3.754 17.761

24.990 15.310 7.218 20.387

23.940 19.949 11.591 23.417

Average 23.0 17.1 7.5 20.5

210

Table A.5: DNase I cutting at the -137 and -587 VNN1 gene promoter SNPs in

homozygous SAFHS cell lines

DNase I-treated cell lines of the genotypes (“Genotype”) shown were used to calculate chromatin accessibility expressed as a ratio of undigested to DNase I-digested (DNase I cutting [%]). The data were obtained from cultured human cell lines derived from the SAFHS cohort for each genotype and assayed in triplicate (n=3). The average DNase I cutting values for each replicate are shown in addition to the overall average (“Average”) of each genotype.

Genotype 137TT 137GG 587GG 587AA

DNase I cutting (%) 31.815 25.357 20.203 24.558

29.767 26.024 17.761 22.562

29.604 24.974 16.576 20.650

Average 30.4 25.5 18.2 22.6

211

Table A.6: MNase cutting at the -137 and -587 VNN1 gene promoter SNPs, with

SPA-2 and β-actin control loci in heterozygous Ramos cell line

MNase-treated cell lines of the genotypes (“Genotype”) shown were used to calculate chromatin accessibility expressed as a ratio of undigested to MNase-digested (MNase cutting [%]). As shown are the data obtained from cultured cell lines derived from the

SAFHS cohort for each genotype and assayed in triplicate (n=3). The average MNase cutting values for each replicate are shown in addition to the overall average (“Average”) of each genotype.

Genotype -137T/G -587G/A SPA-2 β-actin

MNase cutting (%) 28.327 25.346 17.166 28.105

30.898 26.273 21.606 30.466

33.118 26.193 22.299 31.191

Average 30.8 25.9 20.4 29.9

212

Table A.7: MNase cutting at the -137 VNN1 gene promoter SNP in different

SAFHS cell lines

MNase-treated cell lines of the genotypes (“Genotype”) shown were used to calculate chromatin accessibility expressed as a ratio of undigested to MNase-digested (MNase cutting [%]). As shown are the data obtained from cultured cell lines (“Cell line designation") derived from the SAFHS cohort for each genotype and assayed in

quadruplicate or greater (n≥4) based on availability of frozen stocks. The average MNase cutting values for each replicate are shown in addition to the overall average (“Average”) of each genotype.

Cell line

designation 0306 0525 0029 0054 0188 0194

Genotype -137G/G -137G/G -137T/G -137T/G -137T/T -137T/T

MNase

cutting (%)

11.747 12.340 21.510 12.653 22.957 22.727

12.433 21.123 19.013 13.713 21.399 38.598

10.698 11.564 19.319 16.756 29.553 19.720

12.974 10.188 25.238 23.097 26.713 28.998

19.938 12.554 16.266 24.455 29.462

16.548 23.647 35.055

13.903

Average 13.6 13.6 18.8 16.6 24.8 29.1

213

Table A.8: MNase cutting at the -587 VNN1 gene promoter SNP in different

SAFHS cell lines

MNase-treated cell lines of the genotypes (“Genotype”) shown were used to calculate chromatin accessibility expressed as a ratio of undigested to MNase-digested (MNase cutting [%]). As shown are the data obtained from cultured cell lines (“Cell line designation") derived from the SAFHS cohort for each genotype and assayed in

quadruplicate or greater (n≥4) based on availability of frozen stocks. The average MNase cutting values for each replicate are shown in addition to the overall average (“Average”) of each genotype.

Cell line

designation 0306 0525 0029 0054 0188 0194

Genotype -587A/A -587A/A -587G/A -587G/A -587G/G -587G/G

MNase

cutting (%)

32.842 33.869 9.057 12.544 7.509 16.278

18.842 31.503 18.037 16.039 9.276 12.497

19.361 27.52 17.154 20.028 2.247 16.353

38.251 20.535 15.615 16.427 7.442 10.318

29.265 24.748 13.712 9.32

21.325 15.773

15.645

Average 26.6 27.6 15.0 16.3 7.2 13.9