assessment of genetic risk factors for cardiovascular...
TRANSCRIPT
Assessment of Genetic Risk Factors for
Cardiovascular Diseases in Pakistani Population
A thesis submitted for partial fulfillment of the
requirement for the degree of Doctor of Philosophy
By
MUHAMMAD SHAKEEL
Dr. Panjwani Center for Molecular Medicine and Drug Research,
International Center for Chemical and Biological Sciences,
University of Karachi, Karachi-75270, Pakistan
January 2018
CERTIFICATE
TO WHOM IT MAY CONCERN
It is certified that the thesis entitled, “Assessment of Genetic Risk Factors for
Cardiovascular Diseases in Pakistani Population”, submitted to the Board of Advanced
Studies and Research (BASR), University of Karachi, by Mr. Muhammad Shakeel, fulfills the
requirements for awarding the degree of Doctor of Philosophy (Ph.D.) in Molecular
Medicine.
___________________ ___________________
Dr. Ishtiaq Ahmad Khan Prof. Dr. M. Iqbal Choudhary
(Research Supervisor) (H.I., S.I., T.I.)
Assistant Professor Director ICCBS
PCMD, ICCBS University of Karachi, Karachi-75270
University of Karachi, Karachi-75270 Pakistan
Pakistan.
Dedication
To
my
Loving Parents
and
Affectionate Siblings
I
Acknowledgements
First of all I bow my head in front of Almighty Allah for His mercy and blessings. All
loves, respects and references to the Holy Prophet (Sallallaho Alaihe Wasallam) for
enlightening of souls with the light of knowledge.
I express my gratitude to Ms. Nadira Panjwani, H.I., S.I. (Chairperson, Dr. Panjwani
Memorial Trust) for establishing Dr. Panjwani Center for Molecular Medicine and Drug
Research (PCMD), at International Center for Chemical and Biological Sciences
(ICCBS), University of Karachi. I am highly grateful to Prof. Dr. Atta-ur-Rahman,
F.R.S., N.I., H.I., S.I., T.I. (Patron-in-Chief ICCBS) for establishing Jamil-ur-Rahman
Center for Genome Research, and Prof. Dr. M. Iqbal Choudhary, H.I., S.I., T.I.
(Director ICCBS) for leading this world class institution to greater heights. I am deeply
indebted to my research supervisor Dr. Ishtiaq Ahmad Khan for his stimulating
personality, skillful guidance, keen interest, sincere advice and inspiration during the
course of my work.
I am thankful to the Higher Education Commission, Pakistan for awarding the
Indigenous PhD Fellowship. I am greatly encumbered to Prof. Dr. M. Kamran Azim for
his help, expert opinion and guidance during my research work. I am highly grateful to
Dr. Qasim Ayub from Welcome Trust Sangers Institute, Cambridge University for
providing guidelines in the analysis. I would also thank Dr. Waqasuddin Khan for
helping in some of my data analysis work. I would also like to convey my deep
gratitude to all my teachers at the International Center for Chemical and Biological
Sciences (ICCBS) from which I learnt a lot during my stay at ICCBS.
I am pleased to convey my thanks to my colleagues Muhammad Irfan, and Atia Gohar
for their suggestions, and help whenever I needed.
I am highly grateful to the prayers of my mother and siblings especially the eldest
brother who not only encouraged but also supported me to do this job.
Muhammad Shakeel Karachi
Jan‘ 2018
II
Table of Contents
Acknowledgements ........................................................................................................ I
Table of Contents ......................................................................................................... II
List of Figures .............................................................................................................. VI
List of Tables ............................................................................................................... IX
Abbreviations ............................................................................................................... XI
Summary ................................................................................................................... XIII
XVI ........................................................................................................................ .خال صہ
1.0 Introduction…………………………..…………………………..…………..…....…1
1.1 Cardiovascular Diseases .........................................................................................2
1.2 Prevalence of Cardiovascular Diseases...................................................................2
1.3 Risk Factors of Cardiovascular Diseases.................................................................3
1.4 Genetic Risk Factors for Cardiovascular Diseases ..................................................5
1.4.1 Genetics of Coronary Heart Disease and Myocardial Infarction .................................... 6
1.4.2 Genetics of Hypertension ............................................................................................. 7
1.4.3 Genetics of Congenital Heart Diseases ........................................................................ 9
1.4.4 Genetics of Cardiomyopathies .................................................................................... 11
1.5 Genetics of Obesity ...............................................................................................13
1.6 Mutational Load for Cardiovascular Diseases ........................................................15
1.7 Genetic Research on Cardiovascular Diseases in Pakistan...................................17
1.8 Objectives of the Study ..........................................................................................19
2.0 Materials and Methods…………………………………………..…………..…..…20
2.0 Scheme of Study ............................................................................................... 21
2.1 Estimating the Mutaional Load for Cardiovascular Diseases in Pakistani
Population and its Comparison with Global Populations ................................... 22
2.1.1 Genes Involved in Cardiovascular Diseases................................................................ 22
2.1.2 Genomic/Exomic Datasets used .................................................................................. 23
2.1.3 The Analysis Pipeline .................................................................................................. 24
2.1.4 Filtration of Variants by ClinVar Database ................................................................... 27
III
2.1.5 Comparison of Allele Frequencies of Deleterious Variants of CVDs with Global
Populations ............................................................................................................ 27
2.1.6 Genetic Differentiation of Deleterious Variants ....................................................... 28
2.1.7 Linkage Analysis of Deleterious Variants................................................................ 29
2.2 Whole Genome Sequencing of a Pakistani Individual with Hyperlipidemia
and Coronary Artery Disease ....................................................................... 30
2.2.1 Samples Collection and DNA Isolation ................................................................... 30
2.2.2 DNA Quality Assessment and Quantification .......................................................... 31
2.2.3 Library Preparation and DNA Sequencing .............................................................. 32
2.2.3.1 Fragmentation of Genomic DNA ............................................................................ 32
2.2.3.2 Mate-paired Library Preparation ............................................................................. 32
2.2.3.3 Evaluation of the Library with Bioanalyzer .............................................................. 34
2.2.3.4 Preparation of Emulsion, Emulsion-PCR, and Beads Enrichment .......................... 34
2.2.3.5 3‘-Modification of Template Beads ......................................................................... 35
2.2.3.6 Loading the Flow Chip with Template Beads for Sequencing Reactions ................ 36
2.2.4 Analysis of the Genomic Data ................................................................................ 38
2.2.4.1 Filtration of Poor Quality Short Reads .................................................................... 38
2.2.4.2 Alignment of Short Reads with the Reference Human Genome: ............................ 39
2.2.4.3 Post Alignment Processing and Variants Calling .................................................... 39
2.2.5 Assessing the Genetic Variants related to Hyperlipidemia, and related Cardiac
Disorders ................................................................................................................ 42
2.3 Whole Exome Sequencing of Patients with Cardiomyopathy ....................... 43
2.3.1 Selection of Cardiomyopathy Patients .................................................................... 43
2.3.2 Collection of Blood Samples, and DNA Isolation and Quantification ....................... 44
2.3.3 Library Preparation and Exome Enrichment for Whole Exome Sequencing ........... 44
2.3.3.1 Fragmentation of Genomic DNA ............................................................................ 44
2.3.3.2 End-repair of the Fragmented DNA ........................................................................ 46
2.3.3.3 Purification and Adenylation of End-repaired DNA ................................................. 46
2.3.3.4 Ligation of Paired-end Adaptors ............................................................................. 47
2.3.3.5 Amplification of Adaptors-ligated Library ................................................................ 48
2.3.3.6 Assessment of Quality and Quantity of the Amplified Library .................................. 49
2.3.3.7 Hybridyzation and Exome Capturing ...................................................................... 49
2.3.3.8 Capturing the Hybridized DNA using Streptavidin-coated Beads ............................ 51
2.3.3.9 Amplification of Captured Library with Indexing Primers ......................................... 51
IV
2.3.3.10 Sequencing by Synthesis on Illumina Platform ........................................................ 52
2.3.4 Analysis of Whole Exome Sequencing Raw Data .................................................... 53
2.3.5 Analysis of Variants for Cardimyopathy ................................................................... 56
3.0 Results and Discussion……………………………..………..…………..…...…57
3.1 Mutational Load of Cardiovascular Diseases in Pakistani Population and its
Comparison with Global Populations .............................................................. 58
3.1.1 Gene Ontology .......................................................................................................... 58
3.1.2 Mutational Load of CVDs in Pakistani Population using 1000 Genomes PJL,
ExAC SAS, and British Pakistanis Datasets .............................................................. 58
3.1.3 Filtration of Variants from ClinVar Database .............................................................. 69
3.1.4 Comparative Analysis of Allele Frequencies of Predicted Deleterious Variants ......... 83
3.1.5 Functional Annotation of Deleterious Variants ........................................................... 89
3.1.6 Differentiation of Deleterious Variants in Pakistani Population ................................... 92
3.2 Whole Genome Sequencing of a Pakistani Individual with Hyperlipidemia
and Coronary Artery Disease ........................................................................ 100
3.2.1 Quality Assessment of Genomic DNA ..................................................................... 100
3.2.2 Fragmentation of Genomic DNA and Size Selection ................................................ 100
3.2.3 Mate-Paired Library Preparation .............................................................................. 101
3.2.4 Evaluation of the Mate-Paired Library ...................................................................... 102
3.2.6 Analysis of Whole Genome Sequencing Data ......................................................... 104
3.2.7 Analysis for Deleterious Mutations Related to Hyperlipidemia and Related Cardiac
Diseases.................................................................................................................. 106
3.2.8 Filtration for Disease Mutations Related to Hyperlipidemia and Related Cardiac
Diseases.................................................................................................................. 111
3.3 Whole Exome Sequencing and Analysis of Pakistani Patients with
Cardiomyopathy ............................................................................................ 115
3.3.1 Sequencing Reads ................................................................................................... 115
3.3.2 Quality Assessment of Raw Short Reads ................................................................. 115
3.3.3 Alignment with the Reference Genome and Variants Calling .................................... 117
3.3.4 Annotation of Single Nucleotide Variants (SNVs) and Analysis ................................. 121
3.3.4.1 Annotation with ANNOVAR, and CADD .................................................................. 121
3.3.4.2 Annotation with Variant Effect Predictor (VEP) ........................................................ 132
3.3.5 Annotation of Small Indels and Analysis .................................................................. 136
V
3.3.5.1 Annotation with CADD ............................................................................................. 136
3.3.5.2 Annotation with VEP ................................................................................................ 136
3.3.6 Filtration of Variants of ClinVar, OMIM, and GWAS databases ................................ 137
4.0 Conclusion………………………...……………………………..…………..….…138
5.0 Publications………………………………………………………………………….140
6.0 References……………………………..………………………………..…...…..…141
7.0 Appendix Table 1……………………………………………….…………………..162
VI
List of Figures
Figure 1.1 Classical and new risk factors of CVDs. ................................................................ 4
Figure 1.2 Nephron and genes in the collecting duct and distal tubule involved in
reabsorption of Na+ ions and resulting in hypertension. ........................................ 8
Figure 1.3 Various forms of congenital heart diseases ........................................................... 9
Figure 1.4 A schematic short axis cross-sectional view of heart representing various
forms of cardiomyopathies. ................................................................................. 11
Figure 2.1 The outline of methodology for determining the genetic risk factors for
CVDs in Pakistani population. ............................................................................. 21
Figure 2.2 Number of genes analyzed for common, Mendelian and congenital CVDs
in this study. ........................................................................................................ 23
Figure 2.3 The pipeline to find and analyze the deleterious variants related to cardiac
diseases in Pakistani population. ......................................................................... 25
Figure 2.4 NGS workflow for fragment library preparation and paired-end sequencing on
Illumina. ............................................................................................................... 43
Figure 2.1 The outline of methodology for determining the genetic risk factors for CVDs
in Pakistani population. ....................................................................................... 21
Figure 2.2 Number of genes analyzed for common, Mendelian and congenital CVDs in
this study. ............................................................................................................ 23
Figure 2.3 The pipeline to find and analyze the deleterious variants related to cardiac
diseases in Pakistani population. ......................................................................... 25
Figure 2.4 The reactions of sequencing by oligomer ligation and detection (SOLiD)
technology. .......................................................................................................... 37
Figure 2.5 NGS workflow for fragment library preparation and paired-end sequencing
on Illumina. .......................................................................................................... 45
Figure 3.1 Functional categorization of genes involved in cardiovascular diseases. ............. 59
Figure 3.2 The proportions of nonsynonymous, synonymous, and deleterious SNVs
in three datasets. ................................................................................................. 61
Figure 3.3 The number of SNVs predicted as deleterious by CADD, Polyphen2, and
SIFT in genes of cardiovascular diseases. .......................................................... 62
VII
Figure 3.4 Chromosomal positions of deleterious variants in TTN. The deleterious
variants are bunched in initial exons of the gene. ............................................. 64
Figure 3.5 ClinVar‘s pathogenic and likely pathogenic variants from ExAC SAS having
significantly higher allele frequency in SAS than in other populations. .............. 71
Figure 3.6 Mutational load of different cardiovascular disorders in terms of allele
counts of ClinVar‘s pathogenic and likely pathogenic variants.. ........................ 72
Figure 3.7 Chromosomal positions of genes harboring the ClinVar‘s pathogenic and
likely pathogenic variants associated with cardiovascular diseases.. ................ 73
Figure 3.8 Allele frequency spectrum (AFS) of deleterious SNVs in three datasets:
(A) 1000 Genomes PJL, (B) ExAC South Asians, and (C) British Pakistanis.. .. 83
Figure 3.9 Allele frequency spectrum using the common deleterious SNVs of DAF≥10%
of three datasets. .............................................................................................. 85
Figure 3.10 Comparative distribution of allele frequencies of shared deleterious SNVs
in PJL versus all continental groups of 1000 Genomes Project. ....................... 88
Figure 3.11 Manhattan plot for FST values between the PJL versus SAS populations of
1000 Genomes Project.. ................................................................................... 94
Figure 3.12 Comparison of the proportions of moderately, greatly, and severely differen-
tiated deleterious SNVs and all SNVs in genes harboring deleterious SNVs.. .. 95
Figure 3.13 Principal Components Analysis (PCA) using the genes-set of CVDs.
A. PCA using all low and rare allele frequency (AF≤5.0%) SNVs, B. PCA
using all common allele frequency (AF>5.0%) SNVs. C. PCA using
deleterious low and rare allele frequency (AF≤5.0%) SNVs, D. PCA
using deleterious common allele frequency (AF>5.0%) SNVs. ......................... 98
Figure 3.14 Site frequency spectrums for PJL, 5 other populations of 1000 Genomes
Project, and one Southeast Asian population ‗Malay‘, using the data of same
number of individuals (n=96) of each population for normalization.
A. Comparison of low frequency deleterius SNVs in genes set of CVDs.
B. Percent homozyous deleterious SNVs in each population. .......................... 99
Figure 3.15 Agarose gel electrophoresis of genomic DNA isolated from obese individual. 100
Figure 3.16 A. Fragmentation of genomic DNA using the Covaris S220 system.
B. Size selection by slicing the most intense part of fragmented DNA. ........... 101
Figure 3.17 A schematic illustration of one fragment of mate-paired library.. ..................... 101
Figure 3.18 A 2% E-Gel showing the position of mate-paired library in lane no. 2. ............ 102
Figure 3.19 Evaluation of the mate-paired library by Bioanalyzer 2100. ............................ 103
VIII
Figure 3.20 Distribution of the depth (DP) of variants. ....................................................... 105
Figure 3.21 The predicted deleterious variants with SIFT, Polyphen2, and CADD. ........... 106
Figure 3.22 Validated deleterious SNVs having higher allele frequency in SAS
populations than in global populations. ........................................................... 109
Figure 3.23 Comparison of Global and South Asian allele frequencies for variants of
hyperlipidemia (blue) and ischemic heart diseases (red). ............................... 113
Figure 3.24 Phred quality score distribution of forward and reverse ‘fastq‘ files. ................ 116
Figure 3.25 Insert size for all the five bam files. ................................................................. 118
Figure 3.26 Histogram for the depth of coverage for SNPs (A) and indels (B). .................. 120
Figure 3.27 Venn diagram showing the number of SNVs predicted as deleterious by
SIFT, Polyphen2, and with CADD_phred score ≥ 15. ..................................... 125
Figure 3.28 The SNVs predicted as deleterious by SIFT. .................................................. 126
Figure 3.29 The SNVs predicted as deleterious by Polyphen2. ......................................... 126
Figure 3.30 The SNVs with CADD_phred score ≥ 15. ....................................................... 127
Figure 3.31 The combinedly predicted deleterious SNVs with CADD (phred score ≥ 15)
and SIFT, and Polyphen2 tools. ..................................................................... 127
Figure 3.32 Site Frequency Spectrum of all SNVs (A), and deleterious SNVs (B).. ........... 129
Figure 3.33 Scatter plot of 350 deleteroius SNVs for comparison of derived allele
frequencies in South Asia and in Global populations.. .................................... 131
Figure 3.34 Numbers of Loss of Function SNPs according to functional consequences. .. 132
Figure 3.35 Loss of Functions (LoF) SNVs. (A) Allele frequency spectrum of all LoF SNVs
in South Asia. (B) Genomic evolutionary rate profiling (GERP++) scores for
LoF SNVs. ...................................................................................................... 134
Figure 3.36 Functional consequences of indels with CADD_phred ≥ 15. ............................ 136
Figure 3.37 Loss of Function indels according to functional consequences. ....................... 137
IX
List of Tables
Table 1.1 Estimated disability adjusted life years (DALYs) due to CVDs in Pakistan
during the period of 2000-2015. ........................................................................... 3
Table 2.1 Populations of 1000 Genomes Project used for principal components
analysis (PCA). .................................................................................................. 29
Table 2.2 Covaris protocol for fragmenting genomic DNA. ................................................ 32
Table 2.3 PCR conditions for the amplification of mate-paired library. ............................... 33
Table 2.4 Components for preparing the emulsion for ePCR. ............................................ 34
Table 2.5 Determining the amount of template to be used in emulsion preparation,
using the e-calculator-Life Technologies. ........................................................... 35
Table 2.6 Settings on the Covaris instrument for gDNA fragmentation .............................. 46
Table 2.7 Components of End Repair master mix .............................................................. 46
Table 2.8 Components of Adenylation master mix ............................................................. 47
Table 2.9 Components for ligation of paired-end adaptors ................................................. 48
Table 2.10 Components for amplifying the library ................................................................ 48
Table 2.11 PCR program for amplification of adaptor ligated library .................................... 49
Table 2.12 Components of Block Mix .................................................................................. 50
Table 2.13 Components of Hybridization Buffer ................................................................... 50
Table 2.14 Components of Capture Library Hybridization Mix for capture size ≥3 Mb ........ 50
Table 2.15 Components of PCR for indexing ....................................................................... 52
Table 2.16 PCR program for indexing the library ................................................................. 52
Table 3.1 The subset of variants within the coordinates of genes-set of CVDs.. ................ 60
Table 3.2 Genes of Mendelian and congenital CVDs containing high number of
predicted deleterious variants in ExAC SAS....................................................... 65
Table 3.3 Genes of common, Mendelian and congenital CVDs containing high
number of predicted deleterious variants in British Pakistanis. ........................... 67
Table 3.4 ClinVar‘s pathogenic and likely pathogenic variants filtered form 1000
Genomes PJL dataset. ...................................................................................... 74
Table 3.5 ClinVar‘s pathogenic and likely pathogenic variants filtered form ExAC
SAS dataset. ...................................................................................................... 74
Table 3.6 ClinVar‘s pathogenic and likely pathogenic variants filtered form
British Pakistanis dataset. .................................................................................. 80
Table 3.7 The proportion of shared deleterious SNVs (sdSNVs) with other
populations of 1000 Genomes Project and ExAC. ............................................. 86
Table 3.8 Deleterious LoF SNVs filtered from ExAC SAS dataset in genes of
Mendelian and congenital CVDs.. ...................................................................... 90
X
Table 3.9 Novel deleterious SNVs filtered from British Pakistanis dataset in genes
of CVDs. ............................................................................................................ 91
Table 3.10 Deleterious SNVs greatly and severely differentiated in PJL than in global
populations of 1000 Genomes Project.. ............................................................. 96
Table 3.11 The number of variants in different genomic regions as calculated from
ANNOVAR annotation. .................................................................................... 106
Table 3.12 27 predicted deleterious non-synonymous SNVs in hyperlipidemia
proband in genes of CVDs.. ............................................................................. 108
Table 3.13 Common variants associated with hyperlipidemia and CAD filtered from
GWAS-Catalogue and having 1.5 fold or higher allele frequency in SAS
than in Global populations. ............................................................................... 114
Table 3.14 Quality assessment of raw reads in CMP patients‘ fastq files ........................... 115
Table 3.15 Mapped reads and raw depth of coverage for BAM files .................................. 117
Table 3.16 Numbers of variants after applying different filters ........................................... 119
Table 3.17 Numbers of variants after applying different filters ........................................... 121
Table 3.18 The number of SNVs pertaining to different genomic regions and
functions after annotation with ANNOVAR. ...................................................... 122
Table 3.19 The top 1% genes containing nonsynonymous mutations. ............................... 123
Table 3.20 The homozygous deleterious SNVs present in all five patients of this study. ... 128
Table 3.21 The homozygous deleterious SNVs with Global MAF < 1%. ............................ 130
Table 3.22 The LoF SNVs affecting all transcripts of their genes. ...................................... 135
XI
Abbreviations
Abbreviations Description
AFR African
AFS Allele Frequency Spectrum
AMR American
ANNOVAR Annotation of Variants
BAM Binary Alignment Map
BMI Body Mass Index
BWA Burrows-Wheeler Aligner
CAD Coronary Artery Disease
CADD Combined Annotation Dependent Depletion
CHD Congenital Heart Disease
ClinVar Clinical Variation
CTAB Cetyltrimethylammonium Bromide
CVDs Cardiovascular Diseases
DAF Derived Allele Frequency
DALYs Disability Adjusted Life Years
DCM Dilated Cardiomyopathy
DNA Deoxy Ribonucleic Acid
DOAF Disease Ontology Annotation Framework
DP Depth
EAS East Asian
EDTA Ethylenediaminetetraacetic Acid
ePCR Emulsion Polymerase Chain Reaction
EUR European
ExAC Exome Aggregation Consortium
FHS Framingham Heart Study
FIN Finnish European
FST Fixation Index
GATK Genome Analysis Tool Kit
GERP Genome Evolutionary Rate Profiling
GQ Genotype Quality
GWAS Genome Wide Association Studies
HapMap Haplotype Map
HCM Hypertrophic Cardiomyopathy
HDL High Density Lipoprotein
HPO Human Phenotype Ontology
ICBP International Consortium For Blood Pressure
ICD-10 International Classification Of Diseases - 10
XII
LDL Low Density Lipoprotein
LoF Loss of Function
Mb Mega Bases
NFE Non-Finnish European
NGS Next Generation DNA Sequencing
Nonsyn Nonsynonymous
OMIM Online Mendelian Inheritance in Man
PCA Principal Components Analysis
PCR Polymerase Chain Reaction
PJL Punjabi Lahore, Pakistan
Polyphen2 Polymorphism Phenotyping v2
PROMIS Pakistan Risk of Myocardial Infarction Study
QUAL Quality Score
SAM Sequence Alignment Map
SAS South Asian
SFS Site Frequency Spectrum
SIFT Sorting Intolerant from Tolerant
SNPs Single Nucleotide Polymorphisms
SNV Single Nucleotide Variation
SOLiD Sequencing by Oligomer Ligation and Detection
Syn Synonymous
T2D Type 2 Diabetes Mellitus
TAE buffer Tris-Acetate EDTA Buffer
TE buffer Tris-EDTA Buffer
TG Triglycerides
Ti/Tv Transitions/Transversion
ToF Tatrology of Fallot
UCSC University of California, Santa Cruz
VEP Variants Effect Predictor
VSD Ventricular Septal Defects
WHO World Health Organization
XIII
Summary
Cardiovascular diseases (CVDs) are the prime cause of death accounting for 17.7
million deaths every year globally. In Pakistan, prevalence of CVDs is also
considerably high. CVDs are multifactorial with many risk factors involved in the
pathophysiology of the disease including the genetic predisposition. Genetically, CVDs
may be monogenic or polygenic. Also, there is heterogeneity among genetic
predisposition of cardiac disorders in different populations of the world. This study
aims to investigate the genetic risk factors related to CVDs in Pakistani population.
In this study, the whole genome sequencing data of Pakistani individuals (PJL) from
1000 Genomes Project (n=96), whole exome sequencing data from Exome
Aggregation Consortium (predominantly containing individuals from Pakistan)
(n=8256), and whole exome sequencing data of British Pakistanis (n=3222) were
analyzed using different bioinformatics tools against a manually curated list of 1187
genes associated with major CVDs. The analysis of genetic variants with ANNOVAR
and CADD tools highlighted 561 deleterious variants from 1000 Genomes PJL, 7374
deleterious variants from ExAC (SAS), and 6028 deleterious variants from British
Pakistanis datasets in protein coding regions. The analysis with VEP showed 03 Loss
of Function variants from 1000 Genomes PJL, 30 Loss of Function variants from ExAC
(SAS), and 29 Loss of Function variants from British Pakistanis datasets. Further, the
filtration from ClinVar database revealed 03 pathogenic and 02 likely pathogenic
variants from 1000 Genomes Project PJL, 112 pathogenic, and 42 likely pathogenic
variants from ExAC (SAS), and 42 pathogenic and 16 likely pathogenic variants from
British Pakistanis datasets.
The comparative analysis of prioritized deleterious variants showed many variants
having two fold or higher allele frequency in Pakistani population than in other
populations of the world. Likewise, the population differentiation analysis highlighted
10 deleterious SNVs greatly differentiated from world populations and 02 deleterious
SNVs moderately differentiated from other South Asian populations. The principal
components analysis showed the grouping of Pakistani and other South Asian
populations with Europeans and Americans for deleterious mutations of CVDs.
XIV
To further analyze the filtered data for CVDs, whole genome sequencing of an
individual with hyperlipidemia, obesity, and coronary artery disease was carried out
using SOLiD 5500xl NGS system, and whole exome sequencing of 05 patients with
dilated cardiomyopathy was carried out using Illumina NGS system. After variants
calling and applying the same analysis pipeline, 27 deleterious SNVs were observed in
25 genes associated with hyperlipidemia and risk of coronary artery disease. Two
genes, MTRR (methionine synthase reductase), and PLB1 (Phospholipase B1)
contained two deleterious variants each, and are associated with low levels of low
density lipoprotein-cholesterol (LDL-C) and risk of coronary artery disease.
Furthermore, 11 deleterious variants, also filtered from the healthy dataset, were
observed having significantly higher allele frequency in SAS Populations than in other
populations of the world. In addition, two genes, KCNJ12 (potassium voltage-gated
channel subfamily J member 12) and CDC27 (cell division cycle 27 protein), were
identified having highest number of deleterious nonsynonymous and non-coding
variants.
From whole exome analysis of 05 dilated cardiomyopathy patients, 54 variants were
identified in genes associated with dilated cardiomyopathy, which were prioritized in
mutational load analysis as well. Here, the highest number of deleterious variants was
observed in TTN (titin) and MUC19 (Mucin 19) genes. Also, there were 19 deleterious
SNVs in homozygous state with global minor allele frequency < 1.0%. Overall, 278
deleterious SNVs were having higher allele frequency in SAS than in other populations
of the world. Further, three rare allele frequency (AF < 1%) loss of function SNVs in
C2orf40, MYOM3, and TMED4 genes, a homozygous frameshift insertion in RTKN2.,
and a splice site homozygous deletion in SLC6A6 were found in at least one of the
patients.
To conclude, this study comprehensively presents a picture of deleterious mutations
for cardiac disorders in Pakistani population. The mutational load for major CVDs in a
descending order was for hypertension, atherosclerosis, coronary aneurysm, heart
failure, coronary artery disease, cardiomyopathies, cardiac arrhythmias, and
congenital heart defects. The effect of this genetic predisposition (which is a non-
XV
modifiable risk factor) can be suppressed by minimizing the modifiable risk factors
such as healthy lifestyle.
XVI
صہ خال
37رکوڑ 1بلق داین ںیم بس ےس زایدہ اومات اک ببس ےننب وایل امیبرایں ںیہ۔ اعیمل ادارۂ تحص ےک اطمقب رہ اسل داین ںیم رقتًابی ض ارما
وعالم بلق یک رشح اکیف زایدہ ےہ۔ان ارماض اک ببس یئک واعق وہیت ںیہ۔ اپاتسکن ںیم یھب ارماض ا ن ارماض یک وہج ےس الھک اومات
ز( اک لمع دلخ وہات ےہ۔ ۔ ومرویث وعالم ںیم ااسنن ےک ویلخں ںیم وموجدرویث وعالم یھب اشلم ںیہوم ںیہ نج ںیم
نن
ی
ایت امدہ )ج ی
ن
ھچک ج
ز ںیم دبتیلی امیب ری یک وہج یتنب ےہ بج ہک درگی ارماض
نن
ی
ز امیبری یک وہج ےتنب ںیہ۔ اس قیقحت بلق ںیم یئکارماض بلق ںیم اکی ای دنچ ج
نن
ی
ج
ز یک دبتایلیں وج ارماض بلق اک ببس یتنب ںیہ اپاتسکین ولوگں ںیم اپ یئ ںیم
نن
ی
اک اطمہعل ایک ایگ ےہ۔،اجےن وایل ج
ز ںیم وموجد
ن
ی
ز یک دبتویلیں اک زجتہی رکےن ےک ےیل مہ ےن نیت ڈاٹی ب
نن
ی
اور لمکم ازگیوم (genome)لمکم ونیجم اپاتسکین ولوگں ےکج
(exome) وم وصنمےب ںیم وموجد ڈاٹی وک احلص
ن
ی ز اک زجتہی ایک۔ا س ںیم اکی زہار ج
نن
ی
اپاتسکین 69رک ےک ارماض بلق ےس کلسنم ج
وم ڈاٹی، ازگیکی
ن
ی اک لمکم ازگیوم ، اور (اشلم ںیہ اپاتسکین3739 ولوگں )سج ںیم 6739وصنمےب ںیم (ExAC) ولوگں اک لمکم ج
ANNOVAR فلتخم ویپمکرٹ رپورگاومں ےسیج ہک اشلم ےہ۔ ا ن ڈاٹی اک ں اک لمکم ازگیوم ڈاٹیاپاتسکین ولوگ 7777رباطہین ںیم میقم
CADD, اورVariant Effect Predictor ےس زجتہی ایک۔ اس زجتےی ےک ےجیتن ںیم اکی زہار ونیجم وصنمےب ںیم اشلم اپاتسکین
ز ںیم
نن
ی
دبتایلیں ، ہکبج رباطونی 3737دبتایلیں اپیئ ںیئگ ، ازگیکی ڈاٹی ںیم ےس اصقنن دہ 191ولوگں ںیم ارماض بلق ےس ہقلعتم ج
9796اپاتسکوینں ںیم دبتایلیں ںیلم۔ان اصقنن دہ دبتویلیں ںیم تہب اسری ایسی یھب اشلم ںیہ نج یک رشح اپاتسکین ولوگں ںیم درگی اوقام
ایلیں اشلم ج ںی نج اک قرپ اپاتسکین ولوگں ںیم تہب زایدہ ا۔ا۔اس ےک ایسی اصقنن دہ دبت 17اعمل ےک اقمےلب ںیم اکیف زایدہ یھت۔ان ںیم
ز ںیم
نن
ی
ایسی دبتایلیں ںیلم وج رپونیٹ ےننب 1العوہ اکی زہار ونیجم وصنمےب ںیم وموجد اشلم اپاتسکین ولوگں ںیم ارماض بلق ےس ہقلعتم ج
ں 77ےس ںیم ڈاٹی ےک لمع وک لمکم وطر رپ روک دیتی ںیہ۔ ایس رطح ازگیکییمی
ایسی دبتایلیں اپیئ ںیئگ۔ 76اور رباطونی اپاتسکوینں
ی مہ ےن وماٹےپ ےک اکشر اکی اپاتسکین رفد اک دجدی رطہقی رباےئ ارماض بلق ےس کلسنم انشتخ وہےن وایل اینجیت دبتویلیں یک وتقیث لیکی
ے یک فیلکت ےک لمکم ونیجم ےک ذرےعی (Next Generation DNA Sequencing)رتبیت اسزی ل
ن
ضع
رموضیں ےک 1اور یبلق
ز ےک ادنر 71اکشررفد ںیم ےک وماٹےپ لمکم ازگیوم یک رتبیت ولعمم یک ۔ اس زجتےی ےس
نن
ی
اصقنن دہ اینجیت دبتویلیں یک 73فلتخم ج
ز ںیم دو دو اصقنن دہ دبتایلیں اپیئ ںیئگ۔ ہی دوون PLB1اور MTTRوتقیث یک یئگ۔اس ںیم
نن
ی
ز مس ںیم بر ی یک ادقار حڑاحےن ج
نن
ی
ں ج
XVII
ے یک فیلکت ےک رموضیں ںیم ل
ن
ضع
اصقنن دہ اینجیت دبتویلیں یک وتقیث یک یئگ۔ 17اور یبلق رشاین یک امیبری اک ببس یتنب ںیہ۔ ایس رطح یبلق
ز ںیم اپیئ ںیئگ۔ اس MUC19 اور TTNاس ںیم بس ےس زایدہ اصقنن دہ اینجیت دبتایلیں
نن
ی
ےک العوہ وماٹےپ ےک اکشر رفد ںیم ج
CLDN5 اورLPL ز ںیم رپونیٹ ےک لمع وک عطقنم رکےن وایل دو دبتایلیں ںیلم وج مس ںیم وکرٹسیلول یک ادقار حڑاھیت ںیہ۔
نن
ی
ج
عمج اپاتسکین ولوگں ںیم بس ےس زایدہ اینجیت دبتایلیں اشفر وخن اک ببس ےننب وایل اپیئ ںیئگ اور اس ےک دعب بر ی اک وخن یک انویلں ںیم ،رصتخماً
ے یک فیلکت ، وطلی رحتک بلق اکدنب وہان وہان، ل
ن
ضع
ڈروم QT، یبلق رشاین یک امیبری ، یبلق
ن ن
ہقلعتم اور دیپایشئ ارماض بلق ےس ، س
اینجیت دبتایلیں اپیئ ںیئگ۔
1
Chapter 1.0
Introduction
2
1.1 Cardiovascular Diseases
Cardiovascular disease (CVD) is any disorder of the heart and the blood vessels. It is
a group of disorders which includes coronary artery disease (coronary heart disease),
cerebrovascular disease (stroke), peripheral arterial disease, hypertension, rheumatic
heart disease, congenital heart disease, cardiomyopathies, cardiac arrhythmias, deep
vein thrombosis and pulmonary embolism (World Health Organization, 2017a). CVDs
are multifactorial in nature. Several environmental and genetic factors are involved in
the pathophysiology of these disorders (O'donnell, and Nabel, 2011). The conditions
like coronary heart disease, stroke, and peripheral arterial disease involve the
restriction of blood-flow through the artery in heart, brain, and peripheral organs
respectively (British Heart Foundation, 2017). Hypertension is a condition in which
blood flows through the blood vessels with a force greater than normal. In rheumatic
heart disease, the heart muscles or valves are damaged due to infection with
streptococcal bacteria. Congenital heart diseases are malformations of the heart or
related vessels present at birth. In deep vein thrombosis, a blood clot in peripheral vein
e.g. in leg can hamper the normal blood flow to the heart, or it can dislodge and travel
to heart or lungs causing pulmonary embolism (World Health Organization, 2017a).
1.2 Prevalence of Cardiovascular Diseases
Cardiovascular diseases are the leading cause of death globally. Approximately 17.7
million deaths occur due to CVDs every year, which accounts for 31% of all the global
deaths. More than 75% of deaths due to CVDs occur in low- and middle-income
countries (World Health Organization, 2017a). East Asia, Southeast Asia, and South
Asia, where Pakistan is located, have the largest increase in premature mortality due
to CVDs over the past 20 years (Roth et al., 2015). Although, the data of prevalence of
CVDs in Pakistan remains sparse (Aziz, Faruqui, Patel, and Jaffery, 2012), yet
Pakistan faces a considerable load of CVDs in terms of morbidity and mortality. World
Health Organization reported 11.473 million disability adjusted life years (DALYs) due
to CVDs in Pakistan during 2000-2015 (Table 1.1), which was 30.84% of the burden
by non-communicable diseases in this country (World Health Organization, 2016).
3
Table 1.1: Estimated disability adjusted life years (DALYs) due to CVDs in Pakistan during the period of 2000-2015.
S.No Diseases Estimated DALYs (x 000)
1. Ischemic heart disease 6178.1
2. Stroke 2729.6
3. Congenital heart anomalies 1198.5
4. Rheumatic heart disease 677.7
5. Hypertensive heart disease 422.3
6. Cardiomyopathy, myocarditis, endocarditis 102.4
7. Other circulatory diseases 164.1
TOTAL 11472.7
1.3 Risk Factors for Cardiovascular Diseases
A risk factor can be defined as a variable possessing significant association with some
clinical condition through statistical approaches (Brotman, Walker, Lauer, and O‘Brien,
2005). Risk factors are important for assessing the predisposition of diseases enabling
better prevention and control. The risk factors of CVDs were initially determined by
epidemiology based approaches, for example, in a prospective design, Framingham
Heart Study (FHS) identified that factors such as male sex, age, smoking,
hypertension, and diabetes mellitus are related to the risk of developing CVDs
(Dawber et al., 1959). Later, the risk factors for CVDs were investigated through
empirical studies in case-control approach, whereby circulating lipids specially the low
density lipoprotein (LDL) cholesterol was found to be associated with the development
of coronary heart disease (Kannel, Dawber, Friedman, Glennon, and Mcnamara,
1964; Kannel, Dawber, Kagan, Revotskie, and Stokes, 1961). In recent years, large
cohort studies such as INTERHEART study have identified nine risk factors for
susceptibility to myocardial infarction including smoking, raised ApoB/ApoA1
(atherogenic/atheroprotective lipids) ratio, hypertension, abdominal obesity, psycho-
4
social factors, decreased daily intake of fruits and vegetables, regular alcohol
consumption, and decreased physical activity (Yusuf et al., 2004). At present, more
than 100 risk factors have been identified to be linked with various cardiac diseases
(Brotman, Walker, Lauer, and O‘Brien, 2005). The CVDs risk factors have been
divided into two broad categories i.e., classical risk factors and new risk factors. The
classical risk factors are further divided into two classes i.e., modifiable risk factors and
non-modifiable risk factors (Figure 1.1).
Figure 1.1: Classical and new risk factors of CVDs (Badimon, and Vilahur, 2012).
5
1.4 Genetic Risk Factors for Cardiovascular Diseases
There are multiple factors involved in the pathogenesis of cardiovascular diseases
including both the environmental and genetic factors. The interplay between these two
types of risk factors is quite complex and their contribution to the onset of diseases
differs for different CVDs and the individual patients (Delles, McBride, Padmanabhan,
and Dominiczak, 2008). Most of the CVDs are resulted from the complex interaction of
many genes on diverse loci, apart from the gene-environment interactions (Kelly, and
Fuster, 2010). The hereditary risk factors when combined with the modifiable risk
factors such as smoking, alcoholism, lack of physical activity etc. increase the
possibility of susceptibility to heart diseases (Centers for Disease Control and
Prevention, 2017). Studies on determining the genetic predisposition of CVDs started
about 30 years ago which anticipated to decipher some genetic variants to be
incorporated into a risk assessment model of modifiable risk factors. This extensive
research showed that CVDs are quite heterogeneous genetically (Cambien, and Tiret,
2007). So based on these findings, CVDs have been divided into two groups i.e.
monogenic and polygenic. The monogenic forms of CVDs are rare and caused by
mutations in single gene e.g. hypertrophic and dilated cardiomyopathy, long-QT
syndrome, and channelopathies etc. Certain Mendelian disorders also contribute to
the onset of CVDs e.g., familial form of hypercholesterolemia leads to the
manifestation of coronary heart disease, peripheral artery disease, and stroke. The
incidences of such cardiac disorders are also increased by disease family history. On
the other hand, polygenic CVDs are quite complex and multi factorial e.g.
hypertension, myocardial infarction, coronary artery disease, and aortic aneurysm etc.
These common forms of CVDs have been found to be caused by genetic variation in
multiple genes which show little effect when alone but manifest the symptoms of
disorder when work in combination with causal or modifier genes. Some rare variants
also pose risk to such common CVDs (Arnett et al., 2007; Faita, Vecoli, Foffa, and
Andreassi, 2012; O'donnell, and Nabel, 2011). A brief review of genetic basis of some
highly prevalent CVDs is given here.
6
1.4.1 Genetics of Coronary Heart Disease and Myocardial Infarction
It has been demonstrated for decades through the familial and epidemiological studies
that 40% - 60% risk of coronary heart disease is hereditary. The follow-up studies of
the Framingham Study also showed that the susceptibility of coronary heart disease
was found to be 2.4 fold increased in men and 2.2 fold increased in women having
family history of this disease (Ozaki, and Tanaka, 2016). The first genetic risk for
myocardial infarction and early onset of coronary artery disease was identified at band
21.3 of short arm p of chromosome 9 (9p21). The common mutations adjacent to
CDKN2A and CDKN2B on this locus were found to pose 2.02 fold higher risk of early
onset of the disease (Helgadottir et al., 2007). The genomic scale studies to decipher
the genetic risk factors for CVDs using large cohort of cases and controls such as
Coronary Artery Disease Genome-Wide Replication and Meta-Analysis
(CARDIoGRAM) identified 13 novel loci to be associated with coronary artery disease,
in addition to confirming 10 previously identified loci (Schunkert et al., 2011). Likewise,
the Coronary Artery Disease (C4D) Genetics Consortium identified 5 new loci by
genome wide association studies (GWAS) from 21,408 cases of CAD and 19,185
controls (Coronary Artery Disease (C4D) Genetic Consortium, 2011). Merging of
these two large studies led to the formation of a new consortium
CARDIoGRAMplusC4D, which identified 15 novel risk loci for coronary artery disease
(Deloukas et al., 2013). In addition to these consortia, many independent genetic
studies specific to certain populations identified more genetic loci associated with
coronary artery disease, making a total number of 51 risk loci. Many of these risk
variants are involved in lipid metabolism including LDL and cholesterol metabolism. In
addition, some identified variants are involved in inflammation, cell proliferation and
differentiation, and vasoconstriction. However, the underlying mechanism of some
variants by which these variants pose risk to coronary artery disease are still unknown
(Ozaki, and Tanaka, 2016). DNA sequencing of protein coding regions (whole exome)
of large cohorts also lead to identification of many genes carrying substantial number
of deleterious variants in CAD cases as compared to controls. Whole exome
sequencing of families with myocardial infarction highlighted the role of GUCY1A3 and
CCT7 genes which are involved in nitric oxide signaling pathways (Erdmann et al.,
7
2013). The gene APOC3 which encodes apolipoprotein C, harboring several loss of
function mutations, poses risk to CAD (TG and HDL Working Group of Exome
Sequencing Project, 2014). Whole exome sequencing of about 5000 cases of early
onset myocardial infarction revealed the role of detrimental mutations in APOA5 and
LDLR genes (Do et al., 2015).
1.4.2 Genetics of Hypertension
High blood pressure is a major contributor of cardiovascular diseases which can lead
to ischemic heart disease or stroke. Studies have shown the contribution of genetic
factors in about 50% of hypertension cases (Jeanemaitre, Gimenez-Roqueplo, Disse-
Nicodeme, and Corvol, 2007). Like the atherosclerotic CVDs, hypertension is also a
complex genetic trait, which is caused by the variations in multiple genes, because the
blood pressure in body is maintained by quite a complicated network of physiological
systems including vascular, renal, endocrine, and neuronal mechanisms (Doris, 2002).
It was noted that hypertension was caused mostly by the mutations in genes affecting
the renin-angiotensin-aldosterone system which controls salt-water homeostasis in the
body and maintains normal blood pressure (Lifton, Gharavi, and Geller, 2001).
Mutations in the gene SCNN1B, which encodes a sodium channel epithelial 1 beta
subunit, causes the number of sodium channels to be increased in the apical
membrane. This sodium channel is involved in the reabsorption of sodium in the renal
tubule. Increasing the number of such channels causes increased reabsorption of
sodium in the apical membrane, thus raising the blood pressure (Hansson et al.,
1995). Mutations in the gene of 11-beta-hydroxysteroid dehydrogenase, type II
(HSD11B2) results in the excess of mineralocorticoids which also causes increased
renal absorption of sodium. Mutations in the serine-threonine kinases, which are
encoded by the WNK1 and WNK4 genes also found to be linked with hypertension
(Wilson et al., 2001). Other studies also identified mutations in genes of
sodium/chloride transporters which altering the salt-water homeostasis cause
hypertension, such as solute carrier family 12, member 3 gene (SLC12A3) (Simon,
Nelson-Williams, et al., 1996), solute carrier family 12, member 1 gene (SLC12A1),
8
(Simon, Karet, et al., 1996), the inwardly rectifying potassium channel, subfamily J,
member 1 gene (KCNJ1) (DiPietro, Trachtman, Sanjad, and Liftonl, 1996) and in the
chloride voltage-gated channel Kb gene (CLCNKB) (Stonez et al., 1997), and
mutations in non-voltage gated sodium channel epithelial 1 beta subunit (SCNN1B),
and the non-voltage gated sodium channels epithelial 1 gamma subunit (SCNN1G)
(Chang et al., 1996).
Figure 1.2: Nephron and genes in the collecting duct and distal tubule involved in reabsorption of Na+ ions and resulting in hypertension (Luft, 2017).
Recent large scale genome wide association studies (GWAS) and their meta-analysis
lead to the identification of many risk loci which are linked with primary hypertension.
The International Consortium for Blood Pressure (ICBP) and GWAS collectively
(ICBP-GWAS) reported 28 loci for systolic and diastolic blood pressures (International
Consortium for Blood Pressure Genome-Wide Association Studies, 2011). Further,
large independent studies on genetics of hypertension have led to elucidate other loci
linked with hypertension. To date, 185 single nucleotide polymorphisms (SNPs) at
various loci have been catalogued to be associated with hypertension (Hindorff,
Junkins, Mehta, and Manolio, 2011).
9
1.4.3 Genetics of Congenital Heart Diseases
Congenital Heart Disease (CHD) is the malformation of heart present at birth. CHDs
are the most common form of birth abnormalities accounting up to one third of all the
major birth defects (van der Bom, Bouma, Meijboom, Zwinderman, and Mulder, 2012).
This group of CVDs comprises the structural abnormalities of heart such as
abnormalities of cardiac valves, cardiac septum, and the lesions of track of blood
outflow. This includes simple heart defects such as atrial septal defects (ASD),
ventricular septal defects (VSD), patent ductus arteriosus (PDA), pulmonary valve
stenosis, and the complex defect such as Tatrology of Fallot (TOF), which is
combination of four defects of heart i.e., a VSD, pulmonary valve stenosis, right
ventricular hypertrophy, and overriding aorta (National Heart Lung and Blood Institute,
2017).
Figure 1.3: Various forms of congenital heart diseases
Genetically, congenital heart diseases are also heterogeneous. The genetic evidence
of CHD started with the finding of de novo deletions at chromosome 22q11 locus, and
chromosome 21 trisomy (Antonarakis, Lyle, Dermitzakis, Reymond, and Deutsch,
10
2004; Goldmuntz, 2005). Different studies showed that mutations in genes which are
involved in cardiac development such as NKX2-5 gene of homeodomain protein
(Schott et al., 1998), GATA4 which encodes GATA Binding Protein 4 (Garg, Kathiriya,
Barnes, and Schluterman, 2003), and NOTCH1 gene of a transmembrane protein of
NOTCH family (Garg, Muth, Ransom, and Schluterman, 2005) lead to the
manifestation of various forms of CHD. Further studies led to the identification of many
structural variations in different chromosomes associated with high penetrance of
CHDs such as trisomy chromosome 13, trisomy chromosome 18, deletions at
22p11.2, 7q11.23, and 5p15.2 loci etc. (Fahed, Gelb, Seidman, and Seidman, 2013).
The mutations in important cardiac transcription factors resulting in haploinsufficiency
are responsible for inherited and sporadic congenital heart diseases (Pulignani,
Cresci, and Andreassi, 2013). This includes de novo substitution in NR2F2 gene which
encodes a pleiotropic developmental transcription factor causing the atrioventricular
septal defect (Al Turki et al., 2014), and mutations in transcription factors belonging to
the subfamily of T-box such as TBX3 and TBX5 which play role in developing and
maintaining the cardiac conductions system (Postma, Bezzina, and Christoffels, 2016).
Many mutations in regulatory regions such as promoters and enhancers of some
genes have also been identified to be linked with CHDs. The variations in regulatory
regions of genes predispose or cause the disease by altering the binding of
transcription factors and changing the gene expression. To date, more than 50 human
genes have been identified which are involved in different congenital heart
abnormalities (Postma, Bezzina, and Christoffels, 2016).
11
1.4.4 Genetics of Cardiomyopathies
Cardiomyopathies are group of cardiac disorders which involve the structural and
functional abnormalities of heart muscles. For cardiomyopathies, hypertension,
coronary artery disease, congenital heart disease, and heart valvular disease are
excluded because these conditions also damage the heart muscles (Elliott, 2000).
Cardiomyopathies have been classified based on the abnormalities and their
localization in the heart muscles which includes hypertrophic cardiomyopathy (HCM),
dilated cardiomyopathy (DCM), restrictive cardiomyopathy (RCM), and arrhythmogenic
right ventricular dysplasia (ARVD) (Figure 1.4).
Figure 1.4: A schematic short axis cross-sectional view of heart representing various forms of cardiomyopathies (Davies, 2000).
Hypertrophic cardiomyopathy (HCM) is the most common inherited disorder among
the cardiovascular diseases, in which the thickness of the walls of ventricles increases
(Jacoby, and McKenna, 2012). The genetic studies have shown HCM as a genetically
heterogeneous disorder following autosomal dominant as well as autosomal recessive
pattern of inheritance with an incomplete penetrance depending on age and gender
(Sabater‐Molina, Pérez‐Sánchez, Hernández del Rincón, and Gimeno, 2017). Majority
12
of the mutations have been identified in the genes of sarcomeric proteins. About 70%
of the mutations related to HCM have been identified in the genes encoding cardiac
myosin binding protein C (MYBPC3) and β-myosin heavy chain 7 (MYH7). Other
genes harboring the pathogenic variants for HCM with frequency ranging from 1–5%
include TPM1, TNNT2, TNNI3, ACTC1, MYL2, and MYL3 (Lopes et al., 2015). High
throughput sequencing technologies have identified new genes contributing to the
pathophysiology of HCM increasing the list to dozens of responsible genes, including
the genes encoding non-sarcomeric proteins such as Z-disc, and Ca2+-handling
proteins. The variants in genes of desmosomal ion channels, and titin protein have
been found in up to 43% and 64% of the cases along with variants in (Sabater‐Molina,
Pérez‐Sánchez, Hernández del Rincón, and Gimeno, 2017).
Dilated cardiomyopathy (DCM) is the most common cause of cardiac death in young
adults. In DCM, the left ventricle is enlarged due to the reduced thickness of
ventricular walls causing the systolic dysfunction (Hershberger, Hedges, and Morales,
2013). DCM may be idiopathic or with a hereditary cause (25-30%). Studies also
determined that 50% of the idiopathic DCM were genetic (Mahon et al., 2005). Like
HCM, DCM is also genetically heterogeneous showing patterns of autosomal
dominant, autosomal recessive, X-linked, and mitochondrial inheritance. Genetic
studies have identified a number of genes contributing to pathophysiology of DCM, in
which titin (TTN), lamin A/C (LMNA), cardiac troponin T (TNNT2), β-myosin heavy
chain 7 (MYH7), BCL2-associated athanogene 3 (BAG3) found to be major players
contributing to the pathophysiology. To date, over 40 genes have been identified to be
associated with DCM, many of which encode for sarcomeres and cytoskeletal
elements. It has also been noted that many genes responsible for DCM, are also
overlapping with those responsible for HCM (Park, 2017). For restrictive
cardiomyopathy, mutations in cardiac troponin I (cTnI), have been found to increase
myofibril sensitivity to calcium which causes the impaired ventricular relaxation (Liu et
al., 2016).
13
1.5 Genetics of Obesity
The excessive accumulation of fats in the body leading to health impairment is termed
as obesity. Usually, the body mass index (BMI: weight per squared meter of height
(weight/m2) of a person) is used to define the obesity. For adults, a person with BMI ≥
30 is considered as obese (World Health Organization, 2017b). The prevalence of
obesity is high in both the high-income, as well as in middle- and low-income countries
(Ng et al., 2014). It has been reported that globally over 600 million of adults aging >18
are obese (World Health Organization, 2017b).
Obesity is quite a complex metabolic disorder which is also associated with other
pathophysiological conditions such as dyslipidaemia, atherosclerosis, hypertension,
coronary heart disease, type 2 diabetes mellitus (T2D), and certain types of cancers
(Poirier et al., 2006; Switzer, Mangat, and Karmali, 2013). Obesity is one of the prime
risk factors for elevated prevalence of CVDs. A strong association of obesity has been
found with hypertension leading to coronary heart disease and heart failure (Akil, and
Ahmad, 2011; Artham, Lavie, Milani, and Ventura, 2009). So, the genetic factors of
obesity are also the risk factors for cardiovascular diseases. Genetically, obesity has
been classified into monogenic obesity and polygenic obesity. The monogenic forms of
obesity may be syndromic or non-syndromic which follow autosomal or X-linked
pattern of Mendelian inheritance, e.g., abdominal obesity-metabolic syndrome 3
(OMIM # 615812), body mass index quantitative trait locus 9 (OMIM # 602025).
Genetic variations in genes regulating the appetite and related metabolism have been
found to cause these types of obesity (Waalen, 2014). Bardet-Biedl syndrome, a major
form of syndromic obesity, has been found to be caused by variations in a class of 19
genes naming as BBS1 to BBS19 (Pigeyre, Yazdi, Kaur, and Meyre, 2016). The
products of this class of genes affect the signaling cascade through the leptin
receptors (LEPR) (Seo et al., 2009). Another syndromic form of obesity Prader-Willi
syndrome was found to be caused by deletions at chromosome 15 locus q11.2-q13
and variations in genes such as MAGEL2, MKRN3, NPAP1, and SNURF-SNRPN
(Angulo, Butler, and Cataletto, 2015; Pigeyre, Yazdi, Kaur, and Meyre, 2016). In
Cohen syndrome, variations in COH1 (VPS13B) on chromosome 8q22 locus have
14
been found responsible for the pathophysiology (Kolehmainen et al., 2003), while the
Alstrom syndrome has been found to be caused by variations in ALMS1 (Collin et al.,
2002). For non-syndromic form of obesity, a number of heterozygous/homozygous
loss-of-function mutations have been identified in some genes such as LEP (Leptin),
LEPR (Leptin Receptor), MC4R (Melanocortin 4 receptor), POMC
(Proopiomelanocortin), SH2B1 (SH2B adaptor protein 1), and NTRK2 (Neurotrophic
tyrosine kinase receptor type 2) with varying degree of penetrance (Pigeyre, Yazdi,
Kaur, and Meyre, 2016). The list of associated genes increased using the genome
wide association studies (GWAS) and genes such as FTO and MC4R emerged as
strong candidate genes linked with obesity (Srivastava, Srivastava, and Mittal, 2016).
For polygenic obesity, there is still poor understanding of the underlying predictive risk
due to genetic variants. This might be due to the fact that many variants of small effect
size play together to produce the phenotype (Yeo, 2017). Recently, complete genome
sequencing of mouse model of polygenic obesity TALLYHO/Jng (TH) revealed 1601
deleterious non-synonymous mutations in 1148 genes. It was also noted in this study
that 99.83% of the 1.21 million indels were found in non-coding regions including the
intronic, intergenic, and 5 kb upstream or downstream regions (Denvir et al., 2016). To
date, more than 100 loci have been identified to be associated with obesity (Yeo,
2017).
15
1.6 Mutational Load for Cardiovascular Diseases
Mutational load or burden is a phenomenon in population genetics implying that
several deleterious variants within the genome pose a harmful effect to the fitness of
an individual whereby it contributes to the susceptibility of complex disorders
(Howrigan et al., 2011). The overall fitness of a population is reduced by the
emergence of detrimental genetic variants. It is one of the components of genetic load
which determines the genetic make-up of populations. The other parameters of
genetic load are inbreeding load, segregation load, and transitory load (Henn,
Botigué, Bustamante, Clark, and Gravel, 2015). The reasons for emergence of
detrimental variants in populations remained contentious among biologists. Studies
have suggested that deleterious variants arose in populations during the range
expansion during or after the Out-of-Africa event. During the expansion of populations
in new territories, many neutral variants arose to high frequencies being the optimal to
new habitats, a phenomenon termed as ‗gene surfing‘ (Edmonds, Lillie, and Cavalli-
Sforza, 2004; Klopfstein, Currat, and Excoffier, 2005). The surfing effect can also lead
to detrimental mutations rising to high frequencies in the expanding front. This
phenomenon also affects the variants involved in reproduction rate (Travis et al.,
2007). Recent empirical studies based on whole genome/whole exome sequencing of
large cohort of human populations have revealed that populations differ in neutral and
deleterious variants subject to their evolutionary background. On average, the non-
African populations bear more deleterious variants than the African populations
(Lohmueller et al., 2008). This is due to a severe bottleneck faced by ancestral non-
African populations post Out-of-Africa event (Keinan, Mullikin, Patterson, and Reich,
2007; H. Li, and Durbin, 2011). It has been estimated that non-African populations
carry, on average, slightly but significantly larger number of predicted deleterious
mutations than the African populations (Fu, Gittelman, Bamshad, and Akey, 2014). It
was also estimated from large scale DNA sequencing data that on average a person
carries 281-515 missense substitutions, out of which 40-85 in homozygous state (Xue
et al., 2012). These detrimental variants in healthy individuals may not show apparent
disease symptoms may be due to their low penetrance, or being in heterozygous state
16
particularly those which are associated with autosomal recessive disorders, or being
associated with late onset of diseases.
Deleterious variants of different allele frequencies confer different effects on the fitness
of individuals and consequently susceptibility to diseases. It has been hypothesized
that common variants pose less effect to the susceptibility of diseases while rare
variants confer more effect for monogenic, familial as well as complex genetic
disorders (Lettre, 2014).
Comprehensive literature survey shows that continental populations have been
evaluated for general deleterious mutational load and its history in context of
population demographics. To date, there are no reports of studies addressing and
quantifying the mutational load for certain human diseases. This is a gap which needs
to be sophisticatedly addressed through whole genome/whole exome sequencing
data. Quantifying the mutational load for certain diseases can provide a framework
how these diseases have been evolved in the human histories passing the filter of
purifying selection. Cardiovascular diseases, as described earlier, are group of
monogenic and polygenic disorders of heart and the vessels. There is complex
interplay of many genes which leads to the appearance of cardiac disorders. There are
a number of studies elucidating the genetic basis of various common, and Mendelian
cardiac diseases using the large cohort of patients and controls. However, the
evolution of deleterious and disease causing variants for CVDs has not been
investigated so far. Estimation of the mutation load using the deleterious variants for
cardiac diseases will enable to understand the pattern of their emergence in human
populations. The comparison of allele frequencies across the populations would
enable to understand the effect of evolutionary forces distributing these detrimental
variants differentially among the populations, and where by posing differential
underlying mutation load.
17
1.7 Genetic Research on Cardiovascular Diseases in Pakistan
Pakistan is the 5th largest country of the world having a huge flux of population. It is
facing serious health care issues. Consanguineous marriages are common in Pakistan
which are possible cause of genetic disorders including cardiovascular diseases (Haq
et al., 2011). Estimates show that one in five adults of middle age may have sub-
clinical coronary artery disease. Prevalence of myocardial infarction in our local
population has been reported to be 11.2% in one study of prevalence of coronary
artery disease in rural areas of Peshawar (Mahmood-ul-Hassan, Awan, Gul,
Sahibzada, and Hafizullah, 2005). The prevalence of various forms of congenital heart
defects has been reported to be 3.4/1000 births in one study (Rizvi, Mustafa, Kundi,
and Khan, 2015). Despite substantial load of cardiovascular diseases, little genomic
research has been carried out in Pakistan on CVDs. The INTERHEART study (15152
cases and 14820 controls), in which metabolic and socio-economic factors were
studied in relation to myocardial infarction, also comprised <5% cases from Pakistan
(Yusuf et al., 2004). Recently, the Pakistan Risk of Myocardial Infarction Study
(PROMIS) analyzed the whole exomes of 4,793 myocardial infarction cases and 5,710
controls, and highlighted 49,138 rare-frequency (minor allele frequency <1%) predicted
loss-of-function (pLoF) mutations in 1317 genes. In this study, many mutations in lipids
metabolizing genes such as PLA2G7, CYP2F1, TREH, A3GALT2, NRG4, APOC3,
SLC9A3R1 were found key players in conferring the susceptibility to myocardial
infarction (Danish Saleheen et al., 2017). The PROMIS in collaboration with other
consortia, also determined variants in different genes through genome wide
association studies to be associated with coronary heart disease and myocardial
infarction (Golbus et al., 2016; Webb et al., 2017). In addition, there are separate
screening reports of single gene, few genes, or few already associated SNPs with
certain major cardiovascular diseases such as coronary artery disease (Hussain, Bibi,
and Javed, 2011; Iqbal et al., 2005; Shahid et al., 2017), myocardial infarction (Ahmed
et al., 2011; Iqbal et al., 2004; Perwaiz Iqbal et al., 2016; Saeed et al., 2007; Danish
Saleheen et al., 2010), hypertension (Alvi, and Hasnain, 2009; Nawaz, and Hasnain,
2011; Umedani, Chaudhry, Mehraj, and Ishaq, 2013), hypercholesterolemia (Ahmed et
al., 2013; Ajmal et al., 2011), cardiomyopathies (Abid, Akhtar, Khaliq, and Mehdi,
18
2011; Hussain, Haroon, Ejaz, and Javed, 2016; Liaquat, Asifa, Zeenat, and Javed,
2014; Rafiq et al., 2017).
19
1.8 Objectives of the Study
Genomic research on cardiovascular diseases is not at par with its burden in the
country. This is a gap, and a lot of research needs to be carried out on genetic level in
Pakistani population. In this scenario, this study aims to assess and estimate the
underlying mutational burden of cardiovascular diseases in Pakistani population.
Following tasks are aimed to be carried out to come up with the synopsis:
I. To analyze the publically available whole genomic/exomic data of
Pakistani population in different studies/consortia using different bioinformatics
tools such as ANNOVAR (Yang, and Wang, 2015), Combined Annotation
Dependent Depletion (CADD) (Kircher et al., 2014), and Variant Effect Predictor
(VEP) (McLaren et al., 2016) for quantifying the mutational load for common
and Mendelian CVDs. These datasets include 1000 Genomes Project (Punjabi
Lahori, PJL) (1000 Genomes Project, 2015), South Asian in Exome
Aggregation Consortium (ExAC) (Lek et al., 2016) which predominantly
contains samples from Pakistan as a cohort of Pakistan Risk of Myocardial
Infarction Study (PROMIS) (Danesh Saleheen et al., 2015), and British
Pakistanis (Narasimhan et al., 2016). In addition, ClinVar and OMIM databases
will also be filtered for pathogenic and likely pathogenic variants associated with
CVDs. The allele frequencies of prioritized variants will be compared with global
populations to find the relevance of patterns of CVDs genetic risk in Pakistani
population with other populations of the world.
II. To sequence complete genome of a Pakistani individual with
hyperlipidemia, obesity and coronary artery disease using next generation DNA
sequencing (NGS) technology and analyze it for identifying the deleterious
genetic variants prioritized in mutational load analysis related to hyperlipidemia,
and coronary artery disease.
III. To sequence whole exomes of five patients with dilated cardiomyopathy
and analyze it for identifying the deleterious genetic variants prioritized in
mutational load analysis related to dilated cardiomyopathy.
20
Chapter 2.0
Materials and Methods
21
2.0 Scheme of Study
For determining the genetic risk factors possibly responsible for cardiovascular
diseases in Pakistani population, a schematic empirical approach was adopted.
Primarily, the methodology consisted of three phases (Figure 2.1), as:
2.1 Estimating the mutational burden for CVDs using the publically available whole
genome/exome sequencing data of Pakistani population, and its comparison
with other global populations.
2.2 Whole genome sequencing of a Pakistani individual with hyperlipidemia and
coronary artery disease through next generation sequencing (NGS) technology
and its analysis.
2.3 Whole exome sequencing of five Pakistani patients with dilated cardiomyopathy
to evaluate the genetic risk factors.
Figure 2.1: The outline of methodology for determining the genetic risk factors for CVDs in Pakistani population.
Esimating the mutational load for CVDs using whole genome/ exome sequencing data of Pakistnai population and its comparison with other populations.
Whole genome sequencing and analysis of an individual with hyperlipidemia and coronary artery disease to evaluate the variants filtered in mutation load analysis.
Whole exome sequencing and analysis of 5 patients with dilated cardiomyopathy, and its comparative analysis . 2.3
2.2
2.1
22
2.1 Estimating the Mutaional Load for Cardiovascular Diseases in
Pakistani Population and its Comparison with Global
Populations
To determine the mutational load for cardiovascular diseases in Pakistani population,
a pipeline (Figure 2.3) was established in which all the genes previously reported to be
involved in CVDs were listed through the mining of disease databases and literature
survey. Mutational load was calculated in these genes using various bioinformatics
tools. The detailed methodology of estimating the CVDs burden and its comparison
with other populations of the world is described below.
2.1.1 Genes Involved in Cardiovascular Diseases
To determine the genes reported for their association with cardiac diseases, three
databases i.e., Online Mendelian Inheritance in Man (OMIM), ClinVar, and Disease
Ontology Annotation Framework (DOAF) (Hamosh, Scott, Amberger, Bocchini, and
McKusick, 2005; Landrum et al., 2014; W. Xu et al., 2012) were searched. The genes
were retrieved from these databases using the search terms ‗heart‘, ‗cardio‘, ‗cardiac‘,
‗myocardial‘, ‗coronary‘, ‗cardiomyopathy‘, ‗arteriopathy‘, ‗aneurysm‘, ‗atherosclerosis‘,
‗septal defect‘, ‗tetralogy of fallot‘, ‗septal noncompaction‘, ‗arterial‘, ‗atrial‘,
‗hypertension‘, ‗hypercholesterolemia‘, ‗hyper triglyceridemia‘, ‗QT syndrome‘ and
some manually selected cardiac disorder names. To validate these terms, two
databases i.e., Human Phenotype Ontology (Köhler et al., 2014) and WHO‘s
International Classification of Diseases (ICD-10) database were accessed and
comparison was performed. After going through the literature for manual curation, a
final list of (n=1187) genes was prepared, which was carried forward for current
analysis (Appendix Table 1). Out of these, 379 genes were involved in Mendelian and
congenital cardiac disorders such as cardiomyopathies, cardiac arrhythmias, and
atrioventricular septal defects, while rest contributed to common CVDs such as
hypertension, hypercholesterolemia, myocardial infarction, and coronary artery
disease (Figure 2.2). The structural and functional roles of these genes‘ products were
determined by gene ontology terms using the UniProt Gene Ontology Annotation
database for human (version 2.0) (Camon et al., 2004). For visualization of the
23
ontology terms, an online tool BGI WEGO (http://wego.genomics.org.cn/cgi-
bin/wego/index.pl) was used (Ye et al., 2006).
Figure 2.2: Number of genes analyzed for common,
Mendelian and congenital CVDs in this study.
2.1.2 Genomic/Exomic Datasets used
To obtain the genetic variants in selected genes, the whole genomic/exomic data of
Pakistani population was retrieved from different publically available data resources,
such as:
i. Punjabi from Lahore (PJL) (n=96) in 1000 Genomes Project phase 3 (1000
Genomes Project, 2015)
ii. Exome Aggregation Consortium (ExAC)‘s South Asian dataset (n=8056), which
predominantly contains samples from Pakistan (n=7078) as Pakistan Risk of
Myocardial Infarction Study which is a subsidiary of ExAC (Lek et al., 2016;
Danish Saleheen et al., 2017)
iii. Whole exome sequencing data of 3222 British Pakistani individuals with high
relatedness (Narasimhan et al., 2016).
24
The data from 1000 Genomes Project PJL, and British Pakistanis was analyzed for all
1187 genes involved in common as well as Mendelian and congenital CVDs, while
data from ExAC SAS was analyzed for Mendelian and congenital CVDs only because
this dataset contained cohort of common CVDs also (Danish Saleheen et al., 2017).
2.1.3 The Analysis Pipeline
A pipeline was developed to identify and analyze genetic risk factors of cardiovascular
diseases using the computational biology tools (Figure 2.3). The start and end
positions of the genes-set were determined from GENCODE genes set
(gencode.v19.annotation.gtf), which is the final version of GENCODE database
mapped with human reference genome GRCh37 assembly (Harrow et al., 2012). The
shell command ‗grep‘ was used to extract the genes under study from
gencode.v19.annotation.gtf data file.
In order to include the variants from immediate upstream and downstream regions to
cover the promoters of the genes, 2000 bp was subtracted from the start position of
each gene (upstream region), and 2000 bp was added to end position of genes
(downstream region). The genetic variants were extracted within these coordinates of
genes from above mentioned three datasets by using the bcftools-1.2.1 package
(http://www.htslib.org/download/) (Danecek et al., 2011). For this, ‗bcftools view -R‘
option was used to extract the region based variants within the gene co-ordinates. The
output VCF file contained both the SNVs and indels, which were separated using the
bcftools.
25
Figure 2.3: The pipeline to find and analyze the deleterious variants related to cardiac diseases in Pakistani population.
For determining the functional consequences of the subset variants, three annotation
tools were utilized i.e., ANNOVAR (Yang, and Wang, 2015), Combined Annotation
Dependent Depletion (CADD) (Kircher et al., 2014), and Variants Effect Predictor
(VEP) (McLaren et al., 2016). Annotation with ANNOVAR was carried out using the
standalone perl application with gene based refGene annotation, region based
cytoBand and genomicSuperDups annotations, and filter based ljb26_all, dbscsnv11,
esp6500siv2_all, 1000g2015aug_all, and exac03 annotations. The gene based
refGene annotation provides information for all the annotated transcripts in the RefSeq
Gene database. The region based annotations cytoBand and genomicSuperDups
provide the identification of chromosomal bands of variants and duplication segments
26
respectively. Among the filter based annotations, ljb26_all provides the scores of SIFT,
Polyphen2, and GERP++ etc., dbscsnv11 predicts whether the variant is present in the
splice site, and 1000g2015aug_all, esp6500siv2_all, and exac03 provide allele
frequencies of the variants in respective populations and databases
(http://annovar.openbioinfor-matics.org/en/latest/user-guide/download/). The anno-
tation with CADD was performed using a standalone perl script. This annotation
provided CADD based Phred_score (scaled C-score) of the single nucleotide variants
(SNVs). The scaled C-scores of small indels were determined using the web-based
CADD tool (http://cadd.gs.washington.edu/score). For this, the gunzipped vcf file was
uploaded at the captioned server.
To determine the predicted deleteriousness of genetic variants, different criteria have
been used in different studies. Some used single score such as genome evolutionary
rate profiling (GERP) score (Henn, Botigué, Bustamante, Clark, and Gravel, 2015),
PolyPhen2 (Y. Li et al., 2016), and CADD (Richardson, Campbell, Timpson, and
Gaunt, 2016). There are many studies employing more than one tools to predict the
variants as deleterious (Ma et al., 2015; Xue et al., 2012). The scores of three tools
CADD, SIFT, and PolyPhen2 were took into account to consider the variants as
deleterious. The variants for which CADD scaled C-score was ‗≥15‘, SIFT score was
‗<0.05‘, and PolyPhen2_HDIV score was ‗>0.957‘ were considered as deleterious.
These cut-off scores have been recommended by their respective authors. The tools
SIFT and PolyPhen2 predict the effect of variants by employing a machine learning
approach which takes many factors into account such as sequence- and structure-
based features, multiple sequence alignment of proteins, and conservation of variants
across available homologous sequences etc. (Miosge et al., 2015). CADD is an
annotation tool which uses integrated information from 63 annotations from different
databases including the conservation, functional consequences of variants in coding
as well as non-coding regions, and escape from the natural selection
(http://cadd.gs.washington.edu/). This tool integrates information from diverse
annotations into one framework. The scaled C-score correlates with the pathogenicity
27
of coding as well as non-coding variants, considering the allelic diversity, and
regulatory effects measured by experiments (Kircher et al., 2014).
2.1.4 Filtration of Variants by ClinVar Database
The subsets of variants in genes related to cardiovascular diseases from the three
datasets were also filtered from the ClinVar database (Landrum et al., 2014). The
ClinVar database provides an archive of relationship of genetic variants with medical
phenotypes. This database contains variants of different significances including
Benign, Likely benign, Non-pathogenic, Probable-pathogenic, Pathogenic, Drug
response, and Others. The ‗Other‘ category contains variants having risk factor,
sensitivity, association, or some protective role in diseases (Landrum et al. 2014). For
this analysis, the variants with significance ‗Pathogenic‘, and ‗Likely_pathogenic‘ were
extracted using the ClinVar data release 20160104 (ftp://ftp.ncbi.nlm.nih.gov/
pub/clinvar/). The ‗bcftools isec‘ command was used to determine the intersection of
the subset variants and the ClinVar variants. For retrieving the allele frequencies of
extracted variants, the filter based annotation of ANNOVAR for ExAC and 1000
Genomes Project was used.
2.1.5 Comparison of Allele Frequencies of Deleterious Variants of CVDs
with Global Populations
The derived allele frequencies of genetic variants represent their prevalence in a
population providing an insight into the evolutionary genetics background. The
comparison of allele frequencies of variants related to certain diseases/phenotypes
across the populations is a useful approach to study the prevalence of those
diseases/phenotypes in different populations. This also provides the information of
genetic diversity across the populations in terms of the disease/phenotype under study
(1000 Genomes Project 2010). Here the comparison of derived allele frequencies
(DAF) of prioritized variants related to cardiovascular diseases from the three
databases was carried out. The allele frequencies of predicted deleterious derived
alleles from PJL individuals were compared with all the population groups of 1000
Genomes Project i.e., South Asian (SAS), East Asians (EAS), Admixed American
28
(AMR), European (EUR), and African (AFR) and a Southeast Asian population of
Malay (Wong et al. 2013). Similarly, the comparison of allele frequencies of deleterious
derived alleles prioritized from ExAC SAS dataset was carried to other five populations
of this dataset i.e. East Asian (EAS), Latino (AMR), African/African American (AFR),
Non-Finnish European (NFE), and Finnish Europeans (FIN).
2.1.6 Genetic Differentiation of Deleterious Variants
The genetic differentiation of predicted deleterious variants for CVDs was determined
across the populations by calculating the Weir and Cockerham Fixation Index (FST)
(Weir, and Cockerham, 1984). The pairwise unbiased FST was calculated for multiple
loci in two ways i.e., calculation of FST values for predicted deleterious SNVs only, and
calculation of FST values for whole genes harboring those deleterious SNVs. The
genetic differentiation of Pakistani population was estimated against rest of the South
Asian populations, as well as all 25 global populations. The SNVs with FST values in
the range of 0.05 – 0.15 were considered as moderately differentiated, those having
FST values between 0.15 – 0.25 were considered as greatly differentiated, and those
having FST values greater than 0.25 were taken as severely differentiated SNVs
(Jobling, Hurles, and Tyler-Smith, 2013). For calculating the FST values from 1000
Genomes Project data, VCFtools v0.1.12 was used. For this, the merged vcf file of all
the populations was used.
In order to determine whether the footprints of population migrations affected the
distribution of genetic variants related to cardiovascular diseases, the principal
components analysis (PCA) was performed. For this purpose, two approaches were
applied. The principal components were constructed with the PLINK 1.9 (Purcell et al.,
2007) and EIGENSOFT smartpca (Patterson, Price, and Reich, 2006) packages using
the total subset variants in genes related to cardiovascular diseases and predicted
deleterious variants only. The 1000 Genomes Project data of 96 individuals from each
of 15 populations (Table 2.1) and 96 individuals of PJL population was used.
29
Table 2.1: Populations of 1000 Genomes Project used for principal components analysis (PCA).
Population Groups Populations used for PCA analysis
South Asians BEB, STU, ITU
Europeans GBR, FIN, CEU
Americans CLM, PEL, PUR
Africans YRI, LWK, MSL
East Asians CHB, JPT, KHV
For PLINK, the compatible ‗.ped‘ file was created from the vcf file using the VCFtools
v0.1.12. The ‗.ped‘ file was converted into ‗.bed‘ format using the PLINK ‗--make-bed‘
option. Then PLINK ‗--pca' option was used to generate the principle components. For
PCA analysis with EIGENSOFT, the required ‗.pedindel‘ file was created manually
from the ‗.ped‘ file by cutting initial 6 columns. The PCA plot was constructed using the
‗R base‘ package (R Core Team, 2013).
2.1.7 Linkage Analysis of Deleterious Variants
The linkage disequilibrium analysis of the observed deleterious variants from 1000
Genomes Project was performed using the VCFtools v0.1.12. The analysis reveals the
genetic components which are non-randomly passed from parents to offspring in a
population thus deviating from the Hardy-Weinberg equilibrium (Slatkin, 2008). The
linkage disequilibrium analysis was performed using the sliding window of 10,000 bp.
vcftools --vcf cardio_pjl_subset_100316_sort_SNP144_grep-deleterious-only.vcf --hap-
r2 --ld-window-bp 100000 --out pjl_subset_dele_variants_ld_window_100000
30
2.2 Whole Genome Sequencing of a Pakistani Individual with
Hyperlipidemia and Coronary Artery Disease
Obesity, a complex metabolic disorder, is also a risk factor for some other
pathophysiological conditions such as dyslipidaemia, atherosclerosis, hypertension,
coronary heart disease, type 2 diabetes mellitus (T2D), and certain types of cancers
(Poirier et al., 2006; Switzer, Mangat, and Karmali, 2013). The whole genome
sequencing of a Pakistani individual with hyperlipidemia and obesity was carried out
using Applied Biosystems SOLiD® 5500xl next generation DNA sequencing machine.
The detailed procedures and materials used are given below.
2.2.1 Samples Collection and DNA Isolation
Approximately 10mL blood sample of the above-mentioned individual was collected in
K2-EDTA container after the informed consent. The individual was hyperlipidemic with
body mass index (BMI) > 30. DNA extraction was carried out immediately after the
collection of blood samples to achieve the highest integrity of genomic DNA. The high
molecular weight genomic DNA was isolated from the stored blood by CTAB isolation
method with small modifications (Winnepenninckx, Backeljau, and DeWachter, 1993).
The CTAB lysis buffer was prepared with 2% w/v cetyltrimethylammonium bromide
(CTAB), 100 mM TrisHCl, 20 mM EDTA, 1.4 M NaCl, 0.2% v/v β-mercaptoethanol, 0.1
mg/mL proteinase K, and pH of 8.0. Following protocol was used for isolating the DNA.
i. For the lysis of blood cells, 200 uL of whole blood was added to 1mL CTAB
buffer pre-warmed at 65oC in a micro-centrifuge tube. The was incubated at
65oC for one hour, with gently inverting the tube 2 to 3 times during the
incubation.
ii. After the incubation, equal volume of chloroform/isoamylalcohol (24:1) solution
was added to it and the contents of the tube were mixed by gently inverting the
tube several times.
iii. The tube was centrifuged at 12000 RPM for 05 minutes, and the aqueous
supernatant was transferred to a new micro-centrifuge tube very carefully.
31
iv. Two third (2/3) volume of ice chilled isopropanol was added and mixed by
inverting the tube gently several times. Thread like precipitation of DNA was
visible, which was pelleted by centrifugation at 12000 RPM for 03 minutes.
v. The pellet of DNA was washed twice with 70% ethanol to remove the salts and
other impurities from the DNA. Then the pellet was air dried after the last wash,
and finally dissolved in 50 uL TE buffer (pH 7.5). The isolated DNA was stored
at -20oC.
2.2.2 DNA Quality Assessment and Quantification
The quality of genomic DNA was assessed by 1% agarose gel electrophoresis. For
this, 500 mg of agarose powder was dissolved in 50 mL 1xTAE buffer. The mixture
was heated on a heating block until boil and then let it cool for few minutes. Then it
was poured into gel casting tray, into which 2.5 uL DNA staining dye, ethidium
bromide, to a final concentration of 0.5 μg/mL was added. The gel was allowed to
solidify for about half an hour at room temperature. For electrophoresis, 05 uL of each
DNA sample was mixed with 1uL of 6x DNA loading dye, and loaded into the wells of
agarose gel. DNA ladder of 1 Kb size was also loaded in one of the wells. The
electrophoresis was carried out in 1xTAE buffer on a voltage of 60V for 90 minutes.
After the completion of electrophoresis, the gel was visualized on a UV trans-
illuminator.
The quantity of genomic DNA was estimated on Qubit® 2.0 Fluorometer using the
Qubit® dsDNA HS Assay Kit (Thermo Fischer). For quantification, 1 uL of DNA sample
was added to 199 uL of fluorophore containing kit buffer in a 500uL Qubit assay tube.
The mixture was vortexed for 15 seconds, and incubated for 2 minutes at room
temperature. The quantity of DNA was determined on the Fluorometer as described in
the kit protocol.
32
2.2.3 Library Preparation and DNA Sequencing
2.2.3.1 Fragmentation of Genomic DNA
To prepare mate-paired library for whole genome sequencing, the genomic DNA was
fragmented on Covaris™ S220 Focused-ultrasonicator system. About 5 ug DNA was
sheared for average size of 1300 bp using the Covaris™ recommended protocol
(Table 2.2). The average sized DNA was selected on 2% agarose gel electrophoresis
with a sharp and clean blade. The size selected DNA was purified from the gel using
PureLink® Quick Gel Extraction kit.
Table 2.2: Covaris protocol for fragmenting genomic DNA.
Parameter Value
Peak Incident Power (W) 140
Duty Factor 2%
Cycles per Burst 200
Treatment Time (s) 4 15 sec
Temperature 4-8 oC
2.2.3.2 Mate-paired Library Preparation
The mate-paired library of size selected DNA was prepared as per the protocols given
in the SOLiD® Mate-Paired Library Manual. Following steps were carried out in
preparing the library:
i. The ends of the fragmented pieces of DNA were repaired with End Polishing E1
and E2 enzymes.
ii. The mate-paired right (MPR) and mate-paired left (MPL) adaptors were ligated
to the ends of DNA fragments with the ligase enzyme.
iii. The adaptors ligated DNA was circularized by incubating the reaction tube on
70 oC and then pacing on ice immediately.
33
iv. The circularized DNA was purified using the Agencourt AMPure XP beads. The
DNA was recovered from the beads with the elution buffer.
v. The nick in the circularized DNA was translated using DNA Polymerase I at
5 oC for exactly 13.00 minutes.
vi. The nick translated DNA was digested with T7 Exonuclease and S1 Nuclease
enzymes. The digested DNA was purified using the Agencourt AMPure XP
beads and the DNA elution buffer.
vii. A dA-tail was added to both ends of T7 Exonuclease and S1 Nuclease treated
DNA using the A-Tailing Enzyme II. It increases the efficiency of ligating the P1
& P2 adaptors to the digested DNA.
viii. The library was bound to streptavidin beads in 1X BSA solution, and then the
P1 & P2 adaptors were ligated using the T4 DNA ligase enzyme.
viii. The library was nick-translated to fill in any gap, and trial amplification was
performed using the Platinum® PCR Amplification Mix with the conditions of
reaction given in Table 2.3.
Table 2.3: PCR conditions for the amplification of mate-paired library.
Stage Step Temp Time
Holding Nick Translation 72 oC 20 min
Holding Denaturation 94 oC 03 min
Cycling Denature 94 oC 15 sec
Anneal 62 oC 15 sec
Extend 70 oC 01 min
Holding Extend 70 oC 05 min
Holding --- 4 oC ∞
ix. The size of the trial amplified library was evaluated on E-Gel® Electrophoresis
System using a 2 % agarose E-gel.
x. Finally, the library was full amplified using the Platinum® PCR Amplification Mix
with same thermo cycler conditions as in Table 2.3.
34
2.2.3.3 Evaluation of the Library with Bioanalyzer
For precisely evaluating the size distribution of mate-paired library, assessment was
performed on the Agilent Bioanalyzer 2100 instrument using Agilent DNA 100 kit as
per the manufacturer‘s protocol. The DNA 1000 kit can resolve DNA bands ranging in
size from 25 – 1000 bp with a detection limit of 0.5 – 50 ng/uL of DNA. To remove the
flags of library peak, size selection was carried out on 2 % agarose gel on E-Gel®
Electrophoresis System, until a bell shaped distribution of the library was obtained.
2.2.3.4 Preparation of Emulsion, Emulsion-PCR, and Beads Enrichment
For performing the emulsion-PCR, the emulsion of the template library and the PCR
components was prepared on Applied Biosystems SOLiD® EZ Bead™ Emulsifier
system using the Applied Biosystems SOLiD® EZ Bead™ Emulsifier E80 reagent kit
and related accessories. The Emulsifier mixes the oil phase, P1 beads, aqueous
master mix, primers of emulsion PCR (ePCR), and the template (library) and prepares
an emulsion, in which a tiny droplet of oil contains one fragment of template DNA, and
all the components of PCR reaction. The E80 emulsion scale gives the final yield of 1
billion beads after amplification and enrichment. The P1 Beads were declumped on
S220 Focused-ultrasonicator and re-suspended in 1430 μL of SOLiD EZ Bead
Emulsifier reagent, 1x TEX buffer. The amount of other components used for
preparing the emulsion is given in Table 2.4. The amount of library to be used for
preparing the emulsion was calculated with e-calculator provided by the Life
Technologies (Table 2.5).
Table 2.4: Components for preparing the emulsion for ePCR.
Components Amount
SOLiD EZ Bead Emulsifier E80-P1 Beads 1430 uL
1x TEX buffer 1430 μL
Oil Master Mix 67.9 g
SOLiD EZ Bead Emulsifier E80-P1 Reagent (diluted 1:10) 200 uL
SOLiD EZ Bead Emulsifier E80-P2 Reagent 300 uL
SOLiD EZ Bead Emulsifier E80-Aqueous Master Mix 47978 uL
Library Template 21.4 uL
35
Table 2.5: Determining the amount of template to be used in emulsion preparation, using the e-calculator-Life Technologies.
The emulsion was subjected to ePCR on Applied Biosystems SOLiD® EZ Bead™
Amplifier system using the SOLiD® EZ Bead™ E80 emulsion kit (cat # 4452722),
followed by beads enrichment on Applied Biosystems SOLiD® EZ Bead™ Enricher
system using the SOLiD® EZ Bead Enricher E80 Reagent Kit (cat # 4452725),
SOLiD® EZ Bead Enricher Buffer Kit (cat # 4444140), and SOLiD® EZ Bead Enricher
Accessories Kit (cat # 4453073). All the reagents of kits and consumables were
installed in the instrument according to the instructions in the user manual to perform
the enrichment system. At the end of the process, good amount of amplified beads
was obtained.
2.2.3.5 3’-Modification of Template Beads
Before loading the template beads onto the flow-chip, the 3‘-end was modified using
the SOLiD® Pre Deposition Kit (cat # 4452805). First, the beads were sonicated using
the Covalent Declump 3 program on Covaris™ S220 sonicator machine. The beads
were washed with 1X Terminal Transferase Reaction buffer, and then re-suspended it
in 160 μL of 1 mM Bead Linker solution, and 1424 μL of 1X Terminal Transferase
Reaction buffer. Then, 8 μL of Terminal Transferase enzyme (20 U/μL) was used for
every 792 μL of bead solution and incubated at 37oC for 2 hours. After incubation, the
36
beads were washed once with 1X TEX Buffer, and finally resuspended in 400 μL of 1X
TEX Buffer.
2.2.3.6 Loading the Flow Chip with Template Beads for Sequencing Reactions
For loading the 3‘-modified template beads onto the flowchip, FlowChip Deposition
Buffer 1 was used to wash the beads three times. After the final wash, the beads were
suspended in 135 uL of FlowChip Deposition Buffer 1 for loading in 5 lanes of flowchip
(27 uL/lane of flowchip). The beads were declumped using Deposition_Declump
program on Covaris S220 sonicator, and immediately loaded 27 uL of the beads into
each lane of the flowchip. The flowchip was incubated at 37oC for 1 hour in an
incubator. Finally, the flowchip was installed into the Applied Biosystems SOLiD®
5500xl Genetic Analysis system as per the instructions. Co-forward sequencing of F3
and R3 tags of mate-paired library was carried out using the SOLiD® FWD1 SP Kit
(cat # 4463011), SOLiD® FWD2 SP Kit (cat # 4463012), SOLiD® FWD SR S75/S50
kit & SOLiD® FWD Buffer (cat # 4459193), and other related buffers.
The Applied Biosystems SOLiD® 5500xl Genetic Analysis system performs the
sequencing of DNA by ligation chemistry termed as sequencing by oligomer ligation
and detection (SOLiD). In this method, a universal primer anneals with the P1 adaptor
of the template. Then, a pool of octamer probes of nucleotides are added in the
reaction. Each of these octamers contains three modified bases at the 3‘ end, with a
fluorophore attached to it. The complementary octamers anneal with the template and
are ligated with the OH group of preceding base on the newly growing DNA strand.
The three bases at 3‘ end of the octamer cleave off after the ligation, and a specific
fluorescence is emitted which is detected by CCD camera and recorded. Only two first
bases of the probes are recorded. The oligomer ligation and detection cycle is
repeated 5 times, and every base is called twice, hence improving the accuracy of
sequencing reaction (Figure 2.4) (Mardis, 2008; Metzker, 2010).
37
Figure 2.4: The reactions of sequencing by oligomer ligation and detection (SOLiD) technology.
38
2.2.4 Analysis of the Genomic Data
The analysis of the genomic data to determine variants present in the subject under
study comprised of a number of steps with the use of different scripts and
bioinformatics tools. For this, a standard pipeline of variants calling from raw
sequencing reads was applied using the GATK best practices. The steps are
described below:
2.2.4.1 Filtration of Poor Quality Short Reads
The raw sequencing data was obtained in ‗XSQ‘ (eXtensible SeQuence) format. This
XSQ file system is a machine readable binary format which should be converted into
human readable ‗csfasta‘ format for downstream data analysis. The csfasta is color
space format of fasta file which is specific for SOLiD sequencing platform. The XSQ
files were converted into csfasta format using the XSQ_Converter tool by Life
Technologies. (http://www.lifetechnologies.com/pk/en/home/technicalresources/soft-
waredownloads/xsq-software.html).
convertFromXSQ.sh -c -f dnaseqlab5500xl_2014_01_15_1_01.xsq -o /data/results/lane1/
Here, -c and -f parameter specifies that the input file is in XSQ format, and -o
parameter defines the output directory where the csfasta files are generated with
default name as the input file.
The base calling quality scores of the reads were judged with the perl tool
SOLiD_preprocess_filter_v2,pl (Sasson, and Michael, 2010). Here, default parameters
were used for quality trimming i.e., baseline of quality score 10 was used to trim the
poor quality reads, and in each read 3 bases with quality score <10 was allowed. In
addition, all the reads containing any dot (missing base call) was trimmed. The reads
passing the quality filtration, and with matching mate-pairs were proceeded for
alignment step.
perl SOLiD_preprocess_filter_v2.pl -i mp -f
dnaseqlab5500xl_2014_01_15_1_01_default_F3.csfasta -g
dnaseqlab5500xl_2014_01_15_1_01_default_F3.QV.qual -r
39
dnaseqlab5500xl_2014_01_15_1_01_default_R3.csfasta -s
dnaseqlab5500xl_2014_01_15_1_01_default_R3.QV.qual -a y -n y -o out_file
Here, -i option specifies that the input files are from mate-paired library. The option -f,
-g, -r, and -s specifies the forward1 reads, their quality scores, forward2 reads, and
their quality score respectively. The option -a generates a text file containing the
statistics of the filtration process, -n option removes any short read containing dot, i.e.,
the base call was missing there, and -o option is prefix for output files.
2.2.4.2 Alignment of Short Reads with the Reference Human Genome:
This is a key step in genome/exome sequencing experiments, because the false
alignment with reference genome leads to acquiring of false positive variants. To align
the short reads with the reference human genome, the ‗LifeScopeTM Genomic Analysis
Software‘ of Life Technologies was used. Here, the human reference genome version
19 (hg19.fa) of UCSC genome browser was used (http://hgdownload.cse.ucsc.edu/
goldenPath/hg19/bigZips/).
2.2.4.3 Post Alignment Processing and Variants Calling
The alignment of short reads with the reference was obtained in the form of sorted
Sequence Alignment Map (SAM) and its binary format Binary Alignment Map (BAM).
The post alignemtn processing and variant calling of BAM files was carried out by
applying best practices of Picard-tools-1.109 (http://picard.sourceforge.net) and
Genome Analysis Tool Kit (GATK) (McKenna et al., 2010). Following steps were
employed for this purpose.
i. The @RG tags were assigned to each of 5 BAM files using the Picard tool‘s
AddOrReplaceReadGroups so that these may be recognized separately by the
downstream processes after the merging.
java -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -jar picard-tools-
1.109/AddOrReplaceReadGroups.jar I=dnaseqlab5500xl_2015_03_11_1_06_6-5-1.bam
O=amz_1_RG.bam SO=coordinate RGID=FLOWCHIP1_L1 RGLB=MP RGPL=SOLID
RGPU=FLOWCHIP1_L1 RGSM=AMZ CREATE_INDEX=true
40
ii. The individal BAM files were merged by Picard tool‘s MergeSamFiles module,
and subsequently duplicates were removed from the merged BAM file using the Picard
tools MarkDuplicates module.
java -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-DoEscapeAnalysis
-jar /picard-tools-1.109/MergeSamFiles.jar I=amz_1_RG.bam I=amz_2_RG.bam
I=amz_3_RG.bam I=amz_4_RG.bam I=amz_5_RG.bam SO=coordinate
ASSUME_SORTED=true O=amz_RG_merge.bam
java –Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-DoEscapeAnalysis
-jar /picard-tools-1.109/MarkDuplicates.jar I=amz_RG_merge.bam
O=amz_RG_merge_dedup.bam REMOVE_DUPLICATES=true ASSUME_SORTED=true
M=amz_dedup_metrics CREATE_INDEX=true
iii. The local re-alignment was performed with GATK RealignerTargetCreator and
IndelRealigner walkers using the known indel sites of 1000 Genomes project to
optimize the alignment near the indels as:
java -Xms24g -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-
DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R hg19.fa -I
amz_RG_merge_dedup.bam --known
Mills_and_1000G_gold_standard.indels.GR37.sites.vcf --known
1000G_phase1.indels.hg19.vcf -o amz_RG_merge_dedup_realign.intervals
java -Xms24g -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-
DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T IndelRealigner -R hg19.fa -I
amz_RG_merge_dedup.bam -targetIntervals amz_RG_merge_dedup_realign.intervals -
known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -known
1000G_phase1.indels.hg19.sites.vcf -o amz_RG_merge_dedup_realignIndels.bam
iv. Next, the base quality score recallibration was performed using GATK
BaseRecalibrator and PrintReads walkers. GATK applies a machine learning
approach to reassess the errors of sequencing platform empirically, and adjusts the
Q scores of bases accordingly. This improves the accuracy of base calling.
java -Xms24g -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-
DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R hg19.fa -I
amz_RG_merge_dedup_realignIndels.bam -knownSites dbsnp37_chr_20151104.vcf -
knownSites Mills_and_1000G_gold_standard.indels.GR37.sites.vcf -knownSites
1000G_phase1.indels.hg19.vcf -o amz_RG_merge_dedup_realignIndels_BQRC.grp --
41
solid_nocall_strategy LEAVE_READ_UNRECALIBRATED --solid_recal_mode
SET_Q_ZERO_BASE_N
java -Xms24g -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-
DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T PrintReads -R hg19.fa -I
amz_RG_merge_dedup_realignIndels.bam -BQSR
amz_RG_merge_dedup_realignIndels_BQRC.grp -o
amz_RG_merge_dedup_realignIndels_BQRC.bam
v. The variants calling from the base quality recallibrated bam file was carried out
using GATK HaplotypeCaller. The variants were called with minimum mapping quality
score of 20.
java -Xms24g -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-
DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg19.fa -I
amz_RG_merge_dedup_realignIndels_BQRC.bam --dbsnp snp37_20151104.vcf -o
amz_raw_q20.vcf -stand_call_conf 20
vi. A variant was considered where at least two reads were supporting the variant.
So the raw vcf file was filtered with bcftools with DP>=2, as:
bcftools filter -i ―DP>=2‖ -o amz_q20_DP2.vcf amz_raw_q20.vcf
vii. The tendency of discoverving false postive variants was assessed by
calculating the Ti/Tv ratio. The Ti/Tv ratio was evaluated with GATK VariantEval
welker, as:
java -Xms24g -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-
DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T VariantEval -R hg19.fa -I
amz_RG_merge_dedup_realignIndels_BQRC.bam --eval:my_call amz_q20_DP2.vcf -o
amz_q20_DP2.vcf.eval.gr
42
2.2.5 Assessing the Genetic Variants related to Hyperlipidemia, and
related Cardiac Disorders
The annotation of variants was performed with ANNOAR with gene based, region
based, and filter based annotations described in section 2.1.3. For evaluating the
genetic variants related to hyperlimidemia, obesity and risk of related cardiac disorders
such as hypertension, myocardial infarction, and coronary artery disease, the analysis
pipe line as described in section 2.1.3 was applied to find the predicted deleterious
variants and re-assessing the variants filtered in mutational load analsis (section 2.1),
in the individual with these disorders. The variants related to these disorders were also
filtered by ClinVar, OMIM, and GWAS catalogure, which contains genetic variants
reported to be associated with diseases through genome wide association studies
(GWAS) (Welter et al., 2013).
43
2.3 Whole Exome Sequencing of Patients with Cardiomyopathy
Whole exome sequencing and its analysis by bioinformatics tools is becoming a
standard approach in investigating the genetic variations linked to diseases. In this
method, the coding regions of all the genes of a genome are sequenced to study the
mutations which may affect proteins‘ structure and function (Hintzsche, Robinson, and
Tan, 2016). The whole exome sequencing of 05 patients with dilated cardiomyopathy
(DCM) was carried out in order to validate the predicted deleterious variants of CVDs
identified in healthy persons through bioinformatics analysis (section 2.2). Dilated
cardiomyopathy is characterized as the dilation of left ventricle and its impaired
efficiency to pump the blood to peripheral body (systolic dysfunction) in the absence of
coronary artery disease and other abnormal loading conditions such as valves disease
or hypertension (Elliott, 2000). For this study, the ethical approval was obtained from
the institutional ‗Internal Ethical Committee‘. The patients of dilated cardiomyopathy
were selected from National Institute of Cardiovascular Diseases (NICVD), Karachi,
Pakistan. These five patients belonged to five different ethnic backgrounds of
Pakistan, i.e., one patient each from Punjabi, Sindhi, Balochi, Kashmiri, and Urdu
speaking community.
2.3.1 Selection of Cardiomyopathy Patients
The patients of dilated cardiomyopathy were selected based on the confirmed
diagnosis by a cardiologist. The inclusion criteria comprised of the physical symptoms
of the patients, and echocardiography reports (left ventricle dilation, ejection fraction ≤
30), age < 55 years, and the absence of coronary artery disease, hypertension, and
heart valve disease (Japp, Gulati, Cook, Cowie, and Prasad, 2016). Other modifiable
risk factors such as smoking, and alcoholism etc. were also excluded. In order to
enhance the power of study, preferably, the patients with parents who had cousin
marriages or in relatives were selected. Informed consent was obtained from the
patients prior to the blood specimen collection.
44
2.3.2 Collection of Blood Samples, and DNA Isolation and Quantification
The collection of blood samples from the selected patiens, isolation of genomic DNA
and assessment of its quality was carried out according the protocols as described in
section 2.1. Approximately 5 mL venous blood from each patient was collected in an
EDTA-tube. The genomic DNA was isolated using the CTAB buffer. The quality of
genomic DNA was assessed using the agarose gel electrophoresis, for which, 1%
agarose gel was prepared in 1X TAE buffer. The quantity of genomic DNA was
estimated on Qubit® 2.0 Fluorometer using the Qubit® dsDNA HS Assay Kit (Thermo
Fischer). For quantification, 1 uL of DNA sample was added to 199 uL of kit buffer
containing the fluorophore.
2.3.3 Library Preparation and Exome Enrichment for Whole Exome
Sequencing
The library preparation and whole exome sequencing was carried out at Macrogen Inc.
Seoul, South Korea. For whole exome sequencing, the fragment library for paired-end
sequencing was prepared with SureSelectXT Library Preparation Kit (Agilent
Technologies, Santa Clara, CA) using the SureSelectXT Target Enrichment System for
Illumina, Version B.2, April 2015. The whole exomes were enriched with SureSelectXT
Human All Exon v6 kit (Agilent Technologies, Santa Clara, CA) which captures 60 Mb
of the human genome (Agilent Technologies, 2017). The workflow of NGS library
preparation is oulined in Figure 2.4. The details of protocols is described as under:
2.3.3.1 Fragmentation of Genomic DNA
The genomic DNA (gDNA) was fragmented to an average size of 200 bp using the
Covaris S220 system. About 200 ng of genomic DNA was diluted with 1X Low TE
Buffer to a final volume of 50 uL in a 1.5-mL LoBind tube, and then transferred to
Covaris MicroTube. The gDNA was sheared using the Covaris program given in table
2.6.
45
Figure 2.5: NGS workflow for fragment library preparation and paired-end sequencing on Illumina.
46
Table 2.6: Settings on the Covaris instrument for gDNA fragmentation
Settings Value
Duty Factor 10%
Peak Incident Power (PIP) 175
Cycles per Burst 200
Treatment Time 6 min
Bath Temperature 5oC
2.3.3.2 End-repair of the Fragmented DNA
The ends of sheared DNA were repaired using the SureSelectXT Library Prep Kit. For
each of the sample, a volume of 52 uL of End Repair master mix was prepared as
given in Table 2.7. The whole of sheared DNA was mixed with End Repair master mix
and incubated in a thermocycler at 20oC for 30 minutes and then hold at 4oC.
Table 2.7: Components of End Repair master mix
Components Volume (uL)
10× End Repair Buffer 10.0
dNTP Mix 1.6
T4 DNA Polymerase 1.0
Klenow DNA Polymerase 2.0
T4 Polynucleotide Kinase 2.2
Nuclease-free water 35.2
Total mixture 52.0
2.3.3.3 Purification and Adenylation of End-repaired DNA
The end repaired DNA was purified using 180 μL of homogeneous AMPure XP beads
for each sample. The mixture was mixed well by pipette up and down and then
incubated at room temperature for 5 minutes. The tube was placed on a magnetic
stand for 01 minute and let the solution to be cleared. The clear supernatant was
discarded carefully without disturbing the beads. The beads were washed with 70%
47
ethanol without distrubing them. After the second wash, 32 μL nuclease-free water
was added to each sample tube, vortexed for 15 seconds, and incubated at room
temperature for 2 minutes. The tube was again placed on the magentic stand for 3
minutes. The clear supernatant, which contained the end-repaired DNA, was
transferred to a new PCR tube carefully.
The adenylation of end-repaired DNA at 3‘-ends was carried out to enhance the
ligation efficiency of SureSelect Adaptors. For this reaction, 20 uL of adenylation
master mix was used with 30 uL of end-repaired DNA. The adenylation master mix
contains components mentioned in Table 2.8. The mixture was incubated in a
thermocycler at 37oC for 30 minutes following a hold at 4oC. The 3‘-adenylated DNA
was purified using the AMPure XP beads as described above. The DNA was eluted in
13 uL of nuclease-free water.
Table 2.8: Components of Adenylation master mix
Components Volume (uL)
10× Klenow Polymerase Buffer 5.0
dATP 1.0
Exo(–) Klenow 3.0
Nuclease-free water 11.0
Total mixture 20.0
2.3.3.4 Ligation of Paired-end Adaptors
Paired-end adaptors are ligated on both ends of the 3‘-adenylated DNA. The adaptors
are complementary to the sequencing primers used during the sequencing reactions in
the flow cell of Illumina sequencer. For this, 1:10 diluted SureSelect Adaptor Oligo Mix
was used. The reaction mixture contained following components:
48
Table 2.9: Components for ligation of paired-end adaptors
Components Volume (uL)
5× T4 DNA Ligase Buffer 10.0
1:10 diluted SureSelect Adaptors Oligo Mix 10.0
T4 DNA Ligase 1.5
Nuclease-free water 15.5
Total mixture 37.0
The paired-end adaptors reaction mixture was mixed with 13 uL of adenylated DNA
from previous step. The mixture was incubated in a thermocycler at 20oC for 15
minutes following the hold at 4oC. The adaptors-ligated DNA was purified using the
AMPure XP beads as described in step 2.3.3.3. The DNA was eluted in 30 uL of
nuclease-free water.
2.3.3.5 Amplification of Adaptors-ligated Library
Amplification of the library using few cycles of PCR, increases the number of DNA
fragments to which both the adaptors have been ligated. For amplifying the library,
Herculase II Fusion DNA Polymerase is used which has high fidelity. The components
of amplification reaction mixture are given in Table 2.10.
Table 2.10: Components for amplifying the library
Components Volume (uL)
SureSelect Primer 1.25
SureSelect ILM Indexing Pre-Capture PCR Reverse Primer
1.25
5× Herculase II Reaction Buffer 10
100 mM dNTP Mix 0.5
Herculase II Fusion DNA Polymerase 1.0
Nuclease-free water 6.0
Total 20.0
49
This 20 uL of amplification reaction mixture was added to 30 uL of adaptors ligated
purified DNA from previous step, mixed well through pippeting and performed 10
cycles of PCR reaction according to the program mentioned below (Table 2.11).
Table 2.11: PCR program for amplification of adaptor ligated library
Temperature Time Repeats
98°C 2 minutes 1
98°C 30 seconds 10
65°C 30 seconds
72°C 1 minute
72°C 10 minutes 1
4°C hold ∞
After the PCR, the amplified library was purified again using the AMPure XP beads as
described in step 2.3.3.3 above. The amplified library was eluted in 30 uL of nuclease-
free water, and stored at -20oC.
2.3.3.6 Assessment of Quality and Quantity of the Amplified Library
The prepared libraries were analyzed and quantified using the Agilent 2200
TapeStation (Agilent Technologies, Santa Clara, CA). The samples were prepared
according to the user manual. To perform analysis, 1 uL of library was mixed with 3 μL
of D1000 sample buffer in sample tubes and vortexed for 5 seconds following a brief
centrifugation. The sample tubes, D1000 ScreenTape, and loading tips were placed in
the instrument and run was started according to the User Manual.
2.3.3.7 Hybridyzation and Exome Capturing
The exomic regions of the prepared library were captured using the SureSelectXT
Human all exon V6 kit (Agilent Technologies, Santa Clara, CA). The libraries were
concentrated in a vacuum concentrator to a final volume of 3.4 uL with the
concentration of 221 ng/μL (750 ng DNA). To each library, 5.6 uL of Block Mix (Table
2.12) was added, and incubated at 95oC in a thermocycler for 5 minutes following hold
50
at 65oC for another 5 minutes. For capturing the exoming regions, the Hybridiation
buffer was prepared according to the Table 2.13, and Capture Library Hybridization
Mix was prepared was prepared according to Table 2.14.
Table 2.12: Components of Block Mix
Components Volume (uL)
SureSelect Indexing Block 1 2.5
SureSelect Block 2 2.5
SureSelect ILM Indexing Block 3 0.6
Total 5.6
Table 2.13: Components of Hybridization Buffer
Components Volume (uL)
SureSelect Hyb 1 6.63
SureSelect Hyb 2 0.27
SureSelect Hyb 3 2.65
SureSelect Hyb 4 3.45
Total 13.0
Table 2.14: Components of Capture Library Hybridization Mix for capture size ≥3 Mb
Components Volume (uL)
Hybridization Buffer mixture 13.0
25% RNase Block solution 2.0
Library to be captured 5.0
Total 20.0
The capturing components were mixed with library at 65oC in a thermocycler and the
mixture was incubated at the same temperature for 24 hours.
51
2.3.3.8 Capturing the Hybridized DNA using Streptavidin-coated Beads
Before the capture of hybridized DNA, the magnetic coated streptividin beads were
prepared. For this, 50 uL of streptividin beads were washed 3 times by suspending in
200 uL of SureSelect Binding Buffer. The Hybridization mixture (containing the library)
was added to 200 uL of washed streptividin beads and mixed well on a stirrer for 30
minutes. After it, the plate was centrifuged briefly and put on a magnetic rack till the
beads settled completely. The supernatant was discarded, and beads were washed
using 200 μL of SureSelect Wash Buffer 1 with an incubation on 30 minites at room
temperature. The beads were then washed 3 times with pre-warmed Wash Buffer 2
with an incubation of 10 minutes at 65oC. After the final wash, the beads were
suspended in 30 uL nuclease free water.
2.3.3.9 Amplification of Captured Library with Indexing Primers
To run multiple samples in one lane of Illumina sequencer, each sample needs to be
indexed for identification. The indexing is carried out using indexing primers in a PCR
reaction. The PCR reaction mixture for each library was prepared as given in Table
2.15. For performing the indexing, 14 uL of streptavidin-bound library, 1uL of indexing
primer, and 35 uL of PCR reaction mixture were mixed in a well of PCR tubes strip.
The PCR reaction was performed according to the coditions given in Table 3.16. After
the PCR reaction, the amplified indexed library was purified using the AMPure XP
beads as described in section 2.3.3.3 of this section. The library was eluted in 30 uL of
nuclease free water. The libraries were analyzed and quantified with Agilent 2200
TapeStation using 1 uL of volume, as described in section 2.3.3.6.
52
Table 2.15: Components of PCR for indexing
Components Volume (uL)
5× Herculase II Reaction Buffer 10.0
Herculase II Fusion DNA Polymerase 1.0
100 mM dNTP Mix 0.5
SureSelect ILM Indexing Post-Capture Forward PCR Primer
1.0
Nuclease-free water 22.5
Total 35.0
Table 2.16: PCR program for indexing the library
Temperature Time Repeats
98°C 2 minutes 1
98°C 30 seconds 10
57°C 30 seconds
72°C 1 minute
72°C 10 minutes 1
4°C hold ∞
2.3.3.10 Sequencing by Synthesis on Illumina Platform
The sequencing of the libraries was carried out on Illumina HiSeq 4000 system using
the TruSeq SBS v3 reagents. After cluster generation, the paired-end sequencing was
carried out for 2 x 100 bp fragment lengths using sequencing by synthesis (SBS)
technology. In SBS, the fluorescently-labeled nucleotides are incorporated into the
growing poly-nucleotide chain, such that only one nucleotide is incorporated at a time
because the label acts as the reversible terminator. After the nucleotide is detected by
its fluorescence, the terminator is cleaved enzymatically and then the next labelled
nucleotide is incorporated.
53
2.3.4 Analysis of Whole Exome Sequencing Raw Data
The raw sequencing data was obtained in ‗.fastq‘ format. The secondary analysis to
generate a standard variants call format file requires a sophisticated pipe line of
computational biology tools. The pipe line of secondary analysis, as used by ExAC
consortium, was used with slight modification. Various steps to analyze data are given
below:
1. The quality of raw data was assessed with ‗FastQC‘ tool (Andrews, 2010),
which performs analysis and graphically reports the read length of high throughput
short reads, its per base quality scores, GC contents, and contmination of adaptors.
perl fastqc –f fastq MS-1_1.fastq -o ./qc/
2. The good quality short reads were aligned with human reference genome
version 19 (hg19.fa) of UCSC genome browser (http://hgdownload.cse.ucsc.edu/
goldenPath/hg19/bigZips/) using the Burrows-Wheeler Aligner (BWA) tool (H. Li, and
Durbin, 2009). The reference genome was indexed with ‗bwa index‘ prior to the
alignment. For performing alignment, BWA-MEM algorithm was used because it can
align short DNA reads of 70b-1Mb. This algorithm performs alignement of short reads
with the reference using the Smith-Waterman-algorithm (SW) to enhance the maximal
exact matches (MEMs).
./bwa index hg19.fa
./bwa mem -M -t 8 hg19.fa MS-1_1.fastq MS-1_2.fastq > ms1.sam
3. The alignment result was obtained in Sequence Alignment Map (SAM) format,
which was converted into the binary format ‗Binary Alignment Map‘ (BAM) using the
samtools version 0.0.19 (H. Li et al., 2009).
samtools view –bS ms1.sam -o ms1.bam
54
4. The next five steps of pipeline i.e., sorting of BAM files, adding RG tags,
removing duplicates, indels realignment and base quality score recallibration were
performed as described in section 2.2.4.3.
5. The genotype calling of each base quality score recallibrated BAM file was
carried out using GATK‘s walker HaplotypeCaller. The minumum calling confidence of
30 was set. To output only genotypes, -ERC GVCF flag was used.
java -Xmx32g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -jar
GenomeAnalysisTK.jar -T HaplotypeCaller --
disable_auto_index_creation_and_locking_when_reading_rods -R hg19.fa -I
MS4_q30_sort_dedup_RG_ReAlignIndel_BQRC.bam --dbsnp
dbsnp37_chr_20151104.vcf --minPruning 3 --maxNumHaplotypesInPopulation 200 -
ERC GVCF -o MS4_q30.g.vcf.gz -stand_call_conf 30
6. The joint variants calling from the gvcf files was carried out with GATK‘s walker
GenotypeGVCFs, as:
java -Xms36g -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-
DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R
/scratch/cmp_shakeel/tools/hg19.fa -V MS1.g.vcf.gz -V MS2.g.vcf.gz MS3.g.vcf.gz -V
MS4.g.vcf.gz -V MS5.g.vcf.gz --dbsnp dbsnp37_chr_20151104.vcf -A
GenotypeSummaries -o msall_GVCF.vcf
7. The raw vcf file ‗msall_GVCF.vcf‘ was filtered with depth of 20 (DP≥20),
genotype quality of 20 (GQ≥20), and variant quality of 50 (QUAL ≥ 50). Futher, to
minimize the false positive discovery rate, recallibration of variants quality score was
performed using GATK‘s variant discovery tool ‗VariantRecalibrator‘. The single
nucleotide variants were trained with three high confidnet known SNPs datasets i.e.,
‗hapmap_3.3.GRCh37.vcf‘, 1000G_omni2.5.GRCh37.vcf, and 1000G_phase1.snps.
high_confidence.GRCh37.vcf. Likewise, the indels were trained using the
‗Mills_and_1000G_gold_standard.indels. GR37.sites.vcf‘ and ‗1000G_phase1.indels.
hg19.vcf‘. For SNPs, 99.2% sensitivity threshold was applied, and for indels, 95.0%
sensitivity threshold was applied to acheve maximum truth. Only the variants, passing
the filter were proceeded for tertiary analysis.
55
java -Xms24g -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-
DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T VariantRecalibrator --
disable_auto_index_creation_and_locking_when_reading_rods -R hg19.fa -input
msall_DP20gq20.vcf -recalFile msall_DP20gq20_output-Tranche.snps.recal -
tranchesFile msall_DP20gq20_output-Tranche.snps.tranches -allPoly -tranche 100.0 -
tranche 99.8 -tranche 99.6 -tranche 99.4 -tranche 99.2 -tranche 99.0 -tranche 98.0 -
tranche 97.0 -tranche 95.0 -tranche 90.0 -an QD -an MQ -an MQRankSum -an
ReadPosRankSum -an FS -an SOR -
resource:hapmap,known=false,training=true,truth=true,prior=15
hapmap_3.3.GRCh37.vcf -resource:omni,known=false,training=true,truth=true,prior=12
1000G_omni2.5.GRCh37.vcf -
resource:1000G,known=false,training=true,truth=false,prior=10
1000G_phase1.snps.high_confidence.GRCh37.vcf -
resource:dbsnp,known=true,training=false,truth=false,prior=3
dbsnp37_chr_20151104.vcf --maxGaussians 4 -mode SNP -rscriptFile
msall_output.snps.recalibration-Tranche_plots.rscript
java -Xms24g -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-
DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T ApplyRecalibration --
disable_auto_index_creation_and_locking_when_reading_rods -R hg19.fa -input
msall_DP20gq20.vcf -recalFile msall_DP20gq20_output.snps.recal -tranchesFile
msall_DP20gq20_outputg4.snps.tranches -ts_filter_level 99.2 -mode SNP -o
msall_DP20gq20_SNP.vcf
java -Xms24g -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-
DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T VariantRecalibrator --
disable_auto_index_creation_and_locking_when_reading_rods -R hg19.fa -input
msall_DP20gq20-SNP98.vcf -recalFile msall_DP20gq20- output.indels.recal -
tranchesFile msall_DP20gq20- output.indels.tranches -allPoly -tranche 100.0 -tranche
99.95 -tranche 99.9 -tranche 99.5 -tranche 99.0 -tranche 97.0 -tranche 96.0 -tranche
95.0 -tranche 94.0 -tranche 93.5 -tranche 93.0 -tranche 92.0 -tranche 91.0 -tranche
90.0 -an QD -an ReadPosRankSum -an MQRankSum -an FS -an MQ -
resource:mills,known=false,training=true,truth=true,prior=12
Mills_and_1000G_gold_standard.indels.GR37.sites.vcf -
resource:1000G,known=false,training=true,truth=false,prior=10
1000G_phase1.indels.hg19.vcf -
resource:dbsnp137,known=true,training=false,truth=false,prior=2
dbsnp37_chr_20151104.vcf --maxGaussians 4 -mode INDEL -rscriptFile
MSall_output.indels.recalibration_plots.rscript
56
java -Xms12g -Xmx24g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-
DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T ApplyRecalibration --
disable_auto_index_creation_and_locking_when_reading_rods -R hg19.fa -input
msall_DP20gq20-SNP.vcf -recalFile msall_DP20gq20-output.indels.recal -tranchesFile
msall_DP20gq20-output.indels.tranches -ts_filter_level 95.0 -mode INDEL -o
msall_DP20gq20_SNP-indel.vcf
2.3.5 Analysis of Variants for Cardimyopathy
For annotating the genetic variants and determing the potential detrimental variants,
the analysis pipe line described insection 2.1.3 was employed. The annotation with
ANNOVAR was carried out by stand alone utility of ANNOVAR tool, the CADD was
performed online at CADD online server (http://cadd.gs.washington.edu/score), and
VEP was accessed at ensembl‘s online server http://grch37.ensembl.org/
Homo_sapiens/Tools/VEP.
In addition to annotation with afore mentioned tools, the data set was also filtered for
pathogenic, and likely pathogenic variants of dilated cardiomyopathy in ClinVar
database (Landrum et al., 2014), OMIM database (Hamosh, Scott, Amberger,
Bocchini, and McKusick, 2005), and genome wide associated studies (GWAS)
associated variants (Welter et al., 2013). Furthrmore, the deleterious variants as
prioritized from healthy individuals in mutational load analysis (section 2.1), were also
filtered for their validation.
57
Chapter 3.0
Results and Discussion
58
3.1 Mutational Load of Cardiovascular Diseases in Pakistani
Population and its Comparison with Global Populations
3.1.1 Gene Ontology
Grouping of the genes under study based on their cellular, molecular, and biological
roles was carried out using the UniProt Gene Ontology Annotation database for
human (version 2.0) (Camon et al., 2004), and visualized using the BGI WEGO online
ontology tool (Ye et al., 2006). This analysis showed that most of the genes were
involved in binding, catalysis, and molecular transduction, and enzyme regulation in
many biological processes such as biological regulation, anatomical structure
formation, cellular compartment organization and genesis, developmental, metabolic,
and organismal process etc. (Figure 3.1). Gene ontology shows that genes related
with structural processes of the organelles representing the anatomical nature, and
genes related with extracellular processes represent the metabolic nature of cardiac
diseases.
3.1.2 Mutational Load of CVDs in Pakistani Population using 1000
Genomes PJL, ExAC SAS, and British Pakistanis Datasets
To quantify the mutational load of cardiovascular diseases, all the SNVs from three
datasets corresponding to intronic, exonic, untranslated regions, and flanking
upstream/downstream regions of the genes-set were analyzed by applying the
analysis pipeline. The numbers of variants found from the three data sets were
different due to the difference in data structure and sample size (Table 3.1).
59
Figure 3.1: Functional categorization of genes involved in cardiovascular diseases.
60
Table 3.1: The subset of variants within the coordinates of genes-set of CVDs. Here, ExAC (SAS) data was excluded for common CVDs for calculating the mutational load because it contained samples of common CVDs cohort.
Details
1000 Genomes PJL
British Pakistanis
ExAC South Asian
Sample size 96 3222 8256
Genes related to CVDs analyzed 1187 1187 379
Subset of variants in these genes 409102 93523 71816
Exonic variants 6941 41155 44357
5’-UTR variants 1573 1898 1075
3’-UTR variants 7541 2632 1694
Upstream variants 4668 256 80
Downstream variants 4752 39 09
Predicted Consequences of Variants:
Non-synonymous SNVs 3521 24901 28305
Synonymous SNVs 4125 15624 15437
Non-syn/syn ratio 0.85 1.59 1.83
‘Combined predicted deleterious’ SNV sites with SIFT, Polyphen2, and CADD_phred score ≥ 15 (dSNVs)
561 6028 7374
Homozygous dSNVs 69 -* 306
Loss of Function (LoF) dSNVs 05 09 142
Per Person deleterious SNV sites 5.84 1.87 0.89
* information not available
In order to normalize and evaluate the subset variants, the proportions of synonymous
SNVs and nonsynonymous SNVs in exonic variants, non-syn/syn ratio, deleterious
nonsynonymous SNVs, and homozygous deleterious nSNVs from these data sets
were calculated. This evaluation showed that the proportions of nonsynonymous
exonic SNVs (nonsynonymous SNVs/exonic SNVs), and deleterious nonsynonymous
SNVs (deleterious nSNVs/exonic SNVs) were higher in British Pakistanis and ExAC
SAS datasets. On the other hand, the proportion of synonymous exonic SNVs
(synonymous SNVs/exonic SNVs) was higher in 1000 Genomes Project PJL dataset
(Figure 3.2). The higher proportions of nonsynonymous and deleterious SNVs in ExAC
SAS, and British Pakistanis were may be due to the data structure, because both
these data sets were deeply sequenced (~100x) which captured ultra-rare allele
61
frequency variants also. To check this, average allele frequencies of nonsynonymous
and deleterious nSNVs were calculated and compared from the three datasets. The
average allele frequency of nonsynonymous SNVs in 1000 Genomes PJL was found
to be 0.117, while in British Pakistanis it was 0.01839, and in ExAC SAS 0.00826.
Likewise, the average allele frequency of deleterious nSNVs in 1000 Genomes PJL
was calculated as 0.028, in ExAC SAS 0.00149, and in British Pakistanis 0.00381.
Figure 3.2: The proportions of nonsynonymous, synonymous, and deleterious SNVs in three datasets.
62
The numbers of SNVs predicted as deleterious by CADD, SIFT, and Polyphen2 after
applying the analysis pipeline are summarized in Figure 3.3. The per-person
mutational load for cardiovascular diseases was calculated by dividing the combined
predicted deleterious SNVs with sample size in each dataset. This calculation showed
that there were 5.84 deleterious sites per person in 1000 Genomes PJL, 0.89 in ExAC
SAS, and 1.87 in British Pakistanis dataset. The low mutational load in ExAC SAS is
due to that this data was analyzed for Mendelian and congenital CVDs only, which can
be correlated with the general concept of low prevalence of Mendelian disorders, and
that these disorders are caused by mutations of usually rare allele frequency in single
or few genes with large impact on the structure and/or function of proteins (O'donnell,
and Nabel, 2011).
Figure 3.3: The number of SNVs predicted as deleterious by CADD, Polyphen2, and SIFT in genes of cardiovascular diseases.
To explore the apparent difference in the mutational load of CVDs from the 1000
Genomes PJL, and British Pakistanis datasets, which were analyzed with the same
number of genes, the additive mutational load was calculated. Additive mutational load
is the cumulative effect on fitness by taking into account the effect of all detrimental
63
alleles (Bergen, 2015; Henn et al., 2016). This was determined by dividing the sum of
all homozygous and heterozygous deleterious alleles by the cohort in that dataset
(Henn et al., 2016). The per person additive mutation load (in diploid genome) for
British Pakistanis was calculated to be 22.03, and for 1000 Genomes PJL 15.78.
Although, the British Pakistanis contained less number of per person deleterious sites,
yet these sites might have been raised to higher frequencies during high rate of
inbreeding in related individuals due to consanguineous unions which resulted in
higher additive mutational load for CVDs, a phenomenon termed as inbreeding
depression. In inbreeding depression, the increased breeding among related
individuals reduces the biological fitness due to the accumulation of recessive
mutations of varying detrimental effect in a given small population (Charlesworth, and
Willis, 2009). Further, the higher mutational load for common CVDs than for Mendelian
and congenital CVDs can be explained in that common CVDs are polygenic where
large number of deleterious variants in multiple genes with modest-to-weak effect play
their cumulative role in disease susceptibility, whereas Mendelian CVDs are
monogenic or oligo-genic where few rare variants pose greater effect in the outcome
of phenotype (Lettre, 2014).
From 1000 Genomes PJL analysis, the highest number of deleterious variants (10
variants) were found in PRRC2A which encodes proline rich coiled-coil 2A and is
involved in coronary artery aneurysm (Hsieh et al., 2010). The second highest number
of deleterious variants (9 variants) was found in SVEP1 which encodes Sushi Von
Willebrand factor type A, EGF and pentraxin domain containing 1 and is involved in
calcium ion and chromatin binding. This gene has been associated with coronary
artery disease. Notably, a deleterious variants rs111245230 was also found in SVEP1
which causes D2702G substitution in exon 38, and has been reported to be
associated with coronary artery disease and higher diastolic and systolic blood
pressures (Stitziel et al., 2016). Its minor allele frequency was found to be 5.20% in
PJL individuals, 2.76% in South Asians, 3.18% in Europeans, and 2.74% in
Americans. The third highest number of deleterious variants (8 variants) containing
gene was SYNE1 which encodes spectrin repeat containing nuclear envelope protein
1 which is a structural protein in skeletal and smooth muscles and is associated with
64
dilated cardiomyopathy. Mutations in this gene cause disruption of nuclear envelope
leading to defects in myogenesis (Zhou et al., 2017). Further, the genes APOB and
MUC16 were found to be having seven deleterious mutations each. APOB is
associated with hypercholesterolemia and coronary artery disease (Willer et al., 2008),
while MUC16 has been reported to be associated with hypertrophic cardiomyopathy
and heart failure (Varol et al., 2007). Another gene ACE was found with six deleterious
variants. This gene encodes angiotensin I converting enzyme and has been
associated with risk of hypertensive heart disease and coronary artery disease (Dhar,
Ray, Dutta, Sengupta, and Chakrabarti, 2012).
From ExAC SAS data analysis, the highest number of deleterious variants, i.e., 1526
deleterious variants were found in TTN, which encodes titin protein, which is part of
sarcomeres in striated muscles and is associated with cardiomyopathies (Gerull et al.,
2002; Matsumoto et al., 2005). To determine chromosomal locations of these large
number of deleterious variants in TTN, manhattans plot comprising the TTN region on
chromosome 2 was constructed using the CADD_phred score. This showed that the
majority of deleterious variants are bunched in initial exons of this gene (Figure 3.4).
Figure 3.4: Chromosomal positions of deleterious variants in TTN. The deleterious variants are bunched in initial exons of the gene.
65
In addition to TTN, many genes were found to have multiple deleterious variants for
Mendelian and congenital disease (Table 3.2). OBSCN which is paralogue of TTN,
was found having second highest number i.e., 233 deleterious variants. This gene
encodes obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF protein.
Table 3.2: Genes of Mendelian and congenital CVDs containing high number of predicted deleterious variants in ExAC SAS
Gene No. of dele variants
Disorder
TTN 1526 Dilated cardiomyopathy
OBSCN 233 Dilated cardiomyopathy
SYNE1 144 Dilated cardiomyopathy
ALMS1 120 Alström syndrome and dilated cardiomyopathy
FLNC 111 Dilated cardiomyopathy
SYNE2 111 Dilated cardiomyopathy
MYH6 99 Dilated cardiomyopathy
LAMA3 98 Dilated cardiomyopathy
AKAP9 88 Cardiac arrhythmia
RYR2 84 Cardiac arrhythmia
SCN5A 84 Atrial fibrillation, long_QT_syndrome
VWF 78 Hypertrophic cardiomyopathy
MUC16 77 Hypertrophic cardiomyopathy
TNC 69 Dilated cardiomyopathy
LAMA2 67 Dilated cardiomyopathy
TRPM4 60 Ventricular fibrillation
DSP 57 Arrhythmogenic right ventricular dysplasia 8
POLG 55 Dilated cardiomyopathy
MADD 53 Familial hypertrophic cardiomyopathy 4
ANK2 52 Cardiac arrhythmia, ankyrin-b-related
FLNA 52 Cardiac valvular dysplasia, x-linked
MYBPC3 51 Cardiomyopathies
MYH11 51 Aortic aneurysm, familial thoracic 4
MYBPC1 49 Dilated cardiomyopathy
MYH7 48 Dilated cardiomyopathy
DMD 45 Cardiomyopathy, dilated, 3b
MYOM1 43 Primary dilated cardiomyopathy|primary familial hypertrophic cardiomyopathy
NOTCH1 41 Ventricular septal defect
LAMA4 40 Primary familial hypertrophic cardiomyopathy
NFATC1 39 Ventricular septal defect
66
PRDM16 38 Dilated cardiomyopathy 1ll
UNC5B 37 Ventricular tachycardia
PYGB 36 Hypertrophic cardiomyopathy
CTNNA3 35 Left ventricular noncompaction cardiomyopathy
JUP 34 Cardiac arrhythmia
CACNA1C 33 Cardiac arrhythmia
DSC3 33 Arrhythmogenic right ventricular dysplasia 11
ERBB2 32 Dilated cardiomyopathy
RTN4 32 Congenital heart defects, with multiple joint dislocations
LDB3 31 Familial hypertrophic cardiomyopathy 1
NDUFV1 30 Dilated cardiomyopathy
DTNA 29 Left ventricular noncompaction 1, with or without congenital heart defects
DMPK 28 Hypertrophic cardiomyopathy
TXNRD2 28 Primary familial hypertrophic cardiomyopathy
VCL 28 Hypertrophic cardiomyopathy
ACTN2 27 Dilated cardiomyopathy 1aa; primary familial hypertrophic cardiomyopathy
HLA-DRB1 27 Dilated cardiomyopathy
IGF1R 27 Hypertrophic cardiomyopathy
PGM1 27 Dilated cardiomyopathy
DSC2 25 Arrhythmogenic right ventricular dysplasia
MYBPC2 25 Hypertrophic cardiomyopathy
PKP2 25 Arrhythmogenic right ventricular dysplasia 9
MMP2 22 Hypertrophic cardiomyopathy
KCNQ1 21 Long qt syndrome 1
67
From British Pakistanis dataset, which was analyzed for all the genes of common,
Mendelian, and congenital CVDs, contained 52 deleterious variants in APOB which is
well associated with hypercholesterolemia and coronary artery disease (Willer et al.,
2008). The second highest number of deleterious variants from common CVDs was
found in PRRC2A which is associated with coronary artery aneurysm (Hsieh et al.,
2010). The gene HSPG2 contained 44 deleterious variants. This gene encodes
heparan sulfate proteoglycan 2 which is prominent in atrial extra cellular matrix, and
has been reported to lower the risk of atherogenesis because it inhibits the retention of
lipoproteins. The low expression of HSPG2 and decreased amount of heparan sulfate
proteoglycan has been shown to be associated with carotid atherosclerotic lesions
(Tran et al., 2007). Likewise from the genes associated with Mendelian CVDs, again
TTN was the top most gene having 484 deleterious mutations. The genes prioritized
from this dataset having multiple deleterious mutations are summarized in Table 3.3.
Table 3.3: Genes of common, Mendelian and congenital CVDs containing high number of predicted deleterious variants in British Pakistanis.
Gene No. of dele
variants
Disorder
Genes of Mendelian and congenital CVDs
TTN 484 Dilated cardiomyopathy
OBSCN 95 Dilated cardiomyopathy
SYNE2 68 Dilated cardiomyopathy
SYNE1 49 Dilated cardiomyopathy
RYR2 37 Cardiac arrhythmia
LAMA3 36 Dilated cardiomyopathy
MYH6 36 Dilated cardiomyopathy
FLNC 35 Dilated cardiomyopathy
MUC16 34 Hypertrophic cardiomyopathy
SCN5A 33 Atrial fibrillation, long_QT_syndrome
ALMS1 31 Alström syndrome and dilated cardiomyopathy
FPGT-
TNNI3K
30 Cardiac conduction disease with or without dilated
cardiomyopathy
ABCC6 28 Arterial calcification of infancy
AKAP9 28 Cardiac arrhythmia
68
TRPM4 24 Ventricular fibrillation
LAMA2 21 Dilated cardiomyopathy
VWF 21 Hypertrophic cardiomyopathy
MYH11 20 Aortic aneurysm, familial thoracic 4
Genes of Common CVDs
APOB 52 Hypercholesterolemia and coronary artery disease
PRRC2A 45 Coronary artery aneurysm
HSPG2 44 Carotid atherosclerotic lesions
SDK1 33 Hypertension
CSMD1 30 Hypertension
ABCC6 28 Dystrophic cardiac calcification
FN1 27 Aortic aneurysm
ACE 26 Hypertensive heart disease and coronary artery disease
ITPR3 21 Coronary artery aneurysm
SVEP1 21 Coronary artery disease
XDH 21 Atherosclerosis
The highest number of predicted deleterious SNVs in the datasets under study
contributing to common CVDs included in descending order as hypertension,
atherosclerosis, heart failure, aneurysm, and coronary heart disease. For Mendelian
and congenital CVDs, it included cardiomyopathies (dilated and hypertrophic), cardiac
arrhythmias, and atrioventricular septal defects.
69
3.1.3 Filtration of Variants from ClinVar Database
The variants of three datasets under study were filtered for disease mutations
catalogued in ClinVar database. This filtration highlighted several variants associated
with cardiovascular disorders with pathogenic or likely pathogenic significance. Here,
almost all filtered variants were related to Mendelian and congenital CVDs due to
nature of submissions in the database.
From 1000 Genomes PJL, 03 variants with ClinVar significance ‗Pathogenic‘, and 02
variants with ‗likely Pathogenic‘ were identified (Table 3.4). The three pathogenic
SNVs (rs201654872, rs115372595, and rs201680145) contribute to dilated
cardiomyopathy, atrioventricular septal defect, and cerebral autosomal dominant
arteriopathy respectively. The annotation with online VEP tool showed that two
pathogenic missense SNVs rs201654872 [Val/Met] and rs201680145 [Arg/Cys] are
linked with CCCTC-binding factor site (CTCF_binding_site). The CTCF_binding_sites
are major determinants of long-range interactions (looping) of chromatins which alter
gene expression (Zlotorynski, 2015). The third pathogenic missense SNV
rs115372595 [Ala/Val] is also linked with regulatory region (open chromatin region).
The open chromatin sites tend to be near the transcription start site and play a role in
gene expression coincident with CTCF binding sites (Song et al., 2011). The two
‗Likely Pathogenic‘ variants (rs193922669, and rs77613865) contribute to
arrhythmogenic right ventricular cardiomyopathy and hypertrophic cardiomyopathy
respectively. The missense SNV rs193922669 causes Arg/His substitution in
desmoplakin protein, while rs77613865 is a splice region variant, and is also linked
with open chromatin region affecting the expression of myomesin 1 (MYOM1).
From ExAC SAS dataset, 153 SNVs were filtered containing 111 with ‗Pathogenic‘ and
42 with ‗Likely Pathogenic‘ significance (Table 3.5). It was noted that nearly half of
these filtered SNVs (78, 47.40%) belonged to different forms of cardiomyopathies, 38
(26.68%) SNVs were found to be associated with Long_QT syndrome, and 8 (5.19%)
SNVs related to various forms of atrioventricular septal defects. The allele frequencies
of filtered variants were compared among continental populations which divulged 11
SNVs with significantly higher allele frequency in SAS than in other populations of the
70
world (Figure 3.5). The filtered 153 SNVs were annotated with online VEP tool for their
functional consequences which showed that 13 SNVs impart completely Loss of
Function (LoF) effect to the transcripts, and 23 SNVs contributed to regulatory regions.
The filtration from British Pakistani dataset highlighted 58 SNVs containing 42 with
‗Pathogenic‘ and 16 with ‗Likely Pathogenic‘ significance (Table 3.6). In this data set,
23 SNVs were found to be associated with different forms of cardiomyopathies, 20
SNVs with long QT syndrome, 4 SNVs with atrioventricular septal defects, 2 SNVs
with familial hypercholesterolemia, 2 SNVs with aortic aneurysm, and 3 SNVs with
progressive familial heart block. The annotation with online VEP tool for functional
consequences revealed 2 SNVs with stop-gained effect to the transcript (rs786204338
and rs372827156) contributing to cardiomyopathies, and 10 SNVs posing detrimental
effect to regulatory regions.
To have an overall view of all the pathogenic and likely pathogenic variants filtered
from the three datasets, these were combined and the allele frequencies were
retrieved within their respective datasets. The allele frequency of variants filtered from
1000 Genomes Project was almost equal in world populations. From ExAC SAS, 11
SNVs were highlighted having higher allele frequency in SAS than in other populations
of the world (Figure 3.5). The highest load in terms of allele count of pathogenic and
likely pathogenic variants in ClinVar was found for progressive familial heart block
(OMIM # 113900, & 604559) (Figure 3.6). The genomic locations of genes harboring
these variants associated with common, Mendelian and congenital CVDs in Pakistani
population were highlighted (Figure 3.7). Few loci were observed rich in pathogenic
and likely pathogenic variants for CVDs including SCN5A on chromosome 3, KCNH2
on chromosome 7, GATA4 on chromosome 8, KCNQ1 and MYBPC3 on chromosome
11, MYH7 on chromosome 14, and KCNE1 on chromosome 21. The SCN5A encodes
sodium voltage-gated channel alpha subunit 5 which is found in cardiac muscles
primarily. It plays role in the upstroke during the action potential in cardiac cells. The
mutations in this gene have been found to disturb cardiac rhythm causing long QT
syndrome (Qureshi et al., 2015; Schwartz et al., 1995). The KCNQ1 encodes
potassium voltage-gated channel subfamily Q member 1 which is involved in
71
repolarization phase of action potential in cardiac muscles. Mutations in this gene are
also associated with long QT syndrome (Tester, and Ackerman, 2014). The MYBPC3
encodes cardiac myosin binding protein C which plays role in forming cross-bridges of
A bands of cardiac striated muscles. The MYH6 and MYH7 encode alpha and beta
myosin heavy chains respectively which constitute the cardiac myosin proteins. These
three genes are well associated with various forms of cardiomyopathies leading to
heart failure (Bezzina, 2008; Cahill, Ashrafian, and Watkins, 2013). The genes KCNE1
& KCNE2 encode potassium voltage-gated channels playing role in cardiac
conduction. These genes are associated with long QT syndrome (Tester, and
Ackerman, 2014).
Figure 3.5: ClinVar‘s pathogenic and likely pathogenic variants from ExAC SAS having significantly higher allele frequency in SAS than in other populations.
0
0.5
1
1.5
2
2.5
3
Alle
le F
req
uen
cy (
%)
SAS
EUR
AMR
AFR
EAS
72
Figure 3.6: Mutational load of different cardiovascular disorders in terms of allele counts of ClinVar‘s pathogenic and likely pathogenic variants.
1 10 100 1000 10000
Progressive_familial_heart_block
Long_QT_syndrome
Left_ventricular_noncompaction
Hypertrophic_cardiomyopathy
Hyperlipidemia/Hypercholesterolemia
Dilated_cardiomyopathy
Congenital_heart_diseases
Arrhythmogenic_right_ventricular_cardiomyopathy
Allele Count
British Pakistanis
ExAC SAS
73
Figure 3.7: Chromosomal positions of genes harboring the ClinVar‘s pathogenic and likely pathogenic variants associated with cardiovascular diseases. One circle beside the chromosomes denotes one variant, and the colour represents gene.
74
Table 3.4: ClinVar‘s pathogenic and likely pathogenic variants filtered form 1000 Genomes PJL dataset.
CHR POS ID REF ALT Gene Clinical
Significance
Disease
1 3347452 rs201654872 G A PRDM16 Pathogenic Dilated_cardiomyopathy
6 7583050 rs193922669 G A DSP Likely
Pathogenic
Arrhythmogenic_right_ventricular_
cardiomyopathy
8 11614483 rs115372595 C T GATA4 Pathogenic Atrioventricular_septal_defect_4
18 3149140 rs77613865 T G MYOM1 Likely
Pathogenic
Hypertrophic_cardiomyopathy
19 15289863 rs201680145 G A NOTCH3 Pathogenic Cerebral_autosomal_dominant_
arteriopathy
Table 3.5: ClinVar‘s pathogenic and likely pathogenic variants filtered
form ExAC SAS dataset.
CHR POS ID REF ALT Gene Clinical
Significance
Disease
1 3329208 rs397514743 A G PRDM16 Pathogenic Left_ventricular_noncompaction_8
1 3347452 rs201654872 G A PRDM16 Pathogenic Dilated_cardiomyopathy_1LL
1 11907430 rs61757261 T G NPPA Pathogenic Atrial_fibrillation_familial_6
1 116275561 rs146664754 G C CASQ2 Likely
Pathogenic
Ventricular_tachycardia
1 156108298 rs60890628 C T LMNA Pathogenic Dilated_cardiomyopathy_1A
1 169524537 rs118203906 C G F5 Pathogenic Thrombophilia_due_to_activated_
protein_C_resistance
1 201328778 rs730881125 C T TNNT2 Likely
Pathogenic
Cardiomyopathy
1 201333455 rs483352832 G A TNNT2 Pathogenic Dilated_cardiomyopathy_1DD
1 227073271 rs63750197 C T PSEN2 Pathogenic Dilated_cardiomyopathy_1V
1 236925912 rs199920384 A G ACTN2 Likely
Pathogenic
Cardiomyopathy
2 179393524 rs565675340 G A TTN Likely
Pathogenic
Myopathy_with_fatal_cardiomyopathy
2 179430143 rs727505284 G A TTN Likely
Pathogenic
Dilated_cardiomyopathy_1G
2 179655434 rs397517497 C T TTN Likely
Pathogenic
Dilated_cardiomyopathy_1G
3 9979308 rs28941780 G A CRELD1 Pathogenic Atrioventricular_septal_defect_
partial_with_heterotaxy_syndrome
3 14183113 rs778127887 C T TMEM43 Likely
Pathogenic
Cardiomyopathy
75
3 20225453 rs199815268 T C SGOL1 Pathogenic Chronic_atrial_and_intestinal_
dysrhythmia
3 32200588 rs72552291 C T GPD1L Pathogenic Cardiomyopathy
3 33114105 rs72555392 C T GLB1 Pathogenic GM1-Gangliosidosis_Type_I_
with_Cardiac_involvement
3 38592356 rs45563942 A G SCN5A Pathogenic Dilated_cardiomyopathy_1E
3 38592408 rs137854619 C T SCN5A Pathogenic Long_QT_syndrome_2/3
3 38592534 rs199473314 C T SCN5A Pathogenic Congenital_long_QT_syndrome
3 38595794 rs199473279 C T SCN5A Pathogenic Congenital_long_QT_syndrome
3 38603958 rs199473603 G A SCN5A Pathogenic Congenital_long_QT_syndrome
3 38607905 rs199473341 C T SCN5A Pathogenic Dilated_cardiomyopathy
3 38622444 rs199473187 G A SCN5A Pathogenic Congenital_long_QT_syndrome
3 38622640 rs199473183 A G SCN5A Pathogenic Congenital_long_QT_syndrome
3 38629013 rs199473157 C T SCN5A Pathogenic Congenital_long_QT_syndrome
3 38639417 rs199473580 G A SCN5A Pathogenic Congenital_long_QT_syndrome
3 38645249 rs12720452 C T SCN5A Pathogenic Congenital_long_QT_syndrome
3 38645420 rs1805124 T C SCN5A Pathogenic Progressive_familial_heart_block_
type_1A
3 38647498 rs199473111 C T SCN5A Pathogenic Atrial_fibrillation|Atrial_fibrillation
3 38647543 rs199473110 C T SCN5A Pathogenic Congenital_long_QT_syndrome
3 38671821 rs199473059 C G SCN5A Pathogenic Congenital_long_QT_syndrome
3 38674671 rs199473047 C T SCN5A Pathogenic Congenital_long_QT_syndrome
3 46899901 rs145520567 C T MYL3 Likely
Pathogenic
Cardiomyopathy
3 46899903 rs193922391 T C MYL3 Likely
Pathogenic
Cardiomyopathy
4 114284598 rs45570339 C G ANK2 Pathogenic Congenital_long_QT_syndrome
4 114286207 rs66785829 T A ANK2 Pathogenic Arrhythmia
4 114288907 rs35530544 C A ANK2 Pathogenic Cardiac_arrhythmia_ankyrin_B-related
4 114294462 rs121912706 C T ANK2 Pathogenic Long_QT_syndrome_4|Arrhythmia
4 114294537 rs45454496 G A ANK2 Pathogenic Arrhythmia
5 251453 rs137852768 G A SDHA Pathogenic Mitochondrial_complex_II_deficiency|
Dilated_cardiomyopathy_1GG
5 172662014 rs28936670 G A NKX2-5 Pathogenic Tetralogy_of_Fallot|
Interrupted_aortic_arch|
Truncus_arteriosus|
Hypoplastic_left_heart_syndrome_2|
Malformation_of_the_heart_and_
great_vessels
6 6152107 rs267606789 G A F13A1 Pathogenic Factor_xiii_a_subunit_deficiency_of
6 7542236 rs121912998 G A DSP Pathogenic Arrhythmogenic_right_ventricular_
cardiomyopathy_type_8
76
6 7583050 rs193922669 G A DSP Likely
Pathogenic
Arrhythmogenic_right_ventricular_
cardiomyopathy
6 121769078 rs2227885 G A GJA1 Pathogenic Hypoplastic_left_heart_syndrome|
Atrioventricular_septal_defect_and_
common_atrioventricular_junction
6 121769120 rs104893965 G A GJA1 Pathogenic Hypoplastic_left_heart_syndrome|
Atrioventricular_septal_defect_and_
common_atrioventricular_junction
6 129601217 rs117422805 C T LAMA2 Likely
Pathogenic
Congenital_muscular_dystrophy
6 129674430 rs121913575 C T LAMA2 Pathogenic Congenital_muscular_dystrophy_due_
to_partial_LAMA2_deficiency
6 149699739 rs267607100 C A TAB2 Pathogenic Congenital_heart_disease_multiple_
types_2
7 150644429 rs377095107 G A KCNH2 Likely
Pathogenic
Cardiac_arrhythmia
7 150644799 rs141401803 G A KCNH2 Pathogenic Sudden_infant_death_syndrome|
Cardiac_arrhythmia
7 150646083 rs121912510 G A KCNH2 Pathogenic Long_QT_syndrome_2|
Congenital_long_QT_syndrome|
Cardiac_arrhythmia
7 150647283 rs138498207 G A KCNH2 Pathogenic Congenital_long_QT_syndrome|
Cardiac_arrhythmia
7 150649763 rs199472901 G A KCNH2 Pathogenic Congenital_long_QT_syndrome
7 150649787 rs199472899 G A KCNH2 Pathogenic Congenital_long_QT_syndrome
7 150655288 rs199472876 C T KCNH2 Pathogenic Congenital_long_QT_syndrome|
Cardiac_arrhythmia
7 150655407 rs587777907 T A KCNH2 Pathogenic Long_QT_syndrome_2
8 11566308 rs387906769 C T GATA4 Pathogenic Atrioventricular_septal_defect_4|
Ventricular_septal_defect_1|
Tetralogy_of_Fallot
8 11614483 rs115372595 C T GATA4 Pathogenic Atrioventricular_septal_defect_4
8 11614521 rs368489876 G A GATA4 Pathogenic Ventricular_septal_defect_1
8 11615928 rs56208331 G A GATA4 Pathogenic Atrial_septal_defect_2|
Tetralogy_of_Fallot
8 19811733 rs118204057 G A LPL Pathogenic Hyperlipoproteinemia_type_I
8 19813384 rs118204077 C T LPL Pathogenic Hyperlipoproteinemia_type_I
8 19813529 rs268 A G LPL Pathogenic Hyperlipidemia_familial_combined
10 69881254 rs140148105 A G MYPN Pathogenic Familial_hypertrophic_cardiomyopathy_22|
Dilated_cardiomyopathy_1KK
10 69961675 rs71534280 G A MYPN Pathogenic Dilated_cardiomyopathy_1KK
10 75871844 rs121917776 C T VCL Pathogenic Dilated_cardiomyopathy_1W|
77
Familial_hypertrophic_cardiomyopathy_15
10 88441437 rs45487699 C T LDB3 Pathogenic Dilated_cardiomyopathy_1C|Familial_
hypertrophic_cardiomyopathy_24
10 88477867 rs145983824 C T LDB3 Pathogenic Familial_hypertrophic_cardiomyopathy_24
10 92678707 rs145387010 G A ANKRD1 Likely
Pathogenic
Primary_familial_hypertrophic_
cardiomyopathy
11 2549180 rs199473450 C T KCNQ1 Pathogenic Congenital_long_QT_syndrome|
Cardiac_arrhythmia
11 2592624 rs199473456 C T KCNQ1 Pathogenic Congenital_long_QT_syndrome|
Cardiac_arrhythmia
11 2594172 rs199472737 C T KCNQ1 Pathogenic Congenital_long_QT_syndrome|
Cardiac_arrhythmia
11 2608824 rs199473473 G A KCNQ1 Pathogenic Congenital_long_QT_syndrome
11 2609956 rs199472778 A C KCNQ1 Pathogenic Congenital_long_QT_syndrome
11 2790090 rs199472785 C T KCNQ1 Pathogenic Congenital_long_QT_syndrome
11 2790114 rs199472787 C T KCNQ1 Pathogenic Congenital_long_QT_syndrome
11 2799220 rs17221854 C T KCNQ1 Pathogenic Long_QT_syndrome_1|
Congenital_long_QT_syndrome|
Cardiac_arrhythmia
11 19209758 rs137852764 T C CSRP3 Pathogenic Dilated_cardiomyopathy_1M|
Cardiomyopathy
11 47353637 rs730880142 C T MYBPC3 Likely
Pathogenic
Primary_familial_hypertrophic_
cardiomyopathy
11 47354163 rs730880594 G A MYBPC3 Likely
Pathogenic
Cardiomyopathy
11 47354209 rs199669878 C T MYBPC3 Likely
Pathogenic
Cardiomyopathy
11 47355559 rs730880138 G A MYBPC3 Likely
Pathogenic
Primary_familial_hypertrophic_
cardiomyopathy
11 47356671 rs387907267 G A MYBPC3 Pathogenic Familial_hypertrophic_cardiomyopathy_4
11 47357493 rs727504378 C T MYBPC3 Likely
Pathogenic
Cardiomyopathy
11 47359273 rs730880565 G A MYBPC3 Likely
Pathogenic
Cardiomyopathy
11 47361267 rs730880561 G A MYBPC3 Pathogenic Cardiomyopathy
11 47362764 rs730880552 G A MYBPC3 Pathogenic Cardiomyopathy
11 47364285 rs200625851 C T MYBPC3 Pathogenic Familial_hypertrophic_cardiomyopathy_4|
Left_ventricular_noncompaction_10
11 47364602 rs193922377 C T MYBPC3 Pathogenic Cardiomyopathy
11 47364832 rs587776699 C T MYBPC3 Pathogenic Familial_hypertrophic_cardiomyopathy_4
11 47365041 rs730880641 A C MYBPC3 Pathogenic Cardiomyopathy
11 47371333 rs201098973 C T MYBPC3 Pathogenic Cardiomyopathy
78
11 47371339 rs769167548 C T MYBPC3 Likely
Pathogenic
Familial_hypertrophic_cardiomyopathy_4
11 47371575 rs730880619 C G MYBPC3 Likely
Pathogenic
Cardiomyopathy
11 118039455 rs17121819 G A SCN2B Pathogenic Atrial_fibrillation_familial_14
12 5154893 rs121908591 C T KCNA5 Pathogenic Atrial_fibrillation_familial_7
12 8800737 rs727502791 G A MFAP5 Pathogenic Aortic_aneurysm_familial_thoracic_9
12 33003841 rs372827156 G A PKP2 Likely
Pathogenic
Arrhythmogenic_right_ventricular_
cardiomyopathy
12 98928103 rs17028450 C T TMPO Pathogenic Dilated_cardiomyopathy_1T
12 111356964 rs104894363 C T MYL2 Pathogenic Familial_hypertrophic_cardiomyopathy_10|
Cardiomyopathy
14 23862177 rs267606904 C G MYH6 Pathogenic Familial_hypertrophic_cardiomyopathy_14|
Primary_familial_hypertrophic_
cardiomyopathy
14 23862646 rs143978652 C A MYH6 Pathogenic Dilated_cardiomyopathy_1EE|Familial_
hypertrophic_cardiomyopathy_14|
Sudden_cardiac_death
14 23866396 rs515726230 T C MYH6 Likely
Pathogenic
Malformation_of_the_heart
14 23884268 rs730880820 C T MYH7 Likely
Pathogenic
Cardiomyopathy
14 23884341 rs369940645 C T MYH7 Likely
Pathogenic
Familial_hypertrophic_cardiomyopathy_1
14 23886482 rs397516214 G C MYH7 Likely
Pathogenic
Cardiomyopathy
14 23886827 rs730880796 G A MYH7 Likely
Pathogenic
Cardiomyopathy
14 23887447 rs730880909 G C MYH7 Likely
Pathogenic
Cardiomyopathy
14 23890202 rs367546859 C T MYH7 Pathogenic Primary_familial_hypertrophic_
cardiomyopathy|Cardiomyopathy
14 23892910 rs145532615 A G MYH7 Likely
Pathogenic
Familial_hypertrophic_cardiomyopathy_1
14 23893148 rs45496496 C G MYH7 Likely
Pathogenic
Cardiomyopathy|Primary_dilated_
cardiomyopathy
14 23894525 rs3218716 C T MYH7 Pathogenic Cardiomyopathy
14 23894554 rs376754645 C T MYH7 Likely
Pathogenic
Familial_hypertrophic_cardiomyopathy_1
14 23895007 rs121913644 G A MYH7 Pathogenic Familial_hypertrophic_cardiomyopathy_1
14 23896982 rs377491278 C T MYH7 Likely
Pathogenic
Dilated_cardiomyopathy_1S
79
14 23902865 rs186964570 G A MYH7 Likely
Pathogenic
Familial_hypertrophic_cardiomyopathy_1
16 15814100 rs34321232 T G NDE1 Likely
Pathogenic
Familial_aortopathy
18 3149140 rs77613865 T G MYOM1 Likely
Pathogenic
Hypertrophic_cardiomyopathy
18 19395685 rs201850378 C T MIB1 Pathogenic Left_ventricular_noncompaction_7
18 19438554 rs200035428 G T MIB1 Pathogenic Left_ventricular_noncompaction_7
18 28666646 rs193922708 G A DSC2 Likely
Pathogenic
Cardiomyopathy
18 28672114 rs144799937 C T DSC2 Likely
Pathogenic
Cardiomyopathy
18 29099850 rs121913013 G A DSG2 Pathogenic Arrhythmogenic_right_ventricular_
cardiomyopathy_type_10
18 29104828 rs121913012 G A DSG2 Pathogenic Arrhythmogenic_right_ventricular_
cardiomyopathy_type_10
18 29121188 rs201564919 G A DSG2 Pathogenic Cardiomyopathy
18 29178565 rs121918095 G A TTR Pathogenic Amyloidogenic_transthyretin_amyloidosis
18 29178618 rs76992529 G A TTR Pathogenic Amyloid_Cardiomyopathy|Cardiomyopathy
19 45452024 rs120074114 A C APOC2 Pathogenic Apolipoprotein_c-ii_variant
19 49671558 rs387907216 C T TRPM4 Pathogenic Progressive_familial_heart_block_type_1B
19 49685865 rs201907325 G A TRPM4 Pathogenic Progressive_familial_heart_block_type_1B
19 49691898 rs172149856 G A TRPM4 Pathogenic Progressive_familial_heart_block_type_1B
19 55665519 rs397516348 G T TNNI3 Likely
Pathogenic
Cardiomyopathy
19 55668953 rs397516359 G A TNNI3 Likely
Pathogenic
Dilated_cardiomyopathy_1FF
20 30408136 rs121908107 C T MYLK2 Pathogenic Cardiomyopathy\x2c_hypertrophic_
midventricular_digenic
20 30408160 rs121908108 C A MYLK2 Pathogenic Cardiomyopathy\x2c_hypertrophic_
midventricular_digenic
20 42788855 rs554853074 G C JPH2 Likely
Pathogenic
Cardiomyopathy
21 35742806 rs199473648 C T KCNE2 Pathogenic Congenital_long_QT_syndrome|
Cardiac_arrhythmia|Long_QT_syndrome
21 35742947 rs74315448 T C KCNE2 Pathogenic Long_QT_syndrome_6
21 35821559 rs142511345 G A KCNE1 Pathogenic Congenital_long_QT_syndrome
21 35821680 rs1805128 C T KCNE1 Pathogenic Long_QT_syndrome_2/5
21 35821686 rs199473360 C T KCNE1 Pathogenic Congenital_long_QT_syndrome
21 35821770 rs199473644 C T KCNE1 Pathogenic Congenital_long_QT_syndrome
21 35821826 rs199473351 C T KCNE1 Pathogenic Congenital_long_QT_syndrome
21 35821838 rs17857111 C T KCNE1 Likely Congenital_long_QT_syndrome
80
Pathogenic
21 35821850 rs199473350 G A KCNE1 Pathogenic Congenital_long_QT_syndrome
21 35821904 rs144917638 G A KCNE1 Pathogenic Congenital_long_QT_syndrome
22 50962330 rs28937598 G A SCO2 Pathogenic Cardioencephalomyopathy
22 50962573 rs74315512 G A NCAPH2 Pathogenic Cardioencephalomyopathy
X 32360240 rs128626249 G A DMD Pathogenic Dilated_cardiomyopathy_3B
X 153609167 rs370840449 G A EMD Likely
Pathogenic
Cardiomyopathy
Table 3.6: ClinVar‘s pathogenic and likely pathogenic variants filtered
form British Pakistanis dataset.
CHR POS ID REF ALT Gene Clinical
Significance
Disease
1 3430888 rs201654872 G A PRDM16 Pathogenic Dilated_cardiomyopathy_1LL
2 178738114 rs147879266 C T TTN Pathogenic Dilated_cardiomyopathy_1G
3 38550895 rs137854610 C T SCN5A Pathogenic Long_QT_syndrome_3|Congenital_
long_QT_syndrome|Sudden_infant_
death_syndrome
3 38550917 rs137854619 C T SCN5A Pathogenic Long_QT_syndrome_2/3 _digenic|
Congenital_long_QT_syndrome
3 38562467 rs199473603 G A SCN5A Pathogenic Congenital_long_QT_syndrome|
Long_QT_syndrome|not_provided
3 38575385 rs41261344 C T SCN5A Pathogenic Brugada_syndrome_1|
Long_QT_syndrome_3
3 38581149 rs199473183 A G SCN5A Pathogenic Congenital_long_QT_syndrome
3 38603929 rs1805124 T C SCN5A Pathogenic Progressive_familial_heart_block_
type_1A
3 38606052 rs199473110 C T SCN5A Pathogenic Congenital_long_QT_syndrome
3 38630330 rs199473059 C G SCN5A Pathogenic Congenital_long_QT_syndrome
4 113365051 rs66785829 T A ANK2 Pathogenic Arrhythmia|Long_QT_syndrome|
Cardiac_arrhythmia
4 113373306 rs121912706 C T ANK2 Pathogenic Long_QT_syndrome_4|Arrhythmia|
Cardiac_arrhythmia
5 173235011 rs28936670 G A NKX2-5 Pathogenic Tetralogy_of_Fallot|
Interrupted_aortic_arch|
Truncus_arteriosus|
Hypoplastic_left_heart_syndrome_2|
Malformation_of_the_heart_and_
great_vessels
6 7542003 rs121912998 G A DSP Pathogenic Arrhythmogenic_right_ventricular_
cardiomyopathy\x2c_type_8|
Arrhythmogenic_right_ventricular_
81
cardiomyopathy|not_specified
6 7582817 rs193922669 G A DSP Likely
Pathogenic
Arrhythmogenic_right_ventricular_
cardiomyopathy
7 150945498 rs199473032 G A KCNH2 Pathogenic Congenital_long_QT_syndrome
7 150958319 rs587777907 T A KCNH2 Pathogenic Long_QT_syndrome_2
8 11756974 rs115372595 C T GATA4 Pathogenic Atrioventricular_septal_defect_4
8 11757012 rs368489876 G A GATA4 Pathogenic Ventricular_septal_defect_1
8 11758419 rs56208331 G A GATA4 Pathogenic Atrial_septal_defect_2|
Tetralogy_of_Fallot
8 19956018 rs268 A G LPL Pathogenic Hyperlipidemia _familial_combined
10 86681680 rs45487699 C T LDB3 Pathogenic Dilated_cardiomyopathy_1C|
Familial_hypertrophic_cardiomyo-
pathy_24
10 90918950 rs145387010 G A ANKRD1 Likely
Pathogenic
Primary_familial_hypertrophic_
cardiomyopathy|Cardiomyopathy
10 110810449 rs794729148 C T RBM20 Likely Pathogenic
Cardiomyopathy
10 119672399 rs397514506 C T BAG3 Pathogenic Dilated_cardiomyopathy_1HH
Arrhythmogenic_right_ventricular_
cardiomyopathy
11 2570664 rs199472694 G A KCNQ1 Pathogenic Congenital_long_QT_syndrome
11 2572942 rs199472737 C T KCNQ1 Pathogenic Congenital_long_QT_syndrome
Cardiac_arrhythmia
11 2587594 rs199473473 G A KCNQ1 Pathogenic Congenital_long_QT_syndrome
11 2777990 rs17221854 C T KCNQ1 Pathogenic Long_QT_syndrome_1| Acquired_susceptibility_to_long_ QT_syndrome_1| Long_QT_syndrome_LQT1_subtype
11 47332579 rs730880596 C T MYBPC3 Likely Pathogenic
Cardiomyopathy| Hypertrophic_cardiomyopathy
11 47333610 rs371061770 G A MYBPC3 Likely Pathogenic
Hypertrophic_cardiomyopathy
11 47337722 rs730880565 G A MYBPC3 Likely Pathogenic
Cardiomyopathy
11 47341213 rs730880552 G A MYBPC3 Pathogenic Cardiomyopathy
11 47343021 rs786204338 C A MYBPC3 Likely Pathogenic
Familial_hypertrophic_ cardiomyopathy_4
11 47343281 rs587776699 C T MYBPC3 Pathogenic Familial_hypertrophic_cardiomyo- pathy_4|Cardiomyopathy
11 47350024 rs730880619 C G MYBPC3 Likely
Pathogenic
Cardiomyopathy
11 70206161 rs387906839 T G FADD Pathogenic infections _recurrent_ encephalopathy_hepatic_dysfunction_ and_cardiovascular_malformations
12 32850907 rs372827156 G A PKP2 Likely Arrhythmogenic_right_ventricular_
82
Pathogenic cardiomyopathy
14 23427773 rs377491278 C T MYH7 Likely Pathogenic
Dilated_cardiomyopathy_1S
15 48427699 rs140537304 C T FBN1 Likely
Pathogenic
Marfan_syndrome|
Thoracic_aortic_aneurysms_and_
aortic_dissections
15 99690352 rs121918530 A G MEF2A Pathogenic Coronary_artery_disease/
myocardial_infarction
18 3149142 rs77613865 T G MYOM1 Likely Pathogenic
Hypertrophic_cardiomyopathy
18 22171695 rs387906816 G A GATA6 Pathogenic Atrial_septal_defect_9| Tetralogy_of_Fallot
18 31519887 rs121913013 G A DSG2 Pathogenic Arrhythmogenic_right_ventricular_ cardiomyopathy_type_10
18 31524754 rs752432726 A G DSG2 Pathogenic Cardiomyopathy
19 11105436 rs121908026 C T LDLR Pathogenic Familial_hypercholesterolemia
19 11120436 rs28942084 C T LDLR Pathogenic Familial_hypercholesterolemia
19 48965830 rs2230267 T C FTL Likely Pathogenic
sporadic_abdominal_aortic_aneurysm
19 49182608 rs201907325 G A TRPM4 Pathogenic Progressive_familial_heart_block_ type_1B
19 49188641 rs172149856 G A TRPM4 Pathogenic Progressive_familial_heart_block_ type_1B
19 55157585 rs397516359 G A TNNI3 Likely Pathogenic
Dilated_cardiomyopathy_1FF
20 31820333 rs121908107 C T MYLK2 Pathogenic Cardiomyopathy_hypertrophic_ midventricular_digenic
20 44160215 rs554853074 G C JPH2 Likely Pathogenic
Cardiomyopathy
21 34370507 rs199473648 C T KCNE2 Pathogenic Congenital_long_QT_syndrome| Cardiac_arrhythmia|Long_QT_syndrome
21 34370648 rs74315448 T C KCNE2 Pathogenic Long_QT_syndrome_6
21 34449472 rs199473644 C T KCNE1 Pathogenic Congenital_long_QT_syndrome
21 34449540 rs17857111 C T KCNE1 Likely Pathogenic
Congenital_long_QT_syndrome
21 34449606 rs144917638 G A KCNE1 Pathogenic Congenital_long_QT_syndrome
83
3.1.4 Comparative Analysis of Allele Frequencies of Predicted Deleterious
Variants
The assessment of the distribution of allele frequencies of deleterious nonsynonymous
SNVs across the populations is a key factor in understanding the genetic makeup and
estimating the underlying burden of various human diseases (1000 Genomes Project,
2012). The frequency of deleterious variants in different populations is an important
indicator for disease prevalence (Dopazo et al., 2016). Derived allele frequency
spectra of deleterious SNVs filtered from the three datasets under study revealed that
majority of the variants were of rare allele frequency (AF < 0.5%) (Figure 3.8). It was
also observed that larger proportion of deleterious sites were singletons. The
proportion of deleterious singletons in 1000 Genomes PJL was 61.67%, in ExAC SAS
was 40.14%, and in British Pakistanis was 50.86%. The role of rare allele frequency
variants in CVDs‘ susceptibility has been hypothesized that these variants may impart
more effect to the pathophysiology of disease (Wain, 2014).
Figure 3.8: Allele frequency spectrum (AFS) of deleterious SNVs in three datasets: (A) 1000 Genomes PJL, (B) ExAC South Asians, and (C) British Pakistanis. The sharp spikes in AFS of all three datasets represent large number of singletons.
84
In addition to rare variants, small number of low (AF 0.5-5.0%) and common (AF >
5.0%) allele frequency deleterious variants were also observed. These low and
common deleterious variants pose modest-to-weak effect to fitness and spread and
raise to high allele frequencies in the populations along with neutral variants during the
rapid population expansion (Peischl, and Excoffier, 2015). In British Pakistanis,
comparatively more numbers of high-allele frequency deleterious variants were
observed in pattern of allele frequency distribution. To explore this, the allele
frequency spectrum using higher allele frequency deleterious variants (AF > 10%) was
prepared in bins of 10. This showed that the British Pakistanis contained larger
number of high frequency deleterious variants in the genes-set of CVDs as compared
to 1000 Genomes PJL, and ExAC SAS datasets (Figure 3.9). This was due to founder
effect, as this population has extensively been inbred during the past many
generations and carry long runs of identity be descent than contemporary outbred
populations (Narasimhan et al., 2016), which increases the number of deleterious
variants along with neutral variants in populations during rapid expansion (Henn,
Botigué, Bustamante, Clark, and Gravel, 2015).
85
Figure 3.9: Allele frequency spectrum using the common
deleterious SNVs of DAF≥10% of three datasets.
86
Previously, it has been found that the allele frequencies of deleterious genetic variants
associated with certain human diseases may vary among various populations,
according to their historical modes of expansion, role of evolutionary forces, and
bottlenecks. Highly deleterious variants are purged by purifying selection from the
population and are rare (Henn, Botigué, Bustamante, Clark, and Gravel, 2015; Lettre,
2014; Tennessen et al., 2012). The comparison of derived allele frequencies of
predicted deleterious SNVs of cardiovascular diseases, was carried out with other
major population groups within their respective data sets. This comparison revealed
two important findings: (a) The extent of private and shared deleterious SNVs between
the Pakistanis and other populations, and (b) the number of deleterious SNVs with
higher derived allele frequency in the Pakistani population (or South Asian in case of
ExAC data) than in other populations. From this analysis, it was noted that sharing of
deleterious SNVs with other populations was not similar, rather the shared SNVs with
different population groups were different (Table 3.7).
Table 3.7: The proportion of shared deleterious SNVs (sdSNVs) with other populations of 1000 Genomes Project and ExAC.
10
00
Ge
no
me
s_P
JL Total
dSNVs Private dSNVs
deleterious SNVs shared with different populations
Proportion (shared with pop/
total shared dSNVs)
SNVs with higher
DAF in PJL
SNVs with lower DAF
in PJL
Proportion of SNVs with higher DAF
561 185
(33%)
shared with SAS 376 1.000 282 94 0.750
shared with EUR 199 0.529 108 91 0.543
shared with AMR 171 0.455 99 72 0.579
shared with AFR 157 0.418 119 38 0.758
shared with EAS 127 0.338 84 43 0.661
ExA
C_S
AS
Total dSNVs
Private dSNVs
deleterious SNVs shared with different populations
Proportion (shared with pop/
total shared dSNVs)
SNVs with higher
DAF in SAS
SNVs with lower
DAF in SAS
Proportion of SNVs with higher DAF
7374 4170 (56%)
shared with NFE 2480 0.774 1883 597 0.759
shared with AMR 1211 0.378 473 738 0.391
shared with AFR 1202 0.375 445 757 0.370
shared with EAS 893 0.279 268 625 0.300
shared with FIN 478 0.149 123 355 0.257
From 1000 Genomes Project, overall 33.16% of the predicted deleterious SNVs were
private to the Pakistani population, the derived allele frequencies of which varied from
0.0052 to 0.0260, while 66.84% were shared with derived allele frequencies ranging
from 0.0052 to 0.7968. So, it was evident that among the predicted deleterious SNVs,
87
the private proportion contained only rare variants (DAF < 0.05), while the shared
proportion contained both rare (47.50%) and common variants (52.50%) within this
category. Among the SNVs shared with other populations, the proportions of those
having higher allele frequencies in Pakistani population were greater in all the
comparisons conducted within 1000 Genomes populations. This comparison also
revealed that there was comparatively less difference in allele frequencies of most of
the deleterious variants between 1000 Genomes PJL and rest of the South Asian
populations, however, in some cases a significant difference up to 5.2 times higher
was observed. Likewise, the maximum difference of derived allele frequency of shared
variants with Americans was 22.32 times higher in PJL, for Europeans 41.67 times
higher in PJL, whereas, great frequency difference was observed with Africans and
East Asians where the maximum derived allele frequency difference was calculated to
be 72.19 times higher in PJL (Figure 3.9).
From ExAC SAS dataset, the proportion of shared deleterious SNVs with higher DAF
in SAS was greater than only one population i.e., NFE (Non-Finnish Europeans),
while, it was less than AFR (Africans), AMR (Americans), FIN (Finnish), and EAS (East
Asians) populations. Apart from this, many SNVs have higher derived allele frequency
in SAS than in other populations of ExAC dataset. The highest difference of higher
DAF in SAS was observed with NFE i.e. 1098 timers higher in SAS. For other
populations, the maximum difference was 858 times than EAS, 290 times than AMR,
347 times than AFR, and 64 times than FIN populations.
88
Figure 3.10: Distribution of allele frequencies of shared deleterious SNVs in PJL versus all continental groups of 1000 Genomes Project. A. The SNVs at diagonal line have equal DAF in the comparing populations, whereas, those to the right have higher DAF in PJL, and those to the left have higher DAF in comparing population. B. Violin plots showing the median DAF in comparing populations.
A
B
89
3.1.5 Functional Annotation of Deleterious Variants
The predicted deleterious variants of the three data sets were stratified according to
their functional consequences on transcripts to highlight loss of function (LoF) variants
using the online Variant Effect Predictor tool (McLaren et al., 2016). The loss of
function variants include ‗stop-gained‘, ‗stop-lost‘, ‗start-lost‘, ‗frameshift change‘, and
‗splice donor or acceptor‘ which possess the most damaging effect to proteins
structure and/or function (MacArthur et al., 2012). In this analysis, 03 LoF SNVs in
1000 Genomes PJL individuals were found, i.e., rs2228570 (start lost), rs371316552
(stop gained), and rs117054298 (splice acceptor variant). The derived allele frequency
of homozygous ‗rs2228570‘ was found to be quite high in all 1000 Genomes‘
continental populations ranging from 51.73% in Americans to 81.09% in Africans,
while in PJL individuals, it was 79.68%. This variant lies within vitamin D receptor gene
(VDR), whose 7 out of 10 transcripts were found to be affected with LoF mutation, and
is associated with many disease conditions including the hypertension (Santoro et al.,
2015; Swapna, Vamsi, Usha, and Padma, 2011). The heterozygous ‗rs371316552‘
SNP belongs to cathepsin B (CTSB) gene, whose increased expression has been
reported to pose a risk for atherosclerosis and myocardial infarction in rat models
(Jormsjö et al., 2002). The third LoF homozygous SNP ‗rs117054298‘ belongs to
insulin-like growth factor (IGF) binding protein-1 (IGFBP1) gene, whose splice site of
one transcript ENST00000457280 is disrupted and contributes to atherosclerosis
(Rajwani et al., 2012).
From ExAC SAS dataset, 30 deleterious SNVs, including 2 in homozygous state, were
found with LoF effect to the transcripts (Table 3.8). These included 06 SNVs with stop-
gained effect, 02 SNVs with stop-lost effect, 14 SNVs with start-lost effect, 05 SNV as
splice-donor and 03 SNVs as splice-acceptor. Here all the SNVs were of rare allele
frequency, but one splice-acceptor SNV rs117054298 had 1.5% allele frequency in
SAS. This variant belonged to Insulin Like Growth Factor Binding Protein 1 gene
(IGFBP1). The ligh and low levels of Insulin Like Growth Factor Binding Protein 1 have
been reported to be associated with hypertrophic cardiomyopathy and congestive
heart failure respectively (Saeki, Hamada, and Hiwada, 2002).
90
Table 3.8: Deleterious LoF SNVs filtered from ExAC SAS dataset in genes of Mendelian and congenital CVDs. The two underlined SNVs (in chromosome 2 & 20) were in homozygous state.
CHR POS ID REF ALT AC Gene Effected Transcripts
Effect
1 11907740 rs770346667 A G AC=1 CLCN6 1 start-lost
1 147231345 rs782228278 A G AC=2 GJA5 3 start-lost
1 156084858 rs60695352 G A AC=3 LMNA 1 splice-donor
1 159684275 rs749117623 T C AC=1 CRP 5 start-lost
1 227069610 rs762674312 T C AC=1 PSEN2 6 start-lost
2 63824637 rs750401274 C G AC=1 MDH1 1 stop-gained
2 73679956 rs28730854 C A AC=3 ALMS1 4 stop-gained
2 179447910 rs780643085 T A AC=1 TTN 1 splice-donor
3 32180202 rs763553263 G A AC=1 GPD1L 1 stop-gained
5 216962 rs755580860 C T AC=1 SDHA 2 start-lost
6 29795623 rs143732275 T C AC=25 HLA-G 5 start-lost
6 43752283 rs748984440 A C AC=1 VEGFA 2 stop-lost
6 76527266 rs773390519 T C AC=1 MYO6 4 start-lost
7 45932563 rs117054298 A T AC=247 IGFBP1 1 splice-acceptor
7 100771765 rs141347752 G A AC=71 SERPINE1 1 splice-donor
7 100771765 rs141347752 G T AC=1 SERPINE1 1 splice-donor
7 139719838 rs752538699 T C AC=2 PARP12 2 stop-lost
9 108337316 rs750082228 G C AC=1 FKTN 6 start-lost
11 111782447 rs577253222 A G AC=2 CRYAB 10 start-lost
11 111789698 rs782334737 T C AC=10 C11orf52 2 start-lost
14 64692155 rs371152824 G T AC=1 SYNE2 9 stop-gained
15 89864101 rs755510237 C A AC=1 POLG 1 splice-acceptor
16 3070400 rs763424248 T C AC=1 HCFC1R1 5 start-lost
17 7128033 rs754123613 G A AC=1 DVL2 2 splice-acceptor
17 19866257 rs745466247 G A AC=1 AKAP10 1 stop-gained
17 39881255 rs782157753 C T AC=1 HAP1 1 splice-donor
17 42148336 rs767856364 G A AC=1 G6PC3 5 start-lost
19 55652651 rs774939150 A G AC=1 TNNT1 2 start-lost
20 44637567 rs121434556 T A AC=121 MMP9 1 start-lost
22 30659995 rs140815202 C T AC=1 OSM 2 stop-gained
91
From British Pakistanis dataset, 29 deleterious SNVs were identified posing LoF effect
to the transcripts. These included 09 SNVs with stop-gained effect, 02 SNVs with stop-
lost effect, 18 SNVs with start-lost effect, 01 SNV as splice-donor and 02 SNVs as
splice-acceptor. All the LoF SNVs were of rare allele frequency except one start-lost
SNV ‗rs2228570‘ the percent allele frequency of which was found to be 75.6% in
British Pakistanis. This homozygous SNV of vitamin D receptor gene (VDR) was also
filtered in 1000 Genomes PJL dataset and is associated with many disease conditions
including the hypertension. Further, 10 novel LoF SNVs were also filtered which were
not previously reported in dbSNP database (Table 3.9).
Table 3.9: Novel deleterious SNVs filtered from British Pakistanis dataset in genes of CVDs.
CHR POS ID REF ALT AC Gene Effected Transcripts
Effect
6 31543520 . T A AC=1 LTA 1 start-lost
6 32150436 . G A AC=3 AGPAT1 1 stop-gained
6 118887404 . A C AC=1 CEP85L 1 start-lost
6 160109220 . C T AC=1 SOD2 1 stop-gained
8 11606427 . G C AC=1 GATA4 1 splice-acceptor
14 95053702 . G A AC=2 SERPINA5 11 start-lost
17 4544946 . T C AC=2 ALOX15 3 start-lost
17 78188475 . C T AC=1 SGSH 2 stop-gained
18 77211708 . G A AC=1 NFATC1 1 stop-gained
19 39410426 . G C AC=1 SARS2 1 stop-gained
92
3.1.6 Differentiation of Deleterious Variants in Pakistani Population
Data from whole genome/exome sequencing projects can be used to find out the
extent of differentiation among populations based on the differences in allele
frequencies of variants. The presence of variants with highly differentiated frequencies
among the populations provides a direction to fine-map signals of local adaptation as
well as susceptibility to diseases (1000 Genomes Project, 2010). In this study, the
differentiation of genetic variations in genes of CVDs was evaluated in Pakistani
population using the phased data of 1000 Genomes Project. The genetic
differentiation was determined with F-statistics of population genetics by calculating
the Weir and Cockerham FST in two ways: (1) FST calculation for PJL versus rest of the
South Asian (SAS) populations in 1000 Genomes Project using all SNVs of genes
harboring the prioritized deleterious SNVs for cardiovascular diseases to determine
differentiation from neighboring local populations, and (2) FST calculation for PJL
versus 25 global populations in 1000 Genomes Project using the same set of genes.
The FST calculated with all SNVs for PJL versus rest of SAS populations showed large
number of poorly differentiated SNVs having FST values < 0.05, and many moderately
differentiated SNVs (FST value 0.05 – 0.15) (Figure 3.11). The mean FST value of all
SNVs was 0.00134, meaning that the genes harboring deleterious variations for CVDs
are not well differentiated from neighboring South Asian populations. The mean FST for
deleterious SNVs was calculated as 0.00638, which also shows poor differentiation but
still is 4.76 times higher than the mean FST of all SNVs. Two deleterious SNVs
(rs560826688 and rs563254260) were found moderately differentiated (FST value 0.05
- 0.15) from rest of South Asian populations. The derived allele frequency of
rs560826688 is 3.1% and belongs to LDL Receptor Related Protein 5 gene (LRP5)
which is reported to be involved in hypertension (Suwazono et al., 2006). The derived
allele frequency of rs563254260 is 2.6% and lies in Serpin Family F Member 1 gene
(SERPINF1) which has been associated with obesity and hypertension (Chen et al.,
2012). In addition to these, one greatly differentiated (FST value 0.15-0.25) SNV
rs539962979 with FST value 0.16597 was also observed in DMPK (Dystrophia
Myotonica Protein Kinase) which is reported to be involved in cardiomyopathy.
Likewise, the F-statistics performed for PJL versus 25 global populations of 1000
93
Genomes Project showed comparatively higher differentiation than the SAS
populations, where the mean FST of 0.0031 for all SNVs, and 0.0392 for deleterious
SNVs was calculated. The major proportion of differentiation in the predicted
deleterious SNVs was covered by moderately differentiated SNVs (38.32%, 215 out of
561) (Figure 3.12). Besides this, 08 greatly differentiated deleterious SNVs (FST 0.15-
0.25), and 02 severely differentiated deleterious SNVs (FST > 0.25) were also found
(Table 3.10).
The current understanding of population genetics suggest that genetic burden of
common diseases may be different for populations under the influence of their
demographic past histories (Henn, Botigué, Bustamante, Clark, and Gravel, 2015). It
was hypothesized that deleterious variants filtered in Pakistani population for their
association with cardiovascular diseases may have differentiated from other
populations, but the results were persistent with earlier findings that genetic variants
related to common diseases of humans are not well differentiated (Lohmueller,
Mauney, Reich, and Braverman, 2006). Eight greatly and two severely differentiated
deleterious SNVs from the world populations may have evolved under random genetic
drift.
94
Figure 3.11: Manhattan plot for FST values between the PJL versus SAS populations of 1000 Genomes Project. The plot is for selected genes which harbored the deleterious SNVs for cardiovascular diseases, as filtered in this analysis. Each dot in the plot represents one SNV. The two moderately differentiated SNVs are highlighted as red.
95
Figure 3.12: Comparison of the proportions of moderately, greatly, and severely differentiated deleterious SNVs and all SNVs in genes harboring deleterious SNVs. The proportion of moderately differentiated SNVs (FST 0.05 - 0.15) is higher for deleterious SNVs when compared PJL versus all populations of 1000 Genomes Project.
96
Table 3.10: Deleterious SNVs greatly and severely differentiated in PJL versus 25 global populations of 1000 Genomes Project. It is note-worthy that two severely differentiated SNVs (rs560826688 and rs563254260) are both related to hypertension.
CHR POS ID REF ALT Gene Global DAF
PJL DAF FST Disease
6 44274073 rs151044424 C A AARS2 0.0005990 0.015625 0.171381 hypertrophic cardiomyopathy
7 92731317 rs577145375 A G SAMD9 0.0005990 0.015625 0.171381 atherosclerosis
11 35240875 rs376536014 G T CD44 0.0037939 0.041667 0.172107 aneurysm
11 68192737 rs560826688 G T LRP5 0.0011980 0.03125 0.297321 hypertension
14 64678793 rs532495528 G A SYNE2 0.0005990 0.015625 0.170082 dilated cardiomyopathy, Long QT syndrome
14 74974786 rs549001156 C A LTBP2 0.0005990 0.015625 0.171381 Ventricular septal defect
15 67008780 rs532621952 C G SMAD6 0.0005990 0.015625 0.171381 Aortic valve disease 2
17 1680660 rs563254260 C G SERPINF1 0.0009984 0.026042 0.259398 hypertension
19 39410407 rs555119979 G A SARS2 0.0007987 0.020833 0.217625 hypertension
21 38877614 rs575017348 T C DYRK1A 0.0007987 0.020833 0.217625 heart failure
97
The calculated FST values due to the difference in allele frequency of predicted
deleterious SNVs between PJL and global populations provided a direction for
stratification of other populations based on detrimental mutational load for
cardiovascular diseases. For this, principal component analysis (PCA) was carried out
using 1000 Genomes Project data of PJL and 15 other populations, 3 from each
continental group. The PCA was performed for common allele frequency SNVs (AF >
5.0%) and rare and low allele frequency SNVs (AF ≤ 5.0%) separately. The PCA with
all low and rare allele frequency SNVs of the genes-set of CVDs showed all the
populations grouped together except for the Africans (Figure 3.13, A). The PCA with
all common allele frequency SNVs suggested three distinct groups of world
populations where PJL together with other SAS populations were found grouped with
Europeans and Americans (Figure 3.13, B). Likewise, the PCA using deleterious low
and rare allele frequency SNVs showed all populations grouped together, but PJL
appeared to be diverging out (Figure 3.13, C). This was due to PJL specific deleterious
variants. The PCA with deleterious common allele frequency SNVs suggested that
three distinct groups of populations being related to each other decreasing their mutual
variance (Figure 3.13, D). This stratification based on genes involved in CVDs showed
no remarkable differentiation in populations rather it follows the similar pattern of
grouping as has been suggested for populations following the route of expansion after
the out of Africa event (Jobling, Hurles, and Tyler-Smith, 2013; McEvoy, Powell,
Goddard, and Visscher, 2011).
98
Figure 3.13: Principal Components Analysis (PCA) using the genes-set of CVDs. A. PCA using all low and rare allele frequency (AF≤5.0%) SNVs, B. PCA using all common allele frequency (AF>5.0%) SNVs. C. PCA using deleterious low and rare allele frequency (AF≤5.0%) SNVs, D. PCA using deleterious common allele frequency (AF>5.0%) SNVs.
(Here, PJL=Punjabi from Lahore, Pakistan; BEB=Bengali from Bangladesh;
ITU=Indian Telugu from the UK; STU=Sri Lankan Tamil from the UK;
FIN=Finnish in Finland; GBR=British in England and Scotland; CEU=Utah
Residents (CEPH) with Northern and Western European Ancestry;
CLM=Colombians from Medellin, Colombia; PEL=Peruvians from Lima, Peru;
PUR=Puerto Ricans from Puerto Rico; CHB=Han Chinese in Bejing, China;
JPT=Japanese in Tokyo, Japan; KHV=Kinh in Ho Chi Minh City, Vietnam;
LWK=Luhya in Webuye, Kenya; MSL=Mende in Sierra Leone; YRI=Yoruba in
Ibadan, Nigeria).
99
Using the same set of genes, the mutational load for cardiovascular diseases was also
determined for randomly selected one population from five continental population
groups of 1000 Genomes Project i.e., Gujarati Indian from Houston (GIH) in South
Asia, Southern Han Chinese (CHS) from East Asian, Finnish (FIN) in Finland, Puerto
Ricans (PUR) from America, Yoruba in Ibadan (YRI) in Africa, and Malay of East Asia
which is not part of 1000 Genomes Project. This empirical estimation revealed excess
of deleterious derived rare variants (singletons) in YRI and Malay populations, while
there were least deleterious derived singletons in FIN and PJL populations (Figure
3.14 A). The least deleterious singletons in Finnish may be due to the fact that this
population is highly inbred due to founder effect. This also gave clue of increased
inbreeding in PJL individuals. So, the proportion of homozygous deleterious SNVs was
determined in all six populations. The highest proportion was observed in FIN and PJL
populations (PJL 12.30%, Finnish 12.79%, Figure 3.14 B). The low number of derived
singletons and high proportion of homozygous deleterious SNVs in Pakistani
population are may be due to increased level of consanguinity.
Figure 3.14: Site frequency spectrums for PJL, 5 other populations of 1000 Genomes Project, and one Southeast Asian population ‗Malay‘, using the data of same number of individuals (n=96) of each population for normalization. A. Comparison of low frequency deleterius SNVs in genes set of CVDs. B. Percent homozyous deleterious SNVs in each population.
100
3.2 Whole Genome Sequencing of a Pakistani Individual with
Hyperlipi-demia and Coronary Artery Disease
3.2.1 Quality Assessment of Genomic DNA
Genomic DNA from peripheral blood samples of a Pakistani individual was isolated
using the CTAB method. The method is same as described by Winnepenninckx et al.
(1993) with some modifications. Quality of the purified DNA was assessed on 1%
agarose gel. The agarose gel showed intact bands of good quality genomic DNA of
>10Kb size (Figure 3.15).
Figure 3.15: Agarose gel electrophoresis of genomic DNA isolated from Pakistani obese individual (L = ladder, 1 - 5 = genomic DNA samples).
3.2.2 Fragmentation of Genomic DNA and Size Selection
To prepare the mate-paired library, genomic DNA was fragmented to 1500 bp using
the Covaris S220 sonication system. The sonication process yielded good quality of
fragmented genomic DNA mainly distributed from 1000 bp to 2000 bp approximately.
The most intense part of the fragmented DNA, ranging from 1200 bp to 1800 bp, was
extracted from agarose gel using sterile sharp blade (Figure 3.16).
101
Figure 3.16: A. Fragmentation of genomic DNA using the Covaris S220 system. B. Size selection by slicing the most intense part of fragmented DNA.
3.2.3 Mate-Paired Library Preparation
To sequence the whole genome, a mate-paired library was constructed because it
enables to find out single nucleotide variations (SNVs), small insertions and deletions
(indels), as well as the structural variations (SVs) (Levy et al. 2007, Wheeler et al.
2008). In case of SOLiD 5500xl work flow, the mate-paired library consists of a central
mate-pair adaptor of 36 bp, one fragment of the target DNA of 60 bp on either side of
the central mate-pair adaptor, P1-T adaptor of 41 bp at 5‘-end of the target DNA, and
P2-T adaptor of 24 bp at 3‘-end of the target DNA. Hence the total length of one
fragment of ideally prepared mate-paired library lies between 250-300 bp (Figure
3.17).
Figure 3.17: A schematic illustration of one fragment of mate-paired library. Tag1 and Tag2 represent the target DNAs to be sequenced.
A B
102
3.2.4 Evaluation of the Mate-Paired Library
The library of DNA fragments was evaluated on E-Gel® Electrophoresis System using
a 2% ready to use agarose gel. The position of amplified library was in between 250-
350 bp of the DNA ladder, representing the distribution of fragment sizes of the library
(Figure 3.18).
Figure 3.18: A 2% E-Gel showing the position of mate-paired library in lane no. 2.
The E-gel electrophoresis gives the qualitative assessment of the library, while the
Bioanalyzer gives quantitative and very precise size distribution of the library. From the
electropherogram obtained from the Bioanalyzer 2100, the average size of the library
was found to be 269 bp, and the concentration from the area under curve was 3.3
ng/uL (Figure 3.19).
103
Figure 3.19: Evaluation of the mate-paired library by Bioanalyzer 2100.
104
3.2.6 Analysis of Whole Genome Sequencing Data
The final library was subjected to SOLiD sequencing and data was produced in XSQ
file format (a colour space format) which is converted into ‗.csfasta‘ format (the human
readable format for DNA sequences in color codes). A total of 2.065 billion short reads
of DNA were obtained after converting XSQ files into csfasta format using the
XSQ_Converter tool.
Filtering the low quality reads improves the alignment percentage, overall coverage,
and makes variant calling more reliable. SOLiD_preprocess_filter_v2 is a perl based
tool which efficiently filters the reads below a specified quality score (Sasson, and
Michael, 2010). Here, removing the short reads having 3 or more bases with quality
score below 10, about 1.340 billion short reads with matching mate-pairs were
obtained. After alignment with the reference human genome, and removing the reads
aligned two or more times with the reference genome at the same position (duplicates
removal), 312,849,478 short reads were found to be aligned properly with the
reference.
After applying the GATK best practices for variants calling work flow, 2,568,249
variants were called. After applying the variant calling quality score (QV) of 20 (filtering
out the variants with QV < 20), there were 2,167,161 variants in the filtered vcf file,
including 2,055,524 SNVs and 111,664 short insertions deletions (indels). The
histogram of the depth of variants (DP) from the vcf file also showed the median depth
of coverage to be 6 (Figure 3.20). The transitions / transversion ratio (Ti/Tv) for whole
genome variants was found to be 2.14. As a whole, there were 41088 (1.90%) novel
variants.
105
Figure 3.20: Distribution of the depth (DP) of variants.
106
3.2.7 Analysis for Deleterious Mutations Related to Hyperlipidemia and
Related Cardiac Diseases
The filtered variants were subjected to ANNOVAR for gene based, region based, and
filter based annotations. As a result, the number of variants pertaining to different
genomic regions were obtained (Table 3.11). The number of synonymous variants was
more than the non-synonymous variants with a non-syn/syn ratio of 0.90. From the
annotation with CADD, SIFT, and Polyphen2 tools, 425 SNVs were prioritized as
combinedly predicted deleterious variants in 385 different genes (Figure 3.21).
Table 3.11: The number of variants in different genomic regions as calculated from ANNOVAR annotation.
Annotation No. of Variants
Exonic 14213
Intronic 771143
Intergenic 1190116
Upstream 12997
Downstream 15046
UTR5’ 2794
UTR3’ 17434
Synonymous 7143
Nonsynonymous 6434
Figure 3.21: The predicted deleterious variants with SIFT, Polyphen2, and CADD.
107
Among the 425 deleterious SNVs, 27 SNVs belonged to 25 genes from the genes list
of CVDs (Table 3.12). This also included 17 deleterious SNVs which were prioritized in
mutational load analysis. It was also found that two genes i.e., MTRR (methionine
synthase reductase), which plays role in DNA repair mechanism, and PLB1
(Phospholipase B1) which is a membrane phospholipase and is involved in removing
the sn-1 and sn-2 fatty acids from glycerophospholipids, contained two predicted
deleterious SNVs each. The MTRR has been reported to be linked with increased risk
of coronary artery disease (Brown, McKinney, Kaufman, Gravel, and Rozen, 2000),
while PLB1 has been found to be associated with the levels of low density lipoprotein-
cholesterol (LDL-C) and risk of coronary artery disease (Lettre et al., 2011). Further,
three SNVs were found in homozygous state i.e., ‗rs111896385‘ in FMN2 (Formin 2)
gene, a novel SNV in MTRR (methionine synthase reductase) gene, and ‗rs2108622‘
in CYP4F2 (Cytochrome P450 Family 4 Subfamily F Member 2) gene. The
nonsynonymous rs111896385 variant in Formin 2 gene affects its exon 5 & 6 and
causes a proline to leucine transition in protein product. This mutation has not been
described earlier in cardiovascular genetics but the gene Formin 2 has been shown to
be linked with coronary heart disease through a genome wide association study of
14000 cases (Wellcome Trust Case Control Consortium, 2007). The nonsynonymous
homozygous SNV ‗rs2108622‘ in CYP4F2 affects its exon 11 and causes a valine to
methionine transition in the protein product. This SNV has previously been reported to
be associated with coronary heart disease in Chinese Han population (C. Yu et al.,
2014). The novel nonsynonymous SNV in MTRR affects its exon 14 leading to
glutamine to leucine transition in its protein.
In order to explore the genes harboring predicted deleterious mutations and not
included in the genes list of mutational load analysis, three genes i.e., CDC27,
KCNJ12, HYDIN were having six deleterious mutations in each. These genes have not
been reported to be associated with cardiac disorders. The gene KCNJ12 (potassium
voltage-gated channel subfamily J member 12) encodes a K+ channel which is
involved in inward rectifying current in cardiac cells and is involved in cardiac
conduction. The gene CDC27 encodes a cell division cycle 27 protein which is part of
anaphase-promoting complex during cell division. The third gene HYDIN encodes
108
axonemal central pair apparatus protein which is involved in motility of cilia. The
number of deleterious variants was compared within these genes in 5 randomly
selected unrelated individuals from 1000 Genomes Project PJL. On average, KCNJ12,
and CDC27 contained two non-synonymous deleterious mutations, while, HYDIN
contained eight non-synonymous deleterious mutations. This gave clue that
deleterious non-synonymous mutations in KCNJ12 and CDC27 may have link with
hyperlipidemia and/or risk of coronary heart disease, while HYDIN normally contains
high number of deleterious mutations in Pakistani population. The allele frequencies of
all the prioritized SNVs were compared within 1000 Genomes Project dataset in which,
12 SNVs were highlighted having higher alternate allele frequency in SAS populations
than in global populations (Figure 3.22). This included seven common allele frequency
(AF > 5.0%) SNVs and three low allele frequency (AF = 1.0 – 5.0%), and two rare
allele frequency (AF < 1.0%) SNVs.
Table 3.12: 27 predicted deleterious non-synonymous SNVs in hyperlipidemia patient in genes associated with CVDs. The 17 underlined SNV IDs were prioritized in mutational load analysis and re-found in the patient by whole genome sequencing.
CHR POS ID REF ALT Homo/Hetero Gene Effect
chr1 47614434 rs4926600 C T heterozygous CYP4A22 exon12:p.L509F
chr1 115222237 rs34526199 T A heterozygous AMPD1 exon6:p.K316I,exon7:p.K320I
chr1 169701060 rs5361 T G heterozygous SELE exon4:p.S149R
chr1 240370985 rs111896385 C T homozygous FMN2 exon5:p.P958L,exon6::p.P962L
chr2 21231524 rs676210 G A heterozygous APOB exon26:p.P2739L
chr2 28761981 rs6753929 G C heterozygous PLB1 exon11:p.V223L,exon11:p.V212L
chr2 28854972 rs74701215 C G heterozygous PLB1 exon54:p.P1312A,exon55:p.P1323A
chr2 179398509 rs3731752 C A heterozygous TTN exon186:p.G25213V, exon187:p.G25338V,T
chr3 52522023 rs779852675 C T heterozygous NISCH exon16:p.L839F
chr5 7870973 rs1801394 A G heterozygous MTRR exon2:p.I22M,exon2:p.I49M
chr5 7897285 . A T homozygous MTRR exon14:p.Q626L,exon14:p.Q653L
chr5 52356790 rs377150294 C T heterozygous ITGA2 exon12:p.R458W
chr5 148206600 rs201257377 A G heterozygous ADRB2 exon1:p.N69S
chr6 33651929 rs763462799 G A heterozygous ITPR3 exon36:p.G1641R
chr7 3990657 rs34775958 C T heterozygous SDK1 exon6:p.A317V
chr7 92085763 rs34768413 C T heterozygous GATAD1 exon5:p.R233W
chr7 94946084 rs854560 A T heterozygous PON1 exon3:p.L55M
109
Figure 3.22: Deleterious SNVs having higher allele frequency in SAS populations than in global populations.
To assess the deleterious role of non-coding variants, the variants with CADD_Phred
score ≥ 15 were determined in non-coding regions. This showed that there were 211
upstream deleterious variants, 150 downstream, 136 5‘-untranslated region, 399
3‘-untranslated region, and 19 deleterious variants pertaining to splice sites related to
831 different genes including 44 genes from the mutational load genes list.
chr10 69881812 rs772265631 G A heterozygous MYPN exon2:p.R206Q,exon3:p.R206Q
chr11 27679916 rs6265 C T heterozygous BDNF exon1:p.V66M,exon2:p.V66M
chr11 102713476 . C T heterozygous MMP3 exon2:p.G93R
chr12 6061559 rs7962217 C T heterozygous VWF exon49:p.G2705R
chr15 30008977 rs2291166 T G heterozygous TJP1 exon22:p.D1267A,exon23:p.D1271A,T
chr17 26109102 rs3730017 G A heterozygous NOS2 exon7:p.R221W
chr19 15990431 rs2108622 C T homozygous CYP4F2 exon11: p.V433M
chr22 19766782 rs4819522 C T heterozygous TBX1 exon9:p.T350M
chr22 36661354 rs148296684 C T heterozygous APOL1 exon5:p.L140F,exon6:p.L158F
chrX 153008483 rs78993751 G A heterozygous ABCD1 exon8:p.G608D
0
0.1
0.2
0.3
0.4
0.5
0.6
Alt
ern
ate
Alle
le F
req
Global_AF
SAS_AF
110
KCNJ12 was identified with six non-coding deleterious variants in
3‘-untranslated region followed by CDC27 with five deleterious mutations in 5‘-
untranslated region and BCAT1 which showed four mutations in 3‘-untranslated
region. Again, the occurrence of deleterious mutations in these regions was checked
in five randomly selected unrelated individuals from 1000 Genomes Project PJL.
These individuals did not contain mutations in 3‘-UTR of KCNJ12, and 5‘-UTR of
CDC27, however all five individuals contain four deleterious mutations in 3’-UTR of
BCAT1. This gave a clue that untranslated regions of KCNJ12 and CDC27 may also
be involved in hyperlipidemia and associated cardiac risk. A homozygous SNV
rs2516839 (C>T) was found in the 5‘-untranslated region of USF1. This SNV has been
reported to pose two fold higher risk of sudden cardiac death (Kristiansson et al.,
2008). The USF1 encodes an upstream transcription factor 1 of leucine zipper family,
and reported to be associated with hyperlipidemia and atherosclerosis in other studies
also (Laurila et al., 2010). The frequency of this variant is 0.53 in 1000 Genomes PJL,
0.44 and 0.45 in two neighboring populations Indian Telugu (ITU) and Gujrati Indian
(GIH) respectively, 0.15 in African, 0.46 in American, and 0.63 in European
populations. Another homozygous SNV rs71457130 (C>T) was observed in the 3‘-
untranslated region of LRP6 which encodes LDL receptor related protein 6 and is
involved in receptor-mediated endocytosis of lipoproteins. This variant has not been
described earlier with the risk of cardiac disease but other mutations in LRP6 have
been reported to be associated with coronary artery disease (Mani et al., 2007; Y. Xu
et al., 2014).
The annotation of variants with Variant Effect Predictor (VEP) tool predicted the effect
of SNVs on transcript level. The SNVs posing severe functional impact i.e., loss of
function (LoF) effect to the transcript were filtered through this analysis. A homozygous
stop-gained mutation ‗rs885985 (G>A)‘ in CLDN5 (Claudin 5) gene was found. This
gene encodes an integral membrane protein that forms strands of tight junctions. This
gene is highly expressed in fat tissues (Fagerberg et al., 2014) and the reduced level
of claudin-5 has been reported to be involved in human heart failure (Swager et al.,
2015). Another stop-gained mutation ‗rs328 (C>G)‘ was found in heterozygous state in
111
LPL (lipoprotein lipase) gene which has been reported to be involved in hyperlipidemia
(Shatwan et al., 2016) and risk of coronary artery disease (Xie, and Li, 2017).
3.2.8 Filtration for Disease Mutations Related to Hyperlipidemia and
Related Cardiac Diseases
In addition to the validated deleterious SNVs and LoF SNVs, the variants were also
searched for disease mutations in ClinVar, OMIM, and GWAS catalogue. ClinVar, and
OMIM databases are manually curated public archive harboring the genetic variations
associated with various diseases (Hamosh, Scott, Amberger, Bocchini, and McKusick,
2005; Landrum et al., 2014). GWAS database catalogues large number of common
variants associated with common diseases through genome wide association studies
(Welter et al., 2013). From OMIM filtration, one heterozygous intronic SNV rs2073658
(C>T) was found in USF1. This gene itself has been reported to be associated with
hyperlipidemia and atherosclerosis (Laurila et al., 2010). This variant has also been
catalogued in ClinVar as risk factor of familial hyperlipidemia combined. The global
allele frequency of the alternate allele is 17%, but in Pakistan it is 21.9%, which is
almost double than in the neighboring South Asian populations where it is 6.9% in Sri
Lankins, 11.8% in Indian Telugu, 12.6% in Gujrati Indians, and 14.5% in Bengalis.
From GWAS filtration, 73 common variants, including 40 in homozygous state, were
found to be associated with hyperlipidemia and atherosclerosis, and 147 variants
associated with coronary heart diseases, including 71 in homozygous state. From the
variants filtered for coronary heart diseases, the highest numbers of variants i.e., six
SNVs were found at 9p21.3 locus which is the non-coding region of CDKN2B which
encodes cyclin dependent kinase inhibitor 2B. Variations at this locus cause altered
expression of its gene in cardiac tissues and has been associated with coronary heart
diseases (Pilbrow et al., 2012). The second highest number of risk variants for
coronary heart diseases was found in the intergenic region of LPL and SLC18A1 at
8p21.3, which has been reported to be associated with raised lipid levels and risk of
coronary artery disease (Aulchenko et al., 2009). Further, there were four loci i.e.,
intergenic region of TRIB1-LINC00861 at 8q24.13, intronic region of PHACTR1 at
6p24.1, intergenic region of LOC101929011- BUD13 at 11q23.3, and intergenic region
112
of LDAH-APOB at 2p24.1 locus, each of which containing three risk variants for
coronary heart diseases. From the variants filtered for hyperlipidemia and
atherosclerosis, the highest numbers i.e., two risk variants were found at three loci.
These included intronic variants in ABCA1 at 9q31.1 locus, intronic variants in
CDKN2B at 9p21.3 locus, and intergenic region of TRIB1-LINC00861 at 8q24.13
locus. The gene ABCA1 encodes ATP-binding-cassette transporter A1 which is
involved in the transport and homeostasis of plasma high density lipoproteins-
cholesterol (HDL-C) levels. The malfunctioning of this transporter protein has been
found to cause increased triglycerides, causing an increased risk of CAD (Clee et al.,
2001; Frikke-Schmidt, 2011). The role of CDKN2B has been described earlier. The
intergenic region of TRIB1-LINC00861 has recently been reported to pose moderate
risk to increased cholesterol and risk of CAD (Paquette et al., 2017).
The comparison of alternate allele frequencies of filtered variants was carried out
between the SAS and Global populations. The average allele frequency of variants
associated with hyperlipidemia and atherosclerosis was 53.20% in SAS and 52.02% in
Global populations. Likewise, the average allele frequency of variants associated with
coronary heart diseases was 52.76% in SAS and 51.58% in Global populations. The
xy scatter revealed variants of these diseases having higher allele frequencies in SAS
and in Global populations (Figure 3.23). Through this analysis, four SNVs associated
with hyperlipidemia and seven SNVs associated with CAD were highlighted having 1.5
fold or higher allele frequencies in SAS than in Global populations (Table 3.13).
113
Figure 3.23: Comparison of Global and South Asian allele frequencies for variants of hyperlipidemia (blue) and ischemic heart diseases (red).
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
AF_
SAS
AF_Global
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
AF_
SAS
AF_Global
114
Table 3.13: Common variants associated with hyperlipidemia and CAD filtered from GWAS-Catalogue and having 1.5 fold or higher allele frequency in SAS than in Global populations.
CHR POS ID REF ALT State Location Gene/Locus AF_Global AF_SAS
2 21204025 rs6544366 G T Heterozygous intergenic LDAH-APOB 0.3716 0.5798
3 64029383 rs831574 A G Heterozygous intergenic PSMD6-PRICKLE2 0.2792 0.4325
6 46684222 rs1805017 C T Homozygous exonic PLA2G7 0.3175 0.5235
6 97080198 rs12200560 A G Heterozygous intergenic FHL5-GPR63 0.2949 0.4673
7 106409452 rs17398575 G A Homozygous intergenic CCDC71L-PIK3CG 0.1839 0.319
9 107665739 rs2575876 G A Heterozygous intronic ABCA1 0.2632 0.4008
11 92708710 rs10830963 C G Heterozygous intronic MTNR1B 0.2602 0.4264
12 51213433 rs17291650 A G Heterozygous exonic ATF1 0.0365 0.0624
16 57005479 rs1532624 C A Heterozygous intronic CETP 0.3131 0.4796
16 67902070 rs2271293 G A Heterozygous intronic NUTF2 0.0996 0.1667
19 11163601 rs1122608 G T Heterozygous intronic SMARCA4 0.1382 0.2495
115
3.3 Whole Exome Sequencing and Analysis of Pakistani Patients
with Cardiomyopathy
3.3.1 Sequencing Data
The exome sequencing short reads of patients under study were obtained in ‗fastq‘ file
format, which is a standard format for storing the short reads of DNA sequences along
with their quality scores (Cock, Fields, Goto, Heuer, & Rice, 2010). The summary of
short sequence reads in each of fastq is given in Table 3.14.
Table 3.14: Quality assessment of raw reads in CMP patients‘ fastq files
Samples No. of Raw reads
Read Length (bases)
Gb of Sequences
GC(%) Average quality
per read
MS-1_1.fastq 36,726,002 101 3.709 49.396 40
MS-1_2.fastq 36,726,002 101 3.709 49.396 40
MS-2_1.fastq 30,251,621 101 3.055 49.323 40
MS-2_2.fastq 30,251,621 101 3.055 49.323 40
MS-3_1.fastq 30,334,907 101 3.064 49.505 40
MS-3_2.fastq 30,334,907 101 3.064 49.505 40
MS-4_1.fastq 33,594,314 101 3.393 49.392 40
MS-4_2.fastq 33,594,314 101 3.393 49.392 40
MS-5_1.fastq 37,471,150 101 3.785 49.130 40
MS-5_2.fastq 37,471,150 101 3.785 49.130 40
3.3.2 Quality Assessment of Raw Short Reads
Quality assessment of short reads plays key role in obtaining the true positive genetic
variants from high throughput DNA sequencing data. Filtering low quality reads is
inevitable to determine meaningful genotypes and hence increases accuracy for
associating them with disease/phenotype (Carson et al. 2014). The quality
assessment with FastQC tool showed that each fastq file had average Phred quality
score of reads 40, with none of the reads having quality score below 20 (Figure 3.24).
The GC contents for each file were found to be 49%.
116
Figure 3.24: Phred quality score distribution of forward and reverse ‘fastq‘ files.
117
3.3.3 Alignment with the Reference Genome and Variants Calling
The results of paired-end alignment of short reads with reference human genome were
obtained in Sequence Alignment Map (SAM) format files, which were converted into
binary format BAM files. After the removal of duplicates in BAM files, the insert size of
aligned reads was determined with PICARD (picard-tools-1.109) to evaluate the
correct alignment of short reads with the reference genome. The insert size was found
to be 100-300 bp for all the five samples with the peak value of ~200bp (Figure 3.25).
The numbers of reads aligned with the reference genome were determined with
‗samtools flagstat‘. The depth of coverage was calculated to be 87.86x - 108.12x for all
the samples. The ratio of mapped reads with the reference without duplicates was
>85%. For each of BAM file, the results are summarized in Table 3.15.
Table 3.15: Mapped reads and raw depth of coverage for BAM files
Sample Mapped reads from
‘duplicates removed bam
file’
Mapped reads (%)
Total bases in mapped
reads (mapped
reads * 101)
Size of Target
(bp)
Raw Depth of Coverage
MS1_AJK 63,721,330 86.75 6,435,854,330 60,000,000 107.26 x
MS2_SND 52,749,145 87.18 5,327,663,645 60,000,000 88.79 x
MS3_BLC 52,196,433 86.03 5,271,839,733 60,000,000 87.86 x
MS4_PNJ 56,662,519 84.33 5,722,914,419 60,000,000 95.38 x
MS5_URD 64,229,947 85.71 6,487,224,647 60,000,000 108.12 x
118
Figure 3.25: Insert size for all the five bam files.
119
By applying the best practices of GATK pipeline for joint variants calling from the BAM
files, initially 421194 variants were called in raw ‗.vcf‘ file in which the minimum phred-
scaled confidence threshold was set to 30. This means a variant was called at a site
where minimum Phred quality of mapping was 30. Filtering the variants call set with
different filters enhances the variants accuracy. After applying the filter of Read_Depth
≥ 20, Genotyp_Quality ≥ 20, variant Quality ≥ 50, and variants calling quality score
recalibration (VQSR), 183159 variants were retained in the filtered ‗vcf‘ file. During
variants quality score recalibration, maximum truth was obtained after applying 99.3%
sensitivity filter for SNPs, and 95% sensitivity filter for indels, which is comparable to
those in ExAC data. The number of variants at each filtration step are summarized in
Table 3.16.
Table 3.16: Numbers of variants after applying different filters
Filtering Step No. of variants
Novel variants
Novel (%)
Ti/Tv
Raw vcf file 421194 12769 3.03 2.21
DP ≥ 20 251478 10030 3.99 2.23
DP ≥ 20, GQ_MEAN ≥ 20 233325 8862 3.80 2.26
DP ≥ 20, GQ_MEAN ≥ 20, QUAL ≥ 50
229423 7829 3.41 2.27
DP ≥ 20, GQ_MEAN ≥ 20, QUAL ≥ 50, VQSR
184461 2775 1.50 2.43
DP ≥ 20, GQ_MEAN ≥ 20, QUAL ≥ 50, VQSR, bi-allelic variants only
183159 2670 1.46 2.43
DP ≥ 20, GQ_MEAN ≥ 20, QUAL ≥ 50, VQSR, bi-allelic, SNPs only
168438 2221 1.32 2.43
DP ≥ 20, GQ_MEAN ≥ 20, QUAL ≥ 50, VQSR, bi-allelic, Exonic (SNPs)
38853 304 0.78 3.19
DP ≥ 20, GQ_MEAN ≥ 20, QUAL ≥ 50, VQSR, bi-allelic, Indels only
14721 449 3.05 -
DP ≥ 20, GQ_MEAN ≥ 20, QUAL ≥ 50, VQSR, bi-allelic, Exonic (Indels)
405 13 3.21 -
The application of GATK‘s variant quality score recallibration (VQSR) model,
considerably decreased the proportion of novel variants (3.41% to 1.50%), and Ti/Tv
ratio also improved from 2.27 to 2.43. The VQSR model is built on the annotation
values of standard variants of HapMap, and 1000 Genomes Project Phase-1 SNPs
120
and indels. The overall Ti/Tv ratio for (exonic + non-exonic) variants, which should be
more than 2.0. Likewiese, the Ti/Tv ration for exonic variants was calculated to be
3.19, which is also in the acceptable range of > 2.80 (Bainbridge et al., 2011). The
histogram of the depth of coverage (DP) for SNPs showed a bimodal distribution with
peak values of 41 and 200, and for indels a unimodal distribution with peak value of 41
(Figure 3.26).
Figure 3.26: Histogram for the depth of coverage for SNPs (A) and indels (B).
The total number of finally filtered bi-allelic variants was 181111, in which the exonic
variants were 39258 (21.68%). The large number of non-exonic variants (78.32%) are
due to the capturing of flanking regions of the 60-Mb target by the broad range
enrichment kit (SureSelectXT Human All Exon v6 kit (Agilent Technologies, Santa
Clara, CA). The number of exonic variants for each of the patient were determined to
evaluate whether the contribution of each individual to the merged variants file was
uniform or else. On average, the exonic variants were found to be ~20308 (SD ± 450)
per individual. The number of per individual exonic variants is comparable with
European-Americans of earlier large scale whole exome sequencing studies
(Bamshad et al., 2011).
121
Table 3.17: Numbers of variants after applying different filters
Sample Total variants
Total SNPs
Total indels
Exonic SNPs
Exonic indels
Novel in Total (%)
MS1_AJK 99701 91539 8162 20909 251 644 (0.65)
MS2_SND 94406 86787 7619 19845 241 626 (0.66)
MS3_BLC 95960 88273 7687 20652 262 572 (0.60)
MS4_PNJ 95408 87698 7710 20058 251 573 (0.60)
MS5_URD 94559 86709 7850 20077 265 685 (0.72)
Average 96007 88201 7806 20308 254 620
SD (±) 2160.24 1976.48 216.23 449.94 9.64 48.35
3.3.4 Annotation of Single Nucleotide Variants (SNVs) and Analysis
SNVs annotation was performed with three tools i.e. ANNOVAR, CADD, and VEP.
The number of variants and their corresponding genes/regions were also evaluated,
which are described in the foloowing sections.
3.3.4.1 Annotation with ANNOVAR, and CADD
The ANNOVAR tool annotates variants to determine their genomic and region based
features, evolutionary conservation scores, prediction scores by SIFT, Polyphen2,
Mutation assesser tools, and allele frequencies in different databases. Summary of the
SNVs as determined by the ANNOVAR annotaiton is given in Table 3.18. The
individual from Azad Jammu Kashmir (AJK) contained slightly larger number of SNVs
in each of the category.
122
Table 3.18: The number of SNVs pertaining to different genomic regions and functions after annotation with ANNOVAR.
Annotation AJK BLC PUN SND URD Total
Exonic 20332 19265 20079 19496 19522 38203
Intronic 51295 49009 49076 49841 48562 94863
Intergenic 7707 6975 7310 6903 7244 13669
Upstream 1217 1151 1191 1135 1136 2222
Downstream 509 443 484 483 478 947
UTR5‘ 1822 1694 1772 1634 1753 3361
UTR3‘ 2918 2724 2851 2814 2724 5402
Synonymous 10729 10113 10634 10309 10302 19740
Nonsynonymous 9402 8963 9233 8984 9025 18049
Nonsyn/syn ratio 0.88 0.89 0.87 0.87 0.88 0.91
The ANNOVAR annotation with Refseq data revealed that SNVs correspond to 23477
genes. On average, every person contained 9121.4 (SD ± 190) nonsynonymous
mutations. The nonsynonymous to synonymous (Nonsyn/syn) ratio was also
calculated and was found almost constant (~0.9) for each of the individuals, which is
same as reported in 1000 Genomes Project South Asian populations i.e., 0.90 (1000
Genomes Project, 2015). The Nonsyn/syn ratio gives a broader clue of genes which
have been mutated to code for proteins with altered amino acid. The genes harboring
more nonsynonymous variants than the synonymous variants may contribute more to
the susceptibility/onset of disease. There were 8416 genes containing
nonsynonymous mutations. The nonsyn/syn ratio in these genes was significantly
higher than the overall ratio, i.e., 1.40 versus 0.91 of overall ratio. This is because
most of the rare missense mutations are deleterious in nature with disrupting the
proteins‘ functions and causing disease (Kathiresan, and Srivastava, 2012).
To determine the genes which were highly mutated to code changed amino acids in
proteins, the genes were sorted in descending order according to the number of
nonsynonymous mutations they contained. This revealed that 4041 genes contained
more than one nonsynonymous mutation, while there were 116 genes containing ≥ 10
123
nonsynonymous mutations. The MUC16 (mucin 16, cell surface associated) was the
top candidate containing 74 missense mutations as a whole. This gene encodes the
carbohydrate antigen 125 (CA-125). The CA-125 is a known tumor marker for ovarian
cancer, its level was also found elevated in patients of hypertrophic cardiomyopathy
leading to heart failure (Varol et al., 2007). In another longitudinal study, the CA-125
level was significantly increased in patients with major adverse cardiovascular events
(MACE) including the left ventricular dysfunction (Betti, Ballo, Barchielli, and Zuppiroli,
2010). The ZNF717 (Zinc Finger Protein 717) contained second highest missense
mutations i.e., 68. As the name suggests, this gene codes a zinc finger which is
involved in the regulation of many genes involved in cell proliferation, differentiation
and apoptosis. Literature survey did not show any previous association of this gene
with cardiomyopathies or any other cardiac disorder. The third top most gene was
MUC3A (mucin 3A, cell surface associated) which encodes some membrane bound
and secretary epithelial glycoproteins. This gene was also not found reported to be
associated with any of the cardiac disorders. The well-studied and characterized gene
to be associated with cardiomyopathies TTN (titin, a striated muscular structural
protein) was found at top fourth position in this study containing 49 missense
mutations. The top 1% genes carrying highest number of missense mutations are
listed in table 3.19.
Table 3.19: The top 1% genes containing nonsynonymous mutations.
Gene Nonsyn mutations
Gene Nonsyn mutations
MUC16 74 ALPK2 18
ZNF717 68 CFAP46 18
MUC3A 53 DNHD1 18
TTN 49 MALRD1 18
HYDIN 37 MUC22 18
OBSCN 35 PARP4 18
CMYA5 30 SVEP1 18
SSPO 29 XIRP2 18
CCDC168 28 DNAH14 17
FSIP2 26 RP1L1 17
124
PDE4DIP 26 ADGRV1 16
MKI67 23 DCHS2 16
AHNAK2 22 FAT1 16
OR4C3 22 NEB 16
SPTBN5 22 PKD1L2 16
SYNE2 21 FBN3 15
MUC4 20 LAMA5 15
C1orf167 19 RNF213 15
EYS 19 USH2A 15
KCNJ12 19 VWDE 15
ZAN 19 -- -
125
Overall, the CADD predicted 11246 variants as deleterious (based on CADD_phed
score ≥ 15), SIFT predicted 4424 variants as deleterious, while Polyphen2 predicted
3058 variants as deleterious. There was also overlapping of predicted deleterious
variants among the three tools (Figure 3.27). The number of combined predicted
deleterious SNVs by all the three tools was found to be 1469, in which 325 were in
homozygous state. There were 41 predicted deleterious novel variants, all in
heterozygous state. Here, the ratio of novel variants was found to be 2.79%, which is
higher than the overall ratio of novel variants (Table 3.16). This is inline with the
previous reports that deleterious mutations with disrupting the proteins‘ functions and
causing disease are usually rare in populations (Kathiresan, and Srivastava, 2012).
The significantly higher number of deleterious variants predicted by CADD as
compared to SIFT and Polyphen2 is due to that it predicts the effect of non-coding
variants also in addition to coding variants, while SIFT, and Polyphen2 predict the
effect of missense coding variants. Polyphen2 predicted the least number of SNVs as
deleterious as compared to SIFT, and CADD.
Figure 3.27: Venn diagram showing the number of SNVs predicted as deleterious by SIFT, Polyphen2, and with CADD_phred score ≥ 15.
The deleterious variants were also determined in each of the patients of this study by
all these three tools seperately and combined (Figures 3.28 to 3.31). Overall, the DCM
126
patient from Punjab (PNJ) contained larger number of deleterious variants, while the
patient from Sindh (SND) contained highest number of homozygous deleterious
variants. There were 84 SNVs predicted as deleterious by all three tools and present
in all five patients, out of which 10 were in homozygous state (Figure 3.31). These 10
homozygous deleterious SNVs belonged to all different genes (Table 3.20). It is
noteworthy that there was no overlap between the genes carrying homozygous
deleterious SNVs in all five patients and the top 1% genes containing highest number
of missense mutations, indicating that both of these criteria have different importance
in associating deleterious variants with disease.
Figure 3.28: The SNVs predicted as deleterious by SIFT.
Figure 3.29: The SNVs predicted as deleterious by Polyphen2.
127
Figure 3.30: The SNVs with CADD_phred score ≥ 15.
Figure 3.31: The combinedly predicted deleterious SNVs with CADD (phred score ≥ 15) and SIFT, and Polyphen2 tools.
128
Table 3.20: The homozygous deleterious SNVs present in all five patients of this study.
CHR POS ID REF ALT GENE Substitution
chr1 248059476 rs12139390 A C OR2W3 exon1: A588C
chr3 44948479 rs3749195 C T TGM4 exon10: C1114T
chr3 50597092 rs1034405 G A C3orf18 exon5: C485T
chr4 17643848 rs2286771 G A FAM184B exon13: C2350T
chr9 6328947 rs3847262 T C TPD52L3 exon1: T352C
chr11 5758062 rs7397032 T C OR56B1 exon1: T316C
chr17 17896205 rs4584886 C T DRC3 exon6: C571T
chr17 39659913 rs9891361 G A KRT13 exon2: C560T
chr18 61654463 rs3826616 A G SERPINB8 exon6: A530G
To determine whether SNVs of all frequencies have contributed equally to the
deleterious pool, the site frequency spectrum (SFS) of all SNVs and deleterious SNVs
was compared. Here, the SNVs with CADD_phred score ≥ 15 were used as
deleterious because CADD predicted the effect of larger number of coding as well as
non-coding SNVs. This analysis showed that singletons contributed most to the
deleterious pool, whose proportion was increased from 0.318 (in all SNVs) to 0.462 (in
deleterious pool). The proportion of doubletons was slightly increased from 0.157 to
0.164 in deleterious pool. Besides these two classes of SNVs, the proportion of all
other SNVs of AC=3 to AC=10 was found to be decreased in the deleterious pool
(Figure 3.32). SFS is a simple yet powerful way to study the population demography in
terms of medical genetics and evolutionary history. The greater proportion of individual
specific deleterious singletons, as observed here, is persistent with the view that rare
allele frequency variants are more severe in deleterious effect whereby imparting more
to the susceptibility and onset of complex diseases in human (Bomba, Walter, and
Soranzo, 2017; Kryukov, Pennacchio, and Sunyaev, 2007). Also, these rare
deleterious variants were individual specific because such variants are under tight
selective pressure, and tend to remain individual or population specific (Henn, Botigué,
Bustamante, Clark, and Gravel, 2015). Few deleterious fixed or nearly fixed SNVs
(AC=8 to AC=10) in these patients may pose moderate effect to the fitness, yet
129
increased in frequency due to less efficiency of purifying selection during the
expansion or some bona fide balance, as this was demonstrated by larger datasets of
different non-African populations (Henn, Botigué, Bustamante, Clark, and Gravel,
2015; Subramanian, 2016).
Figure 3.32: Site Frequency Spectrum of all SNVs (A), and deleterious SNVs (B). The proportion of deleterious singletons is considerably increased, while the proportion of higher allele count deleterious variants decreased than in all SNVs pool.
A
B
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Pro
po
rtio
n
130
Cardiomyopathies are rare genetic disorders with a global prevalence of 1 in 2500
inviduals. In the light of this, it was hypothesized that variants of rare allele frequency
would contribute more to the susceptibilty of this disease. So, the combined predicted
deleterious SNVs were filtered based on global allele frequency in 1000 Genomes
Project. This filteration highlighted 350 SNVs with global minor allele frequency < 0.01
(i.e. < 1%). Out of these 350 SNVs, 19 SNVs were present in homozygous state in at
least one of the patients under study, in which 18 were nonsynonymous SNPs causing
the change in amino acid in the resulting proteins, while 1 was intergenic SNP (Table
3.21).
Table 3.21: The homozygous deleterious SNVs with Global MAF < 1%.
CHR POS ID REF ALT Gene Effect
4 184170008 rs574720107 G A WWC2 exon7:p.G292R
6 24520719 rs115784602 G A ALDH5A1 exon6:p.V321M
6 33132114 rs146555195 C T COL11A2 exon62:p.R1560H exon63:p.R1581H exon65:p.R1667H
6 160557271 rs4646278 C T SLC22A1 exon5:p.R287W
8 99135575 rs145484648 G A POP1 exon2:p.A4T
8 133984968 rs115436575 G A TG exon34:p.G2061R
9 95179071 rs34607425 G A OMD exon2:p.P257L
10 13237117 rs74881009 C T MCM10 exon14:p.R608W exon14:p.R609W
10 24762320 rs571833641 C G KIAA1217 exon2:p.S55C exon6:p.S337C exon7:p.S257C
10 71144670 rs554507867 C T HK1 exon12:p.A613V exon15:p.A617V exon16:p.A601V
11 107375422 rs117249984 C A ALKBH8 exon12:p.D656Y
12 30906393 rs573013586 C T CAPRIN2 exon1:p.S102N
12 129299599 rs144816528 G A SLC15A4 exon2:p.P188L
12 133202740 rs5745068 C T POLE exon46:p.R2165H
16 67235477 rs183146864 G A ELMO3 exon10:p.V337I
19 18502861 rs34666550 C T LRRC25 exon2:p.C285Y
19 33298507 rs550817829 C T TDRD12,SLC7A9 intergenic
19 46332369 rs141706016 C T SYMPK exon14:p.R615H
20 2796251 rs576728084 G A C20orf141 exon2:p.G110S exon3:p.G110S
131
Comparison of allele frequencies of deleterious variants enables to get understanding
of mutational load of diseases in different populations, because populations differ in
their genetic makeup depending on their past evolutionary histories (Henn, Botigué,
Bustamante, Clark, and Gravel, 2015). Here, the derived states of 350 rare SNVs were
determined from online CADD annotation, and their derived allele frequencies in
Global and South Asian populations were retrieved and compared in a simple xy
scatter. This analysis showed that there were 278 SNVs having higher derived allele
frequency in South Asia than in Global populations (Figure 3.33).
Figure 3.33: Scatter plot of 350 deleteroius SNVs for comparison of derived allele frequencies in South Asia and in Global populations. Here, the correlation coefficient (r) was 0.72, (regression line, blue) representing a bit inclination of alleles towards South Asia. The SNVs right to the diagonal (black line) have higher allele frequency in Saouth Asia than in Golobal populations.
132
3.3.4.2 Annotation with Variant Effect Predictor (VEP)
The genome of a healthy individual contains about 100 Loss of Function (LoF)
variants, out of which, ~20 are in homozygous state. The LoF variants include stop-
gained, stop-lost, start-lost, splice-donor, splice-acceptor, and frameshift
insertion/deletion. Complete knock-out of genes with LoF variants helps understanding
the function of those genes, whereby predicting their relevance to certain diseases
(Borger, 2017; Kaiser, 2014; MacArthur et al., 2012).
Annotation of the variants under study was also carried out by using Ensembl‘s Variant
Effect Predictor (VEP) tool, version 87. Overall, 696 LoF SNPs were identified
pertaining ‗HIGH‘ impact to the transcript. The highest numbers of LoF SNPs were
those acquiring a stop codon due to mutation and causing the premature termination
of protein synthesis in translation process, so termed as protein truncating variants
(PTVs). As a whole, the LoF SNPs with respect to different functional consequences
have been illustrated in Figure 3.34.
Figure 3.34: Numbers of Loss of Function SNPs according to functional consequences.
131
127
88 81
269
splice_donor
splice_acceptor
stop_lost
start_lost
stop_gained
133
These LoF SNPs were corresponding to 655 genes, and there were 31 genes
containing more than one LoF mutations. Here, MUC19 was the top most candidate
gene containing 5 LoF mutations (3 stop-gained, 1 splice-acceptor, and 1 splice-
donor). MUC19 belonged to the same family as MUC16, which was the top candidate
gene containing highest number of nonsynonymous SNPs in this study (Table 3.20).
MUC19 encodes a gel-forming mucin protein, which has previously been associated
with Sjögren syndrome (D. Yu et al., 2008). The genes containing second highest
number of LoF mutations (4) were ZNF717 and PKD1L2. It is noteworthy that ZNF717
was also the candidate gene containing highest number of nonsynonymous mutations
in this analysis (Table 3.20). Further, there were 15 novel SNPs with loss of function
consequence, including one in homozygous state. This novel homozygous SNP with
stop-gained function in FGD3 (chr9: 95778067_C/T) was noted in one of the patients
(Sindhi). This gene encodes a ‗FYVE, RhoGEF and PH domain containing 3‘ protein
which is involved in neurotrophin receptor p75 (NTR) signaling pathways. Mutations in
this gene have not been previously reported to be associated with any of cardiac
disorders. The total homozygous LoF SNP sites were found to be 86. On average,
each person contained ~34.6 homozygous Loss of Function SNPs, which is slightly
higher than the number as was reported in a healthy individual‘s genome (MacArthur,
and Tyler-Smith, 2010).
As hypothesized earlier that rare deleterious variants contribute more to complex,
multigenic and Mendelian disorders (Bomba, Walter, and Soranzo, 2017), the allele
frequencies of LoF SNVs were determine. For this, the global derived allele
frequencies and South Asian derived allele frequencies were retrieved. The allele
frequency spectrum using the DAF SAS showed that only 45 SNVs had rare allele
frequency (DAF < 1%), 65 SNVs had low allele frequency (1% ≤ DAF ≥ 5%), while
majority of SNVs i.e., 471 belonged to common allele frequency (DAF > 5%).
Surprisingly, this also included 44 fixed or nearly fixed SNVs with DAF > 80% (Figure
3.35 A). Owing to the large numbers of common, nearly fixed and fixed LoF SNVs,
these were characterized based on their evolutionary conservation, as suggested
earlier (MacArthur et al., 2012). The genomic evolutionary rate profiling scores
(GERP++ scores) were retrieved through ANNOVAR annotation. The GERP++ tool
134
uses maximum likelihood evolutionary rate estimation on every position and assigns
score to the variants based on the selective constraint (Davydov et al., 2010). Here,
the average GERP++ score of all LoF SNVs was found to be quite low i.e. 0.65 (Figure
3.35 B), implying that the genes harboring LoF SNVs belonged to evolutionary less
conserved regions. In fact, most of the genes having more than one LoF SNVs have
negative value of GERP++ score or < 2.00, e.g. the maximum GERP++ score for
ZNF717 variants was 1.29. However, there were 84 LoF SNVs with very high GERP++
score i.e. > 4.00 corresponding to evolutionary constraint regions. These variants at
evolutionary constraint sites pose greater deleterious effect to the fitness. This also
included MUC19 which overall has highest number of LoF SNVs and two with
GERP++ score of 5.69 and 6.16.
Figure 3.35: Loss of Functions (LoF) SNVs. (A) Allele frequency spectrum of all LoF SNVs in South Asia. (B) Genomic evolutionary rate profiling (GERP++) scores for LoF SNVs.
In order to filter out LoF SNVs more precisely to highlight those causing maximum
detrimental effect to the structure and function of protein and affecting the fitness, the
ratio of affected transcripts and total transcripts was determined in each of gene
carrying LoF SNV. This is because the LoF SNVs may affect only a small subset of
transcripts of a gene which undergoes many alternative splicings post transcriptionally,
135
whereby masking the effect of observed LoF variant. The position of LoF SNVs were
also noted, because mutations inducing a stop codon near the 3‘-end of mature
transcript would not affect the protein as severely as those truncating the proteins in
central domains of its structure. This analysis highlighted six genes, IFNE, MAGEE2,
OR4P4, OR5AR1, PTCHD3P2, and SIX1, which were consisting of only one transcript
and that too affected with LoF SNV (Table 3.22). Out of these, SIX1 has been reported
to be associated with dilated cardiomyopathy. This gene encodes a homeobox protein
which is involved in transcriptional regulation of genes taking part in development of
several organs including muscles, kidney, and inner ear (Tschirner et al., 2014;
Williams et al., 2011). The other five genes have not been previously reported to be
associated with cardiomyopathies.
Table 3.22: The LoF SNVs affecting all transcripts of their genes.
CHR POS ID REF ALT Gene State Effect
chr2 170624221 rs2114646 C T PTCHD3P2 Heterozygous splice-donor
chr9 21481483 rs2039381 G A IFNE Homozygous stop-gained
chr11 55406022 rs76160133 C G OR4P4 Homozygous stop-gained
chr11 56431216 rs11228710 C T OR5AR1 Homozygous stop-gained
chr14 61124940 rs10143202 A G SIX1 Homozygous start-lost
chrX 75004529 rs1343879 C A MAGEE2 Heterozygous stop-gained
The xy scatter of Global DAF and South Asian DAF using the global rare LoF SNVs
showed inclination of alleles towards South Asia, representing higher allele frequency
in South Asia than in other populations of the world.
136
3.3.5 Annotation of Small Indels and Analysis
Currently there are limited tools to predict the effect of indels. Here, two tools CADD
and Ensembl‘s VEP were used to annotate the indels.
3.3.5.1 Annotation with CADD
The indels were annotated with online Webserver of CADD v1.3 with full annotations
option. There were 935 indels with CADD_phred score ≥ 15. These deleterious indels
corresponding to 772 genes, in which 67 genes were having more than one
deleterious indels. Here, ZNF717 was the top most gene having 14 indels with
CADD_phred score ≥ 15, while SARM1, being the second top most, was having 6
indels with CADD_phred score ≥ 15. The consequences of all indels with
CADD_phred score ≥ 15 are shown in Figure 3.36.
Figure 3.36: Functional consequences of indels with CADD_phred ≥ 15.
3.3.5.2 Annotation with VEP
The annotation of small indels (insertions/deletions) with the same version of
Ensembl‘s VEP tool, revealed 557 indels with ‗HIGH‘ impact to the transcript, in which
19 indels were novel. These LoF indels were attributing to 488 genes, including 48
genes having more than one LoF indels. Here, again ZNF717 was the top most gene
346 20
80
1
10 8
85
87
frameshift
inframe_insertion
inframe_deletion
stop_gained
splice_donor
splice_acceptor
downstream
upstream
137
having 14 LoF indels, while SARM1, being the second top most, was having 6 LoF
mutations. It was noted that majority of the indels were frameshift (92%), yet there
were few stop-gained, splice-donor, and splice-acceptor indels also (Figure 3.37). The
homozygous LoF indels were also enumerated which were found to be 270, and no
novel homozygous LoF indel was observed in this analysis.
Figure 3.37: Loss of Function indels according to functional consequences.
3.3.6 Filtration of Variants of ClinVar, OMIM, and GWAS databases
The filtration of the variants based on pathogenicity in ClinVar database identified one
SNV rs1008642 associated with cardiomyopathies and long QT syndrome. This
missense SNV belongs to ssu-2 homolog (C. elegans) gene (SSUH2) involved in heat
shock protein binding and unfolded protein binding. The SNV was found in two of the
patients in heterozygous form. Its allele frequency in Pakistani populations was found
to be 17.8% according to 1000 Genomes Project database, while its global allele
frequency was noted to be 37.1% with major contribution of African populations i.e.,
62.1%. Likewise, from GWAS catalogue and OMIM database, I did not find any variant
in the patients. This represents the lack of genetic research on cardiomyopathies in
Pakistani population and its representation in medical genetics databases.
513
6
25 27
frame_shift
stop_gained
splice_donor
splice_acceptor
138
Chapter 4.0
Conclusion
139
Conclusion
The present study entails the genetic predisposition to cardiovascular diseases in
Pakistani population. The genetic variants prioritized here as deleterious using the
bioinformatics tools present a framework of early assessing the genetic risk factors for
CVDs from the whole genome or whole exome sequencing datasets. This study
concludes that the underlying detrimental mutational burden is higher for common and
polygenic CVDs than for the Mendelian CVDs in Pakistan. Among common CVDs, the
highest numbers of harmful mutations in a descending order are for hypertension,
atherosclerosis, coronary aneurysm, heart failure, and coronary artery disease.
Likewise for Mendelian CVDs, the highest numbers of harmful mutations in a
descending order are for cardiomyopathies, cardiac arrhythmias, and congenital heart
defects. The identification of prioritized detrimental variants in patients of
hyperlipidemia and cardiomyopathies highlighted genes potentially involved in the
pathophysiology of these disorders.
This study also concludes that although majority of the harmful mutations for CVDs
prioritized in Pakistani population are grouped with neighboring South Asian
populations and Europeans and Americans, yet there are few deleterious variants
which are moderately and greatly differentiated in this population having considerably
higher allele frequency than in other populations of the world. Such differentiated
deleterious mutations can potentially play more role in the pathophysiology of CVDs in
this region of the world.
In future, more patients of different CVDs can be recruited for whole genome and/or
exome sequencing analysis for validating and explore more genes which are
potentially involved in the pathophysiology of various forms of CVDs. This approach
can lead to formulation of panel for Pakistani-population specific genetic risk factors
associated with CVDs for early assessment.
140
5.0 Publications
From the Thesis: Shakeel, M., Irfan, M., and Khan, I.A. (2018). Estimating the mutational load for
cardiovascular diseases in Pakistani population. PloS One, 13(2):e0192446. Shakeel, M., Irfan, M., and Khan, I.A. (2018). Rare genetic mutations in Pakistani
patients with dilated cardiomyopathy. Gene, 673: 134-139. Shakeel, M., Irfan, M., Khan, W., Azim, M.K., and Khan, I.A., (2018). Whole genome
sequencing of a Pakistani obese person identifies rare mutations (Manuscript in preparation).
Other than Thesis: Ali, A., Khan, W., Shakeel, M., and Khan, I.A. (2018). Distinct landscape of mutations
in cervical cancer revealed by somatic mutation signatures - a study inferring mutational signatures from somatic genomic variants in different cancer types (submitted, under review).
Khan, I.A., Anwar, M., Shakeel, M., Bergström, A., Narasimhan, V., Xue, Y., Tyler-
Smith, C., and Ayub Q (2018). Mutational load in genes associated with heritable blood disorders in Pakistan‖ (Manuscript in preparation).
141
Chapter 6.0
References
142
1000 Genomes Project. (2010). A map of human genome variation from population-scale sequencing. Nature, 467(7319), 1061-1073.
1000 Genomes Project. (2012). An integrated map of genetic variation from 1,092
human genomes. Nature, 491(7422), 56-65. 1000 Genomes Project. (2015). A global reference for human genetic variation. Nature,
526(7571), 68-74. Abid, A., Akhtar, N., Khaliq, S., and Mehdi, S. Q. (2011). Genetic heterogeneity for
autosomal dominant familial hypertrophic cardiomyopathy in a Pakistani family. Journal of the College of Physicians and Surgeons Pakistan, 21(4), 202-206.
Ahmed, W., Ali, I. S., Riaz, M., Younas, A., Sadeque, A., Niazi, A. K., Niazi, S. H., Ali,
S. H. B., Azam, M., and Qamar, R. (2013). Association of ANRIL polymorphism (rs1333049: C>G) with myocardial infarction and its pharmacogenomic role in hypercholesterolemia. Gene, 515(2), 416-420.
Ahmed, W., Malik, M., Saeed, I., Khan, A. A., Sadeque, A., Kaleem, U., Ahmed, N.,
Ajmal, M., Azam, M., and Qamar, R. (2011). Role of tissue plasminogen activator and plasminogen activator inhibitor polymorphism in myocardial infarction. Molecular Biology Reports, 38(4), 2541-2548.
Ajmal, M., Ahmed, W., Akhtar, N., Sadeque, A., Khalid, A., Benish Ali, S. H., Ahmed,
N., Azam, M., and Qamar, R. (2011). A novel pathogenic nonsense triple-nucleotide mutation in the low-density lipoprotein receptor gene and its clinical correlation with familial hypercholesterolemia. Genetic Testing and Molecular Biomarkers, 15(9), 601-606.
Akil, L., and Ahmad, H. A. (2011). Relationships between obesity and cardiovascular
diseases in four southern states and Colorado. Journal of Health Care for the Poor and Underserved, 22(4 Suppl), 61.
Al Turki, S., Manickaraj, A. K., Mercer, C. L., Gerety, S. S., Hitz, M.-P., Lindsay, S.,
D‘Alessandro, L. C., Swaminathan, G. J., Bentham, J., and Arndt, A.-K. (2014). Rare variants in NR2F2 cause congenital heart defects in humans. The American Journal of Human Genetics, 94(4), 574-585.
Alvi, F. M., and Hasnain, S. (2009). ACE I/D and G2350A polymorphisms in Pakistani
hypertensive population of Punjab. Clinical and Experimental Hypertension, 31(5), 471-480.
Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data.
Babraham Bioinformatics, 175-176. Angulo, M., Butler, M., and Cataletto, M. (2015). Prader-Willi syndrome: a review of
clinical, genetic, and endocrine findings. Journal of Endocrinological Investigation, 38(12), 1249-1263.
Antonarakis, S. E., Lyle, R., Dermitzakis, E. T., Reymond, A., and Deutsch, S. (2004).
Chromosome 21 and down syndrome: from genomics to pathophysiology. Nature Reviews. Genetics, 5(10), 725.
143
Arnett, D. K., Baird, A. E., Barkley, R. A., Basson, C. T., Boerwinkle, E., Ganesh, S. K., Herrington, D. M., Hong, Y., Jaquish, C., and McDermott, D. A. (2007). Relevance of genetics and genomics for prevention and treatment of cardiovascular disease. Circulation, 115(22), 2878-2901.
Artham, S. M., Lavie, C. J., Milani, R. V., and Ventura, H. O. (2009). Obesity and
hypertension, heart failure, and coronary heart disease-risk factor, paradox, and recommendations for weight loss. The Ochsner Journal, 9(3), 124-132.
Aulchenko, Y. S., Ripatti, S., Lindqvist, I., Boomsma, D., Heid, I. M., Pramstaller, P. P.,
Penninx, B. W., Janssens, A. C. J., Wilson, J. F., and Spector, T. (2009). Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts. Nature Genetics, 41(1), 47-55.
Aziz, K. U., Faruqui, A., Patel, N., and Jaffery, H. (2012). Prevalence and awareness of
cardiovascular disease including life styles in a lower middle class urban community in an Asian country. Pakistan Heart Journal, 41(3-4), 11-20.
Badimon, L., and Vilahur, G. (2012). LDL‐cholesterol versus HDL‐cholesterol in the atherosclerotic plaque: inflammatory resolution versus thrombotic chaos. Annals of the New York Academy of Sciences, 1254(1), 18-32.
Bainbridge, M. N., Wang, M., Wu, Y., Newsham, I., Muzny, D. M., Jefferies, J. L.,
Albert, T. J., Burgess, D. L., and Gibbs, R. A. (2011). Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities. Genome biology, 12(7), R68.
Bamshad, M. J., Ng, S. B., Bigham, A. W., Tabor, H. K., Emond, M. J., Nickerson, D.
A., and Shendure, J. (2011). Exome sequencing as a tool for Mendelian disease gene discovery. Nature Reviews. Genetics, 12(11), 745-755.
Bergen, A. C. (2015). Mutation load under additive fitness effects. Genetics Research
(Camb.), 97(e2), 1-10. Betti, I., Ballo, P., Barchielli, A., and Zuppiroli, A. (2010). Prognostic role of CA-125 in a
population at high risk for cardiovascular disease: results from the Probe-HF Study. Journal of the American College of Cardiology, 55(10), A62-E595.
Bezzina, C. R. (2008). Genetics of cardiomyopathy and channelopathy. Heart and
Metabolism, 41, 5-10. Bomba, L., Walter, K., and Soranzo, N. (2017). The impact of rare and low-frequency
genetic variants in common disease. Genome Biology, 18(77), 1-17. Borger, P. (2017). Natural Knockouts: Natural Selection Knocked Out. Biology (Basel),
6(43), 1-6. doi:10.3390/biology6040043 British Heart Foundation. (2017). Cardiovascular disease. Retrieved from the British
Heart Foundation website: https://www.bhf.org.uk/heart-health/conditions/ cardiovascular-disease.
144
Brotman, D. J., Walker, E., Lauer, M. S., and O‘Brien, R. G. (2005). In search of fewer independent risk factors. Archives of Internal Medicine, 165(2), 138-145.
Brown, C. A., McKinney, K. Q., Kaufman, J. S., Gravel, R. A., and Rozen, R. (2000). A
common polymorphism in methionine synthase reductase increases risk of premature coronary artery disease. Journal of Cardiovascular Risk, 7(3), 197-200.
Cahill, T. J., Ashrafian, H., and Watkins, H. (2013). Genetic cardiomyopathies causing
heart failure. Circulation Research, 113(6), 660-675. Cambien, F., and Tiret, L. (2007). Genetics of cardiovascular diseases: from single
mutations to the whole genome. Circulation, 116(15), 1714-1724. doi:10.1161/circulationaha.106.661751
Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte,
N., Lopez, R., and Apweiler, R. (2004). The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucleic Acids Research, 32(suppl_1), D262-D266.
Chang, S. S., Grunder, S., Hanukoglu, A., Rösler, A., Mathew, P., Hanukoglu, I.,
Schild, L., Lu, Y., Shimkets, R. A., and Nelson-Williams, C. (1996). Mutations in subunits of the epithelial sodium channel cause salt wasting with hyperkalaemic acidosis, pseudohypoaldosteronism type 1. Nature Genetics, 12(3), 248-253.
Charlesworth, D., and Willis, J. H. (2009). The genetics of inbreeding depression.
Nature Reviews. Genetics, 10(11), 783-796. Chen, C., Tso, A. W., Cheung, B. M., Law, L. S., Ong, K., Wat, N., Janus, E. D., Xu, A.,
and Lam, K. S. (2012). Plasma concentration of pigment epithelium‐derived factor is closely associated with blood pressure and predicts incident hypertension in Chinese: a 10‐year prospective study. Clinical Endocrinology, 76(4), 506-513.
Clee, S. M., Zwinderman, A. H., Engert, J. C., Zwarts, K. Y., Molhuizen, H. O., Roomp,
K., Jukema, J. W., van Wijland, M., van Dam, M., and Hudson, T. J. (2001). Common genetic variation in ABCA1 is associated with altered lipoprotein levels and a modified risk for coronary artery disease. Circulation, 103(9), 1198-1205.
Collin, G. B., Marshall, J. D., Ikeda, A., So, W. V., Russell-Eggitt, I., Maffei, P., Beck,
S., Boerkoel, C. F., Sicolo, N., and Martin, M. (2002). Mutations in ALMS1 cause obesity, type 2 diabetes and neurosensory degeneration in Alström syndrome. Nature Genetics, 31(1), 74-78.
Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A.,
Handsaker, R. E., Lunter, G., Marth, G. T., and Sherry, S. T. (2011). The variant call format and VCFtools. Bioinformatics, 27(15), 2156-2158.
Davies, M. (2000). The cardiomyopathies: an overview. Heart, 83(4), 469-474. Davydov, E. V., Goode, D. L., Sirota, M., Cooper, G. M., Sidow, A., and Batzoglou, S.
(2010). Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Computational Biology, 6(12), e1001025.
145
Dawber, T. R., Kannel, W. B., Revotskie, N., Stokes III, J., Kagan, A., and Gordon, T. (1959). Some factors associated with the development of coronary heart disease-six years' follow-up experience in the Framingham Study. American Journal of Public Health and the Nations Health, 49(10), 1349-1356.
Delles, C., McBride, M. W., Padmanabhan, S., and Dominiczak, A. F. (2008). The
genetics of cardiovascular disease. Trends in Endocrinology & Metabolism, 19(9), 309-316.
Deloukas, P., Kanoni, S., Willenborg, C., Farrall, M., Assimes, T. L., Thompson, J. R.,
Ingelsson, E., Saleheen, D., Erdmann, J., Goldstein, B. A., et al. (2013). Large-scale association analysis identifies new risk loci for coronary artery disease. Nature Genetics, 45(1), 25-33. doi:10.1038/ng.2480
Denvir, J., Boskovic, G., Fan, J., Primerano, D. A., Parkman, J. K., and Kim, J. H.
(2016). Whole genome sequence analysis of the TALLYHO/Jng mouse. BMC Genomics, 17(907), 1-15.
Dhar, S., Ray, S., Dutta, A., Sengupta, B., and Chakrabarti, S. (2012). Polymorphism of
ACE gene as the genetic predisposition of coronary artery disease in Eastern India. Indian Heart Journal, 64(6), 576-581.
DiPietro, A., Trachtman, H., Sanjad, S. A., and Liftonl, R. P. (1996). Genetic
heterogeneity of Bartter‘s syndrome revealed by mutations in the K+ channel, ROMK. Nature Genetics, 14.
Do, R., Stitziel, N. O., Won, H.-H., Jørgensen, A. B., Duga, S., Merlini, A., Kiezun, A.,
Farrall, M., Goel, A., and Zuk, O. (2015). Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction. Nature, 518(7537), 102-106.
Dopazo, J., Amadoz, A., Bleda, M., Garcia-Alonso, L., Alemán, A., García-García, F.,
Rodriguez, J. A., Daub, J. T., Muntané, G., and Rueda, A. (2016). 267 Spanish exomes reveal population-specific differences in disease-related genetic variation. Molecular Biology and Evolution, 33(5), 1205-1218.
Doris, P. A. (2002). Hypertension genetics, single nucleotide polymorphisms, and the
common disease: common variant hypothesis. Hypertension, 39(2), 323-331. Edmonds, C. A., Lillie, A. S., and Cavalli-Sforza, L. L. (2004). Mutations arising in the
wave front of an expanding population. Proceedings of the National Academy of Sciences, 101(4), 975-979.
Elliott, P. (2000). Diagnosis and management of dilated cardiomyopathy. Heart, 84(1),
106-106. Erdmann, J., Stark, K., Esslinger, U. B., Rumpf, P. M., Koesling, D., de Wit, C., Kaiser,
F. J., Braunholz, D., Medack, A., and Fischer, M. (2013). Dysfunctional nitric oxide signalling increases risk of myocardial infarction. Nature, 504(7480), 432-436.
Fagerberg, L., Hallström, B. M., Oksvold, P., Kampf, C., Djureinovic, D., Odeberg, J.,
Habuka, M., Tahmasebpoor, S., Danielsson, A., and Edlund, K. (2014). Analysis of
146
the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Molecular and Cellular Proteomics, 13(2), 397-406.
Fahed, A., Gelb, B., Seidman, J., and Seidman, C. (2013). Genetics of congenital heart
disease: the glass half empty. Circulation Research, 112(12), E182-E182. Faita, F., Vecoli, C., Foffa, I., and Andreassi, M. G. (2012). Next generation sequencing
in cardiovascular diseases. World Journal of Cardiology, 4(10), 288-295. doi:10.4330/wjc.v4.i10.288
Frikke-Schmidt, R. (2011). Genetic variation in ABCA1 and risk of cardiovascular
disease. Atherosclerosis, 218(2), 281-282. Fu, W., Gittelman, R. M., Bamshad, M. J., and Akey, J. M. (2014). Characteristics of
neutral and deleterious protein-coding variation among individuals and populations. The American Journal of Human Genetics, 95(4), 421-436.
Garg, V., Kathiriya, I. S., Barnes, R., and Schluterman, M. K. (2003). GATA4 mutations
cause human congenital heart defects and reveal an interaction with TBX5. Nature, 424(6947), 443.
Garg, V., Muth, A. N., Ransom, J. F., and Schluterman, M. K. (2005). Mutations in
NOTCH1 cause aortic valve disease. Nature, 437(7056), 270. Gerull, B., Gramlich, M., Atherton, J., McNabb, M., Trombitás, K., Sasse-Klaassen, S.,
Seidman, J., Seidman, C., Granzier, H., and Labeit, S. (2002). Mutations of TTN, encoding the giant muscle filament titin, cause familial dilated cardiomyopathy. Nature Genetics, 30(2), 201-204.
Golbus, J. R., Stitziel, N. O., Zhao, W., Xue, C., Farrall, M., McPherson, R., Erdmann,
J., Deloukas, P., Watkins, H., and Schunkert, H. (2016). Common and rare genetic variation in CCR2, CCR5, or CX3CR1 and risk of atherosclerotic coronary heart disease and glucometabolic traits. Circulation: Cardiovascular Genetics, 9(3), 250-258.
Goldmuntz, E. (2005). DiGeorge syndrome: new insights. Clinics in Perinatology, 32(4),
963-978. Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A., and McKusick, V. A. (2005).
Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research, 33(suppl 1), D514-D517.
Hansson, J. H., Nelson-Williams, C., Suzuki, H., Schild, L., Shimkets, R., Lu, Y.,
Canessa, C., Iwasaki, T., Rossier, B., and Lifton, R. P. (1995). Hypertension caused by a truncated epithelial sodium channel gamma subunit: genetic heterogeneity of Liddle syndrome. Nature Genetics, 11(1), 76-82.
Haq, F. U., Jalil, F., Hashmi, S., Jumani, M. I., Imdad, A., Jabeen, M., Hashmi, J. T.,
Irfan, F. B., Imran, M., and Atiq, M. (2011). Risk factors predisposing to congenital heart defects. Annals of Pediatric Cardiology, 4(2), 117-121.
147
Harrow, J., Frankish, A., Gonzalez, J. M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B. L., Barrell, D., Zadissa, A., and Searle, S. (2012). GENCODE: the reference human genome annotation for The ENCODE Project. Genome Research, 22(9), 1760-1774.
Helgadottir, A., Thorleifsson, G., Manolescu, A., Gretarsdottir, S., Blondal, T.,
Jonasdottir, A., Jonasdottir, A., Sigurdsson, A., Baker, A., and Palsson, A. (2007). A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science, 316(5830), 1491-1493.
Henn, B. M., Botigué, L. R., Bustamante, C. D., Clark, A. G., and Gravel, S. (2015).
Estimating the mutation load in human genomes. Nature Reviews Genetics, 16(6), 333-343.
Henn, B. M., Botigué, L. R., Peischl, S., Dupanloup, I., Lipatov, M., Maples, B. K.,
Martin, A. R., Musharoff, S., Cann, H., and Snyder, M. P. (2016). Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. Proceedings of the National Academy of Sciences, 113(4), E440-E449.
Hershberger, R. E., Hedges, D. J., and Morales, A. (2013). Dilated cardiomyopathy: the
complexity of a diverse genetic architecture. Nature Reviews Cardiology, 10(9), 531-547.
Hindorff, L., Junkins, H., Mehta, J., and Manolio, T. (2011). A catalog of published
genome-wide association studies 2010. Available at: http://www.genome.gov/gwastudies/, (Accessed June 28, 2016).
Hintzsche, J. D., Robinson, W. A., and Tan, A. C. (2016). A Survey of Computational
Tools to Analyze and Interpret Whole Exome Sequencing Data. International Journal of Genomics, 2016, 1-17.
Howrigan, D. P., Simonson, M. A., Kamens, H. M., Stephens, S. H., Wills, A. G.,
Ehringer, M. A., Keller, M. C., and McQueen, M. B. (2011). Mutational load analysis of unrelated individuals. Paper presented at the BMC Proceedings.
Hsieh, Y. Y., Lin, Y. J., Chang, C. C., Chen, D. Y., Hsu, C. M., Lo, M. M., Hsu, K. H.,
and Tsai, F. J. (2010). Human lymphocyte antigen B‐associated transcript 2, 3, and 5 polymorphisms and haplotypes are associated with susceptibility of Kawasaki disease and coronary artery aneurysm. Journal of Clinical Laboratory Analysis, 24(4), 262-268.
Hussain, S., Bibi, S., and Javed, Q. (2011). Heritability of genetic variants of resistin
gene in patients with coronary artery disease: A family-based study. Clinical Biochemistry, 44(8), 618-622.
Hussain, S., Haroon, J., Ejaz, S., and Javed, Q. (2016). Variants of resistin gene and
the risk of idiopathic dilated cardiomyopathy in Pakistan. Meta Gene, 9, 37-41. International Consortium for Blood Pressure Genome-Wide Association Studies.
(2011). Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature, 478(7367), 103-109.
148
Iqbal, M. P., Fatima, T., Parveen, S., Yousuf, F. A., Shafiq, M., Mehboobali, N., Khan, A. H., Azam, I., and Frossard, P. M. (2005). Lack of association of methylenetetrahydrofolate reductase 677C> T mutation with coronary artery disease in a Pakistani population. Journal of Molecular and Genetic Medicine:, 1(1), 26-32.
Iqbal, M. P., Mahmood, S., Mehboobali, N., Ishaq, M., Fatima, T., Parveen, S., and
Frossard, P. (2004). Association study of the angiotensin-converting enzyme (ACE) gene G2350A dimorphism with myocardial infarction. Experimental & Molecular Medicine, 36(2), 110.
Jacoby, D., and McKenna, W. J. (2012). Genetics of inherited cardiomyopathy.
European Heart Journal, 33(3), 296-304. doi:10.1093/eurheartj/ehr260 Japp, A. G., Gulati, A., Cook, S. A., Cowie, M. R., and Prasad, S. K. (2016). The
diagnosis and evaluation of dilated cardiomyopathy. Journal of the American College of Cardiology, 67(25), 2996-3010.
Jobling, M., Hurles, M., and Tyler-Smith, C. (2013). Human evolutionary genetics:
origins, peoples & disease: Garland Science. Jormsjö, S., Wuttge, D. M., Sirsjö, A., Whatling, C., Hamsten, A., Stemme, S., and
Eriksson, P. (2002). Differential expression of cysteine and aspartic proteases during progression of atherosclerosis in apolipoprotein E-deficient mice. The American Journal of Pathology, 161(3), 939-945.
Kaiser, J. (2014). The hunt for missing genes. Science, 344(6185), 687-689. Kannel, W. B., Dawber, T. R., Friedman, G. D., Glennon, W. E., and Mcnamara, P. M.
(1964). Risk Factors in Coronary Heart DiseaseAn Evaluation of Several Serum Lipids as Predictors of Coronary Heart DiseaseThe Framingham Study. Annals of Internal Medicine, 61(5_Part_1), 888-899.
Kannel, W. B., Dawber, T. R., Kagan, A., Revotskie, N., and Stokes, J. (1961). Factors
of Risk in the Development of Coronary Heart Disease—Six-Year Follow-up ExperienceThe Framingham Study. Annals of Internal Medicine, 55(1), 33-50.
Kathiresan, S., and Srivastava, D. (2012). Genetics of human cardiovascular disease.
Cell, 148(6), 1242-1257. Keinan, A., Mullikin, J. C., Patterson, N., and Reich, D. (2007). Measurement of the
human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans. Nature Genetics, 39(10), 1251-1255.
Kelly, B. B., and Fuster, V. (2010). Promoting cardiovascular health in the developing
world: a critical challenge to achieve global health: National Academies Press, Washington D.C.
Kircher, M., Witten, D. M., Jain, P., O'roak, B. J., Cooper, G. M., and Shendure, J.
(2014). A general framework for estimating the relative pathogenicity of human genetic variants. Nature Genetics, 46(3), 310-315.
149
Klopfstein, S., Currat, M., and Excoffier, L. (2005). The fate of mutations surfing on the wave of a range expansion. Molecular Biology and Evolution, 23(3), 482-490.
Köhler, S., Doelken, S. C., Mungall, C. J., Bauer, S., Firth, H. V., Bailleul-Forestier, I.,
Black, G., Brown, D. L., Brudno, M., and Campbell, J. (2014). The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Research, 42(D1), D966-D974.
Kolehmainen, J., Black, G. C., Saarinen, A., Chandler, K., Clayton-Smith, J., Träskelin,
A.-L., Perveen, R., Kivitie-Kallio, S., Norio, R., and Warburg, M. (2003). Cohen syndrome is caused by mutations in a novel gene, COH1, encoding a transmembrane protein with a presumed role in vesicle-mediated sorting and intracellular protein transport. The American Journal of Human Genetics, 72(6), 1359-1369.
Kristiansson, K., Ilveskoski, E., Lehtimäki, T., Peltonen, L., Perola, M., and Karhunen,
P. J. (2008). Association analysis of allelic variants of USF1 in coronary atherosclerosis. Arteriosclerosis, Thrombosis, and Vascular Biology, 28(5), 983-989.
Kryukov, G. V., Pennacchio, L. A., and Sunyaev, S. R. (2007). Most rare missense
alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics, 80(4), 727-739.
Landrum, M. J., Lee, J. M., Riley, G. R., Jang, W., Rubinstein, W. S., Church, D. M.,
and Maglott, D. R. (2014). ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Research, 42(D1), D980-D985.
Laurila, P.-P., Naukkarinen, J., Kristiansson, K., Ripatti, S., Kauttu, T., Silander, K.,
Salomaa, V., Perola, M., Karhunen, P. J., and Barter, P. J. (2010). Genetic association and interaction analysis of USF1 and APOA5 on lipid levels and atherosclerosis. Arteriosclerosis, Thrombosis, and Vascular Biology, 30(2), 346-352.
Lek, M., Karczewski, K. J., Minikel, E. V., Samocha, K. E., Banks, E., Fennell, T.,
O‘Donnell-Luria, A. H., Ware, J. S., Hill, A. J., and Cummings, B. B. (2016). Analysis of protein-coding genetic variation in 60,706 humans. Nature, 536(7616), 285-291.
Lettre, G. (2014). Rare and low-frequency variants in human common diseases and
other complex traits. Journal of Medical Genetics, 51(11), 705-714. Lettre, G., Palmer, C. D., Young, T., Ejebe, K. G., Allayee, H., Benjamin, E. J., Bennett,
F., Bowden, D. W., Chakravarti, A., and Dreisbach, A. (2011). Genome-wide association study of coronary heart disease and its risk factors in 8,090 African Americans: the NHLBI CARe Project. PLoS Genetics, 7(2), e1001300.
Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows–
Wheeler transform. Bioinformatics, 25(14), 1754-1760. Li, H., and Durbin, R. (2011). Inference of human population history from individual
whole-genome sequences. Nature, 475(7357), 493-496.
150
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009). The sequence alignment/map format and SAMtools. Bioinformatics, 25(16), 2078-2079.
Li, Y., Wang, O., Quan, T., Xia, W., Jiang, Y., Li, M., Meng, X., and Xing, X. (2016). A
genomic study of adult-onset idiopathic hypoparathyroidism in Chinese by targeted next-generation sequencing. Zhonghua Nei Ke Za Zhi, 55(8), 604-608.
Liaquat, A., Asifa, G. Z., Zeenat, A., and Javed, Q. (2014). Polymorphisms of tumor
necrosis factor-alpha and interleukin-6 gene and C-reactive protein profiles in patients with idiopathic dilated cardiomyopathy. Annals of Saudi Medicine, 34(5), 407-414.
Lifton, R. P., Gharavi, A. G., and Geller, D. S. (2001). Molecular mechanisms of human
hypertension. Cell, 104(4), 545-556. Liu, X., Zhang, L., Pacciulli, D., Zhao, J., Nan, C., Shen, W., Quan, J., Tian, J., and
Huang, X. (2016). Restrictive Cardiomyopathy Caused by Troponin Mutations: Application of Disease Animal Models in Translational Studies. Frontiers in Physiology, 7(Article 629), 1-6. doi:10.3389/fphys.2016.00629
Lohmueller, K. E., Indap, A. R., Schmidt, S., Boyko, A. R., Hernandez, R. D., Hubisz,
M. J., Sninsky, J. J., White, T. J., Sunyaev, S. R., and Nielsen, R. (2008). Proportionally more deleterious genetic variation in European than in African populations. Nature, 451(7181), 994-997.
Lohmueller, K. E., Mauney, M. M., Reich, D., and Braverman, J. M. (2006). Variants
associated with common disease are not unusually differentiated in frequency across populations. The American Journal of Human Genetics, 78(1), 130-136.
Lopes, L. R., Syrris, P., Guttmann, O. P., O'Mahony, C., Tang, H. C., Dalageorgou, C.,
Jenkins, S., Hubank, M., Monserrat, L., McKenna, W. J., et al. (2015). Novel genotype-phenotype associations demonstrated by high-throughput sequencing in patients with hypertrophic cardiomyopathy. Heart, 101(4), 294-301. doi:10.1136/heartjnl-2014-306387
Luft, F. C. (2017). What have we learned from the genetics of hypertension? Medical
Clinics of North America, 101(1), 195-206. Ma, M., Ru, Y., Chuang, L.-S., Hsu, N.-Y., Shi, L.-S., Hakenberg, J., Cheng, W.-Y.,
Uzilov, A., Ding, W., and Glicksberg, B. S. (2015). Disease-associated variants in different categories of disease located in distinct regulatory elements. BMC Genomics, 16(8), 1-13.
MacArthur, D. G., Balasubramanian, S., Frankish, A., Huang, N., Morris, J., Walter, K.,
Jostins, L., Habegger, L., Pickrell, J. K., and Montgomery, S. B. (2012). A systematic survey of loss-of-function variants in human protein-coding genes. Science, 335(6070), 823-828.
MacArthur, D. G., and Tyler-Smith, C. (2010). Loss-of-function variants in the genomes
of healthy humans. Human Molecular Genetics, 19(R2), R125-R130.
151
Mahmood-ul-Hassan, Awan, Z. A., Gul, A. M., Sahibzada, W. A., and Hafizullah, M. (2005). Prevalence of coronary artery disease in rural areas of Peshawar. Journal of Postgraduate Medical Institute, 19(1), 14-22.
Mahon, N. G., Murphy, R. T., MacRae, C. A., Caforio, A. L., Elliott, P. M., and
McKenna, W. J. (2005). Echocardiographic evaluation in asymptomatic relatives of patients with dilated cardiomyopathy reveals preclinical disease. Annals of Internal Medicine, 143(2), 108-115.
Mani, A., Radhakrishnan, J., Wang, H., Mani, A., Mani, M.-A., Nelson-Williams, C.,
Carew, K. S., Mane, S., Najmabadi, H., and Wu, D. (2007). LRP6 mutation in a family with early coronary disease and metabolic risk factors. Science, 315(5816), 1278-1282.
Mardis, E. R. (2008). The impact of next-generation sequencing technology on
genetics. Trends in Genetics, 24(3), 133-141. Matsumoto, Y., Hayashi, T., Inagaki, N., Takahashi, M., Hiroi, S., Nakamura, T.,
Arimura, T., Nakamura, K., Ashizawa, N., and Yasunami, M. (2005). Functional analysis of titin/connectin N2-B mutations found in cardiomyopathy. Journal of Muscle Research and Cell Motility, 26(6), 367-374.
McEvoy, B. P., Powell, J. E., Goddard, M. E., and Visscher, P. M. (2011). Human
population dispersal ―Out of Africa‖ estimated from linkage disequilibrium and allele frequencies of SNPs. Genome Research, 21(6), 821-829.
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A.,
Garimella, K., Altshuler, D., Gabriel, S., and Daly, M. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20(9), 1297-1303.
McLaren, W., Gil, L., Hunt, S. E., Riat, H. S., Ritchie, G. R., Thormann, A., Flicek, P.,
and Cunningham, F. (2016). The ensembl variant effect predictor. Genome biology, 17(122), 1-14.
Metzker, M. L. (2010). Sequencing technologies--the next generation. Nature Reviews.
Genetics, 11(1), 31-46. Miosge, L. A., Field, M. A., Sontani, Y., Cho, V., Johnson, S., Palkova, A., Balakishnan,
B., Liang, R., Zhang, Y., and Lyon, S. (2015). Comparison of predicted and actual consequences of missense mutations. Proceedings of the National Academy of Sciences, 112(37), E5189-E5198.
Narasimhan, V. M., Hunt, K. A., Mason, D., Baker, C. L., Karczewski, K. J., Barnes, M.
R., Barnett, A. H., Bates, C., Bellary, S., and Bockett, N. A. (2016). Health and population effects of rare gene knockouts in adult humans with related parents. Science, 352(6284), 474-477.
National Heart Lung and Blood Institute. (2017). Types of Congenital Heart Defects.
Retrieved from the National Heart, Lung, and Blood Institute website:. https://www.nhlbi.nih.gov/health/health-topics/topics/chd/types.
152
Nawaz, S. K., and Hasnain, S. (2011). Effect of ACE polymorphisms on the association between noise and hypertension in a Pakistani population. Journal of the Renin-Angiotensin-Aldosterone System, 12(4), 516-520.
Ng, M., Fleming, T., Robinson, M., Thomson, B., Graetz, N., Margono, C., Mullany, E.
C., Biryukov, S., Abbafati, C., and Abera, S. F. (2014). Global, regional, and national prevalence of overweight and obesity in children and adults during 1980–2013: a systematic analysis for the Global Burden of Disease Study 2013. The Lancet, 384(9945), 766-781.
O'donnell, C. J., and Nabel, E. G. (2011). Genomics of cardiovascular disease. New
England Journal of Medicine, 365(22), 2098-2109. Ozaki, K., and Tanaka, T. (2016). Molecular genetics of coronary artery disease.
Journal of Human Genetics, 61(1), 71-77. Paquette, M., Chong, M., Thériault, S., Dufour, R., Paré, G., and Baass, A. (2017).
Polygenic risk score predicts prevalence of cardiovascular disease in patients with familial hypercholesterolemia. Journal of Clinical Lipidology, 11(3), 725-732. e725.
Park, H.-Y. (2017). Hereditary Dilated Cardiomyopathy: Recent Advances in Genetic
Diagnostics. Korean Circulation Journal, 47(3), 291-298. Patterson, N., Price, A. L., and Reich, D. (2006). Population structure and
eigenanalysis. PLoS Genetics, 2(12), e190. Peischl, S., and Excoffier, L. (2015). Expansion load: recessive mutations and the role
of standing genetic variation. Molecular Ecology, 24(9), 2084-2094. Perwaiz Iqbal, M., Iqbal, K., Khan Tareen, A., Parveen, S., Mehboobali, N., Haider, G.,
and Perwaiz Iqbal, S. (2016). Polymorphisms in MTHFR, MS and CBS genes and premature acute myocardial infarction in a Pakistani population. Pakistan Journal of Pharmaceutical Sciences, 29(6), 1901-1906.
Pigeyre, M., Yazdi, F. T., Kaur, Y., and Meyre, D. (2016). Recent progress in genetics,
epigenetics and metagenomics unveils the pathophysiology of human obesity. Clinical Science, 130(12), 943-986.
Pilbrow, A. P., Folkersen, L., Pearson, J. F., Brown, C. M., McNoe, L., Wang, N. M.,
Sweet, W. E., Tang, W. W., Black, M. A., and Troughton, R. W. (2012). The chromosome 9p21. 3 coronary heart disease risk allele is associated with altered gene expression in normal heart and vascular tissues. PLoS One, 7(6), e39574.
Poirier, P., Giles, T. D., Bray, G. A., Hong, Y., Stern, J. S., Pi-Sunyer, F. X., and Eckel,
R. H. (2006). Obesity and cardiovascular disease: pathophysiology, evaluation, and effect of weight loss. Circulation, 113(6), 898-918.
Postma, A. V., Bezzina, C. R., and Christoffels, V. M. (2016). Genetics of congenital
heart disease: the contribution of the noncoding regulatory genome. Journal of Human Genetics, 61(1), 13-19.
153
Pulignani, S., Cresci, M., and Andreassi, M. G. (2013). Genetics of congenital heart defects: is it not all in the DNA? Translational Research, 161(1), 59-61.
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., Maller,
J., Sklar, P., De Bakker, P. I., and Daly, M. J. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81(3), 559-575.
Qureshi, S. F., Ali, A., John, P., Jadhav, A. P., Venkateshwari, A., Rao, H.,
Jayakrishnan, M., Narasimhan, C., Shenthar, J., and Thangaraj, K. (2015). Mutational analysis of SCN5A gene in long QT syndrome. Meta Gene, 6, 26-35.
R Core Team. (2013). R: a language for data analysis and graphics. R Foundation for
Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. Rafiq, M. A., Chaudhry, A., Care, M., Spears, D. A., Morel, C. F., and Hamilton, R. M.
(2017). Whole exome sequencing identified 1 base pair novel deletion in BCL2‐associated athanogene 3 (BAG3) gene associated with severe dilated cardiomyopathy (DCM) requiring heart transplant in multiple family members. American Journal of Medical Genetics Part A, 173(3), 699-705.
Rajwani, A., Ezzat, V., Smith, J., Yuldasheva, N. Y., Duncan, E. R., Gage, M., Cubbon,
R. M., Kahn, M. B., Imrie, H., and Abbas, A. (2012). Increasing circulating IGFBP1 levels improves insulin sensitivity, promotes nitric oxide production, lowers blood pressure, and protects against atherosclerosis. Diabetes, 61(4), 915-924.
Richardson, T. G., Campbell, C., Timpson, N. J., and Gaunt, T. R. (2016).
Incorporating Non-Coding Annotations into Rare Variant Analysis. PloS One, 11(4), e0154181.
Rizvi, S. F.-u.-H., Mustafa, G., Kundi, A., and Khan, M. A. (2015). Prevalence of
congenital heart disease in rural communities of Pakistan. Journal of Ayub Medical College Abbottabad, 27(1), 124-127.
Roth, G. A., Huffman, M. D., Moran, A. E., Feigin, V., Mensah, G. A., Naghavi, M., and
Murray, C. J. (2015). Global and regional patterns in cardiovascular mortality from 1990 to 2013. Circulation, 132(17), 1667-1678.
Sabater‐Molina, M., Pérez‐Sánchez, I., Hernández del Rincón, J. P., and Gimeno, J. R.
(2017). Genetics of hypertrophic cardiomyopathy: a review of current state. Clinical Genetics, 2017, 1-12.
Saeed, M., Perwaiz Iqbal, M., Yousuf, F., Perveen, S., Shafiq, M., Sajid, J., and
Frossard, P. (2007). Interactions and associations of paraoxonase gene cluster polymorphisms with myocardial infarction in a Pakistani population. Clinical Genetics, 71(3), 238-244.
Saeki, H., Hamada, M., and Hiwada, K. (2002). Circulating levels of insulin-like growth
factor-1 and its binding proteins in patients with hypertrophic cardiomyopathy. Circulation Journal, 66(7), 639-644.
154
Saleheen, D., Alexander, M., Rasheed, A., Wormser, D., Soranzo, N., Hammond, N., Butterworth, A., Zaidi, M., Haycock, P., and Bumpstead, S. (2010). Association of the 9p21. 3 locus with risk of first-ever myocardial infarction in pakistanis. Arteriosclerosis, Thrombosis, and Vascular Biology, 30(7), 1467-1473.
Saleheen, D., Natarajan, P., Armean, I. M., Zhao, W., Rasheed, A., Khetarpal, S. A.,
Won, H.-H., Karczewski, K. J., O‘Donnell-Luria, A. H., and Samocha, K. E. (2017). Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity. Nature, 544(7649), 235-239.
Saleheen, D., Natarajan, P., Zhao, W., Rasheed, A., Khetarpal, S., Won, H.-H.,
Karczewski, K. J., ODonnell-Luria, A. H., Samocha, K. E., and Gupta, N. (2015). Human knockouts in a cohort with a high rate of consanguinity. bioRxiv, 031518.
Santoro, D., Buemi, M., Gagliostro, G., Vecchio, M., Currò, M., Ientile, R., and
Caccamo, D. (2015). Association of VDR gene polymorphisms with heart disease in chronic kidney disease patients. Clinical Biochemistry, 48(16), 1028-1032.
Sasson, A., and Michael, T. P. (2010). Filtering error from SOLiD output.
Bioinformatics, 26(6), 849-850. Schott, J.-J., Benson, D. W., Basson, C. T., Pease, W., Silberbach, G. M., Moak, J. P.,
Maron, B. J., Seidman, C. E., and Seidman, J. G. (1998). Congenital heart disease caused by mutations in the transcription factor NKX2-5. Science, 281(5373), 108-111.
Schunkert, H., König, I. R., Kathiresan, S., Reilly, M. P., Assimes, T. L., Holm, H.,
Preuss, M., Stewart, A. F., Barbalic, M., and Gieger, C. (2011). Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature genetics, 43(4), 333.
Schwartz, P. J., Priori, S. G., Locati, E. H., Napolitano, C., Cantù, F., Towbin, J. A.,
Keating, M. T., Hammoude, H., Brown, A. M., and Chen, L.-S. K. (1995). Long QT syndrome patients with mutations of the SCN5A and HERG genes have differential responses to Na+ channel blockade and to increases in heart rate. Circulation, 92(12), 3381-3386.
Seo, S., Guo, D.-F., Bugge, K., Morgan, D. A., Rahmouni, K., and Sheffield, V. C.
(2009). Requirement of Bardet-Biedl syndrome proteins for leptin receptor signaling. Human Molecular Genetics, 18(7), 1323-1331.
Shahid, S. U., Cooper, J. A., Beaney, K. E., Li, K., Rehman, A., and Humphries, S. E.
(2017). Genetic risk analysis of coronary artery disease in Pakistani subjects using a genetic risk score of 21 variants. Atherosclerosis, 258, 1-7.
Shatwan, I. M., Minihane, A.-M., Williams, C. M., Lovegrove, J. A., Jackson, K. G., and
Vimaleswaran, K. S. (2016). Impact of lipoprotein lipase gene polymorphism, S447X, on postprandial triacylglycerol and glucose response to sequential meal ingestion. International Journal of Molecular Sciences, 17(397), 1-9.
Simon, D. B., Karet, F. E., Hamdan, J. M., DiPietro, A., Sanjad, S. A., and Lifton, R. P.
(1996). Bartter's syndrome, hypokalaemic alkalosis with hypercalciuria, is caused
155
by mutations in the Na-K-2Cl cotransporter NKCC2. Nature Genetics, 13(2), 183-188.
Simon, D. B., Nelson-Williams, C., Bia, M. J., Ellison, D., Karet, F. E., Molina, A. M.,
Vaara, I., Iwata, F., Cushner, H. M., and Koolen, M. (1996). Gitelman's variant of Bartter's syndrome, inherited hypokalaemic alkalosis, is caused by mutations in the thiazide-sensitive Na-Cl cotransporter. Nature Genetics, 12(1), 24-30.
Slatkin, M. (2008). Linkage disequilibrium-understanding the evolutionary past and
mapping the medical future. Nature Reviews. Genetics, 9(6), 477-485. Song, L., Zhang, Z., Grasfeder, L. L., Boyle, A. P., Giresi, P. G., Lee, B.-K., Sheffield,
N. C., Gräf, S., Huss, M., and Keefe, D. (2011). Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Genome Research, 21(10), 1757-1767.
Srivastava, A., Srivastava, N., and Mittal, B. (2016). Genetics of Obesity. Indian Journal
of Clinical Biochemistry, 31(4), 361-371. Stitziel, N. O., Stirrups, K. E., Masca, N., Erdmann, J., Ferrario, P. G., König, I. R.,
Weeke, P. E., Webb, T. R., Auer, P. L., and Schick, U. M. (2016). Coding variation in ANGPTL4, LPL, and SVEP1 and the risk of coronary disease. The New England Journal of Medicine, 374(12), 1134-1144.
Stonez, R., Schurman, S., Nayir, A., Alpay, H., Bakkaloglus, A., Rodriguez—Sorianofi,
I., Griswold, W., Richard, G. A., John15, E., and Lifton, R. P. (1997). Mutations in the chloride channel gene, CLCNKB, cause Bartter's syndrome type III. Nature Genetics, 17, 171.
Subramanian, S. (2016). Europeans have a higher proportion of high-frequency
deleterious variants than Africans. Human Genetics, 135(1), 1-7. Suwazono, Y., Kobayashi, E., Uetani, M., Miura, K., Morikawa, Y., Ishizaki, M., Kido,
T., Nakagawa, H., and Nogawa, K. (2006). Low-density lipoprotein receptor-related protein 5 variant A1330V is a determinant of blood pressure in Japanese males. Life Sciences, 78(21), 2475-2479.
Swager, S. A., Delfín, D. A., Rastogi, N., Wang, H., Canan, B. D., Fedorov, V. V.,
Mohler, P. J., Kilic, A., Higgins, R. S., and Ziolo, M. T. (2015). Claudin-5 levels are reduced from multiple cell types in human failing hearts and are associated with mislocalization of ephrin-B1. Cardiovascular Pathology, 24(3), 160-167.
Swapna, N., Vamsi, U. M., Usha, G., and Padma, T. (2011). Risk conferred by FokI
polymorphism of vitamin D receptor (VDR) gene for essential hypertension. Indian journal of Human Genetics, 17(3), 201-206.
Switzer, N. J., Mangat, H. S., and Karmali, S. (2013). Current trends in obesity: body
composition assessment, weight regulation, and emerging techniques in managing severe obesity. Journal of Interventional Gastroenterology, 3(1), 34.
Tennessen, J. A., Bigham, A. W., O‘Connor, T. D., Fu, W., Kenny, E. E., Gravel, S.,
McGee, S., Do, R., Liu, X., and Jun, G. (2012). Evolution and functional impact of
156
rare coding variation from deep sequencing of human exomes. Science, 337(6090), 64-69.
Tester, D. J., and Ackerman, M. J. (2014). Genetics of long QT syndrome. Methodist
DeBakey Cardiovascular Journal, 10(1), 29-33. TG and HDL Working Group of Exome Sequencing Project, N. H. L. a. B. I. (2014).
Loss-of-function mutations in APOC3, triglycerides, and coronary disease. The New England Journal of Medicine, 371(1), 22-31.
Tran, P.-K., Agardh, H. E., Tran-Lundmark, K., Ekstrand, J., Roy, J., Henderson, B.,
Gabrielsen, A., Hansson, G. K., Swedenborg, J., and Paulsson-Berne, G. (2007). Reduced perlecan expression and accumulation in human carotid atherosclerotic lesions. Atherosclerosis, 190(2), 264-270.
Travis, J. M., Münkemüller, T., Burton, O. J., Best, A., Dytham, C., and Johst, K.
(2007). Deleterious mutations can surf to high densities on the wave front of an expanding population. Molecular Biology and Evolution, 24(10), 2334-2343.
Tschirner, A., Palus, S., Hetzer, R., Meyer, R., Anker, S. D., and Springer, J. (2014).
Six1 is down‐regulated in end‐stage human dilated cardiomyopathy independently of Ezh2. ESC Heart Failure, 1(2), 154-159.
Umedani, L. V., Chaudhry, B., Mehraj, V., and Ishaq, M. (2013). Serene threonine
kinase 39 gene single nucleotide AG polymorphism rs35929607 is weakly associated with essential hypertension in population of Tharparkar, Pakistan. Journal of the Pakistan Medical Association, 63(2), 199-205.
van der Bom, T., Bouma, B. J., Meijboom, F. J., Zwinderman, A. H., and Mulder, B. J.
(2012). The prevalence of adult congenital heart disease, results from a systematic review and evidence based calculation. American Heart Journal, 164(4), 568-575.
Varol, E., Ozaydin, M., Altinbas, A., Aslan, S. M., Dogan, A., and Dede, O. (2007).
Elevated carbohydrate antigen 125 levels in hypertrophic cardiomyopathy patients with heart failure. Heart and Vessels, 22(1), 30-33.
Waalen, J. (2014). The genetics of human obesity. Translational Research, 164(4),
293-301. Wain, L. V. (2014). Rare variants and cardiovascular disease. Briefings in Functional
Genomics, 13(5), 384-391. Webb, T. R., Erdmann, J., Stirrups, K. E., Stitziel, N. O., Masca, N. G., Jansen, H.,
Kanoni, S., Nelson, C. P., Ferrario, P. G., and König, I. R. (2017). Systematic evaluation of pleiotropy identifies 6 further loci associated with coronary artery disease. Journal of the American College of Cardiology, 69(7), 823-836.
Weir, B. S., and Cockerham, C. C. (1984). Estimating F-statistics for the analysis of
population structure. Evolution, 38(6), 1358-1370.
157
Wellcome Trust Case Control Consortium. (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447(7145), 661-678.
Welter, D., MacArthur, J., Morales, J., Burdett, T., Hall, P., Junkins, H., Klemm, A.,
Flicek, P., Manolio, T., and Hindorff, L. (2013). The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Research, 42(D1), D1001-D1006.
Willer, C. J., Sanna, S., Jackson, A. U., Scuteri, A., Bonnycastle, L. L., Clarke, R.,
Heath, S. C., Timpson, N. J., Najjar, S. S., and Stringham, H. M. (2008). Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nature Genetics, 40(2), 161-169.
Williams, T., Hundertmark, M., Kraemer, D., Schönberger, J., Czolbe, M., Panther, F.,
Pekarek, V., and Ritter, O. (2011). The Eya4/six1 Signalling Cascade is Crucial in the Development of Heart Disease. Circulation, 124(Suppl 21), A12702.
Wilson, F. H., Disse-Nicodeme, S., Choate, K. A., Ishikawa, K., Nelson-Williams, C.,
Desitter, I., Gunel, M., Milford, D. V., Lipkin, G. W., and Achard, J.-M. (2001). Human hypertension caused by mutations in WNK kinases. Science, 293(5532), 1107-1112.
Winnepenninckx, B., Backeljau, T., and DeWachter, R. (1993). Extraction of high-
molecular-weight DNA from mollusks. Trends in Genetics, 9(12), 407. World Health Organization. (2016). Global Health Estimates 2015: Disease burden by
Cause, Age, Sex, by Country and by Region, 2000-2015. Retrieved from the World Health Organization website: http://www.who.int/healthinfo/ global_burden_disease/estimates/en/index2.html
World Health Organization. (2017a). Cardiovascular diseases (CVDs), Fact sheet May
2017. Retrieved from the World Health Organization website: http://www.who.int/mediacentre/factsheets/fs317/en/.
World Health Organization. (2017b). Obesity and Overweight Fact Sheet October
2017. Retrieved from World Health Organization website: http://www.who.int/mediacentre/factsheets/fs311/en/.
Xie, L., and Li, Y.-M. (2017). Lipoprotein Lipase (LPL) Polymorphism and the Risk of
Coronary Artery Disease: A Meta-Analysis. International Journal of Environmental Research and Public Health, 14(84), 1-7.
Xu, W., Wang, H., Cheng, W., Fu, D., Xia, T., Kibbe, W. A., and Lin, S. M. (2012). A
framework for annotating human genome in disease context. PloS One, 7(12), e49686.
Xu, Y., Gong, W., Peng, J., Wang, H., Huang, J., Ding, H., and Wang, D. W. (2014).
Functional analysis LRP6 novel mutations in patients with coronary artery disease. PloS One, 9(1), e84345.
158
Xue, Y., Chen, Y., Ayub, Q., Huang, N., Ball, E. V., Mort, M., Phillips, A. D., Shaw, K., Stenson, P. D., and Cooper, D. N. (2012). Deleterious-and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. The American Journal of Human Genetics, 91(6), 1022-1032.
Yang, H., and Wang, K. (2015). Genomic variant annotation and prioritization with
ANNOVAR and wANNOVAR. Nature Protocols, 10(10), 1556-1566. Ye, J., Fang, L., Zheng, H., Zhang, Y., Chen, J., Zhang, Z., Wang, J., Li, S., Li, R., and
Bolund, L. (2006). WEGO: a web tool for plotting GO annotations. Nucleic Acids Research, 34(suppl_2), W293-W297.
Yeo, G. S. (2017). Genetics of obesity: can an old dog teach us new tricks?
Diabetologia, 60(5), 778-783. Yu, C., Yan, Q., Fu, C., Shi, W., Wang, H., Zeng, C., and Wang, X. (2014). CYP4F2
genetic polymorphisms are associated with coronary heart disease in a Chinese population. Lipids in Health and Disease, 13(83), 1-5.
Yu, D., Chen, Y., Han, J., Zhang, H., Chen, X., Zou, W., Liang, L., Xu, C., and Liu, Z.
(2008). MUC19 expression in human ocular surface and lacrimal gland and its alteration in Sjögren syndrome patients. Experimental Eye Research, 86(2), 403-411.
Yusuf, S., Hawken, S., Ôunpuu, S., Dans, T., Avezum, A., Lanas, F., McQueen, M.,
Budaj, A., Pais, P., and Varigos, J. (2004). Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. The Lancet, 364(9438), 937-952.
Zhou, C., Li, C., Zhou, B., Sun, H., Koullourou, V., Holt, I., Puckelwartz, M. J., Warren,
D. T., Hayward, R., and Lin, Z. (2017). Novel nesprin-1 mutations associated with dilated cardiomyopathy cause nuclear envelope disruption and defects in myogenesis. Human Molecular Genetics, 26(12), 2258-2276.
Zlotorynski, E. (2015). Chromosome biology: CTCF-binding site orientation shapes the
genome. Nature Reviews. Molecular Cell Biology, 16(10), 578-579. doi:10.1038/nrm4057
159
Chapter 7.0
Appendix Table 1: Cardiac diseases and their genes analyzed in this study
A. Common CVDs: Cardiac disease Gene Cardiac disease Gene
Aneurysm ABCC6 Coronary heart disease CYP2J2
Aneurysm ABHD16A Coronary heart disease CYP3A4
Aneurysm ACE Coronary heart disease DAB2IP
Aneurysm ACTA2 Coronary heart disease EPHX1
Aneurysm ADIPOQ Coronary heart disease FCAR
Aneurysm AGT Coronary heart disease FGF21
Aneurysm AGTR1 Coronary heart disease FTL
Aneurysm APOA1 Coronary heart disease GCG
Aneurysm APOB Coronary heart disease GP1BA
Aneurysm APOE Coronary heart disease GP6
Aneurysm APOM Coronary heart disease GSN
Aneurysm BAG6 Coronary heart disease HFE
Aneurysm CAPN2 Coronary heart disease HSPA8
Aneurysm CASP3 Coronary heart disease IFNG
Aneurysm CCL22 Coronary heart disease IFNWP5
Aneurysm CCR5 Coronary heart disease IGHE
Aneurysm CD44 Coronary heart disease IL15
Aneurysm CD59 Coronary heart disease IL18BP
Aneurysm CHI3L1 Coronary heart disease IL4
Aneurysm COL3A1 Coronary heart disease INSIG1
Aneurysm CRP Coronary heart disease KAT2B
Aneurysm CTSB Coronary heart disease KCNJ11
Aneurysm EGR1 Coronary heart disease KIF6
Aneurysm ENG Coronary heart disease KLK3
Aneurysm FBLN5 Coronary heart disease MEF2A
Aneurysm FBN2 Coronary heart disease MTHFD1L
Aneurysm FGF1 Coronary heart disease NOD1
Aneurysm FGF2 Coronary heart disease NOS2
Aneurysm FLT1 Coronary heart disease NPC1
Aneurysm FN1 Coronary heart disease NPC1L1
Aneurysm GZMB Coronary heart disease NPPC
Aneurysm HGF Coronary heart disease NQO1
Aneurysm HMGB1 Coronary heart disease PON3
Aneurysm HOXA4 Coronary heart disease PPARD
Aneurysm HPSE Coronary heart disease SLC2A9
Aneurysm HSPA4 Coronary heart disease SUMO4
Aneurysm HSPB1 Coronary heart disease TCF7L2
Aneurysm IL2RA Coronary heart disease TFAP2B
Aneurysm IL8 Coronary heart disease THBS2
Aneurysm ITPR3 Coronary heart disease THRA
Aneurysm JDP2 Coronary heart disease TNFRSF1B
Aneurysm KDR Coronary heart disease VAMP8
Aneurysm KLF15 Coronary heart disease, transient cerebral ischemia
GCKR
Aneurysm KLK1 Coronary heart disease INS-IGF2
Aneurysm LEP Endocarditis ITGA2B
Aneurysm LIMK1 Endocarditis PLCB2
Aneurysm LOX Gestational hypertension MIR499A
Aneurysm LRP1 Heart failure ABCB1
Aneurysm LTBP4 Heart failure ACY1
160
Aneurysm MMP10 Heart failure ADCY6
Aneurysm MMP8 Heart failure ADRA1A
Aneurysm MTHFR Heart failure ADRA1B
Aneurysm PF4 Heart failure ADRB3
Aneurysm PLA2G10 Heart failure ADRBK1
Aneurysm PLA2G2A Heart failure APLN
Aneurysm PLAT Heart failure AQP2
Aneurysm PPARG Heart failure ATP2A3
Aneurysm PRDX1 Heart failure AVPR1A
Aneurysm PRKCB Heart failure BDKRB1
Aneurysm PRKCD Heart failure BVES
Aneurysm PROC Heart failure CALCA
Aneurysm PRRC2A Heart failure CALCRL
Aneurysm PTGS2 Heart failure CAMK2D
Aneurysm RETN Heart failure CASP1
Aneurysm RTN4 Heart failure CAV3
Aneurysm SELE Heart failure CD34
Aneurysm SERBP1 Heart failure CEBPA
Aneurysm SERPINA5 Heart failure CFLAR
Aneurysm SERPINE1 Heart failure CKMT1B
Aneurysm TGFBR1 Heart failure CLDN5
Aneurysm TGFBR2 Heart failure CNR2
Aneurysm TIMP1 Heart failure CORIN
Aneurysm TIMP2 Heart failure CSF3
Aneurysm TIMP3 Heart failure CTF1
Aneurysm TNF Heart failure CTGF
Aneurysm TNFRSF11B Heart failure CTSG
Aneurysm XYLT1 Heart failure CYP27B1
Aortic valve disease 2 SMAD6 Heart failure CYP2D6
Arteriopathy AHSG Heart failure DDAH1
Arteriopathy ALPL Heart failure DUSP1
Arteriopathy APOH Heart failure DYRK1A
Arteriopathy APOL1 Heart failure ESRRA
Arteriopathy CD163 Heart failure FCGR3B
Arteriopathy CX3CR1 Heart failure FKBP1B
Arteriopathy ENPP1 Heart failure FOXC1
Arteriopathy F12 Heart failure FOXC2
Arteriopathy F2 Heart failure FOXO1
Arteriopathy F2RL2 Heart failure FOXP1
Arteriopathy F5 Heart failure FOXP4
Arteriopathy F7 Heart failure FRMD4B
Arteriopathy FGG Heart failure FSTL1
Arteriopathy GGT1 Heart failure FSTL3
Arteriopathy HIF1A Heart failure GATM
Arteriopathy HTRA1 Heart failure GNAQ
Arteriopathy ICAM1 Heart failure GRK5
Arteriopathy IL6 Heart failure HAMP
Arteriopathy INS Heart failure HDAC4
Arteriopathy ITGAV Heart failure HLA-B
Arteriopathy ITGB3 Heart failure HSPA1B
Arteriopathy LIPC Heart failure HTR4
Arteriopathy LPA Heart failure IL1RL1
Arteriopathy MPO Heart failure JPH2
Arteriopathy MTTP Heart failure KCNE1
Arteriopathy NOTCH3 Heart failure KCNH2
Arteriopathy NPPB Heart failure KCNQ1
161
Arteriopathy OSBPL10 Heart failure LAMA4
Arteriopathy PCSK9 Heart failure LCN2
Arteriopathy PDGFD Heart failure LGALS3
Arteriopathy PLA2G7 Heart failure LRG1
Arteriopathy PLTP Heart failure MAP4
Arteriopathy SCARB1 Heart failure MAPK14
Arteriopathy SPP1 Heart failure MDK
Arteriopathy TF Heart failure MIR199B
Arteriopathy TNFSF12 Heart failure MIR423
Arteriopathy UGT1A1 Heart failure MMP13
Arteriopathy VCAM1 Heart failure MUC16
Arteriopathy VKORC1 Heart failure MYBPC3
Atherosclerosis ABCA1 Heart failure MYL2
Atherosclerosis ABCD1 Heart failure MYL9
Atherosclerosis ABCG1 Heart failure NISCH
Atherosclerosis ABCG5 Heart failure NOL3
Atherosclerosis ABCG8 Heart failure NOS1
Atherosclerosis ACE2 Heart failure NOX5
Atherosclerosis ADAM10 Heart failure NUPR1
Atherosclerosis ADAM15 Heart failure OPA1
Atherosclerosis ADAM17 Heart failure OXT
Atherosclerosis ADAM33 Heart failure PAK1
Atherosclerosis ADAM8 Heart failure PARP1
Atherosclerosis ADAM9 Heart failure PDE5A
Atherosclerosis ADAMTS4 Heart failure PDK1
Atherosclerosis ADAMTS5 Heart failure POMC
Atherosclerosis ADIPOR2 Heart failure POSTN
Atherosclerosis ADM Heart failure PPP1R1A
Atherosclerosis ADRB2 Heart failure PPP1R2
Atherosclerosis AGER Heart failure PPRC1
Atherosclerosis AGTR2 Heart failure PRKAA2
Atherosclerosis AHR Heart failure PROM1
Atherosclerosis AKR1B1 Heart failure PTHLH
Atherosclerosis AKR1B10 Heart failure PTK2B
Atherosclerosis AKT1 Heart failure RAMP1
Atherosclerosis ALB Heart failure RAMP2
Atherosclerosis ALDH2 Heart failure RAMP3
Atherosclerosis ALOX15 Heart failure RAPGEF3
Atherosclerosis ALOX5 Heart failure REN
Atherosclerosis ALOX5AP Heart failure S100A1
Atherosclerosis ANGPT2 Heart failure S100B
Atherosclerosis AOC3 Heart failure SFTPB
Atherosclerosis APCS Heart failure SLC2A4
Atherosclerosis APH1B Heart failure SLC7A1
Atherosclerosis APOA1BP Heart failure SLC9A1
Atherosclerosis APOA4 Heart failure SOD3
Atherosclerosis APOA5 Heart failure SPATA5L1
Atherosclerosis APOBR Heart failure SRF
Atherosclerosis APOC1 Heart failure STAT1
Atherosclerosis APOC3 Heart failure STC1
Atherosclerosis AR Heart failure TBX5
Atherosclerosis ARG2 Heart failure TEX40
Atherosclerosis B2M Heart failure THBS1
Atherosclerosis BGLAP Heart failure TIMP4
Atherosclerosis BMP4 Heart failure TJP1
Atherosclerosis BRAP Heart failure TNFSF12-TNFSF13
162
Atherosclerosis BSG Heart failure TNNI1
Atherosclerosis CA12 Heart failure TRPC3
Atherosclerosis CA2 Heart failure TRPC6
Atherosclerosis CACNA1C Heart failure TTR
Atherosclerosis CAMP Heart failure UCN
Atherosclerosis CAPG Heart failure UCN2
Atherosclerosis CAPN10 Heart failure UNC93B1
Atherosclerosis CAT Heart failure YY1
Atherosclerosis CCL2 Heart failure ZFPM2
Atherosclerosis CCL23 Hypertension ACAT1
Atherosclerosis CCL5 Hypertension ACSM3
Atherosclerosis CCR2 Hypertension ACVRL1
Atherosclerosis CD14 Hypertension ADD1
Atherosclerosis CD36 Hypertension ADD2
Atherosclerosis CD40 Hypertension ADORA2B
Atherosclerosis CD86 Hypertension ADRA2A
Atherosclerosis CDH1 Hypertension ADRA2B
Atherosclerosis CDH13 Hypertension ALAD
Atherosclerosis CDH5 Hypertension ALOX12
Atherosclerosis CDKN1B Hypertension ANGPT1
Atherosclerosis CDKN1C Hypertension ANPEP
Atherosclerosis CDKN2A Hypertension APEX1
Atherosclerosis CDKN2B Hypertension APLNR
Atherosclerosis CETP Hypertension AQP4
Atherosclerosis CFH Hypertension ARG1
Atherosclerosis CHIT1 Hypertension ARHGEF1
Atherosclerosis CNR1 Hypertension ARHGEF6
Atherosclerosis CPB2 Hypertension ARL6IP5
Atherosclerosis CPE Hypertension ARSG
Atherosclerosis CPT1A Hypertension ATP1A1
Atherosclerosis CSF1 Hypertension ATP1A2
Atherosclerosis CST3 Hypertension ATP1B1
Atherosclerosis CTSS Hypertension ATP2B1
Atherosclerosis CX3CL1 Hypertension ATP5B
Atherosclerosis CXCL1 Hypertension BDKRB2
Atherosclerosis CXCL12 Hypertension BDNF
Atherosclerosis CXCL16 Hypertension BGN
Atherosclerosis CXCL5 Hypertension BLVRA
Atherosclerosis CXCR3 Hypertension BMP10
Atherosclerosis CYBA Hypertension BMP2
Atherosclerosis CYP19A1 Hypertension BMP7
Atherosclerosis CYP27A1 Hypertension BMPR1B
Atherosclerosis CYP2C19 Hypertension BMPR2
Atherosclerosis CYP2C9 Hypertension BTN2A1
Atherosclerosis DKK1 Hypertension C1QTNF1
Atherosclerosis ECE1 Hypertension CACNA1D
Atherosclerosis EDN1 Hypertension CACNB2
Atherosclerosis EDNRB Hypertension CASP8
Atherosclerosis EGF Hypertension CAV1
Atherosclerosis ELANE Hypertension CHEK2
Atherosclerosis ELAVL1 Hypertension CHGB
Atherosclerosis EPHX2 Hypertension CLCNKA
Atherosclerosis ESAM Hypertension CLCNKB
Atherosclerosis ESR1 Hypertension CLU
Atherosclerosis ESR2 Hypertension COMT
Atherosclerosis ETS2 Hypertension CPS1
163
Atherosclerosis F3 Hypertension CRHR1
Atherosclerosis F8 Hypertension CSK
Atherosclerosis FABP3 Hypertension CSMD1
Atherosclerosis FABP4 Hypertension CTH
Atherosclerosis FABP5 Hypertension CTNNB1
Atherosclerosis FASLG Hypertension CYP11A1
Atherosclerosis FCGR2A Hypertension CYP11B1
Atherosclerosis FCGR3A Hypertension CYP17A1
Atherosclerosis FGF23 Hypertension CYP1A2
Atherosclerosis FOXO3 Hypertension CYP21A2
Atherosclerosis FOXP3 Hypertension CYP3A5
Atherosclerosis FPR1 Hypertension CYP4A11
Atherosclerosis FPR2 Hypertension CYP4A22
Atherosclerosis GAS6 Hypertension CYP4F2
Atherosclerosis GDF15 Hypertension DBH
Atherosclerosis GHRL Hypertension DDAH2
Atherosclerosis GJA1 Hypertension DIO2
Atherosclerosis GJA4 Hypertension DNM1L
Atherosclerosis GNB3 Hypertension DPP4
Atherosclerosis GPT Hypertension DRD1
Atherosclerosis GRN Hypertension EDN3
Atherosclerosis GSTM1 Hypertension EGFR
Atherosclerosis GSTO1 Hypertension EMILIN1
Atherosclerosis GSTP1 Hypertension ENPEP
Atherosclerosis GSTT1 Hypertension EPO
Atherosclerosis H6PD Hypertension ERAP1
Atherosclerosis HABP2 Hypertension F11R
Atherosclerosis HAS2 Hypertension F2RL1
Atherosclerosis HAVCR2 Hypertension FBN1
Atherosclerosis HBA1 Hypertension FGA
Atherosclerosis HBEGF Hypertension FGB
Atherosclerosis HDAC5 Hypertension FGF5
Atherosclerosis HMGCR Hypertension FGFBP1
Atherosclerosis HMOX1 Hypertension FH
Atherosclerosis HNF1A Hypertension FMO3
Atherosclerosis HNRNPC Hypertension FURIN
Atherosclerosis HP Hypertension GCGR
Atherosclerosis HSPD1 Hypertension GCK
Atherosclerosis HSPG2 Hypertension GDF2
Atherosclerosis ICOS Hypertension GH1
Atherosclerosis IGF1 Hypertension GHR
Atherosclerosis IGF1R Hypertension GNA12
Atherosclerosis IGFALS Hypertension GNAS
Atherosclerosis IGFBP1 Hypertension GOSR2
Atherosclerosis IGFBP3 Hypertension GPX3
Atherosclerosis IKBKB Hypertension GPX4
Atherosclerosis IL18 Hypertension GREM1
Atherosclerosis IL1A Hypertension GRK4
Atherosclerosis IL1B Hypertension GSTA1
Atherosclerosis IL1RN Hypertension GSTM3
Atherosclerosis IL20 Hypertension GUCA2B
Atherosclerosis IL32 Hypertension HEY1
Atherosclerosis IL6ST Hypertension HLA-A
Atherosclerosis IL7R Hypertension HSD11B1
Atherosclerosis IRS2 Hypertension HSD11B2
Atherosclerosis ITGA2 Hypertension HSD3B1
164
Atherosclerosis ITGB5 Hypertension HSD3B2
Atherosclerosis ITLN1 Hypertension HSPA1A
Atherosclerosis JAK2 Hypertension HSPA1L
Atherosclerosis JAM3 Hypertension HTR2A
Atherosclerosis JUN Hypertension IAPP
Atherosclerosis KL Hypertension ID1
Atherosclerosis KLF2 Hypertension ID2
Atherosclerosis KLRK1 Hypertension IER3
Atherosclerosis LCAT Hypertension IGF2
Atherosclerosis LDLR Hypertension IL12B
Atherosclerosis LEPR Hypertension IL23R
Atherosclerosis LGALS1 Hypertension ILF3
Atherosclerosis LIPG Hypertension INHA
Atherosclerosis LPL Hypertension INHBA
Atherosclerosis LRP6 Hypertension INPPL1
Atherosclerosis LTBR Hypertension INSR
Atherosclerosis LTC4S Hypertension IRS1
Atherosclerosis MAPK7 Hypertension KCNA5
Atherosclerosis MBL2 Hypertension KCNK3
Atherosclerosis MERTK Hypertension KCNMA1
Atherosclerosis MGP Hypertension KCNMB1
Atherosclerosis MIF Hypertension KLC1
Atherosclerosis MIR130A Hypertension KLF5
Atherosclerosis MIR146A Hypertension KLHL3
Atherosclerosis MIR150 Hypertension KLKB1
Atherosclerosis MIR210 Hypertension KNG1
Atherosclerosis MIR27B Hypertension KYNU
Atherosclerosis MMP1 Hypertension LIPE
Atherosclerosis MMP12 Hypertension LRP5
Atherosclerosis MMP3 Hypertension LYZ
Atherosclerosis MNDA Hypertension MAOA
Atherosclerosis NAMPT Hypertension MAP1LC3B
Atherosclerosis NAT2 Hypertension MAPK1
Atherosclerosis NCEH1 Hypertension MAPK8
Atherosclerosis NFATC2 Hypertension MAT1A
Atherosclerosis NFE2L2 Hypertension MC4R
Atherosclerosis NGB Hypertension MEX3C
Atherosclerosis NGF Hypertension MFN2
Atherosclerosis NOX1 Hypertension MIR204
Atherosclerosis NPPA Hypertension MIR21
Atherosclerosis NPY Hypertension MLYCD
Atherosclerosis NR1D1 Hypertension MYOC
Atherosclerosis NR1H3 Hypertension NCF1C
Atherosclerosis NR3C2 Hypertension NEDD4L
Atherosclerosis NRG1 Hypertension NFKBIL1
Atherosclerosis OSBPL8 Hypertension NOX3
Atherosclerosis P2RY12 Hypertension NOX4
Atherosclerosis P2RY2 Hypertension NPR1
Atherosclerosis PALLD Hypertension NR0B1
Atherosclerosis PAPPA Hypertension NR1H4
Atherosclerosis PDE1A Hypertension NR3C1
Atherosclerosis PDE4D Hypertension OPTN
Atherosclerosis PDGFB Hypertension OTC
Atherosclerosis PDGFC Hypertension PCNA
Atherosclerosis PDPN Hypertension PDC
Atherosclerosis PEPD Hypertension PDGFRB
165
Atherosclerosis PGF Hypertension PDHA1
Atherosclerosis PGLYRP1 Hypertension PHOX2A
Atherosclerosis PLA2G3 Hypertension PIK3R1
Atherosclerosis PLA2G6 Hypertension PIM1
Atherosclerosis PLAU Hypertension PLEKHA7
Atherosclerosis PLAUR Hypertension PNMT
Atherosclerosis PLIN2 Hypertension POU5F1
Atherosclerosis PON1 Hypertension PRCP
Atherosclerosis PON2 Hypertension PRKG1
Atherosclerosis PPARA Hypertension PRSS8
Atherosclerosis PPARGC1A Hypertension PSMB9
Atherosclerosis PPIA Hypertension PSMD9
Atherosclerosis PRKCZ Hypertension PTGIS
Atherosclerosis PTGDS Hypertension PTPN1
Atherosclerosis PTGES Hypertension RETNLB
Atherosclerosis PTH Hypertension RGS2
Atherosclerosis PTPN22 Hypertension RHOB
Atherosclerosis PTX3 Hypertension RLN1
Atherosclerosis QSOX1 Hypertension RLN2
Atherosclerosis RARRES2 Hypertension RNLS
Atherosclerosis RBP4 Hypertension ROBO4
Atherosclerosis RGS5 Hypertension ROCK2
Atherosclerosis RHOA Hypertension ROS1
Atherosclerosis RNASE3 Hypertension S100A4
Atherosclerosis RNASE4 Hypertension SARS
Atherosclerosis ROCK1 Hypertension SCG2
Atherosclerosis RORA Hypertension SCNN1B
Atherosclerosis RSAD2 Hypertension SCNN1G
Atherosclerosis RTN3 Hypertension SDK1
Atherosclerosis S100A12 Hypertension SERPINA1
Atherosclerosis S100A8 Hypertension SERPINC1
Atherosclerosis S100A9 Hypertension SERPINF1
Atherosclerosis SAA1 Hypertension SFMBT1
Atherosclerosis SAMD9 Hypertension SGK1
Atherosclerosis SCARB2 Hypertension SLC12A2
Atherosclerosis SCD Hypertension SLC12A3
Atherosclerosis SELL Hypertension SLC22A1
Atherosclerosis SELP Hypertension SLC22A2
Atherosclerosis SELPLG Hypertension SLC22A3
Atherosclerosis SEPP1 Hypertension SLC22A6
Atherosclerosis SERPINA12 Hypertension SLC22A8
Atherosclerosis SERPIND1 Hypertension SLC26A4
Atherosclerosis SHBG Hypertension SLC2A12
Atherosclerosis SIRT1 Hypertension SLC2A5
Atherosclerosis SLC5A7 Hypertension SLC4A4
Atherosclerosis SLC6A4 Hypertension SLC6A18
Atherosclerosis SOCS1 Hypertension SLC6A19
Atherosclerosis SOCS3 Hypertension SLC6A2
Atherosclerosis SOD1 Hypertension SLC6A9
Atherosclerosis SOX18 Hypertension SLC8A1
Atherosclerosis SREBF2 Hypertension SLCO1B1
Atherosclerosis ST8SIA1 Hypertension SLCO4C1
Atherosclerosis STAT3 Hypertension SMAD1
Atherosclerosis SVEP1 Hypertension SMAD4
Atherosclerosis TERT Hypertension SMAD5
Atherosclerosis TFPI Hypertension SORBS1
166
Atherosclerosis TGFB1 Hypertension SREBF1
Atherosclerosis THBD Hypertension SRY
Atherosclerosis TLR2 Hypertension STEAP4
Atherosclerosis TLR4 Hypertension STK39
Atherosclerosis TNFRSF12A Hypertension SUCNR1
Atherosclerosis TNFRSF14 Hypertension TAP1
Atherosclerosis TNFRSF1A Hypertension TBX4
Atherosclerosis TNFRSF25 Hypertension TGFA
Atherosclerosis TNFSF10 Hypertension TGFB3
Atherosclerosis TNFSF11 Hypertension TGFBR3
Atherosclerosis TNFSF15 Hypertension TH
Atherosclerosis TNFSF4 Hypertension THPO
Atherosclerosis TNNT2 Hypertension TLR9
Atherosclerosis TOR2A Hypertension TNFAIP3
Atherosclerosis TP53 Hypertension TPH1
Atherosclerosis TREM1 Hypertension TRPC4
Atherosclerosis TRIB3 Hypertension TRPM6
Atherosclerosis TSPO Hypertension TRPM7
Atherosclerosis TXN Hypertension TRPV5
Atherosclerosis TYRO3 Hypertension TSHR
Atherosclerosis UCP1 Hypertension TXNIP
Atherosclerosis UCP2 Hypertension UMOD
Atherosclerosis USF1 Hypertension VDR
Atherosclerosis UTS2 Hypertension VIP
Atherosclerosis UTS2D Hypertension VNN1
Atherosclerosis UTS2R Hypertension WISP1
Atherosclerosis VEGFA Hypertension WNK1
Atherosclerosis VEGFC Hypertension WNK4
Atherosclerosis WNT5A Hypertension WWOX
Atherosclerosis XBP1 Hypertension XPNPEP1
Atherosclerosis XDH Hypertension ZNF652
Atherosclerosis YWHAZ Hypertension PLEKHA7
Atherosclerosis ZNF202 Hypertension, myocardial infarction GCLC
Atherosclerosis, coronary artery disease PECAM1 Hypertension, myocardial infarction 1 GUCY1A3
Atherosclerosis, myocardial infarction FAM5C Hypertriglyceridemia CREB3L3
Central core disease, tachycardia RYR1 Hypertriglyceridemia LMF1
Coronary artery disease ACAD10 Intracerebral haemorrhage PMF1-BGLAP Coronary artery disease FMN2 Myocardial infarction BRINP3
Coronary artery disease FNDC1 Myocardial infarction GCLM
Coronary artery disease SEZ6L Myocardial infarction LGALS2
Coronary artery spasm 3, susceptibility to ARHGAP9 Myocardial infarction LTA
Coronary heart disease ABO Myocardial infarction MIAT
Coronary heart disease ADAMTS13 Myocardial infarction OLR1
Coronary heart disease ADH1C Myocardial infarction PSMA6
Coronary heart disease ADORA3 Myocardial infarction 1 CCT7
Coronary heart disease AMPD1 Myocardial infarction 1 LRP8
Coronary heart disease ANG Myocardial infarction, protection against|venous thrombosis, protection against
F13A1
Coronary heart disease ANGPTL4 Myocardial ischemia CDKN1A
Coronary heart disease APOC2 Myocardial ischemia COL1A1
Coronary heart disease APOC4 Myocardial ischemia LDHA
Coronary heart disease APOC4-APOC2
Myocardial ischemia NCAM1
Coronary heart disease AS3MT Myocardial ischemia PNOC
Coronary heart disease ATP5J Myocardial ischemia RUNX1
Coronary heart disease AVP Myocardial ischemia SHH
Coronary heart disease C3 Myocardial ischemia TYMP
Coronary heart disease CDKN2B-AS1
Noonan syndrome with asd CBL
167
Coronary heart disease CMA1 Peripheral vascular disease GCH1
Coronary heart disease CNDP1 Pulmonic stenosis|supravalvar aortic stenosis|rasopathy
LRRC56
Coronary heart disease CTSL Rheumatic heart disease IL1R1
Coronary heart disease CXCL10 Rheumatic heart disease KCND3
Coronary heart disease CXCR6 Rheumatic heart disease TLR5
Coronary heart disease CYP1A1 Supravalvar aortic stenosis ELN
Coronary heart disease CYP2C8
B. Mendelian and Congenital CVDs:
Cardiac Disease Gene Cardiac Disease Gene
Aortic aneurysm, familial thoracic 4, thoracic aortic aneurysms and aortic dissections
MYH11 Dilated cardiomyopathy NFKB1
Aortic aneurysm, familial thoracic 6 ACTA2 Dilated cardiomyopathy NPPB
Aortic aneurysm, familial thoracic 8 PRKG1 Dilated cardiomyopathy NPPC
Aortic aneurysm, familial thoracic 9 MFAP5 Dilated cardiomyopathy NPR2
Aortic valve disease 2 SMAD6 Dilated cardiomyopathy OSM
Arrhythmogenic right ventricular cardiomyopathy
CTF1 Dilated cardiomyopathy PGM1
Arrhythmogenic right ventricular cardiomyopathy
DMD Dilated cardiomyopathy POLG
Arrhythmogenic right ventricular cardiomyopathy, type 1
TGFB3 Dilated cardiomyopathy RBM20
Arrhythmogenic right ventricular dysplasia 11
DSC3 Dilated cardiomyopathy SDHA
Arrhythmogenic right ventricular dysplasia 5 TMEM43 Dilated cardiomyopathy SERPINE1
Arrhythmogenic right ventricular dysplasia 9 PKP2 Dilated cardiomyopathy SLC25A5P8
Arterial calcification of infancy ENPP1 Dilated cardiomyopathy SOD2
Atherosclerosis, susceptibility to VEGFA Dilated cardiomyopathy STAT3
Atrial fibrillation, familial, 11 GJA5 Dilated cardiomyopathy TAZ
Atrial fibrillation, familial, 12 ABCC9 Dilated cardiomyopathy TGFB1
Atrial fibrillation, familial, 12 SUR2 Dilated cardiomyopathy TNC
Atrial fibrillation, familial, 14 SCN2B Dilated cardiomyopathy TNFRSF11B
Atrial fibrillation, familial, 15 NUP155 Dilated cardiomyopathy TNFSF10
Atrial fibrillation, familial, 7 KCNA5 Dilated cardiomyopathy TNFSF12
Atrial myxoma, familial PRKAR1A Dilated cardiomyopathy TP53
Abdominal obesity-metabolic syndrome 3 DYRK1B Dilated cardiomyopathy HLA-DQB1
Adams-oliver syndrome 1 ARHGAP31 Dilated cardiomyopathy HLA-DRB4
Amyloidosis, cardiac and cutaneous APOA1 Dilated cardiomyopathy TNF
Atrial septal defect 2 GATA4 Dilated cardiomyopathy TNFRSF12A
Atrial septal defect 3 MYH6 Dilated cardiomyopathy 1AA, Primary familial hypertrophic cardiomyopathy
ACTN2
Atrial septal defect 4 TBX20 Dilated cardiomyopathy 1F, Limb-girdle muscular dystrophy, type 1E
DNAJB6
Atrial septal defect 5 ACTC1 Dilated cardiomyopathy 1II CRYAB
Atrial septal defect 6 TLL1 Dilated cardiomyopathy 1LL, Left ventricular noncompaction 8
PRDM16
Atrial septal defect 6 ENTPD5 Dilated cardiomyopathy 1P, Familial hypertrophic cardiomyopathy 18, Sudden cardiac death, Dilated cardiomyopathy, Cardiac arrest
CEP85L
Atrial septal defect 6 LTBP2 Dilated cardiomyopathy 1P, Familial hypertrophic cardiomyopathy 18, Sudden cardiac death, Dilated cardiomyopathy, Cardiac arrest
PLN
Atrial septal defect 6 AREL1 Dilated cardiomyopathy, dilated RYR2
168
cardiomyopathy
Atrial septal defect 6 ACOT2 Double-outlet right ventricle CFC1
Atrial septal defect 6 TTLL5 Double-outlet right ventricle- GDF1
Atrial septal defect 6 ANGEL1 Dursun syndrome G6PC3
Atrial septal defect 6 VRTN Ehlers-Danlos syndrome, autosomal recessive, cardiac valvular form
COL1A2
Atrial septal defect 6 ZC2HC1C Essential hypertension PTGIS
Atrial septal defect 6 PAPLN Fabry disease, cardiac variant GLA
Atrial septal defect 6 ACOT4 Fabry disease, cardiac variant RPL36A-HNRNPH2
Atrial septal defect 6 PTGR2 Familial abdominal aortic aneurysm 1 MYLK
Atrial septal defect 6 PROX2 Familial hypertrophic cardiomyopathy 1 LDB3
Atrial septal defect 7, with or without av conduction defects
NKX2-5 Familial hypertrophic cardiomyopathy 1, Myosin storage myopathy, Dilated cardiomyopathy 1S, Myopathy, distal, 1, Scapuloperoneal myopathy, MYH7-related
MYH7
Atrial septal defect 8 CITED2 Familial hypertrophic cardiomyopathy 12, Dilated cardiomyopathy 1M
CSRP3
Atrioventricular septal defect 3 GJA1 Familial hypertrophic cardiomyopathy 3, Primary familial hypertrophic cardiomyopathy, Sudden cardiac death, Cardiomyopathy
TPM1
Axenfeld-rieger syndrome, type 3 FOXC1 Familial hypertrophic cardiomyopathy 4 MADD
Cafe au lait spots, multiple|atrial septal defect
SOS1 Familial hypertrophic cardiomyopathy 7, Cardiomyopathy
TNNI3
Cardiac arrest DSP Familial type 5 hyperlipoproteinemia APOA5
Cardiac arrhythmia AKAP9 Fatal infantile mitochondrial cardiomyopathy
SDHD
Cardiac arrhythmia CACNA1C Fracture, hip, susceptibility to, Myocardial infarction, Thrombocytopenia, neonatal alloimmune, Posttransfusion purpura, PL(A1)/(A2) ALLOANTIGEN POLYMORPHISM
ITGB3
Cardiac arrhythmia CACNA1C-AS1
Generalized arterial calcification of infancy 2
ABCC6
Cardiac arrhythmia HCN4 Glycogen storage disease II GAA
Cardiac arrhythmia JUP Gm1-gangliosidosis, type i, with cardiac involvement
GLB1
Cardiac arrhythmia KCNE1 Hirschsprung disease, cardiac defects, and autonomic dysfunction
ECE1
Cardiac arrhythmia KCNJ2 Histiocytosis-lymphadenopathy plus syndrome
SLC29A3
Cardiac arrhythmia KCNJ5 Histiocytosis-lymphadenopathy plus syndrome
PRF1
Cardiac arrhythmia KCNQ1 Human immunodeficiency virus type 1, rapid progression to AIDS, Coronary artery disease, resistance to, Age-related macular degeneration 12, MACULAR DEGENERATION, AGE-RELATED, 12, SUSCEPTIBILITY TO
CX3CR1
Cardiac arrhythmia SCN3B Hyperlipidemia, familial combined, susceptibility to
USF1
Cardiac arrhythmia SCN4B Hyperlipidemia, familial combined, Coronary heart disease
LPL
Cardiac arrhythmia, atrial fibrillation, familial, 13
SCN1B Hyperlipoproteinemia, type Ib APOC2
Cardiac arrhythmia, atrial fibrillation, familial, 6
NPPA Hyperlipoproteinemia, type id GPIHBP1
Cardiac arrhythmia, dilated cardiomyopathy, atrial fibrillation, familial, 10
SCN5A Hypertension, early-onset, autosomal dominant, with exacerbation in pregnancy
NR3C2
169
Cardiac arrhythmia, long qt syndrome 12 SNTA1 Hypertension, essential PTGIS
Cardiac arrhythmia, long qt syndrome 6 KCNE2 Hypertension, essential, susceptibility to AGTR1
Cardiac arrhythmia|arrhythmia ANK2 Hypertension, essential, susceptibility to, Crohn disease, association with
AGT
Cardiac arrhythmia|not specified CASQ2 Hypertrophic cardiomyopathy ACE2
Cardiac arrhythmia|not specified DSG2 Hypertrophic cardiomyopathy AGTR2
Cardiac conduction defect, susceptibility to AKAP10 Hypertrophic cardiomyopathy AR
Cardiac conduction disease with or without dilated cardiomyopathy
FPGT-TNNI3K
Hypertrophic cardiomyopathy BIRC5
Cardiac conduction disease with or without dilated cardiomyopathy
TNNI3K Hypertrophic cardiomyopathy COX15
Cardiac valvular dysplasia, x-linked FLNA Hypertrophic cardiomyopathy CYP11B2
Cardiac valvular dysplasia, x-linked FOS Hypertrophic cardiomyopathy DMPK
Cardiac conduction defect, susceptibility to AKAP10 Hypertrophic cardiomyopathy EDN2
Cardioencephalomyopathy, fatal infantile, due to cytochrome c oxidase deficiency 3
COA5 Hypertrophic cardiomyopathy FHL1
Cardioencephalomyopathy, fatal infantile, due to cytochrome c oxidase deficiency 4
COA6 Hypertrophic cardiomyopathy FXN
Cardiofaciocutaneous syndrome BRAF Hypertrophic cardiomyopathy HLA-DPB1
Cardiofaciocutaneous syndrome 3 MAP2K1 Hypertrophic cardiomyopathy IGF1R
Cardiofaciocutaneous syndrome 4 MAP2K2 Hypertrophic cardiomyopathy IGFBP1
Cardiomyopathy ALMS1 Hypertrophic cardiomyopathy IGFBP3
Cardiomyopathy ANKRD1 Hypertrophic cardiomyopathy MMP2
Cardiomyopathy CAV3 Hypertrophic cardiomyopathy MRPL3
Cardiomyopathy DTNA Hypertrophic cardiomyopathy MRPS22
Cardiomyopathy EMD Hypertrophic cardiomyopathy MTO1
Cardiomyopathy GATAD1 Hypertrophic cardiomyopathy MUC16
Cardiomyopathy HOPX Hypertrophic cardiomyopathy MYBPC2
Cardiomyopathy ILK Hypertrophic cardiomyopathy MYL3
Cardiomyopathy JPH2 Hypertrophic cardiomyopathy MYOZ2
Cardiomyopathy MYL2 Hypertrophic cardiomyopathy MYOZ2
Cardiomyopathy PDLIM3 Hypertrophic cardiomyopathy NDUFV2
Cardiomyopathy SSUH2 Hypertrophic cardiomyopathy OBSCN
Cardiomyopathy SYNE1 Hypertrophic cardiomyopathy PRKAG2
Cardiomyopathy TTR Hypertrophic cardiomyopathy PTPN11
Cardiomyopathy FLNC Hypertrophic cardiomyopathy PYGB
Cardiomyopathy, dilated, 1cc NEXN Hypertrophic cardiomyopathy RAF1
Cardiomyopathy, dilated, 1d TNNT2 Hypertrophic cardiomyopathy SCO2
Cardiomyopathy, dilated, 1dd RBM20 Hypertrophic cardiomyopathy SLC25A3
Cardiomyopathy, dilated, 1l SGCD Hypertrophic cardiomyopathy SMARCA4
Cardiomyopathy, dilated, 1nn RAF1 Hypertrophic cardiomyopathy TMEM70
Cardiomyopathy, dilated, 1nn RAF1 Hypertrophic cardiomyopathy TNNT1
Cardiomyopathy, dilated, 1v PSEN2 Hypertrophic cardiomyopathy VWF
Cardiomyopathy, dilated, 1w; cardiomyopathy, hypertrophic, 15
VCL Infections, recurrent, associated with encephalopathy, hepatic dysfunction, and cardiovascular malformations
FADD
Cardiomyopathy, dilated, 1z TNNC1 Lchad Deficiency, dilated cardiomyopathy
HADHA
Cardiomyopathy, hypertrophic, 1, digenic MYLK2 Left ventricular noncompaction cardiomyopathy
CTNNA3
Cardiomyopathy, hypertrophic, 16 MYOZ2 LEOPARD syndrome 1 PTPN11
Cardiomyopathy, restrictive|long qt syndrome
ESR2 Linear skin defects with multiple congenital anomalies 3
NDUFB11
Cardiomyopathy|not provided FKTN Long qt syndrome 13 C11orf45
Cardiomyopathy|not provided TCAP Long QT syndrome 15 CALM2
Cardiomyopathy|not specified DES Long QT syndrome 2, acquired, susceptibility to
KCNH2
Cardiomyopathy|not specified EYA4 Long QT syndrome, acquired, reduced ALG10
170
susceptibility to
Cataract and cardiomyopathy AGK Mckusick-Kaufman syndrome MKKS
Catecholaminergic polymorphic ventricular tachycardia|ventricular tachycardia, catecholaminergic polymorphic, 4
CALM1 Microvascular complications of diabetes 3, Ischemic stroke, susceptibility to, Myocardial infarction, Stroke, hemorrhagic, susceptibility to
ACE
Charge syndrome SEMA3E Mitochondrial DNA depletion syndrome 12 (cardiomyopathictype)
SLC25A4
Chime syndrome PIGL Craniofacial dysmorphism, and congenital heart defects
B3GAT3
Chops syndrome AFF4 Mycobacterium tuberculosis, susceptibility to, Spina bifida, susceptibility to, Coronary artery disease, modifier of, Coronary artery disease, development of, in hiv
CCL2
Chronic atrial and intestinal dysrhythmia SGOL1 Myocardial infarction, Atherosclerosis, susceptibility to, HDL cholesterol, augmented response of
ESR1
Combined oxidative phosphorylation deficiency 8
AARS2 Myocardial infarcation, susceptibility to PSMA6
Congenital aneurysm of ascending aorta|loeys-dietz syndrome|loeys-dietz syndrome
TGFBR2 Myocardial infarction, decreased susceptibility to
F7
Congenital heart defect CRELD1 Myocardial infarction, protection against F13A1
Congenital heart defect EDNRA Myocardial infarction, susceptibility to 1 LRP8
Congenital heart defect GATA4 Myocardial infarction, susceptibility to 10 LGALS2
Congenital heart defect GATA6 Myocardial infarction, susceptibility to 2 GCLM
Congenital heart defect JAG1 Myocardial infarction, susceptibility to 3 TNFSF4
Congenital heart defect MTHFD1 Myocardial infarction, susceptibility to 4 LTA
Congenital heart defect STRA6 Myocardial infarction, susceptibility to 5 GCLC
Congenital heart defect CBS Myocardial infarction, susceptibility to 7 OLR1
Congenital heart defect JARID2 Myocardial infarction, susceptibility to 9 MIAT
Congenital heart defect MTRR Myopathy, early-onset, with fatal cardiomyopathy
TTN-AS1
Congenital heart defects, nonsyndromic, 1, x-linked
ZIC3 Myopathy, spheroid body MYOT
Congenital heart defects, hamartomas of tongue, and polysyndactyly
WDPCP not provided, Cardiac arrest DSC2
Congenital heart defects, multiple types, 4 NR2F2 Oculo-facial-cardiac-dental syndrome (OFCD)
BCOR
Congenital heart defects, nonsyndromic, 2 TAB2 Orthostatic intolerance SLC6A2
Congenital heart defects, with multiple joint dislocations
RTN4 Paroxysmal atrial fibrillation LMNA
Congestive heart failure and beta-blocker response, modifier of
ADRA2C Pericardial constriction and growth failure
TRIM37
Congestive heart failure and beta-blocker response, modifier of
ADRB1 Primary dilated cardiomyopathy BAG3
Conotruncal anomaly face syndrome TBX1 Primary dilated cardiomyopathy FHL2
Conotruncal heart malformations NKX2-6 Primary dilated cardiomyopathy PSEN2
Conotruncal heart malformations, variable NKX2-5 Primary dilated cardiomyopathy SYNE2
Coronary artery spasm 1, susceptibility to NOS3 Primary dilated cardiomyopathy, Cardiomyopathy, dilated, 1u, Heart failure
PSEN1
Coronary heart disease 2 IL1B Primary dilated cardiomyopathy, Combined oxidative phosphorylation deficiency 3
TSFM
Coronary heart disease 6 MMP3 Primary dilated cardiomyopathy, Familial hypertrophic cardiomyopathy 4, Primary familial hypertrophic cardiomyopathy, Left ventricular noncompaction 10, Cardiomyopathy, Paroxysmal atrial fibrillation
MYBPC3
171
Dilated cardiomyopathy ADIPOQ Primary dilated cardiomyopathy, Primary familial hypertrophic cardiomyopathy
MYOM1
Dilated cardiomyopathy ADORA1 Primary dilated cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Sudden cardiac death, Cardiomyopathy, Left ventricular noncompaction cardiomyopathy
RBM20
Dilated cardiomyopathy ADRB2 Primary familial hypertrophic cardiomyopathy
ACADVL
Dilated cardiomyopathy CD40LG Primary familial hypertrophic cardiomyopathy
CALR3
Dilated cardiomyopathy CD46 Primary familial hypertrophic cardiomyopathy
DLG4
Dilated cardiomyopathy CDH2 Primary familial hypertrophic cardiomyopathy
DLST
Dilated cardiomyopathy CHGA Primary familial hypertrophic cardiomyopathy
KRAS
Dilated cardiomyopathy CHRM2 Primary familial hypertrophic cardiomyopathy
LAMA4
Dilated cardiomyopathy CRP Primary familial hypertrophic cardiomyopathy
MIB1
Dilated cardiomyopathy CTLA4 Primary familial hypertrophic cardiomyopathy
TMPO
Dilated cardiomyopathy CXADR Primary familial hypertrophic cardiomyopathy
TMPO-AS1
Dilated cardiomyopathy DAG1 Primary familial hypertrophic cardiomyopathy
TRIM63
Dilated cardiomyopathy DCAF4 Primary familial hypertrophic cardiomyopathy
TXNRD2
Dilated cardiomyopathy DNAJA3 Primary familial hypertrophic cardiomyopathy, Left ventricular noncompaction cardiomyopathy, Dilated cardiomyopathy 1G
TTN
Dilated cardiomyopathy DNAJC19 Primary pulmonary hypertension 2 SMAD9
Dilated cardiomyopathy DVL2 Progressive familial heart block, type IB TRPM4
Dilated cardiomyopathy EDNRB Pulmonary arterial hypertension related to hereditary hemorrhagic telangiectasia
ACVRL1
Dilated cardiomyopathy ELMSAN1 Pulmonary hypoplasia-diaphragmatic hernia-anophthalmia-cardiac defect (PDAC) syndrome
RARB
Dilated cardiomyopathy ERBB2 Ritscher-Schinzel syndrome KIAA0196
Dilated cardiomyopathy FAS Sensorineural deafness with hypertrophic cardiomyopathy
MYO6
Dilated cardiomyopathy FKRP Stroke, susceptibility to ALOX5AP
Dilated cardiomyopathy HFE Sudden cardiac death, Cardiomyopathy, not provided
GPD1L
Dilated cardiomyopathy HLA-G TARP syndrome RBM10
Dilated cardiomyopathy IFNG T-cell immunodeficiency, recurrent infections, and autoimmunity with or without cardiac malformations
STK4
Dilated cardiomyopathy IFT43 Tetralogy of Fallot, Atrial septal defect 9 GATA6
Dilated cardiomyopathy IGF1 Thoracic aortic aneurysms and aortic dissections
COL5A1
Dilated cardiomyopathy IL10 Thoracic aortic aneurysms and aortic dissections
TGFBR1
Dilated cardiomyopathy IL17A Thrombophilia due to factor V Leiden, Ischemic stroke, susceptibility to
F5
Dilated cardiomyopathy IL6 Thrombophilia, Ischemic stroke, susceptibility to
F2
Dilated cardiomyopathy ITPR2 Ventricular septal defect HEATR4
Dilated cardiomyopathy LAMA2 Ventricular septal defect MLH3
Dilated cardiomyopathy LAMA3 Ventricular septal defect NEK9
Dilated cardiomyopathy LAMP2 Ventricular septal defect NOTCH1
172
Dilated cardiomyopathy MIR208A Ventricular septal defect IRX4
Dilated cardiomyopathy MMP1 Ventricular septal defect NFATC1
Dilated cardiomyopathy MMP14 Ventricular septal defect SALL4
Dilated cardiomyopathy MMP9 Ventricular septal defect TBX5
Dilated cardiomyopathy MURC Ventricular septal defect 1 GATA4
Dilated cardiomyopathy MYBPC1 Ventricular tachycardia, catecholaminergic polymorphic, 3
UNC5B
Dilated cardiomyopathy MYPN Ventricular tachycardia, catecholaminergic polymorphic, 5, with or without muscle weakness
TRDN
Dilated cardiomyopathy NDUFV1 Ventricular tachycardia, somatic GNAI2
Dilated cardiomyopathy NEBL Ventricular tachycardia, somatic GPATCH2L