assessment of genetic risk factors for cardiovascular...

Assessment of Genetic Risk Factors for

Cardiovascular Diseases in Pakistani Population

A thesis submitted for partial fulfillment of the

requirement for the degree of Doctor of Philosophy

By

MUHAMMAD SHAKEEL

Dr. Panjwani Center for Molecular Medicine and Drug Research,

International Center for Chemical and Biological Sciences,

University of Karachi, Karachi-75270, Pakistan

January 2018

CERTIFICATE

TO WHOM IT MAY CONCERN

It is certified that the thesis entitled, “Assessment of Genetic Risk Factors for

Cardiovascular Diseases in Pakistani Population”, submitted to the Board of Advanced

Studies and Research (BASR), University of Karachi, by Mr. Muhammad Shakeel, fulfills the

requirements for awarding the degree of Doctor of Philosophy (Ph.D.) in Molecular

Medicine.

___________________ ___________________

Dr. Ishtiaq Ahmad Khan Prof. Dr. M. Iqbal Choudhary

(Research Supervisor) (H.I., S.I., T.I.)

Assistant Professor Director ICCBS

PCMD, ICCBS University of Karachi, Karachi-75270

University of Karachi, Karachi-75270 Pakistan

Pakistan.

Dedication

To

my

Loving Parents

and

Affectionate Siblings

I

Acknowledgements

First of all I bow my head in front of Almighty Allah for His mercy and blessings. All

loves, respects and references to the Holy Prophet (Sallallaho Alaihe Wasallam) for

enlightening of souls with the light of knowledge.

I express my gratitude to Ms. Nadira Panjwani, H.I., S.I. (Chairperson, Dr. Panjwani

Memorial Trust) for establishing Dr. Panjwani Center for Molecular Medicine and Drug

Research (PCMD), at International Center for Chemical and Biological Sciences

(ICCBS), University of Karachi. I am highly grateful to Prof. Dr. Atta-ur-Rahman,

F.R.S., N.I., H.I., S.I., T.I. (Patron-in-Chief ICCBS) for establishing Jamil-ur-Rahman

Center for Genome Research, and Prof. Dr. M. Iqbal Choudhary, H.I., S.I., T.I.

(Director ICCBS) for leading this world class institution to greater heights. I am deeply

indebted to my research supervisor Dr. Ishtiaq Ahmad Khan for his stimulating

personality, skillful guidance, keen interest, sincere advice and inspiration during the

course of my work.

I am thankful to the Higher Education Commission, Pakistan for awarding the

Indigenous PhD Fellowship. I am greatly encumbered to Prof. Dr. M. Kamran Azim for

his help, expert opinion and guidance during my research work. I am highly grateful to

Dr. Qasim Ayub from Welcome Trust Sangers Institute, Cambridge University for

providing guidelines in the analysis. I would also thank Dr. Waqasuddin Khan for

helping in some of my data analysis work. I would also like to convey my deep

gratitude to all my teachers at the International Center for Chemical and Biological

Sciences (ICCBS) from which I learnt a lot during my stay at ICCBS.

I am pleased to convey my thanks to my colleagues Muhammad Irfan, and Atia Gohar

for their suggestions, and help whenever I needed.

I am highly grateful to the prayers of my mother and siblings especially the eldest

brother who not only encouraged but also supported me to do this job.

Muhammad Shakeel Karachi

Jan‘ 2018

II

Table of Contents

Acknowledgements ........................................................................................................ I

Table of Contents ......................................................................................................... II

List of Figures .............................................................................................................. VI

List of Tables ............................................................................................................... IX

Abbreviations ............................................................................................................... XI

Summary ................................................................................................................... XIII

XVI ........................................................................................................................ .خال صہ

1.0 Introduction…………………………..…………………………..…………..…....…1

1.1 Cardiovascular Diseases .........................................................................................2

1.2 Prevalence of Cardiovascular Diseases...................................................................2

1.3 Risk Factors of Cardiovascular Diseases.................................................................3

1.4 Genetic Risk Factors for Cardiovascular Diseases ..................................................5

1.4.1 Genetics of Coronary Heart Disease and Myocardial Infarction .................................... 6

1.4.2 Genetics of Hypertension ............................................................................................. 7

1.4.3 Genetics of Congenital Heart Diseases ........................................................................ 9

1.4.4 Genetics of Cardiomyopathies .................................................................................... 11

1.5 Genetics of Obesity ...............................................................................................13

1.6 Mutational Load for Cardiovascular Diseases ........................................................15

1.7 Genetic Research on Cardiovascular Diseases in Pakistan...................................17

1.8 Objectives of the Study ..........................................................................................19

2.0 Materials and Methods…………………………………………..…………..…..…20

2.0 Scheme of Study ............................................................................................... 21

2.1 Estimating the Mutaional Load for Cardiovascular Diseases in Pakistani

Population and its Comparison with Global Populations ................................... 22

2.1.1 Genes Involved in Cardiovascular Diseases................................................................ 22

2.1.2 Genomic/Exomic Datasets used .................................................................................. 23

2.1.3 The Analysis Pipeline .................................................................................................. 24

2.1.4 Filtration of Variants by ClinVar Database ................................................................... 27

III

2.1.5 Comparison of Allele Frequencies of Deleterious Variants of CVDs with Global

Populations ............................................................................................................ 27

2.1.6 Genetic Differentiation of Deleterious Variants ....................................................... 28

2.1.7 Linkage Analysis of Deleterious Variants................................................................ 29

2.2 Whole Genome Sequencing of a Pakistani Individual with Hyperlipidemia

and Coronary Artery Disease ....................................................................... 30

2.2.1 Samples Collection and DNA Isolation ................................................................... 30

2.2.2 DNA Quality Assessment and Quantification .......................................................... 31

2.2.3 Library Preparation and DNA Sequencing .............................................................. 32

2.2.3.1 Fragmentation of Genomic DNA ............................................................................ 32

2.2.3.2 Mate-paired Library Preparation ............................................................................. 32

2.2.3.3 Evaluation of the Library with Bioanalyzer .............................................................. 34

2.2.3.4 Preparation of Emulsion, Emulsion-PCR, and Beads Enrichment .......................... 34

2.2.3.5 3‘-Modification of Template Beads ......................................................................... 35

2.2.3.6 Loading the Flow Chip with Template Beads for Sequencing Reactions ................ 36

2.2.4 Analysis of the Genomic Data ................................................................................ 38

2.2.4.1 Filtration of Poor Quality Short Reads .................................................................... 38

2.2.4.2 Alignment of Short Reads with the Reference Human Genome: ............................ 39

2.2.4.3 Post Alignment Processing and Variants Calling .................................................... 39

2.2.5 Assessing the Genetic Variants related to Hyperlipidemia, and related Cardiac

Disorders ................................................................................................................ 42

2.3 Whole Exome Sequencing of Patients with Cardiomyopathy ....................... 43

2.3.1 Selection of Cardiomyopathy Patients .................................................................... 43

2.3.2 Collection of Blood Samples, and DNA Isolation and Quantification ....................... 44

2.3.3 Library Preparation and Exome Enrichment for Whole Exome Sequencing ........... 44

2.3.3.1 Fragmentation of Genomic DNA ............................................................................ 44

2.3.3.2 End-repair of the Fragmented DNA ........................................................................ 46

2.3.3.3 Purification and Adenylation of End-repaired DNA ................................................. 46

2.3.3.4 Ligation of Paired-end Adaptors ............................................................................. 47

2.3.3.5 Amplification of Adaptors-ligated Library ................................................................ 48

2.3.3.6 Assessment of Quality and Quantity of the Amplified Library .................................. 49

2.3.3.7 Hybridyzation and Exome Capturing ...................................................................... 49

2.3.3.8 Capturing the Hybridized DNA using Streptavidin-coated Beads ............................ 51

2.3.3.9 Amplification of Captured Library with Indexing Primers ......................................... 51

IV

2.3.3.10 Sequencing by Synthesis on Illumina Platform ........................................................ 52

2.3.4 Analysis of Whole Exome Sequencing Raw Data .................................................... 53

2.3.5 Analysis of Variants for Cardimyopathy ................................................................... 56

3.0 Results and Discussion……………………………..………..…………..…...…57

3.1 Mutational Load of Cardiovascular Diseases in Pakistani Population and its

Comparison with Global Populations .............................................................. 58

3.1.1 Gene Ontology .......................................................................................................... 58

3.1.2 Mutational Load of CVDs in Pakistani Population using 1000 Genomes PJL,

ExAC SAS, and British Pakistanis Datasets .............................................................. 58

3.1.3 Filtration of Variants from ClinVar Database .............................................................. 69

3.1.4 Comparative Analysis of Allele Frequencies of Predicted Deleterious Variants ......... 83

3.1.5 Functional Annotation of Deleterious Variants ........................................................... 89

3.1.6 Differentiation of Deleterious Variants in Pakistani Population ................................... 92

3.2 Whole Genome Sequencing of a Pakistani Individual with Hyperlipidemia

and Coronary Artery Disease ........................................................................ 100

3.2.1 Quality Assessment of Genomic DNA ..................................................................... 100

3.2.2 Fragmentation of Genomic DNA and Size Selection ................................................ 100

3.2.3 Mate-Paired Library Preparation .............................................................................. 101

3.2.4 Evaluation of the Mate-Paired Library ...................................................................... 102

3.2.6 Analysis of Whole Genome Sequencing Data ......................................................... 104

3.2.7 Analysis for Deleterious Mutations Related to Hyperlipidemia and Related Cardiac

Diseases.................................................................................................................. 106

3.2.8 Filtration for Disease Mutations Related to Hyperlipidemia and Related Cardiac

Diseases.................................................................................................................. 111

3.3 Whole Exome Sequencing and Analysis of Pakistani Patients with

Cardiomyopathy ............................................................................................ 115

3.3.1 Sequencing Reads ................................................................................................... 115

3.3.2 Quality Assessment of Raw Short Reads ................................................................. 115

3.3.3 Alignment with the Reference Genome and Variants Calling .................................... 117

3.3.4 Annotation of Single Nucleotide Variants (SNVs) and Analysis ................................. 121

3.3.4.1 Annotation with ANNOVAR, and CADD .................................................................. 121

3.3.4.2 Annotation with Variant Effect Predictor (VEP) ........................................................ 132

3.3.5 Annotation of Small Indels and Analysis .................................................................. 136

V

3.3.5.1 Annotation with CADD ............................................................................................. 136

3.3.5.2 Annotation with VEP ................................................................................................ 136

3.3.6 Filtration of Variants of ClinVar, OMIM, and GWAS databases ................................ 137

4.0 Conclusion………………………...……………………………..…………..….…138

5.0 Publications………………………………………………………………………….140

6.0 References……………………………..………………………………..…...…..…141

7.0 Appendix Table 1……………………………………………….…………………..162

VI

List of Figures

Figure 1.1 Classical and new risk factors of CVDs. ................................................................ 4

Figure 1.2 Nephron and genes in the collecting duct and distal tubule involved in

reabsorption of Na+ ions and resulting in hypertension. ........................................ 8

Figure 1.3 Various forms of congenital heart diseases ........................................................... 9

Figure 1.4 A schematic short axis cross-sectional view of heart representing various

forms of cardiomyopathies. ................................................................................. 11

Figure 2.1 The outline of methodology for determining the genetic risk factors for

CVDs in Pakistani population. ............................................................................. 21

Figure 2.2 Number of genes analyzed for common, Mendelian and congenital CVDs

in this study. ........................................................................................................ 23

Figure 2.3 The pipeline to find and analyze the deleterious variants related to cardiac

diseases in Pakistani population. ......................................................................... 25

Figure 2.4 NGS workflow for fragment library preparation and paired-end sequencing on

Illumina. ............................................................................................................... 43

Figure 2.1 The outline of methodology for determining the genetic risk factors for CVDs

in Pakistani population. ....................................................................................... 21

Figure 2.2 Number of genes analyzed for common, Mendelian and congenital CVDs in

this study. ............................................................................................................ 23

Figure 2.3 The pipeline to find and analyze the deleterious variants related to cardiac

diseases in Pakistani population. ......................................................................... 25

Figure 2.4 The reactions of sequencing by oligomer ligation and detection (SOLiD)

technology. .......................................................................................................... 37

Figure 2.5 NGS workflow for fragment library preparation and paired-end sequencing

on Illumina. .......................................................................................................... 45

Figure 3.1 Functional categorization of genes involved in cardiovascular diseases. ............. 59

Figure 3.2 The proportions of nonsynonymous, synonymous, and deleterious SNVs

in three datasets. ................................................................................................. 61

Figure 3.3 The number of SNVs predicted as deleterious by CADD, Polyphen2, and

SIFT in genes of cardiovascular diseases. .......................................................... 62

VII

Figure 3.4 Chromosomal positions of deleterious variants in TTN. The deleterious

variants are bunched in initial exons of the gene. ............................................. 64

Figure 3.5 ClinVar‘s pathogenic and likely pathogenic variants from ExAC SAS having

significantly higher allele frequency in SAS than in other populations. .............. 71

Figure 3.6 Mutational load of different cardiovascular disorders in terms of allele

counts of ClinVar‘s pathogenic and likely pathogenic variants.. ........................ 72

Figure 3.7 Chromosomal positions of genes harboring the ClinVar‘s pathogenic and

likely pathogenic variants associated with cardiovascular diseases.. ................ 73

Figure 3.8 Allele frequency spectrum (AFS) of deleterious SNVs in three datasets:

(A) 1000 Genomes PJL, (B) ExAC South Asians, and (C) British Pakistanis.. .. 83

Figure 3.9 Allele frequency spectrum using the common deleterious SNVs of DAF≥10%

of three datasets. .............................................................................................. 85

Figure 3.10 Comparative distribution of allele frequencies of shared deleterious SNVs

in PJL versus all continental groups of 1000 Genomes Project. ....................... 88

Figure 3.11 Manhattan plot for FST values between the PJL versus SAS populations of

1000 Genomes Project.. ................................................................................... 94

Figure 3.12 Comparison of the proportions of moderately, greatly, and severely differen-

tiated deleterious SNVs and all SNVs in genes harboring deleterious SNVs.. .. 95

Figure 3.13 Principal Components Analysis (PCA) using the genes-set of CVDs.

A. PCA using all low and rare allele frequency (AF≤5.0%) SNVs, B. PCA

using all common allele frequency (AF>5.0%) SNVs. C. PCA using

deleterious low and rare allele frequency (AF≤5.0%) SNVs, D. PCA

using deleterious common allele frequency (AF>5.0%) SNVs. ......................... 98

Figure 3.14 Site frequency spectrums for PJL, 5 other populations of 1000 Genomes

Project, and one Southeast Asian population ‗Malay‘, using the data of same

number of individuals (n=96) of each population for normalization.

A. Comparison of low frequency deleterius SNVs in genes set of CVDs.

B. Percent homozyous deleterious SNVs in each population. .......................... 99

Figure 3.15 Agarose gel electrophoresis of genomic DNA isolated from obese individual. 100

Figure 3.16 A. Fragmentation of genomic DNA using the Covaris S220 system.

B. Size selection by slicing the most intense part of fragmented DNA. ........... 101

Figure 3.17 A schematic illustration of one fragment of mate-paired library.. ..................... 101

Figure 3.18 A 2% E-Gel showing the position of mate-paired library in lane no. 2. ............ 102

Figure 3.19 Evaluation of the mate-paired library by Bioanalyzer 2100. ............................ 103

VIII

Figure 3.20 Distribution of the depth (DP) of variants. ....................................................... 105

Figure 3.21 The predicted deleterious variants with SIFT, Polyphen2, and CADD. ........... 106

Figure 3.22 Validated deleterious SNVs having higher allele frequency in SAS

populations than in global populations. ........................................................... 109

Figure 3.23 Comparison of Global and South Asian allele frequencies for variants of

hyperlipidemia (blue) and ischemic heart diseases (red). ............................... 113

Figure 3.24 Phred quality score distribution of forward and reverse ‘fastq‘ files. ................ 116

Figure 3.25 Insert size for all the five bam files. ................................................................. 118

Figure 3.26 Histogram for the depth of coverage for SNPs (A) and indels (B). .................. 120

Figure 3.27 Venn diagram showing the number of SNVs predicted as deleterious by

SIFT, Polyphen2, and with CADD_phred score ≥ 15. ..................................... 125

Figure 3.28 The SNVs predicted as deleterious by SIFT. .................................................. 126

Figure 3.29 The SNVs predicted as deleterious by Polyphen2. ......................................... 126

Figure 3.30 The SNVs with CADD_phred score ≥ 15. ....................................................... 127

Figure 3.31 The combinedly predicted deleterious SNVs with CADD (phred score ≥ 15)

and SIFT, and Polyphen2 tools. ..................................................................... 127

Figure 3.32 Site Frequency Spectrum of all SNVs (A), and deleterious SNVs (B).. ........... 129

Figure 3.33 Scatter plot of 350 deleteroius SNVs for comparison of derived allele

frequencies in South Asia and in Global populations.. .................................... 131

Figure 3.34 Numbers of Loss of Function SNPs according to functional consequences. .. 132

Figure 3.35 Loss of Functions (LoF) SNVs. (A) Allele frequency spectrum of all LoF SNVs

in South Asia. (B) Genomic evolutionary rate profiling (GERP++) scores for

LoF SNVs. ...................................................................................................... 134

Figure 3.36 Functional consequences of indels with CADD_phred ≥ 15. ............................ 136

Figure 3.37 Loss of Function indels according to functional consequences. ....................... 137

IX

List of Tables

Table 1.1 Estimated disability adjusted life years (DALYs) due to CVDs in Pakistan

during the period of 2000-2015. ........................................................................... 3

Table 2.1 Populations of 1000 Genomes Project used for principal components

analysis (PCA). .................................................................................................. 29

Table 2.2 Covaris protocol for fragmenting genomic DNA. ................................................ 32

Table 2.3 PCR conditions for the amplification of mate-paired library. ............................... 33

Table 2.4 Components for preparing the emulsion for ePCR. ............................................ 34

Table 2.5 Determining the amount of template to be used in emulsion preparation,

using the e-calculator-Life Technologies. ........................................................... 35

Table 2.6 Settings on the Covaris instrument for gDNA fragmentation .............................. 46

Table 2.7 Components of End Repair master mix .............................................................. 46

Table 2.8 Components of Adenylation master mix ............................................................. 47

Table 2.9 Components for ligation of paired-end adaptors ................................................. 48

Table 2.10 Components for amplifying the library ................................................................ 48

Table 2.11 PCR program for amplification of adaptor ligated library .................................... 49

Table 2.12 Components of Block Mix .................................................................................. 50

Table 2.13 Components of Hybridization Buffer ................................................................... 50

Table 2.14 Components of Capture Library Hybridization Mix for capture size ≥3 Mb ........ 50

Table 2.15 Components of PCR for indexing ....................................................................... 52

Table 2.16 PCR program for indexing the library ................................................................. 52

Table 3.1 The subset of variants within the coordinates of genes-set of CVDs.. ................ 60

Table 3.2 Genes of Mendelian and congenital CVDs containing high number of

predicted deleterious variants in ExAC SAS....................................................... 65

Table 3.3 Genes of common, Mendelian and congenital CVDs containing high

number of predicted deleterious variants in British Pakistanis. ........................... 67

Table 3.4 ClinVar‘s pathogenic and likely pathogenic variants filtered form 1000

Genomes PJL dataset. ...................................................................................... 74

Table 3.5 ClinVar‘s pathogenic and likely pathogenic variants filtered form ExAC

SAS dataset. ...................................................................................................... 74

Table 3.6 ClinVar‘s pathogenic and likely pathogenic variants filtered form

British Pakistanis dataset. .................................................................................. 80

Table 3.7 The proportion of shared deleterious SNVs (sdSNVs) with other

populations of 1000 Genomes Project and ExAC. ............................................. 86

Table 3.8 Deleterious LoF SNVs filtered from ExAC SAS dataset in genes of

Mendelian and congenital CVDs.. ...................................................................... 90

X

Table 3.9 Novel deleterious SNVs filtered from British Pakistanis dataset in genes

of CVDs. ............................................................................................................ 91

Table 3.10 Deleterious SNVs greatly and severely differentiated in PJL than in global

populations of 1000 Genomes Project.. ............................................................. 96

Table 3.11 The number of variants in different genomic regions as calculated from

ANNOVAR annotation. .................................................................................... 106

Table 3.12 27 predicted deleterious non-synonymous SNVs in hyperlipidemia

proband in genes of CVDs.. ............................................................................. 108

Table 3.13 Common variants associated with hyperlipidemia and CAD filtered from

GWAS-Catalogue and having 1.5 fold or higher allele frequency in SAS

than in Global populations. ............................................................................... 114

Table 3.14 Quality assessment of raw reads in CMP patients‘ fastq files ........................... 115

Table 3.15 Mapped reads and raw depth of coverage for BAM files .................................. 117

Table 3.16 Numbers of variants after applying different filters ........................................... 119

Table 3.17 Numbers of variants after applying different filters ........................................... 121

Table 3.18 The number of SNVs pertaining to different genomic regions and

functions after annotation with ANNOVAR. ...................................................... 122

Table 3.19 The top 1% genes containing nonsynonymous mutations. ............................... 123

Table 3.20 The homozygous deleterious SNVs present in all five patients of this study. ... 128

Table 3.21 The homozygous deleterious SNVs with Global MAF < 1%. ............................ 130

Table 3.22 The LoF SNVs affecting all transcripts of their genes. ...................................... 135

XI

Abbreviations

Abbreviations Description

AFR African

AFS Allele Frequency Spectrum

AMR American

ANNOVAR Annotation of Variants

BAM Binary Alignment Map

BMI Body Mass Index

BWA Burrows-Wheeler Aligner

CAD Coronary Artery Disease

CADD Combined Annotation Dependent Depletion

CHD Congenital Heart Disease

ClinVar Clinical Variation

CTAB Cetyltrimethylammonium Bromide

CVDs Cardiovascular Diseases

DAF Derived Allele Frequency

DALYs Disability Adjusted Life Years

DCM Dilated Cardiomyopathy

DNA Deoxy Ribonucleic Acid

DOAF Disease Ontology Annotation Framework

DP Depth

EAS East Asian

EDTA Ethylenediaminetetraacetic Acid

ePCR Emulsion Polymerase Chain Reaction

EUR European

ExAC Exome Aggregation Consortium

FHS Framingham Heart Study

FIN Finnish European

FST Fixation Index

GATK Genome Analysis Tool Kit

GERP Genome Evolutionary Rate Profiling

GQ Genotype Quality

GWAS Genome Wide Association Studies

HapMap Haplotype Map

HCM Hypertrophic Cardiomyopathy

HDL High Density Lipoprotein

HPO Human Phenotype Ontology

ICBP International Consortium For Blood Pressure

ICD-10 International Classification Of Diseases - 10

XII

LDL Low Density Lipoprotein

LoF Loss of Function

Mb Mega Bases

NFE Non-Finnish European

NGS Next Generation DNA Sequencing

Nonsyn Nonsynonymous

OMIM Online Mendelian Inheritance in Man

PCA Principal Components Analysis

PCR Polymerase Chain Reaction

PJL Punjabi Lahore, Pakistan

Polyphen2 Polymorphism Phenotyping v2

PROMIS Pakistan Risk of Myocardial Infarction Study

QUAL Quality Score

SAM Sequence Alignment Map

SAS South Asian

SFS Site Frequency Spectrum

SIFT Sorting Intolerant from Tolerant

SNPs Single Nucleotide Polymorphisms

SNV Single Nucleotide Variation

SOLiD Sequencing by Oligomer Ligation and Detection

Syn Synonymous

T2D Type 2 Diabetes Mellitus

TAE buffer Tris-Acetate EDTA Buffer

TE buffer Tris-EDTA Buffer

TG Triglycerides

Ti/Tv Transitions/Transversion

ToF Tatrology of Fallot

UCSC University of California, Santa Cruz

VEP Variants Effect Predictor

VSD Ventricular Septal Defects

WHO World Health Organization

XIII

Summary

Cardiovascular diseases (CVDs) are the prime cause of death accounting for 17.7

million deaths every year globally. In Pakistan, prevalence of CVDs is also

considerably high. CVDs are multifactorial with many risk factors involved in the

pathophysiology of the disease including the genetic predisposition. Genetically, CVDs

may be monogenic or polygenic. Also, there is heterogeneity among genetic

predisposition of cardiac disorders in different populations of the world. This study

aims to investigate the genetic risk factors related to CVDs in Pakistani population.

In this study, the whole genome sequencing data of Pakistani individuals (PJL) from

1000 Genomes Project (n=96), whole exome sequencing data from Exome

Aggregation Consortium (predominantly containing individuals from Pakistan)

(n=8256), and whole exome sequencing data of British Pakistanis (n=3222) were

analyzed using different bioinformatics tools against a manually curated list of 1187

genes associated with major CVDs. The analysis of genetic variants with ANNOVAR

and CADD tools highlighted 561 deleterious variants from 1000 Genomes PJL, 7374

deleterious variants from ExAC (SAS), and 6028 deleterious variants from British

Pakistanis datasets in protein coding regions. The analysis with VEP showed 03 Loss

of Function variants from 1000 Genomes PJL, 30 Loss of Function variants from ExAC

(SAS), and 29 Loss of Function variants from British Pakistanis datasets. Further, the

filtration from ClinVar database revealed 03 pathogenic and 02 likely pathogenic

variants from 1000 Genomes Project PJL, 112 pathogenic, and 42 likely pathogenic

variants from ExAC (SAS), and 42 pathogenic and 16 likely pathogenic variants from

British Pakistanis datasets.

The comparative analysis of prioritized deleterious variants showed many variants

having two fold or higher allele frequency in Pakistani population than in other

populations of the world. Likewise, the population differentiation analysis highlighted

10 deleterious SNVs greatly differentiated from world populations and 02 deleterious

SNVs moderately differentiated from other South Asian populations. The principal

components analysis showed the grouping of Pakistani and other South Asian

populations with Europeans and Americans for deleterious mutations of CVDs.

XIV

To further analyze the filtered data for CVDs, whole genome sequencing of an

individual with hyperlipidemia, obesity, and coronary artery disease was carried out

using SOLiD 5500xl NGS system, and whole exome sequencing of 05 patients with

dilated cardiomyopathy was carried out using Illumina NGS system. After variants

calling and applying the same analysis pipeline, 27 deleterious SNVs were observed in

25 genes associated with hyperlipidemia and risk of coronary artery disease. Two

genes, MTRR (methionine synthase reductase), and PLB1 (Phospholipase B1)

contained two deleterious variants each, and are associated with low levels of low

density lipoprotein-cholesterol (LDL-C) and risk of coronary artery disease.

Furthermore, 11 deleterious variants, also filtered from the healthy dataset, were

observed having significantly higher allele frequency in SAS Populations than in other

populations of the world. In addition, two genes, KCNJ12 (potassium voltage-gated

channel subfamily J member 12) and CDC27 (cell division cycle 27 protein), were

identified having highest number of deleterious nonsynonymous and non-coding

variants.

From whole exome analysis of 05 dilated cardiomyopathy patients, 54 variants were

identified in genes associated with dilated cardiomyopathy, which were prioritized in

mutational load analysis as well. Here, the highest number of deleterious variants was

observed in TTN (titin) and MUC19 (Mucin 19) genes. Also, there were 19 deleterious

SNVs in homozygous state with global minor allele frequency < 1.0%. Overall, 278

deleterious SNVs were having higher allele frequency in SAS than in other populations

of the world. Further, three rare allele frequency (AF < 1%) loss of function SNVs in

C2orf40, MYOM3, and TMED4 genes, a homozygous frameshift insertion in RTKN2.,

and a splice site homozygous deletion in SLC6A6 were found in at least one of the

patients.

To conclude, this study comprehensively presents a picture of deleterious mutations

for cardiac disorders in Pakistani population. The mutational load for major CVDs in a

descending order was for hypertension, atherosclerosis, coronary aneurysm, heart

failure, coronary artery disease, cardiomyopathies, cardiac arrhythmias, and

congenital heart defects. The effect of this genetic predisposition (which is a non-

XV

modifiable risk factor) can be suppressed by minimizing the modifiable risk factors

such as healthy lifestyle.

XVI

صہ خال

37رکوڑ 1بلق داین ںیم بس ےس زایدہ اومات اک ببس ےننب وایل امیبرایں ںیہ۔ اعیمل ادارۂ تحص ےک اطمقب رہ اسل داین ںیم رقتًابی ض ارما

وعالم بلق یک رشح اکیف زایدہ ےہ۔ان ارماض اک ببس یئک واعق وہیت ںیہ۔ اپاتسکن ںیم یھب ارماض ا ن ارماض یک وہج ےس الھک اومات

ز( اک لمع دلخ وہات ےہ۔ ۔ ومرویث وعالم ںیم ااسنن ےک ویلخں ںیم وموجدرویث وعالم یھب اشلم ںیہوم ںیہ نج ںیم

نن

ی

ایت امدہ )ج ی

ن

ھچک ج

ز ںیم دبتیلی امیب ری یک وہج یتنب ےہ بج ہک درگی ارماض

نن

ی

ز امیبری یک وہج ےتنب ںیہ۔ اس قیقحت بلق ںیم یئکارماض بلق ںیم اکی ای دنچ ج

نن

ی

ج

ز یک دبتایلیں وج ارماض بلق اک ببس یتنب ںیہ اپاتسکین ولوگں ںیم اپ یئ ںیم

نن

ی

اک اطمہعل ایک ایگ ےہ۔،اجےن وایل ج

ز ںیم وموجد

ن

ی

ز یک دبتویلیں اک زجتہی رکےن ےک ےیل مہ ےن نیت ڈاٹی ب

نن

ی

اور لمکم ازگیوم (genome)لمکم ونیجم اپاتسکین ولوگں ےکج

(exome) وم وصنمےب ںیم وموجد ڈاٹی وک احلص

ن

ی ز اک زجتہی ایک۔ا س ںیم اکی زہار ج

نن

ی

اپاتسکین 69رک ےک ارماض بلق ےس کلسنم ج

وم ڈاٹی، ازگیکی

ن

ی اک لمکم ازگیوم ، اور (اشلم ںیہ اپاتسکین3739 ولوگں )سج ںیم 6739وصنمےب ںیم (ExAC) ولوگں اک لمکم ج

ANNOVAR فلتخم ویپمکرٹ رپورگاومں ےسیج ہک اشلم ےہ۔ ا ن ڈاٹی اک ں اک لمکم ازگیوم ڈاٹیاپاتسکین ولوگ 7777رباطہین ںیم میقم

CADD, اورVariant Effect Predictor ےس زجتہی ایک۔ اس زجتےی ےک ےجیتن ںیم اکی زہار ونیجم وصنمےب ںیم اشلم اپاتسکین

ز ںیم

نن

ی

دبتایلیں ، ہکبج رباطونی 3737دبتایلیں اپیئ ںیئگ ، ازگیکی ڈاٹی ںیم ےس اصقنن دہ 191ولوگں ںیم ارماض بلق ےس ہقلعتم ج

9796اپاتسکوینں ںیم دبتایلیں ںیلم۔ان اصقنن دہ دبتویلیں ںیم تہب اسری ایسی یھب اشلم ںیہ نج یک رشح اپاتسکین ولوگں ںیم درگی اوقام

ایلیں اشلم ج ںی نج اک قرپ اپاتسکین ولوگں ںیم تہب زایدہ ا۔ا۔اس ےک ایسی اصقنن دہ دبت 17اعمل ےک اقمےلب ںیم اکیف زایدہ یھت۔ان ںیم

ز ںیم

نن

ی

ایسی دبتایلیں ںیلم وج رپونیٹ ےننب 1العوہ اکی زہار ونیجم وصنمےب ںیم وموجد اشلم اپاتسکین ولوگں ںیم ارماض بلق ےس ہقلعتم ج

ں 77ےس ںیم ڈاٹی ےک لمع وک لمکم وطر رپ روک دیتی ںیہ۔ ایس رطح ازگیکییمی

ایسی دبتایلیں اپیئ ںیئگ۔ 76اور رباطونی اپاتسکوینں

ی مہ ےن وماٹےپ ےک اکشر اکی اپاتسکین رفد اک دجدی رطہقی رباےئ ارماض بلق ےس کلسنم انشتخ وہےن وایل اینجیت دبتویلیں یک وتقیث لیکی

ے یک فیلکت ےک لمکم ونیجم ےک ذرےعی (Next Generation DNA Sequencing)رتبیت اسزی ل

ن

ضع

رموضیں ےک 1اور یبلق

ز ےک ادنر 71اکشررفد ںیم ےک وماٹےپ لمکم ازگیوم یک رتبیت ولعمم یک ۔ اس زجتےی ےس

نن

ی

اصقنن دہ اینجیت دبتویلیں یک 73فلتخم ج

ز ںیم دو دو اصقنن دہ دبتایلیں اپیئ ںیئگ۔ ہی دوون PLB1اور MTTRوتقیث یک یئگ۔اس ںیم

نن

ی

ز مس ںیم بر ی یک ادقار حڑاحےن ج

نن

ی

ں ج

XVII

ے یک فیلکت ےک رموضیں ںیم ل

ن

ضع

اصقنن دہ اینجیت دبتویلیں یک وتقیث یک یئگ۔ 17اور یبلق رشاین یک امیبری اک ببس یتنب ںیہ۔ ایس رطح یبلق

ز ںیم اپیئ ںیئگ۔ اس MUC19 اور TTNاس ںیم بس ےس زایدہ اصقنن دہ اینجیت دبتایلیں

نن

ی

ےک العوہ وماٹےپ ےک اکشر رفد ںیم ج

CLDN5 اورLPL ز ںیم رپونیٹ ےک لمع وک عطقنم رکےن وایل دو دبتایلیں ںیلم وج مس ںیم وکرٹسیلول یک ادقار حڑاھیت ںیہ۔

نن

ی

ج

عمج اپاتسکین ولوگں ںیم بس ےس زایدہ اینجیت دبتایلیں اشفر وخن اک ببس ےننب وایل اپیئ ںیئگ اور اس ےک دعب بر ی اک وخن یک انویلں ںیم ،رصتخماً

ے یک فیلکت ، وطلی رحتک بلق اکدنب وہان وہان، ل

ن

ضع

ڈروم QT، یبلق رشاین یک امیبری ، یبلق

ن ن

ہقلعتم اور دیپایشئ ارماض بلق ےس ، س

اینجیت دبتایلیں اپیئ ںیئگ۔

1

Chapter 1.0

Introduction

2

1.1 Cardiovascular Diseases

Cardiovascular disease (CVD) is any disorder of the heart and the blood vessels. It is

a group of disorders which includes coronary artery disease (coronary heart disease),

cerebrovascular disease (stroke), peripheral arterial disease, hypertension, rheumatic

heart disease, congenital heart disease, cardiomyopathies, cardiac arrhythmias, deep

vein thrombosis and pulmonary embolism (World Health Organization, 2017a). CVDs

are multifactorial in nature. Several environmental and genetic factors are involved in

the pathophysiology of these disorders (O'donnell, and Nabel, 2011). The conditions

like coronary heart disease, stroke, and peripheral arterial disease involve the

restriction of blood-flow through the artery in heart, brain, and peripheral organs

respectively (British Heart Foundation, 2017). Hypertension is a condition in which

blood flows through the blood vessels with a force greater than normal. In rheumatic

heart disease, the heart muscles or valves are damaged due to infection with

streptococcal bacteria. Congenital heart diseases are malformations of the heart or

related vessels present at birth. In deep vein thrombosis, a blood clot in peripheral vein

e.g. in leg can hamper the normal blood flow to the heart, or it can dislodge and travel

to heart or lungs causing pulmonary embolism (World Health Organization, 2017a).

1.2 Prevalence of Cardiovascular Diseases

Cardiovascular diseases are the leading cause of death globally. Approximately 17.7

million deaths occur due to CVDs every year, which accounts for 31% of all the global

deaths. More than 75% of deaths due to CVDs occur in low- and middle-income

countries (World Health Organization, 2017a). East Asia, Southeast Asia, and South

Asia, where Pakistan is located, have the largest increase in premature mortality due

to CVDs over the past 20 years (Roth et al., 2015). Although, the data of prevalence of

CVDs in Pakistan remains sparse (Aziz, Faruqui, Patel, and Jaffery, 2012), yet

Pakistan faces a considerable load of CVDs in terms of morbidity and mortality. World

Health Organization reported 11.473 million disability adjusted life years (DALYs) due

to CVDs in Pakistan during 2000-2015 (Table 1.1), which was 30.84% of the burden

by non-communicable diseases in this country (World Health Organization, 2016).

3

Table 1.1: Estimated disability adjusted life years (DALYs) due to CVDs in Pakistan during the period of 2000-2015.

S.No Diseases Estimated DALYs (x 000)

1. Ischemic heart disease 6178.1

2. Stroke 2729.6

3. Congenital heart anomalies 1198.5

4. Rheumatic heart disease 677.7

5. Hypertensive heart disease 422.3

6. Cardiomyopathy, myocarditis, endocarditis 102.4

7. Other circulatory diseases 164.1

TOTAL 11472.7

1.3 Risk Factors for Cardiovascular Diseases

A risk factor can be defined as a variable possessing significant association with some

clinical condition through statistical approaches (Brotman, Walker, Lauer, and O‘Brien,

2005). Risk factors are important for assessing the predisposition of diseases enabling

better prevention and control. The risk factors of CVDs were initially determined by

epidemiology based approaches, for example, in a prospective design, Framingham

Heart Study (FHS) identified that factors such as male sex, age, smoking,

hypertension, and diabetes mellitus are related to the risk of developing CVDs

(Dawber et al., 1959). Later, the risk factors for CVDs were investigated through

empirical studies in case-control approach, whereby circulating lipids specially the low

density lipoprotein (LDL) cholesterol was found to be associated with the development

of coronary heart disease (Kannel, Dawber, Friedman, Glennon, and Mcnamara,

1964; Kannel, Dawber, Kagan, Revotskie, and Stokes, 1961). In recent years, large

cohort studies such as INTERHEART study have identified nine risk factors for

susceptibility to myocardial infarction including smoking, raised ApoB/ApoA1

(atherogenic/atheroprotective lipids) ratio, hypertension, abdominal obesity, psycho-

4

social factors, decreased daily intake of fruits and vegetables, regular alcohol

consumption, and decreased physical activity (Yusuf et al., 2004). At present, more

than 100 risk factors have been identified to be linked with various cardiac diseases

(Brotman, Walker, Lauer, and O‘Brien, 2005). The CVDs risk factors have been

divided into two broad categories i.e., classical risk factors and new risk factors. The

classical risk factors are further divided into two classes i.e., modifiable risk factors and

non-modifiable risk factors (Figure 1.1).

Figure 1.1: Classical and new risk factors of CVDs (Badimon, and Vilahur, 2012).

5

1.4 Genetic Risk Factors for Cardiovascular Diseases

There are multiple factors involved in the pathogenesis of cardiovascular diseases

including both the environmental and genetic factors. The interplay between these two

types of risk factors is quite complex and their contribution to the onset of diseases

differs for different CVDs and the individual patients (Delles, McBride, Padmanabhan,

and Dominiczak, 2008). Most of the CVDs are resulted from the complex interaction of

many genes on diverse loci, apart from the gene-environment interactions (Kelly, and

Fuster, 2010). The hereditary risk factors when combined with the modifiable risk

factors such as smoking, alcoholism, lack of physical activity etc. increase the

possibility of susceptibility to heart diseases (Centers for Disease Control and

Prevention, 2017). Studies on determining the genetic predisposition of CVDs started

about 30 years ago which anticipated to decipher some genetic variants to be

incorporated into a risk assessment model of modifiable risk factors. This extensive

research showed that CVDs are quite heterogeneous genetically (Cambien, and Tiret,

2007). So based on these findings, CVDs have been divided into two groups i.e.

monogenic and polygenic. The monogenic forms of CVDs are rare and caused by

mutations in single gene e.g. hypertrophic and dilated cardiomyopathy, long-QT

syndrome, and channelopathies etc. Certain Mendelian disorders also contribute to

the onset of CVDs e.g., familial form of hypercholesterolemia leads to the

manifestation of coronary heart disease, peripheral artery disease, and stroke. The

incidences of such cardiac disorders are also increased by disease family history. On

the other hand, polygenic CVDs are quite complex and multi factorial e.g.

hypertension, myocardial infarction, coronary artery disease, and aortic aneurysm etc.

These common forms of CVDs have been found to be caused by genetic variation in

multiple genes which show little effect when alone but manifest the symptoms of

disorder when work in combination with causal or modifier genes. Some rare variants

also pose risk to such common CVDs (Arnett et al., 2007; Faita, Vecoli, Foffa, and

Andreassi, 2012; O'donnell, and Nabel, 2011). A brief review of genetic basis of some

highly prevalent CVDs is given here.

6

1.4.1 Genetics of Coronary Heart Disease and Myocardial Infarction

It has been demonstrated for decades through the familial and epidemiological studies

that 40% - 60% risk of coronary heart disease is hereditary. The follow-up studies of

the Framingham Study also showed that the susceptibility of coronary heart disease

was found to be 2.4 fold increased in men and 2.2 fold increased in women having

family history of this disease (Ozaki, and Tanaka, 2016). The first genetic risk for

myocardial infarction and early onset of coronary artery disease was identified at band

21.3 of short arm p of chromosome 9 (9p21). The common mutations adjacent to

CDKN2A and CDKN2B on this locus were found to pose 2.02 fold higher risk of early

onset of the disease (Helgadottir et al., 2007). The genomic scale studies to decipher

the genetic risk factors for CVDs using large cohort of cases and controls such as

Coronary Artery Disease Genome-Wide Replication and Meta-Analysis

(CARDIoGRAM) identified 13 novel loci to be associated with coronary artery disease,

in addition to confirming 10 previously identified loci (Schunkert et al., 2011). Likewise,

the Coronary Artery Disease (C4D) Genetics Consortium identified 5 new loci by

genome wide association studies (GWAS) from 21,408 cases of CAD and 19,185

controls (Coronary Artery Disease (C4D) Genetic Consortium, 2011). Merging of

these two large studies led to the formation of a new consortium

CARDIoGRAMplusC4D, which identified 15 novel risk loci for coronary artery disease

(Deloukas et al., 2013). In addition to these consortia, many independent genetic

studies specific to certain populations identified more genetic loci associated with

coronary artery disease, making a total number of 51 risk loci. Many of these risk

variants are involved in lipid metabolism including LDL and cholesterol metabolism. In

addition, some identified variants are involved in inflammation, cell proliferation and

differentiation, and vasoconstriction. However, the underlying mechanism of some

variants by which these variants pose risk to coronary artery disease are still unknown

(Ozaki, and Tanaka, 2016). DNA sequencing of protein coding regions (whole exome)

of large cohorts also lead to identification of many genes carrying substantial number

of deleterious variants in CAD cases as compared to controls. Whole exome

sequencing of families with myocardial infarction highlighted the role of GUCY1A3 and

CCT7 genes which are involved in nitric oxide signaling pathways (Erdmann et al.,

7

2013). The gene APOC3 which encodes apolipoprotein C, harboring several loss of

function mutations, poses risk to CAD (TG and HDL Working Group of Exome

Sequencing Project, 2014). Whole exome sequencing of about 5000 cases of early

onset myocardial infarction revealed the role of detrimental mutations in APOA5 and

LDLR genes (Do et al., 2015).

1.4.2 Genetics of Hypertension

High blood pressure is a major contributor of cardiovascular diseases which can lead

to ischemic heart disease or stroke. Studies have shown the contribution of genetic

factors in about 50% of hypertension cases (Jeanemaitre, Gimenez-Roqueplo, Disse-

Nicodeme, and Corvol, 2007). Like the atherosclerotic CVDs, hypertension is also a

complex genetic trait, which is caused by the variations in multiple genes, because the

blood pressure in body is maintained by quite a complicated network of physiological

systems including vascular, renal, endocrine, and neuronal mechanisms (Doris, 2002).

It was noted that hypertension was caused mostly by the mutations in genes affecting

the renin-angiotensin-aldosterone system which controls salt-water homeostasis in the

body and maintains normal blood pressure (Lifton, Gharavi, and Geller, 2001).

Mutations in the gene SCNN1B, which encodes a sodium channel epithelial 1 beta

subunit, causes the number of sodium channels to be increased in the apical

membrane. This sodium channel is involved in the reabsorption of sodium in the renal

tubule. Increasing the number of such channels causes increased reabsorption of

sodium in the apical membrane, thus raising the blood pressure (Hansson et al.,

1995). Mutations in the gene of 11-beta-hydroxysteroid dehydrogenase, type II

(HSD11B2) results in the excess of mineralocorticoids which also causes increased

renal absorption of sodium. Mutations in the serine-threonine kinases, which are

encoded by the WNK1 and WNK4 genes also found to be linked with hypertension

(Wilson et al., 2001). Other studies also identified mutations in genes of

sodium/chloride transporters which altering the salt-water homeostasis cause

hypertension, such as solute carrier family 12, member 3 gene (SLC12A3) (Simon,

Nelson-Williams, et al., 1996), solute carrier family 12, member 1 gene (SLC12A1),

8

(Simon, Karet, et al., 1996), the inwardly rectifying potassium channel, subfamily J,

member 1 gene (KCNJ1) (DiPietro, Trachtman, Sanjad, and Liftonl, 1996) and in the

chloride voltage-gated channel Kb gene (CLCNKB) (Stonez et al., 1997), and

mutations in non-voltage gated sodium channel epithelial 1 beta subunit (SCNN1B),

and the non-voltage gated sodium channels epithelial 1 gamma subunit (SCNN1G)

(Chang et al., 1996).

Figure 1.2: Nephron and genes in the collecting duct and distal tubule involved in reabsorption of Na+ ions and resulting in hypertension (Luft, 2017).

Recent large scale genome wide association studies (GWAS) and their meta-analysis

lead to the identification of many risk loci which are linked with primary hypertension.

The International Consortium for Blood Pressure (ICBP) and GWAS collectively

(ICBP-GWAS) reported 28 loci for systolic and diastolic blood pressures (International

Consortium for Blood Pressure Genome-Wide Association Studies, 2011). Further,

large independent studies on genetics of hypertension have led to elucidate other loci

linked with hypertension. To date, 185 single nucleotide polymorphisms (SNPs) at

various loci have been catalogued to be associated with hypertension (Hindorff,

Junkins, Mehta, and Manolio, 2011).

9

1.4.3 Genetics of Congenital Heart Diseases

Congenital Heart Disease (CHD) is the malformation of heart present at birth. CHDs

are the most common form of birth abnormalities accounting up to one third of all the

major birth defects (van der Bom, Bouma, Meijboom, Zwinderman, and Mulder, 2012).

This group of CVDs comprises the structural abnormalities of heart such as

abnormalities of cardiac valves, cardiac septum, and the lesions of track of blood

outflow. This includes simple heart defects such as atrial septal defects (ASD),

ventricular septal defects (VSD), patent ductus arteriosus (PDA), pulmonary valve

stenosis, and the complex defect such as Tatrology of Fallot (TOF), which is

combination of four defects of heart i.e., a VSD, pulmonary valve stenosis, right

ventricular hypertrophy, and overriding aorta (National Heart Lung and Blood Institute,

2017).

Figure 1.3: Various forms of congenital heart diseases

Genetically, congenital heart diseases are also heterogeneous. The genetic evidence

of CHD started with the finding of de novo deletions at chromosome 22q11 locus, and

chromosome 21 trisomy (Antonarakis, Lyle, Dermitzakis, Reymond, and Deutsch,

10

2004; Goldmuntz, 2005). Different studies showed that mutations in genes which are

involved in cardiac development such as NKX2-5 gene of homeodomain protein

(Schott et al., 1998), GATA4 which encodes GATA Binding Protein 4 (Garg, Kathiriya,

Barnes, and Schluterman, 2003), and NOTCH1 gene of a transmembrane protein of

NOTCH family (Garg, Muth, Ransom, and Schluterman, 2005) lead to the

manifestation of various forms of CHD. Further studies led to the identification of many

structural variations in different chromosomes associated with high penetrance of

CHDs such as trisomy chromosome 13, trisomy chromosome 18, deletions at

22p11.2, 7q11.23, and 5p15.2 loci etc. (Fahed, Gelb, Seidman, and Seidman, 2013).

The mutations in important cardiac transcription factors resulting in haploinsufficiency

are responsible for inherited and sporadic congenital heart diseases (Pulignani,

Cresci, and Andreassi, 2013). This includes de novo substitution in NR2F2 gene which

encodes a pleiotropic developmental transcription factor causing the atrioventricular

septal defect (Al Turki et al., 2014), and mutations in transcription factors belonging to

the subfamily of T-box such as TBX3 and TBX5 which play role in developing and

maintaining the cardiac conductions system (Postma, Bezzina, and Christoffels, 2016).

Many mutations in regulatory regions such as promoters and enhancers of some

genes have also been identified to be linked with CHDs. The variations in regulatory

regions of genes predispose or cause the disease by altering the binding of

transcription factors and changing the gene expression. To date, more than 50 human

genes have been identified which are involved in different congenital heart

abnormalities (Postma, Bezzina, and Christoffels, 2016).

11

1.4.4 Genetics of Cardiomyopathies

Cardiomyopathies are group of cardiac disorders which involve the structural and

functional abnormalities of heart muscles. For cardiomyopathies, hypertension,

coronary artery disease, congenital heart disease, and heart valvular disease are

excluded because these conditions also damage the heart muscles (Elliott, 2000).

Cardiomyopathies have been classified based on the abnormalities and their

localization in the heart muscles which includes hypertrophic cardiomyopathy (HCM),

dilated cardiomyopathy (DCM), restrictive cardiomyopathy (RCM), and arrhythmogenic

right ventricular dysplasia (ARVD) (Figure 1.4).

Figure 1.4: A schematic short axis cross-sectional view of heart representing various forms of cardiomyopathies (Davies, 2000).

Hypertrophic cardiomyopathy (HCM) is the most common inherited disorder among

the cardiovascular diseases, in which the thickness of the walls of ventricles increases

(Jacoby, and McKenna, 2012). The genetic studies have shown HCM as a genetically

heterogeneous disorder following autosomal dominant as well as autosomal recessive

pattern of inheritance with an incomplete penetrance depending on age and gender

(Sabater‐Molina, Pérez‐Sánchez, Hernández del Rincón, and Gimeno, 2017). Majority

12

of the mutations have been identified in the genes of sarcomeric proteins. About 70%

of the mutations related to HCM have been identified in the genes encoding cardiac

myosin binding protein C (MYBPC3) and β-myosin heavy chain 7 (MYH7). Other

genes harboring the pathogenic variants for HCM with frequency ranging from 1–5%

include TPM1, TNNT2, TNNI3, ACTC1, MYL2, and MYL3 (Lopes et al., 2015). High

throughput sequencing technologies have identified new genes contributing to the

pathophysiology of HCM increasing the list to dozens of responsible genes, including

the genes encoding non-sarcomeric proteins such as Z-disc, and Ca2+-handling

proteins. The variants in genes of desmosomal ion channels, and titin protein have

been found in up to 43% and 64% of the cases along with variants in (Sabater‐Molina,

Pérez‐Sánchez, Hernández del Rincón, and Gimeno, 2017).

Dilated cardiomyopathy (DCM) is the most common cause of cardiac death in young

adults. In DCM, the left ventricle is enlarged due to the reduced thickness of

ventricular walls causing the systolic dysfunction (Hershberger, Hedges, and Morales,

2013). DCM may be idiopathic or with a hereditary cause (25-30%). Studies also

determined that 50% of the idiopathic DCM were genetic (Mahon et al., 2005). Like

HCM, DCM is also genetically heterogeneous showing patterns of autosomal

dominant, autosomal recessive, X-linked, and mitochondrial inheritance. Genetic

studies have identified a number of genes contributing to pathophysiology of DCM, in

which titin (TTN), lamin A/C (LMNA), cardiac troponin T (TNNT2), β-myosin heavy

chain 7 (MYH7), BCL2-associated athanogene 3 (BAG3) found to be major players

contributing to the pathophysiology. To date, over 40 genes have been identified to be

associated with DCM, many of which encode for sarcomeres and cytoskeletal

elements. It has also been noted that many genes responsible for DCM, are also

overlapping with those responsible for HCM (Park, 2017). For restrictive

cardiomyopathy, mutations in cardiac troponin I (cTnI), have been found to increase

myofibril sensitivity to calcium which causes the impaired ventricular relaxation (Liu et

al., 2016).

13

1.5 Genetics of Obesity

The excessive accumulation of fats in the body leading to health impairment is termed

as obesity. Usually, the body mass index (BMI: weight per squared meter of height

(weight/m2) of a person) is used to define the obesity. For adults, a person with BMI ≥

30 is considered as obese (World Health Organization, 2017b). The prevalence of

obesity is high in both the high-income, as well as in middle- and low-income countries

(Ng et al., 2014). It has been reported that globally over 600 million of adults aging >18

are obese (World Health Organization, 2017b).

Obesity is quite a complex metabolic disorder which is also associated with other

pathophysiological conditions such as dyslipidaemia, atherosclerosis, hypertension,

coronary heart disease, type 2 diabetes mellitus (T2D), and certain types of cancers

(Poirier et al., 2006; Switzer, Mangat, and Karmali, 2013). Obesity is one of the prime

risk factors for elevated prevalence of CVDs. A strong association of obesity has been

found with hypertension leading to coronary heart disease and heart failure (Akil, and

Ahmad, 2011; Artham, Lavie, Milani, and Ventura, 2009). So, the genetic factors of

obesity are also the risk factors for cardiovascular diseases. Genetically, obesity has

been classified into monogenic obesity and polygenic obesity. The monogenic forms of

obesity may be syndromic or non-syndromic which follow autosomal or X-linked

pattern of Mendelian inheritance, e.g., abdominal obesity-metabolic syndrome 3

(OMIM # 615812), body mass index quantitative trait locus 9 (OMIM # 602025).

Genetic variations in genes regulating the appetite and related metabolism have been

found to cause these types of obesity (Waalen, 2014). Bardet-Biedl syndrome, a major

form of syndromic obesity, has been found to be caused by variations in a class of 19

genes naming as BBS1 to BBS19 (Pigeyre, Yazdi, Kaur, and Meyre, 2016). The

products of this class of genes affect the signaling cascade through the leptin

receptors (LEPR) (Seo et al., 2009). Another syndromic form of obesity Prader-Willi

syndrome was found to be caused by deletions at chromosome 15 locus q11.2-q13

and variations in genes such as MAGEL2, MKRN3, NPAP1, and SNURF-SNRPN

(Angulo, Butler, and Cataletto, 2015; Pigeyre, Yazdi, Kaur, and Meyre, 2016). In

Cohen syndrome, variations in COH1 (VPS13B) on chromosome 8q22 locus have

14

been found responsible for the pathophysiology (Kolehmainen et al., 2003), while the

Alstrom syndrome has been found to be caused by variations in ALMS1 (Collin et al.,

2002). For non-syndromic form of obesity, a number of heterozygous/homozygous

loss-of-function mutations have been identified in some genes such as LEP (Leptin),

LEPR (Leptin Receptor), MC4R (Melanocortin 4 receptor), POMC

(Proopiomelanocortin), SH2B1 (SH2B adaptor protein 1), and NTRK2 (Neurotrophic

tyrosine kinase receptor type 2) with varying degree of penetrance (Pigeyre, Yazdi,

Kaur, and Meyre, 2016). The list of associated genes increased using the genome

wide association studies (GWAS) and genes such as FTO and MC4R emerged as

strong candidate genes linked with obesity (Srivastava, Srivastava, and Mittal, 2016).

For polygenic obesity, there is still poor understanding of the underlying predictive risk

due to genetic variants. This might be due to the fact that many variants of small effect

size play together to produce the phenotype (Yeo, 2017). Recently, complete genome

sequencing of mouse model of polygenic obesity TALLYHO/Jng (TH) revealed 1601

deleterious non-synonymous mutations in 1148 genes. It was also noted in this study

that 99.83% of the 1.21 million indels were found in non-coding regions including the

intronic, intergenic, and 5 kb upstream or downstream regions (Denvir et al., 2016). To

date, more than 100 loci have been identified to be associated with obesity (Yeo,

2017).

15

1.6 Mutational Load for Cardiovascular Diseases

Mutational load or burden is a phenomenon in population genetics implying that

several deleterious variants within the genome pose a harmful effect to the fitness of

an individual whereby it contributes to the susceptibility of complex disorders

(Howrigan et al., 2011). The overall fitness of a population is reduced by the

emergence of detrimental genetic variants. It is one of the components of genetic load

which determines the genetic make-up of populations. The other parameters of

genetic load are inbreeding load, segregation load, and transitory load (Henn,

Botigué, Bustamante, Clark, and Gravel, 2015). The reasons for emergence of

detrimental variants in populations remained contentious among biologists. Studies

have suggested that deleterious variants arose in populations during the range

expansion during or after the Out-of-Africa event. During the expansion of populations

in new territories, many neutral variants arose to high frequencies being the optimal to

new habitats, a phenomenon termed as ‗gene surfing‘ (Edmonds, Lillie, and Cavalli-

Sforza, 2004; Klopfstein, Currat, and Excoffier, 2005). The surfing effect can also lead

to detrimental mutations rising to high frequencies in the expanding front. This

phenomenon also affects the variants involved in reproduction rate (Travis et al.,

2007). Recent empirical studies based on whole genome/whole exome sequencing of

large cohort of human populations have revealed that populations differ in neutral and

deleterious variants subject to their evolutionary background. On average, the non-

African populations bear more deleterious variants than the African populations

(Lohmueller et al., 2008). This is due to a severe bottleneck faced by ancestral non-

African populations post Out-of-Africa event (Keinan, Mullikin, Patterson, and Reich,

2007; H. Li, and Durbin, 2011). It has been estimated that non-African populations

carry, on average, slightly but significantly larger number of predicted deleterious

mutations than the African populations (Fu, Gittelman, Bamshad, and Akey, 2014). It

was also estimated from large scale DNA sequencing data that on average a person

carries 281-515 missense substitutions, out of which 40-85 in homozygous state (Xue

et al., 2012). These detrimental variants in healthy individuals may not show apparent

disease symptoms may be due to their low penetrance, or being in heterozygous state

16

particularly those which are associated with autosomal recessive disorders, or being

associated with late onset of diseases.

Deleterious variants of different allele frequencies confer different effects on the fitness

of individuals and consequently susceptibility to diseases. It has been hypothesized

that common variants pose less effect to the susceptibility of diseases while rare

variants confer more effect for monogenic, familial as well as complex genetic

disorders (Lettre, 2014).

Comprehensive literature survey shows that continental populations have been

evaluated for general deleterious mutational load and its history in context of

population demographics. To date, there are no reports of studies addressing and

quantifying the mutational load for certain human diseases. This is a gap which needs

to be sophisticatedly addressed through whole genome/whole exome sequencing

data. Quantifying the mutational load for certain diseases can provide a framework

how these diseases have been evolved in the human histories passing the filter of

purifying selection. Cardiovascular diseases, as described earlier, are group of

monogenic and polygenic disorders of heart and the vessels. There is complex

interplay of many genes which leads to the appearance of cardiac disorders. There are

a number of studies elucidating the genetic basis of various common, and Mendelian

cardiac diseases using the large cohort of patients and controls. However, the

evolution of deleterious and disease causing variants for CVDs has not been

investigated so far. Estimation of the mutation load using the deleterious variants for

cardiac diseases will enable to understand the pattern of their emergence in human

populations. The comparison of allele frequencies across the populations would

enable to understand the effect of evolutionary forces distributing these detrimental

variants differentially among the populations, and where by posing differential

underlying mutation load.

17

1.7 Genetic Research on Cardiovascular Diseases in Pakistan

Pakistan is the 5th largest country of the world having a huge flux of population. It is

facing serious health care issues. Consanguineous marriages are common in Pakistan

which are possible cause of genetic disorders including cardiovascular diseases (Haq

et al., 2011). Estimates show that one in five adults of middle age may have sub-

clinical coronary artery disease. Prevalence of myocardial infarction in our local

population has been reported to be 11.2% in one study of prevalence of coronary

artery disease in rural areas of Peshawar (Mahmood-ul-Hassan, Awan, Gul,

Sahibzada, and Hafizullah, 2005). The prevalence of various forms of congenital heart

defects has been reported to be 3.4/1000 births in one study (Rizvi, Mustafa, Kundi,

and Khan, 2015). Despite substantial load of cardiovascular diseases, little genomic

research has been carried out in Pakistan on CVDs. The INTERHEART study (15152

cases and 14820 controls), in which metabolic and socio-economic factors were

studied in relation to myocardial infarction, also comprised <5% cases from Pakistan

(Yusuf et al., 2004). Recently, the Pakistan Risk of Myocardial Infarction Study

(PROMIS) analyzed the whole exomes of 4,793 myocardial infarction cases and 5,710

controls, and highlighted 49,138 rare-frequency (minor allele frequency <1%) predicted

loss-of-function (pLoF) mutations in 1317 genes. In this study, many mutations in lipids

metabolizing genes such as PLA2G7, CYP2F1, TREH, A3GALT2, NRG4, APOC3,

SLC9A3R1 were found key players in conferring the susceptibility to myocardial

infarction (Danish Saleheen et al., 2017). The PROMIS in collaboration with other

consortia, also determined variants in different genes through genome wide

association studies to be associated with coronary heart disease and myocardial

infarction (Golbus et al., 2016; Webb et al., 2017). In addition, there are separate

screening reports of single gene, few genes, or few already associated SNPs with

certain major cardiovascular diseases such as coronary artery disease (Hussain, Bibi,

and Javed, 2011; Iqbal et al., 2005; Shahid et al., 2017), myocardial infarction (Ahmed

et al., 2011; Iqbal et al., 2004; Perwaiz Iqbal et al., 2016; Saeed et al., 2007; Danish

Saleheen et al., 2010), hypertension (Alvi, and Hasnain, 2009; Nawaz, and Hasnain,

2011; Umedani, Chaudhry, Mehraj, and Ishaq, 2013), hypercholesterolemia (Ahmed et

al., 2013; Ajmal et al., 2011), cardiomyopathies (Abid, Akhtar, Khaliq, and Mehdi,

18

2011; Hussain, Haroon, Ejaz, and Javed, 2016; Liaquat, Asifa, Zeenat, and Javed,

2014; Rafiq et al., 2017).

19

1.8 Objectives of the Study

Genomic research on cardiovascular diseases is not at par with its burden in the

country. This is a gap, and a lot of research needs to be carried out on genetic level in

Pakistani population. In this scenario, this study aims to assess and estimate the

underlying mutational burden of cardiovascular diseases in Pakistani population.

Following tasks are aimed to be carried out to come up with the synopsis:

I. To analyze the publically available whole genomic/exomic data of

Pakistani population in different studies/consortia using different bioinformatics

tools such as ANNOVAR (Yang, and Wang, 2015), Combined Annotation

Dependent Depletion (CADD) (Kircher et al., 2014), and Variant Effect Predictor

(VEP) (McLaren et al., 2016) for quantifying the mutational load for common

and Mendelian CVDs. These datasets include 1000 Genomes Project (Punjabi

Lahori, PJL) (1000 Genomes Project, 2015), South Asian in Exome

Aggregation Consortium (ExAC) (Lek et al., 2016) which predominantly

contains samples from Pakistan as a cohort of Pakistan Risk of Myocardial

Infarction Study (PROMIS) (Danesh Saleheen et al., 2015), and British

Pakistanis (Narasimhan et al., 2016). In addition, ClinVar and OMIM databases

will also be filtered for pathogenic and likely pathogenic variants associated with

CVDs. The allele frequencies of prioritized variants will be compared with global

populations to find the relevance of patterns of CVDs genetic risk in Pakistani

population with other populations of the world.

II. To sequence complete genome of a Pakistani individual with

hyperlipidemia, obesity and coronary artery disease using next generation DNA

sequencing (NGS) technology and analyze it for identifying the deleterious

genetic variants prioritized in mutational load analysis related to hyperlipidemia,

and coronary artery disease.

III. To sequence whole exomes of five patients with dilated cardiomyopathy

and analyze it for identifying the deleterious genetic variants prioritized in

mutational load analysis related to dilated cardiomyopathy.

20

Chapter 2.0

Materials and Methods

21

2.0 Scheme of Study

For determining the genetic risk factors possibly responsible for cardiovascular

diseases in Pakistani population, a schematic empirical approach was adopted.

Primarily, the methodology consisted of three phases (Figure 2.1), as:

2.1 Estimating the mutational burden for CVDs using the publically available whole

genome/exome sequencing data of Pakistani population, and its comparison

with other global populations.

2.2 Whole genome sequencing of a Pakistani individual with hyperlipidemia and

coronary artery disease through next generation sequencing (NGS) technology

and its analysis.

2.3 Whole exome sequencing of five Pakistani patients with dilated cardiomyopathy

to evaluate the genetic risk factors.

Figure 2.1: The outline of methodology for determining the genetic risk factors for CVDs in Pakistani population.

Esimating the mutational load for CVDs using whole genome/ exome sequencing data of Pakistnai population and its comparison with other populations.

Whole genome sequencing and analysis of an individual with hyperlipidemia and coronary artery disease to evaluate the variants filtered in mutation load analysis.

Whole exome sequencing and analysis of 5 patients with dilated cardiomyopathy, and its comparative analysis . 2.3

2.2

2.1

22

2.1 Estimating the Mutaional Load for Cardiovascular Diseases in

Pakistani Population and its Comparison with Global

Populations

To determine the mutational load for cardiovascular diseases in Pakistani population,

a pipeline (Figure 2.3) was established in which all the genes previously reported to be

involved in CVDs were listed through the mining of disease databases and literature

survey. Mutational load was calculated in these genes using various bioinformatics

tools. The detailed methodology of estimating the CVDs burden and its comparison

with other populations of the world is described below.

2.1.1 Genes Involved in Cardiovascular Diseases

To determine the genes reported for their association with cardiac diseases, three

databases i.e., Online Mendelian Inheritance in Man (OMIM), ClinVar, and Disease

Ontology Annotation Framework (DOAF) (Hamosh, Scott, Amberger, Bocchini, and

McKusick, 2005; Landrum et al., 2014; W. Xu et al., 2012) were searched. The genes

were retrieved from these databases using the search terms ‗heart‘, ‗cardio‘, ‗cardiac‘,

‗myocardial‘, ‗coronary‘, ‗cardiomyopathy‘, ‗arteriopathy‘, ‗aneurysm‘, ‗atherosclerosis‘,

‗septal defect‘, ‗tetralogy of fallot‘, ‗septal noncompaction‘, ‗arterial‘, ‗atrial‘,

‗hypertension‘, ‗hypercholesterolemia‘, ‗hyper triglyceridemia‘, ‗QT syndrome‘ and

some manually selected cardiac disorder names. To validate these terms, two

databases i.e., Human Phenotype Ontology (Köhler et al., 2014) and WHO‘s

International Classification of Diseases (ICD-10) database were accessed and

comparison was performed. After going through the literature for manual curation, a

final list of (n=1187) genes was prepared, which was carried forward for current

analysis (Appendix Table 1). Out of these, 379 genes were involved in Mendelian and

congenital cardiac disorders such as cardiomyopathies, cardiac arrhythmias, and

atrioventricular septal defects, while rest contributed to common CVDs such as

hypertension, hypercholesterolemia, myocardial infarction, and coronary artery

disease (Figure 2.2). The structural and functional roles of these genes‘ products were

determined by gene ontology terms using the UniProt Gene Ontology Annotation

database for human (version 2.0) (Camon et al., 2004). For visualization of the

23

ontology terms, an online tool BGI WEGO (http://wego.genomics.org.cn/cgi-

bin/wego/index.pl) was used (Ye et al., 2006).

Figure 2.2: Number of genes analyzed for common,

Mendelian and congenital CVDs in this study.

2.1.2 Genomic/Exomic Datasets used

To obtain the genetic variants in selected genes, the whole genomic/exomic data of

Pakistani population was retrieved from different publically available data resources,

such as:

i. Punjabi from Lahore (PJL) (n=96) in 1000 Genomes Project phase 3 (1000

Genomes Project, 2015)

ii. Exome Aggregation Consortium (ExAC)‘s South Asian dataset (n=8056), which

predominantly contains samples from Pakistan (n=7078) as Pakistan Risk of

Myocardial Infarction Study which is a subsidiary of ExAC (Lek et al., 2016;

Danish Saleheen et al., 2017)

iii. Whole exome sequencing data of 3222 British Pakistani individuals with high

relatedness (Narasimhan et al., 2016).

http://wego.genomics.org.cn/cgi-bin/wego/index.pl

http://wego.genomics.org.cn/cgi-bin/wego/index.pl

24

The data from 1000 Genomes Project PJL, and British Pakistanis was analyzed for all

1187 genes involved in common as well as Mendelian and congenital CVDs, while

data from ExAC SAS was analyzed for Mendelian and congenital CVDs only because

this dataset contained cohort of common CVDs also (Danish Saleheen et al., 2017).

2.1.3 The Analysis Pipeline

A pipeline was developed to identify and analyze genetic risk factors of cardiovascular

diseases using the computational biology tools (Figure 2.3). The start and end

positions of the genes-set were determined from GENCODE genes set

(gencode.v19.annotation.gtf), which is the final version of GENCODE database

mapped with human reference genome GRCh37 assembly (Harrow et al., 2012). The

shell command ‗grep‘ was used to extract the genes under study from

gencode.v19.annotation.gtf data file.

In order to include the variants from immediate upstream and downstream regions to

cover the promoters of the genes, 2000 bp was subtracted from the start position of

each gene (upstream region), and 2000 bp was added to end position of genes

(downstream region). The genetic variants were extracted within these coordinates of

genes from above mentioned three datasets by using the bcftools-1.2.1 package

(http://www.htslib.org/download/) (Danecek et al., 2011). For this, ‗bcftools view -R‘

option was used to extract the region based variants within the gene co-ordinates. The

output VCF file contained both the SNVs and indels, which were separated using the

bcftools.

http://www.htslib.org/download/

25

Figure 2.3: The pipeline to find and analyze the deleterious variants related to cardiac diseases in Pakistani population.

For determining the functional consequences of the subset variants, three annotation

tools were utilized i.e., ANNOVAR (Yang, and Wang, 2015), Combined Annotation

Dependent Depletion (CADD) (Kircher et al., 2014), and Variants Effect Predictor

(VEP) (McLaren et al., 2016). Annotation with ANNOVAR was carried out using the

standalone perl application with gene based refGene annotation, region based

cytoBand and genomicSuperDups annotations, and filter based ljb26_all, dbscsnv11,

esp6500siv2_all, 1000g2015aug_all, and exac03 annotations. The gene based

refGene annotation provides information for all the annotated transcripts in the RefSeq

Gene database. The region based annotations cytoBand and genomicSuperDups

provide the identification of chromosomal bands of variants and duplication segments

26

respectively. Among the filter based annotations, ljb26_all provides the scores of SIFT,

Polyphen2, and GERP++ etc., dbscsnv11 predicts whether the variant is present in the

splice site, and 1000g2015aug_all, esp6500siv2_all, and exac03 provide allele

frequencies of the variants in respective populations and databases

(http://annovar.openbioinfor-matics.org/en/latest/user-guide/download/). The anno-

tation with CADD was performed using a standalone perl script. This annotation

provided CADD based Phred_score (scaled C-score) of the single nucleotide variants

(SNVs). The scaled C-scores of small indels were determined using the web-based

CADD tool (http://cadd.gs.washington.edu/score). For this, the gunzipped vcf file was

uploaded at the captioned server.

To determine the predicted deleteriousness of genetic variants, different criteria have

been used in different studies. Some used single score such as genome evolutionary

rate profiling (GERP) score (Henn, Botigué, Bustamante, Clark, and Gravel, 2015),

PolyPhen2 (Y. Li et al., 2016), and CADD (Richardson, Campbell, Timpson, and

Gaunt, 2016). There are many studies employing more than one tools to predict the

variants as deleterious (Ma et al., 2015; Xue et al., 2012). The scores of three tools

CADD, SIFT, and PolyPhen2 were took into account to consider the variants as

deleterious. The variants for which CADD scaled C-score was ‗≥15‘, SIFT score was

‗<0.05‘, and PolyPhen2_HDIV score was ‗>0.957‘ were considered as deleterious.

These cut-off scores have been recommended by their respective authors. The tools

SIFT and PolyPhen2 predict the effect of variants by employing a machine learning

approach which takes many factors into account such as sequence- and structure-

based features, multiple sequence alignment of proteins, and conservation of variants

across available homologous sequences etc. (Miosge et al., 2015). CADD is an

annotation tool which uses integrated information from 63 annotations from different

databases including the conservation, functional consequences of variants in coding

as well as non-coding regions, and escape from the natural selection

(http://cadd.gs.washington.edu/). This tool integrates information from diverse

annotations into one framework. The scaled C-score correlates with the pathogenicity

http://annovar.openbioinfor-matics.org/en/latest/user-guide/download/

http://cadd.gs.washington.edu/score

http://cadd.gs.washington.edu/

27

of coding as well as non-coding variants, considering the allelic diversity, and

regulatory effects measured by experiments (Kircher et al., 2014).

2.1.4 Filtration of Variants by ClinVar Database

The subsets of variants in genes related to cardiovascular diseases from the three

datasets were also filtered from the ClinVar database (Landrum et al., 2014). The

ClinVar database provides an archive of relationship of genetic variants with medical

phenotypes. This database contains variants of different significances including

Benign, Likely benign, Non-pathogenic, Probable-pathogenic, Pathogenic, Drug

response, and Others. The ‗Other‘ category contains variants having risk factor,

sensitivity, association, or some protective role in diseases (Landrum et al. 2014). For

this analysis, the variants with significance ‗Pathogenic‘, and ‗Likely_pathogenic‘ were

extracted using the ClinVar data release 20160104 (ftp://ftp.ncbi.nlm.nih.gov/

pub/clinvar/). The ‗bcftools isec‘ command was used to determine the intersection of

the subset variants and the ClinVar variants. For retrieving the allele frequencies of

extracted variants, the filter based annotation of ANNOVAR for ExAC and 1000

Genomes Project was used.

2.1.5 Comparison of Allele Frequencies of Deleterious Variants of CVDs

with Global Populations

The derived allele frequencies of genetic variants represent their prevalence in a

population providing an insight into the evolutionary genetics background. The

comparison of allele frequencies of variants related to certain diseases/phenotypes

across the populations is a useful approach to study the prevalence of those

diseases/phenotypes in different populations. This also provides the information of

genetic diversity across the populations in terms of the disease/phenotype under study

(1000 Genomes Project 2010). Here the comparison of derived allele frequencies

(DAF) of prioritized variants related to cardiovascular diseases from the three

databases was carried out. The allele frequencies of predicted deleterious derived

alleles from PJL individuals were compared with all the population groups of 1000

Genomes Project i.e., South Asian (SAS), East Asians (EAS), Admixed American

ftp://ftp.ncbi.nlm.nih.gov/

28

(AMR), European (EUR), and African (AFR) and a Southeast Asian population of

Malay (Wong et al. 2013). Similarly, the comparison of allele frequencies of deleterious

derived alleles prioritized from ExAC SAS dataset was carried to other five populations

of this dataset i.e. East Asian (EAS), Latino (AMR), African/African American (AFR),

Non-Finnish European (NFE), and Finnish Europeans (FIN).

2.1.6 Genetic Differentiation of Deleterious Variants

The genetic differentiation of predicted deleterious variants for CVDs was determined

across the populations by calculating the Weir and Cockerham Fixation Index (FST)

(Weir, and Cockerham, 1984). The pairwise unbiased FST was calculated for multiple

loci in two ways i.e., calculation of FST values for predicted deleterious SNVs only, and

calculation of FST values for whole genes harboring those deleterious SNVs. The

genetic differentiation of Pakistani population was estimated against rest of the South

Asian populations, as well as all 25 global populations. The SNVs with FST values in

the range of 0.05 – 0.15 were considered as moderately differentiated, those having

FST values between 0.15 – 0.25 were considered as greatly differentiated, and those

having FST values greater than 0.25 were taken as severely differentiated SNVs

(Jobling, Hurles, and Tyler-Smith, 2013). For calculating the FST values from 1000

Genomes Project data, VCFtools v0.1.12 was used. For this, the merged vcf file of all

the populations was used.

In order to determine whether the footprints of population migrations affected the

distribution of genetic variants related to cardiovascular diseases, the principal

components analysis (PCA) was performed. For this purpose, two approaches were

applied. The principal components were constructed with the PLINK 1.9 (Purcell et al.,

2007) and EIGENSOFT smartpca (Patterson, Price, and Reich, 2006) packages using

the total subset variants in genes related to cardiovascular diseases and predicted

deleterious variants only. The 1000 Genomes Project data of 96 individuals from each

of 15 populations (Table 2.1) and 96 individuals of PJL population was used.

29

Table 2.1: Populations of 1000 Genomes Project used for principal components analysis (PCA).

Population Groups Populations used for PCA analysis

South Asians BEB, STU, ITU

Europeans GBR, FIN, CEU

Americans CLM, PEL, PUR

Africans YRI, LWK, MSL

East Asians CHB, JPT, KHV

For PLINK, the compatible ‗.ped‘ file was created from the vcf file using the VCFtools

v0.1.12. The ‗.ped‘ file was converted into ‗.bed‘ format using the PLINK ‗--make-bed‘

option. Then PLINK ‗--pca' option was used to generate the principle components. For

PCA analysis with EIGENSOFT, the required ‗.pedindel‘ file was created manually

from the ‗.ped‘ file by cutting initial 6 columns. The PCA plot was constructed using the

‗R base‘ package (R Core Team, 2013).

2.1.7 Linkage Analysis of Deleterious Variants

The linkage disequilibrium analysis of the observed deleterious variants from 1000

Genomes Project was performed using the VCFtools v0.1.12. The analysis reveals the

genetic components which are non-randomly passed from parents to offspring in a

population thus deviating from the Hardy-Weinberg equilibrium (Slatkin, 2008). The

linkage disequilibrium analysis was performed using the sliding window of 10,000 bp.

vcftools --vcf cardio_pjl_subset_100316_sort_SNP144_grep-deleterious-only.vcf --hap-

r2 --ld-window-bp 100000 --out pjl_subset_dele_variants_ld_window_100000

30

2.2 Whole Genome Sequencing of a Pakistani Individual with

Hyperlipidemia and Coronary Artery Disease

Obesity, a complex metabolic disorder, is also a risk factor for some other

pathophysiological conditions such as dyslipidaemia, atherosclerosis, hypertension,

coronary heart disease, type 2 diabetes mellitus (T2D), and certain types of cancers

(Poirier et al., 2006; Switzer, Mangat, and Karmali, 2013). The whole genome

sequencing of a Pakistani individual with hyperlipidemia and obesity was carried out

using Applied Biosystems SOLiD® 5500xl next generation DNA sequencing machine.

The detailed procedures and materials used are given below.

2.2.1 Samples Collection and DNA Isolation

Approximately 10mL blood sample of the above-mentioned individual was collected in

K2-EDTA container after the informed consent. The individual was hyperlipidemic with

body mass index (BMI) > 30. DNA extraction was carried out immediately after the

collection of blood samples to achieve the highest integrity of genomic DNA. The high

molecular weight genomic DNA was isolated from the stored blood by CTAB isolation

method with small modifications (Winnepenninckx, Backeljau, and DeWachter, 1993).

The CTAB lysis buffer was prepared with 2% w/v cetyltrimethylammonium bromide

(CTAB), 100 mM TrisHCl, 20 mM EDTA, 1.4 M NaCl, 0.2% v/v β-mercaptoethanol, 0.1

mg/mL proteinase K, and pH of 8.0. Following protocol was used for isolating the DNA.

i. For the lysis of blood cells, 200 uL of whole blood was added to 1mL CTAB

buffer pre-warmed at 65oC in a micro-centrifuge tube. The was incubated at

65oC for one hour, with gently inverting the tube 2 to 3 times during the

incubation.

ii. After the incubation, equal volume of chloroform/isoamylalcohol (24:1) solution

was added to it and the contents of the tube were mixed by gently inverting the

tube several times.

iii. The tube was centrifuged at 12000 RPM for 05 minutes, and the aqueous

supernatant was transferred to a new micro-centrifuge tube very carefully.

31

iv. Two third (2/3) volume of ice chilled isopropanol was added and mixed by

inverting the tube gently several times. Thread like precipitation of DNA was

visible, which was pelleted by centrifugation at 12000 RPM for 03 minutes.

v. The pellet of DNA was washed twice with 70% ethanol to remove the salts and

other impurities from the DNA. Then the pellet was air dried after the last wash,

and finally dissolved in 50 uL TE buffer (pH 7.5). The isolated DNA was stored

at -20oC.

2.2.2 DNA Quality Assessment and Quantification

The quality of genomic DNA was assessed by 1% agarose gel electrophoresis. For

this, 500 mg of agarose powder was dissolved in 50 mL 1xTAE buffer. The mixture

was heated on a heating block until boil and then let it cool for few minutes. Then it

was poured into gel casting tray, into which 2.5 uL DNA staining dye, ethidium

bromide, to a final concentration of 0.5 μg/mL was added. The gel was allowed to

solidify for about half an hour at room temperature. For electrophoresis, 05 uL of each

DNA sample was mixed with 1uL of 6x DNA loading dye, and loaded into the wells of

agarose gel. DNA ladder of 1 Kb size was also loaded in one of the wells. The

electrophoresis was carried out in 1xTAE buffer on a voltage of 60V for 90 minutes.

After the completion of electrophoresis, the gel was visualized on a UV trans-

illuminator.

The quantity of genomic DNA was estimated on Qubit® 2.0 Fluorometer using the

Qubit® dsDNA HS Assay Kit (Thermo Fischer). For quantification, 1 uL of DNA sample

was added to 199 uL of fluorophore containing kit buffer in a 500uL Qubit assay tube.

The mixture was vortexed for 15 seconds, and incubated for 2 minutes at room

temperature. The quantity of DNA was determined on the Fluorometer as described in

the kit protocol.

http://www.google.com.pk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0CCgQFjAB&url=http%3A%2F%2Fwww.ebc.uu.se%2FdigitalAssets%2F176%2F176882_3qubitbrochure.pdf&ei=uH-nVIj1GouXuATG64LYBw&usg=AFQjCNFFa5JzmOdUAKRFgPNH_AGqeaYqjw&bvm=bv.82001339,d.c2E

32

2.2.3 Library Preparation and DNA Sequencing

2.2.3.1 Fragmentation of Genomic DNA

To prepare mate-paired library for whole genome sequencing, the genomic DNA was

fragmented on Covaris™ S220 Focused-ultrasonicator system. About 5 ug DNA was

sheared for average size of 1300 bp using the Covaris™ recommended protocol

(Table 2.2). The average sized DNA was selected on 2% agarose gel electrophoresis

with a sharp and clean blade. The size selected DNA was purified from the gel using

PureLink® Quick Gel Extraction kit.

Table 2.2: Covaris protocol for fragmenting genomic DNA.

Parameter Value

Peak Incident Power (W) 140

Duty Factor 2%

Cycles per Burst 200

Treatment Time (s) 4 15 sec

Temperature 4-8 oC

2.2.3.2 Mate-paired Library Preparation

The mate-paired library of size selected DNA was prepared as per the protocols given

in the SOLiD® Mate-Paired Library Manual. Following steps were carried out in

preparing the library:

i. The ends of the fragmented pieces of DNA were repaired with End Polishing E1

and E2 enzymes.

ii. The mate-paired right (MPR) and mate-paired left (MPL) adaptors were ligated

to the ends of DNA fragments with the ligase enzyme.

iii. The adaptors ligated DNA was circularized by incubating the reaction tube on

70 oC and then pacing on ice immediately.

https://www.lifetechnologies.com/order/catalog/product/4465653


33

iv. The circularized DNA was purified using the Agencourt AMPure XP beads. The

DNA was recovered from the beads with the elution buffer.

v. The nick in the circularized DNA was translated using DNA Polymerase I at

5 oC for exactly 13.00 minutes.

vi. The nick translated DNA was digested with T7 Exonuclease and S1 Nuclease

enzymes. The digested DNA was purified using the Agencourt AMPure XP

beads and the DNA elution buffer.

vii. A dA-tail was added to both ends of T7 Exonuclease and S1 Nuclease treated

DNA using the A-Tailing Enzyme II. It increases the efficiency of ligating the P1

& P2 adaptors to the digested DNA.

viii. The library was bound to streptavidin beads in 1X BSA solution, and then the

P1 & P2 adaptors were ligated using the T4 DNA ligase enzyme.

viii. The library was nick-translated to fill in any gap, and trial amplification was

performed using the Platinum® PCR Amplification Mix with the conditions of

reaction given in Table 2.3.

Table 2.3: PCR conditions for the amplification of mate-paired library.

Stage Step Temp Time

Holding Nick Translation 72 oC 20 min

Holding Denaturation 94 oC 03 min

Cycling Denature 94 oC 15 sec

Anneal 62 oC 15 sec

Extend 70 oC 01 min

Holding Extend 70 oC 05 min

Holding --- 4 oC ∞

ix. The size of the trial amplified library was evaluated on E-Gel® Electrophoresis

System using a 2 % agarose E-gel.

x. Finally, the library was full amplified using the Platinum® PCR Amplification Mix

with same thermo cycler conditions as in Table 2.3.

https://www.google.com.pk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CBsQFjAA&url=https%3A%2F%2Fwww.beckmancoulter.com%2Fwsrportal%2Fwsr%2Fresearch-and-discovery%2Fproducts-and-services%2Fnucleic-acid-sample-preparation%2Fagencourt-ampure-xp-pcr-purification%2Findex.htm&ei=1IenVKCsN8e3uQTZqILYCQ&usg=AFQjCNFYAlLvdeB8L3Yka_iOSX80M2lQfw&bvm=bv.82001339,d.c2E&cad=rja

https://www.google.com.pk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CBsQFjAA&url=https%3A%2F%2Fwww.beckmancoulter.com%2Fwsrportal%2Fwsr%2Fresearch-and-discovery%2Fproducts-and-services%2Fnucleic-acid-sample-preparation%2Fagencourt-ampure-xp-pcr-purification%2Findex.htm&ei=1IenVKCsN8e3uQTZqILYCQ&usg=AFQjCNFYAlLvdeB8L3Yka_iOSX80M2lQfw&bvm=bv.82001339,d.c2E&cad=rja

http://www.google.com.pk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&sqi=2&ved=0CCAQFjAA&url=http%3A%2F%2Fwww.lifetechnologies.com%2Fcontent%2Fdam%2FLifeTech%2Fmigration%2Ffiles%2Fpcr%2Fpdfs.par.72660.file.dat%2F11306016.pdf&ei=nJGnVOGOH8mcugSrlYDgCQ&usg=AFQjCNENVZoqy6i5nPiLbZIlY9T8IgEzQg&bvm=bv.82001339,d.c2E

http://www.google.com.pk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CBsQFjAA&url=http%3A%2F%2Fwww.lifetechnologies.com%2Fus%2Fen%2Fhome%2Flife-science%2Fdna-rna-purification-analysis%2Fnucleic-acid-gel-electrophoresis%2Fe-gel-electrophoresis-system.html&ei=kJSnVO6rKou4uATg7IKgBA&usg=AFQjCNFUan-9izuu-vpOgWshegThi8L1zQ&bvm=bv.82001339,d.c2E


http://www.google.com.pk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&sqi=2&ved=0CCAQFjAA&url=http%3A%2F%2Fwww.lifetechnologies.com%2Fcontent%2Fdam%2FLifeTech%2Fmigration%2Ffiles%2Fpcr%2Fpdfs.par.72660.file.dat%2F11306016.pdf&ei=nJGnVOGOH8mcugSrlYDgCQ&usg=AFQjCNENVZoqy6i5nPiLbZIlY9T8IgEzQg&bvm=bv.82001339,d.c2E

34

2.2.3.3 Evaluation of the Library with Bioanalyzer

For precisely evaluating the size distribution of mate-paired library, assessment was

performed on the Agilent Bioanalyzer 2100 instrument using Agilent DNA 100 kit as

per the manufacturer‘s protocol. The DNA 1000 kit can resolve DNA bands ranging in

size from 25 – 1000 bp with a detection limit of 0.5 – 50 ng/uL of DNA. To remove the

flags of library peak, size selection was carried out on 2 % agarose gel on E-Gel®

Electrophoresis System, until a bell shaped distribution of the library was obtained.

2.2.3.4 Preparation of Emulsion, Emulsion-PCR, and Beads Enrichment

For performing the emulsion-PCR, the emulsion of the template library and the PCR

components was prepared on Applied Biosystems SOLiD® EZ Bead™ Emulsifier

system using the Applied Biosystems SOLiD® EZ Bead™ Emulsifier E80 reagent kit

and related accessories. The Emulsifier mixes the oil phase, P1 beads, aqueous

master mix, primers of emulsion PCR (ePCR), and the template (library) and prepares

an emulsion, in which a tiny droplet of oil contains one fragment of template DNA, and

all the components of PCR reaction. The E80 emulsion scale gives the final yield of 1

billion beads after amplification and enrichment. The P1 Beads were declumped on

S220 Focused-ultrasonicator and re-suspended in 1430 μL of SOLiD EZ Bead

Emulsifier reagent, 1x TEX buffer. The amount of other components used for

preparing the emulsion is given in Table 2.4. The amount of library to be used for

preparing the emulsion was calculated with e-calculator provided by the Life

Technologies (Table 2.5).

Table 2.4: Components for preparing the emulsion for ePCR.

Components Amount

SOLiD EZ Bead Emulsifier E80-P1 Beads 1430 uL

1x TEX buffer 1430 μL

Oil Master Mix 67.9 g

SOLiD EZ Bead Emulsifier E80-P1 Reagent (diluted 1:10) 200 uL

SOLiD EZ Bead Emulsifier E80-P2 Reagent 300 uL

SOLiD EZ Bead Emulsifier E80-Aqueous Master Mix 47978 uL

Library Template 21.4 uL



35

Table 2.5: Determining the amount of template to be used in emulsion preparation, using the e-calculator-Life Technologies.

The emulsion was subjected to ePCR on Applied Biosystems SOLiD® EZ Bead™

Amplifier system using the SOLiD® EZ Bead™ E80 emulsion kit (cat # 4452722),

followed by beads enrichment on Applied Biosystems SOLiD® EZ Bead™ Enricher

system using the SOLiD® EZ Bead Enricher E80 Reagent Kit (cat # 4452725),

SOLiD® EZ Bead Enricher Buffer Kit (cat # 4444140), and SOLiD® EZ Bead Enricher

Accessories Kit (cat # 4453073). All the reagents of kits and consumables were

installed in the instrument according to the instructions in the user manual to perform

the enrichment system. At the end of the process, good amount of amplified beads

was obtained.

2.2.3.5 3’-Modification of Template Beads

Before loading the template beads onto the flow-chip, the 3‘-end was modified using

the SOLiD® Pre Deposition Kit (cat # 4452805). First, the beads were sonicated using

the Covalent Declump 3 program on Covaris™ S220 sonicator machine. The beads

were washed with 1X Terminal Transferase Reaction buffer, and then re-suspended it

in 160 μL of 1 mM Bead Linker solution, and 1424 μL of 1X Terminal Transferase

Reaction buffer. Then, 8 μL of Terminal Transferase enzyme (20 U/μL) was used for

every 792 μL of bead solution and incubated at 37oC for 2 hours. After incubation, the


36

beads were washed once with 1X TEX Buffer, and finally resuspended in 400 μL of 1X

TEX Buffer.

2.2.3.6 Loading the Flow Chip with Template Beads for Sequencing Reactions

For loading the 3‘-modified template beads onto the flowchip, FlowChip Deposition

Buffer 1 was used to wash the beads three times. After the final wash, the beads were

suspended in 135 uL of FlowChip Deposition Buffer 1 for loading in 5 lanes of flowchip

(27 uL/lane of flowchip). The beads were declumped using Deposition_Declump

program on Covaris S220 sonicator, and immediately loaded 27 uL of the beads into

each lane of the flowchip. The flowchip was incubated at 37oC for 1 hour in an

incubator. Finally, the flowchip was installed into the Applied Biosystems SOLiD®

5500xl Genetic Analysis system as per the instructions. Co-forward sequencing of F3

and R3 tags of mate-paired library was carried out using the SOLiD® FWD1 SP Kit

(cat # 4463011), SOLiD® FWD2 SP Kit (cat # 4463012), SOLiD® FWD SR S75/S50

kit & SOLiD® FWD Buffer (cat # 4459193), and other related buffers.

The Applied Biosystems SOLiD® 5500xl Genetic Analysis system performs the

sequencing of DNA by ligation chemistry termed as sequencing by oligomer ligation

and detection (SOLiD). In this method, a universal primer anneals with the P1 adaptor

of the template. Then, a pool of octamer probes of nucleotides are added in the

reaction. Each of these octamers contains three modified bases at the 3‘ end, with a

fluorophore attached to it. The complementary octamers anneal with the template and

are ligated with the OH group of preceding base on the newly growing DNA strand.

The three bases at 3‘ end of the octamer cleave off after the ligation, and a specific

fluorescence is emitted which is detected by CCD camera and recorded. Only two first

bases of the probes are recorded. The oligomer ligation and detection cycle is

repeated 5 times, and every base is called twice, hence improving the accuracy of

sequencing reaction (Figure 2.4) (Mardis, 2008; Metzker, 2010).

37

Figure 2.4: The reactions of sequencing by oligomer ligation and detection (SOLiD) technology.

38

2.2.4 Analysis of the Genomic Data

The analysis of the genomic data to determine variants present in the subject under

study comprised of a number of steps with the use of different scripts and

bioinformatics tools. For this, a standard pipeline of variants calling from raw

sequencing reads was applied using the GATK best practices. The steps are

described below:

2.2.4.1 Filtration of Poor Quality Short Reads

The raw sequencing data was obtained in ‗XSQ‘ (eXtensible SeQuence) format. This

XSQ file system is a machine readable binary format which should be converted into

human readable ‗csfasta‘ format for downstream data analysis. The csfasta is color

space format of fasta file which is specific for SOLiD sequencing platform. The XSQ

files were converted into csfasta format using the XSQ_Converter tool by Life

Technologies. (http://www.lifetechnologies.com/pk/en/home/technicalresources/soft-

waredownloads/xsq-software.html).

convertFromXSQ.sh -c -f dnaseqlab5500xl_2014_01_15_1_01.xsq -o /data/results/lane1/

Here, -c and -f parameter specifies that the input file is in XSQ format, and -o

parameter defines the output directory where the csfasta files are generated with

default name as the input file.

The base calling quality scores of the reads were judged with the perl tool

SOLiD_preprocess_filter_v2,pl (Sasson, and Michael, 2010). Here, default parameters

were used for quality trimming i.e., baseline of quality score 10 was used to trim the

poor quality reads, and in each read 3 bases with quality score <10 was allowed. In

addition, all the reads containing any dot (missing base call) was trimmed. The reads

passing the quality filtration, and with matching mate-pairs were proceeded for

alignment step.

perl SOLiD_preprocess_filter_v2.pl -i mp -f

dnaseqlab5500xl_2014_01_15_1_01_default_F3.csfasta -g

dnaseqlab5500xl_2014_01_15_1_01_default_F3.QV.qual -r

http://www.lifetechnologies.com/pk/en/home/technicalresources/

39

dnaseqlab5500xl_2014_01_15_1_01_default_R3.csfasta -s

dnaseqlab5500xl_2014_01_15_1_01_default_R3.QV.qual -a y -n y -o out_file

Here, -i option specifies that the input files are from mate-paired library. The option -f,

-g, -r, and -s specifies the forward1 reads, their quality scores, forward2 reads, and

their quality score respectively. The option -a generates a text file containing the

statistics of the filtration process, -n option removes any short read containing dot, i.e.,

the base call was missing there, and -o option is prefix for output files.

2.2.4.2 Alignment of Short Reads with the Reference Human Genome:

This is a key step in genome/exome sequencing experiments, because the false

alignment with reference genome leads to acquiring of false positive variants. To align

the short reads with the reference human genome, the ‗LifeScopeTM Genomic Analysis

Software‘ of Life Technologies was used. Here, the human reference genome version

19 (hg19.fa) of UCSC genome browser was used (http://hgdownload.cse.ucsc.edu/

goldenPath/hg19/bigZips/).

2.2.4.3 Post Alignment Processing and Variants Calling

The alignment of short reads with the reference was obtained in the form of sorted

Sequence Alignment Map (SAM) and its binary format Binary Alignment Map (BAM).

The post alignemtn processing and variant calling of BAM files was carried out by

applying best practices of Picard-tools-1.109 (http://picard.sourceforge.net) and

Genome Analysis Tool Kit (GATK) (McKenna et al., 2010). Following steps were

employed for this purpose.

i. The @RG tags were assigned to each of 5 BAM files using the Picard tool‘s

AddOrReplaceReadGroups so that these may be recognized separately by the

downstream processes after the merging.

java -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -jar picard-tools-

1.109/AddOrReplaceReadGroups.jar I=dnaseqlab5500xl_2015_03_11_1_06_6-5-1.bam

O=amz_1_RG.bam SO=coordinate RGID=FLOWCHIP1_L1 RGLB=MP RGPL=SOLID

RGPU=FLOWCHIP1_L1 RGSM=AMZ CREATE_INDEX=true

http://hgdownload.cse.ucsc.edu/

http://picard.sourceforge.net/

40

ii. The individal BAM files were merged by Picard tool‘s MergeSamFiles module,

and subsequently duplicates were removed from the merged BAM file using the Picard

tools MarkDuplicates module.

java -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-DoEscapeAnalysis

-jar /picard-tools-1.109/MergeSamFiles.jar I=amz_1_RG.bam I=amz_2_RG.bam

I=amz_3_RG.bam I=amz_4_RG.bam I=amz_5_RG.bam SO=coordinate

ASSUME_SORTED=true O=amz_RG_merge.bam

java –Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-DoEscapeAnalysis

-jar /picard-tools-1.109/MarkDuplicates.jar I=amz_RG_merge.bam

O=amz_RG_merge_dedup.bam REMOVE_DUPLICATES=true ASSUME_SORTED=true

M=amz_dedup_metrics CREATE_INDEX=true

iii. The local re-alignment was performed with GATK RealignerTargetCreator and

IndelRealigner walkers using the known indel sites of 1000 Genomes project to

optimize the alignment near the indels as:

java -Xms24g -Xmx48g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -XX:-

DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R hg19.fa -I

amz_RG_merge_dedup.bam --known

Mills_and_1000G_gold_standard.indels.GR37.sites.vcf --known

1000G_phase1.indels.hg19.vcf -o amz_RG_merge_dedup_realign.intervals


DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T IndelRealigner -R hg19.fa -I

amz_RG_merge_dedup.bam -targetIntervals amz_RG_merge_dedup_realign.intervals -

known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -known

1000G_phase1.indels.hg19.sites.vcf -o amz_RG_merge_dedup_realignIndels.bam

iv. Next, the base quality score recallibration was performed using GATK

BaseRecalibrator and PrintReads walkers. GATK applies a machine learning

approach to reassess the errors of sequencing platform empirically, and adjusts the

Q scores of bases accordingly. This improves the accuracy of base calling.


DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R hg19.fa -I

amz_RG_merge_dedup_realignIndels.bam -knownSites dbsnp37_chr_20151104.vcf -

knownSites Mills_and_1000G_gold_standard.indels.GR37.sites.vcf -knownSites

1000G_phase1.indels.hg19.vcf -o amz_RG_merge_dedup_realignIndels_BQRC.grp --

41

solid_nocall_strategy LEAVE_READ_UNRECALIBRATED --solid_recal_mode

SET_Q_ZERO_BASE_N


DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T PrintReads -R hg19.fa -I

amz_RG_merge_dedup_realignIndels.bam -BQSR

amz_RG_merge_dedup_realignIndels_BQRC.grp -o

amz_RG_merge_dedup_realignIndels_BQRC.bam

v. The variants calling from the base quality recallibrated bam file was carried out

using GATK HaplotypeCaller. The variants were called with minimum mapping quality

score of 20.


DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg19.fa -I

amz_RG_merge_dedup_realignIndels_BQRC.bam --dbsnp snp37_20151104.vcf -o

amz_raw_q20.vcf -stand_call_conf 20

vi. A variant was considered where at least two reads were supporting the variant.

So the raw vcf file was filtered with bcftools with DP>=2, as:

bcftools filter -i ―DP>=2‖ -o amz_q20_DP2.vcf amz_raw_q20.vcf

vii. The tendency of discoverving false postive variants was assessed by

calculating the Ti/Tv ratio. The Ti/Tv ratio was evaluated with GATK VariantEval

welker, as:


DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T VariantEval -R hg19.fa -I

amz_RG_merge_dedup_realignIndels_BQRC.bam --eval:my_call amz_q20_DP2.vcf -o

amz_q20_DP2.vcf.eval.gr

42

2.2.5 Assessing the Genetic Variants related to Hyperlipidemia, and

related Cardiac Disorders

The annotation of variants was performed with ANNOAR with gene based, region

based, and filter based annotations described in section 2.1.3. For evaluating the

genetic variants related to hyperlimidemia, obesity and risk of related cardiac disorders

such as hypertension, myocardial infarction, and coronary artery disease, the analysis

pipe line as described in section 2.1.3 was applied to find the predicted deleterious

variants and re-assessing the variants filtered in mutational load analsis (section 2.1),

in the individual with these disorders. The variants related to these disorders were also

filtered by ClinVar, OMIM, and GWAS catalogure, which contains genetic variants

reported to be associated with diseases through genome wide association studies

(GWAS) (Welter et al., 2013).

43

2.3 Whole Exome Sequencing of Patients with Cardiomyopathy

Whole exome sequencing and its analysis by bioinformatics tools is becoming a

standard approach in investigating the genetic variations linked to diseases. In this

method, the coding regions of all the genes of a genome are sequenced to study the

mutations which may affect proteins‘ structure and function (Hintzsche, Robinson, and

Tan, 2016). The whole exome sequencing of 05 patients with dilated cardiomyopathy

(DCM) was carried out in order to validate the predicted deleterious variants of CVDs

identified in healthy persons through bioinformatics analysis (section 2.2). Dilated

cardiomyopathy is characterized as the dilation of left ventricle and its impaired

efficiency to pump the blood to peripheral body (systolic dysfunction) in the absence of

coronary artery disease and other abnormal loading conditions such as valves disease

or hypertension (Elliott, 2000). For this study, the ethical approval was obtained from

the institutional ‗Internal Ethical Committee‘. The patients of dilated cardiomyopathy

were selected from National Institute of Cardiovascular Diseases (NICVD), Karachi,

Pakistan. These five patients belonged to five different ethnic backgrounds of

Pakistan, i.e., one patient each from Punjabi, Sindhi, Balochi, Kashmiri, and Urdu

speaking community.

2.3.1 Selection of Cardiomyopathy Patients

The patients of dilated cardiomyopathy were selected based on the confirmed

diagnosis by a cardiologist. The inclusion criteria comprised of the physical symptoms

of the patients, and echocardiography reports (left ventricle dilation, ejection fraction ≤

30), age < 55 years, and the absence of coronary artery disease, hypertension, and

heart valve disease (Japp, Gulati, Cook, Cowie, and Prasad, 2016). Other modifiable

risk factors such as smoking, and alcoholism etc. were also excluded. In order to

enhance the power of study, preferably, the patients with parents who had cousin

marriages or in relatives were selected. Informed consent was obtained from the

patients prior to the blood specimen collection.

44

2.3.2 Collection of Blood Samples, and DNA Isolation and Quantification

The collection of blood samples from the selected patiens, isolation of genomic DNA

and assessment of its quality was carried out according the protocols as described in

section 2.1. Approximately 5 mL venous blood from each patient was collected in an

EDTA-tube. The genomic DNA was isolated using the CTAB buffer. The quality of

genomic DNA was assessed using the agarose gel electrophoresis, for which, 1%

agarose gel was prepared in 1X TAE buffer. The quantity of genomic DNA was

estimated on Qubit® 2.0 Fluorometer using the Qubit® dsDNA HS Assay Kit (Thermo

Fischer). For quantification, 1 uL of DNA sample was added to 199 uL of kit buffer

containing the fluorophore.

2.3.3 Library Preparation and Exome Enrichment for Whole Exome

Sequencing

The library preparation and whole exome sequencing was carried out at Macrogen Inc.

Seoul, South Korea. For whole exome sequencing, the fragment library for paired-end

sequencing was prepared with SureSelectXT Library Preparation Kit (Agilent

Technologies, Santa Clara, CA) using the SureSelectXT Target Enrichment System for

Illumina, Version B.2, April 2015. The whole exomes were enriched with SureSelectXT

Human All Exon v6 kit (Agilent Technologies, Santa Clara, CA) which captures 60 Mb

of the human genome (Agilent Technologies, 2017). The workflow of NGS library

preparation is oulined in Figure 2.4. The details of protocols is described as under:

2.3.3.1 Fragmentation of Genomic DNA

The genomic DNA (gDNA) was fragmented to an average size of 200 bp using the

Covaris S220 system. About 200 ng of genomic DNA was diluted with 1X Low TE

Buffer to a final volume of 50 uL in a 1.5-mL LoBind tube, and then transferred to

Covaris MicroTube. The gDNA was sheared using the Covaris program given in table

2.6.

http://www.google.com.pk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0CCgQFjAB&url=http%3A%2F%2Fwww.ebc.uu.se%2FdigitalAssets%2F176%2F176882_3qubitbrochure.pdf&ei=uH-nVIj1GouXuATG64LYBw&usg=AFQjCNFFa5JzmOdUAKRFgPNH_AGqeaYqjw&bvm=bv.82001339,d.c2E

45

Figure 2.5: NGS workflow for fragment library preparation and paired-end sequencing on Illumina.

46

Table 2.6: Settings on the Covaris instrument for gDNA fragmentation

Settings Value

Duty Factor 10%

Peak Incident Power (PIP) 175

Cycles per Burst 200

Treatment Time 6 min

Bath Temperature 5oC

2.3.3.2 End-repair of the Fragmented DNA

The ends of sheared DNA were repaired using the SureSelectXT Library Prep Kit. For

each of the sample, a volume of 52 uL of End Repair master mix was prepared as

given in Table 2.7. The whole of sheared DNA was mixed with End Repair master mix

and incubated in a thermocycler at 20oC for 30 minutes and then hold at 4oC.

Table 2.7: Components of End Repair master mix

Components Volume (uL)

10× End Repair Buffer 10.0

dNTP Mix 1.6

T4 DNA Polymerase 1.0

Klenow DNA Polymerase 2.0

T4 Polynucleotide Kinase 2.2

Nuclease-free water 35.2

Total mixture 52.0

2.3.3.3 Purification and Adenylation of End-repaired DNA

The end repaired DNA was purified using 180 μL of homogeneous AMPure XP beads

for each sample. The mixture was mixed well by pipette up and down and then

incubated at room temperature for 5 minutes. The tube was placed on a magnetic

stand for 01 minute and let the solution to be cleared. The clear supernatant was

discarded carefully without disturbing the beads. The beads were washed with 70%

47

ethanol without distrubing them. After the second wash, 32 μL nuclease-free water

was added to each sample tube, vortexed for 15 seconds, and incubated at room

temperature for 2 minutes. The tube was again placed on the magentic stand for 3

minutes. The clear supernatant, which contained the end-repaired DNA, was

transferred to a new PCR tube carefully.

The adenylation of end-repaired DNA at 3‘-ends was carried out to enhance the

ligation efficiency of SureSelect Adaptors. For this reaction, 20 uL of adenylation

master mix was used with 30 uL of end-repaired DNA. The adenylation master mix

contains components mentioned in Table 2.8. The mixture was incubated in a

thermocycler at 37oC for 30 minutes following a hold at 4oC. The 3‘-adenylated DNA

was purified using the AMPure XP beads as described above. The DNA was eluted in

13 uL of nuclease-free water.

Table 2.8: Components of Adenylation master mix


10× Klenow Polymerase Buffer 5.0

dATP 1.0

Exo(–) Klenow 3.0


Total mixture 20.0

2.3.3.4 Ligation of Paired-end Adaptors

Paired-end adaptors are ligated on both ends of the 3‘-adenylated DNA. The adaptors

are complementary to the sequencing primers used during the sequencing reactions in

the flow cell of Illumina sequencer. For this, 1:10 diluted SureSelect Adaptor Oligo Mix

was used. The reaction mixture contained following components:

48

Table 2.9: Components for ligation of paired-end adaptors


5× T4 DNA Ligase Buffer 10.0

1:10 diluted SureSelect Adaptors Oligo Mix 10.0

T4 DNA Ligase 1.5


Total mixture 37.0

The paired-end adaptors reaction mixture was mixed with 13 uL of adenylated DNA

from previous step. The mixture was incubated in a thermocycler at 20oC for 15

minutes following the hold at 4oC. The adaptors-ligated DNA was purified using the

AMPure XP beads as described in step 2.3.3.3. The DNA was eluted in 30 uL of

nuclease-free water.

2.3.3.5 Amplification of Adaptors-ligated Library

Amplification of the library using few cycles of PCR, increases the number of DNA

fragments to which both the adaptors have been ligated. For amplifying the library,

Herculase II Fusion DNA Polymerase is used which has high fidelity. The components

of amplification reaction mixture are given in Table 2.10.

Table 2.10: Components for amplifying the library


SureSelect Primer 1.25

SureSelect ILM Indexing Pre-Capture PCR Reverse Primer

1.25

5× Herculase II Reaction Buffer 10

100 mM dNTP Mix 0.5

Herculase II Fusion DNA Polymerase 1.0


Total 20.0

49

This 20 uL of amplification reaction mixture was added to 30 uL of adaptors ligated

purified DNA from previous step, mixed well through pippeting and performed 10

cycles of PCR reaction according to the program mentioned below (Table 2.11).

Table 2.11: PCR program for amplification of adaptor ligated library

Temperature Time Repeats

98°C 2 minutes 1

98°C 30 seconds 10

65°C 30 seconds

72°C 1 minute

72°C 10 minutes 1

4°C hold ∞

After the PCR, the amplified library was purified again using the AMPure XP beads as

described in step 2.3.3.3 above. The amplified library was eluted in 30 uL of nuclease-

free water, and stored at -20oC.

2.3.3.6 Assessment of Quality and Quantity of the Amplified Library

The prepared libraries were analyzed and quantified using the Agilent 2200

TapeStation (Agilent Technologies, Santa Clara, CA). The samples were prepared

according to the user manual. To perform analysis, 1 uL of library was mixed with 3 μL

of D1000 sample buffer in sample tubes and vortexed for 5 seconds following a brief

centrifugation. The sample tubes, D1000 ScreenTape, and loading tips were placed in

the instrument and run was started according to the User Manual.

2.3.3.7 Hybridyzation and Exome Capturing

The exomic regions of the prepared library were captured using the SureSelectXT

Human all exon V6 kit (Agilent Technologies, Santa Clara, CA). The libraries were

concentrated in a vacuum concentrator to a final volume of 3.4 uL with the

concentration of 221 ng/μL (750 ng DNA). To each library, 5.6 uL of Block Mix (Table

2.12) was added, and incubated at 95oC in a thermocycler for 5 minutes following hold

50

at 65oC for another 5 minutes. For capturing the exoming regions, the Hybridiation

buffer was prepared according to the Table 2.13, and Capture Library Hybridization

Mix was prepared was prepared according to Table 2.14.

Table 2.12: Components of Block Mix


SureSelect Indexing Block 1 2.5

SureSelect Block 2 2.5

SureSelect ILM Indexing Block 3 0.6

Total 5.6

Table 2.13: Components of Hybridization Buffer


SureSelect Hyb 1 6.63




Total 13.0

Table 2.14: Components of Capture Library Hybridization Mix for capture size ≥3 Mb


Hybridization Buffer mixture 13.0

25% RNase Block solution 2.0

Library to be captured 5.0

Total 20.0

The capturing components were mixed with library at 65oC in a thermocycler and the

mixture was incubated at the same temperature for 24 hours.

51

2.3.3.8 Capturing the Hybridized DNA using Streptavidin-coated Beads

Before the capture of hybridized DNA, the magnetic coated streptividin beads were

prepared. For this, 50 uL of streptividin beads were washed 3 times by suspending in

200 uL of SureSelect Binding Buffer. The Hybridization mixture (containing the library)

was added to 200 uL of washed streptividin beads and mixed well on a stirrer for 30

minutes. After it, the plate was centrifuged briefly and put on a magnetic rack till the

beads settled completely. The supernatant was discarded, and beads were washed

using 200 μL of SureSelect Wash Buffer 1 with an incubation on 30 minites at room

temperature. The beads were then washed 3 times with pre-warmed Wash Buffer 2

with an incubation of 10 minutes at 65oC. After the final wash, the beads were

suspended in 30 uL nuclease free water.

2.3.3.9 Amplification of Captured Library with Indexing Primers

To run multiple samples in one lane of Illumina sequencer, each sample needs to be

indexed for identification. The indexing is carried out using indexing primers in a PCR

reaction. The PCR reaction mixture for each library was prepared as given in Table

2.15. For performing the indexing, 14 uL of streptavidin-bound library, 1uL of indexing

primer, and 35 uL of PCR reaction mixture were mixed in a well of PCR tubes strip.

The PCR reaction was performed according to the coditions given in Table 3.16. After

the PCR reaction, the amplified indexed library was purified using the AMPure XP

beads as described in section 2.3.3.3 of this section. The library was eluted in 30 uL of

nuclease free water. The libraries were analyzed and quantified with Agilent 2200

TapeStation using 1 uL of volume, as described in section 2.3.3.6.

52

Table 2.15: Components of PCR for indexing


5× Herculase II Reaction Buffer 10.0

Herculase II Fusion DNA Polymerase 1.0

100 mM dNTP Mix 0.5

SureSelect ILM Indexing Post-Capture Forward PCR Primer

1.0


Total 35.0

Table 2.16: PCR program for indexing the library

Temperature Time Repeats

98°C 2 minutes 1

98°C 30 seconds 10

57°C 30 seconds

72°C 1 minute

72°C 10 minutes 1

4°C hold ∞

2.3.3.10 Sequencing by Synthesis on Illumina Platform

The sequencing of the libraries was carried out on Illumina HiSeq 4000 system using

the TruSeq SBS v3 reagents. After cluster generation, the paired-end sequencing was

carried out for 2 x 100 bp fragment lengths using sequencing by synthesis (SBS)

technology. In SBS, the fluorescently-labeled nucleotides are incorporated into the

growing poly-nucleotide chain, such that only one nucleotide is incorporated at a time

because the label acts as the reversible terminator. After the nucleotide is detected by

its fluorescence, the terminator is cleaved enzymatically and then the next labelled

nucleotide is incorporated.

53

2.3.4 Analysis of Whole Exome Sequencing Raw Data

The raw sequencing data was obtained in ‗.fastq‘ format. The secondary analysis to

generate a standard variants call format file requires a sophisticated pipe line of

computational biology tools. The pipe line of secondary analysis, as used by ExAC

consortium, was used with slight modification. Various steps to analyze data are given

below:

1. The quality of raw data was assessed with ‗FastQC‘ tool (Andrews, 2010),

which performs analysis and graphically reports the read length of high throughput

short reads, its per base quality scores, GC contents, and contmination of adaptors.

perl fastqc –f fastq MS-1_1.fastq -o ./qc/

2. The good quality short reads were aligned with human reference genome

version 19 (hg19.fa) of UCSC genome browser (http://hgdownload.cse.ucsc.edu/

goldenPath/hg19/bigZips/) using the Burrows-Wheeler Aligner (BWA) tool (H. Li, and

Durbin, 2009). The reference genome was indexed with ‗bwa index‘ prior to the

alignment. For performing alignment, BWA-MEM algorithm was used because it can

align short DNA reads of 70b-1Mb. This algorithm performs alignement of short reads

with the reference using the Smith-Waterman-algorithm (SW) to enhance the maximal

exact matches (MEMs).

./bwa index hg19.fa

./bwa mem -M -t 8 hg19.fa MS-1_1.fastq MS-1_2.fastq > ms1.sam

3. The alignment result was obtained in Sequence Alignment Map (SAM) format,

which was converted into the binary format ‗Binary Alignment Map‘ (BAM) using the

samtools version 0.0.19 (H. Li et al., 2009).

samtools view –bS ms1.sam -o ms1.bam

http://hgdownload.cse.ucsc.edu/%20goldenPath/

http://hgdownload.cse.ucsc.edu/%20goldenPath/

54

4. The next five steps of pipeline i.e., sorting of BAM files, adding RG tags,

removing duplicates, indels realignment and base quality score recallibration were

performed as described in section 2.2.4.3.

5. The genotype calling of each base quality score recallibrated BAM file was

carried out using GATK‘s walker HaplotypeCaller. The minumum calling confidence of

30 was set. To output only genotypes, -ERC GVCF flag was used.

java -Xmx32g -Djava.io.tmpdir=./tmp/ -XX:-UseGCOverheadLimit -jar

GenomeAnalysisTK.jar -T HaplotypeCaller --

disable_auto_index_creation_and_locking_when_reading_rods -R hg19.fa -I

MS4_q30_sort_dedup_RG_ReAlignIndel_BQRC.bam --dbsnp

dbsnp37_chr_20151104.vcf --minPruning 3 --maxNumHaplotypesInPopulation 200 -

ERC GVCF -o MS4_q30.g.vcf.gz -stand_call_conf 30

6. The joint variants calling from the gvcf files was carried out with GATK‘s walker

GenotypeGVCFs, as:


DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R

/scratch/cmp_shakeel/tools/hg19.fa -V MS1.g.vcf.gz -V MS2.g.vcf.gz MS3.g.vcf.gz -V

MS4.g.vcf.gz -V MS5.g.vcf.gz --dbsnp dbsnp37_chr_20151104.vcf -A

GenotypeSummaries -o msall_GVCF.vcf

7. The raw vcf file ‗msall_GVCF.vcf‘ was filtered with depth of 20 (DP≥20),

genotype quality of 20 (GQ≥20), and variant quality of 50 (QUAL ≥ 50). Futher, to

minimize the false positive discovery rate, recallibration of variants quality score was

performed using GATK‘s variant discovery tool ‗VariantRecalibrator‘. The single

nucleotide variants were trained with three high confidnet known SNPs datasets i.e.,

‗hapmap_3.3.GRCh37.vcf‘, 1000G_omni2.5.GRCh37.vcf, and 1000G_phase1.snps.

high_confidence.GRCh37.vcf. Likewise, the indels were trained using the

‗Mills_and_1000G_gold_standard.indels. GR37.sites.vcf‘ and ‗1000G_phase1.indels.

hg19.vcf‘. For SNPs, 99.2% sensitivity threshold was applied, and for indels, 95.0%

sensitivity threshold was applied to acheve maximum truth. Only the variants, passing

the filter were proceeded for tertiary analysis.

55


DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T VariantRecalibrator --

disable_auto_index_creation_and_locking_when_reading_rods -R hg19.fa -input

msall_DP20gq20.vcf -recalFile msall_DP20gq20_output-Tranche.snps.recal -

tranchesFile msall_DP20gq20_output-Tranche.snps.tranches -allPoly -tranche 100.0 -

tranche 99.8 -tranche 99.6 -tranche 99.4 -tranche 99.2 -tranche 99.0 -tranche 98.0 -

tranche 97.0 -tranche 95.0 -tranche 90.0 -an QD -an MQ -an MQRankSum -an

ReadPosRankSum -an FS -an SOR -

resource:hapmap,known=false,training=true,truth=true,prior=15

hapmap_3.3.GRCh37.vcf -resource:omni,known=false,training=true,truth=true,prior=12

1000G_omni2.5.GRCh37.vcf -

resource:1000G,known=false,training=true,truth=false,prior=10

1000G_phase1.snps.high_confidence.GRCh37.vcf -

resource:dbsnp,known=true,training=false,truth=false,prior=3

dbsnp37_chr_20151104.vcf --maxGaussians 4 -mode SNP -rscriptFile

msall_output.snps.recalibration-Tranche_plots.rscript


DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T ApplyRecalibration --


msall_DP20gq20.vcf -recalFile msall_DP20gq20_output.snps.recal -tranchesFile

msall_DP20gq20_outputg4.snps.tranches -ts_filter_level 99.2 -mode SNP -o

msall_DP20gq20_SNP.vcf


DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T VariantRecalibrator --


msall_DP20gq20-SNP98.vcf -recalFile msall_DP20gq20- output.indels.recal -

tranchesFile msall_DP20gq20- output.indels.tranches -allPoly -tranche 100.0 -tranche

99.95 -tranche 99.9 -tranche 99.5 -tranche 99.0 -tranche 97.0 -tranche 96.0 -tranche

95.0 -tranche 94.0 -tranche 93.5 -tranche 93.0 -tranche 92.0 -tranche 91.0 -tranche

90.0 -an QD -an ReadPosRankSum -an MQRankSum -an FS -an MQ -

resource:mills,known=false,training=true,truth=true,prior=12

Mills_and_1000G_gold_standard.indels.GR37.sites.vcf -

resource:1000G,known=false,training=true,truth=false,prior=10

1000G_phase1.indels.hg19.vcf -

resource:dbsnp137,known=true,training=false,truth=false,prior=2

dbsnp37_chr_20151104.vcf --maxGaussians 4 -mode INDEL -rscriptFile

MSall_output.indels.recalibration_plots.rscript

56


DoEscapeAnalysis -jar GenomeAnalysisTK.jar -T ApplyRecalibration --


msall_DP20gq20-SNP.vcf -recalFile msall_DP20gq20-output.indels.recal -tranchesFile

msall_DP20gq20-output.indels.tranches -ts_filter_level 95.0 -mode INDEL -o

msall_DP20gq20_SNP-indel.vcf

2.3.5 Analysis of Variants for Cardimyopathy

For annotating the genetic variants and determing the potential detrimental variants,

the analysis pipe line described insection 2.1.3 was employed. The annotation with

ANNOVAR was carried out by stand alone utility of ANNOVAR tool, the CADD was

performed online at CADD online server (http://cadd.gs.washington.edu/score), and

VEP was accessed at ensembl‘s online server http://grch37.ensembl.org/

Homo_sapiens/Tools/VEP.

In addition to annotation with afore mentioned tools, the data set was also filtered for

pathogenic, and likely pathogenic variants of dilated cardiomyopathy in ClinVar

database (Landrum et al., 2014), OMIM database (Hamosh, Scott, Amberger,

Bocchini, and McKusick, 2005), and genome wide associated studies (GWAS)

associated variants (Welter et al., 2013). Furthrmore, the deleterious variants as

prioritized from healthy individuals in mutational load analysis (section 2.1), were also

filtered for their validation.

http://cadd.gs.washington.edu/score

http://grch37.ensembl.org/%20Homo_sapiens/Tools/VEP

http://grch37.ensembl.org/%20Homo_sapiens/Tools/VEP

57

Chapter 3.0

Results and Discussion

58

3.1 Mutational Load of Cardiovascular Diseases in Pakistani

Population and its Comparison with Global Populations

3.1.1 Gene Ontology

Grouping of the genes under study based on their cellular, molecular, and biological

roles was carried out using the UniProt Gene Ontology Annotation database for

human (version 2.0) (Camon et al., 2004), and visualized using the BGI WEGO online

ontology tool (Ye et al., 2006). This analysis showed that most of the genes were

involved in binding, catalysis, and molecular transduction, and enzyme regulation in

many biological processes such as biological regulation, anatomical structure

formation, cellular compartment organization and genesis, developmental, metabolic,

and organismal process etc. (Figure 3.1). Gene ontology shows that genes related

with structural processes of the organelles representing the anatomical nature, and

genes related with extracellular processes represent the metabolic nature of cardiac

diseases.

3.1.2 Mutational Load of CVDs in Pakistani Population using 1000

Genomes PJL, ExAC SAS, and British Pakistanis Datasets

To quantify the mutational load of cardiovascular diseases, all the SNVs from three

datasets corresponding to intronic, exonic, untranslated regions, and flanking

upstream/downstream regions of the genes-set were analyzed by applying the

analysis pipeline. The numbers of variants found from the three data sets were

different due to the difference in data structure and sample size (Table 3.1).

59

Figure 3.1: Functional categorization of genes involved in cardiovascular diseases.

60

Table 3.1: The subset of variants within the coordinates of genes-set of CVDs. Here, ExAC (SAS) data was excluded for common CVDs for calculating the mutational load because it contained samples of common CVDs cohort.

Details

1000 Genomes PJL

British Pakistanis

ExAC South Asian

Sample size 96 3222 8256

Genes related to CVDs analyzed 1187 1187 379

Subset of variants in these genes 409102 93523 71816

Exonic variants 6941 41155 44357

5’-UTR variants 1573 1898 1075

3’-UTR variants 7541 2632 1694

Upstream variants 4668 256 80

Downstream variants 4752 39 09

Predicted Consequences of Variants:

Non-synonymous SNVs 3521 24901 28305

Synonymous SNVs 4125 15624 15437

Non-syn/syn ratio 0.85 1.59 1.83

‘Combined predicted deleterious’ SNV sites with SIFT, Polyphen2, and CADD_phred score ≥ 15 (dSNVs)

561 6028 7374

Homozygous dSNVs 69 -* 306

Loss of Function (LoF) dSNVs 05 09 142

Per Person deleterious SNV sites 5.84 1.87 0.89

* information not available

In order to normalize and evaluate the subset variants, the proportions of synonymous

SNVs and nonsynonymous SNVs in exonic variants, non-syn/syn ratio, deleterious

nonsynonymous SNVs, and homozygous deleterious nSNVs from these data sets

were calculated. This evaluation showed that the proportions of nonsynonymous

exonic SNVs (nonsynonymous SNVs/exonic SNVs), and deleterious nonsynonymous

SNVs (deleterious nSNVs/exonic SNVs) were higher in British Pakistanis and ExAC

SAS datasets. On the other hand, the proportion of synonymous exonic SNVs

(synonymous SNVs/exonic SNVs) was higher in 1000 Genomes Project PJL dataset

(Figure 3.2). The higher proportions of nonsynonymous and deleterious SNVs in ExAC

SAS, and British Pakistanis were may be due to the data structure, because both

these data sets were deeply sequenced (~100x) which captured ultra-rare allele

61

frequency variants also. To check this, average allele frequencies of nonsynonymous

and deleterious nSNVs were calculated and compared from the three datasets. The

average allele frequency of nonsynonymous SNVs in 1000 Genomes PJL was found

to be 0.117, while in British Pakistanis it was 0.01839, and in ExAC SAS 0.00826.

Likewise, the average allele frequency of deleterious nSNVs in 1000 Genomes PJL

was calculated as 0.028, in ExAC SAS 0.00149, and in British Pakistanis 0.00381.

Figure 3.2: The proportions of nonsynonymous, synonymous, and deleterious SNVs in three datasets.

62

The numbers of SNVs predicted as deleterious by CADD, SIFT, and Polyphen2 after

applying the analysis pipeline are summarized in Figure 3.3. The per-person

mutational load for cardiovascular diseases was calculated by dividing the combined

predicted deleterious SNVs with sample size in each dataset. This calculation showed

that there were 5.84 deleterious sites per person in 1000 Genomes PJL, 0.89 in ExAC

SAS, and 1.87 in British Pakistanis dataset. The low mutational load in ExAC SAS is

due to that this data was analyzed for Mendelian and congenital CVDs only, which can

be correlated with the general concept of low prevalence of Mendelian disorders, and

that these disorders are caused by mutations of usually rare allele frequency in single

or few genes with large impact on the structure and/or function of proteins (O'donnell,

and Nabel, 2011).

Figure 3.3: The number of SNVs predicted as deleterious by CADD, Polyphen2, and SIFT in genes of cardiovascular diseases.

To explore the apparent difference in the mutational load of CVDs from the 1000

Genomes PJL, and British Pakistanis datasets, which were analyzed with the same

number of genes, the additive mutational load was calculated. Additive mutational load

is the cumulative effect on fitness by taking into account the effect of all detrimental

63

alleles (Bergen, 2015; Henn et al., 2016). This was determined by dividing the sum of

all homozygous and heterozygous deleterious alleles by the cohort in that dataset

(Henn et al., 2016). The per person additive mutation load (in diploid genome) for

British Pakistanis was calculated to be 22.03, and for 1000 Genomes PJL 15.78.

Although, the British Pakistanis contained less number of per person deleterious sites,

yet these sites might have been raised to higher frequencies during high rate of

inbreeding in related individuals due to consanguineous unions which resulted in

higher additive mutational load for CVDs, a phenomenon termed as inbreeding

depression. In inbreeding depression, the increased breeding among related

individuals reduces the biological fitness due to the accumulation of recessive

mutations of varying detrimental effect in a given small population (Charlesworth, and

Willis, 2009). Further, the higher mutational load for common CVDs than for Mendelian

and congenital CVDs can be explained in that common CVDs are polygenic where

large number of deleterious variants in multiple genes with modest-to-weak effect play

their cumulative role in disease susceptibility, whereas Mendelian CVDs are

monogenic or oligo-genic where few rare variants pose greater effect in the outcome

of phenotype (Lettre, 2014).

From 1000 Genomes PJL analysis, the highest number of deleterious variants (10

variants) were found in PRRC2A which encodes proline rich coiled-coil 2A and is

involved in coronary artery aneurysm (Hsieh et al., 2010). The second highest number

of deleterious variants (9 variants) was found in SVEP1 which encodes Sushi Von

Willebrand factor type A, EGF and pentraxin domain containing 1 and is involved in

calcium ion and chromatin binding. This gene has been associated with coronary

artery disease. Notably, a deleterious variants rs111245230 was also found in SVEP1

which causes D2702G substitution in exon 38, and has been reported to be

associated with coronary artery disease and higher diastolic and systolic blood

pressures (Stitziel et al., 2016). Its minor allele frequency was found to be 5.20% in

PJL individuals, 2.76% in South Asians, 3.18% in Europeans, and 2.74% in

Americans. The third highest number of deleterious variants (8 variants) containing

gene was SYNE1 which encodes spectrin repeat containing nuclear envelope protein

1 which is a structural protein in skeletal and smooth muscles and is associated with

64

dilated cardiomyopathy. Mutations in this gene cause disruption of nuclear envelope

leading to defects in myogenesis (Zhou et al., 2017). Further, the genes APOB and

MUC16 were found to be having seven deleterious mutations each. APOB is

associated with hypercholesterolemia and coronary artery disease (Willer et al., 2008),

while MUC16 has been reported to be associated with hypertrophic cardiomyopathy

and heart failure (Varol et al., 2007). Another gene ACE was found with six deleterious

variants. This gene encodes angiotensin I converting enzyme and has been

associated with risk of hypertensive heart disease and coronary artery disease (Dhar,

Ray, Dutta, Sengupta, and Chakrabarti, 2012).

From ExAC SAS data analysis, the highest number of deleterious variants, i.e., 1526

deleterious variants were found in TTN, which encodes titin protein, which is part of

sarcomeres in striated muscles and is associated with cardiomyopathies (Gerull et al.,

2002; Matsumoto et al., 2005). To determine chromosomal locations of these large

number of deleterious variants in TTN, manhattans plot comprising the TTN region on

chromosome 2 was constructed using the CADD_phred score. This showed that the

majority of deleterious variants are bunched in initial exons of this gene (Figure 3.4).

Figure 3.4: Chromosomal positions of deleterious variants in TTN. The deleterious variants are bunched in initial exons of the gene.

65

In addition to TTN, many genes were found to have multiple deleterious variants for

Mendelian and congenital disease (Table 3.2). OBSCN which is paralogue of TTN,

was found having second highest number i.e., 233 deleterious variants. This gene

encodes obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF protein.

Table 3.2: Genes of Mendelian and congenital CVDs containing high number of predicted deleterious variants in ExAC SAS

Gene No. of dele variants

Disorder

TTN 1526 Dilated cardiomyopathy

OBSCN 233 Dilated cardiomyopathy

SYNE1 144 Dilated cardiomyopathy

ALMS1 120 Alström syndrome and dilated cardiomyopathy

FLNC 111 Dilated cardiomyopathy


MYH6 99 Dilated cardiomyopathy

LAMA3 98 Dilated cardiomyopathy

AKAP9 88 Cardiac arrhythmia

RYR2 84 Cardiac arrhythmia

SCN5A 84 Atrial fibrillation, long_QT_syndrome

VWF 78 Hypertrophic cardiomyopathy

MUC16 77 Hypertrophic cardiomyopathy

TNC 69 Dilated cardiomyopathy


TRPM4 60 Ventricular fibrillation

DSP 57 Arrhythmogenic right ventricular dysplasia 8

POLG 55 Dilated cardiomyopathy

MADD 53 Familial hypertrophic cardiomyopathy 4

ANK2 52 Cardiac arrhythmia, ankyrin-b-related

FLNA 52 Cardiac valvular dysplasia, x-linked

MYBPC3 51 Cardiomyopathies

MYH11 51 Aortic aneurysm, familial thoracic 4

MYBPC1 49 Dilated cardiomyopathy


DMD 45 Cardiomyopathy, dilated, 3b

MYOM1 43 Primary dilated cardiomyopathy|primary familial hypertrophic cardiomyopathy

NOTCH1 41 Ventricular septal defect

LAMA4 40 Primary familial hypertrophic cardiomyopathy

NFATC1 39 Ventricular septal defect

66

PRDM16 38 Dilated cardiomyopathy 1ll

UNC5B 37 Ventricular tachycardia

PYGB 36 Hypertrophic cardiomyopathy

CTNNA3 35 Left ventricular noncompaction cardiomyopathy

JUP 34 Cardiac arrhythmia

CACNA1C 33 Cardiac arrhythmia

DSC3 33 Arrhythmogenic right ventricular dysplasia 11

ERBB2 32 Dilated cardiomyopathy

RTN4 32 Congenital heart defects, with multiple joint dislocations

LDB3 31 Familial hypertrophic cardiomyopathy 1

NDUFV1 30 Dilated cardiomyopathy

DTNA 29 Left ventricular noncompaction 1, with or without congenital heart defects

DMPK 28 Hypertrophic cardiomyopathy

TXNRD2 28 Primary familial hypertrophic cardiomyopathy

VCL 28 Hypertrophic cardiomyopathy

ACTN2 27 Dilated cardiomyopathy 1aa; primary familial hypertrophic cardiomyopathy

HLA-DRB1 27 Dilated cardiomyopathy

IGF1R 27 Hypertrophic cardiomyopathy

PGM1 27 Dilated cardiomyopathy

DSC2 25 Arrhythmogenic right ventricular dysplasia

MYBPC2 25 Hypertrophic cardiomyopathy

PKP2 25 Arrhythmogenic right ventricular dysplasia 9

MMP2 22 Hypertrophic cardiomyopathy

KCNQ1 21 Long qt syndrome 1

67

From British Pakistanis dataset, which was analyzed for all the genes of common,

Mendelian, and congenital CVDs, contained 52 deleterious variants in APOB which is

well associated with hypercholesterolemia and coronary artery disease (Willer et al.,

2008). The second highest number of deleterious variants from common CVDs was

found in PRRC2A which is associated with coronary artery aneurysm (Hsieh et al.,

2010). The gene HSPG2 contained 44 deleterious variants. This gene encodes

heparan sulfate proteoglycan 2 which is prominent in atrial extra cellular matrix, and

has been reported to lower the risk of atherogenesis because it inhibits the retention of

lipoproteins. The low expression of HSPG2 and decreased amount of heparan sulfate

proteoglycan has been shown to be associated with carotid atherosclerotic lesions

(Tran et al., 2007). Likewise from the genes associated with Mendelian CVDs, again

TTN was the top most gene having 484 deleterious mutations. The genes prioritized

from this dataset having multiple deleterious mutations are summarized in Table 3.3.

Table 3.3: Genes of common, Mendelian and congenital CVDs containing high number of predicted deleterious variants in British Pakistanis.

Gene No. of dele

variants

Disorder

Genes of Mendelian and congenital CVDs

TTN 484 Dilated cardiomyopathy

OBSCN 95 Dilated cardiomyopathy



RYR2 37 Cardiac arrhythmia



FLNC 35 Dilated cardiomyopathy

MUC16 34 Hypertrophic cardiomyopathy

SCN5A 33 Atrial fibrillation, long_QT_syndrome

ALMS1 31 Alström syndrome and dilated cardiomyopathy

FPGT-

TNNI3K

30 Cardiac conduction disease with or without dilated

cardiomyopathy

ABCC6 28 Arterial calcification of infancy

AKAP9 28 Cardiac arrhythmia

68

TRPM4 24 Ventricular fibrillation


VWF 21 Hypertrophic cardiomyopathy

MYH11 20 Aortic aneurysm, familial thoracic 4

Genes of Common CVDs

APOB 52 Hypercholesterolemia and coronary artery disease

PRRC2A 45 Coronary artery aneurysm

HSPG2 44 Carotid atherosclerotic lesions

SDK1 33 Hypertension

CSMD1 30 Hypertension

ABCC6 28 Dystrophic cardiac calcification

FN1 27 Aortic aneurysm

ACE 26 Hypertensive heart disease and coronary artery disease

ITPR3 21 Coronary artery aneurysm

SVEP1 21 Coronary artery disease

XDH 21 Atherosclerosis

The highest number of predicted deleterious SNVs in the datasets under study

contributing to common CVDs included in descending order as hypertension,

atherosclerosis, heart failure, aneurysm, and coronary heart disease. For Mendelian

and congenital CVDs, it included cardiomyopathies (dilated and hypertrophic), cardiac

arrhythmias, and atrioventricular septal defects.

69

3.1.3 Filtration of Variants from ClinVar Database

The variants of three datasets under study were filtered for disease mutations

catalogued in ClinVar database. This filtration highlighted several variants associated

with cardiovascular disorders with pathogenic or likely pathogenic significance. Here,

almost all filtered variants were related to Mendelian and congenital CVDs due to

nature of submissions in the database.

From 1000 Genomes PJL, 03 variants with ClinVar significance ‗Pathogenic‘, and 02

variants with ‗likely Pathogenic‘ were identified (Table 3.4). The three pathogenic

SNVs (rs201654872, rs115372595, and rs201680145) contribute to dilated

cardiomyopathy, atrioventricular septal defect, and cerebral autosomal dominant

arteriopathy respectively. The annotation with online VEP tool showed that two

pathogenic missense SNVs rs201654872 [Val/Met] and rs201680145 [Arg/Cys] are

linked with CCCTC-binding factor site (CTCF_binding_site). The CTCF_binding_sites

are major determinants of long-range interactions (looping) of chromatins which alter

gene expression (Zlotorynski, 2015). The third pathogenic missense SNV

rs115372595 [Ala/Val] is also linked with regulatory region (open chromatin region).

The open chromatin sites tend to be near the transcription start site and play a role in

gene expression coincident with CTCF binding sites (Song et al., 2011). The two

‗Likely Pathogenic‘ variants (rs193922669, and rs77613865) contribute to

arrhythmogenic right ventricular cardiomyopathy and hypertrophic cardiomyopathy

respectively. The missense SNV rs193922669 causes Arg/His substitution in

desmoplakin protein, while rs77613865 is a splice region variant, and is also linked

with open chromatin region affecting the expression of myomesin 1 (MYOM1).

From ExAC SAS dataset, 153 SNVs were filtered containing 111 with ‗Pathogenic‘ and

42 with ‗Likely Pathogenic‘ significance (Table 3.5). It was noted that nearly half of

these filtered SNVs (78, 47.40%) belonged to different forms of cardiomyopathies, 38

(26.68%) SNVs were found to be associated with Long_QT syndrome, and 8 (5.19%)

SNVs related to various forms of atrioventricular septal defects. The allele frequencies

of filtered variants were compared among continental populations which divulged 11

SNVs with significantly higher allele frequency in SAS than in other populations of the

70

world (Figure 3.5). The filtered 153 SNVs were annotated with online VEP tool for their

functional consequences which showed that 13 SNVs impart completely Loss of

Function (LoF) effect to the transcripts, and 23 SNVs contributed to regulatory regions.

The filtration from British Pakistani dataset highlighted 58 SNVs containing 42 with

‗Pathogenic‘ and 16 with ‗Likely Pathogenic‘ significance (Table 3.6). In this data set,

23 SNVs were found to be associated with different forms of cardiomyopathies, 20

SNVs with long QT syndrome, 4 SNVs with atrioventricular septal defects, 2 SNVs

with familial hypercholesterolemia, 2 SNVs with aortic aneurysm, and 3 SNVs with

progressive familial heart block. The annotation with online VEP tool for functional

consequences revealed 2 SNVs with stop-gained effect to the transcript (rs786204338

and rs372827156) contributing to cardiomyopathies, and 10 SNVs posing detrimental

effect to regulatory regions.

To have an overall view of all the pathogenic and likely pathogenic variants filtered

from the three datasets, these were combined and the allele frequencies were

retrieved within their respective datasets. The allele frequency of variants filtered from

1000 Genomes Project was almost equal in world populations. From ExAC SAS, 11

SNVs were highlighted having higher allele frequency in SAS than in other populations

of the world (Figure 3.5). The highest load in terms of allele count of pathogenic and

likely pathogenic variants in ClinVar was found for progressive familial heart block

(OMIM # 113900, & 604559) (Figure 3.6). The genomic locations of genes harboring

these variants associated with common, Mendelian and congenital CVDs in Pakistani

population were highlighted (Figure 3.7). Few loci were observed rich in pathogenic

and likely pathogenic variants for CVDs including SCN5A on chromosome 3, KCNH2

on chromosome 7, GATA4 on chromosome 8, KCNQ1 and MYBPC3 on chromosome

11, MYH7 on chromosome 14, and KCNE1 on chromosome 21. The SCN5A encodes

sodium voltage-gated channel alpha subunit 5 which is found in cardiac muscles

primarily. It plays role in the upstroke during the action potential in cardiac cells. The

mutations in this gene have been found to disturb cardiac rhythm causing long QT

syndrome (Qureshi et al., 2015; Schwartz et al., 1995). The KCNQ1 encodes

potassium voltage-gated channel subfamily Q member 1 which is involved in

71

repolarization phase of action potential in cardiac muscles. Mutations in this gene are

also associated with long QT syndrome (Tester, and Ackerman, 2014). The MYBPC3

encodes cardiac myosin binding protein C which plays role in forming cross-bridges of

A bands of cardiac striated muscles. The MYH6 and MYH7 encode alpha and beta

myosin heavy chains respectively which constitute the cardiac myosin proteins. These

three genes are well associated with various forms of cardiomyopathies leading to

heart failure (Bezzina, 2008; Cahill, Ashrafian, and Watkins, 2013). The genes KCNE1

& KCNE2 encode potassium voltage-gated channels playing role in cardiac

conduction. These genes are associated with long QT syndrome (Tester, and

Ackerman, 2014).

Figure 3.5: ClinVar‘s pathogenic and likely pathogenic variants from ExAC SAS having significantly higher allele frequency in SAS than in other populations.

0

0.5

1

1.5

2

2.5

3

Alle

le F

req

uen

cy (

%)

SAS

EUR

AMR

AFR

EAS

72

Figure 3.6: Mutational load of different cardiovascular disorders in terms of allele counts of ClinVar‘s pathogenic and likely pathogenic variants.

1 10 100 1000 10000

Progressive_familial_heart_block

Long_QT_syndrome

Left_ventricular_noncompaction

Hypertrophic_cardiomyopathy

Hyperlipidemia/Hypercholesterolemia

Dilated_cardiomyopathy

Congenital_heart_diseases

Arrhythmogenic_right_ventricular_cardiomyopathy

Allele Count

British Pakistanis

ExAC SAS

73

Figure 3.7: Chromosomal positions of genes harboring the ClinVar‘s pathogenic and likely pathogenic variants associated with cardiovascular diseases. One circle beside the chromosomes denotes one variant, and the colour represents gene.

74

Table 3.4: ClinVar‘s pathogenic and likely pathogenic variants filtered form 1000 Genomes PJL dataset.

CHR POS ID REF ALT Gene Clinical

Significance

Disease

1 3347452 rs201654872 G A PRDM16 Pathogenic Dilated_cardiomyopathy

6 7583050 rs193922669 G A DSP Likely

Pathogenic

Arrhythmogenic_right_ventricular_

cardiomyopathy

8 11614483 rs115372595 C T GATA4 Pathogenic Atrioventricular_septal_defect_4

18 3149140 rs77613865 T G MYOM1 Likely

Pathogenic


19 15289863 rs201680145 G A NOTCH3 Pathogenic Cerebral_autosomal_dominant_

arteriopathy

Table 3.5: ClinVar‘s pathogenic and likely pathogenic variants filtered

form ExAC SAS dataset.


Significance

Disease

1 3329208 rs397514743 A G PRDM16 Pathogenic Left_ventricular_noncompaction_8

1 3347452 rs201654872 G A PRDM16 Pathogenic Dilated_cardiomyopathy_1LL

1 11907430 rs61757261 T G NPPA Pathogenic Atrial_fibrillation_familial_6

1 116275561 rs146664754 G C CASQ2 Likely

Pathogenic

Ventricular_tachycardia

1 156108298 rs60890628 C T LMNA Pathogenic Dilated_cardiomyopathy_1A

1 169524537 rs118203906 C G F5 Pathogenic Thrombophilia_due_to_activated_

protein_C_resistance

1 201328778 rs730881125 C T TNNT2 Likely

Pathogenic

Cardiomyopathy

1 201333455 rs483352832 G A TNNT2 Pathogenic Dilated_cardiomyopathy_1DD

1 227073271 rs63750197 C T PSEN2 Pathogenic Dilated_cardiomyopathy_1V

1 236925912 rs199920384 A G ACTN2 Likely

Pathogenic

Cardiomyopathy

2 179393524 rs565675340 G A TTN Likely

Pathogenic

Myopathy_with_fatal_cardiomyopathy

2 179430143 rs727505284 G A TTN Likely

Pathogenic

Dilated_cardiomyopathy_1G

2 179655434 rs397517497 C T TTN Likely

Pathogenic

Dilated_cardiomyopathy_1G

3 9979308 rs28941780 G A CRELD1 Pathogenic Atrioventricular_septal_defect_

partial_with_heterotaxy_syndrome

3 14183113 rs778127887 C T TMEM43 Likely

Pathogenic

Cardiomyopathy

75

3 20225453 rs199815268 T C SGOL1 Pathogenic Chronic_atrial_and_intestinal_

dysrhythmia

3 32200588 rs72552291 C T GPD1L Pathogenic Cardiomyopathy

3 33114105 rs72555392 C T GLB1 Pathogenic GM1-Gangliosidosis_Type_I_

with_Cardiac_involvement

3 38592356 rs45563942 A G SCN5A Pathogenic Dilated_cardiomyopathy_1E

3 38592408 rs137854619 C T SCN5A Pathogenic Long_QT_syndrome_2/3

3 38592534 rs199473314 C T SCN5A Pathogenic Congenital_long_QT_syndrome


3 38603958 rs199473603 G A SCN5A Pathogenic Congenital_long_QT_syndrome

3 38607905 rs199473341 C T SCN5A Pathogenic Dilated_cardiomyopathy


3 38622640 rs199473183 A G SCN5A Pathogenic Congenital_long_QT_syndrome




3 38645420 rs1805124 T C SCN5A Pathogenic Progressive_familial_heart_block_

type_1A

3 38647498 rs199473111 C T SCN5A Pathogenic Atrial_fibrillation|Atrial_fibrillation


3 38671821 rs199473059 C G SCN5A Pathogenic Congenital_long_QT_syndrome


3 46899901 rs145520567 C T MYL3 Likely

Pathogenic

Cardiomyopathy

3 46899903 rs193922391 T C MYL3 Likely

Pathogenic

Cardiomyopathy

4 114284598 rs45570339 C G ANK2 Pathogenic Congenital_long_QT_syndrome

4 114286207 rs66785829 T A ANK2 Pathogenic Arrhythmia

4 114288907 rs35530544 C A ANK2 Pathogenic Cardiac_arrhythmia_ankyrin_B-related

4 114294462 rs121912706 C T ANK2 Pathogenic Long_QT_syndrome_4|Arrhythmia

4 114294537 rs45454496 G A ANK2 Pathogenic Arrhythmia

5 251453 rs137852768 G A SDHA Pathogenic Mitochondrial_complex_II_deficiency|

Dilated_cardiomyopathy_1GG

5 172662014 rs28936670 G A NKX2-5 Pathogenic Tetralogy_of_Fallot|

Interrupted_aortic_arch|

Truncus_arteriosus|

Hypoplastic_left_heart_syndrome_2|

Malformation_of_the_heart_and_

great_vessels

6 6152107 rs267606789 G A F13A1 Pathogenic Factor_xiii_a_subunit_deficiency_of

6 7542236 rs121912998 G A DSP Pathogenic Arrhythmogenic_right_ventricular_

cardiomyopathy_type_8

76

6 7583050 rs193922669 G A DSP Likely

Pathogenic


cardiomyopathy

6 121769078 rs2227885 G A GJA1 Pathogenic Hypoplastic_left_heart_syndrome|

Atrioventricular_septal_defect_and_

common_atrioventricular_junction

6 121769120 rs104893965 G A GJA1 Pathogenic Hypoplastic_left_heart_syndrome|

Atrioventricular_septal_defect_and_

common_atrioventricular_junction

6 129601217 rs117422805 C T LAMA2 Likely

Pathogenic

Congenital_muscular_dystrophy

6 129674430 rs121913575 C T LAMA2 Pathogenic Congenital_muscular_dystrophy_due_

to_partial_LAMA2_deficiency

6 149699739 rs267607100 C A TAB2 Pathogenic Congenital_heart_disease_multiple_

types_2

7 150644429 rs377095107 G A KCNH2 Likely

Pathogenic

Cardiac_arrhythmia

7 150644799 rs141401803 G A KCNH2 Pathogenic Sudden_infant_death_syndrome|

Cardiac_arrhythmia

7 150646083 rs121912510 G A KCNH2 Pathogenic Long_QT_syndrome_2|

Congenital_long_QT_syndrome|

Cardiac_arrhythmia

7 150647283 rs138498207 G A KCNH2 Pathogenic Congenital_long_QT_syndrome|

Cardiac_arrhythmia

7 150649763 rs199472901 G A KCNH2 Pathogenic Congenital_long_QT_syndrome


7 150655288 rs199472876 C T KCNH2 Pathogenic Congenital_long_QT_syndrome|

Cardiac_arrhythmia

7 150655407 rs587777907 T A KCNH2 Pathogenic Long_QT_syndrome_2

8 11566308 rs387906769 C T GATA4 Pathogenic Atrioventricular_septal_defect_4|

Ventricular_septal_defect_1|

Tetralogy_of_Fallot


8 11614521 rs368489876 G A GATA4 Pathogenic Ventricular_septal_defect_1

8 11615928 rs56208331 G A GATA4 Pathogenic Atrial_septal_defect_2|

Tetralogy_of_Fallot

8 19811733 rs118204057 G A LPL Pathogenic Hyperlipoproteinemia_type_I

8 19813384 rs118204077 C T LPL Pathogenic Hyperlipoproteinemia_type_I

8 19813529 rs268 A G LPL Pathogenic Hyperlipidemia_familial_combined

10 69881254 rs140148105 A G MYPN Pathogenic Familial_hypertrophic_cardiomyopathy_22|

Dilated_cardiomyopathy_1KK

10 69961675 rs71534280 G A MYPN Pathogenic Dilated_cardiomyopathy_1KK

10 75871844 rs121917776 C T VCL Pathogenic Dilated_cardiomyopathy_1W|

77

Familial_hypertrophic_cardiomyopathy_15

10 88441437 rs45487699 C T LDB3 Pathogenic Dilated_cardiomyopathy_1C|Familial_

hypertrophic_cardiomyopathy_24

10 88477867 rs145983824 C T LDB3 Pathogenic Familial_hypertrophic_cardiomyopathy_24

10 92678707 rs145387010 G A ANKRD1 Likely

Pathogenic

Primary_familial_hypertrophic_

cardiomyopathy

11 2549180 rs199473450 C T KCNQ1 Pathogenic Congenital_long_QT_syndrome|

Cardiac_arrhythmia


Cardiac_arrhythmia


Cardiac_arrhythmia

11 2608824 rs199473473 G A KCNQ1 Pathogenic Congenital_long_QT_syndrome

11 2609956 rs199472778 A C KCNQ1 Pathogenic Congenital_long_QT_syndrome

11 2790090 rs199472785 C T KCNQ1 Pathogenic Congenital_long_QT_syndrome


11 2799220 rs17221854 C T KCNQ1 Pathogenic Long_QT_syndrome_1|

Congenital_long_QT_syndrome|

Cardiac_arrhythmia

11 19209758 rs137852764 T C CSRP3 Pathogenic Dilated_cardiomyopathy_1M|

Cardiomyopathy

11 47353637 rs730880142 C T MYBPC3 Likely

Pathogenic


cardiomyopathy

11 47354163 rs730880594 G A MYBPC3 Likely

Pathogenic

Cardiomyopathy


Pathogenic

Cardiomyopathy


Pathogenic


cardiomyopathy

11 47356671 rs387907267 G A MYBPC3 Pathogenic Familial_hypertrophic_cardiomyopathy_4


Pathogenic

Cardiomyopathy


Pathogenic

Cardiomyopathy

11 47361267 rs730880561 G A MYBPC3 Pathogenic Cardiomyopathy


11 47364285 rs200625851 C T MYBPC3 Pathogenic Familial_hypertrophic_cardiomyopathy_4|

Left_ventricular_noncompaction_10

11 47364602 rs193922377 C T MYBPC3 Pathogenic Cardiomyopathy

11 47364832 rs587776699 C T MYBPC3 Pathogenic Familial_hypertrophic_cardiomyopathy_4

11 47365041 rs730880641 A C MYBPC3 Pathogenic Cardiomyopathy

11 47371333 rs201098973 C T MYBPC3 Pathogenic Cardiomyopathy

78


Pathogenic


11 47371575 rs730880619 C G MYBPC3 Likely

Pathogenic

Cardiomyopathy

11 118039455 rs17121819 G A SCN2B Pathogenic Atrial_fibrillation_familial_14

12 5154893 rs121908591 C T KCNA5 Pathogenic Atrial_fibrillation_familial_7

12 8800737 rs727502791 G A MFAP5 Pathogenic Aortic_aneurysm_familial_thoracic_9

12 33003841 rs372827156 G A PKP2 Likely

Pathogenic


cardiomyopathy

12 98928103 rs17028450 C T TMPO Pathogenic Dilated_cardiomyopathy_1T

12 111356964 rs104894363 C T MYL2 Pathogenic Familial_hypertrophic_cardiomyopathy_10|

Cardiomyopathy

14 23862177 rs267606904 C G MYH6 Pathogenic Familial_hypertrophic_cardiomyopathy_14|


cardiomyopathy

14 23862646 rs143978652 C A MYH6 Pathogenic Dilated_cardiomyopathy_1EE|Familial_

hypertrophic_cardiomyopathy_14|

Sudden_cardiac_death

14 23866396 rs515726230 T C MYH6 Likely

Pathogenic

Malformation_of_the_heart

14 23884268 rs730880820 C T MYH7 Likely

Pathogenic

Cardiomyopathy

14 23884341 rs369940645 C T MYH7 Likely

Pathogenic


14 23886482 rs397516214 G C MYH7 Likely

Pathogenic

Cardiomyopathy

14 23886827 rs730880796 G A MYH7 Likely

Pathogenic

Cardiomyopathy

14 23887447 rs730880909 G C MYH7 Likely

Pathogenic

Cardiomyopathy

14 23890202 rs367546859 C T MYH7 Pathogenic Primary_familial_hypertrophic_

cardiomyopathy|Cardiomyopathy

14 23892910 rs145532615 A G MYH7 Likely

Pathogenic


14 23893148 rs45496496 C G MYH7 Likely

Pathogenic

Cardiomyopathy|Primary_dilated_

cardiomyopathy

14 23894525 rs3218716 C T MYH7 Pathogenic Cardiomyopathy

14 23894554 rs376754645 C T MYH7 Likely

Pathogenic


14 23895007 rs121913644 G A MYH7 Pathogenic Familial_hypertrophic_cardiomyopathy_1

14 23896982 rs377491278 C T MYH7 Likely

Pathogenic

Dilated_cardiomyopathy_1S

79

14 23902865 rs186964570 G A MYH7 Likely

Pathogenic


16 15814100 rs34321232 T G NDE1 Likely

Pathogenic

Familial_aortopathy

18 3149140 rs77613865 T G MYOM1 Likely

Pathogenic


18 19395685 rs201850378 C T MIB1 Pathogenic Left_ventricular_noncompaction_7

18 19438554 rs200035428 G T MIB1 Pathogenic Left_ventricular_noncompaction_7

18 28666646 rs193922708 G A DSC2 Likely

Pathogenic

Cardiomyopathy

18 28672114 rs144799937 C T DSC2 Likely

Pathogenic

Cardiomyopathy

18 29099850 rs121913013 G A DSG2 Pathogenic Arrhythmogenic_right_ventricular_


18 29104828 rs121913012 G A DSG2 Pathogenic Arrhythmogenic_right_ventricular_


18 29121188 rs201564919 G A DSG2 Pathogenic Cardiomyopathy

18 29178565 rs121918095 G A TTR Pathogenic Amyloidogenic_transthyretin_amyloidosis

18 29178618 rs76992529 G A TTR Pathogenic Amyloid_Cardiomyopathy|Cardiomyopathy

19 45452024 rs120074114 A C APOC2 Pathogenic Apolipoprotein_c-ii_variant

19 49671558 rs387907216 C T TRPM4 Pathogenic Progressive_familial_heart_block_type_1B

19 49685865 rs201907325 G A TRPM4 Pathogenic Progressive_familial_heart_block_type_1B

19 49691898 rs172149856 G A TRPM4 Pathogenic Progressive_familial_heart_block_type_1B

19 55665519 rs397516348 G T TNNI3 Likely

Pathogenic

Cardiomyopathy

19 55668953 rs397516359 G A TNNI3 Likely

Pathogenic

Dilated_cardiomyopathy_1FF

20 30408136 rs121908107 C T MYLK2 Pathogenic Cardiomyopathy\x2c_hypertrophic_

midventricular_digenic

20 30408160 rs121908108 C A MYLK2 Pathogenic Cardiomyopathy\x2c_hypertrophic_

midventricular_digenic

20 42788855 rs554853074 G C JPH2 Likely

Pathogenic

Cardiomyopathy

21 35742806 rs199473648 C T KCNE2 Pathogenic Congenital_long_QT_syndrome|

Cardiac_arrhythmia|Long_QT_syndrome

21 35742947 rs74315448 T C KCNE2 Pathogenic Long_QT_syndrome_6

21 35821559 rs142511345 G A KCNE1 Pathogenic Congenital_long_QT_syndrome

21 35821680 rs1805128 C T KCNE1 Pathogenic Long_QT_syndrome_2/5

21 35821686 rs199473360 C T KCNE1 Pathogenic Congenital_long_QT_syndrome



21 35821838 rs17857111 C T KCNE1 Likely Congenital_long_QT_syndrome

80

Pathogenic



22 50962330 rs28937598 G A SCO2 Pathogenic Cardioencephalomyopathy

22 50962573 rs74315512 G A NCAPH2 Pathogenic Cardioencephalomyopathy

X 32360240 rs128626249 G A DMD Pathogenic Dilated_cardiomyopathy_3B

X 153609167 rs370840449 G A EMD Likely

Pathogenic

Cardiomyopathy

Table 3.6: ClinVar‘s pathogenic and likely pathogenic variants filtered

form British Pakistanis dataset.


Significance

Disease

1 3430888 rs201654872 G A PRDM16 Pathogenic Dilated_cardiomyopathy_1LL

2 178738114 rs147879266 C T TTN Pathogenic Dilated_cardiomyopathy_1G

3 38550895 rs137854610 C T SCN5A Pathogenic Long_QT_syndrome_3|Congenital_

long_QT_syndrome|Sudden_infant_

death_syndrome

3 38550917 rs137854619 C T SCN5A Pathogenic Long_QT_syndrome_2/3 _digenic|

Congenital_long_QT_syndrome

3 38562467 rs199473603 G A SCN5A Pathogenic Congenital_long_QT_syndrome|

Long_QT_syndrome|not_provided

3 38575385 rs41261344 C T SCN5A Pathogenic Brugada_syndrome_1|

Long_QT_syndrome_3

3 38581149 rs199473183 A G SCN5A Pathogenic Congenital_long_QT_syndrome

3 38603929 rs1805124 T C SCN5A Pathogenic Progressive_familial_heart_block_

type_1A


3 38630330 rs199473059 C G SCN5A Pathogenic Congenital_long_QT_syndrome

4 113365051 rs66785829 T A ANK2 Pathogenic Arrhythmia|Long_QT_syndrome|

Cardiac_arrhythmia

4 113373306 rs121912706 C T ANK2 Pathogenic Long_QT_syndrome_4|Arrhythmia|

Cardiac_arrhythmia

5 173235011 rs28936670 G A NKX2-5 Pathogenic Tetralogy_of_Fallot|

Interrupted_aortic_arch|

Truncus_arteriosus|

Hypoplastic_left_heart_syndrome_2|

Malformation_of_the_heart_and_

great_vessels

6 7542003 rs121912998 G A DSP Pathogenic Arrhythmogenic_right_ventricular_

cardiomyopathy\x2c_type_8|


81

cardiomyopathy|not_specified

6 7582817 rs193922669 G A DSP Likely

Pathogenic


cardiomyopathy


7 150958319 rs587777907 T A KCNH2 Pathogenic Long_QT_syndrome_2


8 11757012 rs368489876 G A GATA4 Pathogenic Ventricular_septal_defect_1

8 11758419 rs56208331 G A GATA4 Pathogenic Atrial_septal_defect_2|

Tetralogy_of_Fallot

8 19956018 rs268 A G LPL Pathogenic Hyperlipidemia _familial_combined

10 86681680 rs45487699 C T LDB3 Pathogenic Dilated_cardiomyopathy_1C|

Familial_hypertrophic_cardiomyo-

pathy_24

10 90918950 rs145387010 G A ANKRD1 Likely

Pathogenic


cardiomyopathy|Cardiomyopathy

10 110810449 rs794729148 C T RBM20 Likely Pathogenic

Cardiomyopathy

10 119672399 rs397514506 C T BAG3 Pathogenic Dilated_cardiomyopathy_1HH


cardiomyopathy



Cardiac_arrhythmia


11 2777990 rs17221854 C T KCNQ1 Pathogenic Long_QT_syndrome_1| Acquired_susceptibility_to_long_ QT_syndrome_1| Long_QT_syndrome_LQT1_subtype

11 47332579 rs730880596 C T MYBPC3 Likely Pathogenic

Cardiomyopathy| Hypertrophic_cardiomyopathy

11 47333610 rs371061770 G A MYBPC3 Likely Pathogenic


11 47337722 rs730880565 G A MYBPC3 Likely Pathogenic

Cardiomyopathy


11 47343021 rs786204338 C A MYBPC3 Likely Pathogenic

Familial_hypertrophic_ cardiomyopathy_4

11 47343281 rs587776699 C T MYBPC3 Pathogenic Familial_hypertrophic_cardiomyo- pathy_4|Cardiomyopathy

11 47350024 rs730880619 C G MYBPC3 Likely

Pathogenic

Cardiomyopathy

11 70206161 rs387906839 T G FADD Pathogenic infections _recurrent_ encephalopathy_hepatic_dysfunction_ and_cardiovascular_malformations

12 32850907 rs372827156 G A PKP2 Likely Arrhythmogenic_right_ventricular_

82

Pathogenic cardiomyopathy

14 23427773 rs377491278 C T MYH7 Likely Pathogenic

Dilated_cardiomyopathy_1S

15 48427699 rs140537304 C T FBN1 Likely

Pathogenic

Marfan_syndrome|

Thoracic_aortic_aneurysms_and_

aortic_dissections

15 99690352 rs121918530 A G MEF2A Pathogenic Coronary_artery_disease/

myocardial_infarction

18 3149142 rs77613865 T G MYOM1 Likely Pathogenic


18 22171695 rs387906816 G A GATA6 Pathogenic Atrial_septal_defect_9| Tetralogy_of_Fallot

18 31519887 rs121913013 G A DSG2 Pathogenic Arrhythmogenic_right_ventricular_ cardiomyopathy_type_10

18 31524754 rs752432726 A G DSG2 Pathogenic Cardiomyopathy

19 11105436 rs121908026 C T LDLR Pathogenic Familial_hypercholesterolemia

19 11120436 rs28942084 C T LDLR Pathogenic Familial_hypercholesterolemia

19 48965830 rs2230267 T C FTL Likely Pathogenic

sporadic_abdominal_aortic_aneurysm

19 49182608 rs201907325 G A TRPM4 Pathogenic Progressive_familial_heart_block_ type_1B

19 49188641 rs172149856 G A TRPM4 Pathogenic Progressive_familial_heart_block_ type_1B

19 55157585 rs397516359 G A TNNI3 Likely Pathogenic

Dilated_cardiomyopathy_1FF

20 31820333 rs121908107 C T MYLK2 Pathogenic Cardiomyopathy_hypertrophic_ midventricular_digenic

20 44160215 rs554853074 G C JPH2 Likely Pathogenic

Cardiomyopathy

21 34370507 rs199473648 C T KCNE2 Pathogenic Congenital_long_QT_syndrome| Cardiac_arrhythmia|Long_QT_syndrome

21 34370648 rs74315448 T C KCNE2 Pathogenic Long_QT_syndrome_6


21 34449540 rs17857111 C T KCNE1 Likely Pathogenic

Congenital_long_QT_syndrome


83

3.1.4 Comparative Analysis of Allele Frequencies of Predicted Deleterious

Variants

The assessment of the distribution of allele frequencies of deleterious nonsynonymous

SNVs across the populations is a key factor in understanding the genetic makeup and

estimating the underlying burden of various human diseases (1000 Genomes Project,

2012). The frequency of deleterious variants in different populations is an important

indicator for disease prevalence (Dopazo et al., 2016). Derived allele frequency

spectra of deleterious SNVs filtered from the three datasets under study revealed that

majority of the variants were of rare allele frequency (AF < 0.5%) (Figure 3.8). It was

also observed that larger proportion of deleterious sites were singletons. The

proportion of deleterious singletons in 1000 Genomes PJL was 61.67%, in ExAC SAS

was 40.14%, and in British Pakistanis was 50.86%. The role of rare allele frequency

variants in CVDs‘ susceptibility has been hypothesized that these variants may impart

more effect to the pathophysiology of disease (Wain, 2014).

Figure 3.8: Allele frequency spectrum (AFS) of deleterious SNVs in three datasets: (A) 1000 Genomes PJL, (B) ExAC South Asians, and (C) British Pakistanis. The sharp spikes in AFS of all three datasets represent large number of singletons.

84

In addition to rare variants, small number of low (AF 0.5-5.0%) and common (AF >

5.0%) allele frequency deleterious variants were also observed. These low and

common deleterious variants pose modest-to-weak effect to fitness and spread and

raise to high allele frequencies in the populations along with neutral variants during the

rapid population expansion (Peischl, and Excoffier, 2015). In British Pakistanis,

comparatively more numbers of high-allele frequency deleterious variants were

observed in pattern of allele frequency distribution. To explore this, the allele

frequency spectrum using higher allele frequency deleterious variants (AF > 10%) was

prepared in bins of 10. This showed that the British Pakistanis contained larger

number of high frequency deleterious variants in the genes-set of CVDs as compared

to 1000 Genomes PJL, and ExAC SAS datasets (Figure 3.9). This was due to founder

effect, as this population has extensively been inbred during the past many

generations and carry long runs of identity be descent than contemporary outbred

populations (Narasimhan et al., 2016), which increases the number of deleterious

variants along with neutral variants in populations during rapid expansion (Henn,

Botigué, Bustamante, Clark, and Gravel, 2015).

85

Figure 3.9: Allele frequency spectrum using the common

deleterious SNVs of DAF≥10% of three datasets.

86

Previously, it has been found that the allele frequencies of deleterious genetic variants

associated with certain human diseases may vary among various populations,

according to their historical modes of expansion, role of evolutionary forces, and

bottlenecks. Highly deleterious variants are purged by purifying selection from the

population and are rare (Henn, Botigué, Bustamante, Clark, and Gravel, 2015; Lettre,

2014; Tennessen et al., 2012). The comparison of derived allele frequencies of

predicted deleterious SNVs of cardiovascular diseases, was carried out with other

major population groups within their respective data sets. This comparison revealed

two important findings: (a) The extent of private and shared deleterious SNVs between

the Pakistanis and other populations, and (b) the number of deleterious SNVs with

higher derived allele frequency in the Pakistani population (or South Asian in case of

ExAC data) than in other populations. From this analysis, it was noted that sharing of

deleterious SNVs with other populations was not similar, rather the shared SNVs with

different population groups were different (Table 3.7).

Table 3.7: The proportion of shared deleterious SNVs (sdSNVs) with other populations of 1000 Genomes Project and ExAC.

10

00

Ge

no

me

s_P

JL Total

dSNVs Private dSNVs

deleterious SNVs shared with different populations

Proportion (shared with pop/

total shared dSNVs)

SNVs with higher

DAF in PJL

SNVs with lower DAF

in PJL

Proportion of SNVs with higher DAF

561 185

(33%)

shared with SAS 376 1.000 282 94 0.750

shared with EUR 199 0.529 108 91 0.543

shared with AMR 171 0.455 99 72 0.579

shared with AFR 157 0.418 119 38 0.758

shared with EAS 127 0.338 84 43 0.661

ExA

C_S

AS

Total dSNVs

Private dSNVs

deleterious SNVs shared with different populations

Proportion (shared with pop/

total shared dSNVs)

SNVs with higher

DAF in SAS

SNVs with lower

DAF in SAS

Proportion of SNVs with higher DAF

7374 4170 (56%)

shared with NFE 2480 0.774 1883 597 0.759

shared with AMR 1211 0.378 473 738 0.391

shared with AFR 1202 0.375 445 757 0.370

shared with EAS 893 0.279 268 625 0.300

shared with FIN 478 0.149 123 355 0.257

From 1000 Genomes Project, overall 33.16% of the predicted deleterious SNVs were

private to the Pakistani population, the derived allele frequencies of which varied from

0.0052 to 0.0260, while 66.84% were shared with derived allele frequencies ranging

from 0.0052 to 0.7968. So, it was evident that among the predicted deleterious SNVs,

87

the private proportion contained only rare variants (DAF < 0.05), while the shared

proportion contained both rare (47.50%) and common variants (52.50%) within this

category. Among the SNVs shared with other populations, the proportions of those

having higher allele frequencies in Pakistani population were greater in all the

comparisons conducted within 1000 Genomes populations. This comparison also

revealed that there was comparatively less difference in allele frequencies of most of

the deleterious variants between 1000 Genomes PJL and rest of the South Asian

populations, however, in some cases a significant difference up to 5.2 times higher

was observed. Likewise, the maximum difference of derived allele frequency of shared

variants with Americans was 22.32 times higher in PJL, for Europeans 41.67 times

higher in PJL, whereas, great frequency difference was observed with Africans and

East Asians where the maximum derived allele frequency difference was calculated to

be 72.19 times higher in PJL (Figure 3.9).

From ExAC SAS dataset, the proportion of shared deleterious SNVs with higher DAF

in SAS was greater than only one population i.e., NFE (Non-Finnish Europeans),

while, it was less than AFR (Africans), AMR (Americans), FIN (Finnish), and EAS (East

Asians) populations. Apart from this, many SNVs have higher derived allele frequency

in SAS than in other populations of ExAC dataset. The highest difference of higher

DAF in SAS was observed with NFE i.e. 1098 timers higher in SAS. For other

populations, the maximum difference was 858 times than EAS, 290 times than AMR,

347 times than AFR, and 64 times than FIN populations.

88

Figure 3.10: Distribution of allele frequencies of shared deleterious SNVs in PJL versus all continental groups of 1000 Genomes Project. A. The SNVs at diagonal line have equal DAF in the comparing populations, whereas, those to the right have higher DAF in PJL, and those to the left have higher DAF in comparing population. B. Violin plots showing the median DAF in comparing populations.

A

B

89

3.1.5 Functional Annotation of Deleterious Variants

The predicted deleterious variants of the three data sets were stratified according to

their functional consequences on transcripts to highlight loss of function (LoF) variants

using the online Variant Effect Predictor tool (McLaren et al., 2016). The loss of

function variants include ‗stop-gained‘, ‗stop-lost‘, ‗start-lost‘, ‗frameshift change‘, and

‗splice donor or acceptor‘ which possess the most damaging effect to proteins

structure and/or function (MacArthur et al., 2012). In this analysis, 03 LoF SNVs in

1000 Genomes PJL individuals were found, i.e., rs2228570 (start lost), rs371316552

(stop gained), and rs117054298 (splice acceptor variant). The derived allele frequency

of homozygous ‗rs2228570‘ was found to be quite high in all 1000 Genomes‘

continental populations ranging from 51.73% in Americans to 81.09% in Africans,

while in PJL individuals, it was 79.68%. This variant lies within vitamin D receptor gene

(VDR), whose 7 out of 10 transcripts were found to be affected with LoF mutation, and

is associated with many disease conditions including the hypertension (Santoro et al.,

2015; Swapna, Vamsi, Usha, and Padma, 2011). The heterozygous ‗rs371316552‘

SNP belongs to cathepsin B (CTSB) gene, whose increased expression has been

reported to pose a risk for atherosclerosis and myocardial infarction in rat models

(Jormsjö et al., 2002). The third LoF homozygous SNP ‗rs117054298‘ belongs to

insulin-like growth factor (IGF) binding protein-1 (IGFBP1) gene, whose splice site of

one transcript ENST00000457280 is disrupted and contributes to atherosclerosis

(Rajwani et al., 2012).

From ExAC SAS dataset, 30 deleterious SNVs, including 2 in homozygous state, were

found with LoF effect to the transcripts (Table 3.8). These included 06 SNVs with stop-

gained effect, 02 SNVs with stop-lost effect, 14 SNVs with start-lost effect, 05 SNV as

splice-donor and 03 SNVs as splice-acceptor. Here all the SNVs were of rare allele

frequency, but one splice-acceptor SNV rs117054298 had 1.5% allele frequency in

SAS. This variant belonged to Insulin Like Growth Factor Binding Protein 1 gene

(IGFBP1). The ligh and low levels of Insulin Like Growth Factor Binding Protein 1 have

been reported to be associated with hypertrophic cardiomyopathy and congestive

heart failure respectively (Saeki, Hamada, and Hiwada, 2002).

90

Table 3.8: Deleterious LoF SNVs filtered from ExAC SAS dataset in genes of Mendelian and congenital CVDs. The two underlined SNVs (in chromosome 2 & 20) were in homozygous state.

CHR POS ID REF ALT AC Gene Effected Transcripts

Effect

1 11907740 rs770346667 A G AC=1 CLCN6 1 start-lost

1 147231345 rs782228278 A G AC=2 GJA5 3 start-lost

1 156084858 rs60695352 G A AC=3 LMNA 1 splice-donor

1 159684275 rs749117623 T C AC=1 CRP 5 start-lost

1 227069610 rs762674312 T C AC=1 PSEN2 6 start-lost

2 63824637 rs750401274 C G AC=1 MDH1 1 stop-gained

2 73679956 rs28730854 C A AC=3 ALMS1 4 stop-gained

2 179447910 rs780643085 T A AC=1 TTN 1 splice-donor

3 32180202 rs763553263 G A AC=1 GPD1L 1 stop-gained

5 216962 rs755580860 C T AC=1 SDHA 2 start-lost

6 29795623 rs143732275 T C AC=25 HLA-G 5 start-lost

6 43752283 rs748984440 A C AC=1 VEGFA 2 stop-lost

6 76527266 rs773390519 T C AC=1 MYO6 4 start-lost

7 45932563 rs117054298 A T AC=247 IGFBP1 1 splice-acceptor

7 100771765 rs141347752 G A AC=71 SERPINE1 1 splice-donor

7 100771765 rs141347752 G T AC=1 SERPINE1 1 splice-donor

7 139719838 rs752538699 T C AC=2 PARP12 2 stop-lost

9 108337316 rs750082228 G C AC=1 FKTN 6 start-lost

11 111782447 rs577253222 A G AC=2 CRYAB 10 start-lost

11 111789698 rs782334737 T C AC=10 C11orf52 2 start-lost

14 64692155 rs371152824 G T AC=1 SYNE2 9 stop-gained

15 89864101 rs755510237 C A AC=1 POLG 1 splice-acceptor

16 3070400 rs763424248 T C AC=1 HCFC1R1 5 start-lost

17 7128033 rs754123613 G A AC=1 DVL2 2 splice-acceptor

17 19866257 rs745466247 G A AC=1 AKAP10 1 stop-gained

17 39881255 rs782157753 C T AC=1 HAP1 1 splice-donor

17 42148336 rs767856364 G A AC=1 G6PC3 5 start-lost

19 55652651 rs774939150 A G AC=1 TNNT1 2 start-lost

20 44637567 rs121434556 T A AC=121 MMP9 1 start-lost

22 30659995 rs140815202 C T AC=1 OSM 2 stop-gained

91

From British Pakistanis dataset, 29 deleterious SNVs were identified posing LoF effect

to the transcripts. These included 09 SNVs with stop-gained effect, 02 SNVs with stop-

lost effect, 18 SNVs with start-lost effect, 01 SNV as splice-donor and 02 SNVs as

splice-acceptor. All the LoF SNVs were of rare allele frequency except one start-lost

SNV ‗rs2228570‘ the percent allele frequency of which was found to be 75.6% in

British Pakistanis. This homozygous SNV of vitamin D receptor gene (VDR) was also

filtered in 1000 Genomes PJL dataset and is associated with many disease conditions

including the hypertension. Further, 10 novel LoF SNVs were also filtered which were

not previously reported in dbSNP database (Table 3.9).

Table 3.9: Novel deleterious SNVs filtered from British Pakistanis dataset in genes of CVDs.

CHR POS ID REF ALT AC Gene Effected Transcripts

Effect

6 31543520 . T A AC=1 LTA 1 start-lost

6 32150436 . G A AC=3 AGPAT1 1 stop-gained

6 118887404 . A C AC=1 CEP85L 1 start-lost

6 160109220 . C T AC=1 SOD2 1 stop-gained

8 11606427 . G C AC=1 GATA4 1 splice-acceptor

14 95053702 . G A AC=2 SERPINA5 11 start-lost

17 4544946 . T C AC=2 ALOX15 3 start-lost

17 78188475 . C T AC=1 SGSH 2 stop-gained

18 77211708 . G A AC=1 NFATC1 1 stop-gained

19 39410426 . G C AC=1 SARS2 1 stop-gained

92

3.1.6 Differentiation of Deleterious Variants in Pakistani Population

Data from whole genome/exome sequencing projects can be used to find out the

extent of differentiation among populations based on the differences in allele

frequencies of variants. The presence of variants with highly differentiated frequencies

among the populations provides a direction to fine-map signals of local adaptation as

well as susceptibility to diseases (1000 Genomes Project, 2010). In this study, the

differentiation of genetic variations in genes of CVDs was evaluated in Pakistani

population using the phased data of 1000 Genomes Project. The genetic

differentiation was determined with F-statistics of population genetics by calculating

the Weir and Cockerham FST in two ways: (1) FST calculation for PJL versus rest of the

South Asian (SAS) populations in 1000 Genomes Project using all SNVs of genes

harboring the prioritized deleterious SNVs for cardiovascular diseases to determine

differentiation from neighboring local populations, and (2) FST calculation for PJL

versus 25 global populations in 1000 Genomes Project using the same set of genes.

The FST calculated with all SNVs for PJL versus rest of SAS populations showed large

number of poorly differentiated SNVs having FST values < 0.05, and many moderately

differentiated SNVs (FST value 0.05 – 0.15) (Figure 3.11). The mean FST value of all

SNVs was 0.00134, meaning that the genes harboring deleterious variations for CVDs

are not well differentiated from neighboring South Asian populations. The mean FST for

deleterious SNVs was calculated as 0.00638, which also shows poor differentiation but

still is 4.76 times higher than the mean FST of all SNVs. Two deleterious SNVs

(rs560826688 and rs563254260) were found moderately differentiated (FST value 0.05

- 0.15) from rest of South Asian populations. The derived allele frequency of

rs560826688 is 3.1% and belongs to LDL Receptor Related Protein 5 gene (LRP5)

which is reported to be involved in hypertension (Suwazono et al., 2006). The derived

allele frequency of rs563254260 is 2.6% and lies in Serpin Family F Member 1 gene

(SERPINF1) which has been associated with obesity and hypertension (Chen et al.,

2012). In addition to these, one greatly differentiated (FST value 0.15-0.25) SNV

rs539962979 with FST value 0.16597 was also observed in DMPK (Dystrophia

Myotonica Protein Kinase) which is reported to be involved in cardiomyopathy.

Likewise, the F-statistics performed for PJL versus 25 global populations of 1000

93

Genomes Project showed comparatively higher differentiation than the SAS

populations, where the mean FST of 0.0031 for all SNVs, and 0.0392 for deleterious

SNVs was calculated. The major proportion of differentiation in the predicted

deleterious SNVs was covered by moderately differentiated SNVs (38.32%, 215 out of

561) (Figure 3.12). Besides this, 08 greatly differentiated deleterious SNVs (FST 0.15-

0.25), and 02 severely differentiated deleterious SNVs (FST > 0.25) were also found

(Table 3.10).

The current understanding of population genetics suggest that genetic burden of

common diseases may be different for populations under the influence of their

demographic past histories (Henn, Botigué, Bustamante, Clark, and Gravel, 2015). It

was hypothesized that deleterious variants filtered in Pakistani population for their

association with cardiovascular diseases may have differentiated from other

populations, but the results were persistent with earlier findings that genetic variants

related to common diseases of humans are not well differentiated (Lohmueller,

Mauney, Reich, and Braverman, 2006). Eight greatly and two severely differentiated

deleterious SNVs from the world populations may have evolved under random genetic

drift.

94

Figure 3.11: Manhattan plot for FST values between the PJL versus SAS populations of 1000 Genomes Project. The plot is for selected genes which harbored the deleterious SNVs for cardiovascular diseases, as filtered in this analysis. Each dot in the plot represents one SNV. The two moderately differentiated SNVs are highlighted as red.

95

Figure 3.12: Comparison of the proportions of moderately, greatly, and severely differentiated deleterious SNVs and all SNVs in genes harboring deleterious SNVs. The proportion of moderately differentiated SNVs (FST 0.05 - 0.15) is higher for deleterious SNVs when compared PJL versus all populations of 1000 Genomes Project.

96

Table 3.10: Deleterious SNVs greatly and severely differentiated in PJL versus 25 global populations of 1000 Genomes Project. It is note-worthy that two severely differentiated SNVs (rs560826688 and rs563254260) are both related to hypertension.

CHR POS ID REF ALT Gene Global DAF

PJL DAF FST Disease

6 44274073 rs151044424 C A AARS2 0.0005990 0.015625 0.171381 hypertrophic cardiomyopathy

7 92731317 rs577145375 A G SAMD9 0.0005990 0.015625 0.171381 atherosclerosis

11 35240875 rs376536014 G T CD44 0.0037939 0.041667 0.172107 aneurysm

11 68192737 rs560826688 G T LRP5 0.0011980 0.03125 0.297321 hypertension

14 64678793 rs532495528 G A SYNE2 0.0005990 0.015625 0.170082 dilated cardiomyopathy, Long QT syndrome

14 74974786 rs549001156 C A LTBP2 0.0005990 0.015625 0.171381 Ventricular septal defect

15 67008780 rs532621952 C G SMAD6 0.0005990 0.015625 0.171381 Aortic valve disease 2

17 1680660 rs563254260 C G SERPINF1 0.0009984 0.026042 0.259398 hypertension

19 39410407 rs555119979 G A SARS2 0.0007987 0.020833 0.217625 hypertension

21 38877614 rs575017348 T C DYRK1A 0.0007987 0.020833 0.217625 heart failure

97

The calculated FST values due to the difference in allele frequency of predicted

deleterious SNVs between PJL and global populations provided a direction for

stratification of other populations based on detrimental mutational load for

cardiovascular diseases. For this, principal component analysis (PCA) was carried out

using 1000 Genomes Project data of PJL and 15 other populations, 3 from each

continental group. The PCA was performed for common allele frequency SNVs (AF >

5.0%) and rare and low allele frequency SNVs (AF ≤ 5.0%) separately. The PCA with

all low and rare allele frequency SNVs of the genes-set of CVDs showed all the

populations grouped together except for the Africans (Figure 3.13, A). The PCA with

all common allele frequency SNVs suggested three distinct groups of world

populations where PJL together with other SAS populations were found grouped with

Europeans and Americans (Figure 3.13, B). Likewise, the PCA using deleterious low

and rare allele frequency SNVs showed all populations grouped together, but PJL

appeared to be diverging out (Figure 3.13, C). This was due to PJL specific deleterious

variants. The PCA with deleterious common allele frequency SNVs suggested that

three distinct groups of populations being related to each other decreasing their mutual

variance (Figure 3.13, D). This stratification based on genes involved in CVDs showed

no remarkable differentiation in populations rather it follows the similar pattern of

grouping as has been suggested for populations following the route of expansion after

the out of Africa event (Jobling, Hurles, and Tyler-Smith, 2013; McEvoy, Powell,

Goddard, and Visscher, 2011).

98

Figure 3.13: Principal Components Analysis (PCA) using the genes-set of CVDs. A. PCA using all low and rare allele frequency (AF≤5.0%) SNVs, B. PCA using all common allele frequency (AF>5.0%) SNVs. C. PCA using deleterious low and rare allele frequency (AF≤5.0%) SNVs, D. PCA using deleterious common allele frequency (AF>5.0%) SNVs.

(Here, PJL=Punjabi from Lahore, Pakistan; BEB=Bengali from Bangladesh;

ITU=Indian Telugu from the UK; STU=Sri Lankan Tamil from the UK;

FIN=Finnish in Finland; GBR=British in England and Scotland; CEU=Utah

Residents (CEPH) with Northern and Western European Ancestry;

CLM=Colombians from Medellin, Colombia; PEL=Peruvians from Lima, Peru;

PUR=Puerto Ricans from Puerto Rico; CHB=Han Chinese in Bejing, China;

JPT=Japanese in Tokyo, Japan; KHV=Kinh in Ho Chi Minh City, Vietnam;

LWK=Luhya in Webuye, Kenya; MSL=Mende in Sierra Leone; YRI=Yoruba in

Ibadan, Nigeria).

99

Using the same set of genes, the mutational load for cardiovascular diseases was also

determined for randomly selected one population from five continental population

groups of 1000 Genomes Project i.e., Gujarati Indian from Houston (GIH) in South

Asia, Southern Han Chinese (CHS) from East Asian, Finnish (FIN) in Finland, Puerto

Ricans (PUR) from America, Yoruba in Ibadan (YRI) in Africa, and Malay of East Asia

which is not part of 1000 Genomes Project. This empirical estimation revealed excess

of deleterious derived rare variants (singletons) in YRI and Malay populations, while

there were least deleterious derived singletons in FIN and PJL populations (Figure

3.14 A). The least deleterious singletons in Finnish may be due to the fact that this

population is highly inbred due to founder effect. This also gave clue of increased

inbreeding in PJL individuals. So, the proportion of homozygous deleterious SNVs was

determined in all six populations. The highest proportion was observed in FIN and PJL

populations (PJL 12.30%, Finnish 12.79%, Figure 3.14 B). The low number of derived

singletons and high proportion of homozygous deleterious SNVs in Pakistani

population are may be due to increased level of consanguinity.

Figure 3.14: Site frequency spectrums for PJL, 5 other populations of 1000 Genomes Project, and one Southeast Asian population ‗Malay‘, using the data of same number of individuals (n=96) of each population for normalization. A. Comparison of low frequency deleterius SNVs in genes set of CVDs. B. Percent homozyous deleterious SNVs in each population.

100

3.2 Whole Genome Sequencing of a Pakistani Individual with

Hyperlipi-demia and Coronary Artery Disease

3.2.1 Quality Assessment of Genomic DNA

Genomic DNA from peripheral blood samples of a Pakistani individual was isolated

using the CTAB method. The method is same as described by Winnepenninckx et al.

(1993) with some modifications. Quality of the purified DNA was assessed on 1%

agarose gel. The agarose gel showed intact bands of good quality genomic DNA of

>10Kb size (Figure 3.15).

Figure 3.15: Agarose gel electrophoresis of genomic DNA isolated from Pakistani obese individual (L = ladder, 1 - 5 = genomic DNA samples).

3.2.2 Fragmentation of Genomic DNA and Size Selection

To prepare the mate-paired library, genomic DNA was fragmented to 1500 bp using

the Covaris S220 sonication system. The sonication process yielded good quality of

fragmented genomic DNA mainly distributed from 1000 bp to 2000 bp approximately.

The most intense part of the fragmented DNA, ranging from 1200 bp to 1800 bp, was

extracted from agarose gel using sterile sharp blade (Figure 3.16).

101

Figure 3.16: A. Fragmentation of genomic DNA using the Covaris S220 system. B. Size selection by slicing the most intense part of fragmented DNA.

3.2.3 Mate-Paired Library Preparation

To sequence the whole genome, a mate-paired library was constructed because it

enables to find out single nucleotide variations (SNVs), small insertions and deletions

(indels), as well as the structural variations (SVs) (Levy et al. 2007, Wheeler et al.

2008). In case of SOLiD 5500xl work flow, the mate-paired library consists of a central

mate-pair adaptor of 36 bp, one fragment of the target DNA of 60 bp on either side of

the central mate-pair adaptor, P1-T adaptor of 41 bp at 5‘-end of the target DNA, and

P2-T adaptor of 24 bp at 3‘-end of the target DNA. Hence the total length of one

fragment of ideally prepared mate-paired library lies between 250-300 bp (Figure

3.17).

Figure 3.17: A schematic illustration of one fragment of mate-paired library. Tag1 and Tag2 represent the target DNAs to be sequenced.

A B

102

3.2.4 Evaluation of the Mate-Paired Library

The library of DNA fragments was evaluated on E-Gel® Electrophoresis System using

a 2% ready to use agarose gel. The position of amplified library was in between 250-

350 bp of the DNA ladder, representing the distribution of fragment sizes of the library

(Figure 3.18).

Figure 3.18: A 2% E-Gel showing the position of mate-paired library in lane no. 2.

The E-gel electrophoresis gives the qualitative assessment of the library, while the

Bioanalyzer gives quantitative and very precise size distribution of the library. From the

electropherogram obtained from the Bioanalyzer 2100, the average size of the library

was found to be 269 bp, and the concentration from the area under curve was 3.3

ng/uL (Figure 3.19).


103

Figure 3.19: Evaluation of the mate-paired library by Bioanalyzer 2100.

104

3.2.6 Analysis of Whole Genome Sequencing Data

The final library was subjected to SOLiD sequencing and data was produced in XSQ

file format (a colour space format) which is converted into ‗.csfasta‘ format (the human

readable format for DNA sequences in color codes). A total of 2.065 billion short reads

of DNA were obtained after converting XSQ files into csfasta format using the

XSQ_Converter tool.

Filtering the low quality reads improves the alignment percentage, overall coverage,

and makes variant calling more reliable. SOLiD_preprocess_filter_v2 is a perl based

tool which efficiently filters the reads below a specified quality score (Sasson, and

Michael, 2010). Here, removing the short reads having 3 or more bases with quality

score below 10, about 1.340 billion short reads with matching mate-pairs were

obtained. After alignment with the reference human genome, and removing the reads

aligned two or more times with the reference genome at the same position (duplicates

removal), 312,849,478 short reads were found to be aligned properly with the

reference.

After applying the GATK best practices for variants calling work flow, 2,568,249

variants were called. After applying the variant calling quality score (QV) of 20 (filtering

out the variants with QV < 20), there were 2,167,161 variants in the filtered vcf file,

including 2,055,524 SNVs and 111,664 short insertions deletions (indels). The

histogram of the depth of variants (DP) from the vcf file also showed the median depth

of coverage to be 6 (Figure 3.20). The transitions / transversion ratio (Ti/Tv) for whole

genome variants was found to be 2.14. As a whole, there were 41088 (1.90%) novel

variants.

105

Figure 3.20: Distribution of the depth (DP) of variants.

106

3.2.7 Analysis for Deleterious Mutations Related to Hyperlipidemia and

Related Cardiac Diseases

The filtered variants were subjected to ANNOVAR for gene based, region based, and

filter based annotations. As a result, the number of variants pertaining to different

genomic regions were obtained (Table 3.11). The number of synonymous variants was

more than the non-synonymous variants with a non-syn/syn ratio of 0.90. From the

annotation with CADD, SIFT, and Polyphen2 tools, 425 SNVs were prioritized as

combinedly predicted deleterious variants in 385 different genes (Figure 3.21).

Table 3.11: The number of variants in different genomic regions as calculated from ANNOVAR annotation.

Annotation No. of Variants

Exonic 14213

Intronic 771143

Intergenic 1190116

Upstream 12997

Downstream 15046

UTR5’ 2794

UTR3’ 17434

Synonymous 7143

Nonsynonymous 6434

Figure 3.21: The predicted deleterious variants with SIFT, Polyphen2, and CADD.

107

Among the 425 deleterious SNVs, 27 SNVs belonged to 25 genes from the genes list

of CVDs (Table 3.12). This also included 17 deleterious SNVs which were prioritized in

mutational load analysis. It was also found that two genes i.e., MTRR (methionine

synthase reductase), which plays role in DNA repair mechanism, and PLB1

(Phospholipase B1) which is a membrane phospholipase and is involved in removing

the sn-1 and sn-2 fatty acids from glycerophospholipids, contained two predicted

deleterious SNVs each. The MTRR has been reported to be linked with increased risk

of coronary artery disease (Brown, McKinney, Kaufman, Gravel, and Rozen, 2000),

while PLB1 has been found to be associated with the levels of low density lipoprotein-

cholesterol (LDL-C) and risk of coronary artery disease (Lettre et al., 2011). Further,

three SNVs were found in homozygous state i.e., ‗rs111896385‘ in FMN2 (Formin 2)

gene, a novel SNV in MTRR (methionine synthase reductase) gene, and ‗rs2108622‘

in CYP4F2 (Cytochrome P450 Family 4 Subfamily F Member 2) gene. The

nonsynonymous rs111896385 variant in Formin 2 gene affects its exon 5 & 6 and

causes a proline to leucine transition in protein product. This mutation has not been

described earlier in cardiovascular genetics but the gene Formin 2 has been shown to

be linked with coronary heart disease through a genome wide association study of

14000 cases (Wellcome Trust Case Control Consortium, 2007). The nonsynonymous

homozygous SNV ‗rs2108622‘ in CYP4F2 affects its exon 11 and causes a valine to

methionine transition in the protein product. This SNV has previously been reported to

be associated with coronary heart disease in Chinese Han population (C. Yu et al.,

2014). The novel nonsynonymous SNV in MTRR affects its exon 14 leading to

glutamine to leucine transition in its protein.

In order to explore the genes harboring predicted deleterious mutations and not

included in the genes list of mutational load analysis, three genes i.e., CDC27,

KCNJ12, HYDIN were having six deleterious mutations in each. These genes have not

been reported to be associated with cardiac disorders. The gene KCNJ12 (potassium

voltage-gated channel subfamily J member 12) encodes a K+ channel which is

involved in inward rectifying current in cardiac cells and is involved in cardiac

conduction. The gene CDC27 encodes a cell division cycle 27 protein which is part of

anaphase-promoting complex during cell division. The third gene HYDIN encodes

108

axonemal central pair apparatus protein which is involved in motility of cilia. The

number of deleterious variants was compared within these genes in 5 randomly

selected unrelated individuals from 1000 Genomes Project PJL. On average, KCNJ12,

and CDC27 contained two non-synonymous deleterious mutations, while, HYDIN

contained eight non-synonymous deleterious mutations. This gave clue that

deleterious non-synonymous mutations in KCNJ12 and CDC27 may have link with

hyperlipidemia and/or risk of coronary heart disease, while HYDIN normally contains

high number of deleterious mutations in Pakistani population. The allele frequencies of

all the prioritized SNVs were compared within 1000 Genomes Project dataset in which,

12 SNVs were highlighted having higher alternate allele frequency in SAS populations

than in global populations (Figure 3.22). This included seven common allele frequency

(AF > 5.0%) SNVs and three low allele frequency (AF = 1.0 – 5.0%), and two rare

allele frequency (AF < 1.0%) SNVs.

Table 3.12: 27 predicted deleterious non-synonymous SNVs in hyperlipidemia patient in genes associated with CVDs. The 17 underlined SNV IDs were prioritized in mutational load analysis and re-found in the patient by whole genome sequencing.

CHR POS ID REF ALT Homo/Hetero Gene Effect

chr1 47614434 rs4926600 C T heterozygous CYP4A22 exon12:p.L509F

chr1 115222237 rs34526199 T A heterozygous AMPD1 exon6:p.K316I,exon7:p.K320I

chr1 169701060 rs5361 T G heterozygous SELE exon4:p.S149R

chr1 240370985 rs111896385 C T homozygous FMN2 exon5:p.P958L,exon6::p.P962L

chr2 21231524 rs676210 G A heterozygous APOB exon26:p.P2739L

chr2 28761981 rs6753929 G C heterozygous PLB1 exon11:p.V223L,exon11:p.V212L

chr2 28854972 rs74701215 C G heterozygous PLB1 exon54:p.P1312A,exon55:p.P1323A

chr2 179398509 rs3731752 C A heterozygous TTN exon186:p.G25213V, exon187:p.G25338V,T

chr3 52522023 rs779852675 C T heterozygous NISCH exon16:p.L839F

chr5 7870973 rs1801394 A G heterozygous MTRR exon2:p.I22M,exon2:p.I49M

chr5 7897285 . A T homozygous MTRR exon14:p.Q626L,exon14:p.Q653L

chr5 52356790 rs377150294 C T heterozygous ITGA2 exon12:p.R458W

chr5 148206600 rs201257377 A G heterozygous ADRB2 exon1:p.N69S

chr6 33651929 rs763462799 G A heterozygous ITPR3 exon36:p.G1641R

chr7 3990657 rs34775958 C T heterozygous SDK1 exon6:p.A317V

chr7 92085763 rs34768413 C T heterozygous GATAD1 exon5:p.R233W

chr7 94946084 rs854560 A T heterozygous PON1 exon3:p.L55M

109

Figure 3.22: Deleterious SNVs having higher allele frequency in SAS populations than in global populations.

To assess the deleterious role of non-coding variants, the variants with CADD_Phred

score ≥ 15 were determined in non-coding regions. This showed that there were 211

upstream deleterious variants, 150 downstream, 136 5‘-untranslated region, 399

3‘-untranslated region, and 19 deleterious variants pertaining to splice sites related to

831 different genes including 44 genes from the mutational load genes list.

chr10 69881812 rs772265631 G A heterozygous MYPN exon2:p.R206Q,exon3:p.R206Q

chr11 27679916 rs6265 C T heterozygous BDNF exon1:p.V66M,exon2:p.V66M

chr11 102713476 . C T heterozygous MMP3 exon2:p.G93R

chr12 6061559 rs7962217 C T heterozygous VWF exon49:p.G2705R

chr15 30008977 rs2291166 T G heterozygous TJP1 exon22:p.D1267A,exon23:p.D1271A,T

chr17 26109102 rs3730017 G A heterozygous NOS2 exon7:p.R221W

chr19 15990431 rs2108622 C T homozygous CYP4F2 exon11: p.V433M

chr22 19766782 rs4819522 C T heterozygous TBX1 exon9:p.T350M

chr22 36661354 rs148296684 C T heterozygous APOL1 exon5:p.L140F,exon6:p.L158F

chrX 153008483 rs78993751 G A heterozygous ABCD1 exon8:p.G608D

0

0.1

0.2

0.3

0.4

0.5

0.6

Alt

ern

ate

Alle

le F

req

Global_AF

SAS_AF

110

KCNJ12 was identified with six non-coding deleterious variants in

3‘-untranslated region followed by CDC27 with five deleterious mutations in 5‘-

untranslated region and BCAT1 which showed four mutations in 3‘-untranslated

region. Again, the occurrence of deleterious mutations in these regions was checked

in five randomly selected unrelated individuals from 1000 Genomes Project PJL.

These individuals did not contain mutations in 3‘-UTR of KCNJ12, and 5‘-UTR of

CDC27, however all five individuals contain four deleterious mutations in 3’-UTR of

BCAT1. This gave a clue that untranslated regions of KCNJ12 and CDC27 may also

be involved in hyperlipidemia and associated cardiac risk. A homozygous SNV

rs2516839 (C>T) was found in the 5‘-untranslated region of USF1. This SNV has been

reported to pose two fold higher risk of sudden cardiac death (Kristiansson et al.,

2008). The USF1 encodes an upstream transcription factor 1 of leucine zipper family,

and reported to be associated with hyperlipidemia and atherosclerosis in other studies

also (Laurila et al., 2010). The frequency of this variant is 0.53 in 1000 Genomes PJL,

0.44 and 0.45 in two neighboring populations Indian Telugu (ITU) and Gujrati Indian

(GIH) respectively, 0.15 in African, 0.46 in American, and 0.63 in European

populations. Another homozygous SNV rs71457130 (C>T) was observed in the 3‘-

untranslated region of LRP6 which encodes LDL receptor related protein 6 and is

involved in receptor-mediated endocytosis of lipoproteins. This variant has not been

described earlier with the risk of cardiac disease but other mutations in LRP6 have

been reported to be associated with coronary artery disease (Mani et al., 2007; Y. Xu

et al., 2014).

The annotation of variants with Variant Effect Predictor (VEP) tool predicted the effect

of SNVs on transcript level. The SNVs posing severe functional impact i.e., loss of

function (LoF) effect to the transcript were filtered through this analysis. A homozygous

stop-gained mutation ‗rs885985 (G>A)‘ in CLDN5 (Claudin 5) gene was found. This

gene encodes an integral membrane protein that forms strands of tight junctions. This

gene is highly expressed in fat tissues (Fagerberg et al., 2014) and the reduced level

of claudin-5 has been reported to be involved in human heart failure (Swager et al.,

2015). Another stop-gained mutation ‗rs328 (C>G)‘ was found in heterozygous state in

111

LPL (lipoprotein lipase) gene which has been reported to be involved in hyperlipidemia

(Shatwan et al., 2016) and risk of coronary artery disease (Xie, and Li, 2017).

3.2.8 Filtration for Disease Mutations Related to Hyperlipidemia and

Related Cardiac Diseases

In addition to the validated deleterious SNVs and LoF SNVs, the variants were also

searched for disease mutations in ClinVar, OMIM, and GWAS catalogue. ClinVar, and

OMIM databases are manually curated public archive harboring the genetic variations

associated with various diseases (Hamosh, Scott, Amberger, Bocchini, and McKusick,

2005; Landrum et al., 2014). GWAS database catalogues large number of common

variants associated with common diseases through genome wide association studies

(Welter et al., 2013). From OMIM filtration, one heterozygous intronic SNV rs2073658

(C>T) was found in USF1. This gene itself has been reported to be associated with

hyperlipidemia and atherosclerosis (Laurila et al., 2010). This variant has also been

catalogued in ClinVar as risk factor of familial hyperlipidemia combined. The global

allele frequency of the alternate allele is 17%, but in Pakistan it is 21.9%, which is

almost double than in the neighboring South Asian populations where it is 6.9% in Sri

Lankins, 11.8% in Indian Telugu, 12.6% in Gujrati Indians, and 14.5% in Bengalis.

From GWAS filtration, 73 common variants, including 40 in homozygous state, were

found to be associated with hyperlipidemia and atherosclerosis, and 147 variants

associated with coronary heart diseases, including 71 in homozygous state. From the

variants filtered for coronary heart diseases, the highest numbers of variants i.e., six

SNVs were found at 9p21.3 locus which is the non-coding region of CDKN2B which

encodes cyclin dependent kinase inhibitor 2B. Variations at this locus cause altered

expression of its gene in cardiac tissues and has been associated with coronary heart

diseases (Pilbrow et al., 2012). The second highest number of risk variants for

coronary heart diseases was found in the intergenic region of LPL and SLC18A1 at

8p21.3, which has been reported to be associated with raised lipid levels and risk of

coronary artery disease (Aulchenko et al., 2009). Further, there were four loci i.e.,

intergenic region of TRIB1-LINC00861 at 8q24.13, intronic region of PHACTR1 at

6p24.1, intergenic region of LOC101929011- BUD13 at 11q23.3, and intergenic region

112

of LDAH-APOB at 2p24.1 locus, each of which containing three risk variants for

coronary heart diseases. From the variants filtered for hyperlipidemia and

atherosclerosis, the highest numbers i.e., two risk variants were found at three loci.

These included intronic variants in ABCA1 at 9q31.1 locus, intronic variants in

CDKN2B at 9p21.3 locus, and intergenic region of TRIB1-LINC00861 at 8q24.13

locus. The gene ABCA1 encodes ATP-binding-cassette transporter A1 which is

involved in the transport and homeostasis of plasma high density lipoproteins-

cholesterol (HDL-C) levels. The malfunctioning of this transporter protein has been

found to cause increased triglycerides, causing an increased risk of CAD (Clee et al.,

2001; Frikke-Schmidt, 2011). The role of CDKN2B has been described earlier. The

intergenic region of TRIB1-LINC00861 has recently been reported to pose moderate

risk to increased cholesterol and risk of CAD (Paquette et al., 2017).

The comparison of alternate allele frequencies of filtered variants was carried out

between the SAS and Global populations. The average allele frequency of variants

associated with hyperlipidemia and atherosclerosis was 53.20% in SAS and 52.02% in

Global populations. Likewise, the average allele frequency of variants associated with

coronary heart diseases was 52.76% in SAS and 51.58% in Global populations. The

xy scatter revealed variants of these diseases having higher allele frequencies in SAS

and in Global populations (Figure 3.23). Through this analysis, four SNVs associated

with hyperlipidemia and seven SNVs associated with CAD were highlighted having 1.5

fold or higher allele frequencies in SAS than in Global populations (Table 3.13).

113

Figure 3.23: Comparison of Global and South Asian allele frequencies for variants of hyperlipidemia (blue) and ischemic heart diseases (red).

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

AF_

SAS

AF_Global

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

AF_

SAS

AF_Global

114

Table 3.13: Common variants associated with hyperlipidemia and CAD filtered from GWAS-Catalogue and having 1.5 fold or higher allele frequency in SAS than in Global populations.

CHR POS ID REF ALT State Location Gene/Locus AF_Global AF_SAS

2 21204025 rs6544366 G T Heterozygous intergenic LDAH-APOB 0.3716 0.5798

3 64029383 rs831574 A G Heterozygous intergenic PSMD6-PRICKLE2 0.2792 0.4325

6 46684222 rs1805017 C T Homozygous exonic PLA2G7 0.3175 0.5235

6 97080198 rs12200560 A G Heterozygous intergenic FHL5-GPR63 0.2949 0.4673

7 106409452 rs17398575 G A Homozygous intergenic CCDC71L-PIK3CG 0.1839 0.319

9 107665739 rs2575876 G A Heterozygous intronic ABCA1 0.2632 0.4008

11 92708710 rs10830963 C G Heterozygous intronic MTNR1B 0.2602 0.4264

12 51213433 rs17291650 A G Heterozygous exonic ATF1 0.0365 0.0624

16 57005479 rs1532624 C A Heterozygous intronic CETP 0.3131 0.4796

16 67902070 rs2271293 G A Heterozygous intronic NUTF2 0.0996 0.1667

19 11163601 rs1122608 G T Heterozygous intronic SMARCA4 0.1382 0.2495

115

3.3 Whole Exome Sequencing and Analysis of Pakistani Patients

with Cardiomyopathy

3.3.1 Sequencing Data

The exome sequencing short reads of patients under study were obtained in ‗fastq‘ file

format, which is a standard format for storing the short reads of DNA sequences along

with their quality scores (Cock, Fields, Goto, Heuer, & Rice, 2010). The summary of

short sequence reads in each of fastq is given in Table 3.14.

Table 3.14: Quality assessment of raw reads in CMP patients‘ fastq files

Samples No. of Raw reads

Read Length (bases)

Gb of Sequences

GC(%) Average quality

per read

MS-1_1.fastq 36,726,002 101 3.709 49.396 40

MS-1_2.fastq 36,726,002 101 3.709 49.396 40

MS-2_1.fastq 30,251,621 101 3.055 49.323 40

MS-2_2.fastq 30,251,621 101 3.055 49.323 40

MS-3_1.fastq 30,334,907 101 3.064 49.505 40

MS-3_2.fastq 30,334,907 101 3.064 49.505 40

MS-4_1.fastq 33,594,314 101 3.393 49.392 40

MS-4_2.fastq 33,594,314 101 3.393 49.392 40

MS-5_1.fastq 37,471,150 101 3.785 49.130 40

MS-5_2.fastq 37,471,150 101 3.785 49.130 40

3.3.2 Quality Assessment of Raw Short Reads

Quality assessment of short reads plays key role in obtaining the true positive genetic

variants from high throughput DNA sequencing data. Filtering low quality reads is

inevitable to determine meaningful genotypes and hence increases accuracy for

associating them with disease/phenotype (Carson et al. 2014). The quality

assessment with FastQC tool showed that each fastq file had average Phred quality

score of reads 40, with none of the reads having quality score below 20 (Figure 3.24).

The GC contents for each file were found to be 49%.

116

Figure 3.24: Phred quality score distribution of forward and reverse ‘fastq‘ files.

117

3.3.3 Alignment with the Reference Genome and Variants Calling

The results of paired-end alignment of short reads with reference human genome were

obtained in Sequence Alignment Map (SAM) format files, which were converted into

binary format BAM files. After the removal of duplicates in BAM files, the insert size of

aligned reads was determined with PICARD (picard-tools-1.109) to evaluate the

correct alignment of short reads with the reference genome. The insert size was found

to be 100-300 bp for all the five samples with the peak value of ~200bp (Figure 3.25).

The numbers of reads aligned with the reference genome were determined with

‗samtools flagstat‘. The depth of coverage was calculated to be 87.86x - 108.12x for all

the samples. The ratio of mapped reads with the reference without duplicates was

>85%. For each of BAM file, the results are summarized in Table 3.15.

Table 3.15: Mapped reads and raw depth of coverage for BAM files

Sample Mapped reads from

‘duplicates removed bam

file’

Mapped reads (%)

Total bases in mapped

reads (mapped

reads * 101)

Size of Target

(bp)

Raw Depth of Coverage

MS1_AJK 63,721,330 86.75 6,435,854,330 60,000,000 107.26 x

MS2_SND 52,749,145 87.18 5,327,663,645 60,000,000 88.79 x

MS3_BLC 52,196,433 86.03 5,271,839,733 60,000,000 87.86 x

MS4_PNJ 56,662,519 84.33 5,722,914,419 60,000,000 95.38 x

MS5_URD 64,229,947 85.71 6,487,224,647 60,000,000 108.12 x

118

Figure 3.25: Insert size for all the five bam files.

119

By applying the best practices of GATK pipeline for joint variants calling from the BAM

files, initially 421194 variants were called in raw ‗.vcf‘ file in which the minimum phred-

scaled confidence threshold was set to 30. This means a variant was called at a site

where minimum Phred quality of mapping was 30. Filtering the variants call set with

different filters enhances the variants accuracy. After applying the filter of Read_Depth

≥ 20, Genotyp_Quality ≥ 20, variant Quality ≥ 50, and variants calling quality score

recalibration (VQSR), 183159 variants were retained in the filtered ‗vcf‘ file. During

variants quality score recalibration, maximum truth was obtained after applying 99.3%

sensitivity filter for SNPs, and 95% sensitivity filter for indels, which is comparable to

those in ExAC data. The number of variants at each filtration step are summarized in

Table 3.16.

Table 3.16: Numbers of variants after applying different filters

Filtering Step No. of variants

Novel variants

Novel (%)

Ti/Tv

Raw vcf file 421194 12769 3.03 2.21

DP ≥ 20 251478 10030 3.99 2.23

DP ≥ 20, GQ_MEAN ≥ 20 233325 8862 3.80 2.26

DP ≥ 20, GQ_MEAN ≥ 20, QUAL ≥ 50

229423 7829 3.41 2.27

DP ≥ 20, GQ_MEAN ≥ 20, QUAL ≥ 50, VQSR

184461 2775 1.50 2.43

DP ≥ 20, GQ_MEAN ≥ 20, QUAL ≥ 50, VQSR, bi-allelic variants only

183159 2670 1.46 2.43

DP ≥ 20, GQ_MEAN ≥ 20, QUAL ≥ 50, VQSR, bi-allelic, SNPs only

168438 2221 1.32 2.43

DP ≥ 20, GQ_MEAN ≥ 20, QUAL ≥ 50, VQSR, bi-allelic, Exonic (SNPs)

38853 304 0.78 3.19

DP ≥ 20, GQ_MEAN ≥ 20, QUAL ≥ 50, VQSR, bi-allelic, Indels only

14721 449 3.05 -

DP ≥ 20, GQ_MEAN ≥ 20, QUAL ≥ 50, VQSR, bi-allelic, Exonic (Indels)

405 13 3.21 -

The application of GATK‘s variant quality score recallibration (VQSR) model,

considerably decreased the proportion of novel variants (3.41% to 1.50%), and Ti/Tv

ratio also improved from 2.27 to 2.43. The VQSR model is built on the annotation

values of standard variants of HapMap, and 1000 Genomes Project Phase-1 SNPs

120

and indels. The overall Ti/Tv ratio for (exonic + non-exonic) variants, which should be

more than 2.0. Likewiese, the Ti/Tv ration for exonic variants was calculated to be

3.19, which is also in the acceptable range of > 2.80 (Bainbridge et al., 2011). The

histogram of the depth of coverage (DP) for SNPs showed a bimodal distribution with

peak values of 41 and 200, and for indels a unimodal distribution with peak value of 41

(Figure 3.26).

Figure 3.26: Histogram for the depth of coverage for SNPs (A) and indels (B).

The total number of finally filtered bi-allelic variants was 181111, in which the exonic

variants were 39258 (21.68%). The large number of non-exonic variants (78.32%) are

due to the capturing of flanking regions of the 60-Mb target by the broad range

enrichment kit (SureSelectXT Human All Exon v6 kit (Agilent Technologies, Santa

Clara, CA). The number of exonic variants for each of the patient were determined to

evaluate whether the contribution of each individual to the merged variants file was

uniform or else. On average, the exonic variants were found to be ~20308 (SD ± 450)

per individual. The number of per individual exonic variants is comparable with

European-Americans of earlier large scale whole exome sequencing studies

(Bamshad et al., 2011).

121

Table 3.17: Numbers of variants after applying different filters

Sample Total variants

Total SNPs

Total indels

Exonic SNPs

Exonic indels

Novel in Total (%)

MS1_AJK 99701 91539 8162 20909 251 644 (0.65)

MS2_SND 94406 86787 7619 19845 241 626 (0.66)

MS3_BLC 95960 88273 7687 20652 262 572 (0.60)

MS4_PNJ 95408 87698 7710 20058 251 573 (0.60)

MS5_URD 94559 86709 7850 20077 265 685 (0.72)

Average 96007 88201 7806 20308 254 620

SD (±) 2160.24 1976.48 216.23 449.94 9.64 48.35

3.3.4 Annotation of Single Nucleotide Variants (SNVs) and Analysis

SNVs annotation was performed with three tools i.e. ANNOVAR, CADD, and VEP.

The number of variants and their corresponding genes/regions were also evaluated,

which are described in the foloowing sections.

3.3.4.1 Annotation with ANNOVAR, and CADD

The ANNOVAR tool annotates variants to determine their genomic and region based

features, evolutionary conservation scores, prediction scores by SIFT, Polyphen2,

Mutation assesser tools, and allele frequencies in different databases. Summary of the

SNVs as determined by the ANNOVAR annotaiton is given in Table 3.18. The

individual from Azad Jammu Kashmir (AJK) contained slightly larger number of SNVs

in each of the category.

122

Table 3.18: The number of SNVs pertaining to different genomic regions and functions after annotation with ANNOVAR.

Annotation AJK BLC PUN SND URD Total

Exonic 20332 19265 20079 19496 19522 38203

Intronic 51295 49009 49076 49841 48562 94863

Intergenic 7707 6975 7310 6903 7244 13669

Upstream 1217 1151 1191 1135 1136 2222

Downstream 509 443 484 483 478 947

UTR5‘ 1822 1694 1772 1634 1753 3361

UTR3‘ 2918 2724 2851 2814 2724 5402

Synonymous 10729 10113 10634 10309 10302 19740

Nonsynonymous 9402 8963 9233 8984 9025 18049

Nonsyn/syn ratio 0.88 0.89 0.87 0.87 0.88 0.91

The ANNOVAR annotation with Refseq data revealed that SNVs correspond to 23477

genes. On average, every person contained 9121.4 (SD ± 190) nonsynonymous

mutations. The nonsynonymous to synonymous (Nonsyn/syn) ratio was also

calculated and was found almost constant (~0.9) for each of the individuals, which is

same as reported in 1000 Genomes Project South Asian populations i.e., 0.90 (1000

Genomes Project, 2015). The Nonsyn/syn ratio gives a broader clue of genes which

have been mutated to code for proteins with altered amino acid. The genes harboring

more nonsynonymous variants than the synonymous variants may contribute more to

the susceptibility/onset of disease. There were 8416 genes containing

nonsynonymous mutations. The nonsyn/syn ratio in these genes was significantly

higher than the overall ratio, i.e., 1.40 versus 0.91 of overall ratio. This is because

most of the rare missense mutations are deleterious in nature with disrupting the

proteins‘ functions and causing disease (Kathiresan, and Srivastava, 2012).

To determine the genes which were highly mutated to code changed amino acids in

proteins, the genes were sorted in descending order according to the number of

nonsynonymous mutations they contained. This revealed that 4041 genes contained

more than one nonsynonymous mutation, while there were 116 genes containing ≥ 10

123

nonsynonymous mutations. The MUC16 (mucin 16, cell surface associated) was the

top candidate containing 74 missense mutations as a whole. This gene encodes the

carbohydrate antigen 125 (CA-125). The CA-125 is a known tumor marker for ovarian

cancer, its level was also found elevated in patients of hypertrophic cardiomyopathy

leading to heart failure (Varol et al., 2007). In another longitudinal study, the CA-125

level was significantly increased in patients with major adverse cardiovascular events

(MACE) including the left ventricular dysfunction (Betti, Ballo, Barchielli, and Zuppiroli,

2010). The ZNF717 (Zinc Finger Protein 717) contained second highest missense

mutations i.e., 68. As the name suggests, this gene codes a zinc finger which is

involved in the regulation of many genes involved in cell proliferation, differentiation

and apoptosis. Literature survey did not show any previous association of this gene

with cardiomyopathies or any other cardiac disorder. The third top most gene was

MUC3A (mucin 3A, cell surface associated) which encodes some membrane bound

and secretary epithelial glycoproteins. This gene was also not found reported to be

associated with any of the cardiac disorders. The well-studied and characterized gene

to be associated with cardiomyopathies TTN (titin, a striated muscular structural

protein) was found at top fourth position in this study containing 49 missense

mutations. The top 1% genes carrying highest number of missense mutations are

listed in table 3.19.

Table 3.19: The top 1% genes containing nonsynonymous mutations.

Gene Nonsyn mutations

Gene Nonsyn mutations

MUC16 74 ALPK2 18

ZNF717 68 CFAP46 18

MUC3A 53 DNHD1 18

TTN 49 MALRD1 18

HYDIN 37 MUC22 18

OBSCN 35 PARP4 18

CMYA5 30 SVEP1 18

SSPO 29 XIRP2 18

CCDC168 28 DNAH14 17

FSIP2 26 RP1L1 17

124

PDE4DIP 26 ADGRV1 16

MKI67 23 DCHS2 16

AHNAK2 22 FAT1 16

OR4C3 22 NEB 16

SPTBN5 22 PKD1L2 16

SYNE2 21 FBN3 15

MUC4 20 LAMA5 15

C1orf167 19 RNF213 15

EYS 19 USH2A 15

KCNJ12 19 VWDE 15

ZAN 19 -- -

125

Overall, the CADD predicted 11246 variants as deleterious (based on CADD_phed

score ≥ 15), SIFT predicted 4424 variants as deleterious, while Polyphen2 predicted

3058 variants as deleterious. There was also overlapping of predicted deleterious

variants among the three tools (Figure 3.27). The number of combined predicted

deleterious SNVs by all the three tools was found to be 1469, in which 325 were in

homozygous state. There were 41 predicted deleterious novel variants, all in

heterozygous state. Here, the ratio of novel variants was found to be 2.79%, which is

higher than the overall ratio of novel variants (Table 3.16). This is inline with the

previous reports that deleterious mutations with disrupting the proteins‘ functions and

causing disease are usually rare in populations (Kathiresan, and Srivastava, 2012).

The significantly higher number of deleterious variants predicted by CADD as

compared to SIFT and Polyphen2 is due to that it predicts the effect of non-coding

variants also in addition to coding variants, while SIFT, and Polyphen2 predict the

effect of missense coding variants. Polyphen2 predicted the least number of SNVs as

deleterious as compared to SIFT, and CADD.

Figure 3.27: Venn diagram showing the number of SNVs predicted as deleterious by SIFT, Polyphen2, and with CADD_phred score ≥ 15.

The deleterious variants were also determined in each of the patients of this study by

all these three tools seperately and combined (Figures 3.28 to 3.31). Overall, the DCM

126

patient from Punjab (PNJ) contained larger number of deleterious variants, while the

patient from Sindh (SND) contained highest number of homozygous deleterious

variants. There were 84 SNVs predicted as deleterious by all three tools and present

in all five patients, out of which 10 were in homozygous state (Figure 3.31). These 10

homozygous deleterious SNVs belonged to all different genes (Table 3.20). It is

noteworthy that there was no overlap between the genes carrying homozygous

deleterious SNVs in all five patients and the top 1% genes containing highest number

of missense mutations, indicating that both of these criteria have different importance

in associating deleterious variants with disease.

Figure 3.28: The SNVs predicted as deleterious by SIFT.

Figure 3.29: The SNVs predicted as deleterious by Polyphen2.

127

Figure 3.30: The SNVs with CADD_phred score ≥ 15.

Figure 3.31: The combinedly predicted deleterious SNVs with CADD (phred score ≥ 15) and SIFT, and Polyphen2 tools.

128

Table 3.20: The homozygous deleterious SNVs present in all five patients of this study.

CHR POS ID REF ALT GENE Substitution

chr1 248059476 rs12139390 A C OR2W3 exon1: A588C

chr3 44948479 rs3749195 C T TGM4 exon10: C1114T

chr3 50597092 rs1034405 G A C3orf18 exon5: C485T

chr4 17643848 rs2286771 G A FAM184B exon13: C2350T

chr9 6328947 rs3847262 T C TPD52L3 exon1: T352C

chr11 5758062 rs7397032 T C OR56B1 exon1: T316C

chr17 17896205 rs4584886 C T DRC3 exon6: C571T

chr17 39659913 rs9891361 G A KRT13 exon2: C560T

chr18 61654463 rs3826616 A G SERPINB8 exon6: A530G

To determine whether SNVs of all frequencies have contributed equally to the

deleterious pool, the site frequency spectrum (SFS) of all SNVs and deleterious SNVs

was compared. Here, the SNVs with CADD_phred score ≥ 15 were used as

deleterious because CADD predicted the effect of larger number of coding as well as

non-coding SNVs. This analysis showed that singletons contributed most to the

deleterious pool, whose proportion was increased from 0.318 (in all SNVs) to 0.462 (in

deleterious pool). The proportion of doubletons was slightly increased from 0.157 to

0.164 in deleterious pool. Besides these two classes of SNVs, the proportion of all

other SNVs of AC=3 to AC=10 was found to be decreased in the deleterious pool

(Figure 3.32). SFS is a simple yet powerful way to study the population demography in

terms of medical genetics and evolutionary history. The greater proportion of individual

specific deleterious singletons, as observed here, is persistent with the view that rare

allele frequency variants are more severe in deleterious effect whereby imparting more

to the susceptibility and onset of complex diseases in human (Bomba, Walter, and

Soranzo, 2017; Kryukov, Pennacchio, and Sunyaev, 2007). Also, these rare

deleterious variants were individual specific because such variants are under tight

selective pressure, and tend to remain individual or population specific (Henn, Botigué,

Bustamante, Clark, and Gravel, 2015). Few deleterious fixed or nearly fixed SNVs

(AC=8 to AC=10) in these patients may pose moderate effect to the fitness, yet

129

increased in frequency due to less efficiency of purifying selection during the

expansion or some bona fide balance, as this was demonstrated by larger datasets of

different non-African populations (Henn, Botigué, Bustamante, Clark, and Gravel,

2015; Subramanian, 2016).

Figure 3.32: Site Frequency Spectrum of all SNVs (A), and deleterious SNVs (B). The proportion of deleterious singletons is considerably increased, while the proportion of higher allele count deleterious variants decreased than in all SNVs pool.

A

B

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Pro

po

rtio

n

130

Cardiomyopathies are rare genetic disorders with a global prevalence of 1 in 2500

inviduals. In the light of this, it was hypothesized that variants of rare allele frequency

would contribute more to the susceptibilty of this disease. So, the combined predicted

deleterious SNVs were filtered based on global allele frequency in 1000 Genomes

Project. This filteration highlighted 350 SNVs with global minor allele frequency < 0.01

(i.e. < 1%). Out of these 350 SNVs, 19 SNVs were present in homozygous state in at

least one of the patients under study, in which 18 were nonsynonymous SNPs causing

the change in amino acid in the resulting proteins, while 1 was intergenic SNP (Table

3.21).

Table 3.21: The homozygous deleterious SNVs with Global MAF < 1%.

CHR POS ID REF ALT Gene Effect

4 184170008 rs574720107 G A WWC2 exon7:p.G292R

6 24520719 rs115784602 G A ALDH5A1 exon6:p.V321M

6 33132114 rs146555195 C T COL11A2 exon62:p.R1560H exon63:p.R1581H exon65:p.R1667H

6 160557271 rs4646278 C T SLC22A1 exon5:p.R287W

8 99135575 rs145484648 G A POP1 exon2:p.A4T

8 133984968 rs115436575 G A TG exon34:p.G2061R

9 95179071 rs34607425 G A OMD exon2:p.P257L

10 13237117 rs74881009 C T MCM10 exon14:p.R608W exon14:p.R609W

10 24762320 rs571833641 C G KIAA1217 exon2:p.S55C exon6:p.S337C exon7:p.S257C

10 71144670 rs554507867 C T HK1 exon12:p.A613V exon15:p.A617V exon16:p.A601V

11 107375422 rs117249984 C A ALKBH8 exon12:p.D656Y

12 30906393 rs573013586 C T CAPRIN2 exon1:p.S102N

12 129299599 rs144816528 G A SLC15A4 exon2:p.P188L

12 133202740 rs5745068 C T POLE exon46:p.R2165H

16 67235477 rs183146864 G A ELMO3 exon10:p.V337I

19 18502861 rs34666550 C T LRRC25 exon2:p.C285Y

19 33298507 rs550817829 C T TDRD12,SLC7A9 intergenic

19 46332369 rs141706016 C T SYMPK exon14:p.R615H

20 2796251 rs576728084 G A C20orf141 exon2:p.G110S exon3:p.G110S

131

Comparison of allele frequencies of deleterious variants enables to get understanding

of mutational load of diseases in different populations, because populations differ in

their genetic makeup depending on their past evolutionary histories (Henn, Botigué,

Bustamante, Clark, and Gravel, 2015). Here, the derived states of 350 rare SNVs were

determined from online CADD annotation, and their derived allele frequencies in

Global and South Asian populations were retrieved and compared in a simple xy

scatter. This analysis showed that there were 278 SNVs having higher derived allele

frequency in South Asia than in Global populations (Figure 3.33).

Figure 3.33: Scatter plot of 350 deleteroius SNVs for comparison of derived allele frequencies in South Asia and in Global populations. Here, the correlation coefficient (r) was 0.72, (regression line, blue) representing a bit inclination of alleles towards South Asia. The SNVs right to the diagonal (black line) have higher allele frequency in Saouth Asia than in Golobal populations.

132

3.3.4.2 Annotation with Variant Effect Predictor (VEP)

The genome of a healthy individual contains about 100 Loss of Function (LoF)

variants, out of which, ~20 are in homozygous state. The LoF variants include stop-

gained, stop-lost, start-lost, splice-donor, splice-acceptor, and frameshift

insertion/deletion. Complete knock-out of genes with LoF variants helps understanding

the function of those genes, whereby predicting their relevance to certain diseases

(Borger, 2017; Kaiser, 2014; MacArthur et al., 2012).

Annotation of the variants under study was also carried out by using Ensembl‘s Variant

Effect Predictor (VEP) tool, version 87. Overall, 696 LoF SNPs were identified

pertaining ‗HIGH‘ impact to the transcript. The highest numbers of LoF SNPs were

those acquiring a stop codon due to mutation and causing the premature termination

of protein synthesis in translation process, so termed as protein truncating variants

(PTVs). As a whole, the LoF SNPs with respect to different functional consequences

have been illustrated in Figure 3.34.

Figure 3.34: Numbers of Loss of Function SNPs according to functional consequences.

131

127

88 81

269

splice_donor

splice_acceptor

stop_lost

start_lost

stop_gained

133

These LoF SNPs were corresponding to 655 genes, and there were 31 genes

containing more than one LoF mutations. Here, MUC19 was the top most candidate

gene containing 5 LoF mutations (3 stop-gained, 1 splice-acceptor, and 1 splice-

donor). MUC19 belonged to the same family as MUC16, which was the top candidate

gene containing highest number of nonsynonymous SNPs in this study (Table 3.20).

MUC19 encodes a gel-forming mucin protein, which has previously been associated

with Sjögren syndrome (D. Yu et al., 2008). The genes containing second highest

number of LoF mutations (4) were ZNF717 and PKD1L2. It is noteworthy that ZNF717

was also the candidate gene containing highest number of nonsynonymous mutations

in this analysis (Table 3.20). Further, there were 15 novel SNPs with loss of function

consequence, including one in homozygous state. This novel homozygous SNP with

stop-gained function in FGD3 (chr9: 95778067_C/T) was noted in one of the patients

(Sindhi). This gene encodes a ‗FYVE, RhoGEF and PH domain containing 3‘ protein

which is involved in neurotrophin receptor p75 (NTR) signaling pathways. Mutations in

this gene have not been previously reported to be associated with any of cardiac

disorders. The total homozygous LoF SNP sites were found to be 86. On average,

each person contained ~34.6 homozygous Loss of Function SNPs, which is slightly

higher than the number as was reported in a healthy individual‘s genome (MacArthur,

and Tyler-Smith, 2010).

As hypothesized earlier that rare deleterious variants contribute more to complex,

multigenic and Mendelian disorders (Bomba, Walter, and Soranzo, 2017), the allele

frequencies of LoF SNVs were determine. For this, the global derived allele

frequencies and South Asian derived allele frequencies were retrieved. The allele

frequency spectrum using the DAF SAS showed that only 45 SNVs had rare allele

frequency (DAF < 1%), 65 SNVs had low allele frequency (1% ≤ DAF ≥ 5%), while

majority of SNVs i.e., 471 belonged to common allele frequency (DAF > 5%).

Surprisingly, this also included 44 fixed or nearly fixed SNVs with DAF > 80% (Figure

3.35 A). Owing to the large numbers of common, nearly fixed and fixed LoF SNVs,

these were characterized based on their evolutionary conservation, as suggested

earlier (MacArthur et al., 2012). The genomic evolutionary rate profiling scores

(GERP++ scores) were retrieved through ANNOVAR annotation. The GERP++ tool

134

uses maximum likelihood evolutionary rate estimation on every position and assigns

score to the variants based on the selective constraint (Davydov et al., 2010). Here,

the average GERP++ score of all LoF SNVs was found to be quite low i.e. 0.65 (Figure

3.35 B), implying that the genes harboring LoF SNVs belonged to evolutionary less

conserved regions. In fact, most of the genes having more than one LoF SNVs have

negative value of GERP++ score or < 2.00, e.g. the maximum GERP++ score for

ZNF717 variants was 1.29. However, there were 84 LoF SNVs with very high GERP++

score i.e. > 4.00 corresponding to evolutionary constraint regions. These variants at

evolutionary constraint sites pose greater deleterious effect to the fitness. This also

included MUC19 which overall has highest number of LoF SNVs and two with

GERP++ score of 5.69 and 6.16.

Figure 3.35: Loss of Functions (LoF) SNVs. (A) Allele frequency spectrum of all LoF SNVs in South Asia. (B) Genomic evolutionary rate profiling (GERP++) scores for LoF SNVs.

In order to filter out LoF SNVs more precisely to highlight those causing maximum

detrimental effect to the structure and function of protein and affecting the fitness, the

ratio of affected transcripts and total transcripts was determined in each of gene

carrying LoF SNV. This is because the LoF SNVs may affect only a small subset of

transcripts of a gene which undergoes many alternative splicings post transcriptionally,

135

whereby masking the effect of observed LoF variant. The position of LoF SNVs were

also noted, because mutations inducing a stop codon near the 3‘-end of mature

transcript would not affect the protein as severely as those truncating the proteins in

central domains of its structure. This analysis highlighted six genes, IFNE, MAGEE2,

OR4P4, OR5AR1, PTCHD3P2, and SIX1, which were consisting of only one transcript

and that too affected with LoF SNV (Table 3.22). Out of these, SIX1 has been reported

to be associated with dilated cardiomyopathy. This gene encodes a homeobox protein

which is involved in transcriptional regulation of genes taking part in development of

several organs including muscles, kidney, and inner ear (Tschirner et al., 2014;

Williams et al., 2011). The other five genes have not been previously reported to be

associated with cardiomyopathies.

Table 3.22: The LoF SNVs affecting all transcripts of their genes.

CHR POS ID REF ALT Gene State Effect

chr2 170624221 rs2114646 C T PTCHD3P2 Heterozygous splice-donor

chr9 21481483 rs2039381 G A IFNE Homozygous stop-gained

chr11 55406022 rs76160133 C G OR4P4 Homozygous stop-gained

chr11 56431216 rs11228710 C T OR5AR1 Homozygous stop-gained

chr14 61124940 rs10143202 A G SIX1 Homozygous start-lost

chrX 75004529 rs1343879 C A MAGEE2 Heterozygous stop-gained

The xy scatter of Global DAF and South Asian DAF using the global rare LoF SNVs

showed inclination of alleles towards South Asia, representing higher allele frequency

in South Asia than in other populations of the world.

136

3.3.5 Annotation of Small Indels and Analysis

Currently there are limited tools to predict the effect of indels. Here, two tools CADD

and Ensembl‘s VEP were used to annotate the indels.

3.3.5.1 Annotation with CADD

The indels were annotated with online Webserver of CADD v1.3 with full annotations

option. There were 935 indels with CADD_phred score ≥ 15. These deleterious indels

corresponding to 772 genes, in which 67 genes were having more than one

deleterious indels. Here, ZNF717 was the top most gene having 14 indels with

CADD_phred score ≥ 15, while SARM1, being the second top most, was having 6

indels with CADD_phred score ≥ 15. The consequences of all indels with

CADD_phred score ≥ 15 are shown in Figure 3.36.

Figure 3.36: Functional consequences of indels with CADD_phred ≥ 15.

3.3.5.2 Annotation with VEP

The annotation of small indels (insertions/deletions) with the same version of

Ensembl‘s VEP tool, revealed 557 indels with ‗HIGH‘ impact to the transcript, in which

19 indels were novel. These LoF indels were attributing to 488 genes, including 48

genes having more than one LoF indels. Here, again ZNF717 was the top most gene

346 20

80

1

10 8

85

87

frameshift

inframe_insertion

inframe_deletion

stop_gained

splice_donor

splice_acceptor

downstream

upstream

137

having 14 LoF indels, while SARM1, being the second top most, was having 6 LoF

mutations. It was noted that majority of the indels were frameshift (92%), yet there

were few stop-gained, splice-donor, and splice-acceptor indels also (Figure 3.37). The

homozygous LoF indels were also enumerated which were found to be 270, and no

novel homozygous LoF indel was observed in this analysis.

Figure 3.37: Loss of Function indels according to functional consequences.

3.3.6 Filtration of Variants of ClinVar, OMIM, and GWAS databases

The filtration of the variants based on pathogenicity in ClinVar database identified one

SNV rs1008642 associated with cardiomyopathies and long QT syndrome. This

missense SNV belongs to ssu-2 homolog (C. elegans) gene (SSUH2) involved in heat

shock protein binding and unfolded protein binding. The SNV was found in two of the

patients in heterozygous form. Its allele frequency in Pakistani populations was found

to be 17.8% according to 1000 Genomes Project database, while its global allele

frequency was noted to be 37.1% with major contribution of African populations i.e.,

62.1%. Likewise, from GWAS catalogue and OMIM database, I did not find any variant

in the patients. This represents the lack of genetic research on cardiomyopathies in

Pakistani population and its representation in medical genetics databases.

513

6

25 27

frame_shift

stop_gained

splice_donor

splice_acceptor

138

Chapter 4.0

Conclusion

139

Conclusion

The present study entails the genetic predisposition to cardiovascular diseases in

Pakistani population. The genetic variants prioritized here as deleterious using the

bioinformatics tools present a framework of early assessing the genetic risk factors for

CVDs from the whole genome or whole exome sequencing datasets. This study

concludes that the underlying detrimental mutational burden is higher for common and

polygenic CVDs than for the Mendelian CVDs in Pakistan. Among common CVDs, the

highest numbers of harmful mutations in a descending order are for hypertension,

atherosclerosis, coronary aneurysm, heart failure, and coronary artery disease.

Likewise for Mendelian CVDs, the highest numbers of harmful mutations in a

descending order are for cardiomyopathies, cardiac arrhythmias, and congenital heart

defects. The identification of prioritized detrimental variants in patients of

hyperlipidemia and cardiomyopathies highlighted genes potentially involved in the

pathophysiology of these disorders.

This study also concludes that although majority of the harmful mutations for CVDs

prioritized in Pakistani population are grouped with neighboring South Asian

populations and Europeans and Americans, yet there are few deleterious variants

which are moderately and greatly differentiated in this population having considerably

higher allele frequency than in other populations of the world. Such differentiated

deleterious mutations can potentially play more role in the pathophysiology of CVDs in

this region of the world.

In future, more patients of different CVDs can be recruited for whole genome and/or

exome sequencing analysis for validating and explore more genes which are

potentially involved in the pathophysiology of various forms of CVDs. This approach

can lead to formulation of panel for Pakistani-population specific genetic risk factors

associated with CVDs for early assessment.

140

5.0 Publications

From the Thesis: Shakeel, M., Irfan, M., and Khan, I.A. (2018). Estimating the mutational load for

cardiovascular diseases in Pakistani population. PloS One, 13(2):e0192446. Shakeel, M., Irfan, M., and Khan, I.A. (2018). Rare genetic mutations in Pakistani

patients with dilated cardiomyopathy. Gene, 673: 134-139. Shakeel, M., Irfan, M., Khan, W., Azim, M.K., and Khan, I.A., (2018). Whole genome

sequencing of a Pakistani obese person identifies rare mutations (Manuscript in preparation).

Other than Thesis: Ali, A., Khan, W., Shakeel, M., and Khan, I.A. (2018). Distinct landscape of mutations

in cervical cancer revealed by somatic mutation signatures - a study inferring mutational signatures from somatic genomic variants in different cancer types (submitted, under review).

Khan, I.A., Anwar, M., Shakeel, M., Bergström, A., Narasimhan, V., Xue, Y., Tyler-

Smith, C., and Ayub Q (2018). Mutational load in genes associated with heritable blood disorders in Pakistan‖ (Manuscript in preparation).

141

Chapter 6.0

References

142

1000 Genomes Project. (2010). A map of human genome variation from population-scale sequencing. Nature, 467(7319), 1061-1073.

1000 Genomes Project. (2012). An integrated map of genetic variation from 1,092

human genomes. Nature, 491(7422), 56-65. 1000 Genomes Project. (2015). A global reference for human genetic variation. Nature,

526(7571), 68-74. Abid, A., Akhtar, N., Khaliq, S., and Mehdi, S. Q. (2011). Genetic heterogeneity for

autosomal dominant familial hypertrophic cardiomyopathy in a Pakistani family. Journal of the College of Physicians and Surgeons Pakistan, 21(4), 202-206.

Ahmed, W., Ali, I. S., Riaz, M., Younas, A., Sadeque, A., Niazi, A. K., Niazi, S. H., Ali,

S. H. B., Azam, M., and Qamar, R. (2013). Association of ANRIL polymorphism (rs1333049: C>G) with myocardial infarction and its pharmacogenomic role in hypercholesterolemia. Gene, 515(2), 416-420.

Ahmed, W., Malik, M., Saeed, I., Khan, A. A., Sadeque, A., Kaleem, U., Ahmed, N.,

Ajmal, M., Azam, M., and Qamar, R. (2011). Role of tissue plasminogen activator and plasminogen activator inhibitor polymorphism in myocardial infarction. Molecular Biology Reports, 38(4), 2541-2548.

Ajmal, M., Ahmed, W., Akhtar, N., Sadeque, A., Khalid, A., Benish Ali, S. H., Ahmed,

N., Azam, M., and Qamar, R. (2011). A novel pathogenic nonsense triple-nucleotide mutation in the low-density lipoprotein receptor gene and its clinical correlation with familial hypercholesterolemia. Genetic Testing and Molecular Biomarkers, 15(9), 601-606.

Akil, L., and Ahmad, H. A. (2011). Relationships between obesity and cardiovascular

diseases in four southern states and Colorado. Journal of Health Care for the Poor and Underserved, 22(4 Suppl), 61.

Al Turki, S., Manickaraj, A. K., Mercer, C. L., Gerety, S. S., Hitz, M.-P., Lindsay, S.,

D‘Alessandro, L. C., Swaminathan, G. J., Bentham, J., and Arndt, A.-K. (2014). Rare variants in NR2F2 cause congenital heart defects in humans. The American Journal of Human Genetics, 94(4), 574-585.

Alvi, F. M., and Hasnain, S. (2009). ACE I/D and G2350A polymorphisms in Pakistani

hypertensive population of Punjab. Clinical and Experimental Hypertension, 31(5), 471-480.

Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data.

Babraham Bioinformatics, 175-176. Angulo, M., Butler, M., and Cataletto, M. (2015). Prader-Willi syndrome: a review of

clinical, genetic, and endocrine findings. Journal of Endocrinological Investigation, 38(12), 1249-1263.

Antonarakis, S. E., Lyle, R., Dermitzakis, E. T., Reymond, A., and Deutsch, S. (2004).

Chromosome 21 and down syndrome: from genomics to pathophysiology. Nature Reviews. Genetics, 5(10), 725.

143

Arnett, D. K., Baird, A. E., Barkley, R. A., Basson, C. T., Boerwinkle, E., Ganesh, S. K., Herrington, D. M., Hong, Y., Jaquish, C., and McDermott, D. A. (2007). Relevance of genetics and genomics for prevention and treatment of cardiovascular disease. Circulation, 115(22), 2878-2901.

Artham, S. M., Lavie, C. J., Milani, R. V., and Ventura, H. O. (2009). Obesity and

hypertension, heart failure, and coronary heart disease-risk factor, paradox, and recommendations for weight loss. The Ochsner Journal, 9(3), 124-132.

Aulchenko, Y. S., Ripatti, S., Lindqvist, I., Boomsma, D., Heid, I. M., Pramstaller, P. P.,

Penninx, B. W., Janssens, A. C. J., Wilson, J. F., and Spector, T. (2009). Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts. Nature Genetics, 41(1), 47-55.

Aziz, K. U., Faruqui, A., Patel, N., and Jaffery, H. (2012). Prevalence and awareness of

cardiovascular disease including life styles in a lower middle class urban community in an Asian country. Pakistan Heart Journal, 41(3-4), 11-20.

Badimon, L., and Vilahur, G. (2012). LDL‐cholesterol versus HDL‐cholesterol in the atherosclerotic plaque: inflammatory resolution versus thrombotic chaos. Annals of the New York Academy of Sciences, 1254(1), 18-32.

Bainbridge, M. N., Wang, M., Wu, Y., Newsham, I., Muzny, D. M., Jefferies, J. L.,

Albert, T. J., Burgess, D. L., and Gibbs, R. A. (2011). Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities. Genome biology, 12(7), R68.

Bamshad, M. J., Ng, S. B., Bigham, A. W., Tabor, H. K., Emond, M. J., Nickerson, D.

A., and Shendure, J. (2011). Exome sequencing as a tool for Mendelian disease gene discovery. Nature Reviews. Genetics, 12(11), 745-755.

Bergen, A. C. (2015). Mutation load under additive fitness effects. Genetics Research

(Camb.), 97(e2), 1-10. Betti, I., Ballo, P., Barchielli, A., and Zuppiroli, A. (2010). Prognostic role of CA-125 in a

population at high risk for cardiovascular disease: results from the Probe-HF Study. Journal of the American College of Cardiology, 55(10), A62-E595.

Bezzina, C. R. (2008). Genetics of cardiomyopathy and channelopathy. Heart and

Metabolism, 41, 5-10. Bomba, L., Walter, K., and Soranzo, N. (2017). The impact of rare and low-frequency

genetic variants in common disease. Genome Biology, 18(77), 1-17. Borger, P. (2017). Natural Knockouts: Natural Selection Knocked Out. Biology (Basel),

6(43), 1-6. doi:10.3390/biology6040043 British Heart Foundation. (2017). Cardiovascular disease. Retrieved from the British

Heart Foundation website: https://www.bhf.org.uk/heart-health/conditions/ cardiovascular-disease.

144

Brotman, D. J., Walker, E., Lauer, M. S., and O‘Brien, R. G. (2005). In search of fewer independent risk factors. Archives of Internal Medicine, 165(2), 138-145.

Brown, C. A., McKinney, K. Q., Kaufman, J. S., Gravel, R. A., and Rozen, R. (2000). A

common polymorphism in methionine synthase reductase increases risk of premature coronary artery disease. Journal of Cardiovascular Risk, 7(3), 197-200.

Cahill, T. J., Ashrafian, H., and Watkins, H. (2013). Genetic cardiomyopathies causing

heart failure. Circulation Research, 113(6), 660-675. Cambien, F., and Tiret, L. (2007). Genetics of cardiovascular diseases: from single

mutations to the whole genome. Circulation, 116(15), 1714-1724. doi:10.1161/circulationaha.106.661751

Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte,

N., Lopez, R., and Apweiler, R. (2004). The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucleic Acids Research, 32(suppl_1), D262-D266.

Chang, S. S., Grunder, S., Hanukoglu, A., Rösler, A., Mathew, P., Hanukoglu, I.,

Schild, L., Lu, Y., Shimkets, R. A., and Nelson-Williams, C. (1996). Mutations in subunits of the epithelial sodium channel cause salt wasting with hyperkalaemic acidosis, pseudohypoaldosteronism type 1. Nature Genetics, 12(3), 248-253.

Charlesworth, D., and Willis, J. H. (2009). The genetics of inbreeding depression.

Nature Reviews. Genetics, 10(11), 783-796. Chen, C., Tso, A. W., Cheung, B. M., Law, L. S., Ong, K., Wat, N., Janus, E. D., Xu, A.,

and Lam, K. S. (2012). Plasma concentration of pigment epithelium‐derived factor is closely associated with blood pressure and predicts incident hypertension in Chinese: a 10‐year prospective study. Clinical Endocrinology, 76(4), 506-513.

Clee, S. M., Zwinderman, A. H., Engert, J. C., Zwarts, K. Y., Molhuizen, H. O., Roomp,

K., Jukema, J. W., van Wijland, M., van Dam, M., and Hudson, T. J. (2001). Common genetic variation in ABCA1 is associated with altered lipoprotein levels and a modified risk for coronary artery disease. Circulation, 103(9), 1198-1205.

Collin, G. B., Marshall, J. D., Ikeda, A., So, W. V., Russell-Eggitt, I., Maffei, P., Beck,

S., Boerkoel, C. F., Sicolo, N., and Martin, M. (2002). Mutations in ALMS1 cause obesity, type 2 diabetes and neurosensory degeneration in Alström syndrome. Nature Genetics, 31(1), 74-78.

Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A.,

Handsaker, R. E., Lunter, G., Marth, G. T., and Sherry, S. T. (2011). The variant call format and VCFtools. Bioinformatics, 27(15), 2156-2158.

Davies, M. (2000). The cardiomyopathies: an overview. Heart, 83(4), 469-474. Davydov, E. V., Goode, D. L., Sirota, M., Cooper, G. M., Sidow, A., and Batzoglou, S.

(2010). Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Computational Biology, 6(12), e1001025.

145

Dawber, T. R., Kannel, W. B., Revotskie, N., Stokes III, J., Kagan, A., and Gordon, T. (1959). Some factors associated with the development of coronary heart disease-six years' follow-up experience in the Framingham Study. American Journal of Public Health and the Nations Health, 49(10), 1349-1356.

Delles, C., McBride, M. W., Padmanabhan, S., and Dominiczak, A. F. (2008). The

genetics of cardiovascular disease. Trends in Endocrinology & Metabolism, 19(9), 309-316.

Deloukas, P., Kanoni, S., Willenborg, C., Farrall, M., Assimes, T. L., Thompson, J. R.,

Ingelsson, E., Saleheen, D., Erdmann, J., Goldstein, B. A., et al. (2013). Large-scale association analysis identifies new risk loci for coronary artery disease. Nature Genetics, 45(1), 25-33. doi:10.1038/ng.2480

Denvir, J., Boskovic, G., Fan, J., Primerano, D. A., Parkman, J. K., and Kim, J. H.

(2016). Whole genome sequence analysis of the TALLYHO/Jng mouse. BMC Genomics, 17(907), 1-15.

Dhar, S., Ray, S., Dutta, A., Sengupta, B., and Chakrabarti, S. (2012). Polymorphism of

ACE gene as the genetic predisposition of coronary artery disease in Eastern India. Indian Heart Journal, 64(6), 576-581.

DiPietro, A., Trachtman, H., Sanjad, S. A., and Liftonl, R. P. (1996). Genetic

heterogeneity of Bartter‘s syndrome revealed by mutations in the K+ channel, ROMK. Nature Genetics, 14.

Do, R., Stitziel, N. O., Won, H.-H., Jørgensen, A. B., Duga, S., Merlini, A., Kiezun, A.,

Farrall, M., Goel, A., and Zuk, O. (2015). Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction. Nature, 518(7537), 102-106.

Dopazo, J., Amadoz, A., Bleda, M., Garcia-Alonso, L., Alemán, A., García-García, F.,

Rodriguez, J. A., Daub, J. T., Muntané, G., and Rueda, A. (2016). 267 Spanish exomes reveal population-specific differences in disease-related genetic variation. Molecular Biology and Evolution, 33(5), 1205-1218.

Doris, P. A. (2002). Hypertension genetics, single nucleotide polymorphisms, and the

common disease: common variant hypothesis. Hypertension, 39(2), 323-331. Edmonds, C. A., Lillie, A. S., and Cavalli-Sforza, L. L. (2004). Mutations arising in the

wave front of an expanding population. Proceedings of the National Academy of Sciences, 101(4), 975-979.

Elliott, P. (2000). Diagnosis and management of dilated cardiomyopathy. Heart, 84(1),

106-106. Erdmann, J., Stark, K., Esslinger, U. B., Rumpf, P. M., Koesling, D., de Wit, C., Kaiser,

F. J., Braunholz, D., Medack, A., and Fischer, M. (2013). Dysfunctional nitric oxide signalling increases risk of myocardial infarction. Nature, 504(7480), 432-436.

Fagerberg, L., Hallström, B. M., Oksvold, P., Kampf, C., Djureinovic, D., Odeberg, J.,

Habuka, M., Tahmasebpoor, S., Danielsson, A., and Edlund, K. (2014). Analysis of

146

the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Molecular and Cellular Proteomics, 13(2), 397-406.

Fahed, A., Gelb, B., Seidman, J., and Seidman, C. (2013). Genetics of congenital heart

disease: the glass half empty. Circulation Research, 112(12), E182-E182. Faita, F., Vecoli, C., Foffa, I., and Andreassi, M. G. (2012). Next generation sequencing

in cardiovascular diseases. World Journal of Cardiology, 4(10), 288-295. doi:10.4330/wjc.v4.i10.288

Frikke-Schmidt, R. (2011). Genetic variation in ABCA1 and risk of cardiovascular

disease. Atherosclerosis, 218(2), 281-282. Fu, W., Gittelman, R. M., Bamshad, M. J., and Akey, J. M. (2014). Characteristics of

neutral and deleterious protein-coding variation among individuals and populations. The American Journal of Human Genetics, 95(4), 421-436.

Garg, V., Kathiriya, I. S., Barnes, R., and Schluterman, M. K. (2003). GATA4 mutations

cause human congenital heart defects and reveal an interaction with TBX5. Nature, 424(6947), 443.

Garg, V., Muth, A. N., Ransom, J. F., and Schluterman, M. K. (2005). Mutations in

NOTCH1 cause aortic valve disease. Nature, 437(7056), 270. Gerull, B., Gramlich, M., Atherton, J., McNabb, M., Trombitás, K., Sasse-Klaassen, S.,

Seidman, J., Seidman, C., Granzier, H., and Labeit, S. (2002). Mutations of TTN, encoding the giant muscle filament titin, cause familial dilated cardiomyopathy. Nature Genetics, 30(2), 201-204.

Golbus, J. R., Stitziel, N. O., Zhao, W., Xue, C., Farrall, M., McPherson, R., Erdmann,

J., Deloukas, P., Watkins, H., and Schunkert, H. (2016). Common and rare genetic variation in CCR2, CCR5, or CX3CR1 and risk of atherosclerotic coronary heart disease and glucometabolic traits. Circulation: Cardiovascular Genetics, 9(3), 250-258.

Goldmuntz, E. (2005). DiGeorge syndrome: new insights. Clinics in Perinatology, 32(4),

963-978. Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A., and McKusick, V. A. (2005).

Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research, 33(suppl 1), D514-D517.

Hansson, J. H., Nelson-Williams, C., Suzuki, H., Schild, L., Shimkets, R., Lu, Y.,

Canessa, C., Iwasaki, T., Rossier, B., and Lifton, R. P. (1995). Hypertension caused by a truncated epithelial sodium channel gamma subunit: genetic heterogeneity of Liddle syndrome. Nature Genetics, 11(1), 76-82.

Haq, F. U., Jalil, F., Hashmi, S., Jumani, M. I., Imdad, A., Jabeen, M., Hashmi, J. T.,

Irfan, F. B., Imran, M., and Atiq, M. (2011). Risk factors predisposing to congenital heart defects. Annals of Pediatric Cardiology, 4(2), 117-121.

147

Harrow, J., Frankish, A., Gonzalez, J. M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B. L., Barrell, D., Zadissa, A., and Searle, S. (2012). GENCODE: the reference human genome annotation for The ENCODE Project. Genome Research, 22(9), 1760-1774.

Helgadottir, A., Thorleifsson, G., Manolescu, A., Gretarsdottir, S., Blondal, T.,

Jonasdottir, A., Jonasdottir, A., Sigurdsson, A., Baker, A., and Palsson, A. (2007). A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science, 316(5830), 1491-1493.

Henn, B. M., Botigué, L. R., Bustamante, C. D., Clark, A. G., and Gravel, S. (2015).

Estimating the mutation load in human genomes. Nature Reviews Genetics, 16(6), 333-343.

Henn, B. M., Botigué, L. R., Peischl, S., Dupanloup, I., Lipatov, M., Maples, B. K.,

Martin, A. R., Musharoff, S., Cann, H., and Snyder, M. P. (2016). Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. Proceedings of the National Academy of Sciences, 113(4), E440-E449.

Hershberger, R. E., Hedges, D. J., and Morales, A. (2013). Dilated cardiomyopathy: the

complexity of a diverse genetic architecture. Nature Reviews Cardiology, 10(9), 531-547.

Hindorff, L., Junkins, H., Mehta, J., and Manolio, T. (2011). A catalog of published

genome-wide association studies 2010. Available at: http://www.genome.gov/gwastudies/, (Accessed June 28, 2016).

Hintzsche, J. D., Robinson, W. A., and Tan, A. C. (2016). A Survey of Computational

Tools to Analyze and Interpret Whole Exome Sequencing Data. International Journal of Genomics, 2016, 1-17.

Howrigan, D. P., Simonson, M. A., Kamens, H. M., Stephens, S. H., Wills, A. G.,

Ehringer, M. A., Keller, M. C., and McQueen, M. B. (2011). Mutational load analysis of unrelated individuals. Paper presented at the BMC Proceedings.

Hsieh, Y. Y., Lin, Y. J., Chang, C. C., Chen, D. Y., Hsu, C. M., Lo, M. M., Hsu, K. H.,

and Tsai, F. J. (2010). Human lymphocyte antigen B‐associated transcript 2, 3, and 5 polymorphisms and haplotypes are associated with susceptibility of Kawasaki disease and coronary artery aneurysm. Journal of Clinical Laboratory Analysis, 24(4), 262-268.

Hussain, S., Bibi, S., and Javed, Q. (2011). Heritability of genetic variants of resistin

gene in patients with coronary artery disease: A family-based study. Clinical Biochemistry, 44(8), 618-622.

Hussain, S., Haroon, J., Ejaz, S., and Javed, Q. (2016). Variants of resistin gene and

the risk of idiopathic dilated cardiomyopathy in Pakistan. Meta Gene, 9, 37-41. International Consortium for Blood Pressure Genome-Wide Association Studies.

(2011). Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature, 478(7367), 103-109.

http://www.genome.gov/gwastudies/

148

Iqbal, M. P., Fatima, T., Parveen, S., Yousuf, F. A., Shafiq, M., Mehboobali, N., Khan, A. H., Azam, I., and Frossard, P. M. (2005). Lack of association of methylenetetrahydrofolate reductase 677C> T mutation with coronary artery disease in a Pakistani population. Journal of Molecular and Genetic Medicine:, 1(1), 26-32.

Iqbal, M. P., Mahmood, S., Mehboobali, N., Ishaq, M., Fatima, T., Parveen, S., and

Frossard, P. (2004). Association study of the angiotensin-converting enzyme (ACE) gene G2350A dimorphism with myocardial infarction. Experimental & Molecular Medicine, 36(2), 110.

Jacoby, D., and McKenna, W. J. (2012). Genetics of inherited cardiomyopathy.

European Heart Journal, 33(3), 296-304. doi:10.1093/eurheartj/ehr260 Japp, A. G., Gulati, A., Cook, S. A., Cowie, M. R., and Prasad, S. K. (2016). The

diagnosis and evaluation of dilated cardiomyopathy. Journal of the American College of Cardiology, 67(25), 2996-3010.

Jobling, M., Hurles, M., and Tyler-Smith, C. (2013). Human evolutionary genetics:

origins, peoples & disease: Garland Science. Jormsjö, S., Wuttge, D. M., Sirsjö, A., Whatling, C., Hamsten, A., Stemme, S., and

Eriksson, P. (2002). Differential expression of cysteine and aspartic proteases during progression of atherosclerosis in apolipoprotein E-deficient mice. The American Journal of Pathology, 161(3), 939-945.

Kaiser, J. (2014). The hunt for missing genes. Science, 344(6185), 687-689. Kannel, W. B., Dawber, T. R., Friedman, G. D., Glennon, W. E., and Mcnamara, P. M.

(1964). Risk Factors in Coronary Heart DiseaseAn Evaluation of Several Serum Lipids as Predictors of Coronary Heart DiseaseThe Framingham Study. Annals of Internal Medicine, 61(5_Part_1), 888-899.

Kannel, W. B., Dawber, T. R., Kagan, A., Revotskie, N., and Stokes, J. (1961). Factors

of Risk in the Development of Coronary Heart Disease—Six-Year Follow-up ExperienceThe Framingham Study. Annals of Internal Medicine, 55(1), 33-50.

Kathiresan, S., and Srivastava, D. (2012). Genetics of human cardiovascular disease.

Cell, 148(6), 1242-1257. Keinan, A., Mullikin, J. C., Patterson, N., and Reich, D. (2007). Measurement of the

human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans. Nature Genetics, 39(10), 1251-1255.

Kelly, B. B., and Fuster, V. (2010). Promoting cardiovascular health in the developing

world: a critical challenge to achieve global health: National Academies Press, Washington D.C.

Kircher, M., Witten, D. M., Jain, P., O'roak, B. J., Cooper, G. M., and Shendure, J.

(2014). A general framework for estimating the relative pathogenicity of human genetic variants. Nature Genetics, 46(3), 310-315.

149

Klopfstein, S., Currat, M., and Excoffier, L. (2005). The fate of mutations surfing on the wave of a range expansion. Molecular Biology and Evolution, 23(3), 482-490.

Köhler, S., Doelken, S. C., Mungall, C. J., Bauer, S., Firth, H. V., Bailleul-Forestier, I.,

Black, G., Brown, D. L., Brudno, M., and Campbell, J. (2014). The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Research, 42(D1), D966-D974.

Kolehmainen, J., Black, G. C., Saarinen, A., Chandler, K., Clayton-Smith, J., Träskelin,

A.-L., Perveen, R., Kivitie-Kallio, S., Norio, R., and Warburg, M. (2003). Cohen syndrome is caused by mutations in a novel gene, COH1, encoding a transmembrane protein with a presumed role in vesicle-mediated sorting and intracellular protein transport. The American Journal of Human Genetics, 72(6), 1359-1369.

Kristiansson, K., Ilveskoski, E., Lehtimäki, T., Peltonen, L., Perola, M., and Karhunen,

P. J. (2008). Association analysis of allelic variants of USF1 in coronary atherosclerosis. Arteriosclerosis, Thrombosis, and Vascular Biology, 28(5), 983-989.

Kryukov, G. V., Pennacchio, L. A., and Sunyaev, S. R. (2007). Most rare missense

alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics, 80(4), 727-739.

Landrum, M. J., Lee, J. M., Riley, G. R., Jang, W., Rubinstein, W. S., Church, D. M.,

and Maglott, D. R. (2014). ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Research, 42(D1), D980-D985.

Laurila, P.-P., Naukkarinen, J., Kristiansson, K., Ripatti, S., Kauttu, T., Silander, K.,

Salomaa, V., Perola, M., Karhunen, P. J., and Barter, P. J. (2010). Genetic association and interaction analysis of USF1 and APOA5 on lipid levels and atherosclerosis. Arteriosclerosis, Thrombosis, and Vascular Biology, 30(2), 346-352.

Lek, M., Karczewski, K. J., Minikel, E. V., Samocha, K. E., Banks, E., Fennell, T.,

O‘Donnell-Luria, A. H., Ware, J. S., Hill, A. J., and Cummings, B. B. (2016). Analysis of protein-coding genetic variation in 60,706 humans. Nature, 536(7616), 285-291.

Lettre, G. (2014). Rare and low-frequency variants in human common diseases and

other complex traits. Journal of Medical Genetics, 51(11), 705-714. Lettre, G., Palmer, C. D., Young, T., Ejebe, K. G., Allayee, H., Benjamin, E. J., Bennett,

F., Bowden, D. W., Chakravarti, A., and Dreisbach, A. (2011). Genome-wide association study of coronary heart disease and its risk factors in 8,090 African Americans: the NHLBI CARe Project. PLoS Genetics, 7(2), e1001300.

Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows–

Wheeler transform. Bioinformatics, 25(14), 1754-1760. Li, H., and Durbin, R. (2011). Inference of human population history from individual

whole-genome sequences. Nature, 475(7357), 493-496.

150

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009). The sequence alignment/map format and SAMtools. Bioinformatics, 25(16), 2078-2079.

Li, Y., Wang, O., Quan, T., Xia, W., Jiang, Y., Li, M., Meng, X., and Xing, X. (2016). A

genomic study of adult-onset idiopathic hypoparathyroidism in Chinese by targeted next-generation sequencing. Zhonghua Nei Ke Za Zhi, 55(8), 604-608.

Liaquat, A., Asifa, G. Z., Zeenat, A., and Javed, Q. (2014). Polymorphisms of tumor

necrosis factor-alpha and interleukin-6 gene and C-reactive protein profiles in patients with idiopathic dilated cardiomyopathy. Annals of Saudi Medicine, 34(5), 407-414.

Lifton, R. P., Gharavi, A. G., and Geller, D. S. (2001). Molecular mechanisms of human

hypertension. Cell, 104(4), 545-556. Liu, X., Zhang, L., Pacciulli, D., Zhao, J., Nan, C., Shen, W., Quan, J., Tian, J., and

Huang, X. (2016). Restrictive Cardiomyopathy Caused by Troponin Mutations: Application of Disease Animal Models in Translational Studies. Frontiers in Physiology, 7(Article 629), 1-6. doi:10.3389/fphys.2016.00629

Lohmueller, K. E., Indap, A. R., Schmidt, S., Boyko, A. R., Hernandez, R. D., Hubisz,

M. J., Sninsky, J. J., White, T. J., Sunyaev, S. R., and Nielsen, R. (2008). Proportionally more deleterious genetic variation in European than in African populations. Nature, 451(7181), 994-997.

Lohmueller, K. E., Mauney, M. M., Reich, D., and Braverman, J. M. (2006). Variants

associated with common disease are not unusually differentiated in frequency across populations. The American Journal of Human Genetics, 78(1), 130-136.

Lopes, L. R., Syrris, P., Guttmann, O. P., O'Mahony, C., Tang, H. C., Dalageorgou, C.,

Jenkins, S., Hubank, M., Monserrat, L., McKenna, W. J., et al. (2015). Novel genotype-phenotype associations demonstrated by high-throughput sequencing in patients with hypertrophic cardiomyopathy. Heart, 101(4), 294-301. doi:10.1136/heartjnl-2014-306387

Luft, F. C. (2017). What have we learned from the genetics of hypertension? Medical

Clinics of North America, 101(1), 195-206. Ma, M., Ru, Y., Chuang, L.-S., Hsu, N.-Y., Shi, L.-S., Hakenberg, J., Cheng, W.-Y.,

Uzilov, A., Ding, W., and Glicksberg, B. S. (2015). Disease-associated variants in different categories of disease located in distinct regulatory elements. BMC Genomics, 16(8), 1-13.

MacArthur, D. G., Balasubramanian, S., Frankish, A., Huang, N., Morris, J., Walter, K.,

Jostins, L., Habegger, L., Pickrell, J. K., and Montgomery, S. B. (2012). A systematic survey of loss-of-function variants in human protein-coding genes. Science, 335(6070), 823-828.

MacArthur, D. G., and Tyler-Smith, C. (2010). Loss-of-function variants in the genomes

of healthy humans. Human Molecular Genetics, 19(R2), R125-R130.

151

Mahmood-ul-Hassan, Awan, Z. A., Gul, A. M., Sahibzada, W. A., and Hafizullah, M. (2005). Prevalence of coronary artery disease in rural areas of Peshawar. Journal of Postgraduate Medical Institute, 19(1), 14-22.

Mahon, N. G., Murphy, R. T., MacRae, C. A., Caforio, A. L., Elliott, P. M., and

McKenna, W. J. (2005). Echocardiographic evaluation in asymptomatic relatives of patients with dilated cardiomyopathy reveals preclinical disease. Annals of Internal Medicine, 143(2), 108-115.

Mani, A., Radhakrishnan, J., Wang, H., Mani, A., Mani, M.-A., Nelson-Williams, C.,

Carew, K. S., Mane, S., Najmabadi, H., and Wu, D. (2007). LRP6 mutation in a family with early coronary disease and metabolic risk factors. Science, 315(5816), 1278-1282.

Mardis, E. R. (2008). The impact of next-generation sequencing technology on

genetics. Trends in Genetics, 24(3), 133-141. Matsumoto, Y., Hayashi, T., Inagaki, N., Takahashi, M., Hiroi, S., Nakamura, T.,

Arimura, T., Nakamura, K., Ashizawa, N., and Yasunami, M. (2005). Functional analysis of titin/connectin N2-B mutations found in cardiomyopathy. Journal of Muscle Research and Cell Motility, 26(6), 367-374.

McEvoy, B. P., Powell, J. E., Goddard, M. E., and Visscher, P. M. (2011). Human

population dispersal ―Out of Africa‖ estimated from linkage disequilibrium and allele frequencies of SNPs. Genome Research, 21(6), 821-829.

McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A.,

Garimella, K., Altshuler, D., Gabriel, S., and Daly, M. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20(9), 1297-1303.

McLaren, W., Gil, L., Hunt, S. E., Riat, H. S., Ritchie, G. R., Thormann, A., Flicek, P.,

and Cunningham, F. (2016). The ensembl variant effect predictor. Genome biology, 17(122), 1-14.

Metzker, M. L. (2010). Sequencing technologies--the next generation. Nature Reviews.

Genetics, 11(1), 31-46. Miosge, L. A., Field, M. A., Sontani, Y., Cho, V., Johnson, S., Palkova, A., Balakishnan,

B., Liang, R., Zhang, Y., and Lyon, S. (2015). Comparison of predicted and actual consequences of missense mutations. Proceedings of the National Academy of Sciences, 112(37), E5189-E5198.

Narasimhan, V. M., Hunt, K. A., Mason, D., Baker, C. L., Karczewski, K. J., Barnes, M.

R., Barnett, A. H., Bates, C., Bellary, S., and Bockett, N. A. (2016). Health and population effects of rare gene knockouts in adult humans with related parents. Science, 352(6284), 474-477.

National Heart Lung and Blood Institute. (2017). Types of Congenital Heart Defects.

Retrieved from the National Heart, Lung, and Blood Institute website:. https://www.nhlbi.nih.gov/health/health-topics/topics/chd/types.

https://www.nhlbi.nih.gov/health/health-topics/topics/chd/types

152

Nawaz, S. K., and Hasnain, S. (2011). Effect of ACE polymorphisms on the association between noise and hypertension in a Pakistani population. Journal of the Renin-Angiotensin-Aldosterone System, 12(4), 516-520.

Ng, M., Fleming, T., Robinson, M., Thomson, B., Graetz, N., Margono, C., Mullany, E.

C., Biryukov, S., Abbafati, C., and Abera, S. F. (2014). Global, regional, and national prevalence of overweight and obesity in children and adults during 1980–2013: a systematic analysis for the Global Burden of Disease Study 2013. The Lancet, 384(9945), 766-781.

O'donnell, C. J., and Nabel, E. G. (2011). Genomics of cardiovascular disease. New

England Journal of Medicine, 365(22), 2098-2109. Ozaki, K., and Tanaka, T. (2016). Molecular genetics of coronary artery disease.

Journal of Human Genetics, 61(1), 71-77. Paquette, M., Chong, M., Thériault, S., Dufour, R., Paré, G., and Baass, A. (2017).

Polygenic risk score predicts prevalence of cardiovascular disease in patients with familial hypercholesterolemia. Journal of Clinical Lipidology, 11(3), 725-732. e725.

Park, H.-Y. (2017). Hereditary Dilated Cardiomyopathy: Recent Advances in Genetic

Diagnostics. Korean Circulation Journal, 47(3), 291-298. Patterson, N., Price, A. L., and Reich, D. (2006). Population structure and

eigenanalysis. PLoS Genetics, 2(12), e190. Peischl, S., and Excoffier, L. (2015). Expansion load: recessive mutations and the role

of standing genetic variation. Molecular Ecology, 24(9), 2084-2094. Perwaiz Iqbal, M., Iqbal, K., Khan Tareen, A., Parveen, S., Mehboobali, N., Haider, G.,

and Perwaiz Iqbal, S. (2016). Polymorphisms in MTHFR, MS and CBS genes and premature acute myocardial infarction in a Pakistani population. Pakistan Journal of Pharmaceutical Sciences, 29(6), 1901-1906.

Pigeyre, M., Yazdi, F. T., Kaur, Y., and Meyre, D. (2016). Recent progress in genetics,

epigenetics and metagenomics unveils the pathophysiology of human obesity. Clinical Science, 130(12), 943-986.

Pilbrow, A. P., Folkersen, L., Pearson, J. F., Brown, C. M., McNoe, L., Wang, N. M.,

Sweet, W. E., Tang, W. W., Black, M. A., and Troughton, R. W. (2012). The chromosome 9p21. 3 coronary heart disease risk allele is associated with altered gene expression in normal heart and vascular tissues. PLoS One, 7(6), e39574.

Poirier, P., Giles, T. D., Bray, G. A., Hong, Y., Stern, J. S., Pi-Sunyer, F. X., and Eckel,

R. H. (2006). Obesity and cardiovascular disease: pathophysiology, evaluation, and effect of weight loss. Circulation, 113(6), 898-918.

Postma, A. V., Bezzina, C. R., and Christoffels, V. M. (2016). Genetics of congenital

heart disease: the contribution of the noncoding regulatory genome. Journal of Human Genetics, 61(1), 13-19.

153

Pulignani, S., Cresci, M., and Andreassi, M. G. (2013). Genetics of congenital heart defects: is it not all in the DNA? Translational Research, 161(1), 59-61.

Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., Maller,

J., Sklar, P., De Bakker, P. I., and Daly, M. J. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81(3), 559-575.

Qureshi, S. F., Ali, A., John, P., Jadhav, A. P., Venkateshwari, A., Rao, H.,

Jayakrishnan, M., Narasimhan, C., Shenthar, J., and Thangaraj, K. (2015). Mutational analysis of SCN5A gene in long QT syndrome. Meta Gene, 6, 26-35.

R Core Team. (2013). R: a language for data analysis and graphics. R Foundation for

Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. Rafiq, M. A., Chaudhry, A., Care, M., Spears, D. A., Morel, C. F., and Hamilton, R. M.

(2017). Whole exome sequencing identified 1 base pair novel deletion in BCL2‐associated athanogene 3 (BAG3) gene associated with severe dilated cardiomyopathy (DCM) requiring heart transplant in multiple family members. American Journal of Medical Genetics Part A, 173(3), 699-705.

Rajwani, A., Ezzat, V., Smith, J., Yuldasheva, N. Y., Duncan, E. R., Gage, M., Cubbon,

R. M., Kahn, M. B., Imrie, H., and Abbas, A. (2012). Increasing circulating IGFBP1 levels improves insulin sensitivity, promotes nitric oxide production, lowers blood pressure, and protects against atherosclerosis. Diabetes, 61(4), 915-924.

Richardson, T. G., Campbell, C., Timpson, N. J., and Gaunt, T. R. (2016).

Incorporating Non-Coding Annotations into Rare Variant Analysis. PloS One, 11(4), e0154181.

Rizvi, S. F.-u.-H., Mustafa, G., Kundi, A., and Khan, M. A. (2015). Prevalence of

congenital heart disease in rural communities of Pakistan. Journal of Ayub Medical College Abbottabad, 27(1), 124-127.

Roth, G. A., Huffman, M. D., Moran, A. E., Feigin, V., Mensah, G. A., Naghavi, M., and

Murray, C. J. (2015). Global and regional patterns in cardiovascular mortality from 1990 to 2013. Circulation, 132(17), 1667-1678.

Sabater‐Molina, M., Pérez‐Sánchez, I., Hernández del Rincón, J. P., and Gimeno, J. R.

(2017). Genetics of hypertrophic cardiomyopathy: a review of current state. Clinical Genetics, 2017, 1-12.

Saeed, M., Perwaiz Iqbal, M., Yousuf, F., Perveen, S., Shafiq, M., Sajid, J., and

Frossard, P. (2007). Interactions and associations of paraoxonase gene cluster polymorphisms with myocardial infarction in a Pakistani population. Clinical Genetics, 71(3), 238-244.

Saeki, H., Hamada, M., and Hiwada, K. (2002). Circulating levels of insulin-like growth

factor-1 and its binding proteins in patients with hypertrophic cardiomyopathy. Circulation Journal, 66(7), 639-644.

http://www.r-project.org/

154

Saleheen, D., Alexander, M., Rasheed, A., Wormser, D., Soranzo, N., Hammond, N., Butterworth, A., Zaidi, M., Haycock, P., and Bumpstead, S. (2010). Association of the 9p21. 3 locus with risk of first-ever myocardial infarction in pakistanis. Arteriosclerosis, Thrombosis, and Vascular Biology, 30(7), 1467-1473.

Saleheen, D., Natarajan, P., Armean, I. M., Zhao, W., Rasheed, A., Khetarpal, S. A.,

Won, H.-H., Karczewski, K. J., O‘Donnell-Luria, A. H., and Samocha, K. E. (2017). Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity. Nature, 544(7649), 235-239.

Saleheen, D., Natarajan, P., Zhao, W., Rasheed, A., Khetarpal, S., Won, H.-H.,

Karczewski, K. J., ODonnell-Luria, A. H., Samocha, K. E., and Gupta, N. (2015). Human knockouts in a cohort with a high rate of consanguinity. bioRxiv, 031518.

Santoro, D., Buemi, M., Gagliostro, G., Vecchio, M., Currò, M., Ientile, R., and

Caccamo, D. (2015). Association of VDR gene polymorphisms with heart disease in chronic kidney disease patients. Clinical Biochemistry, 48(16), 1028-1032.

Sasson, A., and Michael, T. P. (2010). Filtering error from SOLiD output.

Bioinformatics, 26(6), 849-850. Schott, J.-J., Benson, D. W., Basson, C. T., Pease, W., Silberbach, G. M., Moak, J. P.,

Maron, B. J., Seidman, C. E., and Seidman, J. G. (1998). Congenital heart disease caused by mutations in the transcription factor NKX2-5. Science, 281(5373), 108-111.

Schunkert, H., König, I. R., Kathiresan, S., Reilly, M. P., Assimes, T. L., Holm, H.,

Preuss, M., Stewart, A. F., Barbalic, M., and Gieger, C. (2011). Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature genetics, 43(4), 333.

Schwartz, P. J., Priori, S. G., Locati, E. H., Napolitano, C., Cantù, F., Towbin, J. A.,

Keating, M. T., Hammoude, H., Brown, A. M., and Chen, L.-S. K. (1995). Long QT syndrome patients with mutations of the SCN5A and HERG genes have differential responses to Na+ channel blockade and to increases in heart rate. Circulation, 92(12), 3381-3386.

Seo, S., Guo, D.-F., Bugge, K., Morgan, D. A., Rahmouni, K., and Sheffield, V. C.

(2009). Requirement of Bardet-Biedl syndrome proteins for leptin receptor signaling. Human Molecular Genetics, 18(7), 1323-1331.

Shahid, S. U., Cooper, J. A., Beaney, K. E., Li, K., Rehman, A., and Humphries, S. E.

(2017). Genetic risk analysis of coronary artery disease in Pakistani subjects using a genetic risk score of 21 variants. Atherosclerosis, 258, 1-7.

Shatwan, I. M., Minihane, A.-M., Williams, C. M., Lovegrove, J. A., Jackson, K. G., and

Vimaleswaran, K. S. (2016). Impact of lipoprotein lipase gene polymorphism, S447X, on postprandial triacylglycerol and glucose response to sequential meal ingestion. International Journal of Molecular Sciences, 17(397), 1-9.

Simon, D. B., Karet, F. E., Hamdan, J. M., DiPietro, A., Sanjad, S. A., and Lifton, R. P.

(1996). Bartter's syndrome, hypokalaemic alkalosis with hypercalciuria, is caused

155

by mutations in the Na-K-2Cl cotransporter NKCC2. Nature Genetics, 13(2), 183-188.

Simon, D. B., Nelson-Williams, C., Bia, M. J., Ellison, D., Karet, F. E., Molina, A. M.,

Vaara, I., Iwata, F., Cushner, H. M., and Koolen, M. (1996). Gitelman's variant of Bartter's syndrome, inherited hypokalaemic alkalosis, is caused by mutations in the thiazide-sensitive Na-Cl cotransporter. Nature Genetics, 12(1), 24-30.

Slatkin, M. (2008). Linkage disequilibrium-understanding the evolutionary past and

mapping the medical future. Nature Reviews. Genetics, 9(6), 477-485. Song, L., Zhang, Z., Grasfeder, L. L., Boyle, A. P., Giresi, P. G., Lee, B.-K., Sheffield,

N. C., Gräf, S., Huss, M., and Keefe, D. (2011). Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Genome Research, 21(10), 1757-1767.

Srivastava, A., Srivastava, N., and Mittal, B. (2016). Genetics of Obesity. Indian Journal

of Clinical Biochemistry, 31(4), 361-371. Stitziel, N. O., Stirrups, K. E., Masca, N., Erdmann, J., Ferrario, P. G., König, I. R.,

Weeke, P. E., Webb, T. R., Auer, P. L., and Schick, U. M. (2016). Coding variation in ANGPTL4, LPL, and SVEP1 and the risk of coronary disease. The New England Journal of Medicine, 374(12), 1134-1144.

Stonez, R., Schurman, S., Nayir, A., Alpay, H., Bakkaloglus, A., Rodriguez—Sorianofi,

I., Griswold, W., Richard, G. A., John15, E., and Lifton, R. P. (1997). Mutations in the chloride channel gene, CLCNKB, cause Bartter's syndrome type III. Nature Genetics, 17, 171.

Subramanian, S. (2016). Europeans have a higher proportion of high-frequency

deleterious variants than Africans. Human Genetics, 135(1), 1-7. Suwazono, Y., Kobayashi, E., Uetani, M., Miura, K., Morikawa, Y., Ishizaki, M., Kido,

T., Nakagawa, H., and Nogawa, K. (2006). Low-density lipoprotein receptor-related protein 5 variant A1330V is a determinant of blood pressure in Japanese males. Life Sciences, 78(21), 2475-2479.

Swager, S. A., Delfín, D. A., Rastogi, N., Wang, H., Canan, B. D., Fedorov, V. V.,

Mohler, P. J., Kilic, A., Higgins, R. S., and Ziolo, M. T. (2015). Claudin-5 levels are reduced from multiple cell types in human failing hearts and are associated with mislocalization of ephrin-B1. Cardiovascular Pathology, 24(3), 160-167.

Swapna, N., Vamsi, U. M., Usha, G., and Padma, T. (2011). Risk conferred by FokI

polymorphism of vitamin D receptor (VDR) gene for essential hypertension. Indian journal of Human Genetics, 17(3), 201-206.

Switzer, N. J., Mangat, H. S., and Karmali, S. (2013). Current trends in obesity: body

composition assessment, weight regulation, and emerging techniques in managing severe obesity. Journal of Interventional Gastroenterology, 3(1), 34.

Tennessen, J. A., Bigham, A. W., O‘Connor, T. D., Fu, W., Kenny, E. E., Gravel, S.,

McGee, S., Do, R., Liu, X., and Jun, G. (2012). Evolution and functional impact of

156

rare coding variation from deep sequencing of human exomes. Science, 337(6090), 64-69.

Tester, D. J., and Ackerman, M. J. (2014). Genetics of long QT syndrome. Methodist

DeBakey Cardiovascular Journal, 10(1), 29-33. TG and HDL Working Group of Exome Sequencing Project, N. H. L. a. B. I. (2014).

Loss-of-function mutations in APOC3, triglycerides, and coronary disease. The New England Journal of Medicine, 371(1), 22-31.

Tran, P.-K., Agardh, H. E., Tran-Lundmark, K., Ekstrand, J., Roy, J., Henderson, B.,

Gabrielsen, A., Hansson, G. K., Swedenborg, J., and Paulsson-Berne, G. (2007). Reduced perlecan expression and accumulation in human carotid atherosclerotic lesions. Atherosclerosis, 190(2), 264-270.

Travis, J. M., Münkemüller, T., Burton, O. J., Best, A., Dytham, C., and Johst, K.

(2007). Deleterious mutations can surf to high densities on the wave front of an expanding population. Molecular Biology and Evolution, 24(10), 2334-2343.

Tschirner, A., Palus, S., Hetzer, R., Meyer, R., Anker, S. D., and Springer, J. (2014).

Six1 is down‐regulated in end‐stage human dilated cardiomyopathy independently of Ezh2. ESC Heart Failure, 1(2), 154-159.

Umedani, L. V., Chaudhry, B., Mehraj, V., and Ishaq, M. (2013). Serene threonine

kinase 39 gene single nucleotide AG polymorphism rs35929607 is weakly associated with essential hypertension in population of Tharparkar, Pakistan. Journal of the Pakistan Medical Association, 63(2), 199-205.

van der Bom, T., Bouma, B. J., Meijboom, F. J., Zwinderman, A. H., and Mulder, B. J.

(2012). The prevalence of adult congenital heart disease, results from a systematic review and evidence based calculation. American Heart Journal, 164(4), 568-575.

Varol, E., Ozaydin, M., Altinbas, A., Aslan, S. M., Dogan, A., and Dede, O. (2007).

Elevated carbohydrate antigen 125 levels in hypertrophic cardiomyopathy patients with heart failure. Heart and Vessels, 22(1), 30-33.

Waalen, J. (2014). The genetics of human obesity. Translational Research, 164(4),

293-301. Wain, L. V. (2014). Rare variants and cardiovascular disease. Briefings in Functional

Genomics, 13(5), 384-391. Webb, T. R., Erdmann, J., Stirrups, K. E., Stitziel, N. O., Masca, N. G., Jansen, H.,

Kanoni, S., Nelson, C. P., Ferrario, P. G., and König, I. R. (2017). Systematic evaluation of pleiotropy identifies 6 further loci associated with coronary artery disease. Journal of the American College of Cardiology, 69(7), 823-836.

Weir, B. S., and Cockerham, C. C. (1984). Estimating F-statistics for the analysis of

population structure. Evolution, 38(6), 1358-1370.

157

Wellcome Trust Case Control Consortium. (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447(7145), 661-678.

Welter, D., MacArthur, J., Morales, J., Burdett, T., Hall, P., Junkins, H., Klemm, A.,

Flicek, P., Manolio, T., and Hindorff, L. (2013). The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Research, 42(D1), D1001-D1006.

Willer, C. J., Sanna, S., Jackson, A. U., Scuteri, A., Bonnycastle, L. L., Clarke, R.,

Heath, S. C., Timpson, N. J., Najjar, S. S., and Stringham, H. M. (2008). Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nature Genetics, 40(2), 161-169.

Williams, T., Hundertmark, M., Kraemer, D., Schönberger, J., Czolbe, M., Panther, F.,

Pekarek, V., and Ritter, O. (2011). The Eya4/six1 Signalling Cascade is Crucial in the Development of Heart Disease. Circulation, 124(Suppl 21), A12702.

Wilson, F. H., Disse-Nicodeme, S., Choate, K. A., Ishikawa, K., Nelson-Williams, C.,

Desitter, I., Gunel, M., Milford, D. V., Lipkin, G. W., and Achard, J.-M. (2001). Human hypertension caused by mutations in WNK kinases. Science, 293(5532), 1107-1112.

Winnepenninckx, B., Backeljau, T., and DeWachter, R. (1993). Extraction of high-

molecular-weight DNA from mollusks. Trends in Genetics, 9(12), 407. World Health Organization. (2016). Global Health Estimates 2015: Disease burden by

Cause, Age, Sex, by Country and by Region, 2000-2015. Retrieved from the World Health Organization website: http://www.who.int/healthinfo/ global_burden_disease/estimates/en/index2.html

World Health Organization. (2017a). Cardiovascular diseases (CVDs), Fact sheet May

2017. Retrieved from the World Health Organization website: http://www.who.int/mediacentre/factsheets/fs317/en/.

World Health Organization. (2017b). Obesity and Overweight Fact Sheet October

2017. Retrieved from World Health Organization website: http://www.who.int/mediacentre/factsheets/fs311/en/.

Xie, L., and Li, Y.-M. (2017). Lipoprotein Lipase (LPL) Polymorphism and the Risk of

Coronary Artery Disease: A Meta-Analysis. International Journal of Environmental Research and Public Health, 14(84), 1-7.

Xu, W., Wang, H., Cheng, W., Fu, D., Xia, T., Kibbe, W. A., and Lin, S. M. (2012). A

framework for annotating human genome in disease context. PloS One, 7(12), e49686.

Xu, Y., Gong, W., Peng, J., Wang, H., Huang, J., Ding, H., and Wang, D. W. (2014).

Functional analysis LRP6 novel mutations in patients with coronary artery disease. PloS One, 9(1), e84345.

http://www.who.int/mediacentre/factsheets/fs317/en/

http://www.who.int/mediacentre/factsheets/fs311/en/

158

Xue, Y., Chen, Y., Ayub, Q., Huang, N., Ball, E. V., Mort, M., Phillips, A. D., Shaw, K., Stenson, P. D., and Cooper, D. N. (2012). Deleterious-and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. The American Journal of Human Genetics, 91(6), 1022-1032.

Yang, H., and Wang, K. (2015). Genomic variant annotation and prioritization with

ANNOVAR and wANNOVAR. Nature Protocols, 10(10), 1556-1566. Ye, J., Fang, L., Zheng, H., Zhang, Y., Chen, J., Zhang, Z., Wang, J., Li, S., Li, R., and

Bolund, L. (2006). WEGO: a web tool for plotting GO annotations. Nucleic Acids Research, 34(suppl_2), W293-W297.

Yeo, G. S. (2017). Genetics of obesity: can an old dog teach us new tricks?

Diabetologia, 60(5), 778-783. Yu, C., Yan, Q., Fu, C., Shi, W., Wang, H., Zeng, C., and Wang, X. (2014). CYP4F2

genetic polymorphisms are associated with coronary heart disease in a Chinese population. Lipids in Health and Disease, 13(83), 1-5.

Yu, D., Chen, Y., Han, J., Zhang, H., Chen, X., Zou, W., Liang, L., Xu, C., and Liu, Z.

(2008). MUC19 expression in human ocular surface and lacrimal gland and its alteration in Sjögren syndrome patients. Experimental Eye Research, 86(2), 403-411.

Yusuf, S., Hawken, S., Ôunpuu, S., Dans, T., Avezum, A., Lanas, F., McQueen, M.,

Budaj, A., Pais, P., and Varigos, J. (2004). Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. The Lancet, 364(9438), 937-952.

Zhou, C., Li, C., Zhou, B., Sun, H., Koullourou, V., Holt, I., Puckelwartz, M. J., Warren,

D. T., Hayward, R., and Lin, Z. (2017). Novel nesprin-1 mutations associated with dilated cardiomyopathy cause nuclear envelope disruption and defects in myogenesis. Human Molecular Genetics, 26(12), 2258-2276.

Zlotorynski, E. (2015). Chromosome biology: CTCF-binding site orientation shapes the

genome. Nature Reviews. Molecular Cell Biology, 16(10), 578-579. doi:10.1038/nrm4057

159

Chapter 7.0

Appendix Table 1: Cardiac diseases and their genes analyzed in this study

A. Common CVDs: Cardiac disease Gene Cardiac disease Gene

Aneurysm ABCC6 Coronary heart disease CYP2J2

Aneurysm ABHD16A Coronary heart disease CYP3A4

Aneurysm ACE Coronary heart disease DAB2IP

Aneurysm ACTA2 Coronary heart disease EPHX1

Aneurysm ADIPOQ Coronary heart disease FCAR

Aneurysm AGT Coronary heart disease FGF21

Aneurysm AGTR1 Coronary heart disease FTL

Aneurysm APOA1 Coronary heart disease GCG

Aneurysm APOB Coronary heart disease GP1BA

Aneurysm APOE Coronary heart disease GP6

Aneurysm APOM Coronary heart disease GSN

Aneurysm BAG6 Coronary heart disease HFE

Aneurysm CAPN2 Coronary heart disease HSPA8

Aneurysm CASP3 Coronary heart disease IFNG

Aneurysm CCL22 Coronary heart disease IFNWP5

Aneurysm CCR5 Coronary heart disease IGHE

Aneurysm CD44 Coronary heart disease IL15

Aneurysm CD59 Coronary heart disease IL18BP

Aneurysm CHI3L1 Coronary heart disease IL4

Aneurysm COL3A1 Coronary heart disease INSIG1

Aneurysm CRP Coronary heart disease KAT2B

Aneurysm CTSB Coronary heart disease KCNJ11

Aneurysm EGR1 Coronary heart disease KIF6

Aneurysm ENG Coronary heart disease KLK3

Aneurysm FBLN5 Coronary heart disease MEF2A

Aneurysm FBN2 Coronary heart disease MTHFD1L

Aneurysm FGF1 Coronary heart disease NOD1

Aneurysm FGF2 Coronary heart disease NOS2

Aneurysm FLT1 Coronary heart disease NPC1

Aneurysm FN1 Coronary heart disease NPC1L1

Aneurysm GZMB Coronary heart disease NPPC

Aneurysm HGF Coronary heart disease NQO1

Aneurysm HMGB1 Coronary heart disease PON3

Aneurysm HOXA4 Coronary heart disease PPARD

Aneurysm HPSE Coronary heart disease SLC2A9

Aneurysm HSPA4 Coronary heart disease SUMO4

Aneurysm HSPB1 Coronary heart disease TCF7L2

Aneurysm IL2RA Coronary heart disease TFAP2B

Aneurysm IL8 Coronary heart disease THBS2

Aneurysm ITPR3 Coronary heart disease THRA

Aneurysm JDP2 Coronary heart disease TNFRSF1B

Aneurysm KDR Coronary heart disease VAMP8

Aneurysm KLF15 Coronary heart disease, transient cerebral ischemia

GCKR

Aneurysm KLK1 Coronary heart disease INS-IGF2

Aneurysm LEP Endocarditis ITGA2B

Aneurysm LIMK1 Endocarditis PLCB2

Aneurysm LOX Gestational hypertension MIR499A

Aneurysm LRP1 Heart failure ABCB1

Aneurysm LTBP4 Heart failure ACY1

160

Aneurysm MMP10 Heart failure ADCY6

Aneurysm MMP8 Heart failure ADRA1A

Aneurysm MTHFR Heart failure ADRA1B

Aneurysm PF4 Heart failure ADRB3

Aneurysm PLA2G10 Heart failure ADRBK1

Aneurysm PLA2G2A Heart failure APLN

Aneurysm PLAT Heart failure AQP2

Aneurysm PPARG Heart failure ATP2A3

Aneurysm PRDX1 Heart failure AVPR1A

Aneurysm PRKCB Heart failure BDKRB1

Aneurysm PRKCD Heart failure BVES

Aneurysm PROC Heart failure CALCA

Aneurysm PRRC2A Heart failure CALCRL

Aneurysm PTGS2 Heart failure CAMK2D

Aneurysm RETN Heart failure CASP1

Aneurysm RTN4 Heart failure CAV3

Aneurysm SELE Heart failure CD34

Aneurysm SERBP1 Heart failure CEBPA

Aneurysm SERPINA5 Heart failure CFLAR

Aneurysm SERPINE1 Heart failure CKMT1B

Aneurysm TGFBR1 Heart failure CLDN5

Aneurysm TGFBR2 Heart failure CNR2

Aneurysm TIMP1 Heart failure CORIN

Aneurysm TIMP2 Heart failure CSF3

Aneurysm TIMP3 Heart failure CTF1

Aneurysm TNF Heart failure CTGF

Aneurysm TNFRSF11B Heart failure CTSG

Aneurysm XYLT1 Heart failure CYP27B1

Aortic valve disease 2 SMAD6 Heart failure CYP2D6

Arteriopathy AHSG Heart failure DDAH1

Arteriopathy ALPL Heart failure DUSP1

Arteriopathy APOH Heart failure DYRK1A

Arteriopathy APOL1 Heart failure ESRRA

Arteriopathy CD163 Heart failure FCGR3B

Arteriopathy CX3CR1 Heart failure FKBP1B

Arteriopathy ENPP1 Heart failure FOXC1

Arteriopathy F12 Heart failure FOXC2

Arteriopathy F2 Heart failure FOXO1

Arteriopathy F2RL2 Heart failure FOXP1

Arteriopathy F5 Heart failure FOXP4

Arteriopathy F7 Heart failure FRMD4B

Arteriopathy FGG Heart failure FSTL1

Arteriopathy GGT1 Heart failure FSTL3

Arteriopathy HIF1A Heart failure GATM

Arteriopathy HTRA1 Heart failure GNAQ

Arteriopathy ICAM1 Heart failure GRK5

Arteriopathy IL6 Heart failure HAMP

Arteriopathy INS Heart failure HDAC4

Arteriopathy ITGAV Heart failure HLA-B

Arteriopathy ITGB3 Heart failure HSPA1B

Arteriopathy LIPC Heart failure HTR4

Arteriopathy LPA Heart failure IL1RL1

Arteriopathy MPO Heart failure JPH2

Arteriopathy MTTP Heart failure KCNE1

Arteriopathy NOTCH3 Heart failure KCNH2

Arteriopathy NPPB Heart failure KCNQ1

161

Arteriopathy OSBPL10 Heart failure LAMA4

Arteriopathy PCSK9 Heart failure LCN2

Arteriopathy PDGFD Heart failure LGALS3

Arteriopathy PLA2G7 Heart failure LRG1

Arteriopathy PLTP Heart failure MAP4

Arteriopathy SCARB1 Heart failure MAPK14

Arteriopathy SPP1 Heart failure MDK

Arteriopathy TF Heart failure MIR199B

Arteriopathy TNFSF12 Heart failure MIR423

Arteriopathy UGT1A1 Heart failure MMP13

Arteriopathy VCAM1 Heart failure MUC16

Arteriopathy VKORC1 Heart failure MYBPC3

Atherosclerosis ABCA1 Heart failure MYL2

Atherosclerosis ABCD1 Heart failure MYL9

Atherosclerosis ABCG1 Heart failure NISCH

Atherosclerosis ABCG5 Heart failure NOL3

Atherosclerosis ABCG8 Heart failure NOS1

Atherosclerosis ACE2 Heart failure NOX5

Atherosclerosis ADAM10 Heart failure NUPR1

Atherosclerosis ADAM15 Heart failure OPA1

Atherosclerosis ADAM17 Heart failure OXT

Atherosclerosis ADAM33 Heart failure PAK1

Atherosclerosis ADAM8 Heart failure PARP1

Atherosclerosis ADAM9 Heart failure PDE5A

Atherosclerosis ADAMTS4 Heart failure PDK1

Atherosclerosis ADAMTS5 Heart failure POMC

Atherosclerosis ADIPOR2 Heart failure POSTN

Atherosclerosis ADM Heart failure PPP1R1A

Atherosclerosis ADRB2 Heart failure PPP1R2

Atherosclerosis AGER Heart failure PPRC1

Atherosclerosis AGTR2 Heart failure PRKAA2

Atherosclerosis AHR Heart failure PROM1

Atherosclerosis AKR1B1 Heart failure PTHLH

Atherosclerosis AKR1B10 Heart failure PTK2B

Atherosclerosis AKT1 Heart failure RAMP1

Atherosclerosis ALB Heart failure RAMP2

Atherosclerosis ALDH2 Heart failure RAMP3

Atherosclerosis ALOX15 Heart failure RAPGEF3

Atherosclerosis ALOX5 Heart failure REN

Atherosclerosis ALOX5AP Heart failure S100A1

Atherosclerosis ANGPT2 Heart failure S100B

Atherosclerosis AOC3 Heart failure SFTPB

Atherosclerosis APCS Heart failure SLC2A4

Atherosclerosis APH1B Heart failure SLC7A1

Atherosclerosis APOA1BP Heart failure SLC9A1

Atherosclerosis APOA4 Heart failure SOD3

Atherosclerosis APOA5 Heart failure SPATA5L1

Atherosclerosis APOBR Heart failure SRF

Atherosclerosis APOC1 Heart failure STAT1

Atherosclerosis APOC3 Heart failure STC1

Atherosclerosis AR Heart failure TBX5

Atherosclerosis ARG2 Heart failure TEX40

Atherosclerosis B2M Heart failure THBS1

Atherosclerosis BGLAP Heart failure TIMP4

Atherosclerosis BMP4 Heart failure TJP1

Atherosclerosis BRAP Heart failure TNFSF12-TNFSF13

162

Atherosclerosis BSG Heart failure TNNI1

Atherosclerosis CA12 Heart failure TRPC3

Atherosclerosis CA2 Heart failure TRPC6

Atherosclerosis CACNA1C Heart failure TTR

Atherosclerosis CAMP Heart failure UCN

Atherosclerosis CAPG Heart failure UCN2

Atherosclerosis CAPN10 Heart failure UNC93B1

Atherosclerosis CAT Heart failure YY1

Atherosclerosis CCL2 Heart failure ZFPM2

Atherosclerosis CCL23 Hypertension ACAT1

Atherosclerosis CCL5 Hypertension ACSM3

Atherosclerosis CCR2 Hypertension ACVRL1

Atherosclerosis CD14 Hypertension ADD1

Atherosclerosis CD36 Hypertension ADD2

Atherosclerosis CD40 Hypertension ADORA2B

Atherosclerosis CD86 Hypertension ADRA2A

Atherosclerosis CDH1 Hypertension ADRA2B

Atherosclerosis CDH13 Hypertension ALAD

Atherosclerosis CDH5 Hypertension ALOX12

Atherosclerosis CDKN1B Hypertension ANGPT1

Atherosclerosis CDKN1C Hypertension ANPEP

Atherosclerosis CDKN2A Hypertension APEX1

Atherosclerosis CDKN2B Hypertension APLNR

Atherosclerosis CETP Hypertension AQP4

Atherosclerosis CFH Hypertension ARG1

Atherosclerosis CHIT1 Hypertension ARHGEF1

Atherosclerosis CNR1 Hypertension ARHGEF6

Atherosclerosis CPB2 Hypertension ARL6IP5

Atherosclerosis CPE Hypertension ARSG

Atherosclerosis CPT1A Hypertension ATP1A1

Atherosclerosis CSF1 Hypertension ATP1A2

Atherosclerosis CST3 Hypertension ATP1B1

Atherosclerosis CTSS Hypertension ATP2B1

Atherosclerosis CX3CL1 Hypertension ATP5B

Atherosclerosis CXCL1 Hypertension BDKRB2

Atherosclerosis CXCL12 Hypertension BDNF

Atherosclerosis CXCL16 Hypertension BGN

Atherosclerosis CXCL5 Hypertension BLVRA

Atherosclerosis CXCR3 Hypertension BMP10

Atherosclerosis CYBA Hypertension BMP2

Atherosclerosis CYP19A1 Hypertension BMP7

Atherosclerosis CYP27A1 Hypertension BMPR1B

Atherosclerosis CYP2C19 Hypertension BMPR2

Atherosclerosis CYP2C9 Hypertension BTN2A1

Atherosclerosis DKK1 Hypertension C1QTNF1

Atherosclerosis ECE1 Hypertension CACNA1D

Atherosclerosis EDN1 Hypertension CACNB2

Atherosclerosis EDNRB Hypertension CASP8

Atherosclerosis EGF Hypertension CAV1

Atherosclerosis ELANE Hypertension CHEK2

Atherosclerosis ELAVL1 Hypertension CHGB

Atherosclerosis EPHX2 Hypertension CLCNKA

Atherosclerosis ESAM Hypertension CLCNKB

Atherosclerosis ESR1 Hypertension CLU

Atherosclerosis ESR2 Hypertension COMT

Atherosclerosis ETS2 Hypertension CPS1

163

Atherosclerosis F3 Hypertension CRHR1

Atherosclerosis F8 Hypertension CSK

Atherosclerosis FABP3 Hypertension CSMD1

Atherosclerosis FABP4 Hypertension CTH

Atherosclerosis FABP5 Hypertension CTNNB1

Atherosclerosis FASLG Hypertension CYP11A1

Atherosclerosis FCGR2A Hypertension CYP11B1

Atherosclerosis FCGR3A Hypertension CYP17A1

Atherosclerosis FGF23 Hypertension CYP1A2

Atherosclerosis FOXO3 Hypertension CYP21A2

Atherosclerosis FOXP3 Hypertension CYP3A5

Atherosclerosis FPR1 Hypertension CYP4A11

Atherosclerosis FPR2 Hypertension CYP4A22

Atherosclerosis GAS6 Hypertension CYP4F2

Atherosclerosis GDF15 Hypertension DBH

Atherosclerosis GHRL Hypertension DDAH2

Atherosclerosis GJA1 Hypertension DIO2

Atherosclerosis GJA4 Hypertension DNM1L

Atherosclerosis GNB3 Hypertension DPP4

Atherosclerosis GPT Hypertension DRD1

Atherosclerosis GRN Hypertension EDN3

Atherosclerosis GSTM1 Hypertension EGFR

Atherosclerosis GSTO1 Hypertension EMILIN1

Atherosclerosis GSTP1 Hypertension ENPEP

Atherosclerosis GSTT1 Hypertension EPO

Atherosclerosis H6PD Hypertension ERAP1

Atherosclerosis HABP2 Hypertension F11R

Atherosclerosis HAS2 Hypertension F2RL1

Atherosclerosis HAVCR2 Hypertension FBN1

Atherosclerosis HBA1 Hypertension FGA

Atherosclerosis HBEGF Hypertension FGB

Atherosclerosis HDAC5 Hypertension FGF5

Atherosclerosis HMGCR Hypertension FGFBP1

Atherosclerosis HMOX1 Hypertension FH

Atherosclerosis HNF1A Hypertension FMO3

Atherosclerosis HNRNPC Hypertension FURIN

Atherosclerosis HP Hypertension GCGR

Atherosclerosis HSPD1 Hypertension GCK

Atherosclerosis HSPG2 Hypertension GDF2

Atherosclerosis ICOS Hypertension GH1

Atherosclerosis IGF1 Hypertension GHR

Atherosclerosis IGF1R Hypertension GNA12

Atherosclerosis IGFALS Hypertension GNAS

Atherosclerosis IGFBP1 Hypertension GOSR2

Atherosclerosis IGFBP3 Hypertension GPX3

Atherosclerosis IKBKB Hypertension GPX4

Atherosclerosis IL18 Hypertension GREM1

Atherosclerosis IL1A Hypertension GRK4

Atherosclerosis IL1B Hypertension GSTA1

Atherosclerosis IL1RN Hypertension GSTM3

Atherosclerosis IL20 Hypertension GUCA2B

Atherosclerosis IL32 Hypertension HEY1

Atherosclerosis IL6ST Hypertension HLA-A

Atherosclerosis IL7R Hypertension HSD11B1

Atherosclerosis IRS2 Hypertension HSD11B2

Atherosclerosis ITGA2 Hypertension HSD3B1

164

Atherosclerosis ITGB5 Hypertension HSD3B2

Atherosclerosis ITLN1 Hypertension HSPA1A

Atherosclerosis JAK2 Hypertension HSPA1L

Atherosclerosis JAM3 Hypertension HTR2A

Atherosclerosis JUN Hypertension IAPP

Atherosclerosis KL Hypertension ID1

Atherosclerosis KLF2 Hypertension ID2

Atherosclerosis KLRK1 Hypertension IER3

Atherosclerosis LCAT Hypertension IGF2

Atherosclerosis LDLR Hypertension IL12B

Atherosclerosis LEPR Hypertension IL23R

Atherosclerosis LGALS1 Hypertension ILF3

Atherosclerosis LIPG Hypertension INHA

Atherosclerosis LPL Hypertension INHBA

Atherosclerosis LRP6 Hypertension INPPL1

Atherosclerosis LTBR Hypertension INSR

Atherosclerosis LTC4S Hypertension IRS1

Atherosclerosis MAPK7 Hypertension KCNA5

Atherosclerosis MBL2 Hypertension KCNK3

Atherosclerosis MERTK Hypertension KCNMA1

Atherosclerosis MGP Hypertension KCNMB1

Atherosclerosis MIF Hypertension KLC1

Atherosclerosis MIR130A Hypertension KLF5

Atherosclerosis MIR146A Hypertension KLHL3

Atherosclerosis MIR150 Hypertension KLKB1

Atherosclerosis MIR210 Hypertension KNG1

Atherosclerosis MIR27B Hypertension KYNU

Atherosclerosis MMP1 Hypertension LIPE

Atherosclerosis MMP12 Hypertension LRP5

Atherosclerosis MMP3 Hypertension LYZ

Atherosclerosis MNDA Hypertension MAOA

Atherosclerosis NAMPT Hypertension MAP1LC3B

Atherosclerosis NAT2 Hypertension MAPK1

Atherosclerosis NCEH1 Hypertension MAPK8

Atherosclerosis NFATC2 Hypertension MAT1A

Atherosclerosis NFE2L2 Hypertension MC4R

Atherosclerosis NGB Hypertension MEX3C

Atherosclerosis NGF Hypertension MFN2

Atherosclerosis NOX1 Hypertension MIR204

Atherosclerosis NPPA Hypertension MIR21

Atherosclerosis NPY Hypertension MLYCD

Atherosclerosis NR1D1 Hypertension MYOC

Atherosclerosis NR1H3 Hypertension NCF1C

Atherosclerosis NR3C2 Hypertension NEDD4L

Atherosclerosis NRG1 Hypertension NFKBIL1

Atherosclerosis OSBPL8 Hypertension NOX3

Atherosclerosis P2RY12 Hypertension NOX4

Atherosclerosis P2RY2 Hypertension NPR1

Atherosclerosis PALLD Hypertension NR0B1

Atherosclerosis PAPPA Hypertension NR1H4

Atherosclerosis PDE1A Hypertension NR3C1

Atherosclerosis PDE4D Hypertension OPTN

Atherosclerosis PDGFB Hypertension OTC

Atherosclerosis PDGFC Hypertension PCNA

Atherosclerosis PDPN Hypertension PDC

Atherosclerosis PEPD Hypertension PDGFRB

165

Atherosclerosis PGF Hypertension PDHA1

Atherosclerosis PGLYRP1 Hypertension PHOX2A

Atherosclerosis PLA2G3 Hypertension PIK3R1

Atherosclerosis PLA2G6 Hypertension PIM1

Atherosclerosis PLAU Hypertension PLEKHA7

Atherosclerosis PLAUR Hypertension PNMT

Atherosclerosis PLIN2 Hypertension POU5F1

Atherosclerosis PON1 Hypertension PRCP

Atherosclerosis PON2 Hypertension PRKG1

Atherosclerosis PPARA Hypertension PRSS8

Atherosclerosis PPARGC1A Hypertension PSMB9

Atherosclerosis PPIA Hypertension PSMD9

Atherosclerosis PRKCZ Hypertension PTGIS

Atherosclerosis PTGDS Hypertension PTPN1

Atherosclerosis PTGES Hypertension RETNLB

Atherosclerosis PTH Hypertension RGS2

Atherosclerosis PTPN22 Hypertension RHOB

Atherosclerosis PTX3 Hypertension RLN1

Atherosclerosis QSOX1 Hypertension RLN2

Atherosclerosis RARRES2 Hypertension RNLS

Atherosclerosis RBP4 Hypertension ROBO4

Atherosclerosis RGS5 Hypertension ROCK2

Atherosclerosis RHOA Hypertension ROS1

Atherosclerosis RNASE3 Hypertension S100A4

Atherosclerosis RNASE4 Hypertension SARS

Atherosclerosis ROCK1 Hypertension SCG2

Atherosclerosis RORA Hypertension SCNN1B

Atherosclerosis RSAD2 Hypertension SCNN1G

Atherosclerosis RTN3 Hypertension SDK1

Atherosclerosis S100A12 Hypertension SERPINA1

Atherosclerosis S100A8 Hypertension SERPINC1

Atherosclerosis S100A9 Hypertension SERPINF1

Atherosclerosis SAA1 Hypertension SFMBT1

Atherosclerosis SAMD9 Hypertension SGK1

Atherosclerosis SCARB2 Hypertension SLC12A2

Atherosclerosis SCD Hypertension SLC12A3

Atherosclerosis SELL Hypertension SLC22A1

Atherosclerosis SELP Hypertension SLC22A2

Atherosclerosis SELPLG Hypertension SLC22A3

Atherosclerosis SEPP1 Hypertension SLC22A6

Atherosclerosis SERPINA12 Hypertension SLC22A8

Atherosclerosis SERPIND1 Hypertension SLC26A4

Atherosclerosis SHBG Hypertension SLC2A12

Atherosclerosis SIRT1 Hypertension SLC2A5

Atherosclerosis SLC5A7 Hypertension SLC4A4

Atherosclerosis SLC6A4 Hypertension SLC6A18

Atherosclerosis SOCS1 Hypertension SLC6A19

Atherosclerosis SOCS3 Hypertension SLC6A2

Atherosclerosis SOD1 Hypertension SLC6A9

Atherosclerosis SOX18 Hypertension SLC8A1

Atherosclerosis SREBF2 Hypertension SLCO1B1

Atherosclerosis ST8SIA1 Hypertension SLCO4C1

Atherosclerosis STAT3 Hypertension SMAD1

Atherosclerosis SVEP1 Hypertension SMAD4

Atherosclerosis TERT Hypertension SMAD5

Atherosclerosis TFPI Hypertension SORBS1

166

Atherosclerosis TGFB1 Hypertension SREBF1

Atherosclerosis THBD Hypertension SRY

Atherosclerosis TLR2 Hypertension STEAP4

Atherosclerosis TLR4 Hypertension STK39

Atherosclerosis TNFRSF12A Hypertension SUCNR1

Atherosclerosis TNFRSF14 Hypertension TAP1

Atherosclerosis TNFRSF1A Hypertension TBX4

Atherosclerosis TNFRSF25 Hypertension TGFA

Atherosclerosis TNFSF10 Hypertension TGFB3

Atherosclerosis TNFSF11 Hypertension TGFBR3

Atherosclerosis TNFSF15 Hypertension TH

Atherosclerosis TNFSF4 Hypertension THPO

Atherosclerosis TNNT2 Hypertension TLR9

Atherosclerosis TOR2A Hypertension TNFAIP3

Atherosclerosis TP53 Hypertension TPH1

Atherosclerosis TREM1 Hypertension TRPC4

Atherosclerosis TRIB3 Hypertension TRPM6

Atherosclerosis TSPO Hypertension TRPM7

Atherosclerosis TXN Hypertension TRPV5

Atherosclerosis TYRO3 Hypertension TSHR

Atherosclerosis UCP1 Hypertension TXNIP

Atherosclerosis UCP2 Hypertension UMOD

Atherosclerosis USF1 Hypertension VDR

Atherosclerosis UTS2 Hypertension VIP

Atherosclerosis UTS2D Hypertension VNN1

Atherosclerosis UTS2R Hypertension WISP1

Atherosclerosis VEGFA Hypertension WNK1

Atherosclerosis VEGFC Hypertension WNK4

Atherosclerosis WNT5A Hypertension WWOX

Atherosclerosis XBP1 Hypertension XPNPEP1

Atherosclerosis XDH Hypertension ZNF652

Atherosclerosis YWHAZ Hypertension PLEKHA7

Atherosclerosis ZNF202 Hypertension, myocardial infarction GCLC

Atherosclerosis, coronary artery disease PECAM1 Hypertension, myocardial infarction 1 GUCY1A3

Atherosclerosis, myocardial infarction FAM5C Hypertriglyceridemia CREB3L3

Central core disease, tachycardia RYR1 Hypertriglyceridemia LMF1

Coronary artery disease ACAD10 Intracerebral haemorrhage PMF1-BGLAP Coronary artery disease FMN2 Myocardial infarction BRINP3

Coronary artery disease FNDC1 Myocardial infarction GCLM

Coronary artery disease SEZ6L Myocardial infarction LGALS2

Coronary artery spasm 3, susceptibility to ARHGAP9 Myocardial infarction LTA

Coronary heart disease ABO Myocardial infarction MIAT

Coronary heart disease ADAMTS13 Myocardial infarction OLR1

Coronary heart disease ADH1C Myocardial infarction PSMA6

Coronary heart disease ADORA3 Myocardial infarction 1 CCT7

Coronary heart disease AMPD1 Myocardial infarction 1 LRP8

Coronary heart disease ANG Myocardial infarction, protection against|venous thrombosis, protection against

F13A1

Coronary heart disease ANGPTL4 Myocardial ischemia CDKN1A

Coronary heart disease APOC2 Myocardial ischemia COL1A1

Coronary heart disease APOC4 Myocardial ischemia LDHA

Coronary heart disease APOC4-APOC2

Myocardial ischemia NCAM1

Coronary heart disease AS3MT Myocardial ischemia PNOC

Coronary heart disease ATP5J Myocardial ischemia RUNX1

Coronary heart disease AVP Myocardial ischemia SHH

Coronary heart disease C3 Myocardial ischemia TYMP

Coronary heart disease CDKN2B-AS1

Noonan syndrome with asd CBL

167

Coronary heart disease CMA1 Peripheral vascular disease GCH1

Coronary heart disease CNDP1 Pulmonic stenosis|supravalvar aortic stenosis|rasopathy

LRRC56

Coronary heart disease CTSL Rheumatic heart disease IL1R1

Coronary heart disease CXCL10 Rheumatic heart disease KCND3

Coronary heart disease CXCR6 Rheumatic heart disease TLR5

Coronary heart disease CYP1A1 Supravalvar aortic stenosis ELN

Coronary heart disease CYP2C8

B. Mendelian and Congenital CVDs:

Cardiac Disease Gene Cardiac Disease Gene

Aortic aneurysm, familial thoracic 4, thoracic aortic aneurysms and aortic dissections

MYH11 Dilated cardiomyopathy NFKB1

Aortic aneurysm, familial thoracic 6 ACTA2 Dilated cardiomyopathy NPPB

Aortic aneurysm, familial thoracic 8 PRKG1 Dilated cardiomyopathy NPPC

Aortic aneurysm, familial thoracic 9 MFAP5 Dilated cardiomyopathy NPR2

Aortic valve disease 2 SMAD6 Dilated cardiomyopathy OSM

Arrhythmogenic right ventricular cardiomyopathy

CTF1 Dilated cardiomyopathy PGM1

Arrhythmogenic right ventricular cardiomyopathy

DMD Dilated cardiomyopathy POLG

Arrhythmogenic right ventricular cardiomyopathy, type 1

TGFB3 Dilated cardiomyopathy RBM20

Arrhythmogenic right ventricular dysplasia 11

DSC3 Dilated cardiomyopathy SDHA

Arrhythmogenic right ventricular dysplasia 5 TMEM43 Dilated cardiomyopathy SERPINE1

Arrhythmogenic right ventricular dysplasia 9 PKP2 Dilated cardiomyopathy SLC25A5P8

Arterial calcification of infancy ENPP1 Dilated cardiomyopathy SOD2

Atherosclerosis, susceptibility to VEGFA Dilated cardiomyopathy STAT3

Atrial fibrillation, familial, 11 GJA5 Dilated cardiomyopathy TAZ

Atrial fibrillation, familial, 12 ABCC9 Dilated cardiomyopathy TGFB1

Atrial fibrillation, familial, 12 SUR2 Dilated cardiomyopathy TNC

Atrial fibrillation, familial, 14 SCN2B Dilated cardiomyopathy TNFRSF11B

Atrial fibrillation, familial, 15 NUP155 Dilated cardiomyopathy TNFSF10

Atrial fibrillation, familial, 7 KCNA5 Dilated cardiomyopathy TNFSF12

Atrial myxoma, familial PRKAR1A Dilated cardiomyopathy TP53

Abdominal obesity-metabolic syndrome 3 DYRK1B Dilated cardiomyopathy HLA-DQB1

Adams-oliver syndrome 1 ARHGAP31 Dilated cardiomyopathy HLA-DRB4

Amyloidosis, cardiac and cutaneous APOA1 Dilated cardiomyopathy TNF

Atrial septal defect 2 GATA4 Dilated cardiomyopathy TNFRSF12A

Atrial septal defect 3 MYH6 Dilated cardiomyopathy 1AA, Primary familial hypertrophic cardiomyopathy

ACTN2

Atrial septal defect 4 TBX20 Dilated cardiomyopathy 1F, Limb-girdle muscular dystrophy, type 1E

DNAJB6

Atrial septal defect 5 ACTC1 Dilated cardiomyopathy 1II CRYAB

Atrial septal defect 6 TLL1 Dilated cardiomyopathy 1LL, Left ventricular noncompaction 8

PRDM16

Atrial septal defect 6 ENTPD5 Dilated cardiomyopathy 1P, Familial hypertrophic cardiomyopathy 18, Sudden cardiac death, Dilated cardiomyopathy, Cardiac arrest

CEP85L

Atrial septal defect 6 LTBP2 Dilated cardiomyopathy 1P, Familial hypertrophic cardiomyopathy 18, Sudden cardiac death, Dilated cardiomyopathy, Cardiac arrest

PLN

Atrial septal defect 6 AREL1 Dilated cardiomyopathy, dilated RYR2

168

cardiomyopathy

Atrial septal defect 6 ACOT2 Double-outlet right ventricle CFC1

Atrial septal defect 6 TTLL5 Double-outlet right ventricle- GDF1

Atrial septal defect 6 ANGEL1 Dursun syndrome G6PC3

Atrial septal defect 6 VRTN Ehlers-Danlos syndrome, autosomal recessive, cardiac valvular form

COL1A2

Atrial septal defect 6 ZC2HC1C Essential hypertension PTGIS

Atrial septal defect 6 PAPLN Fabry disease, cardiac variant GLA

Atrial septal defect 6 ACOT4 Fabry disease, cardiac variant RPL36A-HNRNPH2

Atrial septal defect 6 PTGR2 Familial abdominal aortic aneurysm 1 MYLK

Atrial septal defect 6 PROX2 Familial hypertrophic cardiomyopathy 1 LDB3

Atrial septal defect 7, with or without av conduction defects

NKX2-5 Familial hypertrophic cardiomyopathy 1, Myosin storage myopathy, Dilated cardiomyopathy 1S, Myopathy, distal, 1, Scapuloperoneal myopathy, MYH7-related

MYH7

Atrial septal defect 8 CITED2 Familial hypertrophic cardiomyopathy 12, Dilated cardiomyopathy 1M

CSRP3

Atrioventricular septal defect 3 GJA1 Familial hypertrophic cardiomyopathy 3, Primary familial hypertrophic cardiomyopathy, Sudden cardiac death, Cardiomyopathy

TPM1

Axenfeld-rieger syndrome, type 3 FOXC1 Familial hypertrophic cardiomyopathy 4 MADD

Cafe au lait spots, multiple|atrial septal defect

SOS1 Familial hypertrophic cardiomyopathy 7, Cardiomyopathy

TNNI3

Cardiac arrest DSP Familial type 5 hyperlipoproteinemia APOA5

Cardiac arrhythmia AKAP9 Fatal infantile mitochondrial cardiomyopathy

SDHD

Cardiac arrhythmia CACNA1C Fracture, hip, susceptibility to, Myocardial infarction, Thrombocytopenia, neonatal alloimmune, Posttransfusion purpura, PL(A1)/(A2) ALLOANTIGEN POLYMORPHISM

ITGB3

Cardiac arrhythmia CACNA1C-AS1

Generalized arterial calcification of infancy 2

ABCC6

Cardiac arrhythmia HCN4 Glycogen storage disease II GAA

Cardiac arrhythmia JUP Gm1-gangliosidosis, type i, with cardiac involvement

GLB1

Cardiac arrhythmia KCNE1 Hirschsprung disease, cardiac defects, and autonomic dysfunction

ECE1

Cardiac arrhythmia KCNJ2 Histiocytosis-lymphadenopathy plus syndrome

SLC29A3

Cardiac arrhythmia KCNJ5 Histiocytosis-lymphadenopathy plus syndrome

PRF1

Cardiac arrhythmia KCNQ1 Human immunodeficiency virus type 1, rapid progression to AIDS, Coronary artery disease, resistance to, Age-related macular degeneration 12, MACULAR DEGENERATION, AGE-RELATED, 12, SUSCEPTIBILITY TO

CX3CR1

Cardiac arrhythmia SCN3B Hyperlipidemia, familial combined, susceptibility to

USF1

Cardiac arrhythmia SCN4B Hyperlipidemia, familial combined, Coronary heart disease

LPL

Cardiac arrhythmia, atrial fibrillation, familial, 13

SCN1B Hyperlipoproteinemia, type Ib APOC2

Cardiac arrhythmia, atrial fibrillation, familial, 6

NPPA Hyperlipoproteinemia, type id GPIHBP1

Cardiac arrhythmia, dilated cardiomyopathy, atrial fibrillation, familial, 10

SCN5A Hypertension, early-onset, autosomal dominant, with exacerbation in pregnancy

NR3C2

169

Cardiac arrhythmia, long qt syndrome 12 SNTA1 Hypertension, essential PTGIS

Cardiac arrhythmia, long qt syndrome 6 KCNE2 Hypertension, essential, susceptibility to AGTR1

Cardiac arrhythmia|arrhythmia ANK2 Hypertension, essential, susceptibility to, Crohn disease, association with

AGT

Cardiac arrhythmia|not specified CASQ2 Hypertrophic cardiomyopathy ACE2

Cardiac arrhythmia|not specified DSG2 Hypertrophic cardiomyopathy AGTR2

Cardiac conduction defect, susceptibility to AKAP10 Hypertrophic cardiomyopathy AR

Cardiac conduction disease with or without dilated cardiomyopathy

FPGT-TNNI3K

Hypertrophic cardiomyopathy BIRC5

Cardiac conduction disease with or without dilated cardiomyopathy

TNNI3K Hypertrophic cardiomyopathy COX15

Cardiac valvular dysplasia, x-linked FLNA Hypertrophic cardiomyopathy CYP11B2

Cardiac valvular dysplasia, x-linked FOS Hypertrophic cardiomyopathy DMPK

Cardiac conduction defect, susceptibility to AKAP10 Hypertrophic cardiomyopathy EDN2

Cardioencephalomyopathy, fatal infantile, due to cytochrome c oxidase deficiency 3

COA5 Hypertrophic cardiomyopathy FHL1

Cardioencephalomyopathy, fatal infantile, due to cytochrome c oxidase deficiency 4

COA6 Hypertrophic cardiomyopathy FXN

Cardiofaciocutaneous syndrome BRAF Hypertrophic cardiomyopathy HLA-DPB1

Cardiofaciocutaneous syndrome 3 MAP2K1 Hypertrophic cardiomyopathy IGF1R

Cardiofaciocutaneous syndrome 4 MAP2K2 Hypertrophic cardiomyopathy IGFBP1

Cardiomyopathy ALMS1 Hypertrophic cardiomyopathy IGFBP3

Cardiomyopathy ANKRD1 Hypertrophic cardiomyopathy MMP2

Cardiomyopathy CAV3 Hypertrophic cardiomyopathy MRPL3

Cardiomyopathy DTNA Hypertrophic cardiomyopathy MRPS22

Cardiomyopathy EMD Hypertrophic cardiomyopathy MTO1

Cardiomyopathy GATAD1 Hypertrophic cardiomyopathy MUC16

Cardiomyopathy HOPX Hypertrophic cardiomyopathy MYBPC2

Cardiomyopathy ILK Hypertrophic cardiomyopathy MYL3

Cardiomyopathy JPH2 Hypertrophic cardiomyopathy MYOZ2

Cardiomyopathy MYL2 Hypertrophic cardiomyopathy MYOZ2

Cardiomyopathy PDLIM3 Hypertrophic cardiomyopathy NDUFV2

Cardiomyopathy SSUH2 Hypertrophic cardiomyopathy OBSCN

Cardiomyopathy SYNE1 Hypertrophic cardiomyopathy PRKAG2

Cardiomyopathy TTR Hypertrophic cardiomyopathy PTPN11

Cardiomyopathy FLNC Hypertrophic cardiomyopathy PYGB

Cardiomyopathy, dilated, 1cc NEXN Hypertrophic cardiomyopathy RAF1

Cardiomyopathy, dilated, 1d TNNT2 Hypertrophic cardiomyopathy SCO2

Cardiomyopathy, dilated, 1dd RBM20 Hypertrophic cardiomyopathy SLC25A3

Cardiomyopathy, dilated, 1l SGCD Hypertrophic cardiomyopathy SMARCA4

Cardiomyopathy, dilated, 1nn RAF1 Hypertrophic cardiomyopathy TMEM70

Cardiomyopathy, dilated, 1nn RAF1 Hypertrophic cardiomyopathy TNNT1

Cardiomyopathy, dilated, 1v PSEN2 Hypertrophic cardiomyopathy VWF

Cardiomyopathy, dilated, 1w; cardiomyopathy, hypertrophic, 15

VCL Infections, recurrent, associated with encephalopathy, hepatic dysfunction, and cardiovascular malformations

FADD

Cardiomyopathy, dilated, 1z TNNC1 Lchad Deficiency, dilated cardiomyopathy

HADHA

Cardiomyopathy, hypertrophic, 1, digenic MYLK2 Left ventricular noncompaction cardiomyopathy

CTNNA3

Cardiomyopathy, hypertrophic, 16 MYOZ2 LEOPARD syndrome 1 PTPN11

Cardiomyopathy, restrictive|long qt syndrome

ESR2 Linear skin defects with multiple congenital anomalies 3

NDUFB11

Cardiomyopathy|not provided FKTN Long qt syndrome 13 C11orf45

Cardiomyopathy|not provided TCAP Long QT syndrome 15 CALM2

Cardiomyopathy|not specified DES Long QT syndrome 2, acquired, susceptibility to

KCNH2

Cardiomyopathy|not specified EYA4 Long QT syndrome, acquired, reduced ALG10

170

susceptibility to

Cataract and cardiomyopathy AGK Mckusick-Kaufman syndrome MKKS

Catecholaminergic polymorphic ventricular tachycardia|ventricular tachycardia, catecholaminergic polymorphic, 4

CALM1 Microvascular complications of diabetes 3, Ischemic stroke, susceptibility to, Myocardial infarction, Stroke, hemorrhagic, susceptibility to

ACE

Charge syndrome SEMA3E Mitochondrial DNA depletion syndrome 12 (cardiomyopathictype)

SLC25A4

Chime syndrome PIGL Craniofacial dysmorphism, and congenital heart defects

B3GAT3

Chops syndrome AFF4 Mycobacterium tuberculosis, susceptibility to, Spina bifida, susceptibility to, Coronary artery disease, modifier of, Coronary artery disease, development of, in hiv

CCL2

Chronic atrial and intestinal dysrhythmia SGOL1 Myocardial infarction, Atherosclerosis, susceptibility to, HDL cholesterol, augmented response of

ESR1

Combined oxidative phosphorylation deficiency 8

AARS2 Myocardial infarcation, susceptibility to PSMA6

Congenital aneurysm of ascending aorta|loeys-dietz syndrome|loeys-dietz syndrome

TGFBR2 Myocardial infarction, decreased susceptibility to

F7

Congenital heart defect CRELD1 Myocardial infarction, protection against F13A1

Congenital heart defect EDNRA Myocardial infarction, susceptibility to 1 LRP8

Congenital heart defect GATA4 Myocardial infarction, susceptibility to 10 LGALS2

Congenital heart defect GATA6 Myocardial infarction, susceptibility to 2 GCLM

Congenital heart defect JAG1 Myocardial infarction, susceptibility to 3 TNFSF4

Congenital heart defect MTHFD1 Myocardial infarction, susceptibility to 4 LTA

Congenital heart defect STRA6 Myocardial infarction, susceptibility to 5 GCLC

Congenital heart defect CBS Myocardial infarction, susceptibility to 7 OLR1

Congenital heart defect JARID2 Myocardial infarction, susceptibility to 9 MIAT

Congenital heart defect MTRR Myopathy, early-onset, with fatal cardiomyopathy

TTN-AS1

Congenital heart defects, nonsyndromic, 1, x-linked

ZIC3 Myopathy, spheroid body MYOT

Congenital heart defects, hamartomas of tongue, and polysyndactyly

WDPCP not provided, Cardiac arrest DSC2

Congenital heart defects, multiple types, 4 NR2F2 Oculo-facial-cardiac-dental syndrome (OFCD)

BCOR

Congenital heart defects, nonsyndromic, 2 TAB2 Orthostatic intolerance SLC6A2

Congenital heart defects, with multiple joint dislocations

RTN4 Paroxysmal atrial fibrillation LMNA

Congestive heart failure and beta-blocker response, modifier of

ADRA2C Pericardial constriction and growth failure

TRIM37

Congestive heart failure and beta-blocker response, modifier of

ADRB1 Primary dilated cardiomyopathy BAG3

Conotruncal anomaly face syndrome TBX1 Primary dilated cardiomyopathy FHL2

Conotruncal heart malformations NKX2-6 Primary dilated cardiomyopathy PSEN2

Conotruncal heart malformations, variable NKX2-5 Primary dilated cardiomyopathy SYNE2

Coronary artery spasm 1, susceptibility to NOS3 Primary dilated cardiomyopathy, Cardiomyopathy, dilated, 1u, Heart failure

PSEN1

Coronary heart disease 2 IL1B Primary dilated cardiomyopathy, Combined oxidative phosphorylation deficiency 3

TSFM

Coronary heart disease 6 MMP3 Primary dilated cardiomyopathy, Familial hypertrophic cardiomyopathy 4, Primary familial hypertrophic cardiomyopathy, Left ventricular noncompaction 10, Cardiomyopathy, Paroxysmal atrial fibrillation

MYBPC3

171

Dilated cardiomyopathy ADIPOQ Primary dilated cardiomyopathy, Primary familial hypertrophic cardiomyopathy

MYOM1

Dilated cardiomyopathy ADORA1 Primary dilated cardiomyopathy, Primary familial hypertrophic cardiomyopathy, Sudden cardiac death, Cardiomyopathy, Left ventricular noncompaction cardiomyopathy

RBM20

Dilated cardiomyopathy ADRB2 Primary familial hypertrophic cardiomyopathy

ACADVL

Dilated cardiomyopathy CD40LG Primary familial hypertrophic cardiomyopathy

CALR3

Dilated cardiomyopathy CD46 Primary familial hypertrophic cardiomyopathy

DLG4

Dilated cardiomyopathy CDH2 Primary familial hypertrophic cardiomyopathy

DLST

Dilated cardiomyopathy CHGA Primary familial hypertrophic cardiomyopathy

KRAS

Dilated cardiomyopathy CHRM2 Primary familial hypertrophic cardiomyopathy

LAMA4

Dilated cardiomyopathy CRP Primary familial hypertrophic cardiomyopathy

MIB1

Dilated cardiomyopathy CTLA4 Primary familial hypertrophic cardiomyopathy

TMPO

Dilated cardiomyopathy CXADR Primary familial hypertrophic cardiomyopathy

TMPO-AS1

Dilated cardiomyopathy DAG1 Primary familial hypertrophic cardiomyopathy

TRIM63

Dilated cardiomyopathy DCAF4 Primary familial hypertrophic cardiomyopathy

TXNRD2

Dilated cardiomyopathy DNAJA3 Primary familial hypertrophic cardiomyopathy, Left ventricular noncompaction cardiomyopathy, Dilated cardiomyopathy 1G

TTN

Dilated cardiomyopathy DNAJC19 Primary pulmonary hypertension 2 SMAD9

Dilated cardiomyopathy DVL2 Progressive familial heart block, type IB TRPM4

Dilated cardiomyopathy EDNRB Pulmonary arterial hypertension related to hereditary hemorrhagic telangiectasia

ACVRL1

Dilated cardiomyopathy ELMSAN1 Pulmonary hypoplasia-diaphragmatic hernia-anophthalmia-cardiac defect (PDAC) syndrome

RARB

Dilated cardiomyopathy ERBB2 Ritscher-Schinzel syndrome KIAA0196

Dilated cardiomyopathy FAS Sensorineural deafness with hypertrophic cardiomyopathy

MYO6

Dilated cardiomyopathy FKRP Stroke, susceptibility to ALOX5AP

Dilated cardiomyopathy HFE Sudden cardiac death, Cardiomyopathy, not provided

GPD1L

Dilated cardiomyopathy HLA-G TARP syndrome RBM10

Dilated cardiomyopathy IFNG T-cell immunodeficiency, recurrent infections, and autoimmunity with or without cardiac malformations

STK4

Dilated cardiomyopathy IFT43 Tetralogy of Fallot, Atrial septal defect 9 GATA6

Dilated cardiomyopathy IGF1 Thoracic aortic aneurysms and aortic dissections

COL5A1

Dilated cardiomyopathy IL10 Thoracic aortic aneurysms and aortic dissections

TGFBR1

Dilated cardiomyopathy IL17A Thrombophilia due to factor V Leiden, Ischemic stroke, susceptibility to

F5

Dilated cardiomyopathy IL6 Thrombophilia, Ischemic stroke, susceptibility to

F2

Dilated cardiomyopathy ITPR2 Ventricular septal defect HEATR4

Dilated cardiomyopathy LAMA2 Ventricular septal defect MLH3

Dilated cardiomyopathy LAMA3 Ventricular septal defect NEK9

Dilated cardiomyopathy LAMP2 Ventricular septal defect NOTCH1

172

Dilated cardiomyopathy MIR208A Ventricular septal defect IRX4

Dilated cardiomyopathy MMP1 Ventricular septal defect NFATC1

Dilated cardiomyopathy MMP14 Ventricular septal defect SALL4

Dilated cardiomyopathy MMP9 Ventricular septal defect TBX5

Dilated cardiomyopathy MURC Ventricular septal defect 1 GATA4

Dilated cardiomyopathy MYBPC1 Ventricular tachycardia, catecholaminergic polymorphic, 3

UNC5B

Dilated cardiomyopathy MYPN Ventricular tachycardia, catecholaminergic polymorphic, 5, with or without muscle weakness

TRDN

Dilated cardiomyopathy NDUFV1 Ventricular tachycardia, somatic GNAI2

Dilated cardiomyopathy NEBL Ventricular tachycardia, somatic GPATCH2L

assessment of genetic risk factors for cardiovascular...

Documents