introduction to the concept of functional genomics david meyre, associate professor, mcmaster...
TRANSCRIPT
Introduction to the concept of functional genomics
David Meyre, Associate Professor, McMaster University([email protected])
HRM 728 Graduate Course: Genetic Epidemiology – October, 24th 2014
Population Genomics Program
Introduction to the concept of functional genomics
What Is Functional Genomics?
The goal of functional genomics is to understand the relationship between an organism’s genome and its phenotype.
Functional genomics is a field of molecular biology that is attempting to make use of the vast wealth of data produced by genome sequencing projects to describe genome function. Functional genomics uses high-throughput techniques like DNA microarrays, proteomics, epigenomics, metagenomics, metabolomics and mutation analysis to describe the function and interactions of genes.
The genomic revolution
Human genome sequence High-throughput technologies
Large human biobanksBiostatistics & Bioinformatics
GENES
FUNCTIONAL GENOMICS
Gene identification approaches
Genome-wide linkage Candidate gene
Homozygosity mapping Genome-wide association
GENES
Full exome / genome sequencing
Classification of human genetic diseases
Syndromic disease
( < 0.004%)
Monogenic disease
( < 2 %)
Polygenic disease
(~ 20%)
OBESITY
Genes and causality
SYNDROMIC / MONOGENIC DISEASE
Beyond co-segregation studies, additional arguments are needed to demonstrate the causal role of a mutation in the disease
functional genomics
Genes and causality
POLYGENIC DISEASE (e.g. type 2 diabetes)
Beyond association studies, additional arguments are needed to demonstrate the causal role of a variant / gene in the disease
functional genomicsSladek et al., Nature 2007
Introduction to the concept of functional genomics
TRANS-ETHNIC FINE MAPPING APPROACH
Variant identification : Functional prediction (in silico)
Gene / locus Identification
Functional validation (in vitro, in vivo)
Hypothesis free approaches Candidate gene
Fine mappingWe are here
Trans-ethnic fine mapping approach
.Linkage disequilibrium is the non-random association of alleles at two or more loci
. The human genome is composed of blocks of linkage disequilibrium
. The extent of linkage disequilibrium blocks varies according to the ethnic background
Trans-ethnic fine mapping approach
Distance (Kb)
Icelandic
French
Asian
African
Disease-associated LD block
SNP1 SNP2 SNP3 SNP4 SNP5
Causal SNP
Trans-ethnic fine mapping approach
. Large-scale resequencing and case control association studies in Icelandic, Danish, West African and American African subjects identified the rs903146 as the likely causal type 2 diabetes-associated SNP
Variant identification : Functional prediction (in silico)
Gene / locus Identification
Functional validation (in vitro, in vivo)
Hypothesis free approaches Candidate gene
Fine mapping
We are here
Gene candidacy
. Are genes in the disease-associated LD block involved in syndromic / monogenic forms of the same disease?
-loci associated with polygenic obesity: MC4R, BDNF, POMC, PCSK1, SIM1
-GWAS for complex traits: 20% of the GWAS loci include genes involved in mendelian disorders for the same trait
. Are genes in the disease-associated LD block involved in a corresponding phenotype in animal models (KO, Tg, SiRNA)?
-loci associated with polygenic obesity: MC4R, BDNF, POMC, PCSK1, SIM1, FTO, GIPR, NPC1, SH2B1, TBC1D1, NEGR1
- > 170 genes induce a phenotype of severe obesity in genetic mice models
. Gene function, biology
-function related to energy metabolism
Gene candidacy
In order to find the causal gene in a disease-associated linkage disequilibrium block, mRNA expression studies can be useful (microarrays, RT-PCR):
1-Is the gene expressed in target tissues for the disease (obesity: brain, adipocytes; T2D: pancreas)?
2-Is the gene mRNA expression modulated by the disease status in a relevant tissue?
3-Is the gene mRNA expression modulated by the disease-associated SNP in a relevant tissue?
Gene candidacy
. ORMDL3 is one of the 19 genes located in the asthma-associated LD block
. ORMDL3 is expressed in the lung
. ORMDL3 mRNA level is modulated by asthma disease status in lymphoblastoid cell lines
. ORMDL3 mRNA level is strongly modulated by the asthma-associated SNP in lymphoblastoid cell lines
ORMDL3 is a highly relevant candidate gene at this locus
Moffatt et al., Nature 2007
Gene candidacy
. Combination of expression mRNA and GWAS studies
. 27 genes differentially regulated in adipose tissue of monozygotic twins discordant for obesity
. ‘Hypothesis driven’ GWAS analysis for these 27 genes followed by a replication in a second independent sample identified a novel obesity gene: F13A1
Naukkarinen et al., PLOS Genet 2010
Functional prediction (in silico)
Gene / locus Identification
Functional validation (in vitro, in vivo)
Hypothesis free approaches Candidate gene
Fine mapping
We are here
Introduction to the concept of functional genomics
GENE VARIANT AND FUNCTION
Gene variant and function
Two major types of variants :
1- Variants that affect the protein structure and function of the gene in which they occur :
Missense, nonsense, frameshift (indels) coding mutations: altered protein function
Intron / exon mutations, splicing branch points: exon skipping/adding
Gene variant and function
2- Variants that affect expression and regulation of the gene in which they occur or other distal genes (eQTLs)
gene variant in the promoter (Transcription Factor Biding Site): change in gene expression
gene variant in 3’UTR: altered mRNA stability
gene variant in microRNAs binding sites: change in expression
gene variant in enhancers / silencers/ insulators: change in expression in a distal gene (or a group of genes)
gene variant in a CpG methylation site: change in DNA methylation pattern
Copy Number Variants (CNV): modulation of gene expression, haplo-insufficiency
How to prove causality between a genetic variant and a biological effect?
In silico prediction studies for coding variants
+: deleterious
-: neutral
Eight coding non-synonymous mutations in the PCSK1 gene have been identified in extreme obese patients: the Polyphen-2 software (conservation of the amino-acid across evolution + protein structure) is 100% concordant with in vitro studies
Mutations PolyPhen-2 PANTHER SIFT SNAP PMUT
K26E - - - - -
M125I + - - - -
T175M + + + + +
N180S + + + + -
Y181H + + + + -
G226R + + - + +
S325N + + + + -
T558A + NA - - -
G593R + NA + + +
Creemers et al., Diabetes 2012
prediction studies for regulatory variants
Transcription factor binding sites and cis regulatory modules
phylogenetically conserved sites
specific epigenetic marks (ex : enhancer /silencers /insulators specific proteins, promoter proteins , DNA methylation, DNAse hypersensitivity … )
• Combine both in silico and indirect experimental data, ex: ANOVAR, FunciSNP, PMCA, GWAS3D
• These tools attribute a score to each variant in the LD block of the genomic region thought to cause the phenotype and predict its functionalaty based on its proximity to:
Introduction to the concept of functional genomics
EVOLUTIONARY GENETICS
We are here
Evolutionary genetics
Natural selection is the gradual, non-random process by which biological traits become either more or less common in a population as a function of differential reproduction of their bearers. It is a key mechanism of evolution. The term "natural selection" was popularized by Charles Darwin.
Evolutionary genetics (Huxley 1942)
-advantageous mutations have been positively selected in human populations during recent evolution
-disadvantageous mutations have been negatively selected in human populations during recent evolution
Evolutionary genetics
THRIFTY GENOTYPE HYPOTHESIS: the 'thrifty' genotype would have been advantageous for hunter-gatherer populations, especially child-bearing women, because it would allow them to fatten more quickly during times of abundance. Fatter individuals carrying the thrifty genes would thus better survive times of food scarcity.
Obesity and type 2 diabetes predisposing mutations may show evidence of positive signature of evolution
Evolutionary genetics
.The LCT rs4988235 T variant confers lactase persistence
. The LCT rs4988235 T variant is associated with more milk / dairy products consumption and increased body mass index
. The LCT rs4988235 T variant has a selective advantage in milk-producing dairy farming populations and has been submitted to positive selection in relation with events of cattle domestication
. The LCT rs4988235 T allele frequency is more frequent in Northern (MAF: 0.7) than in Southern Europe (MAF: 0.1)
Evolutionary genetics
Davey-Smith et al., EJHG 2009
LCT rs4988235 T allele frequency in UK
Evolutionary genetics
. Genome-wide approaches in diverse ethnic backgrounds have identified several hundreds of regions showing recent positive natural selection
. New methods are able to identify causal variants in regions with positive natural selection signature
. The amino-acid change Lys109Arg in the LEPR gene is as a causal variant submitted to positive selection
. The Lys109Arg variant is associated with body mass index variation
Grossman et al., Science 2010
Evolutionary genetics
. Genome-wide approaches in diverse ethnic backgrounds have identified several hundreds of regions showing recent positive natural selection
. New methods are able to identify causal variants in regions with positive natural selection signature
. The amino-acid change Lys109Arg in the LEPR gene is as a causal variant submitted to positive selection
. The Lys109Arg variant is associated with body mass index variation
Grossman et al., Science 2010
Introduction to the concept of functional genomics
Sources of data for variant functionality prediction
We are here
ENCODE PROJECT (ENCyclopedia of Dna Elements).
• https://www.encodeproject.org/
Integrative analysis of :
•3545 biosamples (2441 in humans) from different cell lines/ tissues
•971 epigenetic marks
5194 assays (Chip-seq, RNA-seq, IP, DNAse seq, transcription profiling …)
NIH Roadmap Epigenomics Mapping Consortium
•http://genomebrowser.wustl.edu/
Integrative analysis of :
•111 reference human cells/tissues
•40+ epigenetic marks
Genotype-Tissue Expression (GTEx) project
Correlations between genotype and tissue-specific gene expression levels in
•42 cell lines/ tissues
•100 - 200 RNA seq and genotyped samples
• http://www.gtexportal.org/home/
Examples of coding and regulatory variants
Gene variation in the promoter and gene Gene variation in the promoter and gene expression expression
. The -11391 G>A variant in the promoter of the ACDC/adiponectin gene is associated with higher in vitro promoter activity and with higher plasma adiponectin level in lean and in obese children
Bouatia-Naji et al., Diabetes 2006
Gene variation and long-range Gene variation and long-range enhancerenhancer
Smemo et al., Nature 2014
. The obesity-associated FTO intron 1 region directly interacts with the promoter of IRX3 gene (580 Kb downstream of FTO)
. The intron 1 SNP in FTO modulates IRX3 (but not FTO) expression
. Irx3-deficient mice display a leanness phenotype
Gene variation at a CpG methylation Gene variation at a CpG methylation sitesite
. Gene variant rs1421085 in intron 1 of FTO is the main contributor to polygenic obesity (Dina et al., Nat Genet 2007)
. Gene variant rs7202116, in full linkage disequilibrium with rs1421085, creates a CpG methylation site and is associated with increased methylation of a 7.7 kb regulatory region within FTO
. The 7.7 kb regulatory region encapsulates a Highly-Conserved non Coding Element that acts as a long range gene expression enhancer
Bell et al., PLOS One 2010
Intron / exon mutations and exon skipping
. Extreme obesity cosegregates with homozygosity for a G/A substitution in the splice donor site of exon 16 of the LEPR gene
. The intron / exon mutation induces skipping of exon 16 and a truncated inactive leptin receptor
Clement et al., Nature 1998
CNVs are highly causal variants in CNVs are highly causal variants in mendelian diseases mendelian diseases
a 600kb heterozygous deletion (~30 genes) on chromosome 16p11.2 explains 0.7% of morbid hyperphagic obesity and is associated with developmental delays
duplications in the same chromosomal region are associated with underweight and eating restrictive disorders
SH2B1, a key modulator of the response to the satiety hormone leptin, and a Mendelian hyperphagic obesity gene, is located in the deleted interval
Walters et al., Nature 2010; Jacquemont et al., Nature 2012
Gene variation in 3’UTR and mRNA Gene variation in 3’UTR and mRNA stability stability
.A>G +1044 TGA SNP is included in the ENPP1 risk haplotype associated with higher ENPP1 plasma level and risk of obesity / T2D
.A>G +1044 TGA forms a linkage disequilibrium block in 3’UTR with A>C +1092 TGA and C>T+1157 TGA
.In HLA cells transfected with either 3’UTR variant or wild-type cDNA, specific ENPP1 mRNA half-life was increased for those transfected with 3’UTR variant cDNA (t/2=4.35 vs. 2.55 h; p=0.001)
Meyre et al., unpublished
Cis versus Trans e-QTLs?
. The polymorphism rs9585056 is associated with T1D, modulates the expression of the cis-gene GPR183 and the expression of the IRF7 network genes
Heinig et al., Nature 2010
Functional prediction (in silico)
Gene / locus Identification
Functional validation (in vitro, in vivo)
Hypothesis free approaches Gene candidate
Fine mapping
We are here
In vitro functional studies
68% of non-synonymous mutations found in obese patients are deleterious (test alpha-MSH)
Stutzmann et al., Diabetes 2008
In vitro/ In vivo functional studiesCRISPR/Cas9 system, a powerful genetic tool for
genome editing and the study of functional variants
The rs1421085 variant of FTO identified by PMCA has been proven to modulate the expression of IRX3 and IRX5 genes using CRISPR/Cas9 method
Introduction to the concept of functional genomics
STUDY OF ENDOPHENOTYPES
. Rs17782313 near MC4R has been associated with BMI by GWAS
. Deleterious coding mutations in MC4R are the commonest form of monogenic obesity with hyperphagia and increased stature
. If the SNP modulates the expression / function of MC4R, we can predict associations with the same traits in an appropriate direction
. The SNP rs17782313 obesity predisposing allele is associated with more snacking and overeating and increased stature
MC4R is a highly relevant candidate gene at this locus
Study of endophenotypes
Stutzmann et al., Int J Obes 2009
Functional prediction (in silico)
Gene / locus Identification
Functional validation (in vitro, in vivo)
Hypothesis free approaches Gene candidate
Fine mapping
FTO, a good illustration of integrative approach
.Novel variants identified in African populations
.FTO SNP shows evidence of positive natural selection
.The SNP is associated with different patterns of methylation (demethylase)
FTO SNPs in intron 1 affect the expression of other genes (IRX3 and IRX5) implicated in fat storage and energy expenditure
. FTO complete deficiency leads to a polymalformative lethal syndrome in humans
. FTO partial deficiency does not relate to leanness/obesity in humans
. FTO knock-out mice are lean, FTO transgenic mice are obese
. FTO is highly expressed in hypothalamus and is regulated by fasting and feeding
. FTO SNP is associated with food intake in humans
Ichimura et al., Nature 2012
ANY QUESTIONS?ANY QUESTIONS?
The French fair-play!