gwas vs ngs
DESCRIPTION
GWAS vs NGS. James McKay Genetic Susceptibility Group. Genetics. Individual. Predicted Phenotype. Non-heritable. Heritable. Environment. What we expect in terms of effects of genetic variants in cancer susceptibility. Population frequency seems to impact on disease - PowerPoint PPT PresentationTRANSCRIPT
GWAS vs NGS
James McKayGenetic Susceptibility Group
Genetics
Environment
Individual
PredictedPhenotype
Heritable
Non-heritable
What we expect in terms of effects of genetic variants in cancer susceptibilityWhat we expect in terms of effects of genetic variants in cancer susceptibilityPopulation frequency seems to impact on disease Severity of the consequence on the genes function
Genome wide association studies GWAS
Cases Controls
Agnostic approach -- no knowledge about the gene is needed
Test all common genetic variation across the genome
770,000 variants for common variants, each tested for differences between cases and controls
Assays to measure all common
genetic variation in human genome
Cases Controls-Test each one of the variants, tested for differences between cases and controls
Genome wide association studies Association in case-control groups
Cancer types with successful GWAS
Prostate cancerBreast cancerColorectal cancerLung cancerEsophageal cancer Ovarian cancerHead and NeckTesticular cancerBladder cancerThyroid cancerPancreatic cancer
MelanomaBasal cell carcinomaGliomaNeuroblastomaKidneyChronic lymphocytic leukemiaAcute lymphoblastic leukemiaFollicular lymphomaMyeloproliferative disordersHodgkin’s Lymphoma
Blue = carried out at IARC
-Log
10 (
p-va
lue)
Chromosome
6p21 MHC Region
5q31 IL13/IL4
GWAS Results Classical HL – 4 european studies
1200 ca 6713 generic control
K Urayama
MHC Region Associations
27 28 29 30 31 32 33
05
10
15
20
25
30
All classical HL
EBV-positive HL
EBV-negative HL
-Log
10 (
P v
alue
)
Position in MHC Region (MB)
HLA-DRA: rs2395185
MICB: rs2248462
Extended Class I Class I Class III Class II
27 28 29 30 31 32 33
05
10
15
20
25
30 HLA-DRA: rs6903608
HLA-DRA: rs6903608
27 28 29 30 31 32 33
05
10
15
20
25
30
HLA-A: rs2734986
HLA-A: rs6904029
HeterozygousHomozygous
EPILYMPH-GWASSCALE-GWASUK Studies-GWASNetherlands-GWASEPILYMPH-ReplicationUK Studies-Replication
NSHLMCHL
EBV- cHLEBV+ cHL
15-35 years36-90 years
MCHL vs NSHLEBV+ vs EBV- cHL
Log-additive
Study (P homogeneity=0.137)
Major subtypes (P homogeneity=0.144)
Tumor EBV status (P homogeneity=0.011)
Age-specific cHL (P homogeneity=0.155)
Case-case analysis
1757624261
18134139128764
493
1262330
958392
968788
330392
70201967232
862578
31671803396214
70207020
60556055
70207020
1262958
1.381.391.84
1.731.771.281.311.221.08
1.451.21
1.471.07
1.461.27
0.820.81
1.24-1.541.22-1.581.38-2.45
1.27-2.351.33-2.351.06-1.551.03-1.660.74-2.010.79-1.48
1.29-1.630.98-1.49
1.29-1.680.87-1.32
1.29-1.651.10-1.47
0.66-1.020.65-1.01
Chr 5: IL13rs20541 Ca Co OR 95%CI
1.0 1.5 2.0
OR
P=1.8x10-9
P=1.1x10-8
Results for IL13
Next step in GWAS
Very large sample sizesmeta-analysis lung cancer 14K ca 18K co
Are all SNPs equal?Bayesian approach, weight SNPs based
on different approaches – eQTL, medical literature
Many cancer loci are relevant to more than one cancer subtype – start with known loci decrease multiple testing burden
Very large sample sizesmeta-analysis lung cancer 14K ca 18K co
Are all SNPs equal?Bayesian approach, weight SNPs based
on different approaches – eQTL, medical literature
Many cancer loci are relevant to more than one cancer subtype – start with known loci decrease multiple testing burden
Limitations of GWAS
Small RR and many variants testedSample sizes in thousand samples
needed2nd cancers in Hodgkin’s Best et al Nat Med 2011
Only considers common genetic variants(and only ~ 80% of them)
Rare variants not assessed
Small RR and many variants testedSample sizes in thousand samples
needed2nd cancers in Hodgkin’s Best et al Nat Med 2011
Only considers common genetic variants(and only ~ 80% of them)
Rare variants not assessed
Next generation sequencing
Massive parallel sequencing
Now able to assay the entire sequence of an Individual
The seq first genome – $3 billion, 14+ labs
A single machine, $3000
Many applications other than DNA reseq
Now able to assay the entire sequence of an Individual
The seq first genome – $3 billion, 14+ labs
A single machine, $3000
Many applications other than DNA reseq
Review issue Exomes Genome Biology 2011
GWAS assays focus on common genetic variants, NGS givesIndividual seq hence common information on rare variants
GWAS assays focus on common genetic variants, NGS givesIndividual seq hence common information on rare variants
Families, trios, case control, tumour vs normal, Pooled/individual
Whole genome, target capture (exome, spec regions? Illumina SOLiD, PGM, 454…..
Seq ACGTACGTACGAGCT……ACGTACGTACGTACGT75 – 150 – 250 bp
Mapping
Variant calling
Variant consequence
Sboner et al Genome Biology 2011
An example of a NGS workflow
Variant calling, heterozygote calls, 50% of reads should be wild type allele, C (ie in the reference)50% of read should be variant ie T30 reads / base seems to be solution in terms of accuracy/cost effectiveness
NGS data, many many short sequence reads
~3 million SNPs
15 – 20,000Coding SNPs
5,000 – 7,000Coding SNPs
200 -500 Nonsynon + trun SNPs
50 – 100 Functional SNPs
Target exomes
Silent, Synonymous
Previously identifed
Functional – truncatingIn silico predictions
Variant filtering
Families, trios, case control, tumour vs normal, Pooled/individual
Whole genome, target capture (exome, spec regions? Illumina SOLiD, PGM, 454…..
Seq ACGTACGTACGAGCT……ACGTACGTACGTACGT75 – 150 – 250 bp
Mapping
Variant calling
Variant consequence
Ahhh, yes, tricky, we might have to form a working group and get back to you on that one
Sboner et al Genome Biology 2011
An example of a NGS workflow
After Qc filtering
50-100 variants per individual that are in Genes and appear functional
How do we differentiate true from false?
Bin variants across genes? Test for association? (need @ least 3K ca 3kco)
50-100 variants per individual that are in Genes and appear functional
How do we differentiate true from false?
Bin variants across genes? Test for association? (need @ least 3K ca 3kco)
NPC pedigree Sarawak Malaysia
11 cases for which we have genomic DNA
Exome sequencing underway
Triage variants in pedigree, interesting variant should segregating in cases
Validation in remaining individuals + additional pedigrees, (Allan Hildesheim US NCI)
Genes following two hit models
(Knudson’s hypothesis)
NGS quite successful in recessive diseases (two mutations, a rare event)
Many inherited tumours have no normal alleles, one inherited, the second (wildtype) then deleted somatically, RB, TP53, VHL, BRCA1/2, APC, PTEN
NGS quite successful in recessive diseases (two mutations, a rare event)
Many inherited tumours have no normal alleles, one inherited, the second (wildtype) then deleted somatically, RB, TP53, VHL, BRCA1/2, APC, PTEN
chrA chrB
BRCA1BRCA1
Exomes seq
Seq
Exomes seq
Seq
Genomic DNA Somatic Tissue events
Catalog mutation events in consistutional DNA
And somatic events
Identify genes for which there isCo-occurence of events, consistent with two hithypothes
Identify genes for which there isCo-occurence of events, consistent with two hithypothes
chrA (inherited events)50 by chance?
chrB (somatic events)500 by chance?
Exome seqCNV
1.3 times per genome
IARC biorep has close to 500 lung cancer cases with a blood sample and snap frozen tumour 30 LC have a first degree relatives with lung cancer
IARC biorep has close to 500 lung cancer cases with a blood sample and snap frozen tumour 30 LC have a first degree relatives with lung cancer
IARC biorepos approx contains lung cancers
blood and frozen tumour
Two stage designExome sequencing
Normal/tumour 30 fh+
470 for replication
Two stage designExome sequencing
Normal/tumour 30 fh+
470 for replication
I’ll stop there, Thanks
Next generation sequencing
Massive parallel sequencing
Assay single cell and single position
Say chr 3 1 - 50 (from a single cell)Diploid:chr1 ACGTACGTACGAGACGTACGTACGTACGTchr2 ACGTACGTACGAAACGTACGTACGTACGT
Not a single cell (although its being worked on), but sample a individuals In parallel, massive billions of reads,
Assay single cell and single position
Say chr 3 1 - 50 (from a single cell)Diploid:chr1 ACGTACGTACGAGACGTACGTACGTACGTchr2 ACGTACGTACGAAACGTACGTACGTACGT
Not a single cell (although its being worked on), but sample a individuals In parallel, massive billions of reads,