annotation of variants vkgl / vkgn course: cq-f1, erasmus ... van der j..pdfand genotypes variant...

36
Annotation of variants Variant Detection and Interpretation in a Diagnostic Context K. Joeri van der Velde, PhD (bioinformatics, research, diagnostics) Genomics Coordination Center Dept. of Genetics, UMC Groningen The Netherlands VKGL / VKGN course: “NGS in DNA diagnostics” Tuesday September 4 th 2018 CQ-F1, Erasmus MC, Rotterdam http://rnasilencing.files.wordpress.com/2 010/11/caduceus-with-dna-helix4.jpg

Upload: others

Post on 21-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

Annotation of variantsVariant Detection and Interpretation in a Diagnostic Context

K. Joeri van der Velde, PhD

(bioinformatics, research, diagnostics)

Genomics Coordination Center

Dept. of Genetics, UMC Groningen

The Netherlands

VKGL / VKGN course:“NGS in DNA diagnostics”

Tuesday September 4th 2018

CQ-F1, Erasmus MC, Rotterdam

http://rnasilencing.files.wordpress.com/2010/11/caduceus-with-dna-helix4.jpg

Page 2: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

VCF

FASTQ

BAM

G>T A>G T>CA

Genome diagnostics in a nutshell

athogenic

BPV

Penign

ariant of unknown significance

Sample prep & track WGS/WES/Targeted

Quality control

Read mapping

Call variants and genotypes

Variant annotation and interpretation

Counselpatient

Counselpatient,consent

Draw sample

This talk

Page 3: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

How many variants?

1 patient, WES

~100,000 variants

..which causal ?P

Page 4: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

Some keywords

Filteringcandidates

most likely

least likely

Prioritization IdentificationP

B B

B

B B BB

BB

Page 5: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

Filter & prioritize… on what?

we must first annotate the variants with additional context...use this information to filter & prioritize

CHR POS REF ALT + GoNL + 1000G + ClinVar2 179575511 C T .23 .212 179577870 T C2 179578704 G A .01 Pathogenic2 179578730 G A Benign2 179578891 T C .0012 179579093 T C Pathogenic2 179579212 T C .022 179579694 T A2 179579822 T A 0.02 Benign2 179579977 G A .25 .3

Page 6: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

Many different types of annotations

1. Gene panels2. Population alleles3. Protein impact4. Assessed variants5. Inheritance match6. Phenotype match7. Deleteriousness What? How?

+ pitfalls

Page 7: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

1. Gene panels: in the genome

1% pseudogenes (~15,000) & ncRNA (~25,000)

20-40% regulatory 50% repetitive / transposons

1% Exome (~20,000 genes)

10%

Intron & UTR

0.2% ‘Clinome’ (~3,600 genes)

The human genome: 3,000,000,000 base pairs

Known Mendelian disorder genes,may be clinically actionable

0.002% a ‘gene panel’ (1 to ~100 genes)Genes specific for a disorder or spectrum

Page 8: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

1. Gene panels: in practice

Jongbloed (2011) Expert Opin Med Diagn 5:9-24

Clinicians:

“Which genes am I

allowed to look at?”

Researchers:

“Which genes

am I interested in?”

Pitfalls:

❖ Genes can be shared across panels❖ Panels for same disorder differ per lab❖ Tunnel vision: this must be it❖ Panels often updated with latest genes

Page 9: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

1. Gene panels: clinical “superpanel”

Special highlight:

>3,600 monogenic disease genes, actionable to a degree

Page 10: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

Database Individuals Sequencing type Variant info

~500 Dutch WGS AF, GTC, genotypes (by

request)

~2500 various

WGS AF, genotypes

~60,000 various

WES AF, GTC

~6,500 various

WES AF, GTC

~140,000 various

WES and WGS AF, GTC

2. Population alleles: databases

Population variation reference databases: bigger = better

Page 11: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

2. Population alleles: practical example

❖ Check how often an alternative allele is observed in the general population

ExAC contains ~60,000 individuals

With 10 heterozygous and 2 homozygous: AC 14

Therefore: 14/120000 = 0.000117 = AF 0.01%

❖ Allele frequency filtering threshold? Depends..

Common or rare disorder? Dominant or recessive?

Founder mutation? Consanguinity?

.....from 0.01% to 5%

Pitfalls:

❖ Assumptions of disorder rarity and penetrance❖ Population may not be representative of patient ethnicity

Page 12: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

2. Population alleles: false discovery

Gene Variant Class Cases with Cases without

Controls with

Controls without

All Odds ratio

DSP Truncating 12 415 42 59570 41.01

❖ Disease genes usually enriched with pathogenic variants❖ Pitfall: Some (often newer) disease genes NOT enriched at all !

Special highlight:

Do patients have more “pathogenic variants” than healthy individuals?

Page 13: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

3. Protein impact

Effect Impact

STOP_GAINED HIGH

FRAME_SHIFT HIGH

NON_SYNONYMOUS_CODING MODERATE

CODON_INSERTION MODERATE

SYNONYMOUS_CODING LOW

NON_SYNONYMOUS_START LOW

INTRON MODIFIER

MICRO_RNA MODIFIER

+34 more effectsPitfalls:

❖ Transcript definitions (there could be dozens!)❖ Overlapping genes (need to deal with this)❖ “Impact” very handy but may oversimplify

Predict the effect of a variant on the transcription of a gene

Easy to use filter: each effect falls in 1 of 4 impact severity categories

& others& VEP

Page 14: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

4. Assessed variants: databases

“Has my variant been interpreted before?”

❖ LOVD instances ( http://grenada.lumc.nl/LSDB_list/lsdbs )❖ The Human Gene Mutation Database ( www.hgmd.cf.ac.uk )❖ HGVS list of LSDBs ( http://www.hgvs.org/locus-specific-mutation-databases )❖ VKGL data sharing of Dutch clinical variants

❖ Via Cartagenia bench lab ( http://cartagenia.com )❖ Joint national database ( www.molgenis.org/vkgl )

VKGL database current release:97,801 variants

Lead: Marielle van Gijn (UMCU)

Page 15: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

4. Assessed variants: ClinVar

❖ ClinVar ( www.ncbi.nlm.nih.gov/clinvar )❖ Big & growing: 5,449 6,550 8,103 genes with 163,816 318,819

440,386 variants (stricken: numbers from previous years)❖ ‘Star status’ based on submitter(s)❖ Low confidence?

OK, good to know!

Pitfalls:

❖ Many false positives in some databases (various papers on this)❖ Legacy data (e.g. variants reported before 2000)

Page 16: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

5. Inheritance match: CGD

“Could this heterozygous variant cause the disorder?”

Disorder inheritance mode Number of genes

Autosomal recessive (AR) ~1650

Autosomal dominant (AD) ~1000

AD or AR ~350

X-linked ~200

Other (digenic, bloodgroup, Y-linked etc) ~100

Pitfalls:

❖ CGD data is brilliant and easy to get but not totally “clean”, beware when using these data in a script/program

/

Page 17: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

6. Phenotype match: when and why?

0.2% ‘Clinome’ (~3,500 genes)

TECPR2 ….

Spastic paraplegia 49 Mucopolysaccharidosis IVA DysarthriaBrachycephalySeizuresBroad neckAreflexia

Cervical subluxationGenu valgumHypoplasia of the odontoid processKeratan sulfate excretion in urineMetaphyseal widening

MicrocephalyDental crowdingHypotoniaCentral apneaGastroesophageal reflux disease

?

❖ Filter the genome, based on patient symptoms❖ How? We know many gene-disorder associations

GALNS…. ….

….

https://guideimg.alibaba.com/images/shop/88/11/20/2/close-up-of-a-baby-with-an-ice-pack-on-his-head-poster-print-18-x-24_1882252.jpg

Page 18: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

6. Phenotype match: not so easy

There are >100,000 gene-disease-symptom associations!

......

OMIM:615031 TECPR2 9895 HP:0001260 Dysarthria

OMIM:615031 TECPR2 9895 HP:0000248 Brachycephaly

OMIM:615031 TECPR2 9895 HP:0001250 Seizures

OMIM:615031 TECPR2 9895 HP:0001284 Areflexia

OMIM:615031 TECPR2 9895 HP:0000475 Broad neck

OMIM:253000 GALNS 2588 HP:0003308 Cervical subluxation

OMIM:253000 GALNS 2588 HP:0002857 Genu valgum

OMIM:253000 GALNS 2588 HP:0003311 Hypoplasia of the odontoid process

OMIM:253000 GALNS 2588 HP:0012069 Keratan sulfate excretion in urine

OMIM:253000 GALNS 2588 HP:0003016 Metaphyseal widening

…...

...

ALL_SOURCES_ALL_FREQUENCIES_diseases_to_genes_to_phenotypes.txt by HPO team

Page 19: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

6. Phenotype match: how?

❖ NaiveConsiders terms separatelyTerms directly linked to diseases/genes

❖ SmartConsiders terms as one collectionTerms algorithmically matched to diseases/genes

Page 20: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

6. Phenotype match: naive method

❖ Naive, using specific termsexample:

"Keratan sulfate excretion in urine"

"Cervical subluxation"

OMIM:253000 GALNS 2588 HP:0012069 Keratan sulfate excretion in urineOMIM:253010 GLB1 2720 HP:0012069 Keratan sulfate excretion in urineOMIM:601492 HYAL1 3373 HP:0012069 Keratan sulfate excretion in urine

OMIM:253000 GALNS 2588 HP:0003308 Cervical subluxationOMIM:253010 GLB1 2720 HP:0003308 Cervical subluxationOMIM:607095 RMRP 6023 HP:0003308 Cervical subluxation

HYAL1 RMRPGALNSGLB1

Prime candidates? Also interesting?

Pitfall: Chances of an exact match are very small

Page 21: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

6. Phenotype match: naive method

❖ Naive, using broad terms

Pitfall: too many genes to investigate!

Seizures: 1181 genesMicrocephaly: 644 genesMuscular hypotonia: 928 genesHearing impairment: 1231 genesDevelopmental delay: 639 genesShort stature: 994 genesNystagmus: 706 genesCognitive impairment: 894 genes

Substringed on July 2015 release of ALL_SOURCES_ALL_FREQUENCIES_diseases_to_genes_to_phenotypes.txt by HPO team

Page 22: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

6. Phenotype match: smart method

Input symptom set A

Symptoms of disease B

❖ Smart matchingusing any terms

❖ HPO symptoms =graph of terms

❖ “Semantic similarity”between collectionsof symptoms (Resnik, 1999)

Page 23: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

6. Phenotype match: smart tools

Phenotips ( http://phenotips.org )

Comprehensive matching in nice GUI, including ‘negative’ symptoms and suggestions for differential diagnosis

Page 24: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

6. Phenotype match: smart tools

AMELIE ( https://amelie.stanford.edu/ )

Enter symptoms and candidate genes, uses automated literature search to prioritize candidate genes

Why: truly very good performance for finding known disease genes (unpublished benchmark)

Page 25: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

6. Phenotype match: smart tools

GADO ( https://genenetwork.nl/gado )

Enter symptoms and optional candidate genes, prioritizes any expressed gene based on massive RNA-seq gene network

Why: perfect for discovering new disease genes

Page 26: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

6. Phenotype match: wrap-up

Pitfall:

❖ Not every ‘phenotype matching’ software is smart!

So be wary of how the matching is performed

For ‘smart’ software:

Enter as many, as specific symptoms as possible

this will only improve the match and chances of success!

For ‘naive’ software:

Try a few specific terms first, but don’t get stuck. Broaden

your terms when needed, don’t enter too many at once.

no point in getting thousands of hits

Page 27: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

7. Deleteriousness: traditional tools

Many conservation-based tools that predict how damaging a

variant will be:

SIFT, PolyPhen, PROVEAN, PON-P2, MutationAssessor, FATHMM-MKL, Condel, PhyloP, UMD-Pred, Grantham, DeepPVP, ENTPRISE, PHAST, FitCons, DANN, MutPred, GERP, Xvar, VAAST, AlignGVGD, MAPP, MutationTaster, Alamut, MVP, EIGEN, REVEL, DIVAN, DeepSEA, REMM, GWAVA, DeltaSVM, DANQ, and so on ...

Good tools! But there are some issues:

• Scope difference (types of variants, genomic regions)• Results often correlated with each other• Training bias or even overfitted

So which to use?

Page 28: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

7. Deleteriousness: next gen tools

LINSIGHT (http://www.nature.com/ng/journal/v49/n4/full/ng.3810.html)

EIGEN (http://www.nature.com/ng/journal/v48/n2/full/ng.3477.html)

CADD (https://www.nature.com/ng/journal/v46/n3/full/ng.2892.html)

Machine-learned model using 60+ tools & sources,trained to find the difference between:variants stabilized in evolution since primates andvariants randomized across the genome

Low CADD score means:This variant looks like a stabilized variant, safe

High CADD score means:This variant looks like a non-stabilized variant, could be dangerous

Great for variant prioritization

http://3.bp.blogspot.com/-yeR87n827N8/TkagxSc9OuI/AAAAAAAAAyI/bYwDJjym8lA/s1600/Rise-of-the-Planet-the-Apes-James-Franco.jpg

Page 29: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

7. Deleteriousness: CADD scores

COL3A1Ehlers Danlos Syndrome, data from Raymond Dalgleish, https://eds.gene.le.ac.uk/home.php?select_db=COL3A1

Patient causal variantGoNL variant1000G variant

http://www.reluctantcontortionist.co.uk/wp-content/uploads/2014/06/James-Morris-The-Elastic-Skin-Man-2-264x300.png

✔Threshold between pathogenic & benign?

Page 30: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

7. Deleteriousness: CADD scores

MEFVFamilial Mediterranean Fever, data fromINSAID Consortium, http://molgenis.org/said

https://i.pinimg.com/originals/4e/b2/e6/4eb2e6e6c3e42b80ac363b3bc7dce572.jpg

Patient causal variantGoNL variant1000G variant

xPitfall: does not work for every gene!

Threshold between pathogenic & benign?

Page 31: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

7. Deleteriousness: GAVIN method

GAVIN - Gene-Aware Variant INterpretation for medical sequencing Van der Velde et al., Genome Biology (2017) 18 (1): 6.

❖ Automated variant classification❖ Based on CADD calibrations❖ Web app: molgenis.org/gavin❖ UMC Groningen genome

diagnostics: 2x faster

Special highlight:

Spe

cific

ity →

Sensitivity →Plot by Kristin Abbott

Page 32: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

Bonus: Excel gene names @$%&#-ups

Pitfall: Gene name errors caused by Excel

Zeeberg et al. 2004, Ziemann et al. 2016

http://www.winbeta.org/news/20-of-scientific-papers-on-genes-contain-gene-name-conversion-errors-caused-by-excel

This will happen to you

Page 33: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

in summary...

sequence filter

prioritize identify

P

BB

BB

B

B BBB

BBB

B

Page 34: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

PS: Solve-RD: huge patient re-analysis

21 international partners + ERNs19,000 unsolved WES re-analysis2,000 cases selected for WGS800 ultra-RD patients WES/WGS500 long-read genomes (PacBio)120 ‘unsolvables’ into multi-omics

Structural variation

GeneNetwork

A.S.E.

Methylation network

Exon skipping

Literature match

GAVIN+

InterVar SilVA

SPIDEX

Kumar NC

VIPUR

“UTR tool” “TFB

S to

ol”

dept. of Genetics

Diagnostics & research

Testbeds

Your help needed:Fresh ideas/methods/tools we can try out to diagnose moreSamples of undiagnosed patients (i.e. FASTQ/BAM/VCF files)

Project highlight

Page 35: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

Some useful hyperlinks1. Clinical Genomics Database http://research.nhgri.nih.gov/CGD

2. Genome of the Netherlands http://nlgenome.nl

2. 1000 Genomes Project http://1000genomes.org

2. gnomAD database http://gnomad.broadinstitute.org

3. Variant Effect Predictor http://www.ensembl.org/vep

3. SnpEff tool http://snpeff.sourceforge.net

4. MOLGENIS for scientific data http://molgenis.org

5. Online Mendelian Inheritance in Man http://omim.org

6. HPO terms http://human-phenotype-ontology.github.io

6. Phenomizer http://compbio.charite.de/phenomizer

6. PhenoTips diagnosis https://phenotips.org

7. CADD scores http://cadd.gs.washington.edu

7. GAVIN method http://molgenis.org/gavin

8. European Spreadsheet Risk Interest Group http://www.eusprig.org

9. Solving the Unsolved Rare Diseases http://solve-rd.eu/

Page 36: Annotation of variants VKGL / VKGN course: CQ-F1, Erasmus ... van der J..pdfand genotypes Variant annotation and interpretation Counsel patient Counsel patient, consent Draw sample

Thank you! Questions & acknowledgements

• Special thanks to:• Morris Swertz, Richard Sinke, Kristin Abbott, Rolf Sijmons

• Collaborators from UMCG, RUG, (inter)national projects

• Colleagues & students:

contact: [email protected]