introduction to next generation sequencing (ngs) · introduction to next generation sequencing...

24
Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Upload: others

Post on 23-May-2020

43 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Introduction to Next Generation Sequencing (NGS)

Andrew ParrishExeter, 2nd November 2017

Page 2: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

• What is Next Generation Sequencing (NGS)?

• Why do we need NGS?

• Common approaches to NGS

• NGS Workflow

Topics to cover today

Page 3: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

What is Next Generation Sequencing? (NGS)

• Historically we have used Sanger sequencing to investigate genetic diseases

• This looks at one stretch of DNA from one patient at a time (~600 base pairs in length)

• Measures fluorescence given off when dye labelled nucleotides are excited by a laser to determine order of bases

Page 4: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

What is Next Generation Sequencing? (NGS)

• NGS (also referred to as high throughput sequencing or massively parallel sequencing)

• Generates hundreds of millions of overlapping short sequences (up to 300bp) in a single run

• These have to be computationally put back together

• Can look at multiple patients in one run

Page 5: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Why do we need Next Generation Sequencing? (NGS)

• Human Genome project took 15 years to complete using Sanger based technology at an estimated cost of $3 billion

• Today, using NGS, this could be completed in a day or two for under $1000

Page 6: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Common approaches to NGS

• Targeted panels (tNGS)– Pull out specific genes from the patient’s DNA and only obtain

the sequence data from these genes (up to about 150 genes)

• “Rare disease”/”Medeliome”/”Clinical” exome– Essential a very large (6,110 genes) panel that looks at the exons

of genes known to cause human disease (at the time of design!)

• “Whole” exome– Looks at the exons of 23,244 expressed genes that encode 1-2%

of the human genome

• Genome sequencing– Looks at the complete (ish) DNA sequence from a patient

Page 7: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Common approaches to NGS

• Single gene disease– Easily clinically recognisable disease– Single genetic aetiology (mutations in one gene cause this disorder)– Existing tests widely available in diagnostic laboratories

• Small number of genes for a disease– Clinically recognisable disease– Multiple sub-types caused by mutations in different genes – Highly developed clinical expertise and knowledge available in specialist

centres

• Large number of possible causes (or no known cause)– Strong suggestion of monogenic disease, but no clear clue to which gene to

test

Page 8: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Workflow for NGS

Raw Reads (FASTQ)

Assess quality and process reads

Processed reads (FASTQ)

Map to reference genome

Aligned Reads (SAM/BAM)

Call variants (VCF)

Variant and sample quality control

Annotate variants

Assess depth and breadth of coverage

Filter and prioritise variants

Integrate with clinical data

Shortlist of disease related variants

Visualise data

Visualise data

Patient

Extract DNA, prepare library and sequence

Diagnostic report

Page 9: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Workflow for NGS

Quality Control

Map to reference genome

Call variants

Annotate variants

Shortlist of disease related variants

Visualise data

Visualise data

Patient

Extract DNA, prepare library and sequence

Diagnostic report

Page 10: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Genomic DNA

Fragment

Target

Attach adaptors for paired end sequencing

DNA extraction and library preparation

Page 11: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

SequencingSequencing

Page 12: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Read mapping

.. TAGTACCCCATCTTGTAGGTCTGAAACACAAAGTGTGGGGTGTCTAGGGAAGAAGGTGTGTGACCAGGGAGGTCCC .. Reference Genome

ATCTTGTAGGGAAACACAAAGTG GTCTAGGGAAGAAGG

• After base calling, align/map sequences onto reference genome

• Determine coordinates (chromosomal position) and add basicannotations (coding, non-coding, etc) if known

Page 13: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Read mapping

Page 14: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Coverage

• Vertical coverage – how many times a particular base has been sequenced (e.g. 20X, 30X etc.)– Greater depth of coverage means improved accuracy for variant detection

(but is more expensive)

• Horizontal coverage – how much of the genome has been sequenced– Greater target size means more genome is sequenced (but is more

expensive)

Page 15: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Coverage

Page 16: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Variant calling

Page 17: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Variant calling

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT germlinechr4 27668 . T C 8.65 . DP=2;AF1=1;AC1=4;… GT:PL:DP:SP:GQ 0/1:0,0,0:0:0:3chr4 27669 . G T 4.77 . DP=2;AF1=1;AC1=4;… GT:PL:DP:SP:GQ 0/1:0,0,0:0:0:3chr4 27712 . T C 44 . DP=2;AF1=1;AC1=4;… GT:PL:DP:SP:GQ 1/1:40,3,0:1:0:8chr4 27774 . G A 5.47 . DP=2;AF1=0.5011; AC1=2; … GT:PL:DP:SP:GQ 0/1:34,0,23:2:0:28chr4 36523 . A T 10.4 . DP=1;AF1=1;AC1=4;… GT:PL:DP:SP:GQ 0/1:0,0,0:0:0:3

Page 18: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Variants

• A variant is a DNA sequence that is different to the “normal” sequence for a particular species.

• These should be named according to standardised nomenclature (HGVS)

• This allows consistent reporting and must include:– Reference sequence - e.g. NM_0000123.4– cDNA change - e.g. c.123A>G– Protein change - e.g. p.(V59M) or p.(Val59Met)

Variants

Page 19: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Variant types

The sun was hot but the man did not get his hat.

• SNV – a change to a single base pairThe sun wos hot but the man did not get his cat.The sun was .ot but the man did not get his hat.

• Small insertion/deletion (InDel) – in frameThe sun hot but the man did not get his hat.The sun was too hot but the man did not get his hat.

• Small insertion/deletion (InDel) – frameshiftThe sun wah otb utt hem and idn otg eth ish atThe sun wwa sho tbu tth ema ndi dno tge thi sha t

Variant types

Page 20: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

• A variant is pathogenic if it interferes with normal protein production.– There are many ways that this can happen!

Stop codonChangeaminoacid

Changesplice site,add intron

Changesplice site,removeexon

Newstopcodon

Frameshift, causing stop codon later

Regulatory region

Variant pathogenicityVariant pathogenicity

Page 21: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

• Frameshift and stop gain (nonsense substitution) variants are highly likely to be pathogenic.

• Splicing variants are likely to be pathogenic, but need checking with a splicing predictor.

• Missense variants can be pathogenic, and there are in-silico tools to predict the effect. The effect depends on how the amino acids are changed.

• Synonymous substitutions are very unlikely to be pathogenic unless they affect splicing.

Variant prioritisationVariant prioritisation

Page 22: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

~30,000 variants

Causal mutation(s)

Exclude common variants

Identify potential pathogenic mutation(s)

Variant prioritisation

Page 23: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

• We can pull information in from a variety of external sources, including:– “Population” databases, e.g. ExAC and dbSNP

• These provide an approximation of the variants that are common in the population and may be excluded from consideration

– Disease databases, e.g. HGMD• These provide a list of the known disease causing mutations

seen in a variety of settings and may be a flag for prioritisation

– In silico analysis packages, e.g. SIFT, PolyPhen– Phenotypic terms provided by clinician using HPO

Variant annotation and filtering

Page 24: Introduction to Next Generation Sequencing (NGS) · Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017File Size: 1MBPage Count: 24

Questions?