ngs ii illumina sequencing

49
DepthOfCoverage Genetics for Dummies 2017 NGS II Illumina Sequencing Robert Kraaij Department of Internal Medicine [email protected]

Upload: others

Post on 05-Nov-2021

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NGS II Illumina Sequencing

DepthOfCoverageGenetics for Dummies 2017

NGS II – Illumina Sequencing

Robert Kraaij

Department of Internal Medicine

[email protected]

Page 2: NGS II Illumina Sequencing

• Data Analysis

• Applications

• Example: Exome Sequencing

Overview

Page 3: NGS II Illumina Sequencing

Things to be addressed

NGS: many short reads that might contain errors

data analysis will handle these reads and errors

Page 4: NGS II Illumina Sequencing

• Data Analysis

• Applications

• Example: Exome Sequencing

Overview

Page 5: NGS II Illumina Sequencing

cBot

flowcell

bridgePCR

HiSeq2000

Illumina Sequencing

Page 6: NGS II Illumina Sequencing

Per Cycle Imaging

Page 7: NGS II Illumina Sequencing

G A T C

Per Cycle Imaging

Page 8: NGS II Illumina Sequencing

G

good quality

G

poor quality

Per Cycle Base Calling

Page 9: NGS II Illumina Sequencing

Phred Score Incorrect base Accuracy

10 1 in 10 90 %

20 1 in 100 99 %

30 1 in 1000 99.9 %

40 1 in 10000 99.99 %

50 1 in 100000 99.999 %

0 to 93 → ASCII 33 to 126 = single character

Quality Scoring

Page 10: NGS II Illumina Sequencing

@SEQ_ID

GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTC

+SEQ_ID

!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>

FASTQ File

Page 11: NGS II Illumina Sequencing

T A C G G T A C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

Alignment or Mapping of Reads

R E F E R E N C E G E N O M E (HG19)

chromosome + position + strand

sample.bam

Page 12: NGS II Illumina Sequencing

Run QC and filtering

sample.bam

Page 13: NGS II Illumina Sequencing

sample.bam

• both reads

• quality scores

• chromosome

• position

• quality flag

• duplicate flag

• off target flag

sortedBAM file

Page 14: NGS II Illumina Sequencing

Coverage

T A C G G T A C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T A C T T G C A T A G

G A T T A C G G T A C T T G C

G G T A C T T G C A T A G C T

T T A C G G T A C T T G C A T

5x coverage

Page 15: NGS II Illumina Sequencing

Mean Coverage

bases on target

size of target

Page 16: NGS II Illumina Sequencing

% of Bases Above a Certain Threshold

T A C G G T A C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T A C T T G C A T A G

G A T T A C G G T A C T T G C

G G T A C T T G C A T A G C T

T T A C G G T A C T T G C A T

5x 5x 4x1x

Page 17: NGS II Illumina Sequencing

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G

G A T T A C G G T G C T T G C

G G T G C T T G C A T A G C T

T T A C G G T G C T T G C A T

G = homozygous alternative

- G A T T A C G G T G C

C G G T G C T T G C A T A G C

T G C A T A G C T -

A T T A C G G T G C T T G C A

Page 18: NGS II Illumina Sequencing

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G

G A T T A C G G T A C T T G C

G G T G C T T G C A T A G C T

T T A C G G T A C T T G C A T

A/G = heterozygous

- G A T T A C G G T A C

C G G T G C T T G C A T A G C

T G C A T A G C T -

A T T A C G G T G C T T G C A

Page 19: NGS II Illumina Sequencing

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G

G A T T A C G G T A C T T G C

A/G = heterozygous?

Page 20: NGS II Illumina Sequencing

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G

G A T T A C G G T A C T T G C

G

sequencing quality

goodpoor

Page 21: NGS II Illumina Sequencing

sample.vcf

• chromosome

• position

• quality

• annotations

VCF File

Page 22: NGS II Illumina Sequencing

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G T A G

G A T T A C G G T A C T T G C

G G T G C T T G C A T A G C T

- G A T T A C G G T A C T T G C A T

deletion = heterozygous

- G A T T A C G G T A C

C G G T G C T T G C A T A G C

T G C A T A G C T -

- G A T T A C G G T G C T T G C A

Page 23: NGS II Illumina Sequencing

Paired-End Sequencing

2 x 100 bp

Page 24: NGS II Illumina Sequencing

Variant Calling: Mate Pairs

normal

400 bp

deletion

800 bp

insertion

200 bp

Page 25: NGS II Illumina Sequencing

Variant Calling: Mate Pairs

normal

400 bp

translocation

Page 26: NGS II Illumina Sequencing

Variant Calling: Split Reads

genome

800 bp

mRNA (cDNA)

Page 27: NGS II Illumina Sequencing

• Data Analysis

• Applications

• Example: Exome Sequencing

Overview

Page 28: NGS II Illumina Sequencing

Applications

• Re-sequencing → full genome → SNPs and indels

• Re-sequencing → mate pairs → structural variations

• Re-sequencing → regional → SNPs and indels

• Sequencing → de novo assembly

• RNAseq

• ChIPseq

• …seq

Page 29: NGS II Illumina Sequencing

www.illumina.com

Page 30: NGS II Illumina Sequencing

Example:

Exome Sequencing

Page 31: NGS II Illumina Sequencing

funding by NGI-NCHA, NWO, BBMRI

n > 3,000 samples of random set from RS-I

start May 2011; Nimblegen

part of “CHARGE-S” effort:

>5,000 exomes across 4 cohorts

Framingham, CHS, ARIC, Rotterdam Study

Expand with exome variants array?

CHARGE

Exome Sequencing

Page 32: NGS II Illumina Sequencing

Exome vs Full Genome

exon exon exongenome → 3 Gb

exome → ~30 Mb

Page 33: NGS II Illumina Sequencing

Exome Sequencing Workflow

DNA

isolation

Library

preparation

Exome

captureSequencing

Data

analysis

Page 34: NGS II Illumina Sequencing

+

+

Exome

capture

Page 35: NGS II Illumina Sequencing

Nimblegen SeqCap EZ v2 Capture

• CCDS (Sept 2009)

• miRBase (v14, Sept 2009)

• RefSeq (Jan 2010)

• 2,100,000 probes

• 30,246 coding genes

• 329,028 exons

• 710 miRNAs

• 36.5 Mb primary target

• 44.1 Mb capture target

Page 36: NGS II Illumina Sequencing

Illumina TruSeq V3 2x100 PE Sequencing

Page 37: NGS II Illumina Sequencing

Data analysis: BWA-GATK pipeline

• BclToFastQ (CASAVA)

• Chastity Filter

Demultiplexing

• BWA (paired)

• SortSam, MarkDuplicates (picard)

Alignment• BaseQualityScore

Recalibration, IndelRealignment (GATK)

Processing

• HaplotypeCaller

• VQSR

• VarEval

Variant-Calling• ANNOVAR,

VCFtools

• PlinkSeq, SKAT, R

• Spotfire

Analysis

Page 38: NGS II Illumina Sequencing

Sample QC and Variant QC

Page 39: NGS II Illumina Sequencing

RSX-2 Samples were sequenced to ~54x Mean Coverage

Average Mean Depth of Coverage

across the 44Mb SeqCap Exome

Perc

enta

ge o

f 44M

b c

overe

d 1

0x o

r better

Page 40: NGS II Illumina Sequencing

Mean Depth of Coverage by Flowcell

Mean D

epth

of

Covera

ge

Flowcell Number (Roughly Chronological Order)

Page 41: NGS II Illumina Sequencing

Determing Heterozygous Concordance versus 550k

genotyping arrays

Hete

rozygous C

oncord

ance

Flowcell Number (Roughly Chronological Order)

Page 42: NGS II Illumina Sequencing

Sample QC and Variant QC

Page 43: NGS II Illumina Sequencing

Number of Detected SNPs per Samples by Flowcell

Flowcell Number (Roughly Chronological Order)

Page 44: NGS II Illumina Sequencing

Heterozygous to Homozygous ratio per Sample by

Flowcell

Flowcell Number (Roughly Chronological Order)

Page 45: NGS II Illumina Sequencing

purines

Transition to Transversion Ratio

pyrimidines

tran

svers

ion

transition

Page 46: NGS II Illumina Sequencing

Transition to Transversion Ratio per Sample by Flowcell

Flowcell Number (Roughly Chronological Order)

Page 47: NGS II Illumina Sequencing

QC and filtering results

Page 48: NGS II Illumina Sequencing
Page 49: NGS II Illumina Sequencing

Things to Remember

NGS: many short reads that might contain errors

coverage indicates the number of independent reads that

cover a base → needed to analyse a genome

FASTQ file → sequence + quality scores

BAM file → aligned reads

VCF file → called variants + annotation