considerations for analyzing targeted ngs data introduction tim hague, cto

31
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Upload: francine-stone

Post on 25-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Considerations for Analyzing Targeted NGS Data

Introduction

Tim Hague, CTO

Page 2: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO
Page 3: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Introduction

Many mapping, alignment and variant calling algorithms

Most of these have been developed for whole genome sequencing and to some extent population genetic studies.

Page 4: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Premise

In contrast, NGS based diagnostics deals with particular genes or mutations of an individual.

Different diagnostic targets present specific challenges.

Page 5: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Goal

Present analysis issues related to differences in:

Sequencing technologiesTargeting technologiesTarget specifics Pseudogenes and segmental duplication

Page 6: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

NGS Sequencers Illumina Ion Torrent Roche 454 (SOLiD)

Roche 454Illumina IonTorrentt

Page 7: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Moore B, Hu H, Singleton M, De La Vega, FM, Reese MG, Yandell M. Genet Med. 2011 Mar;13(3):210-7.

Page 8: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Sequencing TechnologyDifferences:Homopolymer error ratesG/C content errorsRead length Sequencing protocols (single vs paired reads)

Page 9: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Targeting Methods PCR primers (e.g. amplicons) Hybridization probes (e.g. exome kits)

Page 10: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Targeting TechnologyDifferences:Exact matching regions vs regions with SNPs.

Results in:Need for mapping against whole chromosomes to avoid false positives.

Page 11: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Analysis Targets

Differences:Rate of polymorphismRepetitive structuresMutation profilesG/C contentSingle genes vs multi gene complexes

Page 12: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

BRCA1/2 HLA CFTR1/2000 1/29 1/2000

Distributions of insertions and deletionsDistribution of repeat elements

Page 13: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO
Page 14: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Segmental Duplications Sometimes called Low Copy Repeats (LCRs) Highly homologous, >95% sequence identity Rare in most mammals Comprise a large portion of the human genome

(and other primate genomes)

Important for understanding HLA

Page 15: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Segmental Duplications

Many LCRs are concentrated in "hotspots"

Recombinations in these regions are responsible for a wide range of disorders, including:

Charcot-Marie-Tooth syndrome type 1AHereditary neuropathy with liability to pressure palsiesSmith-Magenis syndromePotocki-Lupski syndrome

Page 16: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Data Analysis Tools

Differences:Detection rates of complex variants (sensitivity)False positive rates (accuracy)SpeedEase of use

Data analysis shouldn’t be like this!

Page 17: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

“Depending upon which tool you use, you can see pretty big differences between even the same genome called with different tools—nearly as big as the two Life Tech/Illumina genomes.”

Mark Yandel in BioIT-World.com, June 8, 2011

Page 18: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Examples Missing variants SNPs, a DNP and deletions

Page 19: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO
Page 20: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Identify more valid variants

Page 21: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Find homopolymer indels

Page 22: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Examples Coverage differences

Page 23: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Four times exon coverage

[0-432]

[0-96]

Page 24: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Higher exome coverage

[0-24]

[0-10]

Page 25: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

First conclusion

Read accuracy is not the limiting factor in accurate variant analysis.

Page 26: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Example Dense region of SNPs

Page 27: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

www.omixon.com

Page 28: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Second conclusion

As variant density increases the performance of most tools goes down.

Page 29: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Variant Calling

TThere are few popular variant callers: GATK, SAMtools mpileup, VarScanThe most comprehensive (GATK) has a whole pipeline, including a quality recalibration step and an indel realignment stepThese recalibration and realignment steps are highly recommended to be run before any variant callDeduplication and removing non-primary alignments may also be required

There are few popular variant callers: GATK, SAMtools mpileup, The most comprehensive (GATK) has a whole pipeline, including a quality recalibration step and an indel realignment stepThese recalibration and realignment steps are highly recommended to be run before any variant callDeduplication and removing non-primary alignments may also be required

There are few popular variant callers: GATK, SAMtools mpileup, VarScan

The most comprehensive (GATK) has a whole pipeline, including a quality recalibration step and an indel realignment step

These recalibration and realignment steps are highly recommended to be run before any variant call

Deduplication and removing non-primary alignments may also be required

Page 30: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Indel realigner problem

Page 31: Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

Variants that can be hard to find

DNPs TNPs Small indels next to SNPs 30+ bp indels Homopolymer indels Homopolymer indel and SNP together Indels in palindromes Dense regions of variants