next generation sequencing (ngs): procedure ed...

Post on 12-Aug-2020

20 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Next generation sequencing (NGS):

procedure ed applicazioni

Istituto di Tecnologie Biomediche Consiglio Nazionale delle Ricerche

ITB - CNR

Ermanno Rizzi, PhD

Ferrara 05/05/2014 Ermanno.rizzi@itb.cnr.it

NGS…what?

• High performance sequencing: High throughput Sequencing (HTS) Why?

• Higher sequencing data request at lower cost (compare to Sanger).

When?

• in 2005 first NGS platfrom: GS20 454 Lifesciences …something is changing

• high quantity and quality of available data • new genomics applications

…what will change?

• new multidisciplinary approaches.

Next Generation Sequencing NGS: Intro

Next Generation Sequencing NGS: Intro

Workflow

Reference sequence

Mapping Reads

Depth or “X” or fold

Coverage (% of reference sequence)

Next Generation Sequencing NGS: Intro

Keywords

454\Roche Genome Sequencer FLX-Titanium, Junior

NGS Platforms

Illumina Genome Anlyzer, HiSeq, MiSeq

Life SOLiD 5500 W Series Genetic Analysis Systems

Life Ion Personal Genome Machine (PGM) and Proton

Next Generation Sequencing NGS: Intro

NGS platform comparison

Platform Sequencing chemistry Signal detection

Roche/454 single-nucleotide addition (SNA)

by pyrosequencing

Luminescence

Torrent or Proton (PGM)

single-nucleotide addition (SNA) by

semiconductor sequencing Chip pH

Illumina Sequencing by Synthesis (SBS)

using cyclic reversible termination (CRT)

Fluorescence

Solid Sequencing by ligation (SBL) Fluorescence

NGS platform distribution

Platform Run time Read

Length Gigabases/run

Reagent Cost/run

Roche/454 Titanium FLX+ 20 hrs. 800 0,8 $ 6 200

Roche/454 Titanium Junior 10 hrs. 400 0,04 $ 977

Ion Torrent - Proton I 4 hrs. 175 12,2 $ 1 000

Illumina HiSeq 2500 - high output v3

2 days 50 75 $ 5 866

Illumina HiSeq 2500 - high output v3

11 days 200 300 $ 13 580

Life Technologies SOLiD – 5500xl

8 days 110 155 $ 10 503

NGS cost comparison

Target sample Library preparation NGS

Signal detecion and analysis

Reads analysis

Final data

Library amplification

A T C G

A T C G

NGS Workflow. From sample prep to final data

Library preparation

DNA fragmentation

• Sonication • Ultra sonication (Covaris) • Nebulization • Enzymatic

• Ligation • Tagmentation • Paired ends o Mate Pair

Adapter addition and Multiplexing (MID o Index)

• Fluorometer • qPCR • Agilent Bioanalyser

Library quality control and quantitation

…To Library amplification

Procedure Library preparation: adapter Ligation

Roche/454 Rapid Protocol:

• for all Roche/454 applications • starting material: 500 ng of gDNA

Procedure Library preparation: enzymatic “tagmentation”

Illumina Nextera “tagmentation”

• for small genomes • small amount of starting material: 1 or 50 ng of gDNA

Procedure Library preparation: paired ends

Amplification and selection

• Coupling ends of large fragments: 3, 8, 20K bases

• Large contigs • Enhance scaffolding • Complete genome seq. • Structural variants

Roche/454 Rapid Protocol:

• Library captured onto beads surface

• Water in oil emulsion creates microreactors

• millions of DNA fragments amplified onto beads surface

• recovery of DNA beads by emulsion breaking

• enrichment to eliminate null beads and recover positive beads

Library amplification: Emulsion PCR (emPCR)

emPCR

Illumina Bridge amplification • High-density primers attached to the slide • Solid-phase amplification • 100–200 million spatially separated template clusters

Library amplification: Bridge Amplification

454\Roche Genome Sequencer FLX-Titanium, Junior

NGS Platforms

Illumina Genome Anlyzer, HiSeq, MiSeq

Life SOLiD 5500 W Series Genetic Analysis Systems

Life Ion Personal Genome Machine (PGM) and Proton

www.molecularecologist.com

Error Rates

Instrument Primary Errors Single-pass Error Rate (%) Final Error Rate (%)

3730xl (capillary) Substitution 0.1-1 0.1-1

454, all models Indel 1 1

Illumina, all models

Substitution ~0.1 ~0.1

Ion Torrent – all chips

Indel ~1 ~1

SOLiD – 5500xl A-T bias ~5 ≤ 0.1

Platforms error rates

The best platform is…..

Things to be considered: • Amount of data • Read length • Support from company • User community • Post sequencing requirements • Platform cost • Kits cost • cost per base

• Denovo Vs Re-sequencing

•Target: DNA, RNA

• Single ends Vs paired ends

• Sequencing approaches: Shotgun Vs Enrichment

Enrichment by PCR Enrichment by probe capture ChIP-Seq for epigenetic studies

NGS applications

!

?

Target: DNA and RNA

DNA Seq = Genomics • genome sequencing (nuDNA o mtDNA) • exome • variant calling: mutations and SNPs • copy number variation (CNV) • Epigenetics

• ChIP-Seq: promoter metylation, histon modifications, transcription factors

RNA Seq = Transcriptomics • gene expression levels • variant calling • splicing variants • fusion transcripts • transcript discovery

Sample preparation for RNA Seq

RNA seq Sequencing of: • mRNA • ncRNA, • small RNA • micro RNA

Poly T or random examers

Capture sequencing

Capture Total genomic

DNA

Direct sequencing

PCR

Amplicon sequencing

NGS applications: DNA

Capture probe hybridization

Pre-amplification

Capture probe design and synthesis

Library preparation

NGS protocol

“fishing”

Washing

Target recovery

Enrichment by probe capture

Capture sequencing: the target

Exome

• targets: exons

• ~1% of human genome

• size: ~30Mb

• ~85% mutations related to disease

• multiple sample variant call (MSVC)

for Low pass sequencing

Custom

• Diagnostic genome regions

• Chromosomes

• Specific regions (kinase, transcription

factors…)

• Specific genes (HLA, MHC ecc)

Aims • Rare and common variant identification

• single nucleotide • Insertions and Deletions (InDels)

• SNPs analysis • Copy Number Variations (CNV)

Looking forward…

Sanger Sequencing

NGS

Third Generation Sequencing (TGS)

Third generation: single molecule sequencing

Company Name TGS principle

Helicos Genetic Analysis Platform Virtual Terminator nucleotides

Pacific Biosciences Anchored DNA polymerase+Zero-mode waveguide (ZMW)

VisiGen Biotechnologies Modified DNA polymerase + Fluorescence Resonance Energy Transfer (FRET)

• Halcyon Molecular • ZS Genetics Transmission Electron Microscopy (TEM)

Oxford Nanopore modified α-hemolysin pore + Measure of Ionic current

TGS Features • No “wash-and-scan” technology • “Real time” - really fast • No synchronization required no dephasing problem • Single molecule sequencing

Applications @ ITB-CNR • Shotgun: bacteria genome finishing • PCR enrichment: Integrome study in Gene Therapy • Variant calling in ancient DNA • RNA seq: transcriptome of breast cancer • Metagenomics

Istituto di Tecnologie Biomediche Consiglio Nazionale delle Ricerche

ITB - CNR

Shotgun: bacteria genome finishing

Fuel droplet A.venetianus colonies

Circular representations of A. venetianus VE-C3 chromosome and plasmids.

Acinetobacter venetianus VE-C3 genome sequencing

• Roche/454 + Illumina sequencing • 3,564,836 bp bases were assembled

Adhesion to oil fuel: wee cluster for n-alkanes adhesion

Metabolism of n-alkanes: alk-like sequences cytochrome P450

Resistance to heavy metal:

As, Cd, Co,Cr, Cu, Hg, Pb, Zn found in the Venice Lagoon

Bioremediation an resistance clusters identification

Phylogenetic analysis conducted using a set of conserved proteins: FusA, IleS, LepA, LeuS, PyrG, RecA, RecG, RplB, RpoB

Each genome is represented by an arc and the different genomes (arcs) are connected by vertices accounting for their shared sequence similarity

Philogenetic analysis: Acinetobacter pangenome

BLAST comparisons of Acinetobacter species.

Integrome study in Gene Therapy

Proviral vector Integration

Integrome study in Gene Therapy

Human genome

Proviral vector

GATCCGTTTCAGTCGATCAGTGGGCATA

Integration site (IS) nucleotide sequence

Integrome: all detectable IS in the human genome

Recover of vector-genome junctions: Ligation Mediated PCR (LM-PCR)

5’ LTR 3’ LTR linker

Pst I Mse I

Restriction sites

Integrated proviral vector

Genomic DNA

LM-PCR

Nested PCR

Integrome study in Gene Therapy

Distribution of retroviral integrations around transcription start sites.

Integrome study in Gene Therapy

A

B

C

Distribution of the distance of MLV and HIV integration sites from the transcription start site (TSS) of targeted genes at 2500-bp (A), 50-bp (B), or 5-bp (C) resolution.

Integrome study in Gene Therapy

Results and applications •Integration pattern for pro-viral vector to be used in gene therapy

•Tool to study the transcriptionally active regions -> applied in stem cells studies

5’ LTR 3’ LTR linker

Pst I Mse I

Restriction sites

Integrated proviral vector

Genomic DNA

Integrome study in Gene Therapy

Ancient DNA analysis by NGS

… to recover genetic info from the past.

• To determine phylogenetic relationship among extint and extant animals

• For palaeogenetics and evolutionary biology studies

Why to study ancient DNA (aDNA?)

“Homo” evolution

Common ancestor

Why to study aDNA?

For anthropological applications

and population genetics on

modern human and on

Early Modern Humans (EMH).

Domestication process

Why to study aDNA?

aDNA analysis challenge

Authenticity assessment: Ancient Vs Modern

Features of aDNA • low amount • high degradation • small fragment size: 70-120 bp • contamination • post-mortem damage

Ancient DNA analysis by NGS

Features of aDNA: Misincorporation pattern I

Patterns of damage in genomic DNA sequences from a Neandertal. Briggs AW, PNAS 2007

Features of aDNA: Misincorporation pattern II

Temporal Patterns of Nucleotide Misincorporations and DNA Fragmentation in Ancient DNA. Sawyer S, PLoS One. 2012

C to T misincorporations at the first position of mtDNA fragments as a function of age.

Ancient DNA and NGS

• Single locus PCR • Multiplex PCR • Shotgun approach • Custom capture best approach in terms of:

• Discrimination endogenous vs exogenous

• Cost • Enrichment ratio

Forensic DNA

Forensic DNA common features with aDNA

• Fragmentation • Low amount • High level of contamination

NGS applied to forensic DNA:

Short Tandem Repeats (STR) count and analysis

STR profiling

RNA seq: transcriptome of breast cancer

Rationale •primary human lobular breast cancer tissue • 132,000 reads • validated by RT-PCR Results:

• one deletion • two novel ncRNAs • ten unknown or rare transcript isoforms • a novel gene fusion • thousands of novel non-coding transcripts • more than three hundred reads corresponding to the non-coding RNA

MALAT1, which is highly expressed in many human carcinomas.

intragenic deletion in WHSC1L1, identified by the 99 bases long read 1B.

The green and blue arrows represent the two halves of the fusion transcript which map on the opposite order to the genome.

RNA seq: transcriptome of breast cancer

Metagenomics

Human Metagenomics

Environmental Metagenomics

A microbiome is "the ecological community of commensal, symbiotic, and pathogenic microorganisms that literally share our body space."

Microbiome

study of genetic material recovered directly from environmental samples.

Metagnomics Vs 16 S rRNA seq.

Ingrid Cifola

Clarissa Consolandi

Clelia Peano

Roberta Bordoni

Alessandro Pietrelli

Marco Severgnini

Eleonora Mangano

Eva Pinatel

Luca Petiti

Simone Puccio

Santosh Anand

Gianluca De Bellis

Cristina Battaglia

Thanks to my colleagues

Thanks for your attention!

Lunch ?!

Ermanno.rizzi@itb.cnr.it

top related