lugm-update of the illumina analysis pipeline

48

Click here to load reader

Upload: hai-wei-yen

Post on 14-Apr-2017

1.149 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: LUGM-Update of the Illumina Analysis Pipeline

© 2011 Illumina, Inc. All rights reserved.Illumina, illuminaDx, BeadArray, BeadXpress, cBot, CSPro, DASL, Eco, Genetic Energy, GAIIx, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, Sentrix, Solexa, TruSeq, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.

Update of the IlluminaAnalysis Pipeline

顏海威 Henry YenBioinformatics FAS

均泰生物科技有限公司[email protected]

Slide generated from Henry Yen

Page 2: LUGM-Update of the Illumina Analysis Pipeline

2

Course Objectives

By the end of this course, you will be able to:

Illumina Data Analysis Overview

The Workflow in MiSeq Reporter

Powerful Annotation Tool - VariantStudio

Illumina iCloud - BaseSpace

Slide generated from Henry Yen

Page 3: LUGM-Update of the Illumina Analysis Pipeline

3

Illumina Data Analysis Overview

Slide generated from Henry Yen

Page 4: LUGM-Update of the Illumina Analysis Pipeline

4

Data Visualization

Secondary Analysis

Primary Analysis

Data Analysis Workflow

Slide generated from Henry Yen

Page 5: LUGM-Update of the Illumina Analysis Pipeline

5

Alignments and Variant Detection

Images/TIFF files

Base CallingIntensities

Outputs Outputs

Primary and Secondary Analysis Overview

Analysis Type

Primary Analysis(RTA)

Secondary Analysis(MSR / BaseSpace)

Sequencing(MCS/NCS/HCS)

Slide generated from Henry Yen

Page 6: LUGM-Update of the Illumina Analysis Pipeline

6

MiSeq Analysis Workflow

RTA

Resequencing Amplicon Small RNA De novoAssembly

16SMetagenomics

Base calls &Quality Scores

Instrument Control Software(MCS)

Images and Intensities

Limited Visualization via HTTP interface

Application-specific additional analysis

Alignment/FASTQ, Variants, Statistics

Enrichment

MiSeq Reporter

I’m All-in-One Sequencer

Slide generated from Henry Yen

Page 7: LUGM-Update of the Illumina Analysis Pipeline

7

Why We use the MiSeq Reporter

Automatic– Auto start after sequencing

Simply– Start-to-end workflow

Powerful– Support different analysis required

Friendly– Graphical User Interface

Slide generated from Henry Yen

Page 8: LUGM-Update of the Illumina Analysis Pipeline

8

The Workflow in MiSeq Reporter

Slide generated from Henry Yen

Page 9: LUGM-Update of the Illumina Analysis Pipeline

9

Workflows from MiSeq Reporter

AssemblyCapture-based Taxonomy

Reference Non Reference

Whole genome

Targeted-Seq

PCR-based

Resequencing

Library QC

EnrichmentAmplicon

Amplicon-DSPCR-Amplicon

mtDNA

RNA

Small RNA

Targeted-RNA

De novoAssembly Metagenomics

MiSeq Reporter

Slide generated from Henry Yen

Page 10: LUGM-Update of the Illumina Analysis Pipeline

10

Resequencing Workflows

Adapter Masking

Reads Demultiplexing

Enrichment workflow:

Reads are aligned to reference genome.

Variants are noted

Output the fastq, .bam, .vcf, .gVCF

Report the on-targeted rate, coverage & variants summary

Alignment

Indel Realignment

Bin / Sort

Variants Calling

Report

Fastq file

BAM file

VCF file

PDF file

Duplicated Flag

Resequencing

Slide generated from Henry Yen

Page 11: LUGM-Update of the Illumina Analysis Pipeline

11

Library QC Workflows

Adapter Masking

Reads Demultiplexing

PCR Amplicon workflow:

Analyzed the data by BWA.

Reads are aligned to reference genome.

Non Variants calling

Output the fastq, .bam,

Alignment

Indel Realignment

Bin / Sort

Alignment Statistics

Fastq file

BAM file

Duplicated Flag

LibraryQC

Slide generated from Henry Yen

Page 12: LUGM-Update of the Illumina Analysis Pipeline

12

Enrichment Workflows

Adapter Masking

Reads Demultiplexing

Enrichment workflow:

Reads are aligned to targeted region.

Analyzed data from probe captured

Output the fastq, .bam, .vcf, .gVCF

Report the aligned rate, on-targeted rate, coverage & variants summary

Alignment

Indel Realignment

Bin / Sort

Variants Calling

Targeted Statistics

Fastq file

BAM file

VCF file

CSV file

Duplicated Flag

Targeted Region

Enrichment

Slide generated from Henry Yen

Page 13: LUGM-Update of the Illumina Analysis Pipeline

13

Amplicon Workflows

Adapter Masking

Reads Demultiplexing

Amplicon workflow:

Analyzed the data from short-range PCR.

Reads are aligned to targeted region.

Customer targeted design from Illumina

Output the fastq, .bam, .vcf, .gVCF

Alignment

Indel Realignment

Bin / Sort

Variants Calling

Fastq file

BAM file

VCF file

Targeted Region

TruSeq Amplicon

Amplicon Viewer

Excel file

Slide generated from Henry Yen

Page 14: LUGM-Update of the Illumina Analysis Pipeline

14

Amplicon-DS Workflows

Adapter Masking

Reads Demultiplexing

Amplicon-DS workflow:

Analyzed the data from TruSight Tumor.

Variants check by double strand.

Filtering FFPE sample false-positive variants

Output the fastq, .bam, .vcf, .gVCF

Alignment

Indel Realignment

Bin / Sort

Variants Calling(Somatic)

Fastq file

BAM file

VCF file

Targeted Region

Variants filtering

Amplicon-DS

Slide generated from Henry Yen

Page 15: LUGM-Update of the Illumina Analysis Pipeline

15

Two manifest file :1. downstream locus-specific oligos (DLSO)2. upstream locus-specific oligos (ULSO)

The DNA Deamination bias corrected

The Amplicon Double-Stranded workflow can remove the FFPE sample DNA deamination bias (C -> T)

Slide generated from Henry Yen

Page 16: LUGM-Update of the Illumina Analysis Pipeline

16

PCR Amplicon Workflows

Adapter Masking

Reads Demultiplexing

PCR Amplicon workflow:

Analyzed the data from long-range PCR.

Reads are aligned to targeted region.

Targeted design by customer

Output the fastq, .bam, .vcf, .gVCF

Alignment

Indel Realignment

Bin / Sort

Variants Calling

Fastq file

BAM file

VCF file

Targeted Region

Duplicated Flag

PCR Amplicon

Slide generated from Henry Yen

Page 17: LUGM-Update of the Illumina Analysis Pipeline

17

mtDNA Workflows

Adapter Masking

Reads Demultiplexing

mtDNA workflow:

Analyzed the data by forensic.

Reads are aligned to rRCS.

Output the fastq, .bam, viewer file & excel file

It can be used to trace maternal lineage

Alignment with rRCS

Bin / Sort

Show by mtDNA viewer

Fastq file

BAM file

Excel file

Viewer file generated

Viewer file

mtDNA

Slide generated from Henry Yen

Page 18: LUGM-Update of the Illumina Analysis Pipeline

18

Small RNA Workflows

Adapter Masking

Reads Demultiplexing

Small RNA workflow:

Analyzed the data by Bowtie.

Reads are aligned to miRBase.

Non Variants calling

Output the fastq, .bam, pi chart & reads count for miRNA

Alignment

Bin / Sort

Reads count

Fastq file

BAM file

TXT file

Small RNA

Slide generated from Henry Yen

Page 19: LUGM-Update of the Illumina Analysis Pipeline

19

Targeted RNA Workflows

Adapter Masking

Reads Demultiplexing

Targeted RNA workflow::Reads are aligned against custom manifest file (banded Smith-Waterman)

Reports relative expression of genes and isoforms between several samples

Outputs:

FASTQ, BAM, HTML report

Alignment

Bin / Sort

Different Expression Analysis

Fastq file

BAM file

HTML file

Targeted RNA

Slide generated from Henry Yen

Page 20: LUGM-Update of the Illumina Analysis Pipeline

20

De novo assembly Workflows

Adapter Masking

Reads Demultiplexing

De novo Assembly workflow:

The data Assembly by Velvet.

Assembly of small (<20MB) genome from reads, without the use of a genomic reference

Output the fastq, .fasta & dot plot

Assembly

Indel Realignment

Dot plot

Fastq file

Fasta file

De Novo Assembly

Slide generated from Henry Yen

Page 21: LUGM-Update of the Illumina Analysis Pipeline

21

Metagenomics Workflows

Adapter Masking

Reads Demultiplexing

Metagenomics workflow:

Bacteria population analysis based on 16S rRNA amplicons .

Assembly of small (<20MB) genome from reads, without the use of a genomic reference

Output the fastq, .fasta & dot plot

Reads Classification

Current Taxonomy

Pi chart

Fastq file

Fasta file

Metagenomics

Slide generated from Henry Yen

Page 22: LUGM-Update of the Illumina Analysis Pipeline

22

Greengenes database 13.5 (May 2013) to perform taxonomic classification– http://greengenes.lbl.gov/– Illumina-curated version– Filter entries with 16S length <1250 bp– Filter entries with incomplete annotation

Bayesian classification method to assign taxonomies

RDP Naïve Bayesian Classifier (http://dx.doi.org/10.1128%2FAEM.00062-07)

Short sub-sequences are extracted from each read and compared to the

database by the classifier

Uses full length Illumina paired-end reads

Classification down to genus/species-level

16S metagenomics in MiSeq Reporter 2.4Slide generated from Henry Yen

Page 23: LUGM-Update of the Illumina Analysis Pipeline

23

Top 20 classification results

Ordered by Taxonomic level

New HTML Output in Metagenomics WorkflowSlide generated from Henry Yen

Page 24: LUGM-Update of the Illumina Analysis Pipeline

24

Read Stitch in MiSeq Reporter

≥ 10 bps

Read 1Read 2

Stitch Read

MiSeq Reporter has the PE reads stitch function

Read 1 and Read 2 have minimum 10 bps overlapping Bases Match Score need ≥ 0.9

Bases Match Score = 1- [Base Mismatch Rate] Overlapping PE reads can be stitch one read.

Cannot be stitched PE reads are converted to two single reads in the FASTQfile.

Slide generated from Henry Yen

Page 25: LUGM-Update of the Illumina Analysis Pipeline

25

Powerful Annotation Tool VaraintStudio

Slide generated from Henry Yen

Page 26: LUGM-Update of the Illumina Analysis Pipeline

26

Illumina VariantStudioIntuitive analysis and interpretation

Import Data Annotate Filter Classify Report

• Intuitive user interface

• Rich annotations

• Flexible and comprehensive set of filters

• Streamlined variant classification

• Easy and customizable report generation

Insight

Slide generated from Henry Yen

Page 27: LUGM-Update of the Illumina Analysis Pipeline

27

Illumina VariantStudio WorkflowData in, biological knowledge out

Import VCF or gVCF Files

Illumina VariantStudio Desktop ClientIllumina VariantStudio Desktop Client

Export Report of interpreted variants

VariantStudioAnnotation Database

Slide generated from Henry Yen

Page 28: LUGM-Update of the Illumina Analysis Pipeline

28

Annotation & FilteringLeveraging a broad range of annotation sources to enrich data with biological context

NHLBIExome Variant Server

1,000,000sDetected Variants

1,000,000sDetected Variants

10,000sCoding Variants

10,000sCoding Variants

100sDeleterious

Variants

100sDeleterious

VariantsFew

Causal Variants

FewCausal

Variants

Big Data

Easy to validate

Slide generated from Henry Yen

Page 29: LUGM-Update of the Illumina Analysis Pipeline

29

Clinical Panels and VariantStudioStreamlined workflow from sample to report

Align +Call Variant Annotate Filter Generate

ReportClassify

Easy!! Correctly !! Rapid!!

Slide generated from Henry Yen

Page 30: LUGM-Update of the Illumina Analysis Pipeline

30

Illumina iCloudBaseSpace

Slide generated from Henry Yen

Page 31: LUGM-Update of the Illumina Analysis Pipeline

31

The Illumina Analysis iCloud : BaseSpaceSlide generated from Henry Yen

Page 32: LUGM-Update of the Illumina Analysis Pipeline

32

BaseSpace Creates a Sequencing EcosystemAccelerates Analysis and Sharing of Genomic Data

Electronic Medical Record

Electronic Medical Record

Medical HistoryMedical History

Drugs & Immunization

Drugs & Immunization

Patient SchedulePatient

Schedule

Reference Content

Reference Content

Lab DataLab Data

Genomic Data

Diagnostic Images

Diagnostic Images

ScannedCharts

ScannedCharts

App Space

Public Databases

Slide generated from Henry Yen

Page 33: LUGM-Update of the Illumina Analysis Pipeline

33

Run data is automatically sent to Projects in BaseSpace

Runs and Projects have separate permissions

Core labs will be able to transfer ownership of a project

Runs and ProjectsSlide generated from Henry Yen

Page 34: LUGM-Update of the Illumina Analysis Pipeline

34

Enrichment Apps Release on BaseSpace NowPush-Button, Step by Step App Analysis

BWA EnrichmentILLUMINA, INC

The core algorithms in the BWA Enrichment workflow are the BWA Genome Alignment Software and the GATK Variant Caller.

Isaac EnrichmentILLUMINA, INC

The core algorithms in the Isaac Enrichment workflow are the Isaac Genome Alignment Software and the Isaac Variant Caller.

Only for Human hg 19 Read length of at least 32bp Support Paired-end run

Free

Slide generated from Henry Yen

Page 35: LUGM-Update of the Illumina Analysis Pipeline

35

Resequencing Analyzed Apps on BaseSpacePush-Button, Step by Step App Analysis

BWA Whole Genome SequencingILLUMINA, INC.

BWA/GATK Whole Genome Sequencing processes whole-genome sequencing data using BWA for alignment and variant detection using GATK.

Isaac Whole Genome Sequencing v2ILLUMINA, INC.

The Isaac Whole Genome Sequencing workflow performs read mapping using Isaac Genome Alignment Software and Isaac Variant Detection (SNVs, small indels, copy number anomalies and structural variations).

HiSeq Isaac Human WGS WorkflowILLUMINA INC.

Isaac Genome Alignment Software and Isaac Variant Caller for human samples.

Free

Free

Free

Slide generated from Henry Yen

Page 36: LUGM-Update of the Illumina Analysis Pipeline

36

About 12 species reference genome to aligned

Read length 21 ~ 150 bps ( Isaac is 35 ~150bps)

Support the Paired end runs

Does not support the Mate-paired reads

Detected CNV & Structure Variants result [VCF file]

Isaac & BWA Whole Genome Sequencing ILLUMINA, INC

Whole genome Analysis Apps on BaseSpacePush-Button, Step by Step App Analysis

Slide generated from Henry Yen

Page 37: LUGM-Update of the Illumina Analysis Pipeline

37

Tumor/Normal Paired Analysis Apps on BaseSpacePush-Button, Step by Step App Analysis

Tumor NormalILLUMINA, INC

The Tumor/Normal Sequencing App is designed to detect somatic variants from a tumor and matched normal sample pair

Only support human hg 19

Read length 50 ~ 150 bps

Support the Paired end runs

40X for normal sample & 80X for tumor(recommend)

Detected the somatic mutation in tumor

Free

Slide generated from Henry Yen

Page 38: LUGM-Update of the Illumina Analysis Pipeline

38

16S MetagenomicsILLUMINA, INC.

The 16S Metagenomics app performs taxonomic classification of 16S rRNA targeted amplicon reads using an Illumina-curated version of the GreenGenes taxonomic database.

16s Metagenomics Apps Release on BaseSpace NowPush-Button, Step by Step App Analysis

Free

Slide generated from Henry Yen

Page 39: LUGM-Update of the Illumina Analysis Pipeline

39

De novo assembly Apps in BaseSpacePush-Button, Step by Step App Analysis

Align, assemble & analyze readsDNASTAR, INC.

DNASTAR software for comprehensive next-gen sequence assembly and analysis.

Assemble bacteria de novo - FREEDNASTAR, INC.DNASTAR SeqMan NGen allows you to perform de novo assembly of bacterial genome sequences.

Slide generated from Henry Yen

Page 40: LUGM-Update of the Illumina Analysis Pipeline

40

SPAdesALGORITHMIC BIOLOGY LAB

SPAdes 3.0 - St. Petersburg Genome Assembler -is intended for both standard isolates and single-cell MDA bacterial assemblies.

BayesHammer + SPAdesBayesHammer – read error correction tool, which works well on both single-cell and standard data sets.SPAdes – iterative short-read genome assembly module; by default consecutively iterates through the

set of K-mer length values depending on the reads length.

Support MDA (Multiple displacement amplification) singel-cell bacterial assemblies

Supports paired-end reads, mate-pairs and unpaired reads.

De novo assembly Apps in BaseSpacePush-Button, Step by Step App Analysis

Free

Slide generated from Henry Yen

Page 41: LUGM-Update of the Illumina Analysis Pipeline

41

The Algorithm for de Bruijn graph

You should setting the K-merin your assemblies

Slide generated from Henry Yen

Page 42: LUGM-Update of the Illumina Analysis Pipeline

4242

New RNA-seq End-to-End Analysis Apps in “BaseSpace”

Software : TopHat2 v2.0.7Aligner : Bowtie 0.12.9Assembly & Gene Expression : Cufflinks 2.1.1Variants Caller: Isaac Variant Caller 2.0.5Alignment Statistics : Picard tools 1.72

What can the App do ?A. Alignment to hg19 human genomeB. FPKM value for genes or transcriptsC. Splice Junctions & fusions gene detectD. cSNPs findingE. Different expression gene discovery

TopHat Alignment Cufflinks Assembly & DE

Free

Slide generated from Henry Yen

Page 43: LUGM-Update of the Illumina Analysis Pipeline

43

Support 3 species (Human, Mouse, Rat) Can call gene fusion Only can trim adapter from TruSeq

New RNA-seq End-to-End Analysis Apps in “BaseSpace”Slide generated from Henry Yen

Page 44: LUGM-Update of the Illumina Analysis Pipeline

44

Biological Interpretation for RNA-seq Data in BaseSpace

FreeiPathwayGuide (Supports Human datasets only)ADVAITA BIO

An extension of the Cufflinks Assembly & DE workflow, iPathwayGuide will perform the following analyses: DE Gene Analysis Gene Ontology Analysis for Biological Processes, Molecular Functions,

and Cellular Components Pathway Analysis with Impact Analysis modeled on KEGG Pathways Coherent Cascade Analysis on Pathways Downstream Gene Perturbation Analysis Drug Interaction Analysis Disease Analysis based on enrichment

Slide generated from Henry Yen

Page 45: LUGM-Update of the Illumina Analysis Pipeline

4545

Overview the Core Apps for BaseSpace

BWA Enrichment

BWA Whole Genome Sequencing

Tumor Normal Paired

TopHat Alignment

Cufflinks Assembly & DE

Slide generated from Henry Yen

Page 46: LUGM-Update of the Illumina Analysis Pipeline

4646

BaseSpace Onsite System

Easy to Use from sample to Answer

Secure, Safe and Local Environment

Push-Button Data Processing

Two 6 cores CPUs with 128GB RAMCan only do the LIMS for NextSeq 500 now!!

(Support The HiSeq & MiSeq system in future)

RNA-seq Exome-seq Whole genome Analysis Tumor & Normal Paired

Slide generated from Henry Yen

Page 47: LUGM-Update of the Illumina Analysis Pipeline

4747

SummaryWorkflow MSR Local

VersionBaseSpace

Version

Amplicon – DS 2.4 N/A

Assembly 2.4 2.2

Enrichment 2.4 2.2

Generate FASTQ 2.4 2.2

Library QC 2.4 2.2

Metagenomics 2.4 2.2

PCR Amplicon 2.4 2.2

Resequencing 2.4 2.2

Small RNA 2.4 2.2

Targeted RNA 2.4 N/A

TruSeq Amplicon 2.4 2.2

BaseSpace Dual Mode Replicates Analysis Locally on MiSeq• Selectable option in MCS

• Allows customers to compare and evaluate MSR Local vs. BaseSpace

• Retains local copy of all files for customers reluctant to rely on 100% remote storage

Slide generated from Henry Yen

Page 48: LUGM-Update of the Illumina Analysis Pipeline

48

Questions?

…..or Tired?

Slide generated from Henry Yen