high throughput sequencing

37
High Throughput Sequencing

Upload: stu

Post on 23-Feb-2016

98 views

Category:

Documents


0 download

DESCRIPTION

High Throughput Sequencing. Agenda. Introduction to sequencing Applications Bioinformatics analysis pipelines What should you ask yourself before planning the experiment. Introduction to sequencing. What is sequencing?. Finding the sequence of a DNA/ RNA molecule. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: High Throughput Sequencing

High Throughput Sequencing

Page 2: High Throughput Sequencing

Agenda

• Introduction to sequencing• Applications• Bioinformatics analysis pipelines• What should you ask yourself before

planning the experiment

Page 3: High Throughput Sequencing

Introduction to sequencing

Page 4: High Throughput Sequencing

What is sequencing?

Finding the sequence of a DNA/ RNA molecule

:// . . / / / /http cancergenome nih gov newsevents multimedialibrary images CancerBiology

What can we sequence?

Page 5: High Throughput Sequencing

Sanger sequencing

• Up to 1,000 bases molecule• One molecule at a time

• Widely used from 1970-2000• First human genome draft was

based on Sanger sequencing• Still in use for single molecules

:// . . / / / /http www genomebc ca education articles sequencing

Page 6: High Throughput Sequencing

High Throughput SequencingNext Generation Sequencing (NGS) / Massively parallel sequencing

• Sequencing millions of molecules in parallel• Do not need prior knowledge of what you’re

sequencing

Platform Read length No. of reads per run

454 sequencing Up to 1,000 bp ~1 M (1 million)SOLiD 50-75 bp ~1 G (1 billion)HiSeq 100-150 bp ~0.5 G

We will discuss Illumina’s platform only

Page 7: High Throughput Sequencing

Sequencing Workflow

1. Extract tissue cells2. Extract DNA/RNA from cells3. Sample preparation for sequencing4. Sequencing5. Bioinformatics analysis

Why is it important to understand the “wet lab” part?

Page 8: High Throughput Sequencing

Sample Prep

Random shearing of the DNA

Adding adaptors and barcodes

Size selection Amplification Sequencing

Page 9: High Throughput Sequencing

Sequencing process

Page 10: High Throughput Sequencing

Sequencing process

Leave only sequences from one direction

Page 11: High Throughput Sequencing

Sequencing process

Page 12: High Throughput Sequencing

Sequencing process

Page 13: High Throughput Sequencing

Sequencing process

Page 14: High Throughput Sequencing

Applications

Page 15: High Throughput Sequencing

• Resequencing – sequencing the genome of an

organism with a known genome

• Exome sequencing / Targeted sequencing –

sequencing only selected regions from the genome

• De-novo sequencing– sequencing the genome of

an organism with a unknown genome

DNA sequencing

Page 16: High Throughput Sequencing

Sequencing of mRNA extracted form the cell to get an estimate of expression levels of genes.

RNA-Seq

Page 17: High Throughput Sequencing

Counting vs. Reading

RNA-Seq vs. DNA-sequencing

Page 18: High Throughput Sequencing

Sequencing the regions in the genome to which a protein binds to.

ChIP-Seq

Page 19: High Throughput Sequencing

Basic concepts

Insert – the DNA fragment that is used for sequencing. Read – the part of the insert that is sequenced. Single Read (SR) – a sequencing procedure by which the insert is sequenced from one end only. Paired End (PE) – a sequencing procedure by which the insert is sequenced from both ends.

Page 20: High Throughput Sequencing

Bioinformatics Analysis Pipelines

Page 21: High Throughput Sequencing

Demultiplexing

LaneUnknown inserts

Page 22: High Throughput Sequencing

Reference Genome

SampleMapping

Demultiplexing

Example of mapping parameters:• Number of mismatches per read • Scores for mismatch or gaps

Mapping parameters affect the rest of the analysis

Page 23: High Throughput Sequencing

Removing duplicates and non-unique mappings

Mapping

Demultiplexing

Reference Genome

Reference Genome

?

Page 24: High Throughput Sequencing

Resequencing/ Exome Pipeline

Page 25: High Throughput Sequencing

Coverage profile and variant calling

Removing duplicates and non-unique mappings

Mapping

Demultiplexing

Reference Genome

…ACTTCGTCGAAAGG…

G

Page 26: High Throughput Sequencing

Coverage profile and variant calling

Removing duplicates and non-unique mappings

Mapping

Demultiplexing

Variant filtering

Reference Genome

…ACTTCGTCGAAAGG…

Reference Genome

…ACTTCGTCGAAAGG…

Frequency >= 20%

Coverage >= 5

Page 27: High Throughput Sequencing

Variant calling

Removing duplicates and non-unique mappings

Mapping

Demultiplexing

Variant filtering

Genes and known variants

Reference Genome

…ACTTCGTCGAAATG… …GTCCCGTGATACTCCGT…

GA

rs230985Gene X

Page 28: High Throughput Sequencing

Resequencing results

Page 29: High Throughput Sequencing

Coverage profile and variant calling

Removing duplicates and non-unique mappings

Mapping

Demultiplexing

Variant filtering

Genes and known variants

Finding suspicious variants

Recessive disease:1. Variant not in known databases2. Homozygous variant shared by all

affected individuals3. Same variant appears in healthy parents

at heterozygous state4. Healthy brothers can be heterozygous

to the same variant

Dominant disease:1. Variant not in known databases2. Heterozygous variant shared by all

affected individuals3. The variant doesn’t appear in healthy

individuals

1 Table per project

1 Table per sample

Example for further analysis

Page 30: High Throughput Sequencing

Coverage profile and variant calling

Removing duplicates and non-unique mappings

Mapping

Demultiplexing

Variant filtering

Genes and known variants

Finding suspicious variants

QC

QC

QC

QC

QC

Quality control steps in the pipeline

Page 31: High Throughput Sequencing

How is de-novo assembly different from resequencing analysis ?

Page 32: High Throughput Sequencing

RNA-Seq Pipeline

Page 33: High Throughput Sequencing

Gene expression levels

Removing duplicates and non-unique mappings

Mapping

Demultiplexing

FPKM Normalization:

Unannotated genes

Page 34: High Throughput Sequencing

Gene expression levels

Removing duplicates and non-unique mappings

Mapping

Demultiplexing

Differential gene expression

Differential expression parameters:• Threshold - Minimum number of reads for pair

testing• Normalization• Replicates

Differential expression parameters affect the results

Page 35: High Throughput Sequencing

RNA-Seq results

Page 36: High Throughput Sequencing

CoverageCoverage – resequencingNumber of bases that cover each base in the genome in average.

“Coverage” – RNA-Seq• Depends on the expression profile of each

sample.• Highly expressed genes will be detected with less

“coverage” than lowly expressed genes.

Page 37: High Throughput Sequencing

What should you ask yourself before sequencing when planning the experiment

• Reference genome:– What is my reference genome? – Does it have updated annotations? – What annotations are known? – Are my samples closely related to the reference genome?

• Do I expect to have contaminations in my sample?• Do I have validations from other technologies? (RT-PCR,

SNPchip…)• Do I have controls and replicates?• RNA-Seq: am I interested in alternative splicing?• Resequencing: What kind of mutations do I expect to find?