next generation sequencing exome sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf ·...

40
Next Generation Sequencing Exome Sequencing Marcela Davila Genomics Core Facility

Upload: others

Post on 18-Jun-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Next Generation Sequencing Exome Sequencing

Marcela Davila

Genomics Core Facility

Page 2: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

NGS methods

First generation (great cost, intense human effort) 1954 – Sequencing by degradation (Whitfeld PR) 1975 – Chain termination method (Sanger & Coulson) 1977 – Chemical modification (Maxam and Gilbert) Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing – Roche SBL – AB SOLiD Third generation (increase sequencing speed, high throughput, no optics) Semiconductor: Ion Torrent SBS-single molecule: Helicos SBS-single molecule-real time: Pacific Biosciences SBH/SBL- Complete Genomics FRET: VisiGen Protein nanopores: Oxford Nanopore TEM: Halcyon Molecular and ZS Genetics Transistor mediated: IBM STM: Reveo

Page 3: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Sanger method

Dye-labeled terminator

DNA template

Laser beam

Chromatogram

Capillar electrophoresis

Page 4: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Next generation sequencing

For cyclic array sequencing

1. DNA library preparation (ligation of adapters) 2. Amplification (ePCR, bridge PCR) 3. Sequencing reaction 4. Imaging 5. Decoding

Page 5: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

1. DNA library preparation (ligation of adapters) 2. Amplification (ePCR, bridge PCR) 3. Sequencing reaction 4. Imaging 5. Decoding

Next generation sequencing

For cyclic array sequencing

Page 6: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

1. DNA library preparation (ligation of adapters) 2. Amplification (ePCR, bridge PCR) 3. Sequencing reaction 4. Imaging 5. Decoding

Next generation sequencing

For cyclic array sequencing

Page 7: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

SeqBySynthesis - Illumina

Page 8: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Pyrosequencing - Roche

Pyrogram

Page 9: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

SeqByLigation – AB/SOLiD

First round

Page 10: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Second round

SeqByLigation – AB/SOLiD

Page 11: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

SeqByLigation – AB/SOLiD

Page 12: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

IonSensitiveFieldEffectTransistors – Ion Torrent

Page 13: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

1 2 3

A A

C C C

G G G G

T T

A A

C C

G G

T T

SeqBySynthesis - single molecule - Helicos

Page 14: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Single Molecule Real Time – Pacific Biosciences

Page 15: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

combinatorialProbeAnchorLigation – Complete Genomics

Page 16: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

FluorescenceResonanceEnergyTransfer – VisiGen

Page 17: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Protein nanopores – Oxford Nanopore Tech

Exonuclease

Page 18: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Transmission Electron Microscopy – Halcyon Molecular/ZS Genetics

Electronic fingerprint

Page 19: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

STM - Transistor mediated – IBM

metal

Dielectric layers

Page 20: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

ScanningTunnelingMicroscope– Reveo

Page 21: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Niedringhaus TP, et al (2011) Metzker ML (2010) Schadt EE, et al (2010) Tanaka H and Kawai T (2009) Drmanac R, et al (2009) Mardis ER. (2008) http://www.illumina.com/Media/flash_player.ilmn?dirname=systems&swfname=GA_workflow_vid&width=780&height=485&iframe http://my454.com/products/technology.asp http://appliedbiosystems.cnpg.com/Video/flatFiles/699/index.aspx http://www.helicosbio.com/Technology/TrueSingleMoleculeSequencing/tSMStradeHowItWorks/tabid/162/Default.aspx http://www.pacificbiosciences.com/aboutus/video-gallery?videoImage=pac_bio_lg.jpg http://www.nanoporetech.com/news/movies#movie-24-nanopore-dna-sequencing http://www.invitrogen.com/site/us/en/home/Products-and-Services/Applications/Sequencing/Semiconductor-Sequencing/Semiconductor-Sequencing-Technology/Ion-Torrent-Technology-How-Does-It-Work.html http://www.abrf.org/Other/ABRFMeetings/ABRF2005/Hardin.pdf http://researcher.ibm.com/view_project.php?id=1120

References

Page 22: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Quality Check Quality Filter

Mapping to reference genome Realignment and recalibration

SNV detection Peak detection Transcript abundance estimation

Resequencing RNA-seq ChIP-Seq

Different applications, different pipelines

Page 23: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

AAGCCTA

AAGCTTA

Human genome 3 billion bps

3 million differences

(0.1%)

AAGCTA

AAG-TA

AAG-TA

AAGCTA

UAG GGU ACU

* G T

Splice sites/branch site UTRs Coding regions

SNPs

Page 24: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Biotin probes

Streptavidin beads

DNA library

Hybridization

Capture

Targeted resequencing

Page 25: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Single end (SE)

Paired-end (PE)

Mate-pair (MP)

200-500 bp

2-5 Kb

R1

R1

R1

R2

R2

Different recipies

Page 26: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

@HWI-H200:53:D08U2ACXX:5:1101:1231:2012 1:N:0:

GCATTTTAGTAGAACCAGNCATTTCCCCCNACNTCNNTNCGNNANNNNTAA

+

@CCFFFFFHFFHHJJJJJ#3<FGIJJJJJ#1?###################

@HWI-H200:53:D08U2ACXX:5:1101:1184:2013 1:N:0:

TATATTTAATGTACTTTCNTATTTTATATNCANTATNTNATANANNNNTTG

+

CC@FFFFFHHHFFFFHIG#3AFGIIIHIJ#2A#1:C###############

@HWI-H200:53:D08U2ACXX:5:1101:1151:2035 1:N:0:

TTTTGCCTTGTTGCCCAGGTTGGTCTCGAACTCCTGGGCTCAAGGGATATG

+

@CCFFFFFHHHHHJJJGIGJJIIBHHHHIIGBGHGCHIIIHHGIGIJGHIF

@HWI-H200:53:D08U2ACXX:5:1101:1248:2055 1:N:0:

CAGGAACAGAATGAATGAGCGAAACAAATTCCCCTTGAGCTTCACTTGTTG

+

CCCFFFFFHHHHHIJJJJJJJJJJIJJIJJIJJJJJIJJIJJJJJJJIIIH

@HWI-H200:53:D08U2ACXX:5:1101:1235:2080 1:N:0:

ATGGTCTATTAAGTATGCAATAGTATTTTGTCTAAAACAATAATGTACATA

+

@@@FADDFHHHGHFHHGEIHIJGAIFHIIIIJIHIIJHIJIJJJHFHDHII

@HWI-H200:53:D08U2ACXX:5:1101:1165:2081 1:N:0:

ATAACAATGACAATAGAATTTGGGGACTCAGGAGGAAAGGGAGGGAAGCGG

+

CCCFFFFFGHHHHJGHIIIJJJJJJJJIIIJJIGGIJJJJGIIGIIIIIJJ

@HWI-H200:53:D08U2ACXX:5:1101:1231:2012 2:N:0:

TACTNNTANNTNCAGANCAGTTTAAATAAATAAAACATNCACCAGTATGTA

+

@BCF##22##2#2<CG#2AEFGIHJIIJJJFIJJJJJJ#0?GGGBFHIJGH

@HWI-H200:53:D08U2ACXX:5:1101:1184:2013 2:N:0:

ACATNNAANNTNAAAGNTCACAAACTATATATTATATANTGTACATAAAAT

+

B@@F##22##2#3<CG#3AFHIJJJGJJJJJJJJIJJJ#0?FGHJJJJGJG

@HWI-H200:53:D08U2ACXX:5:1101:1151:2035 2:N:0:

CAAACTAACCANGCGGACTTCATTGCTTTTAGAGGACACAATTAATTCTCT

+

CCCFFFFFHHH#2<CGIJBHJJIJJGIGJIIFGGIJJJIIJHIJIGIJIJI

@HWI-H200:53:D08U2ACXX:5:1101:1248:2055 2:N:0:

TATACAATCAANGCACAATCTATTAGAATGGGAAGAGACCCTGGAGATAAT

+

CCCFFFFFHHH#2AFHIJIHHHJJJJJJIJJJJJJJJJJJJIJJHEGHGG<

@HWI-H200:53:D08U2ACXX:5:1101:1235:2080 2:N:0:

AATCCCAACACTTTGGGAGGCTGAGGTGGGTGGATCACTTGGGGTCAGGAG

+

B@?DFBFFHHHHHIJJIJIJJJJIGI:DGI?F@GBFGIIGAGIIBF>HGIH

@HWI-H200:53:D08U2ACXX:5:1101:1165:2081 2:N:0:

GCTGTGTTAGCTTCTTTGTCCTATTGAAATGCAAAGATAGGCTGACTAACT

+

CC@FFFFFHHHHHJJJJI?CHFHGJJJJJIIJJJJIIJJGFHIJJJJJJJE

R1 R2

@HWI-H200:53:D08U2ACXX:5:1101:1231:2012 1:N:0:

GCATTTTAGTAGAACCAGNCATTTCCCCCNACNTCNNTNCGNNANNNNTAA

+

@CCFFFFFHFFHHJJJJJ#3<FGIJJJJJ#1?###################

31 37 39 18 16 2

Fastq format

Page 27: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

LIMS

Phred = 50

Probability that the base has been erroneously called

Phred score

P(called wrong)

Accuracy base call

10 1 in 10 90%

20 1 in 100 99%

30 1 in 1000 99,9%

40 1 in 10000 99,99%

50 1 in 100000 99,999%

Phred = 10

Phred quality score

Page 28: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Peak detection Transcript abundance estimation

RNA-seq ChIP-Seq

Quality Check Quality Filter

Mapping to reference genome Realignment and recalibration

SNV detection

Resequencing

Variant calling Annotation Custom filtering of variants

Exome pipeline

Page 29: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Quality check - FastQC

Page 30: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Quality check - FastQC

Page 31: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

@HWI-H200:53:D08U2ACXX:5:1101:1231:2012 1:N:0:

GCATTTTAGTAGAACCAGNCATTTCCCCCNACNTCNNTNCGNNANNNNTAA

+

@CCFFFFFHFFHHJJJJJ#3<FGIJJJJJ#1?###################

X nts

Low quality

Ambiguous bases

Quality filter- FastX

Page 32: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

CTACGATCGATCGA AGACGCAGCTACTACACG

CTACGATCGATCTACGCAGCTACTACACGTGCTGGGACGC REF

ACCACACGTGCAGG TCGATCGACG

CTACG ATCGACGCAGCTACCA AGGGACGT

READS

WHERE to place the reads? a) Unique reads b) Everywhere possible c) Choose one randomly d) Use pair-end data

HOW to place the reads? a) Ungapped b) Gapped

ACTACACGTGCAGGGACGT

Mapping

Page 33: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Local realignment around indels

Page 34: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

HWI-H200:53:D08U2ACXX:6:1108:18555:16623 99 chr1 10001 0 45M6S = 10174 224

TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT

AACCCTAAAGATCG @?@DDDBDAH??FHDGFFFHIIIGDGEHHI<ABHICHIEHCDD3BDEDGEC MD:Z:45 RG:Z:1 XG:i:0 AM:i:0 NM:i:0

SM:i:0 XM:i:0 XO:i:0 XT:A:M

HWI-H200:53:D08U2ACXX:6:1101:9568:123823 99 chr1 10003 11 1S46M1S = 10204 252

GACCCTGACCCTGACCCTAACCCTAACCCTAACCCTA

ACCCCAAACCC @@CFBDFFDFHHFGIIEHGGGD@GGHDGGFHGGEHEGCGHGGHGEHGC MD:Z:5A5A28T2C2 RG:Z:1 XG:i:0

AM:i:11 NM:i:4 SM:i:11 XM:i:4 XO:i:0 XT:A:

M

HWI-H200:53:D08U2ACXX:6:1302:17187:33007 97 chr1 10003 0 51M chrM 430 0

ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA

CCCTAACCCTAACC CCCFFFFFHHHHHJJJJJJIIIIJJJJJIIJJJJJJJJJJJJJJJJJIIGI X0:i:513 MD:Z:51 RG:Z:1 XG:i:0

AM:i:0 NM:i:0 SM:i:0 XM:i:0 XO:i:

0 XT:A:R

HWI-H200:53:D08U2ACXX:6:1104:2930:78353 177 chr1 10004 0 51M chr22 38431286 0

CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC

CCTAACCCTAACCC IIGAF?JJIGADJIGGD?GHGEEEIHGCCGIIHIHHIHFDHDHDDDDB@@B X0:i:515 MD:Z:51 RG:Z:1 XG:i:0

AM:i:0 NM:i:0 SM:i:0 XM:i:0 XO:i:

0 XT:A:R

HWI-H200:53:D08U2ACXX:6:1205:3665:10423 99 chr1 10054 0 51M = 10366 363

CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA

ACCCTA CCCFFFFFDBHHHIGEGEHEHIJJIHIFGIIGEHIGH9FGHHIIJJGGI=C X0:i:502 MD:Z:51 RG:Z:1 XG:i:0 AM:i:0

NM:i:0 SM:i:0 XM:i:0 XO:i:0 XT:A:

R

HWI-H200:53:D08U2ACXX:6:1101:4778:107011 163 chr1 10056 0 51M = 10355 350

AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA

ACCCTAACCCTAAC CCCFFFFFHHHHGJJJJJJJJJJIJJIJJIIHGIJECEHIJ;FGEIIEHCA X0:i:508 MD:Z:51 RG:Z:1 XG:i:0

AM:i:0 NM:i:0 SM:i:0 XM:i:0 XO:i:

0 XT:A:R

SAM (Sequence Alignment/Map) BAM

Query name HWI-H200:53:D08U2ACXX:6:1101:1233:2037

Flat 83

Reference name chr15

Leftmost position 47933389

Mapping quality 29

CIGAR string 51M

Mate reference =

Mate position 47933089

Insert size 351

Query sequence AATGAATGNCCATGGNCAGCAGCAGGACAGCAGGAACCACGTCT

Quality 00#9DG?:1#FB@>E@BGHHCGCFABIIHEIGFHFDC7;ADB@?@

Optional fields XT:A:U NM:i:2 SM:i:29 AM:i:29 X0:i:1 X1:i:0

BAM format

Page 35: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

ACTACACGTGCAGGGACGT CTACGATCGATCGA AGACGCAGCTACTACACG

CTACGATCGATCTACGCAGCTACTACACGTGCTGGGACGC REF

ACCACACGTGCAGG TCGATCGACG

CTACG ATCGACGCAGCTACCA AGGGACGT

READS

Is it a variant allele?

P(CC|D) = 0.06 P(CT|D) = 0.94 P(TT|D) = 3 × 10−11

What is the most likely

genotype?

Variant calling - GATK

Page 36: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

VCF format

Page 37: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

In which gene is it located? Name, Description,

OMIM, Pathway, GO,

Expression profiles . . .

Where in the gene is it located? Intron, exon, UTR,

intergenic region, splice site Is there any AA change? GAA -> GAG = E->E

GTT -> CTT = V->L

TGG -> TGA = W->X

TGA -> CGA = X->R

What impact does the AA

change have? Damaging, benign

Is it a known SNP?

ACTACACGTGCAGGGACGT CTACGATCGATCGA AGACGCAGCTACTACACG

CTACGATCGATCTACGCAGCTACTACACGTGCTGGGACGC REF

ACCACACGTGCAGG TCGATCGACG

CTACG ATCGACGCAGCTACCA AGGGACGT

READS

Annotation – Annovar, SIFT, PERL

Page 38: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

300,000 SNPs - 10,000 Indels

Variants list

Page 39: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Exome sequencing

cases

Coding variants

Controls Genetic variation DBs

Disease model Disease knowledge

Candidate genes

Family filters

The real work begins…

http://www.sciencedirect.com/science/article/pii/S0002929711003946

Variants filtering

Page 40: Next Generation Sequencing Exome Sequencingbio.lundberg.gu.se/courses/vt12/nextgen_marcela_i.pdf · Second generation (sincronyzed washing/scanning) SBS – Illumina Pyrosequencing

Data visualization