![Page 1: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/1.jpg)
CS 6293 Advanced Topics: Current Bioinformatics
Next-generation sequencing - technology
![Page 2: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/2.jpg)
Outline
• First generation sequencing• Next generation sequencing (current)
– AKA: • Second generation sequencing• Massively parallel sequencing• Ultra high-throughput sequencing
• Future generation sequencing• Analysis challenges
![Page 3: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/3.jpg)
Sanger sequencing (1st generation)
• DNA is fragmented• Cloned to a plasmid
vector• Cyclic sequencing
reaction• Separation by
electrophoresis• Readout with
fluorescent tags
Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)
![Page 4: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/4.jpg)
Cyclic-array methods (next-generation)
• DNA is fragmented• Adaptors ligated to
fragments• Several possible
protocols yield array of PCR colonies.
• Enyzmatic extension with fluorescently tagged nucleotides.
• Cyclic readout by imaging the array.
Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)
![Page 5: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/5.jpg)
Available next-generation sequencing platforms
• Illumina/Solexa• ABI SOLiD• Roche 454• Polonator• HeliScope• …
![Page 6: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/6.jpg)
Emulsion PCR
Rothberg and Leomon Nat Biotechnol. 2008
Shendure and Ji Nat Biotechnol. 2008
• Fragments, with adaptors, are PCR amplified within a water drop in oil.
• One primer is attached to the surface of a bead. • Used by 454, Polonator and SOLiD.
![Page 7: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/7.jpg)
Stats:• read lengths 200-300 bp• accuracy problem with homopolymers• 400,000 reads per run• costs $60 per megabase
Rothberg and Leomon Nat Biotechnol. 2008
454 Sequencing
![Page 8: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/8.jpg)
Bridge PCR
• DNA fragments are flanked with adaptors.• A flat surface coated with two types of primers,
corresponding to the adaptors.• Amplification proceeds in cycles, with one end of each
bridge tethered to the surface.• Used by illumina/Solexa.
![Page 9: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/9.jpg)
http://www.illumina.com/pages.ilmn?ID=203
![Page 10: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/10.jpg)
First RoundAll 4 labeled nucleotides
PrimersPolymerase
![Page 11: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/11.jpg)
1. Take image of first cycle 2. Remove fluorophore3. Remove block on 3’ terminus
![Page 12: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/12.jpg)
![Page 13: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/13.jpg)
http://seq.molbiol.ru/
Stats:• read lengths up to 36 bp• error rates 1-1.5%• several million “spots” per lane (8 lanes)• cost $2 per megabase
![Page 14: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/14.jpg)
Conventional sequencing
• Can sequence up to 1,000 bp, and per-base 'raw' accuracies as high as 99.999%. In the context of high-throughput shotgun genomic sequencing, Sanger sequencing costs on the order of $0.50 per kilobase.
Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)
![Page 15: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/15.jpg)
Sequence qualities• In most cases, the quality is poorest toward the
ends, with a region of high quality in the middle• Uses of sequence qualities
– ‘Trimming’ of reads • Removal of low quality ends
– Consensus calling in sequence assembly– Confidence metric for variant discovery
• In general, newer approaches produce larger amounts of sequences that are shorter and of lower per-base quality– Next-generation sequencing has error rate around 1%
or higher
![Page 16: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/16.jpg)
Phred Quality Score)(log10 10 pq
• p=error probability for the base• if p=0.01 (1% chance of error), then q=20• p = 0.00001, (99.999% accuracy), q = 50• Phred quality values are rounded to the
nearest integer
![Page 17: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/17.jpg)
Main Illumina noise factors
Schematic representation of main Illumina noise factors.(a–d) A DNA cluster comprises identical DNA templates (colored boxes) that are attached to the flow cell. Nascent strands (black boxes) and DNA polymerase (black ovals) are depicted. (a) In the ideal situation, after several cycles the signal (green arrows) is strong, coherent and corresponds to the interrogated position.(b) Phasing noise introduces lagging (blue arrows) and leading (red arrow) nascent strands, which transmit a mixture of signals. (c) Fading is attributed to loss of material that reduces the signal intensity (c). (d) Changes in the fluorophore cross-talk cause misinterpretation of the received signal (blue arrows; d). For simplicity, the noise factors are presented separately from each other.
Erlich et al. Nature Methods 5: 679-682 (2008)
![Page 18: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/18.jpg)
Comparison of existing methods
Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)
![Page 19: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/19.jpg)
Read length and pairing
• Short reads are problematic, because short sequences do not map uniquely to the genome.
• Solution #1: Get longer reads.• Solution #2: Get paired reads.
ACTTAAGGCTGACTAGC TCGTACCGATATGCTG
![Page 20: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/20.jpg)
Third generation
• Single-molecule sequencing– no DNA amplification is involved– Helicos HeliScope– Pacific Biosciences SMRT– …
• Longer reads– Roche/454 > 400bp– Illumina/Solexa > 100bp– Pacific Bioscience > 1000 bp and single molecule
![Page 21: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/21.jpg)
Applications of next-generation sequencing
Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)
![Page 22: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/22.jpg)
Analysis tasks
• Base calling• Mapping to a reference genome• De novo or assisted genome assembly
![Page 23: CS 6293 Advanced Topics: Current Bioinformatics](https://reader035.vdocument.in/reader035/viewer/2022062502/56815649550346895dc3ea51/html5/thumbnails/23.jpg)
References
• Next-generation DNA sequencing, Shendure and Ji, Nat Biotechnol. 2008.
• Next-Generation DNA Sequencing Methods, Elaine R. Mardis, Annu. Rev. Genomics Hum. Genet. (2008) 9:387–402