6. ulrike schoeck- gatc biotech
DESCRIPTION
Provisioning Bioinformatics are we prepared.TRANSCRIPT
![Page 1: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/1.jpg)
GATC Biotech confidential VII © 2007-2011
Eagle Genomics Symposium
"Provisioning bioinformatics - are we prepared?"
Ulrike Schoeck
GATC Biotech
April, 5th 2011
![Page 2: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/2.jpg)
GATC Biotech confidential VII © 2007-2011
I. Introduction to GATC Biotech, providing sequencing service
II. Presentation of in-house sequencing technologies
III. Bioinformatics - definition and history
IV. Evolution of sequencing
V. Sequence analysis - what do we have to face everyday?
VI. Sequencing applications - what is possible?
VII. Conclusions - are we prepared?
Agenda
![Page 3: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/3.jpg)
GATC Biotech confidential VII © 2007-2011
GATC Biotech - where we are
![Page 4: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/4.jpg)
GATC Biotech confidential VII © 2007-2011
GATC Biotech
• leading european commercial sequencing service provider
• over 20 years of experience and know how
• ISO-certified since 1997
• 100% privately owned, self-financed & independent
• more than 125 employees in 5 subsidaries, 22 sales offices
• 3-shift sequencing labs in Germany (Konstanz & Duesseldorf) and UK
• over 10,000 customers all over the world (industry & academia)
• Illumina Certified Service Provider
Complete and integrated sequencing & bioinformatic solutions:
from single sample to ultra high throughput
![Page 5: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/5.jpg)
GATC Biotech confidential VII © 2007-2011
Sequencing technologies in house
Applied Biosystems
ABI 3730xl
Roche / 454
Genome Sequencer FLX
since 1996 since 2006
since 2006
Illumina / Solexa
HiSeq 2000
Pacific Bosciences
PacBio RS
May 2011
![Page 6: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/6.jpg)
GATC Biotech confidential VII © 2007-2011
Sequencing capacity
GS FLX
yearly s
equencin
g c
apacity in T
b
GA
HiSeq
PacBio RS
GS FLX
GA GA
0
10
20
30
40
50
60
70
till 2006 July 07 Jan 08 July 08 Jan 09 July 09 Jan 10 July 10 Jan 11 July 11
Applied Biosystems ABI 3730xl
![Page 7: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/7.jpg)
GATC Biotech confidential VII © 2007-2011
System comparison
system GS FLX HiSeq
2000
PacBio
RS
available
since
2005 (GS 20 by
454 Life Science)
2006 (Genetic
Analyzer by Solexa) 2010
device PicoTiterPlate
w/ wells
flowcells
w/ channels
SMRT cells w /zero-
mode waveguides
library DNA fragmentation, adapter ligation
amplification emulsion PCR bridging PCR none
sequencing
sequencing by
synthesis
pyrosequencing
sequencing by
synthesis
cyclic reversible
termination
sequencing by
synthesis
single molecule,
real-time
![Page 8: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/8.jpg)
GATC Biotech confidential VII © 2007-2011
Comparison
GS FLX HiSeq 2000 PacBio RS
Read length Ø 400 bases 50 bases
100 bases > 1,000 bases
Mate pairs /
paired end
averaging 140-
200+ bases
insert sizes
~ 3 kb & higher
2 x 50 or 2 x 100 bases
insert sizes 300 b,
~ 3 kb
strobe reads
# of reads /
run > 1 mio
> 800,000,000
(single reads)
> 1,600,000,000
(paired end)
75,000 ZMVs /
SMRT cell
base
integration
same bases in
one cycle
(homopolymers)
base after base,
cycle per cycle
base after base,
continuously
![Page 9: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/9.jpg)
GATC Biotech confidential VII © 2007-2011
Definition:
• Science explaining biology by using information
technologies (computational biology)
• Providing algorithms, databases, user interfaces and
statistical applications for specifying potential scientific
significance
Bioinformatics
![Page 10: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/10.jpg)
GATC Biotech confidential VII © 2007-2011
Main object:
• Presentation of macromolecules as linear chains of
defined components or as sequences of symbols
• Main application in bioinformatics: comparison of
sequences for detecting homology (function, structure)
GCGTCCTCGGGCTTGGCGA
ACTGGGCGGCGGCGGTGGC
GGGCAGCAGCATGGGGGCG
GCA...
Bioinformatics
![Page 11: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/11.jpg)
GATC Biotech confidential VII © 2007-2011
GCGTCCTCGGGCTTGGCGA
ACTGGGCGGCGGCGGTGGC
GGGCAGCAGCATGGGGGCG
GCA...
Main object:
• Presentation of macromolecules as linear chains of
defined components or as sequences of symbols
• Main application in bioinformatics: comparison of
sequences for detecting homology (function, structure)
Bioinformatics
![Page 12: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/12.jpg)
GATC Biotech confidential VII © 2007-2011
• Manual analyses of sequential homologies using standard
word processing programmes
• Sustainable change in molecular biology by introducing
efficient computer algorithms
• Sequence alignment
• Phylogenetics
• Pattern matching
• Web-based database searches
• …
History
![Page 13: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/13.jpg)
GATC Biotech confidential VII © 2007-2011
• Sanger sequencing (1 read per sequencing run)
• Roche (1,000,000 reads per sequencing run)
• Illumina (1,600,000,000 reads per sequencing run)
Evolution of Sequencing
![Page 14: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/14.jpg)
GATC Biotech confidential VII © 2007-2011
0
10
20
30
40
50
60
70
till 2006 July 07 Jan 08 July 08 Jan 09 July 09 Jan 10 July 10 Jan 11 July 11
yearly s
equencin
g c
apacity in T
b
Evolution of Sequencing
![Page 15: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/15.jpg)
GATC Biotech confidential VII © 2007-2011
• Massively produced sequence data using next generation
sequencing technologies
• Advantages
• Applications, applications, applications…
• Runtime
• Costs
• Challenges
• Data analysis and interpretation
• Hardware infrastructure
• Data storage
• Software development
• Error rates
Sequence analysis - today
![Page 16: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/16.jpg)
GATC Biotech confidential VII © 2007-2011
• Massively produced sequence data using next generation
sequencing technologies
• Advantages
• Applications, applications, applications…
• Runtime
• Costs
Example: de novo sequencing
@HWI-ST143_0345:7:1:1200:2150#CGATGT/1
TTCTTCTGATGCCGGCATCCCTGCTTGCAGGTGTGAAG
+
HHHHHHHHHHHHHHHHFHHHHHHHHHHHHHF=FBF:CF
@HWI-ST143_0345:7:1:1310:2072#CGATGT/1
CGTTTCTAAAGCACCCACTATGGATGNNCAGCAGGACA
+
GFFGEFEFFFGDGGCEBEE=EEEEE9##55++;(1@A:
...
Sequence analysis - today
![Page 17: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/17.jpg)
GATC Biotech confidential VII © 2007-2011
• Massively produced sequence data using next generation
sequencing technologies
• Advantages
• Applications, applications, applications…
• Runtime
• Costs
@HWI-ST143_0345:7:1:1200:2150#CGATGT/1
TTCTTCTGATGCCGGCATCCCTGCTTGCAGGTGTGAAG
+
HHHHHHHHHHHHHHHHFHHHHHHHHHHHHHF=FBF:CF
@HWI-ST143_0345:7:1:1310:2072#CGATGT/1
CGTTTCTAAAGCACCCACTATGGATGNNCAGCAGGACA
+
GFFGEFEFFFGDGGCEBEE=EEEEE9##55++;(1@A:
...
Bioinformatics
Assembly
Scaffolding
Annotation
Finishing
Sequence analysis - today
![Page 18: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/18.jpg)
GATC Biotech confidential VII © 2007-2011
• Massively produced sequence data using next generation sequencing
technologies
• Advantages
• Applications, applications, applications…
• Runtime
• Costs
Example: Quantitative transcriptomics
@HWI-ST143_0345:7:1:1200:2150#CGATGT/1
TTCTTCTGATGCCGGCATCCCTGCTTGCAGGTGTGAAG
+
HHHHHHHHHHHHHHHHFHHHHHHHHHHHHHF=FBF:CF
@HWI-ST143_0345:7:1:1310:2072#CGATGT/1
CGTTTCTAAAGCACCCACTATGGATGNNCAGCAGGACA
+
GFFGEFEFFFGDGGCEBEE=EEEEE9##55++;(1@A:
...
Sequence analysis - today
![Page 19: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/19.jpg)
GATC Biotech confidential VII © 2007-2011
• Massively produced sequence data using next generation
sequencing technologies
• Advantages
• Applications, applications, applications…
• Runtime
• Costs
@HWI-ST143_0345:7:1:1200:2150#CGATGT/1
TTCTTCTGATGCCGGCATCCCTGCTTGCAGGTGTGAAG
+
HHHHHHHHHHHHHHHHFHHHHHHHHHHHHHF=FBF:CF
@HWI-ST143_0345:7:1:1310:2072#CGATGT/1
CGTTTCTAAAGCACCCACTATGGATGNNCAGCAGGACA
+
GFFGEFEFFFGDGGCEBEE=EEEEE9##55++;(1@A:
...
Bioinformatics
Alignment
Quantification
Comparison
data
cleaned
sequences
Preanalysis:
short quality reads
low complexity regions
cDNA adapters
sequencing primers
de novo contigs cluster representatives
contig hits
option 2:
assembly
option 1:
clustering
BLAST
analysis
cluster hits
Assembly
Assembly validation BLAST
analysis
Clustering
Sequence analysis - today
![Page 20: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/20.jpg)
GATC Biotech confidential VII © 2007-2011
data
cleaned
sequences
Preanalysis:
short quality reads
low complexity regions
cDNA adapters
sequencing primers
de novo contigs cluster representatives
contig hits
option 2:
assembly
option 1:
clustering
BLAST
analysis
cluster hits
Assembly
Assembly validation BLAST
analysis
Query QLength %HitLength HitLength %Identity e-value GeneID GeneLength
GD3X8YD02G3UTK 473 105.07 497 88 1.00E-138 59783566 778
GD3X8YD02HAS2D 504 103.57 522 92 0 112201467 867
GD3X8YD01EQVR9 438 103.42 453 89 1.00E-129 82985781 904
GD3X8YD02F9LP1 372 103.23 384 92 1.00E-140 194673237 7376
GD3X8YD01BBAL3 435 103.22 449 91 1.00E-165 112362035 3276
GD3X8YD02IRTW2 413 103.15 426 87 1.00E-104 56145323 770
GD3X8YD01C9534 418 103.11 431 93 1.00E-167 157279321 3170
GD3X8YD01EJJ53 461 103.04 475 89 1.00E-137 73976208 2891
Cluster ID Length(bp) %HitLength e-value UniGene ID Gene Length Contig ID Length (bp) Hit Start Hit End
GD3X8YD02G3UTK 473 105.07 1.00E-138 59783566 778 contig20575 2718 300 770
GD3X8YD02HAS2D 504 103.57 0 112201467 867 contig01324 2816 1230 1804
GD3X8YD01EQVR9 438 103.42 1.00E-129 82985781 904 contig01325 2825 67 513
GD3X8YD02F9LP1 372 103.23 1.00E-140 194673237 7376 contig01323 2903 2005 2375
GD3X8YD01BBAL3 435 103.22 1.00E-165 112362035 3276 contig01321 2980 400 830
GD3X8YD02IRTW2 413 103.15 1.00E-104 56145323 770 contig01320 2977 2300 2710
GD3X8YD01C9534 418 103.11 1.00E-167 157279321 3170 contig01318 2894 34 460
GD3X8YD01EJJ53 461 103.04 1.00E-137 73976208 2891 contig01315 2971 56 510
GD3X8YD02F06H3 427 103.04 0 194673243 2026 contig01314 2968 124 552
GD3X8YD02JQKXX 463 103.02 1.00E-180 146186547 4283 contig01322 2808 2287 2750
GD3X8YD01C2RA0 438 102.97 0 167693932 567 contig01319 2886 1456 1890
GD3X8YD01BMILD 439 102.96 1.00E-130 112362035 3276 contig01317 2960 1098 1530
GD3X8YD02IA76G 478 102.93 1.00E-155 160333384 4068 contig01316 2963 1004 1484
GD3X8YD02IJUER 443 102.93 0 114451341 864 contig05489 2657 95 1438
GD3X8YD01AUA6W 484 102.89 1.00E-119 59782054 869 contig22645 3358 3009 3486
GD3X8YD01CKSI4 451 102.88 1.00E-140 74268339 2212 contig20113 2734 1765 2109
GD3X8YD02JPD23 492 102.85 1.00E-167 151556820 3665 contig11912 2558 53 547
GD3X8YD02GG9F3 390 102.82 1.00E-105 219283151 3782 contig01371 2299 1453 1843
Sequence analysis - today
![Page 21: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/21.jpg)
GATC Biotech confidential VII © 2007-2011
DNA
• single reads in tubes & plates (PCR, plasmids)
• whole (meta)genome de novo sequencing
• whole genome re-sequencing
• targeted re-sequencing (enrichment, amplicons, exons)
• methylome / epigenome studies
• ChIP-Seq
RNA
• eukaryotic / prokaryotic cDNA de novo sequencing
• eukaryotic / prokaryotic cDNA re-sequencing (3’ UTR / 5’ UTR)
• smallRNA / microRNA
Sequencing applications
![Page 22: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/22.jpg)
GATC Biotech confidential VII © 2007-2011
• Massively produced sequence data using next generation
sequencing technologies
• Advantages
• Applications, applications, applications…
• Turnover
• Costs
• Challenges
• Data analysis and interpretation
• Hardware infrastructure
• Data storage
• Software development
• Error rates
Sequence analysis - today
![Page 23: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/23.jpg)
GATC Biotech confidential VII © 2007-2011
Conclusion
Provisioning bioinformatics: Are we prepared?
Advancements in sequencing technologies
(data quantity and application complexity)
and
Advancements in information technologies
(hardware and software)
Cloud computing,
GPU usage, software developement,
parallelization...
GAP
SOLUTION
![Page 24: 6. Ulrike Schoeck- GATC Biotech](https://reader031.vdocument.in/reader031/viewer/2022020122/55501245b4c90555618b4aa7/html5/thumbnails/24.jpg)
GATC Biotech confidential VII © 2007-2011
Thanks for your kind attention.
Open questions?
www.gatc-biotech.com