the challenge of ngs data - international committee on...
TRANSCRIPT
![Page 1: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/1.jpg)
The challenge of NGS data
Guy Cochrane
![Page 2: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/2.jpg)
Data intensity
Availability of technologies
Broadening of applications
![Page 3: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/3.jpg)
Context
![Page 4: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/4.jpg)
EMBL European Bioinformatics Institute Genes, genomes & variation
ArrayExpress Expression Atlas Metabolights
PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Literature & ontologies
Europe PubMed Central
Gene Ontology
Experimental Factor
Ontology
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
1000 Genomes
Gene, protein & metabolite expression
Protein sequences, families & motifs
Chemical biology
Reactions, interactions & pathways
IntAct Reactome MetaboLights
Systems
BioModels
Enzyme Portal
BioSamples
Ensembl
Ensembl Genomes
European Genome-phenome Archive
Metagenomics portal
![Page 5: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/5.jpg)
European Nucleotide Archive (ENA)
http://www.ebi.ac.uk/ena/
• A broad platform for the management, sharing, integration and dissemination of sequence data
• Established in the early 1980s, extended for new technologies and applications
• Globally comprehensive scientific record and European node of INSDC
• Connectivity with broader EMBL-EBI resources
• Sequence data foundation
• Sustained within EMBL-EBI under EMBL funding with additional support from EC, UK Research councils, Wellcome Trust, etc.
• Substantial scale: 1.3 petabase pairs across >1 million taxa, 2,000-5,000 active data providers, global consumer userbase
• Rich submission, discovery and retrieval software, tools and services
![Page 6: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/6.jpg)
Submissions
![Page 7: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/7.jpg)
Data discovery temperature>=10 AND temperature<=25 AND geo_box1(42, 17, 43, 18)
tax_tree(10090) AND library_source="GENOMIC" AND instrument_platform="ILLUMINA” AND library_strategy="ChIP-Seq"
![Page 8: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/8.jpg)
Cross-references and tagging GUI: http://www.ebi.ac.uk/ena/data/xref/search
![Page 9: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/9.jpg)
COMPARE
COMPARE: the enabling system for rapid identification, containment and mitigation of emerging infectious diseases and foodborne outbreaks by generation and comparison of genomic information on samples and pathogens across sectors, time and locations, with additional contextual data.
A global platform for the sequence-based rapid identification of pathogens
!
![Page 10: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/10.jpg)
Ocean Samping Day and Tara Oceans
![Page 11: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/11.jpg)
Technology
![Page 12: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/12.jpg)
Data accumulation
Cook CE, Bergman MT, Finn RD, Cochrane G, Birney E, Apweiler R. The European Bioinformatics Institute in 2016: Data growth and integration. Nucleic Acids Res. 2016 Jan;44(D1) D20-6. doi:10.1093/nar/gkv1352. PMID: 26673705; PMCID: PMC4702932.
![Page 13: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/13.jpg)
Starting point
TGAGCTCTAAGTACC329183050298757
![Page 14: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/14.jpg)
Sequence compression
• Encoding of read starts and differences
• 3.5x–100x compression over existing formats
• Scales favourably with increasing read length and density
Coverage
Bits
per b
ase
0.025
0.05
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.01% unpaired
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 10 20 30 40 50
0.1% unpaired
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 10 20 30 40 50
1.0% unpaired
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
0 10 20 30 40 50
0.01% paired
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
0 10 20 30 40 50
0.1% paired
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
0 10 20 30 40 50
1.0% paired
●
●
●
● ●●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
● ●●
●
●●
● ●●
0 10 20 30 40 50
Read length● 25● 50● 100● 200● 400
Cold Spring Harbor Laboratory Press on May 4, 2011 - Published by genome.cshlp.orgDownloaded from
Coverage
Bits
per b
ase
0.025
0.05
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.01% unpaired
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 10 20 30 40 50
0.1% unpaired
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 10 20 30 40 50
1.0% unpaired
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
0 10 20 30 40 50
0.01% paired
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
0 10 20 30 40 50
0.1% paired
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
0 10 20 30 40 50
1.0% paired
●
●
●
● ●●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
● ●●
●
●●
● ●●
0 10 20 30 40 50
Read length● 25● 50● 100● 200● 400
Cold Spring Harbor Laboratory Press on May 4, 2011 - Published by genome.cshlp.orgDownloaded from
Coverage
Bits
per b
ase
0.025
0.05
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.01% unpaired
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 10 20 30 40 50
0.1% unpaired
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 10 20 30 40 50
1.0% unpaired
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
0 10 20 30 40 50
0.01% paired
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
0 10 20 30 40 50
0.1% paired
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
0 10 20 30 40 50
1.0% paired
●
●
●
● ●●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
● ●●
●
●●
● ●●
0 10 20 30 40 50
Read length● 25● 50● 100● 200● 400
Cold Spring Harbor Laboratory Press on May 4, 2011 - Published by genome.cshlp.orgDownloaded from
Fritz, M.H. Leinonen, R., et al. (2011) Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. 21 (5), 734-40.
![Page 15: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/15.jpg)
Quality compression
TGAGCTCTAAGTACC329183050298757
002020010022212 -2---30---9---7
Horizontal reduction Vertical reduction
![Page 16: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/16.jpg)
Quality compression: simple, horizontal reduction
Photograph from MichaelMaggs, http://en.wikipedia.org/wiki/File:Amanita_muscaria_(fly_agaric).JPG
![Page 17: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/17.jpg)
Models for data reduction
Jong-Seok Lee et al. (2009), http://mmspg.epfl.ch/files/content/sites/mmspl/files/shared/lee_icme.pdf
![Page 18: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/18.jpg)
CRAM performance
Bits/base
![Page 19: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/19.jpg)
CRAM performance
Bits/base
1000 Genomes
WT Sanger Institute data
![Page 20: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/20.jpg)
Human aspects
![Page 21: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/21.jpg)
Standards
event
organism
!"#$%&'$(
)*+,!-.(
/-!#$%+(
$'$0+(12(
%!33$0+(
4$'5%$(
.56$(7-)%&!0(8(9*)0&+:(
%!0+$0+(
3)-50$(-$;5!0(
/)-)3$+$-(0)3$($0'5-!0<(/)-)3$+$-.(
=57$(.+);$(
.$>(
3$+,!4(
.56$(7-)%&!0(?(+-$)+3$0+(.+!-);$(+-$)+3$0+(%,$35%)=(
.56$("5!3)..("5!'!=*3$(
9*)0&+:(453$0.5!0.(
%*--$0%:(
*05+.(3$+,!4(%!33$0+(
%!0+)50$-(
!"#$%&'(
)*+,('-"'./&01(
event
organism
!"#$"%&'(
)%*+(
$,"-./#(
0"*"12#+(
,"2*30+(,.'&%*30+((
)",%'%*4(
$/.*.!.,(,"5+,(
0+$*6(
)"#$,+(2*,+(
*+#$+/"*3/+(
7+"*3/+(#"*+/%",(
$"/"#+*+/(89(
5%.#+()!%+'2:!('"#+(
*";.'(89(
ten Hoopen et al. Standards in Genomic Sciences (2015) 10:20
![Page 22: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/22.jpg)
![Page 23: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/23.jpg)
Metagenomics
![Page 24: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/24.jpg)
https://www.ebi.ac.uk/metagenomics/
![Page 25: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/25.jpg)
![Page 26: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/26.jpg)
Scheuch et al. (2015) BMC Bioinformatics 16:69
![Page 27: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/27.jpg)
Identification data
Sequence Analysis Genes
Taxa Samples
![Page 28: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/28.jpg)
Identification data accumulation
• Identification against context is informative
• (Molecular observations may be tentative or high confidence)
• Coincidental observations (time, place, virulence, host phenotypes)
• ‘Capture’ identifications to ‘connect’ coincidental observations
![Page 29: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/29.jpg)
Identification data
Sequence Analysis Genes
Taxa Samples
User
![Page 30: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/30.jpg)
Identification data
Sequence Analysis Genes
Taxa Samples
User
User
![Page 31: The challenge of NGS data - International Committee on ...talk.ictvonline.org/cfs-file/__key/communityserver-wikis-components-files/00...Experimental Factor Ontology Molecular structures](https://reader033.vdocument.in/reader033/viewer/2022050200/5f53a62bd409cb192f792a99/html5/thumbnails/31.jpg)
Acknowledgements
Standards & support Ana Cerdeño-Tárraga, Ana Luisa Toribio, Petra ten Hoopen, Marc Rosello, Richard Gibson, Jeena Rajan, Clara Amid Compression technology Vadim Zalunin, Ewan Birney, James Bonfield, Rasko Leinonen ENA technical services Blaise Alako, Simon Kay, Nima Pakseresht, Xin Liu, Suran Jayathilaka, Nicole Silvester, Daniel Vaughan, Neil Goodgame, Iain Cleland, Dmitriy Smirnov, Kethi Reddy, Vadim Zalunin, Rasko Leinonen ENA - http://www.ebi.ac.uk/ena/