a peek inside the bioinformatics black box - dcamg symposium - mon 20 july 2015
TRANSCRIPT
A peek inside the bioinformatics black box
A/Prof. Torsten Seemann
Victorian Life Sciences Computation Initiative (VLSCI)Doherty Centre for Applied Microbial Genomics (DCAMG)
The University of Melbourne
DCAMG Symposium - Melbourne, AU - Mon 20 July 2015
What data do we really have?
Isolate genomeSequenced reads
Other isolates in sequencing run
Contamination
Unsequenced regions
What we want
Metadata
■ Genome data itself is of limited value
■ Needs “extra” information
□ location: Australia 37.8S,145.0E □ date: 2015 2015-07-20□ source: human 60yo male faecal swab□ etc.
Compare to already assembled genomes
AGTCTGATTAGCTTAGCTTGTAGCGCTATATTATAGTCTGATTAGCTTAGAT
ATTAGCTTAGATTGTAG
CTTAGATTGTAGC-C
TGATTAGCTTAGATTGTAGC-CTATAT
TAGCTTAGATTGTAGC-CTATATT
TAGATTGTAGC-CTATATTA
TAGATTGTAGC-CTATATTAT
SNP Deletion
Reference
Reads
Best practice
■ Use both approaches□ reference-based + de novo
■ Best of both worlds□ and worst of both worlds - interpretation is non-trivial
■ Still need□ good epidemiology, metadata and domain knowledge!
Inferring transmission
■ Identical sequence does not imply transmission
■ Easier to rule out than in
The future
■ Genomics is delivering on the promise□ still not maximally exploited
■ Directions
□ more use of pan-genome□ understanding recombination / horizontal transfer□ dynamics of microevolution□ useful visualization of large data sets□ open science: data sharing, open source software