pathema website functionality enhancements
TRANSCRIPT
Pathema Website FunctionalityEnhancements:
Pathema gets its Sexy Back
Tanja Davidsen, Ph.D.J. Craig Venter Institute
Improved Searches using Lucene
• Improved speed and functionality of the search querieson Pathema
• An open source information retrieval library supported byApache
• At the core of the Lucene logical architecture is adocument containing fields of text, independent of fileformat
• Prevents us from hitting the database for searches,especially helpful for inexact searches
• Used by Wikipedia, Monster, SourceForge, UniProt andEBI
Improved Searches using Lucene
• Improves our search speed from 30+s to 1-3s• Filters will allow us to let the users build even
more complex queries: Search for all genes in organism B.anthracis starting
with the “dna” and assigned GO ID GO:0003677
GBrowse (GMOD)
• The most popular GMOD viewer• Used to replace and/or accompany our in
house genome viewers• Order and appearance of tracks are
customizable by administrator and end-user• Supports third party annotation using GFF
formats• Third-party feature loading• Customizable plug-in architecture (e.g. run
BLAST, find oligonucleotides, designprimers)
ClosTox: The Clostridum Toxin DB
• The Clostridium community is primarilyinterested in the toxin genes
• We created a specialty toxin and neurotoxinassociated proteins (NAPs) database forbrowsing on the Clostridium site
• Data for the database provided by Clostridiumresearchers/community
• Very successful debut at the last Botulismmeeting
Sybil: Comparative Genomic Region
• Compares a reference to selectedcomparison genomes by protein clusters
• Specify how many clustered genes a non-reference sequence region must have incommon to with the reference
Sybil: Synteny gradient display
• A color-coded display of conserved syntenybetween two or more sequences
• Select a reference sequence (bottom of thedisplay) with the genes color-coded from the 5’end to the 3’ end
• Orthologs in the comparison genomes areshown in the color of the ortholog from thereference genome
• As a result one can see large and small-scalerearrangements at a glance, in addition toregions that may be inserted in one sequencerelative to another
New Data Types
• Virulence Factors• Epitopes• Experimentally characterized
genes/proteins• Multidrug transporters• Genomic islands• Community requested databases
(ClosTox)
Acknowledgements
• PI: Granger Sutton (JCVI)• Subcontract: Owen White (University of Maryland, Baltimore, IGS)• Project Manager: Lauren Brinkac
JCVI Informatics Engineers Analysts Tanja Davidsen (manager) Scott Durkin (manager) Erin Beck Ramana Madupu Alex Richter Susmita Shrivastava Kevin Galinsky Bob Dodson Jay Sundaram Derek Harkins Seth Schobel Lis Caler
IGS Informatics Engineers Anu Ganapathy YongMei Zhao Josh Orvis Aaron Gussman Kevin Galens Jonathon Crabtree