2009 gmod meeting dhileep sivam & isabelle phan seattle biomedical research institute
TRANSCRIPT
![Page 1: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/1.jpg)
2009 GMOD Meeting
Dhileep Sivam & Isabelle Phan
Seattle Biomedical Research Institute
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
![Page 2: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/2.jpg)
Seattle Biomedical Research Institute (SBRI)
• Founded in 1976• About 250 full-time staff• Focus on infectious disease• 13 Labs• Strong ties to the University of Washington• Bioinformatics Core
![Page 3: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/3.jpg)
How we first came to use Chado
LmjF Probe Set LinJ Probe Set
LmjF V5.2 LinJ V2.0 LinJ V3.0 LinJ V4.0LmjF V4.0
Mapping MappingMapping
Result Set Result Set Result Set
![Page 4: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/4.jpg)
Microarray Project
Chado
NimblegenData
Parsers
Analysis ToolsNormalization
ScalingFeature-level aggregation
RemappingVisualization
![Page 5: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/5.jpg)
Use Case: SSGCIDSeattle Structural Genomics Center for Infectious Disease
Vaccine Targets!
Gene Cloning & Expression
Protein Crystallization
Structure Determination
Bioinformatic Screening
Project Aim
3D Protein Structure
NIAID Emerging and re-emerging priority pathogens
Structures will serve as a starting point for drug development
Multi-center
![Page 6: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/6.jpg)
SSGCID
Vaccine Targets!
Gene Cloning & Expression
Protein Crystallization
Structure Determination
Bioinformatic Screening
![Page 7: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/7.jpg)
SSGCID
Chado
ExternalSequenceResources
BLAST Screening
ExportParsers
Bulk Loader
![Page 8: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/8.jpg)
Things that have come up…
Complexity of querying BLAST results
Gene Models
Complexity of querying microarray data
Materialized Views
SimplestPossible Model
“Grouping of Genes” DBXrefs
![Page 9: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/9.jpg)
warehouse
ProteomicsMicroarrayStructural genomics
Data access
curation
Automatedanalysispipeline
Sequence data management at SBRI
![Page 10: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/10.jpg)
Chado + GUS: why do we need both?
• Chado– Collaboration with IGS– Annotation tools: Manatee (apollo), Ergatis
• Internal data production
• Gus– Collaboration with UPenn– Web front end
• External data access
![Page 11: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/11.jpg)
Chado
ProteomicsMicroarrayStructural genomics
ManateeManual annotation
ErgatisAnalysis pipeline
Sequence data management at SBRI
GUS
GUS WDK
![Page 12: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/12.jpg)
Chado2GUS: Lost in translation
• Chado– Denormalized
schema• Polymorphism
– Mysql (IGS Chado)
• GUS– Normalized schema
• Subclassing
– Postgres port from Oracle
![Page 13: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/13.jpg)
Picking the best of two worlds
• Chado– Biological data model– Flexibility
• GUS– Software engineering– Flexibility
![Page 14: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/14.jpg)
The future?
• SQL-free data production– Instead of custom wrappers over raw SQL:
• ORMs: Chado Hibernate, ActiveRecords• Unified object model
• RDBMS-free data mining– Instead of GUS predefined query + set
combination• Biomart + Galaxy• RDF + triple store + sparql (object store + Lucene)
![Page 15: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute](https://reader035.vdocument.in/reader035/viewer/2022070401/56649f1b5503460f94c307e0/html5/thumbnails/15.jpg)