ensmart: a generic system for fast and flexible access to biological data arek kasprzyk et al (2004)...
TRANSCRIPT
![Page 1: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/1.jpg)
EnsMart: A Generic System for EnsMart: A Generic System for Fast and Flexible Access to Fast and Flexible Access to
Biological DataBiological Data
Arek Kasprzyk Arek Kasprzyk et alet al (2004) (2004) 14:160-169, Genome research14:160-169, Genome research
EBI, Wellcome TrustEBI, Wellcome Trust
![Page 2: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/2.jpg)
ObjectivesObjectives
Understand the idea of a “Data Mart”Understand the idea of a “Data Mart” Understand why this idea is useful to biologyUnderstand why this idea is useful to biology Have an idea of how Have an idea of how EnsEnsMartMart works. works. Assess the significance of the EnsMart Assess the significance of the EnsMart
system. Will it last?system. Will it last?
![Page 3: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/3.jpg)
Data Mart definedData Mart defined
A database that is potentially derived from A database that is potentially derived from many other databases whose primary many other databases whose primary purpose is query processing and report purpose is query processing and report generation for non-technical users.generation for non-technical users.
Similar to a “Data Warehouse” Similar to a “Data Warehouse”
Marts/warehouses important components in Marts/warehouses important components in “decision support systems” in business.“decision support systems” in business.
![Page 4: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/4.jpg)
Data Mart in EnsMartData Mart in EnsMart
• Data collected
• Standardized
• Query Optimized
• Presented to Users
![Page 5: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/5.jpg)
Marts – benefitsMarts – benefits
Allows good division of labor Allows good division of labor – Computers for transactions separate from Computers for transactions separate from
computers for queriescomputers for queries– Interface development separate from database Interface development separate from database
development.development.– Biologists (can be) separated from computer Biologists (can be) separated from computer
scientists as a result of good interface design.scientists as a result of good interface design.– Produces faster more stable system for usersProduces faster more stable system for users
![Page 6: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/6.jpg)
CostsCosts Construction of the Mart is a challenging and Construction of the Mart is a challenging and
continuous process. continuous process. New sources of data need to be incorporated and New sources of data need to be incorporated and
validated constantly validated constantly TrustTrust
![Page 7: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/7.jpg)
The case for EnsMart, why now?The case for EnsMart, why now?
Growing number of different databases and Growing number of different databases and opportunities. Genomes, expression, opportunities. Genomes, expression, protein, disease…protein, disease…
Assembled, high quality genomes available.Assembled, high quality genomes available.– ““finished” genomes can be used as references finished” genomes can be used as references
to link data from different databases to link data from different databases consistently.consistently.
EnsMart built to take advantage of the EnsMart built to take advantage of the opportunities for cross-database queries.opportunities for cross-database queries.
![Page 8: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/8.jpg)
Inside EnsMartInside EnsMart
9 organisms9 organisms At least 17 different At least 17 different
primary sources of data, primary sources of data, many with multiple many with multiple databases.databases.
2 kinds of “Foci”2 kinds of “Foci”– GenesGenes
EnsembleEnsemble ESTEST VegaVega
– SNPsSNPs
![Page 9: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/9.jpg)
EnsMart schemaEnsMart schema
Focus 1
Many
Many
One
Many
Many
![Page 10: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/10.jpg)
EnsMart schema: another focusEnsMart schema: another focus
![Page 11: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/11.jpg)
Schema -> Query SpeedSchema -> Query Speed
““Central” tables or foci contain binary Central” tables or foci contain binary values for each satellite indicating values for each satellite indicating existence. First step in query generation existence. First step in query generation limits the range of satellite tables limits the range of satellite tables accessed.accessed.
These values are only useful in the query These values are only useful in the query process (take extra space and time for process (take extra space and time for transactions).transactions).
Many queries may not require access to Many queries may not require access to satellite tables as a result.satellite tables as a result.
![Page 12: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/12.jpg)
User InterfacesUser Interfaces
Supposedly Confucian quote Supposedly Confucian quote – "What I hear I forget. "What I hear I forget. – What I see I remember. What I see I remember. – What I do I understand." What I do I understand."
![Page 13: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/13.jpg)
User InterfacesUser Interfaces
MartViewMartView: website, “wizard” query : website, “wizard” query construction.construction.
MartExplorerMartExplorer: Stand alone tool, tree-based : Stand alone tool, tree-based query construction.query construction.
MartShellMartShell: text-based application that : text-based application that utilizes an SQL-like query language. Can utilizes an SQL-like query language. Can be used interactively or in batch processes.be used interactively or in batch processes.
Write your ownWrite your own! – using MartLib java library! – using MartLib java library
![Page 14: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/14.jpg)
MartView 1MartView 1
Mart View 1Choose org and focus
![Page 15: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/15.jpg)
MartView 2Design query
![Page 16: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/16.jpg)
MartView 3 MartView 3 Specify OutputSpecify Output
![Page 17: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/17.jpg)
MartExplorerMartExplorer
![Page 18: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/18.jpg)
MartShellMartShell
![Page 19: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/19.jpg)
ConclusionsConclusions
Powerful query system for biologists.Powerful query system for biologists. Useful framework for software engineers.Useful framework for software engineers.
– All open source!All open source!
What about other loci such as repetitive What about other loci such as repetitive elements?elements?
Data validation?Data validation? Annotation updates?Annotation updates?
![Page 20: EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14:160-169, Genome research EBI, Wellcome Trust](https://reader035.vdocument.in/reader035/viewer/2022062722/56649f395503460f94c56aca/html5/thumbnails/20.jpg)
EnsMart DiscussionEnsMart Discussion
What, if any, are the problems with the foci What, if any, are the problems with the foci system?system?
What alternatives to this system exist?What alternatives to this system exist?
Describe a task that EnsMart could be used to Describe a task that EnsMart could be used to accomplish.accomplish.
Describe any personal experiences with Describe any personal experiences with EnsMart.EnsMart.