Download - BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004
![Page 1: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/1.jpg)
BioMart
A Federated Query Architecture
Arek KasprzykEuropean Bioinformatics Institute26 April 2004
![Page 2: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/2.jpg)
Changing Research Focus
• The increase in high-throughput technologies
• Growing sophistication of the user• Research question involving big
datasets– Multispecies– Multiexperiments– Multidatsets
• Data sources distributed
![Page 3: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/3.jpg)
Use cases
• Upstream sequences for all kinases upregulated in brain and associated with known diseases
• Name, chromosome position, description of all genes located on chromosome 1, expressed in lung, associated with mouse homologues, and non-synonymous snp changes
![Page 4: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/4.jpg)
Solutions
• Bioinformatics support– Processing data files– Use third party software– In house processing
• No bioinformatics?
• One-stop shop for biological data
![Page 5: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/5.jpg)
![Page 6: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/6.jpg)
CORBASOAP
![Page 7: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/7.jpg)
A Container ‘Revolution’
![Page 8: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/8.jpg)
BIOMART
![Page 9: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/9.jpg)
System Overview
![Page 10: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/10.jpg)
Key features
• Generic– Universal BioMart data model– Query-based interface– No data dependent abstractions
• Network scalability– Query optimised schema
• Platform portability– Automatic, simple SQL
![Page 11: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/11.jpg)
BioMart – a generic system
• Key abstractions– Dataset– Filter– Attribute
![Page 12: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/12.jpg)
Use cases
Upstream sequences for all kinases up-regulated in brain and associated with
known diseases
Name, chromosome position, description of all genes located on chromosome 1, expressed in lung,
associated with mouse homologues and non-synonymous snp changes
![Page 13: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/13.jpg)
Key Abstractions
GENE CENTRAL
gene_id(PK)gene_stable_id gene_startgene_chrom_endchromosomegene_display_iddescription
Mart
Dataset
Attribute
Filter
![Page 14: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/14.jpg)
Mart Query Language (MQL)
Using = dataset
Get = attribute
Where = filter
![Page 15: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/15.jpg)
BioMart
• Schema specification• XML-based configuration• Admin tools
– Configuration/Building
• Data access– Libraries and interfaces (Perl, Java)
![Page 16: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/16.jpg)
‘Reversed Star’ Schema
TRANSCRIPT CENTRAL
transcript_id (PK)gene_idgene_stable_id gene_chrom_startgene_chrom_endchromosomegene_display_idbanddescriptionetc
DISEASE SATELLITE
gene_id (FK)diseaseomim_idetc.
REFSEQ SATELLITE
gene_id (FK)transcript_id(FK)db_primary_iddisplay_idetc.
PFAM SATELLITE
gene_id (FK)transcript_id(FK)translation_idpfam_idetc.
SNP SATELLITE
gene_id (FK)transcript_id(FK)snp_idsnp_external_idsnp_chrom_startetc.
gene_id(PK)gene_stable_id gene_chrom_startgene_chrom_endchromosomegene_display_idbanddescriptionetc
GENE CENTRAL
![Page 17: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/17.jpg)
XML-based Configuration
XML
XML
XML
![Page 18: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/18.jpg)
Admin Tools
• MartEditor – XML editor with build-in system logic– Configure existing interfaces– Automatically create new, ‘naive’ configuration
• MartBuilder – Transforms source -> mart schema– A set of SQL commands (mart-build) – An automatic schema transformation
![Page 19: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/19.jpg)
Deploying BioMart
Source databases
Mart
Transformation
MartBuilder
Configuration
XML
MartEditor
![Page 20: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/20.jpg)
MartEditor
![Page 21: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/21.jpg)
Data access
• Libraries and interfaces– MartLib (API)– MartView (Web)– MartShell (Text)– MartExplorer (GUI)
![Page 22: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/22.jpg)
MartLib
GUI
Engine Filter Handler F
Query Chaining
Look up Tables
File
Query Runner
CompileExecute
Results
![Page 23: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/23.jpg)
MartView
![Page 24: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/24.jpg)
MartShell
![Page 25: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/25.jpg)
MartExplorer
![Page 26: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/26.jpg)
Distributed Architecture
![Page 27: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/27.jpg)
Query-chaining
F A F A F A
Dataset 1 Dataset 2Dataset 3
using Dataset1 get Attribute1 where Filter1=var1 as q;
using Dataset2 get Attribute2 where Filter2=var2 and filter3 in q
![Page 28: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/28.jpg)
BioMart – A Distributed Architecture
XML XML XML
MySQL ORACLE PostgreSQL
ANSI SQL
XML
XML
XML
XML
XML
XML
![Page 29: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/29.jpg)
BioMart – User Perspective
MartView MartLib
WWW SERVER XML
XML
XML
XML
MartShell
MartExplorer
MartLib
STANDALONE CLIENT
![Page 30: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/30.jpg)
Distributed Model Benefits
• Each group retains full control over their data source– Data content– Data updates– Data presentation (interface)– Deployment platform– Security
![Page 31: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/31.jpg)
Requirements
• Mart-spec database– ‘Mart-compatible’ star schema– Table naming convention (dataset__content__type)– XML configuration file
• RDBMS server outside firewall
![Page 32: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/32.jpg)
What Do You Get?
• Flexible interfaces configurable according to your spec
• ‘Performance-assured’ data retrieval• Query chaining across data sources• Administrator tools for modifying and
deploying the system
![Page 33: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/33.jpg)
Future
![Page 34: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/34.jpg)
July
• Alpha release of the BioMart suite– Specification
• Schema naming convention• DTD for XML config
• Administration Tools – Configure
• Data access (Perl/Java) – Lib– Interfaces
• Tested on MySQL 4/Oracle 9i ‘mixture’
![Page 35: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/35.jpg)
After July …
• MartBuilder– Automatically build marts from existing 3NF with
predefined PK/FK – Fixed schema data transformation function
• SQL collection
– Collaboration• Laboratory for the Foundation of Computer Science • Bell Labs
![Page 36: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/36.jpg)
BioMart – an Open Project
• All code and data freely available– Website
• www.ebi.ac.uk/biomart• www.ebi.ac.uk/biomart/martview
– Public MySQL server• martdb.ebi.ac.uk
– Ftp• ftp.ebi.ac.uk
• Mailing lists– mart-dev– mart-announce
![Page 37: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/37.jpg)
Summary
• If you need …– Scalable and flexible search interfaces for
an existing database– Single ‘integrated’ search interface to many
in house databases – ‘Connect’ your databases to other
databases on the internet
• BioMart
![Page 38: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/38.jpg)
BioMart and GMOD
• Points for discussion– Schema transformation for Chado
• Populated and stable?• Schema transformation for current
schemas of member databases?
– Testing it in PostgreSQL?
![Page 39: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/39.jpg)
![Page 40: BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004](https://reader035.vdocument.in/reader035/viewer/2022062409/56649e985503460f94b9af97/html5/thumbnails/40.jpg)
Credits
• Damian Smedley• Damian Keefe• Andreas Kahari• Craig Melsopp• Will Spooner• Darin London• Katerina Tzouvara