biomart query network
DESCRIPTION
BioMart Query Network. Arek Kasprzyk European Bioinformatics Institute 8 January 2005. Biological databases. Distributed Different format Different focus Different release schedule Scalability factor. BioMart. Retrieval. MartExplorer. MartShell. MartView. JAVA. Perl. BioMart API. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/1.jpg)
BioMart Query Network
Arek KasprzykEuropean Bioinformatics Institute8 January 2005
![Page 2: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/2.jpg)
Biological databases
• Distributed• Different format• Different focus• Different release schedule• Scalability factor
![Page 3: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/3.jpg)
![Page 4: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/4.jpg)
BioMart
![Page 5: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/5.jpg)
Retrieval
myDatabase
SNPVega
EnsemblUniProt
myMart
MSD
BioMart API
JAVA Perl
MartExplorer MartShell MartView
Schema transformation
MartBuilder
XML
MartEditor
Configuration
Databases
Public data (local or remote)
![Page 6: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/6.jpg)
MartView
![Page 7: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/7.jpg)
BioMart@Ensembl
![Page 8: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/8.jpg)
MartShell
![Page 9: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/9.jpg)
MartExplorer
![Page 10: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/10.jpg)
Database
![Page 11: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/11.jpg)
FK
FK
FK
FK
PK
FK FK FKFK
PK PK
PK PK
Schema
![Page 12: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/12.jpg)
FK
FK
FK
FK
PK
PK
FK FK
FK FK
Schema
![Page 13: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/13.jpg)
FK
FK
FK
FK
PK
PK
Schema
![Page 14: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/14.jpg)
main1
PK1
2
PK2PK1
FK2
dm
FK2
dm
FK1 FK2
dm
FK1 FK2
PK1FK1 FK1
FK2 FK2PK2 FK1
Schema - ‘reversed star’
![Page 15: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/15.jpg)
Fixed schema transformationA
B
TA
TB
C
![Page 16: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/16.jpg)
Schema transformation
• Central table– Longest n:1, 1:1 path
• Dimension table– Central transformation ‘around’ 1:n
table. – Link tables are decomposed into a set
of 1:n first
![Page 17: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/17.jpg)
MartBuilder• Input
– central object– database meta data– cardinalities
• Output– Set of SQL statements:
• “create table as select …”
• Transformations – represented as asymmetric tree
![Page 18: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/18.jpg)
MartBuilder
DATASET: hsapiens_gene_ensemblTYPE MAIN [M] DIMENSION [D] EXIT [E]: MTABLE NAME: genegene: alt_allele cardinality [11] [n1] [0n] [1n] [SKIP S]: Sgene: gene cardinality [11] [n1] [0n] [1n] [SKIP S]: Sgene: gene_description cardinality [11] [n1] [0n] [1n] [SKIP S]: 11gene: gene_stable_id cardinality [11] [n1] [0n] [1n] [SKIP S]: 11gene: kk__gene__main cardinality [11] [n1] [0n] [1n] [SKIP S]: Sgene: transcript cardinality [11] [n1] [0n] [1n] [SKIP S]: Sgene: analysis cardinality [11] [n1] [0n] [1n] [SKIP S]: n1gene: dna cardinality [11] [n1] [0n] [1n] [SKIP S]: Sgene: dnac cardinality [11] [n1] [0n] [1n] [SKIP S]: Sgene: seq_region cardinality [11] [n1] [0n] [1n] [SKIP S]: STYPE MAIN [M] DIMENSION [D] EXIT [E]: EADD EXTENSION: hsapiens_gene_ensembl__gene__MAIN [Y|N]: NCHANGE FINAL TABLE NAME: hsapiens_gene_ensembl__gene__MAIN TO:
CREATE TABLE TEMP0 as SELECT gene.gene_id,gene.type,gene.analysis_id,gene.seq_region_id,gene.seq_region_start,gene.seq_region_end,gene.seq_region_strand,gene.display_xref_id,gene_description.gene_id AS gene_id_TEMP0,gene_description.description FROM gene, gene_description WHERE gene_description.gene_id = gene.gene_id;CREATE TABLE hsapiens_gene_ensembl__gene__MAIN as SELECT TEMP0.gene_id,TEMP0.type,TEMP0.analysis_id,TEMP0.seq_region_id,TEMP0.seq_region_start,TEMP0.seq_region_end,TEMP0.seq_region_strand,TEMP0.display_xref_id,TEMP0.gene_id_TEMP0,TEMP0.description,gene_stable_id.gene_id AS gene_id_TEMP1,gene_stable_id.stable_id,gene_stable_id.version FROM TEMP0, gene_stable_id WHERE gene_stable_id.gene_id = TEMP0.gene_id;drop table TEMP0;
![Page 19: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/19.jpg)
Transformation configuration
satellog_repeats M repeats disease n1satellog_repeats M repeats gc 11satellog_repeats M repeats linkage_depth Ssatellog_repeats M repeats repeats Ssatellog_repeats M repeats transcripts Ssatellog_repeats M repeats ugcount Ssatellog_repeats M repeats ugstats Ssatellog_repeats M repeats rep_class n1satellog_repeats D ugcount ugcount Ssatellog_repeats D ugcount ugstats Ssatellog_repeats D ugcount gc Ssatellog_repeats D ugcount repeats n1r
![Page 20: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/20.jpg)
Data access
![Page 21: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/21.jpg)
Dataset – Key Abstraction
• Dataset– Organised into a single schema– BioMart database contains one or more dataset(s)– Attribute– Filter– Exportable/Importable (Links)
• Dataset - an equivalent of relational table– Exportable/Importable = PK/FK
![Page 22: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/22.jpg)
Key Abstractions
GENE CENTRAL
gene_id(PK)gene_stable_id gene_startgene_chrom_endchromosomegene_display_iddescription
Mart
Dataset
Attribute
Filter
![Page 23: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/23.jpg)
Exportables, Importables and Links
• Exportable = ordered list of attributes• Importable = ordered list of filters
– WHERE filt1=value1– WHERE filt1=value1 or filt1=value2– WHERE filt1>value1 and filt2<value2
• Links = matching importable and exportable
![Page 24: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/24.jpg)
MartView
![Page 25: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/25.jpg)
Dataset Configuration
• Dataset configuration • Attributes • Filters• Trees, Groups, Collections• Links • Semantics• Relational mapping
• User interface• Linking datasets• XML-based
![Page 26: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/26.jpg)
Dataset Configuration
XML
XML
XML
![Page 27: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/27.jpg)
Table naming conventionNaïve configuration
• Tables– Meta tables meta_content– Data tables dataset__content__type
• Data tables– Main __main – Dimension __dm
• Columns– Key _key– Boolean filter _bool– List filter _list
![Page 28: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/28.jpg)
MartEditor
![Page 29: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/29.jpg)
MartEditor
• Naïve configuration• Updates• Links• Automatic discovery of new tables
![Page 30: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/30.jpg)
Class diagram - configuration
![Page 31: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/31.jpg)
Class diagram - querying
![Page 32: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/32.jpg)
Information flow
• Read connections• Register individual datasets and create
linked datasets• Get input from the user, split queries to
individual datasets. • Find the shortest path between datasets
(Dijikstra)• Compile SQL
![Page 33: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/33.jpg)
Summary
![Page 34: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/34.jpg)
BioMart
• Domain independent• Platform independent
– MySQL 4– Oracle 9i
• Plugin architecture
![Page 35: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/35.jpg)
BioMart model
• Already applied– Ensembl– Vega– dbSNP– Uniprot– MSD– Variety of small projects
• In development– ArrayExpress– Wormbase– RGD
![Page 36: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/36.jpg)
Future work
• BioMart v 0.2 to be released later on in january
• Java library to be upgraded over coming months to the new architecture
• BioMart has been integrated with Taverna
• MartBuilder - to be properly implemented
![Page 37: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/37.jpg)
BioMart
• www.ebi.ac.uk/biomart• Open source (LGPL)• Public MySQL server• ftp• [email protected]• [email protected]
![Page 38: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/38.jpg)
Acknowledgments
• BioMart– Damian Smedley– Darin London
• Contributors– Arne Stabenau (Ensembl)– Andreas Kahari (Ensembl)– Craig Melsopp (Ensembl)– Katerina Tzouvara (Uniprot)– Paul Donlon (Unilever)– Will Spooner (CSHL)
![Page 39: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/39.jpg)
![Page 40: BioMart Query Network](https://reader034.vdocument.in/reader034/viewer/2022051401/56814fbb550346895dbd7546/html5/thumbnails/40.jpg)