fabrizia bignami
TRANSCRIPT
Database harmonisation and IT infrastructureDatabase harmonisation and IT infrastructure
Jan-Eric Litton, Karolinska Institutet, Stockholm, Sweden
16 September 2009
Preparatory Phase Grant Agreement Preparatory Phase Grant Agreement 212111212111
Connecting Biobanks - Connecting Biobanks - leading design principalsleading design principals
The leading seven design principals during the work for the presented scenarios, use cases and software architecture are: • Assuring confidentiality of donors • Following a user centered approach and providing a
maximum support for researchers • Use of up to date Web-Technologies • Flexibili ty in terms of biobank content and schema handling • Extensibili ty in terms of additional participants and adding
new data and information • Eff iciency in terms of query processing • Low effort for biobanks to participate in federation
We are using a HUB and spoke structure We are using a HUB and spoke structure with federated databases. with federated databases.
HUB - In general, a hub is the central part of a wheel where the spokes come together. The term is familiar to frequent fliers who travel through airport "hubs" to make connecting flights from one point to another. In data communications, a hub is a place of convergence where data arrives from one or more directions and is forwarded out in one or more other directions.
Hub and Spoke for EuropeanHub and Spoke for European Biobank Network Biobank Network
• No need to connect all with all on “network” level– Database federation system does the routing
of traffic (queries)– Hubs provide the database services
• Single access point (simpler to manage)– Hubs can be federated as well– We can have many Hubs in different
geographical places, genotype Hub, phenotype Hub, Sample Hub, Meta Data Hub, National Hub
GenomEUtwin and TwinNETGenomEUtwin and TwinNET
bbmri.nobbmri.se
id=1 id=1
bbmri.au
id=1
bbmri.nl
id=1
bbmri.fi
id=1
Common Schema for all national HUBSCommon Schema for all national HUBS
A simple exampleA simple example
User scenario 1: Get attributes for biobanks Peter logs in to the BBMRI portal and goes to the search function. He then first needs to choose one of the ontologies that are included in UMLS Metathesaurus :
a. Select ontology ‡ ICD - 10
He makes the following choices of input (by query statements or predefined scrollbars) for the set "Attributes for biobanks":
b. Type of diagnosis | Using ICD - 10 ‡ C50. //Note: non - specific code not us ed for diagnosis
c. Type of sample ‡ tissue
d. Method of preservation ‡ All
e. Data neede d ‡ Diagnosis information
Number of samples ‡ Collections with at least 100 samples
A simple exampleA simple example
Biobank Nationality Source Vocabulary
Diagnosis (non-spec)
Sample Preservation Approximate number
UDN Austria ICD-10 C50. tissue Formalin-fixed
1000
ReTRA Austria ICD-10 C50. tissue Formalin-fixed
400
CAS Germany ICD-9-CM 174 tissue Fresh/frozen 1200
DREB France ICD-10 C50. tissue Formalin-fixed
100
SMIN Sweden SNOMED CT 254837009 tissue Formalin-fixed
800
UPPA Sweden SNOMED CT 254837009 tissue Fresh/frozen 800
BReMa Iceland ICD-9-CM 174 tissue Fresh/frozen 400
...
WP5 - objectivesWP5 - objectives
Task 1 – consensus on the requirements for a general information management system for biobanks in Europe
Task 2 – how to maintain unique and secure identities (object models) for specimens, subjects and biobanks
Task 3 – strategy for communication between biobanks including a common nomenclature, compatible software techniques and appropriate information transmission policies
WP5 - lexiconWP5 - lexicon
Swedish versionEnglish versionSpanish versionItalian versionEstonia version
database federato
un tipo di sistema di gestione di basi di dati che integra, in maniera trasparente …..
WP5 - crewWP5 - crewAndrea Calabria, CNR-ITB (IT)Mario Caccamo, EMBL-EBI (UK)Claus Dabringer, UNI-KLU (AT)Johann Eder, UNI-KLU (AT)Paul Flicek, EMBL-EBI (UK)Ruslan Fomkin, KI (SE)Martin Fransson, KI (SE)Hákon Gudbjartsson, deCODE (IS)Andy Harris, UK Biobank (UK)Hans Hillege, UMCG (NL)Maria Krestyaninova, EMBL-EBI (UK)Klaus Kuhn, TUM (DE)
Erkki Leego, UTARTU (EE)Jan-Eric Litton, KI (SE)Fernando López, VITRO, SA (ES)Ioannis Michalopoulos, BRFAA (GR)Luciano Milanesi, CNR-ITB (IT)Juha Muilu, NPHI (FI)Louis Rechaussat, INSERM (FR)Blandine Rimbault, INSERM (FR)Tore Risch, UU (SE)Pedro Roiz, VITRO, SA (ES)Paolo Romano, IST (IT)Morris Swertz, UMCG (NL)
WP5WP5
•Meetings arranged:
–1st WP5 team meeting, March 26-27, 2008, Stockholm, Sweden
–2nd WP5 team meeting, September 28-29, 2008, Reykjavik, Iceland
–3rd WP5 team meeting, March 23-34, 2009, Brussels, Belgium
•Upcoming meetings
–4th WP5 team meeting, October 14-15, Milan, Italy
Challenges….Challenges….
Data definitions change over time How to connect to medical ontologies Integration of phenotype and genotype data The natural language issue
And of course ELSI
Jan-Eric Litton, ProfDepartment of Medical Epidemiology and BiostatisticsKarolinska InstitutetBox 281SE-17177 StockholmSweden
Tel: +46 8 52487759E-mail: [email protected]
www.bbmri.eu
Further information