an introduction to the biomedical informatics research network (birn) gully apc burns information...
DESCRIPTION
The Purpose of BIRN âto advance biomedical research through data sharing and online collaborationâTRANSCRIPT
An Introduction to the Biomedical Informatics
Research Network (BIRN)Gully APC Burns
Information Sciences InstituteUniversity of Southern California
This Presentation⢠The BIRN Vision⢠Project History⢠Communities ⢠Specific Capabilities
â Data Management â BIRN Pathologyâ Information Integration â BIRN Mediatorâ Knowledge Engineering â âKEfEDâ
⢠http://www.birncommunity.org
The Purpose of BIRNâto advance biomedical research
through data sharing and online collaborationâ
Challenges ⌠Opportunities
What is the main challenge we face?⢠Not just size of datasets âŚâ˘ Not just communication bandwidth âŚ
The challenge is heterogeneity.⢠In 2010, NIH funded 31,232 RO1 grants⢠How do we provide practical data sharing
and collaboration solutions at that scale?
Use Cases and Capabilities
Focus on the things our users wantDevelop Capabilities â i.e., our ability to:
â Publish data / Deploy toolsâ Rapidly develop new lightweight systems
tailored for specialized use casesâ Reconcile data across disparate sources â Provide security (credentials, encryption)â Develop and enable scientific reasoning
systems
Background and History⢠Funded by National Center for Research Resources in 2001.
⢠Three neuroimaging testbeds initially provided use cases for developed technology.
⢠Reorganized in 2008 to provide software-based solutions for data sharing in the biosciences. No longer focused on neuroimaging (or even imaging).
⢠One can no longer âlog into BIRNâ. New model of providing modular capabilities that users can fit into existing toolset.
Sites and Collaborators
Technical Approach
⢠Bottom up, not top down⢠Focus on user requirements - what
they want to do⢠Create solutions - factor out common
requirementsâ Capability model includes software and
processâ Avoid âBig Design up Frontâ (BDUF)
Dissemination Models⢠Different problems require different kinds of
solutions.â BIRN operates services for all users; e.g., user
registration serviceâ BIRN provides kits for project teams to deploy
services for their members; e.g., data sharingâ BIRN provides downloadable tools for individuals to
use on their own; e.g., pathology databases, integration systems, reasoning systems, workflows.
⢠Understanding deployment needs is part of defining the problem to be solved.
Working Groups⢠New capabilities are defined,
designed, and disseminated by BIRN Working Groupsâ Operationsâ Securityâ Data Management â Information Integration â Knowledge Engineeringâ Workflow Tools â Genomics
⢠Projects are vetted by BIRN Steering Committee for appropriateness and resource planning
⢠Collaborators are expected to be actively involved in development and deployment.
⢠Collaborators are not expected to use complete BIRN capability kit but should pick and choose to suit their own needs.
Working with BIRN
Some of our Communities âŚâ˘ FBIRN
â Schizophrenic neuroimaging, fMRI data⢠Nonhuman Primates Research
Consortium â Colony management, pathology, genetics
⢠Cardiovascular Research Grid⢠Kinetics Foundation
â Parkinsonâs disease therapeutics⌠and others
Data Management⢠BIRN supports data
intensive activities including:â Imaging, Microscopy,
Genomics, Time Series, Analytics and moreâŚ
⢠BIRN utilities scale:â TB data setsâ 100 MB â 2 GB Filesâ Millions of Files
Data Service Deployment
BIRN PATHOLOGYProject Overview
BIRN Pathology Technology âStackâ
LAMP
Database Development
Virtual Microscop
ySearch
Application (e.g., Pathology)Data Integration
Security
File TransferData Management
âCapabilitiesâOther BIRN
âCapabilitiesâ 3rd Party Tools Application Specific
key
Database Development
⢠In a nutshell:⢠Django + Extensions
⢠Features⢠Django Web Framework
⢠MVC, ORM, HTML Templates⢠Auto-generated data forms!
⢠RESTful interface routing⢠Spreadsheet batch importer⢠Crowd integration (users/groups/SSO)⢠Data object access control (ACLs)⢠Javascript UI widgets⢠Scheduled database backup
⢠System Requirements⢠Linux, Apache, MySQL or Postgres,
Python, Django
Use Cases
Scientists need to create custom web-accessible databases to share data within and between research groups.
Scientists store data on spreadsheets, need to manage this data (DBMS), and expose it programmatically (WS REST)
Virtual Microscopy
⢠Image Processing Tools⢠Accepts images in numerous
proprietary microscopy formats (Bio-Formats)
⢠Creates pyramidal, tiled image sets in open format (JPEG) used by Zoomify image viewer
⢠Extracts proprietary graphical annotation formats and converts them to Zoomify annotation format
⢠Image Server + Database⢠(Built on our Database Development
tools)⢠Monitors an image upload directory,
automatically processes images, and updates the image database
⢠Stores and serves annotations according to associated ACLs
Use Cases
Scientists need to share high resolution (100X) images over relatively low bandwidth networks.
Scientists need to annotate and share annotations on images.
Search⢠Features
⢠Single-site, full-text indexing of databases and text files.
⢠Search results returned as Python objects based on Django ORM mappings.
⢠âAdvanced searchâ; i.e., multiple fields, and logical expressions.
⢠Easy to synchronize index with changes to Django-based object model.
⢠Technology⢠Based on Xapian and Djapian
Use Case
Scientists need to search their databases using full-text queries and field-based âadvancedâ search queries.
Pathology Application
⢠Features⢠Pathology Data Model
⢠Subject, Case, Specimen, Tissue, Image entities⢠Species, Disease, Etiology domain tables
⢠Pathology data entry workflows⢠UI forms tailored to the data entry workflow of
pathologists.⢠Microscopy support for proprietary
formats:⢠Aperio SVS (supported)⢠Hammamatsu NDPI (partially supported)⢠Olympus VSI (under development)
⢠And, of course, all of the features from the underlying software and services⢠Database Development + Virtual Microscopy +
Search⢠Data Integration using BIRN Mediator
⢠Capability to run federated queries across sites
Use Cases
Pathologists create images in proprietary microscopy formats, annotate the images, and need to upload them to a web-accessible database for sharing among collaborating sites.
Pathologists associate metadata and supplementary files with their microscopy images.
Screenshots
Screenshots
Screenshots
Information Integration⢠BIRN supports integration
across complex data sourcesâ Can process wide variety of
structured & semi-structured sources (DBMS, XML, HTML, Excel, XML, SOAP
⢠Use Schema & Dataâ Source Modeling & Record
Linkage⢠Infrastructure capabilities
â Security, Efficient Query Execution, SQL-like syntax across multiple sources.
Decision Support
Application Programs
Mediator
KnowledgeBases
Databases Computer Programs
The Web
Information Mediator⢠Virtual Integration Architecture:
â Virtual organization: community of data providers and consumers that want to share data for a specific purpose
â Autonomous sources: data, control remains at sources; no change to access methods, schemas; data accessed real-time in response to user queries
â Mediator: integrator defines domain schema and describes source contents
⢠Domain schema: agreed upon view of the domain preferred by the virtual organization
⢠Source descriptions: logical formulas relating source and domain schemas
BIRN MEDIATORProject Overview
Information Mediator⢠Query Answering
â User writes query in domain schemaâ Mediator:
⢠Determines sources relevant to user query⢠Rewrites query in sources schemas⢠Breaks query into sub-queries for sources⢠Optimizes query evaluation plan⢠Combines answers from sources
â Efficient query evaluation⢠Streaming dataflow
The Information MediatorUser Queries / Web Portal / Services
Security
Information Integration
âCapabilitiesâOther BIRN âCapabilitiesâ Application Specific
key
Execution Engine
Optimizer
Reformulation
Wrapper WrapperWrapper
Logical SourceDescriptions
Data Sources
Database Consolidation
⢠Construct a âvirtual organizationâ ⢠A community of data providers and consumers
sharing data for common specific purpose
⢠All sources are autonomous ⢠data, control remains at sources⢠no change to access methods, schemas; ⢠data accessed real-time in response to user
queries
⢠Work consists of modeling domain schema and source contents⢠Domain schema = agreed upon view of the
domain preferred by the virtual organization⢠Source descriptions = logical formulas relating
source and domain schemas
⢠Implemented in multiple domains: ⢠fMRI : Ashish, et al. (2010) âNeuroscience Data
Integration through Mediation: An (F)BIRN Case Study.â Front. Neuroinf. 4:118
⢠Cardiovascular Research Grid⢠Non-Human Primate Research Consortium⢠Child Neurodevelopmental Disorders
Use Cases
Scientists from different groups want to query across two databases with different schema
Databases may be completely different (i.e., one group uses Excel spreadsheets, another uses Filemaker Pro and a third uses Oracle)
Database Extension
⢠Reapply the mediator technology to sources from different subjects⢠e.g., link genetics data to imaging data.
⢠Not dependent on a universal, global ontology, but a locally-defined model specific for the application
⢠Develop models using Datalog that are then processed by the BIRN Mediator, and OGSA-DAI / OGSA-DQP technology.
Use Cases
Scientists want to bring together data from different sources into a a single, common domain model
Sources will require linkage at the level of schema and data
EZ-Mediator
⢠This is not a particular difficult task to manage by hand but automating it is even simpler within our framework ⢠Small improvements can lead to
large gains!⢠Based on the same mediator technology as previously described
Use Cases
Can we, with the absolute minimum of effort, run queries across separate databases that have the same schema?
Screenshots
Screenshots
Knowledge Engineering
Scientific assertions as âComputable, citable
elementsâ⢠There are very large number of statements like âmice like
cheeseââ semantics at this level are complicated!
⢠For example:â âNovel neurotrophic factor CDNF protects midbrain dopamine
neurons in vivoâ [Lindholm et al 2007]â âHippocampo-hypothalamic connections: origin in subicular
cortex, not ammon's horn.â [Swanson & Cowan 1975]â âIntravenous 2-deoxy-D-glucose injection rapidly elevates levels
of the phosphorylated forms of p44/42 mitogen-activated protein kinases (extracellularly regulated kinases 1/2) in rat hypothalamic parvicellular paraventricular neurons.â [Khan & Watts 2004]
⢠Assertions vary in their levels of reliability, specificity.⢠Can we introduce a generalized formalism that could support
automated reasoning?
Cycles of Scientific Investigation (âCoSIâ)
e.g., âCDNF protects nigral dopaminergic neurons in-vivoâ
This statistically-significant effect is the experimental basis for the findings of this study.
Our ontology engineering approach is based on experimental variables
from Lindholm, P. et al. (2007), Nature, 448(7149): p. 73-7
Knowledge Engineering from
Experimental Design (âKEfEDâ)
Khan et al. (2007), J. Neurosci. 27:7344-60 [expt 2]
KNOWLEDGE ENGINEERING FROM EXPERIMENTAL DESIGNâKEfEDâProject Overview
BioScholarApplication
⢠Develop a knowledge base framework for observations and interpretations from experiments.
⢠Scientists manually curate data by hand from publications into generic database driven by KEfED model
⢠Can reuse designs for multiple experiments
⢠Design process is intuitive, can build a database without informatics training⢠Ideal for non-computational biologists.⢠Java / Flex Web application, one click
install
Use Cases
Scientists want to develop a generic knowledge base driven from a corpus of PDF files stored locally within a specific laboratory
CruxApplication
⢠Scientists within a disease foundation must plot a whole research program
⢠How to keep track of hypotheses, experimental results and outcomes to plan the next phase of the project?
⢠System is just about to start year 2 of funding geared towards curation of raw data (not from publications).
⢠System can permit scientists with no computational expertise develop scientific data repositories.
⢠Project driven by the Kinetics Foundation (with initial funding from M.J. Fox Foundation).
Use Cases
Decision makers at a disease foundation want to store raw data generated a generic knowledge base driven from a corpus of PDF files stored locally within a specific laboratory
Screenshots
Screenshots
Summary⢠BIRN provides capability kits and
consulting to facilitate internal and external data sharing and other aspects of collaboration.
⢠BIRN takes a bottom-up, modular, capability-based approach that is driven by end user needs.
⢠End users are viewed as collaborators and are actively involved in development process.
BIRN Program Announcements
⢠Provides funds to work with BIRN on projects relating to data sharing (PAR-07-426) and the associated ontology (PAR-07-425) for the data.
⢠Mechanism to fund the leveraging of BIRN capabilities
⢠BIRN provides assistance in framing proposals and developing projects.
Contact InformationGeneral:
â [email protected] Project Manager:
â Joe Ames: [email protected] Outreach:
â Karl Helmer: [email protected]:
â http://www.birncommunity.org â https://wiki.birncommunity.org:8443/
Acknowledgements⢠Executive Team
â Carl Kesselmanâ Joe Amesâ Karl Helmer
⢠Data Managementâ Ann Chervenakâ Rob Schulerâ David Smithâ Laura Pearlman
⢠Information Integrationâ Jose Luis Ambiteâ Gowri Gumaraguruparanâ Maria Musleaâ Naveen Ashish
⢠Knowledge Engineeringâ Jessica Turnerâ Tom Russ
⢠Securityâ Rachana Ananthakrishnan
⢠Communitiesâ NonHuman Primate
Research Consortiumâ FBIRN â CVRGâ Mouse BIRNâ Mouse Genome Informatics â Brain Architecture Group @
USC
And Many Many MoreâŚ