an introduction to the biomedical informatics research network (birn) gully apc burns information...

48
An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Upload: beryl-turner

Post on 18-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

The Purpose of BIRN “to advance biomedical research through data sharing and online collaboration”

TRANSCRIPT

Page 1: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

An Introduction to the Biomedical Informatics

Research Network (BIRN)Gully APC Burns

Information Sciences InstituteUniversity of Southern California

Page 2: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

This Presentation• The BIRN Vision• Project History• Communities • Specific Capabilities

– Data Management – BIRN Pathology– Information Integration – BIRN Mediator– Knowledge Engineering – ‘KEfED’

• http://www.birncommunity.org

Page 3: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

The Purpose of BIRN“to advance biomedical research

through data sharing and online collaboration”

Page 4: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Challenges … Opportunities

What is the main challenge we face?• Not just size of datasets …• Not just communication bandwidth …

The challenge is heterogeneity.• In 2010, NIH funded 31,232 RO1 grants• How do we provide practical data sharing

and collaboration solutions at that scale?

Page 5: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Use Cases and Capabilities

Focus on the things our users wantDevelop Capabilities – i.e., our ability to:

– Publish data / Deploy tools– Rapidly develop new lightweight systems

tailored for specialized use cases– Reconcile data across disparate sources – Provide security (credentials, encryption)– Develop and enable scientific reasoning

systems

Page 6: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Background and History• Funded by National Center for Research Resources in 2001.

• Three neuroimaging testbeds initially provided use cases for developed technology.

• Reorganized in 2008 to provide software-based solutions for data sharing in the biosciences. No longer focused on neuroimaging (or even imaging).

• One can no longer ‘log into BIRN’. New model of providing modular capabilities that users can fit into existing toolset.

Page 7: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Sites and Collaborators

Page 8: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Technical Approach

• Bottom up, not top down• Focus on user requirements - what

they want to do• Create solutions - factor out common

requirements– Capability model includes software and

process– Avoid “Big Design up Front” (BDUF)

Page 9: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Dissemination Models• Different problems require different kinds of

solutions.– BIRN operates services for all users; e.g., user

registration service– BIRN provides kits for project teams to deploy

services for their members; e.g., data sharing– BIRN provides downloadable tools for individuals to

use on their own; e.g., pathology databases, integration systems, reasoning systems, workflows.

• Understanding deployment needs is part of defining the problem to be solved.

Page 10: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Working Groups• New capabilities are defined,

designed, and disseminated by BIRN Working Groups– Operations– Security– Data Management – Information Integration – Knowledge Engineering– Workflow Tools – Genomics

Page 11: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

• Projects are vetted by BIRN Steering Committee for appropriateness and resource planning

• Collaborators are expected to be actively involved in development and deployment.

• Collaborators are not expected to use complete BIRN capability kit but should pick and choose to suit their own needs.

Working with BIRN

Page 12: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Some of our Communities …• FBIRN

– Schizophrenic neuroimaging, fMRI data• Nonhuman Primates Research

Consortium – Colony management, pathology, genetics

• Cardiovascular Research Grid• Kinetics Foundation

– Parkinson’s disease therapeutics… and others

Page 13: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Data Management• BIRN supports data

intensive activities including:– Imaging, Microscopy,

Genomics, Time Series, Analytics and more…

• BIRN utilities scale:– TB data sets– 100 MB – 2 GB Files– Millions of Files

Page 14: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Data Service Deployment

Page 15: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

BIRN PATHOLOGYProject Overview

Page 16: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

BIRN Pathology Technology ‘Stack’

LAMP

Database Development

Virtual Microscop

ySearch

Application (e.g., Pathology)Data Integration

Security

File TransferData Management

“Capabilities”Other BIRN

“Capabilities” 3rd Party Tools Application Specific

key

Page 17: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Database Development

• In a nutshell:• Django + Extensions

• Features• Django Web Framework

• MVC, ORM, HTML Templates• Auto-generated data forms!

• RESTful interface routing• Spreadsheet batch importer• Crowd integration (users/groups/SSO)• Data object access control (ACLs)• Javascript UI widgets• Scheduled database backup

• System Requirements• Linux, Apache, MySQL or Postgres,

Python, Django

Use Cases

Scientists need to create custom web-accessible databases to share data within and between research groups.

Scientists store data on spreadsheets, need to manage this data (DBMS), and expose it programmatically (WS REST)

Page 18: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Virtual Microscopy

• Image Processing Tools• Accepts images in numerous

proprietary microscopy formats (Bio-Formats)

• Creates pyramidal, tiled image sets in open format (JPEG) used by Zoomify image viewer

• Extracts proprietary graphical annotation formats and converts them to Zoomify annotation format

• Image Server + Database• (Built on our Database Development

tools)• Monitors an image upload directory,

automatically processes images, and updates the image database

• Stores and serves annotations according to associated ACLs

Use Cases

Scientists need to share high resolution (100X) images over relatively low bandwidth networks.

Scientists need to annotate and share annotations on images.

Page 19: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Search• Features

• Single-site, full-text indexing of databases and text files.

• Search results returned as Python objects based on Django ORM mappings.

• ‘Advanced search’; i.e., multiple fields, and logical expressions.

• Easy to synchronize index with changes to Django-based object model.

• Technology• Based on Xapian and Djapian

Use Case

Scientists need to search their databases using full-text queries and field-based “advanced” search queries.

Page 20: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Pathology Application

• Features• Pathology Data Model

• Subject, Case, Specimen, Tissue, Image entities• Species, Disease, Etiology domain tables

• Pathology data entry workflows• UI forms tailored to the data entry workflow of

pathologists.• Microscopy support for proprietary

formats:• Aperio SVS (supported)• Hammamatsu NDPI (partially supported)• Olympus VSI (under development)

• And, of course, all of the features from the underlying software and services• Database Development + Virtual Microscopy +

Search• Data Integration using BIRN Mediator

• Capability to run federated queries across sites

Use Cases

Pathologists create images in proprietary microscopy formats, annotate the images, and need to upload them to a web-accessible database for sharing among collaborating sites.

Pathologists associate metadata and supplementary files with their microscopy images.

Page 21: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Screenshots

Page 22: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Screenshots

Page 23: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Screenshots

Page 24: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Information Integration• BIRN supports integration

across complex data sources– Can process wide variety of

structured & semi-structured sources (DBMS, XML, HTML, Excel, XML, SOAP

• Use Schema & Data– Source Modeling & Record

Linkage• Infrastructure capabilities

– Security, Efficient Query Execution, SQL-like syntax across multiple sources.

Decision Support

Application Programs

Mediator

KnowledgeBases

Databases Computer Programs

The Web

Page 25: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Information Mediator• Virtual Integration Architecture:

– Virtual organization: community of data providers and consumers that want to share data for a specific purpose

– Autonomous sources: data, control remains at sources; no change to access methods, schemas; data accessed real-time in response to user queries

– Mediator: integrator defines domain schema and describes source contents

• Domain schema: agreed upon view of the domain preferred by the virtual organization

• Source descriptions: logical formulas relating source and domain schemas

Page 26: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

BIRN MEDIATORProject Overview

Page 27: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Information Mediator• Query Answering

– User writes query in domain schema– Mediator:

• Determines sources relevant to user query• Rewrites query in sources schemas• Breaks query into sub-queries for sources• Optimizes query evaluation plan• Combines answers from sources

– Efficient query evaluation• Streaming dataflow

Page 28: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

The Information MediatorUser Queries / Web Portal / Services

Security

Information Integration

‘Capabilities’Other BIRN ‘Capabilities’ Application Specific

key

Execution Engine

Optimizer

Reformulation

Wrapper WrapperWrapper

Logical SourceDescriptions

Data Sources

Page 29: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Database Consolidation

• Construct a ‘virtual organization’ • A community of data providers and consumers

sharing data for common specific purpose

• All sources are autonomous • data, control remains at sources• no change to access methods, schemas; • data accessed real-time in response to user

queries

• Work consists of modeling domain schema and source contents• Domain schema = agreed upon view of the

domain preferred by the virtual organization• Source descriptions = logical formulas relating

source and domain schemas

• Implemented in multiple domains: • fMRI : Ashish, et al. (2010) “Neuroscience Data

Integration through Mediation: An (F)BIRN Case Study.” Front. Neuroinf. 4:118

• Cardiovascular Research Grid• Non-Human Primate Research Consortium• Child Neurodevelopmental Disorders

Use Cases

Scientists from different groups want to query across two databases with different schema

Databases may be completely different (i.e., one group uses Excel spreadsheets, another uses Filemaker Pro and a third uses Oracle)

Page 30: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Database Extension

• Reapply the mediator technology to sources from different subjects• e.g., link genetics data to imaging data.

• Not dependent on a universal, global ontology, but a locally-defined model specific for the application

• Develop models using Datalog that are then processed by the BIRN Mediator, and OGSA-DAI / OGSA-DQP technology.

Use Cases

Scientists want to bring together data from different sources into a a single, common domain model

Sources will require linkage at the level of schema and data

Page 31: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

EZ-Mediator

• This is not a particular difficult task to manage by hand but automating it is even simpler within our framework • Small improvements can lead to

large gains!• Based on the same mediator technology as previously described

Use Cases

Can we, with the absolute minimum of effort, run queries across separate databases that have the same schema?

Page 32: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Screenshots

Page 33: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Screenshots

Page 34: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Knowledge Engineering

Page 35: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Scientific assertions as ‘Computable, citable

elements’• There are very large number of statements like ‘mice like

cheese’– semantics at this level are complicated!

• For example:– “Novel neurotrophic factor CDNF protects midbrain dopamine

neurons in vivo” [Lindholm et al 2007]– “Hippocampo-hypothalamic connections: origin in subicular

cortex, not ammon's horn.” [Swanson & Cowan 1975]– “Intravenous 2-deoxy-D-glucose injection rapidly elevates levels

of the phosphorylated forms of p44/42 mitogen-activated protein kinases (extracellularly regulated kinases 1/2) in rat hypothalamic parvicellular paraventricular neurons.” [Khan & Watts 2004]

• Assertions vary in their levels of reliability, specificity.• Can we introduce a generalized formalism that could support

automated reasoning?

Page 36: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Cycles of Scientific Investigation (‘CoSI’)

Page 37: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

e.g., ‘CDNF protects nigral dopaminergic neurons in-vivo’

This statistically-significant effect is the experimental basis for the findings of this study.

Our ontology engineering approach is based on experimental variables

from Lindholm, P. et al. (2007), Nature, 448(7149): p. 73-7

Page 38: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Knowledge Engineering from

Experimental Design (‘KEfED’)

Khan et al. (2007), J. Neurosci. 27:7344-60 [expt 2]

Page 39: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

KNOWLEDGE ENGINEERING FROM EXPERIMENTAL DESIGN‘KEfED’Project Overview

Page 40: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

BioScholarApplication

• Develop a knowledge base framework for observations and interpretations from experiments.

• Scientists manually curate data by hand from publications into generic database driven by KEfED model

• Can reuse designs for multiple experiments

• Design process is intuitive, can build a database without informatics training• Ideal for non-computational biologists.• Java / Flex Web application, one click

install

Use Cases

Scientists want to develop a generic knowledge base driven from a corpus of PDF files stored locally within a specific laboratory

Page 41: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

CruxApplication

• Scientists within a disease foundation must plot a whole research program

• How to keep track of hypotheses, experimental results and outcomes to plan the next phase of the project?

• System is just about to start year 2 of funding geared towards curation of raw data (not from publications).

• System can permit scientists with no computational expertise develop scientific data repositories.

• Project driven by the Kinetics Foundation (with initial funding from M.J. Fox Foundation).

Use Cases

Decision makers at a disease foundation want to store raw data generated a generic knowledge base driven from a corpus of PDF files stored locally within a specific laboratory

Page 42: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Screenshots

Page 43: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Screenshots

Page 44: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Summary• BIRN provides capability kits and

consulting to facilitate internal and external data sharing and other aspects of collaboration.

• BIRN takes a bottom-up, modular, capability-based approach that is driven by end user needs.

• End users are viewed as collaborators and are actively involved in development process.

Page 45: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

BIRN Program Announcements

• Provides funds to work with BIRN on projects relating to data sharing (PAR-07-426) and the associated ontology (PAR-07-425) for the data.

• Mechanism to fund the leveraging of BIRN capabilities

• BIRN provides assistance in framing proposals and developing projects.

Page 46: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Contact InformationGeneral:

– [email protected] Project Manager:

– Joe Ames: [email protected] Outreach:

– Karl Helmer: [email protected]:

– http://www.birncommunity.org – https://wiki.birncommunity.org:8443/

Page 47: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California

Acknowledgements• Executive Team

– Carl Kesselman– Joe Ames– Karl Helmer

• Data Management– Ann Chervenak– Rob Schuler– David Smith– Laura Pearlman

• Information Integration– Jose Luis Ambite– Gowri Gumaraguruparan– Maria Muslea– Naveen Ashish

• Knowledge Engineering– Jessica Turner– Tom Russ

• Security– Rachana Ananthakrishnan

• Communities– NonHuman Primate

Research Consortium– FBIRN – CVRG– Mouse BIRN– Mouse Genome Informatics – Brain Architecture Group @

USC

And Many Many More…

Page 48: An Introduction to the Biomedical Informatics Research Network (BIRN) Gully APC Burns Information Sciences Institute University of Southern California