page 1 informatics pilot project edrn knowledge system working group san antonio, texas january 21,...

20
Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion Laboratory, California Institute of Technology, National Aeronautics and Space Administration

Upload: madlyn-maxwell

Post on 12-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 1

Informatics Pilot Project

EDRN Knowledge System Working Group

San Antonio, TexasJanuary 21, 2001

Steve HughesThuy Tran

Dan Crichton

Jet Propulsion Laboratory, California Institute of Technology,National Aeronautics and Space Administration

                                                                       

Page 2: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 2

Problem Definition and Proposal

Overview Problem: Specimen data is geographically distributed across

heterogeneous data systems making the location, retrieval and use of this data difficult.

Solution: Build a “data architecture” for the EDRN network Use “metadata” as a key to interoperability Provide services for data sharing, archiving and distribution Provide a software framework that allows analysis tools to be

plugged into the EDRN data enterprise

Benefit: Correlating data across multiple centers affords an opportunity to create new data sets and data awareness

Example: Find all prostate tissue samples for men ages 70 and older collected before 1980 from databases across the EDRN

Page 3: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 3

EDRN Data Architecture Evolution

Data System Evolution

Local Database - Local Tools - No Data Sharing between Centers - No Common Data Elements

Limited Data Sharing - Manual Data Sharing - Manual Correlation - Export/Import Data - Limited CDEs

Full Data Sharing - Location Independence - Data Interchange - Data Sharing - Common CDEs between centers - Heterogeneous Systems

Locally Centralized Data Interoperable & Distributed Databases

Page 4: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 4

Completed Steps for the Mockup Implementation

Extracted Data from Partner Centers Moffitt and San Antonio provided sample data sets to the DMCC and JPL Used “synthesized” data in lieu of “sensitive” data Preserved the original data structures provided by the centers

Mapped Data Dictionary Terms Mapped common models between the EDRN CDE, Moffitt and San

Antonio for correlating data sets Developed “Profiles” that represent data resources for San Antonio,

Moffitt, DMCC, EDRN and NCI

Hosted data and metadata “profiles” at JPL Integrated with an existing data sharing software framework

developed by JPL called “OODT” or Object Oriented Data Technology Framework developed to share space science datasets across NASA’s

distributed Planetary Data System

Built a user interface to demonstrate a use case scenario for interoperability and data sharing between the databases

Page 5: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 5

Goals for the Mockup Implementation

Demonstrate the Return on Investment (ROI) achieved in “federating” (or linking) laboratory data systems together

Identify a scenario that demonstrates usability such as providing generic support for specimen data location and retrieval

Use metadata (or profiles) “Recipes” to describe what data (specimen) and resources are

available Communicate across systems

Adoption of EDRN CDEs Look for common models between systems

Understand how to relate center-specific metadata models

Look for “low hanging” fruit Centers with similar databases and data models

Page 6: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 6

QueryManager

EDRN Knowledge ArchitectureMockup Implementation at JPL

San Antonio

Moffitt

MetadataProfiles

EDRN Mock Databases Hosted at JPL

San Antonio ProductExchange Server

Moffitt ProductExchange Server

In:QueryOut::Identified Resources

In:QueryOut::Data Products

In:QueryOut::Data Products

OODTMiddleware: Hosted at JPL

EDRN “Mock” Query Interface

In:QueryOut::Data Products

Page 7: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 7

Profile CDE Integration

Describe specimen data, data servers, and other resources using metadata “profiles”

Use Common Data Element (CDE) set for specimen description and search attributes

Use industry standard metadata terminology such as Dublin Core

Example Metadata Profiles: Mockup EDRN H. Lee Moffitt Cancer Center Product Server

Mockup EDRN University of Texas, San Antonio Product Server

Mockup EDRN DMCC Fred Hutchinson Cancer Research Center Query Interface

Mockup EDRN DMCC Fred Hutchinson Cancer Research Center Web Site

Early Detection Research Network Web Site

EDRN Data Management and Coordinating Center Data Dictionary

Page 8: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 8

Data Element Comparison Chart

MockUp CDE DMCC CDE * Moffitt San AntonioGRADE - GRADE GLEASONSITE - SITE (Prostate Implied)DIAGNOSIS - DIAGNOSIS -COLLECTION_DATE - COLLECTION_DT PROCEDURE_DATECANCER_DX_DATE CANCER_DX_DATE INITIAL_DX_DATM -RACE RACE RACE -GENDER GENDER SEX GENDERBIRTH_DATE BIRTH_DATE BIRTH_DT BIRTH_DATEICDO_CD - ICDO_CD (61.9 Implied)SPECIMEN_TYPE - - -SAMPLE_ID - - -

* As of 12/5/2000

Page 9: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 9

User Interface

Provide a user interface to support various queries of related to cancer specimen data (http://oodt.jpl.nasa.gov/EDRN/search.jsp ):

Find all prostate tissue samples for all men collected from San Antonio and Moffitt databases

Find all prostate tissue samples for men ages 70 and older collected before 1980 from San Antonio and Moffitt databases sorted by Grade, Age, and Site

Find all breast tissue samples from women ages 50 and older from San Antonio* and Moffitt databases

Find all lung tissue samples from San Antonio and Moffitt databases *

* San Antonio database contains just prostate

Page 10: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 10

Key Challenges

        

Local data dictionaries and associated data models Different terms, data types, enumerated values, etc

Different meanings and interpretations

Different database product implementations Filemaker Pro and Microsoft Access

Maintain the structural integrity of the data models

EDRN CDEs exist for demographic data, but not specimen data* JPL developed common CDEs between the two databases for the

specimen data

* As of 12/5/2000

Page 11: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 11

Next Steps

Focus the implementation of data sharing on defining a robust metadata infrastructure Complete the EDRN CDE effort and begin a process of

mapping the CDEs to the center databases Reuse this mockup experience as an example!

Incorporate feedback from mockup presentation Address IRB and security requirements related to data

sharing Encrypted and de-identified keys Network and computer security access

Connect to databases physically located at the centers Implement data system interfaces to the remote databases

Page 12: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 12

Acknowledgements

Lynn Anderson, H. Lee Moffitt Cancer Center Betsy Higgins, University of Texas, San Antonio Heather Kincaid, Data Management and Coordinating

Center, Fred Hutchinson Cancer Research Center Mark Thornquist, Data Management and Coordinating

Center, Fred Hutchinson Cancer Research Center Ziding Feng, Data Management and Coordinating

Center, Fred Hutchinson Cancer Research Center Greg Downing, Office of Science Policy, Office of the

Director, National Institute of Health Sudhir Srivastava, National Cancer Institute

Page 13: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 13

Backup Slides

Page 14: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 14

EDRN Mockup Query Example

Page 15: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 15

EDRN Mockup Results – Query 1

Page 16: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 16

EDRN Mockup Results – Query 3

Page 17: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 17

EDRN Mockup Results – Query 4

Page 18: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 18

Detailed Search of Profiles

Page 19: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 19

Profiles of EDRN Resources

EDRNWebsite

Resource Profiles

MoffittProduct Server

San AntonioProduct Server

EDRNResources

San AntonioMockup

DB

MoffittMockup

DB

DMCCSample Interface

EDRNWebsite

DMCCWebsite

DMCCWebsite

DMCCSample Interface

MoffittProduct Server

San AntonioProduct Server

Page 20: Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion

Page 20

EDRN Mockup Data Flow

Query Server

Profile Serverjpl.edrn

Product Serveredrn.moffitt

Product Serveredrn.sanantonio

Userquery

XSL(profiles ordata productsformatted)

XMLQuery/IIOP(no results)

XMLQuery/IIOP(profiles ordata resultsas requested)

XMLQuery/IIOP(no results)

XMLQuery/IIOP(profiles of resources to handle query)

XM

LQ

ue

ry/I

IOP

(d

ata

re

sults

)

XM

LQ

ue

ry/I

IOP

(p

rod

uct

se

arc

h)

Search Web Page

Profile DB

Moffitt “Mock”

Database

San Antonio“Mock”

Database

Que

ryC

lien

t

Web

se

rver

sear

ch.js

p

Web EDRN/NCIResources