page 1 informatics pilot project edrn knowledge system working group san antonio, texas january 21,...
TRANSCRIPT
Page 1
Informatics Pilot Project
EDRN Knowledge System Working Group
San Antonio, TexasJanuary 21, 2001
Steve HughesThuy Tran
Dan Crichton
Jet Propulsion Laboratory, California Institute of Technology,National Aeronautics and Space Administration
Page 2
Problem Definition and Proposal
Overview Problem: Specimen data is geographically distributed across
heterogeneous data systems making the location, retrieval and use of this data difficult.
Solution: Build a “data architecture” for the EDRN network Use “metadata” as a key to interoperability Provide services for data sharing, archiving and distribution Provide a software framework that allows analysis tools to be
plugged into the EDRN data enterprise
Benefit: Correlating data across multiple centers affords an opportunity to create new data sets and data awareness
Example: Find all prostate tissue samples for men ages 70 and older collected before 1980 from databases across the EDRN
Page 3
EDRN Data Architecture Evolution
Data System Evolution
Local Database - Local Tools - No Data Sharing between Centers - No Common Data Elements
Limited Data Sharing - Manual Data Sharing - Manual Correlation - Export/Import Data - Limited CDEs
Full Data Sharing - Location Independence - Data Interchange - Data Sharing - Common CDEs between centers - Heterogeneous Systems
Locally Centralized Data Interoperable & Distributed Databases
Page 4
Completed Steps for the Mockup Implementation
Extracted Data from Partner Centers Moffitt and San Antonio provided sample data sets to the DMCC and JPL Used “synthesized” data in lieu of “sensitive” data Preserved the original data structures provided by the centers
Mapped Data Dictionary Terms Mapped common models between the EDRN CDE, Moffitt and San
Antonio for correlating data sets Developed “Profiles” that represent data resources for San Antonio,
Moffitt, DMCC, EDRN and NCI
Hosted data and metadata “profiles” at JPL Integrated with an existing data sharing software framework
developed by JPL called “OODT” or Object Oriented Data Technology Framework developed to share space science datasets across NASA’s
distributed Planetary Data System
Built a user interface to demonstrate a use case scenario for interoperability and data sharing between the databases
Page 5
Goals for the Mockup Implementation
Demonstrate the Return on Investment (ROI) achieved in “federating” (or linking) laboratory data systems together
Identify a scenario that demonstrates usability such as providing generic support for specimen data location and retrieval
Use metadata (or profiles) “Recipes” to describe what data (specimen) and resources are
available Communicate across systems
Adoption of EDRN CDEs Look for common models between systems
Understand how to relate center-specific metadata models
Look for “low hanging” fruit Centers with similar databases and data models
Page 6
QueryManager
EDRN Knowledge ArchitectureMockup Implementation at JPL
San Antonio
Moffitt
MetadataProfiles
EDRN Mock Databases Hosted at JPL
San Antonio ProductExchange Server
Moffitt ProductExchange Server
In:QueryOut::Identified Resources
In:QueryOut::Data Products
In:QueryOut::Data Products
OODTMiddleware: Hosted at JPL
EDRN “Mock” Query Interface
In:QueryOut::Data Products
Page 7
Profile CDE Integration
Describe specimen data, data servers, and other resources using metadata “profiles”
Use Common Data Element (CDE) set for specimen description and search attributes
Use industry standard metadata terminology such as Dublin Core
Example Metadata Profiles: Mockup EDRN H. Lee Moffitt Cancer Center Product Server
Mockup EDRN University of Texas, San Antonio Product Server
Mockup EDRN DMCC Fred Hutchinson Cancer Research Center Query Interface
Mockup EDRN DMCC Fred Hutchinson Cancer Research Center Web Site
Early Detection Research Network Web Site
EDRN Data Management and Coordinating Center Data Dictionary
Page 8
Data Element Comparison Chart
MockUp CDE DMCC CDE * Moffitt San AntonioGRADE - GRADE GLEASONSITE - SITE (Prostate Implied)DIAGNOSIS - DIAGNOSIS -COLLECTION_DATE - COLLECTION_DT PROCEDURE_DATECANCER_DX_DATE CANCER_DX_DATE INITIAL_DX_DATM -RACE RACE RACE -GENDER GENDER SEX GENDERBIRTH_DATE BIRTH_DATE BIRTH_DT BIRTH_DATEICDO_CD - ICDO_CD (61.9 Implied)SPECIMEN_TYPE - - -SAMPLE_ID - - -
* As of 12/5/2000
Page 9
User Interface
Provide a user interface to support various queries of related to cancer specimen data (http://oodt.jpl.nasa.gov/EDRN/search.jsp ):
Find all prostate tissue samples for all men collected from San Antonio and Moffitt databases
Find all prostate tissue samples for men ages 70 and older collected before 1980 from San Antonio and Moffitt databases sorted by Grade, Age, and Site
Find all breast tissue samples from women ages 50 and older from San Antonio* and Moffitt databases
Find all lung tissue samples from San Antonio and Moffitt databases *
* San Antonio database contains just prostate
Page 10
Key Challenges
Local data dictionaries and associated data models Different terms, data types, enumerated values, etc
Different meanings and interpretations
Different database product implementations Filemaker Pro and Microsoft Access
Maintain the structural integrity of the data models
EDRN CDEs exist for demographic data, but not specimen data* JPL developed common CDEs between the two databases for the
specimen data
* As of 12/5/2000
Page 11
Next Steps
Focus the implementation of data sharing on defining a robust metadata infrastructure Complete the EDRN CDE effort and begin a process of
mapping the CDEs to the center databases Reuse this mockup experience as an example!
Incorporate feedback from mockup presentation Address IRB and security requirements related to data
sharing Encrypted and de-identified keys Network and computer security access
Connect to databases physically located at the centers Implement data system interfaces to the remote databases
Page 12
Acknowledgements
Lynn Anderson, H. Lee Moffitt Cancer Center Betsy Higgins, University of Texas, San Antonio Heather Kincaid, Data Management and Coordinating
Center, Fred Hutchinson Cancer Research Center Mark Thornquist, Data Management and Coordinating
Center, Fred Hutchinson Cancer Research Center Ziding Feng, Data Management and Coordinating
Center, Fred Hutchinson Cancer Research Center Greg Downing, Office of Science Policy, Office of the
Director, National Institute of Health Sudhir Srivastava, National Cancer Institute
Page 13
Backup Slides
Page 14
EDRN Mockup Query Example
Page 15
EDRN Mockup Results – Query 1
Page 16
EDRN Mockup Results – Query 3
Page 17
EDRN Mockup Results – Query 4
Page 18
Detailed Search of Profiles
Page 19
Profiles of EDRN Resources
EDRNWebsite
Resource Profiles
MoffittProduct Server
San AntonioProduct Server
EDRNResources
San AntonioMockup
DB
MoffittMockup
DB
DMCCSample Interface
EDRNWebsite
DMCCWebsite
DMCCWebsite
DMCCSample Interface
MoffittProduct Server
San AntonioProduct Server
Page 20
EDRN Mockup Data Flow
Query Server
Profile Serverjpl.edrn
Product Serveredrn.moffitt
Product Serveredrn.sanantonio
Userquery
XSL(profiles ordata productsformatted)
XMLQuery/IIOP(no results)
XMLQuery/IIOP(profiles ordata resultsas requested)
XMLQuery/IIOP(no results)
XMLQuery/IIOP(profiles of resources to handle query)
XM
LQ
ue
ry/I
IOP
(d
ata
re
sults
)
XM
LQ
ue
ry/I
IOP
(p
rod
uct
se
arc
h)
Search Web Page
Profile DB
Moffitt “Mock”
Database
San Antonio“Mock”
Database
Que
ryC
lien
t
Web
se
rver
sear
ch.js
p
Web EDRN/NCIResources