i2b2 clinical research chart and hive architecture
DESCRIPTION
i2b2 Clinical Research Chart and Hive Architecture. Henry Chueh Shawn Murphy Isaac Kohane, PI. Summary. Background Intro to the Clinical Research Chart (CRC) Hive / Cell Software Architecture More details on establishing and using the CRC. Background. Clinical documentation is…clinical - PowerPoint PPT PresentationTRANSCRIPT
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
i2b2i2b2Clinical Research ChartClinical Research Chartand Hive Architecture and Hive Architecture
Henry ChuehHenry ChuehShawn MurphyShawn Murphy
Isaac Kohane, PIIsaac Kohane, PI
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
SummarySummary
• Background
• Intro to the Clinical Research Chart (CRC)
• Hive / Cell Software Architecture
• More details on establishing and using the CRC
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
BackgroundBackground
• Clinical documentation is…clinical
• Lack of systematic approach for organizing clinical data for research
• Ownership issues are unique
• Consent issues are a challenge
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Driving Biological ProjectsDriving Biological Projects
• Asthma
• Hypertension
• Huntington’s Disease
• Diabetes
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Clinical Research Chart (CRC)Clinical Research Chart (CRC)
• Organize and transform clinical data to maximize its utility for research
• Develop an Application and Database framework to serve this goal
• Establish an architecture that allows data from different studies done on this platform to be integrated
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Design of Clinical Research ChartDesign of Clinical Research Chart
Ontology Consent/Tracking Application Pool Management
Services:
Data flowing
Custom Interfaces
Soap/Http interfaces
A program
CRC DB
HL7 MSH|^/&|736401….. PID|102|3231285.….
Text filesText files
XML <Patient1> <image>.….
database
clinicaltrials
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Design of Clinical Research ChartDesign of Clinical Research Chart
Ontology Consent/Tracking Application Pool Management
Services:
Data flowing
Custom Interfaces
Soap/Http interfaces
A program
Data pipeline/workflow application Pheno/Genotype Database
Visualization and Analysis of database contents
CRC DB Text files
Text files
XML <Patient1> <image>.….
database
clinicaltrials
HL7 MSH|^/&|736401….. PID|102|3231285.….
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
i2b2 Skeletal Data Flowi2b2 Skeletal Data Flow
Shared data
Shared data
Study specific
data
Study specific
data
Clinical Clinical Research Research
ChartChart
Clinical Clinical Research Research
ChartChart
Enterprise SystemsEnterprise SystemsRegistration, ADT, Labs,Registration, ADT, Labs,
Reports, Clinical Notes, etcReports, Clinical Notes, etc
Enterprise data
source(RPDR)
Enterprise data
source(RPDR)
AnnotationUI
EDCapplications
Local SystemsLocal SystemsSystems not gathered intoSystems not gathered into
Enterprise data warehousesEnterprise data warehouses
i2b2 ETLworkflow
AnnotationService
EDCService
Analyticworkflow
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Overall ThemesOverall Themes
• Framework to allow development of application services in a maximally decoupled fashion.
• Linux and Windows OS support• Java and C++ programming languages• Use Cases for construction of CRC come
from Driving Biology Projects and experience with clients of Partners Research Patient Data Registry
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Focus on WorkflowFocus on Workflow
• Necessary for both pre-CRC and post-CRC processes
• Needed for scientific flexibility
• Implies a consistent environment for data pipelining and flow control
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
i2b2 Hivei2b2 Hive
• Formed as a collection of interoperable Cells, or services
• Loosely coupled
• Makes no assumptions about proximity
• Connected by Web services
• Activated by a workflow engine that forms basis of choreography among Cells for complex interactions
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Complex choreographyComplex choreography
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
i2b2 Celli2b2 Cell
• Behaves as a functional service
• Separates interactions conceptually into transactions and semantics
• Focuses on facilitating transactions with simple semantics (e.g., datatype)
• Leaves deep semantics to be defined by the services provided by a Cell
• Does not restrict language implementation
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Target layer for i2b2 Target layer for i2b2
TCP/IP
Web Services
I2b2 platform
Semantic Objects
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Cell examplesCell examples
• Concept extraction from clinical narratives
• Simple transformations; e.g., basic text format conversion
• Complex encoding; e.g., encoding MIAME in MAGE
• Microarray data normalization
• …
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Exposing CellsExposing Cells
• Protocols layered on top of SOAP
• At the WSDL level for integrators; ie, bioinformaticians & software engineers
• At a functional level for investigators
• i2b2 toolkits to allow integrators to expose controlled functionality to investigators (Automator)
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Automator ApproachAutomator Approach
investigators
informaticians
Extend Kepler workflow engine
i2b2 Automator
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Bird’s eye viewBird’s eye view
Workflow engine
Investigator Portal
CRCRepository
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Current ImplementationCurrent Implementation
• Extending Kepler workflow engine for i2b2
• Data model for CRC repository
• Defining protocols necessary for interaction (in addition to SOAP)
• Created Cell for concept extraction from narratives
• Early designs for Automator toolkit
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
i2b2 Architecture Key Points i2b2 Architecture Key Points
• Leverage existing workflow standards and software
• Use Web services as basic form of interaction
• Assume unlimited choreography, but…
• Provide tools to distill complexity into basic automation for clinical investigators
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
SW Licensing and DistributionSW Licensing and Distribution
• Commit to Open Source software
• Use GNU Lesser General Public License
• Establish local i2b2 repository exposed through i2b2 website
• Contribute to a more global NCBC SourceForge style repository if it emerges ?NIH Forge
• Keep i2b2 protocols fully open
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Interoperability across NCBCInteroperability across NCBC
• Strongly consider Web services as basic protocol for generic shared interactions
• Consider sharing datasets
• Promote diversity of approach and use of shared software (don’t impose uniformity)
• Facilitate/promote NCBC Open Source project teams
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Pre-CRC Data Pre-CRC Data Pipeline/WorkflowPipeline/Workflow
Populating the Clinical Research Populating the Clinical Research Chart (CRC)Chart (CRC)
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Pre-CRC Data Pipeline/WorkflowPre-CRC Data Pipeline/Workflow
• Use workflow framework to choreograph applications services in specific sequences
• Used to extract, transform, conform, and load data and metadata into the CRC
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Pre-CRC Data Pipeline/WorkflowPre-CRC Data Pipeline/Workflow
Ontology Consent/Tracking Application Pool Management
Services:
Data flowing
Custom Interfaces
Soap/Http interfaces
OutputInput
A program
increasingly useful
Local or through SOAP service
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Ontology ServiceOntology Service
Ontology Consent/Tracking Application Pool Management
• Manages mappings of terms to common vocabularies• Provides lists of acceptable (enumerated) values for
various attribute and value slots.• Allows for management of hierarchies, groupings, and
relationships between terms
Ontology
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Person Consent/Tracking ServicePerson Consent/Tracking Service
Ontology Consent/Tracking Application Pool Management
• Provides mappings between patient/subject identifiers• Tracks patient/subject consent information• Allows identification of the patient/subject based upon
fuzzy demographic matches
Consent/Tracking
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Application Pool (CVS) ServiceApplication Pool (CVS) Service
Ontology Consent/Tracking Application Pool Management
• Stores programs/scripts used in pipeline• Provides applications to be downloaded when needed• Manages versioning of software• Provides documentation
Application Pool
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Management ServiceManagement Service
Ontology Consent/Tracking Application Pool Management
• Stores workflow execution plan• Starts and controls workflow execution• Schedules workflow execution• Monitors workflow execution and data locations• Controls permissions associated with workflow
execution
Management
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Data Pipeline/Workflow ApplicationData Pipeline/Workflow ApplicationUse Case for Asthma DataUse Case for Asthma Data
Ontology Consent/Tracking Application Pool Management
Services:
Data flowing
Custom Interfaces
Soap/Http interfaces
OutputInput
A program
RPDR
CRC DB
AsthmaMartData retrieval
Data de-identification
Language processing
Vocabulary matching
Load Data into Mart
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Data Pipeline/WorkflowData Pipeline/WorkflowImplementationImplementation
• Define standard XML representation for workflow - MoML
• Define standards for SOAP services and resource discovery
• Adopt and extend open source workflow package (Kepler)
• Prototypes by July timeframe
• BIRN -> NAMIC and LONI collaboration
• Can follow construction details at http://diagon/i2b2
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Phenotype/Genotype Phenotype/Genotype DatabaseDatabase
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Phenotype/Genotype DatabasePhenotype/Genotype DatabasePrinciplesPrinciples
• Analytical database schema that does not need to change with new data types and concepts
• Defined fundamental unit of data (atomic fact) = observation
• Defined metadata strategy
• Various levels of de-identification (reviewed and approved by IRB)
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Phenotype/Genotype DatabasePhenotype/Genotype DatabaseArchitectureArchitecture
observation_fact
PK,FK2 Encounter_Id_ePK,FK1,FK2 Patient_Id_ePK Concept_CdPK Provider_IdPK Start_DatePK ValType_CdPK TVal_CharPK NVal_Num
ValueFlag_Cd Quantity_Num Units_Cd End_Date Confidence_Num Observation_Blob Update_Date Download_Date Import_Date Sourcesystem
visit_dimension
PK Encounter_Id_ePK Patient_Id_e
InOutpt_Cd Location_Cd Start_Date End_Date Visit_Blob Update_Date Download_Date Import_Date Sourcesystem_Cd
patient_dimension
PK Patient_Id_e
Vital_Status_Cd Birth_Date Death_Date Sex_Cd Age_In_Years_Num Language_Cd Race_Cd Marital_Status_Cd Religion_Cd Zip_Cd StateCityZip_Path Patient_Blob Update_Date Download_Date Import_Date Sourcesystem_Cd
concept_dimension
PK Concept_Path
Concept_Cd Name_Char Concept_Blob Update_Date Download_Date Import_Date Sourcesystem_Cd
provider_dimension
PK Provider_Path
Provider_Id Name_Char Provider_Blob Update_Date Download_Date Import_Date Sourcesystem(see preprint)
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Phenotype/Genotype DatabasePhenotype/Genotype DatabaseUse CaseUse Case
• Smoking observations represented in database
Patient_id_e Concept_cd Start_date Provider_id Confidence_num
Z234 CT-A-SMK 1/1/1997 M0022303 3
Z234 CT-A-SMK 1/1/1998 M0034125 9
Z234 IC9-3051 1/1/2001 M0022303 3
Z234 CT-A-NSK 1/1/2002 M0034125 9
Patient_id_e Birth_date Sex_cd Race_cd Death_date
Z234 3/4/1924 Female Black 4/5/2003
Provider_id Provider_path Name_char
M0022303 MGH\Neurology\M0022303 M0022303
Concept_cd Concept_path Name_char
CT-A-SMK AsthV1\DRptNLP\Tobacco Use\Smoker Smoking
IC9-3051 V2\Diagnosis\Mental Disorders (290-319)\Non-psychotic disorders (300-316)\(305) Nondependent abuse of drugs\(305-1) Tobacco use disorder\(305-11) Tobacco use disorder, co~
Tobacco Use Disorder, continuous use
CT-A-NSK AsthV1\DRptNLP\Tobacco Use\Non smoker Never smoked
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Phenotype/Genotype DatabasePhenotype/Genotype DatabaseImplementationImplementation
• Asthma CRC DB “primed” with data from 90,000 patients from Research Patient Data Registry
• Serves as fundamental data structure for i2b2 supported data Querying and Visualization Application Suite
• CRC DB’s able to fuse seamlessly together• Various levels of de-identification to be
supported for data sharing and publication
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Visualization and Analysis Visualization and Analysis of CRC databaseof CRC database
Post-CRC workflowPost-CRC workflow
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Visualization and AnalysisVisualization and AnalysisPrinciplesPrinciples
• Supported application suite to query and view CRC database contents
• Outside applications for analysis and viewing able to plug in to application suite
• Pipeline/Workflow framework may be used for analysis and re-entry of derived data into CRC database
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Visualization and AnalysisVisualization and AnalysisArchitectureArchitecture
• Supported Applications, Querying and Visualization– Standard querying
– Data exploration
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Visualization and AnalysisVisualization and AnalysisArchitectureArchitecture
• Supported Applications, ontology management– Ontology Management
• Integrate (outside?) population analysis applications
You have picked “seizure disorder”
File Edit
i2b2 ontology management
provenancemapping transform explain
Total20042005
10,1245,0665,058
mappingCounts
You have picked “seizure disorder”
File Edit
i2b2 ontology management
File Edit
i2b2 ontology management
provenancemapping transform explain
Total20042005
10,1245,0665,058
mappingCounts
Total20042005
10,1245,0665,058
mappingCounts
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Visualization and AnalysisVisualization and AnalysisArchitectureArchitecture
• Supported applications have plug-in architecture for outside analytic tools:– Standard web-link support with GET and
POST oriented data transfer– Support transfer of specifically transformed
data to outside applications– Complex analysis supported with workflow
application
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Visualization and AnalysisVisualization and AnalysisArchitecture - QueryArchitecture - Query
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Visualization and AnalysisVisualization and AnalysisArchitecture - ExplorationArchitecture - Exploration
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Visualization and AnalysisVisualization and AnalysisArchitecture – Ontology mgmtArchitecture – Ontology mgmt
You have picked “seizure disorder”
File Edit
i2b2 ontology management
provenancemapping transform explain
Total20042005
10,1245,0665,058
mappingCounts
You have picked “seizure disorder”
File Edit
i2b2 ontology management
File Edit
i2b2 ontology management
provenancemapping transform explain
Total20042005
10,1245,0665,058
mappingCounts
Total20042005
10,1245,0665,058
mappingCounts
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Visualization and AnalysisVisualization and AnalysisUse CaseUse Case
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Visualization and AnalysisVisualization and AnalysisImplementation of analysis toolsImplementation of analysis tools
• Workflow framework to accommodate external analytic applications
CRC DBProgID CA2.3
SN8745
PA5683
SN8745
SN8745
SNOMED CODE
patient id 0000004
account # 347
subject id 4
subject id 4
ProgID CX2.3
ProgID PN5.1 ProgID TH3.0
ProgID SN5.4
ProgID AA3.3
ProgID CN2.3 ProgID XN0.9
i2b2 National Center for Biomedical Computingi2b2 National Center for Biomedical Computing
Final AssemblyFinal Assembly
statisticsapplication
server
statisticsapplication
server
Gene expression in APOE 4 Allele
Alzheimer's
Seizures
ER visits
Clinic visits
Outcomes calculated every weekSurgeryER visit
microarray (encrypted)
ownershipmanager
encryption
Trauma
Gene-Chips
populationregistry
databasedatabase
microarray (encrypted)
Trauma
Surgery
Multiple sclerosis
Trauma
CT ScanHemorrhage
Thalamus
person concept date
Gene-ChipsSeizure
SeizureAlzheimer’sDiabetes
Z5937XZ5937XZ5937XZ5937X
Z5956XZ5956XZ5956XZ5956X
Z5956XZ5956XZ5956XZ5956X
Z5937X
raw value
3/43/43/43/4
3/93/93/93/9
5/25/25/25/2
4/6