connecticut is data rich but information poor
DESCRIPTION
Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos. How PATH Works Example of PATH installed as P20WIN PATH vs Desktop Integrator. PATH Presentation CT Data Collaborative June 2014. Virtual Data Warehouse - PowerPoint PPT PresentationTRANSCRIPT
Connecticut is Data Rich but Information Poor
Our Vision: Connecting the Silos
PATH PresentationCT Data Collaborative June 2014
• PATH Overview (Rob)
• How PATH Works (April)
• PATH Demo (Laurel and April)
• Why Integrate Data? The MAPS Study
(Rob)
• PATH vs Data Ladder (Rob & April)
How PATH Works
Virtual Data Warehouse
Identity Resolution across multiple sources that don’t share a Gold Standard Identifier
HIPAA and FERPA Compliant
Always transfers Fact data separately from Demographic data or Personally Identifiable Information
Data Owners control which data is exported to a location outside of their data center
Data Owners approve all queries
PATH History
Completed Phases
2007 - Established in Statute - Public Act 07-02
2008 - Initial Development as CHIN, inclusion of 4 initial data
sources
2009 - Implemented advanced record linkage in a virtual data
warehouse
2011 - Scalability to 1M+ individuals, ability to add additional data
sources and manage metadata w/o code modifications, unlimited
data sources
2014 - Implemented for P20WIN 40M Records, 1.6B Data Elements
Now Available to CT Agencies and Organizations as PATH
Data Categories
People Records
Demographic Information such as Name, Address, SSN, DOB, etc.
Also known as PII – Personally Identifiable Information
Fact Records
Education, Health, Labor, etc. Information about a person BUT without the PII information
De-Identified or Anonymized Data
A Walk Through of How PATH Works
P20WIN Example
• PATH Remote Software installed at each Participating Agency
• Agency Data Steward uses the PATH Metadata Editor to Identify:• Table/Record Schema of
Agency Data• Data at the Field or Table Level
marked Available or Unavailable for Download
• Common Data Element fields used for linking records - provides Identity Resolution across the different sources
Step 1
Agency Data
Agency Data
Agency Data
Agency Data
SDE
CCC
CSU
DOL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
• During Remote Initialization the Extract/Transform/Load function of PATH builds a Record Index of the People Records from each Data Source
Step 2
Agency Data
Agency Data
Agency Data
Agency Data
SDE
CCC
CSU
DOL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Record Index
Record Index
Record Index
Record Index
Main @ DAS/BEST
• PATH Software installed at a Main Location - for P20WIN this location is DAS/BEST
Step 3
DOL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Record Index
Record Index
Record Index
Agency Data
Agency Data
Agency Data
Agency Data
Record Index
SDE
CCC
CSU
Probabilistic Integrator - Pi
UI, Security,Workflow,Query Engine
During Main Initialization• Using each Agency’s
Record Index, Extracts Common Data Elements from People Records
Step 4
Probabilistic Integrator - Pi
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Record Index
Record Index
Record Index
Agency Data
Agency Data
Agency Data
Agency Data
Record Index
SDE
CCC
CSU
DOLMain @
DAS/BEST
UI, Security,Workflow,Query Engine
During Main Initialization• Using each Agency’s
Record Index, Extracts Common Data Elements from People Records
• Sends them to Main & Loads into Memory ONLY
Step 4
Probabilistic Integrator - Pi
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Record Index
Record Index
Record Index
Agency Data
Agency Data
Agency Data
Agency Data
Record Index
SDE
CCC
CSU
DOLMain @
DAS/BEST
UI, Security,Workflow,Query Engine
During Main Initialization• Extracts Common Data Elements
from People Records using each Agency’s Record Index
• Sends them to Main & Loads into Memory ONLY
• Combines multiple records for individuals into Clusters via Probabilistic Integration Utility
• Builds a Data Base of Clusters containing only Agency Record Indices
• No Personally Identifiable Agency Data written to disk outside of Agency
Step 4
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Record Index
Record Index
Record Index
Agency Data
Agency Data
Agency Data
Agency Data
Record Index
SDE
CCC
CSU
DOL
Probabilistic Integrator - Pi
Main @ DAS/BEST
UI, Security,Workflow,Query Engine
DB of Clustered
Indices
• Use UI features to establish user Roles, Login, etc.
Step 5
Probabilistic Integrator - Pi
DB of Clustered
Indices
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Record Index
Record Index
Record Index
Agency Data
Agency Data
Agency Data
Agency Data
Record Index
SDE
CCC
CSU
DOLMain @
DAS/BEST
UI, Security,Workflow,Query Engine
• Use UI features to establish user Roles, Login, etc.
• Use UI features to:• Create a Query• Approve a Query• Schedule a Query
Step 5
Probabilistic Integrator - Pi
DB of Clustered
Indices
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Record Index
Record Index
Record Index
Agency Data
Agency Data
Agency Data
Agency Data
Record Index
SDE
CCC
CSU
DOLMain @
DAS/BEST
UI, Security,Workflow,Query Engine
• Use UI features to establish user Roles, Login, etc.
• Use UI features to:• Create a Query• Approve a Query• Schedule a Query
• Use Query Engine to:• Build Agency Query Requests• Uses ONLY Data Available
for Download in Query Request
Step 5
Probabilistic Integrator - Pi
DB of Clustered
Indices
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Record Index
Record Index
Record Index
Agency Data
Agency Data
Agency Data
Agency Data
Record Index
SDE
CCC
CSU
DOLMain @
DAS/BEST
UI, Security,Workflow,Query Engine
• Query Engine uses Clusters of Indices to
• Get the needed Agency Records Indices
• Queries Only Agency Data marked Available for Download
• Transfers only data marked Available for Download to the Main
• Downloads Only Approved Queries
Step 6
Probabilistic Integrator - Pi
DB of Clustered
Indices
UI, Security,Workflow,Query Engine
De-identified Integrated Data
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Record Index
Record Index
Record Index
Agency Data
Agency Data
Agency Data
Agency Data
Record Index
SDE
CCC
CSU
DOLMain @
DAS/BEST
PATH as P20WIN Demo
CCC & CSU
DOL
3 User Roles
Data Consumer• Create
Query• Downloads
Query Data
System Admin• Approved Query
Design• Schedules
Execution• Approved Data
Download
Data Steward• Data
Management
Query WorkflowCreate Query
• Data Consumer
Approve Query
• Agency Sys Admin
Schedule
Execution
• Agency Sys Admin
Approve Download
• Agency Sys Admin
Download
Data
• Data Consumer
What you don’t get - PII
Last Name
First Name
DOB SSN
Appel April 01/01/1999 016-000-9876
Berry John 02/02/1997 216-000-4576
Carat Colleen 03/03/1993 119-000-1234
Ernst Max 04/04/1994 116-000-3456
Gomez Gloria 05/05/1995 036-000-9999
Hurst William 06/06/1996 016-000-5599
Keller Helene 07/07/1997 017-000-2340
Martinez Pedro 08/08/1998 018-000-9886
Rodriguez Felix 09/09/1999 029-000-9111
Smith Peggy 10/10/2000 016-000-8787
CCC & CSU
Record Locator
G Major Grad Stat
Degree Code
887962 F English Yes BA
87562 M Physical Ed No AA
074125 F Computer Science
Yes BS
658741 M Political Science
Yes BA
110034 M Ecology No BS
265310 M Psychology Yes BA
035890 F Math Yes BS
010098 M Biology Yes BS
235874 F Anthropology No BA
458712 M Agriculture Yes AS
DOLRecord Locator
NAICS Code
Qtr Wage
016098 522110 1 $68,245
110012 611410 2 $50,129
778541 511210 2 $11,569
110034 924120 1 $37,250
030099 721211 2 $6,002
582310 443142 2 $12,558
010023 441120 3 $78,852
010098 621210 2 $44,852
556840 923140 2 $25,556
874512 531311 2 $21,215
Record Locator
Gender Major Grad Stat Degree Code NAICS Code Qtr Wage
110034 M Ecology No BA 924120 1 $37,250
010098 M Biology Yes BS 621210 2 $44,852
Record Locator
016098
210045
110012
110034
030099
010055
010023
010098
020091
010087
Data Output
The Mortality After Prison Study
• Objective: Link DOC’s OBIS system with DPH death records
• 2 Specific questions motivating this project:• How many former prisoners are dead?• What do they die of, and when?
Methods
• Former DOC inmates linked to DPH death records using Pi --
• DOC releases beginning April 1974• DPH deaths 1980 – 2010• Matching Fields: first name, last name,
sex, date of birth
Causes of Death of Former InmatesCT Male Population 1999-2012 CT Former Inmates 1999-2012
% Cause of Death (ICD 10) % Cause of Death (ICD 10)
9 Atherosclerotic heart disease 8.1 Accidental poisoning by and exposure to narcotics7 Malignant Neoplasm Of Bronchus Or Lung 6.3 Malignant Neoplasm Of Bronchus Or Lung
5.1 Acute myocardial infarction 4.8 Atherosclerotic heart disease3.6 Chronic obstructive pulmonary disease 4.4 Atherosclerotic cardiovascular disease3.4 Atherosclerotic cardiovascular disease 3.5 Assault by other and unspecified firearm discharge3 Malignant neoplasm of prostate 2.8 Acute myocardial infarction
2.4 Stroke 2.3Accidental poisoning by drugs, medicaments and biological substances
2.3 Pneumonia 2.2 Cirrhosis of liver2 Unspecified dementia 2.2 Chronic obstructive pulmonary disease
1.8 Malignant neoplasm of colon 2.1Intentional self-harm by hanging, strangulation and suffocation
1.8 Cardiac arrest 1.9Person injured in unspecified motor-vehicle accident, traffic
1.8 Heart failure 1.4 HIV disease resulting in other specified conditions1.7 Sepsis 1.3 Unspecified diabetes mellitus1.6 Malignant neoplasm of pancreas 1.3 Sepsis1.5 Unspecified diabetes mellitus 1.1 Liver cell carcinoma1.5 Alzheimer's disease 1.1 Cardiomyopathy
1.1 Malignant neoplasm of pancreas
1.1 Hypertensive heart disease
1.1 Intentional self-harm by firearm discharge
Causes of Death of Former InmatesCT Male Population 1999-2012 CT Former Inmates 1999-2012
% Cause of Death (ICD 10) % Cause of Death (ICD 10)
9 Atherosclerotic heart disease 8.1 Accidental poisoning by and exposure to narcotics7 Malignant Neoplasm Of Bronchus Or Lung 6.3 Malignant Neoplasm Of Bronchus Or Lung
5.1 Acute myocardial infarction 4.8 Atherosclerotic heart disease3.6 Chronic obstructive pulmonary disease 4.4 Atherosclerotic cardiovascular disease3.4 Atherosclerotic cardiovascular disease 3.5 Assault by other and unspecified firearm discharge3 Malignant neoplasm of prostate 2.8 Acute myocardial infarction
2.4 Stroke 2.3Accidental poisoning by drugs, medicaments and biological substances
2.3 Pneumonia 2.2 Cirrhosis of liver2 Unspecified dementia 2.2 Chronic obstructive pulmonary disease
1.8 Malignant neoplasm of colon 2.1Intentional self-harm by hanging, strangulation and suffocation
1.8 Cardiac arrest 1.9Person injured in unspecified motor-vehicle accident, traffic
1.8 Heart failure 1.4 HIV disease resulting in other specified conditions1.7 Sepsis 1.3 Unspecified diabetes mellitus1.6 Malignant neoplasm of pancreas 1.3 Sepsis1.5 Unspecified diabetes mellitus 1.1 Liver cell carcinoma1.5 Alzheimer's disease 1.1 Cardiomyopathy
1.1 Malignant neoplasm of pancreas
1.1 Hypertensive heart disease
1.1 Intentional self-harm by firearm discharge
Causes of Death of Former InmatesCT Male Population 1999-2012 CT Former Inmates 1999-2012
% Cause of Death (ICD 10) % Cause of Death (ICD 10)
9 Atherosclerotic heart disease 8.1 Accidental poisoning by and exposure to narcotics7 Malignant Neoplasm Of Bronchus Or Lung 6.3 Malignant Neoplasm Of Bronchus Or Lung
5.1 Acute myocardial infarction 4.8 Atherosclerotic heart disease3.6 Chronic obstructive pulmonary disease 4.4 Atherosclerotic cardiovascular disease3.4 Atherosclerotic cardiovascular disease 3.5 Assault by other and unspecified firearm discharge3 Malignant neoplasm of prostate 2.8 Acute myocardial infarction
2.4 Stroke 2.3Accidental poisoning by drugs, medicaments and biological substances
2.3 Pneumonia 2.2 Cirrhosis of liver2 Unspecified dementia 2.2 Chronic obstructive pulmonary disease
1.8 Malignant neoplasm of colon 2.1Intentional self-harm by hanging, strangulation and suffocation
1.8 Cardiac arrest 1.9Person injured in unspecified motor-vehicle accident, traffic
1.8 Heart failure 1.4 HIV disease resulting in other specified conditions1.7 Sepsis 1.3 Unspecified diabetes mellitus1.6 Malignant neoplasm of pancreas 1.3 Sepsis1.5 Unspecified diabetes mellitus 1.1 Liver cell carcinoma1.5 Alzheimer's disease 1.1 Cardiomyopathy
1.1 Malignant neoplasm of pancreas
1.1 Hypertensive heart disease
1.1 Intentional self-harm by firearm discharge
Cirrhosis of liver
Liver cell carcinoma
Accidental poisoning by and exposure to narcotics
Accidental poisoning by drugs, medicaments and biological substances
13.7%
Why Should the State Care?
• Former DOC inmates enrolled in Medicaid upon release
• Alcohol and drug abuse/dependence associated with HIGH rates of ER utilization
• Inpatient hospital costs for substance dependent patients significantly higher
• Deaths due to long term alcohol/drug abuse costly
• Remote Components• Metadata Editor• Extract, Transform and Load
Module
• Main Components• Integration Engine• User Interface• Security• Workflow Module• Query Engine with Filtering
PATH Components
DB of Clustered
Indices
De-identified Integrated Data
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Record Index
Record Index
Record Index
Agency Data
Agency Data
Agency Data
Agency Data
Record Index
Main @ DAS/BEST
Metadata Editor& ETL
Probabilistic Integrator - Pi
UI, Security,Workflow,Query Engine
Integration Engine `
UI, Security,Workflow,Query Engine
• Security• Personally Identifiable information
never written outside of Agency Data Center
• Encrypted transfer of all data• PII and Facts never transmitted
together• Audit logs• No Query without Owner’s
Approval
• Ease of Use• System Administration• Data Management• Query Filtering• Query results delivered as de-
identified data
PATH Functionality
DB of Clustered
Indices
De-identified Integrated Data
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Record Index
Record Index
Record Index
Agency Data
Agency Data
Agency Data
Agency Data
Record Index
Main @ DAS/BEST
Metadata Editor& ETL
Probabilistic Integrator - Pi
UI, Security,Workflow,Query Engine
Integration Engine `
UI, Security,Workflow,Query Engine
PII & Facts separate Xfer
Encrypted Xfer
Query FilteringNo PII
Sys Admin
Approval req’d
Audit logs
No PII
Data Mgmt
• Remote Components• Metadata Editor• Extract, Transform and Load
Module
• Main Components• Integration Engine• User Interface• Security• Workflow Module• Query Engine with Filtering
DB of Clustered
Indices
De-identified Integrated Data
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Metadata Editor& ETL
Record Index
Record Index
Record Index
Agency Data
Agency Data
Agency Data
Agency Data
Record Index
Metadata Editor& ETL
UI, Security,Workflow,Query Engine
Integration Engine `
UI, Security,Workflow,Query Engine
Competitor Components
PII & Facts separate Xfer
Encrypted Xfer
Query FilteringNo PII
Sys Admin
Approval req’d
Audit logs
No PII
Data Mgmt
Desktop Integration Engine
1. Minimal Security• No Encrypted Transfer of Data• No Audit Logs• Dump of Facts with PII• No Secure Logins• FTP or Thumb Drive Transfers• No Anonymized Data
2. No Access Control - No Approval Workflow
3. No Chain of Custody Assurance – Possibility for Cherry-Picked Data
Copies of Agency
Data
PII Visible Integrated Data
Agency Data
Agency Data
Agency Data
Agency Data
Integration Engine `
Competitor Deficits
Take a Test Drive
Get a Login & Password
Quick Start Guide
Test Report Summary
Full Documentation
Full Test Report