connecticut is data rich but information poor. our vision: connecting the silos

32
Connecticut is Data Rich but Information Poor

Upload: freddy-holter

Post on 01-Apr-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

Connecticut is Data Rich but Information Poor

Page 2: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

Our Vision: Connecting the Silos

Page 3: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

PATH PresentationCT Data Collaborative June 2014

• PATH Overview (Rob)

• How PATH Works (April)

• PATH Demo (Laurel and April)

• Why Integrate Data? The MAPS Study

(Rob)

• PATH vs Data Ladder (Rob & April)

Page 4: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

How PATH Works

Virtual Data Warehouse

Identity Resolution across multiple sources that don’t share a Gold Standard Identifier

HIPAA and FERPA Compliant

Always transfers Fact data separately from Demographic data or Personally Identifiable Information

Data Owners control which data is exported to a location outside of their data center

Data Owners approve all queries

Page 5: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

PATH History

Completed Phases

2007 - Established in Statute - Public Act 07-02

2008 - Initial Development as CHIN, inclusion of 4 initial data

sources

2009 - Implemented advanced record linkage in a virtual data

warehouse

2011 - Scalability to 1M+ individuals, ability to add additional data

sources and manage metadata w/o code modifications, unlimited

data sources

2014 - Implemented for P20WIN 40M Records, 1.6B Data Elements

Now Available to CT Agencies and Organizations as PATH

Page 6: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

Data Categories

People Records

Demographic Information such as Name, Address, SSN, DOB, etc.

Also known as PII – Personally Identifiable Information

Fact Records

Education, Health, Labor, etc. Information about a person BUT without the PII information

De-Identified or Anonymized Data

Page 7: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

A Walk Through of How PATH Works

P20WIN Example

Buchanan,Laurel
This is a much better place for the p20 win scope slide
Page 8: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

• PATH Remote Software installed at each Participating Agency

• Agency Data Steward uses the PATH Metadata Editor to Identify:• Table/Record Schema of

Agency Data• Data at the Field or Table Level

marked Available or Unavailable for Download

• Common Data Element fields used for linking records - provides Identity Resolution across the different sources

Step 1

Agency Data

Agency Data

Agency Data

Agency Data

SDE

CCC

CSU

DOL

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Page 9: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

• During Remote Initialization the Extract/Transform/Load function of PATH builds a Record Index of the People Records from each Data Source

Step 2

Agency Data

Agency Data

Agency Data

Agency Data

SDE

CCC

CSU

DOL

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Record Index

Record Index

Record Index

Record Index

Page 10: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

Main @ DAS/BEST

• PATH Software installed at a Main Location - for P20WIN this location is DAS/BEST

Step 3

DOL

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Record Index

Record Index

Record Index

Agency Data

Agency Data

Agency Data

Agency Data

Record Index

SDE

CCC

CSU

Probabilistic Integrator - Pi 

 UI, Security,Workflow,Query Engine  

Page 11: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

During Main Initialization• Using each Agency’s

Record Index, Extracts Common Data Elements from People Records

Step 4

Probabilistic Integrator - Pi 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Record Index

Record Index

Record Index

Agency Data

Agency Data

Agency Data

Agency Data

Record Index

SDE

CCC

CSU

DOLMain @

DAS/BEST

 UI, Security,Workflow,Query Engine  

Page 12: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

During Main Initialization• Using each Agency’s

Record Index, Extracts Common Data Elements from People Records

• Sends them to Main & Loads into Memory ONLY

Step 4

Probabilistic Integrator - Pi 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Record Index

Record Index

Record Index

Agency Data

Agency Data

Agency Data

Agency Data

Record Index

SDE

CCC

CSU

DOLMain @

DAS/BEST

 UI, Security,Workflow,Query Engine  

Page 13: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

During Main Initialization• Extracts Common Data Elements

from People Records using each Agency’s Record Index

• Sends them to Main & Loads into Memory ONLY

• Combines multiple records for individuals into Clusters via Probabilistic Integration Utility

• Builds a Data Base of Clusters containing only Agency Record Indices

• No Personally Identifiable Agency Data written to disk outside of Agency

Step 4

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Record Index

Record Index

Record Index

Agency Data

Agency Data

Agency Data

Agency Data

Record Index

SDE

CCC

CSU

DOL

Probabilistic Integrator - Pi 

Main @ DAS/BEST

 UI, Security,Workflow,Query Engine  

DB of Clustered

Indices

Page 14: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

• Use UI features to establish user Roles, Login, etc.

Step 5

Probabilistic Integrator - Pi 

DB of Clustered

Indices

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Record Index

Record Index

Record Index

Agency Data

Agency Data

Agency Data

Agency Data

Record Index

SDE

CCC

CSU

DOLMain @

DAS/BEST

 UI, Security,Workflow,Query Engine  

Buchanan,Laurel
UI shoudl be wtitten out as User Interface since these slides stand alone. True of all previousl abbreviations.
Page 15: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

• Use UI features to establish user Roles, Login, etc.

• Use UI features to:• Create a Query• Approve a Query• Schedule a Query

Step 5

Probabilistic Integrator - Pi 

DB of Clustered

Indices

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Record Index

Record Index

Record Index

Agency Data

Agency Data

Agency Data

Agency Data

Record Index

SDE

CCC

CSU

DOLMain @

DAS/BEST

 UI, Security,Workflow,Query Engine  

Page 16: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

• Use UI features to establish user Roles, Login, etc.

• Use UI features to:• Create a Query• Approve a Query• Schedule a Query

• Use Query Engine to:• Build Agency Query Requests• Uses ONLY Data Available

for Download in Query Request

Step 5

Probabilistic Integrator - Pi 

DB of Clustered

Indices

 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Record Index

Record Index

Record Index

Agency Data

Agency Data

Agency Data

Agency Data

Record Index

SDE

CCC

CSU

DOLMain @

DAS/BEST

UI, Security,Workflow,Query Engine  

Page 17: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

• Query Engine uses Clusters of Indices to

• Get the needed Agency Records Indices

• Queries Only Agency Data marked Available for Download

• Transfers only data marked Available for Download to the Main

• Downloads Only Approved Queries

Step 6

Probabilistic Integrator - Pi 

DB of Clustered

Indices

UI, Security,Workflow,Query Engine  

 

De-identified Integrated Data 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Record Index

Record Index

Record Index

Agency Data

Agency Data

Agency Data

Agency Data

Record Index

SDE

CCC

CSU

DOLMain @

DAS/BEST

Page 18: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

PATH as P20WIN Demo

CCC & CSU

DOL

Page 19: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

3 User Roles

Data Consumer• Create

Query• Downloads

Query Data

System Admin• Approved Query

Design• Schedules

Execution• Approved Data

Download

Data Steward• Data

Management

Page 20: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

Query WorkflowCreate Query

• Data Consumer

Approve Query

• Agency Sys Admin

Schedule

Execution

• Agency Sys Admin

Approve Download

• Agency Sys Admin

Download

Data

• Data Consumer

Page 21: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

What you don’t get - PII

Last Name

First Name

DOB SSN

Appel April 01/01/1999 016-000-9876

Berry John 02/02/1997 216-000-4576

Carat Colleen 03/03/1993 119-000-1234

Ernst Max 04/04/1994 116-000-3456

Gomez Gloria 05/05/1995 036-000-9999

Hurst William 06/06/1996 016-000-5599

Keller Helene 07/07/1997 017-000-2340

Martinez Pedro 08/08/1998 018-000-9886

Rodriguez Felix 09/09/1999 029-000-9111

Smith Peggy 10/10/2000 016-000-8787

CCC & CSU

Record Locator

G Major Grad Stat

Degree Code

887962 F English Yes BA

87562 M Physical Ed No AA

074125 F Computer Science

Yes BS

658741 M Political Science

Yes BA

110034 M Ecology No BS

265310 M Psychology Yes BA

035890 F Math Yes BS

010098 M Biology Yes BS

235874 F Anthropology No BA

458712 M Agriculture Yes AS

DOLRecord Locator

NAICS Code

Qtr Wage

016098 522110 1 $68,245

110012 611410 2 $50,129

778541 511210 2 $11,569

110034 924120 1 $37,250

030099 721211 2 $6,002

582310 443142 2 $12,558

010023 441120 3 $78,852

010098 621210 2 $44,852

556840 923140 2 $25,556

874512 531311 2 $21,215

Record Locator

Gender Major Grad Stat Degree Code NAICS Code Qtr Wage

110034 M Ecology No BA 924120 1 $37,250

010098 M Biology Yes BS 621210 2 $44,852

Record Locator

016098

210045

110012

110034

030099

010055

010023

010098

020091

010087

Data Output

Page 22: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

The Mortality After Prison Study

• Objective: Link DOC’s OBIS system with DPH death records

• 2 Specific questions motivating this project:• How many former prisoners are dead?• What do they die of, and when?

Page 23: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

Methods

• Former DOC inmates linked to DPH death records using Pi --

• DOC releases beginning April 1974• DPH deaths 1980 – 2010• Matching Fields: first name, last name,

sex, date of birth

Page 24: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

Causes of Death of Former InmatesCT Male Population 1999-2012 CT Former Inmates 1999-2012

% Cause of Death (ICD 10) % Cause of Death (ICD 10)

9 Atherosclerotic heart disease 8.1 Accidental poisoning by and exposure to narcotics7 Malignant Neoplasm Of Bronchus Or Lung 6.3 Malignant Neoplasm Of Bronchus Or Lung

5.1 Acute myocardial infarction 4.8 Atherosclerotic heart disease3.6 Chronic obstructive pulmonary disease 4.4 Atherosclerotic cardiovascular disease3.4 Atherosclerotic cardiovascular disease 3.5 Assault by other and unspecified firearm discharge3 Malignant neoplasm of prostate 2.8 Acute myocardial infarction

2.4 Stroke 2.3Accidental poisoning by drugs, medicaments and biological substances

2.3 Pneumonia 2.2 Cirrhosis of liver2 Unspecified dementia 2.2 Chronic obstructive pulmonary disease

1.8 Malignant neoplasm of colon 2.1Intentional self-harm by hanging, strangulation and suffocation

1.8 Cardiac arrest 1.9Person injured in unspecified motor-vehicle accident, traffic

1.8 Heart failure 1.4 HIV disease resulting in other specified conditions1.7 Sepsis 1.3 Unspecified diabetes mellitus1.6 Malignant neoplasm of pancreas 1.3 Sepsis1.5 Unspecified diabetes mellitus 1.1 Liver cell carcinoma1.5 Alzheimer's disease 1.1 Cardiomyopathy

1.1 Malignant neoplasm of pancreas

1.1 Hypertensive heart disease

1.1 Intentional self-harm by firearm discharge

Page 25: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

Causes of Death of Former InmatesCT Male Population 1999-2012 CT Former Inmates 1999-2012

% Cause of Death (ICD 10) % Cause of Death (ICD 10)

9 Atherosclerotic heart disease 8.1 Accidental poisoning by and exposure to narcotics7 Malignant Neoplasm Of Bronchus Or Lung 6.3 Malignant Neoplasm Of Bronchus Or Lung

5.1 Acute myocardial infarction 4.8 Atherosclerotic heart disease3.6 Chronic obstructive pulmonary disease 4.4 Atherosclerotic cardiovascular disease3.4 Atherosclerotic cardiovascular disease 3.5 Assault by other and unspecified firearm discharge3 Malignant neoplasm of prostate 2.8 Acute myocardial infarction

2.4 Stroke 2.3Accidental poisoning by drugs, medicaments and biological substances

2.3 Pneumonia 2.2 Cirrhosis of liver2 Unspecified dementia 2.2 Chronic obstructive pulmonary disease

1.8 Malignant neoplasm of colon 2.1Intentional self-harm by hanging, strangulation and suffocation

1.8 Cardiac arrest 1.9Person injured in unspecified motor-vehicle accident, traffic

1.8 Heart failure 1.4 HIV disease resulting in other specified conditions1.7 Sepsis 1.3 Unspecified diabetes mellitus1.6 Malignant neoplasm of pancreas 1.3 Sepsis1.5 Unspecified diabetes mellitus 1.1 Liver cell carcinoma1.5 Alzheimer's disease 1.1 Cardiomyopathy

1.1 Malignant neoplasm of pancreas

1.1 Hypertensive heart disease

1.1 Intentional self-harm by firearm discharge

Page 26: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

Causes of Death of Former InmatesCT Male Population 1999-2012 CT Former Inmates 1999-2012

% Cause of Death (ICD 10) % Cause of Death (ICD 10)

9 Atherosclerotic heart disease 8.1 Accidental poisoning by and exposure to narcotics7 Malignant Neoplasm Of Bronchus Or Lung 6.3 Malignant Neoplasm Of Bronchus Or Lung

5.1 Acute myocardial infarction 4.8 Atherosclerotic heart disease3.6 Chronic obstructive pulmonary disease 4.4 Atherosclerotic cardiovascular disease3.4 Atherosclerotic cardiovascular disease 3.5 Assault by other and unspecified firearm discharge3 Malignant neoplasm of prostate 2.8 Acute myocardial infarction

2.4 Stroke 2.3Accidental poisoning by drugs, medicaments and biological substances

2.3 Pneumonia 2.2 Cirrhosis of liver2 Unspecified dementia 2.2 Chronic obstructive pulmonary disease

1.8 Malignant neoplasm of colon 2.1Intentional self-harm by hanging, strangulation and suffocation

1.8 Cardiac arrest 1.9Person injured in unspecified motor-vehicle accident, traffic

1.8 Heart failure 1.4 HIV disease resulting in other specified conditions1.7 Sepsis 1.3 Unspecified diabetes mellitus1.6 Malignant neoplasm of pancreas 1.3 Sepsis1.5 Unspecified diabetes mellitus 1.1 Liver cell carcinoma1.5 Alzheimer's disease 1.1 Cardiomyopathy

1.1 Malignant neoplasm of pancreas

1.1 Hypertensive heart disease

1.1 Intentional self-harm by firearm discharge

Cirrhosis of liver

Liver cell carcinoma

Accidental poisoning by and exposure to narcotics

Accidental poisoning by drugs, medicaments and biological substances

13.7%

Page 27: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

Why Should the State Care?

• Former DOC inmates enrolled in Medicaid upon release

• Alcohol and drug abuse/dependence associated with HIGH rates of ER utilization

• Inpatient hospital costs for substance dependent patients significantly higher

• Deaths due to long term alcohol/drug abuse costly

Page 28: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

• Remote Components• Metadata Editor• Extract, Transform and Load

Module

• Main Components• Integration Engine• User Interface• Security• Workflow Module• Query Engine with Filtering

PATH Components

DB of Clustered

Indices

De-identified Integrated Data 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Record Index

Record Index

Record Index

Agency Data

Agency Data

Agency Data

Agency Data

Record Index

Main @ DAS/BEST

Metadata Editor& ETL 

Probabilistic Integrator - Pi 

UI, Security,Workflow,Query Engine  

 

Integration Engine `

 

UI, Security,Workflow,Query Engine  

Page 29: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

• Security• Personally Identifiable information

never written outside of Agency Data Center

• Encrypted transfer of all data• PII and Facts never transmitted

together• Audit logs• No Query without Owner’s

Approval

• Ease of Use• System Administration• Data Management• Query Filtering• Query results delivered as de-

identified data

PATH Functionality

DB of Clustered

Indices

De-identified Integrated Data 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Record Index

Record Index

Record Index

Agency Data

Agency Data

Agency Data

Agency Data

Record Index

Main @ DAS/BEST

Metadata Editor& ETL 

Probabilistic Integrator - Pi 

UI, Security,Workflow,Query Engine  

 

Integration Engine `

 

UI, Security,Workflow,Query Engine  

PII & Facts separate Xfer

Encrypted Xfer

Query FilteringNo PII

Sys Admin

Approval req’d

Audit logs

No PII

Data Mgmt

Page 30: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

• Remote Components• Metadata Editor• Extract, Transform and Load

Module

• Main Components• Integration Engine• User Interface• Security• Workflow Module• Query Engine with Filtering

DB of Clustered

Indices

De-identified Integrated Data 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Metadata Editor& ETL 

Record Index

Record Index

Record Index

Agency Data

Agency Data

Agency Data

Agency Data

Record Index

Metadata Editor& ETL 

UI, Security,Workflow,Query Engine  

 

Integration Engine `

 

UI, Security,Workflow,Query Engine  

Competitor Components

PII & Facts separate Xfer

Encrypted Xfer

Query FilteringNo PII

Sys Admin

Approval req’d

Audit logs

No PII

Data Mgmt

Page 31: Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos

Desktop Integration Engine

1. Minimal Security• No Encrypted Transfer of Data• No Audit Logs• Dump of Facts with PII• No Secure Logins• FTP or Thumb Drive Transfers• No Anonymized Data

2. No Access Control - No Approval Workflow

3. No Chain of Custody Assurance – Possibility for Cherry-Picked Data

Copies of Agency

Data

PII Visible Integrated Data 

Agency Data

Agency Data

Agency Data

Agency Data

Integration Engine `

Competitor Deficits