www.privacyanalytics.ca | 855.686.4781 [email protected] 251 laurier avenue, suite 200...
TRANSCRIPT
www.privacyanalytics.ca | [email protected]
251 Laurier Avenue, Suite 200Ottawa, Ontario, Canada K1P 5J6
© 2014 Privacy Analytics, Inc.
Privacy Analytics
For organizations that want to safeguard and enable data for secondary purposes …
•Automates the masking and de-identification of data using a risk-based approach to anonymization
•Integrated capabilities to anonymize structured and unstructured data from multiple sources
•Peer-reviewed methodologies and value-added services that certify data as de-identified
© 2014 Privacy Analytics, Inc.
Our Product and Services
PARATCORE
PARAT TEXT
CertificationServices
Risk Management
Training
Audit re-identification risk using threat models and scenarios
Develop internal expertise around
managing re-identification
risks
Anonymize unstructured data
in text and XML documents.
Automate the measurement ofre-identification risk and anonymize data
© 2014 Privacy Analytics, Inc.
How PARAT CORE Works
Measure Risk
Select Data Anonymize Data
ManageReleases
© 2014 Privacy Analytics, Inc.
HIPAA De-identification Methods
Our software automates statistical de-identification methods in an integrated way to 1) ensure that we safeguard our customers’ data; and 2) maximize its analytic utility
© 2014 Privacy Analytics, Inc.
Primary Structured and Unstructured Data
• Income = $82,000• Plan # 54678
• MRN: 123• cwright@
• Chris Wright• Born Jan 15, 1978• Zip code: 12345
Safe Harbor Method
(Data Masking)
Expert Determination (Statistical De-identification
and Data Masking)
• MRN: 123• cwright@
• Chris Wright• Born Jan 15, 1978• Zip code: 12345
• Income = $82,000• Plan # 54678
External Structured Data
• Income = $82,000• Plan # 65123
• MRN: 589• rwong@
• Robert Wong• Born Jan 29, 1978• Zip code: 12346
• Income = $82,000• Plan # 54678
EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012
• MRN: 123• cwright@
• Chris Wright• Born Jan 15, 1978• Zip code: 12345
Internal Structured Data
EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012
EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012
Structured & Unstructured Data
How We Enable Analytic Utility
EMR data and notes at last PCP visit:• Admission date: 08/18/2012• Discharge date: 08/20/2012
© 2014 Privacy Analytics, Inc.
Our Approach: De-identificationTaking into Account the Risk of Disclosure
If the measured risk does not meet the threshold, specific
transformations (such as generalization and
suppression) are applied to reduce the risk.
Based on plausible attacks, appropriate metrics are
selected and used to measure actual re-identification risk
from the data.De-identification
Process
Measure Risk
Apply Transformations
Set Risk Threshold
Based on the characteristics of the data recipient, the data, and precedents, a quantitative risk threshold is set.
This is an iterative process. The mitigating controls in place can be strengthened to get a more forgiving threshold.
© 2014 Privacy Analytics, Inc.
Re-identification Risk: Example
DIRECT IDENTIFIERS INDIRECT IDENTIFIERS SENSITIVE VARIABLES OTHER
ID Name Telephone No. Sex Year of Birth Lab Test Lab Result
Pay Delay
1 John Smith (412) 668-5468 M 1959 Albumin, Serum 4.8 37
2 Alan Smith (413) 822-5074 M 1969 Creatine Kinase 86 36
3 Alice Brown (416) 886-5314 F 1955 Alkaline Phosphatase 66 52
4 Hercules Green (613)763-5254 M 1959 Bilirubin <0 36
5 Alicia Freds (613) 586-6222 F 1942 BUN/Creatinine Ratio 17 82
6 Gill Stringer (954) 699-5423 F 1975 Calcium, Serum 9.2 34
7 Marie Kirkpatrick (416) 786-6212 F 1966 Free Thyroxine Index 2.7 23
8 Leslie Hall (905) 668-6581 F 1987 Globulin, Total 3.5 9
9 Douglas Henry (416) 423-5965 M 1959 B-type Natriuretic peptide 134 38
10 Fred Thompson (416) 421-7719 M 1967 Creatine Kinase 80 21
3Two quasi-identifiers
matching in three cells within a dataset
© 2014 Privacy Analytics, Inc.
Identifiability Spectrum
Little De-identification Significant De-identification
5
20
3
2
10
811
16
A range of operational precedents exist based on the situational context of the data’s use and available mitigating controls that protect it.
© 2014 Privacy Analytics, Inc.
Identifiability Spectrum
Little De-identification Significant De-identification
5
20
3
2
10
811
16
Leading research organizations apply these precedents to data release for secondary purposes. We’ve embedded these precedents into PARAT CORE.
© 2014 Privacy Analytics, Inc.
Balancing Privacy with Analytic Utility
© 2014 Privacy Analytics, Inc.
PARAT
© 2014 Privacy Analytics, Inc.
Why PARAT CORE?
A scalable set of capabilities that enables the release of anonymized data for analysis, while safeguarding personal information to:
Automate
Audit
Analyze
© 2014 Privacy Analytics, Inc.
Why Privacy Analytics?
• Research into risk and statistical de-identification since 2004
• Recognized by Privacy by Design as an Ambassador
• Defensible and auditable approaches to meet regulatory obligations for Canada and the U.S.
• Methodology, approach and algorithms peer reviewed in lead academic publications
Half of Fortune 50 healthcare companies have engaged Privacy Analytics. And it’s because of our:
• Software and professional services delivered to more than 100 customers
• Serves complex, large heterogeneous and homogenous data environments
• Support for large structured and unstructured data sets
SoftwareMethodologyExpertise
© 2014 Privacy Analytics, Inc.
Thank You
Contact name:Title:Phone:Email:
© 2014 Privacy Analytics, Inc.
Challenges:• Significant size of the data set. Held more than
five years of clinical, prescription, laboratory, scheduling and billing data of patients• Numerous release requests from more than
2500 clinics and 5000 physicians
EMR Software Vendor
Analytic Outcomes:De-identified data to analyze:• Post-marketing surveillance of adverse events• Public health surveillance• Prescription pattern analysis• Health services analysis
Wanted to anonymize data on 550,000 patients from general practices
Longitudinal data needed to be used for on-going and on-demand analytics
17
Post-marketing and Public Health Surveillance
© 2014 Privacy Analytics, Inc.
Challenges:• Highly sensitive data on individual interactions
with health system• Multiple data sources of individual health
information
Clinical Data
Analytic Outcomes:• Reduced Ethics Review Board approval for
data release from many months to two weeks• Made linked cancer data available for health
services research• Provided richer levels of individual health
information by linking multiple different data sets
De-identified and linked clinical cancer data with administrative data
For the last few years this was the only mechanism to release microdata
Sharing Cancer Data for Health Services Research
18
© 2014 Privacy Analytics, Inc.
Challenges:• Highly sensitive data on mother and child
interactions with health system • Required a defensible process to release high
quality individual-level data
Public Policy Data Registry
Analytic Outcomes:• Faster release of data for analysis to
researchers and public health with auditable, automated data sharing agreements• Deeper, richer data sets from which to make
public policy decisions• Streamlined interactions with ethics review
Large linked registry available researchers and analysts in Canada and abroad
Data sharing needed to meet rigorous requirements of a prescribed registry
19
Research on Public Health
© 2014 Privacy Analytics, Inc.
Challenges:• Large volume of free-form text data on
thousands of patients that was difficult to analyze because it could not be shared• Limits utility of the clinical data
National Institutes of Health
Analytic Outcomes:De-identified data will allow researchers to:• Test hypotheses for new research• Confirm potential sample sizes for proposed
research • Find collaborators for cross-disciplinary
research studies
Wants to anonymize unstructured text data from more than 400,000 patients
Seeks to augment currently available data in de-identified format
20
Accelerate Research Using Unstructured Data
© 2014 Privacy Analytics, Inc.
Challenges:• Anonymize a claims database of 200k patients
for a competition aimed at improving healthcare• The data needed to look real, with the same
data formats used before anonymization
Open Clinical Data Competition
Analytic Outcomes:Enabled researchers to:• Explore new analytic approaches for a large
data set• Established robust anonymization practices,
and mitigating controls, standard practices to ensure data was used properly
6.7M claims from the State of Louisiana—anyone competing would have access
De-identified data that is realistic provides a compelling framework for innovation
21
Inspiring Innovation through Competition
© 2014 Privacy Analytics, Inc.
© 2014 Privacy Analytics, Inc.
Dat
a se
t siz
e
Larg
eM
id-s
ized
Data typeHomogenousHeterogeneous
State of Louisiana
Customer’s Data Landscape
© 2014 Privacy Analytics, Inc.
Balancing Privacy with Data Utility
Data Quality1 Analytic Granularity2 Depth of Insight3
Ensuring de-identified data has analytic usefulness by minimizing the amount of distortion but still ensure that re-identification risk is very small
Allowing users to configure the extent of de-identification to match the characteristics of the analysis that is anticipated
Enabling analysis of the total patient health experience, to compile a complete picture of this experience from multiple data sources and types
The Analytic Benefits of our Approach
© 2014 Privacy Analytics, Inc.
Primary Structured and Unstructured Data
• Income = $82,000• Plan # 54678
• MRN: 123• cwright@
• Chris Wright• Born Jan 15, 1978• Zip code: 12345
Safe Harbor Method
(Data Masking)
Expert Determination (Statistical De-identification
and Data Masking)
• MRN: 123• cwright@
• Chris Wright• Born Jan 15, 1978• Zip code: 12345
• Income = $82,000• Plan # 54678
External Structured Data
• Income = $82,000• Plan # 54678
EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012
• MRN: 123• cwright@
• Chris Wright• Born Jan 15, 1978• Zip code: 12345
Internal Structured Data
EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012
EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012
Structured & Unstructured Data
How We Enable Analytic Utility (Before)
© 2014 Privacy Analytics, Inc.
Primary Structured and Unstructured Data
• Income = $82,000• Plan # 54678
• MRN: 123• cwright@
• Chris Wright• Born Jan 15, 1978• Zip code: 12345
Safe Harbor Method
(Data Masking)
Expert Determination (Statistical De-identification
and Data Masking)
• MRN: 123• cwright@
• Chris Wright• Born Jan 15, 1978• Zip code: 12345
External Structured Data
• Income = $82,000• Plan # 65123
• MRN: 589• rwong@
• Robert Wong• Born Jan 29, 1978• Zip code: 12346
• Income = $82,000• Plan # 54678
EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012
Internal Structured Data
EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012
Structured & Unstructured Data
How We Enable Analytic Utility (After)
EMR data and notes at last PCP visit:• Admission date: 08/18/2012• Discharge date: 08/20/2012