www.privacyanalytics.ca | 855.686.4781 [email protected] 251 laurier avenue, suite 200...

25
www.privacyanalytics.ca | 855.686.4781 [email protected] 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

Upload: noel-carr

Post on 17-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

www.privacyanalytics.ca | [email protected]

251 Laurier Avenue, Suite 200Ottawa, Ontario, Canada K1P 5J6

Page 2: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Privacy Analytics

For organizations that want to safeguard and enable data for secondary purposes …

•Automates the masking and de-identification of data using a risk-based approach to anonymization

•Integrated capabilities to anonymize structured and unstructured data from multiple sources

•Peer-reviewed methodologies and value-added services that certify data as de-identified

Page 3: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Our Product and Services

PARATCORE

PARAT TEXT

CertificationServices

Risk Management

Training

Audit re-identification risk using threat models and scenarios

Develop internal expertise around

managing re-identification

risks

Anonymize unstructured data

in text and XML documents.

Automate the measurement ofre-identification risk and anonymize data

Page 4: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

How PARAT CORE Works

Measure Risk

Select Data Anonymize Data

ManageReleases

Page 5: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

HIPAA De-identification Methods

Our software automates statistical de-identification methods in an integrated way to 1) ensure that we safeguard our customers’ data; and 2) maximize its analytic utility

Page 6: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Primary Structured and Unstructured Data

• Income = $82,000• Plan # 54678

• MRN: 123• cwright@

• Chris Wright• Born Jan 15, 1978• Zip code: 12345

Safe Harbor Method

(Data Masking)

Expert Determination (Statistical De-identification

and Data Masking)

• MRN: 123• cwright@

• Chris Wright• Born Jan 15, 1978• Zip code: 12345

• Income = $82,000• Plan # 54678

External Structured Data

• Income = $82,000• Plan # 65123

• MRN: 589• rwong@

• Robert Wong• Born Jan 29, 1978• Zip code: 12346

• Income = $82,000• Plan # 54678

EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012

• MRN: 123• cwright@

• Chris Wright• Born Jan 15, 1978• Zip code: 12345

Internal Structured Data

EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012

EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012

Structured & Unstructured Data

How We Enable Analytic Utility

EMR data and notes at last PCP visit:• Admission date: 08/18/2012• Discharge date: 08/20/2012

Page 7: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Our Approach: De-identificationTaking into Account the Risk of Disclosure

If the measured risk does not meet the threshold, specific

transformations (such as generalization and

suppression) are applied to reduce the risk.

Based on plausible attacks, appropriate metrics are

selected and used to measure actual re-identification risk

from the data.De-identification

Process

Measure Risk

Apply Transformations

Set Risk Threshold

Based on the characteristics of the data recipient, the data, and precedents, a quantitative risk threshold is set.

This is an iterative process. The mitigating controls in place can be strengthened to get a more forgiving threshold.

Page 8: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Re-identification Risk: Example

DIRECT IDENTIFIERS INDIRECT IDENTIFIERS SENSITIVE VARIABLES OTHER

ID Name Telephone No. Sex Year of Birth Lab Test Lab Result

Pay Delay

1 John Smith (412) 668-5468 M 1959 Albumin, Serum 4.8 37

2 Alan Smith (413) 822-5074 M 1969 Creatine Kinase 86 36

3 Alice Brown (416) 886-5314 F 1955 Alkaline Phosphatase 66 52

4 Hercules Green (613)763-5254 M 1959 Bilirubin <0 36

5 Alicia Freds (613) 586-6222 F 1942 BUN/Creatinine Ratio 17 82

6 Gill Stringer (954) 699-5423 F 1975 Calcium, Serum 9.2 34

7 Marie Kirkpatrick (416) 786-6212 F 1966 Free Thyroxine Index 2.7 23

8 Leslie Hall (905) 668-6581 F 1987 Globulin, Total 3.5 9

9 Douglas Henry (416) 423-5965 M 1959 B-type Natriuretic peptide 134 38

10 Fred Thompson (416) 421-7719 M 1967 Creatine Kinase 80 21

3Two quasi-identifiers

matching in three cells within a dataset

Page 9: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Identifiability Spectrum

Little De-identification Significant De-identification

5

20

3

2

10

811

16

A range of operational precedents exist based on the situational context of the data’s use and available mitigating controls that protect it.

Page 10: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Identifiability Spectrum

Little De-identification Significant De-identification

5

20

3

2

10

811

16

Leading research organizations apply these precedents to data release for secondary purposes. We’ve embedded these precedents into PARAT CORE.

Page 11: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Balancing Privacy with Analytic Utility

Page 12: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

PARAT

Page 13: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Why PARAT CORE?

A scalable set of capabilities that enables the release of anonymized data for analysis, while safeguarding personal information to:

Automate

Audit

Analyze

Page 14: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Why Privacy Analytics?

• Research into risk and statistical de-identification since 2004

• Recognized by Privacy by Design as an Ambassador

• Defensible and auditable approaches to meet regulatory obligations for Canada and the U.S.

• Methodology, approach and algorithms peer reviewed in lead academic publications

Half of Fortune 50 healthcare companies have engaged Privacy Analytics. And it’s because of our:

• Software and professional services delivered to more than 100 customers

• Serves complex, large heterogeneous and homogenous data environments

• Support for large structured and unstructured data sets

SoftwareMethodologyExpertise

Page 15: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Thank You

Contact name:Title:Phone:Email:

Page 16: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Challenges:• Significant size of the data set. Held more than

five years of clinical, prescription, laboratory, scheduling and billing data of patients• Numerous release requests from more than

2500 clinics and 5000 physicians

EMR Software Vendor

Analytic Outcomes:De-identified data to analyze:• Post-marketing surveillance of adverse events• Public health surveillance• Prescription pattern analysis• Health services analysis

Wanted to anonymize data on 550,000 patients from general practices

Longitudinal data needed to be used for on-going and on-demand analytics

17

Post-marketing and Public Health Surveillance

Page 17: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Challenges:• Highly sensitive data on individual interactions

with health system• Multiple data sources of individual health

information

Clinical Data

Analytic Outcomes:• Reduced Ethics Review Board approval for

data release from many months to two weeks• Made linked cancer data available for health

services research• Provided richer levels of individual health

information by linking multiple different data sets

De-identified and linked clinical cancer data with administrative data

For the last few years this was the only mechanism to release microdata

Sharing Cancer Data for Health Services Research

18

Page 18: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Challenges:• Highly sensitive data on mother and child

interactions with health system • Required a defensible process to release high

quality individual-level data

Public Policy Data Registry

Analytic Outcomes:• Faster release of data for analysis to

researchers and public health with auditable, automated data sharing agreements• Deeper, richer data sets from which to make

public policy decisions• Streamlined interactions with ethics review

Large linked registry available researchers and analysts in Canada and abroad

Data sharing needed to meet rigorous requirements of a prescribed registry

19

Research on Public Health

Page 19: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Challenges:• Large volume of free-form text data on

thousands of patients that was difficult to analyze because it could not be shared• Limits utility of the clinical data

National Institutes of Health

Analytic Outcomes:De-identified data will allow researchers to:• Test hypotheses for new research• Confirm potential sample sizes for proposed

research • Find collaborators for cross-disciplinary

research studies

Wants to anonymize unstructured text data from more than 400,000 patients

Seeks to augment currently available data in de-identified format

20

Accelerate Research Using Unstructured Data

Page 20: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Challenges:• Anonymize a claims database of 200k patients

for a competition aimed at improving healthcare• The data needed to look real, with the same

data formats used before anonymization

Open Clinical Data Competition

Analytic Outcomes:Enabled researchers to:• Explore new analytic approaches for a large

data set• Established robust anonymization practices,

and mitigating controls, standard practices to ensure data was used properly

6.7M claims from the State of Louisiana—anyone competing would have access

De-identified data that is realistic provides a compelling framework for innovation

21

Inspiring Innovation through Competition

Page 21: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Page 22: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Dat

a se

t siz

e

Larg

eM

id-s

ized

Data typeHomogenousHeterogeneous

State of Louisiana

Customer’s Data Landscape

Page 23: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Balancing Privacy with Data Utility

Data Quality1 Analytic Granularity2 Depth of Insight3

Ensuring de-identified data has analytic usefulness by minimizing the amount of distortion but still ensure that re-identification risk is very small

Allowing users to configure the extent of de-identification to match the characteristics of the analysis that is anticipated

Enabling analysis of the total patient health experience, to compile a complete picture of this experience from multiple data sources and types

The Analytic Benefits of our Approach

Page 24: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Primary Structured and Unstructured Data

• Income = $82,000• Plan # 54678

• MRN: 123• cwright@

• Chris Wright• Born Jan 15, 1978• Zip code: 12345

Safe Harbor Method

(Data Masking)

Expert Determination (Statistical De-identification

and Data Masking)

• MRN: 123• cwright@

• Chris Wright• Born Jan 15, 1978• Zip code: 12345

• Income = $82,000• Plan # 54678

External Structured Data

• Income = $82,000• Plan # 54678

EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012

• MRN: 123• cwright@

• Chris Wright• Born Jan 15, 1978• Zip code: 12345

Internal Structured Data

EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012

EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012

Structured & Unstructured Data

How We Enable Analytic Utility (Before)

Page 25: Www.privacyanalytics.ca | 855.686.4781 info@privacyanalytics.ca 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6

© 2014 Privacy Analytics, Inc.

Primary Structured and Unstructured Data

• Income = $82,000• Plan # 54678

• MRN: 123• cwright@

• Chris Wright• Born Jan 15, 1978• Zip code: 12345

Safe Harbor Method

(Data Masking)

Expert Determination (Statistical De-identification

and Data Masking)

• MRN: 123• cwright@

• Chris Wright• Born Jan 15, 1978• Zip code: 12345

External Structured Data

• Income = $82,000• Plan # 65123

• MRN: 589• rwong@

• Robert Wong• Born Jan 29, 1978• Zip code: 12346

• Income = $82,000• Plan # 54678

EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012

Internal Structured Data

EMR data and notes at last PCP visit:• Admission date: 08/15/2012• Discharge date: 08/17/2012

Structured & Unstructured Data

How We Enable Analytic Utility (After)

EMR data and notes at last PCP visit:• Admission date: 08/18/2012• Discharge date: 08/20/2012