"future analytics - fabrication of synthetic data", dr. susan wegner,vp smart data...

Dr. Susan Wegner, Telekom Innovation Laboratories

25. Februar 2015, BITKOM Big Data Summit, Hanau

Future analytics – Fabrication of Synthetic Data

DATA NATIVES 2015, Dr. Susan Wegner, Telekom Innovation Laboratories

www.laboratories.telekom.com @T_Labs

ACCESS TO DATA IS STILL AN ISSUE

DUE TO DIFFERENT TECHNOLOGY AND DATA SOURCES

Depersonalization approaches

Depersonalization

Standard Anonymization Approaches Adaption of real data using data manipulation

techniques to increase k-anonymity*

Pertubation Regression Classification

Tree Generalization Suppression Replacement

Markov Chain

Further Methods

*each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appear in the release Source: http://whimsley.typepad.com/whimsley/2011/09/data-anonymization-and-re-identification-some-basics-of-data-privacy.html

Synthesization Creation of new data with same properties using machines learning methods (Ongoing Research)

Standard Anonymization Approaches A tradeoff between anonymity and usefulness

Tradeoff

Perfect Anonymity

Perfect Usefulness

It is not possible to create a perfectly anonymised dataset that is perfectly useful to researchers at the same time.

Pro: 100% Data Privacy

Con: Data Loss and distortions can compromise conclusions

Pro: A maximum of data based insights is possible Con: Disclosure of individuals is possible

Decreasing Intensity of Anonymization

Standard Anonymization Approaches Combining data sources endangers anonymity

Anonymized Netflix Data (2007)

10 million movie ratings

500,000 customers

Personal details were removed and replaced by random numbers

Public IMDB* Data (2007)

Users who entered movie ranking using real name

Sources Combination

Rankings

Anonymized Netflix Data

Public IMDB Data

Users on IMDB using their real name had similar ranking patterns in the Netflix data

It was possible to find all other preferences of those user in the Netflix Data

Danger

You can never fully estimate the anonymity of your data using standard approaches

Conclusion

*IMDB = Internet Movie Database Source: Narayanan & Shmatikov (2008), Robust De-anonymization of Large Datasets

Depersonalization Approaches

Depersonalization

Standard Anonymization Approaches Adaption of real data using data manipulation

techniques to increase k-anonymity*

Pertubation Regression Classification

Tree Generalization Suppression Replacement

Markov Chain

Further Methods

Synthesization Creation of new data with same properties using machines learning methods (Ongoing Research)

*each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appear in the release Source: http://whimsley.typepad.com/whimsley/2011/09/data-anonymization-and-re-identification-some-basics-of-data-privacy.html

synthetic data to overcome privacy issues

ADVANTAGES & DISADVANTAGES OF TECHNIQUES

Rendering anonymous means the modification of personal data so that the information concerning personal or material circumstances can no longer be attributed to an identified or identifiable individual.

Anonymization

Actual data is used to develop patterns in which the characteristics of this actual data are largely retained. These patterns are then used to generate new data, which no longer has any reference to an individual in the actual data. Synthetic data makes it possible, for the first time, to use data that was previously unavailable.

Synthetization

Standard Anonymization Approaches vs. Synthetic Data

Standard Approaches vs. Synthetic Data Synthetic data is not always superior

Creation is fast and easy

Suitable for real time provision

Individual is retained

Unrestricted data transfer

Unrestricted data storage

100% protection of individuals

No data loss or distortion

Standard Anonymization Synthesization

No approach is completely superior over the other

Synthetic data beats standard anonymization, if it leads to data loss or if it doesn’t allow unrestricted storage and transfer of data due to data privacy issues or volume restrictions

Conclusion

Main Advantages of Synthetic Data

Synthetic Data

Standard Approaches

First results Comparison of Distributions

0 50000 100000 150000 200000

AIF/MOC

AIF/MTC

AIF/Upd ate Location

IuCS MOC

IuCS MTC

Deviations are statistically not significant

Distribution of the variable ‘Place’ in the source And synthetized data set

Distribution of the variable ‘activity’ in the source And synthetized data set

Source Synthetized

Amount of cases for each activity

Activity 1 Activity 2 Activity 3 Activity 4 Activity 5

Why Synthetic data?

The Solution/USP:

Synthetic data have nearly the same quality as the original.

They cannot be traced back to their origin.

100% compliant with Data Privacy

Patents pending (disruptive technology).

It can be stored in any way and transferred to other.

This makes new services, including individualized services, possible.

Thank you

WE SHAPE THE FUTURE

BACKUP

Data modelling

Real Data

New Data

1. Collection of several events 2. Clustering 3. Formation of regional patterns

4. Probability model 5. Fabrication of synthetic data

All possible events

Regional distribution

Local distribution

Future Analytics – exemplary application

Vision – include everything in one picture

100% Compliant with Data Privacy

"future analytics - fabrication of synthetic data", dr. susan wegner,vp smart data...

Data & Analytics

telekom dunyasi_ekim

schildener_deutsche telekom

exegese do nt - uwe wegner - cap3

@ deutsche telekom

hb wegner auto sept2013

open telekom cloud · with open telekom cloud, telekom...

big data @ magyar...

carl wegner

workshop for xrf-spectrometry anne wegner,...

deutsche telekom open telekom cloud powered by...

telekom malaysia

wegner cpas outsourced accounting services

turk telekom

page 2 - the manning exchange, december 26, 2016 robert...

wegner nelson method.pdf

wegner premature demise 1992

wegner cpa b2b presentation

documents needed checklist with dean wegner

dean wegner strong mortgage products

paul smith celebrates wegner