time series compressibility and privacy vldb 2007 : time-series data mining presented by spiros...

13
Time Series Compressibility and Privacy Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu IBM T.J. Watson Research Center, Boston University 2008-01-18 Summerized By Jaeseok Myung

Upload: brooke-logan

Post on 25-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu

Time Series Compressibility and PrivacyTime Series Compressibility and Privacy

VLDB 2007 : Time-Series Data Mining

Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu

IBM T.J. Watson Research Center, Boston University

2008-01-18

Summerized By Jaeseok Myung

Page 2: Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu

Copyright 2008 by CEBT

A driver installing a vehicle monitoring system

May not wish to reveal his exact speed

But still allow mining of general driving patterns

Partial “information hiding” via data perturbation, for time series

MotivationMotivation

Center for E-Business Technology IDS Lab. Seminar – 2/13

Page 3: Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu

Copyright 2008 by CEBT

PerturbationPerturbation

Introduce uncertainty about individual values by perturbing them

(Published value = True value + Perturbation) at time t

Center for E-Business Technology

Random Deterministic

IDS Lab. Seminar – 3/13

Page 4: Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu

Copyright 2008 by CEBT

True Value EstimationTrue Value Estimation

Case : Random Perturbation

Assume that an attacker know the shape of the series with arbitrary accuracy

– Reconstruction via filtering

Case : Deterministic Perturbation

Assume that an attacker direct access to an arbitrary number of true values

– Reconstruction from true value leaks (regression)

Center for E-Business Technology IDS Lab. Seminar – 4/13

Page 5: Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu

Copyright 2008 by CEBT

GoalsGoals

Partial “information hiding” via data perturbation, for time series

Perturbation adapts to data properties

Automatically combines “random” and “deterministic” at appropriate scales

Evaluate against both

Filtering

True value leaks

Suitable for on-the-fly, streaming perturbation

Center for E-Business Technology IDS Lab. Seminar – 5/13

Page 6: Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu

Copyright 2008 by CEBT

User-defined Utility

DiscordDiscord

Center for E-Business Technology IDS Lab. Seminar – 6/13

Page 7: Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu

Copyright 2008 by CEBT

General algorithmGeneral algorithm

0 : Choose a “description” or basis

1 : Perturb only those coefficients that are “important” in the chosen description

2 : Determine by how much to perturb them

Center for E-Business Technology

FFT

DWT

IDS Lab. Seminar – 7/13

Page 8: Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu

Copyright 2008 by CEBT

Streaming PerturbationStreaming Perturbation

We want to perturb values as they arrive, before seeing the entire series, which grows indefinitely

Center for E-Business Technology

Step 1 : Coefficients

Step 2 : Noise Allocation

IDS Lab. Seminar – 8/13

Page 9: Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu

Copyright 2008 by CEBT

ExperimentsExperiments

Datasets

Perturbation Methods

IID

Fourier-based (FFT)

Batch wavelet-based(DWT)

Streaming wavelet-based (str. DWT)

Center for E-Business Technology IDS Lab. Seminar – 9/13

Page 10: Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu

Copyright 2008 by CEBT

Final UncertaintyFinal Uncertainty

Center for E-Business Technology IDS Lab. Seminar – 10/13

Page 11: Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu

Copyright 2008 by CEBT

Uncertainty ReductionUncertainty Reduction

Center for E-Business Technology IDS Lab. Seminar – 11/13

Page 12: Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu

Copyright 2008 by CEBT

True UncertaintyTrue Uncertainty

Center for E-Business Technology IDS Lab. Seminar – 12/13

Page 13: Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu

Copyright 2008 by CEBT

ConclusionConclusion

Partial information hiding via data perturbation

User-defined discord (utility)

Adapts to data properties

Automatically combines “random” and “deterministic” at appropriate scales

Additionally preserves spectral properties

Evaluate against both

Filtering

True value leaks

Suitable for on-the-fly, streaming perturbation

Center for E-Business Technology IDS Lab. Seminar – 13/13