time series compressibility and privacy vldb 2007 : time-series data mining presented by spiros...
TRANSCRIPT
Time Series Compressibility and PrivacyTime Series Compressibility and Privacy
VLDB 2007 : Time-Series Data Mining
Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu
IBM T.J. Watson Research Center, Boston University
2008-01-18
Summerized By Jaeseok Myung
Copyright 2008 by CEBT
A driver installing a vehicle monitoring system
May not wish to reveal his exact speed
But still allow mining of general driving patterns
Partial “information hiding” via data perturbation, for time series
MotivationMotivation
Center for E-Business Technology IDS Lab. Seminar – 2/13
Copyright 2008 by CEBT
PerturbationPerturbation
Introduce uncertainty about individual values by perturbing them
(Published value = True value + Perturbation) at time t
Center for E-Business Technology
Random Deterministic
IDS Lab. Seminar – 3/13
Copyright 2008 by CEBT
True Value EstimationTrue Value Estimation
Case : Random Perturbation
Assume that an attacker know the shape of the series with arbitrary accuracy
– Reconstruction via filtering
Case : Deterministic Perturbation
Assume that an attacker direct access to an arbitrary number of true values
– Reconstruction from true value leaks (regression)
Center for E-Business Technology IDS Lab. Seminar – 4/13
Copyright 2008 by CEBT
GoalsGoals
Partial “information hiding” via data perturbation, for time series
Perturbation adapts to data properties
Automatically combines “random” and “deterministic” at appropriate scales
Evaluate against both
Filtering
True value leaks
Suitable for on-the-fly, streaming perturbation
Center for E-Business Technology IDS Lab. Seminar – 5/13
Copyright 2008 by CEBT
User-defined Utility
DiscordDiscord
Center for E-Business Technology IDS Lab. Seminar – 6/13
Copyright 2008 by CEBT
General algorithmGeneral algorithm
0 : Choose a “description” or basis
1 : Perturb only those coefficients that are “important” in the chosen description
2 : Determine by how much to perturb them
Center for E-Business Technology
FFT
DWT
IDS Lab. Seminar – 7/13
Copyright 2008 by CEBT
Streaming PerturbationStreaming Perturbation
We want to perturb values as they arrive, before seeing the entire series, which grows indefinitely
Center for E-Business Technology
Step 1 : Coefficients
Step 2 : Noise Allocation
IDS Lab. Seminar – 8/13
Copyright 2008 by CEBT
ExperimentsExperiments
Datasets
Perturbation Methods
IID
Fourier-based (FFT)
Batch wavelet-based(DWT)
Streaming wavelet-based (str. DWT)
Center for E-Business Technology IDS Lab. Seminar – 9/13
Copyright 2008 by CEBT
Final UncertaintyFinal Uncertainty
Center for E-Business Technology IDS Lab. Seminar – 10/13
Copyright 2008 by CEBT
Uncertainty ReductionUncertainty Reduction
Center for E-Business Technology IDS Lab. Seminar – 11/13
Copyright 2008 by CEBT
True UncertaintyTrue Uncertainty
Center for E-Business Technology IDS Lab. Seminar – 12/13
Copyright 2008 by CEBT
ConclusionConclusion
Partial information hiding via data perturbation
User-defined discord (utility)
Adapts to data properties
Automatically combines “random” and “deterministic” at appropriate scales
Additionally preserves spectral properties
Evaluate against both
Filtering
True value leaks
Suitable for on-the-fly, streaming perturbation
Center for E-Business Technology IDS Lab. Seminar – 13/13