right protection via watermarking with provable preservation of distance-based mining

34
Right Protection via Watermarking with Provable Preservation of Distance-based Mining Spyros Zoumpoulis Joint work with Michalis Vlachos, Nick Freris and Claudio Lucchese Mathematical & Computational Sciences August 18, 2011 IBM ZRL

Upload: basil

Post on 23-Mar-2016

60 views

Category:

Documents


2 download

DESCRIPTION

Right Protection via Watermarking with Provable Preservation of Distance-based Mining. Spyros Zoumpoulis Joint work with Michalis Vlachos, Nick Freris and Claudio Lucchese Mathematical & Computational Sciences. August 18, 2011 IBM ZRL. Spyros Zoumpoulis. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Right Protection via Watermarking with Provable Preservation ofDistance-based Mining

Spyros ZoumpoulisJoint work with Michalis Vlachos, Nick Freris and

Claudio Lucchese

Mathematical & Computational Sciences

August 18, 2011IBM ZRL

Page 2: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Problem

• Want to distribute datasets, but maintain ownership rights

• Want to maintain ownership rights, but also maintain ability to distill useful knowledge out of data

Transformations

Rights Protection

How can we guarantee that the results on the modified and the original datasets are the same?

Spyros Zoumpoulis Watermarking with Preservation of Distance-based Mining 2

Distance D Watermarking

original distance graph

new graph

Page 3: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Problem

• Want to distribute datasets, but maintain ownership rights

• Want to maintain ownership rights, but also maintain ability to distill useful knowledge out of data

Spyros Zoumpoulis 3

Distance D Watermarking

originaldistance graph Change intensity

of transformation

new graph

Watermarking with Preservation of Distance-based Mining

Page 4: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Roadmap

• Trajectory Datasets

• Objective

• Watermarking Scheme

• Detection Process

• Theoretical Guarantees on Distance Distortion

• Preservation of Nearest Neighbors and Minimum Spanning Tree

• Algorithms for NN and MST Preservation

• Fast algorithms for NN and MST Preservation

• Experiments – Preservation, Speed-up and Resilience against Attacks

• Conclusion

Spyros Zoumpoulis 4Watermarking with Preservation of Distance-based Mining

Page 5: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Roadmap

• Trajectory Datasets

• Objective

• Watermarking Scheme

• Detection Process

• Theoretical Guarantees on Distance Distortion

• Preservation of Nearest Neighbors and Minimum Spanning Tree

• Algorithms for NN and MST Preservation

• Fast algorithms for NN and MST Preservation

• Experiments – Preservation, Speed-up and Resilience against Attacks

• Conclusion

Spyros Zoumpoulis 5Watermarking with Preservation of Distance-based Mining

Page 6: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Trajectory Datasets

Easily collected: smartphones, GPS-enabled devices, etc.

• Epidemiology• Transportation• Emergency situations• …

Spyros Zoumpoulis 6Watermarking with Preservation of Distance-based Mining

Page 7: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Trajectory Datasets

Spyros Zoumpoulis 7

Images/Shapes

Medical

Mobility

Financial

Microsoft

Yahoo

Astronomical

1986 2006

Motion/Video

Handwriting

Watermarking with Preservation of Distance-based Mining

Page 8: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Roadmap

• Trajectory Datasets

• Objective

• Watermarking Scheme

• Detection Process

• Theoretical Guarantees on Distance Distortion

• Preservation of Nearest Neighbors and Minimum Spanning Tree

• Algorithms for NN and MST Preservation

• Fast algorithms for NN and MST Preservation

• Experiments – Preservation, Speed-up and Resilience against Attacks

• Conclusion

Spyros Zoumpoulis 8Watermarking with Preservation of Distance-based Mining

Page 9: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Objective

• Watermark dataset strongly enough so as to right-protect it, weakly enough so that spatial relations between objects are not distorted:

Maintain dataset’s mining utility via distance-based mining operations

• We focus on two topological properties: Nearest Neighbors and Minimum Spanning Tree

• Scheme should– Provide an ownership determination mechanism for dataset– Introduce imperceptible visual distortions on objects– Be robust to malicious data transformations– Allow for appropriate tuning of watermarking power, so that distance relations are

preserved O1

O2

O3

O4

Spyros Zoumpoulis 9Watermarking with Preservation of Distance-based Mining

Page 10: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Roadmap

• Trajectory Datasets

• Objective

• Watermarking Scheme

• Detection Process

• Theoretical Guarantees on Distance Distortion

• Preservation of Nearest Neighbors and Minimum Spanning Tree

• Algorithms for NN and MST Preservation

• Fast algorithms for NN and MST Preservation

• Experiments – Preservation, Speed-up and Resilience against Attacks

• Conclusion

Spyros Zoumpoulis 10Watermarking with Preservation of Distance-based Mining

Page 11: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

+ watermark A

+ watermark B

+ watermark N … If the movie is leaked on the internet, by examining the movie one can deduce the source of the leak

Oscars: Academy voting members get watermarked DVD’s months before official release

Spyros Zoumpoulis 11

Watermarking Scheme

Watermarking with Preservation of Distance-based Mining

Page 12: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Watermarking Scheme

• Object , where

• ,

},...,{ 1 nxxx ibax kkk

},...,{ 1 n

DFTXXXx i

jjjeX

• Multiplicative watermark embedding , ),( pW jj

jjj pW

ˆ1ˆ

10 p

otherwise 0

largest theamong is if }1,1{1 if 0

1 lj

W jj

01

n

jjW

Spyros Zoumpoulis 12Watermarking with Preservation of Distance-based Mining

Page 13: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Watermarking Scheme

Frequency Domain

DFT IDFT

watermarked magnitudes

original trajectory

watermarked trajectory

watermark

Magnitude

Phase

Magnitude

Phasesame

modified

Frequency Domain

p (embedding power)

By construction, mechanism provides resilience to geometric data transformations (rotation, translation, scaling)

Spyros Zoumpoulis 13Watermarking with Preservation of Distance-based Mining

Page 14: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Watermarking Scheme

Spyros Zoumpoulis 14Watermarking with Preservation of Distance-based Mining

Page 15: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Roadmap

• Trajectory Datasets

• Objective

• Watermarking Scheme

• Detection Process

• Theoretical Guarantees on Distance Distortion

• Preservation of Nearest Neighbors and Minimum Spanning Tree

• Algorithms for NN and MST Preservation

• Fast algorithms for NN and MST Preservation

• Experiments – Preservation, Speed-up and Resilience against Attacks

• Conclusion

Spyros Zoumpoulis 15Watermarking with Preservation of Distance-based Mining

Page 16: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Detection Process• Given a watermarked dataset and a watermark W, need a measure of “how

likely” it is that the dataset was watermarked with W (and not another watermark)

• Detection correlation:

• Correlation between watermark W’ and dataset watermarked with W is

Threshold rule: decide was watermarked with W if

WSSSSWx

T

ˆ

:ˆ,

plWpWSWx T 'ˆ,'

• Collect correlation statistics, approximate distributions with normals

S SWx ˆ,

Spyros Zoumpoulis 16Watermarking with Preservation of Distance-based Mining

Page 17: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Roadmap

• Trajectory Datasets

• Objective

• Watermarking Scheme

• Detection Process

• Theoretical Guarantees on Distance Distortion

• Preservation of Nearest Neighbors and Minimum Spanning Tree

• Algorithms for NN and MST Preservation

• Fast algorithms for NN and MST Preservation

• Experiments – Preservation, Speed-up and Resilience against Attacks

• Conclusion

Spyros Zoumpoulis 17Watermarking with Preservation of Distance-based Mining

Page 18: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Theoretical Guarantees on Distance Distortion

• Goal: preservation of spatial relations between objects

• Distance before watermark:

n

j

yj

xj

yj

xj

yj

xjyxD

1

222 cos2),(

• Distance after watermark:

n

j

yj

xj

yj

xj

yj

xjjp WpyxD

1

22222 cos2),(ˆ

n

j

yj

xj

yj

xj

yj

xjjWp

1

22cos22

),(2 yxD

Spyros Zoumpoulis 18Watermarking with Preservation of Distance-based Mining

Page 19: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Theoretical Guarantees on Distance Distortion

Theorem. Given , for any dataset S and objects, , we have

uniformly, for all watermarks consistent with S and embedding powers

1max p Syx ,

yxDpyxDyxDp p ,1,ˆ,1 maxmax

max0 pp

Sketch of proof. LB: UB: subject to

),(ˆminminmin 2

,,yxDpWyxSp

),(ˆmaxmaxmax 2

,,yxDpWyxSp

yxD ,

Spyros Zoumpoulis 19Watermarking with Preservation of Distance-based Mining

Page 20: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Roadmap

• Trajectory Datasets

• Objective

• Watermarking Scheme

• Detection Process

• Theoretical Guarantees on Distance Distortion

• Preservation of Nearest Neighbors and Minimum Spanning Tree

• Algorithms for NN and MST Preservation

• Fast algorithms for NN and MST Preservation

• Experiments – Preservation, Speed-up and Resilience against Attacks

• Conclusion

Spyros Zoumpoulis 20Watermarking with Preservation of Distance-based Mining

Page 21: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Preservation of Nearest Neighbors and Minimum Spanning Tree

• is continuous in p, : for p sufficiently small, any topological property will be preserved

• We focus on Nearest Neighbors and Minimum Spanning Tree because of importance in data analysis

yxDp ,ˆ yxDyxD ,,ˆ0

• Given dataset S and object with Nearest Neighbor , x preserves its NN after the watermarking if

xySyyxDxNNxD pp ,,,ˆ)(,ˆ

Sx xxNN )(

• Given dataset S and objects s.t. (x, y) is an edge of an MST T, (x, y) is preserved in the MST after the watermarking if

where are the connected components T is split into after edge (x, y) has been removed

yxyxpp VvUuvuDyxD ,, ,,,ˆ,ˆ

Syx ,

yxyx VU ,, ,

x

NN(x)

y z

Spyros Zoumpoulis 21Watermarking with Preservation of Distance-based Mining

Page 22: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Roadmap

• Trajectory Datasets

• Objective

• Watermarking Scheme

• Detection Process

• Theoretical Guarantees on Distance Distortion

• Preservation of Nearest Neighbors and Minimum Spanning Tree

• Algorithms for NN and MST Preservation

• Fast algorithms for NN and MST Preservation

• Experiments – Preservation, Speed-up and Resilience against Attacks

• Conclusion

Spyros Zoumpoulis 22Watermarking with Preservation of Distance-based Mining

Page 23: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

NN-Preservation AlgorithmNN-P Watermarking Problem. Given dataset D and watermark W, find the largest s.t. that at least a fraction 1-τ of the objects in D preserve their NN after the watermarking with W

maxmin , ppp

Spyros Zoumpoulis 23Watermarking with Preservation of Distance-based Mining

Page 24: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

MST-Preservation AlgorithmMST-P Watermarking Problem. Given dataset D and watermark W, find the largest s.t. that at least a fraction 1-τ of the edges of an MST of D are preserved in the MST after the watermarking with W

maxmin , ppp

Spyros Zoumpoulis 24Watermarking with Preservation of Distance-based Mining

Page 25: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Roadmap

• Trajectory Datasets

• Objective

• Watermarking Scheme

• Detection Process

• Theoretical Guarantees on Distance Distortion

• Preservation of Nearest Neighbors and Minimum Spanning Tree

• Algorithms for NN and MST Preservation

• Fast algorithms for NN and MST Preservation

• Experiments – Preservation, Speed-up and Resilience against Attacks

• Conclusion

Spyros Zoumpoulis 25Watermarking with Preservation of Distance-based Mining

Page 26: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Fast NN-Preservation AlgorithmCorollary. Given , for any dataset D and objects D, if

then y does not violate the NN of x after the watermarking, for all watermarks consistent with D and powers

1max p yx,

max

max

11

)(,,

pp

xNNxDyxD

max0 pp

x

NN(x)

y1

y2

y3

ρ

)(,11

max

max xNNxDpp

Spyros Zoumpoulis 26Watermarking with Preservation of Distance-based Mining

Page 27: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Fast MST-Preservation AlgorithmCorollary. Given , for any dataset D and edge e in an MST of D, objects , if

then (u,v) does not violate the MST at edge e after the watermarking, for all watermarks consistent with D and powers

1max p

max

max

11

)(,

pp

eDvuD

max0 pp

ee VvUu ,

Spyros Zoumpoulis 27Watermarking with Preservation of Distance-based Mining

Page 28: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Roadmap

• Trajectory Datasets

• Objective

• Watermarking Scheme

• Detection Process

• Theoretical Guarantees on Distance Distortion

• Preservation of Nearest Neighbors and Minimum Spanning Tree

• Algorithms for NN and MST Preservation

• Fast algorithms for NN and MST Preservation

• Experiments – Preservation, Speed-up and Resilience against Attacks

• Conclusion

Spyros Zoumpoulis 28Watermarking with Preservation of Distance-based Mining

Page 29: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Experimental Results - Preservation

Evaluate our technique using visualization. Example of MST preservation:

MST on original dataMST on watermarked data

Spanning Tree After Rights Protection

Spyros Zoumpoulis 29Watermarking with Preservation of Distance-based Mining

Page 30: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Experimental Results – Speed-up of 2-3 orders of magnitude

Compare number of operations and time for exhaustive vs. Fast algorithms

• Computations of coefficients of quadratics– Prune >99.9638% of operations for NN preservation– Prune >99.9978% of operations for MST preservationfor datasets of ~1000 objects

• Quadratic inequalities solved– Prune >99.9789% of operations for NN preservation– Prune >99.9987% of operations for MST preservationfor datasets of ~1000 objects

01.0,0 maxmin pp

01.0,0 maxmin pp

• Running time after pre-processing– NN Preservation: 0.5 s vs. 3.7 min– MST Preservation: 2.8 min vs. 1.4 hrsfor datasets of ~1000 objects

0,01.0,0 maxmin pp

Spyros Zoumpoulis 30Watermarking with Preservation of Distance-based Mining

Page 31: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Experimental Results – Resilience against Attacks• Recipient of data may transform data to obfuscate ownership

• Attacks considered– Geometric transformations (global rotation, translation, scaling)– Gaussian noise addition (space domain and frequency domain)– Downsampling/Upsampling

→ Robustness

Spyros Zoumpoulis 31Watermarking with Preservation of Distance-based Mining

Page 32: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Roadmap

• Trajectory Datasets

• Objective

• Watermarking Scheme

• Detection Process

• Theoretical Guarantees on Distance Distortion

• Preservation of Nearest Neighbors and Minimum Spanning Tree

• Algorithms for NN and MST Preservation

• Fast algorithms for NN and MST Preservation

• Experiments – Preservation, Speed-up and Resilience against Attacks

• Conclusion

Spyros Zoumpoulis 32Watermarking with Preservation of Distance-based Mining

Page 33: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Conclusion• Tradeoff: rights protection vs. preservation of mining utility

• Proved fundamental tight bounds on distance distortion due to watermarking

• Future work– Other data transformations– Provide a unified framework for preservation of general mining algorithms under

general data transformations

• Leveraged analysis to propose efficient algorithms for NN and MST preservation

• Presented algorithms that identify the max embedding power that preserves NN and MST

• Technique preserves distance properties, is resilient to malicious attacks

Spyros Zoumpoulis 33

Transformations

Rights Protection

How can we guarantee that the results on the modified and the original datasets are the same?

AnonymizationCompression

Watermarking with Preservation of Distance-based Mining

Page 34: Right Protection via Watermarking with Provable Preservation of Distance-based Mining

Preservation of Nearest Neighbors and Minimum Spanning Tree

MST preservation does not imply NN preservation…

…and vice versa

Spyros Zoumpoulis 34Watermarking with Preservation of Distance-based Mining