self-identifying sensor data - jeff vaughan · parameter check bit allows for parameter recovery....

62
Self-Identifying Sensor Data Jeffrey Vaughan Department of Computer Science Harvard University With Stephen Chong (Harvard) and Christian Skalka (UVM) IPSN April 13, 2010 1/23

Upload: others

Post on 01-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Self-Identifying Sensor Data

Jeffrey Vaughan

Department of Computer ScienceHarvard University

With Stephen Chong (Harvard) and Christian Skalka (UVM)

IPSNApril 13, 2010

1/23

Page 2: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Example: Hubbard Brook Ecosystem Study

2/23

Page 3: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Example: Hubbard Brook Ecosystem Study

2/23

Page 4: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Data sharing: a tension

Scientists want their work to be used and cited.But don’t want use without attribution.

GoalTrack data provenance in a light-weight, cooperative way.

3/23

Page 5: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Key Idea: Encode provenance in insignificant bits.

Sensorw/ error±1mm

4/23

Page 6: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Key Idea: Encode provenance in insignificant bits.

Sensorw/ error±1mm

31234423818

123275435493

Snow (μm)

4/23

Page 7: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Key Idea: Encode provenance in insignificant bits.

Sensorw/ error±1mm

31234423818

123275435493

Snow (μm)

ProvenaceMark: 54321

4/23

Page 8: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Key Idea: Encode provenance in insignificant bits.

Sensorw/ error±1mm

31234423818

123275435493

Snow (μm)

ProvenaceMark: 54321

185338425215

4/23

Page 9: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Key Idea: Encode provenance in insignificant bits.

Sensorw/ error±1mm

31234423818

123275435493

Snow (μm)

ProvenaceMark: 54321

185338425215

4/23

Page 10: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Key Idea: Encode provenance in insignificant bits.

Sensorw/ error±1mm

31234423818

123275435493

Snow (μm)

ProvenaceMark: 54321

185338425215

Publish 31218523338

123425435215

Snow (μm)

4/23

Page 11: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Key Idea: Encode provenance in insignificant bits.

Sensorw/ error±1mm

31234423818

123275435493

Snow (μm)

ProvenaceMark: 54321

185338425215

Publish 31218523338

123425435215

Snow (μm)

Sample

4/23

Page 12: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Key Idea: Encode provenance in insignificant bits.

Sensorw/ error±1mm

31234423818

123275435493

Snow (μm)

ProvenaceMark: 54321

185338425215

Publish 31218523338

123425435215

Snow (μm)

Sample

4/23

Page 13: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Key Idea: Encode provenance in insignificant bits.

Sensorw/ error±1mm

31234423818

123275435493

Snow (μm)

ProvenaceMark: 54321

185338425215

Publish 31218523338

123425435215

Snow (μm)

Sample

Round

90

3020

4/23

Page 14: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Key Idea: Encode provenance in insignificant bits.

Sensorw/ error±1mm

31234423818

123275435493

Snow (μm)

ProvenaceMark: 54321

185338425215

Publish 31218523338

123425435215

Snow (μm)

Sample

Round

90

3020

Reorder

435220

4/23

Page 15: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Key Idea: Encode provenance in insignificant bits.

Sensorw/ error±1mm

31234423818

123275435493

Snow (μm)

ProvenaceMark: 54321

185338425215

Publish 31218523338

123425435215

Snow (μm)

Sample

Round

90

3020

Reorder

435220

4/23

Page 16: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Key Idea: Encode provenance in insignificant bits.

Sensorw/ error±1mm

31234423818

123275435493

Snow (μm)

ProvenaceMark: 54321

185338425215

Publish 31218523338

123425435215

Snow (μm)

Sample

Round

90

3020

Reorder

435220

ProvenaceMark: 54321

Recover

4/23

Page 17: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Key Idea: Encode provenance in insignificant bits.

Sensorw/ error±1mm

31234423818

123275435493

Snow (μm)

ProvenaceMark: 54321

185338425215

Publish 31218523338

123425435215

Snow (μm)

Sample

Round

90

3020

Reorder

435220

ProvenaceMark: 54321

Recover

Lookup

4/23

Page 18: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

The scientific/educational threat model is unusual.

Scientists. . .

are cooperative: they want to properly cite and attributedata.

use legacy workflows, and will be harmed byschema/format changes.

care a lot about data integrity, but understand the boundson measurement precision and accuracy.

5/23

Page 19: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Outline

1 Introduction

2 Encoding and Decoding

3 Evaluation

4 Conclusion

6/23

Page 20: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding and Decoding

7/23

Page 21: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Self-identifying data based on three functions

Encode : Dataset ×Mark → AnnotatedDataset

Check : Dataset ×Mark → bool

Recover : Dataset → Mark

Where

Mark , int32Datapoint , int

Dataset , AnnotatedDataset , List of Datapoint

8/23

Page 22: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Self-identifying data based on three functions

Encode : Dataset ×Mark → AnnotatedDataset

Check : Dataset ×Mark → bool (cf. One-bit watermarking)

Recover : Dataset → Mark (cf. Blind watermarking)

Where

Mark , int32Datapoint , int

Dataset , AnnotatedDataset , List of Datapoint

8/23

Page 23: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Data is collected for scientific analysis.

Sensors measure real-world phenomena.High-order significant bits contain meaningful data.Low-order insignificant bits contain noise.Acceptable encodings should be minimally perceptible.

Guiding principle

Never alter significant bits.

9/23

Page 24: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must be robust against transformation.

Sampling Store metadata in many points: Pointwiseredundancy.

Truncation Store key metadata in more significant bits.

Other metadata stored in several bit locations:Positional redundancy

Reordering Order independent encoding.

Insertions/Bit Flips Enable recovery even with “bad” datapoints.

10/23

Page 25: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp

11/23

Page 26: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

11/23

Page 27: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

11/23

Page 28: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 411/23

Page 29: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 411/23

Page 30: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 811/23

Page 31: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 8

ppN = 8

11/23

Page 32: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 8

ppN = 8

hash(1110001100) mod N = 4pp 11/23

Page 33: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 8

ppN = 8

hash(1110001100) mod N = 4pp

01 1 0

4pp

11/23

Page 34: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 8

ppN = 8

hash(1110001100) mod N = 4pp

01 1 0

4pp

01 1 0

11/23

Page 35: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 8

ppN = 8

hash(1110001100) mod N = 4pp

01 1 0

4pp

01 1 001 1 0

11/23

Page 36: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 8

ppN = 8

hash(1110001100) mod N = 4pp

01 1 0

4pp

01 1 001 1 0

01 1 0

Insig

11/23

Page 37: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 8

ppN = 8

hash(1110001100) mod N = 4pp

01 1 0

4pp

01 1 001 1 0

01 1 0

Insig

Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp

11/23

Page 38: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 8

ppN = 8

hash(1110001100) mod N = 4pp

01 1 0

4pp

01 1 001 1 0

01 1 0

Insig

Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp

1

11/23

Page 39: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 8

ppN = 8

hash(1110001100) mod N = 4pp

01 1 0

4pp

01 1 001 1 0

01 1 0

Insig

Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp

101

11/23

Page 40: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 8

ppN = 8

hash(1110001100) mod N = 4pp

01 1 0

4pp

01 1 001 1 0

01 1 0

Insig

Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp

101

Insig

11/23

Page 41: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 8

ppN = 8

hash(1110001100) mod N = 4pp

01 1 0

4pp

01 1 001 1 0

01 1 0

Insig

Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp

101

Insig

Mark check bit: mc = hash(Sig. Bits@Mark) mod 2 = 0

11/23

Page 42: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 8

ppN = 8

hash(1110001100) mod N = 4pp

01 1 0

4pp

01 1 001 1 0

01 1 0

Insig

Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp

101

Insig

Mark check bit: mc = hash(Sig. Bits@Mark) mod 2 = 0

0

11/23

Page 43: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 8

ppN = 8

hash(1110001100) mod N = 4pp

01 1 0

4pp

01 1 001 1 0

01 1 0

Insig

Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp

101

Insig

Mark check bit: mc = hash(Sig. Bits@Mark) mod 2 = 0

010

11/23

Page 44: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding must use limited supply of insignificant bits

Harvard Red

Publish

Datapoint0 1111 1 11 00 00 000 1c

mc

p

Mark0 1010 0 10 11 10 100 0

c

pp Sig. BitsSig. Bits Insig

L = 6md

0 1111 1 00 00

Number of Provenance Pieces: ppN = 4

0pp 2pp1pp 3pp

ppN = 4

0pp 2pp1pp 3pp4pp 6pp5pp 7pp

ppN = 8

ppN = 8

hash(1110001100) mod N = 4pp

01 1 0

4pp

01 1 001 1 0

01 1 0

Insig

Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp

101

Insig

Mark check bit: mc = hash(Sig. Bits@Mark) mod 2 = 0

010

Insig

11/23

Page 45: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Decoding uses all metadata from annotated datasets.

parameter check: recovery of encoding parameters

mark check: checking if a mark is consistent with dataset

provenance piece: recover an unknown mark from a dataset

12/23

Page 46: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Parameter check bit allows for parameter recovery.

GoalLet D be an annotated data set with k elements. Find Lmd andNpp used in encoding.

Algorithm sketch (FindParams)

Enumerate Lmd and Npp candidates, rejecting incorrectguesses.

Calculate pc for each guess and compare with data points in D.

Correct guess: Probability pc-consistent = 1.Incorrect guess:

Probability consistent with any data point ≈ 1/2.Probability consistent with all points ≈ (1/2)k .

13/23

Page 47: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Parameter check bit allows for parameter recovery.

GoalLet D be an annotated data set with k elements. Find Lmd andNpp used in encoding.

Algorithm sketch (FindParams)

Enumerate Lmd and Npp candidates, rejecting incorrectguesses.(200 or fewer candidates in the worst case.)Calculate pc for each guess and compare with data points in D.

Correct guess: Probability pc-consistent = 1.Incorrect guess:

Probability consistent with any data point ≈ 1/2.Probability consistent with all points ≈ (1/2)k .

13/23

Page 48: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Define Check function using mark check bit.

Goal

Check(D,m) = true

iff mark m is encoded into dataset D.

Algorithm sketch (Check )

Recover parameter Lmd (and Npp) using FindParams.

For each data point in D, recompute mc.

Correct guess: Probability mc-consistent = 1.Incorrect guess:

Probability consistent with any data point ≈ 1/2.Probability consistent with all points ≈ (1/2)k .

14/23

Page 49: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Define Check function using mark check bit.

Goal

Check(D,m) = true

iff mark m is encoded into dataset D.

Algorithm sketch (Check )

Recover parameter Lmd (and Npp) using FindParams.

For each data point in D, recompute mc.

Correct guess: Probability mc-consistent = 1.Incorrect guess:

Probability consistent with any data point ≈ 1/2.Probability consistent with all points ≈ (1/2)k .

14/23

Page 50: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Recover function based on Check .

Goal

Recover(D) = m

iff mark m is encoded into dataset D.

Algorithm sketch (Recover )

Let suggestions = GatherSuggestions(D)Let candidates : List mark = Merge(suggestions)

for each m ∈ candidatesif Check(D,m)

return m

Heuristics in Merge keep candidateslist short, and search tractable.

15/23

Page 51: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Recover function based on Check .

Goal

Recover(D) = m

iff mark m is encoded into dataset D.

Algorithm sketch (Recover )

Let suggestions = GatherSuggestions(D)Let candidates : List mark = Merge(suggestions)

for each m ∈ candidatesif Check(D,m)

return m

Heuristics in Merge keep candidateslist short, and search tractable.

15/23

Page 52: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Recover function based on Check .

Goal

Recover(D) = m

iff mark m is encoded into dataset D.

Algorithm sketch (Recover )

Let suggestions = GatherSuggestions(D)Let candidates : List mark = Merge(suggestions)

for each m ∈ candidatesif Check(D,m)

return m

Heuristics in Merge keep candidateslist short, and search tractable.

15/23

Page 53: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Probabilistic algorithms deal with corruption.

Key Ideas

Empirical cutoffs < 1 determine when mc- andpc-consistency scores indicate correct parameters ormarks.

Weighted voting selects likely candidates provenancemarks.

See paper for details.

16/23

Page 54: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Evaluation

17/23

Page 55: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Encoding preserves bulk statistical data properties.

Assumption: hash mod k samples uniform distributionover {0 . . . k − 1}.Let ε = 2Lmd , the largest value written with insignificant bits.

Change in mean Change invariance

worst case ε− 1 (ε− 1)2/4

insignificant bits ∼ 0 0uniformly dist.

18/23

Page 56: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Recovery is robust against sampling. (1/2)

1 10 100 1000 10000

Sample size

0.0

0.2

0.4

0.6

0.8

1.0

pc-c

onsi

sten

cy s

core

19/23

Page 57: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Recovery is robust against sampling. (2/2)

1 10 100 1000

Sample size

0.0

0.2

0.4

0.6

0.8

1.0

Probabilityofrecovery

Predicted, R=1Predicted, R=2Predicted, R=4Predicted, R=8Predicted, R=16Predicted, R=32Observed, R=2

20/23

Page 58: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Conclusion

21/23

Page 59: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Self-identifying data solves a new problem.

This work: a mechanism for embeddingprovenance information into sensor data.

Management of provenance metadata:Metadata generation [SensorBase]Republication and curation [Park and Heidermann; Ledlie,Ng, et al. ’05]

Structured data:Databases [Lee, Bressen, and Madnick ’95; Buneman,Chapmen and Cheney ’06]Video and images [Genani and Lindqvist, ’07]

Non-malleable data:Digital signatures [Goldwasser, Micali, and Rivest ’88]

22/23

Page 60: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Take home message

Self identifying data. . .

associates sensor data with provenance metadata.

maintains integrity of significant bits, means, and variance.

is robust against benign transformations.

23/23

Page 61: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Bonus Slides

Definition of GatherSuggestions

Page 62: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding

Definition of GatherSuggestions

Algorithm sketch(GatherSuggestions(D))

Recover Lmd and Npp using FindParams.Let suggestions := ∅

For each d ∈ DLet (S,pc,mc,pp) = split(d ,Lmd)Let offset = hash(S) mod NppLet

suggest(i) =

(pp[j], j) j = i − offset mod Npp

and pp[j] in bounds∗ otherwise

suggestions := suggestions ∪ {suggest}

return suggestions