self-identifying sensor data - jeff vaughan · parameter check bit allows for parameter recovery....
TRANSCRIPT
![Page 1: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/1.jpg)
Self-Identifying Sensor Data
Jeffrey Vaughan
Department of Computer ScienceHarvard University
With Stephen Chong (Harvard) and Christian Skalka (UVM)
IPSNApril 13, 2010
1/23
![Page 2: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/2.jpg)
Example: Hubbard Brook Ecosystem Study
2/23
![Page 3: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/3.jpg)
Example: Hubbard Brook Ecosystem Study
2/23
![Page 4: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/4.jpg)
Data sharing: a tension
Scientists want their work to be used and cited.But don’t want use without attribution.
GoalTrack data provenance in a light-weight, cooperative way.
3/23
![Page 5: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/5.jpg)
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
4/23
![Page 6: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/6.jpg)
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
4/23
![Page 7: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/7.jpg)
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
4/23
![Page 8: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/8.jpg)
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
4/23
![Page 9: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/9.jpg)
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
4/23
![Page 10: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/10.jpg)
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
4/23
![Page 11: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/11.jpg)
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
Sample
4/23
![Page 12: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/12.jpg)
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
Sample
4/23
![Page 13: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/13.jpg)
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
Sample
Round
90
3020
4/23
![Page 14: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/14.jpg)
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
Sample
Round
90
3020
Reorder
435220
4/23
![Page 15: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/15.jpg)
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
Sample
Round
90
3020
Reorder
435220
4/23
![Page 16: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/16.jpg)
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
Sample
Round
90
3020
Reorder
435220
ProvenaceMark: 54321
Recover
4/23
![Page 17: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/17.jpg)
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
Sample
Round
90
3020
Reorder
435220
ProvenaceMark: 54321
Recover
Lookup
4/23
![Page 18: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/18.jpg)
The scientific/educational threat model is unusual.
Scientists. . .
are cooperative: they want to properly cite and attributedata.
use legacy workflows, and will be harmed byschema/format changes.
care a lot about data integrity, but understand the boundson measurement precision and accuracy.
5/23
![Page 19: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/19.jpg)
Outline
1 Introduction
2 Encoding and Decoding
3 Evaluation
4 Conclusion
6/23
![Page 20: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/20.jpg)
Encoding and Decoding
7/23
![Page 21: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/21.jpg)
Self-identifying data based on three functions
Encode : Dataset ×Mark → AnnotatedDataset
Check : Dataset ×Mark → bool
Recover : Dataset → Mark
Where
Mark , int32Datapoint , int
Dataset , AnnotatedDataset , List of Datapoint
8/23
![Page 22: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/22.jpg)
Self-identifying data based on three functions
Encode : Dataset ×Mark → AnnotatedDataset
Check : Dataset ×Mark → bool (cf. One-bit watermarking)
Recover : Dataset → Mark (cf. Blind watermarking)
Where
Mark , int32Datapoint , int
Dataset , AnnotatedDataset , List of Datapoint
8/23
![Page 23: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/23.jpg)
Data is collected for scientific analysis.
Sensors measure real-world phenomena.High-order significant bits contain meaningful data.Low-order insignificant bits contain noise.Acceptable encodings should be minimally perceptible.
Guiding principle
Never alter significant bits.
9/23
![Page 24: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/24.jpg)
Encoding must be robust against transformation.
Sampling Store metadata in many points: Pointwiseredundancy.
Truncation Store key metadata in more significant bits.
Other metadata stored in several bit locations:Positional redundancy
Reordering Order independent encoding.
Insertions/Bit Flips Enable recovery even with “bad” datapoints.
10/23
![Page 25: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/25.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp
11/23
![Page 26: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/26.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
11/23
![Page 27: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/27.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
11/23
![Page 28: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/28.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 411/23
![Page 29: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/29.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 411/23
![Page 30: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/30.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 811/23
![Page 31: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/31.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
11/23
![Page 32: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/32.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp 11/23
![Page 33: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/33.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
11/23
![Page 34: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/34.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 0
11/23
![Page 35: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/35.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
11/23
![Page 36: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/36.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
11/23
![Page 37: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/37.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
11/23
![Page 38: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/38.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
1
11/23
![Page 39: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/39.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
101
11/23
![Page 40: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/40.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
101
Insig
11/23
![Page 41: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/41.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
101
Insig
Mark check bit: mc = hash(Sig. Bits@Mark) mod 2 = 0
11/23
![Page 42: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/42.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
101
Insig
Mark check bit: mc = hash(Sig. Bits@Mark) mod 2 = 0
0
11/23
![Page 43: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/43.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
101
Insig
Mark check bit: mc = hash(Sig. Bits@Mark) mod 2 = 0
010
11/23
![Page 44: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/44.jpg)
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
101
Insig
Mark check bit: mc = hash(Sig. Bits@Mark) mod 2 = 0
010
Insig
11/23
![Page 45: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/45.jpg)
Decoding uses all metadata from annotated datasets.
parameter check: recovery of encoding parameters
mark check: checking if a mark is consistent with dataset
provenance piece: recover an unknown mark from a dataset
12/23
![Page 46: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/46.jpg)
Parameter check bit allows for parameter recovery.
GoalLet D be an annotated data set with k elements. Find Lmd andNpp used in encoding.
Algorithm sketch (FindParams)
Enumerate Lmd and Npp candidates, rejecting incorrectguesses.
Calculate pc for each guess and compare with data points in D.
Correct guess: Probability pc-consistent = 1.Incorrect guess:
Probability consistent with any data point ≈ 1/2.Probability consistent with all points ≈ (1/2)k .
13/23
![Page 47: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/47.jpg)
Parameter check bit allows for parameter recovery.
GoalLet D be an annotated data set with k elements. Find Lmd andNpp used in encoding.
Algorithm sketch (FindParams)
Enumerate Lmd and Npp candidates, rejecting incorrectguesses.(200 or fewer candidates in the worst case.)Calculate pc for each guess and compare with data points in D.
Correct guess: Probability pc-consistent = 1.Incorrect guess:
Probability consistent with any data point ≈ 1/2.Probability consistent with all points ≈ (1/2)k .
13/23
![Page 48: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/48.jpg)
Define Check function using mark check bit.
Goal
Check(D,m) = true
iff mark m is encoded into dataset D.
Algorithm sketch (Check )
Recover parameter Lmd (and Npp) using FindParams.
For each data point in D, recompute mc.
Correct guess: Probability mc-consistent = 1.Incorrect guess:
Probability consistent with any data point ≈ 1/2.Probability consistent with all points ≈ (1/2)k .
14/23
![Page 49: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/49.jpg)
Define Check function using mark check bit.
Goal
Check(D,m) = true
iff mark m is encoded into dataset D.
Algorithm sketch (Check )
Recover parameter Lmd (and Npp) using FindParams.
For each data point in D, recompute mc.
Correct guess: Probability mc-consistent = 1.Incorrect guess:
Probability consistent with any data point ≈ 1/2.Probability consistent with all points ≈ (1/2)k .
14/23
![Page 50: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/50.jpg)
Recover function based on Check .
Goal
Recover(D) = m
iff mark m is encoded into dataset D.
Algorithm sketch (Recover )
Let suggestions = GatherSuggestions(D)Let candidates : List mark = Merge(suggestions)
for each m ∈ candidatesif Check(D,m)
return m
Heuristics in Merge keep candidateslist short, and search tractable.
15/23
![Page 51: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/51.jpg)
Recover function based on Check .
Goal
Recover(D) = m
iff mark m is encoded into dataset D.
Algorithm sketch (Recover )
Let suggestions = GatherSuggestions(D)Let candidates : List mark = Merge(suggestions)
for each m ∈ candidatesif Check(D,m)
return m
Heuristics in Merge keep candidateslist short, and search tractable.
15/23
![Page 52: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/52.jpg)
Recover function based on Check .
Goal
Recover(D) = m
iff mark m is encoded into dataset D.
Algorithm sketch (Recover )
Let suggestions = GatherSuggestions(D)Let candidates : List mark = Merge(suggestions)
for each m ∈ candidatesif Check(D,m)
return m
Heuristics in Merge keep candidateslist short, and search tractable.
15/23
![Page 53: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/53.jpg)
Probabilistic algorithms deal with corruption.
Key Ideas
Empirical cutoffs < 1 determine when mc- andpc-consistency scores indicate correct parameters ormarks.
Weighted voting selects likely candidates provenancemarks.
See paper for details.
16/23
![Page 54: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/54.jpg)
Evaluation
17/23
![Page 55: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/55.jpg)
Encoding preserves bulk statistical data properties.
Assumption: hash mod k samples uniform distributionover {0 . . . k − 1}.Let ε = 2Lmd , the largest value written with insignificant bits.
Change in mean Change invariance
worst case ε− 1 (ε− 1)2/4
insignificant bits ∼ 0 0uniformly dist.
18/23
![Page 56: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/56.jpg)
Recovery is robust against sampling. (1/2)
1 10 100 1000 10000
Sample size
0.0
0.2
0.4
0.6
0.8
1.0
pc-c
onsi
sten
cy s
core
19/23
![Page 57: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/57.jpg)
Recovery is robust against sampling. (2/2)
1 10 100 1000
Sample size
0.0
0.2
0.4
0.6
0.8
1.0
Probabilityofrecovery
Predicted, R=1Predicted, R=2Predicted, R=4Predicted, R=8Predicted, R=16Predicted, R=32Observed, R=2
20/23
![Page 58: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/58.jpg)
Conclusion
21/23
![Page 59: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/59.jpg)
Self-identifying data solves a new problem.
This work: a mechanism for embeddingprovenance information into sensor data.
Management of provenance metadata:Metadata generation [SensorBase]Republication and curation [Park and Heidermann; Ledlie,Ng, et al. ’05]
Structured data:Databases [Lee, Bressen, and Madnick ’95; Buneman,Chapmen and Cheney ’06]Video and images [Genani and Lindqvist, ’07]
Non-malleable data:Digital signatures [Goldwasser, Micali, and Rivest ’88]
22/23
![Page 60: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/60.jpg)
Take home message
Self identifying data. . .
associates sensor data with provenance metadata.
maintains integrity of significant bits, means, and variance.
is robust against benign transformations.
23/23
![Page 61: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/61.jpg)
Bonus Slides
Definition of GatherSuggestions
![Page 62: Self-Identifying Sensor Data - Jeff Vaughan · Parameter check bit allows for parameter recovery. Goal Let D be an annotated data set with k elements. Find Lmd and Npp used in encoding](https://reader034.vdocument.in/reader034/viewer/2022050409/5f85d2ae9e23e15efe49bb4c/html5/thumbnails/62.jpg)
Definition of GatherSuggestions
Algorithm sketch(GatherSuggestions(D))
Recover Lmd and Npp using FindParams.Let suggestions := ∅
For each d ∈ DLet (S,pc,mc,pp) = split(d ,Lmd)Let offset = hash(S) mod NppLet
suggest(i) =
(pp[j], j) j = i − offset mod Npp
and pp[j] in bounds∗ otherwise
suggestions := suggestions ∪ {suggest}
return suggestions