self-identifying sensor data - jeff vaughan · parameter check bit allows for parameter recovery....
Post on 01-Aug-2020
1 Views
Preview:
TRANSCRIPT
Self-Identifying Sensor Data
Jeffrey Vaughan
Department of Computer ScienceHarvard University
With Stephen Chong (Harvard) and Christian Skalka (UVM)
IPSNApril 13, 2010
1/23
Example: Hubbard Brook Ecosystem Study
2/23
Example: Hubbard Brook Ecosystem Study
2/23
Data sharing: a tension
Scientists want their work to be used and cited.But don’t want use without attribution.
GoalTrack data provenance in a light-weight, cooperative way.
3/23
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
4/23
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
4/23
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
4/23
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
4/23
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
4/23
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
4/23
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
Sample
4/23
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
Sample
4/23
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
Sample
Round
90
3020
4/23
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
Sample
Round
90
3020
Reorder
435220
4/23
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
Sample
Round
90
3020
Reorder
435220
4/23
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
Sample
Round
90
3020
Reorder
435220
ProvenaceMark: 54321
Recover
4/23
Key Idea: Encode provenance in insignificant bits.
Sensorw/ error±1mm
31234423818
123275435493
Snow (μm)
ProvenaceMark: 54321
185338425215
Publish 31218523338
123425435215
Snow (μm)
Sample
Round
90
3020
Reorder
435220
ProvenaceMark: 54321
Recover
Lookup
4/23
The scientific/educational threat model is unusual.
Scientists. . .
are cooperative: they want to properly cite and attributedata.
use legacy workflows, and will be harmed byschema/format changes.
care a lot about data integrity, but understand the boundson measurement precision and accuracy.
5/23
Outline
1 Introduction
2 Encoding and Decoding
3 Evaluation
4 Conclusion
6/23
Encoding and Decoding
7/23
Self-identifying data based on three functions
Encode : Dataset ×Mark → AnnotatedDataset
Check : Dataset ×Mark → bool
Recover : Dataset → Mark
Where
Mark , int32Datapoint , int
Dataset , AnnotatedDataset , List of Datapoint
8/23
Self-identifying data based on three functions
Encode : Dataset ×Mark → AnnotatedDataset
Check : Dataset ×Mark → bool (cf. One-bit watermarking)
Recover : Dataset → Mark (cf. Blind watermarking)
Where
Mark , int32Datapoint , int
Dataset , AnnotatedDataset , List of Datapoint
8/23
Data is collected for scientific analysis.
Sensors measure real-world phenomena.High-order significant bits contain meaningful data.Low-order insignificant bits contain noise.Acceptable encodings should be minimally perceptible.
Guiding principle
Never alter significant bits.
9/23
Encoding must be robust against transformation.
Sampling Store metadata in many points: Pointwiseredundancy.
Truncation Store key metadata in more significant bits.
Other metadata stored in several bit locations:Positional redundancy
Reordering Order independent encoding.
Insertions/Bit Flips Enable recovery even with “bad” datapoints.
10/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 411/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 411/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 811/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp 11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 0
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
1
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
101
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
101
Insig
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
101
Insig
Mark check bit: mc = hash(Sig. Bits@Mark) mod 2 = 0
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
101
Insig
Mark check bit: mc = hash(Sig. Bits@Mark) mod 2 = 0
0
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
101
Insig
Mark check bit: mc = hash(Sig. Bits@Mark) mod 2 = 0
010
11/23
Encoding must use limited supply of insignificant bits
Harvard Red
Publish
Datapoint0 1111 1 11 00 00 000 1c
mc
p
Mark0 1010 0 10 11 10 100 0
c
pp Sig. BitsSig. Bits Insig
L = 6md
0 1111 1 00 00
Number of Provenance Pieces: ppN = 4
0pp 2pp1pp 3pp
ppN = 4
0pp 2pp1pp 3pp4pp 6pp5pp 7pp
ppN = 8
ppN = 8
hash(1110001100) mod N = 4pp
01 1 0
4pp
01 1 001 1 0
01 1 0
Insig
Parameter check bit: pc = hash(Sig. Bits@N ) mod 2 = 1pp
101
Insig
Mark check bit: mc = hash(Sig. Bits@Mark) mod 2 = 0
010
Insig
11/23
Decoding uses all metadata from annotated datasets.
parameter check: recovery of encoding parameters
mark check: checking if a mark is consistent with dataset
provenance piece: recover an unknown mark from a dataset
12/23
Parameter check bit allows for parameter recovery.
GoalLet D be an annotated data set with k elements. Find Lmd andNpp used in encoding.
Algorithm sketch (FindParams)
Enumerate Lmd and Npp candidates, rejecting incorrectguesses.
Calculate pc for each guess and compare with data points in D.
Correct guess: Probability pc-consistent = 1.Incorrect guess:
Probability consistent with any data point ≈ 1/2.Probability consistent with all points ≈ (1/2)k .
13/23
Parameter check bit allows for parameter recovery.
GoalLet D be an annotated data set with k elements. Find Lmd andNpp used in encoding.
Algorithm sketch (FindParams)
Enumerate Lmd and Npp candidates, rejecting incorrectguesses.(200 or fewer candidates in the worst case.)Calculate pc for each guess and compare with data points in D.
Correct guess: Probability pc-consistent = 1.Incorrect guess:
Probability consistent with any data point ≈ 1/2.Probability consistent with all points ≈ (1/2)k .
13/23
Define Check function using mark check bit.
Goal
Check(D,m) = true
iff mark m is encoded into dataset D.
Algorithm sketch (Check )
Recover parameter Lmd (and Npp) using FindParams.
For each data point in D, recompute mc.
Correct guess: Probability mc-consistent = 1.Incorrect guess:
Probability consistent with any data point ≈ 1/2.Probability consistent with all points ≈ (1/2)k .
14/23
Define Check function using mark check bit.
Goal
Check(D,m) = true
iff mark m is encoded into dataset D.
Algorithm sketch (Check )
Recover parameter Lmd (and Npp) using FindParams.
For each data point in D, recompute mc.
Correct guess: Probability mc-consistent = 1.Incorrect guess:
Probability consistent with any data point ≈ 1/2.Probability consistent with all points ≈ (1/2)k .
14/23
Recover function based on Check .
Goal
Recover(D) = m
iff mark m is encoded into dataset D.
Algorithm sketch (Recover )
Let suggestions = GatherSuggestions(D)Let candidates : List mark = Merge(suggestions)
for each m ∈ candidatesif Check(D,m)
return m
Heuristics in Merge keep candidateslist short, and search tractable.
15/23
Recover function based on Check .
Goal
Recover(D) = m
iff mark m is encoded into dataset D.
Algorithm sketch (Recover )
Let suggestions = GatherSuggestions(D)Let candidates : List mark = Merge(suggestions)
for each m ∈ candidatesif Check(D,m)
return m
Heuristics in Merge keep candidateslist short, and search tractable.
15/23
Recover function based on Check .
Goal
Recover(D) = m
iff mark m is encoded into dataset D.
Algorithm sketch (Recover )
Let suggestions = GatherSuggestions(D)Let candidates : List mark = Merge(suggestions)
for each m ∈ candidatesif Check(D,m)
return m
Heuristics in Merge keep candidateslist short, and search tractable.
15/23
Probabilistic algorithms deal with corruption.
Key Ideas
Empirical cutoffs < 1 determine when mc- andpc-consistency scores indicate correct parameters ormarks.
Weighted voting selects likely candidates provenancemarks.
See paper for details.
16/23
Evaluation
17/23
Encoding preserves bulk statistical data properties.
Assumption: hash mod k samples uniform distributionover {0 . . . k − 1}.Let ε = 2Lmd , the largest value written with insignificant bits.
Change in mean Change invariance
worst case ε− 1 (ε− 1)2/4
insignificant bits ∼ 0 0uniformly dist.
18/23
Recovery is robust against sampling. (1/2)
1 10 100 1000 10000
Sample size
0.0
0.2
0.4
0.6
0.8
1.0
pc-c
onsi
sten
cy s
core
19/23
Recovery is robust against sampling. (2/2)
1 10 100 1000
Sample size
0.0
0.2
0.4
0.6
0.8
1.0
Probabilityofrecovery
Predicted, R=1Predicted, R=2Predicted, R=4Predicted, R=8Predicted, R=16Predicted, R=32Observed, R=2
20/23
Conclusion
21/23
Self-identifying data solves a new problem.
This work: a mechanism for embeddingprovenance information into sensor data.
Management of provenance metadata:Metadata generation [SensorBase]Republication and curation [Park and Heidermann; Ledlie,Ng, et al. ’05]
Structured data:Databases [Lee, Bressen, and Madnick ’95; Buneman,Chapmen and Cheney ’06]Video and images [Genani and Lindqvist, ’07]
Non-malleable data:Digital signatures [Goldwasser, Micali, and Rivest ’88]
22/23
Take home message
Self identifying data. . .
associates sensor data with provenance metadata.
maintains integrity of significant bits, means, and variance.
is robust against benign transformations.
23/23
Bonus Slides
Definition of GatherSuggestions
Definition of GatherSuggestions
Algorithm sketch(GatherSuggestions(D))
Recover Lmd and Npp using FindParams.Let suggestions := ∅
For each d ∈ DLet (S,pc,mc,pp) = split(d ,Lmd)Let offset = hash(S) mod NppLet
suggest(i) =
(pp[j], j) j = i − offset mod Npp
and pp[j] in bounds∗ otherwise
suggestions := suggestions ∪ {suggest}
return suggestions
top related