camouflage: automated anonymization of field data (icse 2011)
TRANSCRIPT
CAMOUFLAGE:AUTOMATED ANONYMIZATION
OF FIELD DATA
James ClauseUniversity of Delaware
Alessandro OrsoGeorgia Institute
of Technology
THE BIG PICTURE
THE BIG PICTURE
THE BIG PICTURE
THE BIG PICTURE
THE BIG PICTURE
• Apple crash reporter• Windows error reporting• Ubuntu Apport• Gnome BugBuddy• Mozilla / Google Breakpad• many others
• Chilimbi and colleagues ’09• Elbaum and Diep ’05• Hilbert and Redmiles ’00• Liblit and colleagues ’05• Pavlopoulou and Young ’99• many others
PRIVACY CONCERNS
PRIVACY CONCERNS
Handling concerns in practice
PRIVACY CONCERNS
Handling concerns in practice
PRIVACY CONCERNS
• Ignore them
Handling concerns in practice
PRIVACY CONCERNS
• Ignore them
• Privacy policies
Handling concerns in practice
PRIVACY CONCERNS
• Ignore them
• Privacy policies
• Collect limited amounts of information• less likely to be sensitive• can rely on user checking
PRIVACY CONCERNSUnfortunately:
Register values
Stack dumps
Branch profiles
Path profiles
Test cases
Usefulness
PRIVACY CONCERNS
Privacy concerns
Unfortunately:
Register values
Stack dumps
Branch profiles
Path profiles
Test cases
Usefulness
PRIVACY CONCERNS
Privacy concerns
Unfortunately:
Register values
Stack dumps
Branch profiles
Path profiles
Test cases
Usefulness
GOAL: Enable the collection of detailed information while reducing or eliminating privacy concerns.
PRIVACY CONCERNS
Privacy concerns
Unfortunately:
Register values
Stack dumps
Branch profiles
Path profiles
Test cases
Usefulness
GOAL: Enable the collection of detailed information while reducing or eliminating privacy concerns.
OUTLINE• Intuition
• Castro and colleagues’ technique
• Our improvements• Path condition relaxation• Breakable input conditions
• Evaluation
• Related work
• Conclusions and future work
Sensitiveinput (I) that causes F
Input domain
INTUITION
Sensitiveinput (I) that causes F
Input domainInputs that
cause F
INTUITION
Sensitiveinput (I) that causes F
Input domainInputs that
cause F
INTUITION
Anonymizedinput (I’) that also causes F
Inputs that satisfyF’s path condition Sensitive
input (I) that causes F
Input domainInputs that
cause F
INTUITION
Anonymizedinput (I’) that also causes F
CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
Path condition: set of constraints on a program’s inputs that encode the conditions necessary for a
specific path to be executed.
boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}
CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}
CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
5 3 0(sensitive)
Path Condition:
Symbolic State:
boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}
CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
5 3 0(sensitive)
Path Condition:
Symbolic State:
boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}
CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
5 3 0 x→i1y→i2z→i3
(sensitive)
Path Condition:
Symbolic State:
boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}
CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
5 3 0 x→i1y→i2z→i3
(sensitive)
Path Condition:
i1 <= 5
Symbolic State:
boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}
CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
5 3 0 x→i1y→i2z→i3
(sensitive)
Path Condition:
i1 <= 5
Symbolic State:
boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}
CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
5 3 0 x→i1y→i2z→i3
(sensitive)
Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}
CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
5 3 0 x→i1y→i2z→i3
(sensitive)
Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}
CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
5 3 0 x→i1y→i2z→i3
(sensitive)
Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}
CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
5 3 0 x→i1y→i2z→i3
∧ i2+i1*2 > 10
(sensitive)
Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}
CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
5 3 0 x→i1y→i2z→i3
∧ i2+i1*2 > 10
(sensitive)
Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}
CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)
5 3 0 x→i1y→i2z→i3
∧ i2+i1*2 > 10∧ i3 == 0
(sensitive)
i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0
CASTRO AND COLLEAGUES’ TECHNIQUE(CHOOSING ANONYMIZED INPUTS)
ConstraintSolver
i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0
CASTRO AND COLLEAGUES’ TECHNIQUE(CHOOSING ANONYMIZED INPUTS)
ConstraintSolver
i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0
i1 == 5i2 == 3i3 == 0
CASTRO AND COLLEAGUES’ TECHNIQUE(CHOOSING ANONYMIZED INPUTS)
OUR IMPROVEMENTS
Increase the number ofpossible choices for I’
Chose I’ such that it isas different as possible from I
OUR IMPROVEMENTS
Increase the number ofpossible choices for I’
Chose I’ such that it isas different as possible from I
PATH CONDITION RELAXATION
Sensitiveinput (I) that causes F
Input domain
PATH CONDITION RELAXATION
Sensitiveinput (I) that causes F
Input domain
PATH CONDITION RELAXATION
Sensitiveinput (I) that causes F
Input domain
PATH CONDITION RELAXATION
Sensitiveinput (I) that causes F
Input domain
PATH CONDITION RELAXATION
Sensitiveinput (I) that causes F
Input domain
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
x.equals(y);
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
x.equals(y);
abc abd
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
x.equals(y);
Traditional:
x0 == y0∧ x1 == y1∧ x2 != y2
abc abd
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
x.equals(y);
Traditional:
x0 == y0∧ x1 == y1∧ x2 != y2
Relaxed:
x0 != y0∨ x1 != y1∨ x2 != y2
abc abd
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
switch(x) { case 1: ... break; case 3: case 5:
... break; default: ...}
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
switch(x) { case 1: ... break; case 3: case 5:
... break; default: ...}
5
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
switch(x) { case 1: ... break; case 3: case 5:
... break; default: ...}
Traditional:
x == 5
5
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
Relaxed:
x == 5∨ x == 3
switch(x) { case 1: ... break; case 3: case 5:
... break; default: ...}
Traditional:
x == 5
5
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
switch(x) { case 1: ... break; case 3: case 5:
... break; default: ...}
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
switch(x) { case 1: ... break; case 3: case 5:
... break; default: ...}
10
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
switch(x) { case 1: ... break; case 3: case 5:
... break; default: ...}
Traditional:
x == 10
10
PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals
2. Switch statements 4. Array reads
switch(x) { case 1: ... break; case 3: case 5:
... break; default: ...}
Traditional:
x == 10
Relaxed:
x != 1∧ x != 3∧ x != 5
10
OUR IMPROVEMENTS
Increase the number ofpossible choices for I’
Chose I’ such that it isas different as possible from I
ConstraintSolver
BREAKABLE INPUT CONDITIONS
Path Condition:
i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0 i1 == 5
i2 == 3i3 == 0
ConstraintSolver
BREAKABLE INPUT CONDITIONS
Path Condition:
i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0 i1 == 5
i2 == 3i3 == 0
boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}
5 3 0
ConstraintSolver
BREAKABLE INPUT CONDITIONS
Path Condition:
i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0
ConstraintSolver
BREAKABLE INPUT CONDITIONS
Path Condition:
i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0
Breakable Input Condition:
i1 != 5∧ i2 != 3∧ i3 != 0
ConstraintSolver
BREAKABLE INPUT CONDITIONS
Path Condition:
i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0
Breakable Input Condition:
i1 != 5∧ i2 != 3∧ i3 != 0
ConstraintSolver
BREAKABLE INPUT CONDITIONS
Path Condition:
i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0
Breakable Input Condition:
i1 != 5∧ i2 != 3∧ i3 != 0
ConstraintSolver
BREAKABLE INPUT CONDITIONS
Path Condition:
i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0
Breakable Input Condition:
i1 != 5∧ i2 != 3∧ i3 != 0
i1 == 4i2 == 10i3 == 0
ASSUMPTIONS
ASSUMPTIONS1. The failure f is observable and can be detected with an
assertion.‣ common to all debugging techniques; holds in most, if not all, cases.
ASSUMPTIONS1. The failure f is observable and can be detected with an
assertion.‣ common to all debugging techniques; holds in most, if not all, cases.
2. Any input that satisfies the path condition results in f.
• Non-determinism
‣ common to all debugging techniques; requires a deterministic replay mechanism
• Implicit checks (e.g., division by zero)
‣ likely that they do not involve relevant inputs
‣ make implicit checks explicit (e.g., 100/x → assert x != 0)
ASSUMPTIONS
ASSUMPTIONS
✘
ASSUMPTIONS
✘
1. The failure f is observable and can be detected with an assertion.‣ common to all debugging techniques; holds in most, if not all, cases.
1. The failure f is observable and can be detected with an assertion.‣ common to all debugging techniques; holds in most, if not all, cases.
2. Any input that satisfies the path condition results in f.
• Non-determinism
‣ common to all debugging techniques; requires a deterministic replay mechanism
• Implicit checks (e.g., division by zero)
‣ likely that they do not involve relevant inputs
‣ make implicit checks explicit (e.g., 100/x → assert x != 0)
EVALUATION
EVALUATION
1 FeasibilityCan the approach generate, in a reasonable amount of time, anonymized inputs that reproduce a failure?
EVALUATION
1 FeasibilityCan the approach generate, in a reasonable amount of time, anonymized inputs that reproduce a failure?
StrengthHow much information about the original inputs is revealed?
2
EVALUATION
EffectivenessAre the anonymized inputs safe to send to developers?
31 FeasibilityCan the approach generate, in a reasonable amount of time, anonymized inputs that reproduce a failure?
StrengthHow much information about the original inputs is revealed?
2
EVALUATION
EffectivenessAre the anonymized inputs safe to send to developers?
31 FeasibilityCan the approach generate, in a reasonable amount of time, anonymized inputs that reproduce a failure?
StrengthHow much information about the original inputs is revealed?
2 4 ImprovementDoes the use of path condition relaxation and breakable input conditions provide any benefits over the basic approach?
i1 == 4i2 == 10i3 == 0
ConstraintSolver
PROTOTYPE IMPLEMENTATION
Path Condition:
i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0
Breakable Input Condition:
i1 != 5∧ i2 != 3∧ i3 != 0
i1 == 4i2 == 10i3 == 0
ConstraintSolver
PROTOTYPE IMPLEMENTATION
Path Condition:
i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0
Breakable Input Condition:
i1 != 5∧ i2 != 3∧ i3 != 0
Java Pathfinder
Java Pathfinder
i1 == 4i2 == 10i3 == 0
ConstraintSolver
PROTOTYPE IMPLEMENTATION
Path Condition:
i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0
Breakable Input Condition:
i1 != 5∧ i2 != 3∧ i3 != 0
Java Pathfinder
Java Pathfinder
Yices
i1 == 4i2 == 10i3 == 0
ConstraintSolver
PROTOTYPE IMPLEMENTATION
Path Condition:
i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0
Breakable Input Condition:
i1 != 5∧ i2 != 3∧ i3 != 0
Java Pathfinder
Java Pathfinder
YicesRuby
scripts
Executableinputs
SUBJECTS
• Columba: 1 fault• htmlparser: 1 fault
• Printtokens: 2 faults• NanoXML: 16 faults
(20 faults, total)
SUBJECTS
• Columba: 1 fault• htmlparser: 1 fault
• Printtokens: 2 faults• NanoXML: 16 faults
Select sensitive failure-inducing inputs• 170 total inputs• manually generated or included with subject• 100 bytes to 5MB in size
(20 faults, total)
SUBJECTS
• Columba: 1 fault• htmlparser: 1 fault
• Printtokens: 2 faults• NanoXML: 16 faults
Select sensitive failure-inducing inputs• 170 total inputs• manually generated or included with subject• 100 bytes to 5MB in size
(Assume all of each input is potentially sensitive)
(20 faults, total)
RQ1:FEASIBILITY
0
150
300
450
600
0
5
10
15
20
colu
mba
html
pars
er
prin
ttok
ens
1
prin
ttok
ens
2
nano
xml
1
nano
xml
2
nano
xml
3
nano
xml
4
nano
xml
5
nano
xml
6
nano
xml
7
nano
xml
8
nano
xml
9
nano
xml
10
nano
xml
11
nano
xml
12
nano
xml
13
neno
xml
14
nano
xml
15
nano
xml
16
Ave
rage
exec
utio
n tim
e (s
)A
vera
geso
lver
tim
e (s
)
RQ1:FEASIBILITY
0
150
300
450
600
0
5
10
15
20
colu
mba
html
pars
er
prin
ttok
ens
1
prin
ttok
ens
2
nano
xml
1
nano
xml
2
nano
xml
3
nano
xml
4
nano
xml
5
nano
xml
6
nano
xml
7
nano
xml
8
nano
xml
9
nano
xml
10
nano
xml
11
nano
xml
12
nano
xml
13
neno
xml
14
nano
xml
15
nano
xml
16
Ave
rage
exec
utio
n tim
e (s
)A
vera
geso
lver
tim
e (s
)
Inputs can be anonymized in a reasonable amount of time (easily done overnight)
Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs that satisfy the path
condition
Littleinformation revealed
Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs that satisfy the path
condition
Lots ofinformation revealed
Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs that satisfy the path
condition
Measures how much of the anonymized input is identical
to the original input
AAAAAAsecretAAAAAA...
AAAAAA
BBBBBBsecretBBBBBB...
BBBBBB
I’
Lots ofinformation revealed
I
Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs that satisfy the path
condition
Measures how much of the anonymized input is identical
to the original input
AAAAAAsecretAAAAAA...
AAAAAA
BBBBBBsecretBBBBBB...
BBBBBB
I’
Lots ofinformation revealed
I
RQ2: STRENGTH
0
25
50
75
100
0
25
50
75
100
colu
mba
html
pars
er
prin
ttok
ens
1
prin
ttok
ens
2
nano
xml
1
nano
xml
2
nano
xml
3
nano
xml
4
nano
xml
5
nano
xml
6
nano
xml
7
nano
xml
8
nano
xml
9
nano
xml
10
nano
xml
11
nano
xml
12
nano
xml
13
neno
xml
14
nano
xml
15
nano
xml
16
Ave
rage
% B
its R
evea
led
Ave
rage
% R
esid
ue
RQ2: STRENGTH
0
25
50
75
100
0
25
50
75
100
colu
mba
html
pars
er
prin
ttok
ens
1
prin
ttok
ens
2
nano
xml
1
nano
xml
2
nano
xml
3
nano
xml
4
nano
xml
5
nano
xml
6
nano
xml
7
nano
xml
8
nano
xml
9
nano
xml
10
nano
xml
11
nano
xml
12
nano
xml
13
neno
xml
14
nano
xml
15
nano
xml
16
Ave
rage
% B
its R
evea
led
Ave
rage
% R
esid
ue
Anonymized inputs reveal, on average, between 60% (worst case) and 2% (best case) of the
information in the original inputs
RQ3: EFFECTIVENESSNANOXML
<!DOCTYPE Foo [ <!ELEMENT Foo (ns:Bar)> <!ATTLIST Foo xmlns CDATA #FIXED 'http://nanoxml.n3.net/bar' a CDATA #REQUIRED>
<!ELEMENT ns:Bar (Blah)> <!ATTLIST ns:Bar xmlns:ns CDATA #FIXED 'http://nanoxml.n3.net/bar'>
<!ELEMENT Blah EMPTY> <!ATTLIST Blah x CDATA #REQUIRED ns:x CDATA #REQUIRED>]><!-- comment --><Foo a='very' b='secret' c='stuff'>vaz <ns:Bar> <Blah x="1" ns:x="2"/> </ns:Bar></Foo>
RQ3: EFFECTIVENESSNANOXML
<!DOCTYPE [ <! > <!ATTLIST #FIXED ' ' >
<!E > <!ATTLIST #FIXED ' '>
<!E > <!ATTLIST # : # >]><!-- -->< =' ' =' ' =' '> < : > < =" " : =" "/> </ :
Wayne,Bartley,Bartley,Wayne,[email protected],,Ronald,Kahle,Kahle,Ron,[email protected],,Wilma,Lavelle,Lavelle,Wilma,,[email protected],Jesse,Hammonds,Hammonds,Jesse,,[email protected],Amy,Uhl,Uhl,Amy,[email protected],[email protected],Hazel,Miracle,Miracle,Hazel,[email protected],,Roxanne,Nealy,Nealy,Roxie,,[email protected],Heather,Kane,Kane,Heather,[email protected],,Rosa,Stovall,Stovall,Rosa,,[email protected],Peter,Hyden,Hyden,Pete,,[email protected],Jeffrey,Wesson,Wesson,Jeff,[email protected],,Virginia,Mendoza,Mendoza,Ginny,[email protected],,Richard,Robledo,Robledo,Ralph,[email protected],,Edward,Blanding,Blanding,Ed,,[email protected],Sean,Pulliam,Pulliam,Sean,[email protected],,Steven,Kocher,Kocher,Steve,[email protected],,Tony,Whitlock,Whitlock,Tony,,[email protected],Frank,Earl,Earl,Frankie,,,Shelly,Riojas,Riojas,Shelly,[email protected],,
RQ3: EFFECTIVENESSCOLUMBA
, , , , ,, , , , , ,, , , , ,, , , , , ,, , , , , , , , , , , , ,, , , , ,, , , , , , ,, , , , ,, , , , , ,, , , , , , ,, , , , , ,, , , , , ,, , , , ,, , , , , , ,, , , , , ,, , , , ,, , , , , ,,,
RQ3: EFFECTIVENESSCOLUMBA
, , , , ,, , , , , ,, , , , ,, , , , , ,, , , , , , , , , , , , ,, , , , ,, , , , , , ,, , , , ,, , , , , ,, , , , , , ,, , , , , ,, , , , , ,, , , , ,, , , , , , ,, , , , , ,, , , , ,, , , , , ,,,
RQ3: EFFECTIVENESSHTMLPARSER
<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title>
<style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/
body { margin: 0px;...
/*]]>*/--></style></head><body> ...</body>
RQ3: EFFECTIVENESSHTMLPARSER
<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title>
<style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/
body { margin: 0px;...
/*]]>*/--></style></head><body> ...</body>
RQ3: EFFECTIVENESSHTMLPARSER
<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title>
<style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/
body { margin: 0px;...
/*]]>*/--></style></head><body> ...</body>
The portions of the inputs that remain after anonymization tend to be structural in nature and
therefore are safe to send to developers
RQ4: IMPROVEMENT
RQ4: IMPROVEMENT
0
25
50
75
100
0
25
50
75
100
colu
mba
html
pars
er
prin
ttok
ens
1
prin
ttok
ens
2
nano
xml
1
nano
xml
2
nano
xml
3
nano
xml
4
nano
xml
5
nano
xml
6
nano
xml
7
nano
xml
8
nano
xml
9
nano
xml
10
nano
xml
11
nano
xml
12
nano
xml
13
neno
xml
14
nano
xml
15
nano
xml
16
% Im
prov
emen
tBi
ts R
evea
led
% Im
prov
emen
tR
esid
ue
RQ4: IMPROVEMENT
0
25
50
75
100
0
25
50
75
100
colu
mba
html
pars
er
prin
ttok
ens
1
prin
ttok
ens
2
nano
xml
1
nano
xml
2
nano
xml
3
nano
xml
4
nano
xml
5
nano
xml
6
nano
xml
7
nano
xml
8
nano
xml
9
nano
xml
10
nano
xml
11
nano
xml
12
nano
xml
13
neno
xml
14
nano
xml
15
nano
xml
16
% Im
prov
emen
tBi
ts R
evea
led
% Im
prov
emen
tR
esid
ue
Inputs anonymized using our improvements reveal an average of 30% less bits of information
and 40% less residue.(With only a marginal increase in time.)
RELATED WORK
RELATED WORK• Castro and colleagues ’08
RELATED WORK• Castro and colleagues ’08
• Broadwell and colleagues ’03
RELATED WORK• Castro and colleagues ’08
• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crash
dumps.
RELATED WORK• Castro and colleagues ’08
• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crash
dumps.
• Wang and colleagues ’08
RELATED WORK• Castro and colleagues ’08
• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crash
dumps.
• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a
client machine to construct anonymized inputs
RELATED WORK• Castro and colleagues ’08
• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crash
dumps.
• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a
client machine to construct anonymized inputs
• Data set anonymization techniques (e.g., k-anonymization)
RELATED WORK• Castro and colleagues ’08
• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crash
dumps.
• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a
client machine to construct anonymized inputs
• Data set anonymization techniques (e.g., k-anonymization)• Budi and colleagues ’11
RELATED WORK• Castro and colleagues ’08
• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crash
dumps.
• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a
client machine to construct anonymized inputs
• Data set anonymization techniques (e.g., k-anonymization)• Budi and colleagues ’11
• Grechanik and colleagues ’11
RELATED WORK• Castro and colleagues ’08
• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crash
dumps.
• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a
client machine to construct anonymized inputs
• Data set anonymization techniques (e.g., k-anonymization)• Budi and colleagues ’11
• Grechanik and colleagues ’11
• Dynamic symbolic execution techniques
FUTURE WORK
FUTURE WORK• Additional quality metrics that:
• consider additional aspects of privacy loss• consider the relative sensitivity of different inputs• are intuitive and easy to use
FUTURE WORK• Additional quality metrics that:
• consider additional aspects of privacy loss• consider the relative sensitivity of different inputs• are intuitive and easy to use
• Conduction additional (human) studies• additional (larger) subjects
FUTURE WORK• Additional quality metrics that:
• consider additional aspects of privacy loss• consider the relative sensitivity of different inputs• are intuitive and easy to use
• Conduction additional (human) studies• additional (larger) subjects
• Investigate the combination of anonymization and minimization
SUMMARY
SUMMARY1. An approach for automatically anonymizing failure-inducing
inputs• extends Castro and colleagues’ technique through the
novel concepts of path condition relaxation and breakable input conditions
SUMMARY1. An approach for automatically anonymizing failure-inducing
inputs• extends Castro and colleagues’ technique through the
novel concepts of path condition relaxation and breakable input conditions
2. An empirical evaluation that demonstrates, for the subjects considered, our approach is:• feasible — generates anonymized inputs in < 10 minutes• effective — anonymized inputs did not contain sensitive
information • an improvement over the state-of-the-art
QUESTIONS?