camouflage: automated anonymization of field data (icse 2011)

127
CAMOUFLAGE: AUTOMATED ANONYMIZATION OF FIELD DATA James Clause University of Delaware Alessandro Orso Georgia Institute of Technology

Upload: james-clause

Post on 19-Jun-2015

107 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

CAMOUFLAGE:AUTOMATED ANONYMIZATION

OF FIELD DATA

James ClauseUniversity of Delaware

Alessandro OrsoGeorgia Institute

of Technology

Page 2: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

THE BIG PICTURE

Page 3: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

THE BIG PICTURE

Page 4: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

THE BIG PICTURE

Page 5: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

THE BIG PICTURE

Page 6: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

THE BIG PICTURE

• Apple crash reporter• Windows error reporting• Ubuntu Apport• Gnome BugBuddy• Mozilla / Google Breakpad• many others

• Chilimbi and colleagues ’09• Elbaum and Diep ’05• Hilbert and Redmiles ’00• Liblit and colleagues ’05• Pavlopoulou and Young ’99• many others

Page 7: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PRIVACY CONCERNS

Page 8: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PRIVACY CONCERNS

Page 9: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Handling concerns in practice

PRIVACY CONCERNS

Page 10: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Handling concerns in practice

PRIVACY CONCERNS

• Ignore them

Page 11: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Handling concerns in practice

PRIVACY CONCERNS

• Ignore them

• Privacy policies

Page 12: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Handling concerns in practice

PRIVACY CONCERNS

• Ignore them

• Privacy policies

• Collect limited amounts of information• less likely to be sensitive• can rely on user checking

Page 13: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PRIVACY CONCERNSUnfortunately:

Register values

Stack dumps

Branch profiles

Path profiles

Test cases

Usefulness

Page 14: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PRIVACY CONCERNS

Privacy concerns

Unfortunately:

Register values

Stack dumps

Branch profiles

Path profiles

Test cases

Usefulness

Page 15: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PRIVACY CONCERNS

Privacy concerns

Unfortunately:

Register values

Stack dumps

Branch profiles

Path profiles

Test cases

Usefulness

GOAL: Enable the collection of detailed information while reducing or eliminating privacy concerns.

Page 16: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PRIVACY CONCERNS

Privacy concerns

Unfortunately:

Register values

Stack dumps

Branch profiles

Path profiles

Test cases

Usefulness

GOAL: Enable the collection of detailed information while reducing or eliminating privacy concerns.

Page 17: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

OUTLINE• Intuition

• Castro and colleagues’ technique

• Our improvements• Path condition relaxation• Breakable input conditions

• Evaluation

• Related work

• Conclusions and future work

Page 18: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Sensitiveinput (I) that causes F

Input domain

INTUITION

Page 19: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Sensitiveinput (I) that causes F

Input domainInputs that

cause F

INTUITION

Page 20: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Sensitiveinput (I) that causes F

Input domainInputs that

cause F

INTUITION

Anonymizedinput (I’) that also causes F

Page 21: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Inputs that satisfyF’s path condition Sensitive

input (I) that causes F

Input domainInputs that

cause F

INTUITION

Anonymizedinput (I’) that also causes F

Page 22: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)

Path condition: set of constraints on a program’s inputs that encode the conditions necessary for a

specific path to be executed.

Page 23: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}

CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)

Page 24: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}

CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)

5 3 0(sensitive)

Page 25: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Path Condition:

Symbolic State:

boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}

CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)

5 3 0(sensitive)

Page 26: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Path Condition:

Symbolic State:

boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}

CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)

5 3 0 x→i1y→i2z→i3

(sensitive)

Page 27: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Path Condition:

Symbolic State:

boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}

CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)

5 3 0 x→i1y→i2z→i3

(sensitive)

Page 28: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Path Condition:

i1 <= 5

Symbolic State:

boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}

CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)

5 3 0 x→i1y→i2z→i3

(sensitive)

Page 29: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Path Condition:

i1 <= 5

Symbolic State:

boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}

CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)

5 3 0 x→i1y→i2z→i3

(sensitive)

Page 30: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Path Condition:

i1 <= 5

Symbolic State:

a→i1*2

boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}

CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)

5 3 0 x→i1y→i2z→i3

(sensitive)

Page 31: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Path Condition:

i1 <= 5

Symbolic State:

a→i1*2

boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}

CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)

5 3 0 x→i1y→i2z→i3

(sensitive)

Page 32: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Path Condition:

i1 <= 5

Symbolic State:

a→i1*2

boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}

CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)

5 3 0 x→i1y→i2z→i3

∧ i2+i1*2 > 10

(sensitive)

Page 33: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Path Condition:

i1 <= 5

Symbolic State:

a→i1*2

boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}

CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)

5 3 0 x→i1y→i2z→i3

∧ i2+i1*2 > 10

(sensitive)

Page 34: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Path Condition:

i1 <= 5

Symbolic State:

a→i1*2

boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}

CASTRO AND COLLEAGUES’ TECHNIQUE(PATH CONDITION GENERATION)

5 3 0 x→i1y→i2z→i3

∧ i2+i1*2 > 10∧ i3 == 0

(sensitive)

Page 35: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

CASTRO AND COLLEAGUES’ TECHNIQUE(CHOOSING ANONYMIZED INPUTS)

Page 36: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ConstraintSolver

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

CASTRO AND COLLEAGUES’ TECHNIQUE(CHOOSING ANONYMIZED INPUTS)

Page 37: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ConstraintSolver

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

i1 == 5i2 == 3i3 == 0

CASTRO AND COLLEAGUES’ TECHNIQUE(CHOOSING ANONYMIZED INPUTS)

Page 38: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

OUR IMPROVEMENTS

Increase the number ofpossible choices for I’

Chose I’ such that it isas different as possible from I

Page 39: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

OUR IMPROVEMENTS

Increase the number ofpossible choices for I’

Chose I’ such that it isas different as possible from I

Page 40: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION

Sensitiveinput (I) that causes F

Input domain

Page 41: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION

Sensitiveinput (I) that causes F

Input domain

Page 42: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION

Sensitiveinput (I) that causes F

Input domain

Page 43: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION

Sensitiveinput (I) that causes F

Input domain

Page 44: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION

Sensitiveinput (I) that causes F

Input domain

Page 45: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

Page 46: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

Page 47: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

x.equals(y);

Page 48: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

x.equals(y);

abc abd

Page 49: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

x.equals(y);

Traditional:

x0 == y0∧ x1 == y1∧ x2 != y2

abc abd

Page 50: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

x.equals(y);

Traditional:

x0 == y0∧ x1 == y1∧ x2 != y2

Relaxed:

x0 != y0∨ x1 != y1∨ x2 != y2

abc abd

Page 51: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

Page 52: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

Page 53: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

switch(x) { case 1: ... break; case 3: case 5:

... break; default: ...}

Page 54: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

switch(x) { case 1: ... break; case 3: case 5:

... break; default: ...}

5

Page 55: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

switch(x) { case 1: ... break; case 3: case 5:

... break; default: ...}

Traditional:

x == 5

5

Page 56: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

Relaxed:

x == 5∨ x == 3

switch(x) { case 1: ... break; case 3: case 5:

... break; default: ...}

Traditional:

x == 5

5

Page 57: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

switch(x) { case 1: ... break; case 3: case 5:

... break; default: ...}

Page 58: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

switch(x) { case 1: ... break; case 3: case 5:

... break; default: ...}

10

Page 59: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

switch(x) { case 1: ... break; case 3: case 5:

... break; default: ...}

Traditional:

x == 10

10

Page 60: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

PATH CONDITION RELAXATION1. Array inequalities 3. Multi-clause conditionals

2. Switch statements 4. Array reads

switch(x) { case 1: ... break; case 3: case 5:

... break; default: ...}

Traditional:

x == 10

Relaxed:

x != 1∧ x != 3∧ x != 5

10

Page 61: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

OUR IMPROVEMENTS

Increase the number ofpossible choices for I’

Chose I’ such that it isas different as possible from I

Page 62: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ConstraintSolver

BREAKABLE INPUT CONDITIONS

Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0 i1 == 5

i2 == 3i3 == 0

Page 63: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ConstraintSolver

BREAKABLE INPUT CONDITIONS

Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0 i1 == 5

i2 == 3i3 == 0

boolean foo(int x, int y, int z) { if(x <= 5) { int a = x * 2; if(y + a > 10) { if(z == 0) { return true; } } } return false;}

5 3 0

Page 64: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ConstraintSolver

BREAKABLE INPUT CONDITIONS

Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

Page 65: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ConstraintSolver

BREAKABLE INPUT CONDITIONS

Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

Breakable Input Condition:

i1 != 5∧ i2 != 3∧ i3 != 0

Page 66: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ConstraintSolver

BREAKABLE INPUT CONDITIONS

Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

Breakable Input Condition:

i1 != 5∧ i2 != 3∧ i3 != 0

Page 67: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ConstraintSolver

BREAKABLE INPUT CONDITIONS

Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

Breakable Input Condition:

i1 != 5∧ i2 != 3∧ i3 != 0

Page 68: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ConstraintSolver

BREAKABLE INPUT CONDITIONS

Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

Breakable Input Condition:

i1 != 5∧ i2 != 3∧ i3 != 0

i1 == 4i2 == 10i3 == 0

Page 69: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ASSUMPTIONS

Page 70: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ASSUMPTIONS1. The failure f is observable and can be detected with an

assertion.‣ common to all debugging techniques; holds in most, if not all, cases.

Page 71: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ASSUMPTIONS1. The failure f is observable and can be detected with an

assertion.‣ common to all debugging techniques; holds in most, if not all, cases.

2. Any input that satisfies the path condition results in f.

• Non-determinism

‣ common to all debugging techniques; requires a deterministic replay mechanism

• Implicit checks (e.g., division by zero)

‣ likely that they do not involve relevant inputs

‣ make implicit checks explicit (e.g., 100/x → assert x != 0)

Page 72: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ASSUMPTIONS

Page 73: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ASSUMPTIONS

Page 74: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

ASSUMPTIONS

Page 75: Camouflage: Automated Anonymization of Field Data (ICSE 2011)
Page 76: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

1. The failure f is observable and can be detected with an assertion.‣ common to all debugging techniques; holds in most, if not all, cases.

Page 77: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

1. The failure f is observable and can be detected with an assertion.‣ common to all debugging techniques; holds in most, if not all, cases.

2. Any input that satisfies the path condition results in f.

• Non-determinism

‣ common to all debugging techniques; requires a deterministic replay mechanism

• Implicit checks (e.g., division by zero)

‣ likely that they do not involve relevant inputs

‣ make implicit checks explicit (e.g., 100/x → assert x != 0)

Page 78: Camouflage: Automated Anonymization of Field Data (ICSE 2011)
Page 79: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

EVALUATION

Page 80: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

EVALUATION

1 FeasibilityCan the approach generate, in a reasonable amount of time, anonymized inputs that reproduce a failure?

Page 81: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

EVALUATION

1 FeasibilityCan the approach generate, in a reasonable amount of time, anonymized inputs that reproduce a failure?

StrengthHow much information about the original inputs is revealed?

2

Page 82: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

EVALUATION

EffectivenessAre the anonymized inputs safe to send to developers?

31 FeasibilityCan the approach generate, in a reasonable amount of time, anonymized inputs that reproduce a failure?

StrengthHow much information about the original inputs is revealed?

2

Page 83: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

EVALUATION

EffectivenessAre the anonymized inputs safe to send to developers?

31 FeasibilityCan the approach generate, in a reasonable amount of time, anonymized inputs that reproduce a failure?

StrengthHow much information about the original inputs is revealed?

2 4 ImprovementDoes the use of path condition relaxation and breakable input conditions provide any benefits over the basic approach?

Page 84: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

i1 == 4i2 == 10i3 == 0

ConstraintSolver

PROTOTYPE IMPLEMENTATION

Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

Breakable Input Condition:

i1 != 5∧ i2 != 3∧ i3 != 0

Page 85: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

i1 == 4i2 == 10i3 == 0

ConstraintSolver

PROTOTYPE IMPLEMENTATION

Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

Breakable Input Condition:

i1 != 5∧ i2 != 3∧ i3 != 0

Java Pathfinder

Java Pathfinder

Page 86: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

i1 == 4i2 == 10i3 == 0

ConstraintSolver

PROTOTYPE IMPLEMENTATION

Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

Breakable Input Condition:

i1 != 5∧ i2 != 3∧ i3 != 0

Java Pathfinder

Java Pathfinder

Yices

Page 87: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

i1 == 4i2 == 10i3 == 0

ConstraintSolver

PROTOTYPE IMPLEMENTATION

Path Condition:

i1 <= 5∧ i2+i1*2 > 10∧ i3 == 0

Breakable Input Condition:

i1 != 5∧ i2 != 3∧ i3 != 0

Java Pathfinder

Java Pathfinder

YicesRuby

scripts

Executableinputs

Page 88: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

SUBJECTS

• Columba: 1 fault• htmlparser: 1 fault

• Printtokens: 2 faults• NanoXML: 16 faults

(20 faults, total)

Page 89: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

SUBJECTS

• Columba: 1 fault• htmlparser: 1 fault

• Printtokens: 2 faults• NanoXML: 16 faults

Select sensitive failure-inducing inputs• 170 total inputs• manually generated or included with subject• 100 bytes to 5MB in size

(20 faults, total)

Page 90: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

SUBJECTS

• Columba: 1 fault• htmlparser: 1 fault

• Printtokens: 2 faults• NanoXML: 16 faults

Select sensitive failure-inducing inputs• 170 total inputs• manually generated or included with subject• 100 bytes to 5MB in size

(Assume all of each input is potentially sensitive)

(20 faults, total)

Page 91: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RQ1:FEASIBILITY

0

150

300

450

600

0

5

10

15

20

colu

mba

html

pars

er

prin

ttok

ens

1

prin

ttok

ens

2

nano

xml

1

nano

xml

2

nano

xml

3

nano

xml

4

nano

xml

5

nano

xml

6

nano

xml

7

nano

xml

8

nano

xml

9

nano

xml

10

nano

xml

11

nano

xml

12

nano

xml

13

neno

xml

14

nano

xml

15

nano

xml

16

Ave

rage

exec

utio

n tim

e (s

)A

vera

geso

lver

tim

e (s

)

Page 92: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RQ1:FEASIBILITY

0

150

300

450

600

0

5

10

15

20

colu

mba

html

pars

er

prin

ttok

ens

1

prin

ttok

ens

2

nano

xml

1

nano

xml

2

nano

xml

3

nano

xml

4

nano

xml

5

nano

xml

6

nano

xml

7

nano

xml

8

nano

xml

9

nano

xml

10

nano

xml

11

nano

xml

12

nano

xml

13

neno

xml

14

nano

xml

15

nano

xml

16

Ave

rage

exec

utio

n tim

e (s

)A

vera

geso

lver

tim

e (s

)

Inputs can be anonymized in a reasonable amount of time (easily done overnight)

Page 93: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Average % Bits Revealed Average % Residue

RQ2: STRENGTH

Page 94: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Average % Bits Revealed Average % Residue

RQ2: STRENGTH

Measures how many inputs that satisfy the path

condition

Littleinformation revealed

Page 95: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Average % Bits Revealed Average % Residue

RQ2: STRENGTH

Measures how many inputs that satisfy the path

condition

Lots ofinformation revealed

Page 96: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Average % Bits Revealed Average % Residue

RQ2: STRENGTH

Measures how many inputs that satisfy the path

condition

Measures how much of the anonymized input is identical

to the original input

AAAAAAsecretAAAAAA...

AAAAAA

BBBBBBsecretBBBBBB...

BBBBBB

I’

Lots ofinformation revealed

I

Page 97: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Average % Bits Revealed Average % Residue

RQ2: STRENGTH

Measures how many inputs that satisfy the path

condition

Measures how much of the anonymized input is identical

to the original input

AAAAAAsecretAAAAAA...

AAAAAA

BBBBBBsecretBBBBBB...

BBBBBB

I’

Lots ofinformation revealed

I

Page 98: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RQ2: STRENGTH

0

25

50

75

100

0

25

50

75

100

colu

mba

html

pars

er

prin

ttok

ens

1

prin

ttok

ens

2

nano

xml

1

nano

xml

2

nano

xml

3

nano

xml

4

nano

xml

5

nano

xml

6

nano

xml

7

nano

xml

8

nano

xml

9

nano

xml

10

nano

xml

11

nano

xml

12

nano

xml

13

neno

xml

14

nano

xml

15

nano

xml

16

Ave

rage

% B

its R

evea

led

Ave

rage

% R

esid

ue

Page 99: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RQ2: STRENGTH

0

25

50

75

100

0

25

50

75

100

colu

mba

html

pars

er

prin

ttok

ens

1

prin

ttok

ens

2

nano

xml

1

nano

xml

2

nano

xml

3

nano

xml

4

nano

xml

5

nano

xml

6

nano

xml

7

nano

xml

8

nano

xml

9

nano

xml

10

nano

xml

11

nano

xml

12

nano

xml

13

neno

xml

14

nano

xml

15

nano

xml

16

Ave

rage

% B

its R

evea

led

Ave

rage

% R

esid

ue

Anonymized inputs reveal, on average, between 60% (worst case) and 2% (best case) of the

information in the original inputs

Page 100: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RQ3: EFFECTIVENESSNANOXML

<!DOCTYPE Foo [   <!ELEMENT Foo (ns:Bar)>   <!ATTLIST Foo       xmlns CDATA #FIXED 'http://nanoxml.n3.net/bar'       a     CDATA #REQUIRED>

   <!ELEMENT ns:Bar (Blah)>   <!ATTLIST ns:Bar       xmlns:ns CDATA #FIXED 'http://nanoxml.n3.net/bar'>

   <!ELEMENT Blah EMPTY>   <!ATTLIST Blah       x    CDATA #REQUIRED       ns:x CDATA #REQUIRED>]><!-- comment --><Foo a='very' b='secret' c='stuff'>vaz   <ns:Bar>       <Blah x="1" ns:x="2"/>   </ns:Bar></Foo>

Page 101: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RQ3: EFFECTIVENESSNANOXML

<!DOCTYPE [   <! >   <!ATTLIST         #FIXED ' '        >

   <!E >   <!ATTLIST        #FIXED ' '>

   <!E >   <!ATTLIST        #        : # >]><!-- -->< =' ' =' ' =' '>   < : >       < =" " : =" "/>   </ :

Page 102: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Wayne,Bartley,Bartley,Wayne,[email protected],,Ronald,Kahle,Kahle,Ron,[email protected],,Wilma,Lavelle,Lavelle,Wilma,,[email protected],Jesse,Hammonds,Hammonds,Jesse,,[email protected],Amy,Uhl,Uhl,Amy,[email protected],[email protected],Hazel,Miracle,Miracle,Hazel,[email protected],,Roxanne,Nealy,Nealy,Roxie,,[email protected],Heather,Kane,Kane,Heather,[email protected],,Rosa,Stovall,Stovall,Rosa,,[email protected],Peter,Hyden,Hyden,Pete,,[email protected],Jeffrey,Wesson,Wesson,Jeff,[email protected],,Virginia,Mendoza,Mendoza,Ginny,[email protected],,Richard,Robledo,Robledo,Ralph,[email protected],,Edward,Blanding,Blanding,Ed,,[email protected],Sean,Pulliam,Pulliam,Sean,[email protected],,Steven,Kocher,Kocher,Steve,[email protected],,Tony,Whitlock,Whitlock,Tony,,[email protected],Frank,Earl,Earl,Frankie,,,Shelly,Riojas,Riojas,Shelly,[email protected],,

RQ3: EFFECTIVENESSCOLUMBA

, , , , ,, , , , , ,, , , , ,, , , , , ,, , , , , , , , , , , , ,, , , , ,, , , , , , ,, , , , ,, , , , , ,, , , , , , ,, , , , , ,, , , , , ,, , , , ,, , , , , , ,, , , , , ,, , , , ,, , , , , ,,,

Page 103: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RQ3: EFFECTIVENESSCOLUMBA

, , , , ,, , , , , ,, , , , ,, , , , , ,, , , , , , , , , , , , ,, , , , ,, , , , , , ,, , , , ,, , , , , ,, , , , , , ,, , , , , ,, , , , , ,, , , , ,, , , , , , ,, , , , , ,, , , , ,, , , , , ,,,

Page 104: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RQ3: EFFECTIVENESSHTMLPARSER

<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title>

<style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/

body { margin: 0px;...

/*]]>*/--></style></head><body> ...</body>

Page 105: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RQ3: EFFECTIVENESSHTMLPARSER

<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title>

<style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/

body { margin: 0px;...

/*]]>*/--></style></head><body> ...</body>

Page 106: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RQ3: EFFECTIVENESSHTMLPARSER

<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title>

<style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/

body { margin: 0px;...

/*]]>*/--></style></head><body> ...</body>

The portions of the inputs that remain after anonymization tend to be structural in nature and

therefore are safe to send to developers

Page 107: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RQ4: IMPROVEMENT

Page 108: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RQ4: IMPROVEMENT

0

25

50

75

100

0

25

50

75

100

colu

mba

html

pars

er

prin

ttok

ens

1

prin

ttok

ens

2

nano

xml

1

nano

xml

2

nano

xml

3

nano

xml

4

nano

xml

5

nano

xml

6

nano

xml

7

nano

xml

8

nano

xml

9

nano

xml

10

nano

xml

11

nano

xml

12

nano

xml

13

neno

xml

14

nano

xml

15

nano

xml

16

% Im

prov

emen

tBi

ts R

evea

led

% Im

prov

emen

tR

esid

ue

Page 109: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RQ4: IMPROVEMENT

0

25

50

75

100

0

25

50

75

100

colu

mba

html

pars

er

prin

ttok

ens

1

prin

ttok

ens

2

nano

xml

1

nano

xml

2

nano

xml

3

nano

xml

4

nano

xml

5

nano

xml

6

nano

xml

7

nano

xml

8

nano

xml

9

nano

xml

10

nano

xml

11

nano

xml

12

nano

xml

13

neno

xml

14

nano

xml

15

nano

xml

16

% Im

prov

emen

tBi

ts R

evea

led

% Im

prov

emen

tR

esid

ue

Inputs anonymized using our improvements reveal an average of 30% less bits of information

and 40% less residue.(With only a marginal increase in time.)

Page 110: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RELATED WORK

Page 111: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RELATED WORK• Castro and colleagues ’08

Page 112: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RELATED WORK• Castro and colleagues ’08

• Broadwell and colleagues ’03

Page 113: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RELATED WORK• Castro and colleagues ’08

• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crash

dumps.

Page 114: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RELATED WORK• Castro and colleagues ’08

• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crash

dumps.

• Wang and colleagues ’08

Page 115: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RELATED WORK• Castro and colleagues ’08

• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crash

dumps.

• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a

client machine to construct anonymized inputs

Page 116: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RELATED WORK• Castro and colleagues ’08

• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crash

dumps.

• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a

client machine to construct anonymized inputs

• Data set anonymization techniques (e.g., k-anonymization)

Page 117: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RELATED WORK• Castro and colleagues ’08

• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crash

dumps.

• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a

client machine to construct anonymized inputs

• Data set anonymization techniques (e.g., k-anonymization)• Budi and colleagues ’11

Page 118: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RELATED WORK• Castro and colleagues ’08

• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crash

dumps.

• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a

client machine to construct anonymized inputs

• Data set anonymization techniques (e.g., k-anonymization)• Budi and colleagues ’11

• Grechanik and colleagues ’11

Page 119: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

RELATED WORK• Castro and colleagues ’08

• Broadwell and colleagues ’03• Uses dynamic taint analysis to identify and remove sensitive information in crash

dumps.

• Wang and colleagues ’08• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a

client machine to construct anonymized inputs

• Data set anonymization techniques (e.g., k-anonymization)• Budi and colleagues ’11

• Grechanik and colleagues ’11

• Dynamic symbolic execution techniques

Page 120: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

FUTURE WORK

Page 121: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

FUTURE WORK• Additional quality metrics that:

• consider additional aspects of privacy loss• consider the relative sensitivity of different inputs• are intuitive and easy to use

Page 122: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

FUTURE WORK• Additional quality metrics that:

• consider additional aspects of privacy loss• consider the relative sensitivity of different inputs• are intuitive and easy to use

• Conduction additional (human) studies• additional (larger) subjects

Page 123: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

FUTURE WORK• Additional quality metrics that:

• consider additional aspects of privacy loss• consider the relative sensitivity of different inputs• are intuitive and easy to use

• Conduction additional (human) studies• additional (larger) subjects

• Investigate the combination of anonymization and minimization

Page 124: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

SUMMARY

Page 125: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

SUMMARY1. An approach for automatically anonymizing failure-inducing

inputs• extends Castro and colleagues’ technique through the

novel concepts of path condition relaxation and breakable input conditions

Page 126: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

SUMMARY1. An approach for automatically anonymizing failure-inducing

inputs• extends Castro and colleagues’ technique through the

novel concepts of path condition relaxation and breakable input conditions

2. An empirical evaluation that demonstrates, for the subjects considered, our approach is:• feasible — generates anonymized inputs in < 10 minutes• effective — anonymized inputs did not contain sensitive

information • an improvement over the state-of-the-art

Page 127: Camouflage: Automated Anonymization of Field Data (ICSE 2011)

QUESTIONS?