field studies and ecological validity - umiacsmmazurek/818d-s15/slides/06-field.pptx.pdf · why...

1

Field studies and ecological validity

Michelle Mazurek

2

Today’s class

•  Field studies (pluses and minuses)

•  Ecological validity

–  Ethics

•  Crowdsourced studies (MTurk and friends)

•  Project pitches: Next week

3

FIELD STUDIES

4

Why (not) a field study?

•  Better ecological validity–  Validate a lab study result

•  Because you can’t get the data any other way

•  Logistically difficult

•  Limited piloting / not easy to adjust–  One shot at your participant pool

•  Expensive (money and time)Plan extremely carefully!

5

PhishGuru in the real world

•  Anti-phishing training delivered when users follow a phishing link

•  Training, phishing, legitimate emails delivered to 300 employees in a Portuguese company

6

PhishGuru in the real world

•  Was a field study necessary here? Why?–  How could it have been designed differently?

•  What logistical problems were encountered?–  Design choices the authors later regretted?–  How did they threaten the study’s outcome?

7

ECOLOGICAL VALIDITY Case study: Measuring password strength

8

Comm ACM, 1979

Computers and Security,

1989

Passwords research is everywhere

ACIS 2004 (Campbell and Bryant)

CCS 2005 (Narayanan and Shmatikov)

WWW 2007 (Florencio and Herley)

CCS 2010 (Weir et al.)

CHI 2011 (Komanduri et al.)

NDSS 2012 (Castelluccia et al.)

IEEE S&P 2012 (Bonneau)

9

… but good data is hard to find

•  Small data sets

•  Experimental rather than field data

•  Self-reported surveys

•  Leaked data of questionable validity

•  Minimal-value accounts

•  No access to plaintext passwords

Are the results generalizable?

10

Fahl et al.: Password study validity

•  Goal: Compare lab study, online study, real passwords

•  Methods: –  Several thousand passwords (plaintext, anonymized)–  Invite same pool to online or lab study–  Security priming, or not–  Manual analysis for similarity

•  583 online, 63 lab participants

11

Results: Validity

% Online Lab Priming Non Total Highly valid 46 49 47 44 46 Somewhat valid 23 32 24 24 24 Invalid 31 18 29 32 30

•  Overall, experimental data can be useful –  Self-reporting of realistic behavior can help

•  No significant difference due to priming

•  Lab slightly but significantly better than online

12

Critique the study design

•  What was measured?–  Comments about the manual analysis approach?

•  Priming vs. non-priming

–  How are the instructions different in the 2 cases?–  Would you have given different instructions?–  Are there other conditions you would test?

13

Implications of the results

•  Do these findings apply to other studies in the security/privacy area? How?

14

Passwords for an entire university

•  25,000 real, high-value passwords from CMU

•  Contextual data – logs, demographics, survey

•  What factors correlate with password strength?–  New (to passwords) statistical methods–  Find new results, confirm prior results

•  What to do when you don’t have field data?

–  Comparison with leaked and study data

15

What are CMU passwords?

•  25,459 accounts for faculty, staff, and students–  Plus 17,104 deactivated accounts

•  Single-sign-on for email, financial, grades, registration, health, etc.

•  Password requirements:

–  Minimum 8 characters–  Upper, lower, digit, symbol–  Dictionary check (241,497 words)

16

Strength metric: Guessability

•  How many guesses to reach each password?–  Subject to guessing algorithm and training data

•  Result: guess number or beyond the cutoff

–  Cutoff = 380 trillion guesses (runs in about 1 day)

Password Guess number 12345678 4

Password178 1.4 x 106

jn%fKXsl!8@Df Beyond cutoff

Example:

17

Comparing password sets

•  Examining CMU password policy–  Use conforming subset for all leaked data

•  Online studies

–  MTsim: Closest match to real CMU experience–  MTcomp8: Similar password requirements

•  Leaked: plaintext

–  RockYou, Yahoo!, CSDN

•  Leaked: hashed and cracked–  Gawker, StratFor

18

Comparing sets – Guessability

Leaked hashed/cracked: Very easy to guess

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%1E4 1E7 1E10 1E13

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%1E4 1E7 1E10 1E13

Gcomp8SFcomp8MTsimRYcomp8CMUactiveMTcomp8CSDNcomp8Ycomp8

Limited-knowledge Extensive-knowledge

Guess number

Perc

ent g

uess

ed

GawkerStratfor

RockYouMTsimCMUMTcomp8CSDNYahoo

Guess number

Perc

ent g

uess

ed

19


Leaked plaintext: RockYou close to CMU, others much tougher

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%1E4 1E7 1E10 1E13

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%1E4 1E7 1E10 1E13



Guess number

Perc

ent g

uess

ed

Guess number

Perc

ent g

uess

ed

GawkerStratfor


20


100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%1E4 1E7 1E10 1E13

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%1E4 1E7 1E10 1E13



Guess number

Perc

ent g

uess

ed

Online studies: Both close, MTcomp8 closer Guess number

Perc

ent g

uess

ed

GawkerStratfor


21

Other metrics for comparison

•  Composition: length, character classes

•  Structures

•  Entropy (Shay et al., SOUPS 2010)

•  Frequency distribution

22

Comparing sets – Length

9.5 10 10.5 11 11.5

CMUactiveMTsim

MTcomp8 RYcomp8

Ycomp8 CSDNcomp8

SFcomp8 Gcomp8

SVcomp8 MTbasic8

MTdictionary8 MTbasic16

2.3 2.475 2.65 2.825 3 1 1.125 1.25 1.375 1.5 1.1 1.2 1.3 1.4 1.5 1.6 1.7

Length Digits Symbols Uppercase

number of characters number of digits number of symbols number of uppercase lettersOverall: Online studies closest across metrics (Full results in the paper)

CMU MTSim

MTcomp8 RockYou

Yahoo CSDN

Stratfor Gawker Survey

MTbasic8 MTdictionary8

MTbasic16

12.6

8.0

17.9

23

Discussion

•  Critique the study design–  Challenges of field studies

•  Are there lessons for other HFSP studies?

24

Quick note on ethics

•  All three studies we discussed today have significant ethical implications

•  We’ll revisit this in a couple of weeks–  Any comments/questions in the meantime?

33

Homework 2

•  Suggesting study designs–  We’ll talk about more options Tuesday

•  Deploy and analyze an MTurk survey

–  Part, but only part, can be done with partners–  There are 11 of you … potentially one triple

•  Read directions carefully!

34

Project pitches

•  5 min each; slides optional–  What is the research question?–  Preliminary high-level methodology–  Ideally: Quick overview of related work / why it’s novel

•  We’ll vote to narrow down and then form teams

•  Final teams by 2/24

•  Proposals due 3/3–  Two pages; details posted on course website

field studies and ecological validity - umiacsmmazurek/818d-s15/slides/06-field.pptx.pdf · why...

Documents