field studies and ecological validity - umiacsmmazurek/818d-s15/slides/06-field.pptx.pdf · why...
TRANSCRIPT
2
Today’s class
• Field studies (pluses and minuses)
• Ecological validity
– Ethics
• Crowdsourced studies (MTurk and friends)
• Project pitches: Next week
4
Why (not) a field study?
• Better ecological validity– Validate a lab study result
• Because you can’t get the data any other way
• Logistically difficult
• Limited piloting / not easy to adjust– One shot at your participant pool
• Expensive (money and time)Plan extremely carefully!
5
PhishGuru in the real world
• Anti-phishing training delivered when users follow a phishing link
• Training, phishing, legitimate emails delivered to 300 employees in a Portuguese company
6
PhishGuru in the real world
• Was a field study necessary here? Why?– How could it have been designed differently?
• What logistical problems were encountered?– Design choices the authors later regretted?– How did they threaten the study’s outcome?
8
Comm ACM, 1979
Computers and Security,
1989
Passwords research is everywhere
ACIS 2004 (Campbell and Bryant)
CCS 2005 (Narayanan and Shmatikov)
WWW 2007 (Florencio and Herley)
CCS 2010 (Weir et al.)
CHI 2011 (Komanduri et al.)
NDSS 2012 (Castelluccia et al.)
IEEE S&P 2012 (Bonneau)
9
… but good data is hard to find
• Small data sets
• Experimental rather than field data
• Self-reported surveys
• Leaked data of questionable validity
• Minimal-value accounts
• No access to plaintext passwords
Are the results generalizable?
10
Fahl et al.: Password study validity
• Goal: Compare lab study, online study, real passwords
• Methods: – Several thousand passwords (plaintext, anonymized)– Invite same pool to online or lab study– Security priming, or not– Manual analysis for similarity
• 583 online, 63 lab participants
11
Results: Validity
% Online Lab Priming Non Total Highly valid 46 49 47 44 46 Somewhat valid 23 32 24 24 24 Invalid 31 18 29 32 30
• Overall, experimental data can be useful – Self-reporting of realistic behavior can help
• No significant difference due to priming
• Lab slightly but significantly better than online
12
Critique the study design
• What was measured?– Comments about the manual analysis approach?
• Priming vs. non-priming
– How are the instructions different in the 2 cases?– Would you have given different instructions?– Are there other conditions you would test?
13
Implications of the results
• Do these findings apply to other studies in the security/privacy area? How?
14
Passwords for an entire university
• 25,000 real, high-value passwords from CMU
• Contextual data – logs, demographics, survey
• What factors correlate with password strength?– New (to passwords) statistical methods– Find new results, confirm prior results
• What to do when you don’t have field data?
– Comparison with leaked and study data
15
What are CMU passwords?
• 25,459 accounts for faculty, staff, and students– Plus 17,104 deactivated accounts
• Single-sign-on for email, financial, grades, registration, health, etc.
• Password requirements:
– Minimum 8 characters– Upper, lower, digit, symbol– Dictionary check (241,497 words)
16
Strength metric: Guessability
• How many guesses to reach each password?– Subject to guessing algorithm and training data
• Result: guess number or beyond the cutoff
– Cutoff = 380 trillion guesses (runs in about 1 day)
Password Guess number 12345678 4
Password178 1.4 x 106
jn%fKXsl!8@Df Beyond cutoff
Example:
17
Comparing password sets
• Examining CMU password policy– Use conforming subset for all leaked data
• Online studies
– MTsim: Closest match to real CMU experience– MTcomp8: Similar password requirements
• Leaked: plaintext
– RockYou, Yahoo!, CSDN
• Leaked: hashed and cracked– Gawker, StratFor
18
Comparing sets – Guessability
Leaked hashed/cracked: Very easy to guess
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%1E4 1E7 1E10 1E13
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%1E4 1E7 1E10 1E13
Gcomp8SFcomp8MTsimRYcomp8CMUactiveMTcomp8CSDNcomp8Ycomp8
Limited-knowledge Extensive-knowledge
Guess number
Perc
ent g
uess
ed
GawkerStratfor
RockYouMTsimCMUMTcomp8CSDNYahoo
Guess number
Perc
ent g
uess
ed
19
Comparing sets – Guessability
Leaked plaintext: RockYou close to CMU, others much tougher
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%1E4 1E7 1E10 1E13
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%1E4 1E7 1E10 1E13
Gcomp8SFcomp8MTsimRYcomp8CMUactiveMTcomp8CSDNcomp8Ycomp8
Limited-knowledge Extensive-knowledge
Guess number
Perc
ent g
uess
ed
Guess number
Perc
ent g
uess
ed
GawkerStratfor
RockYouMTsimCMUMTcomp8CSDNYahoo
20
Comparing sets – Guessability
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%1E4 1E7 1E10 1E13
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%1E4 1E7 1E10 1E13
Gcomp8SFcomp8MTsimRYcomp8CMUactiveMTcomp8CSDNcomp8Ycomp8
Limited-knowledge Extensive-knowledge
Guess number
Perc
ent g
uess
ed
Online studies: Both close, MTcomp8 closer Guess number
Perc
ent g
uess
ed
GawkerStratfor
RockYouMTsimCMUMTcomp8CSDNYahoo
21
Other metrics for comparison
• Composition: length, character classes
• Structures
• Entropy (Shay et al., SOUPS 2010)
• Frequency distribution
22
Comparing sets – Length
9.5 10 10.5 11 11.5
CMUactiveMTsim
MTcomp8 RYcomp8
Ycomp8 CSDNcomp8
SFcomp8 Gcomp8
SVcomp8 MTbasic8
MTdictionary8 MTbasic16
2.3 2.475 2.65 2.825 3 1 1.125 1.25 1.375 1.5 1.1 1.2 1.3 1.4 1.5 1.6 1.7
Length Digits Symbols Uppercase
number of characters number of digits number of symbols number of uppercase lettersOverall: Online studies closest across metrics (Full results in the paper)
CMU MTSim
MTcomp8 RockYou
Yahoo CSDN
Stratfor Gawker Survey
MTbasic8 MTdictionary8
MTbasic16
12.6
8.0
17.9
23
Discussion
• Critique the study design– Challenges of field studies
• Are there lessons for other HFSP studies?
24
Quick note on ethics
• All three studies we discussed today have significant ethical implications
• We’ll revisit this in a couple of weeks– Any comments/questions in the meantime?
33
Homework 2
• Suggesting study designs– We’ll talk about more options Tuesday
• Deploy and analyze an MTurk survey
– Part, but only part, can be done with partners– There are 11 of you … potentially one triple
• Read directions carefully!
34
Project pitches
• 5 min each; slides optional– What is the research question?– Preliminary high-level methodology– Ideally: Quick overview of related work / why it’s novel
• We’ll vote to narrow down and then form teams
• Final teams by 2/24
• Proposals due 3/3– Two pages; details posted on course website