on the semantic patterns of passwords and their security impact
DESCRIPTION
On the Semantic Patterns of Passwords and their Security Impact. Rafael Veras , Christopher Collins, Julie Thorpe University of Ontario institute of Technology Presenter: Kyle Wallace. A Familiar Scenario…. User Name:. CoolGuy90. Password:. “ What should I pick as my new password ?”. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/1.jpg)
On the Semantic Patterns of Passwords and their Security ImpactRAFAEL VERAS, CHRISTOPHER COLLINS, JULIE THORPE
UNIVERSITY OF ONTARIO INSTITUTE OF TECHNOLOGY
PRESENTER: KYLE WALLACE
![Page 2: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/2.jpg)
A Familiar Scenario…
Password:
“What should I pick as my new password?”
User Name: CoolGuy90
![Page 3: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/3.jpg)
A Familiar Scenario…
“Musical!Snowycat90”
![Page 4: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/4.jpg)
A Familiar Scenario… But how secure is “Musical!Snowycat90” really? (18 chars)
◦ “Musical” – Dictionary word, possibly related to hobby◦ “!” – Filler character◦ “Snowy” – Dictionary word, attribute to “cat”◦ “cat” – Dictionary word, animal, possibly pet◦ “90” – Number, possibly truncated year of birth
15/18 characters are related to dictionary words!Why do we pick the passwords that we do?
![Page 5: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/5.jpg)
Password Patterns? “Even after half a century of password use in computing, we still do not have a deep understanding of how people create their passwords” –Authors
Are there ‘meta-patterns’ or preferences that can be observed across how people choose their passwords?
Do these patterns/preferences have an impact on security?
![Page 6: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/6.jpg)
Contributions Use NLP to segment, classify, and generalize semantic categories
Describe most common semantic patterns in RockYou database
A PCFG that captures structural, semantic, and syntactic patterns
Evaluation of security impact, comparison with previous studies
![Page 7: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/7.jpg)
Contributions Use NLP to segment, classify, and generalize semantic categories
Describe most common semantic patterns in RockYou database
A PCFG that captures structural, semantic, and syntactic patterns
Evaluation of security impact, comparison with previous studies
![Page 8: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/8.jpg)
Segmentation Decomposition of passwords into constituent parts
◦ Passwords contain no whitespace characters (usually)◦ Passwords contain filler characters (“gaps”) between segments
Ex: crazy2duck93^ -> {crazy, duck} & {2,93^}
Issue: What about strings that parse multiple ways?
![Page 9: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/9.jpg)
Coverage Prefer fewer, smaller gaps to larger ones Ex: Anyonebarks98 (13 characters long)
![Page 10: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/10.jpg)
Splitting Algorithm Source corpora: Raw word list
◦ Taken from COCA (Contemporary Corpus of American English)
Trimmed version of COCA:◦ 3 letter words: Frequency of 100+◦ 2 letter words: Top 37◦ 1 letter words: a, I
Also collected list of names, cities, surnames, months, and countries
![Page 11: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/11.jpg)
Splitting Algorithm Reference Corpus: Collection of N-Grams, where N=3 (Full COCA)◦ N-Gram: Sequence of tokens (words)
Ex: “I love my cats”◦ Unigrams: I, love, my, cats (4)◦ Bigrams: I love, love my, my cats (3)◦ Trigrams: I love my, love my cats (2)
![Page 12: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/12.jpg)
Common Words
![Page 13: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/13.jpg)
Part-of-Speech Tagging Necessary step for semantic classification◦ Ex: “love” is a noun (my true love)
and a verb (I love cats)
Given segments , returns Gap segments are not tagged
![Page 14: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/14.jpg)
Semantic Classification Assigns a semantic classifier to each password segment
◦ Only assigned to nouns and verbs
WordNet: A graph of concepts expressed as a set of synonyms◦ “Synsets” are arranged into hierarchies, more general at top
Fall back to source corpora for proper nouns◦ Tag with female name, male name, surname, country, or city
![Page 15: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/15.jpg)
Semantic Classification
Tags represented asword.pos.#, where # is the WordNet ‘sense’
![Page 16: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/16.jpg)
Semantic Generalization Where in the synset hierarchy should we represent a word?
Utilize a tree cut model on synset tree◦ Goal: Optimize between parameter & data description length
![Page 17: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/17.jpg)
W=1000 (gold), W=5000 (red), W=10000(blue)
![Page 18: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/18.jpg)
Contributions Use NLP to segment, classify, and generalize semantic categories
Describe most common semantic patterns in RockYou database
A PCFG that captures structural, semantic, and syntactic patterns
Evaluation of security impact, comparison with previous studies
![Page 19: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/19.jpg)
Classification RockYou leak (2009) contained over 32 million passwords
Effect of generalization can be seen in a few cases (in blue)◦ Some generalizations better than
others (Ex: ‘looted’ vs ‘bravo100’)
Some synsets are not generalized (in red)◦ Ex: puppy.n.01 -> puppy.n.01
![Page 20: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/20.jpg)
Summary of Categories Love (6,7) Places (3, 13) Sexual Terms (29, 34, 54, 69) Royalty (25, 59, 60) Profanity (40, 70, 72) Animals (33, 36, 37, 92, 96 100)
Food (61, 66, 76, 82, 93) Alcohol (39) Money (46, 74) *Some categories expanded from two letter acronyms +Some categories contain noise from names dictionary
![Page 21: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/21.jpg)
Top 100 Semantic Categories
![Page 22: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/22.jpg)
Contributions Use NLP to segment, classify, and generalize semantic categories
Describe most common semantic patterns in RockYou database
A PCFG that captures structural, semantic, and syntactic patterns
Evaluation of security impact, comparison with previous studies
![Page 23: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/23.jpg)
Probabilistic Context-Free Grammar A CFG whose productions have associated probabilities
◦ A vocabulary set (terminals) ◦ A variable set (non-terminals) ◦ A start variable ◦ A set of rules (terminals + non-terminals)◦ A set of probabilities on rules, such that
![Page 24: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/24.jpg)
Semantic PCFG In the author’s PCFG:
◦ is comprised of the source corpora and learned gap segments◦ is the set of all semantic and syntactic categories◦ All rules are of the form , or (nonterminals)
This grammar is regular (described by a finite automaton)
![Page 25: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/25.jpg)
Sample PCFG Training data:
◦ iloveyou2◦ ihatedthem3◦ football3
rules are base structures
Grammar can generate passwords
Probability of a password is the product of all rule probabilities
Ex: P(youlovethem2) = 0.0103125
![Page 26: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/26.jpg)
RockYou Base Structures (Top 50)
![Page 27: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/27.jpg)
Contributions Use NLP to segment, classify, and generalize semantic categories
Describe most common semantic patterns in RockYou database
A PCFG that captures structural, semantic, and syntactic patterns
Evaluation of security impact, comparison with previous studies
![Page 28: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/28.jpg)
Building a Guess Generator Cracking attacks consist of three steps:
◦ Generate a guess◦ Hash the guess using the same algorithm as target◦ Check for matches in the target database
Most popular methods (using John the Ripper program)◦ Word lists (from previous breaks)◦ Brute force (usually after exhausting word lists)
![Page 29: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/29.jpg)
Guess Generator
At a high level:◦ Output terminals in highest
probability order◦ Iteratively replaces higher
probability terminals with lower probability ones
◦ Uses priority queue to maintain order
Will this produce the same list of guesses every time?
![Page 30: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/30.jpg)
Guess Generator Example Suppose only one base structure:
Initialized with most probable terminals: “I love Susie’s cat” Pop first guess off queue (“IloveSusiescat”)
◦ Replace first segment: “youloveSusiescat”◦ Replace second segment: “IhateSusiescat”◦ Replace third segment: “IloveBobscat”◦ Replace fourth segment: “IloveSusiesdog”
![Page 31: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/31.jpg)
Mangling Rules Passwords aren’t always strictly lowercase◦ Beardog123lol ◦ bearDOG123LoL ◦ BearDog123LoL
Three types of rules:◦ Capitalize first word segment◦ Capitalize whole word segment◦ CamelCase on all segments
Any others?
![Page 32: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/32.jpg)
Comparison to Weir Approach Author’s approach seen as an evolution of Weir
◦ Weir contains far fewer non-terminals (less precise estimates)◦ Weir does not learn semantic rules (fewer overall terminals)◦ Weir treats grammar and dictionary input separately
◦ Authors semantic classification needs to be re-run for changes
![Page 33: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/33.jpg)
Password Cracking Experiments Considered 5 methods:
◦ Semantic approach w/o mangling rules◦ Semantic approach w/ custom mangling rules◦ Semantic approach w/ JtR’s mangling rules◦ Weir approach◦ Wordlist w/ JtR’s default rules + incremental brute force
Attempted to crack LinkedIn and MySpace leaks
![Page 34: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/34.jpg)
Experiment 1: RockYou vs LinkedIn
5,787,239 unique passwords Results:
◦ Semantic outperforms non-semantic versions
◦ Weir approach is worst (67% improvement)
◦ Authors approach is more robust against differing demographics
![Page 35: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/35.jpg)
Experiment 2: RockYou vs MySpace
41,543 unique passwords Results:
◦ Semantic approach outperforms all◦ No-rules performs best
◦ Weir approach is worst (32% improvement)
◦ Password were phished, quality lowered?
![Page 36: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/36.jpg)
Experiment 3: Maximum Crack Rate
Since method is based on grammar, can build grammar recognizer to check Results:
◦ Semantic equivalent to brute force, with fewer guesses
◦ Weir approach generates fewer guesses, 30% less guessed
![Page 37: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/37.jpg)
Experiment 3: Time to Maximum Crack
Fit non-linear regression to sample of guess probs. Results:
◦ Semantic method has lower guess/second
◦ Grammar is much larger than Weir method
![Page 38: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/38.jpg)
Issues with Semantic Approach Further study needed into performance bottlenecks
◦ Though semantic method is more efficient (high guesses/hit)
Approach requires a significant amount of memory◦ Workaround involves probability threshold for adding to queue
Duplicates could be produced due to ambiguous splits◦ Ex: (one, go) vs (on, ego)
![Page 39: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/39.jpg)
Conclusions There are underlying semantic patterns in password creation
These semantics can be captured in a probabilistic grammar
This grammar can be used to efficiently generate probable passwords
This generator shows (up to) a 67% improvement over previous efforts
![Page 40: On the Semantic Patterns of Passwords and their Security Impact](https://reader035.vdocument.in/reader035/viewer/2022081511/56813add550346895da327cf/html5/thumbnails/40.jpg)
Thank you!QUESTIONS?