Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Passwords for Humans: A Cultural Approach to Passphrase Wordlist Generation

Skylar Nagao
August 03, 2016

Passwords for Humans: A Cultural Approach to Passphrase Wordlist Generation

Presented at PasswordsCon16 / BSides Las Vegas by Skylar Nagao and Florencia Herra-Vega.

Overview:

(1) An introduction to passphrases, entropy estimation, and Peerio's need for strong and memorable passwords

(2) A discussion of the problems with existing passphrase wordlists

(3) A proposal for a more human-centered approach to wordlist generation based on culture

(4) Discussion of recent research in the field and suggestions for future work

Skylar Nagao

August 03, 2016
Tweet

Other Decks in Research

Transcript

  1. Skylar Nagao Product Manager at Peerio E: [email protected] T: @skygao_

    Who we are Florencia Herra-Vega CTO at Peerio E: fl[email protected] T: @flohdot
  2. The Basics: Peerio, passphrases, and seNing standards The Problem: Is

    “hhhh” a word? A Cultural Approach: How Seinfeld and GoT can help The Future: Some ideas and an open forum The Plan
  3. Passphrase: Like a password, but usually longer and composed of

    words, e.g. MyP@$$w0rD vs. Hi, this is my passphrase. Definitions
  4. Entropy: A lack of predictability. To have more entropy is

    to have less predictability Definitions Less entropy More entropy
  5. Easy to use message and file sharing application Cloud based,

    end-to-end encrypted Users need to be able to login from any device Background
  6. Ñ  “Easy to use” = users have to think as

    liNle as possible Ñ  In security, “easy to use” should also mean “hard to screw up” Ñ  Secure defaults needed, e.g. minimum password strength Background
  7. Ñ People cannot, and will not, remember more than a few

    passwords Ñ Password managers are great for most people Ñ Critical services should have strong passwords Assumptions
  8. Ñ At what point does trying to crack (guess) a password

    become unreasonable? Ñ How much time & money does your adversary have? Ó $5? Ó $1000? Modeling threats
  9. Ñ FBI paid $1+ million to crack an iPhone Ñ In 2013,

    USA intelligence agencies total budget was ~$52.6 Billion Ñ No single agency had more than $15 billion ANack budgets?
  10. Ñ  In 2013, Bitcoin miners performed ~275 SHA-256 hashes Ñ 

    Collectively received a payout of ~$257 million Ñ  Researchers estimated centralized actor would require: Ó  $1 million to perform ~270 hashes / year Ó  $1 billion to perform ~280 hashes / year 1 ANack costs? 1. Bonneau, J., & Schechter, S. Towards Reliable Storage of 56-bit Secrets in Human Memory.
  11. $1 billion and a full year is probably more money

    and time than most agencies are willing to spend trying to crack a password. The $1 Billion Standard
  12. Ñ  ~90 bits of entropy would require ~$1 trillion Ñ 

    ~100 bits of entropy would require ~$1 quadrillion Ñ  Increasing entropy requires longer passwords Ñ  Stretching keys further affects UX Why not $1 trillion?
  13. Ñ  $1B standard requires 81-bit passwords Ó  Performing 280 hash

    functions = ~80-bits of entropy Ó  281 = a 50% chance to crack at the US$1B cost Ñ  Key stretching can effectively increase bit-strength in terms of cost Ó  Increase the computational cost of each hash function Ó  Estimated 13.5+ bits added due to key stretching 81 - 13.5 = 67.5 bits Peerio’s Entropy Requirement
  14. Ñ  One study found 30% of users were using one

    of top 10,000 most common passwords 1 Ñ  Over 200,000 users from the Ashley Madison hack were using these passwords Ó  123456 Ó  12345 Ó  password Problem: People suck at making passwords 1. hNps://xato.net/10-000-top-passwords-6d6380716fe0#.a744cencn
  15. The Entropy in Entropy Estimation Ñ  Count characters? 111111111 =

    26.4 bits? Ñ  Meet password rules? P@s$W0rD = 52.4 bits? Ñ  Make it a liNle longer? 123qweasdzxc = 62 bits?
  16. The Entropy in Entropy Estimation Ñ  zxcvbn estimate 111111111 =

    ~6.3 bits — instantly cracked Ñ  zxcvbn estimate P@s$W0rD = ~6.3 bits — instantly cracked Ñ  zxcvbn estimate 123qweasdzxc = ~20 bits — 3 minutes to crack
  17. Our first solution “Okay, to compensate for people’s inability to

    come up with good passwords, let’s just require 100-bit passphrases, give detailed instructions about how to make passphrases, and give a strength meter to help users know when it’s good enough.”
  18. 1. People aren’t used to passphrases 2. People are still

    predictable People are also bad at making passphrases
  19. Your favorite band doesn’t make you unique Bon Jovi and

    Drake may be timeless, but these passphrases probably aren’t
  20. People’s memory is fallible This is my super sweet password.

    This is my super sweet password this is my super sweet password. this is my super sweet password This is my super sweet password! this is my super sweet password!
  21. Random Generation Ñ  Provides reliable standards for estimating password entropy

    Ñ  User doesn’t have to think Ñ  Research suggests that users are not significantly more or less likely to remember assigned passwords than chosen ones 1 1. Stobert, E. A. Memorability of Assigned Random Graphical Passwords. 2011
  22. Longer than equivalent strength passwords in terms of character length

    Length muse aims with lasting liable j16mjkzsh96yc Password Passphrase 13.0 characters 29.0 characters ~67.2 bits ~67.5 bits 5.2 bits/char 2.3 bits/char
  23. However, passphrases are shorter in terms of “chunks” of data

    to memorize Length muse aims with lasting liable j16mjkzsh96yc Password Passphrase 13.0 chunks 5.0 chunks ~67.2 bits ~67.5 bits 5.2 bits/chunk 13.5 bits/chunk 1 2 3 4 5
  24. Ñ Passphrases are language specific Ñ Different dictionaries needed for each language

    Ñ Languages average word length varies, YMMV (sorry German) Localization
  25. Ñ Limited research on the usability of system assigned passwords vs.

    passphrases. Ñ One study found that 5-character passwords were significantly easier to remember and faster to type than 3-4 word passphrases after five days1 Evidence 1. Shay et al. Correct horse baNery staple: Exploring the usability of system-assigned passphrases. 2012
  26. However, the same study also found: Ó  6-character passwords were

    significantly harder, more annoying, and less fun to learn, than 5- character passwords Ó  Using larger dictionaries for passphrases didn’t affect memorability Ó  People found passphrases more fun to learn Evidence
  27. Another study found that users were more likely to memorize

    6-word passphrases than equivalent strength 12-character passwords when asked to login to a service repeatedly over two weeks Evidence 1. Bonneau, J., & Schechter, S. Towards Reliable Storage of 56-bit Secrets in Human Memory.
  28. Ñ What makes a good passphrase dictionary? Ñ LiNle research has been

    done. Ñ So… How have we been doing it? Are Words The Problem With Phrases?
  29. Ñ 7776 words Ñ Words 3-5 characters in length Ñ Supposedly constructed of

    common English words Ñ Includes words like: zloty, wuhan, ncaa, boise Ñ Includes “words” like: ! , !! , ? , $ , $$ , () , 5678 , 6789 , zg , zf , zh , etc. Diceware
  30. Don’t know a word? The FAQs for Diceware state: “There

    are some obscure words in both lists. If you passphrase includes a word you don't know, look it up in a good dictionary. Learning the word's meaning will aid you memory and your vocabulary.” Diceware
  31. Ñ 6800 words (a reduced Diceware list) Ñ Removed “words” like numbers,

    symbols, and Americanisms like “Boise” Ñ Still has words like “zloty” and “wuhan” SecureDrop
  32. Ñ 256 Words Ñ Developed by Phillip Zimmerman and Patrick Juola for

    PGPfone — a VoIP service where words phonetic distinctiveness were of key value. Ñ E.g. capricorn, norwegian, stethoscope, PGP WordList
  33. Ñ 58,110 most commonly used English words Ñ Does not reference source

    Ñ Includes many long and obscure words, e.g. Ó  axiomatising, expostulations, epicycloid, quincentenary, psoriasis, gastroenteritis, uptotheminute. miniLock
  34. Ó 1024 most common words from COCA word list Ó Not sorted

    for length, includes words like: “environmental”, “international”, “experience” Correct Horse BaUery Staple: Exploring the usability of system assigned passphrases
  35. Ó  676 words Ó  Built from all 3-5 character nouns,

    adjectives, verbs Ó  Excluded vulgar words, plural nouns, slang Towards reliable storage of 56- bit secrets in human memory
  36. Ó  Chose most common based on Google’s N-Gram corpus Ó 

    Eliminated words within edit distance of two for possible typo correction Ó  Eliminated words that have word prefixes to allow for autocomplete Towards reliable storage of 56- bit secrets in human memory
  37. Ñ What words is someone likely to know? Ñ What words is

    someone likely to remember? Ñ What words are easiest to type? Q#1: What Words?
  38. Ñ These lists vary from 256 to 56110 words Ñ Larger list

    = more entropy per word Ñ Larger list = larger avg. characters/word Ñ Larger list = weirder words Q#2: How Many Words?
  39. Passphrase engines have been designed by developers and security professionals

    We need to start evaluating the usability of different passphrase formulas This is a UX problem. The Problem…
  40. A Cultural Approach to Passphrases OR How Seinfeld and Game

    of Thrones can help make beNer passphrases
  41. Ñ Choose shorter words Ñ Create “words” based on paNerns (xyz) Ñ Choose

    phonetically distinct words Ñ Choose semantically distinct words Ñ Choose auto-complete friendly words Ñ Choose auto-correct friendly words Technical Approaches
  42. Ñ Choose words people know the meaning of Ñ Choose words people

    know how to spell Ñ Recognize not everyone speaks English Ñ Recognize not everyone has the same education Ñ Remove words that may offend Ñ User review and feedback Cultural Approaches
  43. COCA word list? Google N-Grams? Wiktionary? All help identify commonly

    used words, but should we include words from all sources? Q#1: What Words?
  44. Most common words in Harvard Law Review != Most common

    words in Dr. Seuss books Q#1: What Words?
  45. Ñ Most commonly used words from accessible sources Ñ Language has social

    biases Ó Who and what gets published? Ó Who and what gets aired? Ó Who consumes this content? Q#1: What Words?
  46. Most common words in printed media != Most common words

    in online publications != Most common words in television and film Q#1: What Words?
  47. Numerous studies have found that word frequency data from subtitles

    is a more accurate indicator of reading performance and word recognition.1 Reflects real world usage 1. Dimitropoulou et al. Subtitle-Based Word Frequencies as the Best Estimate of Reading Behavior: The Case of Greek, 2010
  48. Ñ Find large corpus of subtitles for a language Ñ Compare against

    native language dictionary to remove slang, sound cues, etc. Ñ Organize remaining words by use frequency The Process
  49. Ñ Isolate 20-35,000 most common words Ñ Remove “offensive” words from list

    Ñ Remove longest words in list until list is ~12k The Process
  50. Ñ ~12,000 word list of shortest and most common words in

    a language Ñ Cultural basis for word familiarity The Result
  51. Larger dictionary = more entropy per word 1024 words =

    10.0 bits // 7 words = 70.0 bits 11600 words = ~13.5 bits // 5 words = 67.5 bits The Result
  52. Fewer longer words ~= more shorter words (overall character length)

    Six 4.5-character words + 5 spaces = 32 characters Five 5.8-character words + 4 spaces = 33 characters The Result
  53. Tested as Public Key encoding against Ñ  Hexadecimal Ñ  Base32

    Ñ  Numeric Ñ  PGP Wordlist Ñ  Sentence format Testing the Results Dechand, et al., An Empirical Study of Textual Key-Fingerprint Representations. University of Bonn. 2015
  54. Tested as Public Key encoding Ñ  Fastest verification method Ñ 

    Second best in terms of error detection Ñ  Second best in avoiding false positives Ñ  Users preferred over non-language options Dechand, et al., An Empirical Study of Textual Key-Fingerprint Representations. University of Bonn. 2015 Testing the Results
  55. Other “hard” words •  Words that are hard to spell

    •  Words with short edit distance •  Homonyms
  56. Recent Work EFF & Joseph Bonneau’s diceware reboot •  Removed

    most weird words from Diceware •  Sourced familiar words from Ghent University Center for Reading Research •  Removed difficult to spell words & homophones •  Removed vulgar or offensive words •  Identified “concrete” words (e.g. screwdriver vs. love)
  57. Recent Work Bonneau’s three new lists: (1)  7776 words, common

    words (2)  1296 words, most common words (3)  1296 words, autocorrect + autocomplete tuned
  58. Recent Work Bonneau’s Long •  7776 words •  ~12.9 bits/word

    •  7.0 characters/word •  77-bit, 6 words •  English only Peerio word lists •  11600+ words •  ~13.5 bits/word •  5.8 characters/word •  81-bit, 6 words •  12+ languages Bonneau’s Short •  1296 words •  ~10.3 bits/word •  4.5 characters/word •  82-bit, 8 words •  English only
  59. UX improvements •  Incentive for spaced repetition •  Trim/ignore/insert whitespace

    •  Autocorrect •  Autocomplete? •  Special characters?
  60. Let users choose some things? •  Re-roll individual words • 

    Allow insertion of additonal words (e.g. articles, prepositions) •  Allow insertion or edits of suffixes (e.g. control -> controls)
  61. Test various word lists for… Ñ Memorability (short and long) Ñ Speed

    of entry Ñ Accuracy Ñ User preference Upcoming Research
  62. Help out! •  Having native speakers review our dictionaries helps

    •  Use our dictionaries in your projects •  Show us your research or research we missed! •  Let’s collaborate! @skygao_ @peerio @flohdot hNps://passphrases.peerio.com