Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Breaking Monoalphabetic Substitution Ciphers using Generic Algorithms

Breaking Monoalphabetic Substitution Ciphers using Generic Algorithms

Presentation about using an Evolutionary Algorithm to perform automated cryptanalysis against arbitrary text encrypted with a monoalphabetic substitution cipher.

Rod Hilton

April 17, 2012
Tweet

More Decks by Rod Hilton

Other Decks in Technology

Transcript

  1. Outline 2 ¨  Intro to Substitution Ciphers ¤  Monoalphabetic Substitution

    Ciphers ¨  Intro to Metaheuristics ¤  Genetic Algorithms ¨  Applying Genetic Algorithms to Substitution Ciphers ¤  Gene mutation ¤  Fitness functions ¨  Geneticrypt ¨  DEMO!
  2. Substitution Ciphers - Examples ABCDEFGHIJKLMNOPQRSTUVWXYZ! ¨  ROT-13 ¤  “HELLO WORLD”

    -> “URYYB JBEYQ” ¨  Caesar Shift ¤  Shift of 3… ¤  “HELLO WORLD” -> “KHOOR ZRUOG” 3
  3. Substitution Ciphers - Attacking ¨  ROT-13 ¤  Trivial – ROT-13

    again ¨  Caesar Shift ¤  Also trivial – only 26 possible keys. 4
  4. Mono. Sub. Ciphers – Key Generation RODHILTON RODHILTN RODHILTN ABCEFGJKMPQSUVWXYZ

    RODHILTN ZYXWVUSQPMKJGFECBA RODHILTNZYXWVUSQPMKJGFECBA! 6
  5. Mono. Sub. Ciphers – Keyspace ¨  Possible keys: 26! ¨ 

    403,291,461,126,605,635,584,000,000 ¨  At 1,000,000 guesses per second: ¤  403,291,461,126,605,635,584 seconds ¤  6,721,524,352,110,093,926 minutes ¤  112,025,405,868,501,565 hours ¤  4,667,725,244,520,898 days ¤  12,788,288,341,153 years 7
  6. Mono. Sub. Ciphers – Attacks y eyz nz yz nlfjj!

    A eAz nz Az nlfjj (A->Y)! A eAN nN AN nlfjj (N->Z)! A eAN EN AN Elfjj ? (E->N)?! !Search dictionary for regex /e..(.)\1/ only “emcee” works, not that. Also, EN is not a word! A eAN IN AN Ilfjj ? (I->N)?! “igloo”! A eAN IN AN IGLOO (IGLO->NLFJ)! A MAN IN AN IGLOO (M->E)! ! 8
  7. Mono. Sub. Ciphers – Attacks (Cont’d) ! ! “LET’S ATTACK

    AT 0700!”! “FVP'H YPPYUA YP 0700!”! ! ! Don’t even need to crack this, “0700” is obvious. 9
  8. Mono. Sub. Ciphers – Stronger What if we remove punctuation

    and convert numbers to words? “LET’S ATTACK AT 0700!”! “LET’S ATTACK AT ZERO SEVEN ZERO ZERO!”! “LETSATTACKATZEROSEVENZEROZERO”! ! FVPHYPPYUAYPGVKJHVWVZGVKJGVKJ! ! How will we crack this? ! 10
  9. Metaheuristics ¨  Metaheuristics search a problem space for solutions ¨ 

    You don’t know how to solve a problem, but if I gave you a solution, you could grade it. ¤  Searches for solutions with better and better grades 11
  10. Metaheuristics Scatter Search Bee colony Intelligent water drops Tabu search

    firefly Genetic Algorithms Ant colony Harmony search A* Simulated annealing Particle swarm Monkey search Memetic algorithm Spiral optimization Cuckoo search 12
  11. Genetic Algorithms ¨  Based on biological evolution ¨  Replaces natural

    selection with artificial selection ¨  Think of our DNA as a solution to the problem of survival in our environment 13
  12. Genetic Algorithms - Method 14 1.  Generate a random population

    of n  solutions (chromosomes) 2.  Evaluate fitness f(x) of each chromosome 1.  Select  1  ≤  i  <  n  members of population according to fitness (usually 2) 2.  Have i  chromosomes reproduce (crossover) 3.  Modify the offspring in some small way (mutation) 4.  Offspring become a new population of n  solutions (a generation) 3.  Loop until end condition is satisfied
  13. Genetic Algorithms Usage 16 ¨  Genetic Algorithms can be used

    to “solve” a variety of problems ¤  Traveling Salesman ¤  Backing up a truck ¤  Scheduling problems ¤  Hamiltonian Circuit ¤  Pac-Man ¤  Spears at al. even claim GAs can solve Satisfiability problems
  14. Will Genetic Algorithms Work? 17 ¨  Can we crack monoalphabetic

    substitution ciphers with genetic algorithms? ¨  First we must decide: are partial solutions useful?
  15. Partial Solutions 18 ¨  Imagine I’m eavesdropping on my enemy,

    who sends his conspirator the message “QRRJSJQZUPZTYJ” ¨  I will try two different keys for decryption
  16. Partial Solutions 20 Ciphertext: QRRJSJQZUPZTYJ! Plaintext: MEETATMIDNIGHT! Worse key: XPPIQIXYDVYGAI

    ( 2/14) Better key: MEETSTMIDNIGFT (12/14) Though neither key is the correct key: HODURYTVZGCXQPLMAISJFNWEBK! is a better key than YHEUNATLJICDKBMRSWOXVPFQZG!
  17. Partial Solutions 21 Partial solutions are useless for lots of

    ciphers If your guess for my p and q in RSA are both off by a tiny amount, your attempted decryption will be as worthless as a decryption with a p and q off by much more. Partial solutions not useful for RSA, but they are useful for Monoalphabetic Substitution Ciphers
  18. The Checklist 22 ¨  What do we want? 1.  A

    way to randomly generate solutions 2.  A way to do crossover 3.  A way to mutate solutions 4.  A way to evaluate fitness
  19. Generating Chromosomes 23 A way to randomly generate chromosomes? Easy:

    Take the alphabet, shuffle the letters, use that as a key. DXQIBGWJCNRAHLMOVSKPEUTFZY! MBLDEHNWKZORPXVUTCYISQFAGJ! TXHZJAFIYDRNULBWMSCOKQGPVE!
  20. Combining Chromosomes 24 A way to combine chromosomes? Typically GAs

    use crossover: Chromosome 1: 001011|0101101101 Chromosome 2: 110101|1001101010 Offspring : 001011|1001101010
  21. Combining Chromosomes – Cont’d 25 Will crossover work for our

    chromosomes? Chromo 1: GBWKUYJAN|LCVTXMHZISFRDQPOE! Chromo 2: JTCPLUAQS|ZMYOBXIVHFWKNEGDR! Offspring: GBWKUYJAN|ZMYOBXIVHFWKNEGDR! ! ¨  This won’t work! ¤  G, B, W, K, Y, N used twice ¤  C, L, P, Q, S, T unused
  22. Mutating Chromosomes 26 Solution: skip crossover, just do mutation from

    a single candidate instead Chromo 1: ZVJSQBKPACNFLYUMHGWOETDXIR! Offspring: ZVJSQBXPACNFLYUMHGWOETDKIR! Offspring: ZVJMQBKPACNFLYUSHGWOETDXIR! Offspring: ZVJSQBKPNCAFLYUMHGWOETDXIR!
  23. Fitness – The Tricky Bit 27 ¨  Good fitness functions

    = hardest part of doing GA ¨  Many different approaches have been proposed for this exact problem
  24. Fitness – Frequency Analysis 28 Scan a corpus of English

    text, looking at letter frequencies. To score a chromosome: ¤  Compare the letter frequencies of this reference text to the letter frequencies of the ciphertext decrypted with the chromosome key.
  25. Fitness – Bigrams and Trigrams 30 ¨  We can also

    keep track of the frequencies of pairs or even triples of letters. ¤  “ON” occurs much more often than “ZJ” ¤  “ING” occurs much more often than “QSR” ¨  These are called bigrams and trigrams.
  26. Fitness – n-gram Notation 31 Ku (i)   –  Standard

    English frequency for unigram i Kb (i,j)   –  Standard English frequency for bigram ij Kt (i,j,k)   –  Standard English frequency for trigram ijk Du (i)   –  Partially decrypted frequency for unigram i Db (i,j)   –  Partially decrypted frequency for bigram ij Dt (i,j,k)   –  Partially decrypted frequency for trigram ijk
  27. Fitness – Jakobsen et al. 32 Ignores unigrams and trigrams,

    only considers bigrams f  =  Σ|Kb (i,j)   -­‐‑  Db (i,j) |
  28. Fitness – Spillman et al. 33 Spillman performed this attack

    using fitness function f  =   (   1  –   {  Σ  |Ku (i)   -­‐‑  Du (i) |)  +  Σ|Kb (i,j)   -­‐‑  Db (i,j) |   }/4     )8
  29. Fitness – Clark and Dawson 34 Clark and Dawson used

    weights f  =  α  (  Σ|Ku (i)   -­‐‑  Du (i) |)  +                β  (  Σ|Kb (i,j)   -­‐‑  Db (i,j) |)  +            γ  (  Σ|Kt (i,j,k)   -­‐‑  Dt (i,j,k) |) In other words, the sum of the differences in unigram, bigram, and trigram frequencies, each scaled according to a weight
  30. Watch out! – Insufficient Lengths 35 ¨  Problem: Short lengths

    ¨  Text needs to be somewhat lengthy for frequency analysis to work. ¨  Example: ¤  E is 13%, so out of 100 characters, only 13 should be E ¤  In the short message “MEET ME”, E is 3/6, 50% ¤  Longer messages average out these outliers
  31. Watch out! – Local Maxima ¨  Problem: Local Maxima ¨ 

    A GA can go down a path where small mutations can’t rescue it. 36
  32. Geneticrypt! 37 ¨  Implementation of automated cryptanalysis of substitution ciphers

    ¨  Built as a metaheuristic framework, easy to plug in different algorithms and different chromosomes
  33. Geneticrypt – Simulator 38 ¨  Creates a population of keys

    ¤  Evaluates fitness of each key by using it to decrypt cipher text ¤  Selects the most-fit individual, creates new population by swapping two random elements from selected key ¨  Repeat until manual intervention
  34. Geneticrypt – Decisions and Tradeoffs 39 ¨  Population size =

    75 ¤  Smaller population size è Faster iterations, but more of them ¨  Includes most fit in next generation ¤  Doing so avoids mutating off optimal solution ¤  Increases chances of local maximum
  35. Geneticrypt – Fitness Function 40 Geneticrypt uses a modified weighted

    value formula f  =  α  (  1  -­‐‑  Σ|Ku (i)   -­‐‑  Du (i) |)  +                β  (  1  -­‐‑  Σ|Kb (i,j)   -­‐‑  Db (i,j) |)  +            γ  (  1  -­‐‑  Σ|Kt (i,j,k)   -­‐‑  Dt (i,j,k) |) α  =  0.2 β  =  0.3 γ  =  0.5 We want to skew in favor of trigrams.
  36. Geneticrypt – Fitness Function 41 f  =  .2  (  1

     -­‐‑  Σ|Ku (i)   -­‐‑  Du (i) |)  +                .3  (  1  -­‐‑  Σ|Kb (i,j)   -­‐‑  Db (i,j) |)  +            .5  (  1  -­‐‑  Σ|Kt (i,j,k)   -­‐‑  Dt (i,j,k) |) ¨  Each similarity between 0.0 and 1.0 ¨  Starting out, trigram similarity will be very low, 0.5 * 0 = 0, unigrams take over ¨  After some time, trigram similarity is higher, dwarfs unigram
  37. Geneticrypt – Frequency Analysis 42 ¨  Text is pulled from

    Project Gutenburg* ¤  Alice in Wonderland ¤  Hamlet ¤  Adventures of Huckleberry Finn ¤  Pride and Prejudice ¤  A Princess of Mars ¨  370,090 words ¨  Bigrams: 534 out of 262=  676 ¨  Trigrams: 4847 out of 263  =  17547 *http://www.gutenberg.org/
  38. Frequency Analysis (Cont’d) 43 ¨  Frequency analysis ignores spaces ¤ 

    “_in” doesn’t get a trigram entry ¨  Removes punctuation ¤  “won’t” becomes “wont”, trigram entries for “won” and “ont” – no entry for “n’t”
  39. Geneticrypt Project 45 ¨  Geneticrypt is OSS licensed under GPLv3

    ¤  Github page: github.com/rodhilton/Geneticrypt ¤  Core Library and Cryptools CLI written in Java and Groovy, built using Gradle ¤  GUI written in Groovy using Griffon MVC library
  40. Future Work 46 ¨  Formally benchmark population sizes and fitness

    functions ¨  Use Spillman’s genetic crossover function ¨  Try other Metaheuristics to avoid local maxima ¤  Tried on Substitution Ciphers: n  Simulated Annealing (Forsyth) n  Firefly algorithm (Luthra) n  Particle swarm (Uddin and Youssef) n  A* (Josh, remember him?) ¤  Newer algorithms: n  Cuckoo Search (2009) n  Galaxy-based Search (2011) n  Spiral Optimization (2011)
  41. References 47 ¨  Karel P. Bergmann, Renate Scheidler, and Christian

    Jacob, Cryptanalysis using genetic algorithms, Proceedings of the 10th annual conference on Genetic and evolutionary computation (New York, NY, USA), GECCO ’08, ACM, 2008, pp. 1099–1100. ¨  John M. Carroll and Steve Martin, The automated cryptanalysis of substitution ciphers, Cryptologia 10 (1986), no. 4, 193–209. ¨  A Dimovski and D. Gligoroski, Attack on the polyalphabetic substitution cipher using a parallel genetic algorithm, 38th International Scientific Conference on Information, Communication and Energy Systems and Technologies - ICEST, 2003, pp. 318–21. ¨  L. D. Davis and Melanie Mitchell, Handbook of genetic algorithms. ¨  W. S. Forsyth and R. Safavi-Naini, Automated cryptanalysis of substitution ciphers, Cryptologia 17 (1993), no. 4, 407–418.
  42. References 48 ¨  David E. Goldberg, Genetic algorithms in search,

    optimization, and machine learning, Addison-Wesley Professional, 1989. ¨  Omran S.S. Hammood, D.A. and A.S. Al-Khalid, Using genetic algorithm to cryptanalyse a simple substitution cipher, Journal of Computer Sciences, vol. 3, 2007, pp. 134–7. ¨  Kenneth A. De Jong, Kenneth A. De, Jong William, and William M. Spears, Using genetic algorithms to solve np-complete problems, 1989. ¨  J. Luthra and S.K. Pal, A hybrid firefly algorithm using genetic operators for the cryptanalysis of a monoalphabetic substitution cipher, Information and Communication Technologies (WICT), 2011 World Congress on, 11 2011, pp. 202–206. ¨  Sean Luke, Essentials of metaheuristics, lulu.com, 2011. ¨  Robert A. J. Matthews, The use of genetic algorithms in cryptanalysis, Cryptologia 17 (1993), no. 2, 187–201.
  43. References 49 ¨  Hasan Mohammed, Hasan Husei, Bayoumi I. Bayoumi,

    Fathy Saad Holail, Bahaa Eldin, M. Hasan, and Mohammed Z. Abd El-mageed, A genetic algorithm for cryptanalysis with application to des-like systems, 2006. ¨  Abdelwadood Mesleh, Bilal Zahran, Anwar Al-Abadi, Samer Hamed, Nawal Al- Zabin, Heba Bargouthi, and Iman Maharmeh, Genetic cryptanalysis, Networked Digital Technologies, Communications in Computer and Information Science, vol. 87, Springer Berlin Heidelberg, 2010, pp. 321–332. ¨  S.S. Omran, A.S. Al-Khalid, and D.M. Al-Saady, Using genetic algorithm to break a mono - alphabetic substitution cipher, Open Systems (ICOS), 2010 IEEE Conference on, dec. 2010, pp. 63 –67. ¨  S.S. Omran, A.S. Al-Khalid, and D.M. Al-Saady, A cryptanalytic attack on vigenere cipher using genetic algorithm, Open Systems (ICOS), 2011 IEEE Conference on, sept. 2011, pp. 59 –64.
  44. References 50 ¨  Shmuel Peleg and Azriel Rosenfeld, Breaking substitution

    ciphers using a relaxation algorithm, Commun. ACM 22 (1979), no. 11, 598–605. ¨  Hamed Shah-Hosseini, Principal components analysis by the galaxy-based search algorithm: a novel metaheuristic for continuous optimisation, International Journal of Computational Science and Engineering 6 (2011), no. 1, 132–140. ¨  M.F. Uddin and A.M. Youssef, Cryptanalysis of simple substitution ciphers using particle swarm optimization, Evolutionary Computation, 2006. CEC 2006. IEEE Congress on, 0-0 2006, pp. 677 –680. ¨  A. K. Verma, Mayank Dave, and R. C. Joshi, Genetic algorithm and tabu search attack on the mono-alphabetic substitution cipher in adhoc networks, Journal of Computer Science 3 (2007), 134–137.