Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hashing 101

Avatar for Kyrus Kyrus
October 01, 2012

Hashing 101

An overview of hash functions, their role in computer forensics, and related math-y stuff.

Avatar for Kyrus

Kyrus

October 01, 2012
Tweet

More Decks by Kyrus

Other Decks in Technology

Transcript

  1. 2 Outline •  Introduction •  What is a Hash Function

    •  Integrity of Evidence •  Identifying Known Files •  Algorithms •  Collisions •  Existing Hash Sets •  Hash Set Auditing •  Fast Hash Matching •  Questions
  2. What is a Hash Function •  Reduce variable sized input

    to smaller, fixed size output •  Alice -> 1 •  Bob -> 2 •  Elizabeth -> 5 •  Frank -> 6 •  Zed -> 26 •  Esther -> 5 3
  3. Cryptographic Hashing •  Specific kind of hash function: •  For

    a hash function H and a message m: 1.  Easy to compute 2.  Hard to compute m given H(m) 3.  Hard to change m without changing H(m) 4.  Given m1 , hard to find m2 such that H(m1 ) = H(m2 ) 4
  4. Integrity of Evidence •  Useful for proving something hasn't changed

    •  Compute hash at acquisition time –  H(data) = X •  Compute hash at any later date –  H(data) = X •  If the hashes match the data has not changed –  Given the principles of cryptographic hashing 5
  5. Identifying Known Files •  H(known) = X •  Save known

    hash •  Compute hashes of unknown files •  H(unknown) = X •  Hash match! •  Original files should match too 6 Copyright © 2003 Jesse Kornblum. All Rights Reserved.
  6. Hash Algorithms •  MD5 –  128 bit hash, 32 hex-digit

    output –  Older and most commonly used in forensics –  Developed in 1992 –  Fast! –  Weaknesses being exploited now •  MD5("ABC") = 902fbdd2b1df0c4f70b4a5d23525e932 7
  7. Hash Algorithms •  SHA-1 –  160 bit hash, 40 hex-digit

    output –  NIST Secure Hashing Algorithm –  Has potential weaknesses •  SHA-2 Family –  NIST Secure Hashing Algorithm Family –  Weaknesses are theoretical… for now •  SHA-3 Standard –  Coming soon 8
  8. Hash Programs •  There are many! •  Built into commercial

    forensics tools •  Available as stand-alone programs too •  GUI and Command line tools •  In this course we will be using the Hashdeep suite –  Free and open source –  Recursive processing –  http://md5deep.sf.net 9
  9. Hashdeep Suite •  Hashdeep program –  Uses multiple algorithms simultaneously

    •  Individual algorithms –  md5deep –  sha1deep –  sha256deep –  and more… •  Started as md5deep 10
  10. md5deep in action •  Computing a hash C:\> md5deep kittypr0n.jpg

    12b741af7c77f2fac58d62917f4a0291 C:\kittypr0n.jpg •  Saving known hashes C:\> md5deep kittypr0n.jpg > known.txt •  Matching known hashes C:\> md5deep -m known.txt -r unknown C:\unknown\innocent.txt C:\> md5deep -wm known.txt -r unknown C:\unknown\innocent.txt matches C:\kittypr0n.jpg 11
  11. Hash Collisions •  When two different inputs hash to the

    same value •  When m1 ≠ m2 , H(m1 ) = H(m2 ) •  Extension of the Pigeon Hole Principle 12
  12. Hash Collisions •  Pigeon-Hole Principle •  2128 possible MD5 outputs

    •  For a file of length n, 2n possible inputs •  For a 128KB file, there are 2131,072 possible inputs •  2131,072 >> 2128 •  Therefore there can be hash collisions 13 Picture courtesy Flickr user addedentry and used under a Creative Commons license, http://www.flickr.com/photos/addedentry/3273096118/
  13. Types of Attacks •  Collision Attack –  Find m1 and

    m2 such that H(m1 ) = H(m2 ) –  Can be used to create apps with different functionality but the same hash •  But you can't choose the hash output •  Chosen Prefix Collision Attack –  Given p1 and p2 , find m1 and m2 such that H(p1 ||m1 ) = H(p2 ||m2 ) –  Can be used to forge code signatures •  But you can't choose the hash output 14
  14. Types of Attacks •  Preimage Attack –  Given hash output

    h, find m such that H(m) = h –  Find a new input which matches a chosen hash •  That new input may not be meaningful •  Second Preimage Attack –  Given m1 , find m2 such that H(m1 ) = H(m2 ) –  From existing exe, generate new file which has the same hash •  That new input may not be meaningful 15
  15. MD5 Attacks •  Published in 1992 •  Cryptographically broken in

    1996 –  Different from a practical break •  Collision technique developed in 2004 by Wang et al. •  Chosen prefix attack published in 2007 16
  16. Types of Attacks •  Collision Attack –  Find m1 and

    m2 such that H(m1 ) = H(m2 ) –  Takes seconds on a netbook •  Chosen Prefix Collision Attack –  Given p1 and p2 , find m1 and m2 such that H(p1 ||m1 ) = H(p2 ||m2 ) •  Preimage Attack –  Given hash output h, find m such that H(m) = h •  Second Preimage Attack –  Given m1 , find m2 such that H(m1 ) = H(m2 ) 18
  17. Types of Attacks •  Collision Attack –  Find m1 and

    m2 such that H(m1 ) = H(m2 ) –  Takes seconds on a netbook •  Chosen Prefix Collision Attack –  Given p1 and p2 , find m1 and m2 such that H(p1 ||m1 ) = H(p2 ||m2 ) –  Used to require a cluster of Playstation 3s –  Now?? •  Preimage Attack –  Given hash output h, find m such that H(m) = h •  Second Preimage Attack –  Given m1 , find m2 such that H(m1 ) = H(m2 ) 19
  18. The Future of MD5 20 Picture courtesy Flickr user katerha

    and used under a Creative Commons license, http://www.flickr.com/photos/katerha/4526272937/
  19. Hash Algorithms Families •  MD5 is based on Merkle–Damgård construction

    –  Method to turn one-way compression functions into collision- resistant hash functions –  So are SHA-1 and SHA-2 •  Weaknesses found in MD5 may be applicable to SHA-1 and SHA-2 –  They may not survive long! 21
  20. Hash Algorithm Families •  There are many other families • 

    Davies–Meyer •  Matyas–Meyer–Oseas •  Hirose •  Miyaguchi–Preneel –  Whirlpool algorithm –  Part of hashdeep suite 23
  21. SHA-3 •  National Institute of Standards and Technology (NIST) • 

    Competition for SHA-3 standard –  Much like Rijndael became “AES” •  Three year process •  Five Finalists –  BLAKE, Grøstl, JH, Keccak, Skein •  Final decision will be made Real Soon Now™ –  http://csrc.nist.gov/groups/ST/hash/sha-3/ 24
  22. NSRL •  National Software Reference Library (NSRL) –  Created by

    NIST in 2001 •  Known files –  Not guaranteed to be known good –  NIST will hash anything –  Vendor contact for each file 25
  23. NSRL •  Data set is HUGE –  78 million hashes

    in the set –  21 million unique hashes –  About 1.6GB of data •  Best resource that nobody uses •  Lots of programs can parse the file format –  But have trouble with the full data set 26
  24. NSRL •  nsrlquery –  Robert Hansen –  https://nsrlquery.sf.net/ •  Client/server

    application –  Gives thumbs up/down for presence in NSRL –  Uses MD5 or SHA1 –  Server for *nix –  Clients for Windows and *nix •  Can use md5deep output as input to nsrlquery 27
  25. nsrlquery Displays unknown files as default $ md5deep –br *

    | nsrllookup 305e40dee29d261d0a3dc466f2184e35 unknown.exe 607e033a16006ed1e9987cfc62562f72 EVILEVIL.exe Can also display known files $ md5deep –br * | nsrllookup -k e97295de2a9fde547feab4fe41df16ca mspaint.exe eee470f2a771fc0b543bdeef74fceca0 msiexec.exe 28
  26. Kyrus NSRL Server •  Server requires about 1GB of RAM

    –  Takes a while to start •  Kyrus is testing a public nsrlquery server with MD5 hashes –  nsrl.kyr.us –  Add -s flag for remote server C:\> md5deep * | nsrllookup -s nsrl.kyr.us 305e40dee29d261d0a3dc466f2184e35 unknown.exe 607e033a16006ed1e9987cfc62562f72 EVILEVIL.exe 29
  27. Hash Set Auditing •  Hash sets are used to detect

    changes –  Verifying the contents of downloaded file –  Determining if your forensics tool has made any changes •  Hash set tools are great for detecting identical files –  Break down when asked to detect changes •  Current Approaches –  Report known files found –  Or report unknown files found –  Or report known files not found 31
  28. Example - Bogocopy C:\> dir /b src foo.txt bar.txt C:\>

    md5deep src\* > known.txt C:\> bogocopy src dest C:\> md5deep -lm known.txt dest\* dest\foo.txt dest\bar.txt C:\> dir /b dest foo.txt bar.txt CONFESSION.DOCX 33
  29. Hash Set Auditing •  Current Approaches –  Report known files

    found –  Or report known files not found –  Or report unknown files found •  We want all three of these! •  Along with –  Report known files found in new location •  Determine what is there •  Determine what's supposed to be there •  Highlight any mismatches 34
  30. Hash Set Auditing •  Hashdeep –  Part of the hashdeep

    suite –  http://md5deep.sf.net/ •  Can do positive and negative matching •  Multihashing •  Hash set audits –  Reports any mismatches –  Finds new files, moved files, files not found 35
  31. Example – Bogocopy with Hashdeep C:\> dir /b src foo.txt

    bar.txt C:\> hashdeep -b src\* > known.txt C:\> bogocopy src dest C:\> hashdeep -bak known dest\* hashdeep: Audit failed C:\> dir /b dest foo.txt bar.txt CONFESSION.DOCX 36
  32. Example – Bogocopy with Hashdeep C:\> hashdeep –vvbak known.txt dest\*

    CONFESSION.DOCX: No match hashdeep: Audit failed Files matched: 2 Files partially matched: 0 Files moved: 0 New files found: 1 Known files not found: 0 37
  33. WoW64 Gotcha •  Windows on Windows64 –  x86 emulator for

    x64 based Windows systems –  32-bit view for 32-bit programs running on a 64-bit OS –  http://msdn.microsoft.com/en-us/library/aa384249(v=vs.85).aspx •  So what? •  32-bit programs have a different view –  For example C:\Windows\System32 –  md5deep vs. md5deep64 38
  34. WoW64 Gotcha •  On a 64-bit OS, 32-bit programs see

    a different file C:> md5deep Windows\System32\ieapfltr.dll ee9d715af1b928982f417238b9914484 C:\Windows \System32\ieapfltr.dll (This is actually C:\Windows\SYSWOW64\ieapfltr.dll) C:\> md5deep64 Windows\System32\ieapfltr.dll 8eada158d964e3fd1999ad96c9c507ff C:\Windows \System32\ieapfltr.dll 39
  35. WoW64 Gotcha •  You can detect a 64-bit platform with

    a batch script @if "%PROCESSOR_ARCHITECTURE%" == "x86" (set MD5DEEP=md5deep.exe) else (set MD5DEEP=md5deep64.exe) %MD5DEEP% -re C:\Windows\System32\* 40
  36. Fast Hash Matching •  I did not invent this • 

    Several other names –  Fibonnacci hashing –  AccessData Triage Hashing –  Piecewise Hashing –  Partial hashing –  And many more 41 Picture courtesy Flickr user nvarvel and used under a Creative Commons license, http://www.flickr.com/photos/nvarvel/6269179660/
  37. Fast Hash Matching •  Traditional Approach: •  For each known

    file –  Read and compute hash, H(known) •  For each unknown file: –  Read and compute hash, H(unknown) –  For each known hash: •  If H(unknown) == H(known) –  Match! 42
  38. Fast Hash Matching •  Traditional Approach: •  For each known

    file –  Read and compute hash, H(known) •  For each unknown file: –  Read and compute hash, H(unknown) –  For each known hash: •  If H(unknown) == H(known) –  Match! 43
  39. Assumptions •  Searching for identical files –  Based on content

    •  If any part of the content is not the same, the files are not identical •  Example: –  Identical files are the same size –  If two files are not the same size, they are not identical •  Therefore we should compare file sizes first –  Fast! •  Then part of the file •  Then the whole file 44
  40. Fuzzy Hashing •  Cryptographic hashing is great for identical matches

    •  Need something else for similar files –  Define “similar” •  Fuzzy Hashing –  Context Triggered Piecewise Hashing •  Similar patterns of ones and zero •  File format agnostic •  Implemented in Free ssdeep program –  http://ssdeep.sf.net 46
  41. 47 Outline •  Introduction •  What is a Hash Function

    •  Integrity of Evidence •  Identifying Known Files •  Algorithms •  Collisions •  Existing Hash Sets •  Hash Set Auditing •  Fast Hash Matching •  Questions
  42. Questions? Jesse Kornblum [email protected] 48 Picture courtesy Flickr user Krysten

    Newby and used under a Creative Commons license, http://www.flickr.com/photos/krystenn/340848452/
  43. Hands On Exercises A.  Verify Image Integrity B.  Searching for

    Known Files C.  Eliminating Known Files 49
  44. Searching for Known Files •  Useful md5deep flags -b Omit

    file path information -r Work recursively -m [FILE] Load known hashes from FILE and match them -w When used with -m, display which file matched 51
  45. Eliminating Known Files •  Useful md5deep flags -b Omit file

    path information -r Work recursively -m [FILE] Load known hashes from FILE and match them -w When used with -m, display which file matched •  Pipe the output of one program to another •  Use the pipe symbol: | C:\> command1 | command2 •  Required nsrllookup flags -s Set server to use. Use nsrl.kyr.us 52