a hash function H and a message m: 1. Easy to compute 2. Hard to compute m given H(m) 3. Hard to change m without changing H(m) 4. Given m1 , hard to find m2 such that H(m1 ) = H(m2 ) 4
• Compute hash at acquisition time – H(data) = X • Compute hash at any later date – H(data) = X • If the hashes match the data has not changed – Given the principles of cryptographic hashing 5
output – Older and most commonly used in forensics – Developed in 1992 – Fast! – Weaknesses being exploited now • MD5("ABC") = 902fbdd2b1df0c4f70b4a5d23525e932 7
output – NIST Secure Hashing Algorithm – Has potential weaknesses • SHA-2 Family – NIST Secure Hashing Algorithm Family – Weaknesses are theoretical… for now • SHA-3 Standard – Coming soon 8
forensics tools • Available as stand-alone programs too • GUI and Command line tools • In this course we will be using the Hashdeep suite – Free and open source – Recursive processing – http://md5deep.sf.net 9
• For a file of length n, 2n possible inputs • For a 128KB file, there are 2131,072 possible inputs • 2131,072 >> 2128 • Therefore there can be hash collisions 13 Picture courtesy Flickr user addedentry and used under a Creative Commons license, http://www.flickr.com/photos/addedentry/3273096118/
m2 such that H(m1 ) = H(m2 ) – Can be used to create apps with different functionality but the same hash • But you can't choose the hash output • Chosen Prefix Collision Attack – Given p1 and p2 , find m1 and m2 such that H(p1 ||m1 ) = H(p2 ||m2 ) – Can be used to forge code signatures • But you can't choose the hash output 14
h, find m such that H(m) = h – Find a new input which matches a chosen hash • That new input may not be meaningful • Second Preimage Attack – Given m1 , find m2 such that H(m1 ) = H(m2 ) – From existing exe, generate new file which has the same hash • That new input may not be meaningful 15
m2 such that H(m1 ) = H(m2 ) – Takes seconds on a netbook • Chosen Prefix Collision Attack – Given p1 and p2 , find m1 and m2 such that H(p1 ||m1 ) = H(p2 ||m2 ) • Preimage Attack – Given hash output h, find m such that H(m) = h • Second Preimage Attack – Given m1 , find m2 such that H(m1 ) = H(m2 ) 18
m2 such that H(m1 ) = H(m2 ) – Takes seconds on a netbook • Chosen Prefix Collision Attack – Given p1 and p2 , find m1 and m2 such that H(p1 ||m1 ) = H(p2 ||m2 ) – Used to require a cluster of Playstation 3s – Now?? • Preimage Attack – Given hash output h, find m such that H(m) = h • Second Preimage Attack – Given m1 , find m2 such that H(m1 ) = H(m2 ) 19
– Method to turn one-way compression functions into collision- resistant hash functions – So are SHA-1 and SHA-2 • Weaknesses found in MD5 may be applicable to SHA-1 and SHA-2 – They may not survive long! 21
Competition for SHA-3 standard – Much like Rijndael became “AES” • Three year process • Five Finalists – BLAKE, Grøstl, JH, Keccak, Skein • Final decision will be made Real Soon Now™ – http://csrc.nist.gov/groups/ST/hash/sha-3/ 24
in the set – 21 million unique hashes – About 1.6GB of data • Best resource that nobody uses • Lots of programs can parse the file format – But have trouble with the full data set 26
application – Gives thumbs up/down for presence in NSRL – Uses MD5 or SHA1 – Server for *nix – Clients for Windows and *nix • Can use md5deep output as input to nsrlquery 27
– Takes a while to start • Kyrus is testing a public nsrlquery server with MD5 hashes – nsrl.kyr.us – Add -s flag for remote server C:\> md5deep * | nsrllookup -s nsrl.kyr.us 305e40dee29d261d0a3dc466f2184e35 unknown.exe 607e033a16006ed1e9987cfc62562f72 EVILEVIL.exe 29
changes – Verifying the contents of downloaded file – Determining if your forensics tool has made any changes • Hash set tools are great for detecting identical files – Break down when asked to detect changes • Current Approaches – Report known files found – Or report unknown files found – Or report known files not found 31
found – Or report known files not found – Or report unknown files found • We want all three of these! • Along with – Report known files found in new location • Determine what is there • Determine what's supposed to be there • Highlight any mismatches 34
suite – http://md5deep.sf.net/ • Can do positive and negative matching • Multihashing • Hash set audits – Reports any mismatches – Finds new files, moved files, files not found 35
x64 based Windows systems – 32-bit view for 32-bit programs running on a 64-bit OS – http://msdn.microsoft.com/en-us/library/aa384249(v=vs.85).aspx • So what? • 32-bit programs have a different view – For example C:\Windows\System32 – md5deep vs. md5deep64 38
Several other names – Fibonnacci hashing – AccessData Triage Hashing – Piecewise Hashing – Partial hashing – And many more 41 Picture courtesy Flickr user nvarvel and used under a Creative Commons license, http://www.flickr.com/photos/nvarvel/6269179660/
file – Read and compute hash, H(known) • For each unknown file: – Read and compute hash, H(unknown) – For each known hash: • If H(unknown) == H(known) – Match! 42
file – Read and compute hash, H(known) • For each unknown file: – Read and compute hash, H(unknown) – For each known hash: • If H(unknown) == H(known) – Match! 43
• If any part of the content is not the same, the files are not identical • Example: – Identical files are the same size – If two files are not the same size, they are not identical • Therefore we should compare file sizes first – Fast! • Then part of the file • Then the whole file 44
• Need something else for similar files – Define “similar” • Fuzzy Hashing – Context Triggered Piecewise Hashing • Similar patterns of ones and zero • File format agnostic • Implemented in Free ssdeep program – http://ssdeep.sf.net 46
path information -r Work recursively -m [FILE] Load known hashes from FILE and match them -w When used with -m, display which file matched • Pipe the output of one program to another • Use the pipe symbol: | C:\> command1 | command2 • Required nsrllookup flags -s Set server to use. Use nsrl.kyr.us 52