BLAKE Cryptographic Hash Function Paper

BLAKE Cryptographic Hash Function and FINAL BLAKE 256 4003482 Team
Asteroid: Breandan Considine, Sean Pearl and Zhenhua “Java” Lu TABLE OF CONTENTS 1

Heading Page No. I. Introduction 3 II. BLAKE Hash Function
4 i. Background 4 ii. Overview 4 iii. Example Inputs and Outputs 8 III. BLAKE256 Implementation 9 i. Initial Design 9 ii. Initial CPU Profiling 10 iii. Design Revisions 11 iv. Results 11 IV. Guides and Resources 12 i. User 12 ii. Developer 12 V. Project Results 13 i. Learning Outcomes 13 ii. Future Work 13 iii. Roles and Responsibilities 13 References 14 2

I. Introduction This document covers the whole of our work
with the BLAKE hash function. Our implementation covers the BLAKE256 variant, but it should be simple for a developer working from our source to derive the other specifications, especially BLAKE512. We chose BLAKE256 for two reasons: it is more clear for the purpose of an exercise and presentation, and 32bit words lend themselves more readily to the integer type in Java. Our decision was to develop BLAKE in Java and accept the additional overhead of the language and JVM, versus Python. 3

II. BLAKE Hash Function i. Background The BLAKE cryptographic hash
function was first developed by JeanPhilippe Aumasson, Luca Henzen, Willi Meier, and Raphael Phan to be submitted as a competitor in the NIST SHA3 hash competition. Originally submitted in 2008, BLAKE has survived two rounds against 50 and 13 competing hash functions, respectively. It has been selected as a finalist against four other candidates, and awaits NIST’s decision, which is scheduled to be given this year (2012). The current final BLAKE family has been tweaked slightly since the original submission, as permitted by the competition rules. The changes affected the number of rounds of the core manipulations that would run per message block, to be more conservative about security while remaining fast. You can read more about BLAKE at http://131002.net/blake/ ii. Overview There are four variations of BLAKE as specified in the final BLAKE submission. Figure 1 shows the differences, the most striking being in word and digest size, the latter of which defines the name of each variant. Figure 1. Thus, the core BLAKE256 compression function takes, as an input, 512 bits/16 words/64 bytes of message data, 256 bits/8 words/32 bytes of chaining value, 128 bits/4 words/16 bytes of salt, and additionally a counter that is 64 bits/2 words/8 bytes. 4

The initial 8 words of chaining values are the same
initial values as in SHA256, and BLAKE also specifies a table of 16 constants taken from pi. These were chosen, as in other hash functions, to prevent weakness with certain types of input. The iteration mode BLAKE uses is HAIFA, which allows for explicit handling of a salt to resist certain preimage attacks. The compression function in BLAKE works in three stages, the local widepipe construction shown in Figure 2. The first portion, the initialization stage, takes as input the 8 chaining words, 4 salt words, 8 constant words and 2 counter words. It combines these to achieve an expansion to 16 words that the second portion will use, as shown in Figure 3. Figure 2. Figure 3. h chaining value, s salt, c constant, t counter The second stage of the BLAKE compression function is the round function, which it takes from the ChaCha stream cipher. BLAKE256 iterates 14 times over the round function, also called the G function (whereas BLAKE512 iterates 16 times). Each round is made up of a column step followed by a diagonal step, each consisting of four quarter rounds that can be run in parallel. The selected words from the expanded chaining value for each individual column/row step are shown in Figure 4. 5

Figure 4. These selected values are fed into the Gi
function, along with two words of the message and two words of the constants, selected from a permutation of the message block and constants, shown in Figure 5. Figure 5. The selection of permutation is the round number (0..13) mod 10, to give us one of the nine permutations in Figure 5. The permutations are used in the Gi function, shown in Figure 6 and Figure 7, to determine which message word and constant word are used. “i” is in 1 ... 7 from Gi in G1 … G7 from Figure 4. The bit shifts vary in BLAKE512. 6

Figure 6. Figure 7. The Gi function runs 8 times
over 14 rounds, so 112 times per message block. The permutations are chosen so that no constant is used more than once with the same word of the message. After all fourteen rounds are executed, the 16 words are finalized to give 8 chaining values, according to Figure 8. From the specification, BLAKE256 can be used on messages less than 2^64 in length, while BLAKE512 supports message lengths up to 2^128, to prevent certain collisions that could be obtained simply by extending a message that far. 7

Figure 8. h’ new chaining value, h old
chaining value, s salt iii. Example Inputs and Outputs We took a number of inputs and hashed them, and found quite different results for even small change. An empty document had a hash, as output in binary, of 1100001110100101110000101001000001100001110001001110000101011111011000 0111000110111000010101111011100001010101001110000111000010011000010101 0110101100000001001001100010110000111001101001010001010011010101000101 0010111000010101011000100001111000101000000010111010011011011100001110 1001111100001010110111001111000110100110101011000011101010010101101011 1000101000000010010000011100010100000001010000101000100 or roughly “}x ’Ý LÓg©åB‘"‰uPYD3šÖïÙÛË1[W|S” in ascii. A document with one space had an output of roughly “å É¾Í½©Ä ` bÚQMQ%¬!›mç∙jéZ†‡D” in ascii. The test inputs and output files in the project archive contain the exact inputs and outputs for Alice in Wonderland and other documents. 8

BLAKE256 Implementation i. Initial Design The design of our BLAKE256
implementation can be described in two phases, the first of which was data representation, and the second of which was data flow. Since words in BLAKE256 are 32 bits/8 bytes, we decided to store everything described as a word as an integer in Java, which is itself a 32bit datatype. We converted some of the larger constants and chaining values to their negative equivalent in binary, because Java integers are signed. We were careful to do this by casting (to long and then back to int), to ensure that the actual bits of the number remained unchanged. The arithmetic operations were defined in the documentation in terms of bit shifting, addition, and xor, so those are the operations we use on ints. For collections of words, such as the block of chaining values or a message block or the constants, we used arrays of type int. We chose to do this so that we could use iteration in the spirit of good coding practice, and for ease of writing. There are some instances where ints are converted to bytes and viceversa, but only for the purposes of IO. We also defined a few counters after when we were designing the data flow. Most of the noniterated data is stored as private fields, for reasons besides efficiency. For our data flow, we decided to take in input as bytes, which is quite natural for IO. We defined the method bytePack to take bytes one by one in sequence until it had four, which it packed into an int and gave to the hash method, in essence becoming a buffer. The hash method would do similarly, taking message words in until it had 16, at which point it would call the rounds method to perform the rounds. The rounds method first calls the initialize method in BLAKEHash, which returns an array containing the expanded chain values. We define a loop in the range of 0 … 13 to perform 14 rounds of the G function on this array, which is in itself two loops, one for the column steps followed by one for the row steps. Each of these loops would contain, respectively, the code for a single Gi function for a row or column. Since the expanded array was stored on locally and not as a field, we decided not to make method calls for each Gi function, which also made for better performance. We defined another method in BLAKEHash, finalize, which is called after the rounds on the expanded array and 9

places the finalized values into the array for chain values.
The digest method in BLAKEHash simply loops over the array of chain values, converting each word into four bytes, which it places into the array that it was passed. We elected not to implement a hard ceiling for the amount of input that had been hashed, as it would be far beyond the capabilities of modern machines to hash that much input before we died. ii. Initial CPU Profiling Across the three tests we settled on for profiling, the rounds method in BLAKEHash takes up most (over half) of the runtime itself, with some of the time taken up by other methods, like initialization and finalization. Line by line, most of the lines taking up the time were the ones in the two Gi function loops. The 1 million zeros test took 1.6 seconds, the 10 million zeros test took 16.76 seconds, and the 100 million zeros test took 156.24 seconds. All of the testing was run on glados for speed and consistent performance. The full results of the profiling can be found in the project archive. iii. Design Revisions Looking at the results of the profiling, we decided to take the recommendation of unravelling the loops in the G function in the rounds method in the BLAKEHash class. Our arithmetic was already largely as efficient as it could be, so we didn’t change that. As a result, however, there were a few operations that we didn’t have to perform. A majority of these were to obtain indeces in arrays. For example, the line m[1][(g+1)%4] = (m[1][(g+1)%4] ^ m[2][(g+2)%4]) >>> 12; became just lines like m[1][3] = (m[1][3] ^ m[2][0]) >>> 12; 10

Which should theoretically save us a few fractions of a
second over a few hundred thousand calls to the G function. We attempted to remove the indexing overhead by changing arrays to many single variables, but it effected too large a change on the rest of the code to be worthwhile. iv. Results The changes we made managed to slice off a small percentage of the running time by eliminating some of the overhead that comes with looping. Over 100 million bytes hashed, we managed to save eight seconds (148.34 seconds, saving around 67% of the total running time). Over ten million, we saved about a second and a half (15.14 seconds), and over a million we saved about a tenth of a second (1.51), so between 5% and 10% gains for each category. The line profile of each shows a shift of lines in the rounds method taking up less time, though this is slightly less helpful because there are two to three times as many lines in rounds after optimization. The full results of the profiling can be found in the project archive. 11

IV. Guides and Resources i. User Java is a language
that runs on a virtual machine which executes java bytecode and has been ported to most/all major platforms. We have not created a graphical user interface to BLAKE256. In *nix and Windows shells, you can navigate to the bin directory and use the command “java BLAKE256 infile outfile”, where “infile” is the path to the input plaintext file, and “outfile” is the path to the output plaintext file. ii. Developer If you want to recompile the source, you can do so with any Java compiler. We give two examples below: In *nix and Windows shells, you can use the javac compiler from the JDK. Navigate to the source directory and use the command “javac *.java”. This will create the necessary .class files in the source directory to run BLAKE256 with the java command. Also on *nix and Windows systems, you can download and install the Eclipse IDE to compile and run the source code. You can find more instructions on how to do so at http://www.eclipse.org/ . 12

V. Project Results i. Learning Outcomes At the end of
the project, we saw that BLAKE does not need to reinvent the wheel to be a successful hash function. Our work implementing BLAKE showed us that a cryptographic algorithm can be made from preexisting components and utilize their individual traits, and a strong and fast system can be formed. We also learned a number of implementation and optimization methods for cryptographic algorithms in Java, including bitshifting, loop unraveling and array overhead (which we did attempt to eliminate). ii. Future Work One of the things we would do like to do in the future is to port BLAKE256 to C in order to gain performance advantages over the Java platform. Java just has too much overhead to be practical for small systems like a cryptographic algorithm. We might also have considered designing a hardware implementation for FPGA. Another thing we would be interested in doing is what the graduate students did, running an analysis of BLAKE256’s cryptographic strength for various numbers of rounds using a suite like the NIST test suite or TestU01. iii. Roles and Responsibilities We split up the code and documentation responsibilities into three groups, each of which were assigned to two people from the group. Lu and Breandan were responsible for writing and testing the main class BLAKE256. Breandan and Sean were responsible for the design and implementation of the BLAKEHash class including initialization, finalization, and rounds. Lu and Sean were responsible for the documents, including this document. Sean was responsible for running and recording the time analysis, and the entire team worked on optimization. Lu was responsible for setting up and managing the team website. 13

VI. References ◦ J. Aumasson, L. Henzen, W. Meier, R.
Phan. BLAKE cryptographic hash function, SHA3 proposal finalist, 2010. http://www.131002.net/blake/blake.pdf accessed 16 Mar 2012. ◦ SHA3 proposal BLAKE . Webpage http://www.131002.net/blake/ accessed 16 Mar 2012. ◦ The cryptographic hash function BLAKE, 2011. http://www.youtube.com/watch?v=PgpJNRnx6eY accessed 16 Mar 2012 14

BLAKE Cryptographic Hash Function Paper

BLAKE Cryptographic Hash Function Paper

Breandan Considine

More Decks by Breandan Considine

Featured

Transcript

BLAKE Cryptographic Hash Function and FINAL BLAKE 256 4003482 Team

Heading Page No. I. Introduction 3 II. BLAKE Hash Function

I. Introduction This document covers the whole of our work

II. BLAKE Hash Function i. Background The BLAKE cryptographic hash

The initial 8 words of chaining values are the same

Figure 4. These selected values are fed into the Gi

Figure 6. Figure 7. The Gi function runs 8 times

Figure 8. h’ new chaining value, h old

BLAKE256 Implementation i. Initial Design The design of our BLAKE256

places the finalized values into the array for chain values.

Which should theoretically save us a few fractions of a

IV. Guides and Resources i. User Java is a language

V. Project Results i. Learning Outcomes At the end of

VI. References ◦ J. Aumasson, L. Henzen, W. Meier, R.