engine: • JVM • Strip away the fat: • Java • Add advanced composite materials: • DSL with inlined functions • Hold on to your socks! 2 • 6.5 Liter naturally aspirated V12 revs to over 11,000 RPM producing 1000 horsepower! • Plus an electric motor from Rimac for extra power & low-end torque
speedup of analytical queries • Sacrificed insert and update performance • Not suitable for general purpose storage • In-memory; Operates on compressed data • Compression metadata doubles as a form of index • Filtering and aggregates operates on compressed data without decompressing • Heavy reliance on number theory with millions of math operations • This section of code is very hot as it’s called in tight loops (profiling is important) • We’ll zoom in to a common operation in the hot portion of the code: • Computing the average of 2 integers 3
(one of which could be negative) always add up to a positive number mathematically regardless of user input • Delta compression of sorted numbers • Eg. -1000, -993, -991, -987, -981 has all positive deltas: 7, 2, 4, 6 • Values in modulo arithmetic • Indexes into an array • eg. binary search • Etc. 4
average(Int.MAX_VALUE, Int.MAX_VALUE) == -1 fun average(value1: Int, value2: Int): Int { return (value1 + value2) / 2 } // Fails for large differences // average (-1_000_000_000, 2_000_000_000) == negative # fun average(value1: Int, value2: Int): Int { val max = max(value1, value2) val min = min(value1, value2) return min + (max - min) / 2 }
where the left-most bit is the only negative value • Hypothetical 4 bit example: -8 = 1000 4 = 0100 2 = 0010 1 = 0001 Max value: 4 (0100) + 2 (0010) + 1 (0001) = 7 (0111) 6
over into the left-most bit • 4 bit example: 7 (0111) + 4 (0100) = -5 (1011) [-8 + 2 + 1] Note: The value would have been correct if the left-most bit was positive 8 7
is 2 ^ position. 4 bit example: 0101 = -0 x 23 + 1 x 22 + 0 x 21 + 1 x 20 = 5 • Dividing an integer by 2 is the same as dividing each component by 2: (component value / 2) = (2position / 2) = 2position – 1 • 2position – 1 is the same value as the component that’s 1 position to the right so we just need to shift all the bits to the right by 1 position. (7 + 4) / 2 = 5 example: 7 (0111) + 4 (0100) = 1011 Shifting right = 0101 = 5 Overflow averted! 8
Long values as I encounter the same problem there • Addressed overflow correctness issues and it’s also much more performant • Division can have latencies of several dozen cycles • Addition and shifting each complete in a single cycle • Additionally, throughput is even higher in superscalar CPU architectures • However, this only works when the values add up to a positive number so we need to make sure this utility is not used with unsupported values. 4 bit example: • -4 / 2 = -2 but 1100 shifted right = 0110 = 6 9 fun positiveAverage(value1: Int, value2: Int): Int { return (value1 + value2).ushr(1) // Unsigned shift right 1 position }
{ val max = max(value1, value2) val min = min(value1, value2) if (min == Int.MIN_VALUE) { throw IllegalStateException("Int.MIN_VALUE is not supported") } if (max < abs(min)) { // Cannot check sum in order to catch large negatives throw IllegalStateException(“Negative mathematical sum detected") } return (value1 + value2).ushr(1) } • Although cleaner than the Java equivalent, it’s still too verbose
using check is much cleaner fun positiveAverage(value1: Int, value2: Int): Int { val max = max(value1, value2) val min = min(value1, value2) check(min != MIN_VALUE) { “Int.MIN_VALUE is not supported" } check(max >= abs(min)) { "Negative mathematical sum detected" } return (value1 + value2).ushr(1) }
abs, & 2 check calls) hurts performance since we perform millions of these operations in tight loops. • Switching to assert statements still incurs a penalty since it needs to check if assertions are enabled for each assertion during runtime. • Other validations create temporary objects and these can place extra pressure on the garbage collector. • The validation results in a larger function (check calls are also inlined at compile time). • Functions that exceed a size limit are not considered for inlining by the JIT compiler. • Inlining has the single largest impact on performance as it enables follow-on optimizations (eg. escape analysis). Even if this function qualifies for inlining, the calling functions themselves might no longer qualify after this gets inlined into the call sites. • Larger functions also negatively affect the CPU instruction cache hit rate. • There are other smaller impacts of larger functions such as longer class loading time, longer JIT compile times etc. 13
Assumptions read naturally • Assumptions are clearly separated from regular code in order to distinguish conditions that should always be true from conditions that check for bad user input fun positiveAverage(value1: Int, value2: Int): Int { assume { val max = max(value1, value2) val min = min(value1, value2) min isNotEqualTo MIN_VALUE max isGreaterOrEqualTo abs(min) } return (value1 + value2).ushr(1) }
Strips out assumptions at compile time when false inline fun assume(assumption: AssumptionVerifier.() -> Unit) { @Suppress("ConstantConditionIf") if (VERIFY_ASSUMPTIONS) AssumptionVerifier.assumption() } object AssumptionVerifier { inline fun verify(value: Boolean, errorMessage: () -> Any): Boolean { check(value, errorMessage) return true } infix fun <T : Comparable<T>> T.isGreaterOrEqualTo(value: T): Boolean { return verify(this >= value) { "$this must be greater than or equal to $value" } } infix fun <T> T.isNotEqualTo(value: T): Boolean { return verify(this != value) { "$this should not be equal to $value" } } … }
production as assumptions are stripped out at compile time for production builds. • The generated class files are much smaller when assumption verification is disabled. • I liberally state all my assumptions at the beginning of most functions. • Reasoning about the code is much easier when you see the expectations clearly laid out. • 99% of defects are detected immediately when changing the code as an assumption will fail. • Bad assumptions don’t propagate. When a defect is found, it’s detected almost right at the root so I don’t have to analyze side effects. • Data integrity is crucial for a storage engine and contract-based programming enables much higher quality guarantees and confidence. 16
unit & integration tests. • I have a test with a false assumption that verifies that an exception is thrown to ensure that assumptions are validated. • User input is heavily validated with regular checks (not with assumptions). • Assumptions are deep in the engine many layers down. • They are there to catch programing defects and validate the correctness of my mental model when performing mathematical transformations of the data. • Enumerating all possible inputs for a storage engine is not feasible. • Test for boundary cases of each component • Create random input, insert it in each layer of the engine and pull it back out to ensure that it matches the original input. Repeat many times and capture the iteration number and seed of the random number generator when it fails in order to reproduce. • Advanced algorithms / data structures are also validated against simple but inefficient reference implementations with random data to ensure that they produce the same results 17