Data compression is so obviously useful that we take it for granted. From ‘Content-Encoding: gzip’ to video streaming to tarballs, compression has long been an important part of every platform. Still, it doesn’t have to be a black box – all it takes is a bit of information theory and some intuitions about patterns in data.
My presentation will cover the algorithms at the heart of most compression tools, as well as how to design protocols and data formats to go with their flow. I’ll start from the ground up (run-length, delta, and huffman coding), pick apart some and tools we use every day (gzip’s DEFLATE, bzip’s Burrows-Wheeler transform), and then show how I wrote a library to do decompression in under 50 bytes of RAM on a hard real-time embedded system.