Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Structures: The Code That Isn’t There

Atomic Object
September 24, 2012

Data Structures: The Code That Isn’t There

Watch the presentation at: http://www.infoq.com/presentations/Data-Structures

Most programmers rely on a few core data structures, but they’re missing out on useful properties that more specialized data structures provide.

The wrong data structures can bog implementation down in irrelevant detail or create behaviors which waste time and effort, but the right ones can give powerful guarantees for free. My talk will present lesser-known data structures and their unique advantages:

- Skiplists are simple data structures whose design leads to balanced binary tree-like performance, without any need for non-localized operations such as rebalancing. (Example use case: Demonstrating how simple invariants can lead to powerful emergent properties.)

- Difference lists provide a way to explicitly model temporary uncertainty. They are immutable, yet can still be refined as more information becomes available. They have much in common with lazy evaluation, but for data rather than control flow. (Example use case: Adding more flexibility to immutable languages by relaxing the flow of time.)

- Rolling hashes can find deterministic breaking points in buffers of binary data, enabling consistent chunking and re-use as data changes. (Example use case: rsync.)

- Jumpropes (a data structure of my own invention) automatically de-duplicate content stored in them, including data shared between multiple files. Modified content can be stored with very little additional overhead, allowing for cheap versioning. Finally, the next several fragments can always be retrieved in parallel, enabling simple buffering for streaming media. (Example use case: scatterbrain, a distributed filesystem to be released soon.)

Atomic Object

September 24, 2012
Tweet

More Decks by Atomic Object

Other Decks in Technology

Transcript

  1. "The cheapest, fastest, and most reliable components are those that

    aren't there." data structures - Gordon Bell
  2. Unification ?- [X, Y, X] = [1, 2, Z]. X

    = 1 Y = 2 Z = 1 yes difference lists
  3. Unification ?- [X, Y, H] = [1, 2, Z]. X

    = 1 Y = 2 Z = H ??? yes difference lists
  4. ?- A = [1,2|B], B = [3|C] A = [1,

    2, 3|C] yes difference lists
  5. ?- A = [1,2|B], B = [3|C] A = [1,

    2, 3|C] yes difference lists Yes!
  6. more fundamental than lazy evaluation or lazy streams (a bit

    closer to futures/promises) difference lists
  7. (for more details, check out Andrew Tridgell’s thesis: Efficient Algorithms

    for Sorting and Synchronization, pg. 64) rolling hash
  8. then, it just needs a key/value store get(hash) => jr_node

    set(hash, jr_node) => OK | ERROR jumprope