Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Yuki: Functional Data Structures for Riak

Yuki: Functional Data Structures for Riak

Riak is extremely fast as a key-value store, but querying on secondary indexes or running MapReduce jobs can result in unpredictable latency. In practice, developers often require richer means of querying data in real-time. Yuki is an OCaml library that implements various functional data structures in Riak, giving users the ability to interact with their data as if it were a queue, a heap, a random access list or a custom data structure. Yuki has been used in practice to achieve extremely low-latency random access, flexible paging, and conditional streaming of Riak data.

Ryland Degnan

October 29, 2013
Tweet

More Decks by Ryland Degnan

Other Decks in Technology

Transcript

  1. Yuki: Functional Data Structures for Riak Ryland Degnan Senior Software

    Engineer, Netflix Ryland Degnan Yuki: Functional Data Structures for Riak slide 1 /25
  2. Use case: Skydeck Social application built around photo sharing Users

    can create profiles, exchange messages, post photos and vote on the best photos Backend written in OCaml, interacts with mobile application via REST API All user data stored in Riak, using the excellent protobuf client for OCaml (Dave Parfitt) Redis used for caching, ephemeral data Ryland Degnan Yuki: Functional Data Structures for Riak slide 2 /25
  3. Our experience with Riak In general: Key-value lookup – fast!

    Secondary index, range query, mapreduce – great for analytics or as part of an asynchronous job, but not to serve real-time APIs What if we need to run more complicated queries? For example, “What photos have my friends liked lately?” Paging by date/index, fast lookup based on attributes other than primary key Denormalization, intelligent caching and search go a long way, but we found ourselves wanting a more general solution. Ryland Degnan Yuki: Functional Data Structures for Riak slide 3 /25
  4. One possiblity: a simple linked list Store the head at

    some predefined key, use links to point to the next item in the list a c d What happens when we need to retrieve the last item? Operation List cons, snoc: O(1)/O(n) measure/length: O(n) insert/lookup: O(n) Can we do better? Ryland Degnan Yuki: Functional Data Structures for Riak slide 4 /25
  5. Yuki: Functional Data Structures for Riak A library built on

    top of the Riak client for OCaml—implements several approaches to structuring data, querying in real time Purely Functional Data Strucures (Okasaki, 1996) Lightweight Threads (Lwt) monadic concurrency library Yojson/Biniou for fast serialization Ryland Degnan Yuki: Functional Data Structures for Riak slide 5 /25
  6. What is a functional data structure? Functional programming’s stricture against

    destructive updates is a staggering handicap, tantamount to confiscating a master chef’s knives –Chris Okasaki Like knives, destructive updates can be dangerous when misused –Chris Okasaki In a functional setting all data is immutable: No destructive updates (i.e. read-modify-write) No need for locks, which greatly simplifies concurrency Persistence comes for free Ryland Degnan Yuki: Functional Data Structures for Riak slide 6 /25
  7. Linked list (imperative) a c d (before) a c d

    b (after) Ryland Degnan Yuki: Functional Data Structures for Riak slide 7 /25
  8. Linked list (functional) a c d (before) a c d

    a b (after) Ryland Degnan Yuki: Functional Data Structures for Riak slide 8 /25
  9. Binary tree d f h e b c a (before)

    d f h e b c a d f h g (after) Ryland Degnan Yuki: Functional Data Structures for Riak slide 9 /25
  10. Binary tree type ’a tree = | Empty | Node

    of ’a * ’a tree * ’a tree d f h e b c a An improvement? O(log(n)) insert/lookup, if it’s balanced But still O(n) if it’s not Ryland Degnan Yuki: Functional Data Structures for Riak slide 10 /25
  11. Enforcing the invariant Make illegal states unrepresentable –Yaron Minsky Ryland

    Degnan Yuki: Functional Data Structures for Riak slide 11 /25
  12. 2-3 tree type ’a node = | Node2 of ’a

    * ’a | Node3 of ’a * ’a * ’a type ’a tree = | Zero of ’a | Succ of (’a node) tree e e r t a t o n s i s i h t Ryland Degnan Yuki: Functional Data Structures for Riak slide 12 /25
  13. 2-3 tree performance Getting better! Storing all the data in

    the leaves means that the internal nodes can be used to store information about the data for efficient lookups Operation List 2-3 Tree cons, snoc: O(1)/O(n) O(log(n)) measure/length: O(n) O(1) insert/lookup: O(n) O(log(n)) Could still be improved: Adding an element to the head is O(log(n)), worse than a linked list! We would like to be able to add and remove elements from either end in constant time Ryland Degnan Yuki: Functional Data Structures for Riak slide 13 /25
  14. Finger tree A simple general-purpose data structure, introduced by Hinze/Paterson

    in 2004 Based on the 2-3 tree, provides efficient access to nodes at the left and right ends of the tree through the use of fingers Can serve as a sequence, priority queue, search tree, priority search queue and more simply by modifying the measure Ralf Hinze Ryland Degnan Yuki: Functional Data Structures for Riak slide 14 /25
  15. 2-3 tree vs. finger tree e e r t a

    t o n s i s i h t e e r t a t o n s i s i h t Ryland Degnan Yuki: Functional Data Structures for Riak slide 15 /25
  16. Finger tree type ’a digit = | One of ’a

    | Two of ’a * ’a | Three of ’a * ’a * ’a | Four of ’a * ’a * ’a * ’a type ’a fingertree = | Nil | Single of ’a | Deep of ’a digit * (’a node) fingertree * ’a digit e e r t a t o n s i s i h t Ryland Degnan Yuki: Functional Data Structures for Riak slide 16 /25
  17. Measurements In order to support efficient lookup/insert, the internal nodes

    of the finger tree are used to store an additional field that contains positional or ordering information or both module type Monoid = sig include Stringable val zero : t val combine : t -> t -> t end module type Measure = sig type t module Monoid : Monoid val measure : t -> Monoid.t end Ryland Degnan Yuki: Functional Data Structures for Riak slide 17 /25
  18. Application: random-access sequence Sequences should support fast positional operations such

    as accessing the nth element. To this end we annotate the finger tree with sizes: module Size(Elem:Elem) = struct type t = Elem.t module Monoid = struct type t = int let of_string = int_of_string let to_string = string_of_int let zero = 0 let combine = (+) end let measure _ = 1 end Ryland Degnan Yuki: Functional Data Structures for Riak slide 18 /25
  19. Application: random-access sequence 14 3 e e r 9 5

    2 t a 3 t o n 4 2 s i 2 s i 2 h t Ryland Degnan Yuki: Functional Data Structures for Riak slide 19 /25
  20. Application: ordered sequence If we maintain the sequence in key

    order, we have an implementation of ordered sequences, with the annotations serving as split or signpost keys Measures can be maintained in parallel—by combining sizes with split keys we retain the properties of a random-access list: module Product(M1:Measure )(M2:Measure) = struct type t = M1.t module Monoid = struct type t = M1.Monoid.t * M2.Monoid.t let of_string x = let (m1 , m2) = pair_of_string read_string x in M1.Monoid.of_string m1 , M2.Monoid.of_string m2 let to_string (m1 , m2) = string_of_pair write_string (M1.Monoid.to_string m1 , M2.Monoid.to_string m2) let zero = M1.Monoid.zero , M2.Monoid.zero let combine (m1 , m2) (m1 ’, m2 ’) = M1.Monoid.combine m1 m1 ’, M2.Monoid.combine m2 m2 ’ end let measure x = M1.measure x, M2.measure x end Ryland Degnan Yuki: Functional Data Structures for Riak slide 20 /25
  21. Application: ordered sequence t,14 t,3 t t t s,9 s,5

    s,2 s s r,3 r o n i,4 i,2 i i h,2 h e e,2 e a Ryland Degnan Yuki: Functional Data Structures for Riak slide 21 /25
  22. Finger tree performance Very good! Access to the ends in

    amortized constant time Insert/lookup in time logarithmic in the size of the smaller piece Operation List 2-3 Tree Finger Tree cons, snoc: O(1)/O(n) O(log(n)) O(1) measure/length: O(n) O(1) O(1) insert/lookup: O(n) O(log(n)) O(log(min(n, − n)) Ryland Degnan Yuki: Functional Data Structures for Riak slide 22 /25
  23. Streaming data from Riak Given some function for iterating through

    the data, we would like to generate a lazy stream that can be composed with other streams module FingerTree (...) : sig val iter : string -> (Elem.t -> unit Lwt.t) -> unit Lwt.t val to_stream : string -> Elem.t Lwt_stream.t ... end An MVar (Peyton Jones, 1996) is a single-value “mailbox” variable, used for communication between concurrent threads in a synchronous way: module FingerTree (...) = struct let to_stream ts = let waiter , wakener = Lwt.task () in let mvar = Lwt_mvar. create_empty () in let thread = waiter >>= iter (fun elt -> Lwt_mvar.put mvar (Some elt) ) >> Lwt_mvar.put mvar None in wakeup wakener ts; Lwt_stream .from (fun () -> Lwt_mvar.take mvar) ... end Ryland Degnan Yuki: Functional Data Structures for Riak slide 23 /25
  24. Application: activity feed A combining process can pull from multiple

    streams in parallel—for example, combining event streams from many friends into a single activity feed: Friend 1 Friend 2 Friend 3 Combine streams Activity feed! Ryland Degnan Yuki: Functional Data Structures for Riak slide 24 /25
  25. Future work Support for more finger tree-based data structures: max-priority

    queues, interval trees, etc. Garbage collection? Translation into Scala Contributions welcome! https://github.com/rdegnan/yuki Ryland Degnan Yuki: Functional Data Structures for Riak slide 25 /25