Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Riak Enterprise Revisited (RICON East 2013)

Riak Enterprise Revisited (RICON East 2013)

Presented by Chris Tilt at RICON East 2013.

Riak Enterprise has undergone an overhaul since it's 1.2 days, mostly around Mult-DataCenter replication. We'll talk about the "Brave New World" of replication in depth, how it manages concurrent TCP/IP connections, Realtime Sync, and the technology preview of Active Anti-Entropy Fullsync. Finally, we'll peek over the horizon at new features such as chaining of Realtime sync messages across multiple clusters.

About Chris

Chris has 25 years in the high technology industry as a software developer, CTO, designer, and startup co-founder. He discovered Erlang indirectly through development of telecommunications test equipment at Tektronix, which launched a new passion in functional programming. During the Dot Com days, he co-founded a startup using OCaml as the core language for graph theoretic analysis of web sites. A fear of compilers led to an intense study and eventual job as the lead on a Java-to-native assembly language compiler for a Massively Parallel Processor Array at Ambric, also written in OCaml. Thinking about how to scale software concurrently lead right back to Erlang and Basho, where he works on the Enterprise project team. Chris develops iPhone and Android applications as a hobby. He and his son, Geordie, co-designed a concurrent programming language called 'G' which compiles to C, and a LEGO-sized underwater ROV - both targeted for Arduino. When not programming, he enjoys rocket stoves, cob structures, remodeling, Minecraft, and Kendo.

Basho Technologies

May 13, 2013
Tweet

More Decks by Basho Technologies

Other Decks in Technology

Transcript

  1. Talk • Riak Enterprise Overview • Focus: the “Brave New

    World” of replication • New Features in 1.3 (Released) • What’s coming in 1.4 • Futures
  2. Riak Enterprise • Built upon open-source Riak • 24 x

    7 Legendary Basho Technical Support • Closed-source • Extended Monitoring • Multi Data Center Replication*
  3. Use Cases • Primary cluster with hot failover • Availability

    Zones: active-active clusters create data locality and reduce latency • Reporting/Analytics
  4. Cluster A Cluster B riak objects source sink Realtime Fullsync

    Proxy GET (for riak CS) Replication Protocols
  5. The Old Way Cluster A Cluster B Realtime Fullsync Proxy

    GET ... ... single, multi-plexed connection
  6. Lessons Learned • A single shared TCP/IP connection, for all

    replication, doesn’t scale well. • Make it easy to configure!
  7. Lessons Learned • A single shared TCP/IP connection, for all

    replication, doesn’t scale well. • Make it easy to configure! • Dropping realtime objects during intermittent connectivity is not OK.
  8. Lessons Learned • A single shared TCP/IP connection, for all

    replication, doesn’t scale well. • Make it easy to configure! • Dropping realtime objects during intermittent connectivity is not OK. • Networks are unreliable, even in high-end data centers. Connectivity is hard.
  9. Lessons Learned • A single shared TCP/IP connection, for all

    replication, doesn’t scale well. • Make it easy to configure! • Dropping realtime objects during intermittent connectivity is not OK. • Networks are unreliable, even in high-end data centers. Connectivity is hard. • Load balancing is critical for fullsync.
  10. The Brave New World riak 1.3 • Ground up re-write

    of replication • Node to Node connections • All connections start at single IP:Port • Each protocol has it’s own channel
  11. The Brave New World riak 1.3 • Realtime queues separate

    from connections • Fullsync coordinator controls work load • Connection Manager (now moved to Core) handles backoff and retry • Much simpler command and configuration
  12. Praise • Andy Gross (the epoch) • Andrew Thomson •

    Chris Tilt • Dave Parfitt • Jon Merideth • Micah Warren
  13. Realtime Queues • Hook on Post commit • Push objects

    to ETS Queue • RT connection pulls from Queue • Bounded, drop objects when full • On shutdown, proxy objects to peer node
  14. Fullsync to the Max = N:M Cluster A Cluster B

    source sink e.g. Fullsync 2:2
  15. Fullsync Coordinator workload balancing • Schedules each partition on it’s

    vnode • Reservation system on each sink node • Respects max_fssource_node and a “busy” response from a sink node • Caps connections at max_fssource_cluster
  16. Fullsync to the Max = N:M Cluster A Cluster B

    source sink Fullsync 2:2, max_fssource_cluster = 10
  17. Cluster Manager For Easier Configuration • riak-repl clustername A •

    riak-repl connect 192.168.1.100 • riak-repl realtime enable B • riak-repl realtime start B
  18. Coming in 1.4 • Secure Sockets Layer • Network Address

    Translation • Proxied GET for Riak CS • Fine tuned fullsync concurrency controls
  19. Coming in 1.4 • Better per-connection stats for RT and

    FS • Realtime Chaining amongst multiple clusters • Technology preview of AAE Fullsync
  20. Faster Fullsync • Current Keylist method is slow • Riak

    1.3 has Active Anti-Entropy • Technology Preview AAE Fullsync!
  21. Keylist Compare • Source and Sink each create a keylist

    file • For all keys, write hash of key and object to file • Send the Keylist file from sink to source • Source side compares it’s file to sink’s file
  22. Keylist Compare • Each cluster has to fold over entire

    key space • Time is linear with number of keys, K • Network traffic also liner with K • Fullsync of a 0% update still costs K. Ouch. • Discourages frequent syncrhonizing. Boo.
  23. Active Anti-Entropy • AAE Maintains a hash tree • real-time

    updates • persistent • non-blocking
  24. 63

  25. 65

  26. 66

  27. 67

  28. 68

  29. 69

  30. 70

  31. Fullsync AAE • Implement the AAE exchange over TCP/IP •

    Expect compare time to be linear with %differences • No additional read load from a fold :-)
  32. Fullsync Benchmarks measure key compare time Cluster A Cluster B

    source sink 1 M Keys 1 M - (%missing x 1M) Keys
  33. Keylist vs AAE 0 150 300 450 600 100 50

    10 00.5 64 222 383 488 521 500 486 key compare time (secs) % missing Keylist AAE
  34. What’s Up? • Fast local cloning of a data center

    • Per-bucket replication between multiple data centers • Support AAE fullsync over clusters of differing ring sizes • Replication of CRDTs across clusters • Strong consistency