Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Suuchi - Distributed System Primitives

Suuchi - Distributed System Primitives

D90acaa01cb59a2b8b7e986958953eee?s=128

Ashwanth Kumar

April 20, 2017
Tweet

Transcript

  1. ashwanth kumar @_ashwanthkumar software engineer suuchi - library of distributed

    systems primitives
  2. blank yep, it’s intentional

  3. from simple to unmanageable beasts how we scale systems? an

    opinionated view
  4. Service

  5. Service

  6. Service Service

  7. Service Service Service

  8. Service Service Service

  9. Service Service Service

  10. Service Service Service

  11. rise of KV stores distributed, replicated, fault-tolerant (optionally sorted) 2006

    BigTable from Google 2007 Dynamo from Amazon 2009 VoldemortDB from LinkedIn Cassandra from facebook 2008
  12. Distributed (NoSQL) Datastores Service Service Service

  13. Distributed (NoSQL) Datastores Service Service Service Network Latency

  14. data locality for low latency / data intensive applications

  15. Service Service Service Co-locate data to improve performance

  16. Sharded and replicated to improve throughput Service Service Service

  17. Service Service Service Deal with complex distributed system problems at

    the application layer
  18. 1.1.1.1 1.1.1.2 1.1.1.3 A stats.service.ix 1.1.1.1 1.1.1.2 1.1.1.3 DNS based

    Load Balancing Distributed stats aggregation system @indix Count(“a”, 1L) Unique(“a”, 1L) Count(“c”, 1L) Unique(“a”, 1L) Count(“b”, 1L) monoid.plus monoid.plus monoid.plus
  19. पांग ப Communication key=”foo” key=”bar” key=”baz” Request Routing Sync /

    Async Replication Replication Data Sharding Cluster Membership distributed system problems
  20. suuchi github.com/ashwanthkumar/suuchi library of distributed systems primitives

  21. + HandleOrForward - Broadcast - ScatterGather primitives - communication uses

    http/2 with streaming
  22. + Consistent Hash Ring - Your own sharding technique? primitives

    - sharding/routing Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  23. primitives - membership static dynamic fault tolerance in case of

    node/process failure scaling up/down needs downtime of the system
  24. primitives - replication sync async provides very high availability for

    write systems at the cost of eventual consistency every request is successful only if all the replicas succeeded
  25. + KeyValue + RocksDB - Your Own Abstraction? primitives -

    storage embedded KV store from FB for server workloads
  26. - define a gRPC service using proto2 (or proto3) -

    generate the stubs in java / scala - implement the services - connect them together using Suuchi - Server abstraction getting started
  27. let’s see some code see reference for actual links

  28. - HTML Archive System - Handles 1000+ tps - write

    heavy system - Stores ~120TB of url and timestamp indexed HTML pages - Stats (as Monoids) Storage System* - All we want is approximate aggregates real-time - Real-time scheduler for our crawlers* - Finds out which of the 20 urls to crawl now out of 3+ billion urls - Helps crawler crawl 20+ million urls everyday suuchi @indix
  29. idea behind suuchi membership, request routing / sharding 2011 Gizzard

    from Twitter 2016 Suuchi 2016 Slicer from Google 2015 RingPop from Uber
  30. questions? references available at github.com/ashwanthkumar/suuchi-ds-primitives

  31. more on consistent hash ring

  32. primitives - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  33. primitives - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  34. primitives - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  35. primitives - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 3 node 4
  36. - Peer to Peer system - no single point of

    contact - Each node handles or forwards requests transparently - Uses pluggable partitioner scheme - Can be customized as weighted distribution / Rendezvous Hash etc. primitives - request routing