Upgrade to Pro — share decks privately, control downloads, hide ads and more …

suuchi - distributed data systems toolkit

suuchi - distributed data systems toolkit

Talk gave at Chennai Docker / Go meetup
references available at https://github.com/ashwanthkumar/suuchi-talk

D90acaa01cb59a2b8b7e986958953eee?s=128

Ashwanth Kumar

November 19, 2016
Tweet

Transcript

  1. suuchi - distributed data systems toolkit bit.ly/suuchi-toolkit

  2. blank yep, it’s intentional

  3. ashwanth kumar @_ashwanthkumar ashwanthkumar.in principal engineer

  4. data system paradigms things you always knew but never heard

  5. data shipping function shipping

  6. sum of numbers on data shipping paradigm

  7. data shipping paradigm select col from table;

  8. select col from table; data shipping paradigm <<results>>

  9. select col from table; data shipping paradigm <<results>>

  10. sum of numbers on function shipping paradigm

  11. function shipping paradigm select sum(col) from table;

  12. function shipping paradigm select sum(col) from table;

  13. <<result>> function shipping paradigm select sum(col) from table;

  14. data locality for low latency / data intensive applications

  15. - traditional 3 tier applications - state is maintained outside

    the app - usually the dbs become the bottleneck - resort to pre-computes for performance increasing complexity data shipping function shipping - data locality - low latency - high performance - low network transfer - modern big-data compute systems - Hadoop MR - Spark - Storm
  16. recursive reduction aggregations in distributed systems

  17. recursive reduction select sum(col) from table;

  18. recursive reduction select sum(col) from table;

  19. recursive reduction select sum(col) from table; <<result>>

  20. recursive reduction - sum / multiplication / custom aggregation -

    (sorted) top-K elements - operations on a graph - eg. link reach on twitter graph - any operation that is both associative and commutative
  21. Modelled after Big Table Built for key based lookup Later

    added CoProc - Limited capability - External dependencies like Apache Phoenix real world examples
  22. - Hive / Hadoop MR / Spark / Storm -

    Provides generic computing framework with UDF support - Datastores provides Hadoop integrations - Optimized for Batch processing but hardly for serving online content - Lot of operational overhead - And still no data locality :( real world aggregations
  23. suuchi toolkit for building distributed function shipping applications github.com/ashwanthkumar/suuchi

  24. blocks

  25. components - transport uses http/2 with streaming

  26. components - membership static dynamic fault tolerance in case of

    node/process failure scaling up/down needs downtime of the system
  27. components - dynamic membership raft swim consistency vs availability

  28. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  29. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  30. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  31. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  32. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 3 node 4
  33. components - replication Res Req

  34. components - store store agnostic ops - Get - Put

    / Remove WIP - Scan - Batch
  35. components - store Embedded Fast persistent KV store Optimized for

    SSDs suuchi-rocksdb VersionedStore ShardedStore
  36. - define a gRPC service using proto2 (or proto3) -

    generate the stubs in java / scala - implement the services - connect them together using Suuchi - Server how?
  37. see some code see notes section for links

  38. - Used to build Finder - Internal HTML archive store

    - Handles 1000+ rps - write heavy system - Stores 120TB of data across 10 nodes suuchi @indix
  39. questions? references available at github.com/ashwanthkumar/suuchi-talk