Upgrade to Pro — share decks privately, control downloads, hide ads and more …

suuchi - distributed data systems toolkit

suuchi - distributed data systems toolkit

Talk gave at Chennai Docker / Go meetup
references available at https://github.com/ashwanthkumar/suuchi-talk

Ashwanth Kumar

November 19, 2016
Tweet

More Decks by Ashwanth Kumar

Other Decks in Technology

Transcript

  1. - traditional 3 tier applications - state is maintained outside

    the app - usually the dbs become the bottleneck - resort to pre-computes for performance increasing complexity data shipping function shipping - data locality - low latency - high performance - low network transfer - modern big-data compute systems - Hadoop MR - Spark - Storm
  2. recursive reduction - sum / multiplication / custom aggregation -

    (sorted) top-K elements - operations on a graph - eg. link reach on twitter graph - any operation that is both associative and commutative
  3. Modelled after Big Table Built for key based lookup Later

    added CoProc - Limited capability - External dependencies like Apache Phoenix real world examples
  4. - Hive / Hadoop MR / Spark / Storm -

    Provides generic computing framework with UDF support - Datastores provides Hadoop integrations - Optimized for Batch processing but hardly for serving online content - Lot of operational overhead - And still no data locality :( real world aggregations
  5. components - membership static dynamic fault tolerance in case of

    node/process failure scaling up/down needs downtime of the system
  6. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  7. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  8. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  9. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  10. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 3 node 4
  11. components - store Embedded Fast persistent KV store Optimized for

    SSDs suuchi-rocksdb VersionedStore ShardedStore
  12. - define a gRPC service using proto2 (or proto3) -

    generate the stubs in java / scala - implement the services - connect them together using Suuchi - Server how?
  13. - Used to build Finder - Internal HTML archive store

    - Handles 1000+ rps - write heavy system - Stores 120TB of data across 10 nodes suuchi @indix