$30 off During Our Annual Pro Sale. View Details »

Suuchi - FifthElephant, Bengaluru 2017

Suuchi - FifthElephant, Bengaluru 2017

Slides were designed by my wife - Swathi Ravichandran
www.swathiravichandran.com | @swathrav

Sriram

July 27, 2017
Tweet

More Decks by Sriram

Other Decks in Technology

Transcript

  1. Desirable Properties • Handle Scale - order of TBs •

    Fault Tolerant • Ease of operations - less moving parts
  2. BigTable, 2006 Dynamo, 2007 Cassandra, 2008 Voldemort, 2009 rise of

    KV Stores distributed, replicated, fault-tolerant, sorted*
  3. boils down to... Distributed Data Store + CoProcessors (Bigtable /

    HBase) …run arbitrary code “next” to each shard
  4. Distributed Data Store + CoProcessors (Bigtable / HBase) - Business

    logic upgrade is painful - CoProcessors are not services, more an afterthought - Failure semantics are not well established - More applications means multiple coproc or single bloated coproc - Noisy neighbours / Impedance due to a shared datastore
  5. In-house Vs Off-the-shelf In-house Off-the-shelf Features Subset Superset Moving parts

    Fully Controllable Community Controlled Ownership Implicit Acquired / Cultural Upfront cost High Low Expertise Hired / Retained / Nurtured Community
  6. पांग ப Communication key=”foo” key=”bar” key=”baz” Request Routing Sync /

    Async Replication Replication Data Sharding Cluster Membership
  7. Suuchi Provides support for ... - underlying communication channel -

    routing queries to appropriate member - detecting your cluster members - replicating your data based on your strategy - local state via embedded KV store per node (optionally) github.com/ashwanthkumar/suuchi
  8. Sharding / Routing + Consistent Hash Ring - Your own

    sharding technique? node 2 node 1 node 3 node 4 Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web
  9. Replication sync async provides very high availability for write systems

    at the cost of eventual consistency every request is successful only if all the replicas succeeded
  10. Getting started • gRPC Service using Protobuf2 • Generate stubs

    & implement them • Connect using Suuchi “Server” abstraction
  11. Suuchi @ Indix • HTML Archive ◦ Handles 1000+ tps

    - write heavy system ◦ Stores 120 TB of url & timestamp indexed HTML pages • Stats Aggregation System ◦ Approximate real-time aggregates ◦ Timeline & windowed queries • Real time scheduler for our Crawlers ◦ Prioritising which next batch of urls to crawl ◦ Helps crawl 20+ million urls per day