Distributed Data Store + CoProcessors (Bigtable / HBase) - Business logic upgrade is painful - CoProcessors are not services, more an afterthought - Failure semantics are not well established - More applications means multiple coproc or single bloated coproc - Noisy neighbours / Impedance due to a shared datastore
In-house Vs Off-the-shelf In-house Off-the-shelf Features Subset Superset Moving parts Fully Controllable Community Controlled Ownership Implicit Acquired / Cultural Upfront cost High Low Expertise Hired / Retained / Nurtured Community
Suuchi Provides support for ... - underlying communication channel - routing queries to appropriate member - detecting your cluster members - replicating your data based on your strategy - local state via embedded KV store per node (optionally) github.com/ashwanthkumar/suuchi
Sharding / Routing + Consistent Hash Ring - Your own sharding technique? node 2 node 1 node 3 node 4 Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web
Replication sync async provides very high availability for write systems at the cost of eventual consistency every request is successful only if all the replicas succeeded
Suuchi @ Indix ● HTML Archive ○ Handles 1000+ tps - write heavy system ○ Stores 120 TB of url & timestamp indexed HTML pages ● Stats Aggregation System ○ Approximate real-time aggregates ○ Timeline & windowed queries ● Real time scheduler for our Crawlers ○ Prioritising which next batch of urls to crawl ○ Helps crawl 20+ million urls per day