Lessons Learned in Building an analytics stack

Lessons Learned The Hard Way : Building an Analytics Stack
Nishant Druid Committer Software Engineer @ Metamarkets

AGENDA ➡ Demo ➡ Motivations ➡ Technical Challenges ➡ Success
and Failures ➡ Lessons Learned in the Journey

IN CASE INTERNET DIDN’T WORK PRETEND YOU SAW SOMETHING COOL!!

WE HAVE LOT OF EVENT DATA!

Motivations • Interactive data warehouses • Answer BI questions •
How many unique male visitors visited my website last month • How much revenue was generated last quarter broken down by a demographic • Not dumping an entire data set • Not querying for an individual event • Cost effective (we are a startup after all)

Technical Challenges • Ad-hoc queries • Arbitrarily slice ’n dice,
and drill into data • Immediate insights • Scalability • Availability • Low operational overhead

WHERE WE STAND TODAY • Over 10 trillion events •
~ 40 PB of raw data • Over 200 TB of compressed query-able data • Ingesting over 300,000 events/sec on average • Average query time 500ms • 90% queries under 1 second • 99% queries under 10 seconds

HOW DID WE REACH HERE ?

WHAT WE TRIED • RDBMS - Relational Database (MySQL, Postgres)

RDBMS - SETUP • Common setup for data warehousing •
Star Schema • Aggregate Tables • Query Caching

RDBMS - Results Naive benchmark scan rate ~ 5.5M rows
/ second / core 1 day of summarized aggregates 60M+ rows 1 query over 1 week, 16 cores ~ 5 seconds Page load with 20 queries over a week of data ……. long time

• NOSQL - Key/Value Store (HBase, Cassandra)

NOSQL - SETUP • Pre-aggregate all dimensional combinations • Store
results in a NOSQL store

NoSQL - Results • Queries were fast • range scan
on primary key • Inﬂexible • not aggregated, not available • Not continuously updated • Dimensional combinations & Processing => scales exponentially • Example: ~ 500k records • 11 dimensions : 4.5 hours on a 15-node Hadoop Cluster • 14 dimensions : 9 hours on a 25-node hadoop cluster

• NOSQL - Key/Value Store (HBase, Cassandra)

• NOSQL - Key/Value Store (HBase, Cassandra) • ???????

WHAT WE LEARNED • Problem with RDBMS : scans are
slow • Problem with NoSQL : computationally intractable • Tackling the RDBMS issue seems easier

What is Druid ? • Open-Source • Column-oriented • Distributed
• Fast • Real-time • Approximate & Exact • Highly Available • Scalable to Petabytes • Deploy Anywhere • Data store

Early Druid Architecture Historical Nodes Batch Ingestion Broker Nodes Queries

Historical Nodes • Main workhorses of a Druid cluster •
Shared-nothing architecture • Load Immutable read-optimized Segments • Respond to queries

Broker Nodes • Query Scatter/Gather • Maintain a timeline view
of the cluster • Send requests to multiple historical nodes and merge results • Supports Caching of query results

Current Druid Architecture Historical Nodes Batch Ingestion Realtime Ingestion Realtime
Nodes Broker Nodes Queries Handover

Real-Time Nodes • Log-structured merge-tree • Ingest data and buffer
events in a write-optimized data structure • Periodically persist collected events to disk (converting to a read- optimized format) • Query data as soon as it is ingested • Merges all intermediate segments and hands them over to historical nodes

Coordinator Nodes • Distributes data across historical nodes • Asks
historical nodes to drop/load data • Manages replication

More Problems as we grew….. • Monitoring • Scaling •
Efﬁciency • Cost • Software Updates • Multi tenancy

Monitoring Druid Cluster • Emitting and Collecting metrics data for
query performance • Data without tools to analyze it is useless. • WAIT… We have Druid. Use Druid to monitor Druid!! • > 10TB of metrics data in Druid • Interactive exploration of performance metrics allows to pinpoint problems quickly • Narrow problems down to individual query and server • Provides both big picture and the detailed breakdown

Scaling is Hard • Data doubles every 2 months •
More Data -> More Nodes -> More Failures & More Cost!! • Throwing money at the problem only a short term solution • Some piece always fails to scale • Startup means daily operations handled by dev team

Understanding Users and their Queries • Analyze customer query data
from metrics cluster • Percentage of data queried at any given time is small • Query load across the nodes is NOT uniform • Users really look only for recent data interactively using dashboard (3 Months) • Users run quarterly reports (Non-Interactive scripts) • Large queries create bottlenecks and resource contention • 20% of users take 80% of resources

Use Memory Mapped files • All in-memory - fast and
simple • Keeping all data in memory is expensive • Percentage of data queries at any given time is small • Memory management is hard, let OS handle paging • Flexible conﬁguration - control how much to page • Cost vs Performance becomes a simple dial • Use SSDs to mitigate performance impact (still cheaper than RAM)

Caching to Improve Performance • Caching of By Segment Query
Results on Broker/Historical • Distributed Memcache Cluster for storingg by segment query results • Observations - • Improved User Experience • Reduce load on historical nodes • Cache Hit rate upto 60%

Move to Fast Approximate Answers • Prefer fast approximate answers
vs slow exact ones • HyperLogLog sketches for unique counts • Approximate top-k • Approximate histograms

Compression : Reduce data size further • Paging out data
that isn’t required for queries saves cost • Memory is still critical for performance • Cost of decompressing data present in RAM << cost of paging data from Disk • On-the-ﬂy decompression is fast with recent algorithms (LZF, Snappy, LZ4)

Smarter distribution of data • Constantly rebalance to keep workload
uniform • Greedily rebalance based on cost heuristics • Avoid co-locating recent or overlapping data • Favor co-locating data for different customer • Distribute data likely to be queries together

Creation of Data Tiers • Not All Data is equally
important • Users really look only for recent data interactively using dashboard (3 Months) • Users run quarterly reports (Non-Interactive scripts)

Creation of Data Tiers • COLD - high disk to
cpu and disk to ram ratio for old data

Creation of Data Tiers • COLD - high disk to
cpu and disk to ram ratio for old data • HOT - low disk to cpu and low disk to ram for recent data

Creation of Query Tiers Broker • Long Running Cold Queries
take up all the resources on the Broker affecting HOT queries

Creation of Query Tiers Broker • Separate broker nodes for
long and short running queries • Prioritize shorter queries Broker

Cost Effective Cross-Tier Replication • IT’s OK to be SLOW
sometimes (during failures) • Replication can become expensive • Availability is IMPORTANT • Trade-off performance for cost during failures • Move replica to COLD tier • Keep a single replica in HOT tier

Rolling Upgrades 1 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2 2 2 3 • Data Redundancy - Segment Replication across nodes • Shared Nothing Architecture • Maintain backwards compatibility • Allow upgrading components independently • Easy to run experiments • NO Downtime

Multi-tenancy is hard • 20% of customers take 80% of
resources • Bounded Resources • Keep units of computation small • Constantly yield resources • Query Prioritization • Query Cancellation • Query Timeouts • Query Rate Limiting

Druid as Data Platform Druid Approximate Algorithms (HyperLogLog, Histogram, Data
Visualizations (Panoramix, Graphana, Pivot) Machine Learning (SciPy, R, ScalaNLP) Streaming Ingestion (Storm, Samza, Spark-Streaming) Batch Ingestion (Hadoop, Spark)

Take Aways • Pick the right tool • Pick the
tool optimized for the type of queries you will make • If none of the existing tools solve your problem, build it. • Understand your USERS • Analyze query patterns • Use cases should deﬁne the product • Tradeoffs are everywhere • Performance vs Cost (in-memory, tiering, compression) • Latency vs throughput (streaming vs batch ingestion) • Monitor everything

Acknowledgements • Xavier Leaute • Fangjin Yang • Gian Merlino

Lessons Learned in Building an analytics stack

Lessons Learned in Building an analytics stack

More Decks by Nishant

Other Decks in Technology

Featured

Transcript