Performance Metrics and Cassandra

©2012 DataStax 2 • Problems with alternatives • Particular advantages
of Cassandra Why store metrics in Cassandra?

©2012 DataStax 3 • Random writes, random reads • Write
amplification on SSDs • Lock contention • Availability • Difficulty of horizontal scaling RDBMS Issues

©2012 DataStax 4 • Same problems as RDBMS, lower throughput
RRDTool

©2012 DataStax 5 • Same data model as Cassandra •
Master/Slave vs Fully Distributed OpenTSDB

©2012 DataStax 6 • Open Source • Fully Distributed •
Non-Relational • Log-structured Merge-Tree About Apache Cassandra

©2012 DataStax 7 • Sequential writes • Mostly sequential reads
• Supports high parallelism • Partitions data automatically, no distributed joins • Block-based compression Cassandra does Time-Series Data Very Well

©2012 DataStax 8 • ~30k writes/sec per node • Linear
scalability • (see Netflix's 1,000,000 writes/sec benchmark) Cassandra does Time-Series Data Very Well

©2012 DataStax 9 Cassandra Schema CREATE TABLE metrics ( metric_id
text, time timestamp, value float, PRIMARY KEY (metric_id, time) ) WITH CLUSTERING ORDER BY (time DESC);

©2012 DataStax 10 Cassandra Schema CREATE TABLE metrics ( metric_id
text, time timestamp, value float, PRIMARY KEY (metric_id, time) ) WITH CLUSTERING ORDER BY (time DESC); Partition Key Clustering Key

©2012 DataStax 11 Cassandra Schema INSERT INTO metrics (metric_id, time,
value) VALUES ('node12load', now(), 1.2);

©2012 DataStax 12 Cassandra Schema SELECT time, value FROM metrics
WHERE metric_id = 'node12load' AND time > '20121128';

©2012 DataStax 15 • Primarily useful for rolling up a
single metric at different time granularities • 1 min avg, 3 hour max, etc • Use a strategy similar to RRDTool in memory Write-Time Aggregation

©2012 DataStax 16 • Complex analysis of data • Read
individual metrics, combine client-side • Potentially store the results Read-Time Aggregation

©2012 DataStax 17 • Simple API, few hidden costs •
Primarily benchmarking reads • Writes always have the same cost • Reads depend on data model quality, caching, and disk seek times Benchmarking Cassandra

Performance Metrics and Cassandra

Performance Metrics and Cassandra

Tyler Hobbs

Other Decks in Programming

Featured

Transcript

©2012 DataStax 1 Metrics and Cassandra Tyler Hobbs [email protected]

©2012 DataStax 2 • Problems with alternatives • Particular advantages

©2012 DataStax 3 • Random writes, random reads • Write

©2012 DataStax 4 • Same problems as RDBMS, lower throughput

©2012 DataStax 5 • Same data model as Cassandra •

©2012 DataStax 6 • Open Source • Fully Distributed •

©2012 DataStax 7 • Sequential writes • Mostly sequential reads

©2012 DataStax 8 • ~30k writes/sec per node • Linear

©2012 DataStax 9 Cassandra Schema CREATE TABLE metrics ( metric_id

©2012 DataStax 10 Cassandra Schema CREATE TABLE metrics ( metric_id

©2012 DataStax 11 Cassandra Schema INSERT INTO metrics (metric_id, time,

©2012 DataStax 12 Cassandra Schema SELECT time, value FROM metrics

©2012 DataStax 13 • Write Time • Read Time Metric

©2012 DataStax 14 • Entirely Optional • Good for reducing

©2012 DataStax 15 • Primarily useful for rolling up a

©2012 DataStax 16 • Complex analysis of data • Read

©2012 DataStax 17 • Simple API, few hidden costs •

©2012 DataStax 18 • Primary benchmarking mistake? • Not enough

©2012 DataStax 19 Questions? Tyler Hobbs [email protected]