Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performance Metrics and Cassandra

Tyler Hobbs
November 29, 2012

Performance Metrics and Cassandra

This details why Cassandra is a great fit for storing performance metrics. It briefly describes a suggested schema and gives advice on aggregating metrics.

This talk was presented for the Big Data Benchmarking Community (BDBC) on November 29th, 2012: http://clds.ucsd.edu/bdbc/community

Tyler Hobbs

November 29, 2012
Tweet

Other Decks in Programming

Transcript

  1. ©2012 DataStax 3 • Random writes, random reads • Write

    amplification on SSDs • Lock contention • Availability • Difficulty of horizontal scaling RDBMS Issues
  2. ©2012 DataStax 5 • Same data model as Cassandra •

    Master/Slave vs Fully Distributed OpenTSDB
  3. ©2012 DataStax 6 • Open Source • Fully Distributed •

    Non-Relational • Log-structured Merge-Tree About Apache Cassandra
  4. ©2012 DataStax 7 • Sequential writes • Mostly sequential reads

    • Supports high parallelism • Partitions data automatically, no distributed joins • Block-based compression Cassandra does Time-Series Data Very Well
  5. ©2012 DataStax 8 • ~30k writes/sec per node • Linear

    scalability • (see Netflix's 1,000,000 writes/sec benchmark) Cassandra does Time-Series Data Very Well
  6. ©2012 DataStax 9 Cassandra Schema CREATE TABLE metrics ( metric_id

    text, time timestamp, value float, PRIMARY KEY (metric_id, time) ) WITH CLUSTERING ORDER BY (time DESC);
  7. ©2012 DataStax 10 Cassandra Schema CREATE TABLE metrics ( metric_id

    text, time timestamp, value float, PRIMARY KEY (metric_id, time) ) WITH CLUSTERING ORDER BY (time DESC); Partition Key Clustering Key
  8. ©2012 DataStax 12 Cassandra Schema SELECT time, value FROM metrics

    WHERE metric_id = 'node12­load' AND time > '2012­11­28';
  9. ©2012 DataStax 14 • Entirely Optional • Good for reducing

    total volume of data stored Write-Time Aggregation
  10. ©2012 DataStax 15 • Primarily useful for rolling up a

    single metric at different time granularities • 1 min avg, 3 hour max, etc • Use a strategy similar to RRDTool in memory Write-Time Aggregation
  11. ©2012 DataStax 16 • Complex analysis of data • Read

    individual metrics, combine client-side • Potentially store the results Read-Time Aggregation
  12. ©2012 DataStax 17 • Simple API, few hidden costs •

    Primarily benchmarking reads • Writes always have the same cost • Reads depend on data model quality, caching, and disk seek times Benchmarking Cassandra
  13. ©2012 DataStax 18 • Primary benchmarking mistake? • Not enough

    client-side threads/processes. Benchmarking Cassandra