Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performance Metrics and Cassandra

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Tyler Hobbs Tyler Hobbs
November 29, 2012

Performance Metrics and Cassandra

This details why Cassandra is a great fit for storing performance metrics. It briefly describes a suggested schema and gives advice on aggregating metrics.

This talk was presented for the Big Data Benchmarking Community (BDBC) on November 29th, 2012: http://clds.ucsd.edu/bdbc/community

Avatar for Tyler Hobbs

Tyler Hobbs

November 29, 2012
Tweet

Other Decks in Programming

Transcript

  1. ©2012 DataStax 3 • Random writes, random reads • Write

    amplification on SSDs • Lock contention • Availability • Difficulty of horizontal scaling RDBMS Issues
  2. ©2012 DataStax 5 • Same data model as Cassandra •

    Master/Slave vs Fully Distributed OpenTSDB
  3. ©2012 DataStax 6 • Open Source • Fully Distributed •

    Non-Relational • Log-structured Merge-Tree About Apache Cassandra
  4. ©2012 DataStax 7 • Sequential writes • Mostly sequential reads

    • Supports high parallelism • Partitions data automatically, no distributed joins • Block-based compression Cassandra does Time-Series Data Very Well
  5. ©2012 DataStax 8 • ~30k writes/sec per node • Linear

    scalability • (see Netflix's 1,000,000 writes/sec benchmark) Cassandra does Time-Series Data Very Well
  6. ©2012 DataStax 9 Cassandra Schema CREATE TABLE metrics ( metric_id

    text, time timestamp, value float, PRIMARY KEY (metric_id, time) ) WITH CLUSTERING ORDER BY (time DESC);
  7. ©2012 DataStax 10 Cassandra Schema CREATE TABLE metrics ( metric_id

    text, time timestamp, value float, PRIMARY KEY (metric_id, time) ) WITH CLUSTERING ORDER BY (time DESC); Partition Key Clustering Key
  8. ©2012 DataStax 12 Cassandra Schema SELECT time, value FROM metrics

    WHERE metric_id = 'node12­load' AND time > '2012­11­28';
  9. ©2012 DataStax 14 • Entirely Optional • Good for reducing

    total volume of data stored Write-Time Aggregation
  10. ©2012 DataStax 15 • Primarily useful for rolling up a

    single metric at different time granularities • 1 min avg, 3 hour max, etc • Use a strategy similar to RRDTool in memory Write-Time Aggregation
  11. ©2012 DataStax 16 • Complex analysis of data • Read

    individual metrics, combine client-side • Potentially store the results Read-Time Aggregation
  12. ©2012 DataStax 17 • Simple API, few hidden costs •

    Primarily benchmarking reads • Writes always have the same cost • Reads depend on data model quality, caching, and disk seek times Benchmarking Cassandra
  13. ©2012 DataStax 18 • Primary benchmarking mistake? • Not enough

    client-side threads/processes. Benchmarking Cassandra