Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Time-Series Metrics with Cassandra

Time-Series Metrics with Cassandra

Librato's Metrics platform relies on Cassandra as its sole data storage platform for time-series data. This session will discuss how we have scaled from a single six node Cassandra ring two years ago to the multiple storage rings that handle all time-series measurement data today.

633860a65241da93cb5a38a46741804a?s=128

Mike Heffner

June 11, 2013
Tweet

Transcript

  1. #CASSANDRA13 Time-Series Metrics with Cassandra Mike Heffner

  2. #CASSANDRA13 What we do.

  3. #CASSANDRA13 October 2011 • Decision: All measurements in Cassandra •

    Single EC2 Ring: 6 * m1.large • Cassandra 0.8.x • How does this work?
  4. #CASSANDRA13 Today • Multiple sharded rings • EC2: m1.xlarge and

    m2.4xlarge • Cassandra 1.1.x • Read load: < 1%
  5. #CASSANDRA13 Talk Highlights • Adapting Schema to Storage • Optimally

    Expiring Data • Monitor Everything
  6. #CASSANDRA13 Adapting Schema to Storage

  7. #CASSANDRA13 What is a Measurement? ( Metric ID, Source )

    (X, Y) => (Epoch Timestamp, Value)
  8. #CASSANDRA13 Measurement CF

  9. #CASSANDRA13 Locating Measurement Rows Maximum row size math: • 1

    minute records • 1 week TTL • 7 days * 24 hours * 60 minutes => ~10k • 4 Longs * 8 bytes * 10k => ~320KB (not bad)
  10. #CASSANDRA13 We have a problem

  11. #CASSANDRA13 Examining CF SSTables Metrics/metric_id_epochs_60 histograms Offset SSTables 1 28821

    2 58859 3 201198 4 178326 5 223016 6 154952 7 83289 8 21552 10 81104 1 2 3 4 5 6 7 8 10 nodetool cfhistograms Metrics metric_id_epochs_60
  12. #CASSANDRA13 Storage Over Time

  13. #CASSANDRA13

  14. #CASSANDRA13

  15. #CASSANDRA13 Rotating the Rows mget(Rows: [12, EBase_30], [12, EBase_40], Columns:

    {31->45}) Retrieve Time Bases for Times 31->45 for metric ID 12:
  16. #CASSANDRA13 Examining CF SSTables Metrics/metric_id_epochs_60 Offset SSTables 1 28821 2

    58859 3 201198 4 178326 5 223016 6 154952 7 83289 8 21552 10 81104 1 2 3 4 5 6 7 8 10 nodetool cfhistograms Metrics metric_id_epochs_60 Metrics/metric_id_epochs_60 Offset SSTables 1 3491820 2 5389762 3 4095760 4 1310741 5 9976 1 2 3 4 5 6 7 8 9 10 Before After
  17. #CASSANDRA13 /graph me

  18. #CASSANDRA13 Optimally Expiring Data

  19. #CASSANDRA13 TTL Expiration • Churn of about 750GB / day

    • 12 TB total • 6% of data set • gc_grace = 0 • STC
  20. #CASSANDRA13 Synchronized Compactions

  21. #CASSANDRA13 nodetool compact

  22. #CASSANDRA13 * http://hight3ch.com/garbage-truck-crushing-a-car/

  23. #CASSANDRA13 nodetool cleanup

  24. #CASSANDRA13 Cleanup • Not just for topology changes • Tombstoned

    rows (not referenced) • Rotated row keys decrease references • Cons: Must process every sstable.
  25. #CASSANDRA13 Immutable SStables

  26. #CASSANDRA13 Leverage SStable Mod Time • If now – mtime

    > TTL => all data is expired • We can quickly eliminate entire sstables: find -mtime +<TTL> -name *.db | xargs rm • Fast and low overhead • Cons: Rolling restart 26G 2013-05-17 09:44 Metrics-metrics_60-hf-7209-Data.db
  27. #CASSANDRA13 nodetool setcompactionthreshold

  28. #CASSANDRA13 Increasing minor compactions • By default, STC requires a

    minimum of 4 ssts • Leads to large non-compacted sstables • Dropping to 2 can flatten the storage growth nodetool setcompactionthreshold <ks> <cf> 2 • Cons: CPU/IO increase
  29. #CASSANDRA13 Result

  30. #CASSANDRA13 New in 1.2

  31. #CASSANDRA13 1.2 • Off-heap memory • TTL Histograms

  32. #CASSANDRA13 Effective Monitoring

  33. #CASSANDRA13 Ring Dashboards

  34. #CASSANDRA13 Disk Errors => Throw Away • If you ever

    see this, replace! end_request: I/O error, dev xvdb, sector 467940617 end_request: I/O error, dev xvdb, sector 467940617 • Mark node down, bootstrap new • No metric for this?
  35. #CASSANDRA13 Cassandra Log Volume • Count log lines seen every

    10 minutes • Track over time • Can identify: – Unbalanced workloads – Schema disagreements – Phantom gossip nodes – GC activity • grep -v '.java' => exceptions
  36. #CASSANDRA13 Q & A Mike Heffner /mheffner /mheffner We're Hiring!