Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Time-Series Metrics with Cassandra

Time-Series Metrics with Cassandra

Librato's Metrics platform relies on Cassandra as its sole data storage platform for time-series data. This session will discuss how we have scaled from a single six node Cassandra ring two years ago to the multiple storage rings that handle all time-series measurement data today.

Mike Heffner

June 11, 2013
Tweet

Other Decks in Technology

Transcript

  1. #CASSANDRA13 October 2011 • Decision: All measurements in Cassandra •

    Single EC2 Ring: 6 * m1.large • Cassandra 0.8.x • How does this work?
  2. #CASSANDRA13 Today • Multiple sharded rings • EC2: m1.xlarge and

    m2.4xlarge • Cassandra 1.1.x • Read load: < 1%
  3. #CASSANDRA13 Locating Measurement Rows Maximum row size math: • 1

    minute records • 1 week TTL • 7 days * 24 hours * 60 minutes => ~10k • 4 Longs * 8 bytes * 10k => ~320KB (not bad)
  4. #CASSANDRA13 Examining CF SSTables Metrics/metric_id_epochs_60 histograms Offset SSTables 1 28821

    2 58859 3 201198 4 178326 5 223016 6 154952 7 83289 8 21552 10 81104 1 2 3 4 5 6 7 8 10 nodetool cfhistograms Metrics metric_id_epochs_60
  5. #CASSANDRA13 Rotating the Rows mget(Rows: [12, EBase_30], [12, EBase_40], Columns:

    {31->45}) Retrieve Time Bases for Times 31->45 for metric ID 12:
  6. #CASSANDRA13 Examining CF SSTables Metrics/metric_id_epochs_60 Offset SSTables 1 28821 2

    58859 3 201198 4 178326 5 223016 6 154952 7 83289 8 21552 10 81104 1 2 3 4 5 6 7 8 10 nodetool cfhistograms Metrics metric_id_epochs_60 Metrics/metric_id_epochs_60 Offset SSTables 1 3491820 2 5389762 3 4095760 4 1310741 5 9976 1 2 3 4 5 6 7 8 9 10 Before After
  7. #CASSANDRA13 TTL Expiration • Churn of about 750GB / day

    • 12 TB total • 6% of data set • gc_grace = 0 • STC
  8. #CASSANDRA13 Cleanup • Not just for topology changes • Tombstoned

    rows (not referenced) • Rotated row keys decrease references • Cons: Must process every sstable.
  9. #CASSANDRA13 Leverage SStable Mod Time • If now – mtime

    > TTL => all data is expired • We can quickly eliminate entire sstables: find -mtime +<TTL> -name *.db | xargs rm • Fast and low overhead • Cons: Rolling restart 26G 2013-05-17 09:44 Metrics-metrics_60-hf-7209-Data.db
  10. #CASSANDRA13 Increasing minor compactions • By default, STC requires a

    minimum of 4 ssts • Leads to large non-compacted sstables • Dropping to 2 can flatten the storage growth nodetool setcompactionthreshold <ks> <cf> 2 • Cons: CPU/IO increase
  11. #CASSANDRA13 Disk Errors => Throw Away • If you ever

    see this, replace! end_request: I/O error, dev xvdb, sector 467940617 end_request: I/O error, dev xvdb, sector 467940617 • Mark node down, bootstrap new • No metric for this?
  12. #CASSANDRA13 Cassandra Log Volume • Count log lines seen every

    10 minutes • Track over time • Can identify: – Unbalanced workloads – Schema disagreements – Phantom gossip nodes – GC activity • grep -v '.java' => exceptions