Save 37% off PRO during our Black Friday Sale! »

Better graphite storage with cyanite

Better graphite storage with cyanite

From cassandra summit 2014. Overview of the graphite ecosystem and the cyanite alternative storage engine for graphite metrics.

2fcc875f98607b3007909fe4be99160d?s=128

Pierre-Yves Ritschard

September 11, 2014
Tweet

Transcript

  1. BETTER GRAPHITE STORAGE WITH CYANITE PIERRE-YVES RITSCHARD @PYR #CASSANDRASUMMIT 0

  2. @PYR CTO at exoscale, the safe home for your cloud

    applications Open source developer: pithos, cyanite, riemann, collectd… Recovering Operations Engineer
  3. AIM OF THIS TALK Presenting graphite and its ecosystem Presenting

    cyanite Show-casing simplicity through cassandra
  4. OUTLINE Graphite overview The problem with graphite Cyanite solutions &

    internals Looking forward
  5. GRAPHITE OVERVIEW

  6. FROM THE SITE Graphite does two things: 1. Store numeric

    time-series data 2. Render graphs of this data on demand http://graphite.readthedocs.org
  7. SCOPE A metrics tool Not a complete monitoring solution Interacts

    with metric submission tools
  8. WHY ARE METRICS IMPORTANT Outside the scope of this talk

    Narrowing the gap between map and territory
  9. GRAPHITE COMPONENTS whisper carbon graphite-web

  10. WHISPER RRD like storage library Written in python Each file

    contains different roll-up periods and an aggregation method
  11. CARBON Asynchronous (twisted) TCP and UDP service to input time-

    series data Simple storage rules Split across several daemons
  12. CARBON-CACHE Main carbon daemon Temporarily caches values to RAM Writes

    out to whisper
  13. CARBON-AGGREGATOR Aggregates data and forwards to carbon-cache Less I/O strain

    on the filesystem At the expense of resolution
  14. CARBON-RELAY Provides sharding and replication Forwards to appropriate carbon-cache processes

    based on a provided hashing method
  15. GRAPHITE-WEB Simple Django-Based HTTP api Persists configuration to SQL Data

    query and manipulation through a very simple DSL Graph rendering Composer client interface to build graphs # # s u m C P U v a l u e s s u m S e r i e s ( " c o l l e c t d . w e b 0 1 . c p u - * " ) # # p r o v i d e m e m o r y p e r c e n t a g e a l i a s ( a s P e r c e n t ( w e b 0 1 . m e m . u s e d , s u m S e r i e s ( w e b 0 1 . m e m . * ) ) , " m e m p e r c e n t " )
  16. SCREENSHOTS

  17. SCREENSHOTS

  18. ARCHITECTURE OVERVIEW

  19. MODULARITY IN GRAPHITE Recently improved A module can implement a

    storage strategy for graphite-web Carbon modularity is a bit harder
  20. THE GRAPHITE ECOSYSTEM A wealth of tools are now graphite

    compatible
  21. STATSD Very popular metric service to integrate within applications. Aggregates

    events in n second windows Ships off to graphite s t a t s d . i n c r e m e n t ' s e s s i o n . o p e n ' s t a t s d . g a u g e ' s e s s i o n . a c t i v e ' , 3 7 0 s t a t s d . t i m i n g ' p d f . c o n v e r t ' , 3 2 0
  22. COLLECTD Very popular collection daemon with a graphite destination Every

    conceivable system metrics A wealth of additional metric sources (such as a fast statsd server) < p l u g i n w r i t e _ g r a p h i t e > < c a r b o n > H o s t " g r a p h i t e - h o s t " < / c a r b o n > < / p l u g i n >
  23. GRAPHITE-API Alternative to graphite-web Shares data manipulation code No persistence

    of configuration
  24. GRAFANA Increasingly popular alternative to graphite-web, with graphite-api Inspired by

    the kibana project for logstash Optional persistence to elasticsearch for configuration
  25. RIEMANN Distributed system monitoring solution ( d e f g

    r a p h ! ( g r a p h i t e { : h o s t " g r a p h i t e - s e r v e r " } ) ) ( s t r e a m s ( w h e r e ( s e r v i c e " h t t p . 4 0 4 " ) ( r a t e 5 g r a p h ! ) ) )
  26. AND A LOT MORE syslog-ng logstash descartes tasseo jmxtrans

  27. HIGH VALUE PROJECT Active and friendly developer community Growing ecosystem

    Very few contenders
  28. THE PROBLEM WITH GRAPHITE

  29. ESSENTIALY A SINGLE-HOST SOLUTION Built in a day where cacti

    reigned Innovative project at the time which decoupled collection from storage and display
  30. THE WHISPER FILE FORMAT One file per data point Optimized

    for space, not speed Plenty of seeks Only shared storage option is NFS… In many ways can be seen as RRD in python
  31. SCALING STRATEGIES Tacked on after the fact The decoupled architecture

    means that both graphite-web and carbon need upfront knowledge on the locations of shard
  32. SCALING OVERVIEW

  33. IT GETS A BIT HAIRY Cluster topology must be stored

    on all nodes Manual replication mechanism (through carbon-relay) Changing cluster topology means re-assigning shards by hand
  34. WHAT GRAPHITE CAN KEEP Persistence of configuration Local data manipulation

  35. WHAT GRAPHITE WOULD NEED Automatic shard assignment Replication Easy management

    Easy cluster topology changes (horizontal scalability)
  36. THE CYANITE APPROACH Leveraging Apache Cassandra to store time-series Leveraging

    Graphite for the interface
  37. A CASSANDRA-BACKED CARBON REPLACEMENT Written in clojure Async I/O No

    more whisper files Fast storage Horizontally scalable Interfaced with graphite-web through graphite-cyanite
  38. CYANITE DUTIES Providing graphite-compatible input methods (carbon listeners) Providing a

    way to retrieve metric names and metric time- series Implemented as two protocols A metric-store A path-store The rest is up to the graphite eco-system, through graphite- cyanite The recommended companion is graphite-api
  39. GETTING UP AND RUNNING A simple configuration file c a

    r b o n : h o s t : " 1 2 7 . 0 . 0 . 1 " p o r t : 2 0 0 3 r e a d t i m e o u t : 3 0 r o l l u p s : - p e r i o d : 6 0 4 8 0 r o l l u p : 1 0 - p e r i o d : 1 0 5 1 2 0 r o l l u p : 6 0 0 h t t p : h o s t : " 0 . 0 . 0 . 0 " p o r t : 8 0 8 0 l o g g i n g : l e v e l : i n f o f i l e s : - " / v a r / l o g / c y a n i t e / c y a n i t e . l o g " s t o r e : c l u s t e r : ' l o c a l h o s t ' k e y s p a c e : ' m e t r i c '
  40. GRAPHITE-CYANITE with graphite-web: S T O R A G E

    _ F I N D E R S = ( ' c y a n i t e . C y a n i t e F i n d e r ' , ) C Y A N I T E _ U R L S = ( ' h t t p : / / h o s t : p o r t ' , ) with graphite-api: c y a n i t e : u r l s : - h t t p : / / c y a n i t e - h o s t : p o r t f i n d e r s : - c y a n i t e . C y a n i t e F i n d e r
  41. LEADING ARCHITECTURE DRIVERS Simplicity Optimize for speed As few moving

    parts as possible Multi-tenancy Resource efficiency Remain compatible with the graphite ecosystem
  42. CYANITE INTERNALS

  43. CASSANDRA IS GREAT FOR TIME-SERIES It bears repeating High write

    to read ratio workload No manual shard allocation or reassignment Sorted wide columns mean efficient retrieval of data
  44. A NEW STACK

  45. SIMPLE SCHEMA C R E A T E T A

    B L E " m e t r i c " ( t e n a n t t e x t , p e r i o d i n t , r o l l u p i n t , p a t h t e x t , t i m e b i g i n t , d a t a l i s t < d o u b l e > , P R I M A R Y K E Y ( ( t e n a n t , p e r i o d , r o l l u p , p a t h ) , t i m e ) )
  46. TAKING ADVANTAGE OF WIDE COLUMNS

  47. LOOKING FORWARD

  48. REPLACING MORE GRAPHITE PARTS, EXTENDING FUNCTIONALITY Implement graphite's data manipulation

    functions Remove the need for graphite-api or graphite-web when using grafana Finish providing multi-tenancy options
  49. PICKLE SUPPORT Easier integration in existing architectures Would allow integration

    with carbon-relay
  50. ALTERNATIVE INPUT METHODS Support queue input of metrics Collectd already

    supports shipping graphite data to Apache Kafka Support the statsd protocol directly
  51. PROVIDE A CYANITE LIBRARY Easy, standard-compliant storage from JVM based

    applications
  52. BATCH OPERATIONS Compactions of rolled up series Dynamic thresholds Great

    opportunity to leverage the cassandra & spark interaction
  53. A FEW TAKE-AWAYS Cassandra enabled a quick-win in about 1100

    lines of clojure Greatly simplified scaling strategy Building block for a lot more Good way to reduce technology creep if you're already using cassandra
  54. THANKS ! Cyanite owes a lot to: Max Penet (@mpenet)

    for the great alia library Bruno Renie (@brutasse) for graphite-api, graphite-cyanite and the initial nudge Datastax for the awesome cassandra java-driver Its contributors Apache Cassandra obviously @pyr – #CassandraSummit