Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monitorama Cyanite Workshop

Monitorama Cyanite Workshop

Pierre-Yves Ritschard

June 17, 2015

More Decks by Pierre-Yves Ritschard

Other Decks in Technology


  1. FROM THE SITE Graphite does two things: 1. Store numeric

    time-series data 2. Render graphs of this data on demand http://graphite.readthedocs.org
  2. SCOPE A metrics tool Not a complete monitoring solution Interacts

    with metric submission tools Optional event storage
  3. WHISPER RRD like storage library Written in python Each file

    contains different roll-up periods and an aggregation method
  4. CARBON Asynchronous (twisted) TCP and UDP service to input time-

    series data Simple storage rules Split across several daemons
  5. GRAPHITE-WEB Simple Django-Based HTTP api Persists configuration to SQL Data

    query and manipulation through a very simple DSL Graph rendering Composer client interface to build graphs # # s u m C P U v a l u e s s u m S e r i e s ( " c o l l e c t d . w e b 0 1 . c p u - * " ) # # p r o v i d e m e m o r y p e r c e n t a g e a l i a s ( a s P e r c e n t ( w e b 0 1 . m e m . u s e d , s u m S e r i e s ( w e b 0 1 . m e m . * ) ) , " m e m p e r c e n t " )
  6. MODULARITY IN GRAPHITE Recently improved A module can implement a

    storage strategy for graphite-web Carbon modularity is a bit harder
  7. STATSD Very popular metric service to integrate within applications. Aggregates

    events in n second windows Ships off to graphite s t a t s d . i n c r e m e n t ' s e s s i o n . o p e n ' s t a t s d . g a u g e ' s e s s i o n . a c t i v e ' , 3 7 0 s t a t s d . t i m i n g ' p d f . c o n v e r t ' , 3 2 0
  8. COLLECTD Very popular collection daemon with a graphite destination Every

    conceivable system metrics A wealth of additional metric sources (such as a fast statsd server) < p l u g i n w r i t e _ g r a p h i t e > < c a r b o n > H o s t " g r a p h i t e - h o s t " < / c a r b o n > < / p l u g i n >
  9. GRAFANA Quickly becoming the default graphite visualization front-end Inspired by

    the kibana project for logstash Optional persistence to elasticsearch for configuration
  10. RIEMANN Distributed system monitoring solution ( d e f g

    r a p h ! ( g r a p h i t e { : h o s t " g r a p h i t e - s e r v e r " } ) ) ( s t r e a m s ( w h e r e ( s e r v i c e " h t t p . 4 0 4 " ) ( r a t e 5 g r a p h ! ) ) )
  11. ESSENTIALY A SINGLE-HOST SOLUTION Built in a day where cacti

    reigned Innovative project at the time which decoupled collection from storage and display
  12. THE WHISPER FILE FORMAT One file per data point Optimized

    for space, not speed Plenty of seeks Only shared storage option is NFS… In many ways can be seen as RRD in python
  13. SCALING STRATEGIES Tacked on after the fact The decoupled architecture

    means that both graphite-web and carbon need upfront knowledge on the locations of shard
  14. IT GETS A BIT HAIRY Cluster topology must be stored

    on all nodes Manual replication mechanism (through carbon-relay) Changing cluster topology means re-assigning shards by hand
  15. WHAT GRAPHITE WOULD NEED Automatic shard assignment Replication Easy management

    Easy cluster topology changes (horizontal scalability)
  16. A CASSANDRA-BACKED CARBON REPLACEMENT Written in clojure Async I/O &

    Threads No more whisper files Horizontally scalable (stateless!) Interfaced with graphite-web through graphite-cyanite
  17. CYANITE DUTIES Providing graphite-compatible input methods (carbon listeners) Providing a

    way to retrieve metric names and metric time- series A metric-store A path-store Both pluggable The rest is up to the graphite eco-system, through graphite- cyanite The recommended companion is graphite-api
  18. GETTING UP AND RUNNING A simple configuration file c a

    r b o n : h o s t : " 1 2 7 . 0 . 0 . 1 " p o r t : 2 0 0 3 r e a d t i m e o u t : 3 0 r o l l u p s : - p e r i o d : 6 0 4 8 0 r o l l u p : 1 0 - p e r i o d : 1 0 5 1 2 0 r o l l u p : 6 0 0 h t t p : h o s t : " 0 . 0 . 0 . 0 " p o r t : 8 0 8 0 l o g g i n g : l e v e l : i n f o f i l e s : - " / v a r / l o g / c y a n i t e / c y a n i t e . l o g " s t o r e : c l u s t e r : ' l o c a l h o s t ' k e y s p a c e : ' m e t r i c '

    e x : u s e : " i o . c y a n i t e . e s _ p a t h / e s - r e s t " i n d e x : " c y a n i t e _ p a t h s " u r l : " h t t p : / / s e a r c h . i n t e r n a l . e x a m p l e . c o m : 9 2 0 0 "
  20. GRAPHITE-CYANITE with graphite-web: S T O R A G E

    _ F I N D E R S = ( ' c y a n i t e . C y a n i t e F i n d e r ' , ) C Y A N I T E _ U R L S = ( ' h t t p : / / h o s t : p o r t ' , ) with graphite-api: c y a n i t e : u r l s : - h t t p : / / c y a n i t e - h o s t : p o r t f i n d e r s : - c y a n i t e . C y a n i t e F i n d e r
  21. LEADING ARCHITECTURE DRIVERS Simplicity Optimize for speed As few moving

    parts as possible Multi-tenancy Resource efficiency Remain compatible with the graphite ecosystem
  22. CASSANDRA Good at high write to read ratio workload No

    manual shard allocation or reassignment Wide columns

    B L E " m e t r i c " ( t e n a n t t e x t , p e r i o d i n t , r o l l u p i n t , p a t h t e x t , t i m e b i g i n t , d a t a l i s t < d o u b l e > , P R I M A R Y K E Y ( ( t e n a n t , p e r i o d , r o l l u p , p a t h ) , t i m e ) )
  24. WIDE COLUMNS Each row has a key, called the partitioning

    key Here a composite of t e n a n t , p e r i o d , r o l l u p and p a t h Each row has an arbitrary number of columns (not homogeneous) Columns are sorted by a clustering key Here, the timestamp Columns may have TTLs
  25. WORK IN PROGRESS ITEMS DSL support Pickle support Path storage

    Event storage Input methods Integrations Docs
  26. DSL SUPPORT Still on-going. Parser is finished. Multi-method based implementation

    ( d e f m e t h o d a p p l y - t r a n s f o r m : a b s o l u t e [ _ s e r i e s ] ( m a p - v a l u e s ( f n [ p o i n t ] ( M a t h / a b s p o i n t ) ) s e r i e s ) )
  27. PICKLE SUPPORT Pickle is a painful protocol 100% clojure implementation:

    Input is ready but not fully integrated. This is the easy way in for cyanite in your infra Via carbon-relay https://github.com/pyr/pickler
  28. PATH STORAGE Sub-optimum at the moment Having to go to

    ES is sad Leveraging user-provided secondary indexes in Cassandra would be great Won't work out of the box
  29. EVENT STORAGE Unplanned at the moment. Should it really be

    the graphite ecosystem's responsibility?
  30. ALTERNATIVE INPUT METHODS Support queue input of metrics Collectd &

    Logstash already supports shipping graphite data to Apache Kafka & RabbitMQ. Support the statsd protocol directly.
  31. STANDARD BATCH OPERATION RECIPES Compactions of rolled up series Dynamic

    thresholds Great opportunity to leverage the cassandra & spark interaction
  32. Breaking news, building a scalable OSS TSDB is not as

    easy as bolting on NoSQL. 5:48 AM ­ 31 May 2015 24 20 Jason Dixon ​ @obfuscurity Follow A WORD OF WARNING
  33. WHAT YOU GET Trading off the complexity of dealing with

    whisper for the complexity of dealing with cassandra (and optionaly ES).
  34. HOW WE USE IT Not 100% dogfood Still some metrics

    in carbon/whisper. Gradually moving all Input to happen through Kafka. Cyanite used for lookup.
  35. SCHEMA HURDLES Partitioning for collection intervals < 10s. Cassandra CQL

    collections types have significant overhead. Cells should be more compact. It is a trade-off to avoid read-then-write. Kafka helps solve this elegantly. It's a big requirement list for "just" metrics though.
  36. MAINTENANCE (CYANITE) Prune old metrics with cyanite-utils: Whisper to Cyanite

    conversion: https://github.com/WrathOfChris/cyanite-utils https://gist.github.com/deniszh/7986974
  37. MAINTENANCE (CASSANDRA) The usual applies Schedule regular repairs of your

    clusters Follow releases Best supported version: 2.1.x Use D a t e T i e r e d C o m p a c t i o n S t r a t e g y
  38. SCALING Cyanite is stateless Colocate cassandra and cyanite daemons Split

    Data/Proxy nodes for huge deployments Haproxy to distribute queries
  39. OVERALL SENTIMENT A few pending things to go in Not

    (yet) for the faint of heart Gets the job done, better maintenance story especially if used to Cassandra & ES.
  40. THANKS ! Cyanite owes a lot to: Max Penet (@mpenet)

    for the great alia & jet library Bruno Renie (@brutasse) for graphite-api, graphite-cyanite and the initial nudge Datastax for the awesome cassandra java-driver Its contributors Apache Cassandra obviously @pyr