Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Collecting Metrics with Snap - Open Telemetry Framework (SRECon 2017)

Collecting Metrics with Snap - Open Telemetry Framework (SRECon 2017)

We choose solutions to advance our infrastructure every day and it should probably be the same when it comes to collecting our monitoring metrics. Snap is the next generation open telemetry framework designed to simplify the collection, processing, and publishing of system data through a single API written in Go with over 60 different plugins supported.
This presentation was given at SRECon EMEA 2017.

Guy Fighel

August 31, 2017
Tweet

More Decks by Guy Fighel

Other Decks in Programming

Transcript

  1. Welcome to the Jungle • Many tools • Multiple different

    formats • Different collection intervals • Internal vs. external collection
  2. Challenges in Collection • Multiple metrics collection • Can we

    avoid customizations? • How can we scale the collection from a single instance? • Can we make it smart? (flexible decision in collection)
  3. Snap - Open Telemetry Framework Easily collect, process, and publish

    telemetry data at scale Define telemetry workflows and run them on a schedule Provide an open plugin model decoupling actions in the workflow from running the workflow Strong focus on exposing all state and commands with API Provide powerful clustered control of telemetry workflows across small or large clusters - TRIBE Support filtering and decoration (with TAGs)
  4. Applications and Services: Apache, Cassandra, CEPH, Etcd, HAProxy, InfluxDB, MySQL,

    NFS, RabbitMQ, … OpenStack: Nova, Cinder, Glance, Keystone, Neutron... Containers and VMs: Cgroups, Docker, Libvirt, Mesos, Perf events, Processes, … Hardware: SNMP, CPU, Disk, NIC, Intel NodeManager, Intel PCM, SMART, … Filter, alter or append metadata as many times as needed via plugins Filtering Anomaly Detection Statistics and Normalization Encryption for all or part of the data set Injection of remote requires for tokens Comprehensive Plugins Support Publish many times as needed Dashboard Tools: Grafana, Graphite, Riemann... Queues and Logs: RabbitMQ, SQS. Kafka, File... Databases: PostgresSQL, InfluxDB, OpenTSDB, MySQL, HANA, Etcd, KairosDB...
  5. Flexible Scheduling A task describes the how, what, and when

    to do for a Snap job. Collect telemetry data from different systems and sensors Any time intervals (all the time, on-demand, within a window, or on a cron schedule.)
  6. Dynamic Control and Lifecycle Loading, updating, and unloading plugins without

    restarting Snap or extra configuration management Plugin load: Dynamic, does not require restart Plugin Unload: Removes metrics from catalog automatically Plugin Swap: Swaps a newer version plugin for an old one in a safe transaction
  7. Nagios Plugin Collector Collects state metrics from Nagios installation by

    monitoring the Nagios status.dat file github.com/SignifAi/snap-plugin-collector-nagios
  8. Scaling with Tribe Tribe allows you to cluster a group

    of snap nodes into a “tribe”. Tribe “agreement” can run the same plugin or set of tasks. $ snaptel agreement create all-nodes $ snaptel agreement join all-nodes `hostname` $ snapteld --tribe -t 0 --tribe-port 6001 --api-port 8182 --tribe-node-name secondnodename --tribe-seed 192.168.136.176:6000 --control-listen-port 8083
  9. Main Advantages Written in Go - highly maintainable and optimized

    Pipeline workflow, gRPC, PlugIns architecture All functionality exposed over REST APIs Fully decoupled architecture Dynamic Loading Clustering support with Tribe (Gossip masterless protocol) Flexible scheduling Secure communication by design (signed plugins, end to end encryption over TLS)
  10. Where do we go from here? Resources: http://snap-telemetry.io/ https://github.com/intelsdi-x/snap Documentation:

    https://github.com/intelsdi-x/snap/tree/master/docs PlugIns Catalog: https://github.com/intelsdi-x/snap/blob/master/docs/PLUGIN_CATALOG.md Getting Started with writing your own plugin: https://github.com/intelsdi-x/snap/blob/master/docs/PLUGIN_AUTHORING.md