$30 off During Our Annual Pro Sale. View Details »

The Telegraf Toolbelt (InfluxDays SF, 2019)

David McKay
October 02, 2019

The Telegraf Toolbelt (InfluxDays SF, 2019)

Telegraf is an agent for collecting, processing, aggregating, and writing metrics. With over 200 plugins, Telegraf can fetch metrics from a variety of sources, allowing you to build aggregations and write those metrics to InfluxDB, Prometheus, Kafka, and more.

In this talk, we will take a look at some of the lesser known, but awesome, plugins that are often overlooked; as well as how to use Telegraf for monitoring of Cloud Native systems.

David McKay

October 02, 2019
Tweet

More Decks by David McKay

Other Decks in Technology

Transcript

  1. The Telegraf Toolbelt
    It can do that, really?

    View Slide

  2. David McKay
    Developer Advocate
    at InfluxData
    @rawkode
    Scottish
    Has 9 Pets
    Esoteric Programming Languages
    ☸ Kubernetes Release Team

    View Slide

  3. The Plugin Ecosystem
    Establishing a Base Line

    View Slide

  4. © 2019 InfluxData. All rights reserved. 4
    Over 200 Plugins
    ➔ 169 Inputs
    ➔ 35 Outputs
    ➔ 15 Processors
    ➔ 14 Parsers
    ➔ 9 Serializers
    ➔ 8 Aggregators

    View Slide

  5. © 2019 InfluxData. All rights reserved. 5
    V1.10 (May)
    ➔ Inputs
    ◆ Google Cloud PubSub
    ◆ Kinesis Consumer
    ◆ Kube Inventory
    ◆ Neptune Apex
    ◆ Nginx Upstream Checks
    ◆ Multifile
    ◆ Stack Driver
    ➔ Outputs
    ◆ Google Cloud PubSub
    ➔ Serializers
    ◆ Nowmetric
    ◆ Carbon2

    View Slide

  6. © 2019 InfluxData. All rights reserved. 6
    V1.11 (June)
    ➔ Inputs
    ◆ bind9
    ◆ Cisco GNMI
    ◆ Cisco MDT
    ◆ ECS & Fargate
    ◆ GitHub
    ◆ OpenWeatherMap
    ◆ PowerDNS
    ➔ Outputs
    ◆ Health
    ◆ Syslog
    ➔ Serializers
    ◆ Wavefront
    ➔ Aggregators
    ◆ Final

    View Slide

  7. © 2019 InfluxData. All rights reserved. 7
    V1.12 (September)
    ➔ Inputs
    ◆ apcupsd
    ◆ Docker Logs
    ◆ Fireboard
    ◆ Logstash
    ◆ MarkLogic
    ◆ OpenNTPD
    ◆ uWSGI
    ➔ Outputs
    ◆ Exec
    ➔ Parsers
    ◆ Form
    ➔ Processors
    ◆ Date
    ◆ Pivot
    ◆ Unpivot
    ◆ Tag Limit

    View Slide

  8. © 2019 InfluxData. All rights reserved. 8

    View Slide

  9. There are 3249
    telegraf.conf files on
    GitHub

    View Slide

  10. © 2019 InfluxData. All rights reserved. 10

    View Slide

  11. I could only get 1000
    telegraf.conf files from
    GitHub

    View Slide

  12. I used a sample of 1000
    telegraf.conf files from
    GitHub

    View Slide

  13. Interval
    73% Use 10s (Default)
    5.6% use 1s
    4% use 5s
    2% use 1m
    1% use 30s

    View Slide

  14. Round
    Interval
    90% True

    View Slide

  15. Jitter 90% None

    View Slide

  16. Omit
    Hostname
    90% False

    View Slide

  17. © 2019 InfluxData. All rights reserved. 17
    Output Plugins
    ➔ 71% 1 Output
    ➔ 5% 2 Outputs
    ➔ 2% 0 Outputs
    ➔ .6% 3 Outputs

    View Slide

  18. © 2019 InfluxData. All rights reserved. 18
    Output Plugins
    ➔ 72% InfluxDB
    ➔ 5% File
    ➔ 2% Prometheus Client
    ➔ .9% Graphite
    ➔ .6% Kafka

    View Slide

  19. © 2019 InfluxData. All rights reserved. 19
    Input Plugins
    ➔ 17% 1 Input
    ➔ 12% 9 Inputs
    ➔ 10% 8 Inputs
    ➔ 5% 10 Inputs
    ➔ 5% 11 Inputs
    ➔ 5% 6 Inputs
    ➔ 5% 7 Inputs
    ➔ 1 56 Inputs ?!?!

    View Slide

  20. © 2019 InfluxData. All rights reserved. 20
    Input Plugins
    ➔ 58% CPU
    ➔ 53% Mem
    ➔ 52% Disk
    ➔ 51% System
    ➔ 47% DiskIO
    ➔ 47% Swap
    ➔ 40% Process
    ➔ 31% Kernel
    ➔ 28% Docker
    ➔ 23% Net

    View Slide

  21. 12% Invalid

    View Slide

  22. Dual Write to InfluxDB 1 and 2
    Multiple Outputs
    5.6%

    View Slide

  23. © 2019 InfluxData. All rights reserved. 23
    Multiple Outputs
    [[outputs.influxdb]]
    urls = ["http:/
    /influxdb:8086"]
    [[outputs.influxdb_v2]]
    urls = ["http:/
    /influxdb2:9999"]

    View Slide

  24. Fetching Configuration at Runtime
    v1.10
    Remote Configuration

    View Slide

  25. © 2019 InfluxData. All rights reserved. 25
    Remote Configuration
    telegraf --config

    View Slide

  26. © 2019 InfluxData. All rights reserved. 26
    Remote Configuration
    telegraf --config
    https://raw.githubusercontent.com
    /influxdata/telegraf/master/etc/telegraf.conf

    View Slide

  27. © 2019 InfluxData. All rights reserved. 27
    Remote Configuration
    SOME_VAR=abc123 telegraf --config

    View Slide

  28. © 2019 InfluxData. All rights reserved. 28
    Remote Configuration
    [agent]
    interval = “${INTERVAL}”
    [[outputs.influxdb_v2]]
    token = “${TOKEN}”

    View Slide

  29. Handling failures without losing metrics
    v0.10 (2015!)
    Output Resiliency

    View Slide

  30. © 2019 InfluxData. All rights reserved. 30
    Output Resiliency
    Use
    metric_buffer_limit
    to handle downtime of your outputs

    View Slide

  31. © 2019 InfluxData. All rights reserved. 31
    InfluxDB
    Telegraf
    Application
    Software
    Dependencies
    Bare Metal
    /
    VMs
    Network
    Buffer Batch

    View Slide

  32. © 2019 InfluxData. All rights reserved. 32
    Telegraf
    Application
    Software
    Dependencies
    Bare Metal
    /
    VMs
    Network
    Buffer Batch

    View Slide

  33. © 2019 InfluxData. All rights reserved. 33
    InfluxDB
    Telegraf
    Application
    Software
    Dependencies
    Bare Metal
    /
    VMs
    Network
    Buffer Batch

    View Slide

  34. © 2019 InfluxData. All rights reserved. 34
    Output Resiliency
    influxdb_listener
    http_listener
    v1.9 / 0.01% / It was me!

    View Slide

  35. © 2019 InfluxData. All rights reserved. 35
    influxdb_listener
    Allows Telegraf to serve as a proxy for the /write endpoint of the
    InfluxDB HTTP API.
    Output Resiliency

    View Slide

  36. influxdb_listener
    [[inputs.influxdb_listener]]
    service_address = ":8086"

    View Slide

  37. © 2019 InfluxData. All rights reserved. 37

    View Slide

  38. v1 Client Libraries

    View Slide

  39. © 2019 InfluxData. All rights reserved. 39
    http_listener_v2
    Allows Telegraf to accept metrics over HTTP in any supported
    format
    Output Resiliency

    View Slide

  40. http_listener_v2
    [[inputs.http_listener_v2]]
    service_address = ":8080"
    data_format = "json"

    View Slide

  41. JSON
    [[inputs.http_listener_v2]]
    data_format = "json"
    json_name_key = "name"
    tag_keys = [“go_version”]
    Docs

    View Slide

  42. Cloud Native Telegraf
    Bring Your Own Telegraf

    View Slide

  43. Plugins come at a cost

    View Slide

  44. ➜ docker image ls
    byot-sample 19.3MB
    telegraf 254MB

    View Slide

  45. © 2019 InfluxData. All rights reserved. 45
    Bring Your Own Telegraf
    FROM rawkode/telegraf:byo AS build
    FROM alpine:3.7 AS telegraf
    COPY --from=build /etc/telegraf /etc/telegraf
    COPY --from=build /binary /bin/telegraf
    ENTRYPOINT [ "/bin/telegraf" ]

    View Slide

  46. © 2019 InfluxData. All rights reserved. 46
    Bring Your Own Telegraf
    BYOT
    BYOT Example
    Click Me! Click Me!

    View Slide

  47. © 2019 InfluxData. All rights reserved. 47
    InfluxDB
    Telegraf
    Application
    Software
    Dependencies
    Bare Metal
    /
    VMs
    Network
    Buffer Batch

    View Slide

  48. © 2019 InfluxData. All rights reserved. 48
    InfluxDB
    Application
    Software
    Dependencies
    Bare Metal
    /
    VMs
    Network
    Telegraf
    Telegraf
    Telegraf
    Telegraf

    View Slide

  49. Internal
    Telegraf Exposes Metrics Too!
    v1.2 / 0.1%

    View Slide

  50. Internal
    ➔ Keep an eye on buffer_size
    ➔ Make sure you alert on metrics_dropped

    View Slide

  51. Sophisticated Health Checks
    Health Output
    v1.11 / 0%

    View Slide

  52. @app.route("/health")
    def healthcheck():
    return "OK”

    View Slide

  53. @app.route("/health")
    def healthcheck():
    return "OK”

    View Slide

  54. © 2019 InfluxData. All rights reserved. 54
    Health Output
    [[outputs.health]]
    service_address = "http://:5559"
    namepass = ["web-metrics"]
    [[outputs.health.compares]]
    field = "response_time"
    lt = 0.300

    View Slide

  55. Controlling Ingress
    Kafka In, Kafka Out

    View Slide

  56. © 2019 InfluxData. All rights reserved. 56
    InfluxDB
    Application
    Software
    Dependencies
    Bare Metal
    /
    VMs
    Network
    Telegraf
    Telegraf
    Telegraf
    Telegraf
    Kafka
    Telegraf
    Telegraf

    View Slide

  57. Demo

    View Slide

  58. We’ve covered less than
    10 of the 200 plugins

    View Slide

  59. @rawkode
    Thank you

    View Slide