Rambling about InfluxDB, ntop and Kubernetes

InﬂuxDB and ntop on Kubernetes Gianluca Arbezzano, SRE at InﬂuxData

Gianluca Arbezzano Site Reliability Engineer @InﬂuxData • https://gianarb.it • @gianarb
What I like: • I make dirty hacks that look awesome • I grow my vegetables • Travel for fun and work

@gianarb - [email protected]

What is going on right now in production?

1. You Know! Your team knows and use Docker for
local development and testing 2. Kubernetes! Everyone speaks about kubernetes. 3. Hire! You don’t know why but you hired a DevOps that kind of know k8s. 3. Excitement! You are moving everything and everyone to kubernetes

Inspired by a true story

You need a good book 1. Short 2. Driven by
experiences 3. Practical 4. Easy

@gianarb - [email protected]

Kubernetes Resources ¨ Pod ¨ DaemonSet ¨ StatefulSet ¨ Service
¨ Deployment ¨ Persistent Volumes ¨ ConﬁgMap ¨ Secret

InﬂuxDB StatefulSet apiVersion: apps/v1 kind: StatefulSet metadata: namespace: monitoring name:
influxdb labels: component: influxdb spec: serviceName: influxdb selector: matchLabels: component: influxdb replicas: 1 template: metadata: name: influxdb labels: app: influxdb spec: containers: .... containers: - name: influxdb image: quay.io/influxdb/influxdb:nightly volumeMounts: - name: data mountPath: /var/lib/influxdb ports: - containerPort: 8086 name: server volumeClaimTemplates: - metadata: namespace: monitoring name: data spec: storageClassName: ebs-1 accessModes: - "ReadWriteOnce" resources: requests: storage: "250Gi"

InﬂuxDB Service apiVersion: v1 kind: Service metadata: namespace: monitoring name:
influxdb labels: component: influxdb app: influxdb spec: clusterIP: None ports: - port: 8086 name: server selector: component: influxdb publishNotReadyAddresses: true

Chronograf Deployment apiVersion: apps/v1 kind: Deployment metadata: namespace: monitoring name:
chronograf labels: app: chronograf spec: selector: matchLabels: app: chronograf replicas: 1 template: metadata: name: chronograf labels: app: chronograf spec: containers: - name: chronograf image: chronograf:1.7.4 ports: - containerPort: 8888 name: server

Chronograf Service apiVersion: v1 kind: Service metadata: namespace: monitoring name:
chronograf spec: ports: - port: 80 targetPort: 8888 name: server selector: app: chronograf

Telegraf ConﬁgMap apiVersion: v1 kind: ConfigMap metadata: name: telegraf-config-ntopng namespace:
monitoring data: telegraf.conf: | [[outputs.influxdb]] urls = ["$MONITOR_HOST"] database = "ntopng" retention_policy = "weekly" write_consistency = "any" timeout = "5s" username = "$MONITOR_USERNAME" password = "$MONITOR_PASSWORD" [[inputs.influxdb_listener]] service_address = ":8086" [global_tags] env = "$ENV" hostname = "$HOSTNAME"

ntopng ConﬁgMap apiVersion: v1 kind: ConfigMap metadata: name: ntopng-runtimeprefs namespace:
monitoring data: runtimeprefs.json: |+ {"ntopng.prefs.influx_username":"00000800E4B6B0FFA66628E2","ntopng.prefs.influx_dbname":"00066E746F706E670800C3BD27A9CC635D 8C","ntopng.prefs.influx_password":"00000800E4B6B0FFA66628E2","ntopng.prefs.influx_retention":"00C0070800CDE0A41D2AB244AA", "ntopng.prefs.influx_auth_enabled":"00C0000800216D4BC751498930","ntopng.prefs.ts_post_data_url":"0015687474703A2F2F6C6F6361 6C686F73743A383038360800BC9FDD56721F5346","ntopng.prefs.timeseries_driver":"0008696E666C7578646208002B6EA2EF661124EC"}

ntopng daemonset apiVersion: apps/v1 kind: DaemonSet metadata: name: ntopng namespace:
monitoring spec: selector: matchLabels: name: ntopng template: metadata: labels: name: ntopng spec: tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule hostNetwork: true dnsPolicy: ClusterFirstWithHostNet containers: - name: ntopng - name: telegraf

Ntopng daemonset - name: ntopng image: ntop/ntopng:stable args: ["-U", "root"]
resources: limits: memory: 750Mi requests: memory: 250Mi env: - name: HOST_IP valueFrom: fieldRef: fieldPath: status.hostIP - name: HOSTNAME valueFrom: fieldRef: fieldPath: spec.nodeName volumeMounts: - name: config mountPath: /var/tmp/ntopng/runtimeprefs.json subPath: runtimeprefs.json - name: telegraf image: docker.io/library/telegraf:1.9 volumeMounts: - name: telegraf mountPath: /etc/telegraf env: - name: MONITOR_HOST valueFrom: secretKeyRef: name: telegraf key: monitor_host - name: MONITOR_USERNAME valueFrom: secretKeyRef: name: telegraf key: monitor_username - name: HOSTNAME valueFrom: fieldRef: fieldPath: spec.nodeName

Ntopng daemonset ntop uses as identiﬁer IPs, in our case
the containers IP. But I would like to correlate by hostname and environment as well telegraf as proxy add those tags for every points. [global_tags] env = "$ENV" hostname = "$HOSTNAME"

Now you can query your data from Chronograf

You can correlate network metrics with other signals

github.com/kubernetes/kube-state-metrics

github.com/kubernetes-incubator/metrics-server Through the Metrics API you can get the amount of resource currently used by a given node or a given pod. This API doesn’t store the metric values, so it’s not possible for example to get the amount of resources used by a given node 10 minutes ago.

gianarb.it ~ @gianarb # HELP http_requests_total The total number of
HTTP requests. # TYPE http_requests_total counter http_requests_total{method="post",code="200"} 1027 1395066363000 http_requests_total{method="post",code="400"} 3 1395066363000 # Escaping in label values: msdos_file_access_time_seconds{path="C:\\DIR\\FILE.TXT",error="Cannot find file:\n\"FILE.TXT\""} 1.458255915e9 # Minimalistic line: metric_without_timestamp_and_labels 12.47 # A weird metric from before the epoch: something_weird{problem="division by zero"} +Inf -3982045 # A histogram, which has a pretty complex representation in the text format: # HELP http_request_duration_seconds A histogram of the request duration. # TYPE http_request_duration_seconds histogram http_request_duration_seconds_bucket{le="0.05"} 24054 http_request_duration_seconds_bucket{le="0.1"} 33444 http_request_duration_seconds_bucket{le="0.2"} 100392 http_request_duration_seconds_bucket{le="0.5"} 129389 http_request_duration_seconds_bucket{le="1"} 133988 http_request_duration_seconds_bucket{le="+Inf"} 144320 http_request_duration_seconds_sum 53423 http_request_duration_seconds_count 144320

Ntop can be the right tool but we need: ¨ A better conﬁguration via ﬁle or/and env var ¨ It should be “kubernetes aware” because IPs are not the right way to identify containers. They change to often

~ @gianarb - https://gianarb.it ~ monitor your monitor infrastructure

~ @gianarb - https://gianarb.it ~ The monitor infrastructure should notify
you when you are down.

~ @gianarb - https://gianarb.it ~ The monitor infrastructure should be
as far as possible from the environment you are monitoring.

~ @gianarb - https://gianarb.it ~ • Difference cloud provider •
Difference datacenter • Different company • Difference region • Different teams • As different as you can handle it

Reach out: @gianarb gianluca@inﬂuxdb.com https://gianarb.it Any question?

Rambling about InfluxDB, ntop and Kubernetes

Rambling about InfluxDB, ntop and Kubernetes

Gianluca Arbezzano

More Decks by Gianluca Arbezzano

Other Decks in Technology

Featured

Transcript

InﬂuxDB and ntop on Kubernetes Gianluca Arbezzano, SRE at InﬂuxData

Gianluca Arbezzano Site Reliability Engineer @InﬂuxData • https://gianarb.it • @gianarb

@gianarb - [email protected]

@gianarb - [email protected]

What is going on right now in production?

1. You Know! Your team knows and use Docker for

© 2018 InﬂuxData. All rights reserved. 7 @gianarb - gianluca@inﬂuxdb.com

Inspired by a true story

You need a good book 1. Short 2. Driven by

@gianarb - [email protected]

@gianarb - [email protected]

Kubernetes Resources ¨ Pod ¨ DaemonSet ¨ StatefulSet ¨ Service

InﬂuxDB StatefulSet apiVersion: apps/v1 kind: StatefulSet metadata: namespace: monitoring name:

InﬂuxDB Service apiVersion: v1 kind: Service metadata: namespace: monitoring name:

Chronograf Deployment apiVersion: apps/v1 kind: Deployment metadata: namespace: monitoring name:

Chronograf Service apiVersion: v1 kind: Service metadata: namespace: monitoring name:

Telegraf ConﬁgMap apiVersion: v1 kind: ConfigMap metadata: name: telegraf-config-ntopng namespace:

ntopng ConﬁgMap apiVersion: v1 kind: ConfigMap metadata: name: ntopng-runtimeprefs namespace:

ntopng daemonset apiVersion: apps/v1 kind: DaemonSet metadata: name: ntopng namespace:

Ntopng daemonset - name: ntopng image: ntop/ntopng:stable args: ["-U", "root"]

Ntopng daemonset ntop uses as identiﬁer IPs, in our case

Now you can query your data from Chronograf

You can correlate network metrics with other signals

© 2018 InﬂuxData. All rights reserved. 26 @gianarb - gianluca@inﬂuxdb.com

© 2018 InﬂuxData. All rights reserved. 27 @gianarb - gianluca@inﬂuxdb.com

© 2018 InﬂuxData. All rights reserved. 28 @gianarb - gianluca@inﬂuxdb.com

gianarb.it ~ @gianarb # HELP http_requests_total The total number of

© 2018 InﬂuxData. All rights reserved. 30 @gianarb - gianluca@inﬂuxdb.com

~ @gianarb - https://gianarb.it ~ monitor your monitor infrastructure

~ @gianarb - https://gianarb.it ~ The monitor infrastructure should notify

~ @gianarb - https://gianarb.it ~ The monitor infrastructure should be

~ @gianarb - https://gianarb.it ~ • Difference cloud provider •

© 2018 InﬂuxData. All rights reserved. 35 @gianarb - gianluca@inﬂuxdb.com