Rambling about InfluxDB, ntop and Kubernetes

Rambling about InfluxDB, ntop and Kubernetes

Gianluca spoke at the ntop conference in Padova (Italy). He is not a network engineer and he started his journey almost with AWS, and now even worst with containers. He tried to share his view about breaking silos between devops, developers, sysadmin via observability. He hacked ntopng deployed on Kubernetes even if not 100% ready yet!

Fa5fd3405808cc6a9fe4b126b1ec39bd?s=128

Gianluca Arbezzano

May 08, 2019
Tweet

Transcript

  1. 2.

    Gianluca Arbezzano Site Reliability Engineer @InfluxData • https://gianarb.it • @gianarb

    What I like: • I make dirty hacks that look awesome • I grow my vegetables • Travel for fun and work
  2. 6.

    1. You Know! Your team knows and use Docker for

    local development and testing 2. Kubernetes! Everyone speaks about kubernetes. 3. Hire! You don’t know why but you hired a DevOps that kind of know k8s. 3. Excitement! You are moving everything and everyone to kubernetes
  3. 9.
  4. 10.

    You need a good book 1. Short 2. Driven by

    experiences 3. Practical 4. Easy
  5. 13.

    Kubernetes Resources ¨ Pod ¨ DaemonSet ¨ StatefulSet ¨ Service

    ¨ Deployment ¨ Persistent Volumes ¨ ConfigMap ¨ Secret
  6. 14.

    InfluxDB StatefulSet apiVersion: apps/v1 kind: StatefulSet metadata: namespace: monitoring name:

    influxdb labels: component: influxdb spec: serviceName: influxdb selector: matchLabels: component: influxdb replicas: 1 template: metadata: name: influxdb labels: app: influxdb spec: containers: .... containers: - name: influxdb image: quay.io/influxdb/influxdb:nightly volumeMounts: - name: data mountPath: /var/lib/influxdb ports: - containerPort: 8086 name: server volumeClaimTemplates: - metadata: namespace: monitoring name: data spec: storageClassName: ebs-1 accessModes: - "ReadWriteOnce" resources: requests: storage: "250Gi"
  7. 15.

    InfluxDB Service apiVersion: v1 kind: Service metadata: namespace: monitoring name:

    influxdb labels: component: influxdb app: influxdb spec: clusterIP: None ports: - port: 8086 name: server selector: component: influxdb publishNotReadyAddresses: true
  8. 16.

    Chronograf Deployment apiVersion: apps/v1 kind: Deployment metadata: namespace: monitoring name:

    chronograf labels: app: chronograf spec: selector: matchLabels: app: chronograf replicas: 1 template: metadata: name: chronograf labels: app: chronograf spec: containers: - name: chronograf image: chronograf:1.7.4 ports: - containerPort: 8888 name: server
  9. 17.

    Chronograf Service apiVersion: v1 kind: Service metadata: namespace: monitoring name:

    chronograf spec: ports: - port: 80 targetPort: 8888 name: server selector: app: chronograf
  10. 18.

    Telegraf ConfigMap apiVersion: v1 kind: ConfigMap metadata: name: telegraf-config-ntopng namespace:

    monitoring data: telegraf.conf: | [[outputs.influxdb]] urls = ["$MONITOR_HOST"] database = "ntopng" retention_policy = "weekly" write_consistency = "any" timeout = "5s" username = "$MONITOR_USERNAME" password = "$MONITOR_PASSWORD" [[inputs.influxdb_listener]] service_address = ":8086" [global_tags] env = "$ENV" hostname = "$HOSTNAME"
  11. 19.

    ntopng ConfigMap apiVersion: v1 kind: ConfigMap metadata: name: ntopng-runtimeprefs namespace:

    monitoring data: runtimeprefs.json: |+ {"ntopng.prefs.influx_username":"00000800E4B6B0FFA66628E2","ntopng.prefs.influx_dbname":"00066E746F706E670800C3BD27A9CC635D 8C","ntopng.prefs.influx_password":"00000800E4B6B0FFA66628E2","ntopng.prefs.influx_retention":"00C0070800CDE0A41D2AB244AA", "ntopng.prefs.influx_auth_enabled":"00C0000800216D4BC751498930","ntopng.prefs.ts_post_data_url":"0015687474703A2F2F6C6F6361 6C686F73743A383038360800BC9FDD56721F5346","ntopng.prefs.timeseries_driver":"0008696E666C7578646208002B6EA2EF661124EC"}
  12. 20.

    ntopng daemonset apiVersion: apps/v1 kind: DaemonSet metadata: name: ntopng namespace:

    monitoring spec: selector: matchLabels: name: ntopng template: metadata: labels: name: ntopng spec: tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule hostNetwork: true dnsPolicy: ClusterFirstWithHostNet containers: - name: ntopng - name: telegraf
  13. 21.

    Ntopng daemonset - name: ntopng image: ntop/ntopng:stable args: ["-U", "root"]

    resources: limits: memory: 750Mi requests: memory: 250Mi env: - name: HOST_IP valueFrom: fieldRef: fieldPath: status.hostIP - name: HOSTNAME valueFrom: fieldRef: fieldPath: spec.nodeName volumeMounts: - name: config mountPath: /var/tmp/ntopng/runtimeprefs.json subPath: runtimeprefs.json - name: telegraf image: docker.io/library/telegraf:1.9 volumeMounts: - name: telegraf mountPath: /etc/telegraf env: - name: MONITOR_HOST valueFrom: secretKeyRef: name: telegraf key: monitor_host - name: MONITOR_USERNAME valueFrom: secretKeyRef: name: telegraf key: monitor_username - name: HOSTNAME valueFrom: fieldRef: fieldPath: spec.nodeName
  14. 22.

    Ntopng daemonset ntop uses as identifier IPs, in our case

    the containers IP. But I would like to correlate by hostname and environment as well telegraf as proxy add those tags for every points. [global_tags] env = "$ENV" hostname = "$HOSTNAME"
  15. 24.
  16. 28.

    © 2018 InfluxData. All rights reserved. 28 @gianarb - gianluca@influxdb.com

    github.com/kubernetes-incubator/metrics-server Through the Metrics API you can get the amount of resource currently used by a given node or a given pod. This API doesn’t store the metric values, so it’s not possible for example to get the amount of resources used by a given node 10 minutes ago.
  17. 29.

    gianarb.it ~ @gianarb # HELP http_requests_total The total number of

    HTTP requests. # TYPE http_requests_total counter http_requests_total{method="post",code="200"} 1027 1395066363000 http_requests_total{method="post",code="400"} 3 1395066363000 # Escaping in label values: msdos_file_access_time_seconds{path="C:\\DIR\\FILE.TXT",error="Cannot find file:\n\"FILE.TXT\""} 1.458255915e9 # Minimalistic line: metric_without_timestamp_and_labels 12.47 # A weird metric from before the epoch: something_weird{problem="division by zero"} +Inf -3982045 # A histogram, which has a pretty complex representation in the text format: # HELP http_request_duration_seconds A histogram of the request duration. # TYPE http_request_duration_seconds histogram http_request_duration_seconds_bucket{le="0.05"} 24054 http_request_duration_seconds_bucket{le="0.1"} 33444 http_request_duration_seconds_bucket{le="0.2"} 100392 http_request_duration_seconds_bucket{le="0.5"} 129389 http_request_duration_seconds_bucket{le="1"} 133988 http_request_duration_seconds_bucket{le="+Inf"} 144320 http_request_duration_seconds_sum 53423 http_request_duration_seconds_count 144320
  18. 30.

    © 2018 InfluxData. All rights reserved. 30 @gianarb - gianluca@influxdb.com

    Ntop can be the right tool but we need: ¨ A better configuration via file or/and env var ¨ It should be “kubernetes aware” because IPs are not the right way to identify containers. They change to often
  19. 33.

    ~ @gianarb - https://gianarb.it ~ The monitor infrastructure should be

    as far as possible from the environment you are monitoring.
  20. 34.

    ~ @gianarb - https://gianarb.it ~ • Difference cloud provider •

    Difference datacenter • Different company • Difference region • Different teams • As different as you can handle it
  21. 35.

    © 2018 InfluxData. All rights reserved. 35 @gianarb - gianluca@influxdb.com

    Reach out: @gianarb gianluca@influxdb.com https://gianarb.it Any question?