Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rambling about InfluxDB, ntop and Kubernetes

Rambling about InfluxDB, ntop and Kubernetes

Gianluca spoke at the ntop conference in Padova (Italy). He is not a network engineer and he started his journey almost with AWS, and now even worst with containers. He tried to share his view about breaking silos between devops, developers, sysadmin via observability. He hacked ntopng deployed on Kubernetes even if not 100% ready yet!

Gianluca Arbezzano

May 08, 2019
Tweet

More Decks by Gianluca Arbezzano

Other Decks in Technology

Transcript

  1. InfluxDB and ntop on Kubernetes
    Gianluca Arbezzano, SRE at InfluxData

    View Slide

  2. Gianluca Arbezzano
    Site Reliability Engineer @InfluxData
    ● https://gianarb.it
    ● @gianarb
    What I like:
    ● I make dirty hacks that look awesome
    ● I grow my vegetables
    ● Travel for fun and work

    View Slide

  3. @gianarb - [email protected]

    View Slide

  4. @gianarb - [email protected]

    View Slide

  5. What is going on right now in
    production?

    View Slide

  6. 1. You Know!
    Your team knows and
    use Docker for local
    development and
    testing
    2. Kubernetes!
    Everyone speaks
    about kubernetes.
    3. Hire!
    You don’t know why
    but you hired a
    DevOps that kind of
    know k8s.
    3. Excitement!
    You are moving
    everything and
    everyone to
    kubernetes

    View Slide

  7. © 2018 InfluxData. All rights reserved.
    7 @gianarb - [email protected]fluxdb.com

    View Slide

  8. Inspired by a true story

    View Slide

  9. View Slide

  10. You need a good book
    1. Short
    2. Driven by experiences
    3. Practical
    4. Easy

    View Slide

  11. @gianarb - [email protected]

    View Slide

  12. @gianarb - [email protected]

    View Slide

  13. Kubernetes Resources
    ¨ Pod
    ¨ DaemonSet
    ¨ StatefulSet
    ¨ Service
    ¨ Deployment
    ¨ Persistent Volumes
    ¨ ConfigMap
    ¨ Secret

    View Slide

  14. InfluxDB StatefulSet apiVersion: apps/v1
    kind: StatefulSet
    metadata:
    namespace: monitoring
    name: influxdb
    labels:
    component: influxdb
    spec:
    serviceName: influxdb
    selector:
    matchLabels:
    component: influxdb
    replicas: 1
    template:
    metadata:
    name: influxdb
    labels:
    app: influxdb
    spec:
    containers:
    ....
    containers:
    - name: influxdb
    image:
    quay.io/influxdb/influxdb:nightly
    volumeMounts:
    - name: data
    mountPath: /var/lib/influxdb
    ports:
    - containerPort: 8086
    name: server
    volumeClaimTemplates:
    - metadata:
    namespace: monitoring
    name: data
    spec:
    storageClassName: ebs-1
    accessModes:
    - "ReadWriteOnce"
    resources:
    requests:
    storage: "250Gi"

    View Slide

  15. InfluxDB Service apiVersion: v1
    kind: Service
    metadata:
    namespace: monitoring
    name: influxdb
    labels:
    component: influxdb
    app: influxdb
    spec:
    clusterIP: None
    ports:
    - port: 8086
    name: server
    selector:
    component: influxdb
    publishNotReadyAddresses: true

    View Slide

  16. Chronograf Deployment
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    namespace: monitoring
    name: chronograf
    labels:
    app: chronograf
    spec:
    selector:
    matchLabels:
    app: chronograf
    replicas: 1
    template:
    metadata:
    name: chronograf
    labels:
    app: chronograf
    spec:
    containers:
    - name: chronograf
    image: chronograf:1.7.4
    ports:
    - containerPort: 8888
    name: server

    View Slide

  17. Chronograf Service
    apiVersion: v1
    kind: Service
    metadata:
    namespace: monitoring
    name: chronograf
    spec:
    ports:
    - port: 80
    targetPort: 8888
    name: server
    selector:
    app: chronograf

    View Slide

  18. Telegraf ConfigMap apiVersion: v1
    kind: ConfigMap
    metadata:
    name: telegraf-config-ntopng
    namespace: monitoring
    data:
    telegraf.conf: |
    [[outputs.influxdb]]
    urls = ["$MONITOR_HOST"]
    database = "ntopng"
    retention_policy = "weekly"
    write_consistency = "any"
    timeout = "5s"
    username = "$MONITOR_USERNAME"
    password = "$MONITOR_PASSWORD"
    [[inputs.influxdb_listener]]
    service_address = ":8086"
    [global_tags]
    env = "$ENV"
    hostname = "$HOSTNAME"

    View Slide

  19. ntopng ConfigMap
    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: ntopng-runtimeprefs
    namespace: monitoring
    data:
    runtimeprefs.json: |+
    {"ntopng.prefs.influx_username":"00000800E4B6B0FFA66628E2","ntopng.prefs.influx_dbname":"00066E746F706E670800C3BD27A9CC635D
    8C","ntopng.prefs.influx_password":"00000800E4B6B0FFA66628E2","ntopng.prefs.influx_retention":"00C0070800CDE0A41D2AB244AA",
    "ntopng.prefs.influx_auth_enabled":"00C0000800216D4BC751498930","ntopng.prefs.ts_post_data_url":"0015687474703A2F2F6C6F6361
    6C686F73743A383038360800BC9FDD56721F5346","ntopng.prefs.timeseries_driver":"0008696E666C7578646208002B6EA2EF661124EC"}

    View Slide

  20. ntopng daemonset apiVersion: apps/v1
    kind: DaemonSet
    metadata:
    name: ntopng
    namespace: monitoring
    spec:
    selector:
    matchLabels:
    name: ntopng
    template:
    metadata:
    labels:
    name: ntopng
    spec:
    tolerations:
    - key: node-role.kubernetes.io/master
    effect: NoSchedule
    hostNetwork: true
    dnsPolicy: ClusterFirstWithHostNet
    containers:
    - name: ntopng
    - name: telegraf

    View Slide

  21. Ntopng daemonset - name: ntopng
    image: ntop/ntopng:stable
    args: ["-U", "root"]
    resources:
    limits:
    memory: 750Mi
    requests:
    memory: 250Mi
    env:
    - name: HOST_IP
    valueFrom:
    fieldRef:
    fieldPath: status.hostIP
    - name: HOSTNAME
    valueFrom:
    fieldRef:
    fieldPath: spec.nodeName
    volumeMounts:
    - name: config
    mountPath: /var/tmp/ntopng/runtimeprefs.json
    subPath: runtimeprefs.json
    - name: telegraf
    image: docker.io/library/telegraf:1.9
    volumeMounts:
    - name: telegraf
    mountPath: /etc/telegraf
    env:
    - name: MONITOR_HOST
    valueFrom:
    secretKeyRef:
    name: telegraf
    key: monitor_host
    - name: MONITOR_USERNAME
    valueFrom:
    secretKeyRef:
    name: telegraf
    key: monitor_username
    - name: HOSTNAME
    valueFrom:
    fieldRef:
    fieldPath: spec.nodeName

    View Slide

  22. Ntopng daemonset
    ntop uses as identifier
    IPs, in our case the
    containers IP. But I
    would like to correlate
    by hostname and
    environment as well
    telegraf as proxy add
    those tags for every
    points.
    [global_tags]
    env = "$ENV"
    hostname = "$HOSTNAME"

    View Slide

  23. Now you can query your data
    from Chronograf

    View Slide

  24. View Slide

  25. You can correlate network
    metrics with other signals

    View Slide

  26. © 2018 InfluxData. All rights reserved.
    26 @gianarb - [email protected]fluxdb.com

    View Slide

  27. © 2018 InfluxData. All rights reserved.
    27 @gianarb - [email protected]fluxdb.com
    github.com/kubernetes/kube-state-metrics

    View Slide

  28. © 2018 InfluxData. All rights reserved.
    28 @gianarb - [email protected]fluxdb.com
    github.com/kubernetes-incubator/metrics-server
    Through the Metrics API you can get the amount of resource currently used by a given node
    or a given pod. This API doesn’t store the metric values, so it’s not possible for example to
    get the amount of resources used by a given node 10 minutes ago.

    View Slide

  29. gianarb.it ~ @gianarb
    # HELP http_requests_total The total number of HTTP requests.
    # TYPE http_requests_total counter
    http_requests_total{method="post",code="200"} 1027 1395066363000
    http_requests_total{method="post",code="400"} 3 1395066363000
    # Escaping in label values:
    msdos_file_access_time_seconds{path="C:\\DIR\\FILE.TXT",error="Cannot find file:\n\"FILE.TXT\""}
    1.458255915e9
    # Minimalistic line:
    metric_without_timestamp_and_labels 12.47
    # A weird metric from before the epoch:
    something_weird{problem="division by zero"} +Inf -3982045
    # A histogram, which has a pretty complex representation in the text format:
    # HELP http_request_duration_seconds A histogram of the request duration.
    # TYPE http_request_duration_seconds histogram
    http_request_duration_seconds_bucket{le="0.05"} 24054
    http_request_duration_seconds_bucket{le="0.1"} 33444
    http_request_duration_seconds_bucket{le="0.2"} 100392
    http_request_duration_seconds_bucket{le="0.5"} 129389
    http_request_duration_seconds_bucket{le="1"} 133988
    http_request_duration_seconds_bucket{le="+Inf"} 144320
    http_request_duration_seconds_sum 53423
    http_request_duration_seconds_count 144320

    View Slide

  30. © 2018 InfluxData. All rights reserved.
    30 @gianarb - [email protected]fluxdb.com
    Ntop can be the right tool but we need:
    ¨ A better configuration via file or/and env var
    ¨ It should be “kubernetes aware” because IPs are not the right way to identify
    containers. They change to often

    View Slide

  31. ~ @gianarb - https://gianarb.it ~
    monitor your monitor infrastructure

    View Slide

  32. ~ @gianarb - https://gianarb.it ~
    The monitor infrastructure should
    notify you when you are down.

    View Slide

  33. ~ @gianarb - https://gianarb.it ~
    The monitor infrastructure should be
    as far as possible from the environment
    you are monitoring.

    View Slide

  34. ~ @gianarb - https://gianarb.it ~
    ● Difference cloud provider
    ● Difference datacenter
    ● Different company
    ● Difference region
    ● Different teams
    ● As different as you can handle it

    View Slide

  35. © 2018 InfluxData. All rights reserved.
    35 @gianarb - [email protected]fluxdb.com
    Reach out:
    @gianarb
    [email protected]fluxdb.com
    https://gianarb.it
    Any question?

    View Slide