Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rambling about InfluxDB, ntop and Kubernetes

Rambling about InfluxDB, ntop and Kubernetes

Gianluca spoke at the ntop conference in Padova (Italy). He is not a network engineer and he started his journey almost with AWS, and now even worst with containers. He tried to share his view about breaking silos between devops, developers, sysadmin via observability. He hacked ntopng deployed on Kubernetes even if not 100% ready yet!

Gianluca Arbezzano

May 08, 2019
Tweet

More Decks by Gianluca Arbezzano

Other Decks in Technology

Transcript

  1. InfluxDB and ntop on Kubernetes
    Gianluca Arbezzano, SRE at InfluxData

    View full-size slide

  2. Gianluca Arbezzano
    Site Reliability Engineer @InfluxData
    ● https://gianarb.it
    ● @gianarb
    What I like:
    ● I make dirty hacks that look awesome
    ● I grow my vegetables
    ● Travel for fun and work

    View full-size slide

  3. What is going on right now in
    production?

    View full-size slide

  4. 1. You Know!
    Your team knows and
    use Docker for local
    development and
    testing
    2. Kubernetes!
    Everyone speaks
    about kubernetes.
    3. Hire!
    You don’t know why
    but you hired a
    DevOps that kind of
    know k8s.
    3. Excitement!
    You are moving
    everything and
    everyone to
    kubernetes

    View full-size slide

  5. © 2018 InfluxData. All rights reserved.
    7 @gianarb - gianluca@influxdb.com

    View full-size slide

  6. Inspired by a true story

    View full-size slide

  7. You need a good book
    1. Short
    2. Driven by experiences
    3. Practical
    4. Easy

    View full-size slide

  8. Kubernetes Resources
    ¨ Pod
    ¨ DaemonSet
    ¨ StatefulSet
    ¨ Service
    ¨ Deployment
    ¨ Persistent Volumes
    ¨ ConfigMap
    ¨ Secret

    View full-size slide

  9. InfluxDB StatefulSet apiVersion: apps/v1
    kind: StatefulSet
    metadata:
    namespace: monitoring
    name: influxdb
    labels:
    component: influxdb
    spec:
    serviceName: influxdb
    selector:
    matchLabels:
    component: influxdb
    replicas: 1
    template:
    metadata:
    name: influxdb
    labels:
    app: influxdb
    spec:
    containers:
    ....
    containers:
    - name: influxdb
    image:
    quay.io/influxdb/influxdb:nightly
    volumeMounts:
    - name: data
    mountPath: /var/lib/influxdb
    ports:
    - containerPort: 8086
    name: server
    volumeClaimTemplates:
    - metadata:
    namespace: monitoring
    name: data
    spec:
    storageClassName: ebs-1
    accessModes:
    - "ReadWriteOnce"
    resources:
    requests:
    storage: "250Gi"

    View full-size slide

  10. InfluxDB Service apiVersion: v1
    kind: Service
    metadata:
    namespace: monitoring
    name: influxdb
    labels:
    component: influxdb
    app: influxdb
    spec:
    clusterIP: None
    ports:
    - port: 8086
    name: server
    selector:
    component: influxdb
    publishNotReadyAddresses: true

    View full-size slide

  11. Chronograf Deployment
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    namespace: monitoring
    name: chronograf
    labels:
    app: chronograf
    spec:
    selector:
    matchLabels:
    app: chronograf
    replicas: 1
    template:
    metadata:
    name: chronograf
    labels:
    app: chronograf
    spec:
    containers:
    - name: chronograf
    image: chronograf:1.7.4
    ports:
    - containerPort: 8888
    name: server

    View full-size slide

  12. Chronograf Service
    apiVersion: v1
    kind: Service
    metadata:
    namespace: monitoring
    name: chronograf
    spec:
    ports:
    - port: 80
    targetPort: 8888
    name: server
    selector:
    app: chronograf

    View full-size slide

  13. Telegraf ConfigMap apiVersion: v1
    kind: ConfigMap
    metadata:
    name: telegraf-config-ntopng
    namespace: monitoring
    data:
    telegraf.conf: |
    [[outputs.influxdb]]
    urls = ["$MONITOR_HOST"]
    database = "ntopng"
    retention_policy = "weekly"
    write_consistency = "any"
    timeout = "5s"
    username = "$MONITOR_USERNAME"
    password = "$MONITOR_PASSWORD"
    [[inputs.influxdb_listener]]
    service_address = ":8086"
    [global_tags]
    env = "$ENV"
    hostname = "$HOSTNAME"

    View full-size slide

  14. ntopng ConfigMap
    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: ntopng-runtimeprefs
    namespace: monitoring
    data:
    runtimeprefs.json: |+
    {"ntopng.prefs.influx_username":"00000800E4B6B0FFA66628E2","ntopng.prefs.influx_dbname":"00066E746F706E670800C3BD27A9CC635D
    8C","ntopng.prefs.influx_password":"00000800E4B6B0FFA66628E2","ntopng.prefs.influx_retention":"00C0070800CDE0A41D2AB244AA",
    "ntopng.prefs.influx_auth_enabled":"00C0000800216D4BC751498930","ntopng.prefs.ts_post_data_url":"0015687474703A2F2F6C6F6361
    6C686F73743A383038360800BC9FDD56721F5346","ntopng.prefs.timeseries_driver":"0008696E666C7578646208002B6EA2EF661124EC"}

    View full-size slide

  15. ntopng daemonset apiVersion: apps/v1
    kind: DaemonSet
    metadata:
    name: ntopng
    namespace: monitoring
    spec:
    selector:
    matchLabels:
    name: ntopng
    template:
    metadata:
    labels:
    name: ntopng
    spec:
    tolerations:
    - key: node-role.kubernetes.io/master
    effect: NoSchedule
    hostNetwork: true
    dnsPolicy: ClusterFirstWithHostNet
    containers:
    - name: ntopng
    - name: telegraf

    View full-size slide

  16. Ntopng daemonset - name: ntopng
    image: ntop/ntopng:stable
    args: ["-U", "root"]
    resources:
    limits:
    memory: 750Mi
    requests:
    memory: 250Mi
    env:
    - name: HOST_IP
    valueFrom:
    fieldRef:
    fieldPath: status.hostIP
    - name: HOSTNAME
    valueFrom:
    fieldRef:
    fieldPath: spec.nodeName
    volumeMounts:
    - name: config
    mountPath: /var/tmp/ntopng/runtimeprefs.json
    subPath: runtimeprefs.json
    - name: telegraf
    image: docker.io/library/telegraf:1.9
    volumeMounts:
    - name: telegraf
    mountPath: /etc/telegraf
    env:
    - name: MONITOR_HOST
    valueFrom:
    secretKeyRef:
    name: telegraf
    key: monitor_host
    - name: MONITOR_USERNAME
    valueFrom:
    secretKeyRef:
    name: telegraf
    key: monitor_username
    - name: HOSTNAME
    valueFrom:
    fieldRef:
    fieldPath: spec.nodeName

    View full-size slide

  17. Ntopng daemonset
    ntop uses as identifier
    IPs, in our case the
    containers IP. But I
    would like to correlate
    by hostname and
    environment as well
    telegraf as proxy add
    those tags for every
    points.
    [global_tags]
    env = "$ENV"
    hostname = "$HOSTNAME"

    View full-size slide

  18. Now you can query your data
    from Chronograf

    View full-size slide

  19. You can correlate network
    metrics with other signals

    View full-size slide

  20. © 2018 InfluxData. All rights reserved.
    26 @gianarb - gianluca@influxdb.com

    View full-size slide

  21. © 2018 InfluxData. All rights reserved.
    27 @gianarb - gianluca@influxdb.com
    github.com/kubernetes/kube-state-metrics

    View full-size slide

  22. © 2018 InfluxData. All rights reserved.
    28 @gianarb - gianluca@influxdb.com
    github.com/kubernetes-incubator/metrics-server
    Through the Metrics API you can get the amount of resource currently used by a given node
    or a given pod. This API doesn’t store the metric values, so it’s not possible for example to
    get the amount of resources used by a given node 10 minutes ago.

    View full-size slide

  23. gianarb.it ~ @gianarb
    # HELP http_requests_total The total number of HTTP requests.
    # TYPE http_requests_total counter
    http_requests_total{method="post",code="200"} 1027 1395066363000
    http_requests_total{method="post",code="400"} 3 1395066363000
    # Escaping in label values:
    msdos_file_access_time_seconds{path="C:\\DIR\\FILE.TXT",error="Cannot find file:\n\"FILE.TXT\""}
    1.458255915e9
    # Minimalistic line:
    metric_without_timestamp_and_labels 12.47
    # A weird metric from before the epoch:
    something_weird{problem="division by zero"} +Inf -3982045
    # A histogram, which has a pretty complex representation in the text format:
    # HELP http_request_duration_seconds A histogram of the request duration.
    # TYPE http_request_duration_seconds histogram
    http_request_duration_seconds_bucket{le="0.05"} 24054
    http_request_duration_seconds_bucket{le="0.1"} 33444
    http_request_duration_seconds_bucket{le="0.2"} 100392
    http_request_duration_seconds_bucket{le="0.5"} 129389
    http_request_duration_seconds_bucket{le="1"} 133988
    http_request_duration_seconds_bucket{le="+Inf"} 144320
    http_request_duration_seconds_sum 53423
    http_request_duration_seconds_count 144320

    View full-size slide

  24. © 2018 InfluxData. All rights reserved.
    30 @gianarb - gianluca@influxdb.com
    Ntop can be the right tool but we need:
    ¨ A better configuration via file or/and env var
    ¨ It should be “kubernetes aware” because IPs are not the right way to identify
    containers. They change to often

    View full-size slide

  25. ~ @gianarb - https://gianarb.it ~
    monitor your monitor infrastructure

    View full-size slide

  26. ~ @gianarb - https://gianarb.it ~
    The monitor infrastructure should
    notify you when you are down.

    View full-size slide

  27. ~ @gianarb - https://gianarb.it ~
    The monitor infrastructure should be
    as far as possible from the environment
    you are monitoring.

    View full-size slide

  28. ~ @gianarb - https://gianarb.it ~
    ● Difference cloud provider
    ● Difference datacenter
    ● Different company
    ● Difference region
    ● Different teams
    ● As different as you can handle it

    View full-size slide

  29. © 2018 InfluxData. All rights reserved.
    35 @gianarb - gianluca@influxdb.com
    Reach out:
    @gianarb
    gianluca@influxdb.com
    https://gianarb.it
    Any question?

    View full-size slide