Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevOpsFest 2019 - DevOps never sleeps. What we learned from InfluxDB v1 to v2

DevOpsFest 2019 - DevOps never sleeps. What we learned from InfluxDB v1 to v2

Where a company with an OpenSource project announce that they are working on a new major release there is always a lot of chatting going on in the community because you never know how much this is going to break your system. Gianluca Arbezzano SRE at InfluxData will speak about the journey the company is facing from a DevOps perspective to move from InfluxDB v1 to version 2 a fully integrated platform that starts from the strong background we built running a database like InfluxDB at scale in our SaaS offer. This is not just a story about how a project evolved but it touches all the company in particular for what concern DevOpsFest everything around Kubernetes, Container and automation. How the SRE team managed the onboard of 20 developers on a cloud based project where operating and observing the system is a key concept to learn how to build a more solid and sustainable product.

Gianluca Arbezzano

April 06, 2019
Tweet

More Decks by Gianluca Arbezzano

Other Decks in Technology

Transcript

  1. Gianluca Arbezzano
    Site Reliability Engineer @InfluxData
    ● https://gianarb.it
    ● @gianarb
    What I like:
    ● I make dirty hacks that look awesome
    ● I grow my vegetables
    ● Travel for fun and work

    View full-size slide

  2. © 2018 InfluxData. All rights reserved.
    6 @gianarb - gianluca@influxdb.com

    View full-size slide

  3. © 2018 InfluxData. All rights reserved.
    7 @gianarb - gianluca@influxdb.com
    DevOps likes automation.
    Automation likes code.
    YAML is not code.

    View full-size slide

  4. Inspired by a true events

    View full-size slide

  5. 1. You Know!
    Your team knows and
    use Docker for local
    development and
    testing
    2. Kubernetes!
    Everyone speaks
    about kubernetes.
    3. Hire!
    You don’t know why
    but you hired a
    DevOps that kind of
    know k8s.
    3. Excitement!
    You are moving
    everything and
    everyone to
    kubernetes

    View full-size slide

  6. We need to make our
    hands dirty

    View full-size slide

  7. Spin up a cluster that you
    can break
    Bring developers in the loop

    View full-size slide

  8. Deploy CI on Kubernetes
    Bring developers in the loop

    View full-size slide

  9. Run your code in prod
    Bring developers in the loop

    View full-size slide

  10. K8s as code: From YAML to code (golang)
    1. You have the ability to use Golang autocomplete as documentation, reference for every
    kubernetes resources
    2. You feel less a YAML engineer (great feeling btw)
    3. Code is better than YAML! You can reuse it, compile it, embed it in other projects.

    View full-size slide

  11. K8s as code: From YAML to code (golang)
    Tiny cli
    to make
    the
    migration
    to golang
    Some
    manual
    refactoring

    View full-size slide

  12. K8s as code: From YAML to code (golang)
    Tiny cli
    to make
    the
    migration
    to golang
    Some
    manual
    refactoring
    ● Continue to improve our CI to validate that YAML and Go file are the same,
    and the resources in Kubernetes are like the Go file.
    ● Maybe we will be able to remove the YAML at some point.

    View full-size slide

  13. GitOps
    Your Git repository is the entrypoint for all your code changes.
    Infrastructure is ‘as code’, so the place where you make it happen should be Git.
    Read More on weave.com
    https://www.weave.works/technologies/gitops/

    View full-size slide

  14. The secret of
    success

    View full-size slide

  15. Don’t be scared and write your
    own tools!

    View full-size slide

  16. Why Kubernetes
    is so powerful, complex
    and widely adopted?

    View full-size slide

  17. Why AWS
    is so expensive?

    View full-size slide

  18. What do you do
    to justify these costs?

    View full-size slide

  19. © 2018 InfluxData. All rights reserved.
    24 @gianarb - gianluca@influxdb.com

    View full-size slide

  20. © 2018 InfluxData. All rights reserved.
    25 @gianarb - gianluca@influxdb.com
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
    name: {{ template "drone.fullname" . }}-agent
    labels:
    app: {{ template "drone.name" . }}
    chart: "{{ .Chart.Name }}-{ .Chart.Version }}"
    release: "{{ .Release.Name }}"
    heritage: "{{ .Release.Service }}"
    component: agent
    spec:
    replicas: {{ .Values.agent.replicas }}
    template:
    metadata:
    annotations:
    checksum/secrets: {{ include (print $.Template.BasePath "/secrets.yaml") . | sha256sum }}
    {{- if .Values.agent.annotations }
    {{ toYaml .Values.agent.annotations | indent 8 }
    {{- end }}
    labels:
    app: {{ template "drone.name" . }}
    release: "{{ .Release.Name }}"
    component: agent

    View full-size slide

  21. API are
    the keys for
    your success!
    Image credit: Pixabay

    View full-size slide

  22. © 2018 InfluxData. All rights reserved.
    27 @gianarb - gianluca@influxdb.com

    View full-size slide

  23. © 2018 InfluxData. All rights reserved.
    28 @gianarb - gianluca@influxdb.com

    View full-size slide

  24. © 2018 InfluxData. All rights reserved.
    29 @gianarb - gianluca@influxdb.com
    containerd.io

    View full-size slide

  25. © 2018 InfluxData. All rights reserved.
    30 @gianarb - gianluca@influxdb.com

    View full-size slide

  26. © 2018 InfluxData. All rights reserved.
    31 @gianarb - gianluca@influxdb.com

    View full-size slide

  27. © 2018 InfluxData. All rights reserved.
    32 @gianarb - gianluca@influxdb.com

    View full-size slide

  28. © 2018 InfluxData. All rights reserved.
    33 @gianarb - gianluca@influxdb.com

    View full-size slide

  29. © 2018 InfluxData. All rights reserved.
    34 @gianarb - gianluca@influxdb.com

    View full-size slide

  30. © 2018 InfluxData. All rights reserved.
    35 @gianarb - gianluca@influxdb.com
    We use docker as
    replacement for systemd
    for process management

    View full-size slide

  31. © 2018 InfluxData. All rights reserved.
    36 @gianarb - gianluca@influxdb.com
    DIND - Docker in Docker
    $ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock docker sh
    $ docker info
    Containers: 48
    Running: 1
    Paused: 0
    Stopped: 47
    containerd version: 9f2e07b1fc1342d1c48fe4d7bbb94cb6d1bf278b.m
    runc version: 871ba2e58e24314d1fab4517a80410191ba5ad01
    init version: fec3683
    Kernel Version: 4.20.13-arch1-1-ARCH
    Operating System: Arch Linux
    OSType: linux
    Architecture: x86_64
    CPUs: 4
    Total Memory: 15.42GiB
    Name: gianarb

    View full-size slide

  32. UX for OPS.
    Because everyone needs to feel
    like at home...

    View full-size slide

  33. Instrumentation, observability
    and monitoring

    View full-size slide

  34. ~ @gianarb - https://gianarb.it ~
    The secret is all about how do you
    combine things together

    View full-size slide

  35. ~ @gianarb - https://gianarb.it ~
    Metric
    s

    View full-size slide

  36. ~ @gianarb - https://gianarb.it ~
    Logs

    View full-size slide

  37. ~ @gianarb - https://gianarb.it ~
    Traces

    View full-size slide

  38. ~ @gianarb - https://gianarb.it ~
    Often our
    aggregations looks
    a bit twisted...

    View full-size slide

  39. @gianarb - [email protected]
    Distributed Tracing
    Tracing is a way to correlate
    logs using a set of IDs

    View full-size slide

  40. Normal state vs Current state

    View full-size slide

  41. Instrumentation code is a first citizen in your
    codebase: OpenCensus
    ● Open Source project sponsored by Google
    ● It is a SPEC plus a set of libraries in different languages to instrument your
    application
    ● To collect metrics, traces and events.

    View full-size slide

  42. OpenCensus
    Common
    Interface to
    collect stats
    and traces
    from your app
    Different
    exporters to
    persist your
    data

    View full-size slide

  43. gianarb.it ~ @gianarb
    # HELP http_requests_total The total number of HTTP requests.
    # TYPE http_requests_total counter
    http_requests_total{method="post",code="200"} 1027 1395066363000
    http_requests_total{method="post",code="400"} 3 1395066363000
    # Escaping in label values:
    msdos_file_access_time_seconds{path="C:\\DIR\\FILE.TXT",error="Cannot find file:\n\"FILE.TXT\""}
    1.458255915e9
    # Minimalistic line:
    metric_without_timestamp_and_labels 12.47
    # A weird metric from before the epoch:
    something_weird{problem="division by zero"} +Inf -3982045
    # A histogram, which has a pretty complex representation in the text format:
    # HELP http_request_duration_seconds A histogram of the request duration.
    # TYPE http_request_duration_seconds histogram
    http_request_duration_seconds_bucket{le="0.05"} 24054
    http_request_duration_seconds_bucket{le="0.1"} 33444
    http_request_duration_seconds_bucket{le="0.2"} 100392
    http_request_duration_seconds_bucket{le="0.5"} 129389
    http_request_duration_seconds_bucket{le="1"} 133988
    http_request_duration_seconds_bucket{le="+Inf"} 144320
    http_request_duration_seconds_sum 53423
    http_request_duration_seconds_count 144320

    View full-size slide

  44. OpenMetrics
    v2 Prometheus exposition format

    View full-size slide

  45. gianarb.it ~ @gianarb
    func FetchMetricFamilies(url string, ch chan<- *dto.MetricFamily, certificate string, key string,
    skipServerCertCheck bool) error {
    defer close(ch)
    var transport *http.Transport
    if certificate != "" && key != "" {
    cert, err := tls.LoadX509KeyPair(certificate, key)
    if err != nil {
    return err
    }
    tlsConfig := &tls.Config{
    Certificates: []tls.Certificate{cert},
    InsecureSkipVerify: skipServerCertCheck,
    }
    tlsConfig.BuildNameToCertificate()
    transport = &http.Transport{TLSClientConfig: tlsConfig}
    } else {
    transport = &http.Transport{
    TLSClientConfig: &tls.Config{InsecureSkipVerify: skipServerCertCheck},
    }
    }
    https://github.com/prometheus/prom2json/blob/master/prom2json.go#L123

    View full-size slide

  46. Summary:
    ★ Do not be scared to write your code!
    ★ Use the API
    ★ Instrumentation code is a first class
    citizen
    ★ Keep calm and observe all together!

    View full-size slide

  47. © 2018 InfluxData. All rights reserved.
    53 @gianarb - gianluca@influxdb.com
    Credits and Links
    ¨ https://www.weave.works/technologies/gitops/
    ¨ http://gianarb.it
    ¨ https://thenewstack.io/why-you-cant-afford-to-ignore-distributed-tracing-for-observability/
    ¨ https://www.honeycomb.io/blog/
    ¨ https://gianarb.it/blog/infra-as-code-short-long-ttl-resource
    ¨ https://gianarb.it/blog/kubernetes-shared-informer
    ¨ https://github.com/OpenObservability/OpenMetrics
    ¨ https://promcon.io/2018-munich/slides/openmetrics-transforming-the-prometheus-exposition-format-into-a
    -global-standard.pdf
    ¨ https://medium.com/opentracing/merging-opentracing-and-opencensus-f0fe9c7ca6f0

    View full-size slide

  48. ~ @gianarb - https://gianarb.it ~
    Thanks
    @gianarb

    View full-size slide