CfgMgmtCamp - Infrastructure as code should contain code

CfgMgmtCamp - Infrastructure as code should contain code

These days “infrastructure as code” means HCL, YAML, JSON.
I will never buy that JSON is a programming language.
Cloud Formation tries so hard mixing JSON with keywords that runtime become functions saying that it is a maintainable approach.
Helm pushes hard saying that YAML with some parameters that runtime get translated to a variable is a flexible and maintainable approach.
Infrastructure as code means that you are supposed to use a programming language because otherwise, it won’t work.
Some YAML evangelists will tell you that a “human-friendly data serialization standard” is better than code because it will keep you out from writing weird and wrong code that you are not supposed to write doing infrastructure.
The truth is that you need to code better! At InfluxData we know that, and this talk is about our journey moving from YAML to Go to manage our Kubernetes cluster.
What we faced and why we think it is way better!

Fa5fd3405808cc6a9fe4b126b1ec39bd?s=128

Gianluca Arbezzano

February 05, 2019
Tweet

Transcript

  1. 2.

    Gianluca Arbezzano Site Reliability Engineer @InfluxData • https://gianarb.it • @gianarb

    What I like: • I make dirty hacks that look awesome • I grow my vegetables • Travel for fun and work
  2. 5.
  3. 6.
  4. 10.

    © 2018 InfluxData. All rights reserved. 10 @gianarb - gianluca@influxdb.com

    The automation’s goals ¨ The ability to play repeatable tasks with a single click or continuously. ¨ The possibility to build pipelines using products like Kubernetes, AWS, Jenkins ¨ We would like to have a friendly UX to manage “ops”. Image credit: Pixabay
  5. 15.

    © 2018 InfluxData. All rights reserved. 15 @gianarb - gianluca@influxdb.com

    apiVersion: extensions/v1beta1 kind: Deployment metadata: name: {{ template "drone.fullname" . }}-agent labels: app: {{ template "drone.name" . }} chart: "{{ .Chart.Name }}-{ .Chart.Version }}" release: "{{ .Release.Name }}" heritage: "{{ .Release.Service }}" component: agent spec: replicas: {{ .Values.agent.replicas }} template: metadata: annotations: checksum/secrets: {{ include (print $.Template.BasePath "/secrets.yaml") . | sha256sum }} {{- if .Values.agent.annotations } {{ toYaml .Values.agent.annotations | indent 8 } {{- end }} labels: app: {{ template "drone.name" . }} release: "{{ .Release.Name }}" component: agent
  6. 18.

    © 2018 InfluxData. All rights reserved. 18 @gianarb - gianluca@influxdb.com

    Short TTL Long TTL don’t change very often They never change they have a controllable “state” Change frequently They have a state controllable by outside Depend from the external
  7. 21.

    © 2018 InfluxData. All rights reserved. 21 @gianarb - gianluca@influxdb.com

    So you are not like me always with the k8s’s docs or AWS’s doc open to look at YAML formats?
  8. 22.

    Code PRO: A Go struct brings compile time validation and

    inline docs (it means no unknown fields)
  9. 23.

    © 2018 InfluxData. All rights reserved. 23 @gianarb - gianluca@influxdb.com

    Write friendly utility to manipulate the resources // Create the pull of services you need to deploy apps := []runtime.Object{} apps = append(apps, etcd.New()...) apps = append(apps, monitor.New()...) apps = append(apps, kafka.New()...) // Declare the number of transformation you need to apply to the declared resources ops = append(ops, WithNoLimits(), WithMaxReplicas(1), WithReplicas(map[string]int{ "etcd": 3, }), WithNoAffinity(), WithNoNodePorts()) // Apply Transformation apps = Transform(ops...)(apps) // Deploy the transformation using client-sdk
  10. 24.

    © 2018 InfluxData. All rights reserved. 24 @gianarb - gianluca@influxdb.com

    Test what you expect avoids mistakes prod := []runtime.Object{} prod = append(prod, service1.New()...) prod = append(prod, service2.New()...) prod = append(prod, mysql.New()...) prod = append(prod, kafka.New()...) ops = append(ops, WithMaxReplicas(1), WithReplicas(map[string]int{ "service1": 3, "service2": 21, })) apps = Transform(ops...)(prod) // You can write a unit test to enforce what you need in the // prod environment or to avoid to change something that // should not be changed.
  11. 25.

    © 2018 InfluxData. All rights reserved. 25 @gianarb - gianluca@influxdb.com

    // WithReplicas matches the serviceReplicas key to the statefulset service name // and sets the number of replicas to the value. func WithReplicas(serviceReplicas map[string]int) Op { return func(objs []runtime.Object) []runtime.Object { for _, o := range objs { switch app := o.(type) { case *appsv1.StatefulSet: for name, replicas := range serviceReplicas { if app.Spec.ServiceName == name { r := int32(replicas) app.Spec.Replicas = &r } } } } return objs } }
  12. 26.

    © 2018 InfluxData. All rights reserved. 26 @gianarb - gianluca@influxdb.com

    Develop your pipeline ¨ Monitorable ¨ Reproducible ¨ Easy to expand ¨ Documentative
  13. 28.

    © 2018 InfluxData. All rights reserved. 28 @gianarb - gianluca@influxdb.com

    Kubernetes cluster in AWS / Labels and taints from EC2 tags Every autoscaling group can passthrough tags to the underline EC2. We wrote a Golang application that uses a shared informer to listen for new registered node.
  14. 29.

    © 2018 InfluxData. All rights reserved. 29 @gianarb - gianluca@influxdb.com

    Kubernetes cluster in AWS / Labels and taints from EC2 tags When a new EC2 joins the cluster we catched the event and if the EC2 contains specific tags we apply them as tants or labels on the kubelet
  15. 30.

    © 2018 InfluxData. All rights reserved. 30 @gianarb - gianluca@influxdb.com

    Kubernetes cluster in AWS / Labels and taints from EC2 tags EC2 tag: kubernetes/aws-labeler/label/role=ci K8S label: awslabeler.influxdata.com/role=ci EC2 tag: kubernetes/aws-labeler/taint/role=ci:NoSchedule K8S Taint: awslabeler.influxdata.com/role=ci:NoSchedule
  16. 31.

    © 2018 InfluxData. All rights reserved. 31 @gianarb - gianluca@influxdb.com

    Kubernetes cluster in AWS / Long TTL 1. Security groups 2. VPC 3. Subnets 4. Route53 zones 5. AMI
  17. 32.

    © 2018 InfluxData. All rights reserved. 32 @gianarb - gianluca@influxdb.com

    AWS Autoscaling Group / Short TTL 1. AWS Autoscaling Groups to guarantee the expected number of workers 2. Every Autoscaling Group can run different Kubernetes versions, it can have different labels, instance type, security groups, AMI and so on. 3. Autoscaling group are provided as a service from AWS. No maintenance at all
  18. 34.

    © 2018 InfluxData. All rights reserved. 34 @gianarb - gianluca@influxdb.com

    Kubernetes cluster in AWS / Master 1. The master is manually provisioned 2. We use kubernetes cronjob to backup the etcd cluster 3. We are practical folks. We know we can do better but for now we are happy with what we have for now
  19. 36.

    Spit your provisioning by layers using Packer or LinuxKit to

    have a base image and UserData to specialize it
  20. 37.

    © 2018 InfluxData. All rights reserved. 37 @gianarb - gianluca@influxdb.com

    CoreOS supports ignition (ok this is JSON.. but it is not too much JSON!) { "ignition": {"version": "2.1.0"}, "storage": { "directories": [{"filesystem": "root", "path": "/etc/kubernetes/manifests"}, "files": [{ "filesystem": "root", "path": "/etc/ssl/AmazonRootCA1.pem", "contents": {"source": "https://www.amazontrust.com/repository/AmazonRootCA1.pem"} }] }, "systemd": { … } }
  21. 39.

    © 2018 InfluxData. All rights reserved. 39 @gianarb - gianluca@influxdb.com

    Wrap up ¨ JSON and YAML are not the problem ¨ Long TTL vs Short TTL ¨ Use the API! ¨ Patterns and methodology to write good infra as code ¨ Testing to enforce safety ¨ Inline documentation ¨ Code Review ¨ Immutable Infrastructure
  22. 40.

    © 2018 InfluxData. All rights reserved. 40 @gianarb - gianluca@influxdb.com

    Credits and links ¨ https://gianarb.it/blog/infrastructure-as-real-code ¨ https://twitter.com/danveloper/status/1078870433246662656 ¨ https://blog.couchbase.com/kubernetes-operators-game-changer/ ¨ https://gianarb.it/blog/reactive-planning-is-a-cloud-native-pattern ¨ https://engineering.bitnami.com/articles/a-deep-dive-into-kubernetes-controllers.html
  23. 41.

    © 2018 InfluxData. All rights reserved. 41 @gianarb - gianluca@influxdb.com

    Reach out: @gianarb gianluca@influxdb.com https://gianarb.it Any question?