Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CfgMgmtCamp - Infrastructure as code should con...

CfgMgmtCamp - Infrastructure as code should contain code

These days “infrastructure as code” means HCL, YAML, JSON.
I will never buy that JSON is a programming language.
Cloud Formation tries so hard mixing JSON with keywords that runtime become functions saying that it is a maintainable approach.
Helm pushes hard saying that YAML with some parameters that runtime get translated to a variable is a flexible and maintainable approach.
Infrastructure as code means that you are supposed to use a programming language because otherwise, it won’t work.
Some YAML evangelists will tell you that a “human-friendly data serialization standard” is better than code because it will keep you out from writing weird and wrong code that you are not supposed to write doing infrastructure.
The truth is that you need to code better! At InfluxData we know that, and this talk is about our journey moving from YAML to Go to manage our Kubernetes cluster.
What we faced and why we think it is way better!

Gianluca Arbezzano

February 05, 2019
Tweet

More Decks by Gianluca Arbezzano

Other Decks in Technology

Transcript

  1. Gianluca Arbezzano Site Reliability Engineer @InfluxData • https://gianarb.it • @gianarb

    What I like: • I make dirty hacks that look awesome • I grow my vegetables • Travel for fun and work
  2. © 2018 InfluxData. All rights reserved. 10 @gianarb - [email protected]

    The automation’s goals ¨ The ability to play repeatable tasks with a single click or continuously. ¨ The possibility to build pipelines using products like Kubernetes, AWS, Jenkins ¨ We would like to have a friendly UX to manage “ops”. Image credit: Pixabay
  3. © 2018 InfluxData. All rights reserved. 15 @gianarb - [email protected]

    apiVersion: extensions/v1beta1 kind: Deployment metadata: name: {{ template "drone.fullname" . }}-agent labels: app: {{ template "drone.name" . }} chart: "{{ .Chart.Name }}-{ .Chart.Version }}" release: "{{ .Release.Name }}" heritage: "{{ .Release.Service }}" component: agent spec: replicas: {{ .Values.agent.replicas }} template: metadata: annotations: checksum/secrets: {{ include (print $.Template.BasePath "/secrets.yaml") . | sha256sum }} {{- if .Values.agent.annotations } {{ toYaml .Values.agent.annotations | indent 8 } {{- end }} labels: app: {{ template "drone.name" . }} release: "{{ .Release.Name }}" component: agent
  4. © 2018 InfluxData. All rights reserved. 18 @gianarb - [email protected]

    Short TTL Long TTL don’t change very often They never change they have a controllable “state” Change frequently They have a state controllable by outside Depend from the external
  5. © 2018 InfluxData. All rights reserved. 21 @gianarb - [email protected]

    So you are not like me always with the k8s’s docs or AWS’s doc open to look at YAML formats?
  6. Code PRO: A Go struct brings compile time validation and

    inline docs (it means no unknown fields)
  7. © 2018 InfluxData. All rights reserved. 23 @gianarb - [email protected]

    Write friendly utility to manipulate the resources // Create the pull of services you need to deploy apps := []runtime.Object{} apps = append(apps, etcd.New()...) apps = append(apps, monitor.New()...) apps = append(apps, kafka.New()...) // Declare the number of transformation you need to apply to the declared resources ops = append(ops, WithNoLimits(), WithMaxReplicas(1), WithReplicas(map[string]int{ "etcd": 3, }), WithNoAffinity(), WithNoNodePorts()) // Apply Transformation apps = Transform(ops...)(apps) // Deploy the transformation using client-sdk
  8. © 2018 InfluxData. All rights reserved. 24 @gianarb - [email protected]

    Test what you expect avoids mistakes prod := []runtime.Object{} prod = append(prod, service1.New()...) prod = append(prod, service2.New()...) prod = append(prod, mysql.New()...) prod = append(prod, kafka.New()...) ops = append(ops, WithMaxReplicas(1), WithReplicas(map[string]int{ "service1": 3, "service2": 21, })) apps = Transform(ops...)(prod) // You can write a unit test to enforce what you need in the // prod environment or to avoid to change something that // should not be changed.
  9. © 2018 InfluxData. All rights reserved. 25 @gianarb - [email protected]

    // WithReplicas matches the serviceReplicas key to the statefulset service name // and sets the number of replicas to the value. func WithReplicas(serviceReplicas map[string]int) Op { return func(objs []runtime.Object) []runtime.Object { for _, o := range objs { switch app := o.(type) { case *appsv1.StatefulSet: for name, replicas := range serviceReplicas { if app.Spec.ServiceName == name { r := int32(replicas) app.Spec.Replicas = &r } } } } return objs } }
  10. © 2018 InfluxData. All rights reserved. 26 @gianarb - [email protected]

    Develop your pipeline ¨ Monitorable ¨ Reproducible ¨ Easy to expand ¨ Documentative
  11. © 2018 InfluxData. All rights reserved. 28 @gianarb - [email protected]

    Kubernetes cluster in AWS / Labels and taints from EC2 tags Every autoscaling group can passthrough tags to the underline EC2. We wrote a Golang application that uses a shared informer to listen for new registered node.
  12. © 2018 InfluxData. All rights reserved. 29 @gianarb - [email protected]

    Kubernetes cluster in AWS / Labels and taints from EC2 tags When a new EC2 joins the cluster we catched the event and if the EC2 contains specific tags we apply them as tants or labels on the kubelet
  13. © 2018 InfluxData. All rights reserved. 30 @gianarb - [email protected]

    Kubernetes cluster in AWS / Labels and taints from EC2 tags EC2 tag: kubernetes/aws-labeler/label/role=ci K8S label: awslabeler.influxdata.com/role=ci EC2 tag: kubernetes/aws-labeler/taint/role=ci:NoSchedule K8S Taint: awslabeler.influxdata.com/role=ci:NoSchedule
  14. © 2018 InfluxData. All rights reserved. 31 @gianarb - [email protected]

    Kubernetes cluster in AWS / Long TTL 1. Security groups 2. VPC 3. Subnets 4. Route53 zones 5. AMI
  15. © 2018 InfluxData. All rights reserved. 32 @gianarb - [email protected]

    AWS Autoscaling Group / Short TTL 1. AWS Autoscaling Groups to guarantee the expected number of workers 2. Every Autoscaling Group can run different Kubernetes versions, it can have different labels, instance type, security groups, AMI and so on. 3. Autoscaling group are provided as a service from AWS. No maintenance at all
  16. © 2018 InfluxData. All rights reserved. 34 @gianarb - [email protected]

    Kubernetes cluster in AWS / Master 1. The master is manually provisioned 2. We use kubernetes cronjob to backup the etcd cluster 3. We are practical folks. We know we can do better but for now we are happy with what we have for now
  17. Spit your provisioning by layers using Packer or LinuxKit to

    have a base image and UserData to specialize it
  18. © 2018 InfluxData. All rights reserved. 37 @gianarb - [email protected]

    CoreOS supports ignition (ok this is JSON.. but it is not too much JSON!) { "ignition": {"version": "2.1.0"}, "storage": { "directories": [{"filesystem": "root", "path": "/etc/kubernetes/manifests"}, "files": [{ "filesystem": "root", "path": "/etc/ssl/AmazonRootCA1.pem", "contents": {"source": "https://www.amazontrust.com/repository/AmazonRootCA1.pem"} }] }, "systemd": { … } }
  19. © 2018 InfluxData. All rights reserved. 39 @gianarb - [email protected]

    Wrap up ¨ JSON and YAML are not the problem ¨ Long TTL vs Short TTL ¨ Use the API! ¨ Patterns and methodology to write good infra as code ¨ Testing to enforce safety ¨ Inline documentation ¨ Code Review ¨ Immutable Infrastructure
  20. © 2018 InfluxData. All rights reserved. 40 @gianarb - [email protected]

    Credits and links ¨ https://gianarb.it/blog/infrastructure-as-real-code ¨ https://twitter.com/danveloper/status/1078870433246662656 ¨ https://blog.couchbase.com/kubernetes-operators-game-changer/ ¨ https://gianarb.it/blog/reactive-planning-is-a-cloud-native-pattern ¨ https://engineering.bitnami.com/articles/a-deep-dive-into-kubernetes-controllers.html