A Whirlwind Tour of Infra

A Whirlwind Tour of Infra

In this talk we cover how technology shapes architecture, and new developments in technology advance architecture.

3ca5501cb61a4251bd1e6f0a878bb8d4?s=128

Michael Hamrah

January 22, 2019
Tweet

Transcript

  1. A Whirlwind Tour of Infra Or A Brief History of

    Cloud Or Why Everything Is Hard 1
  2. A Story of Binaries and Processes 2

  3. A binary is just data on disk. A set of

    instructions. A process is a binary loaded into memory with a pid (loaded from another process!) CPUs execute instructions, which use memory and IO via syscalls 3
  4. ELF and PE see https://news.ycombinator.com/item?id=8029564 4 ❯ ldd dotnet linux-vdso.so.1

    (0x00007ffe2979b000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5b5e4d0000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5b5e4a libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f5b5e libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5b5e198000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5b5e17e000 libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5b5df94000) /lib64/ld-linux-x86-64.so.2 (0x00007f5b5e4f5000) Links are just other files loaded into memory!
  5. A scripting language starts with a binary…. That loads instructions

    and interprets/optimizes/executes high-level instructions myscript.sh node index.js ruby app.rb 5
  6. Sometime scripts are compiled to an intermediate language (as a

    library) and interpreted/optimized/executed... dotnet myapp.dll java myapp.jar wasm myapp.wasm Still requires linked libraries! 6
  7. Sometimes you can have just a binary… Statically compiled… With

    dependencies… CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -ldflags '-w -extldflags "-static"' -o mybin *.go 7
  8. In The Beginning • Hardware Ruled The World • Infrastructure

    Was Static • Processes Were bad • Vendors Were Dictators • Things Took Months/Years 8
  9. 9 F5 IIS APACHE DB SAN/RAID Apache with: mod_perl mod_php

    phusion passenger IIS Java (+Apache) with: JBoss Websphere Tomcat VMWare Public IP Private DNS
  10. Then Came Google (and AWS) 10 And “We’re not Google

    arguments”
  11. Hardware -> Software 11 HAProxy over F5 Hadoop Distributed Data

    Nginx over Apache (light over heavy)
  12. Servers Were Ephemeral Like clouds. Dotting a blue sky. Moving

    and morphing with the wind. 12
  13. 13 But there were still problems • Did not solve

    shared libraries • Did not solve configuration • Routing had limitations • Did not solve deploy
  14. 14 Immutable Infrastructure My first public talk! • Use Packer

    • Bake Images • Launch with automation • CFN, ASGs, ELBs, R53, S3, EBS • Jenkins, a UI for bash
  15. 15 And all was good in the world. But darkness

    lurked... • Things were slow • Hard to debug • Bespoke automation
  16. 16 And Then There Was Docker Which is just a

    binary. With its linked libraries. Bundled in a tarball. With better automation. And easier process management.
  17. 17 Containers Everywhere Architecture rethought. 3 tier, fat process ->

    Microservices Back to square one… CoreOS Mesosphere Kubernetes Docker Swarm Jaeger Istio... Lots of processes Lots of ad-hoc automation
  18. 18 Requirements • Run Code How how how?!?!?!!?

  19. 19 Requirements • We need servers (maybe not?!?!) ◦ We

    need to run processes • We need to configure these things • We need to install binaries on these ‘things’ • We need to configure these binaries • We need to run these binaries as processes • We need to secure and manage these things and processes • We need to route traffic to these processes • We need to update these things • We need to update these processes How how how?!?!?!!?
  20. 20 Observability Continuous Integration Continuous Delivery/Deployment Runtime (Kubernetes/Docker/App Servers) Server

    Infrastructure (AWS) Operations Configuration Management Environment Management Foundational Infrastructure Ease of Development, Testing and Delivery Meeting Production SLOs Security
  21. There is no right answer, only various degrees of wrong.

    We experiment, learn, decide, act, rinse, repeat and improve!
  22. 22 SignalFx, Logz.io, New Relic Jenkins Spinnaker/Octopus Kubernetes + Istio

    AWS Spinnaker, kubectl, Kubernetes, ad-hoc Kubernetes/Octopus Terraform Foundational Infrastructure Ease of Development, Testing and Delivery Meeting Production SLOs IAM, RBAC, Networking, VPN, Secrets
  23. 23 Welcome to Namely Infra AWS (Virginia) Production 10.50.0.0/16 Int

    10.52.0.0/16 Stage 10.51.0.0/16 VendorX 10.53.0.0/16 Ops 10.54.0.0/16 Portal IT 172.16.0.0/16 An environment is: • An AWS account and permissions • A VPC • Route tables • Everything required to run Namely • The ability to deploy components
  24. Easy CIDR: A little lesson in IPs 10.0.0.0/8 and 172.16.0.0/12

    An IP4 address is 32 bits That’s 32 1’s and 0’s 11110000 11110000 11110000 11110000 The /8 and /16 denotes how many bits ‘to keep’ (Big E) This denotes how IP’s are allocated A route table directs an IP range to a target
  25. 25 Environment Basics a Peering Public ELB(s) Internet Gateway Server1

    Server2 Jumpboxes Server3 VPC A bunch of RDS Alotta ElasticCache Some Aurora CloudFront S3 Kubernetes 15 workers 3 masters 5 etcd
  26. 26 Not so simple, though Zone 1a, private subnet, @

    10.100.12.0/22 Zone 1c, private subnet, 10.100.28.0/22 Zone 1d, private subnet, @ 10.100.44.0/22 Node Node Node Node Node Node Node Node Node Zone 1a, public subnet, 10.100.10.0/22 Zone 1c, public subnet, 10.100.32.0/22 Zone 1d, public subnet, 10.100.40.0/22 Public ELBs With Public IPs Private ELBs With Subnet IPs Try using the dig command to find out how dns names are mapped to IPs ENI ENI ENI SG SG SG SG SG SG SG SG SG
  27. 27 Automation #1: Terraform https://github.com/namely/tf2

  28. 28 Kubernetes • Allows us to run containers ◦ A

    container is just a binary, with its dependencies • Abstracts node-management issues for processes ◦ IP addresses, ports, security, quotas ◦ Also in this space: ECS, Docker-Swarm, GAE • Built-in config, secret, service discovery, scaling management • Automates some cloud infra, like load balancers • Foundational-focused API (doesn’t solve some things) ◦ Allows for extensibility
  29. 29 Kubernetes Cluster Etcd0 Etcd1 Etcd2 Etcd3 Etcd4 Master0 Master1

    Master2 Worker0 Worker2 Worker3 Worker1 Worker... Worker15 State is stored here Does most of the k8s work Where stuff runs
  30. 30 EKS Cluster Worker0 Worker2 Worker3 Worker1 Worker... Worker15 Where

    stuff runs Better networking
  31. 31 Kubernetes Pods • The smallest unit that runs on

    a cluster • Has a name and labels • Has one, unique private IP address • Has one or more containers • All containers in a pod share networking, storage ◦ Can “see” eachother on localhost Pod Name: slug-bcddbcd8-1sa3a Labels: • app: slug Containers: • Slug • image: namely/slug • ports: ◦ 50051 • env • Istio • A ‘Sidecar’ is simply another container in a pod, usually auto-injected
  32. 32 32 32 Kubernetes Networking Kubernetes creates a private address

    space for pods, and handles routing across nodes Worker0 10.250.25.112 Worker1 10.250.35.112 Worker2 10.250.5.112 Pod0 10.2.112.8 Pod1 10.2.112.9 Kubelet Pod0 10.2.180.8 Pod1 10.2.180.9 Kubelet Pod0 10.2.98.8 Pod1 10.2.98.9 Kubelet
  33. 33 33 Kubernetes ReplicaSets • ReplicaSets create Pods • They

    specify how many instances of a pod should be running • Represents desired state ◦ Kubernetes will try to get current state to desired state • You usually don’t deal with these though • But they are why Pods are re-created after you delete them! ReplicaSet Name: slug-bcddbcd8-1sa3a Labels: • app: slug Spec: • Replicas: 2 • Selector ◦ Labels • Template ◦ Same as Pod! Pod Pod
  34. 34 Kubernetes Deployments • Deployments Manage ReplicaSets • You usually

    deal with these ◦ Through Spinnaker! • Deployments wind down old ReplicaSets and scale up new ones. • They support various strategies for how things are updated. Deployment Name: slug Labels: • app: slug Spec: • Replicas: 4 • Strategy ◦ Rolling-update • Selector ◦ Labels • Template ◦ Same as Pod! RS-old RS-new
  35. Other ways to run Pods • CronJobs • DaemonSets •

    Jobs • kubectl run • kubectl exec lets you ‘get into’ a pod
  36. Kubernetes Services • The ‘default’ way pods discover pods (L4)

    • Uses an internal DNS service ◦ Currently CoreDNS • Also uses a private network (10.3/16) • Uses labels to match service names with pods • Has three types. ◦ ClusterIP (default) ◦ LoadBalancer ▪ Allows external traffic to flow to internal pods ▪ On AWS creates an ELB ◦ Don’t worry about type three Service Name: slug Labels: • app: slug Spec: • Type • Ports • Selector ◦ Labels
  37. Time for the OSI Stack!!!!!! Layer 7 Layer 5/6 Layer

    4 Layer 3 Layer 2 Layer 1 Application Blah Transport Network Data Link Physical HTTP/HTTP2 TCP/UDP IPv4/IPv6 Ethernet* Raw wires(less) This is important This is important
  38. Kubernetes Ingress • Services only speak L4-TCP/UDP ◦ ‘Limited’ routing

    (single service) • Ingress is L7. It knows about http and http2 • Not natively implemented by Kubernetes ◦ Only schema is defined ◦ Third parties implement ◦ We tried Contour, Nginx, and Istio • Allows us to compose, shape and route traffic declaratively ◦ Used for gRPC traffic ◦ Used for our APIs ◦ Used to better handle AATE egress Ingress Name: slug Labels: app: slug Spec: rules: - host: '*.i.namely.com' http: paths: - backend: serviceName: slug servicePort: 80 path: /api/slug
  39. All Together Now ELB/ILB Ingress Service Deployment Replica Set Pod

  40. Sadly not perfect • Load Balancing for gRPC • How

    do you determine ‘healthy’ pod • No standard metrics, tracing • No way to test failures • Can’t optimize traffic (inter-zone) • Standard retries, fail-fast
  41. Observability • Bypasses K8S Service ◦ No DNS • Speaks

    L4 and L7 • Emits stuff for us • Big plans for Namely ◦ A/B testing ◦ Failure testing ◦ Better traffic management Some Pod 10.2.123.12 App Container Istio 10.3.0.0/16 Other Pod 10.2.98.128 10.3.0.0/16 App Container Istio Straight to Pod’s IP Hey K8S, tell me every service and every pod IP
  42. Is My Service Working?

  43. Is My Service Working?

  44. 44 Is My Service Working?

  45. We Must Automate If we are engineering processes and solutions

    that are not automatable, we continue having to staff humans to maintain the system. If we have to staff humans to do the work, we are feeding the machines with the blood, sweat, and tears of human beings. Think The Matrix with less special effects and more pissed off System Administrators engineers.
  46. Automation is about finding the right level of abstraction Want

    to do things easily Want to standardize… Yet allow for customization Build knowledge Find good tools and happy path
  47. Kubernetes CRDs Allows you to extend the native Kubernetes API

    We will use this for declarative management of Namely-things (like estuary -> spinnaker) Service creation, secrets, Pager Duty, Dashboards, Databases, Redis Apply yaml just like Deployments, Services, Ingress (Service Catalog) NamelyService Name: employee Spec: Team: hcm Requirements: - Redis - Postgres Observability: - Bugsnag - PagerDuty - SignalFx
  48. Everything is an Investment We want a return. We must

    build up on what we’ve done.