A Whirlwind Tour of Infra

A Whirlwind Tour of Infra Or A Brief History of
Cloud Or Why Everything Is Hard 1

A Story of Binaries and Processes 2

A binary is just data on disk. A set of
instructions. A process is a binary loaded into memory with a pid (loaded from another process!) CPUs execute instructions, which use memory and IO via syscalls 3

ELF and PE see https://news.ycombinator.com/item?id=8029564 4 ❯ ldd dotnet linux-vdso.so.1
(0x00007ffe2979b000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5b5e4d0000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5b5e4a libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f5b5e libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5b5e198000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5b5e17e000 libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5b5df94000) /lib64/ld-linux-x86-64.so.2 (0x00007f5b5e4f5000) Links are just other ﬁles loaded into memory!

A scripting language starts with a binary…. That loads instructions
and interprets/optimizes/executes high-level instructions myscript.sh node index.js ruby app.rb 5

Sometime scripts are compiled to an intermediate language (as a
library) and interpreted/optimized/executed... dotnet myapp.dll java myapp.jar wasm myapp.wasm Still requires linked libraries! 6

Sometimes you can have just a binary… Statically compiled… With
dependencies… CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -ldflags '-w -extldflags "-static"' -o mybin *.go 7

In The Beginning • Hardware Ruled The World • Infrastructure
Was Static • Processes Were bad • Vendors Were Dictators • Things Took Months/Years 8

9 F5 IIS APACHE DB SAN/RAID Apache with: mod_perl mod_php
phusion passenger IIS Java (+Apache) with: JBoss Websphere Tomcat VMWare Public IP Private DNS

Then Came Google (and AWS) 10 And “We’re not Google
arguments”

Hardware -> Software 11 HAProxy over F5 Hadoop Distributed Data
Nginx over Apache (light over heavy)

Servers Were Ephemeral Like clouds. Dotting a blue sky. Moving
and morphing with the wind. 12

13 But there were still problems • Did not solve
shared libraries • Did not solve conﬁguration • Routing had limitations • Did not solve deploy

14 Immutable Infrastructure My ﬁrst public talk! • Use Packer
• Bake Images • Launch with automation • CFN, ASGs, ELBs, R53, S3, EBS • Jenkins, a UI for bash

15 And all was good in the world. But darkness
lurked... • Things were slow • Hard to debug • Bespoke automation

16 And Then There Was Docker Which is just a
binary. With its linked libraries. Bundled in a tarball. With better automation. And easier process management.

17 Containers Everywhere Architecture rethought. 3 tier, fat process ->
Microservices Back to square one… CoreOS Mesosphere Kubernetes Docker Swarm Jaeger Istio... Lots of processes Lots of ad-hoc automation

18 Requirements • Run Code How how how?!?!?!!?

19 Requirements • We need servers (maybe not?!?!) ◦ We
need to run processes • We need to configure these things • We need to install binaries on these ‘things’ • We need to configure these binaries • We need to run these binaries as processes • We need to secure and manage these things and processes • We need to route traffic to these processes • We need to update these things • We need to update these processes How how how?!?!?!!?

20 Observability Continuous Integration Continuous Delivery/Deployment Runtime (Kubernetes/Docker/App Servers) Server
Infrastructure (AWS) Operations Configuration Management Environment Management Foundational Infrastructure Ease of Development, Testing and Delivery Meeting Production SLOs Security

There is no right answer, only various degrees of wrong.
We experiment, learn, decide, act, rinse, repeat and improve!

22 SignalFx, Logz.io, New Relic Jenkins Spinnaker/Octopus Kubernetes + Istio
AWS Spinnaker, kubectl, Kubernetes, ad-hoc Kubernetes/Octopus Terraform Foundational Infrastructure Ease of Development, Testing and Delivery Meeting Production SLOs IAM, RBAC, Networking, VPN, Secrets

23 Welcome to Namely Infra AWS (Virginia) Production 10.50.0.0/16 Int
10.52.0.0/16 Stage 10.51.0.0/16 VendorX 10.53.0.0/16 Ops 10.54.0.0/16 Portal IT 172.16.0.0/16 An environment is: • An AWS account and permissions • A VPC • Route tables • Everything required to run Namely • The ability to deploy components

Easy CIDR: A little lesson in IPs 10.0.0.0/8 and 172.16.0.0/12
An IP4 address is 32 bits That’s 32 1’s and 0’s 11110000 11110000 11110000 11110000 The /8 and /16 denotes how many bits ‘to keep’ (Big E) This denotes how IP’s are allocated A route table directs an IP range to a target

25 Environment Basics a Peering Public ELB(s) Internet Gateway Server1
Server2 Jumpboxes Server3 VPC A bunch of RDS Alotta ElasticCache Some Aurora CloudFront S3 Kubernetes 15 workers 3 masters 5 etcd

26 Not so simple, though Zone 1a, private subnet, @
10.100.12.0/22 Zone 1c, private subnet, 10.100.28.0/22 Zone 1d, private subnet, @ 10.100.44.0/22 Node Node Node Node Node Node Node Node Node Zone 1a, public subnet, 10.100.10.0/22 Zone 1c, public subnet, 10.100.32.0/22 Zone 1d, public subnet, 10.100.40.0/22 Public ELBs With Public IPs Private ELBs With Subnet IPs Try using the dig command to find out how dns names are mapped to IPs ENI ENI ENI SG SG SG SG SG SG SG SG SG

27 Automation #1: Terraform https://github.com/namely/tf2

28 Kubernetes • Allows us to run containers ◦ A
container is just a binary, with its dependencies • Abstracts node-management issues for processes ◦ IP addresses, ports, security, quotas ◦ Also in this space: ECS, Docker-Swarm, GAE • Built-in config, secret, service discovery, scaling management • Automates some cloud infra, like load balancers • Foundational-focused API (doesn’t solve some things) ◦ Allows for extensibility

29 Kubernetes Cluster Etcd0 Etcd1 Etcd2 Etcd3 Etcd4 Master0 Master1
Master2 Worker0 Worker2 Worker3 Worker1 Worker... Worker15 State is stored here Does most of the k8s work Where stuff runs

30 EKS Cluster Worker0 Worker2 Worker3 Worker1 Worker... Worker15 Where
stuff runs Better networking

31 Kubernetes Pods • The smallest unit that runs on
a cluster • Has a name and labels • Has one, unique private IP address • Has one or more containers • All containers in a pod share networking, storage ◦ Can “see” eachother on localhost Pod Name: slug-bcddbcd8-1sa3a Labels: • app: slug Containers: • Slug • image: namely/slug • ports: ◦ 50051 • env • Istio • A ‘Sidecar’ is simply another container in a pod, usually auto-injected

32 32 32 Kubernetes Networking Kubernetes creates a private address
space for pods, and handles routing across nodes Worker0 10.250.25.112 Worker1 10.250.35.112 Worker2 10.250.5.112 Pod0 10.2.112.8 Pod1 10.2.112.9 Kubelet Pod0 10.2.180.8 Pod1 10.2.180.9 Kubelet Pod0 10.2.98.8 Pod1 10.2.98.9 Kubelet

33 33 Kubernetes ReplicaSets • ReplicaSets create Pods • They
specify how many instances of a pod should be running • Represents desired state ◦ Kubernetes will try to get current state to desired state • You usually don’t deal with these though • But they are why Pods are re-created after you delete them! ReplicaSet Name: slug-bcddbcd8-1sa3a Labels: • app: slug Spec: • Replicas: 2 • Selector ◦ Labels • Template ◦ Same as Pod! Pod Pod

34 Kubernetes Deployments • Deployments Manage ReplicaSets • You usually
deal with these ◦ Through Spinnaker! • Deployments wind down old ReplicaSets and scale up new ones. • They support various strategies for how things are updated. Deployment Name: slug Labels: • app: slug Spec: • Replicas: 4 • Strategy ◦ Rolling-update • Selector ◦ Labels • Template ◦ Same as Pod! RS-old RS-new

Other ways to run Pods • CronJobs • DaemonSets •
Jobs • kubectl run • kubectl exec lets you ‘get into’ a pod

Kubernetes Services • The ‘default’ way pods discover pods (L4)
• Uses an internal DNS service ◦ Currently CoreDNS • Also uses a private network (10.3/16) • Uses labels to match service names with pods • Has three types. ◦ ClusterIP (default) ◦ LoadBalancer ▪ Allows external traffic to flow to internal pods ▪ On AWS creates an ELB ◦ Don’t worry about type three Service Name: slug Labels: • app: slug Spec: • Type • Ports • Selector ◦ Labels

Time for the OSI Stack!!!!!! Layer 7 Layer 5/6 Layer
4 Layer 3 Layer 2 Layer 1 Application Blah Transport Network Data Link Physical HTTP/HTTP2 TCP/UDP IPv4/IPv6 Ethernet* Raw wires(less) This is important This is important

Kubernetes Ingress • Services only speak L4-TCP/UDP ◦ ‘Limited’ routing
(single service) • Ingress is L7. It knows about http and http2 • Not natively implemented by Kubernetes ◦ Only schema is defined ◦ Third parties implement ◦ We tried Contour, Nginx, and Istio • Allows us to compose, shape and route traffic declaratively ◦ Used for gRPC traffic ◦ Used for our APIs ◦ Used to better handle AATE egress Ingress Name: slug Labels: app: slug Spec: rules: - host: '*.i.namely.com' http: paths: - backend: serviceName: slug servicePort: 80 path: /api/slug

All Together Now ELB/ILB Ingress Service Deployment Replica Set Pod

Sadly not perfect • Load Balancing for gRPC • How
do you determine ‘healthy’ pod • No standard metrics, tracing • No way to test failures • Can’t optimize traﬃc (inter-zone) • Standard retries, fail-fast

Observability • Bypasses K8S Service ◦ No DNS • Speaks
L4 and L7 • Emits stuff for us • Big plans for Namely ◦ A/B testing ◦ Failure testing ◦ Better traﬃc management Some Pod 10.2.123.12 App Container Istio 10.3.0.0/16 Other Pod 10.2.98.128 10.3.0.0/16 App Container Istio Straight to Pod’s IP Hey K8S, tell me every service and every pod IP

Is My Service Working?

44 Is My Service Working?

We Must Automate If we are engineering processes and solutions
that are not automatable, we continue having to staff humans to maintain the system. If we have to staff humans to do the work, we are feeding the machines with the blood, sweat, and tears of human beings. Think The Matrix with less special effects and more pissed off System Administrators engineers.

Automation is about finding the right level of abstraction Want
to do things easily Want to standardize… Yet allow for customization Build knowledge Find good tools and happy path

Kubernetes CRDs Allows you to extend the native Kubernetes API
We will use this for declarative management of Namely-things (like estuary -> spinnaker) Service creation, secrets, Pager Duty, Dashboards, Databases, Redis Apply yaml just like Deployments, Services, Ingress (Service Catalog) NamelyService Name: employee Spec: Team: hcm Requirements: - Redis - Postgres Observability: - Bugsnag - PagerDuty - SignalFx

Everything is an Investment We want a return. We must
build up on what we’ve done.

A Whirlwind Tour of Infra

A Whirlwind Tour of Infra

More Decks by Michael Hamrah

Other Decks in Technology

Featured

Transcript