Slide 1

Slide 1 text

Stateful Applications on Autopilot Tim Gross @0x74696d (“tim”) github.com/autopilotpattern

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

github.com/autopilotpattern What if your containers were self-aware and self-operating?

Slide 5

Slide 5 text

github.com/autopilotpattern

Slide 6

Slide 6 text

github.com/autopilotpattern How do we get from dev to prod? • Service Discovery • Load balancing • Automated-failover • Config changes • Monitoring

Slide 7

Slide 7 text

github.com/autopilotpattern App

Slide 8

Slide 8 text

github.com/autopilotpattern Nginx Consul MySQL Primary MySQL Replica ES Master ES Data Kibana Prometheus Sales Logstash Customers

Slide 9

Slide 9 text

github.com/autopilotpattern Nginx Consul /custom ers /sales /sales/data /customers/data read/write read- only MySQL Primary MySQL Replica ES Master ES Data Kibana Prometheus Sales Logstash Customers Load Balancing

Slide 10

Slide 10 text

github.com/autopilotpattern Nginx Consul /custom ers /sales /sales/data /customers/data read/write read- only async replication MySQL Primary MySQL Replica ES Master ES Data Kibana Prometheus Sales Logstash Customers Replication & Fail-over

Slide 11

Slide 11 text

github.com/autopilotpattern Nginx Consul /custom ers /sales /sales/data /customers/data read/write read- only async replication MySQL Primary MySQL Replica ES Master ES Data Kibana Prometheus Sales Logstash Customers Service discovery

Slide 12

Slide 12 text

github.com/autopilotpattern Nginx Consul /custom ers /sales /sales/data /customers/data read/write read- only async replication MySQL Primary MySQL Replica ES Master ES Data Kibana Prometheus Sales Logstash Customers Logging

Slide 13

Slide 13 text

github.com/autopilotpattern Nginx Consul /custom ers /sales /sales/data /customers/data read/write read- only async replication MySQL Primary MySQL Replica ES Master ES Data Kibana Prometheus Sales Logstash Customers Monitoring

Slide 14

Slide 14 text

github.com/autopilotpattern Problem: Service Discovery

Slide 15

Slide 15 text

github.com/autopilotpattern App Application

Slide 16

Slide 16 text

github.com/autopilotpattern Database App Application w/ database

Slide 17

Slide 17 text

github.com/autopilotpattern Database App Application w/ database How does the app find the DB? Can we just use DNS?

Slide 18

Slide 18 text

github.com/autopilotpattern Couchbase App Couchbase Couchbase Couchbase Application w/ Couchbase Couchbase Couchbase

Slide 19

Slide 19 text

github.com/autopilotpattern Couchbase App Couchbase Couchbase Couchbase Application w/ Couchbase Couchbase Couchbase

Slide 20

Slide 20 text

github.com/autopilotpattern Couchbase App Couchbase Couchbase Couchbase Application w/ Couchbase Couchbase Couchbase

Slide 21

Slide 21 text

github.com/autopilotpattern Couchbase App Couchbase Couchbase Couchbase Application w/ Couchbase Couchbase Couchbase Need to know real IP address (not A-Record)

Slide 22

Slide 22 text

github.com/autopilotpattern Couchbase App Couchbase Couchbase Couchbase Application w/ Couchbase Couchbase Couchbase What happens when we lose a node?

Slide 23

Slide 23 text

github.com/autopilotpattern Couchbase App Couchbase Couchbase Couchbase Application w/ Couchbase Couchbase Couchbase Does client respect DNS TTL?

Slide 24

Slide 24 text

github.com/autopilotpattern Problem: Load Balancing

Slide 25

Slide 25 text

github.com/autopilotpattern Nginx Customers Sales Microservices application /sales/data /customers/data /sales /custom ers

Slide 26

Slide 26 text

github.com/autopilotpattern Nginx Sales Customers /sales/data /customers/data /sales /custom ers Microservices application

Slide 27

Slide 27 text

github.com/autopilotpattern Nginx Sales Customers /sales/data /customers/data /sales /custom ers Microservices application How do apps update peers when we scale out?

Slide 28

Slide 28 text

github.com/autopilotpattern Nginx Sales Customers /sales/data /custom ers/data Microservices application Route everything thru Nginx (or LB)?

Slide 29

Slide 29 text

github.com/autopilotpattern Nginx Sales Customers /sales/data /custom ers/data Microservices application How do we update Nginx backends? Adds network path length and SPoF

Slide 30

Slide 30 text

github.com/autopilotpattern Sales Sidecar/ Proxy Customers http://localhost http://192.168.1.1 ex. Bamboo Compute node

Slide 31

Slide 31 text

github.com/autopilotpattern Sales Sidecar/ Proxy Customers http://localhost http://192.168.1.1 How do we update proxy config? Adds network path length Compute node

Slide 32

Slide 32 text

github.com/autopilotpattern Nginx Sales Consul /custom ers /sales /sales/data /customers/data Customers Microservices application w/ discovery catalog

Slide 33

Slide 33 text

github.com/autopilotpattern Nginx Sales Consul /custom ers /sales /sales/data /customers/data Customers How do we make existing applications use it? Microservices application w/ discovery catalog

Slide 34

Slide 34 text

github.com/autopilotpattern Problem: Automated- Failover

Slide 35

Slide 35 text

github.com/autopilotpattern read/write read-only async replication App Primary Replica MySQL with replication

Slide 36

Slide 36 text

github.com/autopilotpattern read/write read-only async replication App Primary Replica MySQL with replication How does client find DB? How does replica find primary? How does primary tell replica where to start?

Slide 37

Slide 37 text

github.com/autopilotpattern read/write read-only async replication App Primary Replica MySQL with replication How do we update client on failover? How do we promote a replica? How do we orchestrate backups?

Slide 38

Slide 38 text

github.com/autopilotpattern Solutions that don’t work: Configuration Management (ex. Chef, Puppet, Ansible)

Slide 39

Slide 39 text

github.com/autopilotpattern • No CM server in local development • No service discovery on change • Agents are heavy

Slide 40

Slide 40 text

github.com/autopilotpattern Solutions that don’t work: *aaS (ex. PaaS, DBaaS)

Slide 41

Slide 41 text

github.com/autopilotpattern • Vendor lock-in • Poor performance • Very expensive

Slide 42

Slide 42 text

github.com/autopilotpattern Solutions that don’t work: Mega-orchestrator (ex. Kubernetes)

Slide 43

Slide 43 text

github.com/autopilotpattern Behavior split between orchestrator and application

Slide 44

Slide 44 text

github.com/autopilotpattern Tight coupling between orchestrator and application

Slide 45

Slide 45 text

github.com/autopilotpattern Developers don’t run the orchestrator in development

Slide 46

Slide 46 text

github.com/autopilotpattern Shifts responsibility for app behavior away from app developers

Slide 47

Slide 47 text

github.com/autopilotpattern Shifts responsibility for app behavior away from app developers

Slide 48

Slide 48 text

github.com/autopilotpattern What if your containers were self-aware and self-operating?

Slide 49

Slide 49 text

github.com/autopilotpattern

Slide 50

Slide 50 text

github.com/autopilotpattern “[a] pattern where containers autonomously adapt to changes in their environment and coordinate their actions thru a globally shared state” Lukasz Guminski, Container Solutions http://container-solutions.com/containerpilot-on-mantl/

Slide 51

Slide 51 text

github.com/autopilotpattern Make applications responsible for: Startup Shutdown Scaling Discovery Recovery Telemetry

Slide 52

Slide 52 text

github.com/autopilotpattern Empower development teams to operationalize their applications

Slide 53

Slide 53 text

github.com/autopilotpattern 3 requirements:

Slide 54

Slide 54 text

github.com/autopilotpattern #1: Ability to provision applications onto compute

Slide 55

Slide 55 text

github.com/autopilotpattern VM or physical hardware VM or physical hardware VM or physical hardware Nginx Consul MySQL Primary ES Master Prometheus Logstash Customers Nginx Consul MySQL Primary ES Master Prometheus Logstash Customers Customers Cluster management & provisioning

Slide 56

Slide 56 text

github.com/autopilotpattern Options for cluster management and container placement:

Slide 57

Slide 57 text

github.com/autopilotpattern #2: Network virtualization

Slide 58

Slide 58 text

github.com/autopilotpattern IP inside the container IP outside the container ==

Slide 59

Slide 59 text

github.com/autopilotpattern NAT Sales Customers 192.168.1.101 Compute Node 172.17.0.2:80 192.168.1.100:32380 Docker bridge networking Consul

Slide 60

Slide 60 text

github.com/autopilotpattern NAT Sales Customers Consul Compute Node 192.168.1.100:32380 Docker bridge networking “I’m listening on 172.17.0.2:80” 172.17.0.2:80

Slide 61

Slide 61 text

github.com/autopilotpattern NAT Sales Customers Consul Compute Node 192.168.1.100:32380 Docker bridge networking “Where is Customers?” “172.17.0.2:80” 172.17.0.2:80

Slide 62

Slide 62 text

github.com/autopilotpattern NAT Sales Customers Consul Compute Node 192.168.1.100:32380 Docker bridge networking WTF???!!! 172.17.0.2:80 172.17.0.2:80 No route to host!

Slide 63

Slide 63 text

github.com/autopilotpattern Sales Customers Compute Node Docker host networking Consul 192.168.1.101 192.168.1.100:80

Slide 64

Slide 64 text

github.com/autopilotpattern Sales Customers Compute Node Docker host networking Consul 192.168.1.101 192.168.1.100:80 “I’m listening on 192.168.1.100:80”

Slide 65

Slide 65 text

github.com/autopilotpattern Sales Customers Compute Node Docker host networking Consul 192.168.1.101 192.168.1.100:80 “I’m listening on 192.168.1.100:80” Customers 192.168.1.100:80

Slide 66

Slide 66 text

github.com/autopilotpattern Sales Customers Compute Node Docker host networking Consul 192.168.1.101 192.168.1.100:80 “I’m listening on 192.168.1.100:80” Customers 192.168.1.100:80 Port conflicts!

Slide 67

Slide 67 text

github.com/autopilotpattern Sales Customers Compute Node Overlay networking Consul 192.168.1.101 192.168.1.100:80 “I’m listening on 192.168.1.100:80” Customers 192.168.1.102:80

Slide 68

Slide 68 text

github.com/autopilotpattern Sales Customers Compute Node Overlay networking Consul 192.168.1.101 192.168.1.100:80 “I’m listening on 192.168.1.102:80” Customers 192.168.1.102:80

Slide 69

Slide 69 text

github.com/autopilotpattern Options for overlay networking:

Slide 70

Slide 70 text

github.com/autopilotpattern #3: Infrastructure-backed service discovery

Slide 71

Slide 71 text

github.com/autopilotpattern Nginx Sales Consul Customers Microservices app

Slide 72

Slide 72 text

github.com/autopilotpattern Nginx Sales Consul Customers Microservices app How do we bootstrap service catalog HA? How do services find service catalog?

Slide 73

Slide 73 text

github.com/autopilotpattern Options to bootstrap service catalog: infrastructure-backed DNS * run on each node * Container Name Service (CNS)

Slide 74

Slide 74 text

github.com/autopilotpattern #4: We might need some help

Slide 75

Slide 75 text

github.com/autopilotpattern App-centric micro-orchestrator that runs inside the container. User-defined behaviors: • Lifecycle hooks (preStop, preStop, postStop) • Health checks w/ heart beats • Watch discovery catalog for changes • Update config on upstream changes • Gather performance metrics

Slide 76

Slide 76 text

github.com/autopilotpattern Sales Container Pilot Application Application container http://localhost http://192.168.1.1 Side-car proxy?

Slide 77

Slide 77 text

github.com/autopilotpattern Sales Container Pilot Application Application container http://localhost http://192.168.1.1 Not a proxy!

Slide 78

Slide 78 text

github.com/autopilotpattern Sales Container Pilot Application Consul Where is Sales? Application container

Slide 79

Slide 79 text

github.com/autopilotpattern Sales Container Pilot Application Consul Where is Sales? 192.168.1.100 192.168.1.101 192.168.1.102 Application container

Slide 80

Slide 80 text

github.com/autopilotpattern Sales Container Pilot Application Consul Where is Sales? 192.168.1.100 192.168.1.101 192.168.1.102 Application container onChange event

Slide 81

Slide 81 text

github.com/autopilotpattern Sales Container Pilot Application http://192.168.1.100 Consul Where is Sales? 192.168.1.100 192.168.1.101 192.168.1.102 Application container onChange event

Slide 82

Slide 82 text

github.com/autopilotpattern Application onChange event User-defined behavior hooks: • preStart • preStop • postStop • health • onChange • sensor • task • co-process Application container

Slide 83

Slide 83 text

github.com/autopilotpattern Microservices stack

Slide 84

Slide 84 text

github.com/autopilotpattern Nginx Sales Consul /custom ers /sales /sales/data /customers/data Customers

Slide 85

Slide 85 text

github.com/autopilotpattern ~ $ git clone [email protected]:autopilotpattern/workshop.git ~ $ cd workshop && git checkout workshop ~/workshop $ tree --dirsfirst . ├── customers │ ├── Dockerfile │ ├── containerpilot.json │ ├── customers.js │ └── package.json ├── nginx │ ├── Dockerfile │ ├── containerpilot.json │ ├── index.html │ ├── index.js │ ├── nginx.conf │ └── nginx.conf.ctmpl ├── sales │ ├── Dockerfile │ ├── containerpilot.json │ ├── package.json │ └── sales.js └── docker-compose.yml

Slide 86

Slide 86 text

github.com/autopilotpattern # a Node.js application container FROM gliderlabs/alpine:3.3 # dependencies RUN apk update && apk add nodejs curl COPY package.json /opt/customers/ RUN cd /opt/customers && npm install # add our application and configuration COPY customers.js /opt/customers/ EXPOSE 4000 CMD [ "node", "/opt/customers/customers.js" ] ~/workshop/customers/Dockerfile

Slide 87

Slide 87 text

github.com/autopilotpattern # a Node.js application container FROM gliderlabs/alpine:3.3 # dependencies RUN apk update && apk add nodejs curl COPY package.json /opt/customers/ RUN cd /opt/customers && npm install # get ContainerPilot release (please verify checksum in real life, but YOLO!) RUN curl -Lo /tmp/cp.tar.gz https://github.com/joyent/containerpilot/… tar -xz -f /tmp/cp.tar.gz && mv /containerpilot /bin/ # add our application and configuration COPY customers.js /opt/customers/ COPY containerpilot.json /etc/containerpilot.json ENV CONTAINERPILOT=file:///etc/containerpilot.json EXPOSE 4000 CMD [ "/bin/containerpilot", "node", "/opt/customers/customers.js" ] ~/workshop/customers/Dockerfile

Slide 88

Slide 88 text

github.com/autopilotpattern # a Node.js application container FROM gliderlabs/alpine:3.3 # dependencies RUN apk update && apk add nodejs curl COPY package.json /opt/customers/ RUN cd /opt/customers && npm install # get ContainerPilot release (please verify checksum in real life, but YOLO!) RUN curl -Lo /tmp/cp.tar.gz https://github.com/joyent/containerpilot/… tar -xz -f /tmp/cp.tar.gz && mv /containerpilot /bin/ # add our application and configuration COPY customers.js /opt/customers/ COPY containerpilot.json /etc/containerpilot.json ENV CONTAINERPILOT=file:///etc/containerpilot.json EXPOSE 4000 CMD [ "/bin/containerpilot", "node", "/opt/customers/customers.js" ] ~/workshop/customers/Dockerfile

Slide 89

Slide 89 text

github.com/autopilotpattern # a Node.js application container FROM gliderlabs/alpine:3.3 # dependencies RUN apk update && apk add nodejs curl COPY package.json /opt/customers/ RUN cd /opt/customers && npm install # get ContainerPilot release (please verify checksum in real life, but YOLO!) RUN curl -Lo /tmp/cp.tar.gz https://github.com/joyent/containerpilot/… tar -xz -f /tmp/cp.tar.gz && mv /containerpilot /bin/ # add our application and configuration COPY customers.js /opt/customers/ COPY containerpilot.json /etc/containerpilot.json ENV CONTAINERPILOT=file:///etc/containerpilot.json EXPOSE 4000 CMD [ "/bin/containerpilot", "node", "/opt/customers/customers.js" ] ~/workshop/customers/Dockerfile

Slide 90

Slide 90 text

github.com/autopilotpattern { "consul": "consul:8500", "services": [ { "name": "customers", "port": 4000, "health": "/usr/bin/curl --fail -s http://localhost:4000/data", "poll": 3, "ttl": 10 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "pkill -SIGHUP node" } ] } ~/workshop/customers/etc/containerpilot.json

Slide 91

Slide 91 text

github.com/autopilotpattern { "consul": "consul:8500", "services": [ { "name": "customers", "port": 4000, "health": "/usr/bin/curl --fail -s http://localhost:4000/data", "poll": 3, "ttl": 10 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "pkill -SIGHUP node" } ] } ~/workshop/customers/etc/containerpilot.json can be templatized w/ env var: {{ .CONSUL }}

Slide 92

Slide 92 text

github.com/autopilotpattern { "consul": "consul:8500", "services": [ { "name": "customers", "port": 4000, "health": "/usr/bin/curl --fail -s http://localhost:4000/data", "poll": 3, "ttl": 10 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "pkill -SIGHUP node" } ] } ~/workshop/customers/etc/containerpilot.json

Slide 93

Slide 93 text

github.com/autopilotpattern { "consul": "consul:8500", "services": [ { "name": "customers", "port": 4000, "health": "/usr/bin/curl --fail -s http://localhost:4000/data", "poll": 3, "ttl": 10 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "pkill -SIGHUP node" } ] } ~/workshop/customers/etc/containerpilot.json

Slide 94

Slide 94 text

github.com/autopilotpattern Container Pilot Application Application container

Slide 95

Slide 95 text

github.com/autopilotpattern Container Pilot Application Application container onChange event health check

Slide 96

Slide 96 text

github.com/autopilotpattern Container Pilot Application Application container onChange event health check root@993acf351cd9:/# ps axo uid,pid,ppid,cmd UID PID PPID CMD root 1 0 /bin/containerpilot root 94 1 ├─ node /opt/customers/customers.js root 107 1 ├─ /usr/bin/curl --fail localhost:4000 root 120 1 └─ pkill -SIGHUP node

Slide 97

Slide 97 text

github.com/autopilotpattern { "consul": "consul:8500", "preStart": "/bin/reload-nginx.sh preStart", "services": [ { "name": "nginx", "port": 80, "interfaces": ["eth1", "eth0"], "health": "/usr/bin/curl --fail -s http://localhost/health", "poll": 10, "ttl": 25 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” },{ "name": "customers", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” } ]... ~/workshop/nginx/etc/containerpilot.json

Slide 98

Slide 98 text

github.com/autopilotpattern { "consul": "consul:8500", "preStart": "/bin/reload-nginx.sh preStart", "services": [ { "name": "nginx", "port": 80, "interfaces": ["eth1", "eth0"], "health": "/usr/bin/curl --fail -s http://localhost/health", "poll": 10, "ttl": 25 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” },{ "name": "customers", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” } ]... ~/workshop/nginx/etc/containerpilot.json

Slide 99

Slide 99 text

github.com/autopilotpattern { "consul": "consul:8500", "preStart": "/bin/reload-nginx.sh preStart", "services": [ { "name": "nginx", "port": 80, "interfaces": ["eth1", "eth0"], "health": "/usr/bin/curl --fail -s http://localhost/health", "poll": 10, "ttl": 25 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” },{ "name": "customers", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” } ]... ~/workshop/nginx/etc/containerpilot.json

Slide 100

Slide 100 text

github.com/autopilotpattern { "consul": "consul:8500", "preStart": "/bin/reload-nginx.sh preStart", "services": [ { "name": "nginx", "port": 80, "interfaces": ["eth1", "eth0"], "health": "/usr/bin/curl --fail -s http://localhost/health", "poll": 10, "ttl": 25 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” },{ "name": "customers", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” } ]... ~/workshop/nginx/etc/containerpilot.json

Slide 101

Slide 101 text

github.com/autopilotpattern Container Pilot Application Application container onChange event health check root@993acf351cd9:/# ps axo uid,pid,ppid,cmd UID PID PPID CMD root 1 0 /bin/containerpilot root 94 1 ├─ nginx -g “daemon off;” root 107 1 ├─ /usr/bin/curl --fail localhost root 120 1 └─ consul-template -SIGHUP node root 128 120 └─ nginx -s reload

Slide 102

Slide 102 text

github.com/autopilotpattern Container Pilot Consul Lifecycle: preStart Application container

Slide 103

Slide 103 text

github.com/autopilotpattern Container Pilot Consul Lifecycle: preStart PID1 Separate container Application container

Slide 104

Slide 104 text

github.com/autopilotpattern Container Pilot Consul preStart Lifecycle: preStart Application container

Slide 105

Slide 105 text

github.com/autopilotpattern Container Pilot Consul preStart Lifecycle: preStart Note: no main application running yet! If exit code of preStart != 0, ContainerPilot exits Application container

Slide 106

Slide 106 text

github.com/autopilotpattern #!/bin/sh # Render Nginx configuration template using values from Consul, # but do not reload because Nginx has't started yet preStart() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf" } # Render Nginx configuration template using values from Consul, # then gracefully reload Nginx onChange() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf:nginx -s reload" } until cmd=$1 if [ -z "$cmd" ]; then onChange fi shift 1 $cmd "$@" [ "$?" -ne 127 ] do onChange exit done ~/workshop/nginx/reload-nginx.sh

Slide 107

Slide 107 text

github.com/autopilotpattern #!/bin/sh # Render Nginx configuration template using values from Consul, # but do not reload because Nginx has't started yet preStart() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf" } # Render Nginx configuration template using values from Consul, # then gracefully reload Nginx onChange() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf:nginx -s reload" } until cmd=$1 if [ -z "$cmd" ]; then onChange fi shift 1 $cmd "$@" [ "$?" -ne 127 ] do onChange exit done can be templatized w/ env var: $CONTAINERPILOT__IP ~/workshop/nginx/reload-nginx.sh

Slide 108

Slide 108 text

github.com/autopilotpattern #!/bin/sh # Render Nginx configuration template using values from Consul, # but do not reload because Nginx has't started yet preStart() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf" } # Render Nginx configuration template using values from Consul, # then gracefully reload Nginx onChange() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf:nginx -s reload" } until cmd=$1 if [ -z "$cmd" ]; then onChange fi shift 1 $cmd "$@" [ "$?" -ne 127 ] do onChange exit done ~/workshop/nginx/reload-nginx.sh /etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf:nginx -s reload render this file : to this file : and then do this

Slide 109

Slide 109 text

github.com/autopilotpattern Container Pilot Consul preStart Lifecycle: preStart Nginx container consul-template: get addresses for upstreams

Slide 110

Slide 110 text

github.com/autopilotpattern Container Pilot Consul preStart Lifecycle: preStart “None yet!” or “{ customers: [ 192.168.1.100:4000, 192.168.1.101:4000], sales: [ 192.168.1.102:3000, 192.168.1.103:3000] }” Nginx container consul-template: get addresses for upstreams

Slide 111

Slide 111 text

github.com/autopilotpattern Container Pilot Consul preStart Lifecycle: preStart “None yet!” or “{ customers: [ 192.168.1.100:4000, 192.168.1.101:4000], sales: [ 192.168.1.102:3000, 192.168.1.103:3000] }” Nginx container consul-template: get addresses for upstreams render virtualhost config

Slide 112

Slide 112 text

github.com/autopilotpattern user nginx; worker_processes 1; error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid; events { worker_connections 1024; } http { include /etc/nginx/mime.types; default_type application/octet-stream; access_log /var/log/nginx/access.log main; sendfile on; keepalive_timeout 65; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; server { listen 80; server_name _; root /usr/share/nginx/html; location /health { # requires http_stub_status_module stub_status; allow 127.0.0.1; deny all; } } }

Slide 113

Slide 113 text

github.com/autopilotpattern {{ if service "sales" }} upstream sales { # write the address:port pairs for each healthy Sales node {{range service "sales"}} server {{.Address}}:{{.Port}}; {{end}} least_conn; }{{ end }} server { listen 80; server_name _; root /usr/share/nginx/html; {{ if service "sales" }} location ^~ /sales { # strip '/sales' from the request before passing # it along to the Sales upstream rewrite ^/sales(/.*)$ $1 break; proxy_pass http://sales; proxy_redirect off; }{{end}} } }

Slide 114

Slide 114 text

github.com/autopilotpattern {{ if service "sales" }} upstream sales { # write the address:port pairs for each healthy Sales node {{range service "sales"}} server {{.Address}}:{{.Port}}; {{end}} least_conn; }{{ end }} server { listen 80; server_name _; root /usr/share/nginx/html; {{ if service "sales" }} location ^~ /sales { # strip '/sales' from the request before passing # it along to the Sales upstream rewrite ^/sales(/.*)$ $1 break; proxy_pass http://sales; proxy_redirect off; }{{end}} } }

Slide 115

Slide 115 text

github.com/autopilotpattern {{ if service "sales" }} upstream sales { # write the address:port pairs for each healthy Sales node {{range service "sales"}} server {{.Address}}:{{.Port}}; {{end}} least_conn; }{{ end }} server { listen 80; server_name _; root /usr/share/nginx/html; {{ if service "sales" }} location ^~ /sales { # strip '/sales' from the request before passing # it along to the Sales upstream rewrite ^/sales(/.*)$ $1 break; proxy_pass http://sales; proxy_redirect off; }{{end}} } }

Slide 116

Slide 116 text

github.com/autopilotpattern upstream sales { # write the address:port pairs for each healthy Sales node server 192.168.1.101:3000; server 192.168.1.102:3000; server 192.168.1.103:3000; least_conn; } server { listen 80; server_name _; root /usr/share/nginx/html; location ^~ /sales { # strip '/sales' from the request before passing # it along to the Sales upstream rewrite ^/sales(/.*)$ $1 break; proxy_pass http://sales; proxy_redirect off; } } }

Slide 117

Slide 117 text

github.com/autopilotpattern Container Pilot Consul Lifecycle: run

Slide 118

Slide 118 text

github.com/autopilotpattern Container Pilot node Consul Lifecycle: run • Attach to stdout/ stderr • Return exit code of application to Docker runtime application container

Slide 119

Slide 119 text

github.com/autopilotpattern Container Pilot Consul Lifecycle: health node health application container

Slide 120

Slide 120 text

github.com/autopilotpattern { "consul": "consul:8500", "services": [ { "name": "customers", "port": 4000, "health": "/usr/bin/curl --fail -s http://localhost:4000/data", "poll": 3, "ttl": 10 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "pkill -SIGHUP node" } ] } ~/workshop/customers/etc/containerpilot.json

Slide 121

Slide 121 text

github.com/autopilotpattern User-defined health check inside the container. Runs every poll seconds. Container Pilot Consul Lifecycle: health nginx (or node) health application container

Slide 122

Slide 122 text

github.com/autopilotpattern Container Pilot Consul Lifecycle: health Exit! nginx (or node) health application container

Slide 123

Slide 123 text

github.com/autopilotpattern Container Pilot nginx (or node) Consul health Lifecycle: health Exit code is 0? “I am customers-12345. I am available at 192.168.100.2:4000. I am healthy for the next 10 seconds.” application container

Slide 124

Slide 124 text

github.com/autopilotpattern Container Pilot nginx (or node) Consul health Lifecycle: health If exit code != 0, do nothing (TTL expires) application container

Slide 125

Slide 125 text

github.com/autopilotpattern Container Pilot nginx (or node) Consul Where is customers? 192.168.1.101:3000 application container Lifecycle: onChange

Slide 126

Slide 126 text

github.com/autopilotpattern Container Pilot nginx (or node) Consul Where is customers? 192.168.1.101:3000 application container Lifecycle: onChange Check Consul for services listed in backends. Runs every poll seconds.

Slide 127

Slide 127 text

github.com/autopilotpattern #!/bin/sh # Render Nginx configuration template using values from Consul, # but do not reload because Nginx has't started yet preStart() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf" } # Render Nginx configuration template using values from Consul, # then gracefully reload Nginx onChange() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf:nginx -s reload" } until cmd=$1 if [ -z "$cmd" ]; then onChange fi shift 1 $cmd "$@" [ "$?" -ne 127 ] do onChange exit done ~/workshop/nginx/reload-nginx.sh

Slide 128

Slide 128 text

github.com/autopilotpattern { "consul": "consul:8500", "services": [ { "name": "customers", "port": 4000, "health": "/usr/bin/curl --fail -s http://localhost:4000/data", "poll": 3, "ttl": 10 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "pkill -SIGHUP node" } ] } ~/workshop/customers/etc/containerpilot.json

Slide 129

Slide 129 text

github.com/autopilotpattern var upstreamHosts = []; var getUpstreams = function(force, callback) { // get data from Consul // fill upstreamHosts // fire callback } process.on('SIGHUP', function () { console.log('Received SIGHUP'); getUpstreams(true, function(hosts) { console.log(‘Updated upstreamHosts'); }); }); ~/workshop/sales/sales.js

Slide 130

Slide 130 text

github.com/autopilotpattern ... "telemetry": { "port": 9090, "sensors": [ { "name": "tb_nginx_connections_unhandled_total", "help": "Number of accepted connnections that were not handled", "type": "gauge", "poll": 5, "check": ["/opt/containerpilot/sensor.sh", "unhandled"] }, { "name": "tb_nginx_connections_load", "help": "Ratio of active connections to max worker connections", "type": "gauge", "poll": 5, "check": ["/opt/containerpilot/sensor.sh", "connections_load"] } ] } } ~/workshop/nginx/etc/containerpilot.json

Slide 131

Slide 131 text

github.com/autopilotpattern Stateful applications

Slide 132

Slide 132 text

github.com/autopilotpattern read/write read-only async replication App Primary Replica Consul MySQL with replication

Slide 133

Slide 133 text

github.com/autopilotpattern ~ $ git clone [email protected]:autopilotpattern/mysql.git ~ $ cd mysql ~/mysql $ tree --dirsfirst . ├── bin │ └── manage.py ├── etc │ ├── containerpilot.json │ └── my.cnf.tmpl ├── tests ├── _env ├── Dockerfile ├── docker-compose.yml ├── local-compose.yml └── setup.sh

Slide 134

Slide 134 text

github.com/autopilotpattern ~/mysql/docker-compose.yml mysql: image: autopilotpattern/mysql:latest mem_limit: 4g restart: always # expose for linking, but each container gets a private IP for # internal use as well expose: - 3306 labels: - triton.cns.services=mysql env_file: _env environment: - CONTAINERPILOT=file:///etc/containerpilot.json

Slide 135

Slide 135 text

github.com/autopilotpattern ~/mysql/docker-compose.yml mysql: image: autopilotpattern/mysql:latest mem_limit: 4g restart: always # expose for linking, but each container gets a private IP for # internal use as well expose: - 3306 labels: - triton.cns.services=mysql env_file: _env environment: - CONTAINERPILOT=file:///etc/containerpilot.json Infrastructure-backed service discovery requirement

Slide 136

Slide 136 text

github.com/autopilotpattern ~/mysql/docker-compose.yml mysql: image: autopilotpattern/mysql:latest mem_limit: 4g restart: always # expose for linking, but each container gets a private IP for # internal use as well expose: - 3306 labels: - triton.cns.services=mysql env_file: _env environment: - CONTAINERPILOT=file:///etc/containerpilot.json Credentials from environment

Slide 137

Slide 137 text

github.com/autopilotpattern ~/workshop/mysql $ ./setup.sh /path/to/private/key.pem ~/workshop/mysql $ emacs _env MYSQL_USER=me MYSQL_PASSWORD=password1 MYSQL_REPL_USER=repl MYSQL_REPL_PASSWORD=password2 MYSQL_DATABASE=mydb MANTA_BUCKET=//stor/triton-mysql MANTA_USER= MANTA_SUBUSER= MANTA_ROLE= MANTA_URL=https://us-east.manta.joyent.com MANTA_KEY_ID=1a:b8:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx MANTA_PRIVATE_KEY=-----BEGIN RSA PRIVATE KEY——#… CONSUL=consul.svc.0f06a3e0-a0da-eb00-a7ae-989d4e44e2ad.us-east-1.cns.joyent.com

Slide 138

Slide 138 text

github.com/autopilotpattern ~/mysql $ docker-compose -p my up -d Creating my_consul_1 Creating my_mysql_1 ~/mysql $ docker-compose -p my ps Name Command State Ports ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– my_consul_1 /bin/start -server -bootst... Up 53/tcp, 53/udp, 8300/tcp... my_mysql_1 containerpilot mysqld… Up 0.0.0.0:3600

Slide 139

Slide 139 text

github.com/autopilotpattern ~/mysql $ docker-compose -p my scale mysql=2 Creating my_mysql_2 ~/mysql $ docker-compose -p my ps Name Command State Ports ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– my_consul_1 /bin/start -server -bootst... Up 53/tcp, 53/udp, 8300/tcp... my_mysql_1 containerpilot mysqld… Up 0.0.0.0:3600 my_mysql_2 containerpilot mysqld… Up 0.0.0.0:3600

Slide 140

Slide 140 text

github.com/autopilotpattern FROM percona:5.6 RUN apt-get update && apt-get install -y \ python python-dev gcc curl percona-xtrabackup # get Python drivers MySQL, Consul, and Manta RUN curl -Ls -o get-pip.py https://bootstrap.pypa.io/get-pip.py && \ python get-pip.py && pip install \ PyMySQL==0.6.7 python-Consul==0.4.7 manta==2.5.0 mock==2.0.0 # get ContainerPilot release (see repo for checksum verification!) RUN curl -Lo /tmp/cp.tar.gz https://github.com/joyent/containerpilot/… tar -xz -f /tmp/cp.tar.gz && mv /containerpilot /usr/local/bin/ # configure ContainerPilot and MySQL COPY etc/* /etc/ COPY bin/* /usr/local/bin/ # override the parent entrypoint ENTRYPOINT [] # use --console to get error logs to stderr CMD [ “containerpilot", “mysqld”, \ "--console", \ "--log-bin=mysql-bin", \ "--log_slave_updates=ON", \ "--gtid-mode=ON", \ "--enforce-gtid-consistency=ON" \ ] ~/mysql/Dockerfile

Slide 141

Slide 141 text

github.com/autopilotpattern FROM percona:5.6 RUN apt-get update && apt-get install -y \ python python-dev gcc curl percona-xtrabackup # get Python drivers MySQL, Consul, and Manta RUN curl -Ls -o get-pip.py https://bootstrap.pypa.io/get-pip.py && \ python get-pip.py && pip install \ PyMySQL==0.6.7 python-Consul==0.4.7 manta==2.5.0 mock==2.0.0 # get ContainerPilot release (see repo for checksum verification!) RUN curl -Lo /tmp/cp.tar.gz https://github.com/joyent/containerpilot/… tar -xz -f /tmp/cp.tar.gz && mv /containerpilot /usr/local/bin/ # configure ContainerPilot and MySQL COPY etc/* /etc/ COPY bin/* /usr/local/bin/ # override the parent entrypoint ENTRYPOINT [] # use --console to get error logs to stderr CMD [ “containerpilot", “mysqld”, \ "--console", \ "--log-bin=mysql-bin", \ "--log_slave_updates=ON", \ "--gtid-mode=ON", \ "--enforce-gtid-consistency=ON" \ ] ~/mysql/Dockerfile

Slide 142

Slide 142 text

github.com/autopilotpattern FROM percona:5.6 RUN apt-get update && apt-get install -y \ python python-dev gcc curl percona-xtrabackup # get Python drivers MySQL, Consul, and Manta RUN curl -Ls -o get-pip.py https://bootstrap.pypa.io/get-pip.py && \ python get-pip.py && pip install \ PyMySQL==0.6.7 python-Consul==0.4.7 manta==2.5.0 mock==2.0.0 # get ContainerPilot release (see repo for checksum verification!) RUN curl -Lo /tmp/cp.tar.gz https://github.com/joyent/containerpilot/… tar -xz -f /tmp/cp.tar.gz && mv /containerpilot /usr/local/bin/ # configure ContainerPilot and MySQL COPY etc/* /etc/ COPY bin/* /usr/local/bin/ # override the parent entrypoint ENTRYPOINT [] # use --console to get error logs to stderr CMD [ “containerpilot", “mysqld”, \ "--console", \ "--log-bin=mysql-bin", \ "--log_slave_updates=ON", \ "--gtid-mode=ON", \ "--enforce-gtid-consistency=ON" \ ] ~/mysql/Dockerfile

Slide 143

Slide 143 text

github.com/autopilotpattern FROM percona:5.6 RUN apt-get update && apt-get install -y \ python python-dev gcc curl percona-xtrabackup # get Python drivers MySQL, Consul, and Manta RUN curl -Ls -o get-pip.py https://bootstrap.pypa.io/get-pip.py && \ python get-pip.py && pip install \ PyMySQL==0.6.7 python-Consul==0.4.7 manta==2.5.0 mock==2.0.0 # get ContainerPilot release (see repo for checksum verification!) RUN curl -Lo /tmp/cp.tar.gz https://github.com/joyent/containerpilot/… tar -xz -f /tmp/cp.tar.gz && mv /containerpilot /usr/local/bin/ # configure ContainerPilot and MySQL COPY etc/* /etc/ COPY bin/* /usr/local/bin/ # override the parent entrypoint ENTRYPOINT [] # use --console to get error logs to stderr CMD [ “containerpilot", “mysqld”, \ "--console", \ "--log-bin=mysql-bin", \ "--log_slave_updates=ON", \ "--gtid-mode=ON", \ "--enforce-gtid-consistency=ON" \ ] ~/mysql/Dockerfile

Slide 144

Slide 144 text

github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py", "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] }

Slide 145

Slide 145 text

github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py", "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] } Environment variable interpolation

Slide 146

Slide 146 text

github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py", "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] } Service definition

Slide 147

Slide 147 text

github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py", "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] } Backend definition

Slide 148

Slide 148 text

github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py", "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] } Huh? This isn’t in our docker-compose.yml

Slide 149

Slide 149 text

github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py", "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] } Logic lives in manage.py

Slide 150

Slide 150 text

github.com/autopilotpattern Container Pilot Consul Lifecycle: preStart MySQL container

Slide 151

Slide 151 text

github.com/autopilotpattern Container Pilot Consul Lifecycle: preStart MySQL container PID1 Separate container

Slide 152

Slide 152 text

github.com/autopilotpattern Container Pilot Consul Lifecycle: preStart Manta object store Store snapshots MySQL container

Slide 153

Slide 153 text

github.com/autopilotpattern Container Pilot Consul preStart Lifecycle: preStart Manta object store MySQL container

Slide 154

Slide 154 text

github.com/autopilotpattern Container Pilot Consul preStart Lifecycle: preStart Manta object store MySQL container Note: no main application running yet! If exit code of preStart != 0, ContainerPilot exits

Slide 155

Slide 155 text

github.com/autopilotpattern Container Pilot Consul preStart Lifecycle: preStart Manta object store MySQL container “Has a snapshot been written to Manta?”

Slide 156

Slide 156 text

github.com/autopilotpattern Container Pilot Consul MySQL container preStart Lifecycle: preStart Manta object store “Has a snapshot been written to Manta?” “Nope!”

Slide 157

Slide 157 text

github.com/autopilotpattern Container Pilot Consul MySQL container preStart Lifecycle: preStart Manta object store “Has a snapshot been written to Manta?” “Nope!” initialize DB

Slide 158

Slide 158 text

github.com/autopilotpattern ~/mysql/bin/manage.py def pre_start(): """ MySQL must be running in order to execute most of our setup behavior so we're just going to make sure the directory structures are in place and then let the first health check handler take it from there """ if not os.path.isdir(os.path.join(config.datadir, 'mysql')): last_backup = has_snapshot() if last_backup: get_snapshot(last_backup) restore_from_snapshot(last_backup) else: if not initialize_db(): log.info('Skipping database setup.') sys.exit(0)

Slide 159

Slide 159 text

github.com/autopilotpattern ~/mysql/bin/manage.py def pre_start(): """ MySQL must be running in order to execute most of our setup behavior so we're just going to make sure the directory structures are in place and then let the first health check handler take it from there """ if not os.path.isdir(os.path.join(config.datadir, 'mysql')): last_backup = has_snapshot() if last_backup: get_snapshot(last_backup) restore_from_snapshot(last_backup) else: if not initialize_db(): log.info('Skipping database setup.') sys.exit(0) Check w/ Consul for snapshot

Slide 160

Slide 160 text

github.com/autopilotpattern ~/mysql/bin/manage.py def pre_start(): """ MySQL must be running in order to execute most of our setup behavior so we're just going to make sure the directory structures are in place and then let the first health check handler take it from there """ if not os.path.isdir(os.path.join(config.datadir, 'mysql')): last_backup = has_snapshot() if last_backup: get_snapshot(last_backup) restore_from_snapshot(last_backup) else: if not initialize_db(): log.info('Skipping database setup.') sys.exit(0) calls /usr/bin/mysql_install_db

Slide 161

Slide 161 text

github.com/autopilotpattern ~/mysql/bin/manage.py def pre_start(): """ MySQL must be running in order to execute most of our setup behavior so we're just going to make sure the directory structures are in place and then let the first health check handler take it from there """ if not os.path.isdir(os.path.join(config.datadir, 'mysql')): last_backup = has_snapshot() if last_backup: get_snapshot(last_backup) restore_from_snapshot(last_backup) else: if not initialize_db(): log.info('Skipping database setup.') sys.exit(0)

Slide 162

Slide 162 text

github.com/autopilotpattern Container Pilot Consul Lifecycle: run Manta object store MySQL container

Slide 163

Slide 163 text

github.com/autopilotpattern Container Pilot mysqld Consul Lifecycle: run • Attach to stdout/ stderr • Return exit code of application to Docker runtime MySQL container Manta object store

Slide 164

Slide 164 text

github.com/autopilotpattern Container Pilot Consul Lifecycle: health mysqld health Manta object store MySQL container

Slide 165

Slide 165 text

github.com/autopilotpattern Manta object store Container Pilot Consul Lifecycle: health User-defined health check inside the container. Runs every poll seconds. mysqld MySQL container health

Slide 166

Slide 166 text

github.com/autopilotpattern mysqld MySQL container Container Pilot Consul health Lifecycle: health Manta object store first time? finish initialization

Slide 167

Slide 167 text

github.com/autopilotpattern ~/mysql/bin/manage.py def health(): """ Run a simple health check. Also acts as a check for whether the ContainerPilot configuration needs to be reloaded (if it's been changed externally), or if we need to make a backup because the backup TTL has expired. """ node = MySQLNode() cp = ContainerPilot(node) if cp.update(): cp.reload() return # Because we need MySQL up to finish initialization, we need to check # for each pass thru the health check that we've done so. The happy # path is to check a lock file against the node state (which has been # set above) and immediately return when we discover the lock exists. # Otherwise, we bootstrap the instance. was_ready = assert_initialized_for_state(node) ctx = dict(user=config.repl_user, password=config.repl_password, timeout=cp.config['services'][0]['ttl']) node.conn = wait_for_connection(**ctx) # Update our lock on being the primary/standby. if node.is_primary() or node.is_standby(): update_session_ttl() # Create a snapshot and send it to the object store if all((node.is_snapshot_node(), (not is_backup_running()), (is_binlog_stale(node.conn) or is_time_for_snapshot()))): write_snapshot(node.conn) mysql_query(node.conn, 'SELECT 1', ())

Slide 168

Slide 168 text

github.com/autopilotpattern ~/mysql/bin/manage.py def run_as_primary(node): """ The overall workflow here is ported and reworked from the Oracle-provided Docker image: https://github.com/mysql/mysql-docker/blob/mysql-server/5.7/docker-entrypoint.sh """ node.state = PRIMARY mark_as_primary(node) node.conn = wait_for_connection() if node.conn: # if we can make a connection w/o a password then this is the # first pass set_timezone_info() setup_root_user(node.conn) create_db(node.conn) create_default_user(node.conn) create_repl_user(node.conn) run_external_scripts('/etc/initdb.d') expire_root_password(node.conn) else: ctx = dict(user=config.repl_user, password=config.repl_password, database=config.mysql_db) node.conn = wait_for_connection(**ctx) stop_replication(node.conn) # in case this is a newly-promoted primary if USE_STANDBY: # if we're using a standby instance then we need to first # snapshot the primary so that we can bootstrap the standby. write_snapshot(node.conn) Set up DB, user, replication user, and expire password, etc.

Slide 169

Slide 169 text

github.com/autopilotpattern ~/mysql/bin/manage.py def run_as_replica(node): try: ctx = dict(user=config.repl_user, password=config.repl_password, database=config.mysql_db) node.conn = wait_for_connection(**ctx) set_primary_for_replica(node.conn) except Exception as ex: log.exception(ex) def set_primary_for_replica(conn): """ Set up GTID-based replication to the primary; once this is set the replica will automatically try to catch up with the primary's last transactions. """ primary = get_primary_host() sql = ('CHANGE MASTER TO ' 'MASTER_HOST = %s, ' 'MASTER_USER = %s, ' 'MASTER_PASSWORD = %s, ' 'MASTER_PORT = 3306, ' 'MASTER_CONNECT_RETRY = 60, ' 'MASTER_AUTO_POSITION = 1, ' 'MASTER_SSL = 0; ' 'START SLAVE;') mysql_exec(conn, sql, (primary, config.repl_user, config.repl_password,))

Slide 170

Slide 170 text

github.com/autopilotpattern ~/mysql/bin/manage.py def run_as_replica(node): try: ctx = dict(user=config.repl_user, password=config.repl_password, database=config.mysql_db) node.conn = wait_for_connection(**ctx) set_primary_for_replica(node.conn) except Exception as ex: log.exception(ex) def set_primary_for_replica(conn): """ Set up GTID-based replication to the primary; once this is set the replica will automatically try to catch up with the primary's last transactions. """ primary = get_primary_host() sql = ('CHANGE MASTER TO ' 'MASTER_HOST = %s, ' 'MASTER_USER = %s, ' 'MASTER_PASSWORD = %s, ' 'MASTER_PORT = 3306, ' 'MASTER_CONNECT_RETRY = 60, ' 'MASTER_AUTO_POSITION = 1, ' 'MASTER_SSL = 0; ' 'START SLAVE;') mysql_exec(conn, sql, (primary, config.repl_user, config.repl_password,)) gets from Consul

Slide 171

Slide 171 text

github.com/autopilotpattern ~/mysql/bin/manage.py def run_as_replica(node): try: ctx = dict(user=config.repl_user, password=config.repl_password, database=config.mysql_db) node.conn = wait_for_connection(**ctx) set_primary_for_replica(node.conn) except Exception as ex: log.exception(ex) def set_primary_for_replica(conn): """ Set up GTID-based replication to the primary; once this is set the replica will automatically try to catch up with the primary's last transactions. """ primary = get_primary_host() sql = ('CHANGE MASTER TO ' 'MASTER_HOST = %s, ' 'MASTER_USER = %s, ' 'MASTER_PASSWORD = %s, ' 'MASTER_PORT = 3306, ' 'MASTER_CONNECT_RETRY = 60, ' 'MASTER_AUTO_POSITION = 1, ' 'MASTER_SSL = 0; ' 'START SLAVE;') mysql_exec(conn, sql, (primary, config.repl_user, config.repl_password,)) Remember our preStart downloaded the snapshot

Slide 172

Slide 172 text

github.com/autopilotpattern Wait a sec. How do we know which instance is primary!?

Slide 173

Slide 173 text

github.com/autopilotpattern Container Pilot Consul Lifecycle: health Exit! mysqld MySQL container health

Slide 174

Slide 174 text

github.com/autopilotpattern Container Pilot mysqld Consul health Lifecycle: health Exit code is 0? “I am mysql-12345. I am available at 192.168.100.2:4000. I am healthy for the next 10 seconds.” MySQL container

Slide 175

Slide 175 text

github.com/autopilotpattern Container Pilot mysqld Consul MySQL container health Lifecycle: health If exit code != 0, do nothing (TTL expires)

Slide 176

Slide 176 text

github.com/autopilotpattern Ask Consul for Primary

Slide 177

Slide 177 text

github.com/autopilotpattern I’m the primary! Ask Consul for Primary

Slide 178

Slide 178 text

github.com/autopilotpattern I’m the primary! Ask Consul for Primary Update lock TTL w/ each health check

Slide 179

Slide 179 text

github.com/autopilotpattern I’m the primary! Someone else is the primary! I’m a replica! Ask Consul for Primary

Slide 180

Slide 180 text

github.com/autopilotpattern I’m the primary! Someone else is the primary! I’m a replica! Ask Consul for Primary Syncs up using snapshot and GTID

Slide 181

Slide 181 text

github.com/autopilotpattern No Primary? I’m the Primary! I’m the primary! Someone else is the primary! I’m a replica! Ask Consul for Primary

Slide 182

Slide 182 text

github.com/autopilotpattern No Primary? I’m the Primary? I’m the primary! Someone else is the primary! I’m a replica! Ask Consul for Primary Need to assert only 1 primary

Slide 183

Slide 183 text

github.com/autopilotpattern No Primary? I’m the Primary? I’m the primary! Failed! Go back to start I’m the primary! Someone else is the primary! I’m a replica! Set lock in Consul w/ TTL Ask Consul for Primary

Slide 184

Slide 184 text

github.com/autopilotpattern No Primary? I’m the Primary? I’m the primary! Failed! Go back to start I’m the primary! Someone else is the primary! I’m a replica! Set lock in Consul w/ TTL Ask Consul for Primary Update lock TTL w/ each health check. Rewrite ContainerPilot config and SIGHUP

Slide 185

Slide 185 text

github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py", "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] }

Slide 186

Slide 186 text

github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py", "services": [ { "name": “mysql-primary", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] } Rewrite & reload config

Slide 187

Slide 187 text

github.com/autopilotpattern ~/mysql/bin/manage.py def health(): """ Run a simple health check. Also acts as a check for whether the ContainerPilot configuration needs to be reloaded (if it's been changed externally), or if we need to make a backup because the backup TTL has expired. """ node = MySQLNode() cp = ContainerPilot(node) if cp.update(): cp.reload() return was_ready = assert_initialized_for_state(node) # cp.reload() will exit early so no need to setup # connection until this point ctx = dict(user=config.repl_user, password=config.repl_password, timeout=cp.config['services'][0]['ttl']) node.conn = wait_for_connection(**ctx) # Update our lock on being the primary/standby. # If this lock is allowed to expire and the health check for the primary # fails, the `onChange` handlers for the replicas will try to self-elect # as primary by obtaining the lock. # If this node can update the lock but the DB fails its health check, # then the operator will need to manually intervene if they want to # force a failover. This architecture is a result of Consul not # permitting us to acquire a new lock on a health-checked session if the # health check is *currently* failing, but has the happy side-effect of # reducing the risk of flapping on a transient health check failure. if node.is_primary() or node.is_standby(): update_session_ttl() # Create a snapshot and send it to the object store. if all((node.is_snapshot_node(), (not is_backup_running()), (is_binlog_stale(node.conn) or is_time_for_snapshot()))): write_snapshot(node.conn) mysql_query(node.conn, 'SELECT 1', ())

Slide 188

Slide 188 text

github.com/autopilotpattern Wait a sec. How do we fail-over?

Slide 189

Slide 189 text

github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py", "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] }

Slide 190

Slide 190 text

github.com/autopilotpattern Container Pilot mysqld Consul Where is mysql-primary? 192.168.1.100 MySQL container Lifecycle: onChange

Slide 191

Slide 191 text

github.com/autopilotpattern Container Pilot mysqld Consul Where is mysql-primary? 192.168.1.100 MySQL container Lifecycle: onChange Check Consul for services listed in backends. Runs every poll seconds.

Slide 192

Slide 192 text

github.com/autopilotpattern replica primary Healthy! Healthy! Failed! Ask Consul for Primary no change Ask Consul for Primary no change Ask Consul for Primary fire onChange handler

Slide 193

Slide 193 text

github.com/autopilotpattern ~/mysql/bin/manage.py def on_change(): node = MySQLNode() ctx = dict(user=config.repl_user, password=config.repl_password, timeout=cp.config['services'][0]['ttl']) node.conn = wait_for_connection(**ctx) # need to stop replication whether we're the new primary or not stop_replication(node.conn) while True: try: # if there is no primary node, we'll try to obtain the lock. # if we get the lock we'll reload as the new primary, otherwise # someone else got the lock but we don't know who yet so loop primary = get_primary_node() if not primary: session_id = get_session(no_cache=True) if mark_with_session(PRIMARY_KEY, node.hostname, session_id): node.state = PRIMARY if cp.update(): cp.reload() return else: # we lost the race to lock the session for ourselves time.sleep(1) continue # we know who the primary is but not whether they're healthy. # if it's not healthy, we'll throw an exception and start over. ip = get_primary_host(primary=primary) if ip == node.ip: if cp.update(): cp.reload() return set_primary_for_replica(node.conn) return except Exception as ex: # This exception gets thrown if the session lock for `mysql-primary` # key has not expired yet (but there's no healthy primary either), # or if the replica's target primary isn't ready yet. log.debug(ex) time.sleep(1) # avoid hammering Consul continue

Slide 194

Slide 194 text

github.com/autopilotpattern replica primary Healthy! Healthy! Failed! no change no change Ask Consul for Primary Ask Consul for Primary Ask Consul for Primary fire onChange handler

Slide 195

Slide 195 text

github.com/autopilotpattern replica primary Healthy! Healthy! Failed! no change no change Ask Consul for Primary Ask Consul for Primary Ask Consul for Primary Ask Consul for Primary Ok, I’m primary Set lock in Consul Success! primary Healthy! fire onChange handler

Slide 196

Slide 196 text

Applications on Autopilot Tim Gross @0x74696d (“tim”) github.com/autopilotpattern