Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stateful Applications on Autopilot

Stateful Applications on Autopilot

Given at Velocity Santa Clara 2016

Tim Gross

June 21, 2016
Tweet

More Decks by Tim Gross

Other Decks in Technology

Transcript

  1. github.com/autopilotpattern How do we get from dev to prod? •

    Service Discovery • Load balancing • Automated-failover • Config changes • Monitoring
  2. github.com/autopilotpattern Nginx Consul /custom ers /sales /sales/data /customers/data read/write read-

    only MySQL Primary MySQL Replica ES Master ES Data Kibana Prometheus Sales Logstash Customers Load Balancing
  3. github.com/autopilotpattern Nginx Consul /custom ers /sales /sales/data /customers/data read/write read-

    only async replication MySQL Primary MySQL Replica ES Master ES Data Kibana Prometheus Sales Logstash Customers Replication & Fail-over
  4. github.com/autopilotpattern Nginx Consul /custom ers /sales /sales/data /customers/data read/write read-

    only async replication MySQL Primary MySQL Replica ES Master ES Data Kibana Prometheus Sales Logstash Customers Service discovery
  5. github.com/autopilotpattern Nginx Consul /custom ers /sales /sales/data /customers/data read/write read-

    only async replication MySQL Primary MySQL Replica ES Master ES Data Kibana Prometheus Sales Logstash Customers Logging
  6. github.com/autopilotpattern Nginx Consul /custom ers /sales /sales/data /customers/data read/write read-

    only async replication MySQL Primary MySQL Replica ES Master ES Data Kibana Prometheus Sales Logstash Customers Monitoring
  7. github.com/autopilotpattern Nginx Sales Consul /custom ers /sales /sales/data /customers/data Customers

    How do we make existing applications use it? Microservices application w/ discovery catalog
  8. github.com/autopilotpattern read/write read-only async replication App Primary Replica MySQL with

    replication How does client find DB? How does replica find primary? How does primary tell replica where to start?
  9. github.com/autopilotpattern read/write read-only async replication App Primary Replica MySQL with

    replication How do we update client on failover? How do we promote a replica? How do we orchestrate backups?
  10. github.com/autopilotpattern • No CM server in local development • No

    service discovery on change • Agents are heavy
  11. github.com/autopilotpattern “[a] pattern where containers autonomously adapt to changes in

    their environment and coordinate their actions thru a globally shared state” Lukasz Guminski, Container Solutions http://container-solutions.com/containerpilot-on-mantl/
  12. github.com/autopilotpattern VM or physical hardware VM or physical hardware VM

    or physical hardware Nginx Consul MySQL Primary ES Master Prometheus Logstash Customers Nginx Consul MySQL Primary ES Master Prometheus Logstash Customers Customers Cluster management & provisioning
  13. github.com/autopilotpattern NAT Sales Customers Consul Compute Node 192.168.1.100:32380 Docker bridge

    networking “Where is Customers?” “172.17.0.2:80” 172.17.0.2:80
  14. github.com/autopilotpattern NAT Sales Customers Consul Compute Node 192.168.1.100:32380 Docker bridge

    networking WTF???!!! 172.17.0.2:80 172.17.0.2:80 No route to host!
  15. github.com/autopilotpattern Sales Customers Compute Node Docker host networking Consul 192.168.1.101

    192.168.1.100:80 “I’m listening on 192.168.1.100:80” Customers 192.168.1.100:80
  16. github.com/autopilotpattern Sales Customers Compute Node Docker host networking Consul 192.168.1.101

    192.168.1.100:80 “I’m listening on 192.168.1.100:80” Customers 192.168.1.100:80 Port conflicts!
  17. github.com/autopilotpattern Nginx Sales Consul Customers Microservices app How do we

    bootstrap service catalog HA? How do services find service catalog?
  18. github.com/autopilotpattern App-centric micro-orchestrator that runs inside the container. User-defined behaviors:

    • Lifecycle hooks (preStop, preStop, postStop) • Health checks w/ heart beats • Watch discovery catalog for changes • Update config on upstream changes • Gather performance metrics
  19. github.com/autopilotpattern Sales Container Pilot Application Consul Where is Sales? 192.168.1.100

    192.168.1.101 192.168.1.102 Application container onChange event
  20. github.com/autopilotpattern Sales Container Pilot Application http://192.168.1.100 Consul Where is Sales?

    192.168.1.100 192.168.1.101 192.168.1.102 Application container onChange event
  21. github.com/autopilotpattern Application onChange event User-defined behavior hooks: • preStart •

    preStop • postStop • health • onChange • sensor • task • co-process Application container
  22. github.com/autopilotpattern ~ $ git clone [email protected]:autopilotpattern/workshop.git ~ $ cd workshop

    && git checkout workshop ~/workshop $ tree --dirsfirst . ├── customers │ ├── Dockerfile │ ├── containerpilot.json │ ├── customers.js │ └── package.json ├── nginx │ ├── Dockerfile │ ├── containerpilot.json │ ├── index.html │ ├── index.js │ ├── nginx.conf │ └── nginx.conf.ctmpl ├── sales │ ├── Dockerfile │ ├── containerpilot.json │ ├── package.json │ └── sales.js └── docker-compose.yml
  23. github.com/autopilotpattern # a Node.js application container FROM gliderlabs/alpine:3.3 # dependencies

    RUN apk update && apk add nodejs curl COPY package.json /opt/customers/ RUN cd /opt/customers && npm install # add our application and configuration COPY customers.js /opt/customers/ EXPOSE 4000 CMD [ "node", "/opt/customers/customers.js" ] ~/workshop/customers/Dockerfile
  24. github.com/autopilotpattern # a Node.js application container FROM gliderlabs/alpine:3.3 # dependencies

    RUN apk update && apk add nodejs curl COPY package.json /opt/customers/ RUN cd /opt/customers && npm install # get ContainerPilot release (please verify checksum in real life, but YOLO!) RUN curl -Lo /tmp/cp.tar.gz https://github.com/joyent/containerpilot/… tar -xz -f /tmp/cp.tar.gz && mv /containerpilot /bin/ # add our application and configuration COPY customers.js /opt/customers/ COPY containerpilot.json /etc/containerpilot.json ENV CONTAINERPILOT=file:///etc/containerpilot.json EXPOSE 4000 CMD [ "/bin/containerpilot", "node", "/opt/customers/customers.js" ] ~/workshop/customers/Dockerfile
  25. github.com/autopilotpattern # a Node.js application container FROM gliderlabs/alpine:3.3 # dependencies

    RUN apk update && apk add nodejs curl COPY package.json /opt/customers/ RUN cd /opt/customers && npm install # get ContainerPilot release (please verify checksum in real life, but YOLO!) RUN curl -Lo /tmp/cp.tar.gz https://github.com/joyent/containerpilot/… tar -xz -f /tmp/cp.tar.gz && mv /containerpilot /bin/ # add our application and configuration COPY customers.js /opt/customers/ COPY containerpilot.json /etc/containerpilot.json ENV CONTAINERPILOT=file:///etc/containerpilot.json EXPOSE 4000 CMD [ "/bin/containerpilot", "node", "/opt/customers/customers.js" ] ~/workshop/customers/Dockerfile
  26. github.com/autopilotpattern # a Node.js application container FROM gliderlabs/alpine:3.3 # dependencies

    RUN apk update && apk add nodejs curl COPY package.json /opt/customers/ RUN cd /opt/customers && npm install # get ContainerPilot release (please verify checksum in real life, but YOLO!) RUN curl -Lo /tmp/cp.tar.gz https://github.com/joyent/containerpilot/… tar -xz -f /tmp/cp.tar.gz && mv /containerpilot /bin/ # add our application and configuration COPY customers.js /opt/customers/ COPY containerpilot.json /etc/containerpilot.json ENV CONTAINERPILOT=file:///etc/containerpilot.json EXPOSE 4000 CMD [ "/bin/containerpilot", "node", "/opt/customers/customers.js" ] ~/workshop/customers/Dockerfile
  27. github.com/autopilotpattern { "consul": "consul:8500", "services": [ { "name": "customers", "port":

    4000, "health": "/usr/bin/curl --fail -s http://localhost:4000/data", "poll": 3, "ttl": 10 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "pkill -SIGHUP node" } ] } ~/workshop/customers/etc/containerpilot.json
  28. github.com/autopilotpattern { "consul": "consul:8500", "services": [ { "name": "customers", "port":

    4000, "health": "/usr/bin/curl --fail -s http://localhost:4000/data", "poll": 3, "ttl": 10 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "pkill -SIGHUP node" } ] } ~/workshop/customers/etc/containerpilot.json can be templatized w/ env var: {{ .CONSUL }}
  29. github.com/autopilotpattern { "consul": "consul:8500", "services": [ { "name": "customers", "port":

    4000, "health": "/usr/bin/curl --fail -s http://localhost:4000/data", "poll": 3, "ttl": 10 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "pkill -SIGHUP node" } ] } ~/workshop/customers/etc/containerpilot.json
  30. github.com/autopilotpattern { "consul": "consul:8500", "services": [ { "name": "customers", "port":

    4000, "health": "/usr/bin/curl --fail -s http://localhost:4000/data", "poll": 3, "ttl": 10 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "pkill -SIGHUP node" } ] } ~/workshop/customers/etc/containerpilot.json
  31. github.com/autopilotpattern Container Pilot Application Application container onChange event health check

    root@993acf351cd9:/# ps axo uid,pid,ppid,cmd UID PID PPID CMD root 1 0 /bin/containerpilot root 94 1 ├─ node /opt/customers/customers.js root 107 1 ├─ /usr/bin/curl --fail localhost:4000 root 120 1 └─ pkill -SIGHUP node
  32. github.com/autopilotpattern { "consul": "consul:8500", "preStart": "/bin/reload-nginx.sh preStart", "services": [ {

    "name": "nginx", "port": 80, "interfaces": ["eth1", "eth0"], "health": "/usr/bin/curl --fail -s http://localhost/health", "poll": 10, "ttl": 25 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” },{ "name": "customers", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” } ]... ~/workshop/nginx/etc/containerpilot.json
  33. github.com/autopilotpattern { "consul": "consul:8500", "preStart": "/bin/reload-nginx.sh preStart", "services": [ {

    "name": "nginx", "port": 80, "interfaces": ["eth1", "eth0"], "health": "/usr/bin/curl --fail -s http://localhost/health", "poll": 10, "ttl": 25 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” },{ "name": "customers", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” } ]... ~/workshop/nginx/etc/containerpilot.json
  34. github.com/autopilotpattern { "consul": "consul:8500", "preStart": "/bin/reload-nginx.sh preStart", "services": [ {

    "name": "nginx", "port": 80, "interfaces": ["eth1", "eth0"], "health": "/usr/bin/curl --fail -s http://localhost/health", "poll": 10, "ttl": 25 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” },{ "name": "customers", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” } ]... ~/workshop/nginx/etc/containerpilot.json
  35. github.com/autopilotpattern { "consul": "consul:8500", "preStart": "/bin/reload-nginx.sh preStart", "services": [ {

    "name": "nginx", "port": 80, "interfaces": ["eth1", "eth0"], "health": "/usr/bin/curl --fail -s http://localhost/health", "poll": 10, "ttl": 25 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” },{ "name": "customers", "poll": 3, "onChange": "/bin/reload-nginx.sh onChange” } ]... ~/workshop/nginx/etc/containerpilot.json
  36. github.com/autopilotpattern Container Pilot Application Application container onChange event health check

    root@993acf351cd9:/# ps axo uid,pid,ppid,cmd UID PID PPID CMD root 1 0 /bin/containerpilot root 94 1 ├─ nginx -g “daemon off;” root 107 1 ├─ /usr/bin/curl --fail localhost root 120 1 └─ consul-template -SIGHUP node root 128 120 └─ nginx -s reload
  37. github.com/autopilotpattern Container Pilot Consul preStart Lifecycle: preStart Note: no main

    application running yet! If exit code of preStart != 0, ContainerPilot exits Application container
  38. github.com/autopilotpattern #!/bin/sh # Render Nginx configuration template using values from

    Consul, # but do not reload because Nginx has't started yet preStart() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf" } # Render Nginx configuration template using values from Consul, # then gracefully reload Nginx onChange() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf:nginx -s reload" } until cmd=$1 if [ -z "$cmd" ]; then onChange fi shift 1 $cmd "$@" [ "$?" -ne 127 ] do onChange exit done ~/workshop/nginx/reload-nginx.sh
  39. github.com/autopilotpattern #!/bin/sh # Render Nginx configuration template using values from

    Consul, # but do not reload because Nginx has't started yet preStart() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf" } # Render Nginx configuration template using values from Consul, # then gracefully reload Nginx onChange() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf:nginx -s reload" } until cmd=$1 if [ -z "$cmd" ]; then onChange fi shift 1 $cmd "$@" [ "$?" -ne 127 ] do onChange exit done can be templatized w/ env var: $CONTAINERPILOT_<SERVICENAME>_IP ~/workshop/nginx/reload-nginx.sh
  40. github.com/autopilotpattern #!/bin/sh # Render Nginx configuration template using values from

    Consul, # but do not reload because Nginx has't started yet preStart() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf" } # Render Nginx configuration template using values from Consul, # then gracefully reload Nginx onChange() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf:nginx -s reload" } until cmd=$1 if [ -z "$cmd" ]; then onChange fi shift 1 $cmd "$@" [ "$?" -ne 127 ] do onChange exit done ~/workshop/nginx/reload-nginx.sh /etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf:nginx -s reload render this file : to this file : and then do this
  41. github.com/autopilotpattern Container Pilot Consul preStart Lifecycle: preStart “None yet!” or

    “{ customers: [ 192.168.1.100:4000, 192.168.1.101:4000], sales: [ 192.168.1.102:3000, 192.168.1.103:3000] }” Nginx container consul-template: get addresses for upstreams
  42. github.com/autopilotpattern Container Pilot Consul preStart Lifecycle: preStart “None yet!” or

    “{ customers: [ 192.168.1.100:4000, 192.168.1.101:4000], sales: [ 192.168.1.102:3000, 192.168.1.103:3000] }” Nginx container consul-template: get addresses for upstreams render virtualhost config
  43. github.com/autopilotpattern user nginx; worker_processes 1; error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid;

    events { worker_connections 1024; } http { include /etc/nginx/mime.types; default_type application/octet-stream; access_log /var/log/nginx/access.log main; sendfile on; keepalive_timeout 65; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; server { listen 80; server_name _; root /usr/share/nginx/html; location /health { # requires http_stub_status_module stub_status; allow 127.0.0.1; deny all; } } }
  44. github.com/autopilotpattern {{ if service "sales" }} upstream sales { #

    write the address:port pairs for each healthy Sales node {{range service "sales"}} server {{.Address}}:{{.Port}}; {{end}} least_conn; }{{ end }} server { listen 80; server_name _; root /usr/share/nginx/html; {{ if service "sales" }} location ^~ /sales { # strip '/sales' from the request before passing # it along to the Sales upstream rewrite ^/sales(/.*)$ $1 break; proxy_pass http://sales; proxy_redirect off; }{{end}} } }
  45. github.com/autopilotpattern {{ if service "sales" }} upstream sales { #

    write the address:port pairs for each healthy Sales node {{range service "sales"}} server {{.Address}}:{{.Port}}; {{end}} least_conn; }{{ end }} server { listen 80; server_name _; root /usr/share/nginx/html; {{ if service "sales" }} location ^~ /sales { # strip '/sales' from the request before passing # it along to the Sales upstream rewrite ^/sales(/.*)$ $1 break; proxy_pass http://sales; proxy_redirect off; }{{end}} } }
  46. github.com/autopilotpattern {{ if service "sales" }} upstream sales { #

    write the address:port pairs for each healthy Sales node {{range service "sales"}} server {{.Address}}:{{.Port}}; {{end}} least_conn; }{{ end }} server { listen 80; server_name _; root /usr/share/nginx/html; {{ if service "sales" }} location ^~ /sales { # strip '/sales' from the request before passing # it along to the Sales upstream rewrite ^/sales(/.*)$ $1 break; proxy_pass http://sales; proxy_redirect off; }{{end}} } }
  47. github.com/autopilotpattern upstream sales { # write the address:port pairs for

    each healthy Sales node server 192.168.1.101:3000; server 192.168.1.102:3000; server 192.168.1.103:3000; least_conn; } server { listen 80; server_name _; root /usr/share/nginx/html; location ^~ /sales { # strip '/sales' from the request before passing # it along to the Sales upstream rewrite ^/sales(/.*)$ $1 break; proxy_pass http://sales; proxy_redirect off; } } }
  48. github.com/autopilotpattern Container Pilot node Consul Lifecycle: run • Attach to

    stdout/ stderr • Return exit code of application to Docker runtime application container
  49. github.com/autopilotpattern { "consul": "consul:8500", "services": [ { "name": "customers", "port":

    4000, "health": "/usr/bin/curl --fail -s http://localhost:4000/data", "poll": 3, "ttl": 10 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "pkill -SIGHUP node" } ] } ~/workshop/customers/etc/containerpilot.json
  50. github.com/autopilotpattern User-defined health check inside the container. Runs every poll

    seconds. Container Pilot Consul Lifecycle: health nginx (or node) health application container
  51. github.com/autopilotpattern Container Pilot nginx (or node) Consul health Lifecycle: health

    Exit code is 0? “I am customers-12345. I am available at 192.168.100.2:4000. I am healthy for the next 10 seconds.” application container
  52. github.com/autopilotpattern Container Pilot nginx (or node) Consul health Lifecycle: health

    If exit code != 0, do nothing (TTL expires) application container
  53. github.com/autopilotpattern Container Pilot nginx (or node) Consul Where is customers?

    192.168.1.101:3000 application container Lifecycle: onChange
  54. github.com/autopilotpattern Container Pilot nginx (or node) Consul Where is customers?

    192.168.1.101:3000 application container Lifecycle: onChange Check Consul for services listed in backends. Runs every poll seconds.
  55. github.com/autopilotpattern #!/bin/sh # Render Nginx configuration template using values from

    Consul, # but do not reload because Nginx has't started yet preStart() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf" } # Render Nginx configuration template using values from Consul, # then gracefully reload Nginx onChange() { consul-template \ -once \ -consul consul:8500 \ -template "/etc/containerpilot/nginx.conf.ctmpl:/etc/nginx/nginx.conf:nginx -s reload" } until cmd=$1 if [ -z "$cmd" ]; then onChange fi shift 1 $cmd "$@" [ "$?" -ne 127 ] do onChange exit done ~/workshop/nginx/reload-nginx.sh
  56. github.com/autopilotpattern { "consul": "consul:8500", "services": [ { "name": "customers", "port":

    4000, "health": "/usr/bin/curl --fail -s http://localhost:4000/data", "poll": 3, "ttl": 10 } ], "backends": [ { "name": "sales", "poll": 3, "onChange": "pkill -SIGHUP node" } ] } ~/workshop/customers/etc/containerpilot.json
  57. github.com/autopilotpattern var upstreamHosts = []; var getUpstreams = function(force, callback)

    { // get data from Consul // fill upstreamHosts // fire callback } process.on('SIGHUP', function () { console.log('Received SIGHUP'); getUpstreams(true, function(hosts) { console.log(‘Updated upstreamHosts'); }); }); ~/workshop/sales/sales.js
  58. github.com/autopilotpattern ... "telemetry": { "port": 9090, "sensors": [ { "name":

    "tb_nginx_connections_unhandled_total", "help": "Number of accepted connnections that were not handled", "type": "gauge", "poll": 5, "check": ["/opt/containerpilot/sensor.sh", "unhandled"] }, { "name": "tb_nginx_connections_load", "help": "Ratio of active connections to max worker connections", "type": "gauge", "poll": 5, "check": ["/opt/containerpilot/sensor.sh", "connections_load"] } ] } } ~/workshop/nginx/etc/containerpilot.json
  59. github.com/autopilotpattern ~ $ git clone [email protected]:autopilotpattern/mysql.git ~ $ cd mysql

    ~/mysql $ tree --dirsfirst . ├── bin │ └── manage.py ├── etc │ ├── containerpilot.json │ └── my.cnf.tmpl ├── tests ├── _env ├── Dockerfile ├── docker-compose.yml ├── local-compose.yml └── setup.sh
  60. github.com/autopilotpattern ~/mysql/docker-compose.yml mysql: image: autopilotpattern/mysql:latest mem_limit: 4g restart: always #

    expose for linking, but each container gets a private IP for # internal use as well expose: - 3306 labels: - triton.cns.services=mysql env_file: _env environment: - CONTAINERPILOT=file:///etc/containerpilot.json
  61. github.com/autopilotpattern ~/mysql/docker-compose.yml mysql: image: autopilotpattern/mysql:latest mem_limit: 4g restart: always #

    expose for linking, but each container gets a private IP for # internal use as well expose: - 3306 labels: - triton.cns.services=mysql env_file: _env environment: - CONTAINERPILOT=file:///etc/containerpilot.json Infrastructure-backed service discovery requirement
  62. github.com/autopilotpattern ~/mysql/docker-compose.yml mysql: image: autopilotpattern/mysql:latest mem_limit: 4g restart: always #

    expose for linking, but each container gets a private IP for # internal use as well expose: - 3306 labels: - triton.cns.services=mysql env_file: _env environment: - CONTAINERPILOT=file:///etc/containerpilot.json Credentials from environment
  63. github.com/autopilotpattern ~/workshop/mysql $ ./setup.sh /path/to/private/key.pem ~/workshop/mysql $ emacs _env MYSQL_USER=me

    MYSQL_PASSWORD=password1 MYSQL_REPL_USER=repl MYSQL_REPL_PASSWORD=password2 MYSQL_DATABASE=mydb MANTA_BUCKET=/<username>/stor/triton-mysql MANTA_USER=<username> MANTA_SUBUSER= MANTA_ROLE= MANTA_URL=https://us-east.manta.joyent.com MANTA_KEY_ID=1a:b8:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx MANTA_PRIVATE_KEY=-----BEGIN RSA PRIVATE KEY——#… CONSUL=consul.svc.0f06a3e0-a0da-eb00-a7ae-989d4e44e2ad.us-east-1.cns.joyent.com
  64. github.com/autopilotpattern ~/mysql $ docker-compose -p my up -d Creating my_consul_1

    Creating my_mysql_1 ~/mysql $ docker-compose -p my ps Name Command State Ports ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– my_consul_1 /bin/start -server -bootst... Up 53/tcp, 53/udp, 8300/tcp... my_mysql_1 containerpilot mysqld… Up 0.0.0.0:3600
  65. github.com/autopilotpattern ~/mysql $ docker-compose -p my scale mysql=2 Creating my_mysql_2

    ~/mysql $ docker-compose -p my ps Name Command State Ports ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– my_consul_1 /bin/start -server -bootst... Up 53/tcp, 53/udp, 8300/tcp... my_mysql_1 containerpilot mysqld… Up 0.0.0.0:3600 my_mysql_2 containerpilot mysqld… Up 0.0.0.0:3600
  66. github.com/autopilotpattern FROM percona:5.6 RUN apt-get update && apt-get install -y

    \ python python-dev gcc curl percona-xtrabackup # get Python drivers MySQL, Consul, and Manta RUN curl -Ls -o get-pip.py https://bootstrap.pypa.io/get-pip.py && \ python get-pip.py && pip install \ PyMySQL==0.6.7 python-Consul==0.4.7 manta==2.5.0 mock==2.0.0 # get ContainerPilot release (see repo for checksum verification!) RUN curl -Lo /tmp/cp.tar.gz https://github.com/joyent/containerpilot/… tar -xz -f /tmp/cp.tar.gz && mv /containerpilot /usr/local/bin/ # configure ContainerPilot and MySQL COPY etc/* /etc/ COPY bin/* /usr/local/bin/ # override the parent entrypoint ENTRYPOINT [] # use --console to get error logs to stderr CMD [ “containerpilot", “mysqld”, \ "--console", \ "--log-bin=mysql-bin", \ "--log_slave_updates=ON", \ "--gtid-mode=ON", \ "--enforce-gtid-consistency=ON" \ ] ~/mysql/Dockerfile
  67. github.com/autopilotpattern FROM percona:5.6 RUN apt-get update && apt-get install -y

    \ python python-dev gcc curl percona-xtrabackup # get Python drivers MySQL, Consul, and Manta RUN curl -Ls -o get-pip.py https://bootstrap.pypa.io/get-pip.py && \ python get-pip.py && pip install \ PyMySQL==0.6.7 python-Consul==0.4.7 manta==2.5.0 mock==2.0.0 # get ContainerPilot release (see repo for checksum verification!) RUN curl -Lo /tmp/cp.tar.gz https://github.com/joyent/containerpilot/… tar -xz -f /tmp/cp.tar.gz && mv /containerpilot /usr/local/bin/ # configure ContainerPilot and MySQL COPY etc/* /etc/ COPY bin/* /usr/local/bin/ # override the parent entrypoint ENTRYPOINT [] # use --console to get error logs to stderr CMD [ “containerpilot", “mysqld”, \ "--console", \ "--log-bin=mysql-bin", \ "--log_slave_updates=ON", \ "--gtid-mode=ON", \ "--enforce-gtid-consistency=ON" \ ] ~/mysql/Dockerfile
  68. github.com/autopilotpattern FROM percona:5.6 RUN apt-get update && apt-get install -y

    \ python python-dev gcc curl percona-xtrabackup # get Python drivers MySQL, Consul, and Manta RUN curl -Ls -o get-pip.py https://bootstrap.pypa.io/get-pip.py && \ python get-pip.py && pip install \ PyMySQL==0.6.7 python-Consul==0.4.7 manta==2.5.0 mock==2.0.0 # get ContainerPilot release (see repo for checksum verification!) RUN curl -Lo /tmp/cp.tar.gz https://github.com/joyent/containerpilot/… tar -xz -f /tmp/cp.tar.gz && mv /containerpilot /usr/local/bin/ # configure ContainerPilot and MySQL COPY etc/* /etc/ COPY bin/* /usr/local/bin/ # override the parent entrypoint ENTRYPOINT [] # use --console to get error logs to stderr CMD [ “containerpilot", “mysqld”, \ "--console", \ "--log-bin=mysql-bin", \ "--log_slave_updates=ON", \ "--gtid-mode=ON", \ "--enforce-gtid-consistency=ON" \ ] ~/mysql/Dockerfile
  69. github.com/autopilotpattern FROM percona:5.6 RUN apt-get update && apt-get install -y

    \ python python-dev gcc curl percona-xtrabackup # get Python drivers MySQL, Consul, and Manta RUN curl -Ls -o get-pip.py https://bootstrap.pypa.io/get-pip.py && \ python get-pip.py && pip install \ PyMySQL==0.6.7 python-Consul==0.4.7 manta==2.5.0 mock==2.0.0 # get ContainerPilot release (see repo for checksum verification!) RUN curl -Lo /tmp/cp.tar.gz https://github.com/joyent/containerpilot/… tar -xz -f /tmp/cp.tar.gz && mv /containerpilot /usr/local/bin/ # configure ContainerPilot and MySQL COPY etc/* /etc/ COPY bin/* /usr/local/bin/ # override the parent entrypoint ENTRYPOINT [] # use --console to get error logs to stderr CMD [ “containerpilot", “mysqld”, \ "--console", \ "--log-bin=mysql-bin", \ "--log_slave_updates=ON", \ "--gtid-mode=ON", \ "--enforce-gtid-consistency=ON" \ ] ~/mysql/Dockerfile
  70. github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py",

    "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] }
  71. github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py",

    "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] } Environment variable interpolation
  72. github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py",

    "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] } Service definition
  73. github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py",

    "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] } Backend definition
  74. github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py",

    "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] } Huh? This isn’t in our docker-compose.yml
  75. github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py",

    "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] } Logic lives in manage.py
  76. github.com/autopilotpattern Container Pilot Consul preStart Lifecycle: preStart Manta object store

    MySQL container Note: no main application running yet! If exit code of preStart != 0, ContainerPilot exits
  77. github.com/autopilotpattern Container Pilot Consul MySQL container preStart Lifecycle: preStart Manta

    object store “Has a snapshot been written to Manta?” “Nope!”
  78. github.com/autopilotpattern Container Pilot Consul MySQL container preStart Lifecycle: preStart Manta

    object store “Has a snapshot been written to Manta?” “Nope!” initialize DB
  79. github.com/autopilotpattern ~/mysql/bin/manage.py def pre_start(): """ MySQL must be running in

    order to execute most of our setup behavior so we're just going to make sure the directory structures are in place and then let the first health check handler take it from there """ if not os.path.isdir(os.path.join(config.datadir, 'mysql')): last_backup = has_snapshot() if last_backup: get_snapshot(last_backup) restore_from_snapshot(last_backup) else: if not initialize_db(): log.info('Skipping database setup.') sys.exit(0)
  80. github.com/autopilotpattern ~/mysql/bin/manage.py def pre_start(): """ MySQL must be running in

    order to execute most of our setup behavior so we're just going to make sure the directory structures are in place and then let the first health check handler take it from there """ if not os.path.isdir(os.path.join(config.datadir, 'mysql')): last_backup = has_snapshot() if last_backup: get_snapshot(last_backup) restore_from_snapshot(last_backup) else: if not initialize_db(): log.info('Skipping database setup.') sys.exit(0) Check w/ Consul for snapshot
  81. github.com/autopilotpattern ~/mysql/bin/manage.py def pre_start(): """ MySQL must be running in

    order to execute most of our setup behavior so we're just going to make sure the directory structures are in place and then let the first health check handler take it from there """ if not os.path.isdir(os.path.join(config.datadir, 'mysql')): last_backup = has_snapshot() if last_backup: get_snapshot(last_backup) restore_from_snapshot(last_backup) else: if not initialize_db(): log.info('Skipping database setup.') sys.exit(0) calls /usr/bin/mysql_install_db
  82. github.com/autopilotpattern ~/mysql/bin/manage.py def pre_start(): """ MySQL must be running in

    order to execute most of our setup behavior so we're just going to make sure the directory structures are in place and then let the first health check handler take it from there """ if not os.path.isdir(os.path.join(config.datadir, 'mysql')): last_backup = has_snapshot() if last_backup: get_snapshot(last_backup) restore_from_snapshot(last_backup) else: if not initialize_db(): log.info('Skipping database setup.') sys.exit(0)
  83. github.com/autopilotpattern Container Pilot mysqld Consul Lifecycle: run • Attach to

    stdout/ stderr • Return exit code of application to Docker runtime MySQL container Manta object store
  84. github.com/autopilotpattern Manta object store Container Pilot Consul Lifecycle: health User-defined

    health check inside the container. Runs every poll seconds. mysqld MySQL container health
  85. github.com/autopilotpattern ~/mysql/bin/manage.py def health(): """ Run a simple health check.

    Also acts as a check for whether the ContainerPilot configuration needs to be reloaded (if it's been changed externally), or if we need to make a backup because the backup TTL has expired. """ node = MySQLNode() cp = ContainerPilot(node) if cp.update(): cp.reload() return # Because we need MySQL up to finish initialization, we need to check # for each pass thru the health check that we've done so. The happy # path is to check a lock file against the node state (which has been # set above) and immediately return when we discover the lock exists. # Otherwise, we bootstrap the instance. was_ready = assert_initialized_for_state(node) ctx = dict(user=config.repl_user, password=config.repl_password, timeout=cp.config['services'][0]['ttl']) node.conn = wait_for_connection(**ctx) # Update our lock on being the primary/standby. if node.is_primary() or node.is_standby(): update_session_ttl() # Create a snapshot and send it to the object store if all((node.is_snapshot_node(), (not is_backup_running()), (is_binlog_stale(node.conn) or is_time_for_snapshot()))): write_snapshot(node.conn) mysql_query(node.conn, 'SELECT 1', ())
  86. github.com/autopilotpattern ~/mysql/bin/manage.py def run_as_primary(node): """ The overall workflow here is

    ported and reworked from the Oracle-provided Docker image: https://github.com/mysql/mysql-docker/blob/mysql-server/5.7/docker-entrypoint.sh """ node.state = PRIMARY mark_as_primary(node) node.conn = wait_for_connection() if node.conn: # if we can make a connection w/o a password then this is the # first pass set_timezone_info() setup_root_user(node.conn) create_db(node.conn) create_default_user(node.conn) create_repl_user(node.conn) run_external_scripts('/etc/initdb.d') expire_root_password(node.conn) else: ctx = dict(user=config.repl_user, password=config.repl_password, database=config.mysql_db) node.conn = wait_for_connection(**ctx) stop_replication(node.conn) # in case this is a newly-promoted primary if USE_STANDBY: # if we're using a standby instance then we need to first # snapshot the primary so that we can bootstrap the standby. write_snapshot(node.conn) Set up DB, user, replication user, and expire password, etc.
  87. github.com/autopilotpattern ~/mysql/bin/manage.py def run_as_replica(node): try: ctx = dict(user=config.repl_user, password=config.repl_password, database=config.mysql_db)

    node.conn = wait_for_connection(**ctx) set_primary_for_replica(node.conn) except Exception as ex: log.exception(ex) def set_primary_for_replica(conn): """ Set up GTID-based replication to the primary; once this is set the replica will automatically try to catch up with the primary's last transactions. """ primary = get_primary_host() sql = ('CHANGE MASTER TO ' 'MASTER_HOST = %s, ' 'MASTER_USER = %s, ' 'MASTER_PASSWORD = %s, ' 'MASTER_PORT = 3306, ' 'MASTER_CONNECT_RETRY = 60, ' 'MASTER_AUTO_POSITION = 1, ' 'MASTER_SSL = 0; ' 'START SLAVE;') mysql_exec(conn, sql, (primary, config.repl_user, config.repl_password,))
  88. github.com/autopilotpattern ~/mysql/bin/manage.py def run_as_replica(node): try: ctx = dict(user=config.repl_user, password=config.repl_password, database=config.mysql_db)

    node.conn = wait_for_connection(**ctx) set_primary_for_replica(node.conn) except Exception as ex: log.exception(ex) def set_primary_for_replica(conn): """ Set up GTID-based replication to the primary; once this is set the replica will automatically try to catch up with the primary's last transactions. """ primary = get_primary_host() sql = ('CHANGE MASTER TO ' 'MASTER_HOST = %s, ' 'MASTER_USER = %s, ' 'MASTER_PASSWORD = %s, ' 'MASTER_PORT = 3306, ' 'MASTER_CONNECT_RETRY = 60, ' 'MASTER_AUTO_POSITION = 1, ' 'MASTER_SSL = 0; ' 'START SLAVE;') mysql_exec(conn, sql, (primary, config.repl_user, config.repl_password,)) gets from Consul
  89. github.com/autopilotpattern ~/mysql/bin/manage.py def run_as_replica(node): try: ctx = dict(user=config.repl_user, password=config.repl_password, database=config.mysql_db)

    node.conn = wait_for_connection(**ctx) set_primary_for_replica(node.conn) except Exception as ex: log.exception(ex) def set_primary_for_replica(conn): """ Set up GTID-based replication to the primary; once this is set the replica will automatically try to catch up with the primary's last transactions. """ primary = get_primary_host() sql = ('CHANGE MASTER TO ' 'MASTER_HOST = %s, ' 'MASTER_USER = %s, ' 'MASTER_PASSWORD = %s, ' 'MASTER_PORT = 3306, ' 'MASTER_CONNECT_RETRY = 60, ' 'MASTER_AUTO_POSITION = 1, ' 'MASTER_SSL = 0; ' 'START SLAVE;') mysql_exec(conn, sql, (primary, config.repl_user, config.repl_password,)) Remember our preStart downloaded the snapshot
  90. github.com/autopilotpattern Container Pilot mysqld Consul health Lifecycle: health Exit code

    is 0? “I am mysql-12345. I am available at 192.168.100.2:4000. I am healthy for the next 10 seconds.” MySQL container
  91. github.com/autopilotpattern I’m the primary! Someone else is the primary! I’m

    a replica! Ask Consul for Primary Syncs up using snapshot and GTID
  92. github.com/autopilotpattern No Primary? I’m the Primary! I’m the primary! Someone

    else is the primary! I’m a replica! Ask Consul for Primary
  93. github.com/autopilotpattern No Primary? I’m the Primary? I’m the primary! Someone

    else is the primary! I’m a replica! Ask Consul for Primary Need to assert only 1 primary
  94. github.com/autopilotpattern No Primary? I’m the Primary? I’m the primary! Failed!

    Go back to start I’m the primary! Someone else is the primary! I’m a replica! Set lock in Consul w/ TTL Ask Consul for Primary
  95. github.com/autopilotpattern No Primary? I’m the Primary? I’m the primary! Failed!

    Go back to start I’m the primary! Someone else is the primary! I’m a replica! Set lock in Consul w/ TTL Ask Consul for Primary Update lock TTL w/ each health check. Rewrite ContainerPilot config and SIGHUP
  96. github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py",

    "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] }
  97. github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py",

    "services": [ { "name": “mysql-primary", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] } Rewrite & reload config
  98. github.com/autopilotpattern ~/mysql/bin/manage.py def health(): """ Run a simple health check.

    Also acts as a check for whether the ContainerPilot configuration needs to be reloaded (if it's been changed externally), or if we need to make a backup because the backup TTL has expired. """ node = MySQLNode() cp = ContainerPilot(node) if cp.update(): cp.reload() return was_ready = assert_initialized_for_state(node) # cp.reload() will exit early so no need to setup # connection until this point ctx = dict(user=config.repl_user, password=config.repl_password, timeout=cp.config['services'][0]['ttl']) node.conn = wait_for_connection(**ctx) # Update our lock on being the primary/standby. # If this lock is allowed to expire and the health check for the primary # fails, the `onChange` handlers for the replicas will try to self-elect # as primary by obtaining the lock. # If this node can update the lock but the DB fails its health check, # then the operator will need to manually intervene if they want to # force a failover. This architecture is a result of Consul not # permitting us to acquire a new lock on a health-checked session if the # health check is *currently* failing, but has the happy side-effect of # reducing the risk of flapping on a transient health check failure. if node.is_primary() or node.is_standby(): update_session_ttl() # Create a snapshot and send it to the object store. if all((node.is_snapshot_node(), (not is_backup_running()), (is_binlog_stale(node.conn) or is_time_for_snapshot()))): write_snapshot(node.conn) mysql_query(node.conn, 'SELECT 1', ())
  99. github.com/autopilotpattern ~/mysql/etc/containerpilot.json { "consul": "{{ .CONSUL }}:8500", "preStart": "python /usr/local/bin/manage.py",

    "services": [ { "name": "mysql", "port": 3306, "health": "python /usr/local/bin/manage.py health", "poll": 5, "ttl": 25 } ], "backends": [ { "name": "mysql-primary", "poll": 10, "onChange": "python /usr/local/bin/manage.py on_change" } ] }
  100. github.com/autopilotpattern Container Pilot mysqld Consul Where is mysql-primary? 192.168.1.100 MySQL

    container Lifecycle: onChange Check Consul for services listed in backends. Runs every poll seconds.
  101. github.com/autopilotpattern replica primary Healthy! Healthy! Failed! Ask Consul for Primary

    no change Ask Consul for Primary no change Ask Consul for Primary fire onChange handler
  102. github.com/autopilotpattern ~/mysql/bin/manage.py def on_change(): node = MySQLNode() ctx = dict(user=config.repl_user,

    password=config.repl_password, timeout=cp.config['services'][0]['ttl']) node.conn = wait_for_connection(**ctx) # need to stop replication whether we're the new primary or not stop_replication(node.conn) while True: try: # if there is no primary node, we'll try to obtain the lock. # if we get the lock we'll reload as the new primary, otherwise # someone else got the lock but we don't know who yet so loop primary = get_primary_node() if not primary: session_id = get_session(no_cache=True) if mark_with_session(PRIMARY_KEY, node.hostname, session_id): node.state = PRIMARY if cp.update(): cp.reload() return else: # we lost the race to lock the session for ourselves time.sleep(1) continue # we know who the primary is but not whether they're healthy. # if it's not healthy, we'll throw an exception and start over. ip = get_primary_host(primary=primary) if ip == node.ip: if cp.update(): cp.reload() return set_primary_for_replica(node.conn) return except Exception as ex: # This exception gets thrown if the session lock for `mysql-primary` # key has not expired yet (but there's no healthy primary either), # or if the replica's target primary isn't ready yet. log.debug(ex) time.sleep(1) # avoid hammering Consul continue
  103. github.com/autopilotpattern replica primary Healthy! Healthy! Failed! no change no change

    Ask Consul for Primary Ask Consul for Primary Ask Consul for Primary fire onChange handler
  104. github.com/autopilotpattern replica primary Healthy! Healthy! Failed! no change no change

    Ask Consul for Primary Ask Consul for Primary Ask Consul for Primary Ask Consul for Primary Ok, I’m primary Set lock in Consul Success! primary Healthy! fire onChange handler