Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nomad the automated hard way (along with Consul, Vault and Terraform)

Nomad the automated hard way (along with Consul, Vault and Terraform)

Talk about our experience on the setup of a Container Service with the HashiCorp stack and some grains of Saltstack. We'll show how we secured the systems.
All is automated to deliver TLS certificates to all services and backend during each steps of bootstrapping the service, with ACL, roles and PKIs plus the auto-unseal service of Vault. An new environment can be boostraped from zero with a few human interactions due to the ACL security.

YoanBlanc - @greut
Marc-AurèleBrothier - @marcaurele
Jean-PhilippeMenil - @jpmenil

Marc-Aurèle Brothier

June 04, 2020
Tweet

More Decks by Marc-Aurèle Brothier

Other Decks in Technology

Transcript

  1. NOMAD NOMAD THE THE AUTOMATED HARDWAY AUTOMATED HARDWAY Yoan Blanc

    - @greut Marc-Aurèle Brothier - @marcaurele Jean-Philippe Menil - @jpmenil June 2020
  2. WE WILL TALK ABOUT WE WILL TALK ABOUT context &

    goal around our container journey how we chose Nomad how we built up our PoC how the different environments are running feedback on the journey
  3. CONTEXT CONTEXT 3-4 environments 3 PKIs (certificates done by hand)

    Developers on Windows 10 (since last fall)
  4. CONTEXT CONTEXT 3-4 environments 3 PKIs (certificates done by hand)

    Developers on Windows 10 (since last fall) Releases to production done ~7 times a year (two major ones)
  5. CONTEXT CONTEXT 3-4 environments 3 PKIs (certificates done by hand)

    Developers on Windows 10 (since last fall) Releases to production done ~7 times a year (two major ones) Mainly WebSphere and some Tomcat servers
  6. CONTEXT CONTEXT 3-4 environments 3 PKIs (certificates done by hand)

    Developers on Windows 10 (since last fall) Releases to production done ~7 times a year (two major ones) Mainly WebSphere and some Tomcat servers No ops teams
  7. CONTEXT CONTEXT 3-4 environments 3 PKIs (certificates done by hand)

    Developers on Windows 10 (since last fall) Releases to production done ~7 times a year (two major ones) Mainly WebSphere and some Tomcat servers No ops teams No APIs for ordering VMs
  8. CONTEXT CONTEXT 3-4 environments 3 PKIs (certificates done by hand)

    Developers on Windows 10 (since last fall) Releases to production done ~7 times a year (two major ones) Mainly WebSphere and some Tomcat servers No ops teams No APIs for ordering VMs
  9. INITIAL GOAL INITIAL GOAL 1. Move away from J2EE server

    (WAS) ➜ Tomcat 2. Automated deployments 3. Containers maybe¹? ¹: no needs for a long term solution
  10. READ BETWEEN THE READ BETWEEN THE LINES LINES People are

    unhappy with the process from creating a release to having it in production.
  11. INCEPTION OF DEVOPS INCEPTION OF DEVOPS GitLab with its CI/CD

    Remove WebSphere App Server ➜ Tomcat
  12. REQUIREMENTS REQUIREMENTS Need a reasonable complex solution Need something deployable

    soon Need to ship small apps to production (not clusters of µ- services) Teams should be able to deploy High availability Central secrets management Automated TLS certificates gitops / API / Automation
  13. REQUIREMENTS REQUIREMENTS Need a reasonable complex solution Need something deployable

    soon Need to ship small apps to production (not clusters of µ- services) Teams should be able to deploy High availability Central secrets management Automated TLS certificates gitops / API / Automation
  14. REQUIREMENTS REQUIREMENTS Need a reasonable complex solution Need something deployable

    soon Need to ship small apps to production (not clusters of µ- services) Teams should be able to deploy High availability Central secrets management Automated TLS certificates gitops / API / Automation
  15. REQUIREMENTS REQUIREMENTS Need a reasonable complex solution Need something deployable

    soon Need to ship small apps to production (not clusters of µ- services) Teams should be able to deploy High availability Central secrets management Automated TLS certificates gitops / API / Automation
  16. REQUIREMENTS REQUIREMENTS Need a reasonable complex solution Need something deployable

    soon Need to ship small apps to production (not clusters of µ- services) Teams should be able to deploy High availability (it’s ) Central secrets management Automated TLS certificates gitops / API / Automation
  17. REQUIREMENTS REQUIREMENTS Need a reasonable complex solution Need something deployable

    soon Need to ship small apps to production (not clusters of µ- services) Teams should be able to deploy High availability (it’s ) Central secrets management Automated TLS certificates gitops / API / Automation
  18. REQUIREMENTS REQUIREMENTS Need a reasonable complex solution Need something deployable

    soon Need to ship small apps to production (not clusters of µ- services) Teams should be able to deploy High availability (it’s ) Central secrets management Automated TLS certificates gitops / API / Automation
  19. REQUIREMENTS REQUIREMENTS Need a reasonable complex solution Need something deployable

    soon Need to ship small apps to production (not clusters of µ- services) Teams should be able to deploy High availability (it’s ) Central secrets management Automated TLS certificates gitops / API / Automation (no )
  20. OUR PLAN OUR PLAN 1. Use Git for everything 2.

    Build everything with containers 3. Migrate applications to Tomcat 4. Offer a solution with the lowest entrypoint 5. … 6. Profit!
  21. WHY NOMAD? WHY NOMAD? circa July 2019 1. Docker Swarm

    2. Mesos + Marathon 3. Nomad 4. Kubernetes 5. OpenShi
  22. WHY NOMAD? WHY NOMAD? circa July 2019 1. Docker Swarm

    2. Mesos + Marathon 3. Nomad 4. Kubernetes 5. OpenShi
  23. NOMAD / CONSUL NOMAD / CONSUL It’s very easy to

    setup a proof-of-concept¹ You can play with the components separately Not a Docker-only solution ¹: open source
  24. TRADEOFFS W.R.T TRADEOFFS W.R.T KUBERNETES KUBERNETES No CNI - you’ll

    have to deal with IP and port Small community - not everything is an operator away Clusters are simpler - easier to manage SELinux not superbly supported¹ Less is More or the Paradox of Choice ¹: Pull Request is pending #7624
  25. BACK TO OUR BACK TO OUR REQUIREMENTS REQUIREMENTS Need a

    reasonable complex solution Need something deployable soon Need to ship small apps to production Teams should be able to deploy High availability Central secrets management Automated TLS certificates gitops / API / Automation
  26. BACK TO OUR BACK TO OUR REQUIREMENTS REQUIREMENTS Need a

    reasonable complex solution ✔ Need something deployable soon Need to ship small apps to production Teams should be able to deploy High availability Central secrets management Automated TLS certificates gitops / API / Automation
  27. BACK TO OUR BACK TO OUR REQUIREMENTS REQUIREMENTS Need a

    reasonable complex solution ✔ Need something deployable soon ✔ Need to ship small apps to production Teams should be able to deploy High availability Central secrets management Automated TLS certificates gitops / API / Automation
  28. BACK TO OUR BACK TO OUR REQUIREMENTS REQUIREMENTS Need a

    reasonable complex solution ✔ Need something deployable soon ✔ Need to ship small apps to production Teams should be able to deploy High availability ✔ Central secrets management Automated TLS certificates gitops / API / Automation
  29. BACK TO OUR BACK TO OUR REQUIREMENTS REQUIREMENTS Need a

    reasonable complex solution ✔ Need something deployable soon ✔ Need to ship small apps to production Teams should be able to deploy High availability ✔ Central secrets management ✔ (Vault integration) Automated TLS certificates gitops / API / Automation
  30. BACK TO OUR BACK TO OUR REQUIREMENTS REQUIREMENTS Need a

    reasonable complex solution ✔ Need something deployable soon ✔ Need to ship small apps to production Teams should be able to deploy High availability ✔ Central secrets management ✔ (Vault integration) Automated TLS certificates ✔ (Vault) gitops / API / Automation
  31. BACK TO OUR BACK TO OUR REQUIREMENTS REQUIREMENTS Need a

    reasonable complex solution ✔ Need something deployable soon ✔ Need to ship small apps to production Teams should be able to deploy High availability ✔ Central secrets management ✔ (Vault integration) Automated TLS certificates ✔ (Vault) gitops / API / Automation ✔
  32. NOMAD / CONSUL / TRÆFIK NOMAD / CONSUL / TRÆFIK

    No HTTPs No Secrets Load-Balancing Rolling Update Real applications talking to the actual backends Demoing on a single computer
  33. PER ENVIRONMENT X nodes X nodes SECOPS X nodes Transit

    secrets engine CA SecOps Env ABC Master key CA Env ABC Load Balancer L7 APP Root CA ABC Vault Architecture
  34. MEET THE PETS MEET THE PETS A set of VMs

    we have to take care of One Salt Master to rule them all Everything is in a Git repository
  35. Salt master Container registry SA SA CaaS environment XXX for

    domain XXX.SUB.DOMAIN.COM CaaS Masters zone VM - Master nodes VM - Master nodes VM - Master nodes VM - Container nodes VM - Container nodes APP VM - Container nodes VM - Container nodes LB LB Load Balancer L7 Load Balancer L7 APP APP TCP/UDP/8301 TCP/UDP/8301 TCP/UDP/8301 TCP/UDP/8301 TCP/4647 TCP/UDP/4648 TCP/8300 TCP/8201 TCP/4647 TCP/UDP/4648 TCP/UDP/8301 TCP/8300 TCP/UDP/8301 TCP/8300 TCP/UDP/8301 TCP/8200 TCP/UDP/8301 TCP/8300 TCP/4647 TCP/4647 TCP/4647 TCP/4647 TCP/8200 TCP/8201 TCP/4506 TCP/4505 TCP/4506 TCP/4505 TCP/443 TCP/443 TCP/8300: server RPC TCP/UDP/8301: Lan Serf (gossip) TCP/8501: HTTP API over TLS TCP/8200: HTTP API TCP/8201: Vault replication traffic, request forwarding TCP/4646: HTTP API TCP/4647: RPC for internal communication TCP/UDP/4648: Serf LAN server gossip TCP/UDP/53 TCP/UDP/53 TCP/UDP/53 TCP/UDP/53 TCP/443 TCP/UDP/8600 TCP/UDP/8600 TCP/UDP/8600 Opérations TCP/8501 TCP/8200 TCP/4646 TCP/443 TCP/443 Main DNS Zone Vault SecOps s TCP/8200 TCP/8200 TCP/8200 TCP/UDP/53 TCP/UDP/53 Users APP
  36. SECURE THE CAAS SECURE THE CAAS + ACL (Consul &

    Nomad) Cert signing dance GPG to store few data in SaltStack
  37. SALT SALT no grains, roles in pillars saltenv use of

    reactors orchestration salt top.sls: {{ saltenv }}: '*': - certs - salt.minion {{ grains.id }}: {% for state in pillar.get('roles', {}) %} - {{ state | replace("-", ".") }} {% endfor %}
  38. SALT SALT react on event: reactor: - 'salt/minion/*/start': - salt://reactor/startup_orch.sls

    - 'salt/fileserver/gitfs/update': - salt://reactor/update_fileserver.sls startup_orchestration: runner.state.orchestrate: - args: - mods: orch.startup - pillar: minion: {{ data['id'] }}
  39. SALT SALT orchestration: saltutil.sync_all: salt.function: - tgt: '{{ pillar['minion'] }}'

    - reload_modules: True saltutil.refresh_grains: salt.function: - tgt: '{{ pillar['minion'] }}' mine.update: salt.function: - tgt: '{{ pillar['minion'] }}' highstate_run: salt.state: - tgt: '{{ pillar['minion'] }}' - highstate: True
  40. DNS DNS CoreDNS on master nodes SaltStack pushing configuration Consul-template¹

    pushing configuration forward for <dc>.consul ¹: no official plugins to read Consul Catalog
  41. TEMPLATE IN TEMPLATE IN TEMPLATE TEMPLATE {%- if consul_template %}

    <- range services > <- if .Tags | contains "traefik.enable=true"> {%- for ip in vip %} < .Name > IN A {{ ip }} {%- endfor %} <- end > <- end > <- range service "vault" > <- if .Tags | contains "active"> vault IN A < .Address > <- end > <- end > {%- endif %}
  42. LOG AGGREGATION LOG AGGREGATION A lot of investigations (fluentd, gelf,

    …) Journalbeat on the host Elastic Search¹ running in Nomad Kibana with the LDAP integration Roles mapping pushed with terraform ¹: OpenDistro
  43. MONITORING MONITORING Prometheus consumes the Consul Catalog Grafana to display

    dashboards (using grafonnet!) Ad hoc config using consul-template
  44. EXAMPLE EXAMPLE - job_name: 'bcv' consul_sd_configs: - server: {{ (env

    "attr.unique.network.ip-address") }}:8501 token: {{ .Data.token | toJSON }} ... relabel_configs: - source_labels: ['__meta_consul_service_metadata_metrics_path'] target_label: '__metrics_path__' regex: '(.*)' replacement: $1 - source_labels: ['__meta_consul_service_metadata_metrics_scheme'] target_label: '__scheme__' regex: '(https?)' replacement: $1 - source_labels: ['__meta_consul_node'] target_label: 'instance' replacement: $1 - source_labels: ['__meta_consul_service'] target_label: 'job' replacement: $1
  45. THE GLUE THE GLUE job "xyz" { ... } metrics.path

    traefik... service ... Prometheus XYZ.CORP.LOCAL https://xyz.corp.local/ Træfik
  46. CONTINUOUS CONTINUOUS DEPLOYMENT ON DEPLOYMENT ON NOMAD NOMAD What does

    it look like, from the developer’s point of view.
  47. 1. Create a release from GitLab CI; 2. Ask for

    the deployment of a new release via a Merge Request; 3. Test it live!
  48. PIPELINE TRIGGER PIPELINE TRIGGER curl -X POST \ -F "token=$TOKEN"

    \ -F "ref=master" \ -F "variables[artifactId]=myapp" \ -F "variables[version]=$CI_COMMIT_TAG" \ $CI_API_V4_URL/projects/1234/trigger/pipeline
  49. version: 1.2.3 logging: INFO task { driver = "docker" config

    { image = "hello/world:[[ .version ]]" } env { LOG = [[ .logging | toJSON ]] } }
  50. KERNEL KERNEL cgroups in kernel 3.x¹… kernel memory controller is

    well known to be buggy in 3.x releases, until we disable it completely: Runtime error when using namespaces to isolate containers: ¹: latest release 3.10 powering RHEL 7 cgroup.memory=nokmem namespace.unpriv_enable=1 user_namespace.enable=1 sysctl -w user.max_user_namespaces=15000
  51. ZOMBIES PROCESS ZOMBIES PROCESS We blindly copy-paste systemd unit for

    consul¹ to consul- template. From man systemd.kill: ¹: https://learn.hashicorp.com/consul/datacenter-deploy/deployment-guide#configure-systemd KillMode=process If set to control-group, all remaining processes in the control group of this unit will be killed on unit stop. If set to process, only the main process itself is killed.
  52. CLUSTER UPGRADE CLUSTER UPGRADE Done manually: salt -E '^h(:?ma|st|wo)ad0[0-9]s' schedule.disable_job

    highstate salt -E '^h(:?ma|st|wo)ad0[0-9]s' state.apply consul.install pillar='{"download_only":1}' salt hmaad03s state.apply consul.install salt hmaad03s cmd.run 'systemctl restart consul'
  53. WHERE ARE WE NOW? WHERE ARE WE NOW? DEV environment

    is up PREPROD environment is up Containers are coming in Complete benchmarking will be done in PREPROD env soon PROD coming ??? Dunno
  54. WHAT WE’VE LEARNED WHAT WE’VE LEARNED Vault: more challenging (&

    more confident) Nomad: does the job as expected Change of paradigm: not to underestimate! ACL everywhere: it’s no playground anymore