SysadminDays - Nestor

0db58914e0e9877481c530903739b983?s=47 Saâd Dif
November 19, 2019

SysadminDays - Nestor

Déploiement et gestion du cycle de vie de micro-services sur Kubernetes avec un outil maison

Kapten's tech team is looking for a tool to deploy and manage their microservices. The quest to find a sharp tool looked gloomy at that time without any existing tool answering all the pressing needs for our knights. Knights of the round table gathered and embraced the challenge to give birth to Nestor. One tool to rule them all!

Nestor is our valiant tool allowing every soul in the tech kingdom to interact with microservices without the need to be a container or orchestration wizard.

Our fearless tech team uses it daily, its goal is simple yet complex: Manage, configure and deploy our audacious microservices stack, which is growing at the speed of light.

0db58914e0e9877481c530903739b983?s=128

Saâd Dif

November 19, 2019
Tweet

Transcript

  1. Nestor A tool to rule ‘em all

  2. About me_ 2 Saâd Dif SRE saad.dif@kapten.com eng.kapten.com @Kapten_tech @kapten-engineering

    ChauffeurPrive
  3. Agenda_ 3 Kingdom of Kapten_ Once upon a time_ Uh

    oh_ Nestor_ They lived happily_
  4. Kingdom of Kapten_

  5. Kapten_History 5 Passenger_ Driver_

  6. Kapten_History 6

  7. 2016 180 160 140 120 100 80 60 40 20

    0 2017 2018 48,6 100 160 Revenue 2016 – 2017 – 2018 Kapten_Growth 7 3 millions customers 400 collaborators 50 000 partner drivers 3000 client companies french ride hailing leader +60%
  8. Kapten_Production 8 1TB storage 60k metrics 15k inc / min

    100TB storage 780k logs / min 50B lines of log 100+ Nodes 130+ µ-services 20+ deploy / day 1k+ processes
  9. Agenda_ 9 Kingdom of Kapten_ Once upon a time_ Uh

    oh_ Nestor_ They lived happily_
  10. Heroku_ 10 • Why ? - PaaS over IaaS -

    Easy to handle and scale for dev - Ready to use - AutoBuild for various languages - Everything as a Service • Configuration management ? - Git repositories letsdeploy/letsdeploy-config - Hand provisioned servers • Deployments ? - Letsdeploy
  11. Letsdeploy_ 11 • Python scripts • Release management • Configuration

    management • Scaling
  12. Letsdeploy-config_ 12 • One file per micro service • Three

    sections for each configuration file { "source": "git@github.com:transcovo/nestor-charybde.git", "target": { "heroku_app_name": "prod-nestor-charybde-cp-eu", "type": "heroku" }, "variables": { "AWS_ACCESS_KEY_ID": "*****", "AWS_SECRET_ACCESS_KEY": "*****", "LOGGER_LEVEL": "info", "LOGGER_NAME": "production.nestor-charybde", "NODE_ENV": "production", "NPM_TOKEN": "*****", "SLACK_TOKEN": "*****" } }
  13. Nestor_ 13 Workflow management • Master • Greenlight • Non-prod

    environments • Production
  14. Nestor_ 14 Workflow management

  15. Agenda_ 15 Kingdom of Kapten_ Once upon a time_ Uh

    oh_ Nestor_ They lived happily_
  16. Get out of Heroku_ 16 • Ending contract

  17. Get out of Heroku_ 17 • Ending contract • Move

    to Kubernetes
  18. Get out of Heroku_ 18 • Ending contract • Move

    to Kubernetes ◦ Scalability ◦ Costs ◦ Real infra ◦ Security
  19. What we need_ 19 • Simple and easy deployment tool

    ◦ Command wrapper ◦ Same tool for every environments ◦ End users: Our developers
  20. What we need_ 20 12 Factors (12Factor.net) • Build, release,

    run • Processes • Port Binding • Concurrency • Disposability • Dev/Prod parity • Logs • Admin processes • Codebase • Dependencies • Config • Backing services
  21. What we need_ 21 12Factor.net • Our microservices should respect

    those principles • The tool should help us doing so
  22. What we need_ 22 • Usable by everybody

  23. The clock is ticking_ 23 • Time constraint • Need

    to scale fast • Ansible / Terraform
  24. Agenda_ 24 Kingdom of Kapten_ Once upon a time_ Uh

    oh_ Nestor_ They lived happily_
  25. Nestor_ 25 • Rewrite existing tool • Knowledge already present

    in all teams • First usecase: dev environments
  26. Nestor_ 26 • Able to deploy on different platforms

  27. Nestor_ 27 Three key components: • CLI • API •

    CRON
  28. Nestor_ 28 CLI: • Workflow management • Release related commands

    • Configuration • Datastore management
  29. Nestor_ 29 Wrapper around kubectl: • Port forward • Switch

    environments • ...
  30. Nestor_ 30 API: • NodeJS • Manage workflow • Triggered

    by CI to initiate builds • Push images to DockerHUB
  31. Nestor_ 31 API: • DockerFile • ProcFile • CronFile

  32. Nestor_ 32 API: • Called by Rundeck • One build

    per code delivery
  33. Nestor_ 33 CRON: • All environments updated from “staging” apps

    versions • Releases on Staging every 30 minutes • Datastore snapshot and reset of all dev environments • Apps configuration from staging sync to dev environments
  34. Nestor_ 34 Configuration built dynamically: • Kubernetes templates • “project.yaml”

    file for global configuration • File for each micro service • Children environments • Merge of files
  35. Nestor_ 35 Kubernetes templates: $ tree nestor-config/templates nestor-config/templates ├── anti-affinity-node.yaml

    ├── anti-affinity-zone.yaml ├── config-map.yaml ├── cronjob.yaml ├── deployment.yaml ├── hpa.yaml ├── ingress-app.yaml ├── ingress-global.yaml ├── ingress-nginx-global.yaml ├── job.yaml ├── namespace.yaml ├── nginx.conf ├── secret-tls.yaml └── service.yaml 0 directories, 14 files $ cat nestor-config/templates/service.yaml apiVersion: v1 kind: Service metadata: name: '{{name}}' labels: app: '{{app}}' spec: ports: - port: 80 targetPort: {{target_port}} selector: app: '{{app}}' process: web type: ClusterIP
  36. Nestor_ 36 project.yaml file: $ cat nestor-config/project.yaml project: kapten env:

    production domain: production.kapten.com docker: build: variables: NPM_TOKEN: $NPM_TOKEN registries: docker.com: - id: docker organization: kapten ... ... deployments: kubernetes: - cluster_name: kapten_production_eu-w1 hpa_replicas: true scales: web: minReplicas: 3 maxReplicas: 3 resources: web: limits: memory: 256Mi cpu: 0.2 nodeSelector: default: tier: app ... ... variables: ope: NODE_ENV: 'production' METRICS_DESTINATION: metrics.kapten.com slack: token: '**********' channels: info: tech-release-prod error: tech-release-prod ...
  37. Nestor_ 37 App configuration file: app: mary git: origin: git@github.com:transcovo/mary.git

    is_enabled: true resources: web: limits: cpu: 0.2 memory: 250Mi requests: cpu: 0.2 memory: 250Mi scales: web: maxReplicas: 9 minReplicas: 3 targetCPUUtilizationPercentage: 75 teams: - security variables: app: {} ope: SENTRY_DSN: "********"
  38. Nestor_ 38 Monitoring and alerting in configuration files: • Routing

    • Threshold Config validation templateVars: tplCriticity: high tplTeam: security tplWeb2xxThreshold: "0" tplWeb50thLatencyThreshold: "0.30" tplWeb95thLatencyThreshold: "2"
  39. Nestor_ 39 History management • Nestor history • History saved

    on specific repository • Used for rollbacks
  40. Nestor_ 40 Rollbacks: • Apply previous yaml • Use a

    specific commit id
  41. Nestor_ 41 Most used features by developers: • Deploy specific

    branch • Port forwarding • Switch between environments • Datastore management
  42. From code... Workflow with Nestor_ 42 e2e tests Load tests

    Unit tests Testing env. Monitoring To prod... ...in minutes with nestor_!
  43. Workflow_ dev 1 dev X . . . dev 2

    Master Staging GreenLight Production Peer Review Terminator Shadow CircleCI success (master branch) • Create a git tag • Build docker Image • Rebase Greenlight branch from Master Legend • Rebase Terminator, Staging and Shadow from Greenlight Nestor-api GL success • Rebase Production from Shadow Rundeck Deploy Nestor-api Rebase and move git tag Nestor-api Call nestor-api
  44. Agenda_ 44 Kingdom of Kapten_ Once upon a time_ Uh

    oh_ Nestor_ They lived happily_
  45. To sum up_ 45 • From PaaS to IaaS •

    From deployment workflow to release and config management • Migration took 4 months
  46. Thoughts_ 46 • We needed Nestor • Making your own

    tool / wrapper is not a shame • Answered a specific need at a specific time
  47. Next steps_ 47 • Container build • CLI Rewrite •

    ...
  48. Next steps_ 48 Helm ? • Community based • No

    need to rewrite what is already here • New V3 • Stable and reliable
  49. Next steps_ 49 To study: • Debug pod live (Telepresence

    / Monday) • Make it available for everyone ?
  50. Thank you_! 50 eng.kapten.com @Kapten_tech @kapten-engineering ChauffeurPrive 15€ of credit

    with the promo code SYSADMIN