Slide 1

Slide 1 text

Nestor A tool to rule ‘em all

Slide 2

Slide 2 text

About me_ 2 Saâd Dif SRE [email protected] eng.kapten.com @Kapten_tech @kapten-engineering ChauffeurPrive

Slide 3

Slide 3 text

Agenda_ 3 Kingdom of Kapten_ Once upon a time_ Uh oh_ Nestor_ They lived happily_

Slide 4

Slide 4 text

Kingdom of Kapten_

Slide 5

Slide 5 text

Kapten_History 5 Passenger_ Driver_

Slide 6

Slide 6 text

Kapten_History 6

Slide 7

Slide 7 text

2016 180 160 140 120 100 80 60 40 20 0 2017 2018 48,6 100 160 Revenue 2016 – 2017 – 2018 Kapten_Growth 7 3 millions customers 400 collaborators 50 000 partner drivers 3000 client companies french ride hailing leader +60%

Slide 8

Slide 8 text

Kapten_Production 8 1TB storage 60k metrics 15k inc / min 100TB storage 780k logs / min 50B lines of log 100+ Nodes 130+ µ-services 20+ deploy / day 1k+ processes

Slide 9

Slide 9 text

Agenda_ 9 Kingdom of Kapten_ Once upon a time_ Uh oh_ Nestor_ They lived happily_

Slide 10

Slide 10 text

Heroku_ 10 ● Why ? - PaaS over IaaS - Easy to handle and scale for dev - Ready to use - AutoBuild for various languages - Everything as a Service ● Configuration management ? - Git repositories letsdeploy/letsdeploy-config - Hand provisioned servers ● Deployments ? - Letsdeploy

Slide 11

Slide 11 text

Letsdeploy_ 11 ● Python scripts ● Release management ● Configuration management ● Scaling

Slide 12

Slide 12 text

Letsdeploy-config_ 12 ● One file per micro service ● Three sections for each configuration file { "source": "[email protected]:transcovo/nestor-charybde.git", "target": { "heroku_app_name": "prod-nestor-charybde-cp-eu", "type": "heroku" }, "variables": { "AWS_ACCESS_KEY_ID": "*****", "AWS_SECRET_ACCESS_KEY": "*****", "LOGGER_LEVEL": "info", "LOGGER_NAME": "production.nestor-charybde", "NODE_ENV": "production", "NPM_TOKEN": "*****", "SLACK_TOKEN": "*****" } }

Slide 13

Slide 13 text

Nestor_ 13 Workflow management ● Master ● Greenlight ● Non-prod environments ● Production

Slide 14

Slide 14 text

Nestor_ 14 Workflow management

Slide 15

Slide 15 text

Agenda_ 15 Kingdom of Kapten_ Once upon a time_ Uh oh_ Nestor_ They lived happily_

Slide 16

Slide 16 text

Get out of Heroku_ 16 ● Ending contract

Slide 17

Slide 17 text

Get out of Heroku_ 17 ● Ending contract ● Move to Kubernetes

Slide 18

Slide 18 text

Get out of Heroku_ 18 ● Ending contract ● Move to Kubernetes ○ Scalability ○ Costs ○ Real infra ○ Security

Slide 19

Slide 19 text

What we need_ 19 ● Simple and easy deployment tool ○ Command wrapper ○ Same tool for every environments ○ End users: Our developers

Slide 20

Slide 20 text

What we need_ 20 12 Factors (12Factor.net) ● Build, release, run ● Processes ● Port Binding ● Concurrency ● Disposability ● Dev/Prod parity ● Logs ● Admin processes ● Codebase ● Dependencies ● Config ● Backing services

Slide 21

Slide 21 text

What we need_ 21 12Factor.net ● Our microservices should respect those principles ● The tool should help us doing so

Slide 22

Slide 22 text

What we need_ 22 ● Usable by everybody

Slide 23

Slide 23 text

The clock is ticking_ 23 ● Time constraint ● Need to scale fast ● Ansible / Terraform

Slide 24

Slide 24 text

Agenda_ 24 Kingdom of Kapten_ Once upon a time_ Uh oh_ Nestor_ They lived happily_

Slide 25

Slide 25 text

Nestor_ 25 ● Rewrite existing tool ● Knowledge already present in all teams ● First usecase: dev environments

Slide 26

Slide 26 text

Nestor_ 26 ● Able to deploy on different platforms

Slide 27

Slide 27 text

Nestor_ 27 Three key components: ● CLI ● API ● CRON

Slide 28

Slide 28 text

Nestor_ 28 CLI: ● Workflow management ● Release related commands ● Configuration ● Datastore management

Slide 29

Slide 29 text

Nestor_ 29 Wrapper around kubectl: ● Port forward ● Switch environments ● ...

Slide 30

Slide 30 text

Nestor_ 30 API: ● NodeJS ● Manage workflow ● Triggered by CI to initiate builds ● Push images to DockerHUB

Slide 31

Slide 31 text

Nestor_ 31 API: ● DockerFile ● ProcFile ● CronFile

Slide 32

Slide 32 text

Nestor_ 32 API: ● Called by Rundeck ● One build per code delivery

Slide 33

Slide 33 text

Nestor_ 33 CRON: ● All environments updated from “staging” apps versions ● Releases on Staging every 30 minutes ● Datastore snapshot and reset of all dev environments ● Apps configuration from staging sync to dev environments

Slide 34

Slide 34 text

Nestor_ 34 Configuration built dynamically: ● Kubernetes templates ● “project.yaml” file for global configuration ● File for each micro service ● Children environments ● Merge of files

Slide 35

Slide 35 text

Nestor_ 35 Kubernetes templates: $ tree nestor-config/templates nestor-config/templates ├── anti-affinity-node.yaml ├── anti-affinity-zone.yaml ├── config-map.yaml ├── cronjob.yaml ├── deployment.yaml ├── hpa.yaml ├── ingress-app.yaml ├── ingress-global.yaml ├── ingress-nginx-global.yaml ├── job.yaml ├── namespace.yaml ├── nginx.conf ├── secret-tls.yaml └── service.yaml 0 directories, 14 files $ cat nestor-config/templates/service.yaml apiVersion: v1 kind: Service metadata: name: '{{name}}' labels: app: '{{app}}' spec: ports: - port: 80 targetPort: {{target_port}} selector: app: '{{app}}' process: web type: ClusterIP

Slide 36

Slide 36 text

Nestor_ 36 project.yaml file: $ cat nestor-config/project.yaml project: kapten env: production domain: production.kapten.com docker: build: variables: NPM_TOKEN: $NPM_TOKEN registries: docker.com: - id: docker organization: kapten ... ... deployments: kubernetes: - cluster_name: kapten_production_eu-w1 hpa_replicas: true scales: web: minReplicas: 3 maxReplicas: 3 resources: web: limits: memory: 256Mi cpu: 0.2 nodeSelector: default: tier: app ... ... variables: ope: NODE_ENV: 'production' METRICS_DESTINATION: metrics.kapten.com slack: token: '**********' channels: info: tech-release-prod error: tech-release-prod ...

Slide 37

Slide 37 text

Nestor_ 37 App configuration file: app: mary git: origin: [email protected]:transcovo/mary.git is_enabled: true resources: web: limits: cpu: 0.2 memory: 250Mi requests: cpu: 0.2 memory: 250Mi scales: web: maxReplicas: 9 minReplicas: 3 targetCPUUtilizationPercentage: 75 teams: - security variables: app: {} ope: SENTRY_DSN: "********"

Slide 38

Slide 38 text

Nestor_ 38 Monitoring and alerting in configuration files: ● Routing ● Threshold Config validation templateVars: tplCriticity: high tplTeam: security tplWeb2xxThreshold: "0" tplWeb50thLatencyThreshold: "0.30" tplWeb95thLatencyThreshold: "2"

Slide 39

Slide 39 text

Nestor_ 39 History management ● Nestor history ● History saved on specific repository ● Used for rollbacks

Slide 40

Slide 40 text

Nestor_ 40 Rollbacks: ● Apply previous yaml ● Use a specific commit id

Slide 41

Slide 41 text

Nestor_ 41 Most used features by developers: ● Deploy specific branch ● Port forwarding ● Switch between environments ● Datastore management

Slide 42

Slide 42 text

From code... Workflow with Nestor_ 42 e2e tests Load tests Unit tests Testing env. Monitoring To prod... ...in minutes with nestor_!

Slide 43

Slide 43 text

Workflow_ dev 1 dev X . . . dev 2 Master Staging GreenLight Production Peer Review Terminator Shadow CircleCI success (master branch) ● Create a git tag ● Build docker Image ● Rebase Greenlight branch from Master Legend ● Rebase Terminator, Staging and Shadow from Greenlight Nestor-api GL success ● Rebase Production from Shadow Rundeck Deploy Nestor-api Rebase and move git tag Nestor-api Call nestor-api

Slide 44

Slide 44 text

Agenda_ 44 Kingdom of Kapten_ Once upon a time_ Uh oh_ Nestor_ They lived happily_

Slide 45

Slide 45 text

To sum up_ 45 ● From PaaS to IaaS ● From deployment workflow to release and config management ● Migration took 4 months

Slide 46

Slide 46 text

Thoughts_ 46 ● We needed Nestor ● Making your own tool / wrapper is not a shame ● Answered a specific need at a specific time

Slide 47

Slide 47 text

Next steps_ 47 ● Container build ● CLI Rewrite ● ...

Slide 48

Slide 48 text

Next steps_ 48 Helm ? ● Community based ● No need to rewrite what is already here ● New V3 ● Stable and reliable

Slide 49

Slide 49 text

Next steps_ 49 To study: ● Debug pod live (Telepresence / Monday) ● Make it available for everyone ?

Slide 50

Slide 50 text

Thank you_! 50 eng.kapten.com @Kapten_tech @kapten-engineering ChauffeurPrive 15€ of credit with the promo code SYSADMIN