Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leboncoin: Bare metal provisioning with ACDC

Avatar for Xavier Krantz Xavier Krantz
November 07, 2017

Leboncoin: Bare metal provisioning with ACDC

Bare metal and hardware management rethinked at Leboncoin.

Avatar for Xavier Krantz

Xavier Krantz

November 07, 2017
Tweet

More Decks by Xavier Krantz

Other Decks in Technology

Transcript

  1. 1

  2. 5

  3. 6

  4. 7

  5. 1.2 - Stack Technique 2 Datacenters 600 serveurs physiques (plus

    de 1000 avec les virtuels) 12 Gbits/s de débit sortant 6 To de BDD 8 300M d’images 15k req/s sur leboncoin.fr
  6. 1.2 - Stack Technique 2 Datacenters 600 serveurs physiques (plus

    de 1000 avec les virtuels) 12 Gbits/s de débit sortant 6 To de BDD 9 300M d’images 15k req/s sur leboncoin.fr
  7. 2.1 - Situation initiale • 1 - Operator ◦ find

    a free IP (Welcome ping !) • 3 - Foreman ◦ Go in Foreman and select a node ◦ Get the @MAC ◦ Create the node + put in build mode 12 • 4 - Puppet ◦ Reserve @Mac / DNS name in DHCP ◦ Commit + push ◦ Run the agent on every DHCP nodes • 2 - Puppet ◦ Reserve IP / DNS name in DNS ◦ Commit + push ◦ Run the agent on every DHCP nodes
  8. 2.1 - Situation initiale • 5 - Foreman ◦ Reboot

    the node via BMC plugin • 7 - Operator ◦ Follows with Java console 13 • 6 - Node installs ◦ Boot on network (PXE) ◦ DHCP redirects to TFTP ◦ TFTP serves the custom PXE config ◦ Pressed is rendered by Foreman
  9. 2.1 - Situation initiale • 5 - Foreman ◦ Reboot

    the node via BMC plugin • 7 - Operator ◦ Follows with Java console 14 • 6 - Node installs ◦ Boot on network (PXE) ◦ DHCP redirects to TFTP ◦ TFTP serves the custom PXE config ◦ Pressed is rendered by Foreman 6 manual steps Errors prone Human conflicts Time consuming
  10. 15

  11. 2.2 - Problématique • Simplifier le provisioning bare metal ◦

    Provisioning / installation non-supervisée ◦ 1 manual step 16
  12. 2.3 - Essai 1 - Foreman + SmartProxies Constat: Sous

    utilisation de Foreman. Solutions: Smart proxy pour automatiser : - IPAM + DHCP - DNS 17
  13. • Foreman Smart-proxy ◦ Not supported 2.3 - Essai 1

    - Foreman + SmartProxies • We ◦ 1 big zone file • Foreman Smart-proxy ◦ Dynamic updates = nsupdate ◦ Binary journal file + serial conflicts 18 • We ◦ Do nics bonding ◦ Need to register n@Macs <> 1 IP Pain points: DNS Pain points: DHCP
  14. 2.3 - Essai 1 - Foreman + SmartProxies • We

    ◦ Do not master Ruby ◦ Are not “a Tech company” ◦ Are not that big • Foreman & Smart-proxy ◦ Very complex code base ◦ Very complex UI ◦ Generic and have a lots (too many) of features 19 Pain points: Foreman
  15. 3.1 - Interface avec prestataire Celeris : Prestataire interventions en

    DC • Spreadsheet • DCIM : Netbox ◦ Open source ◦ Digital Ocean ◦ python + postgresql Intégration avec Foreman ? 21
  16. Problématique 2 • Automatiser la gestion du cycle de vie

    des machines physiques ◦ Discovery/intake ◦ Provisioning / installation non-supervisée ◦ Maintenance, decommission 23
  17. Collins • Project open source https://github.com/tumblr/collins • Machine à état

    imposée • Système de hook / callback arbitraire sur les transitions d’état • Metadata key / value arbitraires associées à chaque assets • Web UI + API http + firehose 24
  18. Collins: Tooling 25 API Clients • Go-collins • pycollins •

    Ruby libs ◦ collins-auth ◦ collins-client ◦ collins-notify ◦ collins-state ◦ ... CLI • collins-shell
  19. Collins: Cycle de vie 28 Workflows spécifiés : - Intake

    - Comissionnement - Maintenance - Décomissionnement
  20. 4.4 - Collins callbacks 35 • nowProvisioned ◦ on =

    "asset_update" ◦ When ▪ previous.state = "isProvisioning" ▪ && current.state = "isProvisioned" • provisionEvent ◦ on = "asset_update" ◦ When ▪ current.state = "isNew" • unallocated ◦ on = "asset_update" ◦ When ▪ current.state = "isUnallocated"
  21. 4.6 - Tooling 37 $ collins-shell INFO - ENV Variable

    COLLINS_CONFIG=/home/xkrantz/Sources/github.schibsted.io/leboncoin/acdc/conf/collins.yaml Tasks: collins-shell asset <command> # Asset related commands collins-shell asset_type <command> # Asset Type related commands collins-shell console # drop into the interactive collins shell collins-shell help [TASK] # Describe available tasks or one specific task collins-shell ip_address <command> # IP address related commands collins-shell ipmi <command> # IPMI related commands collins-shell latest # check if there is a newer version of collins-shell collins-shell log MESSAGE # log a message on an asset collins-shell logs TAG # fetch logs for an asset specified by its tag. Use "all" for a... collins-shell power ACTION --reason=REASON --tag=TAG # perform power action (off, on, rebootSoft, rebootHard, etc) o... collins-shell power_status # check power status on an asset collins-shell provision <command> # Provisioning related commands collins-shell search_logs QUERY # search for asset logs collins-shell state <command> # State management related commands - use with care collins-shell tag <command> # Tag related commands collins-shell version # current version of collins-shell
  22. 5 - Next ACDC v2 Rework • Discovery • OS

    bootstrapping Add • Disk management • Firmware updates • Any maintenance tasks 39
  23. 5 - Next ACDC v2 Rework • Discovery • OS

    bootstrapping Add • Disk management • Firmware updates • Any maintenance tasks Discovery • Currently: ◦ Genesis (Tumblr) ◦ Ruby DSL (Chef like) • Next: ◦ CoreOS in Memory + Ansible 40
  24. 5 - Next ACDC v2 Rework • Discovery • OS

    bootstrapping Add • Disk management • Firmware updates • Any maintenance tasks OS Bootstrapping • Currently: ◦ Pressed / Kickstart ◦ Shell scripts • Next: ◦ CoreOS in Memory + Ansible 41