Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leboncoin: Bare metal provisioning with ACDC

Xavier Krantz
November 07, 2017

Leboncoin: Bare metal provisioning with ACDC

Bare metal and hardware management rethinked at Leboncoin.

Xavier Krantz

November 07, 2017
Tweet

More Decks by Xavier Krantz

Other Decks in Technology

Transcript

  1. 1

  2. 5

  3. 6

  4. 7

  5. 1.2 - Stack Technique 2 Datacenters 600 serveurs physiques (plus

    de 1000 avec les virtuels) 12 Gbits/s de débit sortant 6 To de BDD 8 300M d’images 15k req/s sur leboncoin.fr
  6. 1.2 - Stack Technique 2 Datacenters 600 serveurs physiques (plus

    de 1000 avec les virtuels) 12 Gbits/s de débit sortant 6 To de BDD 9 300M d’images 15k req/s sur leboncoin.fr
  7. 2.1 - Situation initiale • 1 - Operator ◦ find

    a free IP (Welcome ping !) • 3 - Foreman ◦ Go in Foreman and select a node ◦ Get the @MAC ◦ Create the node + put in build mode 12 • 4 - Puppet ◦ Reserve @Mac / DNS name in DHCP ◦ Commit + push ◦ Run the agent on every DHCP nodes • 2 - Puppet ◦ Reserve IP / DNS name in DNS ◦ Commit + push ◦ Run the agent on every DHCP nodes
  8. 2.1 - Situation initiale • 5 - Foreman ◦ Reboot

    the node via BMC plugin • 7 - Operator ◦ Follows with Java console 13 • 6 - Node installs ◦ Boot on network (PXE) ◦ DHCP redirects to TFTP ◦ TFTP serves the custom PXE config ◦ Pressed is rendered by Foreman
  9. 2.1 - Situation initiale • 5 - Foreman ◦ Reboot

    the node via BMC plugin • 7 - Operator ◦ Follows with Java console 14 • 6 - Node installs ◦ Boot on network (PXE) ◦ DHCP redirects to TFTP ◦ TFTP serves the custom PXE config ◦ Pressed is rendered by Foreman 6 manual steps Errors prone Human conflicts Time consuming
  10. 15

  11. 2.2 - Problématique • Simplifier le provisioning bare metal ◦

    Provisioning / installation non-supervisée ◦ 1 manual step 16
  12. 2.3 - Essai 1 - Foreman + SmartProxies Constat: Sous

    utilisation de Foreman. Solutions: Smart proxy pour automatiser : - IPAM + DHCP - DNS 17
  13. • Foreman Smart-proxy ◦ Not supported 2.3 - Essai 1

    - Foreman + SmartProxies • We ◦ 1 big zone file • Foreman Smart-proxy ◦ Dynamic updates = nsupdate ◦ Binary journal file + serial conflicts 18 • We ◦ Do nics bonding ◦ Need to register n@Macs <> 1 IP Pain points: DNS Pain points: DHCP
  14. 2.3 - Essai 1 - Foreman + SmartProxies • We

    ◦ Do not master Ruby ◦ Are not “a Tech company” ◦ Are not that big • Foreman & Smart-proxy ◦ Very complex code base ◦ Very complex UI ◦ Generic and have a lots (too many) of features 19 Pain points: Foreman
  15. 3.1 - Interface avec prestataire Celeris : Prestataire interventions en

    DC • Spreadsheet • DCIM : Netbox ◦ Open source ◦ Digital Ocean ◦ python + postgresql Intégration avec Foreman ? 21
  16. Problématique 2 • Automatiser la gestion du cycle de vie

    des machines physiques ◦ Discovery/intake ◦ Provisioning / installation non-supervisée ◦ Maintenance, decommission 23
  17. Collins • Project open source https://github.com/tumblr/collins • Machine à état

    imposée • Système de hook / callback arbitraire sur les transitions d’état • Metadata key / value arbitraires associées à chaque assets • Web UI + API http + firehose 24
  18. Collins: Tooling 25 API Clients • Go-collins • pycollins •

    Ruby libs ◦ collins-auth ◦ collins-client ◦ collins-notify ◦ collins-state ◦ ... CLI • collins-shell
  19. Collins: Cycle de vie 28 Workflows spécifiés : - Intake

    - Comissionnement - Maintenance - Décomissionnement
  20. 4.4 - Collins callbacks 35 • nowProvisioned ◦ on =

    "asset_update" ◦ When ▪ previous.state = "isProvisioning" ▪ && current.state = "isProvisioned" • provisionEvent ◦ on = "asset_update" ◦ When ▪ current.state = "isNew" • unallocated ◦ on = "asset_update" ◦ When ▪ current.state = "isUnallocated"
  21. 4.6 - Tooling 37 $ collins-shell INFO - ENV Variable

    COLLINS_CONFIG=/home/xkrantz/Sources/github.schibsted.io/leboncoin/acdc/conf/collins.yaml Tasks: collins-shell asset <command> # Asset related commands collins-shell asset_type <command> # Asset Type related commands collins-shell console # drop into the interactive collins shell collins-shell help [TASK] # Describe available tasks or one specific task collins-shell ip_address <command> # IP address related commands collins-shell ipmi <command> # IPMI related commands collins-shell latest # check if there is a newer version of collins-shell collins-shell log MESSAGE # log a message on an asset collins-shell logs TAG # fetch logs for an asset specified by its tag. Use "all" for a... collins-shell power ACTION --reason=REASON --tag=TAG # perform power action (off, on, rebootSoft, rebootHard, etc) o... collins-shell power_status # check power status on an asset collins-shell provision <command> # Provisioning related commands collins-shell search_logs QUERY # search for asset logs collins-shell state <command> # State management related commands - use with care collins-shell tag <command> # Tag related commands collins-shell version # current version of collins-shell
  22. 5 - Next ACDC v2 Rework • Discovery • OS

    bootstrapping Add • Disk management • Firmware updates • Any maintenance tasks 39
  23. 5 - Next ACDC v2 Rework • Discovery • OS

    bootstrapping Add • Disk management • Firmware updates • Any maintenance tasks Discovery • Currently: ◦ Genesis (Tumblr) ◦ Ruby DSL (Chef like) • Next: ◦ CoreOS in Memory + Ansible 40
  24. 5 - Next ACDC v2 Rework • Discovery • OS

    bootstrapping Add • Disk management • Firmware updates • Any maintenance tasks OS Bootstrapping • Currently: ◦ Pressed / Kickstart ◦ Shell scripts • Next: ◦ CoreOS in Memory + Ansible 41