Upgrade to Pro — share decks privately, control downloads, hide ads and more …

8 Years of Config Management

8 Years of Config Management

Starting with a small Puppet deployment in 2009, followed by the spread of Bcfg2 and finally the development and full-scale adoption of BundleWrap, we explore how configuration management at //SEIBERT/MEDIA has changed over the years.
This is not a talk about what you should do, but one about what we did – and more importantly where we went wrong. Learn from our mistakes as we did, both on a technical level and from a human perspective as we scaled our config managment from 1 to 30 contributors and 700 servers. Understand why we made BundleWrap and how some of its unique features help us every day.

Recording: https://media.ccc.de/v/froscon2017-1977-8_years_of_config_management

Torsten Rehn

August 20, 2017
Tweet

More Decks by Torsten Rehn

Other Decks in Technology

Transcript

  1. 38 commits per day 21 contributors last month 10k+ services

    in Icinga 727 managed nodes WHY SHOULD YOU CARE?
  2. no iPad Michael Jackson lives Obama freshly in office Deepwater

    Horizon a year away still looking for bin Laden H1N1
  3. LDAP LDAP LDAP LDAP LDAP LDAP BCFG2 LDAP WEB WEB

    DB BCFG2 LDAP GIT BCFG2 LDAP BCFG2 LDAP STUFF STUFF STUFF BCFG2 LDAP CTRL CTRL REPL REPL MASTER DEV
  4. <?python if 'debian' in metadata.groups or 'ubuntu' in metadata.groups: import

    ldap l = ldap.initialize('ldap://localhost:389') l.simple_bind("uid=bcfg2,ou=AutomationUsers,dc=smhss,dc=de", "32US5wa8jXTb") users = [] g = l.search_s('dc=smhss,dc=de', ldap.SCOPE_SUBTREE, '(&(objectClass=posixGroup)(|(cn=' + metadata.hostname + ')(cn=systems)))', ['cn', 'memberUid']) for group in g: for uid in group[1].get("memberUid", ()): u = l.search_s('dc=smhss,dc=de', ldap.SCOPE_SUBTREE, '(&(objectClass=posixAccount)(uid=' + uid + '))', ['homeDirectory', 'sshPublicKey']) users.append({ "uid" : uid, "homedir" : u[0][1]["homeDirectory"][0], "pubkey" : u[0][1]["sshPublicKey"][0], }) ?> <Bundle name="ssh_ldap_pubkey" xmlns:py="http://genshi.edgewall.org/" xmlns:xi="http://www.w3.org/ 2001/XInclude"> <py:if test="'debian' in metadata.groups or 'ubuntu' in metadata.groups"> <Package name="auth-client-config" /> <BoundConfigFile py:for="user in users" name="${user['homedir'].rstrip('/')}/.ssh/authorized_keys" owner="${user['uid']}" group="root" perms="0600" >${user['pubkey']}</BoundConfigFile>
  5. git commit; git push wait for post-update hooks to rsync

    git repo to each bcfg2-server ssh into each node running slapd rm -rf /var/lib/ldap/* && /etc/init.d/slapd restart ssh into target node run bcfg2-apply wait up to 70 minutes hit N, ENTER a couple dozen times curse the idiot who came up with this crap
  6. „Instead of spending hours trying to get Chef installed, […]

    we recommend […] a free Enterprise Chef account and we'll take care of the Chef Server for you.“ learnchef.opscode.com
  7. NIH

  8. “FIXING” BCFG2 better Template engine no more XML item-level parallelism

    fast diffs bundles + metadata = configuration interactive mode Python everywhere
  9. commit b2d3f78fcb8392864527721dde32f923d4c4e2ee Author: Torsten Rehn <[email protected]> Date: Sun Jun 9

    15:38:25 2013 +0200 Damn the torpedoes. Four bells, Captain Drayton.
  10. $ bw test ✓ gce.smedia-1.netbox has no metadata collisions ✓

    gce.smedia-1.netbox systemd action:systemd-locale ✓ gce.smedia-1.netbox netbox symlink:/opt/netbox ✓ gce.smedia-1.netbox ntp file:/etc/ntp.conf ✓ gce.smedia-1.netbox has no IP conflicts ✓ gce.smedia-1.netbox has valid SLA metadata
  11. $ bw verify -a mynode ✓ mynode hosts file:/etc/hosts ✓

    mynode hosts file:/etc/hostname i ╭────────┬───────┬──────┬─────┬────────┬──────────╮ i │ node │ items │ good │ bad │ health │ duration │ i ├────────┼───────┼──────┼─────┼────────┼──────────┤ i │ mynode │ 2 │ 2 │ 0 │ 100.0% │ 1s │ i ╰────────┴───────┴──────┴─────┴────────┴──────────╯
  12. groups = { ‘important_stuff’: { ‘members’: [‘that-old-db-host’], ‘member_patterns’: [‘^cluster-1\.’], ‘members_add’:

    lambda node: node.metadata[‘is_production’], ’members_remove’: lambda node: node.os == ‘debian’, } }
  13. groups = { ‘germany’: { ‘metadata’: {‘nameservers’: [‘8.8.8.8’]}, ‘subgroups’: [‘frankfurt’],

    }, ‘frankfurt’: { ‘metadata’: {‘nameservers’: [‘8.8.4.4’]}, ‘members’: [‘node-1’], }, } nodes = { ‘node-1': { ‘metadata’: {‘nameservers’: atomic([’10.1.2.3’])},
  14. REPO NODE ITEM file dir svc pkg file hash hash

    hash hash hash hash hash hash
  15. $ bw hash -d gce.smedia-1.netbox file:/etc/hosts content_hash 4deb6fa4dfbfc49197d2010e8c3243734b4ec9d5 group root

    mode 0644 owner root type file $ bw hash -d gce.smedia-1.netbox d97ae6dbf84a38[…]c372febd90e0 file:/etc/crontab 1744c47d6eade0[…]5a246abac701 file:/etc/hosts 0b7f33337af160[…]e53aadbac864 pkg_apt:nginx $ bw hash gce.smedia-1.netbox file:/etc/hosts 1744c47d6eade097d10e8c3243735a246abac701
  16. $ bw hash -d 59881a3be29f[…]bee39bcf7 gce.smedia-1.gateway 915f56ca0785[…]3aa9cf7d3 gce.smedia-1.hubot 9ecef21615f0[…]e5a8635c4 gce.smedia-1.inforelay

    a6de0dd8a187[…]2f9426ac7 gce.smedia-1.netbox 9e7468028d2a[…]e1f257155 gce.smedia-1.nexus $ bw hash a43042c583392f46dce1e25f0bef1949b7122934 $ bw hash gce.smedia-1.netbox a6de0dd8a187446e308a989861369152f9426ac7
  17. 0 2.500 5.000 7.500 10.000 2009 2010 2011 2012 2013

    2014 2015 2016 2017 COMMITS / YEAR
  18. 0 50 100 150 200 2009 2010 2011 2012 2013

    2014 2015 2016 2017 BCFG2 BundleWrap NUMBER OF BUNDLES
  19. 0 75 150 225 300 Mar 2016 May 2016 Jul

    2016 Sep 2016 Nov 2016 Jan 2017 Mar 2017 May 2017 Jul 2017 PULL REQUESTS / MONTH
  20. $ bw lock add mynode -i file:/etc/hosts -e 3d ✓

    mynode locked with ID BNOA (expires in 1h) $ bw lock show dynode ╭────────┬──────┬─────────────────────┬─────────────────────┬───────┬─────────────────┬ │ node │ ID │ created │ expires │ user │ items │ ├────────┼──────┼─────────────────────┼─────────────────────┼───────┼─────────────────┼ │ mynode │ BNOA │ 2017-08-17 11:03:53 │ 2017-08-17 12:03:53 │ trehn │ file:/etc/hosts │ ╰────────┴──────┴─────────────────────┴─────────────────────┴───────┴─────────────────┴ $ BW_IDENTITY=notme bw apply mynode […] » mynode hosts file:/etc/hosts skipped (soft locked) […]
  21. $ bw repo create $ cat .secrets.cfg # DO NOT

    COMMIT THIS FILE # share it with your team through a secure channel [generate] key = VzPbOWr-Oh65UUXgvN-b5WUdUBSu5gB0L8Fq-iUCkmo= [encrypt] key = m6Lvf37SHwTdQZpZ78eqGVoJLkoo5GAHJMAz5RhTOyU=
  22. 0 5 10 15 20 25 2009 2010 2011 2012

    2013 2014 2015 2016 2017 CONTRIBUTORS / MONTH
  23. 0 200 400 600 800 2009 2010 2011 2012 2013

    2014 2015 2016 2017 BCFG2 BundleWrap NUMBER OF NODES
  24. $ bw metadata --table gce is_production ╭──────────────────────────────────────────┬───────────────╮ │ node │

    is_production │ ├──────────────────────────────────────────┼───────────────┤ │ gce.smedia-1.bind-1 │ True │ │ gce.smedia-1.bitbucket │ True │ │ gce.smedia-1.dashboard │ True │ │ gce.smedia-1.firescope-test │ False │ │ gce.smedia-1.galaxy │ True │ │ gce.smedia-1.gateway │ False │ │ gce.smedia-1.hubot │ True │ │ gce.smedia-1.inforelay │ True │ │ gce.smedia-1.metrics │ True │ │ gce.smedia-1.netbox │ True │ │ gce.smedia-1.nexus │ True │ │ gce.smedia-1.nipap │ True │ │ gce.smedia-1.onetimesecret │ True │
  25. 'tunnels': [ ('fra-1', 'ovh-1'), ('fra-1', 'pb-1'), ('fra-1', 'pb-2'), ('fra-1', 'pb-5'),

    ('haj1', ‘haj2'), ('haj1', 'ovh-1'), ('haj1', 'pb-2'), ('haj2', 'fra-1'), ('haj2', 'pb-1'), ('haj2', 'pb-2'), ('haj2', 'pb-3'), ('lf', 'fra-1'), ('lf', 'haj1'),
  26. 'haj2': { 'networks': [ '10.3.0.0/16', '10.66.30.0/24', '10.68.3.0/24', ], 'private_as_number': 64603,

    }, 'fra-1': { 'networks': ['10.4.0.0/16'], 'private_as_number': 64604, }, 'ovh-1': { 'networks': ['10.20.0.0/24'],
  27. BundleWrap CIC Who applied where when? show affected nodes for

    each commit read-only API for metadata auto apply if node goes stale automated commits?