Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Database management with salt and other observa...

Database management with salt and other observations

The talk explores the Salt installation at booking.com where we run more than 15000 minions, and touches upon how we use salt for database management.

Video: http://saltstack.com/large-scale-database-management-with-saltstack/

Pankaj Kaushal

March 10, 2015
Tweet

More Decks by Pankaj Kaushal

Other Decks in Technology

Transcript

  1. me • 18 years of Linux • Systems Engineer, Booking

    • Flipkart - Yahoo - Monster • Not a DBA
  2. • Booking Infrastructure • History • How we Database •

    How we Salt • Reactors/Grains/API • DB Tool • Issues and Limitations
  3. Booking Infra • 5 Datacenters - 2 large - 3rd

    large on the way • Thousands of Physical machines • Mostly Linux • Close to 50% Mysql databases
  4. Booking Infra • Replication chains with 100s of slaves •

    Large de-normalised datasets • SAN (explanation later..)
  5. History • Func • Git hooks / Puppet kick •

    Database automation • Ad-hoc task execution
  6. History • Func • Deployments - for pushing out software

    • Interaction between Configs & Services (git hooks) • Remote management of PCI resources
  7. Get the func out • Dead project • Has issues

    with long running cmds and large scale • Replace func with something nice
  8. Why Salt • Asynchronous/Scalable • Not dependent on SSH •

    Salt has rich task execution and reportage arsenal
  9. Why Salt • Salt cp module easily performs file distribution

    tasks • Continue to use puppet for config management for now • Use custom grains, reactors and api to replace func use cases
  10. • Master of masters • Syndic ring • Redis for

    returner • Standby masters • Failover master
  11. design • Independent Masters for PCI and remote Environments •

    Syndic helps balance the minion load • Simple failover means anyone can do it
  12. design • Using ZeroMQ as transport • Keys shared with

    puppet • Master and Syndic info to minions is provided by Hiera
  13. Salt Master # Salt master conf file. order_masters: True max_open_files:

    100000 worker_threads: 24 auto_accept: True syndic_wait: 30 file_roots: base: - /srv/salt - /var/cache/salt/minion/files/base pillar_roots: base: - /srv/pillar pillar_opts: False show_timeout: True log_level: warning timeout: 90 job_cache: False module_dirs: - /srv/salt/_modules • Really simple master config • 10K machines over 3 masters
  14. NodeGroups • Simple script to generate nodegroups.conf from CMDB •

    Lets us target similar machines together
 
 
 
 salt -N <nodegroup-name> cmd.run "ls /tmp/a"
  15. grains • Custom grains • Status: Live / Not live

    • DMI Raid Storage info • Fiber channel Info
  16. custom grains storage: ---------- host1: ---------- port_id: 0x662300 switch_name: fca-edge-102.XXX.XXX.booking.com

    switch_wwpn: 10:00:00:05:33:d3:15:6a wwnn: 50:01:43:80:24:22:03:d5 wwpn: 50:01:43:80:24:22:03:d4 —————-snip snip ———————— switches: - fca-edge-102.XXX.XXX.booking.com - fcb-edge-102.XXX.XXX.booking.com filer: filer-prod-201.XXX.XXX.booking.com wwpns: - 50:01:43:80:24:22:03:d4 - 50:01:43:80:24:22:03:d6 - 50:01:43:80:24:22:04:f8 - 50:01:43:80:24:22:04:fa raid_controllers: |_ ---------- brand: Hewlett-Packard Company model: Smart Array P410i disks_raid: |_ ---------- disk_type: HDD size_in_gb: 300 |_ ---------- disk_type: HDD size_in_gb: 300 |_ ---------- disk_type: HDD size_in_gb: 300 ! db_pools: None ! ! ! ! ! bookings_DATACENTER: XXX4 bookings_DB_TYPE: cold_standby bookings_DJANGO_SETTINGS_MODULE: serverdb2.settings bookings_ENVIRON: production bookings_HTTP_PROXY: http://webproxy.XXX.XXX.booking.com:3128/ bookings_LDAP: 1 bookings_NODE: mc102bpimdb-01.XXX.XXX.booking.com bookings_PCI: 0 ! ! ! !
  17. Use case • Give me the name of all edge

    switches for all the hosts who are connected to filer-prod-204 
 
 
 
 
 
 
 
 sudo salt -G ’stor_filers:filer-prod-204’ grains.get stor_switches stor_switches: - fc-27.ams4.lom.booking.com - fc-28.ams4.lom.booking.com stor_wwpns: - 50:01:43:80:04:c5:0b:f4 - 50:01:43:80:04:c5:0b:f6 - 50:01:43:80:12:08:8b:e6 stor_filers: - filer13 - filer14
  18. CMDB & Grains • Automatically reflect production situation to CMDB

    • CMDB is in sync with production realities
  19. salt reactors • We use salt reactors for • Git

    hooks • Puppet • DNS • Grains sync
  20. salt reactors reactor: - ‘dba-tool/fc/create‘: - /srv/reactor/fc_create.sls - 'puppet/git/update': -

    /srv/reactor/puppet_git_update.sls … fc_create: local.cmd.run: - tgt: ‘storagetools-201' - arg: - /usr/local/bin/fc_provision.sh
  21. DB • Masters • Local storage • SAN (more on

    this shortly) • Materialised datasets
  22. Salt API • Allows you to call salt commands and

    modules • We use this for Interprocess communication for • DBA tools • Storage Allocation
  23. Salt API rest_cherrypy: port: 8000 ssl_crt: /etc/pki/tls/certs/localhost.crt ssl_key: /etc/pki/tls/certs/localhost.key webhook_url

    : /hook thread_pool: 100 socket_queue_size: 40 expire_responses: True collect_stats : True external_auth: pam: sdev: - .* client_acl: sdev: - .*
  24. Salt API • Use Salt API to • Create/modify LUNs

    on Netapp • Create and map zones on FC switches
  25. Salt API • Use Salt API to • Create/modify LUNs

    on Netapp • Using WFA Api • Create and map zones on FC switches • cmd.run
  26. Salt API • Use Salt API to • Create/modify LUNs

    on Netapp • Using WFA Api • Create and map zones on FC switches • cmd.run
  27. DBA tools • Currently mostly uses func • Being ported

    to salt using a combination of salt-api and salt reactors
  28. ISSUES • Very easy to break salt minions everywhere •

    e.g push a bad custom grain • No easy way to fix salt with salt • would be great to have a barebones or salt- minimal
  29. ISSUES • Salt-api is slower than using func at the

    moment • Minions may crash if one of the master is unavailable (14128) • Reactors don’t work with multi master (17033 and 13879 • Node-groups and grains are mutually exclusive