Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Database management with salt and other observations

Database management with salt and other observations

The talk explores the Salt installation at booking.com where we run more than 15000 minions, and touches upon how we use salt for database management.

Video: http://saltstack.com/large-scale-database-management-with-saltstack/

39d28dff39e8dfc4866477c0e1d52506?s=128

Pankaj Kaushal

March 10, 2015
Tweet

More Decks by Pankaj Kaushal

Other Decks in Technology

Transcript

  1. database management with salt and other observations pankaj kaushal systems

    engineer, booking.com
  2. me • 18 years of Linux • Systems Engineer, Booking

    • Flipkart - Yahoo - Monster • Not a DBA
  3. you Salt Custom grains? Reactors? API?

  4. • How we provision databases • How we manage databases

    • How salt plays a role
  5. • Booking Infrastructure • History • How we Database •

    How we Salt • Reactors/Grains/API • DB Tool • Issues and Limitations
  6. booking.com offers online accommodation booking ! Amsterdam Seattle

  7. Booking Infra • 5 Datacenters - 2 large - 3rd

    large on the way • Thousands of Physical machines • Mostly Linux • Close to 50% Mysql databases
  8. Booking Infra • Replication chains with 100s of slaves •

    Large de-normalised datasets • SAN (explanation later..)
  9. History • Func • Git hooks / Puppet kick •

    Database automation • Ad-hoc task execution
  10. History • Func • Deployments - for pushing out software

    • Interaction between Configs & Services (git hooks) • Remote management of PCI resources
  11. GET THE FUNC OUT

  12. Get the func out • Dead project • Has issues

    with long running cmds and large scale • Replace func with something nice
  13. SALT

  14. Why Salt • Asynchronous/Scalable • Not dependent on SSH •

    Salt has rich task execution and reportage arsenal
  15. Why Salt • Salt cp module easily performs file distribution

    tasks • Continue to use puppet for config management for now • Use custom grains, reactors and api to replace func use cases
  16. our Design

  17. • Master of masters • Syndic ring • Redis for

    returner • Standby masters • Failover master
  18. design • Independent Masters for PCI and remote Environments •

    Syndic helps balance the minion load • Simple failover means anyone can do it
  19. design • Using ZeroMQ as transport • Keys shared with

    puppet • Master and Syndic info to minions is provided by Hiera
  20. design • Multi master provides failover for minions • Pypi

    releases
  21. design

  22. Salt Master # Salt master conf file. order_masters: True max_open_files:

    100000 worker_threads: 24 auto_accept: True syndic_wait: 30 file_roots: base: - /srv/salt - /var/cache/salt/minion/files/base pillar_roots: base: - /srv/pillar pillar_opts: False show_timeout: True log_level: warning timeout: 90 job_cache: False module_dirs: - /srv/salt/_modules • Really simple master config • 10K machines over 3 masters
  23. Roles Birds of the same feather

  24. NodeGroups • Simple script to generate nodegroups.conf from CMDB •

    Lets us target similar machines together
 
 
 
 salt -N <nodegroup-name> cmd.run "ls /tmp/a"
  25. grains we use grains to find out what’s going on

  26. grains • Custom grains • Status: Live / Not live

    • DMI Raid Storage info • Fiber channel Info
  27. grains • Database type (master/slave/intermediate master) • Puppet Facts come

    for free • CMDB: kept in sync using grains
  28. custom grains storage: ---------- host1: ---------- port_id: 0x662300 switch_name: fca-edge-102.XXX.XXX.booking.com

    switch_wwpn: 10:00:00:05:33:d3:15:6a wwnn: 50:01:43:80:24:22:03:d5 wwpn: 50:01:43:80:24:22:03:d4 —————-snip snip ———————— switches: - fca-edge-102.XXX.XXX.booking.com - fcb-edge-102.XXX.XXX.booking.com filer: filer-prod-201.XXX.XXX.booking.com wwpns: - 50:01:43:80:24:22:03:d4 - 50:01:43:80:24:22:03:d6 - 50:01:43:80:24:22:04:f8 - 50:01:43:80:24:22:04:fa raid_controllers: |_ ---------- brand: Hewlett-Packard Company model: Smart Array P410i disks_raid: |_ ---------- disk_type: HDD size_in_gb: 300 |_ ---------- disk_type: HDD size_in_gb: 300 |_ ---------- disk_type: HDD size_in_gb: 300 ! db_pools: None ! ! ! ! ! bookings_DATACENTER: XXX4 bookings_DB_TYPE: cold_standby bookings_DJANGO_SETTINGS_MODULE: serverdb2.settings bookings_ENVIRON: production bookings_HTTP_PROXY: http://webproxy.XXX.XXX.booking.com:3128/ bookings_LDAP: 1 bookings_NODE: mc102bpimdb-01.XXX.XXX.booking.com bookings_PCI: 0 ! ! ! !
  29. Use case • Give me the name of all edge

    switches for all the hosts who are connected to filer-prod-204 
 
 
 
 
 
 
 
 sudo salt -G ’stor_filers:filer-prod-204’ grains.get stor_switches stor_switches: - fc-27.ams4.lom.booking.com - fc-28.ams4.lom.booking.com stor_wwpns: - 50:01:43:80:04:c5:0b:f4 - 50:01:43:80:04:c5:0b:f6 - 50:01:43:80:12:08:8b:e6 stor_filers: - filer13 - filer14
  30. CMDB & Grains • Automatically reflect production situation to CMDB

    • CMDB is in sync with production realities
  31. minion event with_grains = True master reactor runner diff grains

    update cmdb scheduler CMDB & Grains
  32. Salt Reactors

  33. salt reactors • We use salt reactors for • Git

    hooks • Puppet • DNS • Grains sync
  34. salt reactors reactor: - ‘dba-tool/fc/create‘: - /srv/reactor/fc_create.sls - 'puppet/git/update': -

    /srv/reactor/puppet_git_update.sls … fc_create: local.cmd.run: - tgt: ‘storagetools-201' - arg: - /usr/local/bin/fc_provision.sh
  35. Databases How we do it

  36. DB • Masters • Local storage • SAN (more on

    this shortly) • Materialised datasets
  37. DB • Shard-ed datasets • 100s of Slaves • Cross

    DC replication
  38. FC-SAN 101 • FIber channel switches • Netapp appliances •

    LUNs as block storage • Multipathd
  39. Salt API

  40. Salt API • Allows you to call salt commands and

    modules • We use this for Interprocess communication for • DBA tools • Storage Allocation
  41. Salt API rest_cherrypy: port: 8000 ssl_crt: /etc/pki/tls/certs/localhost.crt ssl_key: /etc/pki/tls/certs/localhost.key webhook_url

    : /hook thread_pool: 100 socket_queue_size: 40 expire_responses: True collect_stats : True external_auth: pam: sdev: - .* client_acl: sdev: - .*
  42. Salt API • Use Salt API to • Create/modify LUNs

    on Netapp • Create and map zones on FC switches
  43. Salt API • Use Salt API to • Create/modify LUNs

    on Netapp • Using WFA Api • Create and map zones on FC switches • cmd.run
  44. create salt api create zone create lun puppet Storage allocation

  45. Salt API • Use Salt API to • Create/modify LUNs

    on Netapp • Using WFA Api • Create and map zones on FC switches • cmd.run
  46. DBA tools • Currently mostly uses func • Being ported

    to salt using a combination of salt-api and salt reactors
  47. ISSUES

  48. ISSUES • Very easy to break salt minions everywhere •

    e.g push a bad custom grain • No easy way to fix salt with salt • would be great to have a barebones or salt- minimal
  49. ISSUES • Salt-api is slower than using func at the

    moment • Minions may crash if one of the master is unavailable (14128) • Reactors don’t work with multi master (17033 and 13879 • Node-groups and grains are mutually exclusive
  50. Thankyou #salt on freenode saltstack booking.com ! @spo0nman