Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MySQL DevOps @ Outbrain

Shlomi Noach
September 27, 2015

MySQL DevOps @ Outbrain

This session exposes tools and techniques used at Outbrain to manage its MySQL servers, bridging the gaps between our Developers, DBAs and Ops.

Shlomi Noach

September 27, 2015
Tweet

More Decks by Shlomi Noach

Other Decks in Technology

Transcript

  1. Tools bridging the gap between MySQL engineering, ops & DBAs

    Shlomi Noach MySQL DevOps @ Outbrain Shlomi Noach
  2. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain About me • Engineer, DBA • Working with MySQL since 2000 • Formerly consultant, instructor • Author of common_schema, openark-kit, propagator • Write at http://openark.org • Work at the infrastructure team, Outbrain
  3. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain About Outbrain • The leading content discovery platform on the web • Embedded in over 90,000 websites • Serves over 150 million unique US visitors, 15 billion pages and 100 billion recommendations per month • You may not be familiar with us by name, but have met us frequently. • We aim to provide with reliable content to our users.
  4. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain PAID DISCOVERY INTERNAL DISCOVERY
  5. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain About Outbrain • Managing total of over 2,000 servers (Hadoop, Cassandra, MySQL, web services, …) • Processing about 1 Petabyte of information • Over 70 engineers • Doing continuous deployments • Fans and supporters of open source • Have "Ownership" culture: "You build it, you run it!" ◦ Must be supported by technology
  6. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain What's DevOps? • Or, DevDbaOps? ?
  7. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain What's DevOps? • Often described as developers doing ops work, or ops doing engineering work • I see this more as the integration between the groups • Avoiding the scenario where parties have no control of parts of their domain. ◦ Tools ◦ Techniques ◦ Culture
  8. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain What's DevOps? • With good DevOps, you get: ◦ Ownership ◦ Visibility ◦ Action-ability (word has just been invented and will be used as axiom) • Allowing engineers own and be responsible for their apps. ◦ No need for ops telling them something is wrong ◦ No need to sit with ops to understand what is wrong ◦ No need to ask ops to deploy changes • All the while giving ops visibility into engineers actions
  9. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Tribute: automation • We use chef for automation • Some databags leftovers, changing to attributes • Everything is under version control ◦ Allows ops/DBAs easily add/remove packages ◦ Different treatment for masters ◦ Different my.cnf settings based on MySQL role ◦ Different my.cnf settings based on hardware ◦ Setting up backup servers ◦ More...
  10. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Automation: One Ring to Rule Them All • Outbrain's onering is an orchestration solution • Provisioning servers: from operating system through packages (via chef integration) to application deployment (via glu integration) ◦ Allows for a one click "I want a host with MyService tomcat service", or "I want a host with MySQL server" • Then acting as inventory service ◦ "give me all MySQL servers in the LA data center" ◦ "which disks do our OLAP servers use?" • https://github.com/outbrain/onering
  11. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Onering & pmysql: on-demand semi-automated actions • pmysql is a parallel MySQL client (originally developed by Domas Mituzas) • Using onering's API, we can: curl "https://my.onering.service/api/devices /list/name/where/chef.run_list/mysql/name/olap? format=txt" | pmysql -pmypass "stop slave" curl "https://my.onering.service/api/devices /list/name/where/chef.run_list/mysql/name/olap? format=txt" | pmysql -pmypass "select @@version" | grep tokudb | awk '{print $1}' | pmysql -pmypass "stop slave"
  12. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Visibility • A classic developers-ops collision: slow queries ◦ Ops notice increased I/O, slave lags ◦ What do they know of the domain of the problem? ◦ Developers see long response times ◦ What visibility do they get?
  13. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain What makes Anemometer such a good DevOps tool? • It provides visibility to everyone • The engineer doesn't need to know what slow logs are, where they are located, how to interpret them. • It promotes ownership in that it gets the drill down per query/per host/per service • The Permalink. How such a small thing can make all the difference • Ops can hand over what they think is the "guilty query"
  14. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Anemometer Host Anemometer @ Outbrain, behind the scenes MySQL Slow log MySQL Slow log MySQL Slow log Slow log Slow log Slow log logstash logstash logstash pt-query-digest Web interface
  15. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Multiple services, multiple MySQL hosts: who makes it slow? MySQL Slow log MySQL Slow log MySQL Slow log service service service • What is our analysis granulation? • Are slow logs caused by a query? • Affected by a loaded MySQL host? • By a loaded service?
  16. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Anemometer, collecting the slow logs input { tcp { port => 23306 type => "mysql-slow" mode => "server" } } filter { dns { reverse => [ "@source_host", "source_host_name" ] action => "replace" } } output { file { type => "mysql-slow" message_format => "%{@message}" path => "/path/to/slow_logs/logstash/%{@source_host}-mysql-slow.log" } }
  17. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Anemometer, rotating the slow logs /outbrain/slow_logs/logstash/*.log { daily nocompress size 1 missingok ifempty copytruncate prerotate /bin/bash /var/www/html/anemometer/outbrain/pre_rotate.sh $1 endscript nosharedscripts rotate 100 } • logstash streams logs onto the anemometer machine • We choose not to aggregate them into one; the target file name indicates the source host name
  18. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Anemometer, processing the slow logs #!/bin/bash rotated_slow_log_file=$1 rotated_slow_log_file_path=$(dirname $rotated_slow_log_file) rotated_slow_log_file_name=$(basename $rotated_slow_log_file) hostname=${rotated_slow_log_file_name%%-mysql-slow.log*} /bin/grep -v "^$" $rotated_slow_log_file | /usr/bin/pt-query-digest --user=... --password=... \ --review u=,p=,h=localhost,D=...,t=global_query_review \ --history u=,p=,h=localhost,D=...,t=global_query_review_history \ --filter=" \$event->{Bytes} = length(\$event->{arg}) and \$event->{hostname}=\"${hostname}\" and \$event->{clustername}=\"${clustername}\"" \ --no-report --group-by-extra=host • Reading files per mysql-host, adding host & cluster • Secondary grouping by client-host
  19. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Anemomaster: visibility into master DML • One of those "How did we ever live without it?" tools. • Provides near real time (10 minute granularity) visibility into queries issued on master. • Got an unexpected burst of INSERTs? Anemomaster provides a quick and accurate access into the specific "guilty" query. • And ops take a permalink to the owner. • "Anemomaster" is a nickname. This is Anemometer on top of binary log analysis instead of slow log, analyzing number of executions instead of total run time. • Also writing all DMLs to graphite.
  20. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Anemomaster • Pinpointing count executions of a specific UPDATE query • This query is owned by a known team.
  21. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Anemomaster Host Anemomaster @ Outbrain, behind the scenes MySQL Master Binary log MySQL Slave pt-query-digest Relay log MySQL Master Binary log MySQL Slave pt-query-digest Relay log Binary log Web interface
  22. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Anemomaster, processing the binary logs /usr/bin/mysql -umy_user -pmy_password -e 'flush relay logs\G;' sleep 1 binlog_file=$(ls -tr /path/to/mysql/mysqld-relay-bin.[0-9]* | tail -n 2 | head -n 1) mysqlbinlog $binlog_file | /usr/bin/pt-query-digest \ --type binlog --order-by Query_time:cnt --group-by fingerprint \ --limit 100 --review h=myhost,D=anemomaster,t=global_query_review \ --history h=myhost,D=anemomaster,t=global_query_review_history \ --filter=" \$event->{Bytes} = length(\$event->{arg}) and \$event- >{hostname}=\"$(hostname)\" and \$event->{clustername}=\"$ {clustername}\" and \$event->{host}=\"n/a\" " \ --no-report • Actually processing the relay logs on slaves • Assumes SBR, work in progress for RBR
  23. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Anemomaster, writing to graphite query=" select ... " mysql anemomaster --silent --silent --raw -e "$query" | while IFS=$'\t' read -r -a result_values do fingerprint_cluster=${result_values[0]} ; fingerprint_count=${result_values[1]} ; fingerprint_query=${result_values[2]} ; fingerprint_query=$(echo $fingerprint_query | sed -r -e "s/^(-- .*)]//g") fingerprint_query=$(echo $fingerprint_query | tr '\n' ' ' | tr '\r' ' ' | tr '\t' ' ') fingerprint_query=${fingerprint_query%%(*} fingerprint_query=${fingerprint_query%%,*} fingerprint_query=${fingerprint_query%% set *} fingerprint_query=${fingerprint_query%% SET *} fingerprint_query=${fingerprint_query%% where *} fingerprint_query=${fingerprint_query%% WHERE *} fingerprint_query=${fingerprint_query%% join *} fingerprint_query=${fingerprint_query%% JOIN *} fingerprint_query=${fingerprint_query%% using *} fingerprint_query=${fingerprint_query%% USING *} fingerprint_query=${fingerprint_query%% select *} fingerprint_query=${fingerprint_query%% SELECT *} fingerprint_query=$(echo $fingerprint_query | tr -d "\`") fingerprint_query=$(echo $fingerprint_query | tr -d "*") fingerprint_query=$(echo $fingerprint_query | tr " " "_") fingerprint_query=$(echo $fingerprint_query | tr "." "__") echo "data.mysql.${fingerprint_cluster}.mysql_dml.${fingerprint_query}.count \ ${fingerprint_count} $unixtime" | nc -w 1 graphite 3003 done
  24. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain audit_login: a login auditing plugin • Auditing every single login to our databases ◦ Keeping track of connects per minute, find problems ◦ Detecting unused accounts ◦ Detecting failed connects, taking action ◦ Detecting naughty scripts executed by developers (haha, got your IP!) ◦ And, well, auditing for the record
  25. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain audit_login, output {"ts":"2013-09-11 09:11:47","type":"successful_login","myhost":"gromit03","thread":"74153868", "user":"web_user","priv_user":"web_user","host":"web- 87.localdomain","ip":"10.0.0.87"} {"ts":"2013-09-11 09:11:55","type":"failed_login","myhost":"gromit03","thread":"74153869","use r":"backup_user","priv_user":"","host":"web-32","ip":"10.0.0.32"} {"ts":"2013-09-11 09:11:57","type":"failed_login","myhost":"gromit03","thread":"74153870","use r":"backup_user","priv_user":"","host":"web-32","ip":"10.0.0.32"} {"ts":"2013-09-11 09:12:48","type":"successful_login","myhost":"gromit03","thread":"74153871", "user":"root","priv_user":"root","host":"localhost","ip":"10.0.0.111"} {"ts":"2013-09-11 09:13:26","type":"successful_login","myhost":"gromit03","thread":"74153872", "user":"web_user","priv_user":"web_user","host":"web- 11.localdomain","ip":"10.0.0.11"}
  26. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain logstash read, transform, write Kibana Searchable via Lucene audit_login @ Outbrain, behind the scenes MySQL Master audit log MySQL Master audit log MySQL Master audit log audit meta log grep-able like mama used to make
  27. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain audit_login, logstash input { file { type => "mysql_audit_login" format => "json" sincedb_path => "/var/cache/logstash/.since_audit_login_log" sincedb_write_interval => 1 path => [ "/path/to/audit_login.log" ] } } filter { grep { type => "mysql_audit_login" match => [ "user", "monitoring_user" ] negate => true } grep { type => "mysql_audit_login" match => [ "user", "heartbeat_user" ] negate => true }
  28. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain audit_login, logstash output { rabbitmq { host => "my.rmq.host" user => "logstash_user" password => "logstash_password" exchange => "logstash.out" exchange_type => "fanout" type => "mysql_audit_login" } } output { tcp { type => "mysql_audit_login" mode => "client" host => "my.logstash.aggregator" port => "23307" message_format => "%{timestamp},%{type},%{myhost},%{thread},% {user},%{priv_user},%{host},%{ip}" } }
  29. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain audit_login Kibana @ Outbrain user:webapp AND myhost:east1 AND type:failed_login
  30. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Actionability • Can developers actually have controlled/automated actions on the database? • Such that everyone, including DBA/Ops, have visibility into? • Solving the above gives developers greater ownership over their domain, even within the database server.
  31. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Schema & data deployments • Who controls the database schema design? ◦ Ops? Is schema design within their domain? ◦ DBA? Expert about schema design, but is the DBA an expert about the business domain? ◦ Developers? Do they understand indexing? • With many dozens of engineers, we can't have the DBA be the single mutex for any schema change. • But the DBA must know what's going on.
  32. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Hive Meta Hive Meta MySQL & Hive servers @ Outbrain Slave MySQL Slave Slave DWH Slave Slave Meta Slave Hive Hive Hive Hive Hive Hive Hive Hive Hive Hive Hive Hive Hive Hive MySQL build server MySQL build server MySQL unit tests MySQL dev/sim MySQL dev/sim MySQL dev/sim MySQL dev/sim MySQL build server MySQL build server
  33. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Database servers @ Outbrain • Multiple servers • Multiple roles (OLTP, OLAP, Meta, Hive, others) • Multiple environments (dev, QA, Build, Production) • Multiple types (MySQL, Hive) • Multiple engineers who want to deploy to them all. How and where does a developer issue a CREATE TABLE?
  34. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Where & how to CREATE TABLE? • Not on all servers, since the table is irrelevant to some (e.g. relevant to OLTP, not to DWH). Who keeps track? • Shall the developer work them out one by one? Maybe a shell script? ◦ Does the developer know all the credentials on all the servers? • What if some deployment goes wrong? (Table already there; server cannot be accessed) ◦ Who keeps track and retries/fixes? ◦ Do you know who did what, when & where?
  35. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Are all databases equal? • We use different schema names on our test servers than we do on production ◦ Who keeps record? • We have services which use multiple schemas, all with exact same structure. Changes must apply on all schemas. ◦ We've just multiplied the number of deployments for our CREATE TABLE statement. • Different ports, different credentials, different FEDERATED/CONNECT targets...
  36. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Existing schema deployment tools • Some excellent open source solutions. Notable are Liquibase & flywaydb • However we found them to be unsuitable to our needs ◦ Both linear ◦ Multitenancy not easy to achieve ◦ Mathematically sound, but reality isn't mathematically sound. ◦ Require a lot of management to achieve visibility and ownership • Some Windows-Desktop apps around. Ahem.
  37. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Propagator • Eventually we developed our own, "multi-everything" solution • Propagator provides ownership, action-ability and visibility • Developers specify what they want to execute, and for which database role • Propagator infers the hosts, the schema/query transformations and awaits your approval.
  38. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Propagator: submit a script for deployment
  39. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Propagator action-ability, visibility & ownership • Deployments are fully audited. Any failure is accounted for. • Propagator tells you who did what, when and on which host. Also encourages "why". • Engineers do most of the work with no intervention by DBA or ops • DBA has control over deployments. Can retry, restart, selectively skip or issue partial queries...
  40. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Propagator: history visibility
  41. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Propagator: visibility & ownership • The DBA may review deployments history • Has immediate feedback on anything that went wrong • Can most of the time figure out by herself why that went wrong and rerun the deployment • Otherwise knows who to contact • Commenting and tagging enhance visibility • Typical scenario: developer is new, unsure what went wrong (this can be considered as a bug, actually) • Next typical scenario: developer is experienced. Everything works.
  42. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Propagator: still much TODO • Propagator has been in production at Outbrain for a few months now, and it gets the job done. • But still TODO: ◦ More feedback automation ◦ Email alerts ◦ Two-phase approval ◦ Online schema changes integration ◦ SVN integration ◦ Maven integration ◦ Cassandra
  43. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Data retention • If disk space runs out, who gets the alert? ◦ Ops? Sure, they can add some disk space (volume group free space; spare disks on shelf). But only to up to some point. • Time for data retention. Ideally, we would store data forever. Reality is not ideal. ◦ Who is the owner of retention? If I want to drop a partition, who do I approve this with? ◦ Can this be more visible? • Are you doing data retention via shell/Perl scripts? Are these tested, audited, controlled?
  44. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Data retention automation: Gardien • An Outbrain internal service automating data retention ◦ Currently works on Hive/HDFS; MySQL in the works • Every partitioned table is owned by a person or group • Gardien knows the business demands: ◦ Rolls new partitions, knows partition scope ◦ Drops old partitions, has retention policy • Has a web interface, controlled by the business/engineers • Provides visibility to all, actionability to owners
  45. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Gardien dashboard • Create rules (partitions), edit, remove • Visible and audited
  46. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Being 'nice' to the database Work in progress • Are you happy now that you've made your engineers all-powerful? • Can you sleep well at night? • No, really. What haunts your dreams? • Darn. It's PagerDuty alert. Beep • Apparently all the slaves are lagging. • An engineer someone issued too many INSERTs
  47. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Being 'nice' to the database Work in progress • How do you protect your database against malfunctioning/abusing services? • How do you define/detect/respond to an event where your master is flooded with DMLs, and slaves just can't keep up?
  48. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain What's your slaves' serving capacity? Per DC? Per service? MySQL Master Slave Lagging Slave Slave Lagging Slave Lagging Slave Lagging Slave
  49. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Visibility: serving capacity • We measure current serving capacity and make this value visible • Not only to graphite/alerts. Also visible to any of our services. • Our services can be nice to the database by self-throttling access or postponing tasks.
  50. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Slave Visibility: serving capacity, Flow Slave Lagging Slave Slave Lagging Slave Outbrain service Zookeeper Zookeeper Zookeeper Slave Availability detector service Reads status Writes summary status Consults status, connects to DB
  51. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Forcing services to be 'nice', Work in Progress • A connection pool proxy • Proxy consults availability status • Throttles connections based on availability Outbrain service Zookeeper Zookeeper Zookeeper MySQL cluster Consults status, approves/throttles connection Proxy Attempts to get a connection
  52. MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright

    © 2014, Outbrain Contributions • We love open source and heavily rely on open source solutions. • We try to contribute back in form of patches, bug reports and subscribing for commercial support for open source projects. • Some code we have open sourced: ◦ Onering: https://github.com/outbrain/onering ◦ Graphitus: https://github.com/ezbz/graphitus ◦ Propagator: https://github.com/outbrain/propagator ◦ audit_login: https://github.com/outbrain/audit_login