Slide 1

Slide 1 text

Managing and Visualizing Your Replication Topologies with Orchestrator Shlomi Noach Percona Live, Sep 2015

Slide 2

Slide 2 text

Agenda: ● What? Why? ● The killer demo ● Breakdown: discovery, refactoring, recovery, interface ● Pseudo GTID, GTID, binlog servers ● Architecture & stack ● Deployment at scale @ Booking.com ● The CLI demo ● Supported/unsupported ● Contributing Managing and Visualizing … yada yada ...Orchestrator

Slide 3

Slide 3 text

Not a sales pitch ● orchestrator is free and open source ● Designed to be as generic as possible ● Some company specific rules or processes externalized via configuration https://github.com/outbrain/orchestrator 3

Slide 4

Slide 4 text

What? Why? ● With so many replication topologies; with many servers per topology, spanning multiple data centers; with the periodic server failures and movements, - Do you know how your topologies look like? - Does management know? ● With the complexity of moving slaves around the topology; the rules allowing/disallowing server X to replication from Y; with the implications of cross-DC traffic on slave latency, - Who in your company can refactor your topologies other than yourself? ● In the event of server failure, master or intermediate master breakage, - Do you have a clear visual into what fails? - What kind of solutions do you use? - Who can execute failover / override a failover / understand what’s going on? 4

Slide 5

Slide 5 text

The killer demo or Let’s break our production servers right now and deal with the consequences once this conference is over

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Orchestrator breakdown: Discovery ● Crawls through your topologies ● Automatically recognizes new servers ● Resolves IPs, CNAMEs ● Revisits your servers periodically ● Collects data (version, binlogs, replication, …) 7

Slide 8

Slide 8 text

8 Anonymous Booking.com topology

Slide 9

Slide 9 text

9 Anonymous Booking.com topology

Slide 10

Slide 10 text

10 Anonymous Booking.com topology

Slide 11

Slide 11 text

Orchestrator breakdown: Refactoring ● Understands: ● binlog file:pos ● Pseudo-GTID ● GTID (Oracle + MariaDB) ● binlog servers ● Knows the rules for replication X from Y ● Will refactor your topology for you: safely redesign your topology ● Fine grained control or “just do it for me, I’m too tired to think” ● Can refactor via slick web UI ● Or via nerdy command line interface 11

Slide 12

Slide 12 text

Orchestrator breakdown: Recovery ● Keeps a state of your topology ● Uses holistic approach to detect failures http://code.openark.org/blog/mysql/what-makes-a-mysql-server-failurerecovery-case ● If replication breaks, orchestrator knows what the expected topology looked like ● And can recommend “the next best option”, based on state, not on configuration ● And, if you like, can execute an automated/manual failover that heals your topology and leaves no slave (or only those utterly incapable of restoring) behind 12

Slide 13

Slide 13 text

Orchestrator breakdown: Interface ● Command line ● Well formed output ● Go-to if you like your --debug logs ● Web API ● Simple GET (not REST) ● Web UI ● Uses Web API ● Designed to be friendly 13

Slide 14

Slide 14 text

Binary logs coordinates: recap ● Replication based on file:pos ● Different file names on masters & slaves ● Different positions on masters & slaves ● Once the connection is broken, difficult to match up again 14

Slide 15

Slide 15 text

MySQL binary & relay logs 15 Master Slave

Slide 16

Slide 16 text

MySQL binary & relay logs: different languages 16 Master Slave

Slide 17

Slide 17 text

MySQL binary & relay logs: even more languages 17 Master Slave Slave

Slide 18

Slide 18 text

GTID ● Every transaction has a unique identifier ● When a slave connects to a master, it looks for the last GTID statement it already executed ● Available in Oracle MySQL 5.6, MariaDB 10.0 ● Completely different implementations; may cause lockup ● 5.6 migration path is painful (alleviated in 5.7) ● 5.6 requires binary logs & log-slave-updates enabled on all slaves (alleviated in 5.7) ● 5.6 errant transactions, unexecuted sequences ● GTID will be the requirement in future Oracle features ● MariaDB GTID supports domains; easy to use 18

Slide 19

Slide 19 text

Pseudo GTID ● Pseudo GTID offers what GTID offers, without GTID. This includes: ● Slave repointing ● Failover schemes ● With less requirements ● Bulk operations ● Without upgrading your servers; without installing anything on them; in short: not touching your beloved existing setup ● No vendor lockdown; no migration paths 19

Slide 20

Slide 20 text

Pseudo GTID ● Application-side enhancement ● We inject a uniquely identified statement every X seconds. We call it Pseudo GTID. ● Pseudo GTID statements are searchable and identifiable in binary and relay logs ● Make for “markers” in the binary/relay logs ● Injection can be made via MySQL event scheduler or externally ● Otherwise non intrusive. No changes to topology/versions/methodologies 20

Slide 21

Slide 21 text

Injecting Pseudo GTID create event if not exists create_pseudo_gtid_event on schedule every 5 second starts current_timestamp on completion preserve enable do begin set @pseudo_gtid_hint := uuid(); set @_create_statement := concat('drop ', 'view if exists `meta`.`_pseudo_gtid_hint__', @pseudo_gtid_hint, '`'); PREPARE st FROM @_create_statement; EXECUTE st; DEALLOCATE PREPARE st; end $$ 21

Slide 22

Slide 22 text

In the binary logs mysql> show binlog events in 'mysql-bin.015631' \G ... Log_name: mysql-bin.015631 Pos: 1632 Event_type: Query Server_id: 1 End_log_pos: 1799 Info: use `meta`; drop view if exists `meta`.`_pseudo_gtid_hint__50731a22-9ca4- 11e4-aec4-e25ec4bd144f` ... 22

Slide 23

Slide 23 text

Recap: MySQL binary & relay logs 23 Master Slave

Slide 24

Slide 24 text

MySQL binary & relay logs: a virtual contiguous log file 24 Master Slave

Slide 25

Slide 25 text

MySQL binary & relay logs: Pseudo GTID injection 25 Master Slave insert > PGTID 17 update delete create > PGTID 82 delete delete > PGTID 56 insert insert update drop update insert > PGTID 17 update delete create > PGTID 82 delete delete > PGTID 56 insert insert update drop insert > PGTID 17 update delete create > PGTID 82 delete delete > PGTID 56 insert insert

Slide 26

Slide 26 text

insert > PGTID 17 update delete create > PGTID 82 delete delete > PGTID 56 insert insert Pseudo GTID: repoint, based on binary logs 26 Master Slave insert > PGTID 17 update delete create > PGTID 82 delete delete > PGTID 56 insert insert update drop update

Slide 27

Slide 27 text

insert > PGTID 17 update delete create > PGTID 82 delete delete > PGTID 56 insert insert update drop Pseudo GTID: repoint, based on relay logs 27 Master Slave insert > PGTID 17 update delete create > PGTID 82 delete delete > PGTID 56 insert insert update drop update

Slide 28

Slide 28 text

● Please see https://speakerdeck.com/shlominoach/pseudo-gtid-and-easy-mysql-replication-topology- management to learn about advantages, limitations and implementation. ● Pseudo-GTID deployed on all Booking.com chains More on Pseudo GTID 28

Slide 29

Slide 29 text

● A MySQL-server-like entity ● Which merely relays the master’s binary logs ● Under same name and position ● Nested binlog servers allow for simplified refactoring and offer a simplified & faster master recovery mechanism ● See Binlog Servers @ Booking.com https://www.percona.com/live/europe-amsterdam-2015/sessions/binlog-servers-bookingcom ● Orchestrator supports: ● hybrid standard + binlog-server replication topologies ● Pure binlog server topologies Binlog Servers 29

Slide 30

Slide 30 text

● Can execute as long-running service ● Provides HTTP UI, Web API ● Polls servers, checks for crashes, recovers, periodic operations ● Leader election ● Can run as command line ● Issue a single command & exit ● Requires (same, single) MySQL backend for any operation. ● Backend database has the state of topologies ● orchestrator itself mostly stateless (pending operation excluded, optimistic mode) ● Agent-less for most operations; communicates directly with MySQL instances Orchestrator architecture 30

Slide 31

Slide 31 text

● HA: orchestrator leader election ● Self healing backend data ● All locks auto expiring ● Support authentication (basic-auth, reverse proxy) ● Operations friendly, e.g.: ● Server maintenance flag ● Downtiming servers ● Marking as “best candidate” Orchestrator architecture 31

Slide 32

Slide 32 text

Orchestrator general architecture 32 orchestrator service backend db web API web xface

Slide 33

Slide 33 text

● Stack: ● golang - in retrospect a very good choice: a lot of concurrency; easy deployment; rapid development ● MySQL as backend database (duh) ● go-martini web framework ● Page generation via dirty JavaScript/jQuery (sue me) ● Twitter bootstrap ● Graphs via D3 ● Development: ● Github, open source; accepting pull-requests https://github.com/outbrain/orchestrator/ Orchestrator stack & development 33

Slide 34

Slide 34 text

MySQL @ Booking.com 2015 ● We are a big MySQL shop ● We have ALOT production servers on ALOT topologies (aka chains, aka clusters) ● As small as 1 server per topology, as large as hundreds of servers per topology ● Two major data centers, now populating our third ● Single master, plenty slaves ● All chains are deployed with Pseudo-GTID and controlled by orchestrator ● Larger chains: hybrid, normal + binlog servers topologies (complex!) ● “Pure” binlog-server topologies experimental, non-production ● Some topologies sharded ● A little bit of active/passive master-master 34

Slide 35

Slide 35 text

Orchestrator architecture @ Booking.com 35 app leader app app app HTTP load balancer orchestrator-cli on all MySQL nodes

Slide 36

Slide 36 text

Orchestrator @ Booking.com 2015 ● 5-6 hosts running orchestrator service, one is elected as leader at any given time ● ALOT hosts with orchestrator as CLI ● Single elected service polls all our instances ● Each MySQL instance polled every 30s ● Pseudo-GTID deployed on all chains ● Orchestrator configured to auto-recover the death of any intermediate master ● Orchestrator configured to auto recover from some master failures ● Both the above happen ● Some checks & dashboards rely on orchestrator data (API / DB) ● Some operations rely on orchestrator logic 36

Slide 37

Slide 37 text

CLI demo Where real stuff gets done including pie charts ..,.. .M +.. .....N$7$$O$77MZN?.. .~8?: M+8.. ...~7$7777$77$$$$$$$$$7D......=I+?M. .M?+=MMM$$Z$Z$$7$$$7$$7$77$$$$7$$$MZI+=I= . .~III7$$$$$$7777$$$$777777777$$$$$7$7?IM . .M7Z$$777777777777777777777$$$ZD+Z?M. MZ$$77777=7777777777$$Z7777$Z8OMMN~. .M,M8$7777O.MI7$$7IONMMMMMZ777$N.8DM,NM8.,.... ... M+D.MZ7777Z~M88OOODNZZZZZ8NM$77NM=..MD7$Z7$ZM:..,, ..~:DO777IMZZ$777$7777777$$ZZOMODNNNZ$7ZZ$ZZZZ$ZZZZZ8ONMM.. ..MMNZ$77M8Z$77777777777777777$ZM$7I7I$ZZ$ZZOO$OZO$ZZZZZZZ8M... M777778MZZ777777777777777777777$ZN$777$Z$Z$Z$ZOZOOZOZZZZ$7$$Z$ZMDI$7ZM.. I777$NZZ777777777777777777777777$ZOD77$Z$$ZZZZOZ$$$O$$$ZZZZZZZOZZOOZOZZOZ... M7$7MO$777777777777777777777777777O8M77Z$ZZZZZ$ZZ$Z$$$$ZZ$ZZZ$$$ZZZZZ$ZZZZ8M. OI$77$877777777777777777777777777777$ZZI$$$ZZ$ZZZZZZZZ$$ZZ$Z$$$ZZZZ$$ZZZZ$$O$M.. .,$$$I777777777777777777777777777777777$Z7$$$$ZZZZZ$Z$ZZZZ$$ZZ$ZZZ$ZZZ$$$ZZZZZZZI8. M777777777777777777777777777777777777777$$$7$ZZZZ$ZZZZZZZZZZZZ7$ZZZZZZZ$ZZZZZ$$$ZO,. .NZ7I77777777777777777777777777777777777777$$ZZZZZ$$ZZZZ7ZZZZZ$Z$$$$$Z$$$Z$Z7$Z$OODMM ~$$7II77777777777777777777777777777777777777$Z$$ZZ$$ZZ$Z$$$$Z$ZZZZ7$$Z$Z$$$Z$ZZ777$.. ,ZZ$7III7777777777777777777777777777777I77777Z$$$$$$$$$$$Z$$$Z$77$ZZ$Z$$$$$Z$ZZ7$$77M,.. .NZMZ$III777777777777777777777777777777III7777Z$$Z7$$$$77$$$$Z$$Z$7$$$$$$$Z77$$$$7$$$$M:... ,:.MZ$IIII7777777777777777777777777777IIII7777ZZ$$$$$$$777$Z77Z7Z$$$$Z$Z$$$$Z$$$$$7$$$7,:. .NOI7III777777777777777777777777III7IIIII777$$$$$$$$$$Z$$$$$$$7$$$$$$$7$$7$$$$$$$$$$I8. OZOI77II77777777777777777777777IIIIIIII7777Z$$7$7Z$$$7$7$$$7$$$$$$$$$$$$Z7$$Z77777$77$.. .,777$7III777777777777777777IIIIIIIIIII77777Z$$$$7$$$Z7$$$$$$7$Z$$$$$7$Z$7Z$$$$7777II7$M. .:$MI$$I$7I77I7I7777777777777IIIIII?II777777$$$$$$$$7$$$7$$$$777II7$$777Z$$Z$$$77$I$7I7+. .:N I7O7$$IIII7I7777777777IIIIIII???77777777$$$$$$$$$$7$77$$$7I77IIIII77$$$$$77II7$I$7$O ?$ZZ$$I?7IIIIIIIIIIIIIIII????I?I777777II$777$7$7$777I$I7II7???I??II7I778777777I77$$M. +8O$I$IIIIIII7III7IIIIII?????I?I7777777II77I7I$$7777777IIII??III?IIIIII7I77777$7777I.. .MMII7I7III7III7II??7?????IIII777777777II77$7III7II777777I7II7I7III77?I?77II77I$7777N ,8OZMII7I7I?7I7????I????I7I77777777777III777II?II7I77I777II?II77777I77I7777II7777777N ,8ONO$Z$77??I??7+?+???IN7777777777777O$$III77III?I7I?II?77?II7777777777?77I77II7777DM. .OZZOONMZ77ZI77??I?+I++7$$777777777777OZ$7I7?IIII?IIIIIIII?IDII7777777777777777I777777M . $$$$ZZZOOMZ7$I$7II?77I7$ZD$77777777777$D$77IIIIIIIIIIIIIIIIZ$$M$$7777777777I777777777777.. ..77$$$$$ZZZOMIII7$$I$$7$$ZZ$777777777777O$$?IIIIII7I7IIIIII7I7$$$MI7777777777777777777777M. =I777$I777$$Z$ZMO777777777$$M$777777777777M$7IIIIIIIIIIIIIIIIIII77$M$I7777777777777777777I7M. .O77777777777$M.. OM.877777$$7$777777777777$Z$7I7IIIIIIIIIIII7I7II77ZO7777777777777777777777. M7777777777777. .877I77$$77777777777777MN7$7IIIIIIIIIIIII?77MD7$D77777777777777777I7777D. NM7M+MM++MNM,,. :MMN77777777777777IMZ7Z?IIIN$I7I.Z8M$M.$ON+M77777I77II7DMM77MM?II7M. ,M. .DM7MDIIMMM77MM.. .8 .=MM. .7 . MN77IMZ7NMM$.M. N .MM..M:..MOOMM.~. ...... ..

Slide 38

Slide 38 text

orchestrator -c discover -i replica1.host:3306 orchestrator -c topology -i master.host:3306 orchestrator -c relocate -i replica3.host:3306 -d replica2.host:3306 orchestrator -c relocate -i replica3.host:3306 -d master.host --debug orchestrator -c which-slaves -i master.host | while read i ; do orchestrator -c disable-gtid -i $i ; done orchestrator -c regroup-slaves -i master.host --debug orchestrator -c relocate-slaves -i replica1.host -d master.host orchestrator -c which-slaves -i master.host | while read i ; do orchestrator -c enable-gtid -i $i ; done CLI samples

Slide 39

Slide 39 text

In-production experiments, trust ● Periodic crash experiments ● And periodically avoiding these experiments as well ● Getting more people involved (on call sys admins) ● ALOT of input is gained by people inexperienced with MySQL, leading to more visibility on orchestrator’s side ● And of course periodic real crash scenarios 39

Slide 40

Slide 40 text

Supported replication topologies & technologies ● Standard binlog file:pos replication ● GTID (Oracle & MariaDB) ● Pseudo GTID ● Binlog servers (MaxScale) ● Statement based, row based, semi-sync replication ● Single master replication ● Master-master (2 node circular) replication ● 5.7 parallel replication (in-order required for Pseudo-GTID) 40

Slide 41

Slide 41 text

Unsupported ● 5.6 per-schema parallel replication ● Discovery & visualization good, operations unsupported ● Master-master-master (#nodes > 2) replication ● Galera ● Unrecognized by orchestrator, identifies each co-master as its own head of topology ● Multi-master aka multi source (neither Oracle 5.7 nor MariaDB) ● Tungsten 41

Slide 42

Slide 42 text

Contributions & usage ● Known to be deployed by various companies ● Orchestrator accepts pull requests ● Please consider making your own PR ● Please submit bug reports ● Please assist in documentation 42

Slide 43

Slide 43 text

● Orchestrator manual https://github.com/outbrain/orchestrator/wiki/Orchestrator-Manual ● Orchestrator deployment https://github.com/outbrain/orchestrator/wiki/Orchestrator-deployment ● Orchestrator first steps https://github.com/outbrain/orchestrator/wiki/Orchestrator-first-steps ● Orchestrator for developers https://github.com/outbrain/orchestrator/wiki/Orchestrator-for-developers ● openark.org http://code.openark.org/blog/tag/orchestrator http://code.openark.org/blog/tag/pseudo-gtid ● Binlog servers master promotion http://blog.booking.com/abstracting_binlog_servers_and_mysql_master_promotion_wo_reconfiguring_slaves.html Links of interest

Slide 44

Slide 44 text

● http://hyperboleandahalf.blogspot.nl/2010/04/alot-is-better-than-you-at-everything.html ● http://www.hbo.com/game-of-thrones ● https://imgflip.com/memegenerator/Brace-Yourselves-X-is-Coming ● http://www.glassgiant.com/ascii/ ● https://www.keepcalm-o-matic.co.uk/ ● @isamlambert for making a couple sparkles to ignite this ● Team @ Booking.com for ideas, input, time testing, time using ● Contributors! Image, sources & other credits

Slide 45

Slide 45 text

● Binlog Servers at Booking.com https://www.percona.com/live/europe-amsterdam-2015/sessions/binlog-servers-bookingcom ● Combining Redis and MySQL to store HTTP cookie data https://www.percona.com/live/europe-amsterdam-2015/sessions/combining-redis-and-mysql-store-http-cookie-data ● Encrypted MySQL Backups and instant recoverability on large scale https://www.percona.com/live/europe-amsterdam-2015/sessions/encrypted-mysql-backups-and-instant-recoverability-large-scale ● Events storage and analysis with Riak at Booking.com https://www.percona.com/live/europe-amsterdam-2015/sessions/events-storage-and-analysis-riak-bookingcom ● Riding the Binlog: an in Deep Dissection of the Replication Stream https://www.percona.com/live/europe-amsterdam-2015/sessions/riding-binlog-deep-dissection-replication-stream ● Unicode and MySQL https://www.percona.com/live/europe-amsterdam-2015/sessions/unicode-and-mysql ● Your Clone Army: Better scalability through more database servers https://www.percona.com/live/europe-amsterdam-2015/sessions/your-clone-army-better-scalability-through-more-database-servers ● The CIS MySQL Security Benchmark (LT) https://www.percona.com/live/europe-amsterdam-2015/sessions/cis-mysql-security-benchmark ● The Virtues of Boring Technology (Keynote) https://www.percona.com/live/europe-amsterdam-2015/sessions/virtues-boring-technology Other Booking.com talks

Slide 46

Slide 46 text

Questions? @ShlomiNoach http://openark.org http://blog.booking.com Thank you!