Slide 1

Slide 1 text

How people build software 1 " 
 Reliable Crash Detection and Failover with Orchestrator Shlomi Noach, PerconaLive 2016

Slide 2

Slide 2 text

How people build software # Agenda • Orchestrator • Topologies, crash scenarios • Crash detection methods • Promotion complexity • Limbo states, split brain • Flapping & acknowledgement • Visibility & control • Configuration vs. State based analysis & recovery • State of the orchestra 2 #

Slide 3

Slide 3 text

How people build software # Orchestrator • MySQL replication topology manager • github.com/outbrain/orchestrator • Free & open source 3 #

Slide 4

Slide 4 text

How people build software # 4 #

Slide 5

Slide 5 text

How people build software # 5 # ! ! ! ! Simple replication What could possibly go wrong?

Slide 6

Slide 6 text

How people build software # Crash detection 6 #

Slide 7

Slide 7 text

How people build software # 7 # ! ! ! ! Observe/monitor " How do you observe your database availability?

Slide 8

Slide 8 text

How people build software # 8 # ! ! ! ! Monitor master only " Common: ping, check :3306, issue SELECT 1

Slide 9

Slide 9 text

How people build software # 9 # ! ! ! ! # " And if response is bad? - is this a false positive? - try again - and again? - How many times until you’re sure? How much time have you lost? Monitor master only

Slide 10

Slide 10 text

How people build software # 10 # ! ! ! ! Orchestrator’s observation Continuously probes your MySQL servers - Figuring out who replicates from who - Building the topology tree - Understands replication rules - At time of crash, knows what set setup should have been $ $ $ $

Slide 11

Slide 11 text

How people build software # 11 # ! ! ! ! Observe entire topology Holistic approach, used by Orchestrator " MySQL monitoring calls for MySQL specific solution - Monitor master and replicas (issue queries) - Check replicas status - Make an analysis based on result from all servers involved.

Slide 12

Slide 12 text

How people build software # 12 # ! ! ! ! ! ! ! Multi layered/multi DC replication How do you check an intermediate master (IM) availability? "

Slide 13

Slide 13 text

How people build software # 13 # ! ! ! ! ! ! ! Multi layered/multi DC replication " Monitoring the IM and its replicas give the bigger picture - you may actually not care about the IM’s availability as long as its replicas are happy Holistic approach, used by Orchestrator

Slide 14

Slide 14 text

How people build software # 14 # ! ! ! ! ! ! ! Dead intermediate master IM unreachable, its replicas are reachable, and are all in agreement their master is unreachable. Orchestrator’s analysis

Slide 15

Slide 15 text

How people build software # 15 # ! ! ! ! ! ! ! Dead master Master unreachable, its replicas are, and are all in agreement their master is unreachable. Orchestrator’s analysis

Slide 16

Slide 16 text

How people build software # 16 # ! ! ! ! ! ! ! Dead master & some replicas Master unreachable, some of its replicas are, and are all in agreement their master is unreachable. Other replicas are unreachable. Orchestrator’s analysis

Slide 17

Slide 17 text

How people build software # 17 # ! ! ! ! ! ! ! Locked master Master is reachable, but does not execute writes. - all replicas are in agreement that master is reachable - no replica is making progress can be handled as a failed master case Orchestrator’s analysis (pending)

Slide 18

Slide 18 text

How people build software # Recovery & promotion constraints • You’ve made the decision to promote a new master • Which one? • Are all options valid? • Is the current state what you think the current state is? 18 #

Slide 19

Slide 19 text

How people build software # 19 # ! ! ! ! Promotion constraints most up to date less up to date delayed 24 hours You wish to promote the most up to date replica, otherwise you give up on any replica that is more advanced

Slide 20

Slide 20 text

How people build software # 20 # ! ! ! ! Promotion constraints log_slave_updates log_slave_updates No binary logs You must not promote a replica that has no binary logs, or without log_slave_updates

Slide 21

Slide 21 text

How people build software # 21 # ! ! ! ! Promotion constraints DC1 DC1 DC2 DC1 You prefer to promote a replica from same DC as failed master

Slide 22

Slide 22 text

How people build software # 22 # ! ! ! ! Promotion constraints SBR SBR SBR RBR You must not promote Row Based Replication server on top of Statement Based Replication

Slide 23

Slide 23 text

How people build software # 23 # ! ! ! ! 5.6 5.6 5.6 5.7 Promotion constraints Promoting 5.7 means losing 5.6 (replication not forward compatible) So Perhaps worth losing the 5.7 server?

Slide 24

Slide 24 text

How people build software # 24 # ! ! ! ! 5.6 5.6 5.7 5.7 Promotion constraints But if most of your servers are 5.7, and 5.7 turns to be most up to date, better promote 5.7 and drop the 5.6 Orchestrator handles this logic and prioritizes promotion candidates by overall count and state of replicas

Slide 25

Slide 25 text

How people build software # 25 # ! ! ! ! Promotion constraints,
 real life! most up to date,
 DC2 less up to date, 
 DC1 no binary logs, 
 DC1 DC1 Orchestrator can promote one, non-ideal replica, have the rest of the replicas converge, and then refactor again, promoting an ideal server

Slide 26

Slide 26 text

How people build software # 26 # Ways to avoid promotion constraints mess Make sure first replication tier is consistent,
 Have variety on 2nd tier ! ! ! ! ! ! ! 5.6 5.7 5.7 5.6 5.6 5.6

Slide 27

Slide 27 text

How people build software # 27 # ! ! ! ! 5.6 5.6, semi-sync 5.7 5.7 Use semi-sync on designated servers. They will be most up-to-date upon failure Ways to avoid promotion constraints mess

Slide 28

Slide 28 text

How people build software # 28 # ! ! ! ! 5.6 5.6 5.7 5.7 Solve the problem by aligning relay logs on 
 the replicas upon master failure. • That’s what MHA does • Work In Progress: Orchestrator to support this!
 Will require passwordless SSH Ways to avoid promotion 
 constraints mess %%%%
 %%%%
 %%%% %%%%
 %%%%
 %%%% %%%%
 %%%%
 %%%%

Slide 29

Slide 29 text

How people build software # 29 # ! ! ! ! ! ! ! Limbos Master failed; one replica lost along. Recovery went well. What happens when master is back alive?

Slide 30

Slide 30 text

How people build software # 30 # ! ! ! ! Limbos What will promoted master say? What will lost replica say? What will lost master say? OHAI ! Give me traffic ! VIP is mine! Also, good for traffic!

Slide 31

Slide 31 text

How people build software # 31 # ! ! ! ! Solving limbos • Orchestrator forcibly breaks 
 replication on lost replica • RESET SLAVE ALL or forced detach master on promoted replica • read_only=1 on old master, if possible • iptables on old master Master_host:
 //old.master.com Can’t find coordinates! Read only!

Slide 32

Slide 32 text

How people build software # 32 # ! ! ! ! ! ! ! DC split brain DC1 DC2 You’re dead! I can’t hear you! You’re dead! " " They’re dead! They’re dead!

Slide 33

Slide 33 text

How people build software # 33

Slide 34

Slide 34 text

How people build software # Flapping & rolling failovers • The master is diagnosed as being dead • A new master is promoted • Turns out some app client is killing it • Rolling failover • What does happen to a dead master that comes back alive? 34 #

Slide 35

Slide 35 text

How people build software # Flapping & rolling failovers • Orchestrator sets a minimal interval between two automated failovers • First one is automated; an immediate one following gets blocked • A human acknowledging the first failover implicitly resets. Good to go for next automated failover. • And a human can always command a failover. 35 #

Slide 36

Slide 36 text

How people build software # Flapping & rolling failovers • Orchestrator marks a failed master as downtimed • Even if said server is back in the game (human intervention), this particular server will not be failed over in the duration of the downtime. • A human can terminate the downtime 36 #

Slide 37

Slide 37 text

How people build software # 37 # ! ! ! ! ! ! ! Recap: how orchestrator performs master failover • Detection: everyone agrees the master is dead • Is this incident muted? • Has this cluster just recently recovered from another failure without ack?

Slide 38

Slide 38 text

How people build software # 38 # ! ! ! ! ! ! ! Recap: how orchestrator performs master failover • Pick most up to date replica which will also make for least lost servers
 (the two are not necessarily the same) most up to date

Slide 39

Slide 39 text

How people build software # ! 39 # ! ! ! ! ! ! Recap: how orchestrator performs master failover • Refactor topology • Oh wait, 
 actually, now that everything’s connected, is there a better server to promote? • Go for it, refactor again • Mark old master as downtimed • Detach promoted master from old master

Slide 40

Slide 40 text

How people build software # ! 40 # ! ! ! Recap: how orchestrator performs master failover • Invoke external hooks • Orchestrator does not use nor imply a specific service discovery technique • Your own app/scripts to change VIP/ CNAME/Zk entries/Proxy/whatever

Slide 41

Slide 41 text

How people build software # Visibility & control • Flapping and rolling failovers are avoided by having memory of past/recent events • Orchestrator audits: • Detection • Recoveries • Refactoring operations (alas without context) • Owners, reasons, internal operations… • To audit table; to orchestrator log; to syslog • Audit log available via API 41 #

Slide 42

Slide 42 text

How people build software # Visibility & control • Control via: • Web interface • Web API • Command line interface • Hubot
 .orc sup
 > No incidents which require a failover to report.
 .orc recover failed.server.com
 .orc ack failed-cluster
 .orc relocate this.replica below that.one
 .orc graceful-takeover my-cluster 42 #

Slide 43

Slide 43 text

How people build software # # Configuration vs. State based recoveries 43 $ • You designate specific roles to specific servers
 i.e. this server will have to be promoted 
 or these are the relevant servers, these are not
 • You must then match your operations to those dictated rules.
 • Any change you make (provision, deprovision, relocate, …) 
 must be reflected in configuration • Implies chef/puppet deploy; reload of services In configuration based recoveries:

Slide 44

Slide 44 text

How people build software # # Configuration vs. State based recoveries 44 % • You trust the tooling to make the best of a situation
 • Basically do whatever a human would do
 • You still want to have roles for your servers • chef/puppet may still be involved • But those can be added/removed dynamically, 
 and the tooling adapts to change of state In state based recoveries:

Slide 45

Slide 45 text

How people build software # Orchestrator’s detection reliability • There is no n-nines number • Orchestrator has proven to be very accurate, in production environments • Depending on both orchestrator & MySQL configuration, detection may take ~5-10 seconds 45 #

Slide 46

Slide 46 text

How people build software # 46 # ! ! Orchestrator HA MYSQL PROXY LAYER HTTP PROXY LAYER Backend DB " " " "Leader Orchestrator services & Orchestrator is highly available • Supports multiple services competing for leadership • Requires highly available backend database. Supports master-master setup, and guarantees it to be collision free

Slide 47

Slide 47 text

How people build software # Recent developments • Binary log indexing: makes for Pseudo-GTID matching within 1s-2s. Reduced recovery time • Planned master takeover, forced master takeover • Smarter promotion rules • Fuzzy names (it’s the simple stuff that makes life happier) • SSL (Square contributions) • Better master-master support • Replication structure analysis • MIT license! (thanks @Outbrain) 47 #

Slide 48

Slide 48 text

How people build software # What’s on the roadmap? Ongoing, intended • Relay log alignment • Semi-sync (currently via contributions) Likely • Failure detection consensus / 
 leadership handover Maybe • orchestrator-agent xtrabackup Always • Reliability, performance, simplification 48 #

Slide 49

Slide 49 text

How people build software # What’s on the roadmap? GitHub commitment to Orchestrator • We use it, we will make it better • Currently merging changes upstream • GitHub will become upstream • Better documentation, tutorials, sample public AMI • World domination Open and grateful for Contributions! Please discuss via Issues beforehand 49 #

Slide 50

Slide 50 text

How people build software # Orchestrator/related talks • Choosing a MySQL HA solution today
 Michael Patrick (Percona)
 Tuesday 19, 5:15pm • Orchestrator at Square
 John Cesario, Grier Johnson, Brian Ip (Square)
 Thursday 21, 3:00pm 50 #

Slide 51

Slide 51 text

How people build software # GitHub talks • Tutorial: MySQL GTID Implementation, Maintenance, and Best Practices
 Gillian Gunson (GitHub), Brian Cain (Dropbox), Mark Filipi (SurveyMonkey), Monday 18, 9:30am • Growing MySQL at GitHub
 Tom Krouper, Jonah Berquist
 Wednesday 20, 1:00pm • Rookie DBA Mistakes: How I Screwed Up So You Don't Have To
 Gillian Gunson
 Thursday 21, 12:50pm • Co-speaking: Dirty Little Secrets
 Jonah Berquist, Shlomi Noach
 Thursday 21, 3:00pm 51 #

Slide 52

Slide 52 text

How people build software # " Thank you! Questions? github.com/shlomi-noach