Monitoring time in a distributed database: a play in three acts

by Shlomi Noach

Slide 1

Slide 1 text

Monitoring time in distributed databases: a play in three acts Shlomi Noach GitHub StatsCraft 2019

Slide 2

Slide 2 text

Agenda TL;DR: time adventures and mishaps    Throttling Consistent reads And all that follows

Slide 3

Slide 3 text

About me @github/database-infrastructure Author of orchestrator, gh-ost, freno, ccql and other open source tools. Blog at http://openark.org   github.com/shlomi-noach  @ShlomiNoach

Slide 4

Slide 4 text

GitHub  Built for developers Largest open source hosting 100M+ repositories  36M+ developers  1B+ contributions Largest supplier of octocat T-Shirts and stickers

Slide 5

Slide 5 text

Prelude

Slide 6

Slide 6 text

Asynchronous replication Single writer node Asynchronous replicas Multi layered Scale reads across replicas ! ! ! ! ! !

Slide 7

Slide 7 text

Replication lag Desired behavior: smallest possible lag • Consistent reads (aka read your own writes) • Faster/lossless/less lossy failovers ! ! ! ! ! !

Slide 8

Slide 8 text

Replication lag ! ! ! ! ! !

Slide 9

Slide 9 text

Replication lag ! ! ! ! ! !

Slide 10

Slide 10 text

Measuring lag via heartbeat Inject heartbeat on master Read replicated value on replica, compare with time now() ! ! ! ! ! !

Slide 11

Slide 11 text

Inject and read Heartbeat generated locally on writer node ! ! ! ! ! ! Inject Read & compare " Read & compare " Read & compare "

Slide 12

Slide 12 text

create table heartbeat (  anchor int unsigned not null,  ts timestamp(6),  primary key (anchor)  ); Heartbeat ! ! ! ! ! !

Slide 13

Slide 13 text

create table heartbeat (  anchor int unsigned not null,  ts timestamp(6),  primary key (anchor)  ); replace into heartbeat values (  1, now(6)  ); Heartbeat: inject on master ! ! ! ! ! !

Slide 14

Slide 14 text

create table heartbeat (  anchor int unsigned not null,  ts timestamp(6),  primary key (anchor)  ); select   unix_timestamp(now(6)) -   unix_timestamp(ts) as lag   from   heartbeat  where  anchor = 1 Heartbeat: read on replica ! ! ! ! ! !

Slide 15

Slide 15 text

Replication lag: graphing ! ! ! ! ! !

Slide 16

Slide 16 text

Act I

Slide 17

Slide 17 text

Objective: throttling

Slide 18

Slide 18 text

Throttling Break large writes into small tasks Allow writes to take place if lag is low Hold off writes when lag is high Threshold: 1sec

Slide 19

Slide 19 text

! Heartbeat injection 15:07:00.00 .050 .100 .150 .200 .950 15:07:00.000

Slide 20

Slide 20 text

! Heartbeat injection: applied on replica 15:07:00.00 .050 .100 .150 .200 .950 ! 15:07:00.000 15:07:00.004

Slide 21

Slide 21 text

! Heartbeat injection: read by app 15:07:00.00 .050 .100 .150 .200 .950 ! 15:07:00.000 15:07:00.004 # 15:07:00.007 0.007

Slide 22

Slide 22 text

! Heartbeat injection: delayed app read 15:07:00.00 .050 .100 .150 .200 .950 ! 15:07:00.000 15:07:00.004 # 15:07:00.047 0.047

Slide 23

Slide 23 text

! Heartbeat injection: delayed apply 15:07:00.00 .050 .100 .150 .200 .950 ! 15:07:00.000 15:07:00.044 # 15:07:00.047 0.047

Slide 24

Slide 24 text

Heartbeat injection: granularity +50ms

Slide 25

Slide 25 text

Act II

Slide 26

Slide 26 text

Practical constraints

Slide 27

Slide 27 text

Lag monitor service ! ! ! ! ! ! freno to monitor replication lag: • Polls all replicas at 50ms interval • Aggregates data per cluster at 25ms interval • https://githubengineering.com/mitigating-replication-lag-and-reducing-read-load-with-freno/ • https://github.com/github/freno

Slide 28

Slide 28 text

! Heartbeat injection 15:07:00.00 .050 .100 .150 .200 .950 15:07:00.000

Slide 29

Slide 29 text

! Heartbeat injection: applied on replica 15:07:00.00 .050 .100 .150 .200 .950 ! 15:07:00.000 15:07:00.004

Slide 30

Slide 30 text

! Heartbeat injection: read by freno 15:07:00.00 .050 .100 .150 .200 .950 ! 15:07:00.000 15:07:00.004 15:07:00.007 0.007

Slide 31

Slide 31 text

! Heartbeat injection: read by app 15:07:00.00 .050 .100 .150 .200 .950 ! 15:07:00.000 15:07:00.004 15:07:00.007 0.007 # 15:07:00.009

Slide 32

Slide 32 text

! Heartbeat injection: delayed app read 15:07:00.00 .050 .100 .150 .200 .950 ! 15:07:00.000 15:07:00.004 15:07:00.007 0.007 # 15:07:00.048

Slide 33

Slide 33 text

! Delayed app read, broken replica 15:07:00.00 .050 .100 .150 .200 .950 ! 15:07:00.000 15:07:00.004 15:07:00.007 0.007 # 15:07:00.048 xx

Slide 34

Slide 34 text

Heartbeat injection with freno: granularity ±50ms

Slide 35

Slide 35 text

Actual safety margins: 50ms freno sampling interval 25ms freno aggregation interval Allow additional 25ms for “extra complications” Total 100ms

Slide 36

Slide 36 text

Throttling:   granularity is not important

Slide 37

Slide 37 text

Granularity is important

Slide 38

Slide 38 text

Objective: consistent reads

Slide 39

Slide 39 text

Consistent reads,   aka read-your-own-writes A classic problem of distributed databases ! ! ! ! ! ! write expect data "

Slide 40

Slide 40 text

Consistent read checks ! ! ! ! ! ! App asks freno: “I made a write 350ms ago. Are all replicas up to date?” Client auto-requires 100ms error margin We compare replication lag with 250ms write read " check

Slide 41

Slide 41 text

Everything is terrible ! ! ! ! ! ! 100ms is where interesting stuff happens, and it’s within our error margin. write read " check

Slide 42

Slide 42 text

The metrics dilemma The metrics dilemma Can’t we just reduce the interval?

Slide 43

Slide 43 text

Act III

Slide 44

Slide 44 text

Beyond our control

Slide 45

Slide 45 text

Latency

Slide 46

Slide 46 text

High latency networks Minimal lag ! ! ! ! ! !

Slide 47

Slide 47 text

Latency: consistent reads App close to writer node, far from replica ! ! ! ! ! ! write check lag "

Slide 48

Slide 48 text

Latency: consistent reads App close to writer node, far from replica ! ! ! ! ! ! write check lag "

Slide 49

Slide 49 text

Skewed clocks

Slide 50

Slide 50 text

! Heartbeat injection 15:07:00.00 .050 .100 .150 .200 .950 15:07:00.000

Slide 51

Slide 51 text

! Heartbeat injection: applied on skewed replica 15:07:00.00 .050 .100 .150 .200 .950 ! 15:07:00.000 15:07:00.004 -> 15:06:59.994

Slide 52

Slide 52 text

! Heartbeat injection: read by app 15:07:00.00 .050 .100 .150 .200 .950 ! 15:07:00.000 15:07:00.004 -> 15:06:59.994 # 15:07:00.007 -0.003

Slide 53

Slide 53 text

! Heartbeat injection on skewed master 15:07:00.00 .050 .100 .150 .200 .950 15:07:00.025

Slide 54

Slide 54 text

! Heartbeat injection: applied on skewed replica 15:07:00.00 .050 .100 .150 .200 .950 ! 15:07:00.025 15:07:00.004

Slide 55

Slide 55 text

! Heartbeat injection: read by app 15:07:00.00 .050 .100 .150 .200 .950 ! 15:07:00.025 15:07:00.004 # 15:07:00.007 -0.018

Slide 56

Slide 56 text

Timer skew

Slide 57

Slide 57 text

Slide 58

Slide 58 text

Slide 59

Slide 59 text

Granularity limitation

Slide 60

Slide 60 text

Everything is still terrible

Slide 61

Slide 61 text

Atomic clocks

Slide 62

Slide 62 text

Clock synchronization: veriﬁcation

Slide 63

Slide 63 text

A late mitigation

Slide 64

Slide 64 text

An untimely postlude:    Can we do without clocks?

Slide 65

Slide 65 text

$ $ $ $ $ Consensus protocols

Slide 66

Slide 66 text

$ $ $ $ $ Lamport timestamps

Slide 67

Slide 67 text

MySQL: GTID Each transaction generates a GTID:  00020192-1111-1111-1111-111111111111:830541 Each server keeps track of gtid_executed: all transactions ever executed:  00020192-1111-1111-1111-111111111111:1-830541 SELECT GTID_SEUBSET(  ‘00020192-1111-1111-1111-111111111111:830541’,  @@gtid_executed  );