Monitoring time in distributed
databases: a play in three acts
Shlomi Noach
GitHub
StatsCraft 2019
Slide 2
Slide 2 text
Agenda
TL;DR: time adventures and
mishaps
Throttling
Consistent reads
And all that follows
Slide 3
Slide 3 text
About me
@github/database-infrastructure
Author of orchestrator, gh-ost, freno, ccql
and other open source tools.
Blog at http://openark.org
github.com/shlomi-noach
@ShlomiNoach
Slide 4
Slide 4 text
GitHub
Built for developers
Largest open source hosting
100M+ repositories
36M+ developers
1B+ contributions
Largest supplier of octocat T-Shirts and stickers
Slide 5
Slide 5 text
Prelude
Slide 6
Slide 6 text
Asynchronous replication
Single writer node
Asynchronous replicas
Multi layered
Scale reads across replicas
! !
!
!
!
!
Slide 7
Slide 7 text
Replication lag
Desired behavior: smallest possible lag
• Consistent reads (aka read your own writes)
• Faster/lossless/less lossy failovers
! !
!
!
!
!
Slide 8
Slide 8 text
Replication lag
! !
!
!
!
!
Slide 9
Slide 9 text
Replication lag
! !
!
!
!
!
Slide 10
Slide 10 text
Measuring lag via heartbeat
Inject heartbeat on master
Read replicated value on replica, compare with time now()
! !
!
!
!
!
Lag monitor service
! !
!
!
!
!
freno to monitor replication lag:
• Polls all replicas at 50ms interval
• Aggregates data per cluster at 25ms interval
• https://githubengineering.com/mitigating-replication-lag-and-reducing-read-load-with-freno/
• https://github.com/github/freno
Actual safety margins:
50ms freno sampling interval
25ms freno aggregation interval
Allow additional 25ms for “extra complications”
Total 100ms
Slide 36
Slide 36 text
Throttling:
granularity is not important
Slide 37
Slide 37 text
Granularity is important
Slide 38
Slide 38 text
Objective: consistent reads
Slide 39
Slide 39 text
Consistent reads,
aka read-your-own-writes
A classic problem of distributed databases
! !
!
!
!
!
write
expect data
"
Slide 40
Slide 40 text
Consistent read checks
! !
!
!
!
!
App asks freno:
“I made a write 350ms ago. Are all replicas up to date?”
Client auto-requires 100ms error margin
We compare replication lag with 250ms
write
read
"
check
Slide 41
Slide 41 text
Everything is terrible
! !
!
!
!
!
100ms is where interesting stuff happens, and it’s within our
error margin.
write
read
"
check
Slide 42
Slide 42 text
The metrics dilemma
The metrics dilemma
Can’t we just reduce the interval?
Slide 43
Slide 43 text
Act III
Slide 44
Slide 44 text
Beyond our
control
Slide 45
Slide 45 text
Latency
Slide 46
Slide 46 text
High latency networks
Minimal lag
! !
!
!
!
!
Slide 47
Slide 47 text
Latency: consistent reads
App close to writer node, far from replica
! !
!
!
!
!
write
check lag
"
Slide 48
Slide 48 text
Latency: consistent reads
App close to writer node, far from replica
! !
!
!
!
!
write
check lag
"
MySQL: GTID
Each transaction generates a GTID:
00020192-1111-1111-1111-111111111111:830541
Each server keeps track of gtid_executed: all transactions ever
executed:
00020192-1111-1111-1111-111111111111:1-830541
SELECT GTID_SEUBSET(
‘00020192-1111-1111-1111-111111111111:830541’,
@@gtid_executed
);