Monitoring time in a distributed database: a play in three acts

Monitoring time in a distributed database: a play in three acts

Monitoring time is tricky given its fluid nature. Doing so across distributed database hosts is trickier. Latency, probe intervals, clock synchronization, all affect the metrics, and taking actions based on those metrics makes matters even more complex. How does one measure time? What is the baseline? What accuracy and tradeoffs can we expect? Can we use time itself to affect the outcome? At GitHub, we monitor time in our database topologies for throttling and consistent reads purposes. We present our use case and our findings.

168ccec72eee0530b818d44f3fedaacf?s=128

Shlomi Noach

May 14, 2019
Tweet

Transcript

  1. Monitoring time in distributed databases: a play in three acts

    Shlomi Noach GitHub StatsCraft 2019
  2. Agenda TL;DR: time adventures and mishaps
 
 Throttling Consistent reads

    And all that follows
  3. About me @github/database-infrastructure Author of orchestrator, gh-ost, freno, ccql and

    other open source tools. Blog at http://openark.org 
 github.com/shlomi-noach
 @ShlomiNoach
  4. GitHub
 Built for developers Largest open source hosting 100M+ repositories


    36M+ developers
 1B+ contributions Largest supplier of octocat T-Shirts and stickers
  5. Prelude

  6. Asynchronous replication Single writer node Asynchronous replicas Multi layered Scale

    reads across replicas ! ! ! ! ! !
  7. Replication lag Desired behavior: smallest possible lag • Consistent reads

    (aka read your own writes) • Faster/lossless/less lossy failovers ! ! ! ! ! !
  8. Replication lag ! ! ! ! ! !

  9. Replication lag ! ! ! ! ! !

  10. Measuring lag via heartbeat Inject heartbeat on master Read replicated

    value on replica, compare with time now() ! ! ! ! ! !
  11. Inject and read Heartbeat generated locally on writer node !

    ! ! ! ! ! Inject Read & compare " Read & compare " Read & compare "
  12. create table heartbeat (
 anchor int unsigned not null,
 ts

    timestamp(6),
 primary key (anchor)
 ); Heartbeat ! ! ! ! ! !
  13. create table heartbeat (
 anchor int unsigned not null,
 ts

    timestamp(6),
 primary key (anchor)
 ); replace into heartbeat values (
 1, now(6)
 ); Heartbeat: inject on master ! ! ! ! ! !
  14. create table heartbeat (
 anchor int unsigned not null,
 ts

    timestamp(6),
 primary key (anchor)
 ); select 
 unix_timestamp(now(6)) - 
 unix_timestamp(ts) as lag 
 from 
 heartbeat
 where
 anchor = 1 Heartbeat: read on replica ! ! ! ! ! !
  15. Replication lag: graphing ! ! ! ! ! !

  16. Act I

  17. Objective: throttling

  18. Throttling Break large writes into small tasks Allow writes to

    take place if lag is low Hold off writes when lag is high Threshold: 1sec
  19. ! Heartbeat injection 15:07:00.00 .050 .100 .150 .200 .950 15:07:00.000

  20. ! Heartbeat injection: applied on replica 15:07:00.00 .050 .100 .150

    .200 .950 ! 15:07:00.000 15:07:00.004
  21. ! Heartbeat injection: read by app 15:07:00.00 .050 .100 .150

    .200 .950 ! 15:07:00.000 15:07:00.004 # 15:07:00.007 0.007
  22. ! Heartbeat injection: delayed app read 15:07:00.00 .050 .100 .150

    .200 .950 ! 15:07:00.000 15:07:00.004 # 15:07:00.047 0.047
  23. ! Heartbeat injection: delayed apply 15:07:00.00 .050 .100 .150 .200

    .950 ! 15:07:00.000 15:07:00.044 # 15:07:00.047 0.047
  24. Heartbeat injection: granularity +50ms

  25. Act II

  26. Practical constraints

  27. Lag monitor service ! ! ! ! ! ! freno

    to monitor replication lag: • Polls all replicas at 50ms interval • Aggregates data per cluster at 25ms interval • https://githubengineering.com/mitigating-replication-lag-and-reducing-read-load-with-freno/ • https://github.com/github/freno
  28. ! Heartbeat injection 15:07:00.00 .050 .100 .150 .200 .950 15:07:00.000

  29. ! Heartbeat injection: applied on replica 15:07:00.00 .050 .100 .150

    .200 .950 ! 15:07:00.000 15:07:00.004
  30. ! Heartbeat injection: read by freno 15:07:00.00 .050 .100 .150

    .200 .950 ! 15:07:00.000 15:07:00.004 15:07:00.007 0.007
  31. ! Heartbeat injection: read by app 15:07:00.00 .050 .100 .150

    .200 .950 ! 15:07:00.000 15:07:00.004 15:07:00.007 0.007 # 15:07:00.009
  32. ! Heartbeat injection: delayed app read 15:07:00.00 .050 .100 .150

    .200 .950 ! 15:07:00.000 15:07:00.004 15:07:00.007 0.007 # 15:07:00.048
  33. ! Delayed app read, broken replica 15:07:00.00 .050 .100 .150

    .200 .950 ! 15:07:00.000 15:07:00.004 15:07:00.007 0.007 # 15:07:00.048 xx
  34. Heartbeat injection with freno: granularity ±50ms

  35. Actual safety margins: 50ms freno sampling interval 25ms freno aggregation

    interval Allow additional 25ms for “extra complications” Total 100ms
  36. Throttling: 
 granularity is not important

  37. Granularity is important

  38. Objective: consistent reads

  39. Consistent reads, 
 aka read-your-own-writes A classic problem of distributed

    databases ! ! ! ! ! ! write expect data "
  40. Consistent read checks ! ! ! ! ! ! App

    asks freno: “I made a write 350ms ago. Are all replicas up to date?” Client auto-requires 100ms error margin We compare replication lag with 250ms write read " check
  41. Everything is terrible ! ! ! ! ! ! 100ms

    is where interesting stuff happens, and it’s within our error margin. write read " check
  42. The metrics dilemma The metrics dilemma Can’t we just reduce

    the interval?
  43. Act III

  44. Beyond our control

  45. Latency

  46. High latency networks Minimal lag ! ! ! ! !

    !
  47. Latency: consistent reads App close to writer node, far from

    replica ! ! ! ! ! ! write check lag "
  48. Latency: consistent reads App close to writer node, far from

    replica ! ! ! ! ! ! write check lag "
  49. Skewed clocks

  50. ! Heartbeat injection 15:07:00.00 .050 .100 .150 .200 .950 15:07:00.000

  51. ! Heartbeat injection: applied on skewed replica 15:07:00.00 .050 .100

    .150 .200 .950 ! 15:07:00.000 15:07:00.004 -> 15:06:59.994
  52. ! Heartbeat injection: read by app 15:07:00.00 .050 .100 .150

    .200 .950 ! 15:07:00.000 15:07:00.004 -> 15:06:59.994 # 15:07:00.007 -0.003
  53. ! Heartbeat injection on skewed master 15:07:00.00 .050 .100 .150

    .200 .950 15:07:00.025
  54. ! Heartbeat injection: applied on skewed replica 15:07:00.00 .050 .100

    .150 .200 .950 ! 15:07:00.025 15:07:00.004
  55. ! Heartbeat injection: read by app 15:07:00.00 .050 .100 .150

    .200 .950 ! 15:07:00.025 15:07:00.004 # 15:07:00.007 -0.018
  56. Timer skew

  57. GC

  58. VM

  59. Granularity limitation

  60. Everything is still terrible

  61. Atomic clocks

  62. Clock synchronization: verification

  63. A late mitigation

  64. An untimely postlude:
 
 Can we do without clocks?

  65. $ $ $ $ $ Consensus protocols

  66. $ $ $ $ $ Lamport timestamps

  67. MySQL: GTID Each transaction generates a GTID:
 00020192-1111-1111-1111-111111111111:830541 Each server

    keeps track of gtid_executed: all transactions ever executed:
 00020192-1111-1111-1111-111111111111:1-830541 SELECT GTID_SEUBSET(
 ‘00020192-1111-1111-1111-111111111111:830541’,
 @@gtid_executed
 );
  68. And yet the search for time metrics endures… %

  69. Questions? github.com/shlomi-noach @ShlomiNoach Thank you!