Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monitoring time in a distributed database: a play in three acts

Monitoring time in a distributed database: a play in three acts

Monitoring time is tricky given its fluid nature. Doing so across distributed database hosts is trickier. Latency, probe intervals, clock synchronization, all affect the metrics, and taking actions based on those metrics makes matters even more complex. How does one measure time? What is the baseline? What accuracy and tradeoffs can we expect? Can we use time itself to affect the outcome? At GitHub, we monitor time in our database topologies for throttling and consistent reads purposes. We present our use case and our findings.

Shlomi Noach

May 14, 2019
Tweet

More Decks by Shlomi Noach

Other Decks in Technology

Transcript

  1. Monitoring time in distributed
    databases: a play in three acts
    Shlomi Noach
    GitHub
    StatsCraft 2019

    View full-size slide

  2. Agenda
    TL;DR: time adventures and
    mishaps


    Throttling
    Consistent reads
    And all that follows

    View full-size slide

  3. About me
    @github/database-infrastructure
    Author of orchestrator, gh-ost, freno, ccql
    and other open source tools.
    Blog at http://openark.org

    github.com/shlomi-noach

    @ShlomiNoach

    View full-size slide

  4. GitHub

    Built for developers
    Largest open source hosting
    100M+ repositories

    36M+ developers

    1B+ contributions
    Largest supplier of octocat T-Shirts and stickers

    View full-size slide

  5. Asynchronous replication
    Single writer node
    Asynchronous replicas
    Multi layered
    Scale reads across replicas
    ! !
    !
    !
    !
    !

    View full-size slide

  6. Replication lag
    Desired behavior: smallest possible lag
    • Consistent reads (aka read your own writes)
    • Faster/lossless/less lossy failovers
    ! !
    !
    !
    !
    !

    View full-size slide

  7. Replication lag
    ! !
    !
    !
    !
    !

    View full-size slide

  8. Replication lag
    ! !
    !
    !
    !
    !

    View full-size slide

  9. Measuring lag via heartbeat
    Inject heartbeat on master
    Read replicated value on replica, compare with time now()
    ! !
    !
    !
    !
    !

    View full-size slide

  10. Inject and read
    Heartbeat generated locally on writer node
    ! !
    !
    !
    !
    !
    Inject
    Read & compare
    " Read & compare
    "
    Read & compare
    "

    View full-size slide

  11. create table heartbeat (

    anchor int unsigned not null,

    ts timestamp(6),

    primary key (anchor)

    );
    Heartbeat
    ! !
    !
    !
    !
    !

    View full-size slide

  12. create table heartbeat (

    anchor int unsigned not null,

    ts timestamp(6),

    primary key (anchor)

    );
    replace into heartbeat values (

    1, now(6)

    );
    Heartbeat: inject on master
    ! !
    !
    !
    !
    !

    View full-size slide

  13. create table heartbeat (

    anchor int unsigned not null,

    ts timestamp(6),

    primary key (anchor)

    );
    select 

    unix_timestamp(now(6)) - 

    unix_timestamp(ts) as lag 

    from 

    heartbeat

    where

    anchor = 1
    Heartbeat: read on replica
    ! !
    !
    !
    !
    !

    View full-size slide

  14. Replication lag: graphing
    ! !
    !
    !
    !
    !

    View full-size slide

  15. Objective: throttling

    View full-size slide

  16. Throttling
    Break large writes into small tasks
    Allow writes to take place if lag is low
    Hold off writes when lag is high
    Threshold: 1sec

    View full-size slide

  17. !
    Heartbeat injection
    15:07:00.00 .050 .100 .150 .200
    .950
    15:07:00.000

    View full-size slide

  18. !
    Heartbeat injection: applied on replica
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004

    View full-size slide

  19. !
    Heartbeat injection: read by app
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004
    # 15:07:00.007
    0.007

    View full-size slide

  20. !
    Heartbeat injection: delayed app read
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004
    # 15:07:00.047
    0.047

    View full-size slide

  21. !
    Heartbeat injection: delayed apply
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.044
    # 15:07:00.047
    0.047

    View full-size slide

  22. Heartbeat injection: granularity
    +50ms

    View full-size slide

  23. Practical constraints

    View full-size slide

  24. Lag monitor service
    ! !
    !
    !
    !
    !
    freno to monitor replication lag:
    • Polls all replicas at 50ms interval
    • Aggregates data per cluster at 25ms interval
    • https://githubengineering.com/mitigating-replication-lag-and-reducing-read-load-with-freno/
    • https://github.com/github/freno

    View full-size slide

  25. !
    Heartbeat injection
    15:07:00.00 .050 .100 .150 .200
    .950
    15:07:00.000

    View full-size slide

  26. !
    Heartbeat injection: applied on replica
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004

    View full-size slide

  27. !
    Heartbeat injection: read by freno
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004
    15:07:00.007
    0.007

    View full-size slide

  28. !
    Heartbeat injection: read by app
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004
    15:07:00.007
    0.007
    # 15:07:00.009

    View full-size slide

  29. !
    Heartbeat injection: delayed app read
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004
    15:07:00.007
    0.007
    # 15:07:00.048

    View full-size slide

  30. !
    Delayed app read, broken replica
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004
    15:07:00.007
    0.007
    # 15:07:00.048
    xx

    View full-size slide

  31. Heartbeat injection with freno: granularity
    ±50ms

    View full-size slide

  32. Actual safety margins:
    50ms freno sampling interval
    25ms freno aggregation interval
    Allow additional 25ms for “extra complications”
    Total 100ms

    View full-size slide

  33. Throttling: 

    granularity is not important

    View full-size slide

  34. Granularity is important

    View full-size slide

  35. Objective: consistent reads

    View full-size slide

  36. Consistent reads, 

    aka read-your-own-writes
    A classic problem of distributed databases
    ! !
    !
    !
    !
    !
    write
    expect data
    "

    View full-size slide

  37. Consistent read checks
    ! !
    !
    !
    !
    !
    App asks freno:
    “I made a write 350ms ago. Are all replicas up to date?”
    Client auto-requires 100ms error margin
    We compare replication lag with 250ms
    write
    read
    "
    check

    View full-size slide

  38. Everything is terrible
    ! !
    !
    !
    !
    !
    100ms is where interesting stuff happens, and it’s within our
    error margin.
    write
    read
    "
    check

    View full-size slide

  39. The metrics dilemma
    The metrics dilemma
    Can’t we just reduce the interval?

    View full-size slide

  40. Beyond our
    control

    View full-size slide

  41. High latency networks
    Minimal lag
    ! !
    !
    !
    !
    !

    View full-size slide

  42. Latency: consistent reads
    App close to writer node, far from replica
    ! !
    !
    !
    !
    !
    write
    check lag
    "

    View full-size slide

  43. Latency: consistent reads
    App close to writer node, far from replica
    ! !
    !
    !
    !
    !
    write
    check lag
    "

    View full-size slide

  44. Skewed clocks

    View full-size slide

  45. !
    Heartbeat injection
    15:07:00.00 .050 .100 .150 .200
    .950
    15:07:00.000

    View full-size slide

  46. !
    Heartbeat injection: applied on skewed replica
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004 -> 15:06:59.994

    View full-size slide

  47. !
    Heartbeat injection: read by app
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.000
    15:07:00.004 -> 15:06:59.994
    # 15:07:00.007
    -0.003

    View full-size slide

  48. !
    Heartbeat injection on skewed master
    15:07:00.00 .050 .100 .150 .200
    .950
    15:07:00.025

    View full-size slide

  49. !
    Heartbeat injection: applied on skewed replica
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.025
    15:07:00.004

    View full-size slide

  50. !
    Heartbeat injection: read by app
    15:07:00.00 .050 .100 .150 .200
    .950
    !
    15:07:00.025
    15:07:00.004
    # 15:07:00.007
    -0.018

    View full-size slide

  51. Granularity limitation

    View full-size slide

  52. Everything
    is still
    terrible

    View full-size slide

  53. Atomic clocks

    View full-size slide

  54. Clock synchronization: verification

    View full-size slide

  55. A late mitigation

    View full-size slide

  56. An untimely postlude:


    Can we do without clocks?

    View full-size slide

  57. $ $
    $
    $
    $
    Consensus
    protocols

    View full-size slide

  58. $ $
    $
    $
    $
    Lamport
    timestamps

    View full-size slide

  59. MySQL: GTID
    Each transaction generates a GTID:

    00020192-1111-1111-1111-111111111111:830541
    Each server keeps track of gtid_executed: all transactions ever
    executed:

    00020192-1111-1111-1111-111111111111:1-830541
    SELECT GTID_SEUBSET(

    ‘00020192-1111-1111-1111-111111111111:830541’,

    @@gtid_executed

    );

    View full-size slide

  60. And yet the search for time
    metrics endures…
    %

    View full-size slide

  61. Questions?
    github.com/shlomi-noach
    @ShlomiNoach
    Thank you!

    View full-size slide