$30 off During Our Annual Pro Sale. View Details »

Hedged requests in Go

Oleg Kovalov
September 20, 2023

Hedged requests in Go

Oleg Kovalov

September 20, 2023
Tweet

More Decks by Oleg Kovalov

Other Decks in Technology

Transcript

  1. Hedged requests in Go
    Golang Warsaw #53 (autumn) 2023
    Oleg Kovalov
    󰑒
    olegk.dev

    View Slide

  2. Me
    2

    View Slide

  3. - Open-source addicted gopher
    Me
    3

    View Slide

  4. - Open-source addicted gopher
    - Fan of linters (co-author go-critic)
    Me
    4

    View Slide

  5. - Open-source addicted gopher
    - Fan of linters (co-author go-critic)
    - Father of a labrador
    Me
    5

    View Slide

  6. - Open-source addicted gopher
    - Fan of linters (co-author go-critic)
    - Father of a labrador
    - go-perf.dev
    - ...
    olegk.dev
    Me
    6

    View Slide

  7. - Intro
    - Magic
    - Result
    - Announcement
    - Conclusions
    Agenda
    7

    View Slide

  8. Intro
    8
    Client Server

    View Slide

  9. Intro
    9
    Client Server
    latency

    View Slide

  10. 10
    Latency
    Anyone can buy bandwidth.
    But latency is from the Gods.
    (c)

    View Slide

  11. Intro
    11
    Client Server
    LB
    Client
    Client
    Client
    Server
    Server

    View Slide

  12. Intro
    12
    Client Server
    LB
    Client
    Client
    Client
    Server
    Server
    *cough* service mesh *cough*

    View Slide

  13. Intro
    13
    Client Server
    LB
    Client
    Client
    Client
    Server
    Server
    Server

    View Slide

  14. Intro
    14
    Client Server
    LB
    Client
    Client
    Client
    Server
    Server
    Server

    View Slide

  15. Intro
    15
    Client Server
    LB
    Client
    Client
    Client
    Server
    Server
    Server
    LB
    LB

    View Slide

  16. Variability
    16

    View Slide

  17. - Deploy
    - Packet drop
    - Cache eviction
    - Queue overflow
    - Connection pool
    - Background jobs
    - Garbage collection
    - Windows Update
    - …
    Variability
    17

    View Slide

  18. 18
    Latency
    https://last9.io/blog/latency-slo/

    View Slide

  19. 19
    Latency
    https://bravenewgeek.com/everything-you-know-about-latency-is-wrong/

    View Slide

  20. 20
    SLA

    View Slide

  21. Tail at scale
    21

    View Slide

  22. - Paper published by Googlers
    Tail at scale
    22

    View Slide

  23. - Paper published by Googlers
    - Jeffrey Dean, Luiz André Barroso
    - 2013
    Tail at scale
    23

    View Slide

  24. - Paper published by Googlers
    - Jeffrey Dean, Luiz André Barroso
    - 2013
    - More than just a hedged requests
    - See: https://www.barroso.org/publications/TheTailAtScale.pdf
    Tail at scale
    24

    View Slide

  25. - Paper published by Googlers
    - Jeffrey Dean, Luiz André Barroso
    - 2013
    - More than just a hedged requests
    - See: https://www.barroso.org/publications/TheTailAtScale.pdf
    You are not Google
    https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb
    Tail at scale
    25

    View Slide

  26. - Paper published by Googlers
    - Jeffrey Dean, Luiz André Barroso
    - 2013
    - More than just a hedged requests
    - See: https://www.barroso.org/publications/TheTailAtScale.pdf
    You are not Google
    https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb
    Tail at scale != Tailscale
    Tail at scale
    26

    View Slide

  27. Idea
    27
    Client
    Server
    Server
    Server
    LB

    View Slide

  28. Idea
    28
    Client
    Server
    1
    Server
    Server

    View Slide

  29. Idea
    29
    Client
    Server
    1
    Server
    Server
    2

    View Slide

  30. Idea
    30
    Client
    Server
    1
    Server
    Server
    2
    3

    View Slide

  31. Idea
    31
    Client
    Server
    1
    Server
    Server
    2
    3

    View Slide

  32. Idea
    32
    Client
    Server
    1
    Server
    Server
    2
    3



    View Slide

  33. Existing implementations
    33

    View Slide

  34. - Github was empty!
    Existing implementations
    34

    View Slide

  35. - Github was empty!
    - Well, there were a few proof of concept
    Existing implementations
    35

    View Slide

  36. - Github was empty!
    - Well, there were a few proof of concept
    - But nothing production ready
    - at least what I was ready to pick up
    Existing implementations
    36

    View Slide

  37. cristalhq/hedgedhttp
    37

    View Slide

  38. - Dependency-free
    cristalhq/hedgedhttp
    38

    View Slide

  39. - Dependency-free
    - Perfectly aligns with net/http
    cristalhq/hedgedhttp
    39

    View Slide

  40. - Dependency-free
    - Perfectly aligns with net/http
    - Optimized for speed
    cristalhq/hedgedhttp
    40

    View Slide

  41. - Dependency-free
    - Perfectly aligns with net/http
    - Optimized for speed
    - Battle-tested
    cristalhq/hedgedhttp
    41

    View Slide

  42. Example
    42
    delay := 10 * time.Millisecond
    upto := 7
    client := &http.Client{Timeout: time.Second}
    hedged, err := hedgedhttp.NewClient(delay, upto, client)
    if err != nil {
    panic(err)
    }
    // will take `upto` requests, with a `delay` between them
    resp, err := hedged.Do(req)
    if err != nil {
    panic(err)
    }
    defer resp.Body.Close()

    View Slide

  43. Example
    43
    delay := 10 * time.Millisecond
    upto := 7
    client := &http.Client{Timeout: time.Second}
    hedged, err := hedgedhttp.NewClient(delay, upto, client)
    if err != nil {
    panic(err)
    }
    // will take `upto` requests, with a `delay` between them
    resp, err := hedged.Do(req)
    if err != nil {
    panic(err)
    }
    defer resp.Body.Close()

    View Slide

  44. Example
    44
    delay := 10 * time.Millisecond
    upto := 7
    client := &http.Client{Timeout: time.Second}
    hedged, err := hedgedhttp.NewClient(delay, upto, client)
    if err != nil {
    panic(err)
    }
    // will take `upto` requests, with a `delay` between them
    resp, err := hedged.Do(req)
    if err != nil {
    panic(err)
    }
    defer resp.Body.Close()

    View Slide

  45. Example
    45
    delay := 10 * time.Millisecond
    upto := 7
    client := &http.Client{Timeout: time.Second}
    hedged, err := hedgedhttp.NewClient(delay, upto, client)
    if err != nil {
    panic(err)
    }
    // will take `upto` requests, with a `delay` between them
    resp, err := hedged.Do(req)
    if err != nil {
    panic(err)
    }
    defer resp.Body.Close()

    View Slide

  46. Modern example
    46
    cfg := hedgedhttp.Config{
    Transport: http.DefaultTransport,
    Upto: 3,
    Delay: 10 * time.Millisecond,
    Next: func() (upto int, delay time.Duration) {
    return magic()
    },
    }
    client, err := hedgedhttp.New(cfg)
    if err != nil {
    panic(err)
    }
    resp, err := client.Do(req)
    if err != nil {
    panic(err)
    }
    defer resp.Body.Close()

    View Slide

  47. Modern example
    47
    cfg := hedgedhttp.Config{
    Transport: http.DefaultTransport,
    Upto: 3,
    Delay: 10 * time.Millisecond,
    Next: func() (upto int, delay time.Duration) {
    return magic()
    },
    }
    client, err := hedgedhttp.New(cfg)
    if err != nil {
    panic(err)
    }
    resp, err := client.Do(req)
    if err != nil {
    panic(err)
    }
    defer resp.Body.Close()

    View Slide

  48. Modern example
    48
    cfg := hedgedhttp.Config{
    Transport: http.DefaultTransport,
    Upto: 3,
    Delay: 10 * time.Millisecond,
    Next: func() (upto int, delay time.Duration) {
    return magic()
    },
    }
    client, err := hedgedhttp.New(cfg)
    if err != nil {
    panic(err)
    }
    resp, err := client.Do(req)
    if err != nil {
    panic(err)
    }
    defer resp.Body.Close()

    View Slide

  49. hedgedhttp internals (1)
    49
    func (ht *hedgedTransport) RoundTrip(req *http.Request) (*http.Response, error) {
    mainCtx := req.Context()
    upto, timeout := ht.upto, ht.timeout
    if ht.next != nil {
    upto, timeout = ht.next()
    }
    // no hedged requests, just a regular one.
    if upto <= 0 {
    return ht.rt.RoundTrip(req)
    }
    // rollback to default timeout.
    if timeout < 0 {
    timeout = ht.timeout
    }

    View Slide

  50. hedgedhttp internals (2)
    50
    for sent := 0; len(errOverall.Errors) < upto; sent++ {
    if sent < upto {
    idx := sent
    subReq, cancel := reqWithCtx(req, mainCtx, idx != 0)
    cancels[idx] = cancel
    runInPool(func() {
    resp, err := ht.rt.RoundTrip(subReq)
    if err != nil {
    ht.metrics.failedRoundTripsInc()
    errorCh <- err
    } else {
    resultCh <- indexedResp{idx, resp}
    }
    })
    }

    View Slide

  51. hedgedhttp internals (3)
    51
    // all request sent - effectively disabling timeout between requests
    if sent == upto {
    timeout = infiniteTimeout
    }
    resp, err := waitResult(mainCtx, resultCh, errorCh, timeout)
    switch {
    case resp.Resp != nil:
    resultIdx = resp.Index
    return resp.Resp, nil
    case mainCtx.Err() != nil:
    ht.metrics.canceledByUserRoundTripsInc()
    return nil, mainCtx.Err()
    case err != nil:
    errOverall.Errors = append(errOverall.Errors, err)
    }

    View Slide

  52. hedgedhttp internals (4)
    52
    func waitResult(ctx context.Context, resultCh <-chan indexedResp, errorCh <-chan error,
    timeout time.Duration) (indexedResp, error) {
    // ...
    select {
    select {
    case res := <-resultCh:
    return res, nil
    case reqErr := <-errorCh:
    return indexedResp{}, reqErr
    case <-ctx.Done():
    return indexedResp{}, ctx.Err()
    case <-timer.C:
    return indexedResp{}, nil // timeout BETWEEN consecutive requests
    }
    }
    }

    View Slide

  53. Cute goroutine pool
    var taskQueue = make(chan func())
    func runInPool(task func()) {
    select {
    case taskQueue <- task: // submitted, everything is ok
    default:
    go func() {
    task()
    cleanupTicker := time.NewTicker(cleanupDuration)
    for {
    select {
    case t := <-taskQueue:
    t(); cleanupTicker.Reset(cleanupDuration)
    case <-cleanupTicker.C:
    cleanupTicker.Stop(); return
    }
    }
    }()
    }
    } 53

    View Slide

  54. Bonus
    54

    View Slide

  55. - Same pattern works for gRPC
    Bonus
    55

    View Slide

  56. - Same pattern works for gRPC
    - and any other RPC (if your API is sane)
    Bonus
    56

    View Slide

  57. - Same pattern works for gRPC
    - and any other RPC (if your API is sane)
    https://github.com/cristalhq/hedgedgrpc (waits for your ⭐)
    Bonus
    57

    View Slide

  58. Results
    58

    View Slide

  59. Google BigTable bench
    59

    View Slide

  60. > For example, in a Google benchmark that reads the values for 1,000 keys
    stored in a BigTable table distributed across 100 different servers, sending a
    hedging request after a 10ms delay reduces the 99.9th-percentile latency for
    retrieving all 1,000 values from 1,800ms to 74ms while sending just 2% more
    requests.
    Google BigTable bench
    60
    https://www.barroso.org/publications/TheTailAtScale.pdf

    View Slide

  61. > For example, in a Google benchmark that reads the values for 1,000 keys
    stored in a BigTable table distributed across 100 different servers, sending a
    hedging request after a 10ms delay reduces the 99.9th-percentile latency for
    retrieving all 1,000 values from 1,800ms to 74ms while sending just 2% more
    requests.
    Google BigTable bench
    61
    https://www.barroso.org/publications/TheTailAtScale.pdf

    View Slide

  62. 62
    DoltHub

    View Slide

  63. See:
    https://www.dolthub.com/blog/2022-04-25-mitigating-3rd-party-variability/
    63
    DoltHub

    View Slide

  64. 64
    Grafana Tempo

    View Slide

  65. 65
    Grafana Tempo
    See:
    https://grafana.com/blog/2021/08/27/grafana-tempo-1.1-released
    -new-hedged-requests-reduce-latency-by-45/

    View Slide

  66. Silver bullet params
    66

    View Slide

  67. Announcement
    67

    View Slide

  68. No, two announcements.
    Announcement
    68

    View Slide

  69. go-distsys
    69

    View Slide

  70. - Reinventing a wheel
    go-distsys
    70

    View Slide

  71. - Reinventing a wheel
    - We should have better things
    go-distsys
    71

    View Slide

  72. - Reinventing a wheel
    - We should have better things
    - Everyone is welcome!
    https://github.com/go-distsys
    go-distsys
    72

    View Slide

  73. Go-perf meetup #1
    73

    View Slide

  74. - Go, performance, optimizations, concurrency
    - Have something to present?
    Go-perf meetup #1
    74

    View Slide

  75. - Go, performance, optimizations, concurrency
    - Have something to present?
    - See https://go-perf.dev/go-perf-meetup-1
    Go-perf meetup #1
    75

    View Slide

  76. Conclusions
    76

    View Slide

  77. - Latency is hard but manageable
    Conclusions
    77

    View Slide

  78. - Latency is hard but manageable
    - Read papers
    - Do good stuff
    Conclusions
    78

    View Slide

  79. - Latency is hard but manageable
    - Read papers
    - Do good stuff
    - No silver bullet (sad but true)
    - go-distsys & go-perf 🚀
    Conclusions
    79

    View Slide

  80. - The Tail at Scale
    - https://www.barroso.org/publications/TheTailAtScale.pdf
    - cristalhq/hedgedhttp (⭐ and “go get”)
    - https://github.com/cristalhq/hedgedhttp
    - go-distsys (click “Follow” & submit ideas!)
    - https://github.com/go-distsys
    References
    80

    View Slide

  81. Golang Warsaw team 👌
    Grafana 🚀
    Connectis_ 💕
    Thanks
    81

    View Slide

  82. Thank you
    Questions?
    Email: [email protected]
    Telegram: @olegkovalov
    That’s all folks

    View Slide