Slide 1

Slide 1 text

Hedged requests in Go Golang Warsaw #53 (autumn) 2023 Oleg Kovalov 󰑒 olegk.dev

Slide 2

Slide 2 text

Me 2

Slide 3

Slide 3 text

- Open-source addicted gopher Me 3

Slide 4

Slide 4 text

- Open-source addicted gopher - Fan of linters (co-author go-critic) Me 4

Slide 5

Slide 5 text

- Open-source addicted gopher - Fan of linters (co-author go-critic) - Father of a labrador Me 5

Slide 6

Slide 6 text

- Open-source addicted gopher - Fan of linters (co-author go-critic) - Father of a labrador - go-perf.dev - ... olegk.dev Me 6

Slide 7

Slide 7 text

- Intro - Magic - Result - Announcement - Conclusions Agenda 7

Slide 8

Slide 8 text

Intro 8 Client Server

Slide 9

Slide 9 text

Intro 9 Client Server latency

Slide 10

Slide 10 text

10 Latency Anyone can buy bandwidth. But latency is from the Gods. (c)

Slide 11

Slide 11 text

Intro 11 Client Server LB Client Client Client Server Server

Slide 12

Slide 12 text

Intro 12 Client Server LB Client Client Client Server Server *cough* service mesh *cough*

Slide 13

Slide 13 text

Intro 13 Client Server LB Client Client Client Server Server Server

Slide 14

Slide 14 text

Intro 14 Client Server LB Client Client Client Server Server Server

Slide 15

Slide 15 text

Intro 15 Client Server LB Client Client Client Server Server Server LB LB

Slide 16

Slide 16 text

Variability 16

Slide 17

Slide 17 text

- Deploy - Packet drop - Cache eviction - Queue overflow - Connection pool - Background jobs - Garbage collection - Windows Update - … Variability 17

Slide 18

Slide 18 text

18 Latency https://last9.io/blog/latency-slo/

Slide 19

Slide 19 text

19 Latency https://bravenewgeek.com/everything-you-know-about-latency-is-wrong/

Slide 20

Slide 20 text

20 SLA

Slide 21

Slide 21 text

Tail at scale 21

Slide 22

Slide 22 text

- Paper published by Googlers Tail at scale 22

Slide 23

Slide 23 text

- Paper published by Googlers - Jeffrey Dean, Luiz André Barroso - 2013 Tail at scale 23

Slide 24

Slide 24 text

- Paper published by Googlers - Jeffrey Dean, Luiz André Barroso - 2013 - More than just a hedged requests - See: https://www.barroso.org/publications/TheTailAtScale.pdf Tail at scale 24

Slide 25

Slide 25 text

- Paper published by Googlers - Jeffrey Dean, Luiz André Barroso - 2013 - More than just a hedged requests - See: https://www.barroso.org/publications/TheTailAtScale.pdf You are not Google https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb Tail at scale 25

Slide 26

Slide 26 text

- Paper published by Googlers - Jeffrey Dean, Luiz André Barroso - 2013 - More than just a hedged requests - See: https://www.barroso.org/publications/TheTailAtScale.pdf You are not Google https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb Tail at scale != Tailscale Tail at scale 26

Slide 27

Slide 27 text

Idea 27 Client Server Server Server LB

Slide 28

Slide 28 text

Idea 28 Client Server 1 Server Server

Slide 29

Slide 29 text

Idea 29 Client Server 1 Server Server 2

Slide 30

Slide 30 text

Idea 30 Client Server 1 Server Server 2 3

Slide 31

Slide 31 text

Idea 31 Client Server 1 Server Server 2 3 ✅

Slide 32

Slide 32 text

Idea 32 Client Server 1 Server Server 2 3 ✅ ❌ ❌

Slide 33

Slide 33 text

Existing implementations 33

Slide 34

Slide 34 text

- Github was empty! Existing implementations 34

Slide 35

Slide 35 text

- Github was empty! - Well, there were a few proof of concept Existing implementations 35

Slide 36

Slide 36 text

- Github was empty! - Well, there were a few proof of concept - But nothing production ready - at least what I was ready to pick up Existing implementations 36

Slide 37

Slide 37 text

cristalhq/hedgedhttp 37

Slide 38

Slide 38 text

- Dependency-free cristalhq/hedgedhttp 38

Slide 39

Slide 39 text

- Dependency-free - Perfectly aligns with net/http cristalhq/hedgedhttp 39

Slide 40

Slide 40 text

- Dependency-free - Perfectly aligns with net/http - Optimized for speed cristalhq/hedgedhttp 40

Slide 41

Slide 41 text

- Dependency-free - Perfectly aligns with net/http - Optimized for speed - Battle-tested cristalhq/hedgedhttp 41

Slide 42

Slide 42 text

Example 42 delay := 10 * time.Millisecond upto := 7 client := &http.Client{Timeout: time.Second} hedged, err := hedgedhttp.NewClient(delay, upto, client) if err != nil { panic(err) } // will take `upto` requests, with a `delay` between them resp, err := hedged.Do(req) if err != nil { panic(err) } defer resp.Body.Close()

Slide 43

Slide 43 text

Example 43 delay := 10 * time.Millisecond upto := 7 client := &http.Client{Timeout: time.Second} hedged, err := hedgedhttp.NewClient(delay, upto, client) if err != nil { panic(err) } // will take `upto` requests, with a `delay` between them resp, err := hedged.Do(req) if err != nil { panic(err) } defer resp.Body.Close()

Slide 44

Slide 44 text

Example 44 delay := 10 * time.Millisecond upto := 7 client := &http.Client{Timeout: time.Second} hedged, err := hedgedhttp.NewClient(delay, upto, client) if err != nil { panic(err) } // will take `upto` requests, with a `delay` between them resp, err := hedged.Do(req) if err != nil { panic(err) } defer resp.Body.Close()

Slide 45

Slide 45 text

Example 45 delay := 10 * time.Millisecond upto := 7 client := &http.Client{Timeout: time.Second} hedged, err := hedgedhttp.NewClient(delay, upto, client) if err != nil { panic(err) } // will take `upto` requests, with a `delay` between them resp, err := hedged.Do(req) if err != nil { panic(err) } defer resp.Body.Close()

Slide 46

Slide 46 text

Modern example 46 cfg := hedgedhttp.Config{ Transport: http.DefaultTransport, Upto: 3, Delay: 10 * time.Millisecond, Next: func() (upto int, delay time.Duration) { return magic() }, } client, err := hedgedhttp.New(cfg) if err != nil { panic(err) } resp, err := client.Do(req) if err != nil { panic(err) } defer resp.Body.Close()

Slide 47

Slide 47 text

Modern example 47 cfg := hedgedhttp.Config{ Transport: http.DefaultTransport, Upto: 3, Delay: 10 * time.Millisecond, Next: func() (upto int, delay time.Duration) { return magic() }, } client, err := hedgedhttp.New(cfg) if err != nil { panic(err) } resp, err := client.Do(req) if err != nil { panic(err) } defer resp.Body.Close()

Slide 48

Slide 48 text

Modern example 48 cfg := hedgedhttp.Config{ Transport: http.DefaultTransport, Upto: 3, Delay: 10 * time.Millisecond, Next: func() (upto int, delay time.Duration) { return magic() }, } client, err := hedgedhttp.New(cfg) if err != nil { panic(err) } resp, err := client.Do(req) if err != nil { panic(err) } defer resp.Body.Close()

Slide 49

Slide 49 text

hedgedhttp internals (1) 49 func (ht *hedgedTransport) RoundTrip(req *http.Request) (*http.Response, error) { mainCtx := req.Context() upto, timeout := ht.upto, ht.timeout if ht.next != nil { upto, timeout = ht.next() } // no hedged requests, just a regular one. if upto <= 0 { return ht.rt.RoundTrip(req) } // rollback to default timeout. if timeout < 0 { timeout = ht.timeout }

Slide 50

Slide 50 text

hedgedhttp internals (2) 50 for sent := 0; len(errOverall.Errors) < upto; sent++ { if sent < upto { idx := sent subReq, cancel := reqWithCtx(req, mainCtx, idx != 0) cancels[idx] = cancel runInPool(func() { resp, err := ht.rt.RoundTrip(subReq) if err != nil { ht.metrics.failedRoundTripsInc() errorCh <- err } else { resultCh <- indexedResp{idx, resp} } }) }

Slide 51

Slide 51 text

hedgedhttp internals (3) 51 // all request sent - effectively disabling timeout between requests if sent == upto { timeout = infiniteTimeout } resp, err := waitResult(mainCtx, resultCh, errorCh, timeout) switch { case resp.Resp != nil: resultIdx = resp.Index return resp.Resp, nil case mainCtx.Err() != nil: ht.metrics.canceledByUserRoundTripsInc() return nil, mainCtx.Err() case err != nil: errOverall.Errors = append(errOverall.Errors, err) }

Slide 52

Slide 52 text

hedgedhttp internals (4) 52 func waitResult(ctx context.Context, resultCh <-chan indexedResp, errorCh <-chan error, timeout time.Duration) (indexedResp, error) { // ... select { select { case res := <-resultCh: return res, nil case reqErr := <-errorCh: return indexedResp{}, reqErr case <-ctx.Done(): return indexedResp{}, ctx.Err() case <-timer.C: return indexedResp{}, nil // timeout BETWEEN consecutive requests } } }

Slide 53

Slide 53 text

Cute goroutine pool var taskQueue = make(chan func()) func runInPool(task func()) { select { case taskQueue <- task: // submitted, everything is ok default: go func() { task() cleanupTicker := time.NewTicker(cleanupDuration) for { select { case t := <-taskQueue: t(); cleanupTicker.Reset(cleanupDuration) case <-cleanupTicker.C: cleanupTicker.Stop(); return } } }() } } 53

Slide 54

Slide 54 text

Bonus 54

Slide 55

Slide 55 text

- Same pattern works for gRPC Bonus 55

Slide 56

Slide 56 text

- Same pattern works for gRPC - and any other RPC (if your API is sane) Bonus 56

Slide 57

Slide 57 text

- Same pattern works for gRPC - and any other RPC (if your API is sane) https://github.com/cristalhq/hedgedgrpc (waits for your ⭐) Bonus 57

Slide 58

Slide 58 text

Results 58

Slide 59

Slide 59 text

Google BigTable bench 59

Slide 60

Slide 60 text

> For example, in a Google benchmark that reads the values for 1,000 keys stored in a BigTable table distributed across 100 different servers, sending a hedging request after a 10ms delay reduces the 99.9th-percentile latency for retrieving all 1,000 values from 1,800ms to 74ms while sending just 2% more requests. Google BigTable bench 60 https://www.barroso.org/publications/TheTailAtScale.pdf

Slide 61

Slide 61 text

> For example, in a Google benchmark that reads the values for 1,000 keys stored in a BigTable table distributed across 100 different servers, sending a hedging request after a 10ms delay reduces the 99.9th-percentile latency for retrieving all 1,000 values from 1,800ms to 74ms while sending just 2% more requests. Google BigTable bench 61 https://www.barroso.org/publications/TheTailAtScale.pdf

Slide 62

Slide 62 text

62 DoltHub

Slide 63

Slide 63 text

See: https://www.dolthub.com/blog/2022-04-25-mitigating-3rd-party-variability/ 63 DoltHub

Slide 64

Slide 64 text

64 Grafana Tempo

Slide 65

Slide 65 text

65 Grafana Tempo See: https://grafana.com/blog/2021/08/27/grafana-tempo-1.1-released -new-hedged-requests-reduce-latency-by-45/

Slide 66

Slide 66 text

Silver bullet params 66

Slide 67

Slide 67 text

Announcement 67

Slide 68

Slide 68 text

No, two announcements. Announcement 68

Slide 69

Slide 69 text

go-distsys 69

Slide 70

Slide 70 text

- Reinventing a wheel go-distsys 70

Slide 71

Slide 71 text

- Reinventing a wheel - We should have better things go-distsys 71

Slide 72

Slide 72 text

- Reinventing a wheel - We should have better things - Everyone is welcome! https://github.com/go-distsys go-distsys 72

Slide 73

Slide 73 text

Go-perf meetup #1 73

Slide 74

Slide 74 text

- Go, performance, optimizations, concurrency - Have something to present? Go-perf meetup #1 74

Slide 75

Slide 75 text

- Go, performance, optimizations, concurrency - Have something to present? - See https://go-perf.dev/go-perf-meetup-1 Go-perf meetup #1 75

Slide 76

Slide 76 text

Conclusions 76

Slide 77

Slide 77 text

- Latency is hard but manageable Conclusions 77

Slide 78

Slide 78 text

- Latency is hard but manageable - Read papers - Do good stuff Conclusions 78

Slide 79

Slide 79 text

- Latency is hard but manageable - Read papers - Do good stuff - No silver bullet (sad but true) - go-distsys & go-perf 🚀 Conclusions 79

Slide 80

Slide 80 text

- The Tail at Scale - https://www.barroso.org/publications/TheTailAtScale.pdf - cristalhq/hedgedhttp (⭐ and “go get”) - https://github.com/cristalhq/hedgedhttp - go-distsys (click “Follow” & submit ideas!) - https://github.com/go-distsys References 80

Slide 81

Slide 81 text

Golang Warsaw team 👌 Grafana 🚀 Connectis_ 💕 Thanks 81

Slide 82

Slide 82 text

Thank you Questions? Email: [email protected] Telegram: @olegkovalov That’s all folks