Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Redis Bedtime Stories

Redis Bedtime Stories

3 Redis short stories. 30000 requests per second. 3 lessons learned.

Igor Wiedler

August 02, 2023
Tweet

More Decks by Igor Wiedler

Other Decks in Technology

Transcript

  1. Agenda • Refresher • Story 1: Microbursts • Story 2:

    Regression • Story 3: Kubernetes • Takeaways
  2. Numbers Everyone Should Know L1 cache reference 0.5 ns Branch

    mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes with Zippy 3,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns Jeff De a n, LADIS 2009 Keynote
  3. • Clients see periodic performance degradation. • Sometimes it breaches

    the alert threshold, sometimes it doesn't. • Nothing obvious. • Investigation begins.
  4. $ for i in {1..600} do sleep 1 echo -n

    "$(date +'%Y-%m-%d %H:%M:%S.%N %Z')" redis-cli info stats \ | grep -w 'instantaneous_ops_per_sec' done M a tt Smiley
  5. $ sudo tcpdump \ -v \ -G 60 \ -w

    $(hostname -s).inbound.%Y%m%d_%H%M%S.pcap \ 'dst port 6379' M a tt Smiley
  6. $ find tcpflow/ -name '*.06379.findx' \ | xargs -P 8

    -n 100 redis_trace_cmd \ > trace_inbound_redis_commands.out M a tt Smiley
  7. $ sudo perf record -ag -F 497 -- sleep 120

    $ sudo perf script --header \ | stackcollapse-perf.pl --kernel \ | grep redis-server \ | flamegraph.pl --hash --colors=perl \ > flamegraph.svg
  8. • When tra ffi c reaches a Kubernetes node, it

    is handled the same way, regardless of the type of load balancer. • The load balancer is not aware of which nodes in the cluster are running Pods for its Service. • Instead, it balances tra ff i c across all nodes in the cluster, even those not running a relevant Pod.
  9. • Story 1: Microbursts • Aggregation intervals hide burstiness •

    Story 2: Regression • Sometimes upgrades make things worse • Story 3: Kubernetes • Production does not care about your readiness review
  10. References • Microburst analysis • gitlab.com/gitlab-com/gl-infra/reliability/-/issues/9420 • Performance regression in

    BRPOP • github.com/redis/redis/issues/8668 • github.com/redis/redis/pull/8689 • Packet processing overhead on Kubernetes • gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1985 • Further reading • github.com/redis/redis/issues/7071 • about.gitlab.com/blog/2022/11/28/how-we-diagnosed-and-resolved-redis-latency-spikes • gitlab.com/gitlab-com/runbooks/-/blob/master/scripts/redis_trace_cmd.rb