Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Twemproxy

 Twemproxy

twemproxy (pronounced "two-em-proxy"), aka nutcracker, is a fast and lightweight proxy for memcached and redis protocol. It was primarily built to reduce the connection count on the backend caching servers. Twemproxy was created within Twitter to initially support Memchace with Redis support being added 4 months ago

Cd050d965a88cbfe6c766091ce5c05a6?s=128

Justin Mares

April 19, 2013
Tweet

Transcript

  1. Twemproxy (nutcracker) Manju Rajashekhar @manju

  2. Twemproxy (nutcracker) Proxy for redis and memcache

  3. Motivation Unicorn C C Unicorn m >> n => 20

    mn Unicorn m n
  4. Deployed as Local Proxy U C C U m n

    m >> n => m N N U N
  5. Deployed as Remote Proxy U C C U m n

    m >> n N N U N => n N
  6. Fault Tolerance with Remote Proxy U C C U m

    n m >> n => n N N U N N N N
  7. Features Fast & lightweight Persistent server connections Protocol pipelining Shards

    data automatically across multiple server Supports multiple cache pools simultaneously Supports ketama aka consistent hashing Configuration through YAML file Disable nodes on failures
  8. get k1 get k2 delete k3 time Non pipelined req

    - rsp
  9. Pipelined req-rsp get k1 get k2 delete k3 time

  10. Pipelining N U T C R A C K E

    R get k1 delete k3, get k2, get k1 get k2 delete k3 Tradeoff latency for throughput
  11. Fault Tolerance P S P S time C P C

    C -ERR Connection refused\r\n S get key get key or -ERR Connection timed out\r\n t0 t1 t2 => Client retries on -ERR response!
  12. Client retries on -ERR responses -ERR Connection refused\r\n (errno =

    ECONNREFUSED) -ERR Invalid argument\r\n (errno = EINVAL) -ERR Connection timed out\r\n (errno = ETIMEDOUT) -ERR Host is down\r\n (errno = EHOSTDOWN) -ERR Connection reset by peer\r\n (errno = ECONNRESET) ...
  13. Retries with slow server P S time C S get

    key t0 t1 t2 P C get key P C get key P C get key S S stuck-up server :( 99p > 30 ms client-timeout = 30 ms get key get key get key get key Outstanding req Q growing tn
  14. Solution: Use “timeout:” config P S time C t0 t1

    t2 P C get key S stuck-up server :( get key 1. Proxy times out request 2. Set timeout: to expected 99p or 999p latency 3. Set client-side timeout > timeout: => Outstanding req Q bounded! P C S -ERR connection timed out
  15. Re-routing on failures P P P P S1 S2 S3

    S4 S5 S6 ring-size = 6
  16. Re-routing on failures P P P P S1 S2 S3

    S4 S5 S6 ring-size = 5 Compute global view from local information ring-size = 6
  17. Rerouting on failures P S P S back off by

    server_retry_timeout: retry any new request on ejected server after server_retry_timeout: time redis-cache-pool: auto_eject_hosts: true server_retry_timeout: 30000 server_failure_limit: 3 timeout: 400
  18. Succeeding on failures P S server_failure_limit: 3 C S’ retires:

    3 t0: Server - S dies t1: Query is rerouted to S’ after server_failure_limit tries => client-side retries >= server_failure_limit for success => set TTL on items
  19. Simultaneous failures S1 S2 S3 S4 S5 S6 Global view

    is fragmented => Solun: large server_retry_timeout: P P P P cluster-size = 5? cluster-size = 4? cluster-size = 3?
  20. Graphing ejected servers (server_err + server_timedout + server_eof) / server_failure_limit

  21. hash_tag: “{}” Command with multiple keys? Eg: MGET foo bar

    Solun: MGET foo, MGET bar Eg: SINTER foo bar Solun: SINTER foo{tag} bar{tag} Read “notes/redis.md” for details on supported commands
  22. Deployment Checklist ... Logging enabled to LOG_INFO / LOG_NOTICE for

    debugging? Are exposed stats being collected? Is timeout: set? Is redis pool used has a cache? -> set auto_eject_hosts: true Is redis pool used as a data-store? - set auto_eject_hosts: false Is server_retry_timeout: value reasonable to your app domain? Is swap enabled on redis server? Are you ok with high latency variance?
  23. Deployment Checklist Have you tested your setup for resiliency to:

    Permanent failures: kill / reboot machines Transient failures: SIGSTOP redis / drop packets using iptables What’s your strategy for updating configuration? What is the value of -m (--mbuf-size)? What is your max key length? Is it less than --mbuf-size? How many connections is your proxy meant to handle? - file descriptor limit: “ulimit -n” Are you using commands with multiple keys? If so, is hash_tag configured?
  24. Why Twemproxy? Persistent server connections faster client restarts filter close

    from client Protocol pipelining Enables simple and dumb clients Hides semantics of underlying cache pool Easy configuration Automatic sharding and fault-tolerance capability
  25. Why not Proxy? Extra network hop Tradeoff latency for throughput

    Pipelining is your friend
  26. Open Problems Elasticity Programmatically contract and expand caches Availability Data

    replication as a first class primitive
  27. Questions? @manju

  28. Twemproxy Internals

  29. Core Event Loop for (;;) { wait(&event); process(&event); } event

    : connection = 1:1 I/O - non-blocking ET vs LT main
  30. Modules core engine connection server proxy client mbuf messages parser

    req rsp
  31. mbuf /* * mbuf header is at the tail end

    of the mbuf. This enables us to catch * buffer overrun early by asserting on the magic value during get or * put operations * * <------------- mbuf_chunk_size -------------> * +-------------------------------------------+ * | mbuf data | mbuf header | * | (mbuf_offset) | (struct mbuf) | * +-------------------------------------------+ * ^ ^ ^ ^^ * | | | || * \ | | |\ * mbuf->start \ | | mbuf->end (one byte past valid bound) * mbuf->pos \ * \ mbuf * mbuf->last (one byte past valid byte) * */
  32. Chain of handlers Chain of processing handlers Takes an input

    message and produces output for the next handler in the chain Unix pipes: cat foo.txt | tr -s “ “ | cut -d “ “ -f 2 Filter Manipulates output produced by a handler, possibly short circuiting the chain Forwarder Chooses one of the backend server to send the request to Req | Filter-1 | Filter-2 | Forwarder
  33. Chain of handlers read_event req_recv req_filter* req_forward read_event rsp_recv rsp_filter*

    rsp_forward write_event req_send write_event rsp_send Proxy req req rsp rsp Client Server Q: req_1 <- req_2 <- req_3 ...
  34. Stats Collection main stats connect localhost:22222 <stats> a 0 b

    0 ... ... += flag = 1 a 0 b 0 ... ... a 0 b 0 ... ...
  35. Stats Collection main stats connect localhost:22222 <stats> a 1 b

    5 ... ... += flag = 1 a 0 b 0 ... ... a 0 b 0 ... ...
  36. Stats Collection main stats connect localhost:22222 <stats> a 0 b

    0 ... ... += flag = 0 a 1 b 5 ... ... a 0 b 0 ... ...
  37. Stats Collection main stats connect localhost:22222 <stats> a 4 b

    1 ... ... += flag = 1 a 0 b 0 ... ... a 1 b 5 ... ...
  38. Stats Collection main stats connect localhost:22222 <stats> a 0 b

    1 ... ... += flag = 0 a 4 b 1 ... ... a 1 b 5 ... ...