Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Caching at Twitter with Twemcache Twitter Open Source Summit

manj
July 26, 2012

Caching at Twitter with Twemcache Twitter Open Source Summit

This talk is about Twemcache - the Twitter memcached. In this talk, we'll briefly go over the architecture of the caching system at Twitter and delve into details of Twemcache, which is one of the fundamental building blocks of our caching system. We'll discuss the new features in Twemcache, namely custom eviction algorithm, thread-local stats collection and command logger and the motivations behind them.

manj

July 26, 2012
Tweet

Other Decks in Technology

Transcript

  1. Twitter Inc. | @twemcache Twemcache Manju Rajashekhar (@manju) Yao Yue

    (@thinkingfish) github.com/twitter/twemcache
  2. Twitter Inc. | @twemcache Cache In Production ~30 TB of

    cache > 2000 instances of twemcache ~500 machines Avg cache instance is 15G ~ 2 trillion queries/day 23 million queries/sec
  3. Twitter Inc. | @twemcache Cache Systems Cache is an optimization

    for CPU Disk (write through / write back)
  4. Twitter Inc. | @twemcache Caching System Simple Components Client: finagle-memcached,

    finagle-redis Proxy: twemproxy, cproxy Server: twemcache, redis Complex System
  5. Twitter Inc. | @twemcache Client, Proxy & Server C C

    C P P P S S S S C C C m m’ n m >> n m’ < n
  6. Twitter Inc. | @twemcache Eviction (3) B1 B2 B3 Per

    Slabclass LRU Eviction = calcification, OOM
  7. Twitter Inc. | @twemcache Slab Eviction B1 B2 B3 Slab

    Eviction = deterministic behavior
  8. Twitter Inc. | @twemcache Thread-local stats collector Metrics Types counter,

    gauge, timestamp Metrics Category command related: 44 item_*: 9, slab_*: 10, conn_*: 5, data_*: 2, klog_*: 3 Metrics Granularity global, slab (auto aggregation of slab-level stats at global level) a 2 b 3 ... ... worker1 a 4 b 5 ... ... worker2 a 6 b 8 ... ... stats
  9. Twitter Inc. | @twemcache Adding new stats src/mc_stats.h: #define STATS_THREAD_METRICS(ACTION)

    ACTION( conn_disabled, STATS_COUNTER, "# times accepting connections was disabled") ACTION( conn_total, STATS_COUNTER, "# connections created until now") ACTION( conn_struct, STATS_COUNTER, "# new connection objects created") ACTION( conn_yield, STATS_COUNTER, "# times we yielded from an active connection") ACTION( conn_curr, STATS_GAUGE, "# active connections") ACTION( conn_uds, STATS_GAUGE, “# active unix domain sockets”) ... src/mc_core.c: stats_thread_incr(conn_disabled); stats_thread_decr(conn_curr);
  10. Twitter Inc. | @twemcache Command Logger Log Format 172.25.135.205:55438 -

    [09/Jul/2012:18:15:45 -0700] "set foo 0 0 3" 1 6 172.25.135.205:55438 - [09/Jul/2012:18:15:46 -0700] "get foo" 0 14 172.25.135.205:55438 - [09/Jul/2012:18:15:57 -0700] "incr bar 1" 3 9 172.25.135.205:55438 - [09/Jul/2012:18:16:05 -0700] "set bar 0 0 1" 1 6 172.25.135.205:55438 - [09/Jul/2012:18:16:09 -0700] "incr bar 1" 0 1 172.25.135.205:55438 - [09/Jul/2012:18:16:13 -0700] "get bar" 0 12 .... Client IP Timestamp Type Key Status Size
  11. Twitter Inc. | @twemcache Examples Keys accessed/updated/retrieved in the past

    24hrs - What data is hot and what is not? - What should the heap size be to cache for 24 hours worth of data? How many times and when is a key retrieved/updated after insertion? - Explains why hit rate is so - Determine a reasonable TTL - Helps construct a heat map to decide cache size / hit rate trade off What’s the stats per namespace? (“foo:” vs “bar:”) - Does co-habitat make sense?
  12. Twitter Inc. | @twemcache Logging Twemcache running with log level

    8 (verb): Req - “set mcp:000000001 0 0 1\r\n”v\r\n” [Thu Jul 26 12:46:36 2012] mc_thread.c:449 accepted c 36 from '127.0.0.1:57036' on tid 0 [Thu Jul 26 12:46:36 2012] mc_core.c:213 recv on c 36 28 of 2048 [Thu Jul 26 12:46:36 2012] mc_ascii.c:1494 recv on c 36 req with 23 bytes 00000000 73 65 74 20 6d 63 70 3a 30 30 30 30 30 30 30 31 |set mcp:00000001| 00000010 20 30 20 30 20 31 20 | 0 0 1 | [Thu Jul 26 12:46:36 2012] mc_slabs.c:308 new slab 0xb3c77008 allocated at pos 0 [Thu Jul 26 12:46:36 2012] mc_slabs.c:638 get new it at offset 40 with id 1 [Thu Jul 26 12:46:36 2012] mc_items.c:405 alloc it 'mcp:00000001' at offset 40 with id 1 expiry 0 refcount 1 [Thu Jul 26 12:46:36 2012] mc_items.c:653 get it 'mcp:00000001' not found [Thu Jul 26 12:46:36 2012] mc_items.c:440 link it 'mcp:00000001' at offset 40 with flags 02 id 1 [Thu Jul 26 12:46:36 2012] mc_items.c:504 remove it 'mcp:00000001' at offset 40 with flags 03 id 1 refcount 1 [Thu Jul 26 12:46:36 2012] mc_core.c:213 recv on c 36 0 of 2048 [Thu Jul 26 12:46:36 2012] mc_core.c:227 recv on c 36 eof [Thu Jul 26 12:46:36 2012] mc_core.c:434 close c 36