Slide 1

Slide 1 text

Twitter Inc. | @twemcache Twemcache Manju Rajashekhar (@manju) Yao Yue (@thinkingfish) github.com/twitter/twemcache

Slide 2

Slide 2 text

Twitter Inc. | @twemcache Cache In Production ~30 TB of cache > 2000 instances of twemcache ~500 machines Avg cache instance is 15G ~ 2 trillion queries/day 23 million queries/sec

Slide 3

Slide 3 text

Twitter Inc. | @twemcache Cache Systems Cache is an optimization for CPU Disk (write through / write back)

Slide 4

Slide 4 text

Twitter Inc. | @twemcache Caching System Simple Components Client: finagle-memcached, finagle-redis Proxy: twemproxy, cproxy Server: twemcache, redis Complex System

Slide 5

Slide 5 text

Twitter Inc. | @twemcache Client, Proxy & Server C C C P P P S S S S C C C m m’ n m >> n m’ < n

Slide 6

Slide 6 text

Twitter Inc. | @twemcache Twemcache Based on memcached 1.4.4 Running in production since Jan ’11

Slide 7

Slide 7 text

Twitter Inc. | @twemcache Features Custom Eviction Algorithm Thread-local stats collector Command Logger

Slide 8

Slide 8 text

Twitter Inc. | @twemcache Eviction (1) New Item LRU Eviction

Slide 9

Slide 9 text

Twitter Inc. | @twemcache Eviction (2) New Item Items of different sizes

Slide 10

Slide 10 text

Twitter Inc. | @twemcache Eviction (3) B1 B2 B3 Per Slabclass LRU Eviction = calcification, OOM

Slide 11

Slide 11 text

Twitter Inc. | @twemcache Slab Eviction B1 B2 B3 Slab Eviction = deterministic behavior

Slide 12

Slide 12 text

Twitter Inc. | @twemcache Thread-local stats collector Metrics Types counter, gauge, timestamp Metrics Category command related: 44 item_*: 9, slab_*: 10, conn_*: 5, data_*: 2, klog_*: 3 Metrics Granularity global, slab (auto aggregation of slab-level stats at global level) a 2 b 3 ... ... worker1 a 4 b 5 ... ... worker2 a 6 b 8 ... ... stats

Slide 13

Slide 13 text

Twitter Inc. | @twemcache Adding new stats src/mc_stats.h: #define STATS_THREAD_METRICS(ACTION) ACTION( conn_disabled, STATS_COUNTER, "# times accepting connections was disabled") ACTION( conn_total, STATS_COUNTER, "# connections created until now") ACTION( conn_struct, STATS_COUNTER, "# new connection objects created") ACTION( conn_yield, STATS_COUNTER, "# times we yielded from an active connection") ACTION( conn_curr, STATS_GAUGE, "# active connections") ACTION( conn_uds, STATS_GAUGE, “# active unix domain sockets”) ... src/mc_core.c: stats_thread_incr(conn_disabled); stats_thread_decr(conn_curr);

Slide 14

Slide 14 text

Twitter Inc. | @twemcache Command Logger Log Format 172.25.135.205:55438 - [09/Jul/2012:18:15:45 -0700] "set foo 0 0 3" 1 6 172.25.135.205:55438 - [09/Jul/2012:18:15:46 -0700] "get foo" 0 14 172.25.135.205:55438 - [09/Jul/2012:18:15:57 -0700] "incr bar 1" 3 9 172.25.135.205:55438 - [09/Jul/2012:18:16:05 -0700] "set bar 0 0 1" 1 6 172.25.135.205:55438 - [09/Jul/2012:18:16:09 -0700] "incr bar 1" 0 1 172.25.135.205:55438 - [09/Jul/2012:18:16:13 -0700] "get bar" 0 12 .... Client IP Timestamp Type Key Status Size

Slide 15

Slide 15 text

Twitter Inc. | @twemcache Examples Keys accessed/updated/retrieved in the past 24hrs - What data is hot and what is not? - What should the heap size be to cache for 24 hours worth of data? How many times and when is a key retrieved/updated after insertion? - Explains why hit rate is so - Determine a reasonable TTL - Helps construct a heat map to decide cache size / hit rate trade off What’s the stats per namespace? (“foo:” vs “bar:”) - Does co-habitat make sense?

Slide 16

Slide 16 text

Twitter Inc. | @twemcache Logging Twemcache running with log level 8 (verb): Req - “set mcp:000000001 0 0 1\r\n”v\r\n” [Thu Jul 26 12:46:36 2012] mc_thread.c:449 accepted c 36 from '127.0.0.1:57036' on tid 0 [Thu Jul 26 12:46:36 2012] mc_core.c:213 recv on c 36 28 of 2048 [Thu Jul 26 12:46:36 2012] mc_ascii.c:1494 recv on c 36 req with 23 bytes 00000000 73 65 74 20 6d 63 70 3a 30 30 30 30 30 30 30 31 |set mcp:00000001| 00000010 20 30 20 30 20 31 20 | 0 0 1 | [Thu Jul 26 12:46:36 2012] mc_slabs.c:308 new slab 0xb3c77008 allocated at pos 0 [Thu Jul 26 12:46:36 2012] mc_slabs.c:638 get new it at offset 40 with id 1 [Thu Jul 26 12:46:36 2012] mc_items.c:405 alloc it 'mcp:00000001' at offset 40 with id 1 expiry 0 refcount 1 [Thu Jul 26 12:46:36 2012] mc_items.c:653 get it 'mcp:00000001' not found [Thu Jul 26 12:46:36 2012] mc_items.c:440 link it 'mcp:00000001' at offset 40 with flags 02 id 1 [Thu Jul 26 12:46:36 2012] mc_items.c:504 remove it 'mcp:00000001' at offset 40 with flags 03 id 1 refcount 1 [Thu Jul 26 12:46:36 2012] mc_core.c:213 recv on c 36 0 of 2048 [Thu Jul 26 12:46:36 2012] mc_core.c:227 recv on c 36 eof [Thu Jul 26 12:46:36 2012] mc_core.c:434 close c 36

Slide 17

Slide 17 text

Twitter Inc. | @twemcache Future Bootstrapping twemcache from key-value pairs in a file Performance: Fine grained cache_lock

Slide 18

Slide 18 text

Twitter Inc. | @twemcache Twemcache github.com/twitter/twemcache @twemcache