Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Smarter Caching

Neha
May 13, 2013

Smarter Caching

Application-level caches are incredibly useful, but difficult to use well on structured, changing data. In this talk I discuss why you might want to consider moving application computation into the cache in the form of cache joins, and I describe our system, Pequod, which is an in-memory key/value range store which implements these ideas.

Presented at RICON East, New York, NY.

Neha

May 13, 2013
Tweet

More Decks by Neha

Other Decks in Programming

Transcript

  1. Smarter  Caching  With  Pequod   Yandong  Mao   Neha  Narula

      Robert  Morris   Bryan  Kate   Michael  Kester   Eddie  Kohler   Neha  Narula   May  13,  2013   @neha   1  
  2. ApplicaGon  ComputaGon   6   show_timeline(user): get user-timeline if present

    return timeline else get user-following list if not present query db user-following list put user-following list in cache for each f in user-following list get f-posts if not present query db f-posts put f-posts in cache posts = posts + f-posts user-timeline = sort posts put user-timeline into cache return user-timeline post(poster, tweet): insert tweet into database append tweet to poster-posts get poster-followers if not present query db put followers in cache for each f in followers: get f-timeline if present update timeline lock put new timeline into cache unlock
  3. Pequod   •  In  memory  key/value  range  store   scan(k0

    ,k1 )   get(k)   put(k,v) install_join(cache_join) •  App  developer  specifies  cache  joins   •  ComputaGon  on  demand  (or  not)     10  
  4. What  Pequod  Can  Do  With  Cache  Joins   compute  results

     automaGcally   subscribe  to  updates  in  a  range  of  data   keep  cached  results  fresh   easily  use  different  caching  strategies   interleave  different  types  of  data   11  
  5. Compute  Twi]er  Timeline   neha   argv0   neha’s  

    Timeline   Time   basho   jusGnbieber   ladygaga   Posts   SubscripGons   SELECT * FROM posts, subscriptions WHERE posts.poster = subscriptions.poster AND subscriptions.user = “neha” AND ts < posts.timestamp ORDER BY timestamp DESC 12  
  6. All  Timelines  in  Pequod   olive’s   Gmeline   peter’s

      Gmeline   14   max’s   Gmeline   neha’s   Gmeline  
  7. [ table | column | column … ] posts |

    <poster> | <timestamp> subscriptions | <user> | <poster> 15   Pequod  keys  embed  table  names   and  column  values  
  8. Post  Keys   p  |  jusGnbieber  |  207   Do

     you  belieb?   posts | <poster> | <timestamp> 16  
  9. SubscripGon  Key  Range   subscriptions | <user> | <poster> 17

      s  |  neha  |  basho           s  |  neha  |  jusGnbieber   s  |  neha  |  ladygaga  
  10. Timeline  Keys   t  |  neha  |207  |  jusGnbieber  

    Do  you  belieb?   timeline | <user> | <timestamp> | <poster> 18  
  11. Twi]er  Timeline  Cache  Join   Pequod.install_join( “ t|<user>|<ts>|<poster> = copy

    p|<poster>|<ts> using s|<user>|<poster> ”) 19   5melines  =  posts  X  subscrip5ons  
  12. t|<user>|<ts>|<poster> = copy p|<poster>|<ts> using s|<user>|<poster> Twi]er  Timeline  Cache  Join

      Seconday   Source   Sink   20   s|neha|argv0   s|neha|basho   s|neha|jusGnbieber   s|neha|ladygaga   p|argv0|101   p|basho|104   p|jusGnbieber|123   p|ladygaga|198   t|neha   Primary   Source  
  13. t|max|201   t|olive|99   Ask  For  A  Range,  Any  Range

      scan(“t|neha|100”, “t|neha|+”) Where   neha’s   Gmeline   should  be   22  
  14. Use  Lookup  Source   using s|neha|<poster> neha’s   subscripGons  

    s|neha|argv0   s|neha|basho   s|neha|jusGnbieber   s|neha|ladygaga   23  
  15. Find  Primary  Source   using s|neha|argv0 s|neha|argv0   s|neha|basho  

    s|neha|jusGnbieber   s|neha|ladygaga   p|argv0|101   p|argv0|104   p|argv0|115   tweet   tweet   tweet   argv0’s  posts   24  
  16. Copy  Primary  Source   copy p|argv0|101 neha’s   Gmeline  

    t|neha|104|argv0   t|neha|115|argv0   t|neha|101|argv0   tweet   tweet   tweet   25  
  17. Use  Lookup  Source   using s|neha|basho s|neha|argv0   s|neha|basho  

    s|neha|jusGnbieber   s|neha|ladygaga   p|basho|102   tweet   basho’s  posts   26  
  18. Copy  Primary  Source   copy p|basho|102 neha’s   Gmeline  

    t|neha|104|argv0   t|neha|115|argv0   t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   27  
  19. Use  Lookup  Source   using s|neha|justinbieber s|neha|argv0   s|neha|basho  

    s|neha|jusGnbieber   s|neha|ladygaga   p|jusGnbieber|207   tweet   jusGnbieber’s  posts   28  
  20. Copy  Primary  Source   copy p|justinbieber|207 neha’s   Gmeline  

    t|neha|104|argv0   t|neha|115|argv0   t|neha|207|jusGnbieber   t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   tweet   29  
  21. Use  Lookup  Source   using s|neha|ladygaga s|neha|argv0   s|neha|basho  

    s|neha|jusGnbieber   s|neha|ladygaga   p|ladygaga|209   tweet   ladygaga’s  posts   30  
  22. Copy  Primary  Source   copy p|ladygaga|209 neha’s   Gmeline  

    t|neha|104|argv0   t|neha|115|argv0   t|neha|207|jusGnbieber   t|neha|209|ladygaga   t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   tweet   tweet   31  
  23. Cached  Timeline   neha’s   Gmeline   t|neha|104|argv0   t|neha|115|argv0

      t|neha|207|jusGnbieber   t|neha|209|ladygaga   t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   tweet   tweet   35  
  24. Update,  New  Post   put(“p|basho|215”, “At RICON East WOOO!”) p|basho|102

      tweet   basho’s  posts   p|basho|215   At  RICON  East   WOOO!   Update   t|neha   36  
  25. Copy  New  Post   t|<user>|<ts>|<poster> = copy p|<poster>|<ts> using s|<user>|<poster>

    neha’s   Gmeline   t|neha|104|argv0   t|neha|115|argv0   t|neha|207|jusGnbieber   t|neha|209|ladygaga   t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   tweet   tweet   t|neha|215|basho   At  RICON  East  WOOO!   37   Update   t|neha  
  26. Update,  New  SubscripGon   put(“s|neha|xexd”, None) neha’s   subscripGons  

    s|neha|argv0   s|neha|basho   s|neha|jusGnbieber   s|neha|ladygaga   s|neha|xexd   Invalidate   t|neha   38  
  27. Invalidate  Sink   neha’s   Gmeline   t|neha|104|argv0   t|neha|115|argv0

      t|neha|207|jusGnbieber   t|neha|209|ladygaga   t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   tweet   tweet   t|neha|215|basho   At  RICON  East  WOOO!   39   Invalidate   t|neha  
  28. Invalidate  Sink   t|neha|104|argv0   t|neha|115|argv0   t|neha|207|jusGnbieber   t|neha|209|ladygaga

      t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   tweet   tweet   t|neha|215|basho   At  RICON  East  WOOO!   40   s|neha|xexd  
  29. t|neha|104|argv0   t|neha|115|argv0   t|neha|207|jusGnbieber   t|neha|209|ladygaga   t|neha|102|basho  

    t|neha|101|argv0   tweet   tweet   tweet   tweet   tweet   tweet   41   s|neha|xexd   scan(“t|neha|209”, “t|neha|+”) t|neha|213|xexd   tweet   t|neha|215|basho   At  RICON  East  WOOO!   scan  
  30. t|neha|209|ladygaga   tweet   t|neha|213|xexd   42   t|neha|215|basho  

    At  RICON  East  WOOO!   tweet   s|neha|xexd   scan(“t|neha|209”, “t|neha|+”) tweet   tweet   tweet   tweet   tweet   t|neha|104|argv0   t|neha|115|argv0   t|neha|207|jusGnbieber   t|neha|102|basho   t|neha|101|argv0   scan  
  31. Updaters   •  On  put(),  updater  gets  source  key,  new

      value,  old  value   •  Updaters  on  primary  sources  immediately   update  the  relevant  sink  keys   •  Updates  on  secondary  sources  log  update  on   sink  and  invalidate  sink  keys,  to  be  fixed  up  on   next  scan   43  
  32. Twi]er  Timeline  Cache  Join   Pequod.install_join( “ t|<user>|<ts>|<poster> = copy

    p|<poster>|<ts> using s|<user>|<poster> ”) 47   pull For  celebrity  posts  and   Gmelines  of  users  who   aren’t  logged  in  
  33. Pull  CelebriGes  Each  Time   neha’s   Gmeline   t|neha|104|argv0

      t|neha|115|argv0   t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   48   t|max|201   t|olive|99  
  34. Pull  CelebriGes  Each  Time   t|neha|104|argv0   t|neha|115|argv0   t|neha|102|basho

      t|neha|101|argv0   tweet   tweet   tweet   tweet   49   scan(“t|neha|100”, “t|neha|+”) p|jusGnbieber|207   tweet   p|ladygaga|209   tweet  
  35. Pequod  supports   incremental  operaGons  to   keep  results  so

     fresh   (and  so  clean  clean).   55  
  36. Many  Requests   ArGcles   Comments   Votes   App

      EnGre   Cached   Page   58  
  37. Inline  Cache  Joins   page|<article_id>|votes = count votes|<article_id> page|<article_id>|article =

    copy articles|<article_id> 59   page|<article_id>|comments|<id> = copy comments|<article_id>|<id>
  38. Interleaved  Data   ArGcles   Comments   Votes   One

        scan!   60   EnGre   Cached   Page   Cache   Join   Cache   Join   Cache   Join  
  39. ImplementaGon   •  C++  single-­‐threaded,  event  driven  server   • 

    Range  store:  trie  of  red-­‐black  trees   62   posts   subscripGons  
  40. OpGmizaGon:    Value  Sharing   t|argv0|215|basho   ptr   t|ladygaga|215|basho

      t|neha|215|basho   ptr   ptr   p|basho|215   At  RICON  East   WOOO!   64   17%  
  41. EvaluaGon   •  QPS  compared  to  other  systems   • 

    Twi]er  caching  strategies   •  Pequod  compared  to  client-­‐managed  caching   •  Benefit  of  opGmizaGons   65  
  42. Setup   •  12  core  machine  (2  3.47Ghz  Intel  Xeon

     X5960   chips  with  2  hyperthreads  per  core)   •  Linux  3.2.0,  96  GB  memory   •  Redis  2.4.14,  PostgreSQL  9.1.8   66  
  43. Workloads   •  Twi]er     – microbenchmark:  2K  users,  each

     Gmeline  request   returns  ~20  new  tweets.    1M  posts,  1M  reads   – ~real:  1.8M  users,  72M  relaGonships.  (see  paper)   •  News  Site   – 100K  arGcles,  50K  users,  1M  comments,  2M  votes     – 4M  requests,  1%  comment  rate,  varying  vote   rates     67  
  44. QPS  Comparison   68   0   20   40

      60   80   100   120   News  Site   Twi]er   Thousands   PostgreSQL   Redis   Pequod   Queries  per  second   CPU  UGlizaGon   at  19X  
  45. Twi]er  Hybrid  Push/Pull   69   0   5  

    10   15   20   25   30   1   5   10   20   40   60   80   90   95   100   Pull   Push   Hybrid   Total  RunGme  (seconds)   Percentage  of  AcGve  Users  
  46. Client  Managed  vs.  Pequod   70   0   10

      20   30   40   50   60   70   Client-­‐managed   Pequod   Other   Post   Timeline   RPC   Overheads   Total  RunGme  (seconds)  
  47. Client  Managed  vs.  Pequod   71   0   10

      20   30   40   50   60   70   Client-­‐managed   Pequod   Other   Post   Timeline   InserGng   New  Posts   Total  RunGme  (seconds)  
  48. Benefit  of  Sink  Hints  and  Value  Sharing   72  

    0   5   10   15   20   25   30   35   40   45   UnopGmized  Pequod   Pequod   Other   Post   Timeline   Total  RunGme  (seconds)  
  49. Related  Work   •  InvalidaGng  the  cache   – DUP,  TxCache,

     Scaling  Memcached  at  Facebook   •  Materialized  Views   – Maintenance,  Dynamic  Materialized  Views  from   Microsos   •  Push/pull   – Twi]er  Gmeline  REST  service   73  
  50. Open  QuesGons   •  EvicGon   •  Cross-­‐server  cache  joins

      •  Create  efficient  cache  joins  automagically   •  Support  more  computaGon   •  Non-­‐ordered  store   •  Full-­‐fledged  datastore   74  
  51. Pequod  Cache  Joins     Features  of  a  database,  

    Performance  of  a  cache     @neha   [email protected]   75