Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Smarter Caching

749c111168bcee4d556ac780537ed9e6?s=47 Neha
May 13, 2013

Smarter Caching

Application-level caches are incredibly useful, but difficult to use well on structured, changing data. In this talk I discuss why you might want to consider moving application computation into the cache in the form of cache joins, and I describe our system, Pequod, which is an in-memory key/value range store which implements these ideas.

Presented at RICON East, New York, NY.

749c111168bcee4d556ac780537ed9e6?s=128

Neha

May 13, 2013
Tweet

Transcript

  1. Smarter  Caching  With  Pequod   Yandong  Mao   Neha  Narula

      Robert  Morris   Bryan  Kate   Michael  Kester   Eddie  Kohler   Neha  Narula   May  13,  2013   @neha   1  
  2. 2   academia  

  3. ApplicaGon-­‐level     Caching     Is  Useful   3

     
  4. 4  

  5. Cache   App   DB   Cache  reads   Reads

      Writes   5  
  6. ApplicaGon  ComputaGon   6   show_timeline(user): get user-timeline if present

    return timeline else get user-following list if not present query db user-following list put user-following list in cache for each f in user-following list get f-posts if not present query db f-posts put f-posts in cache posts = posts + f-posts user-timeline = sort posts put user-timeline into cache return user-timeline post(poster, tweet): insert tweet into database append tweet to poster-posts get poster-followers if not present query db put followers in cache for each f in followers: get f-timeline if present update timeline lock put new timeline into cache unlock
  7. cache  joins   7  

  8. Cache   App   DB   Cache  reads   Reads

      Writes   8  
  9. Pequod   App   DB   Cache  reads   Writes

      Cache  Joins   9  
  10. Pequod   •  In  memory  key/value  range  store   scan(k0

    ,k1 )   get(k)   put(k,v) install_join(cache_join) •  App  developer  specifies  cache  joins   •  ComputaGon  on  demand  (or  not)     10  
  11. What  Pequod  Can  Do  With  Cache  Joins   compute  results

     automaGcally   subscribe  to  updates  in  a  range  of  data   keep  cached  results  fresh   easily  use  different  caching  strategies   interleave  different  types  of  data   11  
  12. Compute  Twi]er  Timeline   neha   argv0   neha’s  

    Timeline   Time   basho   jusGnbieber   ladygaga   Posts   SubscripGons   SELECT * FROM posts, subscriptions WHERE posts.poster = subscriptions.poster AND subscriptions.user = “neha” AND ts < posts.timestamp ORDER BY timestamp DESC 12  
  13. Tell  Pequod   Gmelines  =  posts  X  subscripGons   13

     
  14. All  Timelines  in  Pequod   olive’s   Gmeline   peter’s

      Gmeline   14   max’s   Gmeline   neha’s   Gmeline  
  15. [ table | column | column … ] posts |

    <poster> | <timestamp> subscriptions | <user> | <poster> 15   Pequod  keys  embed  table  names   and  column  values  
  16. Post  Keys   p  |  jusGnbieber  |  207   Do

     you  belieb?   posts | <poster> | <timestamp> 16  
  17. SubscripGon  Key  Range   subscriptions | <user> | <poster> 17

      s  |  neha  |  basho           s  |  neha  |  jusGnbieber   s  |  neha  |  ladygaga  
  18. Timeline  Keys   t  |  neha  |207  |  jusGnbieber  

    Do  you  belieb?   timeline | <user> | <timestamp> | <poster> 18  
  19. Twi]er  Timeline  Cache  Join   Pequod.install_join( “ t|<user>|<ts>|<poster> = copy

    p|<poster>|<ts> using s|<user>|<poster> ”) 19   5melines  =  posts  X  subscrip5ons  
  20. t|<user>|<ts>|<poster> = copy p|<poster>|<ts> using s|<user>|<poster> Twi]er  Timeline  Cache  Join

      Seconday   Source   Sink   20   s|neha|argv0   s|neha|basho   s|neha|jusGnbieber   s|neha|ladygaga   p|argv0|101   p|basho|104   p|jusGnbieber|123   p|ladygaga|198   t|neha   Primary   Source  
  21. Compute  results  automa5cally   21  

  22. t|max|201   t|olive|99   Ask  For  A  Range,  Any  Range

      scan(“t|neha|100”, “t|neha|+”) Where   neha’s   Gmeline   should  be   22  
  23. Use  Lookup  Source   using s|neha|<poster> neha’s   subscripGons  

    s|neha|argv0   s|neha|basho   s|neha|jusGnbieber   s|neha|ladygaga   23  
  24. Find  Primary  Source   using s|neha|argv0 s|neha|argv0   s|neha|basho  

    s|neha|jusGnbieber   s|neha|ladygaga   p|argv0|101   p|argv0|104   p|argv0|115   tweet   tweet   tweet   argv0’s  posts   24  
  25. Copy  Primary  Source   copy p|argv0|101 neha’s   Gmeline  

    t|neha|104|argv0   t|neha|115|argv0   t|neha|101|argv0   tweet   tweet   tweet   25  
  26. Use  Lookup  Source   using s|neha|basho s|neha|argv0   s|neha|basho  

    s|neha|jusGnbieber   s|neha|ladygaga   p|basho|102   tweet   basho’s  posts   26  
  27. Copy  Primary  Source   copy p|basho|102 neha’s   Gmeline  

    t|neha|104|argv0   t|neha|115|argv0   t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   27  
  28. Use  Lookup  Source   using s|neha|justinbieber s|neha|argv0   s|neha|basho  

    s|neha|jusGnbieber   s|neha|ladygaga   p|jusGnbieber|207   tweet   jusGnbieber’s  posts   28  
  29. Copy  Primary  Source   copy p|justinbieber|207 neha’s   Gmeline  

    t|neha|104|argv0   t|neha|115|argv0   t|neha|207|jusGnbieber   t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   tweet   29  
  30. Use  Lookup  Source   using s|neha|ladygaga s|neha|argv0   s|neha|basho  

    s|neha|jusGnbieber   s|neha|ladygaga   p|ladygaga|209   tweet   ladygaga’s  posts   30  
  31. Copy  Primary  Source   copy p|ladygaga|209 neha’s   Gmeline  

    t|neha|104|argv0   t|neha|115|argv0   t|neha|207|jusGnbieber   t|neha|209|ladygaga   t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   tweet   tweet   31  
  32. Pequod  uses  cache  joins  to   automaGcally  create  cached  

    objects.   32  
  33. New   Tweets   33  

  34. Subscribe  to  range  updates   34  

  35. Cached  Timeline   neha’s   Gmeline   t|neha|104|argv0   t|neha|115|argv0

      t|neha|207|jusGnbieber   t|neha|209|ladygaga   t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   tweet   tweet   35  
  36. Update,  New  Post   put(“p|basho|215”, “At RICON East WOOO!”) p|basho|102

      tweet   basho’s  posts   p|basho|215   At  RICON  East   WOOO!   Update   t|neha   36  
  37. Copy  New  Post   t|<user>|<ts>|<poster> = copy p|<poster>|<ts> using s|<user>|<poster>

    neha’s   Gmeline   t|neha|104|argv0   t|neha|115|argv0   t|neha|207|jusGnbieber   t|neha|209|ladygaga   t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   tweet   tweet   t|neha|215|basho   At  RICON  East  WOOO!   37   Update   t|neha  
  38. Update,  New  SubscripGon   put(“s|neha|xexd”, None) neha’s   subscripGons  

    s|neha|argv0   s|neha|basho   s|neha|jusGnbieber   s|neha|ladygaga   s|neha|xexd   Invalidate   t|neha   38  
  39. Invalidate  Sink   neha’s   Gmeline   t|neha|104|argv0   t|neha|115|argv0

      t|neha|207|jusGnbieber   t|neha|209|ladygaga   t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   tweet   tweet   t|neha|215|basho   At  RICON  East  WOOO!   39   Invalidate   t|neha  
  40. Invalidate  Sink   t|neha|104|argv0   t|neha|115|argv0   t|neha|207|jusGnbieber   t|neha|209|ladygaga

      t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   tweet   tweet   t|neha|215|basho   At  RICON  East  WOOO!   40   s|neha|xexd  
  41. t|neha|104|argv0   t|neha|115|argv0   t|neha|207|jusGnbieber   t|neha|209|ladygaga   t|neha|102|basho  

    t|neha|101|argv0   tweet   tweet   tweet   tweet   tweet   tweet   41   s|neha|xexd   scan(“t|neha|209”, “t|neha|+”) t|neha|213|xexd   tweet   t|neha|215|basho   At  RICON  East  WOOO!   scan  
  42. t|neha|209|ladygaga   tweet   t|neha|213|xexd   42   t|neha|215|basho  

    At  RICON  East  WOOO!   tweet   s|neha|xexd   scan(“t|neha|209”, “t|neha|+”) tweet   tweet   tweet   tweet   tweet   t|neha|104|argv0   t|neha|115|argv0   t|neha|207|jusGnbieber   t|neha|102|basho   t|neha|101|argv0   scan  
  43. Updaters   •  On  put(),  updater  gets  source  key,  new

      value,  old  value   •  Updaters  on  primary  sources  immediately   update  the  relevant  sink  keys   •  Updates  on  secondary  sources  log  update  on   sink  and  invalidate  sink  keys,  to  be  fixed  up  on   next  scan   43  
  44. Clients  can  use  Pequod  to   subscribe  to  updates  in

     a   range.   44  
  45. 45   38  MILLION   WRITES!  

  46. Use  different  caching  strategies   46  

  47. Twi]er  Timeline  Cache  Join   Pequod.install_join( “ t|<user>|<ts>|<poster> = copy

    p|<poster>|<ts> using s|<user>|<poster> ”) 47   pull For  celebrity  posts  and   Gmelines  of  users  who   aren’t  logged  in  
  48. Pull  CelebriGes  Each  Time   neha’s   Gmeline   t|neha|104|argv0

      t|neha|115|argv0   t|neha|102|basho   t|neha|101|argv0   tweet   tweet   tweet   tweet   48   t|max|201   t|olive|99  
  49. Pull  CelebriGes  Each  Time   t|neha|104|argv0   t|neha|115|argv0   t|neha|102|basho

      t|neha|101|argv0   tweet   tweet   tweet   tweet   49   scan(“t|neha|100”, “t|neha|+”) p|jusGnbieber|207   tweet   p|ladygaga|209   tweet  
  50. Pequod  makes  it  easy  to   switch  between  caching  

    strategies.   50  
  51. Keep  cached  results  fresh   51  

  52. User   Karma   52  

  53. User  Karma   karma|<user> = count votes|<user>|<submit> 53   sum

    max min
  54. AutomaGcally  Update   Votes   54   Karma   Cache

      Join   new  vote  
  55. Pequod  supports   incremental  operaGons  to   keep  results  so

     fresh   (and  so  clean  clean).   55  
  56. Interleave  different  types  of  data   56  

  57. 57  

  58. Many  Requests   ArGcles   Comments   Votes   App

      EnGre   Cached   Page   58  
  59. Inline  Cache  Joins   page|<article_id>|votes = count votes|<article_id> page|<article_id>|article =

    copy articles|<article_id> 59   page|<article_id>|comments|<id> = copy comments|<article_id>|<id>
  60. Interleaved  Data   ArGcles   Comments   Votes   One

        scan!   60   EnGre   Cached   Page   Cache   Join   Cache   Join   Cache   Join  
  61. Pequod  makes  it  easy  to   interleave  different  types  of

      data.   61  
  62. ImplementaGon   •  C++  single-­‐threaded,  event  driven  server   • 

    Range  store:  trie  of  red-­‐black  trees   62   posts   subscripGons  
  63. OpGmizaGon:  Sink  Hints   Hint   40%   63  

    Gmelines  
  64. OpGmizaGon:    Value  Sharing   t|argv0|215|basho   ptr   t|ladygaga|215|basho

      t|neha|215|basho   ptr   ptr   p|basho|215   At  RICON  East   WOOO!   64   17%  
  65. EvaluaGon   •  QPS  compared  to  other  systems   • 

    Twi]er  caching  strategies   •  Pequod  compared  to  client-­‐managed  caching   •  Benefit  of  opGmizaGons   65  
  66. Setup   •  12  core  machine  (2  3.47Ghz  Intel  Xeon

     X5960   chips  with  2  hyperthreads  per  core)   •  Linux  3.2.0,  96  GB  memory   •  Redis  2.4.14,  PostgreSQL  9.1.8   66  
  67. Workloads   •  Twi]er     – microbenchmark:  2K  users,  each

     Gmeline  request   returns  ~20  new  tweets.    1M  posts,  1M  reads   – ~real:  1.8M  users,  72M  relaGonships.  (see  paper)   •  News  Site   – 100K  arGcles,  50K  users,  1M  comments,  2M  votes     – 4M  requests,  1%  comment  rate,  varying  vote   rates     67  
  68. QPS  Comparison   68   0   20   40

      60   80   100   120   News  Site   Twi]er   Thousands   PostgreSQL   Redis   Pequod   Queries  per  second   CPU  UGlizaGon   at  19X  
  69. Twi]er  Hybrid  Push/Pull   69   0   5  

    10   15   20   25   30   1   5   10   20   40   60   80   90   95   100   Pull   Push   Hybrid   Total  RunGme  (seconds)   Percentage  of  AcGve  Users  
  70. Client  Managed  vs.  Pequod   70   0   10

      20   30   40   50   60   70   Client-­‐managed   Pequod   Other   Post   Timeline   RPC   Overheads   Total  RunGme  (seconds)  
  71. Client  Managed  vs.  Pequod   71   0   10

      20   30   40   50   60   70   Client-­‐managed   Pequod   Other   Post   Timeline   InserGng   New  Posts   Total  RunGme  (seconds)  
  72. Benefit  of  Sink  Hints  and  Value  Sharing   72  

    0   5   10   15   20   25   30   35   40   45   UnopGmized  Pequod   Pequod   Other   Post   Timeline   Total  RunGme  (seconds)  
  73. Related  Work   •  InvalidaGng  the  cache   – DUP,  TxCache,

     Scaling  Memcached  at  Facebook   •  Materialized  Views   – Maintenance,  Dynamic  Materialized  Views  from   Microsos   •  Push/pull   – Twi]er  Gmeline  REST  service   73  
  74. Open  QuesGons   •  EvicGon   •  Cross-­‐server  cache  joins

      •  Create  efficient  cache  joins  automagically   •  Support  more  computaGon   •  Non-­‐ordered  store   •  Full-­‐fledged  datastore   74  
  75. Pequod  Cache  Joins     Features  of  a  database,  

    Performance  of  a  cache     @neha   narula@mit.edu   75