$30 off During Our Annual Pro Sale. View Details »

Smarter Caching

Neha
May 13, 2013

Smarter Caching

Application-level caches are incredibly useful, but difficult to use well on structured, changing data. In this talk I discuss why you might want to consider moving application computation into the cache in the form of cache joins, and I describe our system, Pequod, which is an in-memory key/value range store which implements these ideas.

Presented at RICON East, New York, NY.

Neha

May 13, 2013
Tweet

More Decks by Neha

Other Decks in Programming

Transcript

  1. Smarter  Caching  With  Pequod  
    Yandong  Mao  
    Neha  Narula  
    Robert  Morris  
    Bryan  Kate  
    Michael  Kester  
    Eddie  Kohler  
    Neha  Narula  
    May  13,  2013  
    @neha  
    1  

    View Slide

  2. 2  
    academia  

    View Slide

  3. ApplicaGon-­‐level    
    Caching    
    Is  Useful  
    3  

    View Slide

  4. 4  

    View Slide

  5. Cache  
    App  
    DB  
    Cache  reads  
    Reads  
    Writes  
    5  

    View Slide

  6. ApplicaGon  ComputaGon  
    6  
    show_timeline(user):
    get user-timeline
    if present
    return timeline
    else
    get user-following list
    if not present
    query db user-following list
    put user-following list in cache
    for each f in user-following list
    get f-posts
    if not present
    query db f-posts
    put f-posts in cache
    posts = posts + f-posts
    user-timeline = sort posts
    put user-timeline into cache
    return user-timeline
    post(poster, tweet):
    insert tweet into database
    append tweet to poster-posts
    get poster-followers
    if not present
    query db
    put followers in cache
    for each f in followers:
    get f-timeline
    if present
    update timeline
    lock
    put new timeline into cache
    unlock

    View Slide

  7. cache  joins  
    7  

    View Slide

  8. Cache  
    App  
    DB  
    Cache  reads  
    Reads  
    Writes  
    8  

    View Slide

  9. Pequod  
    App  
    DB  
    Cache  reads  
    Writes  
    Cache  Joins  
    9  

    View Slide

  10. Pequod  
    •  In  memory  key/value  range  store  
    scan(k0
    ,k1
    )  
    get(k)  
    put(k,v)
    install_join(cache_join)
    •  App  developer  specifies  cache  joins  
    •  ComputaGon  on  demand  (or  not)    
    10  

    View Slide

  11. What  Pequod  Can  Do  With  Cache  Joins  
    compute  results  automaGcally  
    subscribe  to  updates  in  a  range  of  data  
    keep  cached  results  fresh  
    easily  use  different  caching  strategies  
    interleave  different  types  of  data  
    11  

    View Slide

  12. Compute  Twi]er  Timeline  
    neha  
    argv0  
    neha’s  
    Timeline  
    Time  
    basho  
    jusGnbieber  
    ladygaga  
    Posts  
    SubscripGons  
    SELECT * FROM posts, subscriptions
    WHERE posts.poster =
    subscriptions.poster
    AND subscriptions.user = “neha”
    AND ts < posts.timestamp
    ORDER BY timestamp DESC
    12  

    View Slide

  13. Tell  Pequod  
    Gmelines  =  posts  X  subscripGons  
    13  

    View Slide

  14. All  Timelines  in  Pequod  
    olive’s  
    Gmeline  
    peter’s  
    Gmeline  
    14  
    max’s  
    Gmeline  
    neha’s  
    Gmeline  

    View Slide

  15. [ table | column | column … ]
    posts | |
    subscriptions | |
    15  
    Pequod  keys  embed  table  names  
    and  column  values  

    View Slide

  16. Post  Keys  
    p  |  jusGnbieber  |  207   Do  you  belieb?  
    posts | |
    16  

    View Slide

  17. SubscripGon  Key  Range  
    subscriptions | |
    17  
    s  |  neha  |  basho          
    s  |  neha  |  jusGnbieber  
    s  |  neha  |  ladygaga  

    View Slide

  18. Timeline  Keys  
    t  |  neha  |207  |  jusGnbieber   Do  you  belieb?  
    timeline | | |
    18  

    View Slide

  19. Twi]er  Timeline  Cache  Join  
    Pequod.install_join(
    “ t||| =
    copy p||
    using s|| ”)
    19  
    5melines  =  posts  X  subscrip5ons  

    View Slide

  20. t||| =
    copy p||
    using s||
    Twi]er  Timeline  Cache  Join  
    Seconday  
    Source  
    Sink  
    20  
    s|neha|argv0  
    s|neha|basho  
    s|neha|jusGnbieber  
    s|neha|ladygaga  
    p|argv0|101  
    p|basho|104  
    p|jusGnbieber|123  
    p|ladygaga|198  
    t|neha  
    Primary  
    Source  

    View Slide

  21. Compute  results  automa5cally  
    21  

    View Slide

  22. t|max|201  
    t|olive|99  
    Ask  For  A  Range,  Any  Range  
    scan(“t|neha|100”, “t|neha|+”)
    Where  
    neha’s  
    Gmeline  
    should  be  
    22  

    View Slide

  23. Use  Lookup  Source  
    using s|neha|
    neha’s  
    subscripGons  
    s|neha|argv0  
    s|neha|basho  
    s|neha|jusGnbieber  
    s|neha|ladygaga  
    23  

    View Slide

  24. Find  Primary  Source  
    using s|neha|argv0
    s|neha|argv0  
    s|neha|basho  
    s|neha|jusGnbieber  
    s|neha|ladygaga  
    p|argv0|101  
    p|argv0|104  
    p|argv0|115  
    tweet  
    tweet  
    tweet  
    argv0’s  posts  
    24  

    View Slide

  25. Copy  Primary  Source  
    copy p|argv0|101
    neha’s  
    Gmeline  
    t|neha|104|argv0  
    t|neha|115|argv0  
    t|neha|101|argv0   tweet  
    tweet  
    tweet  
    25  

    View Slide

  26. Use  Lookup  Source  
    using s|neha|basho
    s|neha|argv0  
    s|neha|basho  
    s|neha|jusGnbieber  
    s|neha|ladygaga  
    p|basho|102   tweet  
    basho’s  posts  
    26  

    View Slide

  27. Copy  Primary  Source  
    copy p|basho|102
    neha’s  
    Gmeline  
    t|neha|104|argv0  
    t|neha|115|argv0  
    t|neha|102|basho  
    t|neha|101|argv0   tweet  
    tweet  
    tweet  
    tweet  
    27  

    View Slide

  28. Use  Lookup  Source  
    using s|neha|justinbieber
    s|neha|argv0  
    s|neha|basho  
    s|neha|jusGnbieber  
    s|neha|ladygaga  
    p|jusGnbieber|207   tweet  
    jusGnbieber’s  posts  
    28  

    View Slide

  29. Copy  Primary  Source  
    copy p|justinbieber|207
    neha’s  
    Gmeline  
    t|neha|104|argv0  
    t|neha|115|argv0  
    t|neha|207|jusGnbieber  
    t|neha|102|basho  
    t|neha|101|argv0   tweet  
    tweet  
    tweet  
    tweet  
    tweet  
    29  

    View Slide

  30. Use  Lookup  Source  
    using s|neha|ladygaga
    s|neha|argv0  
    s|neha|basho  
    s|neha|jusGnbieber  
    s|neha|ladygaga  
    p|ladygaga|209   tweet  
    ladygaga’s  posts  
    30  

    View Slide

  31. Copy  Primary  Source  
    copy p|ladygaga|209
    neha’s  
    Gmeline  
    t|neha|104|argv0  
    t|neha|115|argv0  
    t|neha|207|jusGnbieber  
    t|neha|209|ladygaga  
    t|neha|102|basho  
    t|neha|101|argv0   tweet  
    tweet  
    tweet  
    tweet  
    tweet  
    tweet  
    31  

    View Slide

  32. Pequod  uses  cache  joins  to  
    automaGcally  create  cached  
    objects.  
    32  

    View Slide

  33. New  
    Tweets  
    33  

    View Slide

  34. Subscribe  to  range  updates  
    34  

    View Slide

  35. Cached  Timeline  
    neha’s  
    Gmeline  
    t|neha|104|argv0  
    t|neha|115|argv0  
    t|neha|207|jusGnbieber  
    t|neha|209|ladygaga  
    t|neha|102|basho  
    t|neha|101|argv0   tweet  
    tweet  
    tweet  
    tweet  
    tweet  
    tweet  
    35  

    View Slide

  36. Update,  New  Post  
    put(“p|basho|215”,
    “At RICON East WOOO!”)
    p|basho|102   tweet  
    basho’s  posts  
    p|basho|215   At  RICON  East  
    WOOO!  
    Update  
    t|neha  
    36  

    View Slide

  37. Copy  New  Post  
    t||| =
    copy p||
    using s||
    neha’s  
    Gmeline  
    t|neha|104|argv0  
    t|neha|115|argv0  
    t|neha|207|jusGnbieber  
    t|neha|209|ladygaga  
    t|neha|102|basho  
    t|neha|101|argv0   tweet  
    tweet  
    tweet  
    tweet  
    tweet  
    tweet  
    t|neha|215|basho   At  RICON  East  WOOO!  
    37  
    Update  
    t|neha  

    View Slide

  38. Update,  New  SubscripGon  
    put(“s|neha|xexd”, None)
    neha’s  
    subscripGons  
    s|neha|argv0  
    s|neha|basho  
    s|neha|jusGnbieber  
    s|neha|ladygaga  
    s|neha|xexd  
    Invalidate  
    t|neha  
    38  

    View Slide

  39. Invalidate  Sink  
    neha’s  
    Gmeline   t|neha|104|argv0  
    t|neha|115|argv0  
    t|neha|207|jusGnbieber  
    t|neha|209|ladygaga  
    t|neha|102|basho  
    t|neha|101|argv0   tweet  
    tweet  
    tweet  
    tweet  
    tweet  
    tweet  
    t|neha|215|basho   At  RICON  East  WOOO!  
    39  
    Invalidate  
    t|neha  

    View Slide

  40. Invalidate  Sink  
    t|neha|104|argv0  
    t|neha|115|argv0  
    t|neha|207|jusGnbieber  
    t|neha|209|ladygaga  
    t|neha|102|basho  
    t|neha|101|argv0   tweet  
    tweet  
    tweet  
    tweet  
    tweet  
    tweet  
    t|neha|215|basho   At  RICON  East  WOOO!  
    40  
    s|neha|xexd  

    View Slide

  41. t|neha|104|argv0  
    t|neha|115|argv0  
    t|neha|207|jusGnbieber  
    t|neha|209|ladygaga  
    t|neha|102|basho  
    t|neha|101|argv0   tweet  
    tweet  
    tweet  
    tweet  
    tweet  
    tweet  
    41  
    s|neha|xexd  
    scan(“t|neha|209”, “t|neha|+”)
    t|neha|213|xexd   tweet  
    t|neha|215|basho   At  RICON  East  WOOO!  
    scan  

    View Slide

  42. t|neha|209|ladygaga   tweet  
    t|neha|213|xexd  
    42  
    t|neha|215|basho   At  RICON  East  WOOO!  
    tweet  
    s|neha|xexd  
    scan(“t|neha|209”, “t|neha|+”)
    tweet  
    tweet  
    tweet  
    tweet  
    tweet  
    t|neha|104|argv0  
    t|neha|115|argv0  
    t|neha|207|jusGnbieber  
    t|neha|102|basho  
    t|neha|101|argv0  
    scan  

    View Slide

  43. Updaters  
    •  On  put(),  updater  gets  source  key,  new  
    value,  old  value  
    •  Updaters  on  primary  sources  immediately  
    update  the  relevant  sink  keys  
    •  Updates  on  secondary  sources  log  update  on  
    sink  and  invalidate  sink  keys,  to  be  fixed  up  on  
    next  scan  
    43  

    View Slide

  44. Clients  can  use  Pequod  to  
    subscribe  to  updates  in  a  
    range.  
    44  

    View Slide

  45. 45  
    38  MILLION  
    WRITES!  

    View Slide

  46. Use  different  caching  strategies  
    46  

    View Slide

  47. Twi]er  Timeline  Cache  Join  
    Pequod.install_join(
    “ t||| =
    copy p||
    using s|| ”)
    47  
    pull
    For  celebrity  posts  and  
    Gmelines  of  users  who  
    aren’t  logged  in  

    View Slide

  48. Pull  CelebriGes  Each  Time  
    neha’s  
    Gmeline  
    t|neha|104|argv0  
    t|neha|115|argv0  
    t|neha|102|basho  
    t|neha|101|argv0   tweet  
    tweet  
    tweet  
    tweet  
    48  
    t|max|201  
    t|olive|99  

    View Slide

  49. Pull  CelebriGes  Each  Time  
    t|neha|104|argv0  
    t|neha|115|argv0  
    t|neha|102|basho  
    t|neha|101|argv0   tweet  
    tweet  
    tweet  
    tweet  
    49  
    scan(“t|neha|100”, “t|neha|+”)
    p|jusGnbieber|207   tweet  
    p|ladygaga|209   tweet  

    View Slide

  50. Pequod  makes  it  easy  to  
    switch  between  caching  
    strategies.  
    50  

    View Slide

  51. Keep  cached  results  fresh  
    51  

    View Slide

  52. User  
    Karma  
    52  

    View Slide

  53. User  Karma  
    karma| =
    count votes||
    53  
    sum
    max
    min

    View Slide

  54. AutomaGcally  Update  
    Votes  
    54  
    Karma  
    Cache  
    Join  
    new  vote  

    View Slide

  55. Pequod  supports  
    incremental  operaGons  to  
    keep  results  so  fresh  
    (and  so  clean  clean).  
    55  

    View Slide

  56. Interleave  different  types  of  data  
    56  

    View Slide

  57. 57  

    View Slide

  58. Many  Requests  
    ArGcles  
    Comments  
    Votes  
    App  
    EnGre  
    Cached  
    Page  
    58  

    View Slide

  59. Inline  Cache  Joins  
    page||votes =
    count votes|
    page||article =
    copy articles|
    59  
    page||comments| =
    copy comments||

    View Slide

  60. Interleaved  Data  
    ArGcles  
    Comments  
    Votes  
    One    
    scan!  
    60  
    EnGre  
    Cached  
    Page  
    Cache  
    Join  
    Cache  
    Join  
    Cache  
    Join  

    View Slide

  61. Pequod  makes  it  easy  to  
    interleave  different  types  of  
    data.  
    61  

    View Slide

  62. ImplementaGon  
    •  C++  single-­‐threaded,  event  driven  server  
    •  Range  store:  trie  of  red-­‐black  trees  
    62  
    posts   subscripGons  

    View Slide

  63. OpGmizaGon:  Sink  Hints  
    Hint  
    40%  
    63  
    Gmelines  

    View Slide

  64. OpGmizaGon:    Value  Sharing  
    t|argv0|215|basho   ptr  
    t|ladygaga|215|basho  
    t|neha|215|basho  
    ptr  
    ptr  
    p|basho|215   At  RICON  East  
    WOOO!  
    64  
    17%  

    View Slide

  65. EvaluaGon  
    •  QPS  compared  to  other  systems  
    •  Twi]er  caching  strategies  
    •  Pequod  compared  to  client-­‐managed  caching  
    •  Benefit  of  opGmizaGons  
    65  

    View Slide

  66. Setup  
    •  12  core  machine  (2  3.47Ghz  Intel  Xeon  X5960  
    chips  with  2  hyperthreads  per  core)  
    •  Linux  3.2.0,  96  GB  memory  
    •  Redis  2.4.14,  PostgreSQL  9.1.8  
    66  

    View Slide

  67. Workloads  
    •  Twi]er    
    – microbenchmark:  2K  users,  each  Gmeline  request  
    returns  ~20  new  tweets.    1M  posts,  1M  reads  
    – ~real:  1.8M  users,  72M  relaGonships.  (see  paper)  
    •  News  Site  
    – 100K  arGcles,  50K  users,  1M  comments,  2M  votes    
    – 4M  requests,  1%  comment  rate,  varying  vote  
    rates  
     
    67  

    View Slide

  68. QPS  Comparison  
    68  
    0  
    20  
    40  
    60  
    80  
    100  
    120  
    News  Site   Twi]er  
    Thousands  
    PostgreSQL  
    Redis  
    Pequod  
    Queries  per  second  
    CPU  UGlizaGon  
    at  19X  

    View Slide

  69. Twi]er  Hybrid  Push/Pull  
    69  
    0  
    5  
    10  
    15  
    20  
    25  
    30  
    1   5   10   20   40   60   80   90   95   100  
    Pull  
    Push  
    Hybrid  
    Total  RunGme  (seconds)  
    Percentage  of  AcGve  Users  

    View Slide

  70. Client  Managed  vs.  Pequod  
    70  
    0  
    10  
    20  
    30  
    40  
    50  
    60  
    70  
    Client-­‐managed   Pequod  
    Other  
    Post  
    Timeline  
    RPC  
    Overheads  
    Total  RunGme  (seconds)  

    View Slide

  71. Client  Managed  vs.  Pequod  
    71  
    0  
    10  
    20  
    30  
    40  
    50  
    60  
    70  
    Client-­‐managed   Pequod  
    Other  
    Post  
    Timeline  
    InserGng  
    New  Posts  
    Total  RunGme  (seconds)  

    View Slide

  72. Benefit  of  Sink  Hints  and  Value  Sharing  
    72  
    0  
    5  
    10  
    15  
    20  
    25  
    30  
    35  
    40  
    45  
    UnopGmized  Pequod   Pequod  
    Other  
    Post  
    Timeline  
    Total  RunGme  (seconds)  

    View Slide

  73. Related  Work  
    •  InvalidaGng  the  cache  
    – DUP,  TxCache,  Scaling  Memcached  at  Facebook  
    •  Materialized  Views  
    – Maintenance,  Dynamic  Materialized  Views  from  
    Microsos  
    •  Push/pull  
    – Twi]er  Gmeline  REST  service  
    73  

    View Slide

  74. Open  QuesGons  
    •  EvicGon  
    •  Cross-­‐server  cache  joins  
    •  Create  efficient  cache  joins  automagically  
    •  Support  more  computaGon  
    •  Non-­‐ordered  store  
    •  Full-­‐fledged  datastore  
    74  

    View Slide

  75. Pequod  Cache  Joins  
     
    Features  of  a  database,  
    Performance  of  a  cache  
     
    @neha  
    [email protected]  
    75  

    View Slide