$30 off During Our Annual Pro Sale. View Details »

Scaling Memcache at Facebook

Scaling Memcache at Facebook

id:y_uuki 論文輪読会#8

Yuuki Tsubouchi (yuuk1)

July 09, 2014
Tweet

More Decks by Yuuki Tsubouchi (yuuk1)

Other Decks in Research

Transcript

  1. Scaling Memcache at Facebook
    Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li,
    Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung,
    Venkateshwaran Venkataramani
    JEZ@VVLJ
    ࿦จྠಡձ
    NSDI 13
    In Proceedings of the 10th USENIX conference on Networked Systems Design
    and Implementation

    View Slide

  2. Infrastructure Requirements
    • ϦΞϧλΠϜʹ͍ۙίϛϡχέʔγϣϯ
    • ෳ਺ͷσʔλιʔε͔ΒίϯςϯπΛଋͶΔ
    • ਓؾͷ͋ΔγΣΞ͞ΕͨίϯςϯπΛࢀর/ߋ৽
    Ͱ͖Δ
    • ຖඵ਺ेԯϦΫΤετ·Ͱεέʔϧ͢Δ

    View Slide

  3. Design Requirements
    • ඇৗʹॏ͍ read ෛՙͷαϙʔτ
    • over 1 G read per second
    • όοΫΤϯυαʔϏεΛॏ͍ read ͔Βִ཭
    • ஍ཧతͳ෼ࢄ
    • ఆظతͳϓϩμΫτͷ૑ग़
    • ༷ʑͳϢʔεέʔεʹରͯ͠ϑϨΩγϒϧ
    • ৽ػೳͷߴ଎։ൃαϙʔτ
    • Ӭଓ૚ͱΩϟογϡ૚ͷ෼཭

    View Slide

  4. memcached
    • Facebook ͷ෼ࢄKVSͷجຊ୯Ґ
    • Trillions of items
    • Billions of requests / second
    • ωοτϫʔΫΞΫηεͰ͖ΔΠϯϝϞϦϋογϡ
    ςʔϒϧ
    • LRUͰϝϞϦ͔Β௥͍ग़͠

    View Slide

  5. Roadmap
    1. ୯ҰϑϩϯτΤϯυΫϥελ
    - ॏ͍ read ϫʔΫϩʔυ
    - Wide fanout
    - ϦΫΤετࣦഊͷϋϯυϦϯά
    2. ෳ਺ϑϩϯτΤϯυΫϥελ
    - σʔλϨϓϦέʔγϣϯ
    - σʔλͷҰ؏ੑ
    3. ෳ਺Ϧʔδϣϯ
    - σʔλͷҰ؏ੑ
    ࠷ऴܗ

    View Slide

  6. memcache scale steps
    NFNDBDIFEαʔόͳ͠
    ਺୆ͷNFNDBDIFEαʔό
    γϯάϧΫϥελ
    ϚϧνΫϥελ
    ஍ཧత෼ࢄΫϥελ

    View Slide

  7. Query Cache (1)
    • 1. memcache ʹ get ϦΫΤετ
    • 2. Ωϟογϡϛεͨ͠ΒɺDB ʹ
    ΫΤϦΛ౤͛Δ
    • 3. memcache ʹΫΤϦ݁ՌΛ
    Ωϟογϡ
    • ී௨ͷϑϩʔ

    View Slide

  8. Query Cache (2)
    • DBʹߋ৽͕͋ͬͨͱ͖Ωϟο
    γϡΛ”ແޮԽ”(invalidation)͢Δ
    ඞཁ͕͋Δ
    • ΩϟογϡΛߋ৽͢ΔΑΓ࡟আ
    • ࡟আ͸ႈ౳ੑ͕͋Δ
    • “demand-filled look-aside cache”

    View Slide

  9. The problem of look-aside cache
    • ෳ਺ͷWebαʔό͕ set͢Δͱ͖ෆ੔߹
    ͕ى͖Δ
    • ҟͳΔ஋Λฒྻʹset͠Α͏ͱ͢Δ
    • memcache ϓϩτίϧ֦ு “leases”
    • Ωϟογϡϛε࣌ʹτʔΫϯ(lease)Λ
    ൃߦ
    • Ωϟογϡϛεͨ͠ਓ͕set͢Δ
    • delete ࣌ʹτʔΫϯΛແޮԽ
    • τʔΫϯͷೝূʹࣦഊ͢ΔͱɺsetෆՄ
    TUBMFTFUT
    SFGIUUQTXXXVTFOJYPSHTJUFTEFGBVMUpMFTDPOGFSFODFQSPUFDUFEpMFTOJTIUBMB@OTEJ@TMJEFTQEG

    View Slide

  10. The problem of look-aside cache
    • ಛఆͷΩʔ͕ແޮԽ → DBʹ
    Ұ੪ϑΥʔϧόοΫ
    • “leases”Λগ͠मਖ਼
    • Ωϟογϡϛε࣌ʹsetΛ଴ͭ
    or ݹ͍ΩϟογϡΛࢀর͢Δ
    • ݹ͍ΩϟογϡΛઐ༻ͷσʔ
    λߏ଄ʹूΊ͓ͯ͘
    5IVOEFSJOH)FSET
    SFGIUUQTXXXVTFOJYPSHTJUFTEFGBVMUpMFTDPOGFSFODFQSPUFDUFEpMFTOJTIUBMB@OTEJ@TMJEFTQEG

    View Slide

  11. memcache scaling steps
    NFNDBDIFEαʔόͳ͠
    ਺୆ͷNFNDBDIFEαʔό
    γϯάϧΫϥελ
    ϚϧνΫϥελ
    ஍ཧత෼ࢄΫϥελ

    View Slide

  12. Many memcached servers
    • Ωʔʹର͢Δ consistent-hashing Ͱ෼ࢄ
    • Ұ෦ͷΩʔ͚ͩΞΫηεස౓͕ߴ͍
    • ϨϓϦέʔγϣϯͩͱϝϞϦޮ཰ѱ͍
    • શWebαʔόͱmemcachedαʔό͕ଟରଟͰ௨৴
    • hundreads of memcache gets per user request (eg. avg 521 fetches)

    View Slide

  13. The problem of Consistent-Hashing
    1BDLFU%SPQ
    • શ Web αʔό͔Βશ memcached ʹόʔετΞΫηε
    • ωοτϫʔΫεΠονͷόοϑΝ͕ᷓΕͯύέοτϩε
    • ᫔᫓੍ޚʹΑΓ TCP ίωΫγϣϯͷεϧʔϓοτ௿Լ
    • εϥΠσΟϯά΢Οϯυ΢Ͱಉ࣌ϦΫΤετ਺Λ੍ޚ
    • Ϩεϙϯε͕ฦ͖ͬͯͨΒ࣍ͷϦΫΤετΛ౤͛Δ
    • ϦΫΤετ੒ޭ਺ʹԠͯ͡΢Οϯυ΢αΠζΛখ͘͢͞Δ
    5$1*ODBTU$POHFTUJPO

    View Slide

  14. memcache scaling steps
    NFNDBDIFEαʔόͳ͠
    ਺୆ͷNFNDBDIFEαʔό
    γϯάϧΫϥελ
    ϚϧνΫϥελ
    ஍ཧత෼ࢄΫϥελ

    View Slide

  15. Multiple Clusters
    • ୆਺૿΍͚ͩ͢ͷεέʔϦϯ
    άͩͱ͖ͼ͍͠
    • ෳ਺ͷϑϩϯτΤϯυΫϥε
    λ + ετϨʔδΫϥελ
    • ඞཁͳ͜ͱ
    • ֤ΫϥελؒͰҰ؏ੑͷҡ

    • σʔλϨϓϦέʔγϣϯ

    View Slide

  16. DB invalidate caches
    • DBߋ৽ޙɺ֤ϑϩϯτΤϯυΫϥελͷΩϟο
    γϡΛແޮԽ͠ͳ͚Ε͹ͳΒͳ͍
    • MySQL ίϛοτϩάΛ tail ͯ͠ શ memcached ͷ
    ΩϟογϡΛແޮԽ͢ΔσʔϞϯʢMcSqueal)
    • ͋Β͔͡ΊɺແޮԽ͢΂͖ΩʔΛSQLʹຒΊࠐΉ

    View Slide

  17. invalidate pipeline
    • memcached ͱ McSquealͷଟର
    ଟ௨৴ͰύέοτϨʔτ͕໰୊ʹ
    • ઐ༻ͷϧʔλʢmcrouter)ΛڬΉ
    • deleteཁٻͷѹॖ΋
    • ύέοτ͋ͨΓͷdelete਺͕18ഒ
    5PPNBOZQBDLFUT

    View Slide

  18. memcache scaling steps
    NFNDBDIFEαʔόͳ͠
    ਺୆ͷNFNDBDIFEαʔό
    γϯάϧΫϥελ
    ϚϧνΫϥελ
    ஍ཧత෼ࢄΫϥελ

    View Slide

  19. Geographically distributed clusters
    • ϨϓϦΧ = ෳ਺ϑϩϯτΤϯυΫϥελ + ετ
    ϨʔδΫϥελ
    • ϨϓϦΧΛ֤Ϧʔδϣϯʹ഑ஔ
    • ϚελϨϓϦΧ
    • ϚελετϨʔδΫϥελΛ΋ͭϨϓϦΧ
    • ॻ͖ࠐΈ͸ϚελϨϓϦΧʹ

    View Slide

  20. Writes in non-master replica
    • εϨʔϒϨϓϦΧ͔ΒϚελDB΁ͷॻ͖ࠐΈ
    • ϨϓϦ஗Ԇ͍ͯ͠ΔؒʹɺผίϯςΩετͰ
    Ωϟογϡϛε → Ωϟογϡ஋ set
    • ϚελDBʹ௥͍͍ͭͯͳ͍ͳ͍εϨʔϒDBͷ
    ஋͕Ωϟογϡ͞Εͯ͠·͏

    View Slide

  21. Remote Markers
    • εϨʔϒϨϓϦ͔ΒϚελ΁ͷॻ͖ࠐΈ࣌ʹ͸ɺ
    ϚʔΧʔΛ͚ͭΔ
    • ϚʔΧʔ͕͍͍ͭͯͨΒɺϚελDB͔Β read
    • ϚʔΧʔ͕͍ͭͯͳ͔ͬͨΒɺεϨʔϒDB͔
    Β read

    View Slide

  22. Conclusion
    • Facbeookͷ੒௕ʹ͋ΘͤͯͲ͏΍ͬͯMemcache Λεέʔϧͤ͞Δ͔
    • ΤϯδχΞϦϯάϦιʔεͷόϥϯεΛͱΕΔΑ͏ʹɺݱ࣮ʹଇͨ͠
    ΍Γํ
    1. ΩϟογϡͱӬଓετϨʔδΛ෼཭ͯ͠ɺಠཱͯ͠εέʔϧͤ͞Δ
    2. ϞχλϦϯάɺσόοΪϯάɺΦϖϨʔγϣϯޮ཰Λվળ͢Δػೳ͸
    ύϑΥʔϚϯεͱಉ͘͡Β͍ॏཁ
    3. ϩδοΫ͸ stateless ͳΫϥΠΞϯτʹஔ͘΄͏͕ࠞཚ͠ͳ͍
    4. γεςϜ͸৽ػೳͷஈ֊తͳϩʔϧΞ΢τͱϩʔϧόοΫΛαϙʔτ
    ͠ͳ͚Ε͹ͳΒͳ͍
    5. Simplicity is vital.

    View Slide

  23. ٙ໰
    • ϨϓϦέʔγϣϯͷ࣮૷͸ʁ
    • memcached ͸ϨϓϦέʔγϣϯͷػೳͳ͍
    • ΫϥΠΞϯτϥΠϒϥϦͷ࣮૷͸ʁ
    • Ͳ͜·ͰΞϓϦͷϩδοΫͱ෼཭͍ͯ͠Δͷ͔

    View Slide