Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NSDI'16: Distributed Systems

NSDI'16: Distributed Systems

NSDI2016論文読み会 (http://system-reading.connpass.com/event/31207/ ) 、Distributed Systems Session の論文紹介で使用した資料です。

disktnk

May 29, 2016
Tweet

More Decks by disktnk

Other Decks in Technology

Transcript

  1. CONTENTS • Consensus in a Box: Inexpensive Coordination in Hardware

    (ETH Zürich) • StreamScope: Continuous Reliable Distributed Processing of Big Data Streams (Microsoft, Microsoft Research) • Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks (Facebook) • The Design and Implementation of the Warp Transactional Filesystem (Cornell University) • BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores
 (University of California, Berkeley) ঺հ͢Δ࿦จ https://www.usenix.org/conference/nsdi16/technical-sessions
  2. CONTENTS • Consensus in a Box: Inexpensive Coordination in Hardware

    • StreamScope: Continuous Reliable Distributed Processing of Big Data Streams • Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks • The Design and Implementation of the Warp Transactional Filesystem • BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores ঺հ͢Δ࿦จ
  3. ͲΜͳ՝୊Λղܾ͠Α͏ͱ͍ͯ͠Δͷ͔ Consensus in a Box: Inexpensive Coordination in Hardware •

    γεςϜ߹ҙ͸ίετ (࣮૷ͷෳࡶ͞ɾॲཧ࣌ؒ) ͕େ͖͍ • γεςϜ߹ҙͱ͸جຊతʹҎԼͷ4ͭΛຬͨ͢ඞཁ͕͋Δ • Termination (Liveness) / Validity / Integrity / Agreement • PAXOS, RAFT ͳͲ • ݫີͳworkload͕ඞཁͱ͞Ε͍ͯΔγεςϜͰ͸߹ҙ͕ඞਢͩ ͕ɺύϑΥʔϚϯε΍εέʔϥϏϦςΟͷ੍໿ʹͳΔ͜ͱ͕ଟ͍
  4. CONTENTS • Consensus in a Box: Inexpensive Coordination in Hardware

    • StreamScope: Continuous Reliable Distributed Processing of Big Data Streams • Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks • The Design and Implementation of the Warp Transactional Filesystem • BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores ঺հ͢Δ࿦จ
  5. ͲΜͳ՝୊Λղܾ͠Α͏ͱ͍ͯ͠Δͷ͔ StreamScope: Continuous Reliable Distributed Processing of Big Data Streams

    • ϏοάσʔλͰετϦʔϜॲཧ͕΍Γ͍ͨ • ෳࡶੑɺεέʔϥϏϦςΟɺfault tolerance (଱ނোੑ) ͕ٻΊΒ ΕΔ • ετϦʔϜॲཧͷσʔλϑϩʔͷ෮چ • ೖྗΠϕϯτͷ࠶ૹɺεςʔτ (ঢ়ଶ) ͷ෮چ
 ˠΠϕϯτͷϩετ͋Δ͍͸ॏෳΠϕϯτΛ๷͍͗ͨ
  6. StreamScope: Continuous Reliable Distributed Processing of Big Data Streams •

    σʔλϑϩʔΛ DAG ͱͯ͠දݱ͠ɺϊʔυͱΤοδͷґଘΛ rStream / rVertex ͱͯ͠ந৅Խ • recoveryͷ࣮ݱ • SCOPE ͱ͍͏ Parallel Map Reduce 
 ࣮૷ͷ֦ு Ͳ͏΍ͬͯղܾͨ͠ͷ͔ http://www.vldb.org/pvldb/1/1454166.pdf
  7. StreamScope: Continuous Reliable Distributed Processing of Big Data Streams •

    STREAM SCOPE (StreamS) ͷ࣮ߦ Ͳ͏΍ͬͯղܾͨ͠ͷ͔
  8. StreamScope: Continuous Reliable Distributed Processing of Big Data Streams •

    STREAM SCOPE (StreamS) ͷ࣮ߦ Ͳ͏΍ͬͯղܾͨ͠ͷ͔ rStream rVertex
  9. StreamScope: Continuous Reliable Distributed Processing of Big Data Streams •

    rStream • vertex (ΠϕϯτΛॲཧ͢Δϊʔυͷ૯শ) ΁ͷґଘΛ෼཭ɾந৅ Խͨ͠ඇಉظίϛϡχέʔγϣϯνϟϯωϧ • seq Λ࣋ͭɻಉ͡ seq Λ࣋ͭ event ͷॻ͖ࠐΈ͕੒ޭ͢Δ·Ͱ ಡΈࠐΈ͸ऴΘΒͳ͍ɻ • φΠʔϒʹ࣮૷͢ΔͱಉظϞσϧʹͳΔͷͰ GC ϞσϧΛ࠾༻ Ͳ͏΍ͬͯղܾͨ͠ͷ͔
  10. StreamScope: Continuous Reliable Distributed Processing of Big Data Streams •

    rVertex • vertex Ͱͷܭࢉʹରͯ͠γϯϓϧͳεφοϓγϣοτΛऔΔɻ • εςʔτͷ restart ٴͼ failure recovery Λ࣮૷ Ͳ͏΍ͬͯղܾͨ͠ͷ͔
  11. StreamScope: Continuous Reliable Distributed Processing of Big Data Streams •

    νΣοΫϙΠϯτͷִؒ΍ӬଓԽͷλΠϛϯάʹΑͬͯෳ਺ͷނো ෮چϞσϧΛ࠷খ͢Δ͜ͱ͕Ͱ͖Δ • strict model / relaxed model … • Ͳ͜Ͱނোͨ͠ͷ͔σόοά͢Δͷ͕؆୯ • σϓϩΠٴͼ (rStream / rVertex) Ҏ֎ͷ࣮૷͸طଘࢿ࢈ (ओʹ SCOPE) Λྲྀ༻͢Δ͜ͱ͕Ͱ͖Δ ݕূ
  12. CONTENTS • Consensus in a Box: Inexpensive Coordination in Hardware

    • StreamScope: Continuous Reliable Distributed Processing of Big Data Streams • Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks • The Design and Implementation of the Warp Transactional Filesystem • BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores ঺հ͢Δ࿦จ
  13. ͲΜͳ՝୊Λղܾ͠Α͏ͱ͍ͯ͠Δͷ͔ • େྔͷHTTPϦΫΤετΛͲͷΑ͏ʹࡹ͖ɺΩϟογϡΛ֤Ϋϥελ ຖʹͲ͏΍ͬͯ෼ࢄͤ͞Δ͔ɻޮ཰తͳ෼ࢄΛߦ͍͍ͨ • balanced / adaptive / stable

    / fast decision Λຬ͍ͨͨ͠ Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks
  14. • ιʔγϟϧάϥϑΛ੩తʹ഑ஔͰ͖Δ “object” ͱ ಈతʹ഑ஔ͢Δ “group” ʹ໌ࣔతʹ෼཭ • Facebook ͷ

    TAO (ιʔγϟϧάϥϑ޲͚෼ࢄDB) Λ࢖༻ • static assignment • ࣅͨάϥϑͷ object ΛूΊΔɻσʔλΞΫηεύλʔϯΛάϥϑ ͱͯ͠දݱ͠ɺ෼ׂɻσʔλ͕େ͖͍ͱ஗͍ɻ Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks Ͳ͏΍ͬͯղܾͨ͠ͷ͔
  15. • dynamic assignment • มԽ͢Δ৘ใʹରͯ͠ಈతʹόϥϯγϯάɻݕূͰ͸ bipartite graph partitioning Λ࢖༻ Social

    Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks Ͳ͏΍ͬͯղܾͨ͠ͷ͔
  16. CONTENTS • Consensus in a Box: Inexpensive Coordination in Hardware

    • StreamScope: Continuous Reliable Distributed Processing of Big Data Streams • Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks • The Design and Implementation of the Warp Transactional Filesystem • BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores ঺հ͢Δ࿦จ
  17. ͲΜͳ՝୊Λղܾ͠Α͏ͱ͍ͯ͠Δͷ͔ • ͜Ε·Ͱͷ෼ࢄϑΝΠϧγεςϜ • ෆे෼ͳอূ / ੍໿ͷଟ͍ΠϯλʔϑΣʔε / εέʔϧ͠ͳ͍ •

    ෼ࢄϑΝΠϧγεςϜͷϓϩτίϧͷ֦ுɻུͯ͠ WTF • PAXOS API + new zero-copy API The Design and Implementation of the Warp Transactional Filesystem
  18. • file slicing API The Design and Implementation of the

    Warp Transactional Filesystem Ͳ͏΍ͬͯղܾͨ͠ͷ͔
  19. • file slicing API • metadata strage ͕ slice ͷϙΠϯλΛ࣋ͭ

    • file offset ͢Δɻoverwrite ͨ͠৔߹͸ compaction ͕૸Δɻmeta data compaction • ࢖Θͳ͍෦෼͸ GC ͞ΕΔ • fragmentation ͕ى͜ΔͷͰɺܧଓతʹ locality-aware slice placement Λ࢖༻ The Design and Implementation of the Warp Transactional Filesystem Ͳ͏΍ͬͯղܾͨ͠ͷ͔
  20. • Map Reduce Sort ࣌ͷ store ʹ࢖༻ˠ࣮ߦ͕࣌ؒ 70min ͔Β 15min

    • Videoฤूˠ࣌ܥྻιʔτ͕ૣ͘ͳͬͨ • ͦͷଞ2ͭ঺հ The Design and Implementation of the Warp Transactional Filesystem Ԡ༻ΞϓϦέʔγϣϯ
  21. CONTENTS • Consensus in a Box: Inexpensive Coordination in Hardware

    • StreamScope: Continuous Reliable Distributed Processing of Big Data Streams • Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks • The Design and Implementation of the Warp Transactional Filesystem • BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores ঺հ͢Δ࿦จ
  22. ͲΜͳ՝୊Λղܾ͠Α͏ͱ͍ͯ͠Δͷ͔ • σʔλετΞʹஔ͍ͯ͸ɺϥϯμϜΞΫηεͱݕࡧͷ2͕ͭجຊతͳ ૢ࡞ • NoSQL͸େ͖ͳσʔλ΁ͷରԠ͕·ͣઌͰɺͦͷ࣍ʹ store Λ͍͔ ʹૣ͘͢Δ͔ͱ͍͏Ξϓϩʔν •

    େ͖ͳσʔλʹରԠ͢ΔͨΊʹѹॖΛར༻͍ͯ͠Δ͕ɺεϧʔϓο τͱͷτϨʔυΦϑ͕ଘࡏ͢Δ BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores
  23. • Succinct store BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores

    Ͳ͏΍ͬͯղܾͨ͠ͷ͔ https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/agarwal
  24. • Succinct Λෳ਺ͷ sampling rate Ͱ store Ͱ͖ΔΑ͏ʹ֦ு BlowFish: Dynamic

    Storage-Performance Tradeoff in Data Stores Ͳ͏΍ͬͯղܾͨ͠ͷ͔
  25. • γϟʔσΟϯά࣌ʹͲ͏΍ͬͯ sampling rate Λௐઅ͢Δ͔ʁ
 ˠઌߦݚڀ: Back-pressure style scheduling BlowFish:

    Dynamic Storage-Performance Tradeoff in Data Stores Ͳ͏΍ͬͯղܾͨ͠ͷ͔ http://dl.acm.org/citation.cfm?id=1285032