HSEとは何か

635e53b96114c922fa5486b418895960?s=47 Fadis
June 06, 2020

 HSEとは何か

Heterogeneous-Memory Storage Engineについて解説します
これは2020年6月6日に行われた カーネル/VM探検隊 online part1での発表資料です

参考文献
Heterogeneous-Memory
Storage Engine: https://www.micron.com/hse
Don't stack your log on my log: https://www.usenix.org/node/187064
電源を切っても消えないメモリとの付き合い方: https://speakerdeck.com/fadis/dian-yuan-woqie-tutemoxiao-enaimemoritofalsefu-kihe-ifang
この資料のサンプルコード: https://github.com/Fadis/hse_demo
カーネル/VM探検隊 online part1: https://connpass.com/event/175388/

635e53b96114c922fa5486b418895960?s=128

Fadis

June 06, 2020
Tweet

Transcript

  1. HSEͱ͸Կ͔ NAOMASA MATSUBAYASHI

  2. Heterogeneous-Memory Storage Engine https://www.micron.com/hse 2020೥4݄ʹMicron͕ൃදͨ͠ Φʔϓϯιʔεͳ Key-Value Store

  3. Heterogeneous-Memory Storage Engine https://github.com/hse-project/hse/wiki "HSE͸NANDϑϥογϡ·ͨ͸ෆشൃϝϞϦΛ༻͍ΔSSDͷͨΊʹ࡞ΒΕͨ ૊ΈࠐΈՄೳͳkey-value storeͰ͢ɻHSE͸DRAM͔ΒଟछͷSSD·ͨ͸ͦͷଞͷsolid- stateετϨʔδ·Ͱͷσʔλͷ഑ஔΛ޻෉͢Δ͜ͱͰɺੑೳͱ଱ٱੑΛվળͤ͞·͢ɻ" https://www.micron.com/hse Φʔϓϯιʔεͳ

    Key-Value Store
  4. Heterogeneous-Memory Storage Engine MongoDBͷWiredTigerΛHSEͰஔ͖׵͑Δͱ YCSBϕϯνϚʔΫͷεϧʔϓοτ͕2ഒ͔Β8ഒʹͳΔΒ͍͠ https://github.com/hse-project/hse/wiki/MongoDB https://github.com/hse-project/hse/wiki "HSE͸NANDϑϥογϡ·ͨ͸ෆشൃϝϞϦΛ༻͍ΔSSDͷͨΊʹ࡞ΒΕͨ ૊ΈࠐΈՄೳͳkey-value storeͰ͢ɻHSE͸DRAM͔ΒଟछͷSSD·ͨ͸ͦͷଞͷsolid-

    stateετϨʔδ·Ͱͷσʔλͷ഑ஔΛ޻෉͢Δ͜ͱͰɺੑೳͱ଱ٱੑΛվળͤ͞·͢ɻ" https://www.micron.com/hse
  5. ετϨʔδΤϯδϯ Ϣʔβۭؒ Χʔωϧۭؒ VFS ϑΝΠϧγεςϜ IOεέδϡʔϥ σόΠευϥΠό ϖʔδΩϟογϡ bio MySQL

    MongoDB WiredTiger ͜Ε InnoDB
  6. ετϨʔδΤϯδϯͷ໾ׂ ετϨʔδσόΠε্Ͱͷه࿥Ґஔͷܾఆ

  7. ετϨʔδΤϯδϯͷ໾ׂ සൟʹΞΫηε͞ΕΔσʔλͷΩϟογϡ

  8. ετϨʔδΤϯδϯͷ໾ׂ PUT ABORT PUT COMMIT BEGIN PUT COMMIT GET GET

    ͜Ε͕ݟ͑Δ GET GET ͜Ε͕ݟ͑Δ GET ͜Ε͕ݟ͑Δ τϥϯβΫγϣϯ
  9. ετϨʔδΤϯδϯͷ໾ׂ COMMIT PUT PUT PUT COMMIT ͜͜ͰΫϥογϡͨ͠Β ࠶ىಈޙ͜͜ͷঢ়ଶʹͳΔ ͜͜ͰΫϥογϡͨ͠Β ࠶ىಈޙ͜͜ͷঢ়ଶʹͳΔ

    ॲཧ͕தஅ͞Εͯ΋σʔλ͕ෆਖ਼ͳঢ়ଶʹͳΒͳ͍
  10. ετϨʔδΤϯδϯͷ࣮૷ํ๏ 0 1 2 4 5 ϩά i 8JSFE5JHFSͷ৔߹

  11. ετϨʔδΤϯδϯͷ࣮૷ํ๏ 0 1 2 4 5 1ͱ2ΛGET ηογϣϯ0 ϩά i

    root 1 2 i 8JSFE5JHFSͷ৔߹
  12. ετϨʔδΤϯδϯͷ࣮૷ํ๏ 0 1 2 4 5 root 1 2 2'ΛPUT

    ηογϣϯ1 root(1) 2' ϩά i i 8JSFE5JHFSͷ৔߹ ηογϣϯ0
  13. ετϨʔδΤϯδϯͷ࣮૷ํ๏ 0 1 2 4 5 root 1 2 ηογϣϯ0

    2ΛGET ηογϣϯ1 root(1) 2' 2ΛGET ϩά i i 8JSFE5JHFSͷ৔߹
  14. ετϨʔδΤϯδϯͷ࣮૷ํ๏ i 0 1 2 4 5 root 1 i

    COMMIT ηογϣϯ1 root(1) 2' ͕ʹͳΔ ϩά 8JSFE5JHFSͷ৔߹ ηογϣϯ0
  15. ετϨʔδΤϯδϯͷ࣮૷ํ๏ 0 1 2 4 5 root 1 ηογϣϯ0 ηογϣϯ1

    2' ͕ʹͳΔ ϩά 2ΛGET 2ΛGET i i 8JSFE5JHFSͷ৔߹
  16. ετϨʔδΤϯδϯͷΨϕʔδίϨΫλ 0 1 2' 4 5 root 1 2' ͕ʹͳΔ

    ϩά i i 8JSFE5JHFSͷ৔߹
  17. ετϨʔδΤϯδϯͷΨϕʔδίϨΫλ 0 1 2' 4 5 root 1 2' ϩά

    i i 8JSFE5JHFSͷ৔߹
  18. ετϨʔδΤϯδϯͷ࣮૷ํ๏ ετϨʔδΤϯδϯ ϑΝΠϧγεςϜ σʔλͷ୳͠ํ Ωʔ ϑΝΠϧύε ه࿥ҐஔΛܾΊΔ ͢Δ ͢Δ Ωϟογϡ

    ͋Δ ͋Δ τϥϯβΫγϣϯ ΞϓϦέʔγϣϯ੍͕ޚ ͳ͍ தஅޙͷঢ়ଶ ׬ྃͨ͠τϥϯβΫγϣϯ͚͕ͩ ൓ө͞Εͨঢ়ଶ ϑΝΠϧγεςϜ͕ յΕ͍ͯͳ͍ԿΒ͔ͷঢ়ଶ தஅޙͷঢ়ଶͷ ճ෮ํ๏ ߏ଄Խϩά ߏ଄Խϩά ϑΝΠϧγεςϜͷδϟʔφϧͱࣅ͍ͯΔ͕ ϑΝΠϧγεςϜ͸τϥϯβΫγϣϯΛఏڙ͠ͳ͍
  19. ϩάͷ্ʹϩά͕৐ͬͨঢ়ଶʹͳΔ Ϣʔβۭؒ Χʔωϧۭؒ VFS ϑΝΠϧγεςϜ IOεέδϡʔϥ σόΠευϥΠό ϖʔδΩϟογϡ bio MySQL

    MongoDB WiredTiger InnoDB ϩά ϩά
  20. σʔλϕʔεΛSSDͷ্Ͱಈ͔͢͜ͱ͸௝͘͠ͳ͘ͳͬͨ

  21. ͷ੍໿ 0 0 0 0 0 20V 20V 20V 20V

    20V 20V 20V 0V 0V 0V 0V 0V ಉҰϒϩοΫͷ͢΂ͯͷηϧ͔ΒిՙΛൈ͘ (=ϒϩοΫΛؙ͝ͱθϩΫϦΞ͢Δ) ௨ৗ1ϒϩοΫ͸ෳ਺ͷϖʔδ͔ΒͳΔͨΊ 1ϖʔδ͚ͩθϩΫϦΞ͢Δ͜ͱ͸ग़དྷͳ͍
  22. ͷ੍໿ 0 0 1 0 1 0V 0V 0V 0V

    0V 0V 0V 0V 0V 20V 0V 20V ͦͷޙ1ʹ͍ͨ͠ηϧʹిՙΛஷΊΔ ͜ͷిՙͷग़ೖΓͷͨͼʹηϧͷτϯωϧࢎԽບ͕ফ໣͢Δ
  23. ϒϩοΫ1 ϒϩοΫ0 ͷ੍໿ 0 1 2 3 4 5 6

    7 2' Λॻ͖׵͍͑ͨ ϋʔυσΟεΫͷΑ͏ʹ௚઀ Λॻ͖׵͑Δͱ 2 ΛθϩΫϦΞ ϒϩοΫ0 0 1 2 3 Λॻ͖ࠐΈ 0 1 2' 3 1. 2. ௒஗͍
  24. ϒϩοΫ1 ϒϩοΫ0 ͷ੍໿ 0 1 2 3 4 5 6

    7 2' ϋʔυσΟεΫͷΑ͏ʹ௚઀ Λॻ͖׵͑Δͱ 2 ͕ۃ୺ʹফ໣ͯ͠࢖͑ͳ͘ͳΔ ϒϩοΫ0 0 1 2 3 ௒੬͍ 2' 2' ಉ͡ϖʔδʹԿ౓΋ॻ͖ࠐΈ
  25. Flash Translation Layer ϒϩοΫ1 ϒϩοΫ0 0 1 2 3 4

    5 6 7 ϒϩοΫ2 ۭ ۭ ۭ ۭ SSDͷίϯτϩʔϥ͸ Ջͳ࣌ʹۭ͖ྖҬΛͲΜͲΜθϩΫϦΞ͢Δ
  26. Flash Translation Layer ϒϩοΫ1 ϒϩοΫ0 0 1 2 3 4

    5 6 7 ϒϩοΫ2 2' ۭ ۭ ۭ 2' ॻ͖ࠐΈཁٻ͕དྷͨΒ θϩΫϦΞࡁΈͷϖʔδʹॻ͖ࠐΉ Λॻ͖׵͍͑ͨ
  27. Flash Translation Layer ϒϩοΫ1 ϒϩοΫ0 0 1 2 3 4

    5 6 7 ϒϩοΫ2 2' ۭ ۭ ۭ SSD͸͋ΔLBAͷϖʔδ͕ Ͳͷ෺ཧΞυϨεʹه࿥͞Ε͍ͯΔ͔Λද͢ ม׵දΛ͍࣋ͬͯΔ ม׵ද 2->8
  28. ม׵ද LBAͱ෺ཧΞυϨεͷม׵ LBA2͸෺ཧΞυϨε8ʹͳͬͨ LBA5͸෺ཧΞυϨε9ʹͳͬͨ LBA1͸ΞυϨε10ʹͳͬͨ LBA1͸TRIM͞Εͨ LBA2͸෺ཧΞυϨε11ʹͳͬͨ LBA3͸෺ཧΞυϨε12ʹͳͬͨ LBA ෺ཧΞυϨε

    2 11 3 12 5 9 ม׵ද͸σόΠεͷRAMͱϑϥογϡϝϞϦͷ྆ํʹஔ͔ΕΔ ϑϥογϡϝϞϦ͸ߦ͝ͱʹॻ͖׵͑ΒΕͳ͍ͷͰ ͢΂ͯͷมߋ͕ߏ଄ԽϩάͰ௥ه͞ΕΔ
  29. ม׵ද LBAͱ෺ཧΞυϨεͷม׵ LBA2͸෺ཧΞυϨε8ʹͳͬͨ LBA5͸෺ཧΞυϨε9ʹͳͬͨ LBA1͸ΞυϨε10ʹͳͬͨ LBA1͸TRIM͞Εͨ LBA2͸෺ཧΞυϨε11ʹͳͬͨ LBA3͸෺ཧΞυϨε12ʹͳͬͨ LBA ෺ཧΞυϨε

    2 11 3 12 5 9 ϒϩοΫ3 ϒϩοΫ2 2 5 1 2 3 ۭ ۭ ۭ ΛಡΈ͍ͨ 2
  30. ม׵ද LBAͱ෺ཧΞυϨεͷม׵ LBA2͸෺ཧΞυϨε8ʹͳͬͨ LBA5͸෺ཧΞυϨε9ʹͳͬͨ LBA1͸ΞυϨε10ʹͳͬͨ LBA1͸TRIM͞Εͨ LBA2͸෺ཧΞυϨε11ʹͳͬͨ LBA3͸෺ཧΞυϨε12ʹͳͬͨ LBA2͸෺ཧΞυϨε13ʹͳͬͨ LBA

    ෺ཧΞυϨε 2 13 3 12 5 9 ϒϩοΫ3 ϒϩοΫ2 2 5 1 3 2 2 ۭ ۭ ʹॻ͖͍ͨ 2
  31. ม׵ද LBAͱ෺ཧΞυϨεͷม׵ LBA2͸෺ཧΞυϨε8ʹͳͬͨ LBA5͸෺ཧΞυϨε9ʹͳͬͨ LBA1͸ΞυϨε10ʹͳͬͨ LBA1͸TRIM͞Εͨ LBA2͸෺ཧΞυϨε11ʹͳͬͨ LBA3͸෺ཧΞυϨε12ʹͳͬͨ LBA2͸෺ཧΞυϨε13ʹͳͬͨ LBA5͸TRIM͞Εͨ

    LBA ෺ཧΞυϨε 2 13 3 12 ϒϩοΫ3 ϒϩοΫ2 2 5 1 3 2 2 ۭ ۭ ΛTRIM 5
  32. ม׵ද FTLͷΨϕʔδίϨΫλ LBA2͸෺ཧΞυϨε8ʹͳͬͨ LBA5͸෺ཧΞυϨε9ʹͳͬͨ LBA1͸ΞυϨε10ʹͳͬͨ LBA1͸TRIM͞Εͨ LBA2͸෺ཧΞυϨε11ʹͳͬͨ LBA3͸෺ཧΞυϨε12ʹͳͬͨ LBA2͸෺ཧΞυϨε13ʹͳͬͨ LBA5͸TRIM͞Εͨ

    LBA ෺ཧΞυϨε 2 13 3 12 ϒϩοΫ3 3 2 ۭ ۭ ϒϩοΫ2 ۭ ۭ ۭ ۭ SSDͷίϯτϩʔϥ͸ શͯͷϖʔδ͕ม׵ද͔Βࢀর͞Εͳ͘ͳͬͨϒϩοΫΛ Ջͳ࣌ʹθϩΫϦΞ͢Δ
  33. FTLͷΨϕʔδίϨΫλ ϒϩοΫ3 3 2 ̐ ̐ ϒϩοΫ2 1 2 1

    2 θϩΫϦΞ͞Εͨϖʔδ͕ݮ͖͍ͬͯͯΔ͕ ͲͷϒϩοΫ΋த్൒୺ʹ࢖ΘΕ͍ͯΔ৔߹
  34. FTLͷΨϕʔδίϨΫλ ϒϩοΫ3 3 2 ϒϩοΫ2 ̐ ̐ 1 2 1

    2 த్൒୺ʹ࢖ΘΕ͍ͯΔϒϩοΫͷ༗ޮͳϖʔδ͚ͩΛ ৽͍͠ϒϩοΫʹॻ͖ࠐΈ ϒϩοΫ4 ۭ ۭ 1 ۭ
  35. FTLͷΨϕʔδίϨΫλ ϒϩοΫ3 3 2 ϒϩοΫ2 ۭ ۭ ۭ ۭ ෆཁʹͳͬͨݩͷϒϩοΫΛθϩΫϦΞ

    ϒϩοΫ4 ۭ ۭ 1 ۭ ̐ ̐
  36. Flash Translation Layer Flash Translation Layer ϑΝΠϧγεςϜ σʔλͷ୳͠ํ LBA ϑΝΠϧύε

    ه࿥ҐஔΛܾΊΔ ͢Δ ͢Δ Ωϟογϡ ͋Δ ͋Δ τϥϯβΫγϣϯ ͳ͍ ͳ͍ தஅޙͷঢ়ଶ ΞυϨεม׵ද͕յΕ͍ͯͳ͍ ԿΒ͔ͷঢ়ଶ ϑΝΠϧγεςϜ͕ յΕ͍ͯͳ͍ԿΒ͔ͷঢ়ଶ தஅޙͷঢ়ଶͷ ճ෮ํ๏ ߏ଄Խϩά ߏ଄Խϩά ϑΝΠϧγεςϜͷδϟʔφϧͱࣅ͍ͯΔ͕ ϩάͷཻ౓͸ϖʔδ୯Ґ
  37. ϩάͷ্ʹϩά͕৐্ͬͨʹϩά͕৐ͬͨঢ়ଶʹͳΔ Ϣʔβۭؒ Χʔωϧۭؒ VFS ϑΝΠϧγεςϜ σόΠευϥΠό ϖʔδΩϟογϡ bio MySQL MongoDB

    WiredTiger InnoDB ϩά ϩά Χʔωϧۭؒ ϋʔυ΢ΣΞ Flash Translation Layer NANDϑϥογϡϝϞϦ ϩά
  38. ϒϩοΫ1 ۭ ۭ ϒϩοΫ0 ۭ ۭ 5 7 7 9

    ϒϩοΫ2 ۭ ۭ ۭ ۭ ʹॻ͖͍ͨ 1 2 3 4 ΛTRIM͍ͨ͠ 1 2 3 4 ϒϩοΫ1 1 2 ϒϩοΫ0 3 4 5 7 7 9 ϒϩοΫ2 ۭ ۭ ۭ ۭ ͙͢θϩΫϦΞͰ͖Δ ফڈ͸ϒϩοΫ୯ҐͰདྷΔͱخ͍͠
  39. ϩάߏ଄ԽϑΝΠϧγεςϜ ϑΝΠϧhogeΛ࡞ͬͨ ϑΝΠϧhogeͷ0ϖʔδ໨ʹσʔλΛॻ͍ͨ ϑΝΠϧhogeͷ1ϖʔδ໨ʹσʔλΛॻ͍ͨ ϑΝΠϧfugaΛ࡞ͬͨ ϑΝΠϧfugaͷ0ϖʔδ໨ʹσʔλΛॻ͍ͨ ϑΝΠϧhogeΛ࡟আͨ͠ ϑΝΠϧfugaͷ0ϖʔδ໨ʹσʔλΛॻ͍ͨ ϑΝΠϧγεςϜʹ ىͬͨ͜ΠϕϯτΛ

    ࣌ܥྻॱʹ ετϨʔδʹه࿥ ⋮ ৽͍͠ૢ࡞͸ৗʹϩάͷઌ୺ʹ௥ه͞ΕΔ
  40. ϩάߏ଄ԽϑΝΠϧγεςϜͷΨϕʔδίϨΫλ ϑΝΠϧhogeΛ࡞ͬͨ ϑΝΠϧhogeͷ0ϖʔδ໨ʹσʔλΛॻ͍ͨ ϑΝΠϧhogeͷ1ϖʔδ໨ʹσʔλΛॻ͍ͨ ϑΝΠϧfugaΛ࡞ͬͨ ϑΝΠϧfugaͷ0ϖʔδ໨ʹσʔλΛॻ͍ͨ ϑΝΠϧhogeΛ࡟আͨ͠ ϑΝΠϧfugaͷ0ϖʔδ໨ʹσʔλΛॻ͍ͨ ϑΝΠϧγεςϜͷ ݱࡏͷঢ়ଶʹ

    Өڹ͠ͳ͍ΠϕϯτΛ ݟ͚ͭΔ ͦͷ··ॻ͖ଓ͚ΔͱετϨʔδͷྖҬΛ࢖͍੾Δ ϑΝΠϧfugaΛ࡞ͬͨ ϑΝΠϧfugaͷ0ϖʔδ໨ʹσʔλΛॻ͍ͨ Өڹͷ͋Δϩά͚ͩΛ ίϐʔͨ͠ ৽͍͠ϩάΛ࡞Δ
  41. ϩάߏ଄ԽϑΝΠϧγεςϜͰ͸ ΨϕʔδίϨΫλ͕૸ΔλΠϛϯάͰ ·ͱ·ͬͨྖҬ͕TRIM͞ΕΔ FTLͷΨϕʔδίϨΫλʹ΍͍͞͠ ·ͱ·ͬͨྖҬ͕SSDͷ෺ཧΞυϨε্Ͱ΋·ͱ·͍ͬͯΔ৔߹ ଈ࠲ʹϒϩοΫΛղ์Ͱ͖ΔՄೳੑ͕ߴ͍

  42. ϩά η Ϋ γ ἀ ϯ η Ϋ γ ἀ

    ϯ Flash-Friendly File System (F2FS) SB CP SIT NAT SSA Main ϩά η Ϋ γ ἀ ϯ η Ϋ γ ἀ ϯ η Ϋ γ ἀ ϯ η Ϋ γ ἀ ϯ η Ϋ γ ἀ ϯ η Ϋ γ ἀ ϯ η Ϋ γ ἀ ϯ ⋯ ෳ਺ͷϩάΛ࣋ͭ ϩάߏ଄ԽϑΝΠϧγεςϜ ϩάʹ࢖͏ྖҬ͸ ηΫγϣϯ୯ҐͰׂΓ౰ͯ ηΫγϣϯαΠζ͸ ଟ෼ϒϩοΫαΠζͱҰக GC࣌ͷTRIM͕ ϒϩοΫ୯Ґʹͳͬͯخ͍͠
  43. 3ͭͷಠཱʹಈ͘ΨϕʔδίϨΫλ͕ॏͳͬͨঢ়ଶ Ϣʔβۭؒ Χʔωϧۭؒ VFS ϑΝΠϧγεςϜ σόΠευϥΠό ϖʔδΩϟογϡ bio MySQL MongoDB

    WiredTiger InnoDB GC GC Χʔωϧۭؒ ϋʔυ΢ΣΞ Flash Translation Layer NANDϑϥογϡϝϞϦ GC
  44. https://www.usenix.org/node/187064 Don't Stack Your Log On My Log YANG, J.,

    PLASSON, N., GILLIS, G., TALAGALA, N., AND SUNDARARAMAN, S. Don’t stack your log on my log. In 2nd Workshop on Interactions of NVM/Flash with Operating Systems and Workloads (INFLOW) (2014).
  45. https://www.usenix.org/node/187064 YANG, J., PLASSON, N., GILLIS, G., TALAGALA, N., AND

    SUNDARARAMAN, S. Don’t stack your log on my log. In 2nd Workshop on Interactions of NVM/Flash with Operating Systems and Workloads (INFLOW) (2014). ߏ଄ԽϩάΛԿॏʹ΋ॏͶΔͱ NAND΁ͷॻ͖ࠐΈ͕ͲΜͲΜ૿͑ͯੑೳΨλམͪ ͱ͍͏࿦จ Don't Stack Your Log On My Log
  46. ϑΝΠϧγεςϜ ετϨʔδΤϯδϯ σʔλ Λॻ͖͍ͨ ϝλ0 σʔλ Λॻ͖͍ͨ ߏ଄Խϩά͸ॻ͖͍ͨσʔλʹՃ͑ͯ ϝλσʔλΛॻ͘ඞཁ͕͋Δ Write

    Amplification
  47. ϑΝΠϧγεςϜ ετϨʔδΤϯδϯ Flash Translation Layer σʔλ Λॻ͖͍ͨ ϝλ0 σʔλ ϝλ0

    σʔλ ϝλ1 Λॻ͖͍ͨ Λॻ͖͍ͨ Write Amplification ϝλ2 ϝλ1 σʔλ ϝλ3 ϝλ5 ϝλ2 ϝλ0 ϝλ4 Λॻ ্૚ͷϝλσʔλ͸ Լ૚ʹͱͬͯ͸σʔλͳͷͰ ϝλσʔλʹϝλσʔλ͕෇͘
  48. ϑΝΠϧγεςϜ ηΫγϣϯ0 ηΫγϣϯ1 ηΫγϣϯ2 ϩά Flash Translation Layer ϒϩοΫ2 ϩά

    ηΫγϣϯ3 ʹॻ͔Εͨϩά͕ෆཁʹͳͬͨͷͰTRIM ηΫγϣϯ1 ϒϩοΫ1 ηΫγϣϯαΠζͱϒϩοΫαΠζ͕ҟͳΔͱ ్த·ͰTRIM͞ΕͨϒϩοΫ͕ੜ͡Δ ϒϩοΫ0 Write Amplification
  49. Flash Translation Layer ϒϩοΫ2 ϩά ϒϩοΫ1 ϒϩοΫ0 ϒϩοΫ3 ϒϩοΫ4 ΨϕʔδίϨΫλ͸్த·ͰTRIM͞ΕͨϒϩοΫ͔Β

    θϩΫϦΞ͞ΕͨྖҬΛ࡞ΔͨΊʹ ࢖༻தͷϖʔδΛ৽͍͠ϒϩοΫʹίϐʔ͢Δ Write Amplification
  50. Write Amplification ετϨʔδΤϯδϯ͸௨ৗϩά͕TRIMՄೳͰ͋Δ͜ͱΛ Լ૚ʹ௨஌͠ͳ͍ ετϨʔδΤϯδϯ ϖʔδ0 ϖʔδ1 ϖʔδ2 ϩά ϖʔδ3

    ࢖༻ࡁΈϩά ϑΝΠϧγεςϜ ϖʔδ0 ϖʔδ1 ϖʔδ2 ϩά ϖʔδ3 Flash Translation Layer ϑΝΠϧ͕͋Δ͔Β࢖༻த ϖʔδ͸࢖༻த͔ͩΒ ผͷϒϩοΫʹίϐʔ͢Δ
  51. Write Amplification 0 ϩά 1 2 3 4 5 6

    ߏ଄ԽϩάͷΨϕʔδίϨΫγϣϯ͸ طଘͷϩά͔Β·ͩ༗ޮͳཁૉ͚ͩΛऔΓग़ͯ͠ ৽͍͠ϩάʹίϐʔ͢Δ 0 ϩά 1 2 3 4 5 6 ϩά 2 6 ̎ ̒ ͜Ε͸Լ૚ͷϩάʹ৽͍͠ॻ͖ࠐΈΛ࢈Ή ্ Լ
  52. Write Amplification 0 ϩά 1 2 3 4 5 6

    ΋͠Լ૚ͷϩάͷΨϕʔδίϨΫλ͕ ૸ͬͨ௚ޙʹ্૚ͷΨϕʔδίϨΫλ͕૸Δͱ 0 ϩά 1 2 3 5 6 1 2 ϩά ΨϕʔδίϨΫγϣϯͰྖҬΛۭ͚ͨ͹͔ΓͷԼ૚ͷϩάʹ େྔͷॻ͖ࠐΈΛੜ্ͤͨ͡͞ʹ 0 1 2 3 5 6 3 5 6 ্ Լ 0
  53. 0 1 2 3 5 6 Write Amplification 0 ϩά

    1 2 3 4 5 6 0 1 2 3 5 6 ϩά ΨϕʔδίϨΫγϣϯͰྖҬΛۭ͚ͨ͹͔ΓͷԼ૚ͷϩάʹ େྔͷΨϕʔδίϨΫγϣϯ଴ͪͷཁૉΛੜͤ͡͞Δ 0 1 2 3 5 6 ্ Լ ௚લͷԼ૚ͷΨϕʔδίϨΫγϣϯΛҰॠͰ୆ແ͠ʹ͢Δ ϩά 1 2 3 5 6 0 0 1 2 3 5 6
  54. Write Amplification ͜ΕΒͷޮՌ͕߹Θͬͨ݁͞Ռ ෳ਺ͷߏ଄Խϩά͕ॏͳͬͨঢ়گͰ͸ ॻ͖ࠐΈΛཁٻͨ͠σʔλͷαΠζʹରͯ͠ ࣮ࡍʹNANDʹॻ͔ΕΔσʔλͷαΠζ͕ ࠅ͍έʔεͰ 2ഒҎ্ʹ๲Ε্͕Δ

  55. Write Amplification ճආํ๏ 1.ߏ଄ԽϩάΛॏͶΔͳ 2.Ͳ͏ͯ͠΋ॏͶΔඞཁ͕͋Δ৔߹͸ ϒϩοΫαΠζΛἧ͑Ζ 3.࢖͍ऴΘͬͨϩά͸TRIM͠Ζ

  56. ϑΝΠϧγεςϜΛ΍ΊΑ͏ VFS ϑΝΠϧγεςϜ σόΠευϥΠό ϖʔδΩϟογϡ bio MySQL MongoDB WiredTiger InnoDB

    Flash Translation Layer NANDϑϥογϡϝϞϦ ϩά ϩά ϩά τϥϯβΫγϣϯΛ ࣮ݱ͢ΔͨΊʹඞཁ ϋʔυ΢ΣΞͷػೳ ࣺ͍ͯͨ
  57. Ϣʔβۭؒ Χʔωϧۭؒ VFS ϑΝΠϧγεςϜ σόΠευϥΠό ϖʔδΩϟογϡ bio MySQL MongoDB WiredTiger

    InnoDB Χʔωϧۭؒ ϋʔυ΢ΣΞ Flash Translation Layer NANDϑϥογϡϝϞϦ HSE mpool HSE͸ΧʔωϧϞδϡʔϧmpoolΛ࢖͏
  58. Ϣʔβۭؒ Χʔωϧۭؒ VFS ϑΝΠϧγεςϜ σόΠευϥΠό ϖʔδΩϟογϡ bio MySQL MongoDB WiredTiger

    InnoDB Χʔωϧۭؒ ϋʔυ΢ΣΞ Flash Translation Layer NANDϑϥογϡϝϞϦ HSE mpool mpool͸ϒϩοΫσόΠεͷ্Ͱಈ͘
  59. ϒϩοΫσόΠεΛࢦఆͯ͠mpoolσόΠεΛ࡞Δ root # modprobe mpool root # ls /dev/mpool* /dev/mpoolctl

    root # mpool create mp1 /dev/nvme0n1 uid=test gid=test mode=0600 root # ls /dev/mpool* /dev/mpoolctl /dev/mpool: mp1 root # mpool list MPOOL TOTAL USED AVAIL CAPACITY LABEL HEALTH mp1 466g 1.16g 441g 0.26% raw optimal
  60. mpool mpoolΧʔωϧϞδϡʔϧ Ϣʔβۭؒ Χʔωϧۭؒ mblock mlog mcache HSE ioctl ioctl

    ioctl mpool͸3ͭͷػೳΛఏڙ͢Δ mpool ϢʔβۭؒϥΠϒϥϦ mdc
  61. mpool *raw_pool = nullptr; SAFE_CALL( mpool_open( params[ "pool" ].as< std::string

    >().c_str(), O_RDWR, &raw_pool, nullptr ) ); std::shared_ptr< mpool > pool( raw_pool, []( mpool *p ) { if( p ) mpool_close( p ); } ); uint64_t block_id = 0u; mblock_props props; mpool_openͰmpoolσόΠεΛ։͖ mblock͸ϖʔδαΠζͷ੔਺ഒͷόΠτྻΛmpoolʹอଘ͢Δ mblock͸࡞੒࣌ʹҰ౓͚ͩॻ͘ࣄ͕Ͱ͖ มߋ΍௥ه͸Ͱ͖ͳ͍͕࡟আ͸Ͱ͖Δ mblockͷAPI
  62. mpool *raw_pool = nullptr; SAFE_CALL( mpool_open( params[ "pool" ].as< std::string

    >().c_str(), O_RDWR, &raw_pool, nullptr ) ); std::shared_ptr< mpool > pool( raw_pool, []( mpool *p ) { if( p ) mpool_close( p ); } ); uint64_t block_id = 0u; mblock_props props; size_t length = 0; if( !params.count( "object" ) ) { memset( reinterpret_cast< void* >( &props ), 0, sizeof( props ) ); SAFE_CALL( mpool_mblock_alloc( pool.get(), MP_MED_CAPACITY, false, &block_id, &props ) ) std::cout << "object id: " << props.mpr_objid << std::endl; std::string m = params[ "message" ].as< std::string >(); size_t buf_size = ( m.size() / PAGE_SIZE + ( m.size() % PAGE_SIZE ? 1 : 0 ) ) * PAGE_SIZE; mpool_mblock_allocͰ৽͍͠mblockΛ࡞੒͢Δ ͜͜ͰฦΔ64bitͷblock id͸ ϑΝΠϧσΟεΫϦϓλͷΑ͏ͳ΋ͷ mblockͷAPI
  63. size_t length = 0; if( !params.count( "object" ) ) {

    memset( reinterpret_cast< void* >( &props ), 0, sizeof( props ) ); SAFE_CALL( mpool_mblock_alloc( pool.get(), MP_MED_CAPACITY, false, &block_id, &props ) ) std::cout << "object id: " << props.mpr_objid << std::endl; std::string m = params[ "message" ].as< std::string >(); size_t buf_size = ( m.size() / PAGE_SIZE + ( m.size() % PAGE_SIZE ? 1 : 0 ) ) * PAGE_SIZE; std::unique_ptr< char, free_deleter > buf( reinterpret_cast< char* >( aligned_alloc( PAGE_SIZE, buf_size ) ) ); if( !buf ) throw std::bad_alloc(); memset( buf.get(), 0, buf_size ); std::copy( m.begin(), m.end(), buf.get() ); iovec iov; iov.iov_base = buf.get(); iov.iov_len = buf_size; mblockͷAPI Ұํಉ࣌ʹಘΒΕΔobject id͸ϑΝΠϧ໊ͷΑ͏ͳ΋ͷ ͜ͷmblockΛ୳͢ͱ͖͸object idΛ࢖༻͢Δ
  64. mblock_props props; size_t length = 0; if( !params.count( "object" )

    ) { memset( reinterpret_cast< void* >( &props ), 0, sizeof( props ) ); SAFE_CALL( mpool_mblock_alloc( pool.get(), MP_MED_CAPACITY, false, &block_id, &props ) ) std::cout << "object id: " << props.mpr_objid << std::endl; std::string m = params[ "message" ].as< std::string >(); size_t buf_size = ( m.size() / PAGE_SIZE + ( m.size() % PAGE_SIZE ? 1 : 0 ) ) * PAGE_SIZE; std::unique_ptr< char, free_deleter > buf( reinterpret_cast< char* >( aligned_alloc( PAGE_SIZE, buf_size ) ) ); if( !buf ) throw std::bad_alloc(); memset( buf.get(), 0, buf_size ); std::copy( m.begin(), m.end(), buf.get() ); iovec iov; iov.iov_base = buf.get(); mblockʹॻ͖ࠐΉσʔλ͸ϖʔδڥքʹ ΞϥΠϯ͞Ε͍ͯͳ͚Ε͹ͳΒͳ͍ mblockͷAPI mpoolͷॻ͖ࠐΈʹ͸ϖʔδΩϟογϡ͕ແ͘ Χʔωϧ͸͜͜Ͱ֬อͨ͠ϝϞϦΛ௚઀σόΠευϥΠόʹ౉͢
  65. SAFE_CALL( mpool_mblock_alloc( pool.get(), MP_MED_CAPACITY, false, &block_id, &props ) ) std::cout

    << "object id: " << props.mpr_objid << std::endl; std::string m = params[ "message" ].as< std::string >(); size_t buf_size = ( m.size() / PAGE_SIZE + ( m.size() % PAGE_SIZE ? 1 : 0 ) ) * PAGE_SIZE; std::unique_ptr< char, free_deleter > buf( reinterpret_cast< char* >( aligned_alloc( PAGE_SIZE, buf_size ) ) ); if( !buf ) throw std::bad_alloc(); memset( buf.get(), 0, buf_size ); std::copy( m.begin(), m.end(), buf.get() ); iovec iov; iov.iov_base = buf.get(); iov.iov_len = buf_size; length = buf_size; SAFE_CALL( mpool_mblock_write( pool.get(), block_id, &iov, 1 ) ) if( abort_transaction ) mpool_mblock_writeͰmblockʹσʔλΛॻ͖ࠐΉ iovecΛෳ਺༻ҙ͢Δ͜ͱͰ ෳ਺ͷϝϞϦྖҬ͔ΒͷσʔλΛ૊Έ߹Θͤͯॻ͘͜ͱ΋Ͱ͖Δ mblockͷAPI
  66. PAGE_SIZE ? 1 : 0 ) ) * PAGE_SIZE; std::unique_ptr<

    char, free_deleter > buf( reinterpret_cast< char* >( aligned_alloc( PAGE_SIZE, buf_size ) ) ); if( !buf ) throw std::bad_alloc(); memset( buf.get(), 0, buf_size ); std::copy( m.begin(), m.end(), buf.get() ); iovec iov; iov.iov_base = buf.get(); iov.iov_len = buf_size; length = buf_size; SAFE_CALL( mpool_mblock_write( pool.get(), block_id, &iov, 1 ) ) if( abort_transaction ) SAFE_CALL( mpool_mblock_abort( pool.get(), block_id ) ) else SAFE_CALL( mpool_mblock_commit( pool.get(), block_id ) ) } else { uint64_t object_id = params[ "object" ].as< uint64_t >(); mpool_mblock_commitͰมߋΛ֬ఆ͢Δ ͜ͷؔ਺ʹ౸ୡ͠ͳ͔ͬͨ৔߹ͦ͜·Ͱͷ mpool_mblock_write͸ແ͔ͬͨ͜ͱʹͳΔ mblockͷAPI mpool_mblock_abortͰ ͦ͜·ͰͷมߋΛ໌ࣔతʹແ͔ͬͨ͜ͱʹ͢Δ
  67. iovec iov; iov.iov_base = buf.get(); iov.iov_len = buf_size; length =

    buf_size; SAFE_CALL( mpool_mblock_write( pool.get(), block_id, &iov, 1 ) ) if( abort_transaction ) SAFE_CALL( mpool_mblock_abort( pool.get(), block_id ) ) else SAFE_CALL( mpool_mblock_commit( pool.get(), block_id ) ) } else { uint64_t object_id = params[ "object" ].as< uint64_t >(); SAFE_CALL( mpool_mblock_find_get( pool.get(), object_id, &block_id, &props ) ) length = props.mpr_write_len; std::cout << "object id: " << object_id << std::endl; } طʹॻ͖ࠐ·ΕͨmblockΛ୳͢ʹ͸ mpool_mblock_find_get mblockͷAPI
  68. SAFE_CALL( mpool_mblock_find_get( pool.get(), object_id, &block_id, &props ) ) length =

    props.mpr_write_len; std::cout << "object id: " << object_id << std::endl; } { size_t buf_size = length; std::unique_ptr< char, free_deleter > buf( reinterpret_cast< char* >( aligned_alloc( PAGE_SIZE, buf_size ) ) ); if( !buf ) throw std::bad_alloc(); memset( buf.get(), 0, buf_size ); iovec iov; iov.iov_base = buf.get(); iov.iov_len = buf_size; SAFE_CALL( mpool_mblock_read( pool.get(), block_id, &iov, 1, 0 ) ) std::cout << "length: " << length << std::endl; std::cout << "data: " << buf.get() << std::endl; } mpool_mblock_readͰಡΉ ಡΉͱ͖ʹ࢖͏όοϑΝ΋ ϖʔδڥքʹΞϥΠϯ͞Ε͍ͯΔඞཁ͕͋Δ mblockͷAPI
  69. { size_t buf_size = length; std::unique_ptr< char, free_deleter > buf(

    reinterpret_cast< char* >( aligned_alloc( PAGE_SIZE, buf_size ) ) ); if( !buf ) throw std::bad_alloc(); memset( buf.get(), 0, buf_size ); iovec iov; iov.iov_base = buf.get(); iov.iov_len = buf_size; SAFE_CALL( mpool_mblock_read( pool.get(), block_id, &iov, 1, 0 ) ) std::cout << "length: " << length << std::endl; std::cout << "data: " << buf.get() << std::endl; } if( delete_block ) SAFE_CALL( mpool_mblock_delete( pool.get(), block_id ) ) mpool_mblock_deleteΛ࢖͑͹ ࢦఆͨ͠mblockΛؙ͝ͱ࡟আͰ͖Δ mblockͷAPI
  70. mpool *raw_pool = nullptr; SAFE_CALL( mpool_open( params[ "pool" ].as< std::string

    >().c_str(), O_RDWR|O_EXCL, &raw_pool, nullptr ) ); std::shared_ptr< mpool > pool( raw_pool, []( mpool *p ) { if( p ) mpool_close( p ); } ); mlog_capacity cap; memset( reinterpret_cast< void* >( &cap ), 0, sizeof( cap ) ); mpool_openͰmpoolσόΠεΛ։͘ͷ͸mblockͱҰॹ mlog͸ޙ͔Β௥هͰ͖ΔόΠτྻΛmpoolʹอଘ͢Δ mlogͷ࠷େαΠζ͸࡞੒࣌ʹܾఆ͞Ε ࠷େαΠζ·Ͱ௥هͨ͠ΒͦΕҎ্ॻ͖ࠐΊͳ͘ͳΔ mlogͷAPI
  71. std::shared_ptr< mpool > pool( raw_pool, []( mpool *p ) {

    if( p ) mpool_close( p ); } ); mlog_capacity cap; memset( reinterpret_cast< void* >( &cap ), 0, sizeof( cap ) ); std::shared_ptr< mpool_mlog > log; if( !params.count( "object" ) ) { cap.lcp_captgt = 4 * 1024 * 1024; mlog_props props; memset( reinterpret_cast< void* >( &props ), 0, sizeof( props ) ); mpool_mlog *raw_log = nullptr; SAFE_CALL( mpool_mlog_alloc( pool.get(), &cap, MP_MED_CAPACITY, &props, &raw_log ) ); log.reset( raw_log, [pool]( mpool_mlog *p ) { if( p ) mpool_mlog_close( pool.get(), p ); } ); uint64_t object_id = props.lpr_objid; std::cout << "object id: " << object_id << std::endl; SAFE_CALL( mpool_mlog_commit( pool.get(), log.get() ) ) mlogͷAPI mpool_mlog_allocͰ৽͍͠mlogΛ࡞੒ ࢖༻͢ΔྖҬͷαΠζ (ϖʔδαΠζͷ੔਺ഒ)
  72. log.reset( raw_log, [pool]( mpool_mlog *p ) { if( p )

    mpool_mlog_close( pool.get(), p ); } ); uint64_t object_id = props.lpr_objid; std::cout << "object id: " << object_id << std::endl; SAFE_CALL( mpool_mlog_commit( pool.get(), log.get() ) ) } else { mlog_props props; mpool_mlog *raw_log = nullptr; SAFE_CALL( mpool_mlog_find_get( pool.get(), params[ "object" ].as<uint64_t>(), &props, &raw_log ) ) log.reset( raw_log, [pool]( mpool_mlog *p ) { if( p ) mpool_mlog_close( pool.get(), p ); } ); uint64_t object_id = props.lpr_objid; std::cout << "object id: " << object_id << std::endl; } uint64_t gen = 0; SAFE_CALL( mpool_mlog_open( pool.get(), log.get(), 0, &gen ) ) mlogͷAPI طʹ͋ΔmlogΛ୳࣌͢͸mpool_mlog_find_get mpool_mlog_alloc΍mpool_mlog_find_get͸ mlog_propsΛฦ͢
  73. SAFE_CALL( mpool_mlog_commit( pool.get(), log.get() ) ) } else { mlog_props

    props; mpool_mlog *raw_log = nullptr; SAFE_CALL( mpool_mlog_find_get( pool.get(), params[ "object" ].as<uint64_t>(), &props, &raw_log ) ) log.reset( raw_log, [pool]( mpool_mlog *p ) { if( p ) mpool_mlog_close( pool.get(), p ); } ); uint64_t object_id = props.lpr_objid; std::cout << "object id: " << object_id << std::endl; } uint64_t gen = 0; SAFE_CALL( mpool_mlog_open( pool.get(), log.get(), 0, &gen ) ) if( params.count( "message" ) ) for( const auto &a: params[ "message" ].as< std::vector< std::string > >() ) SAFE_CALL( mpool_mlog_append_data( pool.get(), log.get(), mlogͷAPI mpool_mlogΛ࢖ͬͯ mpool_mlog_openͰϩάΛ։͘
  74. mpool_mlog_close( pool.get(), p ); } ); uint64_t object_id = props.lpr_objid;

    std::cout << "object id: " << object_id << std::endl; } uint64_t gen = 0; SAFE_CALL( mpool_mlog_open( pool.get(), log.get(), 0, &gen ) ) if( params.count( "message" ) ) for( const auto &a: params[ "message" ].as< std::vector< std::string > >() ) SAFE_CALL( mpool_mlog_append_data( pool.get(), log.get(), const_cast< void* >( static_cast< const void* >( a.data() ) ), a.size(), 1 ) ) if( abort_transaction ) SAFE_CALL( mpool_mlog_abort( pool.get(), log.get() ) ) else SAFE_CALL( mpool_mlog_commit( pool.get(), log.get() ) ) if( erase_log != std::numeric_limits< uint64_t >::max() ) SAFE_CALL( mpool_mlog_erase( pool.get(), log.get(), mlogͷAPI mpool_mlog_append_dataͰmlogʹόΠτྻΛ௥Ճ͢Δ ॻ͖ࠐΉόΠτྻ͸ϖʔδڥքʹΞϥΠϯ͞Ε͍ͯͳͯ͘΋ྑ͍
  75. } uint64_t gen = 0; SAFE_CALL( mpool_mlog_open( pool.get(), log.get(), 0,

    &gen ) ) if( params.count( "message" ) ) for( const auto &a: params[ "message" ].as< std::vector< std::string > >() ) SAFE_CALL( mpool_mlog_append_data( pool.get(), log.get(), const_cast< void* >( static_cast< const void* >( a.data() ) ), a.size(), 1 ) ) if( abort_transaction ) SAFE_CALL( mpool_mlog_abort( pool.get(), log.get() ) ) else SAFE_CALL( mpool_mlog_commit( pool.get(), log.get() ) ) if( erase_log != std::numeric_limits< uint64_t >::max() ) SAFE_CALL( mpool_mlog_erase( pool.get(), log.get(), erase_log ) ) bool empty = false; SAFE_CALL( mpool_mlog_empty( pool.get(), log.get(), &empty ) ) std::cout << "empty: " << empty << std::endl; mlogͷAPI mpool_mlog_commitͰมߋΛ֬ఆ͢Δ ͜ͷؔ਺ʹ౸ୡ͠ͳ͔ͬͨ৔߹ͦ͜·Ͱͷ mpool_mlog_append_data͸ແ͔ͬͨ͜ͱʹͳΔ mpool_mlog_abortͰͦ͜·ͰͷมߋΛ ໌ࣔతʹແ͔ͬͨ͜ͱʹ͢Δ
  76. SAFE_CALL( mpool_mlog_open( pool.get(), log.get(), 0, &gen ) ) if( params.count(

    "message" ) ) for( const auto &a: params[ "message" ].as< std::vector< std::string > >() ) SAFE_CALL( mpool_mlog_append_data( pool.get(), log.get(), const_cast< void* >( static_cast< const void* >( a.data() ) ), a.size(), 1 ) ) if( abort_transaction ) SAFE_CALL( mpool_mlog_abort( pool.get(), log.get() ) ) else SAFE_CALL( mpool_mlog_commit( pool.get(), log.get() ) ) if( erase_log != std::numeric_limits< uint64_t >::max() ) SAFE_CALL( mpool_mlog_erase( pool.get(), log.get(), erase_log ) ) bool empty = false; SAFE_CALL( mpool_mlog_empty( pool.get(), log.get(), &empty ) ) std::cout << "empty: " << empty << std::endl; size_t len = 0; SAFE_CALL( mpool_mlog_len( pool.get(), log.get(), &len ) ) mlogͷAPI mpool_mlog_eraseΛ࢖͑͹ ࢦఆͨ͠mlogΛؙ͝ͱ࡟আͰ͖Δ
  77. else SAFE_CALL( mpool_mlog_commit( pool.get(), log.get() ) ) if( erase_log !=

    std::numeric_limits< uint64_t >::max() ) SAFE_CALL( mpool_mlog_erase( pool.get(), log.get(), erase_log ) ) bool empty = false; SAFE_CALL( mpool_mlog_empty( pool.get(), log.get(), &empty ) ) std::cout << "empty: " << empty << std::endl; size_t len = 0; SAFE_CALL( mpool_mlog_len( pool.get(), log.get(), &len ) ) std::cout << "length: " << len << std::endl; SAFE_CALL( mpool_mlog_read_data_init( pool.get(), log.get() ) ) while( 1 ) { std::array< char, 1024u > buf; size_t length = 0u; SAFE_CALL( mpool_mlog_read_data_next( pool.get(), log.get(), buf.data(), buf.size() - 1, &length ) ); if( !length ) break; buf[ length ] = '\0'; mlogͷAPI mpool_mlog_emptyͰ mlog͕ۭ͔Ͳ͏͔Λ֬ೝͰ͖Δ mpool_mlog_lenͰ mlogͷ࢖༻ࡁΈͷྖҬͷαΠζΛऔಘͰ͖Δ
  78. std::cout << "empty: " << empty << std::endl; size_t len

    = 0; SAFE_CALL( mpool_mlog_len( pool.get(), log.get(), &len ) ) std::cout << "length: " << len << std::endl; SAFE_CALL( mpool_mlog_read_data_init( pool.get(), log.get() ) ) while( 1 ) { std::array< char, 1024u > buf; size_t length = 0u; SAFE_CALL( mpool_mlog_read_data_next( pool.get(), log.get(), buf.data(), buf.size() - 1, &length ) ); if( !length ) break; buf[ length ] = '\0'; std::cout << "data: " << buf.data() << std::endl; } SAFE_CALL( mpool_mlog_flush( pool.get(), log.get() ) ) SAFE_CALL( mpool_mlog_close( pool.get(), log.get() ) ) if( delete_log ) SAFE_CALL( mpool_mlog_delete( pool.get(), log.get() ) ) mlogͷAPI mpool_mlog_read_data_initͰಡΈग़͠ͷ༻ҙΛͯ͠ mpool_mlog_read_data_nextͰઌ಄͔Βॱ൪ʹ ॻ͖ࠐ·Εͨ಺༰ΛಡΊΔ
  79. SAFE_CALL( mpool_mlog_read_data_next( pool.get(), log.get(), buf.data(), buf.size() - 1, &length )

    ); if( !length ) break; buf[ length ] = '\0'; std::cout << "data: " << buf.data() << std::endl; } SAFE_CALL( mpool_mlog_flush( pool.get(), log.get() ) ) SAFE_CALL( mpool_mlog_close( pool.get(), log.get() ) ) if( delete_log ) SAFE_CALL( mpool_mlog_delete( pool.get(), log.get() ) ) mlogͷAPI mlogΛ࡟আ͢Δͱ͖͸mpool_mlog_delete
  80. mpool *raw_pool = nullptr; SAFE_CALL( mpool_open( params[ "pool" ].as< std::string

    >().c_str(), O_RDWR|O_EXCL, &raw_pool, nullptr ) ); std::shared_ptr< mpool > pool( raw_pool, []( mpool *p ) { if( p ) mpool_close( p ); } ); uint64_t log1 = 0; uint64_t log2 = 0; if( !params.count( "object" ) ) { mdc_capacity cap; mdcͷAPI MetaData Containerུͯ͠MDC mlogΛ2ຊ૊Έ߹Θͤͯ ΨϕʔδίϨΫγϣϯͰ͖ΔΑ͏ʹͨ͠΋ͷ mpool_openͰmpoolσόΠεΛ։͘ͷ͸mlogͱҰॹ
  81. SAFE_CALL( mpool_open( params[ "pool" ].as< std::string >().c_str(), O_RDWR|O_EXCL, &raw_pool, nullptr

    ) ); std::shared_ptr< mpool > pool( raw_pool, []( mpool *p ) { if( p ) mpool_close( p ); } ); uint64_t log1 = 0; uint64_t log2 = 0; if( !params.count( "object" ) ) { mdc_capacity cap; memset( reinterpret_cast< void* >( &cap ), 0, sizeof( cap ) ); cap.mdt_captgt = 4 * 1024 * 1024; SAFE_CALL( mpool_mdc_alloc( pool.get(), &log1, &log2, MP_MED_CAPACITY, &cap, nullptr ) ); std::cout << "object id: " << log1 << ":" << log2 << std::endl; SAFE_CALL( mpool_mdc_commit( pool.get(), log1, log2 ) ) } else { auto v = params[ "object" ].as< std::string >(); boost::fusion::vector< uint64_t, uint64_t > parsed; namespace qi = boost::spirit::qi; if( !qi::parse( v.begin(), v.end(), qi::ulong_long >> ':' >> qi::ulong_long, parsed ) ) { mdcͷAPI mpool_mdc_allocͰmdcΛ࡞Δ 2ຊͷmlog͕࡞ΒΕͯobject id͕2ͭฦͬͯ͘Δ Ҿ਺ͷmdc_capacityͰmlog1ຊ͋ͨΓͷαΠζΛࢦఆ͢Δ
  82. boost::fusion::vector< uint64_t, uint64_t > parsed; namespace qi = boost::spirit::qi; if(

    !qi::parse( v.begin(), v.end(), qi::ulong_long >> ':' >> qi::ulong_long, parsed ) ) { std::cerr << "invalid object id" << std::endl; return 1; } log1 = boost::fusion::at_c< 0 >( parsed ); log2 = boost::fusion::at_c< 1 >( parsed ); } mpool_mdc *raw_log = nullptr; SAFE_CALL( mpool_mdc_open( pool.get(), log1, log2, 0, &raw_log ) ); std::shared_ptr< mpool_mdc > log( raw_log, [pool]( mpool_mdc *p ) { if( p ) mpool_mdc_close( p ); } ); if( params.count( "message" ) ) for( const auto &a: params[ "message" ].as< std::vector< std::string > >() ) SAFE_CALL( mpool_mdc_append( log.get(), const_cast< void* >( static_cast< const void* >( a.data() ) ), a.size(), 1 ) ) if( params.count( "compact" ) ) { auto v = params[ "compact" ].as< std::vector< std::string > >(); mdcͷAPI mpool_mdc_openͰmdcΛ։͘ ։͍ͨmdc͸mpool_mdc_closeͰด͡Δ
  83. return 1; } log1 = boost::fusion::at_c< 0 >( parsed );

    log2 = boost::fusion::at_c< 1 >( parsed ); } mpool_mdc *raw_log = nullptr; SAFE_CALL( mpool_mdc_open( pool.get(), log1, log2, 0, &raw_log ) ); std::shared_ptr< mpool_mdc > log( raw_log, [pool]( mpool_mdc *p ) { if( p ) mpool_mdc_close( p ); } ); if( params.count( "message" ) ) for( const auto &a: params[ "message" ].as< std::vector< std::string > >() ) SAFE_CALL( mpool_mdc_append( log.get(), const_cast< void* >( static_cast< const void* >( a.data() ) ), a.size(), 1 ) ) if( params.count( "compact" ) ) { auto v = params[ "compact" ].as< std::vector< std::string > >(); std::sort( v.begin(), v.end() ); std::vector< std::vector< char > > bufs; SAFE_CALL( mpool_mdc_rewind( log.get() ) ) while( 1 ) { std::vector< char > buf( 4096, 0 ); size_t size = 0; mdcͷAPI mpool_mdc_append_dataͰactiveͳํͷmlogʹ όΠτྻΛ௥Ճ͢Δ ॻ͖ࠐΉόΠτྻ͸ϖʔδڥքʹΞϥΠϯ͞Ε͍ͯͳͯ͘΋ྑ͍ mdc mlog mlog 1 2 3 4 5 2ຊͷmlogͷ͏ͪҰํ͚͕ͩactiveʹͳ͍ͬͯΔ
  84. } SAFE_CALL( mpool_mdc_cend( log.get() ) ) } SAFE_CALL( mpool_mdc_rewind( log.get()

    ) ) while( 1 ) { std::vector< char > buf( 4096, 0 ); size_t size = 0; auto e = mpool_mdc_read( log.get(), buf.data(), buf.size() - 1, &size ); if( mpool_errno( e ) == EOVERFLOW && size > buf.size() ) { buf.resize( size + 1, 0 ); SAFE_CALL( mpool_mdc_read( log.get(), buf.data(), buf.size() - 1, &size ) ); } else SAFE_CALL( e ) if( !size ) break; std::cout << "data: " << buf.data() << std::endl; } if( delete_log ) { log.reset(); SAFE_CALL( mpool_mdc_destroy( pool.get(), log1, log2 ) ) } mdcͷAPI mpool_mdc_rewindͰactiveͳϩάͷઌ಄ʹҠಈ mpool_mdc_readΛݺͿ౓ʹϩά͕ॱ൪ʹฦͬͯ͘Δ
  85. if( params.count( "message" ) ) for( const auto &a: params[

    "message" ].as< std::vector< std::string > >() ) SAFE_CALL( mpool_mdc_append( log.get(), const_cast< void* >( static_cast< const void* >( a.data() ) ), a.size(), 1 ) ) if( params.count( "compact" ) ) { auto v = params[ "compact" ].as< std::vector< std::string > >(); std::sort( v.begin(), v.end() ); std::vector< std::vector< char > > bufs; SAFE_CALL( mpool_mdc_rewind( log.get() ) ) while( 1 ) { std::vector< char > buf( 4096, 0 ); size_t size = 0; auto e = mpool_mdc_read( log.get(), buf.data(), buf.size() - 1, &size ); if( mpool_errno( e ) == EOVERFLOW && size > buf.size() ) { buf.resize( size + 1, 0 ); SAFE_CALL( mpool_mdc_read( log.get(), buf.data(), buf.size() - 1, &size ) ); } else SAFE_CALL( e ) mdcͷAPI mdc mlog mlog 1 2 3 4 5 1 3 ΨϕʔδίϨΫγϣϯΛߦ͏ʹ͸ ·ͣ༗ޮͳϩάΛಡΈग़͢
  86. if( mpool_errno( e ) == EOVERFLOW && size > buf.size()

    ) { buf.resize( size + 1, 0 ); SAFE_CALL( mpool_mdc_read( log.get(), buf.data(), buf.size() - 1, &size ) ); } else SAFE_CALL( e ) if( !size ) break; if( std::binary_search( v.begin(), v.end(), std::string( buf.data() ) ) ) { buf.resize( size ); bufs.emplace_back( std::move( buf ) ); } } SAFE_CALL( mpool_mdc_cstart( log.get() ) ) for( const auto &buf: bufs ) { SAFE_CALL( mpool_mdc_append( log.get(), const_cast< void* >( static_cast< const void* >( buf.data() ) ), buf.size(), 0 ) ) } SAFE_CALL( mpool_mdc_cend( log.get() ) ) } SAFE_CALL( mpool_mdc_rewind( log.get() ) ) mdcͷAPI mdc mlog mlog 1 2 3 4 5 1 3 1 3 mpool_mdc_cstartͰactiveͳmlogΛ੾Γସ͑ ͦͷޙmpool_mdc_appendͰ༗ޮͳϩάͷॻ͖ࠐΈ
  87. if( mpool_errno( e ) == EOVERFLOW && size > buf.size()

    ) { buf.resize( size + 1, 0 ); SAFE_CALL( mpool_mdc_read( log.get(), buf.data(), buf.size() - 1, &size ) ); } else SAFE_CALL( e ) if( !size ) break; if( std::binary_search( v.begin(), v.end(), std::string( buf.data() ) ) ) { buf.resize( size ); bufs.emplace_back( std::move( buf ) ); } } SAFE_CALL( mpool_mdc_cstart( log.get() ) ) for( const auto &buf: bufs ) { SAFE_CALL( mpool_mdc_append( log.get(), const_cast< void* >( static_cast< const void* >( buf.data() ) ), buf.size(), 0 ) ) } SAFE_CALL( mpool_mdc_cend( log.get() ) ) } SAFE_CALL( mpool_mdc_rewind( log.get() ) ) mdcͷAPI mdc mlog mlog 1 3 ࠷ޙʹmpool_mdc_cendͰinactiveͳϩάΛTRIM
  88. if( mpool_errno( e ) == EOVERFLOW && size > buf.size()

    ) { buf.resize( size + 1, 0 ); SAFE_CALL( mpool_mdc_read( log.get(), buf.data(), buf.size() - 1, &size ) ); } else SAFE_CALL( e ) if( !size ) break; std::cout << "data: " << buf.data() << std::endl; } if( delete_log ) { log.reset(); SAFE_CALL( mpool_mdc_destroy( pool.get(), log1, log2 ) ) } mpool_mdc_destroyͰ2ͭͷmlogΛ·ͱΊͯ࡟আ mdcͷAPI
  89. mpool *raw_pool = nullptr; SAFE_CALL( mpool_open( params[ "pool" ].as< std::string

    >().c_str(), O_RDWR, &raw_pool, nullptr ) ); std::shared_ptr< mpool > pool( raw_pool, []( mpool *p ) { if( p ) mpool_close( p ); } ); std::vector< uint64_t > object_ids = params[ "object" ].as< std::vector< uint64_t > >(); mcacheͷAPI mblock͸ϖʔδΩϟογϡΛ࣋ͨͳ͍ Կ౓΋ಡΉσʔλΛϝϞϦʹஔ͍͓͖͍ͯͨ৔߹͸ mcacheͰϖʔδΩϟογϡΛ࡞Δ ͱΓ͋͑ͣmpool_openͰmpoolσόΠεΛ։͘
  90. uint64_t block_id = 0; SAFE_CALL( mpool_mblock_find_get( pool.get(), object_id, &block_id, &props

    ) ) return props; } ); { mpool_mcache_map *raw_map; SAFE_CALL( mpool_mcache_mmap( pool.get(), object_ids.size(), object_ids.data(), MPC_VMA_WARM, &raw_map ) ); std::shared_ptr< mpool_mcache_map > map( raw_map, [pool] ( mpool_mcache_map *p ) { if( p ) mpool_mcache_munmap( p ); } ); for( uint64_t cache_id = 0; cache_id != object_ids.size(); + +cache_id ) { SAFE_CALL( mpool_mcache_madvise( map.get(), cache_id, 0, props[ cache_id ].mpr_write_len, MADV_WILLNEED ) ) size_t offset = 0u; mcacheͷAPI mpool_mcache_mmapͰmcacheʹ৐͍ͤͨmblockΛ object idͰࢦఆ͢Δ ΩϟογϡΛ΍ΊΔͱ͖͸mpool_mcache_munmap
  91. } ); for( uint64_t cache_id = 0; cache_id != object_ids.size();

    + +cache_id ) { SAFE_CALL( mpool_mcache_madvise( map.get(), cache_id, 0, props[ cache_id ].mpr_write_len, MADV_WILLNEED ) ) size_t offset = 0u; void *page = nullptr; SAFE_CALL( mpool_mcache_getpages( map.get(), 1, cache_id, &offset, &page ) ); char *data = reinterpret_cast< char* >( page ); std::cout << "length: " << props[ cache_id ].mpr_write_len << std::endl; std::cout << "data: " << data << std::endl; } } mcacheͷAPI mpool_mcache_madviseͰ cache id൪໨ͷmblock͕ۙ͘ඞཁʹͳΔ͜ͱΛ௨஌ mpool_mcache_getpagesͰϖʔδΩϟογϡͷΞυϨεΛऔಘ
  92. if( p ) mpool_mcache_munmap( p ); } ); for( uint64_t

    cache_id = 0; cache_id != object_ids.size(); + +cache_id ) { SAFE_CALL( mpool_mcache_madvise( map.get(), cache_id, 0, props[ cache_id ].mpr_write_len, MADV_WILLNEED ) ) size_t offset = 0u; void *page = nullptr; SAFE_CALL( mpool_mcache_getpages( map.get(), 1, cache_id, &offset, &page ) ); char *data = reinterpret_cast< char* >( page ); std::cout << "length: " << props[ cache_id ].mpr_write_len << std::endl; std::cout << "data: " << data << std::endl; } } mcacheͷAPI ϙΠϯτ mcacheͷ࡞੒ͱഁغͷλΠϛϯά͸ ΞϓϦέʔγϣϯ͕ίϯτϩʔϧͰ͖ΔͨΊ ͜ͷΩϟογϡΛͦͷ·· ετϨʔδΤϯδϯͷΩϟογϡʹ࢖͑Δ
  93. switch (cmd) { case MPIOC_MP_CREATE: case MPIOC_MP_ACTIVATE: case MPIOC_MP_DESTROY: case

    MPIOC_MP_RENAME: err = mpioc_mp_cmd(unit, cmd, argp); break; case MPIOC_MP_DEACTIVATE: err = mpioc_mp_deactivate(unit, cmd, argp); break; case MPIOC_DRV_ADD: err = mpioc_mp_add(unit, cmd, argp); break; case MPIOC_PARAMS_SET: err = mpioc_params_set(unit, cmd, argp); break; case MPIOC_PARAMS_GET: err = mpioc_params_get(unit, cmd, argp); break; case MPIOC_MP_MCLASS_GET: err = mpioc_mp_mclass_get(unit, cmd, argp); break; case MPIOC_PROP_GET: err = mpioc_proplist_get(unit, cmd, argp); break; case MPIOC_DEVPROPS_GET: err = mpioc_devprops_get(unit, argp); break; case MPIOC_MB_ALLOC: mpool-kmod/src/mpctl.c static long mpc_ioctl(struct file *fp, unsigned int cmd, unsigned long arg) mdcΛআ͘mpoolͷૢ࡞͸ ͦͷ··ioctlʹϚοϓ͞Εͯ Χʔωϧۭؒͷؔ਺ͷ ݺͼग़͠ʹͳ͍ͬͯΔ
  94. mpoolͷεʔύʔϒϩοΫ object idͱετϨʔδ্ͷ഑ஔͷରԠ͸ Χʔωϧͷ੺ࠇ໦(rbtree)Λ࢖ͬͯอ࣋͢Δ rbtree 2 3 1 1 3

    2
  95. mpoolͷεʔύʔϒϩοΫ ͜ͷ੺ࠇ໦ʹର͢Δมߋ͸ mpoolͷઌ಄ʹஔ͔Εͨmdcʹه࿥͞ΕΔ rbtree 2 3 1 1 3 2

    mdc0 mpoolͷactivate࣌͸͜ͷϩάΛᢞΊͯrbtreeΛߏங͢Δ
  96. 1 3 2 mdc0 ϙΠϯτ ϑΝΠϧγεςϜͷϝλσʔλͱҟͳΓ mdc0ʹ͸object idɺҐஔɺαΠζ͘Β͍ͷ৘ใ͔͠ͳ͍ ͜ͷͨΊmdc0Ҏ֎ͷmdcʹͲΜͳʹมߋΛՃ͑ͯ΋ mdc0ʹϩά͕ॻ͖଍͞ΕΔࣄ͸ͳ͍

    ଟஈϩάΛճආͰ͖Δ
  97. ͜͏͢ΔͱετϨʔδΤϯδϯʹͳΔ 0 1 2' 4 5 root 1 2' ͕ʹͳΔ

    ϩά i i mcache mblock mdc
  98. HSE_SAFE_CALL( hse_kvdb_init() ); std::shared_ptr< void > context( nullptr, []( void*

    ) { hse_kvdb_fini(); } ); const std::string pool_name = params[ "pool" ].as< std::string >(); if( create_kvdb ) HSE_SAFE_CALL( hse_kvdb_make( pool_name.c_str(), nullptr ) ); hse_kvdb *raw_kvdb = nullptr; HSE_SAFE_CALL( hse_kvdb_open( pool_name.c_str(), nullptr, &raw_kvdb ) ); std::shared_ptr< hse_kvdb > kvdb( raw_kvdb, [context]( hse_kvdb *p ) { if( p ) hse_kvdb_close( p ); } ); const std::string kvs_name = params[ "kvs" ].as< std::string >(); if( create_kvs ) HSEͷAPI hse_kvdb_initͰHSEΛ࢖͏ͨΊͷ४උΛ͢Δ ย෇͚Δͱ͖͸hse_kvdb_fini
  99. std::shared_ptr< void > context( nullptr, []( void* ) { hse_kvdb_fini();

    } ); const std::string pool_name = params[ "pool" ].as< std::string >(); if( create_kvdb ) HSE_SAFE_CALL( hse_kvdb_make( pool_name.c_str(), nullptr ) ); hse_kvdb *raw_kvdb = nullptr; HSE_SAFE_CALL( hse_kvdb_open( pool_name.c_str(), nullptr, &raw_kvdb ) ); std::shared_ptr< hse_kvdb > kvdb( raw_kvdb, [context]( hse_kvdb *p ) { if( p ) hse_kvdb_close( p ); } ); const std::string kvs_name = params[ "kvs" ].as< std::string >(); if( create_kvs ) HSE_SAFE_CALL( hse_kvdb_kvs_make( kvdb.get(), kvs_name.c_str(), nullptr ) ); hse_kvs *raw_kvs; HSE_SAFE_CALL( hse_kvdb_kvs_open( kvdb.get(), kvs_name.c_str(), nullptr, &raw_kvs ) ); HSEͷAPI hse_kvdb_makeͰࢦఆͨ͠mpoolʹkvdbΛ࡞Δ hse_kvdb_openͰkvdbΛ։͘ mpool kvdb kvs kvs Ωʔ σʔλ Ωʔ σʔλ kvs Ωʔ σʔλ kvdbͷதʹෳ਺ͷkvs(ςʔϒϧ)Λ࡞Δ͜ͱ͕Ͱ͖Δ ͜Ε
  100. std::shared_ptr< hse_kvdb > kvdb( raw_kvdb, [context]( hse_kvdb *p ) {

    if( p ) hse_kvdb_close( p ); } ); const std::string kvs_name = params[ "kvs" ].as< std::string >(); if( create_kvs ) HSE_SAFE_CALL( hse_kvdb_kvs_make( kvdb.get(), kvs_name.c_str(), nullptr ) ); hse_kvs *raw_kvs; HSE_SAFE_CALL( hse_kvdb_kvs_open( kvdb.get(), kvs_name.c_str(), nullptr, &raw_kvs ) ); std::shared_ptr< hse_kvs > kvs( raw_kvs, [kvdb]( hse_kvs *p ) { if( p ) hse_kvdb_kvs_close( p ); } ); hse_kvdb_opspec os; HSE_KVDB_OPSPEC_INIT( &os ); std::shared_ptr< hse_kvdb_txn > transaction( hse_kvdb_txn_alloc( kvdb.get() ), [kvdb]( hse_kvdb_txn *p ) { if( p ) hse_kvdb_txn_free( kvdb.get(), p ); } ); os.kop_txn = transaction.get(); HSE_SAFE_CALL( hse_kvdb_txn_begin( kvdb.get(), os.kop_txn ) ); HSEͷAPI hse_kvdb_kvs_makeͰࢦఆͨ͠kvdbʹkvsΛ࡞Δ hse_kvdb_kvs_openͰkvsΛ։͘ mpool kvdb kvs Ωʔ σʔλ Ωʔ σʔλ ͜Ε
  101. std::shared_ptr< hse_kvs > kvs( raw_kvs, [kvdb]( hse_kvs *p ) {

    if( p ) hse_kvdb_kvs_close( p ); } ); hse_kvdb_opspec os; HSE_KVDB_OPSPEC_INIT( &os ); std::shared_ptr< hse_kvdb_txn > transaction( hse_kvdb_txn_alloc( kvdb.get() ), [kvdb]( hse_kvdb_txn *p ) { if( p ) hse_kvdb_txn_free( kvdb.get(), p ); } ); os.kop_txn = transaction.get(); HSE_SAFE_CALL( hse_kvdb_txn_begin( kvdb.get(), os.kop_txn ) ); for( const auto &v: put_value ) { HSE_SAFE_CALL( hse_kvs_put( kvs.get(), &os, v.first.data(), v.first.size(), v.second.data(), v.second.size() ) ); } for( const auto &v: get_value ) { std::array< char, 100 > data{ 0 }; bool found = false; size_t length = 0; HSE_SAFE_CALL( hse_kvs_get( kvs.get(), &os, v.data(), v.size(), &found, data.data(), data.size(), &length ) ); HSEͷAPI hse_kvdb_txn_allocͰ৽͍͠τϥϯβΫγϣϯΛ࡞Δ root root(1) ͜Ε ϩά ࣺͯΔͱ͖͸hse_kvdb_txn_free
  102. hse_kvdb_txn_free( kvdb.get(), p ); } ); os.kop_txn = transaction.get(); HSE_SAFE_CALL(

    hse_kvdb_txn_begin( kvdb.get(), os.kop_txn ) ); for( const auto &v: put_value ) { HSE_SAFE_CALL( hse_kvs_put( kvs.get(), &os, v.first.data(), v.first.size(), v.second.data(), v.second.size() ) ); } for( const auto &v: get_value ) { std::array< char, 100 > data{ 0 }; bool found = false; size_t length = 0; HSE_SAFE_CALL( hse_kvs_get( kvs.get(), &os, v.data(), v.size(), &found, data.data(), data.size(), &length ) ); if( found ) std::cout << v << "=" << data.data() << std::endl; } if( abort_transaction ) { HSE_SAFE_CALL( hse_kvdb_txn_abort( kvdb.get(), os.kop_txn ) ); HSEͷAPI hse_kvdb_txn_beginͰτϥϯβΫγϣϯΛ։࢝ root hse_kvs_putͰΩʔͱ஋ͷϖΞΛॻ͘ root(1) Ωʔ σʔλ Ωʔ σʔλ ͜Ε ϩά
  103. v.first.size(), v.second.data(), v.second.size() ) ); } for( const auto &v:

    get_value ) { std::array< char, 100 > data{ 0 }; bool found = false; size_t length = 0; HSE_SAFE_CALL( hse_kvs_get( kvs.get(), &os, v.data(), v.size(), &found, data.data(), data.size(), &length ) ); if( found ) std::cout << v << "=" << data.data() << std::endl; } if( abort_transaction ) { HSE_SAFE_CALL( hse_kvdb_txn_abort( kvdb.get(), os.kop_txn ) ); } else { HSE_SAFE_CALL( hse_kvdb_txn_commit( kvdb.get(), os.kop_txn ) ); } HSEͷAPI hse_kvs_getͰΩʔʹରԠ͢Δ஋Λऔಘ root(1) Ωʔ σʔλ Ωʔ σʔλ root ϩά
  104. v.size(), &found, data.data(), data.size(), &length ) ); if( found )

    std::cout << v << "=" << data.data() << std::endl; } if( abort_transaction ) { HSE_SAFE_CALL( hse_kvdb_txn_abort( kvdb.get(), os.kop_txn ) ); } else { HSE_SAFE_CALL( hse_kvdb_txn_commit( kvdb.get(), os.kop_txn ) ); } HSEͷAPI hse_kvdb_txn_commitͰॻ͖ࠐΈΛ֬ఆ hse_kvdb_txn_abortͰ͜͜·Ͱͷॻ͖ࠐΈΛऔΓফ͠ root(1) Ωʔ σʔλ Ωʔ σʔλ root ஋Λૠೖ ϩά ஋Λૠೖ ͜Ε
  105. Heterogeneous-Memory Storage Engine HSE͸ෳ਺ͷҟͳΔετϨʔδσόΠεΛ ڞ௨ͷΠϯλʔϑΣʔεͰαϙʔτ͢Δ͜ͱΛ໨ࢦ͍ͯ͠Δ 1. ݹయతͳSSD 2. Zoned NamespaceΛ࣋ͭNVMe

    SSD 3. ෆشൃϝϞϦσόΠε
  106. Heterogeneous-Memory Storage Engine HSE͸ෳ਺ͷҟͳΔετϨʔδσόΠεΛ ڞ௨ͷΠϯλʔϑΣʔεͰαϙʔτ͢Δ͜ͱΛ໨ࢦ͍ͯ͠Δ 1. ݹయతͳSSD 2. Zoned NamespaceΛ࣋ͭNVMe

    SSD 3. ෆشൃϝϞϦσόΠε όʔδϣϯ1.7ͷ࣌఺Ͱར༻Մೳ ະ࣮૷ ະ࣮૷
  107. Heterogeneous-Memory Storage Engine HSE͸ෳ਺ͷҟͳΔετϨʔδσόΠεΛ ڞ௨ͷΠϯλʔϑΣʔεͰαϙʔτ͢Δ͜ͱΛ໨ࢦ͍ͯ͠Δ 1. ݹయతͳSSD 2. Zoned NamespaceΛ࣋ͭNVMe

    SSD 3. ෆشൃϝϞϦσόΠε όʔδϣϯ1.7ͷ࣌఺Ͱར༻Մೳ ະ࣮૷ ະ࣮૷ ෆشൃϝϞϦσόΠεʹ͍ͭͯ͸ ҎલͷΧʔωϧ/VM޲͚ʹ༻ҙͨ͠ղઆ͕͋ΔͷͰ ͦͪΒΛ͝ཡ͍ͩ͘͞ https://speakerdeck.com/fadis/dian-yuan-woqie-tutemoxiao-enaimemoritofalsefu-kihe-ifang
  108. Zoned Namespace ϒϩοΫ1 ϒϩοΫ0 0 1 2 3 4 5

    6 7 ϒϩοΫ2 ۭ ۭ ۭ ۭ ม׵ද 2->8 SSDͷ༰ྔ͕େ͖͘ͳΔͱม׵ද΋େ͖͘ͳΔ ͜Ε ͜ͷม׵දͷͨΊʹSSDͷ༰ྔͷ ఔ౓ͷRAM͕ඞཁ 1 1000 େ༰ྔͷSSDͷίϯτϩʔϥʹ͸ େ༰ྔͷRAMΛඋ͑Δඞཁ͕͋Δ ͭΒ͍
  109. Zoned Namespace ϒϩοΫ1 ϒϩοΫ0 0 1 2 3 4 5

    6 7 ϒϩοΫ2 ۭ ۭ ۭ ͜ͷαΠζ୯ҐͰΞυϨεΛม׵͢Δͱ ม׵ද͕େ͖͘ͳΓ͗͢Δ த్൒୺ʹTRIM͞ΕͨϒϩοΫ͕ੜ͡Δ ͜ͷαΠζ୯ҐͰ Ͳ͜ʹׂΓ౰͔ͯͨͱઌ಄͔ΒͲ͜·Ͱ࢖͔͚ͬͨͩΛ ͓֮͑ͯ͜͏ TRIM͸ৗʹϒϩοΫؙ͝ͱ ۭ
  110. Zoned Namespace ϒϩοΫ1 ϒϩοΫ0 ࢖༻த ϒϩοΫ2 ࢖༻த 100MBΦʔμʔͷڊେͳϒϩοΫαΠζΛ༻͍Δ ϒϩοΫʹ͸ۭ͖͕͋ΔݶΓ௥ه͕Ͱ͖Δ ॻ͍ͨ಺༰Λফ͍ͨ͠৔߹͸ϒϩοΫؙ͝ͱ࡟আ͢Δඞཁ͕͋Δ

    FTLͷ࢓ࣄΛݮΒ͠ ΞϓϦέʔγϣϯʹNANDͷ੍໿ͷҰ෦Λ௚઀ݟͤΔ
  111. Ϣʔβۭؒ Χʔωϧۭؒ VFS ϑΝΠϧγεςϜ σόΠευϥΠό ϖʔδΩϟογϡ bio MySQL MongoDB WiredTiger

    InnoDB Χʔωϧۭؒ ϋʔυ΢ΣΞ Flash Translation Layer NANDϑϥογϡϝϞϦ dm-zoned ϖʔδαΠζ,J# ϖʔδαΠζ.J# dm-zoned Linuxͷ Zoned Namespace΁ͷ ରԠ 4KiBϖʔδ͕ ͋Δ͔ͷΑ͏ʹݟͤΔ
  112. Ϣʔβۭؒ Χʔωϧۭؒ VFS ϑΝΠϧγεςϜ σόΠευϥΠό ϖʔδΩϟογϡ bio MySQL MongoDB WiredTiger

    InnoDB Χʔωϧۭؒ ϋʔυ΢ΣΞ Flash Translation Layer NANDϑϥογϡϝϞϦ ߏ଄Խϩά͕૿͑ͨ dm-zoned ϩά ϩά ϩά ϩά
  113. Ϣʔβۭؒ Χʔωϧۭؒ VFS ϑΝΠϧγεςϜ σόΠευϥΠό ϖʔδΩϟογϡ bio MySQL MongoDB WiredTiger

    InnoDB Χʔωϧۭؒ ϋʔυ΢ΣΞ Flash Translation Layer NANDϑϥογϡϝϞϦ HSEͷૂ͍ dm-zoned HSE mpool ϖʔδαΠζ,J# ϖʔδαΠζ.J# ϖʔδαΠζ.J#
  114. ·ͱΊ SSD͕πϯσϨ ͏·͘ੑೳΛҾ͖ग़͢ʹ͸ ΧʔωϧͷϨΠϠʔ͔Β࢖͍ํΛݟ௚͢ඞཁ͕͋Δ HSE͸͜ΕΛ΍ͬͯߴ͍ੑೳΛ࣮ݱͨ͠KVS