電源を切っても消えないメモリとの付き合い方

635e53b96114c922fa5486b418895960?s=47 Fadis
October 19, 2019

 電源を切っても消えないメモリとの付き合い方

メモリのように書けて永続化される次世代ストレージデバイスNVDIMMの扱い方を解説します
これは2019年10月19日に行われる予定だった カーネル/VM探検隊@北陸 5回目(台風の影響で中止) での発表資料です
サンプルコード: https://github.com/Fadis/kernelvm_20191019_samples

635e53b96114c922fa5486b418895960?s=128

Fadis

October 19, 2019
Tweet

Transcript

  1. ిݯΛ੾ͬͯ΋ফ͑ͳ͍ϝϞϦͱͷ෇͖߹͍ํ NAOMASA MATSUBAYASHI https://github.com/Fadis/kernelvm_20191019_samples αϯϓϧίʔυ

  2. Ϩδελ L1 cache L2 cache L3 cache DRAM SSD ϋʔυσΟεΫ

    ߴ଎Ͱ ༰ྔ୯Ձ͕ߴ͍ ௿଎Ͱ ༰ྔ୯Ձ͕͍҆
  3. 1013 100 101 102 103 104 105 106 107 108

    109 1010 1011 1012 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 ༰ྔ[bytes] ॻ͖ࠐΈͷϨΠςϯγ[ඵ] Ϩδελ L1 cache L2 cache 10−1 L3 cache DRAM SSD ϋʔυσΟεΫ ӬଓԽ͞Εͳ͍ ӬଓԽ͞ΕΔ
  4. 1013 100 101 102 103 104 105 106 107 108

    109 1010 1011 1012 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 ༰ྔ[bytes] ॻ͖ࠐΈͷϨΠςϯγ[ඵ] Ϩδελ L1 cache L2 cache 10−1 L3 cache DRAM SSD ϋʔυσΟεΫ ӬଓԽ=͕͔͔࣌ؒ͘͢͝Δ
  5. ϑϥογϡϝϞϦͷ࢓૊Έ τϯωϧࢎԽບ N N 1 ϑϩʔςΟϯάήʔτ ઈԑບ ੍ޚήʔτ ిՙ͸௨աͰ͖ͳ͍ ຊؾΛग़͢ͱిՙ͕௨ա͢Δ

    ిքޮՌτϥϯδελʹ ϑϩʔςΟϯάήʔτ͕ڬ·ͬͨΑ͏ͳߏ଄
  6. 20V τϯωϧࢎԽບ N N 1 ϑϩʔςΟϯάήʔτ ઈԑບ ੍ޚήʔτ 0V ੍ޚήʔτʹߴిѹΛ͔͚

    ଞΛGNDʹ͢Δͱ ిՙ͕τϯωϧࢎԽບΛಥ͖ൈ͚ ϑϩʔςΟϯάήʔτʹஷ·Δ 0V 0V
  7. 5V τϯωϧࢎԽບ N N 1 ϑϩʔςΟϯάήʔτ ઈԑບ ੍ޚήʔτ ϑϩʔςΟϯάήʔτʹ ిՙ͕ͨ·͍ͬͯΔͱ

    ੍ޚήʔτʹগʑిѹΛ͔͚ͯ΋ P૚෇ۙͷిࢠ͕ރׇ͠ͳ͍ҝ νϟωϧ͕ܗ੒͞Εͳ͍ ͜ͷঢ়ଶͰ N-P-NͰిྲྀ͕ྲྀΕΔͷʹ ඞཁͳ੍ޚήʔτͷిѹΛ ͱ͢Δ Vh
  8. 0V τϯωϧࢎԽບ N N 1 ϑϩʔςΟϯάήʔτ ઈԑບ ੍ޚήʔτ ੍ޚήʔτΛGNDʹͯ͠ ͦΕҎ֎ʹߴిѹΛ͔͚Δͱ

    ϑϩʔςΟϯάήʔτ͔Β ిՙ͕ൈ͚Δ 20V 20V 20V
  9. 5V τϯωϧࢎԽບ N N 1 ϑϩʔςΟϯάήʔτ ઈԑບ ੍ޚήʔτ ϑϩʔςΟϯάήʔτʹ ిՙ͕ͨ·͍ͬͯͳ͍ͱ

    ੍ޚήʔτʹిѹΛ͔͚ͨ࣌ʹ P૚෇ۙͷిࢠ͕ރׇ͢Δҝ P૚ͷిࢠ͕ҾͬுΒΕͯ νϟωϧ͕ܗ੒͞ΕΔ ͜ͷঢ়ଶͰ N-P-NͰిྲྀ͕ྲྀΕΔͷʹ ඞཁͳ੍ޚήʔτͷిѹΛ ͱ͢Δ Vl
  10. ௚ྻʹ୔ࢁܨ͙ ݸผʹ઀ଓ͢ΔΑΓूੵ౓Λ্͛Δࣄ͕Ͱ͖Δ ௚ྻʹܨ͕Εͨૉࢠ͸ݸผʹॻ͖׵͕͑Ͱ͖ͳ͍ ར఺ ܽ఺

  11. ಡΈ͍ͨηϧʹ ɺͦΕҎ֎ʹ Λ͔͚Δͱ ಡΈ͍ͨηϧͷ஋͕఍߅஋ͰಡΊΔ Vl Vh Vh Vh Vh Vh

    Vl ໰౴ແ༻Ͱ ྲྀΕΔ ঢ়ଶʹΑͬͯ͸ ྲྀΕΔ ໰౴ແ༻Ͱ ྲྀΕΔ ໰౴ແ༻Ͱ ྲྀΕΔ ໰౴ແ༻Ͱ ྲྀΕΔ 1 2 3 4 5 3൪ͷ஋͕ ಡΊΔ ௚ྻʹ୔ࢁܨ͙
  12. 20V ϑϩʔςΟϯάήʔτ ઈԑບ ੍ޚήʔτ ॻ͖ࠐΈ༻ͷߴిѹΛ࡞Δ νϟʔδϙϯϓ͸ ݪཧ্ߴ଎ͳԠ౴͕Ͱ͖ͳ͍ ʹ௿ϨΠςϯγΛٻΊΔͷ͸ແཧ͕͋Δ V V

    0 2V ΫϩοΫͰ ੾Γସ͑ ͜ΕΛඞཁͳిѹʹͳΔ·Ͱ܁Γฦ͢ ͨΊΔ ͩ͢
  13. 20V ϑϩʔςΟϯάήʔτ ઈԑບ ੍ޚήʔτ ॻ͖ࠐΈ༻ͷߴిѹΛ࡞Δ νϟʔδϙϯϓ͸ ݪཧ্ߴ଎ͳԠ౴͕Ͱ͖ͳ͍ ௚ྻʹͳͬͨηϧͷ Ұ෦͚ͩΛॻ͖׵͍͑ͨ৔߹ શͯͷηϧͷ஋ΛಡΈग़ͯ͠

    ॻ͖௚͢ඞཁ͕͋Δ ʹ௿ϨΠςϯγΛٻΊΔͷ͸ແཧ͕͋Δ
  14. ಉظI/O ඇಉظI/O Χʔωϧ/VM୳ݕୂ@ؔ੢ 9ճ໨ ۃΊͯ଎͍ετϨʔδͱͷ෇͖߹͍ํ ΑΓ ࠓ೔ͷSSD͸ ͷϨΠςϯγΛ େྔͷॻ͖ࠐΈΛಉ࣌ʹߦ͏͜ͱͰΧόʔ͍ͯ͠Δҝ ॻ͖ࠐΉ΋ͷ͕େྔʹͳ͍ͱੑೳ͕ग़ͳ͍

  15. 1013 100 101 102 103 104 105 106 107 108

    109 1010 1011 1012 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 ༰ྔ[bytes] ॻ͖ࠐΈͷϨΠςϯγ[ඵ] Ϩδελ L1 cache L2 cache 10−1 L3 cache DRAM SSD ϋʔυσΟεΫ ? ͜ͷลΓʹ ӬଓԽ͞ΕΔϠπ͕ཉ͍͠ ϑϥογϡϝϞϦʹ୅ΘΔ ෆشൃϝϞϦͷݚڀ͕ ଟํ໘ͰߦΘΕ͍ͯͨ
  16. 1013 100 101 102 103 104 105 106 107 108

    109 1010 1011 1012 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 ༰ྔ[bytes] ॻ͖ࠐΈͷϨΠςϯγ[ඵ] Ϩδελ L1 cache L2 cache 10−1 L3 cache DRAM SSD ϋʔυσΟεΫ ͜ͷลΓʹ ӬଓԽ͞ΕΔϠπ͕ཉ͍͠ NVDIMM IntelɺϑϥογϡϝϞϦʹ୅ΘΔෆشൃϝϞϦΛ࠾༻ͨ͠ Optane DC Persistent MemoryΛ੡඼Խ
  17. NVMe SSD Intel Optane DC DRAM 300µsఔ౓ 500nsఔ౓ 50nsఔ౓ ӬଓԽ͞ΕΔ

    ӬଓԽ͞ΕΔ ӬଓԽ͞Εͳ͍ 128GBͰ 6000ԁ͘Β͍ 128GBͰ 5ສԁ͘Β͍ 128GBͰ 40ສԁ͘Β͍ ϖʔδ୯ҐͰ͔͠ ॻ͚ͳ͍ ΩϟογϡϥΠϯ୯ҐͰ ॻ͚Δ ΩϟογϡϥΠϯ୯ҐͰ ॻ͚Δ ϨΠςϯγ ӬଓԽ ༰ྔ୯Ձ ॻ͖ࠐΈ ୯Ґ
  18. Ge1 Sb2 Te4 SeAsGeSi https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2017/Proceedings_Chrono_2017.html Intel͸Optane DCʹ༻͍ͨෆشൃϝϞϦ3D XpointͷৄࡉΛެ։͍ͯ͠ͳ͍͕ ൒ಋମͷ෼ੳΛઐ໳ͱ͢ΔاۀʹΑΔௐࠪ݁Ռ͕ൃද͞Ε͍ͯΔ 3D

    XPoint: Current Implementations and Future Trends
  19. Ge1 Sb2 Te4 SeAsGeSi ΦϘχοΫᮢ஋εΠον ిѹ͕ҰఆҎԼͷ৔߹͚ͩߴ͍఍߅஋Λࣔ͢෺࣭ ࿙ΕిྲྀͰҙਤ͠ͳ͍ηϧ͕Ԡ౴͢ΔͷΛ๷͙ https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2017/Proceedings_Chrono_2017.html Intel͸Optane DCʹ༻͍ͨෆشൃϝϞϦ3D

    XpointͷৄࡉΛެ։͍ͯ͠ͳ͍͕ ൒ಋମͷ෼ੳΛઐ໳ͱ͢ΔاۀʹΑΔௐࠪ݁Ռ͕ൃද͞Ε͍ͯΔ 3D XPoint: Current Implementations and Future Trends
  20. Ge1 Sb2 Te4 SeAsGeSi ΦϘχοΫᮢ஋εΠον ిѹ͕ҰఆҎԼͷ৔߹͚ͩߴ͍఍߅஋Λࣔ͢෺࣭ ࿙ΕిྲྀͰҙਤ͠ͳ͍ηϧ͕Ԡ౴͢ΔͷΛ๷͙ https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2017/Proceedings_Chrono_2017.html Intel͸Optane DCʹ༻͍ͨෆشൃϝϞϦ3D

    XpointͷৄࡉΛެ։͍ͯ͠ͳ͍͕ ൒ಋମͷ෼ੳΛઐ໳ͱ͢ΔاۀʹΑΔௐࠪ݁Ռ͕ൃද͞Ε͍ͯΔ 3D XPoint: Current Implementations and Future Trends ͓ͦΒ͘ ͱ Λੵ૚ͨ͠ ௒֨ࢠܕ૬มԽϝϞϦ GeTe Sb2 Te3 ͔͚ͨిѹʹΑͬͯ • มԽ͠ͳ͍(ࠓͷঢ়ଶ͕ಡΈग़ͤΔ) • ΞϞϧϑΝε(఍߅େ)ʹมԽ͢Δ • ݁থ(఍߅খ)ʹมԽ͢Δ ͷ3௨ΓͷৼΔ෣͍Λ͢Δ෺࣭
  21. Ge1 Sb2 Te4 SeAsGeSi ϙΠϯτ 2ઢͰશͯͷૢ࡞Λߦ͏ҝ ϑϥογϡϝϞϦͷΑ͏ʹݸผͷॻ͖׵͑Λ ٘ਜ਼ʹ͠ͳͯ͘΋ߴີ౓Խ͕Մೳ 1VҎԼͷ௿ిѹͰॻ͖ࠐΈ͕Մೳͳҝ ߴిѹΛಘΔҝͷ͕͔͔࣌ؒΔ࢓૊Έ͕ෆཁ

    ͜ͷ݁Ռ ฒͷେ༰ྔͱ DRAMʹഭΔ௿ϨΠςϯγͱ ӬଓԽ͕શͯୡ੒͞ΕΔ
  22. ໰୊ OS͸͜ͷσόΠεΛͲͷΑ͏ʹϢʔβۭؒʹݟͤΔ΂͖͔ ϝ Ϟ Ϧ? ϒϩοΫσόΠε?

  23. DRAMͱҰॹʹ DIMMιέοτʹऔΓ෇͚Δ σόΠεͰ͸͋Δ͕ σʔλ͸ӬଓԽ͞ΕΔҝ طଘͷΞϓϦέʔγϣϯ͸ͦ͜ʹϑΝΠϧΛஔ͖͍ͨ $ ls /dev/pmem0 -lha brw-rw----

    1 root disk 259, 1 10݄ 2 03:15 /dev/pmem0 LinuxͰ͸NVDIMM͕͍ͬͯ͞͞Δͱ ͱΓ͋͑ͣϒϩοΫσόΠε͕ੜ͑ͯ͘Δ
  24. $ mkfs.xfs /dev/pmem0 meta-data=/dev/pmem0 isize=512 agcount=4, agsize=128896 blks = sectsz=4096

    attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=515584, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 $ mount -t xfs /dev/pmem0 /mnt/pmem/ $ dmesg (ུ) [1506131.089817] XFS (pmem0): Mounting V5 Filesystem [1506131.094488] XFS (pmem0): Ending clean mount $ cd /mnt/pmem/ $ echo 'Hello, World' >hoge $ ls hoge $ mount|grep pmem0 /dev/pmem0 on /mnt/pmem type xfs (rw,relatime,attr2,inode64,noquota) ϑΝΠϧγεςϜΛ࡞ͬͯ Ϛ΢ϯτͯ͠ಡΈॻ͖
  25. Ϣʔβۭؒ Χʔωϧۭؒ ॻ͖ࠐΈΛཁٻ͢Δ ॻ͖ࠐΉϖʔδ͕͋Δఔ౓ͷྔʹͳΔ·ͰஷΊΔ ϑΝΠϧγεςϜͷҧ͍Λந৅Խ͢Δ ετϨʔδͷͲ͜ʹॻ͖ࠐΉ͔Λܾఆ͢Δ ཁٻΛޮ཰Α͘ॻ͖ࠐΊΔॱ൪ʹฒ΂׵͑Δ ࣮ࡍͷσόΠεʹॻ͖ࠐΈΛߦ͏ ΞϓϦέʔγϣϯ VFS

    ϑΝΠϧγεςϜ IOεέδϡʔϥ σόΠευϥΠό ϖʔδΩϟογϡ bio ϋʔυ΢ΣΞͷҧ͍Λந৅Խ͢Δ Linux্Ͱ ϑΝΠϧͷॻ͖ࠐΈΛཁٻ͔ͯ͠Β ϋʔυσΟεΫʹॻ͖ࠐ·ΕΔ·Ͱ
  26. Ϣʔβۭؒ Χʔωϧۭؒ ΞϓϦέʔγϣϯ VFS ϑΝΠϧγεςϜ IOεέδϡʔϥ σόΠευϥΠό ϖʔδΩϟογϡ bio ॻ͖ࠐΈॱং͕

    ॻ͖ࠐΈ଎౓ʹӨڹΛ༩͑ͳ͍ͷͰ εέδϡʔϦϯά͸ཁΒͳ͍ ͜Ε͸ NVMeͰ΋ লུ͞Ε͍ͯͨ
  27. ඞཁͳσʔλΛ Ұ࣌తʹίϐʔ ίϐʔ͞Εͨ σʔλΛಡΉ DRAM্ͷσʔλΛ ॻ͖׵͑Δ ॻ͖׵Θͬͨ಺༰Λ σΟεΫʹಉظ͢Δ ϖʔδΩϟογϡ CPU͸ϋʔυσΟεΫͷ಺༰Λ

    ௚઀ಡΈॻ͖͸Ͱ͖ͳ͍ σΟεΫͷ಺༰ͷ Ұ෦ͷίϐʔ ӬଓԽ͞Εͨ σʔλ
  28. ϖʔδΩϟογϡΛ Ϣʔβۭؒϓϩηεͷ Ծ૝ΞυϨεۭؒʹϚοϓͯ͠ ಡΈॻ͖Ͱ͖ΔΑ͏ʹ͢Δ mmap σΟεΫͷ಺༰ͷ Ұ෦ͷίϐʔ ϓϩηεͷԾ૝ΞυϨεۭؒ ӬଓԽ͞Εͨ σʔλ

    void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
  29. const auto fd = open( filename.c_str(), new_file ? O_RDWR|O_CREAT :

    O_RDWR, 0644 ); if( fd < 0 ) { std::cerr << strerror( errno ) << std::endl; return 1; } if( new_file ) ftruncate( fd, file_size ); const auto raw = mmap( nullptr, file_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0 ); if( raw == nullptr ) { std::cerr << strerror( errno ) << std::endl; return 1; } std::unique_ptr< char, unmap > mapped( reinterpret_cast< char* >v( raw ), unmap( mapped_length ) ); if( vm.count( "write" ) ) { std::copy( new_value.begin(), new_value.end(), mapped.get() ); mapped.get()[ new_value_size ] = '\0'; msync( mapped.get(), file_size, MS_SYNC ); } else std::cout << mapped.get() << std::endl; } ϑΝΠϧΛ։͍ͯ mmapͯ͠ಘͨΞυϨεʹ msync ஋Λॻ͖ࠐΜͰ
  30. [ 0.000000] reserve setup_data: [mem 0x0000000000058000-0x0000000000058fff] reserved [ 0.000000] reserve

    setup_data: [mem 0x0000000000059000-0x000000000009efff] usable [ 0.000000] reserve setup_data: [mem 0x000000000009f000-0x000000000009ffff] reserved [ 0.000000] reserve setup_data: [mem 0x0000000000100000-0x000000009c4d6017] usable [ 0.000000] reserve setup_data: [mem 0x000000009c4d6018-0x000000009c4e6c57] usable [ 0.000000] reserve setup_data: [mem 0x000000009c4e6c58-0x000000009c4e7017] usable [ 0.000000] reserve setup_data: [mem 0x000000009c4e7018-0x000000009c4f7057] usable [ 0.000000] reserve setup_data: [mem 0x000000009c4f7058-0x000000009c4f8017] usable [ 0.000000] reserve setup_data: [mem 0x000000009c4f8018-0x000000009c518057] usable [ 0.000000] reserve setup_data: [mem 0x000000009c518058-0x000000009dc65fff] usable [ 0.000000] reserve setup_data: [mem 0x000000009dc66000-0x000000009dc92fff] ACPI data [ 0.000000] reserve setup_data: [mem 0x000000009dc93000-0x000000009f7f7fff] usable [ 0.000000] reserve setup_data: [mem 0x000000009f7f8000-0x000000009f7f8fff] ACPI NVS [ 0.000000] reserve setup_data: [mem 0x000000009f7f9000-0x000000009f822fff] reserved [ 0.000000] reserve setup_data: [mem 0x000000009f823000-0x000000009f8c7fff] usable [ 0.000000] reserve setup_data: [mem 0x000000009f8c8000-0x00000000a03d8fff] reserved [ 0.000000] reserve setup_data: [mem 0x00000000a03d9000-0x00000000a5952fff] usable [ 0.000000] reserve setup_data: [mem 0x00000000a5953000-0x00000000a705afff] reserved [ 0.000000] reserve setup_data: [mem 0x00000000a705b000-0x00000000a707cfff] ACPI data [ 0.000000] reserve setup_data: [mem 0x00000000a707d000-0x00000000a7236fff] usable [ 0.000000] reserve setup_data: [mem 0x00000000a7237000-0x00000000a786ffff] ACPI NVS [ 0.000000] reserve setup_data: [mem 0x00000000a7870000-0x00000000a7ffefff] reserved [ 0.000000] reserve setup_data: [mem 0x00000000a7fff000-0x00000000a7ffffff] usable [ 0.000000] reserve setup_data: [mem 0x00000000a8000000-0x00000000a80fffff] reserved [ 0.000000] reserve setup_data: [mem 0x00000000f8000000-0x00000000fbffffff] reserved [ 0.000000] reserve setup_data: [mem 0x00000000fe000000-0x00000000fe010fff] reserved [ 0.000000] reserve setup_data: [mem 0x00000000fec00000-0x00000000fec00fff] reserved [ 0.000000] reserve setup_data: [mem 0x00000000fee00000-0x00000000fee00fff] reserved [ 0.000000] reserve setup_data: [mem 0x00000000ff000000-0x00000000ffffffff] reserved [ 0.000000] reserve setup_data: [mem 0x0000000100000000-0x000000037fffffff] usable [ 0.000000] reserve setup_data: [mem 0x0000000380000000-0x00000003ffffffff] persistent (type 12) [ 0.000000] reserve setup_data: [mem 0x0000000400000000-0x000000044dffffff] usable ىಈ࣌ͷ ΧʔωϧϩάͷҰ෦ /7%*..͸શྖҬ͕෺ཧΞυϨεۭؒʹస͕͍ͬͯΔ [mem 0x0000000380000000-0x00000003ffffffff] persistent (type 12)
  31. CPU͸NVDIMMͷ಺༰Λ ௚઀ಡΈॻ͖Ͱ͖Δ NVDIMMͷϨΠςϯγ͕ DRAMͷϨΠςϯγʹ͍ۙ৔߹ NVDIMM্ͷσʔλΛ ϖʔδΩϟογϡʹ ίϐʔ͢Δͷ͸ ແବ σΟεΫͷ಺༰ͷ Ұ෦ͷίϐʔ

    ӬଓԽ͞Εͨ σʔλ
  32. mmap࣌ʹ ϑΝΠϧ͕ஔ͔Εͨ෺ཧΞυϨεΛ ௚઀ϓϩηεͷԾ૝ΞυϨεۭؒʹ Ϛοϓ͢Δ Filesystem DAX ϓϩηεͷԾ૝ΞυϨεۭؒ ӬଓԽ͞Εͨ σʔλ

  33. $ mount -t xfs -o dax /dev/pmem0 /mnt/pmem/ $ dmesg

    (ུ) [1686537.353077] XFS (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [1686537.356044] XFS (pmem0): Mounting V5 Filesystem [1686537.361297] XFS (pmem0): Ending clean mount $ cd /mnt/pmem/ $ ls hoge $ mount|grep pmem0 /dev/pmem0 on /mnt/pmem type xfs (rw,relatime,attr2,dax,inode64,noquota) Filesystem DAXʹରԠͨ͠ϑΝΠϧγεςϜͰ Ϛ΢ϯτ࣌ʹ-o daxΛ෇͚Δ Filesystem DAXΛ༗ޮʹ͢Δ
  34. Ϣʔβۭؒ Χʔωϧۭؒ ΞϓϦέʔγϣϯ VFS σόΠευϥΠό ϖʔδΩϟογϡ bio ϑΝΠϧγεςϜ mmapͨ͠ྖҬͷಡΈॻ͖͸ ΧʔωϧͷϒϩοΫϨΠϠʔΛᷖճͯ͠

    ௚઀σόΠεʹରͯ͠ߦΘΕΔ
  35. [mem 0x0000000380000000-0x00000003ffffffff] persistent (type 12) NVDIMMͷ෺ཧΞυϨε $ ./00_get_physical_address -p `pidof

    00_mmap` -f /mnt/pmem/fuga /mnt/pmem/fuga: VirtualAddress=0x7f9e0086b000 PhysicalAddress=0x41d1d4000 -o daxΛ෇͚͍ͯͳ͍৔߹ -o daxΛ෇͚ͨ৔߹ $ ./00_get_physical_address -p `pidof 00_mmap` -f /mnt/pmem/fuga /mnt/pmem/fuga: VirtualAddress=0x7fae9df25000 PhysicalAddress=0x38220d000 mmapͷฦΓ஋ͷԾ૝ΞυϨεʹରԠ͢Δ෺ཧΞυϨε͸ NVDIMMͷઌ಄͔Β35,704,832όΠτͷҐஔΛࢦ͍ͯ͠Δ mmapͷฦΓ஋ͷԾ૝ΞυϨεʹରԠ͢Δ෺ཧΞυϨε͸ NVDIMMҎ֎ͷͲ͔͜Λࢦ͍ͯ͠Δ
  36. NNBQ͢ΔطଘͷΞϓϦέʔγϣϯ͕ มߋͳ͠ͰΧʔωϧΛᷖճͯ͠ߴ଎ ΍ͬͨʔ ʜͱ͸͍͔ͳ͍

  37. const auto fd = open( filename.c_str(), new_file ? O_RDWR|O_CREAT :

    O_RDWR, 0644 ); if( fd < 0 ) { std::cerr << strerror( errno ) << std::endl; return 1; } if( new_file ) ftruncate( fd, file_size ); const auto raw = mmap( nullptr, file_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0 ); if( raw == nullptr ) { std::cerr << strerror( errno ) << std::endl; return 1; } std::unique_ptr< char, unmap > mapped( reinterpret_cast< char* >v( raw ), unmap( mapped_length ) ); if( vm.count( "write" ) ) { std::copy( new_value.begin(), new_value.end(), mapped.get() ); mapped.get()[ new_value_size ] = '\0'; msync( mapped.get(), file_size, MS_SYNC ); } else std::cout << mapped.get() << std::endl; } ͍ͭ͜͸ԿΛ͍ͯ͠Δͷ͔
  38. NTZOD #include <sys/mman.h> int msync(void *addr, size_t length, int flags);

    mmap͞ΕͨྖҬͷ͏ͪɺมߋ͕Ճ͑ΒΕͨ෦෼ΛϑΝΠϧγεςϜʹ൓ө͢Δ Ϣʔβۭؒ Χʔωϧۭؒ ΞϓϦέʔγϣϯ VFS ϑΝΠϧγεςϜ IOεέδϡʔϥ σόΠευϥΠό ϖʔδΩϟογϡ bio ͜͜Ͱࢭ·͍ͬͯΔσʔλΛ ετϨʔδ·Ͱ൓өͤ͞Δ Filesystem DAXͰ͸ ϖʔδΩϟγϡΛᷖճͯ͠ ௚઀σόΠεʹॻ͍͍ͯΔͷ͔ͩΒ msync͸ཁΒͳ͍ͷͰ͸?
  39. Ϩδελ L1 cache L2 cache L3 cache CPUͱNVDIMMͷؒʹ͸ΩϟογϡϝϞϦ͕͋Δ $16͔Β͸ॻ͚ͨΑ͏ʹݟ͍͑ͯͯ΋ ͜ͷลͰࢭ·͍ͬͯΔ͔΋͠Εͳ͍

    ͜ͷঢ়ଶͰిݯ͕མͪΔͱ ॻ͍ͨഺͷ಺༰͸ࣦΘΕΔ
  40. CLFLUSH—Flush Cache Line Invalidates from every level of the cache

    hierarchy in the cache coherence domain the cache line that contains the linear address specified with the memory operand. If that cache line contains modified data at any level of the cache hierarchy, that data is written back to memory. The source operand is a byte memory location. — Intel® 64 and IA-32 Architectures Software Developer’s ManualΑΓ શͯͷΩϟογϡ͔Βࢦఆ͞ΕͨΞυϨεΛؚΉΩϟογϡϥΠϯΛ࡟আ͢Δ ͦͷΩϟογϡϥΠϯ͕มߋ͞ΕͨσʔλΛؚΜͰ͍Δ৔߹ϝϞϦʹॻ͘ ׬ྃ͢Δ·Ͱϓϩηοα͸଴ػ͢Δ ΍Ίͯ! ΍Ίͯ!
  41. ॻ͍ͯ ॻ͍ͯ ॻ͚ͨ ॻ͚ͨ ॻ͍ͯ ॻ͍ͯ ॻ͚ͨ ॻ͍ͯ શ෦ॻ͚ͨ CLFLUSH

    ͜͏͍ͨ͠
  42. CLFLUSHOPT—Flush Cache Line Optimized (ུ) to enforce ordering with such

    an operation, software can insert an SFENCE instruction between CFLUSHOPT and that operation. — Intel® 64 and IA-32 Architectures Software Developer’s ManualΑΓ શͯͷΩϟογϡ͔Βࢦఆ͞ΕͨΞυϨεΛؚΉΩϟογϡϥΠϯΛ࡟আ͢Δ ͦͷΩϟογϡϥΠϯ͕มߋ͞ΕͨσʔλΛؚΜͰ͍Δ৔߹ϝϞϦʹॻ͘ ׬ྃΛ଴͍ͪͨ৔߹͸4'&/$&͢Δ ΍Ίͯ!
  43. CLWB—Cache Line Write Back Writes back to memory the cache

    line (if modified) that contains the linear address specified with the memory operand from any level of the cache hierarchy in the cache coherence domain. The line may be retained in the cache hierarchy in non-modified state. — Intel® 64 and IA-32 Architectures Software Developer’s ManualΑΓ Ωϟογϡͷதʹࢦఆ͞ΕͨΞυϨεΛؚΉΩϟογϡϥΠϯ͕͋ͬͯ΋࡟আ͠ͳ͍ ͦͷΩϟογϡϥΠϯ͕มߋ͞ΕͨσʔλΛؚΜͰ͍Δ৔߹ϝϞϦʹॻ͘ ׬ྃΛ଴͍ͪͨ৔߹͸4'&/$&͢Δ
  44. NTZODΛཁٻ͞ΕͨΒ Χʔωϧ͸NNBQ͞ΕͨྖҬͷ͏ͪ มߋ͕͋ͬͨ෦෼Λ$-8#͠ͳ͚Ε͹Βͳ͍ Ϣʔβۭؒ Χʔωϧۭؒ ΞϓϦέʔγϣϯ VFS σόΠευϥΠό ϖʔδΩϟογϡ bio

    ϑΝΠϧγεςϜ Χʔωϧ͸NVDIMM্ͷ มߋ͞ΕͨϖʔδΛ ஌͍ͬͯͳ͚Ε͹ͳΒͳ͍ ?
  45. ϓϩηεͷԾ૝ΞυϨεۭؒ ΞΫηε ϖʔδ͕ͳ͍ Ϣʔβۭؒ Χʔωϧۭؒ ϖʔδͱσόΠε্ͷҐஔͷ ରԠΛaddress spaceʹه࿥ ৽͍͠ϖʔδΩϟογϡΛ֬อͯ͠ ಡΈࠐΈઐ༻ʹ͢Δ

    ͷ৔߹ ϖʔδϑΥʔϧτ address space
  46. ϓϩηεͷԾ૝ΞυϨεۭؒ ॻ͖ࠐΈ Ϣʔβۭؒ Χʔωϧۭؒ address space address space্ͷΤϯτϦʹ dirty bitΛཱͯΔ

    ϖʔδΛॻ͖ࠐΈՄೳʹ͢Δ ͷ৔߹ ಡΈࠐΈઐ༻ ͷϖʔδ ϖʔδϑΥʔϧτ
  47. ϓϩηεͷԾ૝ΞυϨεۭؒ Ϣʔβۭؒ Χʔωϧۭؒ address space ͷ৔߹ msyncΛཁٻ͞ΕͨΒ ൣғ಺ͷdirty bitཱ͕͍ͬͯΔϖʔδΛ σόΠεʹॻ͖ࠐΉ

    msync ίϐʔ
  48. ϓϩηεͷԾ૝ΞυϨεۭؒ ΞΫηε ϖʔδ͕ͳ͍ Ϣʔβۭؒ Χʔωϧۭؒ ϖʔδϑΥʔϧτ ϖʔδͱσόΠε্ͷҐஔͷ ରԠΛaddress spaceʹه࿥ NVDIMM্ͷྖҬΛ

    ಡΈࠐΈઐ༻ͰׂΓ౰ͯΔ ͷ৔߹ address space
  49. ϓϩηεͷԾ૝ΞυϨεۭؒ ॻ͖ࠐΈ Ϣʔβۭؒ Χʔωϧۭؒ address space্ͷΤϯτϦʹ dirty bitΛཱͯΔ ϖʔδΛॻ͖ࠐΈՄೳʹ͢Δ ͷ৔߹

    ಡΈࠐΈઐ༻ ϖʔδ ϖʔδϑΥʔϧτ address space
  50. ϓϩηεͷԾ૝ΞυϨεۭؒ Ϣʔβۭؒ Χʔωϧۭؒ ͷ৔߹ msync address space msyncΛཁٻ͞ΕͨΒ ൣғ಺ͷdirty bitཱ͕͍ͬͯΔϖʔδΛ

    CLWB͢Δ CLWB
  51. ϓϩηεͷԾ૝ΞυϨεۭؒ Ϣʔβۭؒ Χʔωϧۭؒ ͷ৔߹ msync address space msyncΛཁٻ͞ΕͨΒ ൣғ಺ͷdirty bitཱ͕͍ͬͯΔϖʔδΛ

    CLWB͢Δ CLWB Y@ͷ࠷খϖʔδαΠζ όΠτ Y@ͷΩϟογϡϥΠϯͷαΠζ όΠτ ϖʔδϑΥʔϧτͰॻ͖׵͑Λݕ஌͢ΔΧʔωϧ͸ ϖʔδͷཻ౓Ͱ͔͠ॻ͖׵͑ΒΕͨ෦෼Λ೺ѲͰ͖ͳ͍ ͨͱ͑ॻ͖׵͑ΒΕͨͷ͕1ͭͷΩϟογϡϥΠϯͩͬͨͱͯ͠΋ 64ճͷCLWB͕ඞཁʹͳΔ
  52. ϓϩηεͷԾ૝ΞυϨεۭؒ Ϣʔβۭؒ Χʔωϧۭؒ ͷ৔߹ msync address space msyncΛཁٻ͞ΕͨΒ ൣғ಺ͷdirty bitཱ͕͍ͬͯΔϖʔδΛ

    CLWB͢Δ CLWB ॻ͖׵͑ΛߦͬͨϢʔβۭؒΞϓϦέʔγϣϯ͸ ࣗ෼͕ॻ͍ͨ෦෼͕Կॲͳͷ͔Λ஌͍ͬͯΔ CLWB͸ಛݖ໋ྩͰ͸ͳ͍ҝϢʔβۭ͔ؒΒ௚઀౤͛Δࣄ͕Ͱ͖Δ msyncͱ͔͠ͳ͍Ͱ Ϣʔβۭ͔ؒΒॻ͍ͨ෦෼ʹCLWBΛ౤͍͛ͨ
  53. Persistent Memory Development Kit ϢʔβۭؒͰͷflush౳ͷ NVDIMMΛ׆༻͢Δҝʹཉ͍͠ػೳΛඋ͑ͨϥΠϒϥϦ܈ https://pmem.io/

  54. libpmem Persistent Memory Development Kit libpmemblk libpmemlog libvmmalloc libpmemobj++ ΞϓϦέʔγϣϯ

    libpmemobj
  55. libpmem Persistent Memory Development Kit libpmemblk libpmemlog libvmmalloc libpmemobj++ ΞϓϦέʔγϣϯ

    libpmemobj libpmem mmapͷϓϥοτϑΥʔϜඇґଘͷϥούpmem_map_file΍ msyncΑΓࡉཻ͔͍౓ͰflushͰ͖Δpmem_persist౳ͷ جຊతͳૢ࡞Λߦ͏ؔ਺ΛؚΉ
  56. const auto raw = pmem_map_file( filename.c_str(), file_size, device_dax ? 0

    : PMEM_FILE_CREATE, 0644, &mapped_length, &is_pmem ); if( raw == nullptr ) { std::cerr << strerror( errno ) << std::endl; return 1; } std::unique_ptr< char, unmap_pmem > mapped( reinterpret_cast< char* >( raw ), unmap_pmem( mapped_length ) ); if( vm.count( "write" ) ) { std::copy( new_value.begin(), new_value.end(), mapped.get() ); mapped.get()[ new_value_size ] = '\0'; if( is_pmem ) pmem_persist( mapped.get(), new_value_size ); else { if( pmem_msync( mapped.get(), new_value_size ) ) { std::cerr << strerror( errno ) << std::endl; return 1; } } } else std::cout << mapped.get() << std::endl; pmem_map_fileͯ͠ ෆشൃϝϞϦͩͬͨΒ pmem_persist ஋Λॻ͖ࠐΜͰ ී௨ͷετϨʔδͩͬͨΒ pmem_msync
  57. void pmem_persist(const void *addr, size_t len); ࢦఆ͞ΕͨΞυϨεͷൣғʹରͯ͠CPU͕αϙʔτ͢Δํ๏Ͱ ΩϟογϡͷϥΠτόοΫΛߦ͏ int pmem_msync(const

    void *addr, size_t len); ࢦఆ͞ΕͨΞυϨεͷൣғΛؚΉϖʔδʹରͯ͠msyncΛݺͼग़͢ ͍ͣΕͷؔ਺΋msyncͱҟͳΓaddr͸ϖʔδͷઌ಄ʹ ΞϥΠϯ͞Ε͍ͯͳͯ͘΋ྑ͍ DAXͳΒ͜ͷૢ࡞͚ͩͰॻ͖ࠐΈ͕ӬଓԽ͞ΕΔ
  58. const auto raw = pmem_map_file( filename.c_str(), file_size, device_dax ? 0

    : PMEM_FILE_CREATE, 0644, &mapped_length, &is_pmem ); if( raw == nullptr ) { std::cerr << strerror( errno ) << std::endl; return 1; } std::unique_ptr< char, unmap_pmem > mapped( reinterpret_cast< char* >( raw ), unmap_pmem( mapped_length ) ); if( vm.count( "write" ) ) { std::copy( new_value.begin(), new_value.end(), mapped.get() ); mapped.get()[ new_value_size ] = '\0'; if( is_pmem ) pmem_persist( mapped.get(), new_value_size ); else { if( pmem_msync( mapped.get(), new_value_size ) ) { std::cerr << strerror( errno ) << std::endl; return 1; } } } else std::cout << mapped.get() << std::endl; ͜Ε ͜ΕΛॻ͍͍ͯΔ࠷தʹ ిݯ͕མͪΔͱͲ͏ͳΔ͔
  59. Hello, W Ωϟογϡ orld! Hello, W Ωϟογϡʹۭ͖͕ͳ͍ͷͰ ݹ͍ॻ͖ࠐΈΛflush $-8# orld!

    ͜͜Ͱిݯ͕ མͪΔͱ σʔλ͕յΕΔ
  60. CJU Ұൠతͳx86_64ͷPCͷCPUͱϝϞϦͷؒ͸ 64bitͷσʔλόεͰܨ͕͍ͬͯΔ 64bitΑΓେ͖ͳσʔλ͸ 2ճҎ্ʹ෼͚ͯૹΒΕΔ 64bitΑΓେ͖ͳσʔλ͸ిݯ૕ࣦޙ ్த·Ͱॻ͔Ε͍ͯΔ͔΋͠Εͳ͍

  61. libpmem Persistent Memory Development Kit libpmemblk libpmemlog libvmmalloc libpmemobj++ ΞϓϦέʔγϣϯ

    libpmemobj libpmemobj Ͱ͔͍σʔλΛτϥϯβΫγϣφϧʹॻͨ͘Ίͷ δϟʔφϧΛ࡞Δ
  62. PMEMobjpool *raw_pool = create ? pmemobj_create( filename.c_str(), layout, file_size, 0666

    ) : pmemobj_open( filename.c_str(), layout ); if( !raw_pool ) { std::cerr << filename << ':' << strerror( errno ) << std::endl; return 1; } std::unique_ptr< PMEMobjpool, close_pmemobj > pool( raw_pool ); PMEMoid root = pmemobj_root( pool.get(), sizeof( data_t ) ); auto root_raw = reinterpret_cast< data_t* >( pmemobj_direct( root ) ); if( !new_value.empty() ) { new_value.resize( std::min( new_value.size(), size_t( 1023 ) ) ); TX_BEGIN( pool.get() ) { pmemobj_tx_add_range( root,offsetof( data_t, message ), sizeof( char ) * ( new_value.size() + 1 ) ); std::copy( new_value.begin(), new_value.end(), root_raw->message ); root_raw->message[ new_value.size() ] = '\0'; } TX_END } else std::cout << root_raw->message << std::endl;
  63. PMEMobjpool *raw_pool = create ? pmemobj_create( filename.c_str(), layout, file_size, 0666

    ) : pmemobj_open( filename.c_str(), layout ); if( !raw_pool ) { std::cerr << filename << ':' << strerror( errno ) << std::endl; return 1; } std::unique_ptr< PMEMobjpool, close_pmemobj > pool( raw_pool ); PMEMoid root = pmemobj_root( pool.get(), sizeof( data_t ) ); auto root_raw = reinterpret_cast< data_t* >( pmemobj_direct( root ) ); if( !new_value.empty() ) { new_value.resize( std::min( new_value.size(), size_t( 1023 ) ) ); TX_BEGIN( pool.get() ) { pmemobj_tx_add_range( root,offsetof( data_t, message ), sizeof( char ) * ( new_value.size() + 1 ) ); std::copy( new_value.begin(), new_value.end(), root_raw->message ); root_raw->message[ new_value.size() ] = '\0'; } TX_END } else std::cout << root_raw->message << std::endl; ϓʔϧͷεʔύʔϒϩοΫΛ࡞Δ ͋ͷσʔλͱ ͜ͷσʔλ͸ ॻ͖ࠐΈ͕ ్தͰ్੾Ε͍ͯ·͢ ϩά σʔλ pmemobj Ͱ ͢ ϔομ
  64. PMEMobjpool *raw_pool = create ? pmemobj_create( filename.c_str(), layout, file_size, 0666

    ) : pmemobj_open( filename.c_str(), layout ); if( !raw_pool ) { std::cerr << filename << ':' << strerror( errno ) << std::endl; return 1; } std::unique_ptr< PMEMobjpool, close_pmemobj > pool( raw_pool ); PMEMoid root = pmemobj_root( pool.get(), sizeof( data_t ) ); auto root_raw = reinterpret_cast< data_t* >( pmemobj_direct( root ) ); if( !new_value.empty() ) { new_value.resize( std::min( new_value.size(), size_t( 1023 ) ) ); TX_BEGIN( pool.get() ) { pmemobj_tx_add_range( root,offsetof( data_t, message ), sizeof( char ) * ( new_value.size() + 1 ) ); std::copy( new_value.begin(), new_value.end(), root_raw->message ); root_raw->message[ new_value.size() ] = '\0'; } TX_END } else std::cout << root_raw->message << std::endl; ϓʔϧͷϧʔτΦϒδΣΫτΛऔಘ͢Δ ϓʔϧͷઌ಄͔ΒͷΦϑηοτΛද͢ܕ PMEMoid pmemobj_directͰPMEMoid͕ࢦ͍ͯ͠ΔҐஔΛ ݱࡏͷϖʔδϚοϓͷ΋ͱͰͷԾ૝ΞυϨεʹม׵
  65. PMEMobjpool *raw_pool = create ? pmemobj_create( filename.c_str(), layout, file_size, 0666

    ) : pmemobj_open( filename.c_str(), layout ); if( !raw_pool ) { std::cerr << filename << ':' << strerror( errno ) << std::endl; return 1; } std::unique_ptr< PMEMobjpool, close_pmemobj > pool( raw_pool ); PMEMoid root = pmemobj_root( pool.get(), sizeof( data_t ) ); auto root_raw = reinterpret_cast< data_t* >( pmemobj_direct( root ) ); if( !new_value.empty() ) { new_value.resize( std::min( new_value.size(), size_t( 1023 ) ) ); TX_BEGIN( pool.get() ) { pmemobj_tx_add_range( root,offsetof( data_t, message ), sizeof( char ) * ( new_value.size() + 1 ) ); std::copy( new_value.begin(), new_value.end(), root_raw->message ); root_raw->message[ new_value.size() ] = '\0'; } TX_END } else std::cout << root_raw->message << std::endl; pmemobj_tx_add_range͞ΕͨྖҬ͸ TX_ENDʹḷΓண͚ͳ͔ͬͨ৔߹ TX_BEGINલͷঢ়ଶʹͳΔ TX_BEGIN TX_END มߋA มߋB มߋA͚͕ͩ൓ө͞ΕͯมߋB͕൓ө͞Εͳ͍ঢ়ଶʹ͸ܾͯ͠ͳΒͳ͍
  66. PMEMobjpool *raw_pool = create ? pmemobj_create( filename.c_str(), layout, file_size, 0666

    ) : pmemobj_open( filename.c_str(), layout ); if( !raw_pool ) { std::cerr << filename << ':' << strerror( errno ) << std::endl; return 1; } std::unique_ptr< PMEMobjpool, close_pmemobj > pool( raw_pool ); PMEMoid root = pmemobj_root( pool.get(), sizeof( data_t ) ); auto root_raw = reinterpret_cast< data_t* >( pmemobj_direct( root ) ); if( !new_value.empty() ) { new_value.resize( std::min( new_value.size(), size_t( 1023 ) ) ); TX_BEGIN( pool.get() ) { pmemobj_tx_add_range( root,offsetof( data_t, message ), sizeof( char ) * ( new_value.size() + 1 ) ); std::copy( new_value.begin(), new_value.end(), root_raw->message ); root_raw->message[ new_value.size() ] = '\0'; } TX_END } else std::cout << root_raw->message << std::endl; ͜͜·Ͱ૸Γ͖Δͱpmemobj_tx_add_range͞ΕͨྖҬ͸ pmem_persist͞ΕΔ
  67. ର৅ൣғΛϩάʹίϐʔ։࢝ ϩάΛ༗ޮʹ͢Δ ର৅ൣғΛॻ͖׵͑׬ྃ ϩάΛແޮʹ͢Δ ϩάΛ࡟আ͢Δ

  68. ର৅ൣғΛϩάʹίϐʔ։࢝ ϩάΛ༗ޮʹ͢Δ ର৅ൣғΛॻ͖׵͑׬ྃ ϩάΛແޮʹ͢Δ ϩάΛ࡟আ͢Δ ແޮ ϩά σʔλ ϩά σʔλ

    ࣍ʹpmemobj_openͨ࣌͠ʹ ແޮͳϩά͕͋ͬͨΒ࡟আ͢Δ ॻ͖׵͑લͷঢ়ଶʹͳΔ
  69. ࣍ʹpmemobj_openͨ࣌͠ʹ ༗ޮͳϩά͕͋ͬͨΒݩͷҐஔʹίϐʔ͢Δ ॻ͖׵͑લͷঢ়ଶʹͳΔ ༗ޮ ϩά σʔλ ϩά σʔλ ༗ޮ ର৅ൣғΛϩάʹίϐʔ։࢝

    ϩάΛ༗ޮʹ͢Δ ର৅ൣғΛॻ͖׵͑׬ྃ ϩάΛແޮʹ͢Δ ϩάΛ࡟আ͢Δ
  70. ίϐʔ։࢝ ʹ͢Δ ׵͑׬ྃ ʹ͢Δ ͢Δ ϩάΛ࠶ੜ͢Δ ϩά σʔλ ༗ޮ ϩάͷ࠶ੜதʹ࠶౓Ϋϥογϡͨ͠৔߹

    ϩά͸༗ޮͳ··ͳͷͰ ࣍ʹpmemobj_openͨ͠ͱ͖ʹվΊͯ࠶ੜ͞ΕΔ ϩά σʔλ ༗ޮ
  71. ର৅ൣғΛϩάʹίϐʔ։࢝ ϩάΛ༗ޮʹ͢Δ ର৅ൣғΛॻ͖׵͑׬ྃ ϩάΛແޮʹ͢Δ ϩάΛ࡟আ͢Δ ࣍ʹpmemobj_openͨ࣌͠ʹ ༗ޮͳϩά͕͋ͬͨΒݩͷҐஔʹίϐʔ͢Δ ॻ͖׵͑લͷঢ়ଶʹͳΔ ༗ޮ ϩά

    σʔλ ϩά σʔλ ༗ޮ
  72. ର৅ൣғΛϩάʹίϐʔ։࢝ ϩάΛ༗ޮʹ͢Δ ର৅ൣғΛॻ͖׵͑׬ྃ ϩάΛແޮʹ͢Δ ϩάΛ࡟আ͢Δ ແޮ ϩά σʔλ ϩά σʔλ

    ࣍ʹpmemobj_openͨ࣌͠ʹ ແޮͳϩά͕͋ͬͨΒ࡟আ͢Δ ॻ͖׵͑ޙͷঢ়ଶʹͳΔ
  73. TX_BEGIN( pool.get() ) { pmemobj_tx_add_range( head, offsetof( data_t, next ),

    sizeof( PMEMoid ) ); PMEMoid next = pmemobj_tx_zalloc( sizeof( data_t ), 0 ); auto next_raw = reinterpret_cast< data_t* >( pmemobj_direct( next ) ); pmemobj_tx_add_range( next, 0, sizeof( data_t ) ); new(next_raw) data_t(); std::copy( new_value.begin(), new_value.end(), next_raw->message ); next_raw->message[ new_value.size() ] = '\0'; head_raw->next = next; } TX_END σʔλ pmem_tx_*allocͰ ϓʔϧʹ৽͍͠σʔλͷҝͷྖҬΛ֬อ ͜ͷϝϞϦ֬อ͸ TX_ENDʹḷΓண͚ͳ͔ͬͨ৔߹ ແ͔ͬͨ͜ͱʹͳΔ
  74. TX_BEGIN( pool.get() ) { pmemobj_tx_add_range( prev, offsetof( data_t, next ),

    sizeof( PMEMoid ) ); prev_raw->next = cur_raw->next; cur_raw->~data_t(); pmemobj_tx_free( cur ); } TX_END σʔλ pmem_tx_freeͰ ֬อͨ͠ྖҬΛղ์ ͜ͷϝϞϦղ์͸ TX_ENDʹḷΓண͚ͳ͔ͬͨ৔߹ ແ͔ͬͨ͜ͱʹͳΔ
  75. #include <cstring> #include <iostream> #include <cstdint> #include <boost/filesystem.hpp> #include <boost/program_options.hpp>

    #include <libpmemobj.h> class close_pmemobj { public: template< typename T > void operator()( T *p ) const { if( p ) pmemobj_close( p ); } }; namespace fs = boost::filesystem; bool is_special_file( const fs::path &p ) { return fs::status( p ).type() == fs::file_type::character_file || fs::status( p ).type() == fs::file_type::block_file; } struct data_t { char message[ 1024 ]; PMEMoid next; }; int main( int argc, const char *argv[] ) { namespace po = boost::program_options; po::options_description desc( "Options" );
  76. fs::status( p ).type() == fs::file_type::block_file; } struct data_t { char

    message[ 1024 ]; PMEMoid next; }; int main( int argc, const char *argv[] ) { namespace po = boost::program_options; po::options_description desc( "Options" ); std::string new_value; std::string remove_value; uint64_t pool_size; constexpr const char layout[] = "90d2827d-3742-4054-aea8-7a43068085ac"; std::string filename; desc.add_options() ( "help,h", "show this message" ) ( "create,c", "create" ) ( "size,s", po::value< size_t >( &pool_size )->default_value( PMEMOBJ_MIN_POOL ), "pool size" ) ( "filename,f", po::value< std::string >( &filename )->default_value( "/dev/dax0.0" ), "filename" ) ( "append,a", po::value< std::string >( &new_value ), "append" ) ( "delete,d", po::value< std::string >( &remove_value ), "delete" ) ( "list,l", "list" ); po::variables_map vm; po::store( po::parse_command_line( argc, argv, desc ), vm ); po::notify( vm ); if( vm.count( "help" ) ) { std::cout << desc << std::endl; return 0; } size_t mapped_length = 0u; ࠷େ1024όΠτͷจࣈྻͱ ࣍ͷཁૉ΁ͷΦϑηοτΛ࣋ͭ୯ํ޲ϦϯΫϦετͷϊʔυ
  77. pmemobj_open( filename.c_str(), layout ); if( !raw_pool ) { std::cerr <<

    filename << ':' << strerror( errno ) << std::endl; return 1; } std::unique_ptr< PMEMobjpool, close_pmemobj > pool( raw_pool ); PMEMoid root = pmemobj_root( pool.get(), sizeof( data_t ) ); auto root_raw = reinterpret_cast< data_t* >( pmemobj_direct( root ) ); if( !new_value.empty() ) { auto head = root; auto head_raw = root_raw; while( 1 ) { auto next = reinterpret_cast< data_t* >( pmemobj_direct( head_raw->next ) ); if( next ) { head = head_raw->next; head_raw = next; } else break; } new_value.resize( std::min( new_value.size(), size_t( 1023 ) ) ); TX_BEGIN( pool.get() ) { pmemobj_tx_add_range( head, offsetof( data_t, next ), sizeof( PMEMoid ) ); PMEMoid next = pmemobj_tx_zalloc( sizeof( data_t ), 0 ); auto next_raw = reinterpret_cast< data_t* >( pmemobj_direct( next ) ); pmemobj_tx_add_range( next, 0, sizeof( data_t ) ); new(next_raw) data_t(); std::copy( new_value.begin(), new_value.end(), next_raw->message ); next_raw->message[ new_value.size() ] = '\0'; head_raw->next = next; } TX_END } ऴ୺ͷϊʔυΛ୳͢
  78. auto head_raw = root_raw; while( 1 ) { auto next

    = reinterpret_cast< data_t* >( pmemobj_direct( head_raw->next ) ); if( next ) { head = head_raw->next; head_raw = next; } else break; } new_value.resize( std::min( new_value.size(), size_t( 1023 ) ) ); TX_BEGIN( pool.get() ) { pmemobj_tx_add_range( head, offsetof( data_t, next ), sizeof( PMEMoid ) ); PMEMoid next = pmemobj_tx_zalloc( sizeof( data_t ), 0 ); auto next_raw = reinterpret_cast< data_t* >( pmemobj_direct( next ) ); pmemobj_tx_add_range( next, 0, sizeof( data_t ) ); new(next_raw) data_t(); std::copy( new_value.begin(), new_value.end(), next_raw->message ); next_raw->message[ new_value.size() ] = '\0'; head_raw->next = next; } TX_END } if( !remove_value.empty() ) { auto prev = root; auto prev_raw = root_raw; auto cur = prev_raw->next; auto cur_raw = reinterpret_cast< data_t* >( pmemobj_direct( cur ) ); while( cur_raw ) { if( strcmp( cur_raw->message, remove_value.data() ) == 0 ) { break; } auto next = reinterpret_cast< data_t* >( pmemobj_direct( cur_raw->next ) ); if( next ) { ऴ୺ͷϊʔυͷnextΛมߋର৅ͱͯ͠ϩάʹੵΉ ৽͍͠ϊʔυΛ࡞Δ ৽͍͠ϊʔυΛ มߋର৅ͱͯ͠ϩάʹੵΉ ৽͍͠ϊʔυʹ஋Λॻ͖ࠐΜͰ ऴ୺ͷϊʔυͷnextʹܨ͙ ͜ΕΒͷૢ࡞ΛTX_BEGIN͔ΒTX_ENDͷؒͰߦ͏
  79. new_value.resize( std::min( new_value.size(), size_t( 1023 ) ) ); TX_BEGIN( pool.get()

    ) { pmemobj_tx_add_range( head, offsetof( data_t, next ), sizeof( PMEMoid ) ); PMEMoid next = pmemobj_tx_zalloc( sizeof( data_t ), 0 ); auto next_raw = reinterpret_cast< data_t* >( pmemobj_direct( next ) ); pmemobj_tx_add_range( next, 0, sizeof( data_t ) ); new(next_raw) data_t(); std::copy( new_value.begin(), new_value.end(), next_raw->message ); next_raw->message[ new_value.size() ] = '\0'; head_raw->next = next; } TX_END } if( !remove_value.empty() ) { auto prev = root; auto prev_raw = root_raw; auto cur = prev_raw->next; auto cur_raw = reinterpret_cast< data_t* >( pmemobj_direct( cur ) ); while( cur_raw ) { if( strcmp( cur_raw->message, remove_value.data() ) == 0 ) { break; } auto next = reinterpret_cast< data_t* >( pmemobj_direct( cur_raw->next ) ); if( next ) { prev = cur; cur = cur_raw->next; prev_raw = cur_raw; cur_raw = next; } else { std::cerr << "Not found." << std::endl; return 1; } ϓʔϧΛ࡞Δ ௥Ճ ௥Ճ ௥Ճ ࡟আ $ ./03_pmemobj_alloc -c -f test -s 67108864 $ ./03_pmemobj_alloc -f test -a abcde -l abcde $ ./03_pmemobj_alloc -f test -a fghij -l abcde fghij $ ./03_pmemobj_alloc -f test -a klmno -l abcde fghij klmno $ ./03_pmemobj_alloc -f test -d fghij -l abcde klmno
  80. libpmem Persistent Memory Development Kit libpmemblk libpmemlog libvmmalloc libpmemobj++ ΞϓϦέʔγϣϯ

    libpmemobj libpmemobj++ libpmemobjͷC++ϥούʔ
  81. #include <iostream> #include <cstdint> #include <libpmemobj++/p.hpp> #include <libpmemobj++/pool.hpp> #include <libpmemobj++/transaction.hpp>

    #include <libpmemobj++/persistent_ptr.hpp> #include <libpmemobj++/make_persistent.hpp> #include <libpmemobj++/make_persistent_array.hpp> #include <boost/filesystem.hpp> #include <boost/program_options.hpp> using pmem::obj::p; using pmem::obj::persistent_ptr; struct data_t { persistent_ptr< data_t > next; p< std::array< char, 1024 > > data; }; namespace fs = boost::filesystem; bool is_special_file( const fs::path &p ) { return fs::status( p ).type() == fs::file_type::character_file || fs::status( p ).type() == fs::file_type::block_file; } int main( int argc, const char *argv[] ) { namespace po = boost::program_options; po::options_description desc( "Options" ); std::string new_value = ""; std::string remove_value = ""; uint64_t pool_size;
  82. #include <boost/filesystem.hpp> #include <boost/program_options.hpp> using pmem::obj::p; using pmem::obj::persistent_ptr; struct data_t

    { persistent_ptr< data_t > next; p< std::array< char, 1024 > > data; }; namespace fs = boost::filesystem; bool is_special_file( const fs::path &p ) { return fs::status( p ).type() == fs::file_type::character_file || fs::status( p ).type() == fs::file_type::block_file; } int main( int argc, const char *argv[] ) { namespace po = boost::program_options; po::options_description desc( "Options" ); std::string new_value = ""; std::string remove_value = ""; uint64_t pool_size; constexpr const char layout[] = "dd58d49d-4be6-44e0-b160-37e79d94ecf8"; std::string filename; desc.add_options() ( "help,h", "show this message" ) ( "create,c", "create" ) ( "size,s", po::value< size_t >( &pool_size )->default_value( PMEMOBJ_MIN_POOL ), "pool size" ) ( "filename,f", po::value< std::string >( &filename )->default_value( "/dev/dax0.0" ), "filename" ) ( "append,a", po::value< std::string >( &new_value ), "append" ) ( "delete,d", po::value< std::string >( &remove_value ), "delete" ) ࠷େ1024όΠτͷจࣈྻͱ ࣍ͷཁૉ΁ͷΦϑηοτΛ࣋ͭ୯ํ޲ϦϯΫϦετͷϊʔυ
  83. file_size = pool_size; create = true; } namespace pobj =

    pmem::obj; auto pool = create ? pobj::pool< data_t >::create( filename.c_str(), layout, file_size, 0666 ) : pobj::pool< data_t >::open( filename.c_str(), layout ); pobj::persistent_ptr< data_t > root = pool.get_root(); if( !new_value.empty() ) { auto next = root->next; auto cur = root; while( next ) { cur = next; next = next->next; } new_value.resize( 1023 ); std::array< char, 1024 > data; std::copy( new_value.begin(), new_value.end(), data.begin() ); data[ 1023 ] = '\0'; pmem::obj::transaction::exec_tx( pool, [&] { auto new_elem = pmem::obj::make_persistent< data_t >(); new_elem->data = data; cur->next = new_elem; } ); } if( !remove_value.empty() ) { auto next = root->next; auto cur = root; while( next ) { if( strcmp( next->data.get_ro().data(), remove_value.data() ) == 0 ) { const auto data_size = strlen( next->data.get_ro().data() ); pmem::obj::transaction::exec_tx( pool, [&] { ৽͍͠ϊʔυΛ࡞Δ ৽͍͠ϊʔυʹσʔλΛॻ͖ࠐΉ ऴ୺ͷϊʔυͷnextʹ৽͍͠ϊʔυΛܨ͙ ͜ΕΒͷૢ࡞Λexec_txʹ౉͢ϥϜμࣜͷதͰߦ͏
  84. if( !new_value.empty() ) { auto next = root->next; auto cur

    = root; while( next ) { cur = next; next = next->next; } new_value.resize( 1023 ); std::array< char, 1024 > data; std::copy( new_value.begin(), new_value.end(), data.begin() ); data[ 1023 ] = '\0'; pmem::obj::transaction::exec_tx( pool, [&] { auto new_elem = pmem::obj::make_persistent< data_t >(); new_elem->data = data; cur->next = new_elem; } ); } if( !remove_value.empty() ) { auto next = root->next; auto cur = root; while( next ) { if( strcmp( next->data.get_ro().data(), remove_value.data() ) == 0 ) { const auto data_size = strlen( next->data.get_ro().data() ); pmem::obj::transaction::exec_tx( pool, [&] { cur->next = next->next; pmem::obj::delete_persistent< data_t >( next ); } ); break; } cur = next; next = next->next; } $ ./04_pmemobj++ -c -f test -s 67108864 $ ./04_pmemobj++ -f test -a abcde -l abcde $ ./04_pmemobj++ -f test -a fghij -l abcde fghij $ ./04_pmemobj++ -f test -a klmno -l abcde fghij klmno $ ./04_pmemobj++ -f test -d fghij -l abcde klmno ϓʔϧΛ࡞Δ ௥Ճ ௥Ճ ௥Ճ ࡟আ
  85. libpmem Persistent Memory Development Kit libpmemblk libpmemlog libvmmalloc libpmemobj++ ΞϓϦέʔγϣϯ

    libpmemobj libpmemlog ௥ه͔͠Ͱ͖ͳ͍͕libpmemobjΑΓ؆୯ʹॻ͖ࠐΊΔ
  86. size_t mapped_length = 0u; int is_pmem = 0; fs::path path(

    filename ); bool device_dax = false; size_t file_size = 0u; bool create = vm.count( "create" ); if( fs::exists( path ) ) { device_dax = is_special_file( path ); if( !device_dax ) file_size = fs::file_size( path ); else file_size = 0; } else { file_size = pool_size; create = true; } PMEMlogpool *raw_pool = create ? pmemlog_create( filename.c_str(), file_size, 0666 ) : pmemlog_open( filename.c_str() ); if( !raw_pool ) { std::cerr << filename << ':' << strerror( errno ) << std::endl; return 1; } std::unique_ptr< PMEMlogpool, close_pmemlog > pool( raw_pool ); if( !new_value.empty() ) pmemlog_append( pool.get(), new_value.data(), new_value.size() ); if( vm.count( "list" ) ) { pmemlog_walk( pool.get(), 0, []( const void *data, size_t length, void* ) -> int { std::cout << std::string( reinterpret_cast< const char* >( data ), length ) << std::endl; return 0; }, nullptr ); } } ։͘ ॻ͖଍͢ ᢞΊΔ
  87. size_t mapped_length = 0u; int is_pmem = 0; fs::path path(

    filename ); bool device_dax = false; size_t file_size = 0u; bool create = vm.count( "create" ); if( fs::exists( path ) ) { device_dax = is_special_file( path ); if( !device_dax ) file_size = fs::file_size( path ); else file_size = 0; } else { file_size = pool_size; create = true; } PMEMlogpool *raw_pool = create ? pmemlog_create( filename.c_str(), file_size, 0666 ) : pmemlog_open( filename.c_str() ); if( !raw_pool ) { std::cerr << filename << ':' << strerror( errno ) << std::endl; return 1; } std::unique_ptr< PMEMlogpool, close_pmemlog > pool( raw_pool ); if( !new_value.empty() ) pmemlog_append( pool.get(), new_value.data(), new_value.size() ); if( vm.count( "list" ) ) { pmemlog_walk( pool.get(), 0, []( const void *data, size_t length, void* ) -> int { std::cout << std::string( reinterpret_cast< const char* >( data ), length ) << std::endl; return 0; }, nullptr ); } } $ ./05_pmemlog -c -f test $ ./05_pmemlog -f test -a abcde -l abcde $ ./05_pmemlog -f test -a fghij -l abcdefghij $ ./05_pmemlog -f test -a klmno -l abcdefghijklmno ϓʔϧΛ࡞Δ ௥Ճ ௥Ճ ௥Ճ
  88. libpmem Persistent Memory Development Kit libpmemblk libpmemlog libvmmalloc libpmemobj++ ΞϓϦέʔγϣϯ

    libpmemobj libpmemblk ϒϩοΫ୯ҐͰͷॻ͖ࠐΈ͔͠Ͱ͖ͳ͍͕ pmemobjΑΓ؆୯ʹॻ͖ࠐΊΔ
  89. PMEMblkpool *raw_pool = create ? pmemblk_create( filename.c_str(), block_size, file_size, 0666

    ) : pmemblk_open( filename.c_str(), block_size ); if( !raw_pool ) { std::cerr << filename << ':' << strerror( errno ) << std::endl; return 1; } std::unique_ptr< PMEMblkpool, close_pmemblk > pool( raw_pool ); const size_t block_count = pmemblk_nblock( pool.get() ); if( create ) { const char buffer[ block_size ] = { 0 }; for( size_t i = 0; i != block_count; ++i ) { pmemblk_write( pool.get(), buffer, i ); if( i % ( block_count / 10 ) == 0 ) std::cout << 100 * i / block_count << "%" << std::endl; } } if( !new_value.empty() ) { char buffer[ block_size ]; for( size_t i = 0; i != block_count; ++i ) { pmemblk_read( pool.get(), buffer, i ); if( buffer[ 0 ] == '\0' ) { new_value.resize( block_size - 1 ); std::copy( new_value.begin(), new_value.end(), buffer ); buffer[ new_value.size() ] = '\0'; pmemblk_write( pool.get(), buffer, i ); break; } } } if( vm.count( "list" ) ) { char buffer[ block_size ]; ։͘ ϒϩοΫΛಡΉ ϒϩοΫΛॻ͘
  90. PMEMblkpool *raw_pool = create ? pmemblk_create( filename.c_str(), block_size, file_size, 0666

    ) : pmemblk_open( filename.c_str(), block_size ); if( !raw_pool ) { std::cerr << filename << ':' << strerror( errno ) << std::endl; return 1; } std::unique_ptr< PMEMblkpool, close_pmemblk > pool( raw_pool ); const size_t block_count = pmemblk_nblock( pool.get() ); if( create ) { const char buffer[ block_size ] = { 0 }; for( size_t i = 0; i != block_count; ++i ) { pmemblk_write( pool.get(), buffer, i ); if( i % ( block_count / 10 ) == 0 ) std::cout << 100 * i / block_count << "%" << std::endl; } } if( !new_value.empty() ) { char buffer[ block_size ]; for( size_t i = 0; i != block_count; ++i ) { pmemblk_read( pool.get(), buffer, i ); if( buffer[ 0 ] == '\0' ) { new_value.resize( block_size - 1 ); std::copy( new_value.begin(), new_value.end(), buffer ); buffer[ new_value.size() ] = '\0'; pmemblk_write( pool.get(), buffer, i ); break; } } } if( vm.count( "list" ) ) { char buffer[ block_size ]; $ ./06_pmemblk -c -f test 0% 9% 19% 29% 39% 49% 59% 69% 79% 89% 99% $ ./06_pmemblk -f test -a abcde -l abcde $ ./06_pmemblk -f test -a fghij -l abcde fghij $ ./06_pmemblk -f test -a klmno -l abcde fghij klmno ϓʔϧΛ࡞Δ ௥Ճ ௥Ճ ௥Ճ
  91. libpmem Persistent Memory Development Kit libpmemblk libpmemlog libvmmalloc libpmemobj++ ΞϓϦέʔγϣϯ

    libpmemobj libvmmalloc ϝϞϦ֬อʹؔΘΔؔ਺(mallocͱ͔)Λ NVDIMM͔ΒྖҬΛ֬อ͢Δؔ਺Ͱஔ͖׵͑Δ NVDIMMΛେ༰ྔشൃϝϞϦͱͯ͠࢖͏ࣄ͕Ͱ͖Δ
  92. Sparse File % % % ϑΝΠϧ ϓϩηεͷԾ૝ΞυϨεۭؒ . ϑΝΠϧͷ͏ͪඇθϩ஋͕ॻ͖ࠐ·Ε͍ͯΔ ϖʔδ͚͕ͩετϨʔδʹه࿥͞Ε͍ͯΔ

  93. Sparse File % % % ϑΝΠϧ ϓϩηεͷԾ૝ΞυϨεۭؒ ϑΝΠϧͷ͏ͪඇθϩ஋͕ॻ͖ࠐ·Ε͍ͯΔ ϖʔδ͚͕ͩετϨʔδʹه࿥͞Ε͍ͯΔ %

    ϖʔδ͕ͳ͍ͱ͜Ζʹॻ͖ࠐΉͱ ৽͍͠ϖʔδ͕֬อ͞ΕΔ ॻ͖ࠐΈ ϖʔδ͕૿͑ΔͱϑΝΠϧγεςϜͷϝλσʔλ͕มߋ͞ΕΔ .
  94. Sparse File % % % ϑΝΠϧ ϓϩηεͷԾ૝ΞυϨεۭؒ % . ΞϓϦέʔγϣϯ͸

    ϑΝΠϧͷσʔλ͚ͩΛ ॻ͖׵͍͑ͯΔͭ΋ΓͳͷͰ ͚ͩ͜͜flush͢Δ γεςϜ͕ఀࢭͨ͠λΠϛϯάʹΑͬͯ͸ ϝλσʔλ͕ݹ͍··ʹͳΓɺ৽͍͠ϖʔδͷ಺༰͕ࣦΘΕΔ
  95. if (flags & PMEM_FILE_CREATE) { /* * Always set length

    of file to 'len'. * (May either extend or truncate existing file.) */ if (os_ftruncate(fd, (os_off_t)len) != 0) { ERR("!ftruncate"); goto err; } if ((flags & PMEM_FILE_SPARSE) == 0) { if ((errno = os_posix_fallocate(fd, 0, (os_off_t)len)) != 0) { ERR("!posix_fallocate"); goto err; } } } else { ssize_t actual_size = util_file_get_size(path); if (actual_size < 0) { ERR("stat %s: negative size", path); errno = EINVAL; goto err; } len = (size_t)actual_size; } pmdk-1.4.3/src/libpmem/pmem.cΑΓ ৽نϑΝΠϧ࡞੒࣌ʹ ϑΝΠϧͷઌ಄͔Β຤ඌ·ͰΛ fallocate͍ͯ͠Δ pmem_map_fileͰ ࡞੒͞ΕͨϑΝΠϧ͸ SparseʹͳΒͳ͍
  96. Copy on Write % % % ϑΝΠϧ ϓϩηεͷԾ૝ΞυϨεۭؒ % .

    ͍͔ͭ͘ͷϑΝΠϧγεςϜ͸ ϖʔδ͕ॻ͖׵͑ΒΕΔࡍʹ ඞͣ৽͍͠ྖҬΛ֬อ͢Δ flushΛϢʔβۭؒͰย෇͚Δࣄ͕ઈରʹͰ͖ͳ͍ ·͍ͣ ॻ͖ࠐΈ
  97. • flushΛϢʔβۭؒͰย෇͚͍ͨ • ϑΝΠϧγεςϜ͸Χʔωϧ͕؅ཧ͍ͯ͠Δ ͜ͷ2ͭΛཱ྆ͤ͞Α͏ͱ͢Δͷ͕ෆ޾ͷݯ ϑΝΠϧγεςϜΛ΍ΊΑ͏

  98. NVDIMMσόΠεΛ௚઀ ϓϩηεͷԾ૝ΞυϨεۭؒʹ mmapͰ͖ΔΑ͏ʹ͢Δ Device DAX ϓϩηεͷԾ૝ΞυϨεۭؒ NVDIMM্ʹ࡞ͬͨ ϑΝΠϧγεςϜͷ্ͷ ϑΝΠϧͰ͸ͳ͘

  99. Device DAX NVDIMMσόΠεΛ௚઀ ϓϩηεͷԾ૝ΞυϨεۭؒʹ mmapͰ͖ΔΑ͏ʹ͢Δ ΞυϨεۭؒ ར఺ Ϣʔβۭؒϓϩηε͕ flushΛཁ͢ΔՕॴΛ׬શʹ೺ѲͰ͖Δ

  100. Device DAX NVDIMMσόΠεΛ௚઀ ϓϩηεͷԾ૝ΞυϨεۭؒʹ mmapͰ͖ΔΑ͏ʹ͢Δ ΞυϨεۭؒ ར఺ Ϣʔβۭؒϓϩηε͕ flushΛཁ͢ΔՕॴΛ׬શʹ೺ѲͰ͖Δ ॻ͖ࠐΈʹ͔͔Δ͕࣌ؒ༧ଌͰ͖Δ

  101. Device DAX NVDIMMσόΠεΛ௚઀ ϓϩηεͷԾ૝ΞυϨεۭؒʹ mmapͰ͖ΔΑ͏ʹ͢Δ ΞυϨεۭؒ ར఺ Ϣʔβۭؒϓϩηε͕ flushΛཁ͢ΔՕॴΛ׬શʹ೺ѲͰ͖Δ ॻ͖ࠐΈʹ͔͔Δ͕࣌ؒ༧ଌͰ͖Δ

    1GB HugePageΛ࢖ͬͯ TLBϛεΛ཈͑Δࣄ͕Ͱ͖Δ
  102. Device DAX NVDIMMσόΠεΛ௚઀ ϓϩηεͷԾ૝ΞυϨεۭؒʹ mmapͰ͖ΔΑ͏ʹ͢Δ ΞυϨεۭؒ ར఺ Ϣʔβۭؒϓϩηε͕ flushΛཁ͢ΔՕॴΛ׬શʹ೺ѲͰ͖Δ ॻ͖ࠐΈʹ͔͔Δ͕࣌ؒ༧ଌͰ͖Δ

    1GB HugePageΛ࢖ͬͯ TLBϛεΛ཈͑Δࣄ͕Ͱ͖Δ ܽ఺ ϑΝΠϧγεςϜ͸࢖͑ͳ͍
  103. https://github.com/pmem/ndctl ndctl NVDIMMΛͲͷΑ͏ʹར༻͢Δ͔Λ LinuxΧʔωϧͷNVDIMMαϒγεςϜʹࢦࣔ͢ΔίϚϯυ

  104. $ ndctl list [ { "dev":"namespace0.0", "mode":"fsdax", "map":"dev", "size":2111832064, "uuid":"d8aeb862-2052-4d0e-af2b-4961dfaca8d3",

    "sector_size":512, "align":2097152, "blockdev":"pmem0" } ] $ umount /mnt/pmem $ ndctl disable-namespace namespace0.0 disabled 1 namespace $ ndctl destroy-namespace "namespace0.0" destroyed 0 namespaces $ ndctl list $ ls /dev/pmem0 ls: cannot access '/dev/pmem0': No such file or directory σόΠεͷશྖҬ͕ Filesystem DAXΛ࢖͑Δ໊લۭؒʹ ׂΓ౰ͯΒΕ͍ͯͯ /dev/pmem0͔Β ϒϩοΫσόΠεͱͯ͠࢖͑Δঢ়ଶ ϑΝΠϧγεςϜΛΞϯϚ΢ϯτ͠ ໊લۭؒΛ࡟আ
  105. disabled 1 namespace $ ndctl destroy-namespace "namespace0.0" destroyed 0 namespaces

    $ ndctl list $ ls /dev/pmem0 ls: cannot access '/dev/pmem0': No such file or directory ϑΝΠϧγεςϜΛΞϯϚ΢ϯτ͠ ໊લۭؒΛ࡟আ $ ndctl create-namespace -e "namespace0.0" -m devdax -a 1G { "dev":"namespace0.0", "mode":"devdax", "map":"dev", "size":"1024.00 MiB (1073.74 MB)", "uuid":"e307a092-8d2d-4d4c-a96e-2163c7d0b770", "daxregion":{ "id":0, "size":"1024.00 MiB (1073.74 MB)", "align":1073741824, "devices":[ { "chardev":"dax0.0", "size":"1024.00 MiB (1073.74 MB)", "target_node":0, "mode":"devdax" } ] }, "align":1073741824 } ར༻ํ๏devdax ϖʔδαΠζ1GBͰ ৽໊͍͠લۭؒΛ࡞Δ
  106. { "chardev":"dax0.0", "size":"1024.00 MiB (1073.74 MB)", "target_node":0, "mode":"devdax" } ]

    }, "align":1073741824 } ৽໊͍͠લۭؒΛ࡞Δ ls -lha /dev/dax0.0 crw------- 1 root root 252, 6 10݄ 19 10:35 /dev/dax0.0 It's a character device!
  107. Device DAX ls -lha /dev/dax0.0 crw------- 1 root root 252,

    6 10݄ 19 10:35 /dev/dax0.0 ͜ͷσόΠε͸ • open (։͘) • close (ด͡Δ) • mmap (Ծ૝ΞυϨεۭؒʹϚοϓ͢Δ) • fallocate (Ϛοϓͨ͠෺ͷҰ෦Λണ͕͢) ͚͕ͩͰ͖Δ fallocate͸ಛఆͷϖʔδͷׂΓ౰ͯΛ ണ͕͚ͩ͢ͷͨΊʹ༻ҙ͞Ε͍ͯΔ
  108. bool device_dax = false; size_t file_size = 0u; bool create

    = vm.count( "create" ); if( fs::exists( path ) ) { device_dax = is_special_file( path ); if( !device_dax ) file_size = fs::file_size( path ); else file_size = 0; } else { file_size = pool_size; create = true; } PMEMobjpool *raw_pool = create ? pmemobj_create( filename.c_str(), layout, file_size, 0666 ) : pmemobj_open( filename.c_str(), layout ); DeviceDAXʹpmemobjͷϓʔϧΛ࡞Δ࣌͸ pmemobj_createͷϑΝΠϧαΠζΛ0ʹ͢Δ
  109. ͓·͚ Intel Optane DC Persistent MemoryΛಈ͔͢ʹ͸ CascadeLakeϚΠΫϩΞʔΩςΫνϟҎ߱ͷ Xeon GoldҎ্ͷϓϩηοα͕ཁΔ ௒ߴ͍

    memmap=2G!14G ΧʔωϧύϥϝʔλmemmapʹಛผͳࢦఆΛ෇͚ͯLinuxΛىಈ͢Δͱ DRAMͷҰ෦ΛNVDIMMͩͱࢥ͍ࠐΉΑ͏ʹͳΔ /7%*..ѻ͍͢ΔαΠζ %3".ѻ͍͢ΔαΠζ NVDIMMΛ࢖͏ΞϓϦέʔγϣϯͷςετʹศར
  110. ·ͱΊ ϝϞϦͷΑ͏ʹॻ͚Δ৽͍͠ετϨʔδ NVDIMM ΧʔωϧͷϒϩοΫϨΠϠʔΛᷖճ͢Δ Filesystem DAXͱDevice DAX ޮ཰ͷѱ͍ϖʔδ୯Ґͷflushͷ୅ΘΓʹϢʔβۭؒͰ PMDK