Upgrade to Pro — share decks privately, control downloads, hide ads and more …

General-purpose hybrid storage system

General-purpose hybrid storage system

第4回 Web System Architecture 研究会 (WSA研) の発表資料です。
https://websystemarchitecture.hatenablog.jp/entry/2019/02/26/100725

Takamura Narimichi

April 13, 2019
Tweet

More Decks by Takamura Narimichi

Other Decks in Technology

Transcript

  1. ൚༻తͳϋΠϒϦου
    ετϨʔδγεςϜͷఏҊ
    גࣜձࣾϋʔτϏʔπ
    Takamura Narimichi @nari_ex
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 1

    View Slide

  2. ࣗݾ঺հ
    • Takamura Narimchi / ߴଜ ੒ಓ
    • @nari_ex
    • גࣜձࣾϋʔτϏʔπ औక໾ VPoE
    • ిؾ௨৴େֶ
    • ৘ใཧ޻ֶ෦৘ใɾ௨৴޻ֶՊ ֶ࢜
    • άϩʔϏεܦӦେֶӃ
    • ܦӦݚڀՊܦӦઐ߈ म࢜ʢMBAʣ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 2

    View Slide

  3. ໨࣍
    • എܠͱ՝୊
    • ఏҊ
    • ࣮૷ํ๏
    • ຊػߏͷར༻ύλʔϯ
    • ·ͱΊ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 3

    View Slide

  4. എܠͱ՝୊
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 4

    View Slide

  5. എܠͱ՝୊
    • WebαʔϏεͷීٴʹΑΓσʔλྔ͕രൃతʹ૿Ճ
    • ๲େͳσʔλΛอଘ͢ΔετϨʔδͷඅ༻͕૿Ճ
    • සൟʹΞΫηε͞ΕΔσʔλ͸શମͷ͘͝Ұ෦
    • අ༻ରޮՌΛߴΊΔͨΊʹར༻ස౓ͷ௿͍σʔλͷඅ༻Λ࡟ݮ
    ͍ͨ͠
    ※ ຊݚڀʹ͓͚Δσʔλͱ͸ίϯςϯπϑΝΠϧΛର৅ͱ͢Δ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 5

    View Slide

  6. ิ଍: σʔλʹؔ͢Δݴ༿ͷఆٛ
    ຊݚڀͰ͸ར༻ස౓͝ͱʹσʔλΛ2छྨʹ෼ྨ͢Δ
    • ϗοτσʔλ: ར༻ස౓ͷߴ͍σʔλ
    • ίʔϧυσʔλ: ར༻ස౓ͷ௿͍σʔλ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 6

    View Slide

  7. ิ଍: ίϯςϯπσʔλʹ͓͚Δίʔϧυσʔλͷྫ
    • धཁ͕গͳ͍ෆಈ࢈ͷ෺݅σʔλ
    • γʔζϯΦϑʹͳͬͨΞύϨϧͷ੡඼σʔλ
    • ड৴ࡁΈ͔ͭࢀর͞Εͳ͍ϝʔϧσʔλ
    • ϩά഑৴αʔόʹ͓͚Δݹ͍ϩάσʔλ
    ※ ͍ͣΕ΋ϑΝΠϧ୯ҐͰͷΞΫηε͕ඞཁͰ͋ΔέʔεΛ૝ఆ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 7

    View Slide

  8. ՝୊ʹର͢Δجຊઓུ
    • ίʔϧυσʔλΛԿΒ͔ͷํ๏Ͱ҆ՁͳετϨʔδʹҠಈ͢Δ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 8

    View Slide

  9. ैདྷख๏1: ΞʔΧΠϒ
    • ௕ظอଘ͢ΔͨΊʹઐ༻ͷอଘྖҬʹ҆શʹσʔλΛอଘ͢Δ͜ͱ
    • ҆ՁͷετϨʔδʹҠಈ͢Δ͜ͱͰίετ࡟ݮ͕Մೳ
    • ՝୊
    • ผͷ৔ॴʹҠಈ͢ΔͨΊɺୀආલͱಉ͡Α͏ʹσʔλΛར༻͢
    Δ͜ͱ͕ࠔ೉
    => ར༻͞ΕΔՄೳੑ͕θϩʹͳΒͳ͍ίϯςϯπσʔλʹ͸ෆ޲͖
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 9

    View Slide

  10. ैདྷख๏2: ΤϯλʔϓϥΠζ޲͚ετϨʔδ੡඼
    • ར༻ස౓ͷ௿͍σʔλΛ҆ՁͳετϨʔδʹࣗಈͰҠಈ͢Δ
    • ՝୊
    • ϕϯμϩοΫΠϯ
    • ଟֹͷ౤ࢿ͕ඞཁ
    • Ϋϥ΢υ؀ڥ΁ͷಋೖ͕ࠔ೉
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 10

    View Slide

  11. ैདྷख๏2: ΤϯλʔϓϥΠζ޲͚ετϨʔδ੡඼
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 11

    View Slide

  12. ิ଍: Ϋϥ΢υԽʹ൐͏ετϨʔδ૚ͷࣗ༝౓௿Լ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 12

    View Slide

  13. ैདྷख๏3: طଘϑΝΠϧγεςϜͷ֦ு
    • Btrfs ͷ֦ு1
    • ϚϧνσόΠεʹରԠ͍ͯ͠Δ Btrfs ͷಛ௃Λ׆͔ͨ͠ݚڀ
    • ൚༻ϒϩοΫ૚ʹͯσʔλͷҠಈΛߦ͏
    • ՝୊
    • σʔλҠಈ࣌ʹڞ༗ϦιʔεʢϝϞϦɺCPUʣͷෛՙ͕ൃੜ
    • Btrfs Ҏ֎ͷϑΝΠϧγεςϜΛར༻Ͱ͖ͳ͍
    • ޿͘ར༻͞Ε͍ͯΔ ext4 ΍ xfs ͕ར༻Ͱ͖ͳ͍
    1 Hot Cold Data Tracking and Migra3on in btrfs.
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 13

    View Slide

  14. ิ଍: Linux I/O ֎؍
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 14

    View Slide

  15. ՝୊·ͱΊ
    • ΞʔΧΠϒ
    • ίϯςϯπσʔλʹෆ޲͖
    • ΤϯλʔϓϥΠζ޲͚੡඼
    • ߴՁ
    • ϕϯμϩοΫΠϯ
    • Ϋϥ΢υʹෆ޲͖
    • طଘϑΝΠϧγεςϜͷ֦ு
    • ϑΝΠϧγεςϜͷબ୒͕Ͱ͖ͳ͍
    • σʔλҠಈ࣌ͷෛՙʹΑͬͯϝΠϯॲཧͷಈ࡞ʹӨڹ͕ग़Δ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 15

    View Slide

  16. ՝୊·ͱΊ2
    • ಋೖͰ͖Δ؀ڥ͕ݶΒΕ͍ͯΔ
    • σʔλҠಈ࣌ͷෛՙ͕՝୊
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 16

    View Slide

  17. ఏҊ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 17

    View Slide

  18. ൚༻తͳϋΠϒϦουετϨʔδγεςϜͷఏҊ
    • ༷ʑͳ؀ڥͰར༻ՄೳͳϋΠϒϦουετϨʔδγεςϜ
    • ίʔϧυσʔλҠಈ࣌ʹϝΠϯॲཧͷಈ࡞Λ๦͛ͳ͍
    ※ Linux ্Ͱͷ࣮૷Λ૝ఆ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 18

    View Slide

  19. ୡ੒͍ͨ͜͠ͱ
    • ൚༻ੑ͕ߴ͍
    • ΦϯϓϨ؀ڥ͸΋ͪΖΜɺΫϥ΢υ؀ڥͰ΋ར༻Մೳ
    • OSS Ͱߏ੒͞ΕɺLinux্Ͱಈ࡞Λ͢Δ
    • ಋೖ࣌ʹಛఆͷϑΝΠϧγεςϜʹґଘ͠ͳ͍
    • ϗοτετϨʔδɺίʔϧυετϨʔδ͝ͱʹϑΝΠϧγεςϜΛબ΂Δ
    • ίʔϧυσʔλͷҠಈෛՙ͕े෼ʹ௿͍
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 19

    View Slide

  20. ࣮૷
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 20

    View Slide

  21. ࣮૷্ͷ՝୊ͱରࡦ
    • Linux ্Ͱ͸جຊతʹϓϩηε୯ҐͰ੍ޚΛߦ͏
    • ଳҬίϯτϩʔϧ͸࠶഑ஔ࣌ͷΈʹద༻͍ͨ͠
    • => ΞΫηεৼΓ෼͚ͱσʔλͷ࠶഑ஔͷϓϩηεΛ෼཭
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 21

    View Slide

  22. ࣮૷ํ਑
    1. ΞΫηεৼΓ෼͚
    • ϢʔβεϖʔεϑΝΠϧγεςϜʢFUSEʣʹ࣮ͯ૷
    • खܰʹಋೖՄೳ
    • ೚ҙͷϑΝΠϧγεςϜΛར༻Մೳ
    2. σʔλͷ࠶഑ஔ
    • σʔϞϯϓϩηεʹ࣮ͯ૷
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 22

    View Slide

  23. 1. ΞΫηεৼΓ෼͚
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 23

    View Slide

  24. Mul$-Temperature FileSystemʢMTFSʣ
    • ϢʔβεϖʔεϑΝΠϧγεςϜ
    • ϢʔβʔεϖʔεσʔϞϯϓϩηε: m%sd
    • ϗοτσʔλ༻ͱίʔϧυσʔλ༻ͷύʔςΟγϣϯΛͦΕ
    ͧΕࢦఆͯ͠ىಈ
    • ΞϓϦέʔγϣϯ͔Βཁٻ͞ΕΔϑΝΠϧૢ࡞Λίϯτϩʔ
    ϧ͢Δ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 24

    View Slide

  25. m"sd ͷॲཧ֓ཁ
    1. ΞϓϦέʔγϣϯ͕ϑΝΠϧΞΫηεΛཁٻ
    2. FUSE ϥΠϒϥϦΛ௨ͯ͠ m*sd ͕γεςϜίʔϧΛड৴
    3. ϗοτετϨʔδ΁໰͍߹Θͤ
    • ϑΝΠϧ͕ଘࡏ͠ͳ͍৔߹͸ίʔϧυετϨʔδʹ໰͍߹Θͤ
    4. औಘͨ͠σʔλΛΞϓϦέʔγϣϯ΁ฦ٫
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 25

    View Slide

  26. 2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 26

    View Slide

  27. ಡΈࠐΈॲཧͷಈ͖
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 27

    View Slide

  28. ॻ͖ࠐΈͷಈ͖
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 28

    View Slide

  29. 2. σʔλͷ࠶഑ஔ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 29

    View Slide

  30. σʔλͷ࠶഑ஔ
    • ϢʔβʔεϖʔεσʔϞϯϓϩηε: mt-relocatord
    • ٸ͍Ͱॲཧ͢Δඞཁ͕ͳ͍ͷͰγϯάϧεϨου࣮ߦ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 30

    View Slide

  31. ࣮૷্ͷ՝୊ͱରࡦ
    • I/O ͷଳҬɾεϧʔϓοτ੍ޚ
    • => cgroup2 ͰϒϩοΫ I/OΛ੍ޚ
    • IOPS੍ޚ: riops ͱ wiops ʹ੍ͯݶ
    • εϧʔϓοτ੍ޚ: rbps ͱ wbps Ͱ੍ݶ
    • => ioprio_set() ͰI/Oεέδϡʔϥ૚Λ੍ޚ
    • CLASS_IDLE Λࢦఆ
    • ࠶഑ஔҠ࣌ʹίʔϧυσʔλ͕σΟεΫΩϟογϡΛফඅͯ͠͠·͏
    • => posix_fadvise(POSIX_FADV_DONTNEED)ͰҠಈର৅ͷίʔϧυσʔλͷΩϟογϡΛΫϦΞ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 31

    View Slide

  32. Linux I/O ͷϦιʔε੍ޚՕॴ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 32

    View Slide

  33. ࠶഑ஔॲཧͷ֓ཁ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 33

    View Slide

  34. mt-relocatord ͷઃఆ಺༰
    • ࠶഑ஔॲཧͷεέδϡʔϦϯάͷઃఆ
    • ىಈ࣌ࠁ
    • ࣮ߦपظʢ1೔୯Ґʣ
    • ࠶഑ஔͷᮢ஋ઃఆ
    • ୯Ґ࣌ؒ౰ͨΓͷΞΫηε਺ɺߋ৽਺
    • ࠷ऴΞΫηεɺ࠷ऴߋ৽͔Βݱࡏ࣌ࠁ·Ͱͷܦա࣌ؒ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 34

    View Slide

  35. ར༻Πϝʔδ
    // Create partitions
    # mkfs.xfs /dev/vda1
    # mkfs.btrfs /dev/vdb1
    // Create MTFS managed Volumes
    # mtfsctl hot-volume create hv0 /export/sda1/www/
    # mtfsctl cold-volume create cv0 /export/sdb1/www/
    // Create mfsd
    # mtfsctl volume start hv0 cv0
    # systemctl start mtfsd
    // Start mt-relocatord
    # systemctl start mt-relocatord
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 35

    View Slide

  36. ຊػߏͷར༻ύλʔϯ
    environment Hot Data Storage Cold Data Storage
    On-premises(Storage
    Device)
    SSD HDD
    AWS(Block Storage) EBS Provisioned IOPS SSD Cold HDD
    AWS(Shared Storage) EFS(Provisioned
    Throughput)
    EFS(Infrequent Access
    Storage Class)
    ※ Shared Storage Ҏ֎͸೚ҙͷϑΝΠϧγεςϜ͕ར༻Մೳ
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 36

    View Slide

  37. ՝୊
    • ίʔϧυσʔλҠಈͷ҆શੑͱੑೳ
    • FUSE ʹΑΔಈ࡞Φʔόʔϔου
    • ϑΝΠϧ਺૿େʹର͢Δ mt-relocatord ͷॲཧෛՙ
    • mt-relocatord ͕εέδϡʔϦϯάػೳΛ࣋ͭඞཁ͕͋Δ͔
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 37

    View Slide

  38. ·ͱΊ
    • ൚༻తͳ֊૚ԽετϨʔδγεςϜΛఏҊͨ͠
    • FUSEɺcgroup2ɺioprioɺposix_fadvise Λ૊Έ߹Θ࣮ͤͨ૷
    ΛఏҊͨ͠
    • ຊػߏΛίϯςφ؀ڥʹ΋Ԡ༻͍͖͍ͯͨ͠
    • ex. Docker Volume PluginsɺKubernetes Strage Interface
    2019/04/13 ୈ4ճ Web System Architecture ݚڀձ | @nari_ex 38

    View Slide