Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mackerelにおける時系列データベースの性能改善 / Performance Improvement of TSDB in Mackerel

Mackerelにおける時系列データベースの性能改善 / Performance Improvement of TSDB in Mackerel

ペパボ・はてな技術大会〜インフラ技術基盤〜@福岡

Yuuki Tsubouchi (yuuk1)

July 09, 2016
Tweet

More Decks by Yuuki Tsubouchi (yuuk1)

Other Decks in Technology

Transcript

  1. Mackerelʹ͓͚Δ
    ࣌ܥྻσʔλϕʔεͷੑೳվળ
    ϖύϘɾ͸ͯͳٕज़େձʙΠϯϑϥٕज़ج൫ʙ@෱Ԭ
    ͸ͯͳ id:y_uuki

    View Slide

  2. id:y_uuki
    yuuki
    ΢ΣϒΦϖϨʔγϣϯΤϯδχΞ@͸ͯͳ
    ೖࣾ3೥໨͘Β͍

    View Slide

  3. 07/[email protected]ژ౎
    https://speakerdeck.com/yuukit/linux-network-performance-improvement-at-hatena

    View Slide

  4. ΋͘͡
    1. Mackerelͱ࣌ܥྻσʔλ
    2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ
    3. σΟεΫεϥογϯά໰୊ͱͦͷղܾ
    4. ·ͱΊ

    View Slide

  5. ΋͘͡
    1. Mackerelͱ࣌ܥྻσʔλ
    2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ
    3. σΟεΫεϥογϯά໰୊ͱͦͷղܾ
    4. ·ͱΊ

    View Slide

  6. https://mackerel.io

    View Slide

  7. αʔόͷϝτϦοΫՄࢹԽ

    View Slide

  8. MackerelͷΞʔΩςΫνϟ

    View Slide

  9. Mackerelͷ࣌ܥྻσʔλͷಛੑ
    • ΤʔδΣϯτ͕Ϣʔβ͞Μͷϗετ͔Βຖ෼ϝτϦοΫ
    ౤ߘ
    • 2016/01࣌఺ͰΞΫςΟϒΤʔδΣϯτ਺ 10,000+
    • 1ΤʔδΣϯτ͋ͨΓͷϝτϦοΫ਺͸࠷େ200
    • ԾʹฏۉϝτϦοΫ਺Λ100 metrics/agentͱ͢Δͱɹ
    ߹ܭૹ৴ϝτϦοΫ਺ 1,000,000 metrics/min +
    • ϝτϦοΫͷେྔॻ͖ࠐΈʹ଱͑ΒΕΔσʔλϕʔε͕
    ඞཁ

    View Slide

  10. Graphite

    View Slide

  11. ΋͘͡
    1. Mackerelͱ࣌ܥྻσʔλ
    2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ
    3. σΟεΫεϥογϯά໰୊ͱͦͷղܾ
    4. ·ͱΊ

    View Slide

  12. Graphiteͱ͸
    • PythonͰॻ͔Εͨ࣌ܥྻσʔλϕʔεϛυϧ΢ΣΞ
    • HTTPΠϯλϑΣʔε ʢॻ͖ࠐΈ͸ಠࣗϓϩτίϧʣ
    • ग़ྗσʔλܗࣜ͸άϥϑը૾·ͨ͸JSON
    Graphite
    (timestamp, name, value) graph request
    Image or JSON

    View Slide

  13. GraphiteͷΞʔΩςΫνϟ
    (timestamp, name, value) graph request Image or JSON
    carbon graphite-web
    filesystem
    write read
    whisper whisper

    View Slide

  14. GraphiteͷΞʔΩςΫνϟ (graphite-web)
    (timestamp, name, value) graph request Image or JSON
    carbon graphite-web
    filesystem
    write read
    whisper whisper
    ಡΈࠐΈཁٻΛड͚෇͚ΔͨΊͷWebΞϓϦέʔγϣϯ

    View Slide

  15. GraphiteͷΞʔΩςΫνϟ (carbon)
    (timestamp, name, value) graph request Image or JSON
    carbon graphite-web
    filesystem
    write read
    whisper whisper
    ॻ͖ࠐΈཁٻΛड͚෇͚ΔͨΊͷσʔϞϯ

    View Slide

  16. GraphiteͷΞʔΩςΫνϟ (whisper)
    (timestamp, name, value) graph request Image or JSON
    carbon graphite-web
    filesystem
    write read
    whisper whisper
    ࣌ܥྻDBϑΝΠϧΛ࡞੒ɾߋ৽͢ΔͨΊͷϥΠϒϥϦ
    ϝτϦοΫ͝ͱʹ
    ϑΝΠϧ͕Ͱ͖Δ

    View Slide

  17. Whisperͷσʔλߏ଄
    • ͢΂ͯͷσʔλΛอଘ͢ΔͱσΟεΫ࢖༻ྔ͕ංେԽ
    • timestamp: 4byte, value: 8byteͱͯ͠12bytes/datapointͱ͢Δ
    ͱɺ1೥Ͱ6MB/metric
    • ݹ͍σʔλʹ͍ͭͯ͸ҰఆظؒͰฏۉԽor࠷େ஋Λ࢒ؙͯ͠Ί
    ͯ͠·ͬͯσΟεΫ࢖༻ྔΛઅ໿
    • ex. 1෼ਫ਼౓ͷσʔλ͸1೔෼͚ͩͰΑ͍͕ɺ5෼ਫ਼౓ͷσʔλ
    ͸1िؒ࢒͢ͱ͍͏Α͏ͳΠϝʔδ

    View Slide

  18. Graphiteͷॻ͖ࠐΈύϑΥʔϚϯεಛੑ(CPUར༻཰)
    • carbon͸2ͭͷεϨου͕ڠௐͯ͠ಈ࡞͢Δ
    • σʔλΛड͚औΔωοτϫʔΫI/OεϨου
    • ϑΝΠϧॻ͖ࠐΈͷͨΊͷI/OεϨου
    • ΠϕϯτۦಈϞσϧͷωοτϫʔΫαʔό
    • όοϑΝ͝͠ʹεϨουؒͰσʔλϙΠϯτΛ౉͢
    • ֤εϨου͕1ίΞͰ཯଎͢Δ໰୊
    • carbonϓϩηεΛෳ਺ݸͨͯͯ෼ࢄͤ͞Δ

    View Slide

  19. Graphiteͷॻ͖ࠐΈύϑΥʔϚϯεಛੑ(σΟεΫIO)
    • େྔͷϑΝΠϧʹখ͞ͳσʔλྔʢ໿12ByteʣΛ1෼Ҏ
    ಺ʹॻ͖ࠐΉ
    • ϑΝΠϧγεςϜ্ͷۙྡϒϩοΫʹ·ͱΊͯॻ͘͜ͱ
    ͕Ͱ͖ͳ͍ͨΊɺI/Oޮ཰͸ѱ͍ (શํҐॻ͖ࠐΈ)
    • ൓໘ɺಉ࣌ʹෳ਺ͷεϨου͕1ͭͷϑΝΠϧʹॻ͖ࠐ
    Ή͜ͱ͕ͳ͍ͨΊɺ I/Oͷฒྻ౓͸ߴΊ΍͍͢
    • XFSͷΑ͏ͳฒྻI/Oʹ༏ΕͨϑΝΠϧγεςϜͰͳ͘
    ͯ΋ɺੑೳ͸มΘΒͳ͍ (ext4ͳͲ)

    View Slide

  20. ϋʔυ΢ΣΞߏ੒ͱϦιʔε࢖༻ྔ
    • CPU: Xeon E5-2697 v3 @ 2.60GHz 2 socket 28ίΞ
    • ϝϞϦ: 126GB
    • σΟεΫ: Fusion ioMemory ioDrive2 6.4TB
    • ͍ΘΏΔϑϨογϡετϨʔδɻϝʔΧʔެশ஋͸
    300k write IOPS
    • ࣮ޮI/Oੑೳ: 50k ~ 100k write IOPS
    • ී௨ͷSSDͳΒ1/10ͷੑೳ͕ͰΕ͹ྑ͍ํ

    View Slide

  21. Graphiteνϡʔχϯά
    • ioDriveͷIOPSΛ࢖͍੾ΔલʹCPUϦιʔεΛ࢖͍͖ͬ
    ͯ͠·͏ͨΊɺCPUΛઅ໿ͯ͠I/Oʹ޲͚Δߟ͑ํ
    • random writeʹڧ͍ߴ଎ͳσΟεΫͳͨΊɺجຊతʹ
    carbon΍I/Oεέδϡʔϥʹ͸༨ܭͳ࠷దԽΛͤ͞ͳ͍
    • ιʔτʹΑΔI/Oޮ཰Խ΍I/OϦιʔεΛ࢖͍͖Βͳ͍
    ͨΊͷ੍ݶͷύϥϝʔλ͕͋Δ
    • echo noop > /sys/block/fioa/queue/scheduler

    View Slide

  22. GraphiteΫϥελߏ੒
    (timestamp, name, value)
    graphite-web
    carbon carbon …

    LB
    carbon carbon … …
    LB LB
    carbon carbon … …

    View Slide

  23. ৄ͘͠͸ϒϩάͰ
    http://blog.yuuk.io/entry/high-performance-graphite

    View Slide

  24. ΋͘͡
    1. Mackerelͱ࣌ܥྻσʔλ
    2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ
    3. σΟεΫεϥογϯά໰୊ͱͦͷղܾ
    4. ·ͱΊ

    View Slide

  25. write IOPS
    read IOPS
    ಥવͷreadෛՙ૿େ

    View Slide

  26. ͳʹ͕ى͖ͨͷ͔
    • read IOPS͕૿Ճ͠ɺwrite IOPS͕ݮগ͍ͯ͘͠
    • ϝϞϦෆ଍ʹΑΔSwapྖҬͷ࢖༻͸ͳ͠ɻOSͷϝϞϦ
    ࢖༻ྔ͸1/3ఔ౓ͩͬͨ
    • αʔϏε΁ͷಥൃతͳΞΫηε૿Ճ͸ͳ͠
    • sar -BͰɺҰఆ࣌ؒ಺ͷϖʔδΠϯͱϖʔδΞ΢τͷ਺͕
    ҟৗʹ૿͍͑ͯͨ͜ͱ͕൑໌
    • ͜ͷݱ৅ΛσΟεΫεϥογϯάͱݺͿ͜ͱʹ͢Δ
    • LinuxͷϖʔδΩϟογϡͷ࢓૊ΈͱGraphiteͷI/Oύ
    λʔϯ͔ΒݪҼΛਪ࡯ͨ͠

    View Slide

  27. LinuxͷϖʔδΩϟογϡ
    • ϝϞϦͷ಺༁ = used + buffers/caches + free
    • ϑΝΠϧγεςϜ͔ΒσʔλΛಡΈࠐΉ/ॻ͖ࠐΉͱɺ࣍ճ
    Ҏ߱ߴ଎ʹಡ·ͤΔͨΊʹɺOS͕ϖʔδ୯ҐͰσΟεΫ্
    ͷσʔλΛϝϞϦʹࡌͤΔ
    • ϖʔδΩϟογϡͱݺͿ
    • ϖʔδΩϟογϡ͸LRUΞϧΰϦζϜɻ࠷ۙࢀর͞Εͨ
    Ωϟογϡσʔλ͸࢒͠ɺࢀর͞Εͳ͍ݹ͍Ωϟογϡσʔ
    λΛফ͢
    • ϖʔδΩϟογϡ͸௨ৗϝϞϦ࢖༻ྔʹؚ·Εͳ͍

    View Slide

  28. GraphiteͷI/Oύλʔϯ
    • 1෼Ҏ಺ʹશͯͷΞΫςΟϒͳwhisperϑΝΠϧʹॻ͖
    ࠐΉͨΊɺσΟεΫͷ޿ൣғʹ౉ͬͯॻ͖ࠐΈ͕૸Δ
    • whisperͷϝτϦοΫॻ͖ࠐΈૢ࡞͸ɺwrite(2)͚ͩͰ
    ͳ͘ɺϝλσʔλͷಡΈࠐΈ΍ΦϑηοτܭࢉͷͨΊ
    ͷread(2)΋૸Δ
    • ϖʔδΩϟογϡ͸read͚ͩͰͳ͘writeʹ΋༗ޮ
    (Direct I/O͸আ͘)
    • Graphiteϗετ͸େྔͷϖʔδΩϟογϡΛ΋ͭ

    View Slide

  29. read IOPS૿ͷݪҼ
    • ϖʔδΠϯͱϖʔδΞ΢τճ਺͕ଟ͍ͱ͍͏͜ͱ͸ɺ
    LRUʹΑΓݹ͍Ωϟογϡ͕௥͍ग़͞Ε͍ͯΔ
    • whisperॻ͖ࠐΈͷreadͰϖʔδΩϟογϡ͕ޮ͔ͳ͘
    ͳͬͨ݁Ռɺread IOPS͕૿͑ͨ
    Memory
    used
    page cache
    page in page out

    View Slide

  30. ϖʔδΩϟογϡͷઅ໿
    • ౥ࡌϝϞϦΛ૿΍͢͜ͱͰҰԠղܾͰ͖Δ͕ɺ͢Ͱʹ
    126GB RAMͳͷͰɺແବͳϖʔδΩϟογϡΛ࡟ݮ͍ͨ͠
    • writeͨ͠σʔλΛ͙͢ʹಡΉͱ͸ݶΒͳ͍ͨΊɺwrite࣌ͷ
    σʔλΛΩϟογϡʹͷͤͳ͍ => Direct I/O
    • ͔͠͠ɺDirect I/OΛ࢖͏ͨΊʹ͸ɺϒϩοΫαΠζͰϝϞ
    ϦΞϥΠϝϯτΛἧ͑Δඞཁ͕͋Δ => PythonͰ΍Δͷ͕
    ͱͯ΋໘౗ (malloc => posix_memalign)
    • posix_fadvise(2)Λ࢖ͬͯղܾ

    View Slide

  31. posix_fadvise(2)
    • ϓϩηε͕Χʔωϧ΁ϑΝΠϧσʔλͷΞΫηεύλʔϯΛ
    ௨஌
    • Χʔωϧ͸ࢦఆ͞ΕͨΞΫηεύλʔϯʹԠͯ͡I/Oੑೳ͕޲
    ্͢ΔΑ͏ʹ࠷దԽ
    • ΞΫηεύλʔϯ
    • POSIX_FADV_SEQUENTIAL: 2ഒͷઌಡΈ
    • POSIX_FADV_RANDOM: ઌಡΈఀࢭ
    • POSIX_FADV_DONTNEED: Ωϟογϡͨ͠ϖʔδͷղ์
    • etc
    int posix_fadvise(int fd, off_t offset, off_t len, int advice);

    View Slide

  32. posix_fadvise(2)ΛGraphiteʹద༻
    • ࠷ॳ͸ɺϖʔδΩϟογϡΛམͱ͢Φϓγϣϯʹண໨
    • whisperͷॻ͖ࠐΈϩδοΫ͸݁ߏෳࡶͳͨΊɺwriteʹ
    ΑΔϖʔδΩϟογϡ෦෼͚ͩΛམͱ͢ͷ͕೉͍͠
    • FAD_RANDONʹΑΓɺઌಡΈΛͤͣඞཁͳϖʔδ෼͚ͩ
    Ωϟογϡ͢ΔΑ͏ʹͨ͠
    • whisperͷॻ͖ࠐΈͰγʔέϯγϟϧʹᢞΊΔॲཧ͸ͳ͍
    • ઌಡΈ͍ͯͨ͠ແବͳϖʔδΩϟογϡ͕ݮͬͨ
    Active(file): 5387160 kB
    Inactive(file): 37566804 kB
    Active(file): 32252136 kB
    Inactive(file): 7231020 kB
    /proc/meminfo before & after

    View Slide

  33. Graphite΁ͷPull Request

    View Slide

  34. Pull Request಺༰
    • มߋ಺༰͸͞΄Ͳ೉͘͠ͳ͍
    • fadvise ϞδϡʔϧΛ࢖͏
    • straceͯ͠posix_fadvise͕Ͱͯ͘Ε͹ok
    • ৗʹfadvise͢Δͷ͕Α͍͔Θ͔Βͳ͍ͨΊɺઃఆϑΝΠϧ
    ʹΑΔ༗ޮɾແޮΛ੾Γସ͑ΒΕΔΑ͏ʹ (σϑΥϧτແޮ)
    • Ϛʔδͯ͠΋Β͏·Ͱ1ϲ݄͘Β͍͔͔ͬͨ
    with open(path, 'r+b') as fh:
    if CAN_FADVISE and FADVISE_RANDOM:
    posix_fadvise(fh.fileno(), 0, 0, POSIX_FADV_RANDOM)

    View Slide

  35. ςετεΫϦϓτʹΑΔݕূ
    https://gist.github.com/yuuki/8d5d386115b0f01b5371
    • whisperͷॻ͖ࠐΈؔ਺Λ࢖ͬͯɺ࣮ࡍʹϖʔδΩϟο
    γϡͷྔ͕ݮΔ͔Ͳ͏͔֬ೝ
    • 100ݸͷwhisperϑΝΠϧʹରͯ͠100ݸͷσʔλϙΠϯ
    τΛॻ͖ࠐΉεΫϦϓτ
    • /proc//io ͷread_bytes(࣮ࡍʹσΟεΫ͔ΒಡΈͩ
    ͨ͠αΠζ)ΛΈΔ
    • POSIX_FAD_RANDOMΦϓγϣϯΛ͚ͭΔͱϖʔδ
    Ωϟογϡྔ͕1/2ʹͳͬͨ

    View Slide

  36. ΋͘͡
    1. Mackerelͱ࣌ܥྻσʔλ
    2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ
    3. σΟεΫεϥογϯά໰୊ͱͦͷղܾ
    4. ·ͱΊ

    View Slide

  37. ·ͱΊ
    • MackerelͰ͸ 1,000,000 metrics/min + ͷϝτϦοΫ
    ॻ͖ࠐΈΛࡹ͘ඞཁ͕͋Δ
    • ࣌ܥྻσʔλϕʔεͱͯ͠GraphiteΛબ୒
    • ioDriveલఏͰOS೚ͤͷνϡʔχϯά
    • σΟεΫεϥογϯά໰୊Λposix_fadviseʹΑΓ
    writebackʹΑΔϖʔδΩϟογϡΛແޮʹ͢Δύον
    Ͱղܾ

    View Slide

  38. View Slide

  39. 1෼ҎԼͷཻ౓ͷϝτϦοΫ
    ཻ౓ΛଛͳΘͣ௕ظอଘ
    ϦΞϧλΠϜͳҟৗݕ஌

    View Slide

  40. ࣍ੈ୅ͷ࣌ܥྻσʔλϕʔεʹ
    ࡮৽͍ͨ͠

    View Slide

  41. http://hatenacorp.jp/recruit/fresh/operation-engineer
    ٕज़͕޷͖ͳਓ΁

    View Slide

  42. ຊεϥΠυͷKeynoteςϯϓϨʔτͱͯ͠
    shoya140͞ΜͷZebra(http://shoya.io/blog/zebra/) Λ࢖Θ͍͖ͤͯͨͩ·ͨ͠
    Mackerelʹ͓͚Δ
    ࣌ܥྻσʔλϕʔεͷੑೳվળ
    ϖύϘɾ͸ͯͳٕज़େձʙΠϯϑϥٕज़ج൫ʙ@෱Ԭ
    ͸ͯͳ id:y_uuki

    View Slide