Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mackerelにおける時系列データベースの性能改善 / Performance Improvement of TSDB in Mackerel

Mackerelにおける時系列データベースの性能改善 / Performance Improvement of TSDB in Mackerel

ペパボ・はてな技術大会〜インフラ技術基盤〜@福岡

Yuuki Tsubouchi (yuuk1)

July 09, 2016
Tweet

More Decks by Yuuki Tsubouchi (yuuk1)

Other Decks in Technology

Transcript

  1. Mackerelͷ࣌ܥྻσʔλͷಛੑ • ΤʔδΣϯτ͕Ϣʔβ͞Μͷϗετ͔Βຖ෼ϝτϦοΫ ౤ߘ • 2016/01࣌఺ͰΞΫςΟϒΤʔδΣϯτ਺ 10,000+ • 1ΤʔδΣϯτ͋ͨΓͷϝτϦοΫ਺͸࠷େ200 •

    ԾʹฏۉϝτϦοΫ਺Λ100 metrics/agentͱ͢Δͱɹ ߹ܭૹ৴ϝτϦοΫ਺ 1,000,000 metrics/min + • ϝτϦοΫͷେྔॻ͖ࠐΈʹ଱͑ΒΕΔσʔλϕʔε͕ ඞཁ
  2. GraphiteͷΞʔΩςΫνϟ (graphite-web) (timestamp, name, value) graph request Image or JSON

    carbon graphite-web filesystem write read whisper whisper ಡΈࠐΈཁٻΛड͚෇͚ΔͨΊͷWebΞϓϦέʔγϣϯ
  3. GraphiteͷΞʔΩςΫνϟ (carbon) (timestamp, name, value) graph request Image or JSON

    carbon graphite-web filesystem write read whisper whisper ॻ͖ࠐΈཁٻΛड͚෇͚ΔͨΊͷσʔϞϯ
  4. GraphiteͷΞʔΩςΫνϟ (whisper) (timestamp, name, value) graph request Image or JSON

    carbon graphite-web filesystem write read whisper whisper ࣌ܥྻDBϑΝΠϧΛ࡞੒ɾߋ৽͢ΔͨΊͷϥΠϒϥϦ ϝτϦοΫ͝ͱʹ ϑΝΠϧ͕Ͱ͖Δ
  5. Whisperͷσʔλߏ଄ • ͢΂ͯͷσʔλΛอଘ͢ΔͱσΟεΫ࢖༻ྔ͕ංେԽ • timestamp: 4byte, value: 8byteͱͯ͠12bytes/datapointͱ͢Δ ͱɺ1೥Ͱ6MB/metric •

    ݹ͍σʔλʹ͍ͭͯ͸ҰఆظؒͰฏۉԽor࠷େ஋Λ࢒ؙͯ͠Ί ͯ͠·ͬͯσΟεΫ࢖༻ྔΛઅ໿ • ex. 1෼ਫ਼౓ͷσʔλ͸1೔෼͚ͩͰΑ͍͕ɺ5෼ਫ਼౓ͷσʔλ ͸1िؒ࢒͢ͱ͍͏Α͏ͳΠϝʔδ
  6. ϋʔυ΢ΣΞߏ੒ͱϦιʔε࢖༻ྔ • CPU: Xeon E5-2697 v3 @ 2.60GHz 2 socket

    28ίΞ • ϝϞϦ: 126GB • σΟεΫ: Fusion ioMemory ioDrive2 6.4TB • ͍ΘΏΔϑϨογϡετϨʔδɻϝʔΧʔެশ஋͸ 300k write IOPS • ࣮ޮI/Oੑೳ: 50k ~ 100k write IOPS • ී௨ͷSSDͳΒ1/10ͷੑೳ͕ͰΕ͹ྑ͍ํ
  7. ͳʹ͕ى͖ͨͷ͔ • read IOPS͕૿Ճ͠ɺwrite IOPS͕ݮগ͍ͯ͘͠ • ϝϞϦෆ଍ʹΑΔSwapྖҬͷ࢖༻͸ͳ͠ɻOSͷϝϞϦ ࢖༻ྔ͸1/3ఔ౓ͩͬͨ • αʔϏε΁ͷಥൃతͳΞΫηε૿Ճ͸ͳ͠

    • sar -BͰɺҰఆ࣌ؒ಺ͷϖʔδΠϯͱϖʔδΞ΢τͷ਺͕ ҟৗʹ૿͍͑ͯͨ͜ͱ͕൑໌ • ͜ͷݱ৅ΛσΟεΫεϥογϯάͱݺͿ͜ͱʹ͢Δ • LinuxͷϖʔδΩϟογϡͷ࢓૊ΈͱGraphiteͷI/Oύ λʔϯ͔ΒݪҼΛਪ࡯ͨ͠
  8. LinuxͷϖʔδΩϟογϡ • ϝϞϦͷ಺༁ = used + buffers/caches + free •

    ϑΝΠϧγεςϜ͔ΒσʔλΛಡΈࠐΉ/ॻ͖ࠐΉͱɺ࣍ճ Ҏ߱ߴ଎ʹಡ·ͤΔͨΊʹɺOS͕ϖʔδ୯ҐͰσΟεΫ্ ͷσʔλΛϝϞϦʹࡌͤΔ • ϖʔδΩϟογϡͱݺͿ • ϖʔδΩϟογϡ͸LRUΞϧΰϦζϜɻ࠷ۙࢀর͞Εͨ Ωϟογϡσʔλ͸࢒͠ɺࢀর͞Εͳ͍ݹ͍Ωϟογϡσʔ λΛফ͢ • ϖʔδΩϟογϡ͸௨ৗϝϞϦ࢖༻ྔʹؚ·Εͳ͍
  9. ϖʔδΩϟογϡͷઅ໿ • ౥ࡌϝϞϦΛ૿΍͢͜ͱͰҰԠղܾͰ͖Δ͕ɺ͢Ͱʹ 126GB RAMͳͷͰɺແବͳϖʔδΩϟογϡΛ࡟ݮ͍ͨ͠ • writeͨ͠σʔλΛ͙͢ʹಡΉͱ͸ݶΒͳ͍ͨΊɺwrite࣌ͷ σʔλΛΩϟογϡʹͷͤͳ͍ => Direct

    I/O • ͔͠͠ɺDirect I/OΛ࢖͏ͨΊʹ͸ɺϒϩοΫαΠζͰϝϞ ϦΞϥΠϝϯτΛἧ͑Δඞཁ͕͋Δ => PythonͰ΍Δͷ͕ ͱͯ΋໘౗ (malloc => posix_memalign) • posix_fadvise(2)Λ࢖ͬͯղܾ
  10. posix_fadvise(2) • ϓϩηε͕Χʔωϧ΁ϑΝΠϧσʔλͷΞΫηεύλʔϯΛ ௨஌ • Χʔωϧ͸ࢦఆ͞ΕͨΞΫηεύλʔϯʹԠͯ͡I/Oੑೳ͕޲ ্͢ΔΑ͏ʹ࠷దԽ • ΞΫηεύλʔϯ •

    POSIX_FADV_SEQUENTIAL: 2ഒͷઌಡΈ • POSIX_FADV_RANDOM: ઌಡΈఀࢭ • POSIX_FADV_DONTNEED: Ωϟογϡͨ͠ϖʔδͷղ์ • etc int posix_fadvise(int fd, off_t offset, off_t len, int advice);
  11. posix_fadvise(2)ΛGraphiteʹద༻ • ࠷ॳ͸ɺϖʔδΩϟογϡΛམͱ͢Φϓγϣϯʹண໨ • whisperͷॻ͖ࠐΈϩδοΫ͸݁ߏෳࡶͳͨΊɺwriteʹ ΑΔϖʔδΩϟογϡ෦෼͚ͩΛམͱ͢ͷ͕೉͍͠ • FAD_RANDONʹΑΓɺઌಡΈΛͤͣඞཁͳϖʔδ෼͚ͩ Ωϟογϡ͢ΔΑ͏ʹͨ͠ •

    whisperͷॻ͖ࠐΈͰγʔέϯγϟϧʹᢞΊΔॲཧ͸ͳ͍ • ઌಡΈ͍ͯͨ͠ແବͳϖʔδΩϟογϡ͕ݮͬͨ Active(file): 5387160 kB Inactive(file): 37566804 kB Active(file): 32252136 kB Inactive(file): 7231020 kB /proc/meminfo before & after
  12. Pull Request಺༰ • มߋ಺༰͸͞΄Ͳ೉͘͠ͳ͍ • fadvise ϞδϡʔϧΛ࢖͏ • straceͯ͠posix_fadvise͕Ͱͯ͘Ε͹ok •

    ৗʹfadvise͢Δͷ͕Α͍͔Θ͔Βͳ͍ͨΊɺઃఆϑΝΠϧ ʹΑΔ༗ޮɾແޮΛ੾Γସ͑ΒΕΔΑ͏ʹ (σϑΥϧτແޮ) • Ϛʔδͯ͠΋Β͏·Ͱ1ϲ݄͘Β͍͔͔ͬͨ with open(path, 'r+b') as fh: if CAN_FADVISE and FADVISE_RANDOM: posix_fadvise(fh.fileno(), 0, 0, POSIX_FADV_RANDOM)
  13. ·ͱΊ • MackerelͰ͸ 1,000,000 metrics/min + ͷϝτϦοΫ ॻ͖ࠐΈΛࡹ͘ඞཁ͕͋Δ • ࣌ܥྻσʔλϕʔεͱͯ͠GraphiteΛબ୒

    • ioDriveલఏͰOS೚ͤͷνϡʔχϯά • σΟεΫεϥογϯά໰୊Λposix_fadviseʹΑΓ writebackʹΑΔϖʔδΩϟογϡΛແޮʹ͢Δύον Ͱղܾ