Slide 1

Slide 1 text

Mackerelʹ͓͚Δ ࣌ܥྻσʔλϕʔεͷੑೳվળ ϖύϘɾ͸ͯͳٕज़େձʙΠϯϑϥٕज़ج൫ʙ@෱Ԭ ͸ͯͳ id:y_uuki

Slide 2

Slide 2 text

id:y_uuki yuuki ΢ΣϒΦϖϨʔγϣϯΤϯδχΞ@͸ͯͳ ೖࣾ3೥໨͘Β͍

Slide 3

Slide 3 text

07/02@ژ౎ https://speakerdeck.com/yuukit/linux-network-performance-improvement-at-hatena

Slide 4

Slide 4 text

΋͘͡ 1. Mackerelͱ࣌ܥྻσʔλ 2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ 3. σΟεΫεϥογϯά໰୊ͱͦͷղܾ 4. ·ͱΊ

Slide 5

Slide 5 text

΋͘͡ 1. Mackerelͱ࣌ܥྻσʔλ 2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ 3. σΟεΫεϥογϯά໰୊ͱͦͷղܾ 4. ·ͱΊ

Slide 6

Slide 6 text

https://mackerel.io

Slide 7

Slide 7 text

αʔόͷϝτϦοΫՄࢹԽ

Slide 8

Slide 8 text

MackerelͷΞʔΩςΫνϟ

Slide 9

Slide 9 text

Mackerelͷ࣌ܥྻσʔλͷಛੑ • ΤʔδΣϯτ͕Ϣʔβ͞Μͷϗετ͔Βຖ෼ϝτϦοΫ ౤ߘ • 2016/01࣌఺ͰΞΫςΟϒΤʔδΣϯτ਺ 10,000+ • 1ΤʔδΣϯτ͋ͨΓͷϝτϦοΫ਺͸࠷େ200 • ԾʹฏۉϝτϦοΫ਺Λ100 metrics/agentͱ͢Δͱɹ ߹ܭૹ৴ϝτϦοΫ਺ 1,000,000 metrics/min + • ϝτϦοΫͷେྔॻ͖ࠐΈʹ଱͑ΒΕΔσʔλϕʔε͕ ඞཁ

Slide 10

Slide 10 text

Graphite

Slide 11

Slide 11 text

΋͘͡ 1. Mackerelͱ࣌ܥྻσʔλ 2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ 3. σΟεΫεϥογϯά໰୊ͱͦͷղܾ 4. ·ͱΊ

Slide 12

Slide 12 text

Graphiteͱ͸ • PythonͰॻ͔Εͨ࣌ܥྻσʔλϕʔεϛυϧ΢ΣΞ • HTTPΠϯλϑΣʔε ʢॻ͖ࠐΈ͸ಠࣗϓϩτίϧʣ • ग़ྗσʔλܗࣜ͸άϥϑը૾·ͨ͸JSON Graphite (timestamp, name, value) graph request Image or JSON

Slide 13

Slide 13 text

GraphiteͷΞʔΩςΫνϟ (timestamp, name, value) graph request Image or JSON carbon graphite-web filesystem write read whisper whisper

Slide 14

Slide 14 text

GraphiteͷΞʔΩςΫνϟ (graphite-web) (timestamp, name, value) graph request Image or JSON carbon graphite-web filesystem write read whisper whisper ಡΈࠐΈཁٻΛड͚෇͚ΔͨΊͷWebΞϓϦέʔγϣϯ

Slide 15

Slide 15 text

GraphiteͷΞʔΩςΫνϟ (carbon) (timestamp, name, value) graph request Image or JSON carbon graphite-web filesystem write read whisper whisper ॻ͖ࠐΈཁٻΛड͚෇͚ΔͨΊͷσʔϞϯ

Slide 16

Slide 16 text

GraphiteͷΞʔΩςΫνϟ (whisper) (timestamp, name, value) graph request Image or JSON carbon graphite-web filesystem write read whisper whisper ࣌ܥྻDBϑΝΠϧΛ࡞੒ɾߋ৽͢ΔͨΊͷϥΠϒϥϦ ϝτϦοΫ͝ͱʹ ϑΝΠϧ͕Ͱ͖Δ

Slide 17

Slide 17 text

Whisperͷσʔλߏ଄ • ͢΂ͯͷσʔλΛอଘ͢ΔͱσΟεΫ࢖༻ྔ͕ංେԽ • timestamp: 4byte, value: 8byteͱͯ͠12bytes/datapointͱ͢Δ ͱɺ1೥Ͱ6MB/metric • ݹ͍σʔλʹ͍ͭͯ͸ҰఆظؒͰฏۉԽor࠷େ஋Λ࢒ؙͯ͠Ί ͯ͠·ͬͯσΟεΫ࢖༻ྔΛઅ໿ • ex. 1෼ਫ਼౓ͷσʔλ͸1೔෼͚ͩͰΑ͍͕ɺ5෼ਫ਼౓ͷσʔλ ͸1िؒ࢒͢ͱ͍͏Α͏ͳΠϝʔδ

Slide 18

Slide 18 text

Graphiteͷॻ͖ࠐΈύϑΥʔϚϯεಛੑ(CPUར༻཰) • carbon͸2ͭͷεϨου͕ڠௐͯ͠ಈ࡞͢Δ • σʔλΛड͚औΔωοτϫʔΫI/OεϨου • ϑΝΠϧॻ͖ࠐΈͷͨΊͷI/OεϨου • ΠϕϯτۦಈϞσϧͷωοτϫʔΫαʔό • όοϑΝ͝͠ʹεϨουؒͰσʔλϙΠϯτΛ౉͢ • ֤εϨου͕1ίΞͰ཯଎͢Δ໰୊ • carbonϓϩηεΛෳ਺ݸͨͯͯ෼ࢄͤ͞Δ

Slide 19

Slide 19 text

Graphiteͷॻ͖ࠐΈύϑΥʔϚϯεಛੑ(σΟεΫIO) • େྔͷϑΝΠϧʹখ͞ͳσʔλྔʢ໿12ByteʣΛ1෼Ҏ ಺ʹॻ͖ࠐΉ • ϑΝΠϧγεςϜ্ͷۙྡϒϩοΫʹ·ͱΊͯॻ͘͜ͱ ͕Ͱ͖ͳ͍ͨΊɺI/Oޮ཰͸ѱ͍ (શํҐॻ͖ࠐΈ) • ൓໘ɺಉ࣌ʹෳ਺ͷεϨου͕1ͭͷϑΝΠϧʹॻ͖ࠐ Ή͜ͱ͕ͳ͍ͨΊɺ I/Oͷฒྻ౓͸ߴΊ΍͍͢ • XFSͷΑ͏ͳฒྻI/Oʹ༏ΕͨϑΝΠϧγεςϜͰͳ͘ ͯ΋ɺੑೳ͸มΘΒͳ͍ (ext4ͳͲ)

Slide 20

Slide 20 text

ϋʔυ΢ΣΞߏ੒ͱϦιʔε࢖༻ྔ • CPU: Xeon E5-2697 v3 @ 2.60GHz 2 socket 28ίΞ • ϝϞϦ: 126GB • σΟεΫ: Fusion ioMemory ioDrive2 6.4TB • ͍ΘΏΔϑϨογϡετϨʔδɻϝʔΧʔެশ஋͸ 300k write IOPS • ࣮ޮI/Oੑೳ: 50k ~ 100k write IOPS • ී௨ͷSSDͳΒ1/10ͷੑೳ͕ͰΕ͹ྑ͍ํ

Slide 21

Slide 21 text

Graphiteνϡʔχϯά • ioDriveͷIOPSΛ࢖͍੾ΔલʹCPUϦιʔεΛ࢖͍͖ͬ ͯ͠·͏ͨΊɺCPUΛઅ໿ͯ͠I/Oʹ޲͚Δߟ͑ํ • random writeʹڧ͍ߴ଎ͳσΟεΫͳͨΊɺجຊతʹ carbon΍I/Oεέδϡʔϥʹ͸༨ܭͳ࠷దԽΛͤ͞ͳ͍ • ιʔτʹΑΔI/Oޮ཰Խ΍I/OϦιʔεΛ࢖͍͖Βͳ͍ ͨΊͷ੍ݶͷύϥϝʔλ͕͋Δ • echo noop > /sys/block/fioa/queue/scheduler

Slide 22

Slide 22 text

GraphiteΫϥελߏ੒ (timestamp, name, value) graphite-web carbon carbon … … LB carbon carbon … … LB LB carbon carbon … …

Slide 23

Slide 23 text

ৄ͘͠͸ϒϩάͰ http://blog.yuuk.io/entry/high-performance-graphite

Slide 24

Slide 24 text

΋͘͡ 1. Mackerelͱ࣌ܥྻσʔλ 2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ 3. σΟεΫεϥογϯά໰୊ͱͦͷղܾ 4. ·ͱΊ

Slide 25

Slide 25 text

write IOPS read IOPS ಥવͷreadෛՙ૿େ

Slide 26

Slide 26 text

ͳʹ͕ى͖ͨͷ͔ • read IOPS͕૿Ճ͠ɺwrite IOPS͕ݮগ͍ͯ͘͠ • ϝϞϦෆ଍ʹΑΔSwapྖҬͷ࢖༻͸ͳ͠ɻOSͷϝϞϦ ࢖༻ྔ͸1/3ఔ౓ͩͬͨ • αʔϏε΁ͷಥൃతͳΞΫηε૿Ճ͸ͳ͠ • sar -BͰɺҰఆ࣌ؒ಺ͷϖʔδΠϯͱϖʔδΞ΢τͷ਺͕ ҟৗʹ૿͍͑ͯͨ͜ͱ͕൑໌ • ͜ͷݱ৅ΛσΟεΫεϥογϯάͱݺͿ͜ͱʹ͢Δ • LinuxͷϖʔδΩϟογϡͷ࢓૊ΈͱGraphiteͷI/Oύ λʔϯ͔ΒݪҼΛਪ࡯ͨ͠

Slide 27

Slide 27 text

LinuxͷϖʔδΩϟογϡ • ϝϞϦͷ಺༁ = used + buffers/caches + free • ϑΝΠϧγεςϜ͔ΒσʔλΛಡΈࠐΉ/ॻ͖ࠐΉͱɺ࣍ճ Ҏ߱ߴ଎ʹಡ·ͤΔͨΊʹɺOS͕ϖʔδ୯ҐͰσΟεΫ্ ͷσʔλΛϝϞϦʹࡌͤΔ • ϖʔδΩϟογϡͱݺͿ • ϖʔδΩϟογϡ͸LRUΞϧΰϦζϜɻ࠷ۙࢀর͞Εͨ Ωϟογϡσʔλ͸࢒͠ɺࢀর͞Εͳ͍ݹ͍Ωϟογϡσʔ λΛফ͢ • ϖʔδΩϟογϡ͸௨ৗϝϞϦ࢖༻ྔʹؚ·Εͳ͍

Slide 28

Slide 28 text

GraphiteͷI/Oύλʔϯ • 1෼Ҏ಺ʹશͯͷΞΫςΟϒͳwhisperϑΝΠϧʹॻ͖ ࠐΉͨΊɺσΟεΫͷ޿ൣғʹ౉ͬͯॻ͖ࠐΈ͕૸Δ • whisperͷϝτϦοΫॻ͖ࠐΈૢ࡞͸ɺwrite(2)͚ͩͰ ͳ͘ɺϝλσʔλͷಡΈࠐΈ΍ΦϑηοτܭࢉͷͨΊ ͷread(2)΋૸Δ • ϖʔδΩϟογϡ͸read͚ͩͰͳ͘writeʹ΋༗ޮ (Direct I/O͸আ͘) • Graphiteϗετ͸େྔͷϖʔδΩϟογϡΛ΋ͭ

Slide 29

Slide 29 text

read IOPS૿ͷݪҼ • ϖʔδΠϯͱϖʔδΞ΢τճ਺͕ଟ͍ͱ͍͏͜ͱ͸ɺ LRUʹΑΓݹ͍Ωϟογϡ͕௥͍ग़͞Ε͍ͯΔ • whisperॻ͖ࠐΈͷreadͰϖʔδΩϟογϡ͕ޮ͔ͳ͘ ͳͬͨ݁Ռɺread IOPS͕૿͑ͨ Memory used page cache page in page out

Slide 30

Slide 30 text

ϖʔδΩϟογϡͷઅ໿ • ౥ࡌϝϞϦΛ૿΍͢͜ͱͰҰԠղܾͰ͖Δ͕ɺ͢Ͱʹ 126GB RAMͳͷͰɺແବͳϖʔδΩϟογϡΛ࡟ݮ͍ͨ͠ • writeͨ͠σʔλΛ͙͢ʹಡΉͱ͸ݶΒͳ͍ͨΊɺwrite࣌ͷ σʔλΛΩϟογϡʹͷͤͳ͍ => Direct I/O • ͔͠͠ɺDirect I/OΛ࢖͏ͨΊʹ͸ɺϒϩοΫαΠζͰϝϞ ϦΞϥΠϝϯτΛἧ͑Δඞཁ͕͋Δ => PythonͰ΍Δͷ͕ ͱͯ΋໘౗ (malloc => posix_memalign) • posix_fadvise(2)Λ࢖ͬͯղܾ

Slide 31

Slide 31 text

posix_fadvise(2) • ϓϩηε͕Χʔωϧ΁ϑΝΠϧσʔλͷΞΫηεύλʔϯΛ ௨஌ • Χʔωϧ͸ࢦఆ͞ΕͨΞΫηεύλʔϯʹԠͯ͡I/Oੑೳ͕޲ ্͢ΔΑ͏ʹ࠷దԽ • ΞΫηεύλʔϯ • POSIX_FADV_SEQUENTIAL: 2ഒͷઌಡΈ • POSIX_FADV_RANDOM: ઌಡΈఀࢭ • POSIX_FADV_DONTNEED: Ωϟογϡͨ͠ϖʔδͷղ์ • etc int posix_fadvise(int fd, off_t offset, off_t len, int advice);

Slide 32

Slide 32 text

posix_fadvise(2)ΛGraphiteʹద༻ • ࠷ॳ͸ɺϖʔδΩϟογϡΛམͱ͢Φϓγϣϯʹண໨ • whisperͷॻ͖ࠐΈϩδοΫ͸݁ߏෳࡶͳͨΊɺwriteʹ ΑΔϖʔδΩϟογϡ෦෼͚ͩΛམͱ͢ͷ͕೉͍͠ • FAD_RANDONʹΑΓɺઌಡΈΛͤͣඞཁͳϖʔδ෼͚ͩ Ωϟογϡ͢ΔΑ͏ʹͨ͠ • whisperͷॻ͖ࠐΈͰγʔέϯγϟϧʹᢞΊΔॲཧ͸ͳ͍ • ઌಡΈ͍ͯͨ͠ແବͳϖʔδΩϟογϡ͕ݮͬͨ Active(file): 5387160 kB Inactive(file): 37566804 kB Active(file): 32252136 kB Inactive(file): 7231020 kB /proc/meminfo before & after

Slide 33

Slide 33 text

Graphite΁ͷPull Request

Slide 34

Slide 34 text

Pull Request಺༰ • มߋ಺༰͸͞΄Ͳ೉͘͠ͳ͍ • fadvise ϞδϡʔϧΛ࢖͏ • straceͯ͠posix_fadvise͕Ͱͯ͘Ε͹ok • ৗʹfadvise͢Δͷ͕Α͍͔Θ͔Βͳ͍ͨΊɺઃఆϑΝΠϧ ʹΑΔ༗ޮɾແޮΛ੾Γସ͑ΒΕΔΑ͏ʹ (σϑΥϧτແޮ) • Ϛʔδͯ͠΋Β͏·Ͱ1ϲ݄͘Β͍͔͔ͬͨ with open(path, 'r+b') as fh: if CAN_FADVISE and FADVISE_RANDOM: posix_fadvise(fh.fileno(), 0, 0, POSIX_FADV_RANDOM)

Slide 35

Slide 35 text

ςετεΫϦϓτʹΑΔݕূ https://gist.github.com/yuuki/8d5d386115b0f01b5371 • whisperͷॻ͖ࠐΈؔ਺Λ࢖ͬͯɺ࣮ࡍʹϖʔδΩϟο γϡͷྔ͕ݮΔ͔Ͳ͏͔֬ೝ • 100ݸͷwhisperϑΝΠϧʹରͯ͠100ݸͷσʔλϙΠϯ τΛॻ͖ࠐΉεΫϦϓτ • /proc//io ͷread_bytes(࣮ࡍʹσΟεΫ͔ΒಡΈͩ ͨ͠αΠζ)ΛΈΔ • POSIX_FAD_RANDOMΦϓγϣϯΛ͚ͭΔͱϖʔδ Ωϟογϡྔ͕1/2ʹͳͬͨ

Slide 36

Slide 36 text

΋͘͡ 1. Mackerelͱ࣌ܥྻσʔλ 2. GraphiteͷΞʔΩςΫνϟͱੑೳঢ়گ 3. σΟεΫεϥογϯά໰୊ͱͦͷղܾ 4. ·ͱΊ

Slide 37

Slide 37 text

·ͱΊ • MackerelͰ͸ 1,000,000 metrics/min + ͷϝτϦοΫ ॻ͖ࠐΈΛࡹ͘ඞཁ͕͋Δ • ࣌ܥྻσʔλϕʔεͱͯ͠GraphiteΛબ୒ • ioDriveલఏͰOS೚ͤͷνϡʔχϯά • σΟεΫεϥογϯά໰୊Λposix_fadviseʹΑΓ writebackʹΑΔϖʔδΩϟογϡΛແޮʹ͢Δύον Ͱղܾ

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

1෼ҎԼͷཻ౓ͷϝτϦοΫ ཻ౓ΛଛͳΘͣ௕ظอଘ ϦΞϧλΠϜͳҟৗݕ஌

Slide 40

Slide 40 text

࣍ੈ୅ͷ࣌ܥྻσʔλϕʔεʹ ࡮৽͍ͨ͠

Slide 41

Slide 41 text

http://hatenacorp.jp/recruit/fresh/operation-engineer ٕज़͕޷͖ͳਓ΁

Slide 42

Slide 42 text

ຊεϥΠυͷKeynoteςϯϓϨʔτͱͯ͠ shoya140͞ΜͷZebra(http://shoya.io/blog/zebra/) Λ࢖Θ͍͖ͤͯͨͩ·ͨ͠ Mackerelʹ͓͚Δ ࣌ܥྻσʔλϕʔεͷੑೳվળ ϖύϘɾ͸ͯͳٕज़େձʙΠϯϑϥٕज़ج൫ʙ@෱Ԭ ͸ͯͳ id:y_uuki