$30 off During Our Annual Pro Sale. View Details »

SREグループができてこの半年間やってきたこと

Isao Shimizu
January 30, 2017

 SREグループができてこの半年間やってきたこと

SRE Tech Talks #2 XFLAG スタジオにおけるSREの紹介、MySQL, InnoDB, THPのチューニングなど

Isao Shimizu

January 30, 2017
Tweet

More Decks by Isao Shimizu

Other Decks in Technology

Transcript

  1. About me - ਗ਼ਫ ܄ @isaoshimizu - 2011.8-2014.3 SNS mixiͷӡ༻

    - XFLAG ελδΦ - 2014.4-2016.6 αʔόʔΤϯδχΞ - 2016.7- SRE - ओʹࠃ಺޲͚ϞϯελʔετϥΠΫΛࢧ͑Δ͓࢓ࣄ - ଞʹ΋ϞϯετελδΞϜɺϒϥφΠDASHͳͲ - ޷͖ͳ΋ͷ - LinuxɺMySQLɺnginxɺMemcachedͳͲͷϛυϧ΢ΣΞશൠɺGolangɺΫϥϑτϏʔϧ - 2016.3.1 dots. Conference Spring 2016ʮϞϯετΛࢧ͑ΔΠϯϑϥͷࠓͱ͜Ε͔ΒʯΛൃද - https://speakerdeck.com/isaoshimizu/monsutowozhi-eruinhurafalsejin-tokorekara 2
  2. XFLAG ελδΦʹ͓͚ΔSRE - 2016೥7݄ʹSREάϧʔϓ͕૑ઃ - SREʹͳ͔ͬͨΒͱ͍ͬͯಥવେ͖͘มΘͬͨ͜ͱ͸ͳ͘ɺैདྷ͔Βܧଓͯ͠΍͍ͬͯΔ͜ͱ΋ଟ͍ - SREͷνʔϜʹ͓͚Δϛογϣϯ͕Θ͔Γ΍͘͢ͳͬͨ - ࣾ಺ˍࣾ֎͔Β΋Θ͔Γ΍͍͢૊৫ମ੍

    - ήʔϜʹؔΘΔػೳ։ൃ͔Βͷ෼཭ - ෛՙରࡦɺޮ཰ԽɺࣗಈԽͳͲʹ஫ྗ - ϝϯόʔͷಘҙෆಘҙΛ૬ޓʹิ׬͠ͳ͕Β࡞ۀΛ͜ͳ͢ - ౰൪੍Ͱෆଌͷࣄଶʹඋ͑Δ - جຊۀ຿ - Πϯϑϥɾαʔόʔӡ༻ɺՄ༻ੑɾύϑΥʔϚϯε޲্ɺϩάऩूɺ։ൃ؀ڥ੔උɺηΩϡϦςΟରࡦɺπʔϧ։ൃ - ຊ൪/εςʔδϯά؀ڥ΁ͷσϓϩΠɺήʔϜσʔλͷΠϯϙʔτ - εςʔδϯάҎલͷ։ൃ؀ڥ͸։ൃऀࣗ਎͕SlackͷϘοτܦ༝ͰσϓϩΠɾΠϯϙʔτՄೳ - άϥϑΛݟΔɺϩάΛಡΉɺίʔυΛཧղ͢ΔɺGitHub IssueͰ໰୊Λڞ༗͢ΔɺPull RequestͰղܾ͢Δ 4
  3. ౰൪੍ - ౰൪͸ܭ11໊ʢSREͷϝϯόʔ͸ݱࡏ7໊ʣ - 2໊ମ੍Ͱຖिϩʔςʔγϣϯ - GW΍೥຤೥࢝ͳͲͷ࿈ٳͷ࣌͸1೔୯ҐͰަ୅ - جຊతʹఆ࣌֎ͱٳ೔͸౰൪͕Ұ࣍ରԠ͢Δ -

    NagiosͰো֐Λݕ஌ɺPagerDutyͱ࿈ܞͯ͠౰൪΁௨஌ - ো֐ʹݶΒͣɺΤϯδχΞҎ֎͔Βۓٸͷ࿈བྷ͕ड͚ΒΕΔΑ͏ʹɺ
 Slack͔ΒϘοτܦ༝Ͱ౰൪΁௨஌Ͱ͖Δ࢓૊Έ΋͋Δ - ৔߹ʹΑͬͯ͸ΤεΧϨʔγϣϯͯ͠૯ಈһͰରԠ͢Δ - ࠓͷ౰൪੍͕Ͱ͖ͨͷ͸5೥Ҏ্લΒ͍͠ 5
  4. ΦϯϓϨͱΫϥ΢υͷϋΠϒϦουߏ੒ - ϞϯελʔετϥΠΫʢ೔ຊ൛ʣͷ࿩Ͱ͢ - Πϯϑϥ͸ෳ਺DCͰΦϯϓϨϛεʢ෺ཧαʔόʣ͕ओମ - άϩʔόϧ൛΍ଞҊ݅Ͱ͸AWS͕ଟ͍ - Ϋϥ΢υ΋ੵۃతʹར༻ -

    ͓΋ʹຊ൪Appαʔό༻్ͱͯ͠ܭ਺ඦ୆ن໛ͷΠϯελϯεʢ࣌ظʹΑͬͯมಈʣ - AWS c4.4xlarge (16core, 30GB) - GMOΞϓϦΫϥ΢υ N-4096 (40core, 96GB) - ͦΕҎ֎Ͱ͸Route 53, CloudFront, Turnαʔό, ։ൃ؀ڥ, WebαΠτܥͳͲ - ݱࡏ͸ຊ൪Ҏ֎΋ؚΊͯશମͰ1,300୆લޙͷαʔόΛӡ༻ 7
  5. ೥຤೥࢝ରࡦͷҰྫ - Appαʔόͷ૿ڧ - ૯ίΞ਺׵ࢉʢ࿦ཧίΞʣͰ10,000ίΞΛ15,000ίΞʹҰ࣌తʹ૿ڧ - ϫʔΧʔ਺ͷௐ੔ - σʔλϕʔε -

    ෛՙͷߴ͍DBΛγϟʔσΟϯά - ΫΤϦൃߦ਺ΛݮΒ͢ - εέʔϧΞοϓ - ioDrive, ioMemory΁ͷೖΕସ͑ - ݹ͍ϚγϯɺΠϯελϯεͷϦϓϨʔε - νϡʔχϯάʢޙड़ʣ - MySQL(InnoDB)ͷઃఆͷ࠷దԽ - OSͷΞοϓσʔτ - ͏Δ͏ඵରࡦ - FMి೾ࣜͷNTPαʔόʢΞϓϥΠΞϯεʣͷར༻ʢ֎෦αʔϏεʹґଘ͠ͳ͍ʣ - 2017೥1݄1೔9:00ͷ6࣌ؒલ͔Βঃʑʹ஗Βͤͯ1ඵؒΛௐ੔͢ΔઃఆʢLI͸ૠೖͤ͞ͳ͍ʣ 10
  6. MySQL - MariaDB 5.5ܥͰ౷Ұ - ChefͰϓϩϏδϣχϯά - σΟεΫ͸SSD or ioDrive/ioMemoryͷ2ύλʔϯʢSAS͸ແ͍ʣ

    - ౰વͳ͕ΒInnoDBͷΑ͋͘ΔΑ͏ͳઃఆ͸Ұ௨Γ΍ͬͯ͋ΔʢSNS࣌୅͔Β վળΛଓ͚͖ͯͨmy.cnfʣ - ࠓճɺओʹνϡʔχϯάͨ͠ͷઃఆ͸2ͭʢ࣍ϖʔδ΁ʣ 12
  7. MySQL(InnoDB) νϡʔχϯάᶃ - innodb_io_capacity - SSD, ioDrive/ioMemoryͰͦΕͧΕݻఆ஋Λઃఆ͖͕ͯͨ͠ɺ
 ༻్ʹΑͬͯม͑Δ༨஍͕͋ͬͨʢάϥϑ؍࡯ʹΑΓʣ - Dirty

    pages rate, Checkpoint Age, Disk IO, CPU Usage͋ͨΓͷ
 όϥϯεΛݟͳ͕Β࠷ద஋ΛܾΊΔ - ະ࢖༻ͷSlaveͰࢼ͔ͯ͠ΒMasterʹద༻ - ioDriveͰ10,000͔Β20,000ʹม͑ͨࣄྫ΋ - ݁Ռͱͯ͠ϨϓϦέʔγϣϯ஗ԆΛ؇࿨͢ΔޮՌ΍ɺspin lockͷճ਺͕ ݮΔޮՌ͕ಘΒΕͨ 13
  8. MySQL(InnoDB) νϡʔχϯάᶄ - innodb_spin_wait_delay - ͍··ͰσϑΥϧτ஋ 6 Λ࢖͖ͬͯͨʢཁ͸ະઃఆʣ - ͔͜͜ΒMySQLʹ͓͚Δಉظॲཧͷ࿩

    - εϨουͷಉظॲཧʹspin lockΛ༏ઌͯ͠࢖͍ͬͯΔʢϏδʔ΢ΣΠτͱ΋ݴ͏ʣ - ௨ৗͷmutex͸ϩοΫΛऔಘͰ͖ͳ͍৔߹γάφϧΛ଴ͪଓ͚Δ͕ɺspin lock͸ϧʔϓʹΑͬͯ ϩοΫΛऔಘ͚ͭͮ͠Δ - εϨου͸ϩοΫ͕࢖༻Մೳ͔Ͳ͏͔νΣοΫΛ͢ΔͨΊʹϧʔϓ͚ͭͮ͠Δ - ͨͩ͠ɺϩοΫͷղ์Λ଴͍ͬͯΔؒɺϧʔϓ͍ͯ͠Δ෼CPUΛ࢖͍ଓ͚ΔͷͰɺϚϧνίΞͷ؀ ڥ͔ͭɺ୹࣌ؒͰϩοΫ͕ղ์͞ΕΔέʔεʹ༗ޮͳख๏ͱ͍ΘΕ͍ͯΔ - ϩοΫ଴ͪͷϧʔϓͷճ਺͕innodb_sync_spin_loopsΛ௒͑ͨ৔߹͸ɺOSͷ mutex(pthread_cond_wait)ʹҾ͖ܧ͕ΕΔ - ҰൠతʹOS͕ఏڙ͢ΔmutexΛ࢖͏ํ͕ίετ͕ߴ͍ͱ͞Ε͍ͯΔ 14
  9. MySQL(InnoDB) νϡʔχϯάᶅ - MySQLͷspin lockͰ͸ɺinnodb_sync_spin_loops ෼ϧʔϓ͕ճΔ·ͰϩοΫͷऔಘΛࢼߦ͢Δ ͕ 0 ͔Β innodb_spin_wait_delay

    (default 6) ͷൣғͰϥϯμϜʹ଴ͪ࣌ؒΛઃ͚͍ͯΔ - MariaDBͷίʔυྫ
 https://github.com/MariaDB/server/blob/mariadb-5.5.54/storage/innobase/sync/ sync0sync.c#L527 - innodb_spin_wait_delay=50Λઃఆͨ͠ͱ͖ͷ࣮ࡍͷάϥϑͷྫ 15 ઃఆޙ͸os_waits͕ݮ͍ͬͯΔ
  10. OSͷΞοϓσʔτ - 2013೥ʹαʔϏε։࢝ͨ͠౰ॳ͸Ubuntu Server 12.04 LTS - ݱࡏ͸Ubuntu Server 14.04

    LTSΛར༻ - 2017೥4݄ʹEOLΛܴ͑Δ12.04͔Β14.04΁ঃʑʹೖΕସ͍͑ͯΔ - 16.04͸systemd΁ͷରԠ΍ɺKernelͷࠩ෼͕େ͖͍͜ͱ͔Β࡞ۀίετ͕
 ΍΍ߴͦ͏ͩͬͨͷͰҰ୴ݟૹΓʢૣ͍ஈ֊ͰΞοϓσʔτ͍ͨ͠ʣ - 14.04Λ࢖͍࢝ΊΔͱ͖ʹى͖ͨ͜ͱ - Transparent Huge Page໰୊ - ύϑΥʔϚϯεͷมԽ 19
  11. Transparent Huge Page໰୊ᶄ - Kernel 3.13Ҏ߱ͰMySQLͳͲͷϝϞϦΛେྔʹ࢖͏έʔεʹ͓͍ͯɺϝϞϦ࢖༻ྔ͕๲ΒΉͷ͸ Transparent Huge Pages(THP)͕Өڹ ͍ͯ͠Δ͜ͱ͕൑໌

    - THP͸Linux 2.6.38ͰϚʔδ͞Εͨ΋ͷʢ2011೥1݄ʣ - http://kernelnewbies.org/Linux_2_6_38#head-f28790278bf537b4c4869456ad7b84425298708b - /sys/kernel/mm/transparent_hugepage/enabled ͷ஋ͰTHPͷ༗ޮɾແޮ੍͕ޚͰ͖Δ - Ubuntuͷ৔߹ɺKernel 3.8Ͱ͸ madviseɺKernel 3.13Ҏ߱Ͱ͸ always ͕σϑΥϧτ஋ͱͯ͠ઃఆ͞ΕΔͨΊɺ3.13Ҏ߱Ͱ͸THP͕ৗ ʹ༗ޮͱͳΔ - madvise Ͱ͸໌ࣔతʹTHPΛ࢖͏Α͏ʹ͠ͳ͍ݶΓ࢖͏͜ͱ͸ͳ͍ - never Λࢦఆ͢ΔͱTHP͸࢖Θͳ͍ - RHELͷPerformance Tuning Guide https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html - “THP is not recommended for database workloads.” - THP͕༗ޮͷ৔߹ɺ/proc/meminfoͷAnonHugePagesͷ஋͕૿͑Δ - ࣮͸MongoDBͰ͸͓ͳ͡Έͷઃఆ
 https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/ - Brendan GreggઌੜͷεϥΠυʹ͸͍ͩͿॿ͚ΒΕͨ
 http://www.slideshare.net/brendangregg/linux-systems-performance-2016 21
  12. Transparent Huge Page໰୊ᶆ - ݕূͰ͓͜ͳͬͨsysbenchʹ͍ͭͯ - sysbench --test=oltp.lua --oltp-table-size=300000000 --mysql-table-engine=innodb

    --num-threads=10000 --max-requests=10000000 --max-time=0 --oltp-test-mode=complex --mysql-host=ࣗϚγϯͷIP —mysql-user=benchuser --mysql-password=benchpass run - oltp.lua͸SUMͱDISTINCTͷΫΤϦΛ࡟আʢ࣮ӡ༻Ͱ͸࢖Θͳ͍ؔ਺ͷͨΊʣ - ͔͔ͬͨ࣌ؒ - Kernel 3.8: 362෼ - Kernel 3.13: 361෼ - Kernel 4.2: 313෼ - 4.2͚ͩ΍ͨΒͱ଎͔ͬͨ - εϨου୯ମͷϕϯνϚʔΫʢ sysbench --test=threads --num-threads=$NUM_THREADS --max-time=0 run Λ5ճ࣮ࢪͨ͠ฏۉ஋ʣ - Kernel 3.8: 5.2s - Kernel 3.13: 6.0s - Kernel 3.16: 3.5s - Kernel 4.2: 4.0s - Kernel 3.16Ҏ߱εϨου·ΘΓ͕վળ͞Εͨʢʁʣ 23
  13. ิ଍: NUMAͱTHPʹؔΘΔϝϞϦ໰୊ᶃ - ࠷ۙͷϚγϯ͸CPU͝ͱʹϝϞϦ͕۠෼͚͞ΕͯɺCPUʹ͍ۙϝϞϦʹରͯ͠ߴ଎ʹΞΫηεͰ͖ΔΑ͏ʹ ͳͬͨʢNUMA=Non-Uniform Memory Accessʣɻ - NUMAʹ͓͚ΔσϑΥϧτͷڍಈͱͯ͠͸ɺϓϩηε͕ར༻͢ΔCPU͔Β͍ۙϝϞϦϊʔυ͕༏ઌతʹ࢖ΘΕ ͍ͯ͘ɻେྔʹϝϞϦΛ࢖͏έʔεͳͲͰɺ͍ۙϊʔυΛ࢖͍ਚͯ͘͠͠·͏ͱଞํͷԕ͍ϝϞϦϊʔυΛ࢖͍

    ͸͡ΊΔ͕ɺswap͕ൃੜͯ͠͠·͏໰୊͕͋Δʢswap insanityͱݺ͹ΕΔ)ɻswapͷൃੜʹΑͬͯύϑΥʔ Ϛϯε͸ྼԽͯ͠͠·͏ɻ - ໌ࣔతʹinterleaveϙϦγʔΛ࢖͏͜ͱͰCPUʹ͍ۙϝϞϦϊʔυΛ࢖͏ͷͰ͸ͳ͘ɺϥ΢ϯυϩϏϯʹϝϞϦ ϊʔυΛબ୒͢Δ͜ͱ͕Մೳɻ͜Ε͸ɺnumactlίϚϯυͷ--interleveΦϓγϣϯΛࢦఆ࣮ͯ͠ߦ͢Δ͜ͱͰಉ ҰϓϩηεͰinterleaveϙϦγʔ͕࣮ݱͰ͖ΔɻϝϞϦΛେྔʹ࢖͏έʔεͰ͸swapΛ๷͙͜ͱ͕Ͱ͖Δͨ Ίɺ͜Ε͕ਪ঑͞ΕΔɻ 24
  14. ิ଍: NUMAͱTHPʹؔΘΔϝϞϦ໰୊ᶄ - THP͸ϖʔδαΠζΛ4KB͔Β2MBʹେ͖͘͢Δ͜ͱͰɺPTE(Page Table Entry)ͷΩϟογϡͰ͋ΔTLB(Translation Lookaside Buffer)ར༻ͷޮ཰Խ͕ਤΒΕΔ͕ɺαΠζ͕େ͖͘ͳΔ෼ϝϞϦ֬อʹ͔͔͔࣌ؒΔΑ͏ʹͳΓɺ2MBҎ ԼͷϝϞϦ֬อ࣌ʹແବ͕ੜ·ΕΔͱ͍͏σϝϦοτ͕͋Δɻ -

    NUMAʹ͓͍ͯɺϖʔδαΠζ͕4KBͷͱ͖ʹ2MB֬อͨ͠৔߹͸ɺCPUʹ͍ۙϊʔυʹ෼ࢄͯ͠ϝϞϦ͕֬อ͞ΕΔ ͕ɺϖʔδαΠζ͕2MBͰ2MB෼֬อ͢Δͱɺಛఆͷϊʔυʹ͔֬͠อ͞Εͳ͘ͳΓɺԕํͷCPU͔ΒΞΫηε͞ΕΔ ࡍʹΦʔόʔϔου͕ੜ·ΕΔ͜ͱ͕͋Δ( http://www.fabiengaud.net/resources/gaud14large-slides.pdf )ɻ - εϨου΍ϓϩηεΛࢀর͍ͯ͠ΔϝϞϦͷۙ͘ʹҠߦͨ͠Γɺࢀর͍ͯ͠ΔσʔλΛۙ͘ͷϝϞϦʹҠߦ͢ΔͷΛࣗ ಈతʹ΍ͬͯ͘ΕΔNUMA Auto Balacingͱ͍͏ػೳ͕͋Δ͕ɺҠߦ͢Δ͜ͱʹΑͬͯPage Fault͕૿Ճ͢Δέʔε͕ ͋Δɻ - zone_reclaim_mode͕༗ޮʹͳ͍ͬͯΔͱɺCPUʹ͍ۙϝϞϦΛੵۃతʹ࢖͏Α͏ʹͳΔ͕ɺͦͷͨΊʹΩϟογϡΛ ࣺͯΔ͜ͱ΋ࣙ͞ͳ͍ɻmemoryͷnodeͷڑ཭Λݟͯɺԕ͍ͱࣗಈతʹ1ͱ͢Δ΍͔͍ͬͳ΋ͷͳͷͰɺmemoryΛ୔ ࢁ࢖͏Α͏ͳworkloadͰ͸ݻఆతʹ0ʹ͢Δͷ͕ৗ౟खஈ( zone_reclaim_mode = 0 )ɻ - Ҏ্ΑΓɺNUMAɺಛʹϝϞϦΛ࢖͏σʔλϕʔεʹ͓͍ͯ͸ɺTHPɺNUMA Auto Balancingɺzone_reclaim_mode ΛແޮԽ͠ͳ͍ͱෆརӹͳ͜ͱ͕ଟ͍ɻ 25
  15. ิ଍: NUMAͱTHPʹؔΘΔϝϞϦ໰୊ᶅ - ࣮ࡍͷઃఆ஋ - echo never > /sys/kernel/mm/transparent_hugepage/enabled -

    echo 0 > /proc/sys/kernel/numa_balancing - zone_reclaim_mode = 0 THPͱNUMA Auto Balancing͸GRUBͷઃఆͰແޮԽͰ͖Δ - /etc/default/grub ʹ transparent_hugepage=never numa_balancing=disable Λ ௥هͯ͠update-grub͢Δ - OSͷॳظઃఆ࣌ʹ͜ΕΒΛઃఆ͍ͯ͠Δ 26
  16. ิ଍: NUMAͱTHPʹؔΘΔϝϞϦ໰୊ᶆ Amazon Linux AMI 2016.09.1.20170119 x86_64 HVM (Kernel 4.4.41-36.55)

    $ cat /sys/kernel/mm/transparent_hugepage/enabled always [madvise] never Red Hat Enterprise Linux 7.3 (HVM) (Kernel 3.10.0-514) $ cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never CentOS Linux 7 (Kernel 3.10.0-327.10.1) $ cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never Ubuntu Server 16.04 LTS (HVM) (Kernel 4.4.0.53) $ cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never 27
  17. ͪΐͬͱ͚ͩએ఻ - ʲDMM GAMESओ࠵ʂʳʮෳࡶɾେن໛webαʔϏεΛࢧ͑Δٕज़ษڧձʯ - 2017೥2݄7೔(Ր) 19:30 ʙ 22:00ɺ৔ॴ͸͜͜υοπͰʂ -

    https://eventdots.jp/event/610781 ʢநબ੍ʣ - ʰେ͖͘ͳͬͨγεςϜΛվળ͍ͯ͘͠ͱ͍͏͜ͱʱͱ͍͏λΠτϧͰɺ
 ฐࣾ SREάϧʔϓ Ϧʔμʔͷҏ౻͕ొஃ͠·͢ɻͥͻ͝ొ࿥͍ͩ͘͞ɻ 30
  18. ࢀߟจݙ - Linux Systems Performance 2016 http://www.slideshare.net/brendangregg/linux-systems-performance-2016 - Performance Analysis

    and Tuning - Part 1 http://www.slideshare.net/tommylee98229/shak- larryjederperfandtuningsummit14part1final - Optimizing Linux Memory Management for Low-latency / High-throughput Databases https:// engineering.linkedin.com/performance/optimizing-linux-memory-management-low-latency-high-throughput-databases - zone_reclaim_modeͷσϑΥϧτ஋͕มΘΓ·͢ http://mkosaki.blog46.fc2.com/blog-entry-936.html - MySQL Performance: InnoDB IO Capacity & Flushing http://dimitrik.free.fr/blog/archives/2010/07/mysql- performance-innodb-io-capacity-flushing.html - MySQL΍SSDͱ͔ͷ࿩ ޙฤ http://www.slideshare.net/takanorisejima/mysqlssd-56045479 - InnoDB ͷ mutex ͷ࿩ʢೖ໳ฤʣ http://labs.gree.jp/blog/2015/12/15043/ - show innodb status http://www.slideshare.net/myfinder/show-innodb-status-6345058 - Dead Lock Analysis of spin_lock() in Linux Kernel (english) http://www.slideshare.net/SneekerYeh/dead-lock- analysis-of-spin-lock-in-linux-kernel-english - Ծ૝ϝϞϦʔΛࢧ͑Δ΋͏ͻͱͭͷΩϟογϡ TLB http://ascii.jp/elem/000/000/567/567889/ 31