Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SRE大全 メルカリ編 前半 #hbstudy 75 / SRE Taizen Mercari...

kazeburo
August 21, 2017

SRE大全 メルカリ編 前半 #hbstudy 75 / SRE Taizen Mercari 1 hbstudy#75

SRE大全 メルカリ編 hbstudy#75

kazeburo

August 21, 2017
Tweet

More Decks by kazeburo

Other Decks in Technology

Transcript

  1. Me • Masahiro Nagano / ௕໺խ޿ • @kazeburo • Mercari,

    Inc
 Principal Engineer
 Site Reliability Engineering (SRE) Team • BASE, Inc Technical Advisor
  2. Me • ~ 2006: ژ౎ͰελʔτΞοϓࢀՃ • ։ൃΛ͠ͳ͕ΒΠϯϑϥͷ໘౗ΛݟΔɻDC࡞ۀ΋΍ͬͨ • ΞϓϦέʔγϣϯͷνϡʔχϯάɺۭ͍ͨϦιʔεͰ৽ػೳͷ௥Ճͱ͍͏αΠΫϧ •

    mod_perlɺSquidʹΑΔReverse Proxy • 2006 ~: mixi • ʮΞϓϦέʔγϣϯӡ༻νʔϜʯ / DCʹߦ͔ͳ͘ͳͬͨ • େن໛ͳը૾഑৴/memcached/Q4M
  3. Me • 2010 ~: livedoor (NHN Japan => LINE) •

    livedoor΍LINEϑΝϛϦʔͷαʔϏεΛԣஅͯ͠Πϯϑϥ΍ύ ϑΥʔϚϯεͷվળ • livedoor blog ͷMySQLνϡʔχϯά • GrowthForecast/HRForecast/Plack Optimization/MHA • 2015/02: mercari
  4. ࠷ۙͷ׆ಈ • ొஃ • AWS Dev Day Tokyo 2017 •

    YAPC::Fukuoka 2017, YAPC::Hokkaido 2016 • هࣄ • WEB+DB PRESS Vol.88, Vol.92-97 ࿈ࡌ • ೔ܦSYSTEMS 2017 7݄߸, ITPro
  5. SREͱͷग़ձ͍ • 2012/7 ༑ਓͱͷIRCͰͷձ࿩͔Β • ΠϯϑϥͱαʔϏεͷՔಇɺ҆ఆੑΛ୲౰͢ΔνʔϜ͕SRE • https://research.googleblog.com/2012/07/site-reliability-engineers-solving-most.html
 ͜ͷهࣄ͕ެ։͞Εͨࠒ •

    twitter ͷbio΍ൃදεϥΠυʹʮSite ReliabilityʯΛ௥Ճͯ͠ҙࣝ • https://www.slideshare.net/kazeburo/yapc2102mysql/2 (2012/9) • 2015/11 ϝϧΧϦʹͯνʔϜ໊ͱͯ͠࠾༻
  6. Mercari SRE ͷۀ຿ൣғ Operations Software Eng. ج൫ߏங OnCall (ো֐ରԠ) Automation

    εέʔϥϏϦςΟɾՄ༻ੑվળ DBAɺϛυϧ΢ΣΞߏங ΞϓϦέʔγϣϯͷઃܭϨϏϡʔ ϩάऩूɾ෼ੳج൫ͷߏஙɺӡ༻ αʔόϓϩϏδϣχϯάɾσϓϩΠɺϚΠΫϩαʔϏεج൫ͷ੔උ ηΩϡϦςΟʗෆਖ਼ར༻ݕग़
  7. SRE౰൪/OnCall • ೔༵೔0͔࣌Β౔༵೔24࣌·Ͱ1िؒͰަ୅ • Ξϥʔτͷड͚औΓͱҰ࣌ରԠ • ฏ೔͸νʔϜϝϯόʔ͕ग़ࣾ͢Δ·Ͱࣗ୐଴ػ • 9͔࣌Βࣗ୐଴ػ͠ɺUS͔ΒͷґཔͳͲʹରԠ •

    UK͔ΒͷରԠͰ໷ؒ࡞ۀ΋͋Δ • ٳ೔΋15-20෼Ҏ಺ʹରԠ։࢝Ͱ͖Δ͜ͱ͕๬·͘͠ɺߦಈʹ੍ݶ΋͋Δ • ਂ໷ٳ೔ͷରԠͳͲɺՈ଒ͷڠྗ΋ඞཁ
  8. [୤ઢ] ࠷ۙ࡞ͬͨmackerel-plugin #!/usr/bin/perl use HTTP::Date; my $NUM_LOG_WATCH = 1000; my

    $CHECK_RANGE = 300; #5min my $exceptions = 0; my $now = time; open( my $messages_tail, "-|", "tail","-$NUM_LOG_WATCH","/var/log/messages") or die $!; while (<$messages_tail>) { if ( $_ !~ m!Machine Check Exception! ) { next; } if ( my ($time) = ($_ =~ m!^(\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2})\s!) ) { $time = str2time($time); if ( $now - $time < $CHECK_RANGE ) { $exceptions++; } } } if ( $exceptions > 0 ) { print "CRITICAL: Machine Check Exception Found in this 5 minutes\n"; exit 2; } print "OK: No Machine Check Exception found\n"; exit 0; % dmesg | tail 
 sbridge: HANDLING MCE MEMORY ERROR CPU 0: Machine Check Exception: 0 Bank 8: cc0427c000010090 TSC 0 ADDR 37805ac0 MISC 45048ce86 PROCESSOR 0:406f1 TIME 1495654896 SOCKET 0 APIC 0 [Hardware Error]: Machine check events logged EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#0_Ha#0_Channel#0_DIMM": 4255 Unknown error(s): memory read on FATAL area OVERFLOW: cpu=0 Err=0001:0090 (ch=0), addr = 0x37805ac0 => socket=0, ha=1, Channel=0(mask=1), rank=0 DIFDLNBDIJOFFYDFQUJPOT ϝϞϦʔΤϥʔΛݟ͚ͭΔ
  9. [୤ઢ] ࠷ۙ࡞ͬͨmackerel-plugin #!/bin/sh set -e if [ ! -f /opt/MegaRAID/MegaCli/MegaCli64

    ]; then exit fi if ( /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL|grep 'Firmware state'|grep -v "Online, Spun Up" > /dev/null 2>&1 ); then /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL|grep 'Firmware state' exit 2 fi /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL|grep 'Firmware state' exit 0 DIFDLSBJEEJTL .FHB$MJΛ͔ͭͬͯ3"*%ͷঢ়ଶΛ؂ࢹ
  10. slacklog ίϚϯυ $ slacklog -t alert-information --notify -- perl -e

    'die "TEST!"' !LB[VIP͞ΜͷDSPOMPHΛࢀߟʹ࡞੒ CBUDICBDLVQͷࣦഊΛݕ஌ slackboard ௨஌ͷू໿ IUUQTHJUIVCDPNDVCJDEBJZBTMBDLCPBSE
  11. PagerDuty • ༷ʑͳखஈͰ௨஌Λߦ͏͜ͱ͕Ͱ͖Δ • mail • SMS • App •

    ి࿩ • 12෼ʹҰ౓ి࿩ΛೖΕΔϧʔϧͰӡ༻ • Appͷ௨஌͕ศར
  12. γεςϜ͔ΒΈͨϝϧΧϦ ©2011 Amazon Web Services LLC or its affiliates. All

    rights reserved. Client Multimedia Corporate data center Traditional server Mobile Client IAM Add-on Example: IAM Add-on ence ) Assignment/ Task Requester Workers ग़඼! DB Search 5-දࣔ ݕࡧ൓ө ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corporate data center Traditional server Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Amazon Mechanical Turk On-Demand Workforce Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific େྔͷϦΫΤετ ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Corp data c Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ϦΫΤετԠ౴ DB Search ߪೖ! ਺ඵʙ30ඵ ਺ඵʙ ߴ଎ʹେྔͷτϥϯβΫγϣϯΛѻ͏ ը૾ ܾࡁ AI
  13. ΠϯϑϥετϥΫνϟ JP US UK DNS: Amazon Route53 CDN: Akamai, Fastly,

    ImageFlux Storage: Amazon S3 Analysis: Google BigQuery ܾࡁ/෺ྲྀαʔϏε ܾࡁ/෺ྲྀαʔϏε ܾࡁ/෺ྲྀαʔϏε
  14. ΞʔΩςΫνϟ • ࡾ૚+ΞϧϑΝͳΞʔΩςΫνϟ • Reverse Proxy(nginx)
 Application(Apache+mod_php)
 Database(MySQL)
 Cache(memcached)
 Search(Solr)

    • ଟ͘Λʮઐ༻αʔόʯʹͯߏ੒ • εέʔϧΞ΢τͱεέʔϧΞοϓΛಉ࣌ʹߦ͏Diagonal Scale • ओʹ24ίΞ~56ίΞ·ͰͷαʔόΛར༻ • Databaseʹ͸ ioMemory ΍ NVMe Λ౥ࡌͨ͠αʔόΛ࠾༻ nginx nginx nginx ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. Users Client Multimedia Corporate data center Traditional server Mobile Client WS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers ic DNS-RR App App App App App App MySQL MySQL memcached memcached util util cloud cloud JP Solr Solr
  15. Timeline • 6/22 • 9:41ɹ CDNͷ੾Γସ͑Λ࣮ࢪʢ໰୊ൃੜʣ • 14:41ɹΧελϚʔαϙʔτʹ͓ͯ٬͞·͔Βͷ໰͍߹ΘͤΛ֬ೝ͠ɺࣾ಺΁ใࠂ • 15:05ɹCDNͷ੾Γସ͑Λதࢭ͠ɺैདྷͷCDN΁໭͢

    • 15:16ɹWeb൛ͷϝϧΧϦΛϝϯςφϯεϞʔυ΁੾Γସ͑ • 15:38ɹ੾Γସ͑ઌCDNͷઃఆΛdeactivate͠ɺΞΫηεΛःஅ • 15:47ɹWeb൛ͷϝϧΧϦϝϯςφϯεϞʔυΛऴྃ • 17:55 ίʔϙϨʔταΠτʹ͓஌ΒͤΛܝࡌ • 20:45 Tech blogʹͯৄࡉެ։
  16. ੾Γସ͑ઌCDNʹ͓͚Δcacheͷಈ࡞ • CacheΛແޮԽ͢ΔͨΊʹ͸ “Cache-Control: private” ΋͘͠͸ "Set-Cookie" ͕ඞཁ • ”Cache-Control:

    no-cache” ΍ “no-store” ͸ແࢹ͞ΕΔ • Expiresϔομ΋ར༻͞ΕΔ͕ɺ೔෇ͷղऍʹࣦഊ͋Δ͍͸աڈ೔෇ͷ৔߹ ͸ “0ඵ” ͱͯ͠ѻΘΕΔ • ʮ0ඵͷΩϟογϡ͕ଘࡏ͢Δʯ • (্ه͸ઃఆʹΑΓΧελϚΠζ͕Մೳ)
  17. 0ඵͷcache • CDN͔ΒΦϦδϯ΁ͷϦΫΤετͷॲཧதʹɺಉ ͡URLʹରͯ͠ϦΫΤετ͕ൃੜ͢Δͱɺ࠷ॳͷ ϨεϙϯεΛ଴ͬͯɺ2ͭ໨Ҏ߱ͷϦΫΤετʹ ΋ಉ͡Ϩεϙϯε͕ฦ͞ΕΔ • ੩తίϯςϯπͰ͸ඇৗʹ༗ޮͰ͋Δ • ͜ͷ࢓༷ʹؔ͢Δ೺Ѳ͕Ͱ͖ͯͳ͔ͬͨ


    
 ©2011 Amazon Web Services LLC or its affiliates. All rights reserved. User Users Client Multimedia Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Requester Workers Amazon Mechanical Turk Non-Service Specific ©2011 Amazon Web Services LLC or its affiliates. All r User Users Client Mobile Client Internet AWS Management Console IAM Add-on Example: IAM Add-on Human Intelligence Tasks (HIT) Assignment/ Task Workers Amazon Mechanical Turk Non-Service Specific origin CDN (1) (2) (3) (4) (5) (5)
  18. cache aware nginx configuration • Expiresϔομ͸࢖Θͳ͍ • ݹ͍ϒϥ΢β޲͚ʹ͸PragmaϔομͰରԠ • Cache-ControlҎ֎ʹΩϟογϡΛආ͚ΔͨΊ͚ͩͷ

    Set-Cookie΋ૹ৴ more_clear_headers 'Expires'; more_set_headers "Cache-Control: private, no-cache, no-store, must-revalidate" "Pragma: no-cache"; add_header Set-Cookie "merCtx=\"\"; HttpOnly" always; OHJOYDPOG