Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SRE大全 メルカリ編 後半 #hbstudy 75 / SRE Taizen Mercari 2 hbstudy#75

kazeburo
August 02, 2017

SRE大全 メルカリ編 後半 #hbstudy 75 / SRE Taizen Mercari 2 hbstudy#75

SRE大全 メルカリ編 後半 hbstudy#75

kazeburo

August 02, 2017
Tweet

More Decks by kazeburo

Other Decks in Technology

Transcript

  1. SREେશ: ϝϧΧϦฤ [ޙ൒]
    2017/08/17 hbstudy#75
    Masahiro Nagano @kazeburo

    View Slide

  2. AGENDA
    • ޙ൒
    • PHP ΞϓϦέʔγϣϯͷ࠷దԽࣄྫ
    • ηΩϡϦςΟͷऔΓ૊Έ(ύεϫʔυϦετ߈ܸࣄྫ)
    • ϝϧΧϦSREͷࠓޙ
    • SREͷ໾ׂͷݱࡏͱະདྷɺϚΠΫϩαʔϏε

    View Slide

  3. PHP ΞϓϦέʔγϣϯͷ࠷దԽࣄྫ

    View Slide

  4. PHP7.1Խ͠·ͨ͠

    View Slide

  5. CPU Usage
    1)1Y 1)1Y
    CPU࢖༻཰൒෼!!

    View Slide

  6. Response time(95percentile)
    ‒PHP 7.1.x ‒PHP 5.6.x
    Response Time 20-30%վળ!!!

    View Slide

  7. PHP 7.1Խ
    • 5.6ܥ͔ΒͷΞοϓάϨʔυ
    • CIͰͷςετɺΫϥΠΞϯτͷࣗಈςετɺQAʹΑΔखಈςετΛܦͯ׬શҠ
    ߦ
    • ͦͷଞҠߦͷޮՌ
    • CIʹ͔͔Δ࣌ؒ΋୹ॖ

    View Slide

  8. ΊͰͨ͠ΊͰͨ͠

    Ͱ͸ͳͯ͘..

    View Slide

  9. PHP7.1ԽҎ֎ͷ
    PHP ΞϓϦέʔγϣϯͷ࠷దԽࣄྫ

    View Slide

  10. JSONಡΈࠐΈ
    • ػೳͷ֦ॆʹΑΓɺࠃࡍԽରԠͷmessage֨ೲϑΝΠϧ͕਺ेKBʹ๲ΒΉ
    • messageϑΝΠϧ͸΄ͱΜͲͷϦΫΤετͰಡ·ΕΔͨΊෛՙ͕૿େ

    View Slide

  11. JSONಡΈࠐΈ
    • 1୆ͷαʔόͰར༻͞Εͨmessage keyΛϩάʹه࿥ɺΑ͘ར༻͞ΕΔ
    messageΛಛఆ
    • grep | sort | uniq -c | sort -n
    • Α͘ར༻͞ΕΔmessage͚ͩΛूΊͨ essential.json Λ࡞੒
    • ϕϯνϚʔΫ All: 9msec => Essential: 1msec ҎԼ

    View Slide

  12. JSONಡΈࠐΈ: ߴ଎Խ
    ਺msecͷվળ!!

    View Slide

  13. େྔͷྫ֎Ϋϥε
    • app_exception.php ͱ͍͏ϑΝΠϧʹ330ݸͷྫ֎Ϋϥε͕ఆٛ
    • ػೳ͕૿͑Ε͹ྫ֎Ϋϥε΋૿͑Δ
    • ຖϦΫΤετͰಡΈࠐ·ΕΔ

    View Slide

  14. େྔͷྫ֎Ϋϥε
    • αʔό্Ͱى͖͍ͯΔ࣮ࡍʹى͖͍ͯΔྫ֎Λूܭ
    • ൃੜ݅਺ͷଟ͍΋ͷΛ app_exception_base.php ʹ੾Γग़͢
    • ϕϯνϚʔΫ 10msec => 1msec
    • try-catchͰϩάʹͰͳ͍΋ͷ΋͋ͬͨ

    View Slide

  15. େྔͷྫ֎Ϋϥε: ରࡦޙ
    ਺msecͷվળ!!!

    View Slide

  16. PHP(Opcache) ಈ࡞Πϝʔδ
    memory
    Destroy!!
    start
    end
    DB
    API
    shared mem
    php
    php
    php
    php
    php
    php
    php
    php
    php
    Response
    Request
    OPCODE
    Copy
    ࣮ߦ
    ࣮ߦ
    ࣮ߦ
    ίϯύΠϧ݁ՌͰ͸ͳ͘ɺOPCODEΛcache͓ͯ͠Γɺhash΍class͸౎౓ߏங = ௿଎
    ϦΫΤετΛ௒͑ͯcache͢Δͷ͸೉͍͠

    View Slide

  17. PHPͷಈ࡞
    ֎෦ͱͷ௨৴
    PHP಺෦ͷॲཧ
    hash΍classͷߏஙͳͲΛؚΉPHPͷಈ࡞͕࣌ؒ௕͍
    ֎෦΁ͷ໰͍߹Θͤͱ͋Θͤͯνϡʔχϯά͢ΔϙΠϯτ

    View Slide

  18. ϘτϧωοΫΛ୳͢ํ๏
    • NewRelic
    • σʔλϕʔεɺmemcachedɺ֎෦APIͷݺͼग़͠ճ਺ͱ࣌ؒɻN+1΍ɺpreload࿙Εͷൃݟ
    • PHP಺෦ͷτϨʔε΋Ͱ͖Δ
    • strace
    • System Call ͷτϨʔε
    • PHP಺෦Ͱى͖͍ͯΔ͜ͱ͸τϨʔεͰ͖ͳ͍
    • System CallͱιʔείʔυΛݟൺ΂ͯϘτϧωοΫΛ୳͢ => Ͳ͏΍ͬͯʁ

    View Slide

  19. strace
    $ sudo strace -tt -s 256 -p $(pgrep -n -f httpd) |& tee strace.txt
    15:29:31.122566 read(20, “GET /...)
    15:29:31.122844 stat(
    ...
    15:29:31.123914 getcwd("/", 4095) = 2
    15:29:31.123936 chdir(“/var/www/vhosts/app/webroot") = 0
    15:29:31.123966 setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={60, 0}}, NULL) = 0
    15:29:31.123987 fcntl(18, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1, len=1}) = 0
    15:29:31.125310 open(“/var/www/vhosts/app/foo/bar.json”) = 0
    ## Ctrl-C
    $ less strace.txt
    PHP ಺෦ͷಈ࡞͸traceͰ͖ͳ͍
    ιʔείʔυͱরΒ͠߹ΘͤΔʹ͸ώϯτͱͳΔsystem call͕΄͍͠ => OpcacheͷઃఆΛΈΔ

    View Slide

  20. php.ini in Production
    opcache.enable = 1
    opcache.revalidate_freq = any
    opcache.validate_timestamps = 0
    ࠷େੑೳ
    opcache.enable = 1
    opcache.revalidate_freq = 1
    opcache.validate_timestamps = 1
    ϝϧΧϦӡ༻த
    ϑΝΠϧͷߋ৽͸νΣοΫ͠ͳ͍
    ϑΝΠϧͷߋ৽͸νΣοΫ͢Δ
    1ඵ͝ͱʹνΣοΫ
    nginx dynamic upstreamΛ࢖͍ɺbalancer͔Β֎ͯ͠ɺ
    rsync => 1ඵsleep ͯ͠balancerʹ໭͢

    View Slide

  21. php.ini for strace
    opcache.enable = 1
    opcache.revalidate_freq = 0
    opcache.validate_timestamps = 1
    TUSBDF࣮ߦ༻
    ຖϦΫΤετͰνΣοΫ͢Δ
    18:21:01.253488 stat("/var/www/current/app/webroot/index.php", {st_mode=S_IFREG|0644, st_size=356, ...}) = 0
    18:21:01.253572 stat("/var/www/current/dietcake/dietcake.php", {st_mode=S_IFREG|0644, st_size=1013, ...}) = 0
    18:21:01.253697 stat("/var/www/current/dietcake/core/exception.php", {st_mode=S_IFREG|0644, st_size=46, ...}) = 0
    18:21:01.253775 stat("/var/www/current/dietcake/core/inflector.php", {st_mode=S_IFREG|0644, st_size=289, ...}) = 0
    ....
    18:21:01.254005 stat("/var/www/current/dietcake/core/dispatcher.php", {st_mode=S_IFREG|0644, st_size=1300, ...}) = 0
    18:21:01.254044 stat(“/var/www/current/app/aaaaaaa.php", {st_mode=S_IFREG|0644, st_size=5457, ...}) = 0
    18:21:01.254135 stat("/var/www/current/app/app_xxx.php", {st_mode=S_IFREG|0644, st_size=10423, ...}) = 0
    18:21:01.254618 stat("/var/www/current/app/app_exception.php", {st_mode=S_IFREG|0644, st_size=45680, ...}) = 0
    18:21:01.257390 stat("/var/www/current/app/app_yyy.php", {st_mode=S_IFREG|0644, st_size=1133, ...}) = 0

    View Slide

  22. App࠷దԽͷϞνϕʔγϣϯͱλΠϛϯά
    • Appαʔό͕૿͑Δͱ
    • ނো཰͕૿͑ɺσϓϩΠ࣌ؒ΋৳ͼΔ
    • AppαʔόΛ૿΍ͯ͠΋஗͘ͳΔͷ͸๷͛Δ͕ɺϨεϙϯε଎౓͕มΘΒͳ͍
    • ϝϧΧϦͰ͸AppαʔόͷCPU࢖༻཰͕ఆظతʹ50%Λ௒͑࢝ΊͨΒରࡦΛߦ͏
    • αʔόͷ௥Ճ or ΞϓϦέʔγϣϯͷίʔυͷ࠷దԽ
    • قઅతཁҼͰΞΫηε͕૿͑ΔͷͰɺ࣌ؒͷ͋Δͱ͖ʹ४උɾ࠷దԽͷωλΛ΋͓ͬͯ͘
    • αʔόαΠυΤϯδχΞͱऔΓ૊Ή͜ͱ͕Ͱ͖Ε͹ͳ͓ྑ͍
    • SREݚमͰऔΓ૊ΉͳͲ

    View Slide

  23. View Slide

  24. ηΩϡϦςΟͷऔΓ૊Έ
    ύεϫʔυϦετ߈ܸͷࣄྫ

    View Slide

  25. ύεϫʔυϦετ߈ܸ
    • ͓٬͞·ͷΞΧ΢ϯτʹରͯ͠ɺϥϯμϜͳύεϫʔυ΋͘͠͸ผͰ࿙Ӯ
    ͨ͠ύεϫʔυจࣈྻΛ࢖͍ϩάΠϯΛࢼߦ
    • ͞·͟·ͳن໛ͷ߈ܸ͕ߦΘΕ͍ͯΔ
    • ·ͣ͸ؾ෇͘͜ͱ͕Ͱ͖Δ࢓૊Έͷߏங͕ॏཁ

    View Slide

  26. ύεϫʔυϦετ߈ܸͷݕ஌
    • ϩάΠϯࣦഊΛAPIͷϩάͱͯ͠࢒͢
    • ΞΧ΢ϯτͳ͠/ύεϫʔυҧ͍
    • ϩάΛnorikraͰूܭɺmackerelͰՄࢹԽ

    ͱ؂ࢹ
    • ϝϧΧϦ͕TVʹऔΓ্͛ΒΕΔͱΞϥʔτ͕དྷΔ͜ͱ΋

    View Slide

  27. ύεϫʔυϦετ߈ܸ͔Βͷ๷ޚ
    • ൺֱత୯७ͳ߈ܸ͸ΞϓϦέʔγϣϯ಺Ͱࣗಈతʹ๷ޚ
    • ಉҰͷϝʔϧΞυϨεʹΑΔϩάΠϯࢼߦ
    • ಉҰͷIPΞυϨεʹΑΔϩάΠϯࢼߦ
    • ߈ܸͱ൑அ͞Εͨ৔߹͸֘౰IPɺ֘౰ΞΧ΢ϯτΛҰఆظؒڋ൱
    • ͓٬͞·ʹύεϫʔυͷϦηοτଅ͢

    View Slide

  28. େن໛ͳύεϫʔυϦετ߈ܸ
    • ੈքத͔Β߈ܸ͕ߦΘΕΔ
    • 2016೥ʹ࣮ࡍʹى͖ͨ߈ܸͷΞΫη
    εݩͷࠃ
    • 1IPʹ͖ͭɺ਺ճͷϩάΠϯࢼߦͰ
    ͋ΓɺࣗಈͰ͸๷͛ͳ͍
    ͦͷଞ
    18%
    Armenia
    2%
    Azerbaijan
    2%
    Bahrain
    2%
    Georgia
    2% Japan
    2%
    Russian
    2% Indonesia
    3%
    Nepal
    3%
    Pakistan
    5%
    Thailand
    5%
    Taiwan
    6%
    Viet Nam
    6%
    Brazil
    10%
    India
    30%

    View Slide

  29. େن໛ͳύεϫʔυϦετ߈ܸ΁ͷඋ͑
    • ߈ܸ͞Ε΍͍͢Web൛Ͱ͸CAPTCHAΛಋೖ
    • Client Reputation ͷར༻
    • GeoIP
    • Anonymous proxy detection
    • IP Reputation

    View Slide

  30. Client reputation
    http://www.cyren.com/security-center/ip-reputation-check
    https://www.ip2location.com/demo

    View Slide

  31. ߈ܸݩIP Reputation
    -PX

    .JEEMF

    )JHI

    4ׂ͸๷͙͜ͱ͕Ͱ͖ͨՄೳੑ͕͋Δ*
    ༷ʑͳ৘ใϦιʔεΛ૊Έ߹ΘͤͯαʔϏεͷ҆શੑΛߴΊ͍ͯ·͢
    * ߈ܸͷ͋ͬͨλΠϛϯάͷreputationͰ͸ͳ͍ͷͰଟগζϨ͕͋Γ·͢

    View Slide

  32. ϝϧΧϦSREͷࠓޙ
    SREͷ໾ׂͷݱࡏͱະདྷɺMicroservice

    View Slide

  33. Mercari SRE (࠶ܝ)
    • ͍ͭͰ΋շద͔ͭ҆શʹར༻Ͱ͖Δʮ৴པੑͷߴ͍ʯαʔϏεͷ࣮ݱ
    • ʮ৽نαʔϏεͷ։ൃҎ֎ͷΤϯδχΞϦϯά͸શ෦΍Δʯ
    • ʮ։ൃʯͱʮӡ༻ʯͷ෼཭
    • Production؀ڥͷ࡞ۀશൠΛSRE͕ड͚࣋ͭ

    View Slide

  34. Mercari SREͷҐஔ͚ͮ
    ։ൃνʔϜ
    SRE
    BOT
    Infrastructure
    σʔλϕʔε࡞ۀ/ௐࠪґཔ
    εΩʔϚ/ΞʔΩςΫνϟ૬ஊ
    ߏஙʗӡ༻
    σʔλϕʔε࡞ۀ
    ௐࠪ
    ߏஙɾར༻
    σϓϩΠ
    Log
    ಗ໊DB
    ར༻
    ར༻
    ൓ө/ࢀর
    ໰୊ௐࠪͷڠྗɺमਖ਼ґཔ

    View Slide

  35. SREͷ՝୊
    • ૿͑Διϑτ΢ΣΞΤϯδχΞɺαʔϏε/ػೳʹରͯ͠SREͷਓ਺ͷ૿Ճ
    ͕௥͍͔ͭͳ͍
    • Production؀ڥͱιϑτ΢ΣΞΤϯδχΞͷҙࣝͷဃ཭

    View Slide

  36. SREͷ՝୊
    ೥લ ൒೥લ ݱࡏ
    4& 43&

    View Slide

  37. MicroserviceԽ
    • ։ൃऀʹ Ownership
    • ։ൃ૊৫ͷεέʔϥϏϦςΟͱॊೈੑΛߴΊɺαʔϏεΛΑΓ޿͍͛ͯ͘
    • ʮϝϧΧϦνϟϯωϧʯͳͲͷ৽͍͠ػೳ
    • Atte΍Χ΢ϧͳͲͷ࿈ܞΞϓϦ
    • ஍Ҭ͝ͱͷಠࣗαʔϏε => US/UKͰ͸ਐΜͰ͍Δ

    View Slide

  38. Microservice in US
    ©2011 Amazon Web Services LLC or its affiliates. All rights reserved.
    User Users Client Multimedia Corporate
    data center
    Traditional
    server
    Mobile Client
    Internet AWS Management
    Console
    IAM Add-on Example:
    IAM Add-on
    Tasks (HIT) Task
    Mechanical Turk
    Non-Service Specific
    backend for frontends
    GKE
    Core/طଘAPI
    protobuf
    JSON over HTTPs
    GRPC
    Service A Service B
    Spanner
    GKE GKE
    Deployج൫
    SaaS
    PubSub

    View Slide

  39. MicroserviceԽʹΉ͚ͯ
    • Requirements(ཁ݅)ͷࡦఆ
    • Microserviceج൫ͷߏங
    • Deploy, Log, Database, Monitoring...
    • ຊ൪ͷΞʔΩςΫνϟͱͯ͠ͷ࠾༻
    • લਐͭͭ͠՝୊Λݟ͚ͭɺͻͱͭͻͱͭղܾ͍ͯ͘͠

    View Slide

  40. ࠓޙͷMercari SRE
    ։ൃνʔϜ
    SRE
    BOT
    Infrastructure
    σʔλϕʔε࡞ۀ/ௐࠪґཔ
    εΩʔϚ/ΞʔΩςΫνϟ૬ஊ
    ߏஙʗӡ༻
    σʔλϕʔε࡞ۀ
    ௐࠪ
    ߏஙɾར༻
    σϓϩΠ
    Log
    ಗ໊DB
    ར༻
    ར༻
    ൓ө/ࢀর
    ໰୊ௐࠪͷڠྗ
    मਖ਼ґཔ
    Microserviceج൫
    Mircoservice
    Infrastructure
    σϓϩΠ
    ߏங
    ࣗಈԽ͞Εͨӡ༻

    View Slide

  41. ࠓޙͷMercari SRE
    ։ൃνʔϜ
    SRE
    BOT
    Infrastructure
    σʔλϕʔε࡞ۀ/ௐࠪґཔ
    εΩʔϚ/ΞʔΩςΫνϟ૬ஊ
    ߏஙʗӡ༻
    σʔλϕʔε࡞ۀ
    ௐࠪ
    ߏஙɾར༻
    σϓϩΠ
    Log
    ಗ໊DB
    ར༻
    ར༻
    ൓ө/ࢀর
    ໰୊ௐࠪͷڠྗ
    मਖ਼ґཔ
    Mircoservice
    Infrastructure
    σϓϩΠ
    ߏங
    ࣗಈԽ͞Εͨӡ༻
    Microserviceج൫
    ϚΠΫϩαʔϏεԽ͕ਐΉ͜ͱͰ
    ͚ͩ͜͜ʹͳ͍ͬͯ͘ͷ͔ʁ

    View Slide


  42. ͱߟ͍͑ͯ·͢

    View Slide

  43. ࠓޙͷMercari SRE
    ։ൃνʔϜ
    SRE
    BOT
    Infrastructure
    σʔλϕʔε࡞ۀ/ௐࠪґཔ
    εΩʔϚ/ΞʔΩςΫνϟ૬ஊ
    ߏஙʗӡ༻
    σʔλϕʔε࡞ۀ
    ௐࠪ
    ߏஙɾར༻
    σϓϩΠ
    Log
    ಗ໊DB
    ར༻
    ར༻
    ൓ө/ࢀর
    ໰୊ௐࠪͷڠྗ
    मਖ਼ґཔ
    Microserviceج൫/Requirement
    Mircoservice
    Infrastructure
    σϓϩΠ
    ߏங
    ࣗಈԽ͞Εͨӡ༻
    ੵۃతʹؔΘ͍ͬͯ͘

    View Slide

  44. ࠓޙͷMercari SRE
    • Microserviceͷج൫ߏஙɾӡ༻
    • SREͷϊ΢ϋ΢Λ΋ͬͯαʔϏεʹੵۃతؔ༩
    • εέʔϥϏϦςΟɺՄ༻ੑͷվળ
    • ࣗಈԽͷਪਐɺΦϖϨʔγϣϯͷվળ
    • ηΩϡϦςΟͷ࣮ࢪ
    • Globalͳnetwork/infrastructureͷޮ཰ӡ༻

    View Slide

  45. SRE More!!!
    https://twitter.com/kazeburo/status/890131903529054210

    View Slide

  46. Mercari SRE More!!!
    Microservice, Automation, Performance, Scalability
    Database, Network, Distributed System, OS,
    Cloud, Hardware, Security
    SWE + ༷ʑͳ஌ࣝɾܦݧΛੜ͔ͯ͠
    ʮ৽ͨͳՁ஋ΛੜΈग़͢ੈքతϚʔέοτΛ૑Δʯ
    ৴པੑΛ࣋ͬͯࢧ͑Δ
    શํ໘ͰΑΖ͓͘͠ئ͍͠·͢!!

    View Slide

  47. Ҏ্

    View Slide