Upgrade to Pro — share decks privately, control downloads, hide ads and more …

成長し続けるインフラストラクチャとメルカリの挑戦/mercari infrastructure and software

成長し続けるインフラストラクチャとメルカリの挑戦/mercari infrastructure and software

2018/07/29 @ July Tech Festa 2018

700669515ee872152d8b9403c2a0cf8c?s=128

kazeburo

July 30, 2018
Tweet

Transcript

  1. ιϑτ΢ΣΞͰߏங͢Δ ੒௕͠ଓ͚ΔΠϯϑϥετϥΫνϟͱϝϧΧϦͷ௅ઓ July Tech Festa 2018 2018/07/29 Masahiro Nagano

  2. Me • Masahiro Nagano • twitter.com/kazeburo, github.com/kazeburo • גࣜձࣾϝϧΧϦ
 ϓϦϯγύϧΤϯδχΞ


    Site Reliability Engineering (SRE) νʔϜ • BASE, Inc ٕज़ΞυόΠβʔ
  3. Me • 2000೥ࠒ ~ 2006: ژ౎ͰελʔτΞοϓࢀՃ • ΤϯδχΞ1-2໊ɻ։ൃΛ͠ͳ͕ΒΠϯϑϥͷ໘౗ΛݟΔɻDC࡞ۀ΋΍ͬͨ • ΞϓϦέʔγϣϯͷνϡʔχϯάɺۭ͍ͨϦιʔεͰ৽ػೳͷ௥Ճͱ͍͏αΠΫϧ

    • 2006 ~: mixi • ʮΞϓϦέʔγϣϯӡ༻νʔϜʯॴଐɻDCʹߦ͔ͳ͍ӡ༻ΤϯδχΞ • σʔληϯλʔνʔϜ͕༻ҙͨ͠αʔόͷೳྗΛҾ͖ग़͠ɺΞϓϦέʔγϣϯΤϯδ χΞ͕࡞੒ͨ͠ίʔυΛ࠷ߴͷܗͰಈ͔͢ͷ͕ࣗΒ(νʔϜ)ͷ໾ׂ
  4. Me • 2010 ~: livedoor => NHN Japan => LINE

    • livedoor΍LINEϑΝϛϦʔͷαʔϏεΛԣஅͯ͠ΠϯϑϥετϥΫνϟ΍
 ύϑΥʔϚϯεͷվળ • livedoor Blog ͷMySQLνϡʔχϯά • LINEϑΝϛϦʔαʔϏεͷಥൃΞΫηε΁ͷରԠ • 2015/02 ~ : Mercari
  5. SREͱͷग़ձ͍ • 2012/7 ༑ਓͱͷIRCͰͷձ࿩͔Βڭ͑ͯ΋Β͏ • GoogleͷڊେͳΠϯϑϥͱαʔϏεͷՔಇɺ҆ఆੑΛ୲౰͢ΔνʔϜ͕SRE • ʮSite Reliability Engineers:

    “solving the most interesting problems”ʯهࣄ͕ެ։͞Εͨࠒ
 https://research.googleblog.com/2012/07/site-reliability-engineers-solving-most.html • twitter ͷbio΍ൃදεϥΠυʹʮSite ReliabilityʯΛ௥Ճͯ͠ҙࣝ
 https://www.slideshare.net/kazeburo/yapc2102mysql/2 (2012/9) • 2015/11 ϝϧΧϦʹͯνʔϜ໊ͱͯ͠ఏҊ
  6. ࠷ۙͷ׆ಈ • ొஃ • AWS Dev Day Tokyo 2017 •

    YAPC::Fukuoka 2017, YAPC::Okinawa 2018 • Manabiya Teratail Developer Days • هࣄ • WEB+DB PRESS Vol.88, Vol.92-97 ࿈ࡌ, Vol.100
  7. AGENDA • ࣗݾ঺հ • ϝϧΧϦʹ͍ͭͯ • ϝϧΧϦͷ Infrastructure History #1

    - αʔϏε֦େ • Infrastructure Λࢧ͑Δ Software ࣄྫ * 2 • ϝϧΧϦͷ Infrastructure History #2 - Microservices
  8. ϝϧΧϦ • ೔ຊ࠷େڃͷϑϦϚΞϓϦ • 3෼Ͱ؆୯ʹग़඼ 1) ࣸਅΛࡱΔ 2) ঎඼৘ใΛهೖ 3)

    ग़඼ϘλϯΛԡ͢ • ҆৺҆શͳܾࡁɾऔҾ • ΤεΫϩʔ(͓ۚͷ΍ΓͱΓ͸౰͕ࣾؒʹհࡏ) • ಗ໊഑ૹ
  9. ถࠃ/ӳࠃ ΁ͷల։ ! JP " US # UK 2014೥9݄ϩʔϯν 2017೥3݄ϩʔϯν

    2013೥7݄ϩʔϯν
  10. KPI μ΢ϯϩʔυ਺ ྲྀ௨૯ֹ 1ԯ800ສ௒(JP+US+UK) 938ԯԁ௒(2018೥1-3݄) ݄ؒར༻ऀ਺ 1050ສਓ *2018೥3݄࣌఺ *2018೥3݄࣌఺

  11. KPI

  12. Peek Requests 3,400,000 req/min

  13. Infrastructure

  14. Infrastructure in 2017 DNS: Amazon Route53 CDN: Akamai, CloudFront Storage:

    Amazon S3 Analysis: Google BigQuery / Monitoring: Mackerel ! " #
  15. ! " # Infrastructure in 2018 DNS: Amazon Route53 CDN:

    Akamai, Fastly, ImageFlux(JP) Storage: Amazon S3 Analysis: Google BigQuery / Monitoring: Mackerel, DataDog + +
  16. Infrastructure History #1 2013 - 2017 / αʔϏε֦େ΁ͷରԠ

  17. Infrastructure History (1) • 2013/07 JP ϦϦʔε • ͘͞ΒΠϯλʔωοτͷʮ͘͞ΒͷVPSʯ1୆ʹWeb΋DB΋ࡌͤͨߏ੒Ͱ։࢝ •

    Infrastructure ઐ೚ऀ͕͍ͳ͍தͰɺ։ൃऀʹ਎ۙͳج൫Λબ୒ • ϦϦʔεޙ2ϲ݄Ͱʮ͘͞ΒͷΫϥ΢υʯʮઐ༻αʔόʯ΁Ҡߦ
  18. ʮ͘͞Βͷઐ༻αʔόʯ • Metal as a Service • ෺ཧαʔόͳΒͰͷύϑΥʔϚϯε • ωοτϫʔΫͱϋʔυ΢ΣΞͷอक͸


    ͘͞ΒΠϯλʔωοτ͕୲౰ • ʮ͘͞ΒͷΫϥ΢υʯͱ઀ଓ͕Մೳ
  19. Infrastructure History (2) • 2014/09 " US ϦϦʔε • AWS

    (Oregon) ʹͯαʔϏεߏங • JPϦϦʔε͔Β͠͹Β͘ܦͪɺ։ൃऀʹAWSܦݧऀ͕૿Ճ • ͦΕͰ΋ Infrastructure ઐ೚ऀ͸গͳ͘ɺRDS΍ElastiCache౳ϚωʔδυαʔϏεΛ ར༻ͯ͠αʔϏεΛߏங • USࠃ಺ͷ MaaS Λݕ౼͕ͨ͠ɺUSͰͷαʔϏεͷ੒௕͸༧૝͕೉͘͠ɺΫϥ΢υͷॊ ೈ͞Λ JP ΑΓ΋ॏཁࢹ
  20. Infrastructure History (3) • 2015/11 SREνʔϜൃ଍ • JP/US ͷΞʔΩςΫνϟΛվળ͠ɺαʔϏεͷ৴པੑͱεέʔϥϏϦςΟͷ޲ ্ʹऔΓ૊Ή

    • memcachedͷ༗ޮ׆༻ɺN+1ରࡦɺSQLνϡʔχϯάɺMaster/Slave࢖͍෼͚ • HTTP/2ԽɺPHP upgrade/࠷దԽ
  21. Architecture nginx nginx nginx DNS-RR App App App App App

    App MySQL MySQL memcached memcached util util cloud cloud •γϯϓϧͳ3૚ߏ੒ •Ϋϥ΢υͰ΋EC2/GCE (αʔό) Λத৺ʹߏ੒ • ՄೳͳݶΓڞ௨ͷΞʔΩςΫνϟΛ࠾༻ • ϚωʔδυαʔϏεΛαʔόʹϦϓϨΠε
  22. Managed to Server (USͷྫ) • ELB • Internal ELB •

    ElastiCache • RDS DNS-RR, NGINX, OpenResty Internal DNS (BIND), Consul memcached on EC2 MySQL on EC2 ngx_dynamic_upstreamʹΑΔແఀࢭɾߴ଎deploy ಥൃతͳΞΫηε଱ੑɾઃఆ൓ө଎౓վળ OS/Kernel tuningʹΑΔScalability޲্ όοΫΞοϓखॱͷඪ४ԽɾRolling Schema Upgrade
  23. ڞ௨Architectureͷૂ͍ ! " # •ΦϖϨʔγϣϯͷڞ௨ԽʹΑΓগਓ਺Ͱͷӡ༻ •JPͷن໛Ͱ࣮੷ͷ͋Δߏ੒ •Ansible PlaybookɺDBӡ༻ͷڞ௨Խ

  24. Infrastructure History (3) • 2017/03 UK ϦϦʔε • ৽͍ٕ͠ज़ͱͯ͠ʮGCPʯ্ͰαʔϏεΛߏங •

    ৽͍͠ΠϯϑϥͰ͸͋Δ͕ɺΞʔΩςΫνϟ΍Ansible Playbookͷ࠶ར༻ʹΑ Γ࣮੷ͷ͋Δج൫ͰαʔϏε։࢝
  25. Infrastructure Λࢧ͑Δ Software ࣄྫ

  26. None
  27. Consul • Service Discovery • DNSΠϯλʔϑΣΠεΛར༻͢Δ͜ͱͰ಺෦LBͱͯ͠ར༻ • ಺෦޲͚API, SMTPͳͲ಺෦޲͚αʔϏεͷΤϯυϙΠϯτ •

    Configuration Deployment • Distributed Lock
  28. keyword suggest ͷࣄྫ docker Consul check & register service Index

    docker Consul check & register service Index App App App Consul Λ࢖ͬͨService Discovery http://keyword-suggest.service.dc.consul:8080/ # service.json { "service": { "name": "keyword-suggest", "tags": ["production"], "port": 8080, "checks": [ { "script": "/usr/local/bin/check-http -u 'http://127.0.0.1:8080/hc'", "interval": "10s", "timeout": "5s" } ] } }
  29. keyword suggest ͷࣄྫ docker Consul Index # cron consul lock

    -verbose keyword_suggest_reload /path/to/check_and_reload gcloud docker -- pull $IMAGE_PATH:$LATEST_TAG |& tee $PULL_LOG echo -n $LATEST_TAG > $CDIR/tag/latest if grep "Downloaded newer image" $PULL_LOG > /dev/null 2>&1; then echo "[[[FOUND NEW IMAGE]]]" else echo "NO UPDATE. exit.." exit fi consul maint -enable -service suggest sleep 2 restart keyword-suggest sleep 5 consul maint -disable -service suggest Consul Λ࢖ͬͨ෼ࢄlock + ϝϯςφϯεϞʔυʹΑΔҰ࣌తͳαʔϏεఀࢭ cron ࠶ىಈ ϝϯςφϯεϞʔυ ແఀࢭͷIndexΞοϓσʔτ
  30. OpenResty

  31. OpenResty • nginxΛϕʔεʹ࡞ΒΕͨWeb Platform • nginx coreΛ͸͡ΊɺLuaJIT΍LuaϥΠϒϥϦ΍CͰॻ͔Ε֤ͨछαʔυύʔςΟ ϞδϡʔϧͰߏ੒ • nginxͷεέʔϥϏϦςΟΛͦͷ··ʹ༷ʑͳ֦ு͕Մೳ

  32. OpenResty in Mercari nginx nginx DNS-RR OpenResty Apache mod_php OpenResty

    Apache mod_php •Applicationαʔό಺ʹOpenRestyΛىಈ •άϩʔόϧଆnginxͱOpenRestyͷؒ͸HTTP KeepAlive༗ޮ •TCP઀ଓճ਺ΛݮΒ͢͜ͱͰύϑΥʔϚϯε޲্ •૊ΈࠐΈluaʹΑΓߴػೳͳL7ϩʔυόϥϯγϯά •ಥൃతͳେྔΞΫηεΛ࿨Β͛Δ keepalive
  33. Concurrency management with OpenResty OpenResty Apache mod_php nginx •ಛఆͷ঎඼΁ͷߪೖॲཧ͕ूதʹΑΓDBͷෛՙ্ঢ •SELECT

    FOR UPDATE ͕਺ඦ଺ཹ •Busy LockʹΑΓCPUͷ࢖༻཰͕ٸ্ঢ •DB΁ͷ໰͍߹ΘͤΑΓલʹฒྻ౓ͷௐ੔ •ͨͩ͠PHPͰwaitΛ͍ΕΔͱApacheͷϓϩηε͕ރׇ ͢ΔՄೳੑ SELECT FOR UPDATE
  34. Concurrency management with OpenResty OpenResty Apache mod_php memcached add lock_key

    ࣦഊͨ͠ΒԿճ͔retry nginx location = /buy { rewrite_by_lua_block { ngx.req.read_body() local args, err = ngx.req.get_post_args() local lock_key = "buy/" .. args.item_id -- #memcachedʹadd local locked = lock_with_memcached(lock_key) -- #addʹࣦഊͨ͠Β503 if not locked then ngx.exit(ngx.HTTP_SERVICE_UNAVAILABLE) end ngx.ctx.lock_key = lock_key } log_by_lua_block { local lock_key = ngx.ctx.lock_key -- #ϦΫΤετऴΘͬͨΒcacheΛ࡟আ if lock_key then ngx.timer.at(0,unlock_with_memcached,lock_key) end } proxy_pass http://127.0.0.1; } lock͔ͯ͠Β upstreamʹproxy ϦΫΤετ׬ྃޙʹ࡟আ
  35. Infrastructure History #2 2018 - / Microservices

  36. Microservices • αʔϏεͷ Resilience Λ޲্ͤ͞Δ • ࡉ͔͍୯ҐͰͷεέʔϦϯάɺো֐ͷ෼཭ • νʔϜɾ૊৫ͷ Scalability

    ΛߴΊΔ • αʔϏε։ൃͷ଎౓Λ͞Βʹ͍͋͛ͯͨ͘Ί • 1000໊Ҏ্ͷΤϯδχΞ૊৫Λࢤ޲ • Microservice PlatformΛ࡞ΓMicroservicesΛਐΊΔ
  37. Microservices Backend Team A Backend Team B Backend Team C

    Develop Service A Develop Service B Develop Service C QA Deploy Operation QA Team SRE
  38. Microservices Backend Team A Backend Team B Backend Team C

    Develop Service A Develop Service B Develop Service C QA Deploy Operation QA Deploy Operation QA Deploy Operation
  39. US Re-Architecture • US marketʹΑΓ࠷దԽ͢΂͘ Client ΛFull Renewal • MicroservicesͷroutingΛߦ͏API

    GatewayΛGolangͰ࣮૷ • AWS্ͷMonolith APIΛWrap • ؇΍͔ͳҠߦΛ࣮ݱ API Gateway search personalization offer gRPC JSON over HTTPs Protocol Buffers over HTTPs gRPC gRPC Monolith API
  40. API Fork • 3ͭͷRegionͰڞ༗͍ͯͨ͠Monolith APIͷίʔυΛ US,UK ͱ JP Ͱ෼཭ •

    ࣗregionͷมߋ͕ଞregionʹӨڹ͢Δ͜ͱΛ཈͑Δɻௐ੔ɾQAίετ࡟ݮ • ΑΓ֤ࠃͷࣄ৘ʹ͋ͬͨ։ൃΛ֤ࠃͰߦ͏ • US,UKͷݱ஍࠾༻΋ਐల
  41. API Gateway in JP • JPͰ΋MicroservicesΛਐΊΔͨΊ
 API GatewayΛಋೖ • GolangͰ࣮૷͍ͯ͠Δ͕USͱ͸ผ࣮૷

    • Clientͷมߋ͸ͳ͘Protocol͸جຊҡ࣋ • Golangͷnet/httpΛ࢖༻ͭͭ͠ɺ
 DNS cacheɺRequest bufferingͳͲͷ௥Ճ API Gateway JSON over HTTPs JSON over HTTPs ProtoBuf over HTTPs ServiceA ServiceC ServiceB gRPC
  42. ! " # Infrastructure in 2018 + + ͦΕͧΕͷRegionʹ͋Θͤͨ
 Microservicesల։ͱInfrastructure

  43. Microservices Tech Stack • Container / Docker • Kubernetes •

    Spinnaker • Terraform Software is Infrastructure
  44. Container / Docker • Container • Ϧιʔεͷ෼཭ɾ੍ޚ • Docker •

    DockerfileʹΑΔҰ؏ͨ͠Πϝʔδͷ࡞੒ • ϙʔλϏςΟͷ࣮ݱ
  45. Kubernetes • Container ͷ Orchestration Platform • ࣗಈScalingɺࣗಈhealing • Container

    ӡ༻ίετͷ࡟ݮ • GKE(Google Kubernetes Engine) Λத৺ʹར༻
  46. Spinnaker • Continuous Delivery Platform • Developed by Netflix •

    googleͳͲͷڠྗɾOSSԽ • Deploy pipelineͷఆٛɾࣗಈ࣮ߦ • Continuous Delivery for Microservices with Spinnaker at Mercari
 https://speakerdeck.com/tcnksm/continuous-delivery-for-microservices-with-spinnaker-at-mercari
  47. Terraform • Infrastructure as a code / DevOps • MicroservicesΛૉૣ্ཱͪ͛͘ΔStarter-

    KitΛఏڙ • GCP Project • kubernetes Namespace • GitHub teams
  48. Microservices Խͷ՝୊
 (networkฤ)

  49. ڑ཭ ੴङ DC Cloud Native Mircoservices ઐ༻αʔό Core(monolith) API 1,000

    km
  50. ping (ੴङ → ౦ژ) $ ping -c 5 example.jp PING

    example.jp (130.211.x.x) 56(84) bytes of data. 64 bytes from 129.31.x.x.bc.googleusercontent.com (130.211.x.x): icmp_seq=1 ttl=51 time=16.9 ms 64 bytes from 129.31.x.x.bc.googleusercontent.com (130.211.x.x): icmp_seq=2 ttl=51 time=16.9 ms 64 bytes from 129.31.x.x.bc.googleusercontent.com (130.211.x.x): icmp_seq=3 ttl=51 time=16.8 ms 64 bytes from 129.31.x.x.bc.googleusercontent.com (130.211.x.x): icmp_seq=4 ttl=51 time=16.9 ms 64 bytes from 129.31.x.x.bc.googleusercontent.com (130.211.x.x): icmp_seq=5 ttl=51 time=16.8 ms -— example.jp ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4020ms rtt min/avg/max/mdev = 16.846/16.918/16.966/0.150 ms
  51. HTTPs (ੴङ → ౦ژ) $ ./httpstat.sh https://example.jp/hc HTTP/1.1 200 OK

    Server: nginx/1.13.3 Date: Wed, 11 Oct 2017 01:59:15 GMT Content-Type: application/json; charset=utf-8 Content-Length: 22 Expires: Wed, 11 Oct 2017 02:59:15 GMT Cache-Control: max-age=3600 Cache-Control: public Via: 1.1 google Alt-Svc: clear DNS Lookup TCP Connection SSL Handshake Server Processing Content Transfer [ 1ms | 19ms | 165ms | 20ms | 0ms ] | | | | | namelookup:1ms | | | | connect:20ms | | | pretransfer:185ms | | starttransfer:205ms | total:205ms
  52. ෺ཧతͳڑ཭͸ॖΊΒΕͳ͍ ͚ͲɺͳΜͱ͔͍ͨ͠

  53. SoftwareʹΑΔվળ • 3 way handshakeΛආ͚ΔɻTLS ͷ handshake ΋ආ͚Δ • HTTP/1,

    HTTP/2 ͷKeepAlive Λ׆༻͢Δ • PHPͰ͸ϦΫΤετؒͰͷHTTPίωΫγϣϯͷ࠶ར༻͕Ͱ͖ͳ͍ • ϛυϧ΢ΣΞͰConnection Aggregation
  54. chocon GitHub.com/kazeburo/chocon

  55. chocon % curl -H ‘Host: example.jp.ccnproxy-https’ http://10.0.0.1/v1/foo chocon Web/API Client

    https://example.jp/ http http/https with keepAlive Private Network proxy
  56. Chocon & Consul Chocon Consul check & register service App

    App App Chocon Consul check & register service *.ccnproxy-https IN CNAME production.chocon.service.dc.consul. $ curl http://example.jp.ccnproxy-https/v1/foo DNS Consulͱ಺෦DNSΛ૊Έ߹Θͤͯར༻͠΍͘͢ ৑௕ੑͷ࣮ݱ https://example.jp/v1/foo
  57. https with chocon (ੴङ → ౦ژ) $ ./httpstat.sh /dev/null https://example.jp.ccnproxy-https/hc

    HTTP/1.1 200 OK Cache-Control: max-age=3600,public Content-Length: 22 Content-Type: application/json; charset=utf-8 Date: Thu, 01 Jun 2017 00:43:49 GMT Expires: Thu, 01 Jun 2017 01:43:49 GMT Server: nginx/1.11.5 X-Chocon-Req: bSCzJrCMZ9wbRN8TYhZ3wV Body stored in: /tmp/httpstat-body.390174181496278775 DNS Lookup TCP Connection Server Processing Content Transfer [ 1ms | 1ms | 19ms | 0ms ] | | | | namelookup:1ms | | | connect:2ms | | starttransfer:21ms | total:21ms
  58. ·ͱΊ

  59. ϝϧΧϦͷInfrastructure • SoftwareʹΑͬͯՄ༻ੑɾύϑΥʔϚϯεΛ޲্ɺΠϯϑϥΛఆٛ • Consul, OpenResty • chocon • Microservices

    Stack • ଞʹ΋ଟ͘ͷSoftware͕SRE͔Βੜ·Ε͍ͯΔ • https://github.com/mercari/
  60. SREͱͯ͠ͷᛗ࣋ • αʔόͷೳྗΛҾ͖ग़͠ɺΞϓϦέʔγϣϯΤϯδχΞ͕࡞੒ͨ͠ίʔυΛ࠷ߴͷܗͰ ಈ͔͢ͷ͕ࣗΒ(νʔϜ)ͷ໾ׂ • αʔϏεͷՄ༻ੑ͸Πϯϑϥ͚ͩͷ໰୊Ͱ͸ͳ͘ɺιϑτ΢ΣΞΛѻ͏νʔϜʹ΋੹೚ • ͦͷ্Ͱɺ೗Կʹ͓ͯ͠٬༷ʹʮ͍ͭͰ΋շదʹ҆શʹ࢖͑Δʯ৴པੑΛఏڙ͢Δ͔

  61. (Cloud + Network + Hardware + N) × Software =

    Infrastructure
  62. ͝੩ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠