Mercari Item Search: 
Behind The Scenes (20min)

700669515ee872152d8b9403c2a0cf8c?s=47 kazeburo
October 28, 2019

Mercari Item Search: 
Behind The Scenes (20min)

Mercari Item Search: 
Behind The Scenes (20min)
第22回 Lucene/Solr勉強会 2019.10.28

700669515ee872152d8b9403c2a0cf8c?s=128

kazeburo

October 28, 2019
Tweet

Transcript

  1. Mercari Item Search: 
 Behind The Scenes (20min) Masahiro Nagano

    (kazeburo) ୈ22ճ Lucene/Solrษڧձ 2019.10.28
  2. Me • Masahiro Nagano • @kazeburo • ISUCON9 ग़୊ •

    Mercari, Inc. ӡ༻ܥখݼ

  3. https://about.mercari.com/press/news/article/20180719_billionitems/ ϝϧΧϦ͸ɺ2018೥7݄13೔࣌఺ͰϑϦϚΞϓϦʮϝϧΧϦʯͷྦྷܭग़඼਺͕10ԯ඼ʢ˞1ʣΛಥഁ͍ͨ͠·ͨ͠ͷͰ͓஌Β͍ͤͨ͠·͢ɻ
 ˞1 αʔϏε։࢝೔ʢ2013೥7݄2೔ʣ͔Βͷ೔ຊࠃ಺ྦྷܭग़඼਺

  4. https://about.mercari.com/press/news/article/mercari_500million/ ϝϧΧϦ͸ɺ2019೥9݄18೔࣌఺ͰϑϦϚΞϓϦʮϝϧΧϦʯͷྦྷܭऔҾ݅਺͕5ԯ݅(※1)Λಥഁ͍ͨ͠·ͨ͠ͷͰ͓஌Β͍ͤͨ͠·͢ɻ ※1ɿαʔϏε։࢝೔ʢ2013೥7݄2೔ʣ͔Βͷ೔ຊࠃ಺ྦྷܭऔҾ݅਺

  5. • (ӡ༻͔ΒΈͯ) Mercari ͸ ๭SNSͷΑ͏ͳαʔϏε • ঎඼ͷग़඼ɾऔҾʹ൐ͬͯߋ৽͕ඵؒ਺ඦճҎ্ൃੜ • ର໘औҾͷΑ͏ͳεϜʔζ͞ͷͨΊʹɺݕࡧIndex൓ө΋ՄೳͳݶΓߴ଎ʹ

  6. Softwares for Search • 2013.7 (?) ~ • Solr on

    BareMetal Servers • Nginx as LB • 2019.7 ~ New Architecture • Elasticsearch on GKE
  7. None
  8. Jurassic Period PHP Solr update select

  9. Jurassic Period PHP Solr update select

  10. • ૿͑Δ঎඼ɺ৳ͼΔDAU

  11. Cretaceous period PHP Master update select Slave Slave Slave Nginx

    Replication pollInterval 30s
  12. Cretaceous period PHP Master update select Slave Slave Slave Nginx

    Replication pollInterval 30s
  13. K-Pgڥք • ΋ͬͱ૿͑Δ঎඼ɺ΋ͬͱ৳ͼΔDAU • Tuning JVM/GC • Tried CMS ,

    Parallel GC and G1GC. Parallel GC was better for this Era • Tuning Query • Use filter query correctly • Split Index and Fallback in Nginx
  14. Paleogene period PHP Recent Master update select Recent Slave Recent


    Slave Recent
 Slave OpenResty Replication pollInterval 30s All Master All
 Slave All
 Slave update Replication pollInterval 1min
  15. • ݕࡧIndexͷ෼ׂ • Mercari ͷݕࡧͷσϑΥϧτͷฒͼॱ͸ʮ৽ணʯ • ৽ண঎඼͕ೖΔRecent Indexͱશͯͷ঎඼͕ೖΔAll Indexʹ෼ׂ •

    Recent Index͸υΩϡϝϯτ਺͕ݮΔ͜ͱͰෛՙ࡟ݮ • All Index͸ࢀর਺͕ݮΓɺෛՙ͕খ͘͞
  16. • Indexͷ෼ׂOpenResty ʹΑΔ Recent͔ΒAll΁ͷࣗಈFall back • OpenResty • Nginx͓Αͼɺngx_luaΛ͸͡Ίͱ͢ΔCͰॻ͔Ε֤ͨछαʔυύʔςΟϞδϡʔϧͳ ͲͰιϑτ΢ΣΞɾσΟετϦϏϡʔγϣϯ

    • ݕࡧrequestΛड͚ͨࡍʹɺ·ͣRecent Indexʹରͯ͠ݕࡧΛ࣮ߦ͠ɺಘΒΕͨJSONΛ Nginx಺ͷLuaͰॲཧɺऔಘݸ਺(rows)ʹରͯ݁͠Ռ͕ෆ଍͍ͯ͠Ε͹ɺAll Indexʹର͠ ͯΫΤϦΛ͠ͳ͓͢
  17. Paleogene period PHP Recent Master update select Recent Slave Recent


    Slave Recent
 Slave OpenResty Replication pollInterval 30s All Master All
 Slave All
 Slave update Replication pollInterval 1min
  18. Paleogene period PHP Recent Master update select Recent Slave Recent


    Slave Recent
 Slave OpenResty Replication pollInterval 30s All Master All
 Slave All
 Slave update Replication pollInterval 1min
  19. • ͞Βʹ૿͑Δ঎඼ɺ͞Βʹ૿͑ΔDAU • (ݕࡧେख)Bot, Scraper ऻདྷ • ৽ண঎඼͓஌ΒͤϝʔϧɺՁ֨ɾΧςΰϦαδΣετͳͲͷ৽ػೳ

  20. • ઐ༻ Slave ͷ࡞੒ • Bot, ScaperͷΞΫηεىҼͷݕࡧϦΫΤετͷ൑ఆ • ৽ண঎඼͓஌Βͤϝʔϧͷݕࡧॲཧͷ෼཭ •

    ʮDescription͕ΩʔϫʔυʯͷՁ֨ɾΧςΰϦαδΣετͷॏ͍ΫΤϦͷ෼཭ • Query Rewriting by Lua
  21. Neogene period Recent Master update select Recent Slave Recent
 Slave

    Condition
 Slave OpenResty Replication pollInterval 30s All Master All
 Slave All
 Slave update Replication pollInterval 1min Condition
 Slave Suggest
 Slave Suggest
 Slave Replication pollInterval bit long PHP General Specific purpose
  22. local create_t = string.match(token, "^fq=created_time_t%%3A%%5B(%d%d%d%d%-%d%d%-%d%dT%d%d%%3A%d%d%%3A%d) %dZ%+TO%+%%2A%%5D$") if create_t then --

    conditionsͰfilter queryͷhit཰Λ͋͛ΔͨΊɺcreated_time_tͷ࠷ޙΛ9ඵʹͯ͠͠·͏ create_t_filter = "created_time_t%3A%5B" .. create_t .. "9Z+TO+%2A%5D" -- ঎඼ͷঢ়ଶ: શ෦ࢦఆ͞ΕͯΔͷͰফ͢ args = string.gsub(ngx.var.args,"&fq=item_condition_id%%3A%%281%+OR%+2%+OR%+3%+OR%+4%+OR%+5%+OR%+6%%29","") -- ࠷௿Ձ͕֨300ԁͳͷͰফͤΔ args = string.gsub(args,"&fq=price_t%%3A%%5B300%%2BTO%%2B*%%5D", "") -- ঎඼ͷՁ֨: frangeʹॻ͖׵͑ɹ args = string.gsub(args,"&fq=price_t%%3A%%5B%%2A%+TO%+(%d+)%%5D","&fq=%%7B%%21frange%+cache%%3Dfalse%+cost% %3D150%+u%%3D%1%%7Dprice_t") -- "͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑" ରࡦɻ10ճҎ্࿈ଓͰ10จࣈ·Ͱ੾Γ٧Ί args, _, _ = ngx.re.gsub(args, "(%[0-9a-f][0-9a-f]%[0-9a-f][0-9a-f]%[0-9a-f][0-9a-f])\\1{9,}", "$1$1$1$1$1$1$1$1$1$1", "i")
  23. Neogene period Recent Master update select Recent Slave Recent
 Slave

    Condition
 Slave OpenResty Replication pollInterval 30s All Master All
 Slave All
 Slave update Replication pollInterval 1min Condition
 Slave Suggest
 Slave Suggest
 Slave Replication pollInterval bit long PHP General Specific purpose
  24. Neogene period Recent Master update select Recent Slave Recent
 Slave

    Condition
 Slave OpenResty Replication pollInterval 30s All Master All
 Slave All
 Slave update Replication pollInterval 1min Condition
 Slave Suggest
 Slave Suggest
 Slave Replication pollInterval bit long PHP General Specific purpose
  25. • εϜʔζͳSolr ͷόʔδϣϯΞοϓͷ࣮ݱ (4 → 6 → 8) • ΠϯσοΫεΛมߋ͍ͨ͠

    (BigramԽ) • BigramͰͷύϑΥʔϚϯεҡ࣋ͷͨΊ͸BareMetal αʔόͷεέʔϧΞοϓ΋ඞཁ • Microservices Խ/ϦΞʔΩςΫνϟ͍ͨ͠ • Master ͷো֐΁ͷඋ͑
  26. • தؒ queue αʔϏε (solr-queue-app) • Solr΁ͷߋ৽ϦΫΤετͷJSONΛҰ౓Ωϡʔʹ֨ೲ͔ͯ͠Βɺࢦఆ͞ΕͨαʔόʹPOST͠௚͢ • PHPͷίʔυΛมߋͤͣʹ Master

    ௥ՃɾมߋͰ͖ΔΑ͏ʹ • Solr-DB (MySQL) • ߋ৽σʔλΛ঎඼ID͝ͱʹ෼ׂ͠ɺMySQLʹ֨ೲ • MySQLͷσʔλ͔ΒશΠϯσοΫεͷ࡞੒͕਺࣌ؒͰߦ͑Δ • ͜Ε·Ͱ͸਺೔͔Β਺िؒ • Cloud PubSub ΁ͷૹ৴
  27. Quaternary period PHP Solr6 Master Solr8 Master Solr6’ Master MySQL

    Q4M API Worker Q4M API Worker API solr-queue-app API
  28. Current

  29. Current

  30. None
  31. • ௨ৗͷ৽ணɾΧςΰϦλΠϜϥΠϯ • MySQL ͔Β৽ணΛऔಘ • ߋ৽͸΄΅ϦΞϧλΠϜ • ͓͢͢ΊλΠϜϥΠϯ •

    ӾཡཤྺͳͲ͔ΒSolrΫΤϦΛ࡞੒ɺSolr͔ΒϢʔβ͝ͱʹҟͳΔ঎඼Λ৽ண ॱʹදࣔ • ߋ৽͸΄΅ϦΞϧλΠϜ
  32. • ϦΞϧλΠϜͳΠϯσοΫε൓ө • ߴ଎ͳϨεϙϯεͷ࣮ݱ

  33. • ϦΞϧλΠϜͳΠϯσοΫε൓ө • ϨϓϦέʔγϣϯ͸ͤͣɺશ෦Master • soft commitΛ1ඵҎԼʹઃఆ • ߴ଎ͳϨεϙϯεͷ࣮ݱ •

    ෼ࢄΠϯσοΫεʹΑΓαʔό͋ͨΓͷυΩϡϝϯτΛݮΒ͢ • ৽ணʹे෼ͳυΩϡϝϯτ਺ͷΈอ࣋ / εΩʔϚ΋γϯϓϧʹ
  34. PHP PHP API MySQL blackhole black hole Q4M Solr (master)

    worker trigger dequeue black hole Q4M Solr (master) worker trigger dequeue black hole Q4M Solr (master) worker trigger dequeue black hole Q4M Solr (master) worker trigger dequeue soft commit several times per second Update Update item selected by consistent hashing Use MySQL replication as PubSub
  35. black hole Q4M Solr (master) worker trigger dequeue consul my

    $res = $ua->get(‘http://localhost/v1/health/service/".$SRV.'?passing'); my $ref = JSON::XS::decode_json($res->content); my @list = sort { $a cmp $b } map { $_->{Node}{Address} } @$ref; my $ketama = Algorithm::ConsistentHash::Ketama->new(); $ketama->add_bucket($_ . '_' . $timestamp, 1) for @list; my $s1 = $ketama->hash($item_id); return $s1 eq $my_ip; Get server list from Consul Make Consistent Hash Drawing by consistent-hashing Update Solr consistent-hashingͷ݁Ռ͕trueͳΒ͹ɺupdate
 consistent-hashingͷ݁Ռ͕falseͳΒ͹ɺdelete
  36. black hole Q4M Solr (master) worker trigger dequeue consul black

    hole Q4M Solr (master) worker trigger dequeue consul black hole Q4M Solr (master) worker trigger dequeue consul black hole Q4M Solr (master) worker trigger dequeue consul API/Go PHP PHP select distribute select request to all Solr servers and merge their resposne select select select select
  37. None
  38. Current

  39. ͓ΘΓ