$30 off During Our Annual Pro Sale. View Details »

Mercari Item Search: 
Behind The Scenes (20min)

kazeburo
October 28, 2019

Mercari Item Search: 
Behind The Scenes (20min)

Mercari Item Search: 
Behind The Scenes (20min)
第22回 Lucene/Solr勉強会 2019.10.28

kazeburo

October 28, 2019
Tweet

More Decks by kazeburo

Other Decks in Technology

Transcript

  1. Mercari Item Search: 

    Behind The Scenes (20min)
    Masahiro Nagano (kazeburo)
    ୈ22ճ Lucene/Solrษڧձ 2019.10.28

    View Slide

  2. Me
    • Masahiro Nagano
    • @kazeburo
    • ISUCON9 ग़୊
    • Mercari, Inc. ӡ༻ܥখݼ


    View Slide

  3. https://about.mercari.com/press/news/article/20180719_billionitems/
    ϝϧΧϦ͸ɺ2018೥7݄13೔࣌఺ͰϑϦϚΞϓϦʮϝϧΧϦʯͷྦྷܭग़඼਺͕10ԯ඼ʢ˞1ʣΛಥഁ͍ͨ͠·ͨ͠ͷͰ͓஌Β͍ͤͨ͠·͢ɻ

    ˞1 αʔϏε։࢝೔ʢ2013೥7݄2೔ʣ͔Βͷ೔ຊࠃ಺ྦྷܭग़඼਺

    View Slide

  4. https://about.mercari.com/press/news/article/mercari_500million/
    ϝϧΧϦ͸ɺ2019೥9݄18೔࣌఺ͰϑϦϚΞϓϦʮϝϧΧϦʯͷྦྷܭऔҾ݅਺͕5ԯ݅(※1)Λಥഁ͍ͨ͠·ͨ͠ͷͰ͓஌Β͍ͤͨ͠·͢ɻ
    ※1ɿαʔϏε։࢝೔ʢ2013೥7݄2೔ʣ͔Βͷ೔ຊࠃ಺ྦྷܭऔҾ݅਺

    View Slide

  5. • (ӡ༻͔ΒΈͯ) Mercari ͸ ๭SNSͷΑ͏ͳαʔϏε
    • ঎඼ͷग़඼ɾऔҾʹ൐ͬͯߋ৽͕ඵؒ਺ඦճҎ্ൃੜ
    • ର໘औҾͷΑ͏ͳεϜʔζ͞ͷͨΊʹɺݕࡧIndex൓ө΋ՄೳͳݶΓߴ଎ʹ

    View Slide

  6. Softwares for Search
    • 2013.7 (?) ~
    • Solr on BareMetal Servers
    • Nginx as LB
    • 2019.7 ~ New Architecture
    • Elasticsearch on GKE

    View Slide

  7. View Slide

  8. Jurassic Period
    PHP
    Solr
    update select

    View Slide

  9. Jurassic Period
    PHP
    Solr
    update select

    View Slide

  10. • ૿͑Δ঎඼ɺ৳ͼΔDAU

    View Slide

  11. Cretaceous period
    PHP
    Master
    update select
    Slave Slave Slave
    Nginx
    Replication pollInterval 30s

    View Slide

  12. Cretaceous period
    PHP
    Master
    update select
    Slave Slave Slave
    Nginx
    Replication pollInterval 30s

    View Slide

  13. K-Pgڥք
    • ΋ͬͱ૿͑Δ঎඼ɺ΋ͬͱ৳ͼΔDAU
    • Tuning JVM/GC
    • Tried CMS , Parallel GC and G1GC. Parallel GC was better for this Era
    • Tuning Query
    • Use filter query correctly
    • Split Index and Fallback in Nginx

    View Slide

  14. Paleogene period
    PHP
    Recent
    Master
    update select
    Recent
    Slave
    Recent

    Slave
    Recent

    Slave
    OpenResty
    Replication pollInterval 30s
    All
    Master
    All

    Slave
    All

    Slave
    update
    Replication pollInterval 1min

    View Slide

  15. • ݕࡧIndexͷ෼ׂ
    • Mercari ͷݕࡧͷσϑΥϧτͷฒͼॱ͸ʮ৽ணʯ
    • ৽ண঎඼͕ೖΔRecent Indexͱશͯͷ঎඼͕ೖΔAll Indexʹ෼ׂ
    • Recent Index͸υΩϡϝϯτ਺͕ݮΔ͜ͱͰෛՙ࡟ݮ
    • All Index͸ࢀর਺͕ݮΓɺෛՙ͕খ͘͞

    View Slide

  16. • Indexͷ෼ׂOpenResty ʹΑΔ Recent͔ΒAll΁ͷࣗಈFall back
    • OpenResty
    • Nginx͓Αͼɺngx_luaΛ͸͡Ίͱ͢ΔCͰॻ͔Ε֤ͨछαʔυύʔςΟϞδϡʔϧͳ
    ͲͰιϑτ΢ΣΞɾσΟετϦϏϡʔγϣϯ
    • ݕࡧrequestΛड͚ͨࡍʹɺ·ͣRecent Indexʹରͯ͠ݕࡧΛ࣮ߦ͠ɺಘΒΕͨJSONΛ
    Nginx಺ͷLuaͰॲཧɺऔಘݸ਺(rows)ʹରͯ݁͠Ռ͕ෆ଍͍ͯ͠Ε͹ɺAll Indexʹର͠
    ͯΫΤϦΛ͠ͳ͓͢

    View Slide

  17. Paleogene period
    PHP
    Recent
    Master
    update select
    Recent
    Slave
    Recent

    Slave
    Recent

    Slave
    OpenResty
    Replication pollInterval 30s
    All
    Master
    All

    Slave
    All

    Slave
    update
    Replication pollInterval 1min

    View Slide

  18. Paleogene period
    PHP
    Recent
    Master
    update select
    Recent
    Slave
    Recent

    Slave
    Recent

    Slave
    OpenResty
    Replication pollInterval 30s
    All
    Master
    All

    Slave
    All

    Slave
    update
    Replication pollInterval 1min

    View Slide

  19. • ͞Βʹ૿͑Δ঎඼ɺ͞Βʹ૿͑ΔDAU
    • (ݕࡧେख)Bot, Scraper ऻདྷ
    • ৽ண঎඼͓஌ΒͤϝʔϧɺՁ֨ɾΧςΰϦαδΣετͳͲͷ৽ػೳ

    View Slide

  20. • ઐ༻ Slave ͷ࡞੒
    • Bot, ScaperͷΞΫηεىҼͷݕࡧϦΫΤετͷ൑ఆ
    • ৽ண঎඼͓஌Βͤϝʔϧͷݕࡧॲཧͷ෼཭
    • ʮDescription͕ΩʔϫʔυʯͷՁ֨ɾΧςΰϦαδΣετͷॏ͍ΫΤϦͷ෼཭
    • Query Rewriting by Lua

    View Slide

  21. Neogene period
    Recent
    Master
    update
    select
    Recent
    Slave
    Recent

    Slave
    Condition

    Slave
    OpenResty
    Replication pollInterval 30s
    All
    Master
    All

    Slave
    All

    Slave
    update
    Replication pollInterval 1min
    Condition

    Slave
    Suggest

    Slave
    Suggest

    Slave
    Replication pollInterval bit long
    PHP
    General Specific purpose

    View Slide

  22. local create_t = string.match(token, "^fq=created_time_t%%3A%%5B(%d%d%d%d%-%d%d%-%d%dT%d%d%%3A%d%d%%3A%d)
    %dZ%+TO%+%%2A%%5D$")
    if create_t then
    -- conditionsͰfilter queryͷhit཰Λ͋͛ΔͨΊɺcreated_time_tͷ࠷ޙΛ9ඵʹͯ͠͠·͏
    create_t_filter = "created_time_t%3A%5B" .. create_t .. "9Z+TO+%2A%5D"
    -- ঎඼ͷঢ়ଶ: શ෦ࢦఆ͞ΕͯΔͷͰফ͢
    args = string.gsub(ngx.var.args,"&fq=item_condition_id%%3A%%281%+OR%+2%+OR%+3%+OR%+4%+OR%+5%+OR%+6%%29","")
    -- ࠷௿Ձ͕֨300ԁͳͷͰফͤΔ
    args = string.gsub(args,"&fq=price_t%%3A%%5B300%%2BTO%%2B*%%5D", "")
    -- ঎඼ͷՁ֨: frangeʹॻ͖׵͑ɹ
    args = string.gsub(args,"&fq=price_t%%3A%%5B%%2A%+TO%+(%d+)%%5D","&fq=%%7B%%21frange%+cache%%3Dfalse%+cost%
    %3D150%+u%%3D%1%%7Dprice_t")
    -- "͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑͑" ରࡦɻ10ճҎ্࿈ଓͰ10จࣈ·Ͱ੾Γ٧Ί
    args, _, _ = ngx.re.gsub(args, "(%[0-9a-f][0-9a-f]%[0-9a-f][0-9a-f]%[0-9a-f][0-9a-f])\\1{9,}",
    "$1$1$1$1$1$1$1$1$1$1", "i")

    View Slide

  23. Neogene period
    Recent
    Master
    update
    select
    Recent
    Slave
    Recent

    Slave
    Condition

    Slave
    OpenResty
    Replication pollInterval 30s
    All
    Master
    All

    Slave
    All

    Slave
    update
    Replication pollInterval 1min
    Condition

    Slave
    Suggest

    Slave
    Suggest

    Slave
    Replication pollInterval bit long
    PHP
    General Specific purpose

    View Slide

  24. Neogene period
    Recent
    Master
    update
    select
    Recent
    Slave
    Recent

    Slave
    Condition

    Slave
    OpenResty
    Replication pollInterval 30s
    All
    Master
    All

    Slave
    All

    Slave
    update
    Replication pollInterval 1min
    Condition

    Slave
    Suggest

    Slave
    Suggest

    Slave
    Replication pollInterval bit long
    PHP
    General Specific purpose

    View Slide

  25. • εϜʔζͳSolr ͷόʔδϣϯΞοϓͷ࣮ݱ (4 → 6 → 8)
    • ΠϯσοΫεΛมߋ͍ͨ͠ (BigramԽ)
    • BigramͰͷύϑΥʔϚϯεҡ࣋ͷͨΊ͸BareMetal αʔόͷεέʔϧΞοϓ΋ඞཁ
    • Microservices Խ/ϦΞʔΩςΫνϟ͍ͨ͠
    • Master ͷো֐΁ͷඋ͑

    View Slide

  26. • தؒ queue αʔϏε (solr-queue-app)
    • Solr΁ͷߋ৽ϦΫΤετͷJSONΛҰ౓Ωϡʔʹ֨ೲ͔ͯ͠Βɺࢦఆ͞ΕͨαʔόʹPOST͠௚͢
    • PHPͷίʔυΛมߋͤͣʹ Master ௥ՃɾมߋͰ͖ΔΑ͏ʹ
    • Solr-DB (MySQL)
    • ߋ৽σʔλΛ঎඼ID͝ͱʹ෼ׂ͠ɺMySQLʹ֨ೲ
    • MySQLͷσʔλ͔ΒશΠϯσοΫεͷ࡞੒͕਺࣌ؒͰߦ͑Δ
    • ͜Ε·Ͱ͸਺೔͔Β਺िؒ
    • Cloud PubSub ΁ͷૹ৴

    View Slide

  27. Quaternary period
    PHP
    Solr6
    Master
    Solr8
    Master
    Solr6’
    Master
    MySQL

    Q4M
    API
    Worker
    Q4M
    API
    Worker
    API
    solr-queue-app
    API

    View Slide

  28. Current

    View Slide

  29. Current

    View Slide

  30. View Slide

  31. • ௨ৗͷ৽ணɾΧςΰϦλΠϜϥΠϯ
    • MySQL ͔Β৽ணΛऔಘ
    • ߋ৽͸΄΅ϦΞϧλΠϜ
    • ͓͢͢ΊλΠϜϥΠϯ
    • ӾཡཤྺͳͲ͔ΒSolrΫΤϦΛ࡞੒ɺSolr͔ΒϢʔβ͝ͱʹҟͳΔ঎඼Λ৽ண
    ॱʹදࣔ
    • ߋ৽͸΄΅ϦΞϧλΠϜ

    View Slide

  32. • ϦΞϧλΠϜͳΠϯσοΫε൓ө
    • ߴ଎ͳϨεϙϯεͷ࣮ݱ

    View Slide

  33. • ϦΞϧλΠϜͳΠϯσοΫε൓ө
    • ϨϓϦέʔγϣϯ͸ͤͣɺશ෦Master
    • soft commitΛ1ඵҎԼʹઃఆ
    • ߴ଎ͳϨεϙϯεͷ࣮ݱ
    • ෼ࢄΠϯσοΫεʹΑΓαʔό͋ͨΓͷυΩϡϝϯτΛݮΒ͢
    • ৽ணʹे෼ͳυΩϡϝϯτ਺ͷΈอ࣋ / εΩʔϚ΋γϯϓϧʹ

    View Slide

  34. PHP PHP
    API
    MySQL
    blackhole
    black
    hole Q4M
    Solr
    (master)
    worker
    trigger dequeue
    black
    hole Q4M
    Solr
    (master)
    worker
    trigger dequeue
    black
    hole Q4M
    Solr
    (master)
    worker
    trigger dequeue
    black
    hole Q4M
    Solr
    (master)
    worker
    trigger dequeue
    soft commit several times per second
    Update
    Update item selected
    by consistent hashing
    Use MySQL replication as PubSub

    View Slide

  35. black
    hole Q4M
    Solr
    (master)
    worker
    trigger dequeue
    consul
    my $res = $ua->get(‘http://localhost/v1/health/service/".$SRV.'?passing');
    my $ref = JSON::XS::decode_json($res->content);
    my @list = sort { $a cmp $b } map { $_->{Node}{Address} } @$ref;
    my $ketama = Algorithm::ConsistentHash::Ketama->new();
    $ketama->add_bucket($_ . '_' . $timestamp, 1) for @list;
    my $s1 = $ketama->hash($item_id);
    return $s1 eq $my_ip;
    Get server list from Consul
    Make Consistent Hash
    Drawing by consistent-hashing
    Update Solr
    consistent-hashingͷ݁Ռ͕trueͳΒ͹ɺupdate

    consistent-hashingͷ݁Ռ͕falseͳΒ͹ɺdelete

    View Slide

  36. black
    hole Q4M
    Solr
    (master)
    worker
    trigger dequeue
    consul
    black
    hole Q4M
    Solr
    (master)
    worker
    trigger dequeue
    consul
    black
    hole Q4M
    Solr
    (master)
    worker
    trigger dequeue
    consul
    black
    hole Q4M
    Solr
    (master)
    worker
    trigger dequeue
    consul
    API/Go
    PHP PHP
    select
    distribute select request to all Solr servers
    and merge their resposne
    select
    select
    select
    select

    View Slide

  37. View Slide

  38. Current

    View Slide

  39. ͓ΘΓ

    View Slide