Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MTC2018 - Mercari's Search Service: Now and the...

mercari
October 04, 2018

MTC2018 - Mercari's Search Service: Now and the Future

Speaker: Kenji Sugiki, Yanpeng Lin

The Search Team develops and improves on features for the Mercari app such as suggestions, related words, and the recommendations tab. In this session, Sugiki will introduce the various features they are developing in their goal to compete on the global stage.

mercari

October 04, 2018
Tweet

More Decks by mercari

Other Decks in Technology

Transcript

  1. Mercari's Search Service: Now and the Future Kenji Sugiki Software

    Engineer (Search) Yanpeng Lin Software Engineer (Search)
  2. Largest number of listings in Japan Total number of listings

    on the Mercari app 7/13/2018 million 1 billion
  3. Lots of active traffic (listing→buying) GMV/MAU Trends GMV and MAU

    are increasing steadily. 100MM yen 10k people
  4. Mercari’s uniqueness = C2C ・Life cycle of listings is fast

     ・One of a kind (only one user can purchase) ・High frequency of listings and updates  ・High MAU and GMV  ・Price negotiations and edits occur frequently ・Master catalog is largely unsaved The quality and function of searches are very important
  5. Search functions supporting Mercari Timeline Item search New listing notifications

    Related listings Keyword completion Related keywords Recommendation timeline Functions that have been added or improved in the last year
  6. Current search platform: monolithic environment         mercari-api        

    ・・・     ・・・ Master/slave configuration Data divided by service Proxy Reduce internal hassle
  7. Internal search API: basic design QueryBuilder Reranking SolrFetcher PostItemFilter Search

    service QueryRewriter     Query generated from parameters Logic AB Test Dynamic query conversion Adjust/exclude certain items Search field Synonyms Token filter Internal service Timeline, related listings, etc. ResultBuilder Controller Additional data from DB Format Solr result Search pipeline
  8. Search team’s current work ・Add functions, correct logic, AB tests

    ・Solr operations (synonyms, schema changes) ・Create next-generation search platform ・AI-based search optimization Search system + quality improvement ・Keyword completion ・Related keywords ・Personalized recommendations  ・Recommendation timeline Search functions + UI/UX improvement Responsible for all of the various functions to search listings
  9. mercari-api Viewing history API Acquire frequent categories/brands from viewed listings

    user_id Build Solr filter query and cache Data acquisition New arrival notification Data inquiries Polling Architecture of the recommendation timeline    
  10. Related keyword service Popular words by category Popular words by

    brand Words related to the keyword Can be applied to only categories or brands *Currently public release is only for iOS
  11. mercari-api Related keywords Batch computation All data is cached Keyword

    Categories/brands Related keywords Architecture for related keywords (API side) Name-based data: keyword→keyword Related keyword: keyword→related keyword... Kubernetes Engine Save
  12. Batch pipeline (daily cron-like execution) Kubernetes Engine Generate related keywords

    Import Mercari API Delete keyword differences Architecture for related keywords (batch side; CI/CD) Develop git push Automatic image building and deployment Notification of batch results via Slack
  13. User Search keyword Item Search keyword N x M K

    x M BigQuery ① Acquire user event log: search with keyword then tap items ② Keyword normalization ・NFKC + lowercase ・Synonyms, restricted words, spacing ・Align according to most-tapped keywords ③ Determine categories/brands for each keyword ・Aggregate frequency of categories/brands of tapped items ・Calculate skewed distribution with unique UDFs ・Removal may occur ⑤ Merge both models Deduplicate if related keywords have an inclusion relation ④ Transform into sparse matrices to calculate keyword similarity - keyword, user_id - keyword, item_id Generation of related keywords
  14. Query Autocomplete (QAC) Merca Mercari store Mercari book Mercari material

    Mercado bag Release on demand Keyword quality control
  15. QAC built upon microservice platform Client API gateway Mercari API

    Sakura GCP QAC QAC K8S Continuous testing Continuous building Continuous delivery QAC quality control service
  16. Keyword quality control Filter function 1 Filter function 2 Filter

    function ... Filter function ... Filter function N Recall strategy 1 Recall strategy 2 Recall strategy ... Recall strategy ... Recall strategy N QAC (quality control service) Matching function
  17. Keyword quality control (text matching) Filter function 1 Filter function

    2 Filter function ... Filter function ... Filter function N Recall strategy 1 Recall strategy 2 Recall strategy ... Recall strategy ... Recall strategy N Bag Shine muscat Erotica XXX Bag Shine muscat
  18. Keyword quality control (semantic matching) Matching function Keyword Item Heterogeneous

    network embedding Super-bit hashing Lucene angle matching
  19. QAC microservice recap Client API gateway Mercari API Sakura GCP

    QAC QAC K8S Continuous testing Continuous building Continuous delivery QAC quality control service
  20. Creating a world-standard search team ・Recruiting more members ・From monolithic

    to microservice architecture ・Create a more scalable, stronger search platform ・Apply IR, NLP, and ML in ways we can show to the world  ・Structure semi-structured data by detecting item attributes and entities  ・High-speed PDCA for search improvements   ・Consider search evaluation metrics unique to C2C   ・Automatic rank tuning (LTR)  ・Participate in and present at various international conferences
  21. In July and September, we attended international conferences and held

    briefing sessions! Please join us next year!