Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unsupervised synonym Extraction

Kanji Yomoda
December 22, 2021

Unsupervised synonym Extraction

Introducing synonyms and the below two papers.
- Query Rewriting using Automatic Synonym Extraction for E-commerce Search
http://ceur-ws.org/Vol-2410/paper20.pdf
- Unsupervised Synonym Extraction for Document Enhancement in E-commerce Search
https://assets.amazon.science/ba/2b/33b4140240049f3c8261d2ddcf2c/unsupervised-synonym-extraction-for-document-enhancement-in-ecommerce-search.pdf

Kanji Yomoda

December 22, 2021
Tweet

More Decks by Kanji Yomoda

Other Decks in Technology

Transcript

  1. Confidential & Proprietary 2021 Contents - Concept of synonyms -

    Case study - eBay - Amazon - GBD project - Conclusion
  2. Confidential & Proprietary 2021 Synonyms in search? LV ↭ Louis

    Vuitton Acronym Smartphone → iPhone Parent - Child 開戦 → 海鮮 Selecting wrong kanji Buisness→ Business Misspelling Sofa ↭ Couch Synonym Feb ↭ February Abbreviation
  3. Confidential & Proprietary 2021 Synonym in search? Query Query Synonym

    Synony m Syn ony m What user want to find with a query A way of query expansion
  4. Confidential & Proprietary 2021 Background “Xmas tree” => 904 results

    “Christmas tree” => 999+ (about 15k) results
  5. Confidential & Proprietary 2021 Candidate Generation - eBay - Query

    - Query Transitions pairs of queries within the same user session - Query - Item Transitions pairs of query and clicked item titles in the same SRP - Item - Item transitions pairs of items shown in the same SRP 1. Create several pairs of n-grams from following transitions
  6. Confidential & Proprietary 2021 Candidate Generation - eBay Query -

    Query Transitions (bigram) e.g 「womens dress」→ 「ladies gown」 womens - ladies, womens - gown, dresses - ladies dresses - gown, womens dresses - ladies, womens dresses - gown, womens dresses - ladies gown, ladies gown - womens, ladies gown - dresses
  7. Confidential & Proprietary 2021 Candidate Generation - eBay 1. Stemming

    equivalents → exclude boat - boats 2. Compounding equivalents → exclude sail boat and sailboat 3. External dictionary based synonyms → exclude ones found in dict 4. Acronyms-full form equivalents → identify hp ↭ Hewlett Packard 5. Phrasing(Enforce strict word ordering) → restrict only “new balance” 6. and more (Stop word) 2. Filter with following components
  8. Confidential & Proprietary 2021 Filtering - eBay • Trained on

    human judged binary labels indicating the applicability of the synonyms to query expansions label=0 => irrelevant pair, e.g. shoes - sandals label=1 => true synonyms, e.g. shoe - shoes • The classifier has 100 trees with a tree depth of 20, used the scikit-learn package from python • For behavioral features, clicks, sales and other associated engagement signals(like query-item, item-item transitions and so on) Employed Random Forest classifier
  9. Confidential & Proprietary 2021 Filtering - eBay the absolute error

    for stemming based synonyms overall seems higher than that of the space synonyms for most subtypes http://ceur-ws.org/Vol-2410/paper20.pdf
  10. Confidential & Proprietary 2021 Experiment - eBay Online A/B test

    Did online A/B test for 3 week Default: No rewrite Test: Rewrite query with extracted synonyms and E.g. “ps 4 and games” → ((ps 4 OR playstation 4) AND (games OR game) → improvements in both relevance of the search result page (SRP) and user engagement
  11. Confidential & Proprietary 2021 Candidate Generation - Amazon Employed a

    query-product graph to have more candidates A solid line indicates a direct connection and a dotted line denotes an indirect connection
  12. Confidential & Proprietary 2021 Candidate Generation - Amazon 1. Create

    query - title bipartite graph 2. Cluster nodes with label propagation algorithm https://arxiv.org/pdf/1709.05634.pdf
  13. Confidential & Proprietary 2021 Candidate Generation - Amazon 3. vectorized

    the queries using a weight vector by using the connected products in the same graph 4. computed the cosine similarity 5. Add indirect connections A solid line indicates a direct connection and a dotted line denotes an indirect connection
  14. Confidential & Proprietary 2021 Filtering - Amazon 1. Trained model

    on the search queries and product title dataset 2. Fine-tuned by using Query-Intent labels 3. Generate 768-dimension for each n-gram in the synonym candidates 4. Filter by cosine similarity of each synonym pairs Employed BERT model
  15. Confidential & Proprietary 2021 Experiment - Amazon 14.5% expansions were

    irrelevant, 33.1% were relevant in the context. 52.4% were fully interchangeable Offline Human audits Online A/B test Expand the text in the product description using extracted over one million synonyms and did A/B test in 6 countries → increased both purchase and conversion
  16. Confidential & Proprietary 2021 Takeaway 1. Several approaches to populate

    large synonym candidates 2. Easier to generate candidates but difficult to filter 3. Different approaches for applying synonyms Query expansion or Document expansion
  17. Confidential & Proprietary 2021 References - Query Rewriting using Automatic

    Synonym Extraction for E-commerce Search - Unsupervised Synonym Extraction for Document Enhancement in E-commerce Search - Learning from Labeled and Unlabeled Data with Label Propagation - Light Feed-Forward Networks for Shard Selection in Large-scale Product Search