Slide 1

Slide 1 text

Unsupervised synonym Extraction Kanji Yomoda (@k-yomo) December 2021

Slide 2

Slide 2 text

Confidential & Proprietary 2021 Contents - Concept of synonyms - Case study - eBay - Amazon - GBD project - Conclusion

Slide 3

Slide 3 text

Confidential & Proprietary 2021 Synonyms in search? Synonyms in search are not only synonyms?

Slide 4

Slide 4 text

Confidential & Proprietary 2021 Synonyms in search? LV ↭ Louis Vuitton Acronym Smartphone → iPhone Parent - Child 開戦 → 海鮮 Selecting wrong kanji Buisness→ Business Misspelling Sofa ↭ Couch Synonym Feb ↭ February Abbreviation

Slide 5

Slide 5 text

Confidential & Proprietary 2021 Synonym in search? Query Query Synonym Synony m Syn ony m What user want to find with a query A way of query expansion

Slide 6

Slide 6 text

Confidential & Proprietary 2021 Background “Xmas tree” => 904 results “Christmas tree” => 999+ (about 15k) results

Slide 7

Slide 7 text

Case study

Slide 8

Slide 8 text

Confidential & Proprietary 2021 eBay http://ceur-ws.org/Vol-2410/paper20.pdf

Slide 9

Slide 9 text

Confidential & Proprietary 2021 Candidate Generation - eBay - Query - Query Transitions pairs of queries within the same user session - Query - Item Transitions pairs of query and clicked item titles in the same SRP - Item - Item transitions pairs of items shown in the same SRP 1. Create several pairs of n-grams from following transitions

Slide 10

Slide 10 text

Confidential & Proprietary 2021 Candidate Generation - eBay Query - Query Transitions (bigram) e.g 「womens dress」→ 「ladies gown」 womens - ladies, womens - gown, dresses - ladies dresses - gown, womens dresses - ladies, womens dresses - gown, womens dresses - ladies gown, ladies gown - womens, ladies gown - dresses

Slide 11

Slide 11 text

Confidential & Proprietary 2021 Candidate Generation - eBay 1. Stemming equivalents → exclude boat - boats 2. Compounding equivalents → exclude sail boat and sailboat 3. External dictionary based synonyms → exclude ones found in dict 4. Acronyms-full form equivalents → identify hp ↭ Hewlett Packard 5. Phrasing(Enforce strict word ordering) → restrict only “new balance” 6. and more (Stop word) 2. Filter with following components

Slide 12

Slide 12 text

Confidential & Proprietary 2021 Filtering - eBay ● Trained on human judged binary labels indicating the applicability of the synonyms to query expansions label=0 => irrelevant pair, e.g. shoes - sandals label=1 => true synonyms, e.g. shoe - shoes ● The classifier has 100 trees with a tree depth of 20, used the scikit-learn package from python ● For behavioral features, clicks, sales and other associated engagement signals(like query-item, item-item transitions and so on) Employed Random Forest classifier

Slide 13

Slide 13 text

Confidential & Proprietary 2021 Filtering - eBay the absolute error for stemming based synonyms overall seems higher than that of the space synonyms for most subtypes http://ceur-ws.org/Vol-2410/paper20.pdf

Slide 14

Slide 14 text

Confidential & Proprietary 2021 Experiment - eBay Online A/B test Did online A/B test for 3 week Default: No rewrite Test: Rewrite query with extracted synonyms and E.g. “ps 4 and games” → ((ps 4 OR playstation 4) AND (games OR game) → improvements in both relevance of the search result page (SRP) and user engagement

Slide 15

Slide 15 text

Confidential & Proprietary 2021 Experiment - eBay http://ceur-ws.org/Vol-2410/paper20.pdf

Slide 16

Slide 16 text

Confidential & Proprietary 2021 Amazon https://assets.amazon.science/ba/2b/33b4140240049f3c8261d2ddcf2c/unsupervised-synonym-extraction-for-document-enhan cement-in-ecommerce-search.pdf

Slide 17

Slide 17 text

Confidential & Proprietary 2021 Candidate Generation - Amazon Employed a query-product graph to have more candidates A solid line indicates a direct connection and a dotted line denotes an indirect connection

Slide 18

Slide 18 text

Confidential & Proprietary 2021 Candidate Generation - Amazon 1. Create query - title bipartite graph 2. Cluster nodes with label propagation algorithm https://arxiv.org/pdf/1709.05634.pdf

Slide 19

Slide 19 text

Confidential & Proprietary 2021 Candidate Generation - Amazon 3. vectorized the queries using a weight vector by using the connected products in the same graph 4. computed the cosine similarity 5. Add indirect connections A solid line indicates a direct connection and a dotted line denotes an indirect connection

Slide 20

Slide 20 text

Confidential & Proprietary 2021 Filtering - Amazon 1. Trained model on the search queries and product title dataset 2. Fine-tuned by using Query-Intent labels 3. Generate 768-dimension for each n-gram in the synonym candidates 4. Filter by cosine similarity of each synonym pairs Employed BERT model

Slide 21

Slide 21 text

Confidential & Proprietary 2021 Experiment - Amazon 14.5% expansions were irrelevant, 33.1% were relevant in the context. 52.4% were fully interchangeable Offline Human audits Online A/B test Expand the text in the product description using extracted over one million synonyms and did A/B test in 6 countries → increased both purchase and conversion

Slide 22

Slide 22 text

Confidential & Proprietary 2021 Takeaway 1. Several approaches to populate large synonym candidates 2. Easier to generate candidates but difficult to filter 3. Different approaches for applying synonyms Query expansion or Document expansion

Slide 23

Slide 23 text

Confidential & Proprietary 2021 References - Query Rewriting using Automatic Synonym Extraction for E-commerce Search - Unsupervised Synonym Extraction for Document Enhancement in E-commerce Search - Learning from Labeled and Unlabeled Data with Label Propagation - Light Feed-Forward Networks for Shard Selection in Large-scale Product Search

Slide 24

Slide 24 text

Confidential & Proprietary 2021 Thanks!