Slide 1

Slide 1 text

Recent Advances on Candidate Matching Chenliang LI Wuhan University 2023.03.02

Slide 2

Slide 2 text

Introduction Part 1 2 We are now living in an age of INFORMATION EXPLOSION Search Recommendation Chatbot Information Seeking happens everyday for everyone in almost everywhere

Slide 3

Slide 3 text

Introduction Part 1 3 Candidate Matching Ranking Phase 1 Phase 2 Hundreds Tens Models: Matching Models Ranking Models All Items Millions User History and Contexts All other Side Info Data Sources: The two-stage pipeline is widely deployed in real-world systems Ranking Models Matching Models Candidate Matching / Retrieval / Generation are equivalent to each other

Slide 4

Slide 4 text

Introduction Part 1 4 Candidate Matching Ranking Phase 1 Phase 2 Hundreds Tens Models: Matching Models Ranking Models All Items Millions User History and Contexts All other Side Info Data Sources: The two-stage pipeline is widely deployed in real-world systems Ranking Models Matching Models LOW LATENCY Query Encoder Item Encoder starbucks drinks S

Slide 5

Slide 5 text

Introduction Part 1 5 Candidate Matching Ranking Phase 1 Phase 2 Hundreds Tens Models: Matching Models Ranking Models All Items Millions User History and Contexts All other Side Info Data Sources: The two-stage pipeline is widely deployed in real-world systems Ranking Models Matching Models LOW LATENCY Query Encoder Item Encoder starbucks drinks S User History and Contexts All other Side Info LOW LATENCY How to Use Context and Side Info

Slide 6

Slide 6 text

Introduction Part 1 6 The HISTORY for candidate matching Item-based CF Representation Learning Interaction-based Learning Before 2013 Now Pearson-based CF SLIM (i.e., MF) DSSM, YoutubeDNN, BST, DPR, ESAM, Condenser, MADR, MIND, KEMI UMI, PDN, AGREE

Slide 7

Slide 7 text

Introduction Part 1 7 The HISTORY for candidate matching Item-based CF Pearson-based CF SLIM (i.e., MF) A B C High Correlation Item-based filtering (Amazon, 2001) Pearson Coefficient Latent Vector

Slide 8

Slide 8 text

Introduction Part 1 8 The HISTORY for candidate matching Representation Learning DNN/Transformer DSSM, YoutubeDNN, BST, DPR, ESAM, Condenser, MADR, MIND, KEMI Multi-Interest Learning

Slide 9

Slide 9 text

Introduction Part 1 9 The HISTORY for candidate matching Interaction-based Learning UMI, PDN, AGREE How to enable efficient interaction-based feature learning?

Slide 10

Slide 10 text

Overview - Representation Learning – Deep DNN Part 2 10 p Computation parallelization and speedup are must-be for dense retrieval Deep Neural Networks for YouTube Recommendations, RecSys 2016 Learning Deep Structured Semantic Models for Web Search using Clickthrough Data, CIKM 2013

Slide 11

Slide 11 text

Overview - Representation Learning - Multi-Interest Learning Part 2 11 p To model multi-aspect / -interest nature of the world Controllable Multi-Interest Framework for Recommendation, KDD 2020 Multi-Aspect Dense Retrieval, KDD 2022

Slide 12

Slide 12 text

Overview - Representation Learning - GNN Part 2 12 p To exploit high-order semantics and correlations Neural Graph Collaborative Filtering, SIGIR 2019 LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation, SIGIR 2020

Slide 13

Slide 13 text

Overview - Representation Learning – GNN + Multi-Interest Part 2 13 1 • Refine user preferences based on multi-level correlations between historical items. Graph convolution Aggregation 2 • Focus on extracting different interests by performing historical item clustering. Multi-interest learning p To exploit the both benefits of GNN and multi-interest learning When Multi-Level Meets Multi-Interest: A Multi-Grained Neural Model for Sequential Recommendation, SIGIR 2022

Slide 14

Slide 14 text

Overview - Representation Learning – GNN + Multi-Interest Part 2 14 When Multi-Level Meets Multi-Interest: A Multi-Grained Neural Model for Sequential Recommendation, SIGIR 2022

Slide 15

Slide 15 text

Overview - Representation Learning – Long-tail Problem Part 2 15 p Non-displayed items cause exposure bias • Trained only with displayed items • Retrieve items in the entire space Exposure Bias Displayed Non-displayed Label ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance, SIGIR 2021

Slide 16

Slide 16 text

Overview - Representation Learning – Long-tail Problem Part 2 16 p Why poor long-tail performance • Domain Shift • Representation learning is not robust and consistent Cause How to solve • Unsupervised Domain Adaptation (reduce domain shift) ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance, SIGIR 2021

Slide 17

Slide 17 text

Overview - Representation Learning – Long-tail Problem Part 2 17 p Ranking model backbone • Unsupervised domain adaptation to reduce domain shift p Formula definition ESAM ESAM n Motivation • Architectural solutions may not generalize well. • Highlight the importance of learning good feature representations for non-displayed items. Non-displayed Items Displayed Items ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance, SIGIR 2021

Slide 18

Slide 18 text

Overview - Representation Learning – Long-tail Problem Part 2 18 p Domain Shift • Attribute correlation alignment 𝑳𝑫𝑨 Displayed Non-displayed Label Source Domain Target Domain • High-Level Attribute Distribution definition • Distribution verification ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance, SIGIR 2021

Slide 19

Slide 19 text

Overview - Representation Learning – Long-tail Problem Part 2 19 p Center-wise clustering for source domain 𝑳𝑫𝑪 𝒄 • Attribute correlation alignment • 𝑳𝑫𝑪 𝒄 makes similar items cohere together while dissimilar items separate from each other Items with the same feedback (click) are similar Item Click? ✔ × ✔ × × × ✔ ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance, SIGIR 2021

Slide 20

Slide 20 text

Overview - Representation Learning – Long-tail Problem Part 2 20 p Self-training for target clustering 𝑳𝑫𝑪 𝒑 • To suppress negative transfer & easy-to-hard strategy • Why: ignoring target label information when aligning • Entropy regularization Negative Transfer ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance, SIGIR 2021

Slide 21

Slide 21 text

Overview – Representation Learning - Auxiliary Knowledge Part 2 21 Knowledge-aware Attention for Information Propagation p Linking items with their attributes KGAT: Knowledge Graph Attention Network for Recommendation, KDD 2019

Slide 22

Slide 22 text

Overview - Representation Learning – Auxiliary Knowledge - MMoE Part 2 22 p Sharing App in Taobao • UV ~ 19M, GMV ~ 400M p Target • More sharing between users in the platform • More interactions and more friends p Features • Many different sharing scenarios • Correlations and discrepancies Questions: 1. How to exploit scenario dependent knowledge ? 2. How to handle the low-resource nature for long-tail scenarios ? 3. How to accommodate with social relations between users ? Heterogeneous Graph Augmented Multi-Scenario Sharing Recommendation with Tree-Guided Expert Networks, WSDM 2021

Slide 23

Slide 23 text

Overview - Representation Learning – Auxiliary Knowledge - MMoE Part 2 23 p TreeMMoE • We can build a tree structure to model the hierarchical relations between different scenarios “C2C→Sharing→Entity→Product→Makeups” • Some fine-grained scenarios would hold some common ancestor scenarios • Knowledge can be transferred Heterogeneous Graph Augmented Multi-Scenario Sharing Recommendation with Tree-Guided Expert Networks, WSDM 2021

Slide 24

Slide 24 text

Overview - Representation Learning – Auxiliary Knowledge - MMoE Part 2 24 p TreeMMoE Each layer of the tree has a gate for the corresponding expert network Heterogeneous Graph Augmented Multi-Scenario Sharing Recommendation with Tree-Guided Expert Networks, WSDM 2021

Slide 25

Slide 25 text

Overview - Representation Learning – Auxiliary Knowledge - MMoE Part 2 25 p Sum up for representation learning. Two-Tower DNN Multi-Interest Learning Long-tail Problem Auxiliary Knowledge Representation Learning

Slide 26

Slide 26 text

Overview - Interaction-based Learning – Attention Mechanism Part 2 26 p Attention Mechanism (Target Attention) for relevant feature highlighting Deep Interest Network for Click-Through Rate Prediction, KDD 2018

Slide 27

Slide 27 text

Overview - Interaction-based Learning – User Side Part 2 27 p Identify important user features / behaviors for better representation User-Aware Multi-Interest Learning for Candidate Matching in Recommenders, SIGIR 2022

Slide 28

Slide 28 text

Overview - Interaction-based Learning – Item Side Part 2 28 p Identify important item features / behaviors for precise item-item relevance Path-based Deep Network for Candidate Item Matching in Recommenders, SIGIR 2021

Slide 29

Slide 29 text

Overview - Interaction-based Learning – Optimization Part 2 29 p Enable target attention in an efficient way Fast Semantic Matching via Flexible Contextualized Interaction, WSDM 2022

Slide 30

Slide 30 text

Overview - Representation Learning – Auxiliary Knowledge - MMoE Part 2 30 p Sum up for interaction-based learning. User Side Item Side Reduce Computation Cost Interaction-Based Learning

Slide 31

Slide 31 text

Future Trends – One Model Serves ALL Part 3 31 Automatically Network Sharing and Optimization Category Search Insurance Search Hot Trends Item Search Live Commerce 500万Content Matrices 直播/视频 Complicated Entity Relations X00+ Insurances 8k+Fund 80K+Stocks 美股/港股/A股 4k+Agents 100+ Financial Organ. XXX Sections Category Search Section/Con cept Search Fund Search (Price) Hot Trends QA Diverse Queries & Intents 白酒/军工/材料 富/广发/鹏华 人寿险/车险、财产险/健康险 新发/热门/金选 ALiPay APP Automatic Expert Selection for Multi-Scenario and Multi-Task Search, SIGIR 2022

Slide 32

Slide 32 text

Future Trends – One Model Serves ALL Part 3 32 One model serves ALL – Automatically Network Sharing and Optimization Category Search Insurance Search Hot Trends Item Search Live Commerce 500万Content Matrices 直播/视频 Complicated Entity Relations X00+ Insurances 8k+Fund 80K+Stocks 美股/港股/A股 4k+Agents 100+ Financial Organ. XXX Sections Category Search Section/Con cept Search Fund Search (Price) Hot Trends QA Diverse Queries & Intents 白酒/军工/材料 富/广发/鹏华 人寿险/车险、财产险/健康险 新发/热门/金选 ALiPay APP Automatic Expert Selection for Multi-Scenario and Multi-Task Search, SIGIR 2022

Slide 33

Slide 33 text

Future Trends – One Model Serves ALL Part 3 33 One model serves ALL – Automatically Network Sharing and Optimization • Personalization for each instance • Modeling complex scenarios & tasks • Flexible & Scalable • End-to-End、Low Cost Automatic Expert Selection for Multi-Scenario and Multi-Task Search, SIGIR 2022

Slide 34

Slide 34 text

Future Trends – Go Beyond Two-Tower Part 3 34 REAL Interaction-based Learning – Coupling and Decoupling Expensive but Effective starbucks drinks S … … Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval, WWW 2023

Slide 35

Slide 35 text

Future Trends Part 3 35 REAL Interaction-based Learning – Coupling and Decoupling Expensive but Effective starbucks drinks S … … Can we transfer the capacity of interaction-based learning for inference phase??? Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval, WWW 2023

Slide 36

Slide 36 text

Future Trends Part 3 36 REAL Interaction-based Learning – Coupling and Decoupling Coupling for Training starbucks drinks S … … starbucks drinks S … … Cheap yet Effective Decoupling for Inference Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval, WWW 2023

Slide 37

Slide 37 text

Future Trends Part 3 37 REAL Interaction-based Learning – Coupling and Decoupling Coupling for Training starbucks drinks S … … starbucks drinks S … … Cheap yet Effective Decoupling for Inference How to design an appropriate coupling mechanism to support effective representation learning and easy decoupling afterwards for inference phase? Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval, WWW 2023

Slide 38

Slide 38 text

Future Trends Part 3 38 REAL Interaction-based Learning – Coupling and Decoupling Attribute Fusion Layer Attribute-Aware Learning Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval, WWW 2023

Slide 39

Slide 39 text

Future Trends Part 3 39 REAL Interaction-based Learning – Coupling and Decoupling The most relevant attributes are more diverse for AGREE than a vanilla two-tower solution Without Attribute-Aware Learning Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval, WWW 2023

Slide 40

Slide 40 text

Future Trends Part 3 40 REAL Interaction-based Learning – Coupling and Decoupling The most relevant attributes are more diverse for AGREE than a vanilla two-tower solution Without Attribute-Aware Learning Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval, WWW 2023 More sophisticated coupling and decoupling mechanisms deserve investigation.

Slide 41

Slide 41 text

We need YOU Part 4 41 The Most Beautiful Campus in China

Slide 42

Slide 42 text

We need YOU Part 4 42 The Most Beautiful Campus in China • 国家优秀青年科学基⾦项⽬(海外)岗位 • 固定教职教授 • 固定教职副教授 • 特聘研究员 • 特聘副研究员 Faculty positions in every level are available!

Slide 43

Slide 43 text

Let us QA! [email protected] 43 The End p A SIMPLE overview towards the current progress on candidate matching p LOW LATENCY is a MUST BE (We need ARTs for both effectiveness and efficiency) • Our human beings are GREEDY p Some insights towards the future trends p Hope some of you can join us!