Scaling The E-Commerce Recommendation System

Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Scaling The E-Commerce Recommendation System

Slide 3

Slide 3 text

01 02 03 04 Multi-Stage Recommender Retrieval Ranking Challenges in LINE SHOPPING CONTENT 05 Re-rank 06 Model Training

Slide 4

Slide 4 text

Arthur Huang LINE Taiwan Machine Learning Engineer Work Experience • LINE Taiwan MLE (2021~Now) • SHOPLINE DE (2019~2021)

Slide 5

Slide 5 text

Challenges in LINE SHOPPING SECTION 01

Slide 6

Slide 6 text

Challenges in LINE SHOPPING 特點項目文字特點項目 999 特點項目文字特點項目 Complex Scenario Huge Item • More than 20 types of recommendations. • More than millions of products.

Slide 7

Slide 7 text

SECTION 02 Multi-Stage Recommender

Slide 8

Slide 8 text

Multi-Stage Recommender Item Corpus • Quickly retrieve users' interested items. Ranking Re-rank millions hundreds dozens dozens Recommended Items • Ranking based on user behavior in the module. Ranking by Diversity, Freshness Business Logic. Retrieval • Ranking by Diversity, Freshness, Business Logic millions hundreds dozens dozens

Slide 9

Slide 9 text

SECTION 03 Retrieval Quickly retrieve users' interested items.

Slide 10

Slide 10 text

Retrieval - Training Two-Tower Model • Learning User-Item Embeddings • Target • Positive：Clicked Items • Negative：In-batch negative sampling

Slide 11

Slide 11 text

In-Batch Negative Sampling Click Click Click . . . User Ite m Positive Item

Slide 12

Slide 12 text

In-Batch Negative Sampling Click Click Click . . . User Ite m Negative Item

Slide 13

Slide 13 text

Feature Engineering Example : Spotify Million Playlist Dataset

Slide 14

Slide 14 text

Feature Engineering Example : Spotify Million Playlist Dataset • Numeric Feature • Normalization • Power Transform • Wilson Score Interval (e.g. CTR) • Categorical Feature • One-Hot Encoding • Label Encoding + Embedding Layer • e.g. User ID, Item ID • Feature Hashing • Ordinal Encoding • Frequency Encoding • Text Feature • Bert Encoding

Slide 15

Slide 15 text

Feature Engineering • Embedding Layer • Parameters Size = num_embeddings × embedding_dim • Shared Embedding • Reduce Parameters Size

Slide 16

Slide 16 text

Quickly retrieval user's interested items. Retrieval - Inference

Slide 17

Slide 17 text

Online Serving Retrieval - Inference Online Offline

Slide 18

Slide 18 text

Item2Item Retrieval - Inference

Slide 19

Slide 19 text

SECTION 04 Ranking Ranking based on user behavior in the module.

Slide 20

Slide 20 text

Ranking - Training Deep Ranking Network • Learning the probability of click event. • Target (Focus on Module Interaction) • Positive : Click • Negative : Impression but no click. Ranking based on user behavior in the module.

Slide 21

Slide 21 text

Ranking - Target Positive Negative Negative Negative

Slide 22

Slide 22 text

Why can't we use items that were impression but not clicked as negative samples during retrieval? Item Corpus Ranking Re-rank millions hundreds dozens dozens Recommended Items Retrieval • Interest: Click • No Interest: Almost Item Corpus • Very Interest: Click • Interest: Impression but not Click

Slide 23

Slide 23 text

Ranking - Inference Batch Inference using PySpark

Slide 24

Slide 24 text

Ranking - Inference Distributed Model Inference

Slide 25

Slide 25 text

SECTION 05 Re-rank Ranking by Diversity, Freshness, Business Logic

Slide 26

Slide 26 text

Rerank Diversity Freshness • Do not show items form the same category in a sequence. • Promote fresher items. Business Logic • Promotion / Holiday Campagin • Product Profit