Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Scaling The E-Commerce Recommendation System

Slide 3

Slide 3 text

01 02 03 04 Multi-Stage Recommender Retrieval Ranking Challenges in LINE SHOPPING CONTENT 05 Re-rank 06 Model Training

Slide 4

Slide 4 text

Arthur Huang LINE Taiwan Machine Learning Engineer Work Experience • LINE Taiwan MLE (2021~Now) • SHOPLINE DE (2019~2021)

Slide 5

Slide 5 text

Challenges in LINE SHOPPING SECTION 01

Slide 6

Slide 6 text

Challenges in LINE SHOPPING 特點項目文字 特點項目 999 特點項目文字 特點項目 Complex Scenario Huge Item • More than 20 types of recommendations. • More than millions of products.

Slide 7

Slide 7 text

SECTION 02 Multi-Stage Recommender

Slide 8

Slide 8 text

Multi-Stage Recommender Item Corpus • Quickly retrieve users' interested items. Ranking Re-rank millions hundreds dozens dozens Recommended Items • Ranking based on user behavior in the module. Ranking by Diversity, Freshness Business Logic. Retrieval • Ranking by Diversity, Freshness, Business Logic millions hundreds dozens dozens

Slide 9

Slide 9 text

SECTION 03 Retrieval Quickly retrieve users' interested items.

Slide 10

Slide 10 text

Retrieval - Training Two-Tower Model • Learning User-Item Embeddings • Target • Positive:Clicked Items • Negative:In-batch negative sampling

Slide 11

Slide 11 text

In-Batch Negative Sampling Click Click Click . . . User Ite m Positive Item

Slide 12

Slide 12 text

In-Batch Negative Sampling Click Click Click . . . User Ite m Negative Item

Slide 13

Slide 13 text

Feature Engineering Example : Spotify Million Playlist Dataset

Slide 14

Slide 14 text

Feature Engineering Example : Spotify Million Playlist Dataset • Numeric Feature • Normalization • Power Transform • Wilson Score Interval (e.g. CTR) • Categorical Feature • One-Hot Encoding • Label Encoding + Embedding Layer • e.g. User ID, Item ID • Feature Hashing • Ordinal Encoding • Frequency Encoding • Text Feature • Bert Encoding

Slide 15

Slide 15 text

Feature Engineering • Embedding Layer • Parameters Size = num_embeddings × embedding_dim • Shared Embedding • Reduce Parameters Size

Slide 16

Slide 16 text

Quickly retrieval user's interested items. Retrieval - Inference

Slide 17

Slide 17 text

Online Serving Retrieval - Inference Online Offline

Slide 18

Slide 18 text

Item2Item Retrieval - Inference

Slide 19

Slide 19 text

SECTION 04 Ranking Ranking based on user behavior in the module.

Slide 20

Slide 20 text

Ranking - Training Deep Ranking Network • Learning the probability of click event. • Target (Focus on Module Interaction) • Positive : Click • Negative : Impression but no click. Ranking based on user behavior in the module.

Slide 21

Slide 21 text

Ranking - Target Positive Negative Negative Negative

Slide 22

Slide 22 text

Why can't we use items that were impression but not clicked as negative samples during retrieval? Item Corpus Ranking Re-rank millions hundreds dozens dozens Recommended Items Retrieval • Interest: Click • No Interest: Almost Item Corpus • Very Interest: Click • Interest: Impression but not Click

Slide 23

Slide 23 text

Ranking - Inference Batch Inference using PySpark

Slide 24

Slide 24 text

Ranking - Inference Distributed Model Inference

Slide 25

Slide 25 text

SECTION 05 Re-rank Ranking by Diversity, Freshness, Business Logic

Slide 26

Slide 26 text

Rerank Diversity Freshness • Do not show items form the same category in a sequence. • Promote fresher items. Business Logic • Promotion / Holiday Campagin • Product Profit

Slide 27

Slide 27 text

SECTION 06 Model Training

Slide 28

Slide 28 text

Petastorm Efficient reading large datasets

Slide 29

Slide 29 text

Petastorm Shuffling

Slide 30

Slide 30 text

MLflow Experiment Tracking + Model Registry

Slide 31

Slide 31 text

Pytorch Lighting

Slide 32

Slide 32 text

Airflow

Slide 33

Slide 33 text

No content