Slide 1

Slide 1 text

Hierarchical Product Classification @Shopify Kshetrajna Raghavan Data Science @ Shopify

Slide 2

Slide 2 text

Overview ● What does Product Understanding mean @Shopify ? ● Problem Formulation and Model architecture. ● Infrastructure and Scaling ● How merchants interact with ML and feedback loops ● What next?

Slide 3

Slide 3 text

What is Product Understanding? Build a product metadata identification service that effectively extracts useful product information for downstream applications or analytical purposes.

Slide 4

Slide 4 text

Walk down memory lane …

Slide 5

Slide 5 text

Category: Dog Bed Color: Gray Shape: Rectangular Fill Material: Memory Foam * * *

Slide 6

Slide 6 text

Problem Formulation: Category Prediction Taxonomy / Controlled Lists Machine Learning to categorize products

Slide 7

Slide 7 text

Google Product Taxonomy

Slide 8

Slide 8 text

Google Product Taxonomy

Slide 9

Slide 9 text

Model Architecture ● Multi Task - Multi Class Learning: ○ Each level of the taxonomy is a separate learning task. ○ Each task is a multi class classification problem. ● Transfer Learning: Pretrained models for both text and image features. ○ Multi lingual BERT for text ○ MobileNet V2 for Images ● Subclass Model Architecture to make different pieces of the model deployable either as a standalone model or in combination with other models together. ● Tensorflow Transform to perform stateful transformations during preprocessing.

Slide 10

Slide 10 text

Model Architecture: Training ● Parent nodes help child node predictions. ● 7 levels/tasks in total spanning over 5500 nodes ● Data parallelism using distributed Tensorflow across multiple machines/GPUs ● Uses Shopify’s ML platform which is built on Google Cloud Platform. ● Taxonomy unaware during training!

Slide 11

Slide 11 text

Inference Scaling Requirements ● Shopify has multiple billions of products historically. ● Tens of billions of images ● Tens of million of products created/updated daily. ● Multiple downstream consumers. Ex: Search, Product Sync, Admin Page, Analytics. ● Real time , Streaming and Batch applications.

Slide 12

Slide 12 text

Model Architecture: Inference Model Image_Text2Pred Input: Raw Text & Raw Image Output: Prediction per level Model: Image_Text2Prob Input: Raw Text & Raw Image Output: All Probabilities Model: Text2Emb Input: Raw Text Output: Text Emb Model: Image2Emb Input: Raw Image Output: Img Emb Model: Emb2Pred Input: Img, Text Emb Output:Prediction per level Model: Emb2Prob Input: Img,Text Emb Output: All Probabilities Model: Prob2Pred Input: All Prob Output: Prediction per level

Slide 13

Slide 13 text

Taxonomy Structure During Inference ● Raw predictions contain confidence scores for every node in the taxonomy. ● Greedy prediction logic with thresholding to enforce the taxonomy structure. Raw Predictions: Array on confidence scores/per level Choose level 1 prediction with highest score Filter to only immediate descendants of level 1 predicted category Choose level 2 descendant with highest score Continue till hitting Leaf Node or Level 7

Slide 14

Slide 14 text

Infrastructure/Tools

Slide 15

Slide 15 text

ML from Notebooks to Prod

Slide 16

Slide 16 text

Ray Train for Training Ray Actor Pools for Batch Inference Microservices for Online Inference

Slide 17

Slide 17 text

Let's continue that story from before ..

Slide 18

Slide 18 text

Metrics ● Model Metrics: ○ Hierarchical Precision ○ Hierarchical Recall ○ Coverage ○ Relative Lowest common ancestor ● Product Metrics: ○ Adoption ○ Merchant acceptance

Slide 19

Slide 19 text

What next ? ● Evolve and adapt the taxonomy ● Varying logic for probability to prediction for different downstream consumers. ● Automate threshold tuning based on merchant feedback.

Slide 20

Slide 20 text

Vision and Next Steps ● Develop connected taxonomies of categories and attributes. Category: Electric Guitar Attributes: Color: {Red, Green, Blue, Black, Brown} # Strings: {6,7,12} Fretboard Material: {Rosewood, Ebony , Maple} Category: Refrigerator Attributes: Color: {Black, White, Stainless Steel} Door Type: {French, Side by Side, Top Freezer} Lock Type: {Electronic, Manual} ● Expand model to infer additional attributes.

Slide 21

Slide 21 text

thank you