$30 off During Our Annual Pro Sale. View Details »

How Shopify used Ray<>Tensorflow to build a Product Hierarchical Categorization model to auto classify billions of products using NLP and Computer Vision,

How Shopify used Ray<>Tensorflow to build a Product Hierarchical Categorization model to auto classify billions of products using NLP and Computer Vision,

Organizing products using structured metadata is crucial in online retail. This metadata is usually needed by many downstream applications including search and discovery, trust and safety, analytics and reporting among others. At Shopify we like to make the commerce journey as easy as possible for our merchants and one part of this is using Machine Learning to predict the product category for the billions of products that our merchants sell.

We will look at how we solved this problem using transfer learning through Natural Language Processing and Computer vision to create a hierarchical classification Deep Neural Network to categorize products into a hierarchical tree taxonomy. We will dig deeper into modeling challenges and how we came up with specific architecture decisions. We will then dive into how Ray and other tool choices made this work at Shopify Scale. The talk will cover how we continuously monitor the performance of the model using both ML as well as business metrics and how this leads into a feedback mechanism that results in better models.

Finally we will talk about how all of this was built keeping merchant success front and center of all the product as well as technical decisions we made by talking about different features that are built on top of this model that have benefited our merchants.

Anyscale
PRO

January 30, 2023
Tweet

More Decks by Anyscale

Other Decks in Programming

Transcript

  1. Hierarchical Product
    Classification @Shopify
    Kshetrajna Raghavan
    Data Science @ Shopify

    View Slide

  2. Overview
    ● What does Product Understanding mean @Shopify ?
    ● Problem Formulation and Model architecture.
    ● Infrastructure and Scaling
    ● How merchants interact with ML and feedback loops
    ● What next?

    View Slide

  3. What is Product Understanding?
    Build a product metadata identification service that
    effectively extracts useful product information for
    downstream applications or analytical purposes.

    View Slide

  4. Walk down memory lane …

    View Slide

  5. Category: Dog Bed
    Color: Gray
    Shape: Rectangular
    Fill Material: Memory Foam
    *
    *
    *

    View Slide

  6. Problem Formulation: Category Prediction
    Taxonomy /
    Controlled Lists
    Machine Learning to
    categorize products

    View Slide

  7. Google Product Taxonomy

    View Slide

  8. Google Product Taxonomy

    View Slide

  9. Model Architecture
    ● Multi Task - Multi Class Learning:
    ○ Each level of the taxonomy is a separate learning task.
    ○ Each task is a multi class classification problem.
    ● Transfer Learning: Pretrained models for both text and image features.
    ○ Multi lingual BERT for text
    ○ MobileNet V2 for Images
    ● Subclass Model Architecture to make different pieces of the model deployable either as a
    standalone model or in combination with other models together.
    ● Tensorflow Transform to perform stateful transformations during preprocessing.

    View Slide

  10. Model Architecture: Training
    ● Parent nodes help child node predictions.
    ● 7 levels/tasks in total spanning over 5500
    nodes
    ● Data parallelism using distributed
    Tensorflow across multiple machines/GPUs
    ● Uses Shopify’s ML platform which is built on
    Google Cloud Platform.
    ● Taxonomy unaware during training!

    View Slide

  11. Inference Scaling Requirements
    ● Shopify has multiple billions of products historically.
    ● Tens of billions of images
    ● Tens of million of products created/updated daily.
    ● Multiple downstream consumers. Ex: Search, Product Sync, Admin Page, Analytics.
    ● Real time , Streaming and Batch applications.

    View Slide

  12. Model Architecture: Inference
    Model Image_Text2Pred
    Input: Raw Text & Raw Image
    Output: Prediction per level
    Model: Image_Text2Prob
    Input: Raw Text & Raw Image
    Output: All Probabilities
    Model: Text2Emb
    Input: Raw Text
    Output: Text Emb
    Model: Image2Emb
    Input: Raw Image
    Output: Img Emb
    Model: Emb2Pred
    Input: Img, Text Emb
    Output:Prediction per level
    Model: Emb2Prob
    Input: Img,Text Emb
    Output: All Probabilities
    Model: Prob2Pred
    Input: All Prob
    Output: Prediction per
    level

    View Slide

  13. Taxonomy Structure During Inference
    ● Raw predictions contain confidence scores for every node in the taxonomy.
    ● Greedy prediction logic with thresholding to enforce the taxonomy structure.
    Raw Predictions:
    Array on confidence
    scores/per level
    Choose level 1
    prediction with
    highest score
    Filter to only
    immediate
    descendants of
    level 1 predicted
    category
    Choose level 2
    descendant with
    highest score
    Continue till hitting
    Leaf Node or Level
    7

    View Slide

  14. Infrastructure/Tools

    View Slide

  15. ML from Notebooks to Prod

    View Slide

  16. Ray Train for Training
    Ray Actor Pools for
    Batch Inference
    Microservices for
    Online Inference

    View Slide

  17. Let's continue that story from before ..

    View Slide

  18. Metrics
    ● Model Metrics:
    ○ Hierarchical Precision
    ○ Hierarchical Recall
    ○ Coverage
    ○ Relative Lowest common ancestor
    ● Product Metrics:
    ○ Adoption
    ○ Merchant acceptance

    View Slide

  19. What next ?
    ● Evolve and adapt the taxonomy
    ● Varying logic for probability to prediction for different downstream consumers.
    ● Automate threshold tuning based on merchant feedback.

    View Slide

  20. Vision and Next Steps
    ● Develop connected taxonomies of categories and attributes.
    Category: Electric Guitar
    Attributes:
    Color: {Red, Green, Blue, Black, Brown}
    # Strings: {6,7,12}
    Fretboard Material: {Rosewood, Ebony , Maple}
    Category: Refrigerator
    Attributes:
    Color: {Black, White, Stainless Steel}
    Door Type: {French, Side by Side, Top Freezer}
    Lock Type: {Electronic, Manual}
    ● Expand model to infer additional attributes.

    View Slide

  21. thank you

    View Slide