Learning to Rank with Multimodal Data

Learning to Rank with Multimodal Data @ Etsy NYC Machine
Learning Meetup Kamelia Aryafar, Senior Data Scientist, @karyafar Nishan Subedi, Senior Software Engineer, @subedinishan Search Sciences, Etsy July 2016 1

Etsy is a global marketplace where people around the world
connect, both online and offline, to make, sell and buy unique goods. 3

By the Numbers 1.6M active sellers AS OF DECEMBER 31,
2015 24M active buyers AS OF DECEMBER 31, 2015 $2.39B annual GMS IN 2015 35+M items for sale AS OF DECEMBER 31, 2015 Photo by Kirsty-Lyn Jameson DISCLAIMER The statistics included on the following slides are updated quarterly.

819 employees around the world AS OF DECEMBER 31, 2015
9 offices in 7 countries AS OF DECEMBER 31, 2015 Photo by Emily Andrews Work and Culture DISCLAIMER The statistics included on the following slides are updated quarterly.

Large and Unique Seller Base 1.6M active sellers AS OF
SEPTEMBER 30, 2015 95% of sellers run their Etsy shop from home 2014 ETSY SELLER SURVEY 76% consider their shop a business 2014 ETSY SELLER SURVEY Photo by Moira K. Lime DISCLAIMER The statistics included on the following slides are updated quarterly.

Etsy Made in Canada Photo by Jean-Michael Seminaro 24M active
buyers AS OF DECEMBER 31, 2015 92% of buyers agree Etsy offers products they can't find elsewhere 2014 ETSY BUYER SURVEY DISCLAIMER The statistics included on the following slides are updated quarterly.

Learning To Rank 11

Approaches to Learning to Rank • Pointwise - For an
item, predict it’s grade (implicit ordering) - Labels come from interactions with items - Possible class imbalance • Pairwise - Ranking transformed to pairwise classification or regression - Labels depend on ordering of item pair - Ability to create balanced classes • Listwise - Input is entire set of documents associated with query - Output is their ranked list - Eg. Loss is a measure of the distance of ranking generated by the model to the perfect ranking for the set of documents

Pairwise Learning Each training instance represents a pair of items
from same set of search results in your logs. <item1, item2> Learner must learn to order item1 and item2 correctly, with respect to user preference decisions found in your logs.

Features

Label Creation for Pairwise Features {housewarming, gift, photo} - {housewarming,
gift, ceramic, tile} → +1 {housewarming, gift, ceramic, tile } - {housewarming, gift, photo} → -1

Train Classifier (SVM)

Learning to Rank Pipeline

Multimodal Learning to Rank

Image vs. Text Features 20 Texture Shape Color Title Tags

Extracting Image Features 21

title Feature Engineerings Deep Learning 22

ImageNet 23 Photo from : http://www.image-net.org/

Convolutional Neural Nets (CNNs) 24 Photo from: http://cs231n.stanford.edu/

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION Karen Simonyan
& Andrew Zisserman Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank Corey Lynch, Kamelia Aryafar & Josh Attenberg Model Specs: VGGnet

Transfer Learning 26 Images Don’t Lie: Transferring Deep Visual Semantic
Features to Large-Scale Multimodal Learning to Rank Corey Lynch, Kamelia Aryafar & Josh Attenberg

Photo by Corey Lynch

Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale
Multimodal Learning to Rank Corey Lynch, Kamelia Aryafar & Josh Attenberg

32 Images Don’t Lie: Transferring Deep Visual Semantic Features to
Large-Scale Multimodal Learning to Rank Corey Lynch, Kamelia Aryafar & Josh Attenberg

Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale
Multimodal Learning to Rank Corey Lynch, Kamelia Aryafar & Josh Attenberg

Production Pipeline 35

Collecting Item Pairs with Labels 36

Strategies for Model Application Real Time Offline Pros Can handle
unseen items No cost to feature complexity Cons Latency cost ∝ complexity Query time feature computations Computations compound as considerations increase

Real time model evaluation Fetch Model (cache, key-value store) Apply
Ranking (ranking or reranking pass) User Query Top-k results Top-k results Reranked Index Ranking Model

Gaining Confidence

Performance Replays

Ranking Replays

Offline Evaluation Metrics

Model Understanding: Side by Side

Custom Queries & Explain Logs

The future…

Learning to Rank with Multimodal Data

Learning to Rank with Multimodal Data

More Decks by Nishan Subedi

Other Decks in Technology

Featured

Transcript