Learning to Rank with
Multimodal Data @ Etsy
NYC Machine Learning Meetup
Kamelia Aryafar, Senior Data Scientist, @karyafar
Nishan Subedi, Senior Software Engineer, @subedinishan
Search Sciences, Etsy
July 2016
1
Slide 2
Slide 2 text
Etsy
Slide 3
Slide 3 text
Etsy is a global marketplace where people around the world connect,
both online and offline, to make, sell and buy unique goods.
3
Slide 4
Slide 4 text
By the Numbers
1.6M
active sellers
AS OF DECEMBER 31, 2015
24M
active buyers
AS OF DECEMBER 31, 2015
$2.39B
annual GMS
IN 2015
35+M
items for sale
AS OF DECEMBER 31, 2015
Photo by Kirsty-Lyn Jameson
DISCLAIMER
The statistics included
on the following slides
are updated quarterly.
Slide 5
Slide 5 text
819
employees around
the world
AS OF DECEMBER 31, 2015
9
offices in
7 countries
AS OF DECEMBER 31, 2015
Photo by Emily Andrews
Work and Culture
DISCLAIMER
The statistics included
on the following slides
are updated quarterly.
Slide 6
Slide 6 text
Large and Unique Seller Base
1.6M
active sellers
AS OF SEPTEMBER 30, 2015
95%
of sellers run their
Etsy shop from home
2014 ETSY SELLER SURVEY
76%
consider their
shop a business
2014 ETSY SELLER SURVEY
Photo by Moira K. Lime
DISCLAIMER
The statistics included
on the following slides
are updated quarterly.
Slide 7
Slide 7 text
Etsy Made in Canada
Photo by Jean-Michael Seminaro
24M
active buyers
AS OF DECEMBER 31, 2015
92%
of buyers agree
Etsy
offers products
they can't
find elsewhere
2014 ETSY BUYER SURVEY
DISCLAIMER
The statistics included
on the following slides
are updated quarterly.
Slide 8
Slide 8 text
8
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
Learning To Rank
11
Slide 12
Slide 12 text
Approaches to Learning to Rank
• Pointwise
- For an item, predict it’s grade (implicit ordering)
- Labels come from interactions with items
- Possible class imbalance
• Pairwise
- Ranking transformed to pairwise classification or regression
- Labels depend on ordering of item pair
- Ability to create balanced classes
• Listwise
- Input is entire set of documents associated with query
- Output is their ranked list
- Eg. Loss is a measure of the distance of ranking generated by the model to the perfect ranking for the
set of documents
Slide 13
Slide 13 text
Pairwise Learning
Each training instance represents a pair of items from same set of search
results in your logs.
Learner must learn to order item1 and item2 correctly, with respect to user
preference decisions found in your logs.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Karen Simonyan & Andrew Zisserman
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg
Model Specs: VGGnet
Slide 26
Slide 26 text
Transfer Learning
26
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg
Slide 27
Slide 27 text
Photo by Corey Lynch
Slide 28
Slide 28 text
Photo by Corey Lynch
Slide 29
Slide 29 text
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg
Slide 30
Slide 30 text
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg
Slide 31
Slide 31 text
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg
Slide 32
Slide 32 text
32
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg
Slide 33
Slide 33 text
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg
Slide 34
Slide 34 text
Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
Corey Lynch, Kamelia Aryafar & Josh Attenberg
Slide 35
Slide 35 text
Production Pipeline
35
Slide 36
Slide 36 text
Collecting Item Pairs with Labels
36
Slide 37
Slide 37 text
Strategies for Model Application
Real Time Offline
Pros Can handle unseen items No cost to feature
complexity
Cons Latency cost ∝ complexity
Query time feature
computations
Computations compound as
considerations increase
Slide 38
Slide 38 text
Real time model evaluation
Fetch Model
(cache, key-value store)
Apply Ranking
(ranking or reranking pass)
User Query Top-k results
Top-k results
Reranked
Index Ranking Model