Learning to Rank with Multimodal Data

Slide 1

Slide 1 text

Learning to Rank with Multimodal Data @ Etsy NYC Machine Learning Meetup Kamelia Aryafar, Senior Data Scientist, @karyafar Nishan Subedi, Senior Software Engineer, @subedinishan Search Sciences, Etsy July 2016 1

Slide 2

Slide 2 text

Etsy

Slide 3

Slide 3 text

Etsy is a global marketplace where people around the world connect, both online and offline, to make, sell and buy unique goods. 3

Slide 4

Slide 4 text

By the Numbers 1.6M active sellers AS OF DECEMBER 31, 2015 24M active buyers AS OF DECEMBER 31, 2015 $2.39B annual GMS IN 2015 35+M items for sale AS OF DECEMBER 31, 2015 Photo by Kirsty-Lyn Jameson DISCLAIMER The statistics included on the following slides are updated quarterly.

Slide 5

Slide 5 text

819 employees around the world AS OF DECEMBER 31, 2015 9 offices in 7 countries AS OF DECEMBER 31, 2015 Photo by Emily Andrews Work and Culture DISCLAIMER The statistics included on the following slides are updated quarterly.

Slide 6

Slide 6 text

Large and Unique Seller Base 1.6M active sellers AS OF SEPTEMBER 30, 2015 95% of sellers run their Etsy shop from home 2014 ETSY SELLER SURVEY 76% consider their shop a business 2014 ETSY SELLER SURVEY Photo by Moira K. Lime DISCLAIMER The statistics included on the following slides are updated quarterly.

Slide 7

Slide 7 text

Etsy Made in Canada Photo by Jean-Michael Seminaro 24M active buyers AS OF DECEMBER 31, 2015 92% of buyers agree Etsy offers products they can't find elsewhere 2014 ETSY BUYER SURVEY DISCLAIMER The statistics included on the following slides are updated quarterly.

Slide 8

Slide 8 text

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Learning To Rank 11

Slide 12

Slide 12 text

Approaches to Learning to Rank • Pointwise - For an item, predict it’s grade (implicit ordering) - Labels come from interactions with items - Possible class imbalance • Pairwise - Ranking transformed to pairwise classification or regression - Labels depend on ordering of item pair - Ability to create balanced classes • Listwise - Input is entire set of documents associated with query - Output is their ranked list - Eg. Loss is a measure of the distance of ranking generated by the model to the perfect ranking for the set of documents

Slide 13

Slide 13 text

Pairwise Learning Each training instance represents a pair of items from same set of search results in your logs. Learner must learn to order item1 and item2 correctly, with respect to user preference decisions found in your logs.

Slide 14

Slide 14 text

Features

Slide 15

Slide 15 text

Label Creation for Pairwise Features {housewarming, gift, photo} - {housewarming, gift, ceramic, tile} → +1 {housewarming, gift, ceramic, tile } - {housewarming, gift, photo} → -1

Slide 16

Slide 16 text

Train Classifier (SVM)

Slide 17

Slide 17 text

Learning to Rank Pipeline

Slide 18

Slide 18 text

Multimodal Learning to Rank

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Image vs. Text Features 20 Texture Shape Color Title Tags

Slide 21

Slide 21 text

Extracting Image Features 21

Slide 22

Slide 22 text

title Feature Engineerings Deep Learning 22

Slide 23

Slide 23 text

ImageNet 23 Photo from : http://www.image-net.org/

Slide 24

Slide 24 text

Convolutional Neural Nets (CNNs) 24 Photo from: http://cs231n.stanford.edu/

Slide 25

Slide 25 text

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION Karen Simonyan & Andrew Zisserman Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank Corey Lynch, Kamelia Aryafar & Josh Attenberg Model Specs: VGGnet

Slide 26

Slide 26 text

Transfer Learning 26 Images Don’t Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank Corey Lynch, Kamelia Aryafar & Josh Attenberg

Slide 27

Slide 27 text

Photo by Corey Lynch

Slide 28