2017 ICCV-CVF Invited Talk

Exploiting the Web to Understand Fashion Edgar Simo-Serra Octobre 29th,
2017 Waseda University

Talk Overview • Motivation • Dealing with Data • Dataset
Annotation • Fine-Tuning • Approaches that deal with Noise • Modelling Fashion Style • StyleNet • Latent “Look” • Predicting Fashionability • Fashion Datasets • Wrapping Up 1

Motivation “Fashion is the armour to survive everyday life.” —
Bill Cunningham 2

What is Fashion? • Cambridge Dictionary • a style that
is popular at a particular time, especially in clothes, hair, make-up, etc.: • Oxford Dictionary • A popular or the latest style of clothing, hair, decoration, or behaviour. • Merriam-Webster Dictionary a a prevailing custom, usage, or style b (1) :the prevailing style (as in dress) during a particular time. (2) :a garment in such a style c social standing or prominence especially as signalized by dress or conduct 3

Fashion Difficulties • Significant temporal and geographical dependency • Difficulty
in obtaining such data • Limits applicability 4

in obtaining such data • Limits applicability • Highly subjective (multi-modal) • Fashion experts do not agree with each other • Nearly impossible to obtain good annotations • Hard to evaluate 4

in obtaining such data • Limits applicability • Highly subjective (multi-modal) • Fashion experts do not agree with each other • Nearly impossible to obtain good annotations • Hard to evaluate • Fine-grained and dependent on many factors • Must think of combinations • Difference between stockings, leggings, and tights? 4

in obtaining such data • Limits applicability • Highly subjective (multi-modal) • Fashion experts do not agree with each other • Nearly impossible to obtain good annotations • Hard to evaluate • Fine-grained and dependent on many factors • Must think of combinations • Difference between stockings, leggings, and tights? • Extreme class imbalance 4

in obtaining such data • Limits applicability • Highly subjective (multi-modal) • Fashion experts do not agree with each other • Nearly impossible to obtain good annotations • Hard to evaluate • Fine-grained and dependent on many factors • Must think of combinations • Difference between stockings, leggings, and tights? • Extreme class imbalance • Fully supervised learning is nearly impossible 4

The Noisy Web • Web 2.0: Paradigm shift where users
provide content • Social media boom (twitter, instagram, ...) 5

The Noisy Web • Web 2.0: Paradigm shift where users
provide content • Social media boom (twitter, instagram, ...) • The data is incredibly noisy and hard to use! 5

Noisy Data Strategies • Unsupervised Learning • Widest applicability to
data • Lowest performance • Ignores possible labels 6

data • Lowest performance • Ignores possible labels • Weak-label Learning • Attempts to model label noise • Exploits all possible data 6

data • Lowest performance • Ignores possible labels • Weak-label Learning • Attempts to model label noise • Exploits all possible data • Supervised Learning • Highest performance • Requires expensive annotations • Impossible in some cases 6

Dealing with Data

Image Classification • Supervised Learning • Input: image • Output:
class label • Standard Computer Vision/Deep Learning Problem • Makes use of large datasets for performance Krizhevsky et al. ImageNet Classification with Deep Convolutional Neural Networks. NIPS, 2012. 7

ImageNet Dataset • Most famous computer vision dataset • 14,197,122
images • 21,841 categories • Collected from the web and crowd labelled Deng et al. ImageNet: A Large-Scale Hierarchical Image Database. CVPR, 2009. 8

Crowd Sourced Annotations • Supervised learning requires annotated data •
Search queries are very noisy • Use human annotators to clean data 9

Crowd Sourced Annotations Zhou et al. Places: A 10 million
Image Database for Scene Recognition. PAMI, 2017. 9

Importance of Large Datasets • Quantity of data is critical
for performance • Models trained with large data can modified to work with small data Data Performance Deep Learning Traditional Learning 10

Fine-tuning (Transfer Learning) • Train model on available large dataset
... ... ... Input Output Training 11

• Remove last layer and replace with new layer for new task ... ... Input Output New! 11

• Remove last layer and replace with new layer for new task • Continue training with new data • Significantly improve results on small datasets • Use small learning rate to not forget what was learnt • Learning rate is usually stronger on the new layer ... ... ... Input Output Small Dataset Training 11

Modelling Label Noise • Model instances as noise-free, pure random,
or confusing noise Noise Free Pure Random 2% Confusing Noise 7% 91% Noise Free 24% Pure Random 18% Confusing Noise 58% Noise Free Pure Random 5% Confusing Noise 13% 82% Noise Free 31% Pure Random 6% Confusing Noise 63% Tong Xiao, Tian Xia, Yi Yang, Chang Huang, Xiaogang Wang. CVPR, 2015. 12

Modelling Label Noise • Model instances as noise-free, pure random,
or confusing noise • Train model to clean label noise to then train the main model • Requires supervised clean set to learn noise (≈ 50%) Tong Xiao, Tian Xia, Yi Yang, Chang Huang, Xiaogang Wang. CVPR, 2015. 12

Jointly Cleaning and Predicting Labels • Trains one CNN to
correct labels jointly with a prediction CNN • Requires supervised clean set to learn noise (≈ 0 . 5%) cuisine, dish, produce, coconut, food, dim sum food, dessert, xiaolongbao supervision Training sample containing image and noisy labels noisy label set cleaned label set cuisine, dish, food, dim sum food, xiaolongbao CNN as feature extractor label cleaning network multi-label classif er visual features Andreas Veit, Neil Alldrin, Gal Chechik, Ivan Krasin, Abhinav Gupta, Serge Belongie. CVPR, 2017. 13

Jointly Cleaning and Predicting Labels • Trains one CNN to
correct labels jointly with a prediction CNN • Requires supervised clean set to learn noise (≈ 0 . 5%) Convolutional Network Cleanedlabels Legend Predictedlabels ImageClassif er concatenate low dimensional embeddings Label CleaningNetwork Linear Linear Linear Linear Linear Linear Sigmoid Linear Linear + Trainingsamplewithhumanratedlabels identity skip-connection convolutional layer linear layer nogradient propagation linear layer with dimensionality reduction linear layer with dimensionality increase trainingsample fromsetwith only noisy labels d-dimensional vector containing labelsin{0, 1} for eachclass trainingsample fromsetwith humanratedlabels noisy labels noisy labels verif edlabels Trainingsamplewithonly noisy labels Andreas Veit, Neil Alldrin, Gal Chechik, Ivan Krasin, Abhinav Gupta, Serge Belongie. CVPR, 2017. 13

Modelling Fashion Style

Fashion Styles • Used for similarity search and recommendation •
Necessary step for understanding fashion • Subjective, temporal, location-specific, and ambiguous • Large intra-class variability • Constantly in evolution • Not always well defined 14

Fashion Styles • Used for similarity search and recommendation •
Necessary step for understanding fashion • Subjective, temporal, location-specific, and ambiguous • Large intra-class variability • Constantly in evolution • Not always well defined • Recent approaches treat is a weakly-supervised learning problem 14

Modelling Fashion Styles [Simo-Serra et al. CVPR 2016] Mean Hipster
Wars (1,893 images) Mean Fashion144k (277,527 images) Places (2,469,371 images) Mean ImageNet (1,331,167 images) Mean 1. Pre-trained networks limits architecture and application 2. Abundant data with incomplete and noisy user-provided tags 15

Wars (1,893 images) Mean Fashion144k (277,527 images) Places (2,469,371 images) Mean ImageNet (1,331,167 images) Mean 1. Pre-trained networks limits architecture and application 2. Abundant data with incomplete and noisy user-provided tags 3. Problem? Standard training for classification is not robust to noisy data 15

Wars (1,893 images) Mean Fashion144k (277,527 images) Places (2,469,371 images) Mean ImageNet (1,331,167 images) Mean 1. Pre-trained networks limits architecture and application 2. Abundant data with incomplete and noisy user-provided tags 3. Problem? Standard training for classification is not robust to noisy data 4. Solution: Jointly use a ranking loss with a classification loss 15

Objective • Learn compact, discriminative representations of images with Convolutional
Neural Networks • Exploit weak data in the form of incomplete and noisy user-provided tags • Optimize for comparisons with L2 distance Black-Pants Blue-Shirt Gray-Scarf White-Sweater Black-Bag Yellow-Shoes Black-Jeans Brick-Red- Sweater Brown-Bag Dark-Brown-Vest Red-Wedges Aquamarine-Bag Camel-Heels Ivory-Shorts Ivory-Sunglasses Light-Yellow-Hat Turquoise-Blue- Vest Black-Bag Black-Boots Black-Sweater White-Hat White-Shirt Black-Boots Gray-Bag Light-Blue-Jeans Light-Brown- Coat White-Sweater Gray-Sweater 16

Approach Feature CNN Feature CNN Feature CNN Classiﬁer Shared parameters
Dissimilar Similar I Iʴ Iʵ • Define similarity on noisy binary user tags • Build image triplets with anchor image, similar image, and dissimilar image • Ranking loss encourages similar images to have smaller distances than dissimilar images • Classification loss stabilizes and accelerates learning 17

Cleaning the Dataset • Images obtained from chictopia.com • Annotate
3000 images as good/bad (2-3 hours) • Fine-tune VGG network to predict good images • Use network scores to filter data • Fewer images but better quality 18

Implementation • 128-dimension features from whole images • Pre-training with
classification loss only • Batches formed by selecting anchor images and randomly sampling until similar/dissimilar criterion is met Dissimilar Anchor Similar Dissimilar Anchor Similar 19

Results • Trained on Fashion144k dataset [Simo-Serra et al. CVPR,
2015] • Evaluation on Hipsters wars dataset [Kiapour et al. ECCV, 2014] • Task: fashion style prediction Table 1: Similarity search (no training). feature dim. top-1 top-2 top-3 Ours Joint 128 63.5 79.9 86.3 VGG M 4096 53.2 71.7 81.3 VGG 16 4096 53.2 71.5 80.4 VGG M_128 128 44.6 64.0 76.2 VGG 16 Places 4096 40.1 61.0 72.0 Table 2: Linear classifier evaluated on 100 random 9:1 train-test splits. feature params dim. acc. pre. rec. iou Ours Joint 1.6M 128 75.9 75.4 76.5 61.5 Ours Ranking 1.6M 128 74.5 74.2 74.5 59.6 Ours Class. 1.6M 128 73.5 71.7 74.1 57.3 Kiapour et al. 39,168 70.6 70.6 70.4 54.6 VGG M 99M 4096 71.9 72.9 70.9 56.2 VGG 16 134M 4096 70.1 70.5 69.7 54.8 VGG M 128 82M 128 63.5 62.8 63.5 46.3 VGG 16 Places 134M 4096 57.4 57.6 59.4 41.5 20

Visualizing Descriptor Change Ours Fine-tuned VGG M 128 Input Norm
PCA 1 PCA 2 PCA 3 Norm PCA 1 PCA 2 PCA 3 21

Visualizing the Latent Space Black Dress White Spotted Navy Dress
Independent of ethnicity and background Blue Dress Light Pink and White Dress Transition from no pattern to patterns Ampliﬁed Patterns 22

Learning the Latent “Look” [Hsiao and Grauman. ICCV, 2017] •
Learn consistant and diverse styles • Use a polytopic model on localized attributes Query Instance match Label-based Latent looks – low diversity. – inconsistent. – consistent, diverse. 23

PolyTopic Model • Based on Latent Dirichlet Allocation (LDA) •
Latent topics account for a word distribution • PolyTopic model from NLP • Used for aligned corpuses in multiple languages z x ϕ N (outer ) α θ ... ... β z x ϕ N (hosi er y) K (style) M (outf t) 24

Latent topics account for a attribute distribution • PolyTopic model from NLP • Used for aligned attributes of many body regions z x ϕ N (outer ) α θ ... ... β z x ϕ N (hosi er y) K (style) M (outf t) 24

Latent topics account for a attribute distribution • PolyTopic model from NLP • Used for aligned attributes of many body regions • Body regions: outer layer, upper body, lower body, hosiery 24

Obtaining Attributes • New dataset proposed • 195 localized attributes
• 70-600 positives per attributes from google search • 2,000 additional chictopia images as negatives • Total of 18,878 images • Piecewise-training by dividing attributes into 6 groups: material, shape, … • Use RCNN to provide people crops • Train ResNet50 for the different attributes • DeepLab VGG16 with DenseCRF for color and items 25

Results • Using additional supervised data (attributes) is beneficial •
Evaluation on HipsterWars and DeepFashion • Significantly outperforms other approaches • Model allows to naturally mix styles and summarize styles Pinup Pinup Pinup Pinup Goth Goth Goth Goth Preppy Preppy Preppy Preppy Hipster Hipster Hipster Bohemian Bohemian Bohemian Bohemian Preppy 26

Evaluation on HipsterWars and DeepFashion • Significantly outperforms other approaches • Model allows to naturally mix styles and summarize styles HipsterWars DeepFashion Avg AP NMI Avg AP NMI StyleNet 0.39 0.20 0.0501 0.0011 ResNet 0.30 0.16 0.0524 0.0004 Attr-ResNet 0.35 0.18 0.0615 0.0002 Attributes 0.28 / 0.32 0.19 / 0.28 0.0560 / 0.1294 0.0017 / 0.0082 PolyLDA 0.50 / 0.53 0.21 / 0.31 0.0647 / 0.1762 0.0116 / 0.0227 26

Evaluation on HipsterWars and DeepFashion • Significantly outperforms other approaches • Model allows to naturally mix styles and summarize styles 26

Predicting Fashionability

Fashionability • Metric that measures how fashionable a subject is
27

• Does not exist 27

• Does not exist • Exploit metadata as a proxy for fashionability • Jointly consider different factors to predict 27

Predicting Fashionability [Simo-Serra et al. CVPR 2015] • Predict user,
outfit, setting, and fashionability from single image • Afterwards recommend outfit to increase fashionability INPUT IMAGE SETTING Urban USER Female Age ~25 OUTFIT Blouse Skirt Boots Bag Gloves EVALUATION FASHIONABILITY Score 4 RECOMENDATION Blue Jacket (8) 28

Dataset Miss Grey Updated on Nov 16, 2014 Pointed Toe
Suede Boots MIAMASVIN Boots Dark Gray Double Breasted Coat MIAMASVIN Coat Silver Skinny Jeans MIAMASVIN Jeans Tan Chunky Turtleneck Pullover MIAMASVIN Sweater // buy at miamasvin.net // buy at miamasvin.net // buy at miamasvin.net // buy at miamasvin.net 315 6592 chic points http://miamasvin.net miamiyu from Seoul 410 VOTES 8 COMMENTS 82 FAVORITES Chic Brunch Fall Tags reply lizolsen2011 on November 18 via Android So chic! love white jeans for winter! reply EmmaZ on November 18 love the bag! reply lovethemcurves on November 18 I love your clutch! :) reply stylepledge on November 18 lovely look! reply ShellyStuckman on November 17 So lovely! reply AruNeko on November 17 Gorgeous Outfit! So elegant reply hazelkrisferrando on November 17 love reply last_tango_in_paris on November 17 love thzt clutch! Photo Garments Tags Colours Comments Post Details - Votes - Comments - Favourites User Details - Followers - Location Date 29

Dataset 0 2 4 6 8 10 0 5000 10000
15000 20000 Binned Votes 0 1 2 3 4 5 6 7 8 0 2000 4000 6000 8000 Logarithm of Votes 0 200 400 600 800 1000 0 5000 10000 15000 20000 25000 30000 35000 Votes 4 2 0 2 4 0 2000 4000 6000 8000 10000 12000 14000 16000 Time Normalized Votes Logarithm Non-linear Binning Time Normalization 29

Model • Learn feature representation with deep network Fans Personal
Scene Location Singles Colours Garments Comments ΔT Tags Style Softmax • Interpretable results with Conditional Random Field model Fans Personal Location Scene Colours Singles Garments ΔT Comments Style Tags 30

Results Current Outfit: Pink/Black Misc. (5) Recommendations: Pastel Dress (8)
Black/Blue Going out (8) Black Casual (8) Current Outfit: Pink Outfit (3) Recommendations: Heels (8) Pastel Shirts/Skirts (8) Black/Gray Tights/Sweater (5) Current Outfit: Pink/Blue Shoes/Dress Shorts (3) Recommendations: Black/Gray Tights/Sweater (5) Black Casual (5) Black Boots/Tights (5) Current Outfit: Blue with Scarf (3) Recommendations: Heels (8) Pastel Shirts/Skirts (8) Black Casual (8) Current Outfit: Pink/Blue Shoes/Dress Shorts (3) Recommendations: Black Casual (7) Black Heavy (3) Navy and Bags (3) Current Outfit: Formal Blue/Brown (5) Recommendations: Pastel Shirts/Skirts (9) Black/Blue Going out (8) Black Boots/Tights (8) Black Heavy Pastel Shirts/Skirts Shoes and Blue Dress Pink/Black Misc. Heels Black Casual Pink Outfit Shirts and Jeans Blue with Scarf Black with Bag/Glasses Pastel Dress Black/Gray Tights/Sweater Pink/Blue Shoes/Dress/Shorts Bags/Dresses Navy and Bags Brown/Blue Jacket White/Black Blouse/Heels Black Boots/Tights Formal Blue/Brown Black/Blue Going out 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Trimester Los Angeles 20 16 12 8 4 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Trimester Manila 20 16 12 8 4 31

Fashion Datasets

Fashion Datasets Overview • Fashion Segmentation • Fashionista • Attribute
Prediction • DeepFashion • PaperDoll • Fashion550k • Learning the Latent Look • StreetStyle-27k • Fashion Culture Database • Fashion Style • HipsterWars • FashionStyle14 • Fashionability • Fashion144k • Similarity Search • Runway • Street2Shop • Temporal • Amazon 32

Fashion Datasets Overview Dataset Outfit? Store? #Images Annotations DeepFashion 800,000
Weak attributes, pose Paperdoll 339,797 Weak attributes Runway 348,598 Weak attributes, pairs Fashion144k 144,169 Weak attributes, fashionability Fashion550k 402,572 Weak attributes StreetStyle-27k 27,000 Attributes Fashion Culture 76,532,519 Weak location attribute Fashionista 685 Pixel-level annotations HipsterWars 1,893 Style label FashionStyle14 13,126 Style label Street2Shop 39,479 Street and shop pairs Amazon 5,933,184 Weak Attributes, ratings, pairs 33

Fashionista • Per-pixel garment annotation • Per-image labels and pose
obtained by crowd sourcing null shoes shirt jeans hair skin null tights jacket dress hat heels hair skin null shorts blouse bracelet wedges hair skin null shoes top stockings hair skin Kota Yamaguchi, M. Hadi Kiapour, Luis E. Ortiz, Tamara L. Berg. CVPR, 2012. 34

Weak Attribute Datasets • Easily able to obtain large datasets
(chictopia…) • Label quality is problematic • Focus on methodology and scalability • Paperdoll, Fashion144k/550k, Fashion Culture, Amazon, … accessories boots dress jacket sweater bag cardigan heels shorts top boots skirt belt pumps skirt t-shirt f ats necklace shirt skirt belt shirt shoes skirt tights skirt top blazer shoes shorts top skirt belt blazer boots shorts t-shirt belt dress heels jacket shoes shorts bracelet jacket pants shoes top bag blazer boots shorts top accessories blazer shoes shorts top Kota Yamaguchi, M. Hadi Kiapour, Tamara L. Berg. ICCV, 2013. 35

Crowd-sourced Datasets • Hard and expensive to scale in size
• Limited applicability to more simple problems • HipsterWars, Street2Shop, StreetStyle-27k, … 0 100 200 300 400 500 0 20 40 60 80 100 120 Number of Games Played Number of Players M. Hadi Kiapour, Kota Yamaguchi, Alexander C. Berg, Tamara L. Berg. CVPR, 2012. 36

Crowd-sourced Datasets • Hard and expensive to scale in size
• Limited applicability to more simple problems • HipsterWars, Street2Shop, StreetStyle-27k, … M. Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C. Berg, Tamara L. Berg. ICCV, 2015. 36

Expert-curated Datasets • Experts are hard to come by •
Shares problem of crowd-sourced datasets • Important as an evaluation tool • FashionStyle14, … Moeko Takagi, Edgar Simo-Serra, Satoshi Iizuka, Hiroshi Ishikawa. ICCV-CVF, 2017. 37

Fashion MNIST “MNIST should only be used to test if
code is working.” — Twitter people • Drop-in replacement for MNIST Han Xiao, Kashif Rasul, Roland Vollgraf. arXiv, 2017. 38

Wrapping Up

Conclusion • In computer vision for fashion, data is king
• Lots of data out there and growing! • Plagued by noise, class imbalance, … 39

Conclusion • In computer vision for fashion, data is king
• Lots of data out there and growing! • Plagued by noise, class imbalance, … • Important directions • Unsupervised / Weakly-supervised learning • Attributes • Dealing with multi-modality • High quality datasets (evaluation) 39

Questions? Thanks for your attention! 40

2017 ICCV-CVF Invited Talk

2017 ICCV-CVF Invited Talk

More Decks by シモセラ エドガー

Other Decks in Research

Featured

Transcript

More Decks by シモセラエドガー