is popular at a particular time, especially in clothes, hair, make-up, etc.: • Oxford Dictionary • A popular or the latest style of clothing, hair, decoration, or behaviour. • Merriam-Webster Dictionary a a prevailing custom, usage, or style b (1) :the prevailing style (as in dress) during a particular time. (2) :a garment in such a style c social standing or prominence especially as signalized by dress or conduct 3
is popular at a particular time, especially in clothes, hair, make-up, etc.: • Oxford Dictionary • A popular or the latest style of clothing, hair, decoration, or behaviour. • Merriam-Webster Dictionary a a prevailing custom, usage, or style b (1) :the prevailing style (as in dress) during a particular time. (2) :a garment in such a style c social standing or prominence especially as signalized by dress or conduct 3
is popular at a particular time, especially in clothes, hair, make-up, etc.: • Oxford Dictionary • A popular or the latest style of clothing, hair, decoration, or behaviour. • Merriam-Webster Dictionary a a prevailing custom, usage, or style b (1) :the prevailing style (as in dress) during a particular time. (2) :a garment in such a style c social standing or prominence especially as signalized by dress or conduct 3
in obtaining such data • Limits applicability • Highly subjective (multi-modal) • Fashion experts do not agree with each other • Nearly impossible to obtain good annotations • Hard to evaluate 4
in obtaining such data • Limits applicability • Highly subjective (multi-modal) • Fashion experts do not agree with each other • Nearly impossible to obtain good annotations • Hard to evaluate • Fine-grained and dependent on many factors • Must think of combinations • Difference between stockings, leggings, and tights? 4
in obtaining such data • Limits applicability • Highly subjective (multi-modal) • Fashion experts do not agree with each other • Nearly impossible to obtain good annotations • Hard to evaluate • Fine-grained and dependent on many factors • Must think of combinations • Difference between stockings, leggings, and tights? • Extreme class imbalance 4
in obtaining such data • Limits applicability • Highly subjective (multi-modal) • Fashion experts do not agree with each other • Nearly impossible to obtain good annotations • Hard to evaluate • Fine-grained and dependent on many factors • Must think of combinations • Difference between stockings, leggings, and tights? • Extreme class imbalance • Fully supervised learning is nearly impossible 4
data • Lowest performance • Ignores possible labels • Weak-label Learning • Attempts to model label noise • Exploits all possible data • Supervised Learning • Highest performance • Requires expensive annotations • Impossible in some cases 6
class label • Standard Computer Vision/Deep Learning Problem • Makes use of large datasets for performance Krizhevsky et al. ImageNet Classification with Deep Convolutional Neural Networks. NIPS, 2012. 7
class label • Standard Computer Vision/Deep Learning Problem • Makes use of large datasets for performance Krizhevsky et al. ImageNet Classification with Deep Convolutional Neural Networks. NIPS, 2012. 7
• Remove last layer and replace with new layer for new task • Continue training with new data • Significantly improve results on small datasets • Use small learning rate to not forget what was learnt • Learning rate is usually stronger on the new layer ... ... ... Input Output Small Dataset Training 11
or confusing noise • Train model to clean label noise to then train the main model • Requires supervised clean set to learn noise (≈ 50%) Tong Xiao, Tian Xia, Yi Yang, Chang Huang, Xiaogang Wang. CVPR, 2015. 12
correct labels jointly with a prediction CNN • Requires supervised clean set to learn noise (≈ 0 . 5%) cuisine, dish, produce, coconut, food, dim sum food, dessert, xiaolongbao supervision Training sample containing image and noisy labels noisy label set cleaned label set cuisine, dish, food, dim sum food, xiaolongbao CNN as feature extractor label cleaning network multi-label classif er visual features Andreas Veit, Neil Alldrin, Gal Chechik, Ivan Krasin, Abhinav Gupta, Serge Belongie. CVPR, 2017. 13
correct labels jointly with a prediction CNN • Requires supervised clean set to learn noise (≈ 0 . 5%) Convolutional Network Cleanedlabels Legend Predictedlabels ImageClassif er concatenate low dimensional embeddings Label CleaningNetwork Linear Linear Linear Linear Linear Linear Sigmoid Linear Linear + Trainingsamplewithhumanratedlabels identity skip-connection convolutional layer linear layer nogradient propagation linear layer with dimensionality reduction linear layer with dimensionality increase trainingsample fromsetwith only noisy labels d-dimensional vector containing labelsin{0, 1} for eachclass trainingsample fromsetwith humanratedlabels noisy labels noisy labels verif edlabels Trainingsamplewithonly noisy labels Andreas Veit, Neil Alldrin, Gal Chechik, Ivan Krasin, Abhinav Gupta, Serge Belongie. CVPR, 2017. 13
Necessary step for understanding fashion • Subjective, temporal, location-specific, and ambiguous • Large intra-class variability • Constantly in evolution • Not always well defined 14
Necessary step for understanding fashion • Subjective, temporal, location-specific, and ambiguous • Large intra-class variability • Constantly in evolution • Not always well defined • Recent approaches treat is a weakly-supervised learning problem 14
Wars (1,893 images) Mean Fashion144k (277,527 images) Places (2,469,371 images) Mean ImageNet (1,331,167 images) Mean 1. Pre-trained networks limits architecture and application 2. Abundant data with incomplete and noisy user-provided tags 15
Wars (1,893 images) Mean Fashion144k (277,527 images) Places (2,469,371 images) Mean ImageNet (1,331,167 images) Mean 1. Pre-trained networks limits architecture and application 2. Abundant data with incomplete and noisy user-provided tags 3. Problem? Standard training for classification is not robust to noisy data 15
Wars (1,893 images) Mean Fashion144k (277,527 images) Places (2,469,371 images) Mean ImageNet (1,331,167 images) Mean 1. Pre-trained networks limits architecture and application 2. Abundant data with incomplete and noisy user-provided tags 3. Problem? Standard training for classification is not robust to noisy data 4. Solution: Jointly use a ranking loss with a classification loss 15
Dissimilar Similar I Iʴ Iʵ • Define similarity on noisy binary user tags • Build image triplets with anchor image, similar image, and dissimilar image • Ranking loss encourages similar images to have smaller distances than dissimilar images • Classification loss stabilizes and accelerates learning 17
3000 images as good/bad (2-3 hours) • Fine-tune VGG network to predict good images • Use network scores to filter data • Fewer images but better quality 18
classification loss only • Batches formed by selecting anchor images and randomly sampling until similar/dissimilar criterion is met Dissimilar Anchor Similar Dissimilar Anchor Similar 19
Learn consistant and diverse styles • Use a polytopic model on localized attributes Query Instance match Label-based Latent looks – low diversity. – inconsistent. – consistent, diverse. 23
Latent topics account for a word distribution • PolyTopic model from NLP • Used for aligned corpuses in multiple languages z x ϕ N (outer ) α θ ... ... β z x ϕ N (hosi er y) K (style) M (outf t) 24
Latent topics account for a attribute distribution • PolyTopic model from NLP • Used for aligned attributes of many body regions z x ϕ N (outer ) α θ ... ... β z x ϕ N (hosi er y) K (style) M (outf t) 24
Latent topics account for a attribute distribution • PolyTopic model from NLP • Used for aligned attributes of many body regions • Body regions: outer layer, upper body, lower body, hosiery 24
• 70-600 positives per attributes from google search • 2,000 additional chictopia images as negatives • Total of 18,878 images • Piecewise-training by dividing attributes into 6 groups: material, shape, … • Use RCNN to provide people crops • Train ResNet50 for the different attributes • DeepLab VGG16 with DenseCRF for color and items 25
outfit, setting, and fashionability from single image • Afterwards recommend outfit to increase fashionability INPUT IMAGE SETTING Urban USER Female Age ~25 OUTFIT Blouse Skirt Boots Bag Gloves EVALUATION FASHIONABILITY Score 4 RECOMENDATION Blue Jacket (8) 28
Suede Boots MIAMASVIN Boots Dark Gray Double Breasted Coat MIAMASVIN Coat Silver Skinny Jeans MIAMASVIN Jeans Tan Chunky Turtleneck Pullover MIAMASVIN Sweater // buy at miamasvin.net // buy at miamasvin.net // buy at miamasvin.net // buy at miamasvin.net 315 6592 chic points http://miamasvin.net miamiyu from Seoul 410 VOTES 8 COMMENTS 82 FAVORITES Chic Brunch Fall Tags reply lizolsen2011 on November 18 via Android So chic! love white jeans for winter! reply EmmaZ on November 18 love the bag! reply lovethemcurves on November 18 I love your clutch! :) reply stylepledge on November 18 lovely look! reply ShellyStuckman on November 17 So lovely! reply AruNeko on November 17 Gorgeous Outfit! So elegant reply hazelkrisferrando on November 17 love reply last_tango_in_paris on November 17 love thzt clutch! Photo Garments Tags Colours Comments Post Details - Votes - Comments - Favourites User Details - Followers - Location Date 29
Scene Location Singles Colours Garments Comments ΔT Tags Style Softmax • Interpretable results with Conditional Random Field model Fans Personal Location Scene Colours Singles Garments ΔT Comments Style Tags 30
Black/Blue Going out (8) Black Casual (8) Current Outfit: Pink Outfit (3) Recommendations: Heels (8) Pastel Shirts/Skirts (8) Black/Gray Tights/Sweater (5) Current Outfit: Pink/Blue Shoes/Dress Shorts (3) Recommendations: Black/Gray Tights/Sweater (5) Black Casual (5) Black Boots/Tights (5) Current Outfit: Blue with Scarf (3) Recommendations: Heels (8) Pastel Shirts/Skirts (8) Black Casual (8) Current Outfit: Pink/Blue Shoes/Dress Shorts (3) Recommendations: Black Casual (7) Black Heavy (3) Navy and Bags (3) Current Outfit: Formal Blue/Brown (5) Recommendations: Pastel Shirts/Skirts (9) Black/Blue Going out (8) Black Boots/Tights (8) Black Heavy Pastel Shirts/Skirts Shoes and Blue Dress Pink/Black Misc. Heels Black Casual Pink Outfit Shirts and Jeans Blue with Scarf Black with Bag/Glasses Pastel Dress Black/Gray Tights/Sweater Pink/Blue Shoes/Dress/Shorts Bags/Dresses Navy and Bags Brown/Blue Jacket White/Black Blouse/Heels Black Boots/Tights Formal Blue/Brown Black/Blue Going out 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Trimester Los Angeles 20 16 12 8 4 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Trimester Manila 20 16 12 8 4 31
obtained by crowd sourcing null shoes shirt jeans hair skin null tights jacket dress hat heels hair skin null shorts blouse bracelet wedges hair skin null shoes top stockings hair skin Kota Yamaguchi, M. Hadi Kiapour, Luis E. Ortiz, Tamara L. Berg. CVPR, 2012. 34
(chictopia…) • Label quality is problematic • Focus on methodology and scalability • Paperdoll, Fashion144k/550k, Fashion Culture, Amazon, … accessories boots dress jacket sweater bag cardigan heels shorts top boots skirt belt pumps skirt t-shirt f ats necklace shirt skirt belt shirt shoes skirt tights skirt top blazer shoes shorts top skirt belt blazer boots shorts t-shirt belt dress heels jacket shoes shorts bracelet jacket pants shoes top bag blazer boots shorts top accessories blazer shoes shorts top Kota Yamaguchi, M. Hadi Kiapour, Tamara L. Berg. ICCV, 2013. 35
• Limited applicability to more simple problems • HipsterWars, Street2Shop, StreetStyle-27k, … 0 100 200 300 400 500 0 20 40 60 80 100 120 Number of Games Played Number of Players M. Hadi Kiapour, Kota Yamaguchi, Alexander C. Berg, Tamara L. Berg. CVPR, 2012. 36
• Limited applicability to more simple problems • HipsterWars, Street2Shop, StreetStyle-27k, … M. Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C. Berg, Tamara L. Berg. ICCV, 2015. 36
Shares problem of crowd-sourced datasets • Important as an evaluation tool • FashionStyle14, … Moeko Takagi, Edgar Simo-Serra, Satoshi Iizuka, Hiroshi Ishikawa. ICCV-CVF, 2017. 37
• Lots of data out there and growing! • Plagued by noise, class imbalance, … • Important directions • Unsupervised / Weakly-supervised learning • Attributes • Dealing with multi-modality • High quality datasets (evaluation) 39