[EN] Has Image Recognition Technology Surpassed Human Abilities?

Mercari Tech Conf 2017 Machine Learning Engineer Takuma Yamaguchi Has
Image Recognition Surpassed Human Abilities? Item Image Recognition at Mercari: The Present and Future

Introduction • Fields of study • Pattern recognition, image processing,
machine learning • PhD in Engineering • Packaged software company • Image recognition, voice recognition algorithms • OCR / speech recognition engines • Manufacturing company for transportation infrastructure • Operations research • Train fare calculation algorithms • Image recognition, computer vision • Turnstile, parking, surveillance camera • Social gaming company • Developed and administered tools for data analysis • Hadoop / Spark / Hive / Presto • Server-side engineering • Mercari, Inc. • 2016 - present • Image recognition and other machine learning technology • Server-side engineering Takuma Yamaguchi

5.1 O. Russakovsky, et al. ImageNet Large Scale Visual Recognition
Challenge. IJCV, 2015.

Image Data Comparison: Mercari vs. ImageNet Mercari • Number of
categories: 1,100+ • clothes • shoes • bags • books ImageNet (ILSVRC) • Number of categories: 1,000 • milk can • rugby ball • paper towel • earthstar • envelope • miniskirt, mini • cowboy hat, ten-gallon hat • Trolleybus, trolley coach • etc. • cars • handmade • cosmetics • toys • etc. The level of difficulty in distinguishing items is about the same for Mercari and ImageNet → Perhaps we can use machine learning to demonstrate the human ability to recognize items sold on Mercari?

Item Category Recognition • Data • 1,108 classes • For
learning: 2,200,000 images • For testing: 110,000 images • Framework • TensorFlow • Model • Deep Neural Networks • Inception-v3 • https://github.com/tensorflow/models • Learning environment • AWS EC2 p2.8xlarge • CPU x 32 • GPU(Tesla K80) x 8 • Parameters, etc • Learning Rate: 0.1 • Batch Size: 256 • Epoch: 15 (60 hours) • Close to no tuning

Item Category Recognition Top-5 Error: 29.3% The quality of recognition
is high even without data cleansing or parameter tuning, but is it as good as a human? Items are properly recognized for most categories Can we improve the recognition quality even more? What is happening here? Can we improve?

Categories with Low Recognition Rate (High Error Rate) • Women
> Accessories > Others • Women > Bags > Others • Women > Tops > Others • Women > Others > Others • Men > Others • Baby/Kids > Others > Others • Handmade > Hobbies/Toys > Others • Sports/Leisure > Others > Others • Cosmetics/Perfume/Beauty > Others > Others • Entertainment/Hobbies > Collections > Novelty items • Entertainment/Hobbies > Collections > Others • Entertainment/Hobbies > Others > Others • Apartment/Decor > Others • Others > Household goods/Travel > Household goods • Others > Household goods/Travel > Others • Others > Antique/Collections > Miscellaneous goods • Others > Antique/Collections > Others • Others > Others Errors were mostly in the “Others” category, selected in the first place because there was no valid option, so this is somewhat as expected

Analysis of Miscategorizations • Babies/Kids > Girls outfit > 100cm~
> Jackets (0.3682) • Babies/Kids > Baby girls outfit ~95cm > Outerwear (0.3037) • Babies/Kids > Baby boys/girls outfit ~95cm > Outerwear (0.0615) • Babies/Kids > Boys/girls outfit 100cm~ > Jackets (0.0529) • Babies/Kids > Baby boys outfit ~95cm > Outerwear (0.0481) The numbers in parentheses indicate the level (score) of recognition accuracy and the sum of all cases being monitored is 1.0

Analysis of Miscategorizations The numbers in parentheses indicate the level
(score) of recognition accuracy and the sum of all cases being monitored is 1.0 The answer is… Babies/Kids > Boys outfit 100cm~ > Jackets (Answer: the category selected by our customers) • Babies/Kids > Girls outfit 100cm~ > Jackets (0.3682) • Babies/Kids > Baby girls outfit ~95cm > Outerwear (0.3037) • Babies/Kids > Baby boys/girls outfit ~95cm > Outerwear (0.0615) • Babies/Kids > Boys/girls outfit 100cm~ > Jackets (0.0529) • Babies/Kids > Baby boys outfit ~95cm > Outerwear (0.0481)

Analysis of Miscategorizations • Women > Jackets/Outerwear > Pea coats
(0.4849) • Men > Jackets/Outerwear > Pea coats (0.3372) • Babies/Kids > Boys outfit 100cm~ > Coats (0.0305) • Babies/Kids > Boys/girls outfit 100cm~ > Coats (0.0289) • Women > Jackets/Outerwear > Trench coats (0.0176)

Analysis of Miscategorizations The answer is… Babies/Kids > Boys outfit
100cm~ > Jackets The numbers in parentheses indicate the level (score) of recognition accuracy and the sum of all cases being monitored is 1.0 • Women > Jackets/Outerwear > Pea coats (0.4849) • Men > Jackets/Outerwear > Pea coats (0.3372) • Babies/Kids > Boys outfit 100cm~ > Coats (0.0305) • Babies/Kids > Boys/girls outfit 100cm~ > Coats (0.0289) • Women > Jackets/Outerwear > Trench coats (0.0176) (Answer: the category selected by our customers)

Analysis of Miscategorizations Babies/Kids > Kids outfit 100cm~ > Jackets
• Sometimes multiple categories fit the image • The pink jacket with polka dots could be for boys as well as girls • Even humans have a hard time distinguishing items of slightly different sizes • Applies to clothes and other items like shoes Interestingly, the error tendencies are very similar to human mistakes.

Babies/Kids > Toys > Educational toys (0.7098) Analysis of Miscategorizations
Babies/Kids > Toys > Educational toys Babies/Kids > Toys > Musical box (0.9111)

Analysis of Miscategorizations Babies/Kids > Toys > Educational toys Babies/Kids
> Toys > Musical box The ability to recognize “Looping” and “Mary” toys was premature; machine learning and data collection was insufficient for the diversity of these images. Can calculate item similarity levels using CNN’s middle layer Babies/Kids > Toys > Musical box (0.9111)

What Can We Do for Precision Improvement? No one tells
a child how to see, especially in the early years. They learn this through real-world experiences and examples. If you consider a child's eyes as a pair of biological cameras, they take one picture about every 200 milliseconds, the average time an eye movement is made. So by age three, a child would have seen hundreds of millions of pictures of the real world. That's a lot of training examples. So instead of focusing solely on better and better algorithms, my insight was to give the algorithms the kind of training data that a child was given through experiences, in both quantity and quality. “

In Conclusion, Has Image Recognition Technology Surpassed Human Abilities? •
In terms of category recognition of item images on Mercari • (While we haven’t set specific numeric goals) the technology has not surpassed human abilities • It is possible to demonstrate human’s recognition capabilities using existing recognition technology • There is a need to study greater-scale data sets to equip for diversity • Beyond item category, we can use image recognition technology to predict: • Brand • Item title • Item condition • Price range ...for even better customer experience

Inception-v3: same recognition model as item category recognition Under Armour
Converse PUMA Recognition Results Item Brand Recognition

Item Brand Recognition Timberland Tommy Hilfiger Gucci

Bolei Zhou, et al., Learning Deep Features for Discriminative Localization,
CVPR 2016. • Visualization technique called Class Activation Mapping • Applicable for general Convolutional Neural Networks • Also applicable for Inception-v3 What Info does Deep Neural Networks Use for Recognition?

What Info does Deep Neural Networks Use for Recognition?

Item Brand Recognition

Generating Image Descriptions • Oriol Vinyals, et al., Show and
tell: A neural image caption generator, CVPR 2015. • https://github.com/tensorflow/models/tree /master/im2txt • https://research.googleblog.com/2016/09/ show-and-tell-image-captioning-open.html Besides image categorization, research on generating accurate item descriptions is in progress. 　　→ Can we generate item titles from item images?

Title Brand Category Price Color Image Studied several prediction models
such as Deep Neural Networks and Deep Learning Babies/Kids > Kids shoes > Sandals Crocs Crocband Crocs ¥1,500 - ¥2,500 Pink Used over a total of 50 million pieces of item data for learning (We are currently at the accuracy testing stage, and quantitative test results are not ready for reporting.) Prediction of Item Details Including Item Title

For people with no specific product or brand knowledge, it
can seem as though the prediction has surpassed human capabilities Prediction of Item Details Including Item Title Ralph Lauren polo shirt Men > Tops > Polo Shirt Louis Vuitton Monogram Hock Women > Accessories > Billfold wallet Bvlgari Pour Homme “Bvlgari” (written in Japanese) Cosmetics/Perfume/Beauty > Perfume > cologne (men)

Prediction of Item Details Including Item Title For people with
no specific product or brand knowledge, it can seem as though the prediction has surpassed human capabilities Combi baby wipe warmer Babies/kids > Diaper/Toilet/Bath > Diapers “Mappuru” Seoul mini Entertainment/Hobby > Books > Maps/travel guides TV remote controller Electronics/smartphones/cameras > TV/video equipment > Others

Comparison with Google Vision API Labels Vision API Item title
was accurately generated from image features without OCR Has properly collected information that the item is a SHARP AQUOS remote controller TV remote controller Electronics/smartphones/cameras > TV/video equipment > Others

Comparison with Google Vision API Labels Vision API Text Item
title was accurately generated from image features without OCR Has recognized that the item is a magazine about Seoul “Mappuru” Seoul mini Entertainment/Hobby > Books > Maps/travel guides

Comparison with Google Vision API Labels Vision API An item
widely recognized by families/households with newborn babies and toddlers Has not recognized that it is a baby wipe warmer, or a baby good Combi baby wipe warmer Babies/kids > Diaper/Toilet/Bath > Diapers

Summary • Category recognition by ImageNet (ILSVRC) is considered to
have achieved human-level precision • Studied categorization modeling using Mercari item images • Our item recognition rates were far from ImageNet recognition rates • Categories with high rate of recognition errors: • Categories like “others” whose definition is not clear or specific • Categories defined by different sizes and gender (men’s, women’s, kids’, etc) • There is space for improvement • High parameter adjustment at the time of learning • Add machine learning data (especially for categories that consist of a variety of items) • Future development • Prediction to include not just categories but item titles, brand, price, color, etc • A prototype for predictions using above factors is already made; we are currently testing its accuracy • When there is a lack of product and brand knowledge, it seems as though the prediction has surpassed human capabilities • e.g. Combi baby wipe warmer / BVLGARI pour homme

Thank You

[EN] Has Image Recognition Technology Surpassed...

[EN] Has Image Recognition Technology Surpassed Human Abilities?

mercari
PRO

More Decks by mercari

Featured

Transcript

Mercari Tech Conf 2017 Machine Learning Engineer Takuma Yamaguchi Has

Introduction • Fields of study • Pattern recognition, image processing,

5.1

5.1 O. Russakovsky, et al. ImageNet Large Scale Visual Recognition

Image Data Comparison: Mercari vs. ImageNet Mercari • Number of

Item Category Recognition • Data • 1,108 classes • For

Item Category Recognition Top-5 Error: 29.3% The quality of recognition

Categories with Low Recognition Rate (High Error Rate) • Women

Analysis of Miscategorizations • Babies/Kids > Girls outfit > 100cm~

Analysis of Miscategorizations The numbers in parentheses indicate the level

Analysis of Miscategorizations • Women > Jackets/Outerwear > Pea coats

Analysis of Miscategorizations The answer is… Babies/Kids > Boys outfit

Analysis of Miscategorizations Babies/Kids > Kids outfit 100cm~ > Jackets

Babies/Kids > Toys > Educational toys (0.7098) Analysis of Miscategorizations

Analysis of Miscategorizations Babies/Kids > Toys > Educational toys Babies/Kids

What Can We Do for Precision Improvement? No one tells

In Conclusion, Has Image Recognition Technology Surpassed Human Abilities? •

Inception-v3: same recognition model as item category recognition Under Armour

Item Brand Recognition Timberland Tommy Hilfiger Gucci

Bolei Zhou, et al., Learning Deep Features for Discriminative Localization,

What Info does Deep Neural Networks Use for Recognition?

Item Brand Recognition

Item Brand Recognition

Generating Image Descriptions • Oriol Vinyals, et al., Show and

Title Brand Category Price Color Image Studied several prediction models

For people with no specific product or brand knowledge, it

Prediction of Item Details Including Item Title For people with

Comparison with Google Vision API Labels Vision API Item title

Comparison with Google Vision API Labels Vision API Text Item

Comparison with Google Vision API Labels Vision API An item

Summary • Category recognition by ImageNet (ILSVRC) is considered to

Thank You