Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[EN] Has Image Recognition Technology Surpassed Human Abilities?

mercari
September 30, 2017
130

[EN] Has Image Recognition Technology Surpassed Human Abilities?

mercari

September 30, 2017
Tweet

More Decks by mercari

Transcript

  1. Mercari Tech Conf 2017 Machine Learning Engineer Takuma Yamaguchi Has

    Image Recognition Surpassed Human Abilities? Item Image Recognition at Mercari: The Present and Future
  2. Introduction • Fields of study • Pattern recognition, image processing,

    machine learning • PhD in Engineering • Packaged software company • Image recognition, voice recognition algorithms • OCR / speech recognition engines • Manufacturing company for transportation infrastructure • Operations research • Train fare calculation algorithms • Image recognition, computer vision • Turnstile, parking, surveillance camera • Social gaming company • Developed and administered tools for data analysis • Hadoop / Spark / Hive / Presto • Server-side engineering • Mercari, Inc. • 2016 - present • Image recognition and other machine learning technology • Server-side engineering Takuma Yamaguchi
  3. 5.1

  4. Image Data Comparison: Mercari vs. ImageNet Mercari • Number of

    categories: 1,100+ • clothes • shoes • bags • books ImageNet (ILSVRC) • Number of categories: 1,000 • milk can • rugby ball • paper towel • earthstar • envelope • miniskirt, mini • cowboy hat, ten-gallon hat • Trolleybus, trolley coach • etc. • cars • handmade • cosmetics • toys • etc. The level of difficulty in distinguishing items is about the same for Mercari and ImageNet → Perhaps we can use machine learning to demonstrate the human ability to recognize items sold on Mercari?
  5. Item Category Recognition • Data • 1,108 classes • For

    learning: 2,200,000 images • For testing: 110,000 images • Framework • TensorFlow • Model • Deep Neural Networks • Inception-v3 • https://github.com/tensorflow/models • Learning environment • AWS EC2 p2.8xlarge • CPU x 32 • GPU(Tesla K80) x 8 • Parameters, etc • Learning Rate: 0.1 • Batch Size: 256 • Epoch: 15 (60 hours) • Close to no tuning
  6. Item Category Recognition Top-5 Error: 29.3% The quality of recognition

    is high even without data cleansing or parameter tuning, but is it as good as a human? Items are properly recognized for most categories Can we improve the recognition quality even more? What is happening here? Can we improve?
  7. Categories with Low Recognition Rate (High Error Rate) • Women

    > Accessories > Others • Women > Bags > Others • Women > Tops > Others • Women > Others > Others • Men > Others • Baby/Kids > Others > Others • Handmade > Hobbies/Toys > Others • Sports/Leisure > Others > Others • Cosmetics/Perfume/Beauty > Others > Others • Entertainment/Hobbies > Collections > Novelty items • Entertainment/Hobbies > Collections > Others • Entertainment/Hobbies > Others > Others • Apartment/Decor > Others • Others > Household goods/Travel > Household goods • Others > Household goods/Travel > Others • Others > Antique/Collections > Miscellaneous goods • Others > Antique/Collections > Others • Others > Others Errors were mostly in the “Others” category, selected in the first place because there was no valid option, so this is somewhat as expected
  8. Analysis of Miscategorizations • Babies/Kids > Girls outfit > 100cm~

    > Jackets (0.3682) • Babies/Kids > Baby girls outfit ~95cm > Outerwear (0.3037) • Babies/Kids > Baby boys/girls outfit ~95cm > Outerwear (0.0615) • Babies/Kids > Boys/girls outfit 100cm~ > Jackets (0.0529) • Babies/Kids > Baby boys outfit ~95cm > Outerwear (0.0481) The numbers in parentheses indicate the level (score) of recognition accuracy and the sum of all cases being monitored is 1.0
  9. Analysis of Miscategorizations The numbers in parentheses indicate the level

    (score) of recognition accuracy and the sum of all cases being monitored is 1.0 The answer is… Babies/Kids > Boys outfit 100cm~ > Jackets (Answer: the category selected by our customers) • Babies/Kids > Girls outfit 100cm~ > Jackets (0.3682) • Babies/Kids > Baby girls outfit ~95cm > Outerwear (0.3037) • Babies/Kids > Baby boys/girls outfit ~95cm > Outerwear (0.0615) • Babies/Kids > Boys/girls outfit 100cm~ > Jackets (0.0529) • Babies/Kids > Baby boys outfit ~95cm > Outerwear (0.0481)
  10. Analysis of Miscategorizations • Women > Jackets/Outerwear > Pea coats

    (0.4849) • Men > Jackets/Outerwear > Pea coats (0.3372) • Babies/Kids > Boys outfit 100cm~ > Coats (0.0305) • Babies/Kids > Boys/girls outfit 100cm~ > Coats (0.0289) • Women > Jackets/Outerwear > Trench coats (0.0176)
  11. Analysis of Miscategorizations The answer is… Babies/Kids > Boys outfit

    100cm~ > Jackets The numbers in parentheses indicate the level (score) of recognition accuracy and the sum of all cases being monitored is 1.0 • Women > Jackets/Outerwear > Pea coats (0.4849) • Men > Jackets/Outerwear > Pea coats (0.3372) • Babies/Kids > Boys outfit 100cm~ > Coats (0.0305) • Babies/Kids > Boys/girls outfit 100cm~ > Coats (0.0289) • Women > Jackets/Outerwear > Trench coats (0.0176) (Answer: the category selected by our customers)
  12. Analysis of Miscategorizations Babies/Kids > Kids outfit 100cm~ > Jackets

    • Sometimes multiple categories fit the image • The pink jacket with polka dots could be for boys as well as girls • Even humans have a hard time distinguishing items of slightly different sizes • Applies to clothes and other items like shoes Interestingly, the error tendencies are very similar to human mistakes.
  13. Babies/Kids > Toys > Educational toys (0.7098) Analysis of Miscategorizations

    Babies/Kids > Toys > Educational toys Babies/Kids > Toys > Musical box (0.9111)
  14. Analysis of Miscategorizations Babies/Kids > Toys > Educational toys Babies/Kids

    > Toys > Musical box The ability to recognize “Looping” and “Mary” toys was premature; machine learning and data collection was insufficient for the diversity of these images. Can calculate item similarity levels using CNN’s middle layer Babies/Kids > Toys > Musical box (0.9111)
  15. What Can We Do for Precision Improvement? No one tells

    a child how to see, especially in the early years. They learn this through real-world experiences and examples. If you consider a child's eyes as a pair of biological cameras, they take one picture about every 200 milliseconds, the average time an eye movement is made. So by age three, a child would have seen hundreds of millions of pictures of the real world. That's a lot of training examples. So instead of focusing solely on better and better algorithms, my insight was to give the algorithms the kind of training data that a child was given through experiences, in both quantity and quality. “
  16. In Conclusion, Has Image Recognition Technology Surpassed Human Abilities? •

    In terms of category recognition of item images on Mercari • (While we haven’t set specific numeric goals) the technology has not surpassed human abilities • It is possible to demonstrate human’s recognition capabilities using existing recognition technology • There is a need to study greater-scale data sets to equip for diversity • Beyond item category, we can use image recognition technology to predict: • Brand • Item title • Item condition • Price range ...for even better customer experience
  17. Inception-v3: same recognition model as item category recognition Under Armour

    Converse PUMA Recognition Results Item Brand Recognition
  18. Bolei Zhou, et al., Learning Deep Features for Discriminative Localization,

    CVPR 2016. • Visualization technique called Class Activation Mapping • Applicable for general Convolutional Neural Networks • Also applicable for Inception-v3 What Info does Deep Neural Networks Use for Recognition?
  19. Generating Image Descriptions • Oriol Vinyals, et al., Show and

    tell: A neural image caption generator, CVPR 2015. • https://github.com/tensorflow/models/tree /master/im2txt • https://research.googleblog.com/2016/09/ show-and-tell-image-captioning-open.html Besides image categorization, research on generating accurate item descriptions is in progress.   → Can we generate item titles from item images?
  20. Title Brand Category Price Color Image Studied several prediction models

    such as Deep Neural Networks and Deep Learning Babies/Kids > Kids shoes > Sandals Crocs Crocband Crocs ¥1,500 - ¥2,500 Pink Used over a total of 50 million pieces of item data for learning (We are currently at the accuracy testing stage, and quantitative test results are not ready for reporting.) Prediction of Item Details Including Item Title
  21. For people with no specific product or brand knowledge, it

    can seem as though the prediction has surpassed human capabilities Prediction of Item Details Including Item Title Ralph Lauren polo shirt Men > Tops > Polo Shirt Louis Vuitton Monogram Hock Women > Accessories > Billfold wallet Bvlgari Pour Homme “Bvlgari” (written in Japanese) Cosmetics/Perfume/Beauty > Perfume > cologne (men)
  22. Prediction of Item Details Including Item Title For people with

    no specific product or brand knowledge, it can seem as though the prediction has surpassed human capabilities Combi baby wipe warmer Babies/kids > Diaper/Toilet/Bath > Diapers “Mappuru” Seoul mini Entertainment/Hobby > Books > Maps/travel guides TV remote controller Electronics/smartphones/cameras > TV/video equipment > Others
  23. Comparison with Google Vision API Labels Vision API Item title

    was accurately generated from image features without OCR Has properly collected information that the item is a SHARP AQUOS remote controller TV remote controller Electronics/smartphones/cameras > TV/video equipment > Others
  24. Comparison with Google Vision API Labels Vision API Text Item

    title was accurately generated from image features without OCR Has recognized that the item is a magazine about Seoul “Mappuru” Seoul mini Entertainment/Hobby > Books > Maps/travel guides
  25. Comparison with Google Vision API Labels Vision API An item

    widely recognized by families/households with newborn babies and toddlers Has not recognized that it is a baby wipe warmer, or a baby good Combi baby wipe warmer Babies/kids > Diaper/Toilet/Bath > Diapers
  26. Summary • Category recognition by ImageNet (ILSVRC) is considered to

    have achieved human-level precision • Studied categorization modeling using Mercari item images • Our item recognition rates were far from ImageNet recognition rates • Categories with high rate of recognition errors: • Categories like “others” whose definition is not clear or specific • Categories defined by different sizes and gender (men’s, women’s, kids’, etc) • There is space for improvement • High parameter adjustment at the time of learning • Add machine learning data (especially for categories that consist of a variety of items) • Future development • Prediction to include not just categories but item titles, brand, price, color, etc • A prototype for predictions using above factors is already made; we are currently testing its accuracy • When there is a lack of product and brand knowledge, it seems as though the prediction has surpassed human capabilities • e.g. Combi baby wipe warmer / BVLGARI pour homme