Slide 1

Slide 1 text

Food Image Object Detection and Classification Challenges and Solutions

Slide 2

Slide 2 text

Part 1: Detection

Slide 3

Slide 3 text

自己紹介 • リビツキ レシェック • ポーランド出身 • 2016~ クックパッド • github: lunardog

Slide 4

Slide 4 text

Warning! This presentation contains images that may cause severe drooling and stomach grumbling. @cookpad

Slide 5

Slide 5 text

History 歴史

Slide 6

Slide 6 text

ImageNet KWWSLPDJHQHWRUJ

Slide 7

Slide 7 text

ImageNet Large Scale Visual Recognition Competition KWWSZZZLPDJHQHWRUJFKDOOHQJHV/695&

Slide 8

Slide 8 text

ILSVRC 2010 task Classification )RUHDFKLPDJHDOJRULWKPV ZLOOSURGXFHDOLVWRIDWPRVW REMHFWFDWHJRULHVLQWKH GHVFHQGLQJRUGHURI FRQILGHQFH KWWSZZZLPDJHQHWRUJFKDOOHQJHV/695&

Slide 9

Slide 9 text

ILSVRC 2011 tasks 1. Classification 2. *Classification with localization *tester task

Slide 10

Slide 10 text

KWWSFVQVWDQIRUGHGXV\OODEXVKWPO Classification + Localization

Slide 11

Slide 11 text

ILSVRC 2012 tasks 1. Classification 2. Classification with localization 3. Fine-grained classification

Slide 12

Slide 12 text

Fine-grained classification KWWSZZZLPDJHQHWRUJFKDOOHQJHV/695&

Slide 13

Slide 13 text

AlexNet ,PDJHQHWFODVVLILFDWLRQZLWKGHHSFRQYROXWLRQDOQHXUDOQHWZRUNV $.UL]KHYVN\,6XWVNHYHU*(+LQWRQ$GYDQFHVLQQHXUDOLQIRUPDWLRQ SURFHVVLQJV\VWHPV

Slide 14

Slide 14 text

ILSVRC 2013 tasks 1. Detection 2. Classification 3. Classification with localization

Slide 15

Slide 15 text

ILSVRC 2014 tasks 1. Detection 2. Classification 3. Classification with localization

Slide 16

Slide 16 text

Object Detection KWWSFVQVWDQIRUGHGXV\OODEXVKWPO

Slide 17

Slide 17 text

Deep Learning KWWSVGHYEORJVQYLGLDFRP

Slide 18

Slide 18 text

ILSVRC 2015 tasks 1. Object detection 2. Object localization 3. *Object detection from video 4. *Scene classification

Slide 19

Slide 19 text

ILSVRC 2016 tasks 1. Object localization 2. Object detection 3. Object detection from video 4. Scene classification 5. Scene parsing

Slide 20

Slide 20 text

Cookpad 2016

Slide 21

Slide 21 text

画像データセット 1997年~ レシピ数:国内約260万 + 国外 + つくれぽ + 手順写真 17言語、60カ国 ※数字は2017年02月時点のものです

Slide 22

Slide 22 text

画像解析の研究関心 • これは料理ですか? • どの料理ですか? • 料理はどこですか? • 。。。 Part 2

Slide 23

Slide 23 text

Where is the food? 料理はどこですか?

Slide 24

Slide 24 text

ゴール )LQGIRRGLQWKHLPDJHGUDZ DERXQGLQJER[DURXQGWKH IRRGLWHPLQFOXGLQJWKH GLVKLIYLVLEOH

Slide 25

Slide 25 text

,IWKHUHDUHPXOWLSOHLWHPV GUDZDERXQGLQJER[ DURXQGHDFKRQH ゴール

Slide 26

Slide 26 text

ground truth bounding box > 0.9 We count it as a positive detection if Intersection over Union ratio is greater than 0.9. ƴ

Slide 27

Slide 27 text

QXPEHURIWUXHSRVLWLYHV QXPEHURIJURXQGWUXWKER[HV ƴ ƴ ƴ QXPEHURIWUXHSRVLWLYHV QXPEHURIJHQHUDWHGER[HV 再現率 (precision) (recall) ƴ ƴ

Slide 28

Slide 28 text

Methods

Slide 29

Slide 29 text

1. Build a classifier 2. Pick Regions of Interest 3. Run classifier on each region 4. Remove duplicate detections IDEA

Slide 30

Slide 30 text

Fast, Faster R-CNN 5LFKIHDWXUHKLHUDUFKLHVIRUDFFXUDWHREMHFWGHWHFWLRQDQGVHPDQWLFVHJPHQWDWLRQ 5RVV*LUVKLFN-HII'RQDKXH7UHYRU'DUUHOO-LWHQGUD0DOLN )DVWHU5&117RZDUGV5HDO7LPH2EMHFW'HWHFWLRQZLWK5HJLRQ3URSRVDO1HWZRUNV 6KDRTLQJ5HQ.DLPLQJ+H5RVV*LUVKLFN-LDQ6XQ )DVW5&11 5RVV*LUVKLFN

Slide 31

Slide 31 text

問題 1. Computational cost 2. Context is important 3. ...but context can be confusing. KDQG IRRG JUDVV IRRG KWWSSL[DED\FRP

Slide 32

Slide 32 text

Single Shot Detector 66'6LQJOH6KRW0XOWL%R['HWHFWRU :HL/LX'UDJRPLU$QJXHORY'XPLWUX(UKDQ&KULVWLDQ6]HJHG\ 6FRWW5HHG&KHQJ

Slide 33

Slide 33 text

Either The Least Or Most Employable Person Ever 7KH+XIILQJWRQ3RVW JLWKXEFRPSMUHGGLH SMUHGGLHFRPGDUNQHW ZZZNDJJOHFRPSMUHGGLH Joseph Redmon

Slide 34

Slide 34 text

You Only Look Once

Slide 35

Slide 35 text

Slide 36

Slide 36 text

No content