Slide 1

Slide 1 text

Finding beans in burgers Deep semantic-visual embedding with localization @lunardog 関東コンピュータービジョン勉強会  2018.07.07

Slide 2

Slide 2 text

自己紹介 ● レシェック ● ポーランド人 ● 2005~ 機械学習の研究者 ● 2010~ 日本に来ました ● 2016~ クックパッドに入社 ● github: @lunardog twitter: @_lunardog_

Slide 3

Slide 3 text

CVPR 2018 SIGIR 2018 MsCOCO Recipe1M

Slide 4

Slide 4 text

CVPR 2017

Slide 5

Slide 5 text

Learning Cross-modal Embeddings for Cooking Recipes and Food Images ● CVPR 2017 ● joint embedding of images and recipes

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

CVPR 2018

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

MsCOCO -> MsCOCO MsCOCO -> Flickr30K

Slide 11

Slide 11 text

triplet loss WELDON pooling

Slide 12

Slide 12 text

Triplet Loss

Slide 13

Slide 13 text

FaceNet: A Unified Embedding for Face Recognition and Clustering Florian Schroff, Dmitry Kalenichenko, James Philbin

Slide 14

Slide 14 text

FaceNet: A Unified Embedding for Face Recognition and Clustering Florian Schroff, Dmitry Kalenichenko, James Philbin

Slide 15

Slide 15 text

y z z’ 1- 1- α

Slide 16

Slide 16 text

≥α ≥α ≥α ≥α ≥α ≥α

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

≥α ≥α ≥α ≥α ≥α ≥α

Slide 19

Slide 19 text

triplet loss WELDON pooling

Slide 20

Slide 20 text

1- 1- α

Slide 21

Slide 21 text

≥α ≥α ≥α ≥α ≥α ≥α Instance Loss

Slide 22

Slide 22 text

≥α ≥α ≥α ≥α ≥α ≥α Semantic Loss

Slide 23

Slide 23 text

WELDON Pooling

Slide 24

Slide 24 text

Global Average Pooling Linear Typical Image Classifier

Slide 25

Slide 25 text

WELDON

Slide 26

Slide 26 text

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.05 0.3 0.1 0.0 0.0 0.0 00 0.5 1.0 1.0 0.3 0.0 0.0 0.0 0.5 1.0 1.0 1.0 0.01 0.0 0.0 0.2 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.05 0.4 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.05 0.3 0.1 0.0 0.0 0.0 00 0.5 1.0 1.0 0.3 0.0 0.0 0.0 0.5 1.0 1.0 1.0 0.01 0.0 0.0 0.2 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.05 0.4 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.05 0.3 0.1 0.0 0.0 0.0 00 0.5 1.0 1.0 0.3 0.0 0.0 0.0 0.5 1.0 1.0 1.0 0.01 0.0 0.0 0.2 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.05 0.4 0.2 0.0 0.0 0.0 Global MAX Pooling Global Average Pooling

Slide 27

Slide 27 text

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.05 0.3 0.1 0.0 0.0 0.0 00 0.5 1.0 1.0 0.3 0.0 0.0 0.0 0.5 1.0 1.0 1.0 0.01 0.0 0.0 0.2 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.05 0.4 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.05 0.3 0.1 0.0 0.0 0.0 00 0.5 1.0 1.0 0.3 0.0 0.0 0.0 0.5 1.0 1.0 1.0 0.01 0.0 0.0 0.2 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.05 0.4 0.2 0.0 0.0 0.0 min + max Pooling bottom m top k

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

https://tokyo-ml.github.io/hotdog-tf-js/ http://techlife.cookpad.com/entry/2018/04/06/124455

Slide 31

Slide 31 text

The END