ใֶݚڀՊ म࢜՝ఔमྃ (ݚڀࣨ) 2020 ~ 2023: ಉେֶ ത࢜՝ఔमྃ, ത࢜ (ใֶ) 2023 ~ ݱࡏ: LINEϠϑʔגࣜձࣾ Image and video AI νʔϜ ྉཧಈը͔ΒͷϨγϐੜ ACMMM21, MTAP22, ACM TOMM ࣮ݧಈըͷVision&Languageσʔληοτ ICCVW21, JNLP22 (༏ल) Add the butter into a pan Crack the egg and stir … खॱ.1 Invert 4 times to mix खॱ. 2 Add 10μl of Alkaline Protease Solution
͔Βֶश͢ΔVideo-and-Language … खॱॻ (ਓखͰ࡞, ϊΠζগ) Իॻ͖ى͜͠ (ࣗಈ, ϊΠζଟ) 1. Crack the eggs and whisk 2. Add butter into the pan 3. Pour the eggs into the pan 4. … [0~10s]: music [10~12]: hi, I am Mina … [30~50]: crack the eggs and carefully… ࡞ۀͱແؔ ֶश
ਓ͕ૢ࡞Λߦ͏աఔ: Ϟϊ͕Ռʹ͚ͯࠞ͟Γ߹͍ͬͯ͘ (1) ৄࡉಈ࡞ೝࣝ [Rohrbach+ CVPR12] Material list eggs butter Key events cheese step 3 step 2 step 1 added cracked stirred added stirred action(s) : state State transition add the butter into a pan crack the eggs and stir add the egg mixture and cheese and stir Procedural text (2) ಈըશମͷཧղ [Nishimura+ ACMMM21]
طଘϞσϧΠϕϯτͱจΛฒྻʹ༧ଌ (Πϕϯτؒͷґଘؔແࢹ) ✔︎ લʹ༧ଌͨ͠ΠϕϯτͱจΛͱʹ࣍ͷखॱΛ༧ଌ Ours: multimodal recurrent prediction Event Candidates T … … 1 Add the butter into a pan 3 Crack the eggs and stir Add the egg mixture and cheese and stir 2 Input Video Events and sentences Event selector Sentence generator Multimodal memory mixing Our model DVC Event Extractor
(d) (e) (f) (g) (a) (b) (c) (d) (e) (f) (g) (a) (b) (c) (d) (e) (f) (a) Cut the pork into slices (b) Cover the pork in plastic wrap and pound (c) Sprinkle salt and pepper on top of the meat (d) Melt butter in the pan (e) Mix eggs milk salt and pepper together (f) Dip the pork in the egg mixture and the bread crumbs (g) Fry the pork in the pan (a) Add oil and salt to a bowl (b) Mix the chicken in the flour and mix (c) Fry the chicken in a pan (d) Fry the chicken in a pan (e) Heat the oil in a pan (f) Fry the chicken in a pan (g) Fry the chicken in the pan (a) Cut the pork in half and remove the pork (b) Season the pork with salt and pepper (c) Season the pork with salt and pepper (d) Heat some butter in a pan (e) Coat the pork in the break crumbs (f) Fry the pork in a pan … … … … 20