Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[RSJ22] TDP-MAT: Multimodal Language Comprehens...

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

[RSJ22] TDP-MAT: Multimodal Language Comprehension for Object Manipulation Tasks via Realย Images

More Decks by Semantic Machine Intelligence Lab., Keio Univ.

Other Decks in Technology

Transcript

  1. 1

  2. 2

  3. โœ“ : 4 โ€œLook in the left wicker vase that

    is next to the potted plantโ€ Wicker vase :
  4. โœ“ : โ€œLook in the left wicker vase that is

    next to the potted plantโ€ 5 Wicker vase : Wicker vase Wicker vase Wicker vase
  5. โœ“ : โœ“ Key : โ€œLook in the left wicker

    vase that is next to the potted plantโ€ 6 Wicker vase : Wicker vase Wicker vase Wicker vase
  6. โœ“ REVERIE-fetch โ€ข โ€ข (Instruction) (Context Regions) (Candidate Region) 8

    โ€œLook in the left wicker vase that is next to the potted plantโ€
  7. โœ“ REVERIE-fetch โ€ข โ€ข (Instruction) (Context Regions) (Candidate Region) 9

    โ€œLook in the left wicker vase that is next to the potted plantโ€
  8. โœ“ REVERIE-fetch โ€ข โ€ข (Instruction) (Context Regions) (Candidate Region) โ€ข

    10 โ€œLook in the left wicker vase that is next to the potted plantโ€
  9. โœ“ REVERIE-fetch โ€ข โ€ข (Instruction) (Context Regions) (Candidate Region) โ€ข

    11 โ€œLook in the left wicker vase that is next to the potted plantโ€ Faster R-CNN[Ren+, PAMI16]
  10. MTCM [Magassouba+, RA-L19] . VGG16LSTM . Target-dependent UNITER (TDU) [Ishikawa+,

    RA-L21] UNITER[Chen+, ECCV20] . REVERIE task / dataset [Qi+, CVPR20] , REVERIE 12
  11. โœ“ ๐œน๐‘ก โœ“ 18 Input ๐œน๐‘ก Output 1. ๐ธ ๐œน

    = CE ๐‘“ ๐’™ , ๐’š โˆ‡๐œน ๐ธ ๐œน = ๐œ•๐ธ ๐œ•๐œน 2. โˆ‡๐œน ๐ธ ๐œน ๐’Ž๐‘ก ๐’—๐‘ก ๐’Ž๐‘ก = ๐œŒ1 ๐’Ž๐‘กโˆ’1 + 1 โˆ’ ๐œŒ1 โˆ‡๐œน ๐ธ ๐œน๐‘ก ๐’—๐‘ก = ๐œŒ2 ๐’—๐‘กโˆ’1 + 1 โˆ’ ๐œŒ2 โˆ‡๐œน ๐ธ ๐œน๐‘ก 2 3. ๐’Ž๐‘ก ๐’—๐‘ก โˆ†๐œน๐’• เท ๐’Ž๐‘ก = ๐’Ž๐‘ก 1 โˆ’ ๐œŒ1 ๐‘ก , เท ๐’—๐‘ก = ๐’—๐‘ก 1 โˆ’ ๐œŒ2 ๐‘ก โˆ†๐œน๐’• = ๐œ‚ เท ๐’Ž๐‘ก เท ๐’—๐‘ก + ๐œ– 4. ๐œน๐‘ก+1 = ฮ  ๐œน โ‰ค๐œ– ๐œน๐‘ก + โˆ†๐œน๐’• โˆ†๐œน๐’• ๐น
  12. โœ“ REVERIE-fetch dataset - REVERIE dataset โœ“ REVERIE[Qi+, CVPR18] -

    โ†’ 1. , 2. https://yuankaiqi.github.io/REVERIE_Challenge/static/img/demo.gif 22 Matterport3D
  13. โœ“ REVERIE-fetch dataset - REVERIE dataset โœ“ REVERIE[Qi+, CVPR18] :

    + 23 , โ†“ - REVERIE - - https://yuankaiqi.github.io/REVERIE_Challenge/static/img/demo.gif
  14. โœ“ REVERIE-fetch dataset โ€ข REVERIE dataset #Samples Vocabulary size Average

    sentence length 30532 2853 19.1 Training Validation Test 26808 2552 1172 24 โ€œLook in the left wicker vase that is next to the potted plantโ€
  15. โ€œGo into the living room and give me the pillow

    on the couch nearest the plantโ€ 25 โ€ข โ†’ TDP-MAT
  16. 26 โ€ข โ†’ TDP-MAT โœ“ Bounding box โ€œMake haste to

    the office and fluff the pillow sitting on the left of the chairโ€
  17. โ€ข Acc [%] : 27 Condition Acc [%] โ†‘ Baseline

    : TDU [Ishikawa+, IROS21] 73.3 0.485 Ours : TDP-MAT W/o MAT 72.5 3.55 W/o MAT + Smaller learning rate 74.4 0.831 W/o CLIP & Perceiver 74.1 1.47 W/o Pretraining 73.1 2.24 Full 75.3 0.691 +2.0
  18. 28 Condition Acc [%] โ†‘ Baseline : TDU [Ishikawa+, IROS21]

    73.3 0.485 Ours : TDP-MAT W/o MAT 72.5 3.55 W/o MAT + Smaller learning rate 74.4 0.831 W/o CLIP & Perceiver 74.1 1.47 W/o Pretraining 73.1 2.24 Full 75.3 0.691 +2.8 - - 5 - ( ) - Smaller learning rate : 1/8 -
  19. 29 Condition Acc [%] โ†‘ Baseline : TDU [Ishikawa+, IROS21]

    73.3 0.485 Ours : TDP-MAT W/o MAT 72.5 3.55 W/o MAT + Smaller learning rate 74.4 0.831 W/o CLIP & Perceiver 74.1 1.47 W/o Pretraining 73.1 2.24 Full 75.3 0.691 +1.2 - CLIP Encoders, Perceiver Module, - Cross Attention
  20. 30 Condition Acc [%] โ†‘ Baseline : TDU [Ishikawa+, IROS21]

    73.3 0.485 Ours : TDP-MAT W/o MAT 72.5 3.55 W/o MAT + Smaller learning rate 74.4 0.831 W/o CLIP & Perceiver 74.1 1.47 W/o Pretraining 73.1 2.24 Full 75.3 0.691 +2.2 - TDU
  21. โœ“ โœ“ ๐ฟ ๐‘ ๐‘…๐ฟร—๐ท ๐‘…๐‘ร—๐ธ ๐‘…๐ฟร—๐ท, ๐‘…๐‘ร—๐ท โ†’ ๐‘…๐ฟร—๐‘

    ๐‘…๐ฟร—๐ท ๐‘…๐ฟร—๐ท, ๐‘…๐ฟร—๐ท โ†’ ๐‘…๐ฟร—๐ฟ 32