Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[IROS24] Object Segmentation from Open-Vocabula...

[IROS24] Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models

More Decks by Semantic Machine Intelligence Lab., Keio Univ.

Other Decks in Technology


  1. Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport

    Polygon Matching with Multimodal Foundation Models Takayuki Nishimura, Katsuyuki Kuyo, Motonari Kambara and Komei Sugiura Keio University, Japan
  2. - 2 - Object Segmentation from Manipulation Instructions ×8 image

    point clouds “Go to the living room and pick up the pillow closest to the radio art on the wall.” instruction segmentation mask
  3. “Walk to the living room and fetch me the leftmost

    pillow on the smaller white sofa, the pillow closest to the plant on the small table.” Ours - 3 - EVF-SAM-2 [Zhang+, 24] Even SOTA foundation models struggle with our task
  4. Proposed method: Polygon-based mask generation based on optimal transport -

    4 - Main novelty: Polygon Matching Loss based on Optimal Transport Polygon’s vertex order must be the same Predicted Mask Our method Existing methods Ground Truth Mask Predicted Mask Ground Truth Mask
  5. Quantitative results: Our method outperformed baselines in all metrics Model

    mIoU↑ [%] [email protected]↑[%] [email protected]↑[%] LAVT [Yang+, CVPR22] 28.16±2.85 26.46±4.01 18.75±3.29 SeqTR [Zhu+, ECCV22] 21.84±2.28 17.87±7.00 5.16±5.26 MDSM [Iioka+, IROS23] 24.36±3.87 22.49±5.46 13.71±3.34 Ours 38.16±2.46 48.85±2.70 22.29±3.32 +10.00 - 5 - +22.39
  6. Qualitative results: Our method could identify the target object and

    generate mask appropriately - 6 - Ours LAVT Ground Truth ☺ Understood the target object. ☺ appropriate mask
  7. Please come to poster ThPI5T5 - 7 - Segment target

    object from manipulation instructions by polygon matching using optimal transport