$30 off During Our Annual Pro Sale. View Details »

[IROS23] Prototypical Contrastive Transfer Learning for Multimodal Language Understanding

[IROS23] Prototypical Contrastive Transfer Learning for Multimodal Language Understanding

More Decks by Semantic Machine Intelligence Lab., Keio Univ.

Other Decks in Technology

Transcript

  1. View Slide

  2. L Time consuming: e.g., Multiple env., arranging physical objects
    Motivation:
    Mitigating labor-intensive data collection by domain transfer
    - 2 -
    Transfer
    Simulation Real-world x8

    View Slide

  3. “Look in the left wicker vase
    “next to the potted plant”
    Task:
    Multimodal Language Understanding for Fetching Instruction
    - 3 -
    “Grasp the glass
    “in the sink”
    Transfer
    Simulation Real-world
    Binary classification for each object
    Pos.
    Neg.
    Neg.
    Neg.
    Neg.
    Pos.
    Neg.
    Neg.
    Neg.

    View Slide

  4. Prototypical Contrastive Transfer Learning (PCTL)
    for multimodal language understanding
    Contribution:
    l Introduce domain transfer
    to multimodal language understanding
    l Extend prototypical contrastive loss
    for classification problems in two domains
    - 4 -
    PCL [Li+, ICLR’21]
    Related work:
    MCDDA [Saito+, CVPR’18]

    View Slide

  5. Prototypical Contrastive Transfer Learning (PCTL)
    for multimodal language understanding
    Contribution:
    l Introduce domain transfer
    to multimodal language understanding
    l Extend prototypical contrastive loss
    for classification problems in two domains
    - 5 -
    PCL [Li+, ICLR’21]
    Related work:
    MCDDA [Saito+, CVPR’18]
    Domain transfer for
    single modality (vision) task

    View Slide

  6. Prototypical Contrastive Transfer Learning (PCTL)
    for multimodal language understanding
    Contribution:
    l Introduce domain transfer
    to multimodal language understanding
    l Extend prototypical contrastive loss
    for classification problems in two domains
    - 6 -
    PCL [Li+, ICLR’21]
    Related work:
    MCDDA [Saito+, CVPR’18]
    Performs domain transfer
    based on contrastive learning
    Inspired by PCL

    View Slide

  7. PCTL: Alleviate domain gap by
    PCTL: contrastive learning between two domains
    - 7 -
    Real-world
    Simulation
    Feature
    vectors
    Feature
    vectors
    Clusters’
    centroids
    Clusters’
    centroids
    Contrastive
    “Clean the top-left
    “picture above TV”
    “Pick up the glass
    “in the sink”

    View Slide

  8. Qualitative results: Correct prediction by PCTL
    - 8 -
    “Go down the stairs to the
    “lower balcony area and turn off
    “the lamp on the dresser.”
    From REVERIE [Qi+, CVPR’20]
    #sample: 10342
    From ALFRED
    [Shridhar+, CVPR’20]
    #sample: 34286
    Real-world
    Transfer
    “Pick up the
    “tissue box on the desk“
    Simulation

    View Slide

  9. Quantitative results: Outperformed Target Domain Only
    - 9 -
    Methods Train Test Acc. [%]ˢ
    Target Domain
    Only
    Real Real 73.0±1.87
    MCDDA+
    [Saito, CVPR’18]
    SimReal Real 74.9±3.94
    PCTL (Ours) Sim+Real Real 78.1±2.49
    Improved by
    domain transfer
    +5.1

    View Slide

  10. Quantitative results: Outperformed MCDDA+
    - 10 -
    Methods Train Test Acc. [%]ˢ
    Target Domain
    Only
    Real Real 73.0±1.87
    MCDDA+
    [Saito, CVPR’18]
    SimReal Real 74.9±3.94
    PCTL (Ours) Sim+Real Real 78.1±2.49
    +3.2
    Outperformed
    existing method

    View Slide

  11. Summary:
    Prototypical Contrastive Transfer Learning (PCTL)
    Motivation:
    Mitigating labor-intensive data collection by domain transfer
    Novelty:
    l Introduce domain transfer to multimodal language understanding
    l Extend prototypical contrastive loss
    for classification problems in two domains
    Result:
    Outperformed target-domain only condition
    & existing domain transfer method
    - 11 -

    View Slide