Slide 14
Slide 14 text
Experiments
• Evaluation benchmarks
• VL-Checklist (VLC) (Zhao et.al arXiv)
• pos-neg captions per 1 image (C
pos,
C
neg,
I)
• Winoground (Thrush et.al CVPR2022)
• 2 image-text pairs (C
0,
I
0,
C
1,
I
1
) swapping words
• Attribution, Relation and Order (ARO) (Yuksekgonul et.al ICLR 2023)
• Select the most suitable caption for an image from 5 captions, adjusting for
changes in relationship, object, and attributes
• Visual Spatial Reasoning (VSR) (Liu et.al TACL 2023)
• estimate whether Image-text pair has spatial relationship each other
• ZS (Various Zero-Shot Task)
• 21 classification tasks from ELEVATER (Li et al., NeurIPS 2022)
14
Winoground sample
VSR sample