Appendix: Difference between OSMI and RES
• Sentence
– RES: ``the candle on the right’’
– OSMI: ``Go to the dining table. Then pick up the candle on the right.’’
14
Even the latest segmentation model,
SEEM [Zou+, 23], is difficult to address OSMI tasks
e.g. ``Pick up the plant in front of the mirror.’’