man is seen speaking to the camera and holding up a paper. ࣌ؒ ΫΤϦʹରԠ͢Δ۠ؒ ݕࡧΫΤϦ 🥺 [Otani et al., “Uncovering hidden challenges in query-based video moment retrieval,” BMVC 2020]
Answer: Overcoming Priors for Visual Question Answering,” CVPR 2018] [Otani et al., “Uncovering hidden challenges in query-based video moment retrieval,” BMVC 2020] VQAλεΫ ෦ө૾ݕࡧλεΫ
… Is the ? … Vision and LanguageϞσϧ “No” ग़ྗ ը૾ ςΩετ Is the person wearing shorts? “No” <latexit sha1_base64="LDPUnnAxSXH5hASXL0W0EUrn9fs=">AAACZHichVHLSsNAFD2Nr1pf1SIIgohFcRWmxfpaFdy47MOqoCJJHHVomoRkWqjFH9Ct4sKVgoj4GW78ARf9AUFcKrhx4U0iikj1DjP3zpl77j0zozum8CRjzYjS1t7R2RXtjvX09vUPxAeHVj276hq8ZNim7a7rmsdNYfGSFNLk647LtYpu8jW9vOSfr9W46wnbWpF1h29VtD1L7ApDkwTl5XY8ydQF3zLjKZUF9jtIZqMILGfHr7GJHdgwUEUFHBYkxSY0eDQ2kAKDQ9gWGoS5FIngnOMQMeJWKYtThkZomdY92m18ohbt/ZpewDaoi0nTJeY4JtkDu2Ev7J7dsif23rJWI6jha6mT10Mud7YHjkaKb/+yKuQl9r9Zf2qW2MV8oFWQdidA/FsYIb92cPZSXCxMNqbYJXsm/Resye7oBlbt1bjK88I5YvQBrd/9K1hNq6lZNZOfSWbT4U8gilFMYJreew5ZLCOHEvXlOMYJTiOPSq+SUIbDVCXyyUnghyljH9fBimo=</latexit> t <latexit sha1_base64="mD+5NEFfv4XARUheWCRTfSw3DJk=">AAACZHichVHJSgNBEH0Ztxi3aBAEQcSgeAqdYNxOAS8esxgVVGRmbLXJbMx0AjH4A3pVPHhSEBE/w4s/4CE/IIhHBS8erJkRRSRaTXdVv65X9bpbcwzhScaaEaWtvaOzK9od6+nt6x+IDw6tenbV1XlZtw3bXddUjxvC4mUppMHXHZerpmbwNa2y5J+v1bjrCdtakXWHb5nqniV2ha5Kggq17XiSpRZ8y46nUyyw30EyF0VgeTt+jU3swIaOKkxwWJAUG1Dh0dhAGgwOYVtoEOZSJIJzjkPEiFulLE4ZKqEVWvdot/GJWrT3a3oBW6cuBk2XmOOYZA/shr2we3bLnth7y1qNoIavpU5eC7nc2R44Gim9/csyyUvsf7P+1Cyxi/lAqyDtToD4t9BDfu3g7KW0WJxsTLFL9kz6L1iT3dENrNqrflXgxXPE6ANav/tXsJpJpWdT2cJMMpcJfwJRjGIC0/Tec8hhGXmUqS/HMU5wGnlUepWEMhymKpFPTgI/TBn7ANvBimw=</latexit> v <latexit sha1_base64="fyC5jTbAWRk64eZuBbK7mZaPIpU=">AAACZHichVFNSwJBGH7cvswsLQmCICQxOsms9EUnoUtHP/IDTGR3G21x3V12V8GkP1DXokOngojoZ3TpD3TwDwTR0aBLh15XIUqqd5iZZ555n3eemZFNTbUdxjoeYWR0bHzCO+mb8k/PBIKzcznbaFgKzyqGZlgFWbK5puo866iOxgumxaW6rPG8XNvp7eeb3LJVQ99zWiYv1aWqrlZURXKISrXKwQiLMTfCw0AcgEjCCzeSRvAW+ziAAQUN1MGhwyGsQYJNrQgRDCZxJbSJswip7j7HMXykbVAWpwyJ2BqNVVoVB6xO615N21UrdIpG3SJlGFH2xO5Ylz2ye/bCPn6t1XZr9Ly0aJb7Wm6WAycLmfd/VXWaHRx+qf707KCCLderSt5Nl+ndQunrm0cX3cx2OtpeYdfslfxfsQ57oBvozTflJsXTl/DRB4g/n3sY5OIxcSO2nlqLJOL9n4AXi1jGKr33JhLYRRJZOpfjFGc49zwLfiEkzPdTBc9AE8K3EJY+AXuDijs=</latexit> y <latexit sha1_base64="zqGtiap1argNypVovcA90di/Nv4=">AAACZHichVHLSgMxFD0dX7U+Wi2CIEixKK5KRuoDVwU3Llu1D6hFZsa0Dp0XM2mhFn9At4oLVwoi4me48Qdc+AOCuFRw48I704KoqDckOTm55+YkUR1D9wRjDyGpp7evfyA8GBkaHhmNxsbGC57dcDWe12zDdkuq4nFDt3he6MLgJcfliqkavKjW1/z9YpO7nm5bW6Ll8Iqp1Cy9qmuKICondmJJlmJBJH4CuQuSmTCCyNqxK2xjFzY0NGCCw4IgbECBR60MGQwOcRW0iXMJ6cE+xwEipG1QFqcMhdg6jTValbusRWu/pheoNTrFoO6SMoFZds+u2Qu7Yzfsib3/Wqsd1PC9tGhWO1ru7EQPJzff/lWZNAvsfar+9CxQxUrgVSfvTsD4t9A6+ub+6cvm6sZse45dsGfyf84e2C3dwGq+apc5vnGGCH2A/P25f4LCQkpeSi3m0slMuvMTCGMKM5in915GBuvIIk/nchzhGCehR2lYiksTnVQp1NXE8SWk6Q9yHYo4</latexit> t σʔληοτ
ؔ͢Δಛ <latexit sha1_base64="DeTys5kAwzbKM1exHJFukxg5aaA=">AAACZHichVFNSwJBGH7cvswsLQmCICQxOskYYtFJ6NLRj/wAE9ndRltcd5fdVTDpD9S16NCpICL6GV36Ax38A0F0NOjSodd1IUqqd5iZZ555n3eemZEMVbFsxnoeYWx8YnLKO+2b8c/OBYLzCwVLb5kyz8u6qpslSbS4qmg8byu2ykuGycWmpPKi1NgZ7Bfb3LQUXduzOwavNMW6ptQUWbSJynSqwQiLMSfCoyDuggjcSOvBW+zjADpktNAEhwabsAoRFrUy4mAwiKugS5xJSHH2OY7hI22LsjhliMQ2aKzTquyyGq0HNS1HLdMpKnWTlGFE2RO7Y332yO7ZC/v4tVbXqTHw0qFZGmq5UQ2cLOXe/1U1abZx+KX607ONGrYcrwp5NxxmcAt5qG8fXfRz29lod41ds1fyf8V67IFuoLXf5JsMz17CRx8Q//nco6CwEYsnY8lMIpJKuF/hxTJWsU7vvYkUdpFGns7lOMUZzj3Pgl8ICYvDVMHjakL4FsLKJ/eoifY=</latexit> y <latexit sha1_base64="dF/R2xA93xuWgpBul2eHzvjxmFc=">AAACZHichVFNSwJBGH7cvswsLQmCICQxOskYYtFJ6NLRj/wAE9ndxlpcd5fdVTLpD9S16NCpICL6GV36Ax38A0F0NOjSodd1IUqqd5iZZ555n3eemZEMVbFsxroeYWR0bHzCO+mb8k/PBIKzcwVLb5oyz8u6qpslSbS4qmg8byu2ykuGycWGpPKiVN/q7xdb3LQUXdux2wavNMR9TakpsmgTlTmsBiMsxpwID4O4CyJwI60Hb7GLPeiQ0UQDHBpswipEWNTKiIPBIK6CDnEmIcXZ5ziGj7RNyuKUIRJbp3GfVmWX1Wjdr2k5aplOUambpAwjyp7YHeuxR3bPXtjHr7U6To2+lzbN0kDLjWrgZCH3/q+qQbONgy/Vn55t1LDheFXIu+Ew/VvIA33r6KKX28xGOyvsmr2S/yvWZQ90A631Jt9kePYSPvqA+M/nHgaFtVg8GUtmEpFUwv0KLxaxjFV673WksI008nQuxynOcO55FvxCSJgfpAoeVxPCtxCWPgH1qIn1</latexit> x ग़ྗͱ ؔͳ͍ ಛ <latexit sha1_base64="86W8FM5dr/X1w/5WoYeX81zo6yg=">AAACZHichVFNSwJBGH7cvswsLQmCICQxOskYYtFJ6NLRj/wAE9ndRltcd5fdVTDpD9S16NCpICL6GV36Ax38A0F0NOjSodd1IUqqd5iZZ555n3eemZEMVbFsxnoeYWx8YnLKO+2b8c/OBYLzCwVLb5kyz8u6qpslSbS4qmg8byu2ykuGycWmpPKi1NgZ7Bfb3LQUXduzOwavNMW6ptQUWbSJysjVYITFmBPhURB3QQRupPXgLfZxAB0yWmiCQ4NNWIUIi1oZcTAYxFXQJc4kpDj7HMfwkbZFWZwyRGIbNNZpVXZZjdaDmpajlukUlbpJyjCi7IndsT57ZPfshX38Wqvr1Bh46dAsDbXcqAZOlnLv/6qaNNs4/FL96dlGDVuOV4W8Gw4zuIU81LePLvq57Wy0u8au2Sv5v2I99kA30Npv8k2GZy/how+I/3zuUVDYiMWTsWQmEUkl3K/wYhmrWKf33kQKu0gjT+dynOIM555nwS+EhMVhquBxNSF8C2HlE8uoieA=</latexit> c ೖྗ <latexit sha1_base64="rjKwA1RX5UHN7lmV8NqHdu4pdA8=">AAACd3ichVHLSgMxFD0d3/XRqhvBhcWitCAlI6IiCAU3LvuwKqiUmRjr0HkxMy3W4g/4Ay4EQUGq+Blu/AEX/QRxWUEEF95OB0RFvSHJyck9NyeJauua6zHWDEld3T29ff0D4cGh4ZFIdHRs07UqDhcFbumWs60qrtA1UxQ8zdPFtu0IxVB1saWW19r7W1XhuJplbng1W+wZSsnUDjSueEQVo2N2ojbHk7HVGIEkDTxZjMZZivkR+wnkAMQRRMaKNrCLfVjgqMCAgAmPsA4FLrUdyGCwidtDnTiHkObvC5wgTNoKZQnKUIgt01ii1U7AmrRu13R9NadTdOoOKWOYYY/slrXYA7tjT+z911p1v0bbS41mtaMVdjFyOpF//Vdl0Ozh8FP1p2cPB1j2vWrk3faZ9i14R189PmvlV3Iz9Vl2xZ7J/yVrsnu6gVl94ddZkTtHmD5A/v7cP8HmfEpeTC1mF+LpheAr+jGJaSTovZeQxjoyKNC5R7hAAzehN2lKmpUSnVQpFGjG8SUk+QO75I8c</latexit> p(y, c) = p(y)p(c) <latexit sha1_base64="oZbm26aQY1dgakCiHb9LM91ALGs=">AAAConichVFNSxtBGH5cP5u2GvVS6GVoYmsvYSJiiyIIuZSejGlUcCXsTidxcL/YnQTjmj/gH/DgSUWk+DO82GtLD/6E0qOFXnrou5sFaaXtO8zMM8+8zzvPzNiBoyLN+c2AMTg0PDI69iD38NHj8Yn85NR65LdDIevCd/xw07Yi6ShP1rXSjtwMQmm5tiM37N1Ksr/RkWGkfO+d7gZy27VanmoqYWmiGvnlYLZ7IF6yZUbgRYLMJWpa7um46YesKJipPFZhZtS2I6mZ6Vp6R1hOXOkVe418gZd4Guw+KGeggCxW/fwFTLyHD4E2XEh40IQdWIiobaEMjoC4bcTEhYRUui/RQ460bcqSlGERu0tji1ZbGevROqkZpWpBpzjUQ1IyzPAv/AO/5df8kn/lP/9aK05rJF66NNt9rQwaE4dPaj/+q3Jp1ti5U/3Ts0YTr1OvirwHKZPcQvT1nf2j29ri2kz8nJ/yb+T/hN/wK7qB1/kuzqty7Rg5+oDyn899H6zPlcoLpYXqfGFlPvuKMTzFM8zSe7/CCt5gFXU69wwf8QmfjaLx1qgatX6qMZBppvFbGOYv3cCfrw==</latexit> p(y|c) = p(y0|c) for c 2 C ⇢ C ͱ ʹͷґଘ͕ؔͳͯ͘ɺ Λհͨ͠ґଘ͕ؔ͋Δ <latexit sha1_base64="RqwlWTAXTGywhT7N9Gtgjz5U+S8=">AAACZHichVFNSwJRFD1OX2aWlgRBEJIUreT5kVkroU1LPzIFC5mZnjY4zgwzo2DSH6ht0aJVQUT0M9r0B1r4B4JoWdCmRdfRiAjrPt679513z73nvScZqmLZjHVcwtDwyOiYe9wz4Z2c8vmnZ3YsvWHKPC/rqm4WJdHiqqLxvK3YKi8aJhfrksoLUm2ze15octNSdG3bbhl8ry5WNaWiyKJNUKZV9odYeD0Zja3Ggr+DSJg5FkLf0rr/BrvYhw4ZDdTBocGmWIUIi0YJETAYhO2hTZhJkeKccxzBQ9wGZXHKEAmt0VqlXamParTv1rQctkxdVJomMYNYYo/slr2yB3bHntnHwFptp0ZXS4u81ONyo+w7nsu9/8uqk7dx8M36U7ONCpKOVoW0Gw7SvYXc4zcPz19zG9ml9jK7Yi+k/5J12D3dQGu+ydcZnr2Ahz7g65WDg4OdaDiSCCcy8VAq3v8KN+axiBV67zWksIU08tSX4wSnOHM9CV4hIMz2UgVXnxPADxMWPgGlyYpQ</latexit> y <latexit sha1_base64="98x5RFjGXs464FUSfiTbBQxq8SY=">AAACZHichVHJSgNBEH0Ztxi3qAiCIMGgeAqdRY2eBC8esxgNRAkzY0eHzMZMJ6DBH9Cr4sGTgoj4GV78AQ/5AUE8KnjxYM0kIiLRarqr+nW9qtfdiq1rrmCsGZC6unt6+4L9oYHBoeGR8OjYpmvVHJUXVEu3nKIiu1zXTF4QmtB50Xa4bCg631Kqa975Vp07rmaZG+LA5juGvGdqFU2VBUFZtRyOsthyOpFcSEZ+B/EY8y2KtmWs8A22sQsLKmowwGFCUKxDhkujhDgYbMJ20CDMoUjzzzmOECJujbI4ZciEVmndo12pjZq092q6PlulLjpNh5gRzLJHdste2QO7Y8/so2Othl/D03JAXmlxuV0eOZ7Mv//LMsgL7H+z/tQsUEHa16qRdttHvFuoLX798Pw1v5KbbcyxK/ZC+i9Zk93TDcz6m3qd5bkLhOgDvl450jnYTMTii7HFbCq6mmp/RRBTmME8vfcSVrGODArUl+MEpzgLPEmD0rg00UqVAm3OOH6YNP0JecmKOg==</latexit> c <latexit sha1_base64="JLWQVVS/vr/UZ0GLeKhstOqp4gc=">AAACZHichVHLSsNAFD2Nr1ofrRZBEKRYFFdh+rBWVwU3LvuwKqhIEkcNTZOQTIu1+AO6VVy4UhARP8ONP+DCHxDEpYIbF96mERFR7zBz75y5594zM6pt6K5g7CEgdXR2dfcEe0N9/QOD4cjQ8LJr1RyNlzXLsJxVVXG5oZu8LHRh8FXb4UpVNfiKWllona/UuePqlrkkGjbfqCo7pr6ta4ogqLC3GYkzeS6bTM2kYj+DhMw8i8O3vBW5wjq2YEFDDVVwmBAUG1Dg0lhDAgw2YRtoEuZQpHvnHAcIEbdGWZwyFEIrtO7Qbs1HTdq3aroeW6MuBk2HmDFMsnt2zV7YHbthT+z911pNr0ZLS4O82uZyezN8OFp6+5dVJS+w+8X6U7PANrKeVp202x7SuoXW5tf3T19K88XJ5hS7YM+k/5w9sFu6gVl/1S4LvHiGEH3A5yvHfg+Wk3IiI2cK6Xgu7X9FEGOYwDS99yxyWEQeZerLcYRjnAQepX4pKo20U6WAz4nim0njH6PJik8=</latexit> x <latexit sha1_base64="dF/R2xA93xuWgpBul2eHzvjxmFc=">AAACZHichVFNSwJBGH7cvswsLQmCICQxOskYYtFJ6NLRj/wAE9ndxlpcd5fdVTLpD9S16NCpICL6GV36Ax38A0F0NOjSodd1IUqqd5iZZ555n3eemZEMVbFsxroeYWR0bHzCO+mb8k/PBIKzcwVLb5oyz8u6qpslSbS4qmg8byu2ykuGycWGpPKiVN/q7xdb3LQUXdux2wavNMR9TakpsmgTlTmsBiMsxpwID4O4CyJwI60Hb7GLPeiQ0UQDHBpswipEWNTKiIPBIK6CDnEmIcXZ5ziGj7RNyuKUIRJbp3GfVmWX1Wjdr2k5aplOUambpAwjyp7YHeuxR3bPXtjHr7U6To2+lzbN0kDLjWrgZCH3/q+qQbONgy/Vn55t1LDheFXIu+Ew/VvIA33r6KKX28xGOyvsmr2S/yvWZQ90A631Jt9kePYSPvqA+M/nHgaFtVg8GUtmEpFUwv0KLxaxjFV673WksI008nQuxynOcO55FvxCSJgfpAoeVxPCtxCWPgH1qIn1</latexit> x <latexit sha1_base64="86W8FM5dr/X1w/5WoYeX81zo6yg=">AAACZHichVFNSwJBGH7cvswsLQmCICQxOskYYtFJ6NLRj/wAE9ndRltcd5fdVTDpD9S16NCpICL6GV36Ax38A0F0NOjSodd1IUqqd5iZZ555n3eemZEMVbFsxnoeYWx8YnLKO+2b8c/OBYLzCwVLb5kyz8u6qpslSbS4qmg8byu2ykuGycWmpPKi1NgZ7Bfb3LQUXduzOwavNMW6ptQUWbSJysjVYITFmBPhURB3QQRupPXgLfZxAB0yWmiCQ4NNWIUIi1oZcTAYxFXQJc4kpDj7HMfwkbZFWZwyRGIbNNZpVXZZjdaDmpajlukUlbpJyjCi7IndsT57ZPfshX38Wqvr1Bh46dAsDbXcqAZOlnLv/6qaNNs4/FL96dlGDVuOV4W8Gw4zuIU81LePLvq57Wy0u8au2Sv5v2I99kA30Npv8k2GZy/how+I/3zuUVDYiMWTsWQmEUkl3K/wYhmrWKf33kQKu0gjT+dynOIM555nwS+EhMVhquBxNSF8C2HlE8uoieA=</latexit> c <latexit sha1_base64="DeTys5kAwzbKM1exHJFukxg5aaA=">AAACZHichVFNSwJBGH7cvswsLQmCICQxOskYYtFJ6NLRj/wAE9ndRltcd5fdVTDpD9S16NCpICL6GV36Ax38A0F0NOjSodd1IUqqd5iZZ555n3eemZEMVbFsxnoeYWx8YnLKO+2b8c/OBYLzCwVLb5kyz8u6qpslSbS4qmg8byu2ykuGycWmpPKi1NgZ7Bfb3LQUXduzOwavNMW6ptQUWbSJynSqwQiLMSfCoyDuggjcSOvBW+zjADpktNAEhwabsAoRFrUy4mAwiKugS5xJSHH2OY7hI22LsjhliMQ2aKzTquyyGq0HNS1HLdMpKnWTlGFE2RO7Y332yO7ZC/v4tVbXqTHw0qFZGmq5UQ2cLOXe/1U1abZx+KX607ONGrYcrwp5NxxmcAt5qG8fXfRz29lod41ds1fyf8V67IFuoLXf5JsMz17CRx8Q//nco6CwEYsnY8lMIpJKuF/hxTJWsU7vvYkUdpFGns7lOMUZzj3Pgl8ICYvDVMHjakL4FsLKJ/eoifY=</latexit> y <latexit sha1_base64="86W8FM5dr/X1w/5WoYeX81zo6yg=">AAACZHichVFNSwJBGH7cvswsLQmCICQxOskYYtFJ6NLRj/wAE9ndRltcd5fdVTDpD9S16NCpICL6GV36Ax38A0F0NOjSodd1IUqqd5iZZ555n3eemZEMVbFsxnoeYWx8YnLKO+2b8c/OBYLzCwVLb5kyz8u6qpslSbS4qmg8byu2ykuGycWmpPKi1NgZ7Bfb3LQUXduzOwavNMW6ptQUWbSJysjVYITFmBPhURB3QQRupPXgLfZxAB0yWmiCQ4NNWIUIi1oZcTAYxFXQJc4kpDj7HMfwkbZFWZwyRGIbNNZpVXZZjdaDmpajlukUlbpJyjCi7IndsT57ZPfshX38Wqvr1Bh46dAsDbXcqAZOlnLv/6qaNNs4/FL96dlGDVuOV4W8Gw4zuIU81LePLvq57Wy0u8au2Sv5v2I99kA30Npv8k2GZy/how+I/3zuUVDYiMWTsWQmEUkl3K/wYhmrWKf33kQKu0gjT+dynOIM555nwS+EhMVhquBxNSF8C2HlE8uoieA=</latexit> c
and answer: Overcoming priors for visual question answering,” CVPR 2018 • Hendrycks and Dietterich, “Benchmarking neural network robustness to common corruption and perturbations,” ICLR 2019 • Hendrycks et al., “Natural adversarial examples,” CVPR 2021 • Out-of-distributionݕग़ • Hendrycks and Gimpel, “A baseline for detecting misclassification and out- of-distribution examples in neural networks,” ICLR 2017 • Hein et al., “Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem,” CVPR 2019 • දత૬ؔΛ࣋ͭಛྔͷݕग़ • Wong et al., “Leveraging sparse linear layers for debuggable deep networks,” ICML 2021 • Anders et al., “Finding and removing Clever Hans: Using explanation methods to debug and improve deep models,” Information Fusion, Vol. 77, 2022 • Neuhaus et al., “Spurious features everywhere – Large-scale detection of harmful spurious features in ImageNet,” ICCV 2023
“What makes training multi-modal classification network hard?” CVPR 2020 • ͓ͦΒ͘Vision and Languageͷ߹ςΩετͷϞμϦςΟ • Shah et al., “The pitfalls of simplicity bias in neural networks,” NeurIPS 2020 • ಛʹɺը૾ͱςΩετΛೖྗͱ͢Δ߹ • ςΩετࢄతͳͷͰɺೖྗͱग़ྗͱͷ૬ؔΛݟ͚͍ͭ͢ • ը૾·ͣը૾தͷ֓೦͕ݟ͖͔͑ͯͯΒ • ςΩετ→ग़ྗͷ૬ؔΛֶशͨ͠Βͦ͜Ͱऩଋͨ͠Α͏ʹݟ͑Δ • ͔ͨ͠͠ΒάϩοΩϯάͷΑ͏ͳݱ͕ى͖Δ͔ • Power et al., “Grokking: Generalization beyond overfitting on small algorithmic datasets,” ICLR Workshop 2021
bias skin tone bias age bias ethnicity bias Train model Evaluate gender bias skin tone bias age bias ethnicity bias Train model Evaluate gender bias skin tone bias age bias ethnicity bias 40% Train model Evaluate gender bias skin tone bias age bias ethnicity bias generated images real images