ʢCaptioningɼ Q&Aʣ • ݴ༿ͰԻָΛ୳͢ʢMusic searchʣ etc… • Vision & Languageͱಉ͘͡ɼɹɹɹ ݴޠͱͷϚϧνϞʔμϧֶशͷҰ 5 It’s a jazz music piece with slow tempo. Piano, muted-trumpet, tenor sax, bass, and drums are appeared. Captioning Question: Is this music a vocal song? Answer: No. It’s an instrumental song. Q&A ϛϡʔττϥϯϖοτͱ ςφʔαοΫε͕ϝϩσΟʔΛ ԋ͢Δδϟζָۂ Music search είΞ: 0.91
• Cross-attention: TransformerϕʔεͳΒ Ͱͷํ๏ɽ M&LͰQueryͱKey&Value ΛͦΕͧΕҟͳΔϞʔμϧ͔ΒಘΔ 12 [Perez 18] FiLM Visual reasoning with a General Conditioning Layer. Perez et al. AAAI2018 https://vaclavkosar.com/ml/cross-attention-in- transformer-architecture