Slide 1

Slide 1 text

NAACL HLT 2016 ʢBest Paperʣ ࿦จಡΈձ@य़೔ΤϦΞɹ2016೥7݄16೔(౔) ಡΜͩਓɿhimkt * εϥΠυதͷਤ͸͢΂ͯ࿦จ͔ΒҾ༻

Slide 2

Slide 2 text

Overview • ෳ਺ͷ࣭໰Ԡ౴λεΫʹରԠͰ͖ΔϞσϧͷఏҊ • ը૾ • ߏ଄Խ͞Εͨ஌ࣝϕʔε • ࣭໰จΛߏจղੳͯ͠ରԠ͢ΔωοτϫʔΫΛಈతʹ ߏங͢ΔʢDynamic Neural module networkʣ • ύϥϝʔλͷֶशʹ͸ڧԽֶशΛ࢖͍ͬͯΔ

Slide 3

Slide 3 text

Overview 1 2 3

Slide 4

Slide 4 text

Overview - 1. Network Layout • ࣭໰จΛ܎Γड͚ղੳʢStanford Dependency Parserʣ • ܎Γड͚݁Ռʹ΋ͱ͍ͮͯऔΓ͏ΔωοτϫʔΫߏ଄ͷ ީิΛྻڍ • ࣭໰จΛॴ༩ͱͨ͠ࡍͷωοτϫʔΫʹؔ͢Δ৚݅෇͖ ֬཰ΛධՁͯ͠ωοτϫʔΫΛܾఆ

Slide 5

Slide 5 text

Overview - 2. Module inventory 1 2 3

Slide 6

Slide 6 text

Module inventory • 6छྨͷϞδϡʔϧͱݺ͹ΕΔؔ਺ • Attention͔LabelΛग़ྗ͢Δ • Attention: pixels • Label: true/false or lexicon (e.g. “bird”) • ֤Ϟδϡʔϧ͸ग़ྗͱҾ਺ʹؔͯ͠ʮܕʯ੍໿Λ࣋ͭ • Lookup :: input -> Attention • Find :: input -> Attention • Relate :: Attention -> Attention • And :: Attention* -> Attention • Describe :: Attention -> Labels • Exists :: Attention -> Labels

Slide 7

Slide 7 text

Attention • find :: input -> Attention • ը૾ͷҰ෦ʢpixelͷू߹ʣΛग़ྗ

Slide 8

Slide 8 text

Overview - 3. Produce an answer 1 2 3

Slide 9

Slide 9 text

Produce an answer • What color is the bird? -> (describe[color] find[bird]) -> black and white (lexicon) • Are there any states? -> (exists find[state]) -> true

Slide 10

Slide 10 text

Components • Layout model • ωοτϫʔΫߏ଄Λਪఆ͢Δ • Execution model • ճ౴Λੜ੒͢Δ • Training • ;ͨͭͷύϥϝʔλΛಉ࣌ʹֶश • ڧԽֶश p(z|x; l ) pz (y|w; e )

Slide 11

Slide 11 text

Layout Model • ৚͖݅ͭ֬཰͸ιϑτϚοΫεͷग़ྗ • ͨͩ͠ɼ • ɹɹɹɹɹɹɹ͸ύϥϝʔλ • ɹɹɹ͸LSTMͷग़ྗ • ɹɹɹ͸ɹ ʢi൪໨ͷީิͷωοτϫʔΫʣͷ embedding? ʢfeature vectorʣ p(zi |x; l) = es(zi |x) n j=1 es(zj |x) s(zi |x) = aT (Bhq (x) + Cf(zi ) + d) l = (a, B, C, d) hq (x) f(zi ) zi

Slide 12

Slide 12 text

Execution Model • ճ౴Λੜ੒͢ΔϞσϧ • ࣗ਎ͷೖྗ͕Θ͔͍ͬͯΔͱ͖ ɹͱॻ͚Δ pz(y|z) = z w y ( z w )y = m(h1, h2) and(find, relate(lookup))

Slide 13

Slide 13 text

Training • ڧԽֶश • ɹΛɹɹɹɹɹ͔ΒαϯϓϦϯά • ωοτϫʔΫ͕ܾఆͨ͠ΒɹɹɹɹɹɹɹΛ
 ௚઀࠷େԽͯ͠ɹɹΛߋ৽ • Policy Gradient MethodʹΑΓɹ Λߋ৽ • ޯ഑ɿɹɹɹɹɹɹɹɹɹɹɹɹɹɹʢɹ͸ใुʣ z p(z|x; l ) log p(y|z, x, e ) e l J( l ) = E log p(z|x; l ) · r r J( l ) = E log p(z|x; l ) log p(y|z, w; e )

Slide 14

Slide 14 text

Experimental result • VisualQAʢTable 1ʣͱGeoQAʢTable 2ʣͰstate-of-the-art • VisualQA: images • GeoQAɿstructured domains • ෳ਺ͷ࣭໰Ԡ౴λεΫʹରԠͰ͖Δ͜ͱ͕ূ໌͞Εͨ