Slide 1

Slide 1 text

ABCNN: Attention-Based Convolutional Neural Network for Medeling Sentence Pairs Wenpeng Yin, Hinrich Schütze, Bin Xiang, Bowen Zhou TACL 2016 ※εϥΠυதͷਤද͸શͯ࿦จ͔ΒҾ༻͞Εͨ΋ͷ খொक ୈ8ճ࠷ઌ୺NLPษڧձ@౦େ 2016/09/12

Slide 2

Slide 2 text

จରͷϞσϦϯά͸Ԡ౴બ୒ɾ ݴ͍׵͑ɾؚҙؔ܎ೝࣝʹඞཁ 2 ※ද૚͕ࣅ͍ͯͯ΋ҙຯ͕ҟͳΔ৔߹͕͋Δ

Slide 3

Slide 3 text

จରϞσϦϯάͷ໰୊఺ | ݸผͷλεΫʹಛԽͯ͠γεςϜΛ࠷దԽ ʢλεΫ͝ͱʹ࠷దԽͨ͘͠ͳ͍ʣ | ϖΞΛߟྀͤͣʹจදݱΛݸผʹϞσϧԽ ʢਓؒ͸ϖΞΛߟྀ͍ͯ͠Δʣ | ਓखͰઃܭͨ͠λεΫʹಛԽͨ͠ૉੑ ʢਓखͰૉੑ޻ֶͨ͘͠ͳ͍ʣ 3

Slide 4

Slide 4 text

ABCNN: Attention Based Convolutional Neural Network | ݸผͷλεΫʹಛԽͯ͠γεςϜΛ࠷దԽ →จରϞσϦϯά͕ඞཁͳ༷ʑͳλεΫʹద༻Մ | ϖΞΛߟྀͤͣʹจදݱΛݸผʹϞσϧԽ →จϖΞΛߟྀ | ਓखͰઃܭͨ͠λεΫʹಛԽͨ͠ૉੑ →ૉੑઃܭͳ͠ʹɺԠ౴બ୒ɾݴ͍׵͑ಉఆɾςΩετ ؚҙؔ܎ೝࣝλεΫͰ state-of-the-art https://github.com/yinwenpeng/Answer_Selection 4

Slide 5

Slide 5 text

ࣗવݴޠॲཧʹ͓͚Δ CNN | จʢରʣ෼ྨλεΫͰΑ͘༻͍ΒΕ͍ͯΔ (Kalchbrenner et al., 2014; Kim, 2014; Socher et al., 2011; Yin and Schütze 2015) | ؤ݈ʹೖྗͷந৅తͳಛ௃Λଊ͑Δ͜ͱ͕Ͱ͖ Δʢͱߟ͑ΒΕ͍ͯΔʣ 5

Slide 6

Slide 6 text

ͳͥ LSTM Ͱ͸ͳ͘ CNN? | LSTM ͸จ຺શମΛΤϯίʔυ; CNN ͸ϩʔΧ ϧͳϑϨʔζΛʢϑΟϧλʔʹΑͬͯʣݕग़ →attention ͷ࡞༻ͷ࢓ํ͕ҧ͍ɺCNN ͷํ͕ attention ͕༗ޮʹʢہॴΛݟΔ CNN ͱશମΛݟ Δ attention ͕૬ิతʹʣಇ͘ | ৞ΈࠐΈ૚ΛॏͶͯந৅౓Λ্͛Δ͜ͱ͕Մೳ →ʢจର͔Βɺಉٛͱ͍͏ΑΓΉ͠ΖಉٛͰͳ͍෦ ෼Λݕग़͢΂͖ʣԠ౴બ୒λεΫɺςΩετؚҙؔ ܎ೝࣝλεΫͰ͸ಛʹ༗ޮ 6

Slide 7

Slide 7 text

BCNN ͷجૅ |Siamese architecture |จશମͷฏۉϓʔϦϯά ʢall-apʣͱ΢Οϯυ΢෯಺ ͷฏۉϓʔϦϯάʢw-apʣ →શ෦ग़ྗ૚ʹ౉͢ |୯ޠຒΊࠐΈ͸ word2vec ͰॳظԽ 7 p i = tanh(W⋅c i + b) p: ϑϨʔζຒΊࠐΈ c: ୯ޠຒΊࠐΈͷ࿈݁ W: ڞ௨ͷ৞ΈࠐΈॏΈ

Slide 8

Slide 8 text

ରԠ͢ΔจͷͲ͜ΛݟΔ ͔ΛΞςϯγϣϯͰߟྀ ͢Δ 8 F 0,a = W 0 ⋅AT F 1,a = W 1 ⋅A A i, j = match-score(F 0,r [:,i],F 1,r [:, j]) Ξςϯγϣϯಛ௃ྔϚοϓ →W0 ͱW1 ͸ֶश͢Δ match-score ͸ϢʔΫϦου ڑ཭Λ༻͍ͯ 1 / (1 + | x – y |)

Slide 9

Slide 9 text

ABCNN-2:ΞςϯγϣϯΛ ೖྗͰ͸ͳ͘ग़ྗʹ͔͚ Δ 9 Fp i,r [:, j]= a i,k k=j:j+w ∑ Fc i,r [:,k] ৞ΈࠐΈͷग़ྗʹΞςϯγϣϯॏΈ ʢߦ·ͨ͸ྻͰ࿨ΛऔΔ͜ͱʹΑΓɺ Ϣχοτ͝ͱͷॏΈʹू໿͢Δʣ

Slide 10

Slide 10 text

ABCNN-3: ABCNN-1 ͱ ABCNN-2 ΛελοΩϯά 10 ABCNN-1 ͸୯ޠϨϕϧͷରԠΛݟΔ͜ͱ͕Ͱ͖Δ ABCNN-2 ͸ϑϨʔζϨϕϧͷந৅Խ͕Ͱ͖Δ

Slide 11

Slide 11 text

Ԡ౴બ୒ɾݴ͍׵͑ݕ ग़ɾؚҙ؆қೝࣝλεΫ ͰධՁ 11 | AdagradʴL2 ਖ਼ଇԽʴlayerwise training | ϋΠύʔύϥϝʔλ͸ dev ηοτͰνϡʔχϯά

Slide 12

Slide 12 text

σʔληοτ | WikiQA { ΦʔϓϯυϝΠϯͷQAσʔληοτ { ܇࿅ : ։ൃ : ςετ = 20,360 : 1,130 : 2,352 | MSRP (Microsoft Research Paraphrase) { ܇࿅ = 2,753 T : 1,323 F, ςετ = 1,147 T : 578 F; ։ൃσʔλ͸܇࿅͔ΒϥϯμϜʹ400ࣄྫநग़ | SICK { ؚҙɺໃ६ɺதཱͷ3Ϋϥε෼ྨ { ܇࿅ : ։ൃ : ςετ = 4,439 : 495 : 4,906 12

Slide 13

Slide 13 text

Ԡ౴બ୒͸ BCNN Ͱ͢Ͱʹ state-of-the-art Λ௒͍͑ͯΔ 13 ΞςϯγϣϯΛ޻෉͢Ε͹ͲΜͲΜਫ਼౓্͕͕Δ ʢͨͩ͠૚͸ਂͯ͘͠΋΄ͱΜͲมΘΒͳ͍ʣ

Slide 14

Slide 14 text

ݴ͍׵͑ಉఆͰ͸BCNN͸ state-of-the-artΑΓগ͠ѱ͍ 14 Ԡ౴બ୒ͱಉ͡Ͱɺ૚Λਂͯ͘͠΋΄ͱΜͲมΘΒͳ͍

Slide 15

Slide 15 text

ςΩετؚҙؔ܎ೝࣝ΋ BCNN Ͱ state-of-the-art Λୡ੒ 15 ※ݴޠతͳಛ௃ྔΛআ͍ͨ৔߹ɺABCNN-3ͷਫ਼౓͸84.6 ʢͦΕ·Ͱͷ state-of-the-art ͱಉఔ౓ʣ

Slide 16

Slide 16 text

ςΩετؚҙؔ܎ೝࣝλεΫͷ unigram ͷՄࢹԽ 16 ୯ޠϨϕϧͰྨࣅ͍ͯ͠Δͱ͜ΖʹΞςϯγϣϯ͞ΕΔ ʢྨࣅ͢Δ୯ޠ͕ͳ͚Ε͹Ξςϯγϣϯ͞Εͳ͍ʣ

Slide 17

Slide 17 text

ςΩετؚҙؔ܎ೝࣝλεΫͷ CNN1૚໨ʢ௕͞3ʣͷՄࢹԽ 17 ͋Δछͷڞࢀর໰୊Λղ͚͍ͯΔʁ ʢon it ͱ building ͷؒʹΞςϯγϣϯ͕ுΒΕ͍ͯΔʣ

Slide 18

Slide 18 text

ؚҙؔ܎ೝࣝλεΫͷCNN 2૚໨ʢ௕͞5ʣͷՄࢹԽ 18 શମతʹ΅΍ͬͱ͍ͯ͠Δ͕ɺ௕͍ϑϨʔζΛநग़ ʢseveral murals on it ~= of a colorful buildingʣ

Slide 19

Slide 19 text

·ͱΊ $//ʹΑΔจରϞσϧ ʹΞςϯγϣϯΛ౷߹ | ΞςϯγϣϯΛೖΕΔ͜ͱ͕༗ޮ | 1૚͔2૚͔͸ͦΕ΄Ͳҧ͍͕ͳ͍ | ݴޠతͳಛ௃ྔ΋༗ޮ ʢͨͩ͠ɺABCNNͳΒݴޠతಛ௃ྔΛ࢖Θͳ͘ ͯ΋ state-of-the-art Λୡ੒Ͱ͖Δʣ 19