Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning Co-Substructures by Kernel Dependence Maximization (2017-08-04)

Sho Yokoi
August 04, 2017

Learning Co-Substructures by Kernel Dependence Maximization (2017-08-04)

ERATO感謝祭 SeasonIV

Sho Yokoi

August 04, 2017
Tweet

More Decks by Sho Yokoi

Other Decks in Research

Transcript

  1. Learning Co-Substructures by Kernel Dependence Maximization IJCAI’17 ɹ ɹ ԣҪ

    ঵1,2 ɼ࣋ڮ େ஍3 ɼߴڮ ྒ1 ɼԬ࡚ ௚؍4 ɼס ݈ଠ࿠1,2 1 ౦๺େɼ2RIKEN/AIPɼ3 ౷਺ݚɼ4 ౦޻େ ERATO ײँࡇ SeasonIV 2017-08-04 1 / 31
  2. ಋೖɿ΍Γ͍ͨ͜ͱ dinner have X be full X ( ) ,

    with friends at restaurant favorite Japanese dinner have X be full X ( ) , “Bob had dinner with his friends at his favorite Japanese restaurant just now and he is full.” ʪIBWFEJOOFS CFGVMMʫ ֫ಘ͍ͨ͠ৗࣝత஌ࣝ ίʔύε಺ͷੜจ ༨ܭͳޠ͕ͨ͘͞Μೖ͍ͬͯΔ ஌ࣝʹؚΊΔޠΛࣗಈͰબ୒͍ͨ͠ ࠜ෇͖෦෼໦Λڭࢣͳ͠Ͱܾఆ͢Δ໰୊ͩͱࢥ͏ 1 2 3 2 / 31
  3. ؔ৺ɿݴޠදݱϖΞͷ֫ಘɾ༧ଌ ؔ࿈͢ΔݴޠදݱϖΞͷ֫ಘɾ༧ଌʢNLP ͷத৺՝୊ͷͻͱͭʣ • ίʔύε͔Βɼؔ࿈͢ΔݴޠදݱϖΞΛऩूʢ஌ࣝ֫ಘʣ • ༩͑ΒΕͨݴޠදݱϖΞʹؔ࿈͕͋Δͷ͔ͳ͍ͷ͔Λ༧ଌ ͨͱ͑͹ • ୯ޠͱ୯ޠͷؔ܎ɿ֓೦ಉ࢜ͷؔ܎

    • ҙຯ͸͍͔ۙɼ্ҐԼҐؔ܎Λ͔࣋ͭ • จͱจͷؔ܎ɿ໋୊ಉ࢜ͷؔ܎ • ؚҙؔ܎ʹ͋Δ͔ɼҼՌؔ܎ʹ͋Δ͔ • ΠϕϯτͱΠϕϯτͷؔ܎ ࠓճྫͱͯ͠࠾༻ • యܕతʹ࿈ଓͯ͠ੜ͡ΔΠϕϯτରΛ֫ಘɾ༧ଌ͍ͨ͠ [Schank&Abelson’77] • ྫɿ⟨have dinner, be full⟩ 4 / 31
  4. ໰୊ɿ֫ಘύλʔϯ͕ݻఆ ΠϕϯτϖΞ֫ಘɾ༧ଌͷయܕతϓϩηε [Chambers&Jurafsky’08] 1. จϖΞͷऩूɿڞࢀর߲Λ࣋ͭจରΛίʔύε͔Βऩू Tom killed nancy. The police

    arrested him immediately. DPSFGFSFOU 2. ༧ΊܾΊΒΕͨ ந৅දݱʹม׵ɿʮड़ޠಈࢺͱొ৔ਓ෺ ͷҐஔʯʹண໨ → ⟨X kill, arrest X⟩ 3. ϞσϧԽɿPMI Ͱؔ࿈ͷྑ͞ΛϞσϧԽ PMI(X kill, arrest X) = log p(X kill, arrest X) p(X kill)p(arrest X) Ұݟ໰୊ͳͦ͞͏ • ஌ࣝ֫ಘɿՄಡతͳ஌ࣝ ⟨X kill, arrest X⟩ • ༧ଌɿࣗݾ૬ޓ৘ใྔʢPMIʣʹΑΔϞσϧԽ 5 / 31
  5. ໰୊ɿ֫ಘύλʔϯ͕ݻఆ ઌͷख๏͸ ৔߹ʹΑͬͯ ໰୊͕͋Δ [Granroth-Wilding&Clark’16] • ⟨“Tom had had absent

    repeatedly.”, “He was fired.”⟩ ˠ ⟨X have, fire X⟩ • ⟨“Bob has a talent for accounting work.”, “He was hired with favorable treatment.”⟩ ˠ ⟨X have, hire X⟩ ͜ͷ৔߹ʹ ཧ૝తͳ஌ࣝ • ⟨X have absent repeatedly, fire X⟩ ɼ⟨X have talent, hire X⟩ ඞཁͳ৘ใ͸ΠϯελϯεຖʹҟͳΔ ଞɼ࣍ͷΑ͏ͳޠ۟ͷ༗ແʹΑͬͯ΋Πϕϯτͷҙຯ͕େ ͖͘มΘΓಘΔ • ൱ఆදݱ • ಛఆͷ৚݅Λද͢म০અ • etc. 6 / 31
  6. ղ͖͍ͨ໰୊ ϖΞΛϖΞͨΒ͠Ί͍ͯΔ෦෼ߏ଄ΛɼΠϯελϯεຖʹɼ ڭࢣͳ͠Ͱܾఆ͢Δʢ֫ಘ͢Δ஌ࣝͷ “ཻ౓” Λڭࢣͳ͠Ͱܾఆ͍ͨ͠ʣ is My sad He extremely

    is sadness with , , , , , , , falling asleep She full She very trouble Tom have I He filled heart is is is falling asleep concentrating full She very trouble trouble Tom have I Bob He is is stuffed stuffed has My sadness with filled heart Z outdoors with John eat dinner has Italian restaurant at dinner has restaurant at Italian eat with John outdoors She , concentrating trouble Bob sad He has extremely is ࠓճͷઃఆɿ֤จΛґଘߏ଄໦Ͱදݱ ʢࠜ෇͖෦෼໦ͷେ͖͞Λௐ੔͢Ε͹දݱͷந৅౓Λௐ੔Ͱ͖Δʣ 8 / 31
  7. ఆࣜԽɿैଐੑ࠷େԽ ೖྗ จͷϖΞͷू߹ D = {(si , ti )}n i=1

    ग़ྗ ݩͷจͷ෦෼ߏ଄ͷϖΞͷू߹ Z = {(xi , yi )}n i=1 ໨తؔ਺ Z i.i.d. ∼ PXY ͱݟͯ maximize D[PXY ∥PX PY ] is My sad He extremely is sadness with , , , , , , , falling asleep She full She very trouble Tom have I He filled heart is is is falling asleep concentrating full She very trouble trouble Tom have I Bob He is is stuffed stuffed has My sadness with filled heart Z outdoors with John eat dinner has Italian restaurant at dinner has restaurant at Italian eat with John outdoors She , concentrating trouble Bob sad He has extremely is cf. ಛ௃બ୒:ೖग़ྗؒͷؔ࿈ͷྑ͞Λैଐੑʹؼண [Peng+’05][Song+’12] 9 / 31
  8. ैଐੑ࠷େԽʹ൐͏໰୊ ໨తؔ਺ɿσʔλɾεύʔεωε  ਺ઍʙ਺ඦສΦʔμʔͷޠኮ ʜ ͷ૊߹ͤ → ݸʑͷ෦෼ߏ଄͸௒௿ස౓ ྫɿhave dinner

    at my favorite Italian restrant ʢφΠʔϒͳ࠷໬ਪఆʹ͓͚Δʣ֤ p(x, y), p(x) ͸ඇৗʹεύʔε I(X;Y) = KL[PXY ∥PX PY ] = ∑ x ∑ y p(x, y) log p(x, y) p(x)p(y)  ଟ༷ͳݴ͍׵͑දݱʢࣗવݴޠॲཧʹৗʹ͖ͭ·ͱ͏໰୊ʣ ྫɿget angry ↔ be offended → ׬શҰகͰ͸ͳ͘ྨࣅ౓ʹج͍ͮͯैଐੑΛଌΓ͍ͨ ୳ࡧɿ૊߹ͤരൃ ֤ xi ֤ yi ʹ͍ͭͯͦΕͧΕ෦෼໦ͷऔΓํΛߟ͑Δඞཁ͕ ͋Δ 10 / 31
  9. ໨తؔ਺ʢैଐੑई౓ʣ ɿHSIC Hilbert–Schmidt Independence Criterion [Gretton+05] • Χʔωϧ๏ϕʔεͷಠཱੑɼैଐੑई౓ HSIC(X,Y) =

    MMD2(PXY , PX PY ) • ग़ྗ Z = {(xi , yi )}N i=1 i.i.d. ∼ PXY ʹର͢Δ HSIC ͷਪఆྔ HSIC(Z; k, ℓ) := 1 N2 tr(KHLH) = 1 N2 tr(˜ K˜ L) K = (k(xi , xj )) ∈ RN×N, L = (ℓ(yi , yj )) ∈ RN×N k : X × X → R, ℓ : Y × Y → R (ਖ਼ఆ஋Χʔωϧ) ˜ K := HKH, ˜ L := HLH (த৺ԽάϥϜߦྻ) H = (δij − N−1) ∈ RN×N 13 / 31
  10. HSICͷਪఆྔͷؾ࣋ͪ HSIC ͷਪఆྔ xi xj yi yj ˜ k( xi,

    xj) ˜ `(yi, yj) ˜ ki ˜ `i ˜ K ˜ L ʮHSIC ͷਪఆ஋͕େ͖͍ʯʹʮΧʔωϧؔ਺ʢྨࣅ౓ʣ͕ ఆΊΔڑ཭͕ೖۭͬͨؒʹ์ΓࠐΉͱɼલ݅ଆ͓Αͼޙ݅ ଆͷϑϨʔζͷ૬ରతͳҐஔؔ܎͕͍͍ͩͨҰக͢Δʯ have dinner be full 14 / 31
  11. HSICͷਪఆྔͷؾ࣋ͪ HSIC ͷਪఆ஋͸ҎԼͷ৔߹ʹେ͖͘ͳΔ • X ଆ͕ࣅ͍ͯΕ͹ Y ଆ΋ࣅ͍ͯΔ • X

    ଆ͕ࣅ͍ͯͳ͚Ε͹ Y ଆ΋ࣅ͍ͯͳ͍ Similar Similar Unsimilar Unsimilar Z is My sad He extremely is , , , , falling asleep concentrating full She very trouble trouble Tom have I Bob He is is stuffed has sadness with filled heart dinner has restaurant at Italian eat with John outdoors She  ྨࣅੑʹج͍ͮͨҰ؏ͨ͠ʢैଐੑͷߴ͍ʣ஌ࣝΛظ଴Ͱ͖Δ  ׬શҰகʹج͍ͮͨ਺্͑͛Ͱ͸ͳ͍ͷͰσʔλɾεύʔε ωεʹରԠͰ͖Δ 15 / 31
  12. ୳ࡧɿMetropolis–Hastings ҎԼͷ෼෍্Ͱ Metropolis–Hastings (MCMC) ͰαϯϓϦϯά ʢম͖ͳ·͠Λͯ͠΋΄ͱΜͲҙຯͳ͠ɽద౰ͳ β = const. Ͱ΄΅Ұ௚ઢʹανΔʣ

    p(Z; k, ℓ, β) ∝ exp(β · HSIC(Z; k, ℓ)) গͣͭ͠ࢬͷמΓํΛม͑ͳ͕Β֬཰తʹࢁొΓ Z Z0 Z00 q(Z0|Z) q(Z00|Z0) 17 / 31
  13. ୳ࡧɿఏҊ෼෍ 1. ݱࡏͷղީิɿZ = {(xi , yi )}n i=1 2.

    ໦Λબ୒ɿxi ·ͨ͸ yi Λͻͱͭબ୒ q(x|Z) = q(y|Z) = 1 2n 3. ࢬΛબ୒ɿબ୒͞Εͨ x ΛΘ͔ͣʹม͑ͯ৽͍͠෦෼ߏ଄ x′ Λ ࡞Γ (q(x′|x))ɼ৽͍͠ղީิ Z′ = {. . ., (x′ i , yi ), . . . }n i=1 ΛಘΔ q(x′|x) = 1/|M(x)| (x′ ∈ M(x)), 0 (otherwise) 4. ֬཰ min(1, r) Ͱ Z′ Λडཧ r = p(Z′; k, ℓ, β) p(Z; k, ℓ, β) · q(Z|Z′) q(Z′|Z) = exp(β · (HSIC(Z′; k, ℓ) − HSIC(Z; k, ℓ))) · q(x|x′) q(x′|x) 5. 2–4 Λ܁Γฦ͠ 18 / 31
  14. ܭࢉίετ • த৺ԽάϥϜߦྻ ˜ K, ˜ L Λߏ੒͢Δͷ͸࠷ॳͷ 1 ճ͚ͩ

    ɹ O(N2) • αϯϓϦϯάຖʹάϥϜߦྻ K, L Λ 1 ߦ͚ͩߋ৽ ɹ O(N) • → K, L ΛʢϥϯΫ κʣෆ׬શίϨεΩʔ෼ղ͔ͯ͠Β HSIC(Z; k, ℓ) Λܭࢉ ɹ O(κ2N) 19 / 31
  15. ͜͜·Ͱͷ·ͱΊ ղ͖͍ͨ໰୊ɿϖΞΛϖΞͨΒ͠Ί͍ͯΔ෦෼ߏ଄Λڭࢣͳ͠Ͱܾఆ͢Δ ೖྗ จͷϖΞͷू߹ D = {(si , ti )}n

    i=1 ग़ྗ ݩͷจͷ෦෼ߏ଄ͷϖΞͷू߹ Z = {(xi , yi )}n i=1 ໨తؔ਺ Z i.i.d. ∼ PXY ͱݟͯ max. D[PXY ∥PX PY ]ʢैଐੑ࠷େԽʣ is My sad He extremely is sadness with , , , , , , , falling asleep She full She very trouble Tom have I He filled heart is is is falling asleep concentrating full She very trouble trouble Tom have I Bob He is is stuffed stuffed has My sadness with filled heart Z outdoors with John eat dinner has Italian restaurant at dinner has restaurant at Italian eat with John outdoors She , concentrating trouble Bob sad He has extremely is ఏҊख๏ • ໨తؔ਺ɿ σʔλɾεύʔεωεˍଟ༷ͳݴ͍׵͑ →  HSIC • ୳ࡧɿ ૊߹ͤരൃ →  MH Ͱ֬཰తࢁొΓ 20 / 31
  16. ఆੑධՁɿখن໛ਓ޻σʔλ͔Βͷ஌ࣝ֫ಘ ೖྗɿD = {(si , ti )}12 i=1 si ti

    I have had breakfast at my house . I am full . We had special dinner . We are full . I have had breakfast at ten . I ’m full . They had breakfast at the eatery . They are full now . She had breakfast with her friends . She felt happy . I had breakfast with my friends at my uncle ’s house . I feel happy . They had breakfast with their friends at the cafeteria . They felt happy . He had lunch with his friends at eleven . He felt happy . I had trouble associating with others . I cry . He had trouble with his homework . He cries . I have trouble concentrating . I cry . She had trouble reading books . She cries . ྨࣅ౓ʢΧʔωϧʣɿ k(xi , xj ) = cos(ave(wordvecs(xi )), ave(wordvecs(xj ))) ℓ(yi , yj ) = cos(ave(wordvecs(yi )), ave(wordvecs(yj ))) ϑϨʔζؒྨࣅ౓ͷయܕతͳई౓ɽֶशࡁΈ୯ޠϕΫτϧΛར༻ɽ 23 / 31
  17. ఆੑධՁɿখن໛ਓ޻σʔλ͔Βͷ஌ࣝ֫ಘ ग़ྗɿZ = {(xi , yi )}12 i=1 ଠࣈɿఏҊΞϧΰϦζϜ͕࢒ͨ͠୯ޠ xi

    yi I have had breakfast at my house . I am full . We had special dinner . We are full . I have had breakfast at ten . I ’m full . They had breakfast at the eatery . They are full now . She had breakfast with her friends . She felt happy . I had breakfast with my friends at my uncle ’s house . I feel happy . They had breakfast with their friends at the cafeteria . They felt happy . He had lunch with his friends at eleven . He felt happy . I had trouble associating with others . I cry . He had trouble with his homework . He cries . I have trouble concentrating . I cry . She had trouble reading books . She cries .  1 ౓͚ͩग़ݱ͢Δ୯ޠ (dinner, lunch) ͕සग़ޠ (breakfast) ͱͷྨࣅ౓ʹ ج͍ͮͯ࢒͞ΕΔ  ୈ 2 ϒϩοΫͷ (with) friends ͕࢒͞Εɼࠨล → ӈลͷ༧ଌΛ༰қʹ 24 / 31
  18. ఆྔධՁɿ࣮ίʔύεΛ༻͍ͨؔ܎༧ଌ raw data representation abstract model 2. training training 1.

    abstraction test Z={( xi, yi)} i 1. abstraction 2. scoring ֶशʢ஌ࣝ֫ಘʣ 1. ίʔύε͔Βڞࢀর߲Λ࣋ͭจରΛऩूɿD = {(si , ti )}n i=1 ɹྫɿ⟨“Tom killed Nancy.”, “The police arrested him immediately.”⟩ 2. ந৅දݱʹม׵ͯ͠อଘɿZ = {(xi , yi )}n i=1 ɹྫɿ⟨X kill, arrest X⟩ ༧ଌ 1. จର (s, t) Λந৅දݱ (x, y) ʹม׵ 2. ूΊͨ Z Λ༻͍ (x, y) ͷؔ࿈ͷྑ͞ΛείΞϦϯάɿg(x, y; Z) ධՁई౓ɿAUC-ROC 26 / 31
  19. ఆྔධՁɿؔ࿈ͷڧ͞ͷई౓ Poitwise Mutual Information [C&J’08] PMI(x, y; Z) = log

    N · c(x, y) c(x)c(y) Pointwise HSICɿத৺Խͨ͠Χʔωϧີ౓ਪఆ PHSIC(x, y; Z) := 1 N N ∑ i=1 ˜ k(x, xi )˜ ℓ(y, yi ) ˜ k(·, ·) ͸ط஌ͷσʔλ఺ {xi }N i=1 Ͱத৺Խͨ͠Χʔωϧ ˜ k(x, x′) := k(x, x′) − 1 N N ∑ j=1 k(x, xj ) − 1 N N ∑ i=1 k(xi , x′) + 1 N2 N ∑ i=1 N ∑ j=1 k(xi , xj ) X ʹ࿨͕ఆٛ͞Ε͍ͯΕ͹ ˜ k(x, x′) = k(x − ¯ xi , x′ − ¯ xi ) 27 / 31
  20. PMI:MI ≈ PHSIC:HSIC PHSIC ͸ྨࣅ౓ͰεϜʔδϯάͨ͠ PMI ʹݟ͑Δ PMI ͰଌΔ (x,

    y) ͷؔ࿈ͷྑ͞ • x = xi ∧ y = yi ͳΔ (xi , yi ) ͕ଘࡏ → PMI ্͕ঢ • x = xi ⊻ y = yi ͳΔ (xi , yi ) ͕ଘࡏ → PMI ͕௿Լ PHSIC ͰଌΔ (x, y) ͷؔ࿈ͷྑ͞ • ˜ k(x, xi )˜ ℓ(y, yi ) > 0 ͳΔ (xi , yi ) ͕ଘࡏ → PHSIC ্͕ঢ ɹ “x ≈ xi ∧ y ≈ yi” ͷͱ্͖ঢ • ˜ k(x, xi )˜ ℓ(y, yi ) < 0 ͳΔ (xi , yi ) ͕ଘࡏ → PHSIC ͕௿Լ PMI:MI ≈ PHSIC:HSIC MI(X,Y; Z) = 1 N ∑ i PMI(xi , yi; Z) HSIC(X,Y; Z) = 1 N ∑ i PHSIC(xi , yi; Z) 28 / 31
  21. ఆྔධՁɿ࣮ίʔύεΛ༻͍ͨؔ܎༧ଌ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4

    0.6 0.8 1.0 C&J'08 Jans et al.'12 C&J'08+PHSIC Proposed (a) Gigaword ɹ N = 16,748 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 C&J'08 Jans et al.'12 C&J'08+PHSIC Proposed (b) Fairy Tale ɹ N = 1,673 Method Abstraction Model Gigaword Fairy Tale [C&J’08] Fixed (C&J) PMI 0.553 0.596 [Jans+’12] Fixed (C&J) Conditional 0.556 0.576 [C&J’08] + PHSIC Fixed (C&J) PHSIC 0.518 0.518 Proposed Dynamic PHSIC 0.633 0.646  ʮΠϯελϯεຖʹ஫໨͢΂͖৔ॴΛܾΊΔʯΞϓϩʔν͸༧ଌਫ਼ ౓ʹ΋د༩  PHSIC ͱ͍͏༧ଌϞσϧͰείΞ͕޲্͍ͯ͠ΔΘ͚Ͱ͸ͳ͍ 29 / 31
  22. ·ͱΊ ໰୊ɿ ʮϖΞΛϖΞͨΒ͠Ί͍ͯΔ෦෼ߏ଄Λ୳͢ʯ໰୊ΛఏҊ ೖྗ จͷϖΞͷू߹ D = {(si , ti

    )}n i=1 ग़ྗ ݩͷจͷ෦෼ߏ଄ͷϖΞͷू߹ Z = {(xi , yi )}n i=1 ໨తؔ਺ Z i.i.d. ∼ PXY ͱݟͯ max. D[PXY ∥PX PY ]ʢैଐੑ࠷େԽʣ ఏҊख๏ • ໨తؔ਺ɿ σʔλɾεύʔεωεˍଟ༷ͳݴ͍׵͑ →  HSIC • ୳ࡧɿ ૊߹ͤരൃ →  MH Ͱ֬཰తࢁొΓ ࣮ݧɿΠϕϯτϖΞͷ֫ಘɾ༧ଌ • ఆྔධՁɿఏҊख๏͕஌ࣝ֫ಘͷ؍఺Ͱཧ૝తʹಈ͘ • ఆྔධՁɿΠϯελϯεຖͷந৅Խ͕༧ଌਫ਼౓ʹߩݙ ࠓޙͷऔΓ૊Έ • ߴ଎Խɿݱঢ়਺ສΦʔμʔ → ਺ඦສΦʔμʔ • ΑΓਫ਼៛ͳྨࣅ౓ؔ਺ͷಋೖɿߏ଄Χʔωϧ • ଞλεΫ΁ͷద༻ 31 / 31