Slide 1

Slide 1 text

2012೥αϚʔϒʔτΩϟϯϓ ೔ຊਓͷӳ࡞จʹ͓͚ΔޡΓͷࣗಈݕग़ ԋश ಸྑઌ୺Պֶٕज़େֶӃେֶ ࣗવݴޠॲཧֶݚڀࣨ ॏ౻༏ଠ࿠ ਫຊஐ໵ খொक

Slide 2

Slide 2 text

೔ຊਓେֶੜͷॻ͍ͨ ӳ࡞จͷޡΓݕग़Λ͠·͢ KJ ίʔύεʢKJ=ߕೆେֶɾڭҭଌఆݚڀॴʣ |  ೔ຊޠֶशऀͷ170ΤοηΠʢ՝୊࡞จʣ |  จ๏ޡΓ৘ใͱ඼ࢺɾ۟৘ใΛਓख෇༩ [NP He/PRP ] [VP works/VBZ ] [PP to/TO ] [NP the/DT flowershop/NN-NN ] ./. He works to the flowershop. ˞ prp: લஔࢺ preposition error at: ףࢺ article error crr: గਖ਼ correction 2

Slide 3

Slide 3 text

ೖྗ͸ֶशऀͷӳ࡞จͰɺ ग़ྗ͸ਖ਼ޡͷϥϕϧͰ͢ ֶशʢ܇࿅ʣσʔλ ʹਖ਼ղϥϕϧ͕ ɹγεςϜʹ෼͔Δ ςετʢධՁʣσʔλ ʹਖ਼ղϥϕϧ͕ ɹγεςϜʹ෼͔Βͳ͍ |  ςετσʔλʹର͠ γεςϜ͕Ͳͷఔ౓ਖ਼͘͠ਖ਼ղϥϕϧΛ͚ͭΒΕ͔ͨ ʹΑͬͯγεςϜͷੑೳΛධՁ͢Δ |  ൺֱର৅ʹϕʔεϥΠϯͱݺ͹ΕΔ 3 ୯ޠ඼ࢺϥϕϧ He PRP O works VBZ O toat TO ERR thea DT ERR flowershop NN O . . O

Slide 4

Slide 4 text

ޡΓݕग़ΛܥྻϥϕϦϯά ͱͯ͠ղ͖·͢ |  B-ERR: ޡΓՕॴͷ࢝·Γ |  I-ERR: ޡΓݸॴͷଓ͖ |  O: గਖ਼͠ͳͯ͘΋Α͍Օॴ ˞͜ͷΑ͏ͳϥϕϧ෇͚͸BIOλάͱݺ͹ΕΔ He works toat thea flowershop. He O works O toat B-ERR thea I-ERR flowershop O . O 4

Slide 5

Slide 5 text

ͲͷΑ͏ʹܥྻϥϕϦϯά ໰୊Λղ͔͘ ৚݅෇͖֬཰৔ʢConditional Random FieldʣʹΑΔػցֶश 5

Slide 6

Slide 6 text

ޡΓ͔Ͳ͏͔͸ ૉੑͱૉੑͷॏΈͰܾ·Γ·͢ |  “લͷಈࢺ͕ work ͳΒ લஔࢺ͸ at” ˠto ͕ޡΓͷՄೳੑߴ |  ͜ͷΑ͏ͳʮख͕͔ΓʯΛ ʮૉੑʢ͍ͦͤʣʯͱݺͿ |  P (to ͕ޡΓͷՄೳੑ) = w(“લͷ୯ޠ͕ work ͰޡΓݕग़ର৅ͷ୯ޠ͕to”) +w(“࣍ͷ୯ޠ͕ the”) +… >0.5 ͳΒ to ͕ޡΓͱ൑ఆʢw͸ૉੑͷॏΈ weightʣ 6 He O works O toat B-ERR thea I-ERR flowershop O . O

Slide 7

Slide 7 text

ͲͷΑ͏ͳख͕͔ΓΛ࢖͑͹ จ๏ޡΓ͕ݕग़Ͱ͖Δʁ લஔࢺޡΓ |  Ͳͷಈࢺɾ໊ࢺͱ࢖ΘΕΔ͔ ףࢺޡΓ |  ݻ༗໊ࢺ͔ |  લจ຺Ͱࢀর͞Ε͍ͯΔ͔Ͳ͏͔ ਺ͷޡΓ |  Ճࢉ໊ࢺ͔ෆՃࢉ໊ࢺ͔ 7 He O works O toat B-ERR thea I-ERR flowershop O . O ͜ΕΒͷݴޠֶత஌ࣝΛ༻͍ͯ λεΫʹԠͨ͡ૉੑΛઃܭ͍͖ͯ͠·͢

Slide 8

Slide 8 text

σʔλ͔ΒʮޡΓΒ͠͞ʯ Λࣗಈతʹਪఆ͠·͢ “લͷ୯ޠ͕ work Ͱ ޡΓݕग़ର৅ͷ୯ޠ͕ to” ͱ͍͏ૉੑͷॏΈʢϞσϧʣ ͸ͲͷΑ͏ʹܾΊΔʁ |  ஌͕ࣝ͋Ε͹ܾΊΒΕΔ ஌͕ࣝ͋ͬͯ΋ɺૉੑ਺͕๲େͳ৔߹ɺਓखͰܾΊΔ ͷ͸ඇݱ࣮త |  ஌͕ࣝͳ͍৔߹ɺ௚؍ͰܾΊΒΕͳ͍ 8 He O works O toat B-ERR thea I-ERR flowershop O . O ਓखͰਪఆ͢Δͷ͸ࠔ೉ͳͷͰɺ ػցֶशͰσʔλ͔ΒૉੑͷॏΈΛࣗಈਪఆ͠·͢

Slide 9

Slide 9 text

ૉੑͱͯ͠पลͷ୯ޠ͕ Α͘࢖ΘΕ·͢ ୯ޠʹؔ͢Δૉੑ |  ର৅ͷ୯ޠࣗ਎: to |  1ͭલͷ୯ޠ: works |  1ͭޙΖͷ୯ޠ: the ୯ޠͷ૊Έ߹Θͤʹؔ͢Δૉੑ |  ର৅ͷ୯ޠͱ1ͭલͷ୯ޠͷϖΞ: (works, to) |  ର৅ͷ୯ޠͱ1ͭޙΖͷ୯ޠͷϖΞ: (to, the) 9 He O works O toat B-ERR thea I-ERR flowershop O . O ྻڍ͢Δͷ͸େมʂ

Slide 10

Slide 10 text

ૉੑͷࣗಈྻڍʹ͸ ૉੑςϯϓϨʔτ͕࢖ΘΕ·͢ ୯ޠʹؔ͢ΔςϯϓϨʔτ |  ର৅ͷ୯ޠࣗ਎ |  1ͭલͷ୯ޠ |  1ͭޙΖͷ୯ޠ ୯ޠͷ૊Έ߹Θͤʹؔ͢ΔςϯϓϨʔτ |  ର৅ͷ୯ޠͱ1ͭલͷ୯ޠͷϖΞ |  ର৅ͷ୯ޠͱ1ͭޙΖͷ୯ޠͷϖΞ 10 He O works O toat B-ERR thea I-ERR flowershop O . O

Slide 11

Slide 11 text

CRF++ͷ͔͍͔ͭͨ ૉੑςϯϓϨʔτͱػցֶश 11

Slide 12

Slide 12 text

৚݅෇͖֬཰৔Λ༻͍ͯ ܥྻϥϕϦϯάΛߦͳ͍·͢ CRF++: Yet Another CRF toolkit |  http://code.google.com/p/crfpp/ |  দຊݚOBʢݱ Googleʣͷ޻౻୓͞Μ͕։ൃ |  Φʔϓϯιʔε ʢϓϩάϥϜ͕ઃܭਤͱͱ΋ʹެ։ʣ |  ൚༻ܥྻϥϕϦϯάπʔϧΩοτͰɺ ૉੑςϯϓϨʔτΛѻ͏͜ͱ͕Մೳ 12

Slide 13

Slide 13 text

ϕʔεϥΠϯͷ ૉੑςϯϓϨʔτ |  #Ͱ࢝·Δߦ͸ίϝϯτ |  %x[i,j] ͕ల։͞ΕΔɻ ྫ: U02:%x[0,0] ˠ U02:to ʢର৅ͷ୯ޠʣ 13 # Unigram U01:%x[-1,0] U02:%x[0,0] U03:%x[1,0] U05:%x[-1,0]/%x[0,0] U06:%x[0,0]/%x[1,0] #U11:%x[-1,1] #U12:%x[0,1] #U13:%x[1,1] #U16:%x[-1,1]/%x[0,1] #U17:%x[0,1]/%x[1,1] #U20:%x[-2,1]/%x[-1,1]/%x[0,1] U21:%x[-1,0]/%x[0,0]/%x[1,0] #U22:%x[0,1]/%x[1,1]/%x[2,1] ୯ޠ඼ࢺϥϕϧ He PRP O works VBZ O toat TO B-ERR thea DT I-ERR flowershop NN O . . O

Slide 14

Slide 14 text

CRF++で重みを学習してみよう 14 komachi@bean% crf_learn template train.txt word.model CRF++: Yet Another CRF Tool Kit Copyright (C) 2005-2009 Taku Kudo, All rights reserved. reading training data: 100.. 200.. 300.. 400.. 500.. 600.. 700.. 800.. 900.. 1000.. 1100.. 1200.. 1300.. 1400.. 1500.. 1600.. 1700.. 1800.. 1900.. 2000.. 2100.. Done!0.21 s Number of sentences: 2132 Number of features: 113475 Number of thread(s): 1 Freq: 1 eta: 0.00010 C: 1.00000 shrinking size: 20 iter=0 terr=0.92436 serr=0.99953 act=113475 obj=21786.58030 diff=1.00000 iter=1 terr=0.11906 serr=0.52111 act=113475 obj=10955.05362 diff=0.49717 … iter=106 terr=0.01140 serr=0.08443 act=113475 obj=2478.28158 diff=0.00006 Done!5.26 s

Slide 15

Slide 15 text

CRF++で誤り検出してみよう 15 komachi@bean% crf_test -m word.model test.txt > word.out komachi@bean% less word.out I PRP O O 'm VBP O O a DT O O student NN O O . . O O ... Gardening NN O O is VBZ O O very RB O O interest NN B-ERR B-ERR , , O O but CC O O very RB O O difficult JJ O O . . O O ...

Slide 16

Slide 16 text

ධՁํ๏ ద߹཰ɺ࠶ݱ཰ɺF஋ʹΑΔࣗಈධՁ 16

Slide 17

Slide 17 text

ධՁ͸ద߹཰ɺ࠶ݱ཰ɺ F஋Ͱߦͳ͍·͢ |  ਫ਼౓accuracy = (tp + tn) / (tp + fp + tn + fn) |  ద߹཰precision = tp / (tp + fp) |  ࠶ݱ཰recall = tp / (tp + fn) tp (true positive) ͸γεςϜग़ྗͷ͏ͪਖ਼͘͠ਖ਼ղͰ͖ͨ΋ͷ fp (false positive) ͸γεςϜग़ྗͷ͏ͪޡͬͯਖ਼ղͱͯ͠͠·ͬͨ΋ͷ fn (false negative) ͸γεςϜग़ྗͷ͏ͪޡͬͯෆਖ਼ղͱͯ͠͠·ͬͨ΋ͷ |  F஋ f-measure = ద߹཰ͱ࠶ݱ཰ͷௐ࿨ฏۉ = 2 * precision * recall / (precision + recall) ద߹཰΋࠶ݱ཰΋΄Ͳ΄ͲʹΑ͘ͳ͍ͱ͍͚ͳ͍ 17

Slide 18

Slide 18 text

ܥྻϥϕϦϯάʹ͸ඪ४తͳ ධՁεΫϦϓτ͕͋Γ·͢ conlleval.pl: CoNLL ܗࣜͷϑΝΠϧͰਫ਼౓ɾద߹ ཰ɾ࠶ݱ཰ɾF஋Λܭࢉ͢ΔϓϩάϥϜ |  -d: ۠੾Γจࣈͷมߋʢ͜͜Ͱ͸λϒʹมߋʣ |  -r: νϟϯΫ͝ͱͰ͸ͳ͘ݸʑͷϥϕϧΛධՁ |  ධՁ͍ͨ͠ϑΝΠϧ͸ϦμΠϨΫτͰ༩͑Δ komachi@bean% perl conlleval.pl -d '\t' < word.out processed 1032 tokens with 81 phrases; found: 13 phrases; correct: 6. accuracy: 88.86%; precision: 46.15%; recall: 7.41%; FB1: 12.77 ERR: precision: 46.15%; recall: 7.41%; FB1: 12.77 13 18

Slide 19

Slide 19 text

ςϯϓϨʔτΛม͑Δͱಉ͡ σʔλͰ΋ҧ͏ϞσϧͰ͢ |  ୯ޠ+඼ࢺͷ૊Έ߹ΘͤςϯϓϨʔτΛ࢖ͬͨϞ σϧͷ΄͏͕ɺ୯ޠ͚ͩͷ؆୯ͳϞσϧΑΓਫ਼ ౓͕ߴ͍ komachi@bean% perl conlleval.pl -d '\t' < word.out processed 1032 tokens with 81 phrases; found: 13 phrases; correct: 6. accuracy: 88.86%; precision: 46.15%; recall: 7.41%; FB1: 12.77 ERR: precision: 46.15%; recall: 7.41%; FB1: 12.77 13 komachi@bean% crf_learn template_pos train.txt word_pos.model komachi@bean% crf_test –m word_pos.model test.txt > word_pos.out komachi@bean% perl conlleval.pl -d '\t' < word_pos.out processed 1032 tokens with 81 phrases; found: 20 phrases; correct: 10. accuracy: 89.05%; precision: 50.00%; recall: 12.35%; FB1: 19.80 ERR: precision: 50.00%; recall: 12.35%; FB1: 19.80 20 19

Slide 20

Slide 20 text

ԋश CRF++Λ༻͍ͨӳ࡞จͷޡΓݕग़ 20

Slide 21

Slide 21 text

ԋश1:จ๏ޡΓݕग़ʹ༗ޮͳ ૉੑςϯϓϨʔτΛͭ͘Ζ͏ |  λεΫ: ૉੑςϯϓϨʔτΛฤूͯ͠ɺF஋͕࠷ ΋ߴ͘ͳΔΑ͏ͳςϯϓϨʔτΛ࡞੒ͯͩ͘͠ ͍͞ɻ |  ώϯτ: ୯ޠͱ඼ࢺͷ৘ใ͸train.txt ʹ͋Γɺ ૉੑςϯϓϨʔτͰࢀরͰ͖·͢ɻϥϕϧͷ৘ ใΛࢀর͠ͳ͚Ε͹ɺtrain.txt ͱtest.txt ΛՃ޻ ͯ͠΋Α͍Ͱ͢ɻ |  crf_learn ͷΦϓγϣϯΛ͍ͬͯ͡΋Α͍Ͱ͢ɻ 21

Slide 22

Slide 22 text

ԋश2:ૉੑςϯϓϨʔτͷ ޡΓ෼ੳΛͯ͠ΈΑ͏ |  λεΫ: γεςϜग़ྗͰؒҧ͑ͯ͠·ͬͨ৔ॴΛ ෳ਺ՕॴݟͯɺͲͷΑ͏ͳ஌͕ࣝ͋Ε͹ਖ਼͘͠ ޡΓΛݕग़Ͱ͖Δ͔ɺߟ࡯͍ͯͩ͘͠͞ɻ |  ώϯτ: γεςϜग़ྗͷؒҧ͍ʹ͸2௨Γ͋Γ· ͢ɻfalse positive ʢਖ਼͍͠ͷʹޡΓͱग़ྗͯ͠ ͠·ͬͨʣͱ false negative ʢޡΓͳͷʹਖ਼͠ ͍ͱग़ྗͯ͠͠·ͬͨʣͰ͢ɻͦΕͧΕؒҧ͍ ΛݮΒ͢ͷʹඞཁͱ͞ΕΔ஌ࣝ͸ҟͳΔͷͰɺ ෼͚ͯߟ࡯͢ΔͱΑ͍Ͱ͢ɻ |  ग़ྗϑΝΠϧΛ ERR Ͱݕࡧ͢Δͱݟ͔ͭΓ·͢ɻ 22

Slide 23

Slide 23 text

Φϓγϣϯԋश:จ๏ޡΓݕग़ʹ ༗ޮͳૉੑΛݟ͚ͭΑ͏ |  λεΫ: ࡞੒ͨ͠ૉੑςϯϓϨʔτ͔Βల։͞Ε ͨૉੑͰɺ࠷΋ॏΈ͕ߴ͍ɾ௿͍ૉੑΛݟ͚ͭ ͍ͯͩ͘͞ɻ |  ώϯτ: crf_learn -t template train.txt model ͱ͢Δͱɺmodel.txt ͱ͍͏ϑΝΠϧ͕࡞ΒΕ ·͢ɻ͜ͷதʹɺૉੑͱॏΈ͕ॻ͔Ε͍ͯ·͢ɻ |  ஫ҙ: ϓϩάϥϜΛॻ͔ͳ͍ͱ͜ͷԋश͸Ͱ͖· ͤΜɻ 23

Slide 24

Slide 24 text

ࢀߟURL ޡΓݕग़ɾగਖ਼ϫʔΫγϣοϓ |  https://sites.google.com/site/edcw2012/ KJίʔύεΛ༻͍ͨޡΓݕग़ Helping Our Own Shared Task |  http://clt.mq.edu.au/research/projects/hoo/ hoo2011/ ACL Anthology Reference Corpus ʢඇωΠςΟ ϒͷॻ͍ͨ࿦จʣΛ༻͍ͨޡΓగਖ਼ |  http://clt.mq.edu.au/research/projects/hoo/ hoo2012/ CLC-FCE ʢӳݕडݧऀͷΤοηΠʣΛ༻͍ͨલஔ ࢺɾݶఆࢺޡΓగਖ਼ 24