Slide 1

Slide 1 text

ઐ໳༻ޠநग़ख๏ͷݚڀͱ
 நग़ΞϓϦέʔγϣϯͷ։ൃ

Slide 2

Slide 2 text

ࣗݾ঺հ • খྛᕣՏ: @kajyuuen • ஜ೾େֶ ৘ใֶ܈ 4೥ • ݚڀ͸ࣗવݴޠॲཧɺػցֶश • ։ൃͰ͸Ruby on RailsΛΑ͘࢖͍·͢ • झຯ • ΠϯλʔωοτɺԻָؑ৆ɺόΠΫ(͓ٳΈத) 2

Slide 3

Slide 3 text

໨త 3 ڭࢣσʔλ͕গͳ͍ઐ໳υϝΠϯͷจষ͔Β
 ઐ໳༻ޠΛநग़͕ग़དྷΔγεςϜɾख๏ͷ։ൃ

Slide 4

Slide 4 text

ઐ໳༻ޠͱ͸ ઐ໳༻ޠʢͤΜ΋ΜΑ͏͝ʣͱ͸ɺ͋Δಛఆͷ৬ۀʹैࣄ͢Δऀ΍ɺ
 ͋Δಛఆͷֶ໰ͷ෼໺ɺۀք౳ͷؒͰͷΈ࢖༻͞Εɺ௨༻͢Δݴ༿ɾ༻ޠ܈Ͱ͋Δɻ ςΫχΧϧλʔϜʢӳޠ technical termʣͱ΋ݴΘΕΔɻ Wikipedia͔ΒͷҾ༻ 4 ྫ: ίʔϧηϯλʔ • ΦϖϨʔλʔɺFAQɺVoCɺฏۉ௨࿩࣌ؒ ྫ: ྉཧ • ͍ͪΐ͏੾Γɺܡണ͖ɺࡾຕ͓Ζ͠

Slide 5

Slide 5 text

എܠ ͔͠͠… • ҰൠͷυϝΠϯͰֶशͨ͠ϞσϧΛ
 ઐ໳υϝΠϯʹదԠͤͯ͞΋্ख͘நग़ग़དྷͳ͍ • ઐ໳༻ޠͷநग़ʹ͸ઐ໳Ոͷଟ͘ͷ࣌ؒͱਓख͕ඞཁ ͱ͍͏໰୊͕͋Γɺઐ໳༻ޠͷநग़͸೉͔ͬͨ͠ 5 ઐ໳༻ޠͷࣙॻ͸ܗଶૉղੳ΍ݕࡧͷਫ਼౓Λ޲্ͤ͞Δ

Slide 6

Slide 6 text

എܠ ͦͷͨΊগͳ͍ίετͰઐ໳༻ޠநग़͕ՄೳʹͳΔ͜ͱ͸
 ϨτϦό੡඼ͷੑೳ޲্ʹܨ͕Δ 6

Slide 7

Slide 7 text

ఏҊख๏ • ग़ݱස౓ͱ࿈઀ස౓ʹΑΔઐ໳༻ޠͷީิநग़ • ೳಈֶशΛ༻͍ͨڭࢣ͋ΓֶशʹΑΔઐ໳༻ޠީิͷ෼ྨ 7 ͜ΕΒ2ͭͷख๏Λ૊Έ߹ΘͤΔ͜ͱͰ ௿ίετͰͷઐ໳༻ޠநग़ΛՄೳʹ͢Δ

Slide 8

Slide 8 text

ઐ໳༻ޠநग़·Ͱͷϑϩʔ 8 ग़ݱස౓ͱ࿈઀ස౓ʹΑΔઐ໳༻ޠީิநग़ ೳಈֶशΛ༻͍ͨڭࢣ͋Γֶश ઐ໳༻ޠͷநग़

Slide 9

Slide 9 text

ग़ݱස౓ͱ࿈઀ස౓ʹΑΔઐ໳༻ޠީิநग़[த઒+ 2003] • ઐ໳༻ޠ͸໊ࢺͦͷ΋ͷ͔ෳ਺ͷ໊ࢺͷෳ߹ޠ͔Β੒ΔͱԾఆ • ෳ߹ޠΛߏ੒͢Δ࠷খ୯ҐΛ୯໊ࢺͱఆٛ • ͋Δ୯໊ࢺ͕ଞͷ୯໊ࢺͱ࿈݁ͯ͠
 ෳ߹ޠΛ࡞Δճ਺͕ଟ͍΄Ͳॏཁ㱺ઐ໳༻ޠ 9 ࣗવݴޠॲཧ ࣗવ ݴޠ ॲཧ = + +

Slide 10

Slide 10 text

ग़ݱස౓ͱ࿈઀ස౓ʹΑΔઐ໳༻ޠީิநग़[த઒+ 2003] ྫ: ࣗવݴޠॲཧ 10 ୯໊ࢺ લͷޠʹ࿈݁ͨ͠ճ਺ ޙͷޠʹ࿈݁ͨ͠ճ਺ ࣗવ ݴޠ ॲཧ ॏཁ౓ = ෳ߹ޠΛ࡞Δ୯໊ࢺͷ࿈݁ճ਺ͷ૬৐ฏۉ
 = 6 1 ⋅ 2 ⋅ 2 ⋅ 3 ⋅ 1 ⋅ 1 = 1.51

Slide 11

Slide 11 text

ઐ໳༻ޠநग़·Ͱͷϑϩʔ 11 ग़ݱස౓ͱ࿈઀ස౓ʹΑΔઐ໳༻ޠީิநग़ ೳಈֶशΛ༻͍ͨڭࢣ͋Γֶश ઐ໳༻ޠͷநग़

Slide 12

Slide 12 text

ೳಈֶशͱ͸ ୔ࢁͷϥϕϧͳ͠σʔλͷத͔Β
 ϥϕϧ͕෇͘ͱϞσϧͷੑೳ͕޲্ͦ͠͏ͳσʔλΛϢʔβʹਪન͠
 Ξϊςʔγϣϯ͍ͯ͘͜͠ͱͰϞσϧΛֶश͍ͯ͘͠ํ๏ 12 গͳ͍ڭࢣσʔλͰϞσϧͷੑೳ͕޲্͢Δ

Slide 13

Slide 13 text

ೳಈֶशͱ͸ 13 ઐ໳༻ޠ ඇઐ໳༻ޠ ϥϕϧͳ͠ 1 2 ϥϕϧ͕஌Γ͍ͨσʔλ͸?

Slide 14

Slide 14 text

ೳಈֶशͱ͸ 14 ઐ໳༻ޠ ඇઐ໳༻ޠ ϥϕϧͳ͠ 1 2 ϥϕϧ͕஌Γ͍ͨσʔλ͸? ޮՌతͳֶश͕ߦ͑ͳ͍

Slide 15

Slide 15 text

ೳಈֶशͱ͸ 15 ઐ໳༻ޠ ඇઐ໳༻ޠ ϥϕϧͳ͠ 1 2 ϥϕϧ͕஌Γ͍ͨσʔλ͸? ֶश͕ޮՌతʹਐΉ

Slide 16

Slide 16 text

ಛ௃ྔϕΫτϧͷ࡞੒ • લޙೋ୯ޠͷද૚ܥͱ඼ࢺͱจࣈछ • ڭࢣͳֶ͠शʹΑΔॏཁ౓ ͔Βಛ௃ྔϕΫτϧΛ࡞੒͢Δ 16 ݚڀ ͸ ࣗવݴޠॲཧ ͱ ػց ֶश Ͱ͢ ໊ ॿ ઐ໳༻ޠީิ ॿ ໊ ໊ ॿಈ ݚڀ ͸ ࣗવݴޠॲཧ ͱ ػց ֶश Ͱ͢ 1.51 ݚڀ ͸ ࣗવݴޠॲཧ ͱ ػց ֶश Ͱ͢

Slide 17

Slide 17 text

Ϟσϧͷֶश Logistic regression • ͦͷ୯ޠ͕ઐ໳༻ޠ͔ඇઐ໳༻ޠ͔Λ෼ྨ͢ΔϞσϧ • ೳಈֶशͰ͸ֶशͱ༧ଌΛ܁Γฦ͢ҝ୯७ͳϞσϧΛ࠾༻ • ࠓճ༻͍Δೳಈֶशͷख๏Ͱ͸༧ଌ֬཰͕ඞཁ 17

Slide 18

Slide 18 text

σʔλબ୒ͱϞσϧͷߋ৽ Uncertainly Sampling (least confident) ݱ࣌఺ͷϞσϧͰ࠷΋ෆ͔֬ͳσʔλΛਪન 18 x* LC = arg max x∈U 1 − Pθ ( ̂ y|x) ̂ y: ࠷΋औΓ͏Δ֬཰͕ߴ͍ϥϕϧ U : ϥϕϧͳ͠σʔλͷू߹ x* LC : ϥϕϧ෇͚Λਪન͢Δσʔλ

Slide 19

Slide 19 text

࣮ݧᶃ: Wikipediaʹରͯ͠ઐ໳༻ޠநग़ • σʔλ • Wikipediaͷจষ61ͭʹରͯ͠ઐ໳༻ޠͷநग़Λߦ͏ • ৚݅ઃఆ • ڭࢣͳֶ͠शͰநग़ͨ͠༻ޠͷࡾ෼ͷҰʹΞϊςʔγϣϯ • 5ͭͷσʔλʹϥϕϦϯά͕ऴΘͬͨΒϞσϧΛ࠶ֶश • ೳಈֶशͱϥϯμϜαϯϓϦϯάɺࣙॻʹΑΔൺֱΛߦ͏ 19 ೳಈֶश͕ϥϯμϜαϯϓϦϯάΑΓ༏Ε͍ͯΔ͜ͱΛࣔ͢

Slide 20

Slide 20 text

࣮ݧᶃ: ݁Ռ IPAdic NEologd 20 Ϟσϧ 1SFDJTJPO 3FDBMM 'WBMVF ڭࢣͳֶ͠श ϥϯμϜαϯϓϦϯά ೳಈֶश Ϟσϧ 1SFDJTJPO 3FDBMM 'WBMVF ڭࢣͳֶ͠श ϥϯμϜαϯϓϦϯά ೳಈֶश • ྆ࣙॻʹ͓͍ͯϥϯμϜαϯϓϦϯάΑΓೳಈֶश͕༏Ε͍ͯͨ • NEologdΛ࢖༻ͨ͠΄͏͕ੑೳ͕ߴ͔ͬͨ

Slide 21

Slide 21 text

࣮ݧᶄ: FAQυϝΠϯʹରͯ͠ͷઐ໳༻ޠநग़ • ֶशσʔλ • εΧύʔʂͷϔϧϓίϯςϯπ͔Βऔಘͨ͠FAQ 5,113จࣈ • ৚݅ઃఆ • ϥϯμϜʹΞϊςʔγϣϯ͢ΔϞσϧͱൺֱ • 5ͭͷσʔλʹϥϕϦϯά͕ऴΘͬͨΒϞσϧΛ࠶ֶश • Ξϊςʔγϣϯ਺͕0ͷͱ͖͸શͯͷநग़୯ޠΛઐ໳༻ޠͱΈͳ͢ 21 Ͳͷఔ౓Ξϊςʔγϣϯ͢Ε͹࣮༻తͳϞσϧʹͳΔ͔֬ೝ IUUQTIFMQDFOUFSTLZQFSGFDUWDPKQ

Slide 22

Slide 22 text

࣮ݧᶄ: ਫ਼౓ͱ࠶ݱ཰ 22 • ਫ਼౓͸ೳಈֶश͕ϥϯμϜαϯϓϦϯάΑΓઌʹανΔ • ࠶ݱ཰Ͱೳಈֶश͸ϥϯμϜαϯϓϦϯάΛେ্͖͘ճΔ Ξϊςʔγϣϯͳͩ͠ͱ ਫ਼౓͸௿͍ ڭࢣͳֶ͠श ઐ໳༻ޠͷ72.7%ΛΧόʔ ڭࢣͳֶ͠श

Slide 23

Slide 23 text

࣮ݧᶄ: F஋ 23 ׂ࢛ఔ౓ΞϊςʔγϣϯΛߦ͏͚ͩͰF஋͸7ׂΛ௒͑ͨ ࠷େͰ໿20ϙΠϯτͷࠩ

Slide 24

Slide 24 text

நग़ʹ੒ޭͨ͠ઐ໳༻ޠ • εΧύʔʂɺϓϨϛΞϜαʔϏεޫϚϯγϣϯ޲͚αʔϏε நग़ग़དྷͳ͔ͬͨઐ໳༻ޠ • TZ-WR4KPɺSP-HR200HɺΞϯςφαϙʔτϓϥϯ ؒҧͬͯநग़ͯ͠͠·ͬͨ୯ޠ • ൪૊ɺνϟϯωϧɺMyνϟϯωϧ1 ࣮ݧᶄ: ڭࢣͳֶ͠शͰͷநग़୯ޠྫ 24

Slide 25

Slide 25 text

ΠϯλʔϑΣʔε Ξϊςʔγϣϯͷޮ཰Λ্͛ΔͨΊʹ
 WebΞϓϦέʔγϣϯͱͯ͠ΠϯλʔϑΣʔεΛ։ൃͨ͠ 25 ػೳҰཡ • ઐ໳༻ޠͷϋΠϥΠτ / நग़ػೳ • ೳಈֶशʹΑΔֶशͱΞϊςʔγϣϯσʔλͷਪન • CSVΤΫεϙʔτ

Slide 26

Slide 26 text

DEMO 26

Slide 27

Slide 27 text

ΞϓϦέʔγϣϯͷߏ੒ 27

Slide 28

Slide 28 text

·ͱΊ 28 ໨త গͳ͍ςΩετσʔλ͔Βઐ໳༻ޠͷநग़Λߦ͏ ख๏ ڭࢣͳֶ͠श+ೳಈֶशΛ༻͍ͨWebΞϓϦέʔγϣϯͷఏڙ ࠓޙ நग़ΞϧΰϦζϜͷ࠶࣮૷ʹΑΔߴ଎Խ ݕࡧͳͲͷԠ༻ʹ͓͚ΔੑೳධՁɺ৽ͨͳख๏ɾಛ௃ྔͷௐࠪ

Slide 29

Slide 29 text

ࢀߟจݙ [1] த઒ ༟ࢤ, ౬ຊ ߛজ, ৿ ୢଇ. ग़ݱස౓ͱ࿈઀ස౓ʹجͮ͘ઐ໳༻ޠநग़. ࣗવݴޠॲཧ. 2003, 10(1), p.27-45. [2] த઒ ༟ࢤ, ౬ຊ ߛজ, ৿ ୢଇ. ೔ຊޠϚχϡΞϧจʹ͓͚Δ໊ࢺؒͷ࿈઀৘ใΛ༻͍ͨϋΠύʔςΩε τԽͷͨΊͷࡧҾޠͷநग़. ৘ใॲཧֶձݚڀใࠂࣗવݴޠॲཧ. 1996, (114), p.65-72 [3] “ઐ໳༻ޠʢΩʔϫʔυʣࣗಈநग़༻PerlϞδϡʔϧ ”. ”ઐ໳༻ޠʢΩʔϫʔυʣࣗಈநग़γεςϜ”ͷ ϖʔδ΁Α͏ͦ͜. http://gensen.dl.itc.u-tokyo.ac.jp/termextract.html, (ࢀর 2018-9-4). [4] Burr Settles. Active Learning Literature Survey. Computer Sciences Technical Report 1648. 2010. http://burrsettles.com/pub/settles.activelearning.pdf, (ࢀর 2018-9-4). [5] Burr Settles, Mark Craven. An Analysis of Active Learning Strategies for Sequence Labeling Tasks. EMNLP. 2008. 29