Upgrade to Pro — share decks privately, control downloads, hide ads and more …

専門用語抽出手法の研究と
抽出アプリケーションの開発

Avatar for Koga Kobayashi Koga Kobayashi
September 27, 2018

 専門用語抽出手法の研究と
抽出アプリケーションの開発

Avatar for Koga Kobayashi

Koga Kobayashi

September 27, 2018
Tweet

More Decks by Koga Kobayashi

Other Decks in Programming

Transcript

  1. ࣗݾ঺հ • খྛᕣՏ: @kajyuuen • ஜ೾େֶ ৘ใֶ܈ 4೥ • ݚڀ͸ࣗવݴޠॲཧɺػցֶश

    • ։ൃͰ͸Ruby on RailsΛΑ͘࢖͍·͢ • झຯ • ΠϯλʔωοτɺԻָؑ৆ɺόΠΫ(͓ٳΈத) 2
  2. ग़ݱස౓ͱ࿈઀ස౓ʹΑΔઐ໳༻ޠީิநग़[த઒+ 2003] ྫ: ࣗવݴޠॲཧ 10 ୯໊ࢺ લͷޠʹ࿈݁ͨ͠ճ਺ ޙͷޠʹ࿈݁ͨ͠ճ਺ ࣗવ 

     ݴޠ   ॲཧ   ॏཁ౓ = ෳ߹ޠΛ࡞Δ୯໊ࢺͷ࿈݁ճ਺ͷ૬৐ฏۉ
 = 6 1 ⋅ 2 ⋅ 2 ⋅ 3 ⋅ 1 ⋅ 1 = 1.51
  3. ಛ௃ྔϕΫτϧͷ࡞੒ • લޙೋ୯ޠͷද૚ܥͱ඼ࢺͱจࣈछ • ڭࢣͳֶ͠शʹΑΔॏཁ౓ ͔Βಛ௃ྔϕΫτϧΛ࡞੒͢Δ 16 ݚڀ ͸ ࣗવݴޠॲཧ

    ͱ ػց ֶश Ͱ͢ ໊ ॿ ઐ໳༻ޠީิ ॿ ໊ ໊ ॿಈ ݚڀ ͸ ࣗવݴޠॲཧ ͱ ػց ֶश Ͱ͢ 1.51 ݚڀ ͸ ࣗવݴޠॲཧ ͱ ػց ֶश Ͱ͢
  4. σʔλબ୒ͱϞσϧͷߋ৽ Uncertainly Sampling (least confident) ݱ࣌఺ͷϞσϧͰ࠷΋ෆ͔֬ͳσʔλΛਪન 18 x* LC =

    arg max x∈U 1 − Pθ ( ̂ y|x) ̂ y: ࠷΋औΓ͏Δ֬཰͕ߴ͍ϥϕϧ U : ϥϕϧͳ͠σʔλͷू߹ x* LC : ϥϕϧ෇͚Λਪન͢Δσʔλ
  5. ࣮ݧᶃ: Wikipediaʹରͯ͠ઐ໳༻ޠநग़ • σʔλ • Wikipediaͷจষ61ͭʹରͯ͠ઐ໳༻ޠͷநग़Λߦ͏ • ৚݅ઃఆ • ڭࢣͳֶ͠शͰநग़ͨ͠༻ޠͷࡾ෼ͷҰʹΞϊςʔγϣϯ

    • 5ͭͷσʔλʹϥϕϦϯά͕ऴΘͬͨΒϞσϧΛ࠶ֶश • ೳಈֶशͱϥϯμϜαϯϓϦϯάɺࣙॻʹΑΔൺֱΛߦ͏ 19 ೳಈֶश͕ϥϯμϜαϯϓϦϯάΑΓ༏Ε͍ͯΔ͜ͱΛࣔ͢
  6. ࣮ݧᶃ: ݁Ռ IPAdic NEologd 20 Ϟσϧ 1SFDJTJPO 3FDBMM 'WBMVF ڭࢣͳֶ͠श

       ϥϯμϜαϯϓϦϯά    ೳಈֶश    Ϟσϧ 1SFDJTJPO 3FDBMM 'WBMVF ڭࢣͳֶ͠श    ϥϯμϜαϯϓϦϯά    ೳಈֶश    • ྆ࣙॻʹ͓͍ͯϥϯμϜαϯϓϦϯάΑΓೳಈֶश͕༏Ε͍ͯͨ • NEologdΛ࢖༻ͨ͠΄͏͕ੑೳ͕ߴ͔ͬͨ
  7. ࣮ݧᶄ: FAQυϝΠϯʹରͯ͠ͷઐ໳༻ޠநग़ • ֶशσʔλ • εΧύʔʂͷϔϧϓίϯςϯπ͔Βऔಘͨ͠FAQ 5,113จࣈ • ৚݅ઃఆ •

    ϥϯμϜʹΞϊςʔγϣϯ͢ΔϞσϧͱൺֱ • 5ͭͷσʔλʹϥϕϦϯά͕ऴΘͬͨΒϞσϧΛ࠶ֶश • Ξϊςʔγϣϯ਺͕0ͷͱ͖͸શͯͷநग़୯ޠΛઐ໳༻ޠͱΈͳ͢ 21 Ͳͷఔ౓Ξϊςʔγϣϯ͢Ε͹࣮༻తͳϞσϧʹͳΔ͔֬ೝ IUUQTIFMQDFOUFSTLZQFSGFDUWDPKQ
  8. ࢀߟจݙ [1] த઒ ༟ࢤ, ౬ຊ ߛজ, ৿ ୢଇ. ग़ݱස౓ͱ࿈઀ස౓ʹجͮ͘ઐ໳༻ޠநग़. ࣗવݴޠॲཧ.

    2003, 10(1), p.27-45. [2] த઒ ༟ࢤ, ౬ຊ ߛজ, ৿ ୢଇ. ೔ຊޠϚχϡΞϧจʹ͓͚Δ໊ࢺؒͷ࿈઀৘ใΛ༻͍ͨϋΠύʔςΩε τԽͷͨΊͷࡧҾޠͷநग़. ৘ใॲཧֶձݚڀใࠂࣗવݴޠॲཧ. 1996, (114), p.65-72 [3] “ઐ໳༻ޠʢΩʔϫʔυʣࣗಈநग़༻PerlϞδϡʔϧ ”. ”ઐ໳༻ޠʢΩʔϫʔυʣࣗಈநग़γεςϜ”ͷ ϖʔδ΁Α͏ͦ͜. http://gensen.dl.itc.u-tokyo.ac.jp/termextract.html, (ࢀর 2018-9-4). [4] Burr Settles. Active Learning Literature Survey. Computer Sciences Technical Report 1648. 2010. http://burrsettles.com/pub/settles.activelearning.pdf, (ࢀর 2018-9-4). [5] Burr Settles, Mark Craven. An Analysis of Active Learning Strategies for Sequence Labeling Tasks. EMNLP. 2008. 29