Upgrade to Pro — share decks privately, control downloads, hide ads and more …

専門用語抽出手法の研究と
抽出アプリケーションの開発

Koga Kobayashi
September 27, 2018

 専門用語抽出手法の研究と
抽出アプリケーションの開発

Koga Kobayashi

September 27, 2018
Tweet

More Decks by Koga Kobayashi

Other Decks in Programming

Transcript

  1. ࣗݾ঺հ • খྛᕣՏ: @kajyuuen • ஜ೾େֶ ৘ใֶ܈ 4೥ • ݚڀ͸ࣗવݴޠॲཧɺػցֶश

    • ։ൃͰ͸Ruby on RailsΛΑ͘࢖͍·͢ • झຯ • ΠϯλʔωοτɺԻָؑ৆ɺόΠΫ(͓ٳΈத) 2
  2. ग़ݱස౓ͱ࿈઀ස౓ʹΑΔઐ໳༻ޠީิநग़[த઒+ 2003] ྫ: ࣗવݴޠॲཧ 10 ୯໊ࢺ લͷޠʹ࿈݁ͨ͠ճ਺ ޙͷޠʹ࿈݁ͨ͠ճ਺ ࣗવ 

     ݴޠ   ॲཧ   ॏཁ౓ = ෳ߹ޠΛ࡞Δ୯໊ࢺͷ࿈݁ճ਺ͷ૬৐ฏۉ
 = 6 1 ⋅ 2 ⋅ 2 ⋅ 3 ⋅ 1 ⋅ 1 = 1.51
  3. ಛ௃ྔϕΫτϧͷ࡞੒ • લޙೋ୯ޠͷද૚ܥͱ඼ࢺͱจࣈछ • ڭࢣͳֶ͠शʹΑΔॏཁ౓ ͔Βಛ௃ྔϕΫτϧΛ࡞੒͢Δ 16 ݚڀ ͸ ࣗવݴޠॲཧ

    ͱ ػց ֶश Ͱ͢ ໊ ॿ ઐ໳༻ޠީิ ॿ ໊ ໊ ॿಈ ݚڀ ͸ ࣗવݴޠॲཧ ͱ ػց ֶश Ͱ͢ 1.51 ݚڀ ͸ ࣗવݴޠॲཧ ͱ ػց ֶश Ͱ͢
  4. σʔλબ୒ͱϞσϧͷߋ৽ Uncertainly Sampling (least confident) ݱ࣌఺ͷϞσϧͰ࠷΋ෆ͔֬ͳσʔλΛਪન 18 x* LC =

    arg max x∈U 1 − Pθ ( ̂ y|x) ̂ y: ࠷΋औΓ͏Δ֬཰͕ߴ͍ϥϕϧ U : ϥϕϧͳ͠σʔλͷू߹ x* LC : ϥϕϧ෇͚Λਪન͢Δσʔλ
  5. ࣮ݧᶃ: Wikipediaʹରͯ͠ઐ໳༻ޠநग़ • σʔλ • Wikipediaͷจষ61ͭʹରͯ͠ઐ໳༻ޠͷநग़Λߦ͏ • ৚݅ઃఆ • ڭࢣͳֶ͠शͰநग़ͨ͠༻ޠͷࡾ෼ͷҰʹΞϊςʔγϣϯ

    • 5ͭͷσʔλʹϥϕϦϯά͕ऴΘͬͨΒϞσϧΛ࠶ֶश • ೳಈֶशͱϥϯμϜαϯϓϦϯάɺࣙॻʹΑΔൺֱΛߦ͏ 19 ೳಈֶश͕ϥϯμϜαϯϓϦϯάΑΓ༏Ε͍ͯΔ͜ͱΛࣔ͢
  6. ࣮ݧᶃ: ݁Ռ IPAdic NEologd 20 Ϟσϧ 1SFDJTJPO 3FDBMM 'WBMVF ڭࢣͳֶ͠श

       ϥϯμϜαϯϓϦϯά    ೳಈֶश    Ϟσϧ 1SFDJTJPO 3FDBMM 'WBMVF ڭࢣͳֶ͠श    ϥϯμϜαϯϓϦϯά    ೳಈֶश    • ྆ࣙॻʹ͓͍ͯϥϯμϜαϯϓϦϯάΑΓೳಈֶश͕༏Ε͍ͯͨ • NEologdΛ࢖༻ͨ͠΄͏͕ੑೳ͕ߴ͔ͬͨ
  7. ࣮ݧᶄ: FAQυϝΠϯʹରͯ͠ͷઐ໳༻ޠநग़ • ֶशσʔλ • εΧύʔʂͷϔϧϓίϯςϯπ͔Βऔಘͨ͠FAQ 5,113จࣈ • ৚݅ઃఆ •

    ϥϯμϜʹΞϊςʔγϣϯ͢ΔϞσϧͱൺֱ • 5ͭͷσʔλʹϥϕϦϯά͕ऴΘͬͨΒϞσϧΛ࠶ֶश • Ξϊςʔγϣϯ਺͕0ͷͱ͖͸શͯͷநग़୯ޠΛઐ໳༻ޠͱΈͳ͢ 21 Ͳͷఔ౓Ξϊςʔγϣϯ͢Ε͹࣮༻తͳϞσϧʹͳΔ͔֬ೝ IUUQTIFMQDFOUFSTLZQFSGFDUWDPKQ
  8. ࢀߟจݙ [1] த઒ ༟ࢤ, ౬ຊ ߛজ, ৿ ୢଇ. ग़ݱස౓ͱ࿈઀ස౓ʹجͮ͘ઐ໳༻ޠநग़. ࣗવݴޠॲཧ.

    2003, 10(1), p.27-45. [2] த઒ ༟ࢤ, ౬ຊ ߛজ, ৿ ୢଇ. ೔ຊޠϚχϡΞϧจʹ͓͚Δ໊ࢺؒͷ࿈઀৘ใΛ༻͍ͨϋΠύʔςΩε τԽͷͨΊͷࡧҾޠͷநग़. ৘ใॲཧֶձݚڀใࠂࣗવݴޠॲཧ. 1996, (114), p.65-72 [3] “ઐ໳༻ޠʢΩʔϫʔυʣࣗಈநग़༻PerlϞδϡʔϧ ”. ”ઐ໳༻ޠʢΩʔϫʔυʣࣗಈநग़γεςϜ”ͷ ϖʔδ΁Α͏ͦ͜. http://gensen.dl.itc.u-tokyo.ac.jp/termextract.html, (ࢀর 2018-9-4). [4] Burr Settles. Active Learning Literature Survey. Computer Sciences Technical Report 1648. 2010. http://burrsettles.com/pub/settles.activelearning.pdf, (ࢀর 2018-9-4). [5] Burr Settles, Mark Craven. An Analysis of Active Learning Strategies for Sequence Labeling Tasks. EMNLP. 2008. 29