Upgrade to Pro — share decks privately, control downloads, hide ads and more …

極大部分文字列による関連フレーズ抽出とその応用 / Related Keyphrase Ex...

極大部分文字列による関連フレーズ抽出とその応用 / Related Keyphrase Extraction by Maximal Substrings

Takuya Asano

January 26, 2017
Tweet

More Decks by Takuya Asano

Other Decks in Technology

Transcript

  1. ۃେ෦෼จࣈྻʹΑΔ 
 ؔ࿈ϑϨʔζநग़ͱͦͷԠ༻ i d : t a k u

    y a - a 
 @ t a k u y a _ a ٕज़ษڧձ 2017-01-26
  2. id:takuya-a ΞϓϦέʔγϣϯΤϯδχΞ 2015 ೥ 4 ݄ೖࣾ ڵຯ • ৘ใݕࡧ •

    ࣗવݴޠॲཧ • ػցֶश OSS ׆ಈ kuromoji.js ͱ͔
  3. ࣗಈΩʔϑϨʔζநग़ 
 Automatic keyphrase extraction • จॻͷຊจ͔Βॏཁ͔ͭओ୊ʹ͋͏ϑϨʔζ 
 Λࣗಈબ୒͢Δ NLP

    λεΫ (Turney 2000) ԿͳΜͩΑ೔ຊɻ 
 Ұԯ૯׆༂ࣾձ͡ΌͶʔͷ͔Αɻ 
 ࡢ೔ݟࣄʹอҭԂམͪͨΘɻ • ݻ༗දݱநग़ (NER) • ओ୊ʹؔ܎ͳ͍΋ͷ΋औΓग़͢ • ΤϯςΟςΟϦϯΩϯά (Entity Linking) • ΩʔϑϨʔζͱΤϯςΟςΟ (Wikipedia ͷλΠτϧͳͲ) Λ݁ͼ͚ͭΔ http://anond.hatelabo.jp/20160215171759
  4. • ୯ޠʢܗଶૉʣͷ୯ҐͰ͸ࡉ͔͗͢Δ • ܗଶૉղੳ͚ͩͰ͸ҙຯͷ͋Δ·ͱ·Γ͕औΕͳ͍͜ͱ͕͋Δ • ਓ໊: ্_നੴ_๖_Ի • ஍໊: ӊؙ_ޚ஑

    • ࡞඼໊: ͋ͷ_೔_ݟ_ͨ_Ֆ_ͷ_໊લ_Λ_๻ୡ_͸_·ͩ_஌Β_ͳ͍_ɻ ࣗಈΩʔϑϨʔζநग़ 
 Automatic keyphrase extraction
  5. • ղ͔Εͨͱ͸ݴ͍͕͍ͨ • ӳޠ͔ͭ࠷ߴੑೳͷख๏Ͱ΋είΞ͸௿͍ (Hasan & Ng 2014) • ద߹཰

    27.2 ~ 35.0% • ࠶ݱ཰ 27.8 ~ 66.0% • F஋ 27.5 ~ 45.7% • ʢίʔύεʹΑ͚ͬͯͬ͜͏มΘΔʣ • ೔ຊޠͩͱ΋ͬͱݫ͘͠ͳΓͦ͏ ࣗಈΩʔϑϨʔζநग़ 
 Automatic keyphrase extraction ੡඼ʹ
 ଱͑Δਫ਼౓Λ
 Ί͍ͨ͟͠
  6. Well known APIs • Yahoo! JAPAN ςΩετղੳ Web API •

    ΩʔϑϨʔζநग़ API • Microsoft Cognitive Services • Text Analytics API • ଞʹ΋͋Δ͔΋
  7. • อҭԂམͪͨͰࢼͨ͠ Yahoo! ΩʔϑϨʔζநग़ API "อҭԂ": 100, 
 "ࣇಐख౰": 79,

    
 "࿫࿎": 76, 
 "ΰϚϯ": 63, 
 "൒෼ҐΫϏ": 58, 
 "ࢠڙ࢈Ή΍ͭͳΜ͔͍": 55, 
 "ΤϯϒϨϜ": 55, 
 "΢νϫ": 54, 
 "ࠃձٞһ": 48, 
 "গࢠԽ": 47, 
 "ΦϦϯϐοΫ": 45, 
 "σβΠφʔ": 44, 
 "೔ຊ": 42, 
 "Ϝγ": 41, 
 "ࡒݯ": 40, 
 "ࣇಐख౰20ສ": 39, 
 "ແঈ": 36, 
 "Ұԯ૯׆༂ࣾձ": 35, 
 "අ༻શͯ": 34, 
 "੫ۚ": 33
  8. ΩʔϑϨʔζநग़Λ࣮૷͢Δ • ΍ͬͺΓࣗ෼ͨͪͰ࣮૷͍ͨ͠ • ༻్͝ͱʹνϡʔχϯά͍ͨ͠ • ͸ͯͳϒοΫϚʔΫͷେن໛ςΩετσʔλ͕࢖͑Δ • શจॻ͕ Elasticsearch

    ʹࡌ͍ͬͯΔʂʂ • จࣈྻΞϧΰϦζϜͷग़൪ • ݱ࣮తͳܭࢉ࣌ؒͰɺݱ࣮తͳਫ਼౓Λग़͢ • ίετΛ཈͑Δ
  9. ΩʔϑϨʔζநग़ͷྲྀΕ 1. ީิΩʔϑϨʔζͷநग़ • ͢΂ͯͷ෦෼จࣈྻΛީิʹ͢Δͱީิ਺͕ n^2 ʹͳΔ • ։࢝Ґஔ n

    ύλʔϯ x ௕͞ n ύλʔϯ • ਖ਼ղΩʔϑϨʔζΛΧόʔͭͭ͠ɺܭࢉྔΛ཈͑Δ޻෉͕ඞཁ • ޙଓͷύΠϓϥΠϯͰͷܭࢉΛߴ଎ʹߦ͍͍ͨ 2. ΩʔϑϨʔζͷείΞϦϯά • ͦΕͧΕͷީิΩʔϑϨʔζʹରͯ͠είΞΛ͚ͭΔ • είΞͷᮢ஋΍݅਺ͳͲͰείΞ্ҐͷΩʔϑϨʔζΛબ୒ ʢৄ͘͠͸ Hasan & Ng 2014, 3. Keyphrase Extraction Approaches Λࢀরʣ
  10. ϑΣʔζ1: ީิΩʔϑϨʔζͷநग़ • φΠʔϒͳํ๏: ͢΂ͯͷ෦෼จࣈྻΛߟ͑Δͱ O(n^2) • ݱ࣮తʹ͸ 5 τʔΫϯͱ͔Ͱଧͪ੾Δ͜ͱʹͳΔ

    • ௕͍ϑϨʔζ͸औΕͳ͍ • ैདྷख๏: ώϡʔϦεςΟοΫͳϧʔϧϕʔε 1. ࣙॻΛ࢖ͬͯετοϓϫʔυΛ͸͘͡ (Liu+ 2009) 2. ඼ࢺྻύλʔϯʹϚον͢Δ΋ͷΛબ୒ (Mihalcea & Tarau 2004, Wan & Xiao 2008, Liu+ 2009) 3. ޠኮ౷ޠύλʔϯʹϚον͢Δ΋ͷΛબ୒ (Nguyen and Phan 2009) 4. Wikipedia λΠτϧͷ෦෼จࣈྻ (n-gram) ʹϚον͢Δ΋ͷΛબ୒ (Grineva+ 2009) ʢৄ͘͠͸ Hasan & Ng 2014, 3.1 Selecting Candidate Words and Phrases Λࢀরʣ
  11. ඼ࢺ͋ͯήʔϜ • ໊ࢺʁಈࢺʁॿࢺʁॿಈࢺʁ෭ࢺʁ࿈ମࢺʁ • ʮͦ͏ʯɿ໊ࢺ • ʮ͍Θ͘ʯɿ໊ࢺ • ʮ͝ཡʯɿ໊ࢺ •

    ʮ͖Βͼ΍͔ʯɿ໊ࢺ ͥΜͿ 
 ໊ࢺʂ ͦΕ͸ͨͿΜ͋ͳͨͷཉ໊͔ͬͨ͠ࢺͰ͸ͳ͍ - ԡͯ͠μϝͳΒ;ͯ৸͠Ζ http://ikawaha.hateblo.jp/entry/2016/05/20/155504
  12. ۃେ෦෼จࣈྻʹΑΔީิΩʔϑϨʔζநग़ • සग़͢ΔϑϨʔζΛ΋Εͳ͘ྻڍ͍ͨ͠ • ͜ͷϑΣʔζ̍Ͱ͸Χόʔ཰ʢ࠶ݱ཰ʣ͕࠷ॏཁ • ޙଓͷϑΣʔζ̎ͰϑΟϧλ͢Δ • ෦෼จࣈྻΛ·ͱΊͨʮ୅දʯΛߟ͑Δ͜ͱͰɺΧόʔ཰Λอͪͳ͕ΒީิΛݮΒ͢ •

    ෦෼จࣈྻͲ͏͠ͷʮग़ݱҐஔʯʹΑΔแؚؔ܎ΛΈΔ • ͨͩ͠จࣈྻ௕ͷ͚ࠩͩͣΒͯ͠Ұக͢ΔͳΒಉ͡ͱ͢Δʢޙड़ʣ • ͢΂ͯͷ෦෼จࣈྻΛแؚؔ܎ͰάϧʔϓԽ͢Δ • άϧʔϓͰ࠷௕ͷ෦෼จࣈྻ͕ۃେ෦෼จࣈྻ
  13. • ۃେ෦෼จࣈ͸ abre ͨͩ̍ͭ • 2ճҎ্ݱΕΔ෦෼จࣈྻ͸͢΂ͯ abre ʹؚ·Ε͍ͯΔ • {

    a, b, r, e, ab, br, re, abr, bre, abre } • ͜ΕΒͷ෦෼จࣈྻ͕ಉ͡άϧʔϓ • ࠷௕ͷ abre ͕ۃେ෦෼จࣈྻ Y a b r e - K a b r e ۃେ෦෼จࣈྻͷྫʢ̍ʣ
  14. • ۃେ෦෼จࣈ͸ abre ͨͩ̍ͭ • 2ճҎ্ݱΕΔ෦෼จࣈྻ͸͢΂ͯ abre ʹؚ·Ε͍ͯΔ • {

    a, b, r, e, ab, br, re, abr, bre, abre } • ͜ΕΒͷ෦෼จࣈྻ͕ಉ͡άϧʔϓ • ࠷௕ͷ abre ͕ۃେ෦෼จࣈྻ Y a b r e - K a b r e ۃେ෦෼จࣈྻͷྫʢ̍ʣ
  15. • ۃେ෦෼จࣈ͸ abre ͨͩ̍ͭ • 2ճҎ্ݱΕΔ෦෼จࣈྻ͸͢΂ͯ abre ʹؚ·Ε͍ͯΔ • {

    a, b, r, e, ab, br, re, abr, bre, abre } • ͜ΕΒͷ෦෼จࣈྻ͕ಉ͡άϧʔϓ • ࠷௕ͷ abre ͕ۃେ෦෼จࣈྻ Y a b r e - K a b r e ۃେ෦෼จࣈྻͷྫʢ̍ʣ
  16. ۃେ෦෼จࣈྻͷྫʢ̎ʣ • ۃେ෦෼จࣈ͸ abra ͱ a • a ͸ abra

    ͷதҎ֎ʹ΋ग़ݱ͢ΔͷͰผάϧʔϓ a b r a c a d a b r a
  17. ۃେ෦෼จࣈྻͷྫʢ̎ʣ • ۃେ෦෼จࣈ͸ abra ͱ a • a ͸ abra

    ͷதҎ֎ʹ΋ग़ݱ͢ΔͷͰผάϧʔϓ a b r a c a d a b r a
  18. ۃେ෦෼จࣈྻͷྫʢ̎ʣ • ۃେ෦෼จࣈ͸ abra ͱ a • a ͸ abra

    ͷதҎ֎ʹ΋ग़ݱ͢ΔͷͰผάϧʔϓ • ab ͸ abra ͱಉ͡άϧʔϓ a b r a c a d a b r a
  19. ۃେ෦෼จࣈྻͷྫʢ̏ʣ • ۃେ෦෼จࣈ͸ shi ͱ a • i ͸̎จࣈͣΒ͢ͱ shi

    ͱग़ݱҐஔ͕Ұக͢ΔͷͰ shi ͱಉ͡άϧʔϓ s h i m o b a y a s h i a k a w a k a m i • ۃେ෦෼จࣈ͸ aka ͱ a • aka ͷதʹ̎ճ໨ʹݱΕΔ a ͸ग़ݱҐஔ͕ҟͳΔ
  20. ۃେ෦෼จࣈྻʹΑΔީิΩʔϑϨʔζྻڍ • ۃେ෦෼จࣈྻΛߟ͑Δͱɺ೚ҙ௕ͷසग़จࣈྻΛྻڍͰ͖Δ • ۃେ෦෼จࣈྻ͸ ∞-gram ͱ΋ݺ͹ΕΔ • ྻڍ͢Δͱ͖ɺͦΕͧΕͷۃେ෦෼จࣈྻ͕Կճग़ݱ͔͕ͨ͠Θ ͔Δ

    • ग़ݱճ਺ʹΑͬͯϑΟϧλͰ͖Δ • ઀ඌࣙ໦ʹ͓͚Δ಺෦ϊʔυ͕ۃେ෦෼จࣈྻʹରԠ • ͨͩ͠ɺશϊʔυ͕ۃେ෦෼จࣈྻʹͳΔΘ͚Ͱ͸ͳ͍ʢޙड़ʣ
  21. • ςΩετͷ͢΂ͯͷ઀ඌࣙ (suf fi x) ͷ Patricia Trie • ྫ:

    abracadabra$ ͷ઀ඌࣙ໦ ઀ඌࣙ໦ [Ԭ໺ݪ & ⁋Ҫ 08] Ԭ໺ݪ େี, ⁋Ҫ ५Ұ. "શͯͷ෦෼จࣈྻΛߟྀͨ͠จॻ෼ྨ", NL187 ࣗવݴޠॲཧݚڀձ 2008
  22. ઀ඌࣙ໦ • BWT ΛݟΔͱۃେ෦෼จࣈྻ͔Ͳ͏͔νΣοΫͰ͖Δ [1] • BWT = ͜͜Ͱ͸ɺͦΕͧΕͷ઀ඌࣙͷલͷจࣈ •

    ઀ඌࣙ໦ͷ֤ϊʔυʹରԠ͢Δ઀ඌࣙͷ BWT ͕̎छྨ Ҏ্͔Βͳ͍ͬͯΔͱ͖ɺͦΕ͸ۃେ෦෼จࣈྻ [1] ۃେ෦෼จࣈྻ - Ξεϖ೔ه http://d.hatena.ne.jp/takeda25/20101202/1291269994
  23. ֦ு઀ඌࣙ഑ྻ (ESA) • ઀ඌࣙ໦্ͷૢ࡞Λಉ༷ͷܭࢉྔͰܭࢉͰ͖Δσʔλߏ଄ • ઀ඌࣙ໦ͷϊʔυͷྻڍͳͲ • ֦ு઀ඌࣙ഑ྻ (ESA) =

    ઀ඌࣙ഑ྻ (SA) + ࠷௕ڞ௨઀಄ࣙ഑ྻ (LCP) • ςΩετ௕ n ʹରͯ͠ 9n bytes [1] • ઀ඌࣙ໦ (20n bytes~) ΑΓίϯύΫτ [2] [1] D. Okanohara and J. Tsujii. 2009. Text Categorization with All Substring Features. In the SIAM International Conference on Data Mining (SDM). [2] M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. 2004. Replacing suf fi x trees with enhanced suf fi x arrays. J. Discrete Algs, 2:53–86.
  24. ۃେ෦෼จࣈྻͷྻڍ 1. SA-IS ΞϧΰϦζϜͱ Kasai's algorithm ʹΑΓ ESA (SA +

    LCP) Λߏங 2. BWT ͕มԽ͢Δ઀ඌࣙΛνΣοΫ [2] 3. LCP Λ࢖ͬͯ಺෦ϊʔυΛྻڍ • ͜ͷͱ͖ BWT ΛνΣοΫͯ͠ۃେ෦෼จࣈྻͷΈྻڍ͢Δ • ઀ඌࣙ໦ͷ಺෦ϊʔυͷྻڍ͸ςΩετ௕ T ʹରͯ͠ઢܗ࣌ؒͰ࣮ߦͰ͖Δ [1] • BWT ͷมԽͷνΣοΫ΋ઢܗ࣌ؒͰՄೳ [1] T. Kasai, G. Lee, H. Arimura, S. Arikawa and K. Park "LinearTime Longest-Common-Pre fi x Computation in Suf fi x Arrays and Its Applications", CPM 2001 [2] ۃେ෦෼จࣈྻ - Ξεϖ೔ه http://d.hatena.ne.jp/takeda25/20101202/1291269994
  25. esaxx • ઀ඌࣙ໦ͷ಺෦ϊʔυΛྻڍ͢Δ C++ ϥΠϒϥϦ • ֦ு઀ඌࣙ഑ྻ (ESA) Λߏங͢Δ •

    ۃେ෦෼จࣈྻ͔Ͳ͏͔ͷνΣοΫ͸ [1] Λࢀߟʹࣗ෼Ͱ࣮૷͢Δ • https://code.google.com/archive/p/esaxx/ [1] ۃେ෦෼จࣈྻ - Ξεϖ೔ه http://d.hatena.ne.jp/takeda25/20101202/1291269994
  26. ϑΣʔζ2: ΩʔϑϨʔζͷείΞϦϯά • ϑϨʔζͷείΞʢॏΈ෇͚ʣΛͲ͏ܭࢉ͢Δ͔ • ୯ޠͷॏΈ෇͚ʹ͸͍Ζ͍Ζͳํ๏͕͋Δ • TF-IDF • JLH

    είΞ • ૬ޓ৘ใྔ • ΧΠೋ৐஋ • φΠʔϒͳํ๏ 1. ϑϨʔζͷͳ͔ͷ୯ޠͷॏΈͷ࿨ΛͱΔ 2. ϑϨʔζʹରͯ͠ʢ୯ޠͱಉ͡Α͏ʹʣॏΈ෇͚Λܭࢉ͢Δ
  27. ࣮ݧ 1. ΩʔϫʔυͰ Elasticsearch Λݕ ࡧͯ͠จॻू߹Λऔಘ - ʮ͋ͷՖʯʮ܅ͷ໊͸ʯʮ೚ఱಊʯͳͲ 2. จॻͷຊจΛऔಘ

    - ࠓճ͸ઌ಄ͷ 300 จࣈͷΈ 3. MeCab Ͱܗଶૉղੳ - จࣈͰ͸ͳ͘ܗଶૉΛجຊ୯Ґͱ͢Δ ʢϊΠζ௿ݮʣ 4. ۃେ෦෼จࣈྻΛܭࢉͯ͠ީิϑ ϨʔζΛྻڍ - 5ճҎ্ग़ݱ͢Δ΋ͷ͚ͩ 5. ީิϑϨʔζʹରͯ͠ΧΠೋ৐஋Ͱ
 είΞϦϯά - Elasticsearch ͷϑϨʔζݕࡧΛ࢖͏ - ҎԼͷ౷ܭྔ͔ΒܭࢉͰ͖Δ - શମͷจॻ਺ - Ωʔϫʔυʹώοτͨ͠จॻ਺ - ͦΕΒͷதͰީิϑϨʔζΛؚΉจॻ਺ 6. είΞ͕ Top-K ͷۃେ෦෼จࣈྻΛ ฦ͢ - ࠓճ͸ 500 ݅
  28. ʮ͋ͷՖʯʹର͢Δ݁Ռ • ্Ґ20݅ 142684.106 Ώ͖ ͋ͭ 135512.226 ௕Ҫ ཾ ઇ

    121007.208 Ξχϝ ʮ ͋ͷ ೔ ݟ ͨ Ֆ ͷ ໊લ Λ 119079.563 ʮ ΊΜ · ʯ 118675.949 ɹ ͋ͷ Ֆ 118675.949 ʰ ͋ͷ Ֆ 118675.949 ʮ ͋ͷ Ֆ ʯ 118675.949 ʰ ͋ͷ Ֆ ʱ 118675.949 ʮ ͋ͷ Ֆ 94760.745 ʮ ͋ͷ ೔ ݟ ͨ Ֆ ͷ ໊લ Λ
  29. 86305.143 ʮ ͋ͷ ೔ ݟ ͨ Ֆ ͷ ໊લ Λ

    ๻ୡ ͸ ·ͩ ஌Β ͳ͍ 86305.143 ͋ͷ ೔ ݟ ͨ Ֆ ͷ ໊લ Λ ๻ୡ ͸ ·ͩ ஌Β ͳ͍ ɻ 86305.143 ͋ͷ ೔ ݟ ͨ Ֆ ͷ ໊લ Λ ๻ୡ ͸ ·ͩ ஌Β ͳ͍ 86305.143 ʰ ͋ͷ ೔ ݟ ͨ Ֆ ͷ ໊લ Λ ๻ୡ ͸ ·ͩ ஌Β ͳ͍ 55090.753 ాத ক լ 38692.776 Ԭా ຩ ཬ 35751.909 ௕Ҫ 29098.469 ʮ ৺ ͕ ڣͼ ͨ ͕ͬ ͯΔ Μ ͩ ɻ 29098.469 ʰ ৺ ͕ ڣͼ ͨ ͕ͬ ͯΔ Μ ͩ ɻ ʱ 29098.469 ʮ ৺ ͕ ڣͼ ͨ ͕ͬ ͯΔ Μ ͩ ɻ ʯ ʮ͋ͷՖʯʹର͢Δ݁Ռ
  30. ʮ܅ͷ໊͸ʯʹର͢Δ݁Ռ • ্Ґ80݅ 390586.996 ୍ ͱ ࡾ ༿ 126664.722 ʯ

    ʮ ܅ ͷ ໊ ͸ ɻ ʯ 123792.563 өը ʮ ܅ ͷ ໊ ͸ ɻ ʯ 123792.563 өը ʰ ܅ ͷ ໊ ͸ ɻ ʱ 109792.731 ৽ւ 106256.894 ɺ ৽ւ 106256.894 ͷ ৽ւ 106256.894 ɻ ৽ւ 104401.937 ɻ ৽ւ ੣ ؂ಜ 103768.965 ͷ ৽ւ ੣ 103768.965 ͨ ৽ւ ੣
  31. ʮ܅ͷ໊͸ʯʹର͢Δ݁Ռ 83881.345 ɻ ʮ ܅ ͷ ໊ ͸ ɻ 83881.345

    ʮ ܅ ͷ ໊ ͸ ɻ 83881.345 ʮ ܅ ͷ ໊ ͸ ɻ ʯ ͷ 83881.345 ɻ ʰ ܅ ͷ ໊ ͸ 83881.345 ʮ ܅ ͷ ໊ ͸ ɻ ʯ ͕ ɺ 83881.345 ʰ ܅ ͷ ໊ ͸ ɻ ʱ 83881.345 ɺ ʮ ܅ ͷ ໊ ͸ ɻ ʯ 79635.412 ͷ ʮ ܅ ͷ ໊ ͸ ɻ 79635.412 ͨ ɻ ʮ ܅ ͷ ໊ ͸ 79635.412 ͨ ʰ ܅ ͷ ໊ ͸ 83881.345 ʮ ܅ ͷ ໊ ͸ ɻ ʯ ͸ 83881.345 ʮ ܅ ͷ ໊ ͸ ɻ ʯ Λ 83881.345 ɻ ʮ ܅ ͷ ໊ ͸ 83881.345 ʮ ܅ ͷ ໊ ͸ ɻ ʯ 83881.345 ɻ ʰ ܅ ͷ ໊ ͸ ɻ ʱ 83881.345 ʮ ܅ ͷ ໊ ͸ ɻ ʯ ͕ 83881.345 ܅ ͷ ໊ ͸ 83881.345 ɻ ʰ ܅ ͷ ໊ ͸ ɻ 83881.345 ɺ ܅ ͷ ໊ ͸ 83881.345 ɺ ܅ ͷ ໊ ͸ ɻ 83881.345 ܅ ͷ ໊ ͸ ʁ 83881.345 ʮ ܅ ͷ ໊ ͸ 83881.345 ʮ ܅ ͷ ໊ ͸ ɻ ʯ ͸ ɺ 83881.345 ɻ ܅ ͷ ໊ ͸ 83881.345 ʮ ܅ ͷ ໊ ͸ ʯ ͱ
  32. ʮ܅ͷ໊͸ʯʹର͢Δ݁Ռ 68454.408 ɺ ࡾ༿ 66923.168 ࢳ क 52875.263 ʮ ܅

    ͷ ໊ ͸ ʯ 48620.102 ࡾ༿ 40170.705 өը ʰ ܅ ͷ ໊ ͸ ɻ ʱ Ͱ ώϩΠϯ ͷ 38947.540 ٶ ਫ 38243.976 ࡾ ༿ ͸ 37993.993 ৽ւ ੣ 29207.758 ʮ ܅ ͷ ໊ ͸ ɻ ʯ Λ ݟ 29207.758 ʮ ܅ ͷ ໊ ͸ ɻ ʯ Λ ݟ ͯ 22217.179 ৽ւ ੣ ؂ಜ ࠷৽ ࡞ 22161.230 ৽ւ ੣ ͷ 21938.554 ʮ લલ લੈ 21155.167 ৽ւ ੣ ࡞඼ 19854.987 ৽ւ ؂ಜ 19222.009 ٶ ਫ ࡾ ༿ 17993.221 ཱՖ ୍
  33. 17612.465 ʮ ܅ ͷ ໊ ͸ ɻ ʯ Λ ؍

    13315.859 ʮ εύʔΫϧ 12497.399 ৽ւ ੣ ؂ಜ ࠷৽ ࡞ ʰ ܅ ͷ ໊ ͸ ɻ ʱ 12043.648 ࢳ क ொ 11079.889 ্ നੴ 9984.243 ʮ ඵ଎ 5 ηϯνϝʔτϧ ʯ 6407.954 ৽ւ ੣ ؂ಜ ͷ Ξχϝ өը 5858.545 ৽ւ ੣ ؂ಜ 5662.690 ʮ γϯ ɾ ΰδϥ ʯ 5472.189 γϯ ɾ ΰδϥ 3677.951 ୍ ͱ 3454.078 ೖΕସΘΓ ʮ܅ͷ໊͸ʯʹର͢Δ݁Ռ
  34. 3407.773 ʮ ඵ଎ 2634.372 ɻ ʮ ܅ ͷ 2634.372 ܅

    ͷ 2634.372 ɻ ܅ 2634.372 ɻ ܅ ͷ 2492.065 ʮ ඵ଎ 5 ηϯνϝʔτϧ 2036.764 ্ നੴ ๖ Ի 1511.662 ͷ Ξχϝ өը 1511.662 ͷ Ξχϝ өը ʮ 1456.590 ʮ ܅ ͷ 1274.915 2016 - 08 - 1222.903 ԯ ԁ Λ ಥഁ 1134.913 ͷ େ ώοτ 1129.179 લલ લੈ ʮ܅ͷ໊͸ʯʹର͢Δ݁Ռ
  35. ʮ೚ఱಊʯʹର͢Δ݁Ռ • ্Ґ50݅ 956576.307 … ೚ఱಊ 608643.451 ͸ ɺ ೚ఱಊ

    ͸ 473775.714 ؠా ૱ 464686.595 ؠా ૱ ࣾ௕ ͕ 458018.217 ͠ ɺ ೚ఱಊ 458018.217 ೚ఱಊ ͷ 458018.217 ɻ ೚ఱಊ ɺ 458018.217 ·͠ ͨ ɻ ೚ఱಊ 458018.217 ೚ఱಊ Λ 458018.217 ೚ఱಊ * 458018.217 ʹ ೚ఱಊ 458018.217 ͨ ೚ఱಊ 458018.217 ɻ ೚ఱಊ ͸ 404220.379 ̸̸̬
  36. ʮ೚ఱಊʯʹର͢Δ݁Ռ 386131.784 ؠా ࣾ௕ ͷ 385544.541 ͸ ɺ ೚ఱಊ ͕

    314103.697 ؠా ૱ ࣾ௕ 309061.454 ϚϦΦ ʯ 275932.322 ؠా ࣾ௕ 271259.444 amiibo Λ 229298.880 ؠా ࣾ௕ ͕ 226600.614 Ͱ͢ ͕ ɺ ೚ఱಊ 219915.400 ͠ ͨ ɻ ؠా ࣾ௕ 217359.056 ɻ ɹ ؠా ࣾ௕ ͸ 217359.056 ɻ ɹ ؠా ࣾ௕ 217359.056 ɻ ؠా ࣾ௕ 205515.499 ϑΝϛϦʔ ίϯϐϡʔλ 169882.335 ͯ ͍Δ ɻ ೚ఱಊ 167779.063 New χϯςϯυʔ 3 DS
  37. 166283.753 ϚϦΦ ͷ 162672.396 ܕ ήʔϜ 143406.225 Miitomo 139477.360 ϚϦΦ

    ϝʔΧʔ 133729.277 ɺ ೚ఱಊ 130604.759 גओ ૯ձ Λ ܽ੮ 124955.944 ೚ఱಊ ͷ ؠా ૱ 124945.709 ϚϦΦ Χʔτ 119639.084 ɻ ʮ ϚϦΦ 112837.829 ਾ͑ஔ͖ ܕ ήʔϜ ػ 99299.695 ϛʔτϞ 89169.832 ̏ ̨̙ 83948.098 ถࠃ ೚ఱಊ ͷ ʮ೚ఱಊʯʹର͢Δ݁Ռ
  38. ʮ೚ఱಊʯʹର͢Δ݁Ռ 78462.915 ʮ ϚϦΦ ʯ 67171.541 ೥຤ ঎ઓ 66552.483 New

    χϯςϯυʔ 3 DS / 66088.000 Mii 64830.347 ਾ͑ஔ͖ ܕ 63918.676 ʮ θϧμ ͷ ఻આ ʯ 61309.998 ܅ౡ ࢯ 60804.122 ٶຊ ࢯ 58416.532 େ ཚಆ εϚογϡϒϥβʔζ
  39. ߟ࡯ • ͳΜͱͳ͘είΞͱϑϨʔζͷΑ͞͸૬ؔͯͦ͠͏ • ࠷ԼҐͷ΄͏͸͍͍ϑϨʔζ͕ͳ͍ • ҰํͰɺ͍͍ϑϨʔζ͕த͘Β͍ʹ͋ͬͨΓ΋͢Δ • ه߸͕͚ͬ͜͏ϊΠζʹͳ͍ͬͯΔ •

    ه߸ɾॿࢺͱ͔Ͱ࢝·ͬͯΔ৔߹͸ϑΟϧλ͢Ε͹Αͦ͞͏ • ϑϨʔζʹରͯ͠΋ TF-IDF ΍ΧΠೋ৐஋ͳͲ͸ҙຯΛ΋ͭͷ͔ʁ • ΋ͬͱੑೳͷྑ͍ϑϨʔζείΞϦϯάख๏͕ݚڀ͞Ε͍ͯΔ͔΋ʁ
  40. είΞϦϯάख๏ͷαʔϕΠ • ैདྷख๏ͷαʔϕΠ࿦จɿ [Hasan & Ng 2014] • ΩʔϑϨʔζநग़ख๏ͷ state-of-the-art

    (2014 ೥࣌఺) • ڭࢣ͋Γͷख๏ͱڭࢣͳ͠ͷख๏͕͋Δ • ڭࢣ͋Γ͕ੑೳ͕ߴ͍ͱ΋͍͑ͳ͍ • 4ͭͷσʔληοτͷ͏ͪ3ͭͷ SOTA ͸ڭࢣͳ͠ [Hasan & Ng 2014] • ڭࢣ͋Γ͸ֶशσʔλΛ༻ҙͨ͠ΓϞσϧΛ؅ཧͨ͠Γ͍Ζ͍Ζେม
  41. ैདྷख๏ʢڭࢣͳ͠ʣ 1. άϥϑϕʔεϥϯΩϯά • TextRank 2. τϐοΫϕʔεΫϥελϦϯά 1. KeyCluster 2.

    Topical PageRank (TPR) 3. Community Cluster 3. ݴޠϞσϧϕʔε 1, 2 ͸ֶश͕େมͦ͏ 1. άϥϑϕʔεɿάϥϑ͸୯ޠͲ͏͠ ͷ૊Έ߹ΘͤͳͷͰ2৐Φʔμʔ 2. τϐοΫϕʔεɿτϐοΫϞσϧʹ ͔͚Δͷ͕ॏ͍ 3. ݴޠϞσϧɿ୯ޠΛΧ΢ϯτ͍ͯ͠ ͚ͩ͘ͳͷͰઢܗΦʔμʔ
  42. ࢀߟจݙ • [Turney 00] Peter D. Turney. "Learning algorithms for

    keyphrase extraction", Information retrieval 2.4 (2000): 303-336 • https://arxiv.org/pdf/cs/0212020.pdf • [Hasan & Ng 14] Kazi Saidul Hasan and Vincent Ng. "Automatic Keyphrase Extraction: A Survey of the State of the Art._ Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)" 2014, pages 1262-1273 • https://www.aclweb.org/anthology/P/P14/P14-1119.xhtml • ࣗಈΩʔϑϨʔζநग़ʹ͍ͭͯͷମܥతͳϨϏϡʔ࿦จ • [Liu+ 09] Z. Liu, P. Li, Y. Zheng and M. Sun. "Clustering to fi nd exemplar terms for keyphrase extraction", 2009, pp. 257–266 • ީิΩʔϑϨʔζΛͭ͘Δͱ͖ɺετοϓϫʔυͷࣙॻΛ࢖ͬͯετοϓϫʔυΛ͸͡ ͍͍ͯΔ
  43. • [Ԭ໺ݪ & ⁋Ҫ 08] Ԭ໺ݪ େี, ⁋Ҫ ५Ұ. "શͯͷ෦෼จࣈྻΛߟྀͨ͠จॻ෼ྨ",

    NL187 ࣗવݴޠॲཧݚڀձ 2008 • http://ci.nii.ac.jp/naid/110006980330 • [Okanohara & Tsujii 09] D. Okanohara and J. Tsujii. "Text Categorization with All Substring Features", In the SIAM International Conference on Data Mining (SDM) 2009 • http://epubs.siam.org/doi/abs/10.1137/1.9781611972795.72 • [Abouelhoda+ 04] M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. "Replacing suf fi x trees with enhanced suf fi x arrays.", J. Discrete Algs 2004, 2:53–86. • https://pdfs.semanticscholar.org/4ca9/ ea95a0a9846965e86619e646d9ca36930c18.pdf • [Kasai+ CPM 01] T. Kasai, G. Lee, H. Arimura, S. Arikawa and K. Park "LinearTime Longest-Common-Pre fi x Computation in Suf fi x Arrays and Its Applications", CPM 2001 ࢀߟจݙ