Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
みんなのPython勉強会#38登壇資料 tf-idfを使ったグロースハック
Search
sugaya takehiro
September 12, 2018
Technology
1
880
みんなのPython勉強会#38登壇資料 tf-idfを使ったグロースハック
sugaya takehiro
September 12, 2018
Tweet
Share
Other Decks in Technology
See All in Technology
IPA&AWSダブル全冠が明かす、人生を変えた勉強法のすべて
iwamot
PRO
2
110
NewSQLや分散データベースを支えるRaftの仕組み - 仕組みを理解して知る得意不得意
hacomono
PRO
2
130
Flutter向けPDFビューア、pdfrxのpdfium WASM対応について
espresso3389
0
130
第4回Snowflake 金融ユーザー会 Snowflake summit recap
tamaoki
1
280
OPENLOGI Company Profile for engineer
hr01
1
34k
Beyond Kaniko: Navigating Unprivileged Container Image Creation
f30
0
130
Geminiとv0による高速プロトタイピング
shinya337
1
270
What’s new in Android development tools
yanzm
0
300
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
3
960
Sansanのデータプロダクトマネジメントのアプローチ
sansantech
PRO
0
140
Connect 100+を支える技術
kanyamaguc
0
200
Zephyr RTOSを使った開発コンペに参加した件
iotengineer22
1
220
Featured
See All Featured
Thoughts on Productivity
jonyablonski
69
4.7k
The Language of Interfaces
destraynor
158
25k
Docker and Python
trallard
44
3.5k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
8
690
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Automating Front-end Workflow
addyosmani
1370
200k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
130
19k
The Cult of Friendly URLs
andyhume
79
6.5k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
A better future with KSS
kneath
238
17k
Transcript
͍·͞Β͚ͩͲPythonͰtf-idfͬͯΈͨ UGJEGΛͬͯ ΞϓϦͷάϩʔεΛͯ͠Έͨ
ΤϯδχΞ σʔλΞφϦετ Ϗδωε ຊ͍Β͍ͯ͠Δํ
ࣗݾհ • Takehiro Sugara @sugartaker • ੲϦαʔνձࣾͰ ੳɾࣄۀ։ൃ͍ͯ͠·ͨ͠ • ࠓϔϧεέΞΞϓϦͷ
άϩʔεϋοΫΛ͍ͯ͠·͢
ಥવͰ͕͢ɺࢲ͋ΔࣈΛͱͯάϩʔεͤ͞·ͨ͠ ඪ
1 0 2 ࢲͷମॏͰ͢ దਖ਼ମॏ
ࠓ͢͜ͱ • ࣗݾհ • tf-idfΛͬͯΞϓϦͷάϩʔεΛͯ͠Έͨ
͜Μͳ͜ͱ͋Γ·ͤΜ͔ʁ • Ϛʔέ୲ऀ • ݁ہͲΜͳײ͡ͷࠂόφʔ͕͍͍ͷʁ • ηʔϧε୲ऀ • ݁ہͲΜͳײ͡ͷϝϧϚΨɾϓογϡ௨͕͍͍ͷʁ •
ϥΠλʔ • ݁ہͲΜͳײ͡ͷهࣄ͕͍͍ͷʁ
͜Μͳ͜ͱ͋Γ·ͤΜ͔ʁ ਖ਼Ϧιʔε͕Γͳͯ͘ࡉ͔͍ͱ͜Ζ·ͰΈͯΒΕͳ͍ʂ
'J/$Ͱ͋Γ·ͨ͠
ͦͦ'J/$ͬͯͲΜͳձࣾʁ
ʮ༧ϔϧεέΞºςΫϊϩδʔʯʹಛԽͨ͠ϔϧεςοΫϕϯνϟʔ l"CPVU'J/$z
ɹ FiNC͕ఏڙՄೳͳιϦϡʔγϣϯ 'J/$͕ఏڙ͍ͯ͠ΔαʔϏε FiNCΞϓϦ ʢToC͚ΞϓϦʣ FiNC for Business ʢToB͚αʔϏεʣ FiNC
Fit ʢύʔιφϧδϜʣ FiNC Mall ʢECαΠτʣ
ɹ FiNC͕ఏڙՄೳͳιϦϡʔγϣϯ 'J/$͕ఏڙ͍ͯ͠ΔαʔϏε FiNCΞϓϦ ʢToC͚ΞϓϦʣ FiNC for Business ʢToB͚αʔϏεʣ FiNC
Fit ʢύʔιφϧδϜʣ FiNC Mall ʢECαΠτʣ
ɹ FiNC͕ఏڙՄೳͳιϦϡʔγϣϯ 'J/$ΞϓϦ͕ఏڙ͍ͯ͠ΔαʔϏε ϝσΟΞ ϥΠϑϩά νϟοτϘοτ αϒεΫϦϓγϣϯ
ɹ FiNC͕ఏڙՄೳͳιϦϡʔγϣϯ 'J/$ΞϓϦ͕ఏڙ͍ͯ͠ΔαʔϏε ϝσΟΞ ϥΠϑϩά νϟοτϘοτ αϒεΫϦϓγϣϯ
ɹ FiNC͕ఏڙՄೳͳιϦϡʔγϣϯ 'J/$ΞϓϦ͕ఏڙ͍ͯ͠ΔαʔϏε ϝσΟΞ • 20181݄͔Βελʔτ • ϔϧεέΞؔ࿈ͷهࣄΛܝࡌ͍ͯ͠Δ
՝ ݁ہͲΜͳײ͡ͷهࣄ͕͍͍ͷʁ ϥΠλʔ
՝ • ݸʑͷίϯςϯπͷCTRɾ͓ؾʹೖΓɾࡏ࣌ؒΘ͔Δ • ͰશମతʹͲΜͳίϯςϯπ͕έΔͷ͔ײ֮తʹ͔͠Θ͔Βͳ͍
ղܾࡦ • ͲΜͳ୯ޠ͕ೖͬͨهࣄͩͱέ͍͢ͷ͔Λఆྔతʹग़͢
UGJEGΛͬͯΈͨ
UGJEGͱʁ • tf-idfͱʁ • Term Frequency Inverse Document Frequencyͷུ •
จষͷத͔ΒಛޠΛநग़͜ͱ͕Ͱ͖Δ • tf-idfΛ͏ཧ༝ • ʢݹయతͳख๏͚ͩͲʣ • ܭࢉ͍͢͠ • આ໌͍͢͠ • ͺͬͱग़ͤΔ
UGJEGͷϩδοΫ • tfɿରจষͷର୯ޠͷग़ݱճ ɹɹ/ ରจষͷશͯͷ୯ޠͷग़ݱճ ɹˠͦͷ୯ޠ͕ͦͷจষʹͲΕ͚ͩଟ͘ग़ݱ͍ͯ͠Δ͔ • idfɿlog(૯จষ / ର୯ޠ͕ग़ݱ͢Δจষʣ+
1 ɹɹˠͦͷ୯ޠ͕શମͷจষʹରͯ͠ͲΕ͚ͩϨΞ͔ • tf-idfɿtf * idf
45&1 ϩʔσʔλ ࡞ ܗଶૉղੳ tf-idfΛ ܭࢉ
ϩʔσʔλͷ࡞ จষ༰ จষ1 ࢲPythonͷຊΛಡΉ จষ2 ࢲຊ͕͖ͩ จষ3 ࢲPythonͷຊΛಡΈͳ͕Β PythonͷίʔυΛॻ͘
ܗଶૉղੳ จষ༰ จষ1 ࢲPythonͷຊΛಡΉ จষ2 ࢲຊ͕͖ͩ จষ3 ࢲPythonͷຊΛಡΈͳ͕Β PythonͷίʔυΛॻ͘
ܗଶૉղੳ จষ༰ จষ1 ࢲ Python ຊ จষ2 ࢲ ຊ จষ3
ࢲ Python ຊ Pythonίʔυ
UGJEGͷܭࢉ จষ༰ จষ1 ࢲ Python ຊ จষ2 ࢲ ຊ จষ3
ࢲ Python ຊ Python ίʔυ TF ࢲ 1/5 = 0.2 Python 2/5 = 0.4 ຊ 1/5 = 0.2 ίʔυ 1/5 = 0.2
UGJEGͷܭࢉ จষ༰ จষ1 ࢲ Python ຊ จষ2 ࢲ ຊ จষ3
ࢲ Python ຊ Python ίʔυ TF ࢲ 1/5 = 0.2 Python 2/5 = 0.4 ຊ 1/5 = 0.2 ίʔυ 1/5 = 0.2 ରจষͷର୯ޠͷग़ݱճ ɹɹ/ ରจষͷશͯͷ୯ޠͷग़ݱճ →ͦͷ୯ޠ͕ͦͷจষʹͲΕ͚ͩଟ͘ग़ݱ͍ͯ͠Δ͔
UGJEGͷܭࢉ จষ༰ จষ1 ࢲ Python ຊ จষ2 ࢲ ຊ จষ3
ࢲ Python ຊ Python ίʔυ TF IDF ࢲ 1/5 = 0.2 log2(3/3) + 1 = 1 Python 2/5 = 0.4 log2(3/2) + 1 = 1.58 ຊ 1/5 = 0.2 log2(3/3) + 1= 1 ίʔυ 1/5 = 0.2 log2(3/1) + 1= 2.58
UGJEGͷܭࢉ จষ༰ จষ1 ࢲ Python ຊ จষ2 ࢲ ຊ จষ3
ࢲ Python ຊ Python ίʔυ TF IDF ࢲ 1/5 = 0.2 log2(3/3) + 1 = 1 Python 2/5 = 0.4 log2(3/2) + 1 = 1.58 ຊ 1/5 = 0.2 log2(3/3) + 1= 1 ίʔυ 1/5 = 0.2 log2(3/1) + 1= 2.58 log(૯จষ / ର୯ޠ͕ग़ݱ͢Δจষʣ+ 1 →ͦͷ୯ޠ͕શମͷจষʹରͯ͠ͲΕ͚ͩϨΞ͔
UGJEGͷܭࢉ จষ༰ จষ1 ࢲ Python ຊ จষ2 ࢲ ຊ จষ3
ࢲ Python ຊ Python ίʔυ TF IDF TF-IDF ࢲ 1/5 = 0.2 log2(3/3) + 1 = 1 0.20 Python 2/5 = 0.4 log2(3/2) + 1 = 1.58 0.63 ຊ 1/5 = 0.2 log2(3/3) + 1= 1 0.20 ίʔυ 1/5 = 0.2 log2(3/1) + 1= 2.58 0.52
UGJEGͷܭࢉ จষ༰ จষ1 ࢲ Python ຊ จষ2 ࢲ ຊ จষ3
ࢲ Python ຊ Python ίʔυ TF IDF TF-IDF ࢲ 1/5 = 0.2 log2(3/3) + 1 = 1 0.20 Python 2/5 = 0.4 log2(3/2) + 1 = 1.58 0.63 ຊ 1/5 = 0.2 log2(3/3) + 1= 1 0.20 ίʔυ 1/5 = 0.2 log2(3/1) + 1= 2.58 0.52 TF * IDF
UGJEGͷܭࢉ จষ༰ จষ1 ࢲ Python ຊ จষ2 ࢲ ຊ จষ3
ࢲ Python ຊ Python ίʔυ TF IDF TF-IDF ࢲ 1/5 = 0.2 log2(3/3) + 1 = 1 0.20 Python 2/5 = 0.4 log2(3/2) + 1 = 1.58 0.63 ຊ 1/5 = 0.2 log2(3/3) + 1= 1 0.20 ίʔυ 1/5 = 0.2 log2(3/1) + 1= 2.58 0.52
UGJEGͷܭࢉ จষ༰ จষ1 ࢲ Python ຊ จষ2 ࢲ ຊ จষ3
ࢲ Python ຊ Python ίʔυ TF IDF TF-IDF ࢲ 1/5 = 0.2 log2(3/3) + 1 = 1 0.20 Python 2/5 = 0.4 log2(3/2) + 1 = 1.58 0.63 ຊ 1/5 = 0.2 log2(3/3) + 1= 1 0.20 ίʔυ 1/5 = 0.2 log2(3/1) + 1= 2.58 0.52 ͜ͷจষͰ Pythonͱ͍͏୯ޠ ͕ಛతʂ
UGJEGͷܭࢉ จষ༰ จষ1 ࢲ Python ຊ จষ2 ࢲ ຊ จষ3
ࢲ Python ຊ Python ίʔυ TF IDF TF-IDF ࢲ 1/5 = 0.2 log2(3/3) + 1 = 1 0.20 Python 2/5 = 0.4 log2(3/2) + 1 = 1.58 0.63 ຊ 1/5 = 0.2 log2(3/3) + 1= 1 0.20 ίʔυ 1/5 = 0.2 log2(3/1) + 1= 2.58 0.52 ͜ͷจষͰ Pythonͱ͍͏୯ޠ ͕ಛతʂ
՝ • ݸʑͷίϯςϯπͷCTRɾ͓ؾʹೖΓɾࡏ࣌ؒΘ͔Δ • ͰશମతʹͲΜͳίϯςϯπ͕ड͚Δͷ͔ײ֮తʹ͔͠Θ͔Βͳ͍
͔ͭͯ͜Μͳ͜ͱ͕͋Γ·ͨ͠ هࣄ༰ KPI هࣄ1 μΠΤοτʹӡಈ͕ॏཁ ྑ͍ هࣄ2 μΠΤοτదͳӡಈͱӫཆɺ ಛʹ࣭ͷ੍ݶ͕ޮՌత ྑ͍
هࣄ3 ࣭ΛμΠΤοτதʹ৯ͨ͘ͳͬͨΒʁ ѱ͍ هࣄ4 ӫཆΛؾʹͯ͠μΠΤοτɺ ӫཆ࣭όϥϯεΑ͘ઁऔ͠Α͏ ѱ͍
͔ͭͯ͜Μͳ͜ͱ͕͋Γ·ͨ͠ هࣄ༰ KPI هࣄ1 μΠΤοτʹӡಈ͕ॏཁ ྑ͍ هࣄ2 μΠΤοτదͳӡಈͱӫཆɺ ಛʹ࣭ͷ੍ݶ͕ޮՌత ྑ͍
هࣄ3 ࣭ΛμΠΤοτதʹ৯ͨ͘ͳͬͨΒʁ ѱ͍ هࣄ4 ӫཆΛؾʹͯ͠μΠΤοτɺ ӫཆ࣭όϥϯεΑ͘ઁऔ͠Α͏ ѱ͍
͔ͭͯ͜Μͳ͜ͱ͕͋Γ·ͨ͠ هࣄ༰ KPI هࣄ1 μΠΤοτʹӡಈ͕ॏཁ ྑ͍ هࣄ2 μΠΤοτదͳӡಈͱӫཆɺ ಛʹ࣭ͷ੍ݶ͕ޮՌత ྑ͍
هࣄ3 ࣭ΛμΠΤοτதʹ৯ͨ͘ͳͬͨΒʁ ѱ͍ هࣄ4 ӫཆΛؾʹͯ͠μΠΤοτɺ ӫཆ࣭όϥϯεΑ͘ઁऔ͠Α͏ ѱ͍ μΠΤοτهࣄ͕ ͍͍Μ͡Όͳ͍ʁ
͔ͭͯ͜Μͳ͜ͱ͕͋Γ·ͨ͠ هࣄ༰ KPI هࣄ1 μΠΤοτʹӡಈ͕ॏཁ ྑ͍ هࣄ2 μΠΤοτదͳӡಈͱӫཆɺ ಛʹ࣭ͷ੍ݶ͕ޮՌత ྑ͍
هࣄ3 ࣭ΛμΠΤοτதʹ৯ͨ͘ͳͬͨΒʁ ѱ͍ هࣄ4 ӫཆΛؾʹͯ͠μΠΤοτɺ ӫཆ࣭όϥϯεΑ͘ઁऔ͠Α͏ ѱ͍ ຊ μΠΤοτهࣄ ྑ͍ͷѱ͍ͷ ͋Δ
UGJEGͩͯ͠ΈΔ هࣄ༰ KPI هࣄ1 μΠΤοτʹӡಈ͕ॏཁ ྑ͍ هࣄ2 μΠΤοτదͳӡಈͱӫཆɺ ಛʹ࣭ͷ੍ݶ͕ޮՌత ྑ͍
هࣄ3 ࣭ΛμΠΤοτதʹ৯ͨ͘ͳͬͨΒʁ ѱ͍ هࣄ4 ӫཆΛؾʹͯ͠μΠΤοτɺ ӫཆ࣭όϥϯεΑ͘ઁऔ͠Α͏ ѱ͍
UGJEGͩͯ͠ΈΔ هࣄ༰ KPI هࣄ1 μΠΤοτ ӡಈ ྑ͍ هࣄ2 μΠΤοτ ӡಈ
ӫཆ ࣭ ྑ͍ هࣄ3 ࣭ μΠΤοτ ѱ͍ هࣄ4 ӫཆ μΠΤοτ ӫཆ ࣭ ѱ͍
UGJEGͩͯ͠ΈΔ هࣄ༰ KPI هࣄ1 μΠΤοτ ӡಈ ྑ͍ هࣄ2 μΠΤοτ ӡಈ
ӫཆ ࣭ ྑ͍ هࣄ3 ࣭ μΠΤοτ ѱ͍ هࣄ4 ӫཆ μΠΤοτ ӫཆ ࣭ ѱ͍
UGJEGͩͯ͠ΈΔ هࣄ༰ KPI هࣄ1 هࣄ2 μΠΤοτ ӡಈ μΠΤοτ ӡಈ ӫཆ
࣭ ྑ͍ هࣄ3 هࣄ4 ࣭ μΠΤοτ ӫཆ μΠΤοτ ӫཆ ࣭ ѱ͍
UGJEGͩͯ͠ΈΔ tf-idf μΠΤοτ ӡಈ ӫཆ ࣭ هࣄ1 هࣄ2 ※KPIྑ͍ 0.54
0.75 0.27 0.27 هࣄ3 هࣄ4 ※KPIѱ͍ 0.56 0 0.58 0.58
UGJEGͩͯ͠ΈΔ tf-idf μΠΤοτ ӡಈ ӫཆ ࣭ هࣄ1 هࣄ2 ※KPIྑ͍ 0.54
0.75 0.27 0.27 هࣄ3 هࣄ4 ※KPIѱ͍ 0.56 0 0.58 0.58 ӡಈͷهࣄ͕ Αͦ͞͏ʂ
ࢪࡦ ྑ͛͞ͳ୯ޠ͔ΒੜίϯςϯπΛ࡞͢Δ
݁Ռ DAUҰਓ͋ͨΓͷPV্͕ʂ
·ͱΊ • tf-idf • PythonͰ؆୯ʹͩ͢͜ͱ͕Ͱ͖Δ • จষͷத͔ΒಛޠΛநग़Ͱ͖Δ • ͬ͘͟ΓͱέΔ/έͳ͍ΩʔϫʔυͷΛ͔ͭΊΔ •
ςΩετͷཁྨͷ࠷ॳͷҰาʹ͓͢͢Ί • ࠓճهࣄͷࣄྫ͕ͩɺϝϧϚΨɾϓογϡ௨ͳͲ Ͱ͑Δͣ
͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠ʂ