Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Pythonistaに憧れた分析屋の奮闘記
Search
kanan
June 09, 2017
Programming
0
1.2k
Pythonistaに憧れた分析屋の奮闘記
kanan
June 09, 2017
Tweet
Share
More Decks by kanan
See All by kanan
PyLadiesCaravan_in_苫小牧
kanan
0
57
Python超入門_データ分析編in青森
kanan
0
110
Pythonデータ分析コトハジメin愛知3rd
kanan
1
120
PyLadiesCaravan in 大阪
kanan
0
250
PyLadiesCaravan in 名古屋Returns
kanan
0
150
PyLadiesCaravan in 愛媛(Python入門データ分析編)
kanan
0
310
Python入門_PyLadiesTokyo2021/08/29
kanan
0
340
コトハジメ的Python入門_WiDS広島
kanan
0
87
予測モデルがポンコツになった日_PyLadiesTokyo10May2020-LT
kanan
0
410
Other Decks in Programming
See All in Programming
CSC509 Lecture 12
javiergs
PRO
0
160
Jakarta EE meets AI
ivargrimstad
0
630
Make Impossible States Impossibleを 意識してReactのPropsを設計しよう
ikumatadokoro
0
240
カンファレンスの「アレ」Webでなんとかしませんか? / Conference “thing” Why don't you do something about it on the Web?
dero1to
1
110
とにかくAWS GameDay!AWSは世界の共通言語! / Anyway, AWS GameDay! AWS is the world's lingua franca!
seike460
PRO
1
900
Quine, Polyglot, 良いコード
qnighy
4
650
Nurturing OpenJDK distribution: Eclipse Temurin Success History and plan
ivargrimstad
0
990
TypeScriptでライブラリとの依存を限定的にする方法
tutinoko
3
700
OSSで起業してもうすぐ10年 / Open Source Conference 2024 Shimane
furukawayasuto
0
110
Jakarta EE meets AI
ivargrimstad
0
160
Why Jakarta EE Matters to Spring - and Vice Versa
ivargrimstad
0
1.2k
ローコードSaaSのUXを向上させるためのTypeScript
taro28
1
630
Featured
See All Featured
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
42
9.2k
Agile that works and the tools we love
rasmusluckow
327
21k
KATA
mclloyd
29
14k
Side Projects
sachag
452
42k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.1k
[RailsConf 2023] Rails as a piece of cake
palkan
52
4.9k
What’s in a name? Adding method to the madness
productmarketing
PRO
22
3.1k
A better future with KSS
kneath
238
17k
The Pragmatic Product Professional
lauravandoore
31
6.3k
The Cost Of JavaScript in 2023
addyosmani
45
6.8k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
665
120k
Ruby is Unlike a Banana
tanoku
97
11k
Transcript
PythonistaʹಌΕͨੳͷฃಆه @kanan* 2017.06.07 ΈΜͳͷPythonษڧձ#25
ࣗݾհ ˎKANANʢ͔ͳΜʣ ɹɹcompass-ID @kanan* ɹɹTwitter-ID @Addition_quince ˎ͓ࣄ ɹɹSIerاۀͰσʔλαΠΤϯεؔ࿈ۀΛͬͯ·͢ ɹɹݴޠSAS ˎPyLadiesʹ2016.ळ͔Βͪΐͪ͜ΐ͜ࢀՃ
ˎझຯԅɻ͓ञ͕େ͖ɻ
ࠓͷ͓ͳ͠ ˎϓϩάϥϛϯάʹڵຯ͕ͳ͔ͬͨੳ͕ ˎεΩϧΞοϓͨͯ͘͠PythonʹखΛग़ͦ͏ͱܾҙ͠ ˎ৺ંΕͳ͕Βฃಆͨ͠10ϲ݄ؒͷ͓Ͱ͢ PythonistaʹಌΕͨੳͷฃಆه Pythonista Lv0 → Lv1.5 ※͖ͬͱੌ͍ਓLv
20͘Β͍
Pythonͱͷग़ձ͍ σʔλੳܥɹˠɹPython Α͋͘Δͭ
͜Ε·Ͱͷϓϩάϥϛϯάྺ 2013 2014 2015 2016 2012 2011 େֶ γεςϜΤϯδχΞ σʔλαΠΤϯςΟετ
ˎੲϓϩάϥϛϯάͯͨͭ͠Γͩͬͨ ˎࣄͰϚωδϝϯτ͕ଟ͘ͳΓɺ৮Βͳ͍ۭനͷظؒ ˎSASݴޠͱ͍͍ͳ͕ΒπʔϧͬΆ͍ ϓϩάϥϛϯάɹʹɹࣄ
͜Ε·Ͱͷϓϩάϥϛϯάྺ ͪΌΜͱϓϩάϥϛϯάͬͯͳ͍
PythonσϏϡʔΛܾҙ PyLevel 0 20169݄ ˎҟ༷ͳ΄ͲͷΔؾ ˎΔੳͷࣝ
ͦͷ̍िؒޙɾɾɾ PyLevel 0 20169݄ ͦͷ̎ ˎΠϯετʔϧํ๏ݕࡧ͢Δͱ ɹɹΓํ͕ຯʹҧ͏ ˎHomebrewʁpyenvʁφχιϨ ˎPython3ΛೖΕͨͷʹ ɹɹόʔδϣϯ͕Python2ͬͯͳΔ
ˎ.bash_plofileͳ͍͠ʂ ˎPython͡ΊΔͨΊʹങͬͨ ɹɹMACͷ͍ํ͕Θ͔Βͳ͍ Πϯετʔϧ Ͱ͖ͳ͍ʂ
ܸɹ
ͦΜͳ࣌ɺPyLadiesTokyoͱग़ձ͏
PyLadies TokyoͰSTEP UP PyLadies Tokyo ळ߹॓ 2016 [2016.10.8-10] PyLadies Tokyo
Meet Up #16 [2016.11.27] PyLadies Tokyo Meet Up #17 [2016.12.11] PyLadies Tokyo Meet Up #20 [2016.03.25] *ڥߏங,AnacondaͱJupyterNotebookͷ͍ํΛֶͿ *ڝٕϓϩάϥϛϯάͰPythonͷॻ͖ํΛֶͿ *ϚΠίϯϘʔυ(STM32 Discovery)ʹMicro PythonͰLνΧ *WebεΫϨΠϐϯάΛͬͯΈΔ
ݸਓతʹσʔλੳपลͰษڧ ˎσʔλੳܥ ɹɹɹ1) titanicੜଘऀ༧ଌ ɹɹɹ2) ΞϝϦΧͷՈͷച٫Ձ֨༧ଌ ɹɹɹ3) ίϯϏχͷച্͛༧ଌ ˎͦͷଞ ɹɹɹ1)
खॻ͖ࣈͷೝࣝ ɹɹɹ2) RaspberryPi3ͰPythonͬͯΈΔ 2017.3 ʙ2017.4
PyLevel 1.0 20174݄ ˎΞϧΰϦζϜָ͍͠ ˎԿ͔Ͱ͖Δͱخ͍͠ ˎExcelSASͰ ɹͬͯͨࣄ͕PythonͰ ˎͬͱ৭ʑͬͯΈ͍ͨ Δؾ෮׆ɻͬͱೖऀϨϕϧʹ
νϟϨϯδ ͕ࣗڵຯ͕͋ΔςʔϚͰ Γ͍ͨͱࢥͬͨͷΛΖ͏
ΟεΩʔͰػցֶशʹઓ ʲͳΜͰΟεΩʔʁʳ ɹˎͱΓ͋͑ͣࢲ͕͖ ɹˎϫΠϯΈ͍ͨʹฑ͕Ռͯ͠ͳ͘ଟ͍Θ͚͡Όͳ͍ ɹˎւ֎ͷϏʔϧΈ͍ͨʹຯ߳Γͷಛ͕͖ͬΓͯ͠Δ Ϙτϧͷϥϕϧ͔ΒฑΛผ ը૾ೝࣝ WEBͷใ͔Β֤ฑͷಛΛநग़ ςΩετղੳ ͖ͳฑ͔ΒͨͿΜ͓ޱʹ߹͏ͷΛਪન
Ϩίϝϯυ
Anaconda Continuum Analyticsࣾఏڙͷ σʔλੳͰΑ͘ར༻͞ΕΔϥΠϒϥϦ܊Λ ҰׅΠϯετʔϧͰ͖ͪΌ͏ύοέʔδ ΟεΩʔͰػցֶशʹઓ Jupyter Notebook ϒϥβͰಈ࡞͢Δରܕ࣮ߦڥ ίʔυهड़ͱ࣮ߦɺίϝϯτૠೖ͕Ͱ͖ɺ
݁Ռͷอଘڞ༗ʹศར AnacondaೖΕΔͱσϑΥϧτͰೖͬͯΔ ɹPythonҎ֎ʹR, node.js, RubyෳݴޠʹରԠ ʲ࣮ߦڥʳ
ΟεΩʔͰػցֶशʹઓ Ϙτϧͷϥϕϧ͔ΒฑΛผ ը૾ೝࣝ ̍ʣҰੜݒ໋σʔλΛूΊΔ ̎ʣskimageΛͬͯHOGಛྔΛऔಘ͢Δ ̏ʣֶशσʔλͱςετσʔλʹׂ ̐ʣsklearnΛͬͯCodeBookΛ࡞͠BoVWʹม ̑ʣֶशͱςετ ▪༻ͨ͠ϥΠϒϥϦ ɹɹskimage,
matplotlib, sklearn, glob, os
ΟεΩʔͰػցֶशʹઓ Ϙτϧͷϥϕϧ͔ΒฑΛผ ը૾ೝࣝ ̍ʣҰੜݒ໋σʔλΛूΊΔ 899ຕ ࣗͷ͖ͳ11ฑ͚ͩͰ৺ંΕͯఘΊΔ
ΟεΩʔͰػցֶशʹઓ Ϙτϧͷϥϕϧ͔ΒฑΛผ ը૾ೝࣝ ̎ʣskimageΛͬͯHOGಛྔΛऔಘ͢Δ HOG(Histograms of Oriented Gradients) ɹہॴྖҬ (ηϧ)
ͷًͷޯํΛώετάϥϜԽ 1.ը૾ΛదͳαΠζʹϦαΠζ͠ɺάϨΠεέʔϧͰಡΈࠐΉ 2.֤pixelͷً͔ΒޯڧͱޯํΛٻΊΔ 3.ηϧྖҬ͝ͱʹώετάϥϜΛٻΊΔʢࠓճ8×8ϐΫηϧʣ 4.ϒϩοΫ͝ͱʹਖ਼نԽ͠ɺಛྔΛநग़͢Δʢࠓճ3×3ηϧʣ
ΟεΩʔͰػցֶशʹઓ Ϙτϧͷϥϕϧ͔ΒฑΛผ ը૾ೝࣝ ̎ʣskimageΛͬͯHOGಛྔΛऔಘ͢Δ
ΟεΩʔͰػցֶशʹઓ Ϙτϧͷϥϕϧ͔ΒฑΛผ ը૾ೝࣝ ̐ʣsklearnΛͬͯCodeBookΛ࡞͠BoVWʹม BoVW(Bag-of-Visual-Words)ܗࣜ ɹը૾ͷಛྔΛϕΫτϧԽ͠ώετάϥϜʹͨ͠ͷ ɾɾɾɾ ɾɾɾ ɾɾɾ Visual-word
vectors Codebook (දύλʔϯͷϦετʣ ç ç ç ç
ΟεΩʔͰػցֶशʹઓ Ϙτϧͷϥϕϧ͔ΒฑΛผ ը૾ೝࣝ ̑ʣֶशͱςετ ը૾σʔλͱਖ਼ղͷϥϕϧΛֶशͤ͞Δɹ※SVMʢαϙʔτϕΫλϚγϯʣΛ࠾༻ άϦουαʔνͰϋΠύʔύϥϝʔλνϡʔχϯά
ΟεΩʔͰػցֶशʹઓ Ϙτϧͷϥϕϧ͔ΒฑΛผ ը૾ೝࣝ ਖ਼ ɹ72% ςετ݁Ռ
ΟεΩʔͰػցֶशʹઓ Ϙτϧͷϥϕϧ͔ΒฑΛผ ը૾ೝࣝ ͬͯΈͨ݁Ռ ▪ը૾ͷલॲཧ෦ ɹɹɾࠓճϥϕϧΞοϓͷը૾͔ΓΛ༻ͨ͠ͷͰɺ ɹɹɹҾ͖ͷࣸਅʹରԠͰ͖ͳ͍ʢͬͯΈͨΒյ໓తʣ ɹɹɾഎܠͱ͔ະߟྀͷ·· ɹɹɾOpenCVʁͳʹͦΕ ▪ֶश෦
ɹɹɾಛྔϕʔε(BoVW)ͰͷֶशΛ࠾༻͚ͨ͠Ͳɺ ɹɹɹσΟʔϓϥʔχϯά͏ͱͬͱਫ਼͕͋Δͷ͔ͳ
WEBͷใ͔Β֤ฑͷಛΛநग़ ςΩετղੳ ΟεΩʔͰػցֶशʹઓ ̍ʣWebαΠτ͔Β֤ฑͷઆ໌ίϝϯτΛεΫϨΠϐϯά ̎ʣmecabͰ୯ޠʹׂ͠ɺܗ༰ࢺͱ෭ࢺ͚ͩऔΓग़͢ ̏ʣग़ݱͨ͠୯ޠΛ·ͱΊɺTFIDFܭࢉͰಛతͳޠΛಛఆ ̐ʣ୯ޠΛͬͯࣅ͍ͯΔฑΛΫϥελϦϯά
WEBͷใ͔Β֤ฑͷಛΛநग़ ςΩετղੳ ΟεΩʔͰػցֶशʹઓ ̍ʣWebαΠτ͔Β֤ฑͷઆ໌ίϝϯτΛεΫϨΠϐϯά beautiful soupΛ༻ ɹɹᶃHTMLऔಘ ɹɹᶄ΄͍͠ใͷ෦ͷλάΛݟ͚ͭΔ ɹɹᶅߏղੳͯ͠ಛఆ෦Λநग़ ɹɹᶆσʔλϑϨʔϜʹม͢Δ
WEBͷใ͔Β֤ฑͷಛΛநग़ ςΩετղੳ ΟεΩʔͰػցֶशʹઓ ̍ʣWebαΠτ͔Β֤ฑͷઆ໌ίϝϯτΛεΫϨΠϐϯά beautiful soupΛ༻ ɹɹᶃHTMLऔಘ ɹɹᶄ΄͍͠ใͷ෦ͷλάΛݟ͚ͭΔ ɹɹᶅߏղੳͯ͠ಛఆ෦Λநग़ ɹɹᶆσʔλϑϨʔϜʹม͢Δ
্ख͍͔ͣ͘ ࣌ؒΛ࿘අ
WEBͷใ͔Β֤ฑͷಛΛநग़ ςΩετղੳ ΟεΩʔͰػցֶशʹઓ ̍ʣWebαΠτ͔Β֤ฑͷઆ໌ίϝϯτΛεΫϨΠϐϯά
WEBͷใ͔Β֤ฑͷಛΛநग़ ςΩετղੳ ΟεΩʔͰػցֶशʹઓ ̍ʣWebαΠτ͔Β֤ฑͷઆ໌ίϝϯτΛεΫϨΠϐϯά
WEBͷใ͔Β֤ฑͷಛΛநग़ ςΩετղੳ ΟεΩʔͰػցֶशʹઓ ̍ʣWebαΠτ͔Β֤ฑͷઆ໌ίϝϯτΛεΫϨΠϐϯά ฑͷղઆจ ޱίϛจ ͳΜͱ͔εΫϨΠϐϯάྃ ख࡞ۀͰCSV࡞
WEBͷใ͔Β֤ฑͷಛΛநग़ ςΩετղੳ ΟεΩʔͰػցֶशʹઓ ̎ʣmecabͰ୯ޠʹׂ͠ɺܗ༰ࢺͱ෭ࢺ͚ͩऔΓग़͢ ߳Γڧ͍Ͱ͕͢ɺຯ͖ͬ͢Γ͍ͯͯ͠ɺඒຯ͍͠Ͱ͢ɻ ߳Γ//ڧ͍/Ͱ͢/͕/ɺ/ຯ//͖ͬ͢Γ/ͯ͠/͍ͯ/ɺ /ඒຯ͍͠/Ͱ͢/ɻ
WEBͷใ͔Β֤ฑͷಛΛநग़ ςΩετղੳ ΟεΩʔͰػցֶशʹઓ ̏ʣग़ݱͨ͠୯ޠΛ·ͱΊɺTFIDFܭࢉͰಛతͳޠΛಛఆ BoW(Bag-of-Words) ܗࣜʹม จॻ ߦྻԽ TFIDF ܭࢉ
WEBͷใ͔Β֤ฑͷಛΛநग़ ςΩετղੳ ΟεΩʔͰػցֶशʹઓ ̏ʣग़ݱͨ͠ܗ༰ࢺΛ·ͱΊɺTFIDFܭࢉͰಛతͳޠΛಛఆ TFIDF=TF×IDF TF→Ұ࿈ͷจॻͷͦͷ୯ޠͷग़ݱස IDF→ʢͦͷ୯ޠ͕͋Δจॻͷʗશจॻʣͷٯ ʲͦͷจॻΒ͠͞ʳ ɹͦͷจॻͰଟ͘Ͱͯ͘ΔͷʹɺଞͷจॻͰ͋·ΓͰͯ͜ͳ͍୯ޠ BoW(Bag-of-Words)ܗࣜʹม
จॻߦྻԽ TFIDFܭࢉ BoWܗࣜ=จॻΛͱͯ͠ѻ͏ʢϕΫτϧԽ͢Δʣ ※ࠓճTFʢ୯ޠͷग़ݱසʣΛ༻
WEBͷใ͔Β֤ฑͷಛΛநग़ ςΩετղੳ ΟεΩʔͰػցֶशʹઓ ̐ʣ୯ޠΛͬͯࣅ͍ͯΔฑΛΫϥελϦϯά ֤ฑͷಛతͳ୯ޠTOP5Λͬͯ5ͭʹΫϥελϦϯά …͠Α͏ͱࢥͬͨΒ ˎ൱ఆʮͳ͍ʯͷߟྀ ˎআ֎ͨ͠΄͕͍͍୯ޠ ɹɹɹɹɹɹɹɹɹ ߟྀෆͰɺ
ͳ͍ΫϥελϦϯάʹ
͖ͳฑ͔ΒͨͿΜ͓ޱʹ߹͏ͷΛਪન Ϩίϝϯυ ΟεΩʔͰػցֶशʹઓ ຊ·Ͱʹ౸ୡͰ͖ͣ…ɻ ڧௐϑΟϧλϦϯάͱ͔ͰΓ͔͚ͨͬͨͲɺ ·ͣσʔλΛूΊΔͱ͍͏࠷େͷน͕ͬͯΔɻ
ΟεΩʔͰػցֶशʹઓͷཱྀଓ͘… Ϙτϧͷϥϕϧ͔ΒฑΛผ ը૾ೝࣝ WEBͷใ͔Β֤ฑͷಛΛநग़ ςΩετղੳ ͖ͳฑ͔ΒͨͿΜ͓ޱʹ߹͏ͷΛਪન Ϩίϝϯυ ˎਫ਼্ʂ90%ਖ਼ղ͍ͨ͠ɻOpenCVͬͯΈ͍ͨ ˎσʔλ૿ྔͱ൱ఆޠআ֎બఆɺϊΠζͷѻ͍ͷߟྀ ˎ͜Ε͔ΒؤுΔʂϢʔβෳධՁͰڧௐϑΟϧλϦϯά
WebΞϓϦέʔγϣϯ
·ͱΊ ˎPytonistaʹಌΕͨҰհͷੳͨͿΜPyLv1.5͘Β͍ʹͳΕͨ ˎ࠷ॳͷӈࠨΘ͔Βͳ͍࣌ɺؒͱҰॹ͕͍͍ ˎࣗͷڵຯͱֻ͚߹ΘͤΔͷͬͯؤுΓ͕࣋ଓ͢Δ ˎσʔλूΊΔͷେมͬͯᷚຊͩͬͨ ˎ·͡ͰwebΞϓϦέʔγϣϯΘ͔Γ·ͤΜ ɹಘҙͳਓɺͪΐͬͱͬͯΔਓɺ͓༑ୡʹͳ͍ͬͯͩ͘͞
͋Γ͕ͱ͏͍͟͝·ͨ͠ɻ