Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Introduction to Data Science for PHP Users
Search
Sotaro Karasawa
September 14, 2013
Technology
5
14k
Introduction to Data Science for PHP Users
PHPカンファレンス2013「PHPerのためのデータサイエンス入門」 #phpcon2013
Sotaro Karasawa
September 14, 2013
Tweet
Share
More Decks by Sotaro Karasawa
See All by Sotaro Karasawa
P2B Haus法人サポータープランのご提案
sotarok
2
1.2k
ソフトウェアxスタートアップから見た飲食と配送の世界 / The World of Food Deliverlies and Restaurant Businesses from a Software and Startup Perspective
sotarok
2
1.2k
CTO 3度目の正直 / My 3rd CTO Career
sotarok
21
10k
Introduction to the Corporate Solutions Engineering at MTC2018
sotarok
1
36k
Mercari meetup for Corporate Engineering #1 / What is "Corporate Engineering"?
sotarok
2
2.3k
Markdown and WYSIWYG
sotarok
1
6k
20 Jan 2017 / Moving Beyond Borders - Mercari DAY
sotarok
8
15k
PHPBLT の心得 / PHPBLT #5 @ペパボ
sotarok
5
3.5k
Wiki についての今昔物語 / Crowi
sotarok
5
15k
Other Decks in Technology
See All in Technology
Exadata Database Service on Cloud@Customer セキュリティ、ネットワーク、および管理について
oracle4engineer
PRO
2
1.5k
JAWS FESTA 2024「バスロケ」GPS×サーバーレスの開発と運用の舞台裏/jawsfesta2024-bus-gps-serverless
ma2shita
3
160
生成AI “再”入門 2025年春@WIRED TUESDAY EDITOR'S LOUNGE
kajikent
0
110
クラウド関連のインシデントケースを収集して見えてきたもの
lhazy
7
940
OPENLOGI Company Profile
hr01
0
60k
いまからでも遅くない!コンテナでWebアプリを動かしてみよう!コンテナハンズオン編
nomu
0
150
分解して理解する Aspire
nenonaninu
2
1.1k
EDRの検知の仕組みと検知回避について
chayakonanaika
12
4.9k
Two Blades, One Journey: Engineering While Managing
ohbarye
4
2k
1行のコードから社会課題の解決へ: EMの探究、事業・技術・組織を紡ぐ実践知 / EM Conf 2025
9ma3r
11
3.8k
ABWG2024採択者が語るエンジニアとしての自分自身の見つけ方〜発信して、つながって、世界を広げていく〜
maimyyym
1
170
IAMポリシーのAllow/Denyについて、改めて理解する
smt7174
2
210
Featured
See All Featured
Gamification - CAS2011
davidbonilla
80
5.2k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
33
2.8k
Measuring & Analyzing Core Web Vitals
bluesmoon
6
250
Rails Girls Zürich Keynote
gr2m
94
13k
GraphQLの誤解/rethinking-graphql
sonatard
68
10k
Git: the NoSQL Database
bkeepers
PRO
427
65k
What’s in a name? Adding method to the madness
productmarketing
PRO
22
3.3k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
44
7k
Build The Right Thing And Hit Your Dates
maggiecrowley
34
2.5k
GraphQLとの向き合い方2022年版
quramy
44
14k
Practical Orchestrator
shlominoach
186
10k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
175
52k
Transcript
Crocos, Inc. Sotaro Karasawa @sotarok http://facebook.com/sotarok 1)1FSͷͨΊͷ σʔλαΠΤϯεೖ QIQDPO 1)1ΧϯϑΝϨϯε
ࣗݾհ 4PUBSP,BSBTBXB!TPUBSPL ฑ૱ଠ EIBUFOBOFKQTPUBSPL גࣜձࣾΫϩίε$SPDPT*OD 1)1 (JU 5% 3FE#VMM
ύʔϑΣΫτ1)1 ٕज़ධࣾ વΈͳ͞Μ࣋ͬͯ·͢ΑͶʂʁ ˡ
σʔλαΠΤϯε
ৄ͍͜͠ͱ σʔλαΠΤϯςΟετ ཆಡຊ ٕज़ධࣾ IUUQXXXBNB[PODPKQEQ
σʔλαΠΤϯε ۀཧղ σʔλཧղ σʔλநग़ σʔλՃ ϞσϦϯά ޮՌݕূ αʔϏε࣮ Ҿ༻σʔλαΠΤϯςΟετཆಡຊ 1ୈষσʔλαΠΤϯεͷϓϩηε
σʔλαΠΤϯε ੵ͞ΕͨσʔλΛੳɾϞσϦϯάͯ͠ ϏδωεΛߦ͢ΔͨΊʹॏཁͳ ࢦඪΛಘΔ Λ܁Γฦ͢
σʔλαΠΤϯε ੵ͞ΕͨσʔλΛੳɾϞσϦϯάͯ͠ ϏδωεΛߦ͢ΔͨΊʹॏཁͳ ࢦඪΛಘΔ Λ܁Γฦ͢ Βͳ͚Ε͍͚ͳ͍͜ͱ͕ଟ͍ ࣝͷྖҬɾ෯͕͍
࠷ݶͷͱ͜Ζ͔Β खܰʹ࢝ΊΒΕΔͱ͜Ζ͔Β ࠷ॳͷาΛ;Έͩͦ͏
σʔλαΠΤϯε ۀཧղ σʔλཧղ σʔλநग़ σʔλՃ ϞσϦϯά ޮՌݕূ αʔϏε࣮ Ҿ༻σʔλαΠΤϯςΟετཆಡຊ 1ୈষσʔλαΠΤϯεͷϓϩηε
1)1FS 8FCΞϓϦέʔγϣϯʹͱͬͯ σʔλͱԿ͔
1)1FS 8FCΞϓϦέʔγϣϯʹͱͬͯ σʔλͱԿ͔ σʔλϕʔε ϩά
ࠓճϩάͷ͓
େྔͷΞϓϦέʔγϣϯϩάΛ ͍͔ʹऩू͠ ͲͷΑ͏ʹूܭ͢Δ͔
ͦΕΛ౿·͑ͯ ࠓͷΞδΣϯμ ϩάऩूͱੳͷΈ 1)1ΞϓϦέʔγϣϯͷϩάऩू ੳ
ϩάͷऩूͱੳͷΈ
Έͷਚ͖ͳ͍ ϩάͷऩूͱੳ େྔͷσʔλ Ͳ͏ूΊΔ Ͳ͜ʹஷΊΔ Ͳ͏औΓग़͢ Ͳ͏ूܭ͢Δ
Έͷਚ͖ͳ͍ ϩάͷऩूͱੳ େྔͷσʔλ Ͳ͏ूΊΔ Ͳ͜ʹஷΊΔ Ͳ͏औΓग़͢ Ͳ͏ूܭ͢Δ ωοτϫʔΫଳҬ σΟεΫ༰ྔ Ϗοάσʔλॲཧܥ
ॲཧ࣌ؒ
IUUQXXXUSFBTVSFEBUBDPN
TD Web Server Web Server fluentd S3 Hadoop Client Hive
MySQL etc... Result
TD Web Server Web Server fluentd S3 Hadoop Client Hive
MySQL etc... Result ͋ͬͪଆʹσʔλ͕ஷ·ΓɺΫΤ ϦΛ͛Δͱ͋ͬͪͰ)BEPPQ ͕ىಈͯ݁͠ՌΛฦͯ͘͠ΕΔ
ϩάੳΛਐΊΔʹ͋ͨΓ հͳɺσʔλͷऩूɾੵɾσʔλॲཧ ɹˠ5%͕ͬͯ͘ΕΔ ຊ࣭తͳۀ ɾͲͷΑ͏ͳσʔλ ɾͲͷΑ͏ʹूܭ ͷઃܭɾ࣮ʹίϛοτͰ͖Δʂ
$SPDPTʹ͓͚Δϩάͷ׆༻ wΞϓϦέʔγϣϯϩά w'BDFCPPLͷଐੑใʹجͮ͘ੳ wओཁͳΞΫγϣϯͷ࣮ߦ࣮ߦ࣌ؒ wτϥϯβΫγϣϯɾଐੑผɾܦ࿏ผ wΠϕϯτϩά wιʔγϟϧͷγΣΞ w.PEBMͷ։ดFUD wͦͷଞΖΖ
1)1ΞϓϦέʔγϣϯͷ ϩάऩू
ͲΜͳΞϓϦέʔγϣϯϩά جຊతͳϩάઃܭ
ͲΜͳϩάΛूΊͯΔʁ
8FCαʔόͷϩά
ϩάͱ͍͑ 8FCαʔόʔͷϩά 5SFBTVSF%BUBͷνϡʔτϦ Ξϧ"QBDIFͷϩά http://docs.treasure-data.com/articles/quickstart
͚ͩͲຊʹཉ͍͠ͷ
ͲΜͳϢʔβʔ͕ʁ ͲΜͳͰʁͲ͔͜Βʁ ͍ͭԿΛͨ͠ͷ͔ʁ ͲΜͳϘλϯΛΫϦοΫͨ͠ ͷ͔ʁλοϓͨ͠ͷ͔ʁ
ΞϓϦέʔγϣϯϩά
ͲΜͳϢʔβʔ͕ʁ ɹˠϢʔβʔొใ ͲΜͳͰʁͲ͔͜Βʁ ɹˠ6"(&0 ͍ͭԿΛͨ͠ͷ͔ʁ ɹˠ63*ΞΫγϣϯ
ΞϓϦέʔγϣϯϩάΛ Ͳ͏ूΊΔ͔
ͦͷલʹ ܰ͘εΩʔϚϨεϩάʹ͍ͭͯ
εΩʔϚϨεϩάͱʁ εΩʔϚͷແ͍ϩά
ϩάͷεΩʔϚ ͜Ε·Ͱ ˠྫ͑547
ΧϥϜUJNF ΧϥϜTUBUVT ΧϥϜVSJ ΧϥϜVTFS@JE IPHF εΩʔϚ
foreach (file('app.log') as $line) { $column = explode("\t", trim($line)); $time
= $column[0]; $status = $column[1]; ... } ˞࣮ࡍʹ1)1ͳΜ͔ͰͬͯΒΕͳ͍ͷͰTFEBXLͰ
߲ͷΘ͔ΓͮΒ͞ εΩʔϚมߋͷ͠͞ ੳऀͱऩूऀͷೝࣝࠩҟʹ ΑΔࣄނ
5%ͷϩά ͱ͍͏͔qVFOUE +40/ { "time":1373876885, "status":200, "uri":"/52495/facebook", "session_id":"kn6avn2fuh21r25a65mgm3rjh3", "fb_id":"7c40c5dd2e55cde37a8c40ed80e1", ...
}
ϩάͷ1045
qVFOUQIQMPHHFS use Fluent\Logger\FluentLogger; $logger = new FluentLogger("localhost","24224"); $logger->post( "debug.test", array("hello"=>"world")
); IUUQTHJUIVCDPNqVFOUqVFOUMPHHFSQIQ
جຊతͳϩάઃܭ
ΞΫηεϨίʔυͱͳΔΑ ͏ʹه͢Δ
Ϩεϙϯεʹͻ͔͚ͬΔ ϑϨʔϜϫʔΫʹ͍͍ͩͨ ϨεϙϯεΠϕϯτͷϑοΫϙΠϯτ͋ΔΑͶʁ 4ZNGPOZͳΒ PO,FSOFM3FTQPOTF
tags: - { name: kernel.event_listener, event: kernel.response } public function
onKernelResponse(FilterResponseEvent $event) { $request = $event->getRequest(); $response = $event->getResponse(); // ͳΜ͔ྻͭͬͯ͘ $data = $this->onAccess($request, $response); // log data $this->logger->post("access",$data); } ˞࣮ࡍʹͬͱෳͷ-JTUFOFS-PHHFS͕ొͰ͖ΔΑ͏ʹͯ͋͠Γ·͕͢
جຊతͳεΩʔϚΛܾΊΔ
εΩʔϚϨεͱ͍ͬͯ Ͳ͏͍͏ϩάΛѻ͍ͬͯΔͷ͔ ֤ϨίʔυͰҙຯ͕ҧͬͯҙ ຯ͕ແ͍
جຊతͳεΩʔϚΛܾΊΔ UJNF TUBUVT VSJ VB SFGFSSFS LTSVͬΆ໊͍લʹ߹Θͤͯ ͓͘ͱΘ͔Γ͍͔͢
8FCαʔόʹ͋Δϩά ͚ͩͰͳ͘ BQQ SPVUF DPOUSPMMFS QSPDFTT@UJNF EFWJDF ϑϨʔϜϫʔΫͰͷ ϧʔςΟϯά໊ͱ͔ɺ
ίϯτϩʔϥ໊ͱ͔ (uri ʹϊΠζ͕͋ͬͯ routing ໊ͰूܭͰ͖Δ)
ΞϓϦέʔγϣϯͷΓ͏Δ ଐੑΛඇਖ਼نԽͯ͠Ϩίʔυ ʹؚΊΔ
ඇਖ਼نԽ͞ΕͨϨίʔυ TFTTJPO@JE VTFS@JE HFOEFS BHF EFWJDF
ͳͥඇਖ਼نԽ͔ͷϝϦοτ +0*/ͤͣʹूܭؔʹ͔ΔͨΊ )BEPPQͰ+0*/Ͱ͖Δ͕ɺ ͜͏͓ͯ͘͠ͱఔ͕ݮΔ͔Β ͍ˍγϯϓϧ
ͪͳΈʹ VTFS@JE TFTTJPO@JE ͳͲIBTIԽ͓ͯ͘͠ͱྑ͍ ˞ສҰͷͱ͖ͷϓϥΠόγʔʹ ྀ
·ͱΊΔͱ ΞΫηεϨίʔυͱͳΔΑ͏ ʹه͢Δ جຊతͳεΩʔϚΛܾΊΔ ΞϓϦέʔγϣϯͷΓ͏Δଐ ੑΛඇਖ਼نԽͯ͠ϨίʔυʹؚΊΔ
͜͜·ͰདྷΔͱɺ͏ੳ͕Մೳ
ੳͷྫ SELECT AVG(v['process_time']) FROM access WHERE v['route'] = 'crocos_index'
ੳͷྫ SELECT v['gender'], COUNT(*) FROM access GROUP BY v['gender'] ඇਖ਼نԽ͓͍ͯ͠
ͯΑ͔ͬͨʂ
ੳͷྫ Τϥʔͷௐࠪʹ SELECT v['route'], v['status'], v['ua'] FROM access WHERE v['user_id']
= 'xxx'
˞͘ͳΔͷͰؔ࿈ͷॲཧলུͯ͠·͢ ɹຊผʹ(3061#:ͨ͠Γ8&)&3۟ͰߜͬͨΓ
εΩʔϚϨεϩάͷ׆༻ྫ τϥϯβΫγϣϯ
ͯ͞ جຊతͳεΩʔϚΛ࣋ͭ ϩά͕ͨ·Γ࢝Ί·ͨ͠
ಛผͳҙຯΛ࣋ͭ ΞΫγϣϯͷޭͳͲΛ ه͍ͨ͠
τϥϯβΫγϣϯ uri route: ϦΫΤετ͕དྷͨ͜ͱΘ͔Δ ͔͠͠ɺຊʹޭ͔ͨ͠ɺ ΞϓϦέʔγϣϯͰ͔͠Θ͔Β ͳ͍
εΩʔϚϨεͷग़൪
جຊతͳεΩʔϚ ՃͷεΩʔϚ UJNF TUBUVT VSJ VB SFGFSSFS ͳΜͪΌΒ ͔ΜͪΌΒ
ಛఆͷϨίʔυʹɺಛผ ͳҙຯΛͨͤΔ͜ͱ͕Ͱ ͖Δʂ ͔͠ଞͷϨίʔυʹӨڹ Λ͋ͨ͑Δ͜ͱͳ͘ɻ
τϥϯβΫγϣϯ key_action key_attr_*
τϥϯβΫγϣϯ key_action shop:buy:completed ΞϓϦ:ಈ࡞:ঢ়گ ※͜ͷྫʮߪೖྃʯ
τϥϯβΫγϣϯ key_attr_* τϥϯβΫγϣϯʹؔΘΔՃ తͳใΛͭͬ͜Ή εΩʔϚɺkey_action ͝ͱʹ ҟͳΔ
τϥϯβΫγϣϯྫ key_action = shop:buy:completed key_attr_item_id = xxxxx key_attr_ref = fb_share
τϥϯβΫγϣϯੳͷྫ SELECT item_id, ref, COUNT(*) FROM access WHERE key_action =
'shop:buy:completed' GROUP BY item_id, ref ˞จࣈͷ্ؔW<>ল͍ͯΔ
τϥϯβΫγϣϯੳ ׆༻ྫ: ࢪࡦผʹΞΫηεݩΛه τϥϯβΫγϣϯޭ͔Β ࠷ޮՌతͳࢪࡦΛݟ͚ͭΔ
/&9545&1
ूܭ݁Ռ͔Β ɾ౷ܭతղੳख๏ ɾϞσϦϯά Ϗδωεʹରͯ͠ΫϦςΟΧϧͳࢦඪ ͷࢉग़ͱվળϓϩηεͷཱ֬
·ͱΊ
ϩάΛूΊͨΓੳͨ͠Γ͢Δͷେม ɹ→ Fluentd Hadoop ͏ ɹ→ Treasure Data ͏
Ͳ͏͍͏ϩάΛूΊΕ͍͍ͷ͔ ɹ→ 1ΞΫηε1Ϩίʔυඇਖ਼نԽϩά ɹ→ ϩάϑΥʔϚοτࣗମͷઃܭ ɹ→ εΩʔϚϨεͷ׆༻
࠷ޙʹ 8FBSFIJSJOH ύʔϑΣΫτ1)1ஶऀਓ ݩ1)1ΧϯϑΝϨϯεҕһਓ ݩඇϞςਓ ݩυϥ່ਓ ͱಇ͚Δͷ$SPDPT͚ͩ
None