PHPカンファレンス2013「PHPerのためのデータサイエンス入門」 #phpcon2013
Crocos, Inc.Sotaro Karasawa@sotarokhttp://facebook.com/sotarok1)1FSͷͨΊͷσʔλαΠΤϯεೖQIQDPO1)1ΧϯϑΝϨϯε
View Slide
ࣗݾհ4PUBSP,BSBTBXB!TPUBSPLฑ૱ଠEIBUFOBOFKQTPUBSPLגࣜձࣾΫϩίε$SPDPT*OD1)1 (JU 5%3FE#VMM
ύʔϑΣΫτ1)1ٕज़ධࣾવΈͳ͞Μ࣋ͬͯ·͢ΑͶʂʁ ˡ
σʔλαΠΤϯε
ৄ͍͜͠ͱσʔλαΠΤϯςΟετཆಡຊٕज़ධࣾIUUQXXXBNB[PODPKQEQ
σʔλαΠΤϯεۀཧղσʔλཧղσʔλநग़σʔλՃϞσϦϯάޮՌݕূαʔϏε࣮Ҿ༻σʔλαΠΤϯςΟετཆಡຊ1ୈষσʔλαΠΤϯεͷϓϩηε
σʔλαΠΤϯεੵ͞ΕͨσʔλΛੳɾϞσϦϯάͯ͠ϏδωεΛߦ͢ΔͨΊʹॏཁͳࢦඪΛಘΔ Λ܁Γฦ͢
σʔλαΠΤϯεੵ͞ΕͨσʔλΛੳɾϞσϦϯάͯ͠ϏδωεΛߦ͢ΔͨΊʹॏཁͳࢦඪΛಘΔ Λ܁Γฦ͢Βͳ͚Ε͍͚ͳ͍͜ͱ͕ଟ͍ࣝͷྖҬɾ෯͕͍
࠷ݶͷͱ͜Ζ͔Βखܰʹ࢝ΊΒΕΔͱ͜Ζ͔Β࠷ॳͷาΛ;Έͩͦ͏
1)1FS8FCΞϓϦέʔγϣϯʹͱͬͯσʔλͱԿ͔
1)1FS8FCΞϓϦέʔγϣϯʹͱͬͯσʔλͱԿ͔σʔλϕʔεϩά
ࠓճϩάͷ͓
େྔͷΞϓϦέʔγϣϯϩάΛ͍͔ʹऩू͠ͲͷΑ͏ʹूܭ͢Δ͔
ͦΕΛ౿·͑ͯࠓͷΞδΣϯμϩάऩूͱੳͷΈ1)1ΞϓϦέʔγϣϯͷϩάऩूੳ
ϩάͷऩूͱੳͷΈ
Έͷਚ͖ͳ͍ϩάͷऩूͱੳେྔͷσʔλͲ͏ूΊΔͲ͜ʹஷΊΔͲ͏औΓग़͢Ͳ͏ूܭ͢Δ
Έͷਚ͖ͳ͍ϩάͷऩूͱੳେྔͷσʔλͲ͏ूΊΔͲ͜ʹஷΊΔͲ͏औΓग़͢Ͳ͏ूܭ͢ΔωοτϫʔΫଳҬσΟεΫ༰ྔϏοάσʔλॲཧܥॲཧ࣌ؒ
IUUQXXXUSFBTVSFEBUBDPN
TDWebServerWebServerfluentdS3HadoopClientHiveMySQLetc...Result
TDWebServerWebServerfluentdS3HadoopClientHiveMySQLetc...Result͋ͬͪଆʹσʔλ͕ஷ·ΓɺΫΤϦΛ͛Δͱ͋ͬͪͰ)BEPPQ͕ىಈͯ݁͠ՌΛฦͯ͘͠ΕΔ
ϩάੳΛਐΊΔʹ͋ͨΓհͳɺσʔλͷऩूɾੵɾσʔλॲཧɹˠ5%͕ͬͯ͘ΕΔຊ࣭తͳۀɾͲͷΑ͏ͳσʔλɾͲͷΑ͏ʹूܭͷઃܭɾ࣮ʹίϛοτͰ͖Δʂ
$SPDPTʹ͓͚Δϩάͷ׆༻wΞϓϦέʔγϣϯϩάw'BDFCPPLͷଐੑใʹجͮ͘ੳwओཁͳΞΫγϣϯͷ࣮ߦ࣮ߦ࣌ؒwτϥϯβΫγϣϯɾଐੑผɾܦ࿏ผwΠϕϯτϩάwιʔγϟϧͷγΣΞw.PEBMͷ։ดFUDwͦͷଞΖΖ
1)1ΞϓϦέʔγϣϯͷϩάऩू
ͲΜͳΞϓϦέʔγϣϯϩάجຊతͳϩάઃܭ
ͲΜͳϩάΛूΊͯΔʁ
8FCαʔόͷϩά
ϩάͱ͍͑8FCαʔόʔͷϩά5SFBTVSF%BUBͷνϡʔτϦΞϧ"QBDIFͷϩάhttp://docs.treasure-data.com/articles/quickstart
͚ͩͲຊʹཉ͍͠ͷ
ͲΜͳϢʔβʔ͕ʁͲΜͳͰʁͲ͔͜Βʁ͍ͭԿΛͨ͠ͷ͔ʁͲΜͳϘλϯΛΫϦοΫͨ͠ͷ͔ʁλοϓͨ͠ͷ͔ʁ
ΞϓϦέʔγϣϯϩά
ͲΜͳϢʔβʔ͕ʁɹˠϢʔβʔొใͲΜͳͰʁͲ͔͜Βʁɹˠ6"(&0͍ͭԿΛͨ͠ͷ͔ʁɹˠ63*ΞΫγϣϯ
ΞϓϦέʔγϣϯϩάΛͲ͏ूΊΔ͔
ͦͷલʹܰ͘εΩʔϚϨεϩάʹ͍ͭͯ
εΩʔϚϨεϩάͱʁεΩʔϚͷແ͍ϩά
ϩάͷεΩʔϚ͜Ε·Ͱˠྫ͑547
ΧϥϜUJNFΧϥϜTUBUVTΧϥϜVSJΧϥϜ[email protected]IPHFεΩʔϚ
foreach (file('app.log') as $line) {$column = explode("\t", trim($line));$time = $column[0];$status = $column[1];...}˞࣮ࡍʹ1)1ͳΜ͔ͰͬͯΒΕͳ͍ͷͰTFEBXLͰ
߲ͷΘ͔ΓͮΒ͞εΩʔϚมߋͷ͠͞ੳऀͱऩूऀͷೝࣝࠩҟʹΑΔࣄނ
5%ͷϩά ͱ͍͏͔qVFOUE+40/{"time":1373876885,"status":200,"uri":"/52495/facebook","session_id":"kn6avn2fuh21r25a65mgm3rjh3","fb_id":"7c40c5dd2e55cde37a8c40ed80e1",...}
ϩάͷ1045
qVFOUQIQMPHHFSuse Fluent\Logger\FluentLogger;$logger =new FluentLogger("localhost","24224");$logger->post("debug.test",array("hello"=>"world"));IUUQTHJUIVCDPNqVFOUqVFOUMPHHFSQIQ
جຊతͳϩάઃܭ
ΞΫηεϨίʔυͱͳΔΑ͏ʹه͢Δ
Ϩεϙϯεʹͻ͔͚ͬΔϑϨʔϜϫʔΫʹ͍͍ͩͨϨεϙϯεΠϕϯτͷϑοΫϙΠϯτ͋ΔΑͶʁ4ZNGPOZͳΒPO,FSOFM3FTQPOTF
tags:- { name: kernel.event_listener, event:kernel.response }public function onKernelResponse(FilterResponseEvent $event){$request = $event->getRequest();$response = $event->getResponse();// ͳΜ͔ྻͭͬͯ͘$data = $this->onAccess($request, $response);// log data$this->logger->post("access",$data);}˞࣮ࡍʹͬͱෳͷ-JTUFOFS-PHHFS͕ొͰ͖ΔΑ͏ʹͯ͋͠Γ·͕͢
جຊతͳεΩʔϚΛܾΊΔ
εΩʔϚϨεͱ͍ͬͯͲ͏͍͏ϩάΛѻ͍ͬͯΔͷ͔֤ϨίʔυͰҙຯ͕ҧͬͯҙຯ͕ແ͍
جຊతͳεΩʔϚΛܾΊΔUJNFTUBUVTVSJVBSFGFSSFSLTSVͬΆ໊͍લʹ߹Θ͓ͤͯ͘ͱΘ͔Γ͍͔͢
8FCαʔόʹ͋Δϩά͚ͩͰͳ͘BQQSPVUFDPOUSPMMFS[email protected]EFWJDFϑϨʔϜϫʔΫͰͷϧʔςΟϯά໊ͱ͔ɺίϯτϩʔϥ໊ͱ͔(uri ʹϊΠζ͕͋ͬͯrouting ໊ͰूܭͰ͖Δ)
ΞϓϦέʔγϣϯͷΓ͏ΔଐੑΛඇਖ਼نԽͯ͠ϨίʔυʹؚΊΔ
ඇਖ਼نԽ͞ΕͨϨίʔυ[email protected][email protected]HFOEFSBHFEFWJDF
ͳͥඇਖ਼نԽ͔ͷϝϦοτ+0*/ͤͣʹूܭؔʹ͔ΔͨΊ)BEPPQͰ+0*/Ͱ͖Δ͕ɺ͜͏͓ͯ͘͠ͱఔ͕ݮΔ͔Β͍ˍγϯϓϧ
ͪͳΈʹ[email protected][email protected]ͳͲIBTIԽ͓ͯ͘͠ͱྑ͍˞ສҰͷͱ͖ͷϓϥΠόγʔʹྀ
·ͱΊΔͱΞΫηεϨίʔυͱͳΔΑ͏ʹه͢ΔجຊతͳεΩʔϚΛܾΊΔΞϓϦέʔγϣϯͷΓ͏ΔଐੑΛඇਖ਼نԽͯ͠ϨίʔυʹؚΊΔ
͜͜·ͰདྷΔͱɺ͏ੳ͕Մೳ
ੳͷྫSELECTAVG(v['process_time'])FROMaccessWHEREv['route'] = 'crocos_index'
ੳͷྫSELECTv['gender'], COUNT(*)FROMaccessGROUP BY v['gender']ඇਖ਼نԽ͓͍ͯͯ͠Α͔ͬͨʂ
ੳͷྫ ΤϥʔͷௐࠪʹSELECTv['route'], v['status'], v['ua']FROMaccessWHERE v['user_id'] = 'xxx'
˞͘ͳΔͷͰؔ࿈ͷॲཧলུͯ͠·͢ɹຊผʹ(3061#:ͨ͠Γ8&)&3۟ͰߜͬͨΓ
εΩʔϚϨεϩάͷ׆༻ྫτϥϯβΫγϣϯ
ͯ͞جຊతͳεΩʔϚΛ࣋ͭϩά͕ͨ·Γ࢝Ί·ͨ͠
ಛผͳҙຯΛ࣋ͭΞΫγϣϯͷޭͳͲΛه͍ͨ͠
τϥϯβΫγϣϯuri route:ϦΫΤετ͕དྷͨ͜ͱΘ͔Δ͔͠͠ɺຊʹޭ͔ͨ͠ɺΞϓϦέʔγϣϯͰ͔͠Θ͔Βͳ͍
εΩʔϚϨεͷग़൪
جຊతͳεΩʔϚՃͷεΩʔϚUJNFTUBUVTVSJVBSFGFSSFSͳΜͪΌΒ͔ΜͪΌΒಛఆͷϨίʔυʹɺಛผͳҙຯΛͨͤΔ͜ͱ͕Ͱ͖Δʂ͔͠ଞͷϨίʔυʹӨڹΛ͋ͨ͑Δ͜ͱͳ͘ɻ
τϥϯβΫγϣϯkey_actionkey_attr_*
τϥϯβΫγϣϯkey_actionshop:buy:completedΞϓϦ:ಈ࡞:ঢ়گ※͜ͷྫʮߪೖྃʯ
τϥϯβΫγϣϯkey_attr_*τϥϯβΫγϣϯʹؔΘΔՃతͳใΛͭͬ͜ΉεΩʔϚɺkey_action ͝ͱʹҟͳΔ
τϥϯβΫγϣϯྫkey_action= shop:buy:completedkey_attr_item_id= xxxxxkey_attr_ref= fb_share
τϥϯβΫγϣϯੳͷྫSELECTitem_id, ref, COUNT(*)FROMaccessWHEREkey_action = 'shop:buy:completed'GROUP BYitem_id, ref˞จࣈͷ্ؔW<>ল͍ͯΔ
τϥϯβΫγϣϯੳ׆༻ྫ:ࢪࡦผʹΞΫηεݩΛهτϥϯβΫγϣϯޭ͔Β࠷ޮՌతͳࢪࡦΛݟ͚ͭΔ
/&9545&1
ूܭ݁Ռ͔Βɾ౷ܭతղੳख๏ɾϞσϦϯάϏδωεʹରͯ͠ΫϦςΟΧϧͳࢦඪͷࢉग़ͱվળϓϩηεͷཱ֬
·ͱΊ
ϩάΛूΊͨΓੳͨ͠Γ͢Δͷେมɹ→ Fluentd Hadoop ͏ɹ→ Treasure Data ͏Ͳ͏͍͏ϩάΛूΊΕ͍͍ͷ͔ɹ→ 1ΞΫηε1Ϩίʔυඇਖ਼نԽϩάɹ→ ϩάϑΥʔϚοτࣗମͷઃܭɹ→ εΩʔϚϨεͷ׆༻
࠷ޙʹ8FBSFIJSJOHύʔϑΣΫτ1)1ஶऀਓݩ1)1ΧϯϑΝϨϯεҕһਓݩඇϞςਓݩυϥ່ਓͱಇ͚Δͷ$SPDPT͚ͩ