Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Data Science for PHP Users

Introduction to Data Science for PHP Users

PHPカンファレンス2013「PHPerのためのデータサイエンス入門」 #phpcon2013

Sotaro Karasawa

September 14, 2013
Tweet

More Decks by Sotaro Karasawa

Other Decks in Technology

Transcript

 1. Crocos, Inc.
  Sotaro Karasawa
  @sotarok
  http://facebook.com/sotarok
  1)1FSͷͨΊͷ
  σʔλαΠΤϯεೖ໳
  QIQDPO
  1)1ΧϯϑΝϨϯε

  View Slide

 2. ࣗݾ঺հ
  4PUBSP,BSBTBXB!TPUBSPL
  ฑ୔૱ଠ࿠
  EIBUFOBOFKQTPUBSPL
  גࣜձࣾΫϩίε$SPDPT*OD
  1)1 (JU 5%
  3FE#VMM

  View Slide

 3. ύʔϑΣΫτ1)1
  ٕज़ධ࿦ࣾ
  ౰વΈͳ͞Μ࣋ͬͯ·͢ΑͶʂʁ ˡ

  View Slide

 4. σʔλαΠΤϯε

  View Slide

 5. ৄ͍͜͠ͱ͸
  σʔλαΠΤϯςΟετ
  ཆ੒ಡຊ
  ٕज़ධ࿦ࣾ
  IUUQXXXBNB[PODPKQEQ

  View Slide

 6. σʔλαΠΤϯε
  ۀ຿ཧղ
  σʔλཧղ
  σʔλநग़
  σʔλՃ޻
  ϞσϦϯά
  ޮՌݕূ
  αʔϏε࣮૷
  Ҿ༻σʔλαΠΤϯςΟετཆ੒ಡຊ
  1ୈষσʔλαΠΤϯεͷϓϩηε

  View Slide

 7. σʔλαΠΤϯε
  ஝ੵ͞ΕͨσʔλΛ෼ੳɾϞσϦϯάͯ͠
  ϏδωεΛ਱ߦ͢ΔͨΊʹॏཁͳ
  ࢦඪΛಘΔ Λ܁Γฦ͢

  View Slide

 8. σʔλαΠΤϯε
  ஝ੵ͞ΕͨσʔλΛ෼ੳɾϞσϦϯάͯ͠
  ϏδωεΛ਱ߦ͢ΔͨΊʹॏཁͳ
  ࢦඪΛಘΔ Λ܁Γฦ͢

  ΍Βͳ͚Ε͹͍͚ͳ͍͜ͱ͕ଟ͍
  ஌ࣝͷྖҬɾ෯͕޿͍

  View Slide

 9. ࠷௿ݶͷͱ͜Ζ͔Β
  खܰʹ࢝ΊΒΕΔͱ͜Ζ͔Β
  ࠷ॳͷาΛ;Έͩͦ͏

  View Slide

 10. σʔλαΠΤϯε
  ۀ຿ཧղ
  σʔλཧղ
  σʔλநग़
  σʔλՃ޻
  ϞσϦϯά
  ޮՌݕূ
  αʔϏε࣮૷
  Ҿ༻σʔλαΠΤϯςΟετཆ੒ಡຊ
  1ୈষσʔλαΠΤϯεͷϓϩηε

  View Slide

 11. 1)1FS
  8FCΞϓϦέʔγϣϯʹͱͬͯ
  σʔλͱ͸Կ͔

  View Slide

 12. 1)1FS
  8FCΞϓϦέʔγϣϯʹͱͬͯ
  σʔλͱ͸Կ͔
  σʔλϕʔε
  ϩά

  View Slide

 13. ࠓճ͸ϩάͷ͓࿩

  View Slide

 14. େྔͷΞϓϦέʔγϣϯϩάΛ
  ͍͔ʹऩू͠
  ͲͷΑ͏ʹूܭ͢Δ͔

  View Slide

 15. ͦΕΛ౿·͑ͯ
  ࠓ೔ͷΞδΣϯμ
  ϩάऩूͱ෼ੳͷ೰Έ
  1)1ΞϓϦέʔγϣϯͷϩάऩू
  ෼ੳ

  View Slide

 16. ϩάͷऩूͱ෼ੳͷ೰Έ

  View Slide

 17. ೰Έͷਚ͖ͳ͍
  ϩάͷऩूͱ෼ੳ
  େྔͷσʔλ
  Ͳ͏ूΊΔ
  Ͳ͜ʹஷΊΔ
  Ͳ͏औΓग़͢
  Ͳ͏ूܭ͢Δ

  View Slide

 18. ೰Έͷਚ͖ͳ͍
  ϩάͷऩूͱ෼ੳ
  େྔͷσʔλ
  Ͳ͏ूΊΔ
  Ͳ͜ʹஷΊΔ
  Ͳ͏औΓग़͢
  Ͳ͏ूܭ͢Δ
  ωοτϫʔΫଳҬ
  σΟεΫ༰ྔ
  Ϗοάσʔλॲཧܥ
  ॲཧ࣌ؒ

  View Slide

 19. IUUQXXXUSFBTVSFEBUBDPN

  View Slide

 20. TD
  Web
  Server
  Web
  Server
  fluentd
  S3
  Hadoop
  Client
  Hive
  MySQL
  etc...
  Result

  View Slide

 21. TD
  Web
  Server
  Web
  Server
  fluentd
  S3
  Hadoop
  Client
  Hive
  MySQL
  etc...
  Result
  ͋ͬͪଆʹσʔλ͕ஷ·ΓɺΫΤ
  ϦΛ౤͛Δͱ͋ͬͪͰ)BEPPQ
  ͕ىಈͯ݁͠ՌΛฦͯ͘͠ΕΔ

  View Slide

 22. ϩά෼ੳΛਐΊΔʹ͋ͨΓ
  ໽հͳɺσʔλͷऩूɾ஝ੵɾσʔλॲཧ
  ɹˠ5%͕΍ͬͯ͘ΕΔ
  ຊ࣭తͳۀ຿
  ɾͲͷΑ͏ͳσʔλ
  ɾͲͷΑ͏ʹूܭ
  ͷઃܭɾ࣮૷ʹίϛοτͰ͖Δʂ

  View Slide

 23. $SPDPTʹ͓͚Δϩάͷ׆༻
  wΞϓϦέʔγϣϯϩά
  w'BDFCPPLͷଐੑ৘ใʹجͮ͘෼ੳ
  wओཁͳΞΫγϣϯͷ࣮ߦ਺΍࣮ߦ࣌ؒ
  wτϥϯβΫγϣϯ਺ɾଐੑผɾܦ࿏ผ
  wΠϕϯτϩά
  wιʔγϟϧ΁ͷγΣΞ
  w.PEBMͷ։ดFUD
  wͦͷଞ΋Ζ΋Ζ

  View Slide

 24. 1)1ΞϓϦέʔγϣϯͷ
  ϩάऩू

  View Slide

 25. ͲΜͳΞϓϦέʔγϣϯϩά
  جຊతͳϩάઃܭ

  View Slide

 26. ͲΜͳϩάΛूΊͯΔʁ

  View Slide

 27. 8FCαʔόͷϩά

  View Slide

 28. ϩάͱ͍͑͹
  8FCαʔόʔͷϩά
  5SFBTVSF%BUBͷνϡʔτϦ
  Ξϧ΋"QBDIFͷϩά
  http://docs.treasure-data.com/articles/quickstart

  View Slide

 29. ͚ͩͲຊ౰ʹཉ͍͠ͷ͸

  View Slide

 30. ͲΜͳϢʔβʔ͕ʁ
  ͲΜͳ୺຤ͰʁͲ͔͜Βʁ
  ͍ͭԿΛͨ͠ͷ͔ʁ
  ͲΜͳϘλϯΛΫϦοΫͨ͠
  ͷ͔ʁλοϓͨ͠ͷ͔ʁ

  View Slide

 31. ΞϓϦέʔγϣϯϩά

  View Slide

 32. ͲΜͳϢʔβʔ͕ʁ
  ɹˠϢʔβʔొ࿥৘ใ
  ͲΜͳ୺຤ͰʁͲ͔͜Βʁ
  ɹˠ6"(&0
  ͍ͭԿΛͨ͠ͷ͔ʁ
  ɹˠ63*ΞΫγϣϯ

  View Slide

 33. ΞϓϦέʔγϣϯϩάΛ
  Ͳ͏ूΊΔ͔

  View Slide

 34. ͦͷલʹ
  ܰ͘εΩʔϚϨεϩάʹ͍ͭͯ

  View Slide

 35. εΩʔϚϨεϩάͱ͸ʁ
  εΩʔϚͷແ͍ϩά

  View Slide

 36. ϩάͷεΩʔϚ
  ͜Ε·Ͱ
  ˠྫ͑͹547

  View Slide

 37. ΧϥϜ໨UJNF
  ΧϥϜ໨TUBUVT
  ΧϥϜ໨VSJ
  ΧϥϜ໨[email protected]

  IPHF
  εΩʔϚ

  View Slide

 38. foreach (file('app.log') as $line) {
  $column = explode("\t", trim($line));
  $time = $column[0];
  $status = $column[1];
  ...
  }
  ˞࣮ࡍʹ͸1)1ͳΜ͔Ͱ΍ͬͯΒΕͳ͍ͷͰTFE΍BXLͰ

  View Slide

 39. ߲໨ͷΘ͔ΓͮΒ͞
  εΩʔϚมߋͷ೉͠͞
  ෼ੳऀͱऩूऀͷೝࣝࠩҟʹ
  ΑΔࣄނ

  View Slide

 40. 5%ͷϩά ͱ͍͏͔qVFOUE

  +40/
  {
  "time":1373876885,
  "status":200,
  "uri":"/52495/facebook",
  "session_id":"kn6avn2fuh21r25a65mgm3rjh3",
  "fb_id":"7c40c5dd2e55cde37a8c40ed80e1",
  ...
  }

  View Slide

 41. ϩάͷ1045

  View Slide

 42. qVFOUQIQMPHHFS
  use Fluent\Logger\FluentLogger;
  $logger =
  new FluentLogger("localhost","24224");
  $logger->post(
  "debug.test",
  array("hello"=>"world")
  );
  IUUQTHJUIVCDPNqVFOUqVFOUMPHHFSQIQ

  View Slide

 43. جຊతͳϩάઃܭ

  View Slide


 44. ΞΫηεϨίʔυͱͳΔΑ
  ͏ʹه࿥͢Δ

  View Slide

 45. Ϩεϙϯεʹͻ͔͚ͬΔ
  ϑϨʔϜϫʔΫʹ͍͍ͩͨ
  ϨεϙϯεΠϕϯτ΁ͷϑοΫϙΠϯτ͋ΔΑͶʁ
  4ZNGPOZͳΒ
  PO,FSOFM3FTQPOTF

  View Slide

 46. tags:
  - { name: kernel.event_listener, event:
  kernel.response }
  public function onKernelResponse(FilterResponseEvent $event)
  {
  $request = $event->getRequest();
  $response = $event->getResponse();
  // ͳΜ͔഑ྻͭͬͯ͘
  $data = $this->onAccess($request, $response);
  // log data
  $this->logger->post("access",$data);
  }
  ˞࣮ࡍʹ͸΋ͬͱෳ਺ͷ-JTUFOFS΍-PHHFS͕ొ࿥Ͱ͖ΔΑ͏ʹͯ͋͠Γ·͕͢

  View Slide


 47. جຊతͳεΩʔϚΛܾΊΔ

  View Slide

 48. εΩʔϚϨεͱ͍ͬͯ΋
  Ͳ͏͍͏ϩάΛѻ͍ͬͯΔͷ͔
  ֤ϨίʔυͰҙຯ͕ҧͬͯ͸ҙ
  ຯ͕ແ͍

  View Slide

 49. جຊతͳεΩʔϚΛܾΊΔ
  UJNF
  TUBUVT
  VSJ
  VB
  SFGFSSFS

  LTSVͬΆ໊͍લʹ߹Θͤͯ
  ͓͘ͱΘ͔Γ΍͍͔͢΋

  View Slide

 50. 8FCαʔόʹ͋Δϩά
  ͚ͩͰͳ͘
  BQQ
  SPVUF
  DPOUSPMMFS
  [email protected]
  EFWJDF

  ϑϨʔϜϫʔΫ಺Ͱͷ
  ϧʔςΟϯά໊ͱ͔ɺ
  ίϯτϩʔϥ໊ͱ͔
  (uri ʹϊΠζ͕͋ͬͯ΋
  routing ໊ͰूܭͰ͖Δ)

  View Slide


 51. ΞϓϦέʔγϣϯͷ஌Γ͏Δ
  ଐੑΛඇਖ਼نԽͯ͠Ϩίʔυ
  ʹؚΊΔ

  View Slide

 52. ඇਖ਼نԽ͞ΕͨϨίʔυ
  [email protected]
  [email protected]
  HFOEFS
  BHF
  EFWJDF

  View Slide

 53. ͳͥඇਖ਼نԽ͔ͷϝϦοτ
  +0*/ͤͣʹूܭؔ਺ʹ͔ΔͨΊ
  )BEPPQͰ΋+0*/͸Ͱ͖Δ͕ɺ
  ͜͏͓ͯ͘͠ͱ޻ఔ͕ݮΔ͔Β
  ଎͍ˍγϯϓϧ

  View Slide

 54. ͪͳΈʹ
  [email protected]
  [email protected]
  ͳͲ͸IBTIԽ͓ͯ͘͠ͱྑ͍
  ˞ສҰͷͱ͖ͷϓϥΠόγʔʹ
  ഑ྀ

  View Slide

 55. ·ͱΊΔͱ
  ΞΫηεϨίʔυͱͳΔΑ͏
  ʹه࿥͢Δ
  جຊతͳεΩʔϚΛܾΊΔ
  ΞϓϦέʔγϣϯͷ஌Γ͏Δଐ
  ੑΛඇਖ਼نԽͯ͠ϨίʔυʹؚΊΔ

  View Slide

 56. ͜͜·ͰདྷΔͱɺ΋͏෼ੳ͕Մೳ

  View Slide

 57. ෼ੳͷྫ
  SELECT
  AVG(v['process_time'])
  FROM
  access
  WHERE
  v['route'] = 'crocos_index'

  View Slide

 58. ෼ੳͷྫ
  SELECT
  v['gender'], COUNT(*)
  FROM
  access
  GROUP BY v['gender']
  ඇਖ਼نԽ͓͍ͯ͠
  ͯΑ͔ͬͨʂ

  View Slide

 59. ෼ੳͷྫ Τϥʔͷௐࠪʹ΋

  SELECT
  v['route'], v['status'], v['ua']
  FROM
  access
  WHERE v['user_id'] = 'xxx'

  View Slide

 60. ˞௕͘ͳΔͷͰ೔෇ؔ࿈ͷॲཧ͸লུͯ͠·͢
  ɹຊ౰͸೔ผʹ(3061#:ͨ͠Γ8&)&3۟ͰߜͬͨΓ

  View Slide

 61. εΩʔϚϨεϩάͷ׆༻ྫ
  τϥϯβΫγϣϯ

  View Slide

 62. ͯ͞
  جຊతͳεΩʔϚΛ࣋ͭ
  ϩά͕ͨ·Γ࢝Ί·ͨ͠

  View Slide

 63. ಛผͳҙຯΛ࣋ͭ
  ΞΫγϣϯͷ੒ޭͳͲΛ
  ه࿥͍ͨ͠

  View Slide

 64. τϥϯβΫγϣϯ
  uri ΍ route:
  ϦΫΤετ͕དྷͨ͜ͱ͸Θ͔Δ
  ͔͠͠ɺຊ౰ʹ੒ޭ͔ͨ͠͸ɺ
  ΞϓϦέʔγϣϯͰ͔͠Θ͔Β
  ͳ͍

  View Slide

 65. εΩʔϚϨεͷग़൪

  View Slide

 66. جຊతͳεΩʔϚ
  ௥ՃͷεΩʔϚ
  UJNF
  TUBUVT
  VSJ
  VB
  SFGFSSFS

  ͳΜͪΌΒ
  ͔ΜͪΌΒ
  ಛఆͷϨίʔυʹɺಛผ
  ͳҙຯΛ΋ͨͤΔ͜ͱ͕Ͱ
  ͖Δʂ
  ͔͠΋ଞͷϨίʔυʹӨڹ
  Λ͋ͨ͑Δ͜ͱͳ͘ɻ

  View Slide

 67. τϥϯβΫγϣϯ
  key_action
  key_attr_*

  View Slide

 68. τϥϯβΫγϣϯ
  key_action
  shop:buy:completed
  ΞϓϦ:ಈ࡞:ঢ়گ
  ※͜ͷྫ͸ʮߪೖ׬ྃʯ

  View Slide

 69. τϥϯβΫγϣϯ
  key_attr_*
  τϥϯβΫγϣϯʹؔΘΔ෇Ճ
  తͳ৘ใΛͭͬ͜Ή
  εΩʔϚ͸ɺkey_action ͝ͱʹ
  ҟͳΔ

  View Slide

 70. τϥϯβΫγϣϯྫ
  key_action
  = shop:buy:completed
  key_attr_item_id
  = xxxxx
  key_attr_ref
  = fb_share

  View Slide

 71. τϥϯβΫγϣϯ෼ੳͷྫ
  SELECT
  item_id, ref, COUNT(*)
  FROM
  access
  WHERE
  key_action = 'shop:buy:completed'
  GROUP BY
  item_id, ref
  ˞จࣈ਺ͷؔ܎্W<>ল͍ͯΔ

  View Slide

 72. τϥϯβΫγϣϯ෼ੳ
  ׆༻ྫ:
  ࢪࡦผʹΞΫηεݩΛه࿥
  τϥϯβΫγϣϯ੒ޭ਺͔Β
  ࠷΋ޮՌతͳࢪࡦΛݟ͚ͭΔ

  View Slide

 73. /&9545&1

  View Slide

 74. ूܭ݁Ռ͔Β
  ɾ౷ܭతղੳख๏
  ɾϞσϦϯά
  Ϗδωεʹରͯ͠ΫϦςΟΧϧͳࢦඪ
  ͷࢉग़ͱվળϓϩηεͷཱ֬

  View Slide

 75. ·ͱΊ

  View Slide

 76. ϩάΛूΊͨΓ෼ੳͨ͠Γ͢Δͷ͸େม
  ɹ→ Fluentd ΍ Hadoop ࢖͏
  ɹ→ Treasure Data ࢖͏
  Ͳ͏͍͏ϩάΛूΊΕ͹͍͍ͷ͔
  ɹ→ 1ΞΫηε1Ϩίʔυඇਖ਼نԽϩά
  ɹ→ ϩάϑΥʔϚοτࣗମͷઃܭ
  ɹ→ εΩʔϚϨεͷ׆༻

  View Slide

 77. ࠷ޙʹ
  8FBSFIJSJOH
  ύʔϑΣΫτ1)1ஶऀਓ
  ݩ1)1ΧϯϑΝϨϯεҕһ௕ਓ
  ݩඇϞςਓ
  ݩυϥ່ਓ
  ͱಇ͚Δͷ͸$SPDPT͚ͩ

  View Slide

 78. View Slide