Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Data Science for PHP Users

Introduction to Data Science for PHP Users

PHPカンファレンス2013「PHPerのためのデータサイエンス入門」 #phpcon2013

Sotaro Karasawa

September 14, 2013
Tweet

More Decks by Sotaro Karasawa

Other Decks in Technology

Transcript

  1. Crocos, Inc.
    Sotaro Karasawa
    @sotarok
    http://facebook.com/sotarok
    1)1FSͷͨΊͷ
    σʔλαΠΤϯεೖ໳
    QIQDPO
    1)1ΧϯϑΝϨϯε

    View Slide

  2. ࣗݾ঺հ
    4PUBSP,BSBTBXB!TPUBSPL
    ฑ୔૱ଠ࿠
    EIBUFOBOFKQTPUBSPL
    גࣜձࣾΫϩίε$SPDPT*OD
    1)1 (JU 5%
    3FE#VMM

    View Slide

  3. ύʔϑΣΫτ1)1
    ٕज़ධ࿦ࣾ
    ౰વΈͳ͞Μ࣋ͬͯ·͢ΑͶʂʁ ˡ

    View Slide

  4. σʔλαΠΤϯε

    View Slide

  5. ৄ͍͜͠ͱ͸
    σʔλαΠΤϯςΟετ
    ཆ੒ಡຊ
    ٕज़ධ࿦ࣾ
    IUUQXXXBNB[PODPKQEQ

    View Slide

  6. σʔλαΠΤϯε
    ۀ຿ཧղ
    σʔλཧղ
    σʔλநग़
    σʔλՃ޻
    ϞσϦϯά
    ޮՌݕূ
    αʔϏε࣮૷
    Ҿ༻σʔλαΠΤϯςΟετཆ੒ಡຊ
    1ୈষσʔλαΠΤϯεͷϓϩηε

    View Slide

  7. σʔλαΠΤϯε
    ஝ੵ͞ΕͨσʔλΛ෼ੳɾϞσϦϯάͯ͠
    ϏδωεΛ਱ߦ͢ΔͨΊʹॏཁͳ
    ࢦඪΛಘΔ Λ܁Γฦ͢

    View Slide

  8. σʔλαΠΤϯε
    ஝ੵ͞ΕͨσʔλΛ෼ੳɾϞσϦϯάͯ͠
    ϏδωεΛ਱ߦ͢ΔͨΊʹॏཁͳ
    ࢦඪΛಘΔ Λ܁Γฦ͢

    ΍Βͳ͚Ε͹͍͚ͳ͍͜ͱ͕ଟ͍
    ஌ࣝͷྖҬɾ෯͕޿͍

    View Slide

  9. ࠷௿ݶͷͱ͜Ζ͔Β
    खܰʹ࢝ΊΒΕΔͱ͜Ζ͔Β
    ࠷ॳͷาΛ;Έͩͦ͏

    View Slide

  10. σʔλαΠΤϯε
    ۀ຿ཧղ
    σʔλཧղ
    σʔλநग़
    σʔλՃ޻
    ϞσϦϯά
    ޮՌݕূ
    αʔϏε࣮૷
    Ҿ༻σʔλαΠΤϯςΟετཆ੒ಡຊ
    1ୈষσʔλαΠΤϯεͷϓϩηε

    View Slide

  11. 1)1FS
    8FCΞϓϦέʔγϣϯʹͱͬͯ
    σʔλͱ͸Կ͔

    View Slide

  12. 1)1FS
    8FCΞϓϦέʔγϣϯʹͱͬͯ
    σʔλͱ͸Կ͔
    σʔλϕʔε
    ϩά

    View Slide

  13. ࠓճ͸ϩάͷ͓࿩

    View Slide

  14. େྔͷΞϓϦέʔγϣϯϩάΛ
    ͍͔ʹऩू͠
    ͲͷΑ͏ʹूܭ͢Δ͔

    View Slide

  15. ͦΕΛ౿·͑ͯ
    ࠓ೔ͷΞδΣϯμ
    ϩάऩूͱ෼ੳͷ೰Έ
    1)1ΞϓϦέʔγϣϯͷϩάऩू
    ෼ੳ

    View Slide

  16. ϩάͷऩूͱ෼ੳͷ೰Έ

    View Slide

  17. ೰Έͷਚ͖ͳ͍
    ϩάͷऩूͱ෼ੳ
    େྔͷσʔλ
    Ͳ͏ूΊΔ
    Ͳ͜ʹஷΊΔ
    Ͳ͏औΓग़͢
    Ͳ͏ूܭ͢Δ

    View Slide

  18. ೰Έͷਚ͖ͳ͍
    ϩάͷऩूͱ෼ੳ
    େྔͷσʔλ
    Ͳ͏ूΊΔ
    Ͳ͜ʹஷΊΔ
    Ͳ͏औΓग़͢
    Ͳ͏ूܭ͢Δ
    ωοτϫʔΫଳҬ
    σΟεΫ༰ྔ
    Ϗοάσʔλॲཧܥ
    ॲཧ࣌ؒ

    View Slide

  19. IUUQXXXUSFBTVSFEBUBDPN

    View Slide

  20. TD
    Web
    Server
    Web
    Server
    fluentd
    S3
    Hadoop
    Client
    Hive
    MySQL
    etc...
    Result

    View Slide

  21. TD
    Web
    Server
    Web
    Server
    fluentd
    S3
    Hadoop
    Client
    Hive
    MySQL
    etc...
    Result
    ͋ͬͪଆʹσʔλ͕ஷ·ΓɺΫΤ
    ϦΛ౤͛Δͱ͋ͬͪͰ)BEPPQ
    ͕ىಈͯ݁͠ՌΛฦͯ͘͠ΕΔ

    View Slide

  22. ϩά෼ੳΛਐΊΔʹ͋ͨΓ
    ໽հͳɺσʔλͷऩूɾ஝ੵɾσʔλॲཧ
    ɹˠ5%͕΍ͬͯ͘ΕΔ
    ຊ࣭తͳۀ຿
    ɾͲͷΑ͏ͳσʔλ
    ɾͲͷΑ͏ʹूܭ
    ͷઃܭɾ࣮૷ʹίϛοτͰ͖Δʂ

    View Slide

  23. $SPDPTʹ͓͚Δϩάͷ׆༻
    wΞϓϦέʔγϣϯϩά
    w'BDFCPPLͷଐੑ৘ใʹجͮ͘෼ੳ
    wओཁͳΞΫγϣϯͷ࣮ߦ਺΍࣮ߦ࣌ؒ
    wτϥϯβΫγϣϯ਺ɾଐੑผɾܦ࿏ผ
    wΠϕϯτϩά
    wιʔγϟϧ΁ͷγΣΞ
    w.PEBMͷ։ดFUD
    wͦͷଞ΋Ζ΋Ζ

    View Slide

  24. 1)1ΞϓϦέʔγϣϯͷ
    ϩάऩू

    View Slide

  25. ͲΜͳΞϓϦέʔγϣϯϩά
    جຊతͳϩάઃܭ

    View Slide

  26. ͲΜͳϩάΛूΊͯΔʁ

    View Slide

  27. 8FCαʔόͷϩά

    View Slide

  28. ϩάͱ͍͑͹
    8FCαʔόʔͷϩά
    5SFBTVSF%BUBͷνϡʔτϦ
    Ξϧ΋"QBDIFͷϩά
    http://docs.treasure-data.com/articles/quickstart

    View Slide

  29. ͚ͩͲຊ౰ʹཉ͍͠ͷ͸

    View Slide

  30. ͲΜͳϢʔβʔ͕ʁ
    ͲΜͳ୺຤ͰʁͲ͔͜Βʁ
    ͍ͭԿΛͨ͠ͷ͔ʁ
    ͲΜͳϘλϯΛΫϦοΫͨ͠
    ͷ͔ʁλοϓͨ͠ͷ͔ʁ

    View Slide

  31. ΞϓϦέʔγϣϯϩά

    View Slide

  32. ͲΜͳϢʔβʔ͕ʁ
    ɹˠϢʔβʔొ࿥৘ใ
    ͲΜͳ୺຤ͰʁͲ͔͜Βʁ
    ɹˠ6"(&0
    ͍ͭԿΛͨ͠ͷ͔ʁ
    ɹˠ63*ΞΫγϣϯ

    View Slide

  33. ΞϓϦέʔγϣϯϩάΛ
    Ͳ͏ूΊΔ͔

    View Slide

  34. ͦͷલʹ
    ܰ͘εΩʔϚϨεϩάʹ͍ͭͯ

    View Slide

  35. εΩʔϚϨεϩάͱ͸ʁ
    εΩʔϚͷແ͍ϩά

    View Slide

  36. ϩάͷεΩʔϚ
    ͜Ε·Ͱ
    ˠྫ͑͹547

    View Slide

  37. ΧϥϜ໨UJNF
    ΧϥϜ໨TUBUVT
    ΧϥϜ໨VSJ
    ΧϥϜ໨[email protected]

    IPHF
    εΩʔϚ

    View Slide

  38. foreach (file('app.log') as $line) {
    $column = explode("\t", trim($line));
    $time = $column[0];
    $status = $column[1];
    ...
    }
    ˞࣮ࡍʹ͸1)1ͳΜ͔Ͱ΍ͬͯΒΕͳ͍ͷͰTFE΍BXLͰ

    View Slide

  39. ߲໨ͷΘ͔ΓͮΒ͞
    εΩʔϚมߋͷ೉͠͞
    ෼ੳऀͱऩूऀͷೝࣝࠩҟʹ
    ΑΔࣄނ

    View Slide

  40. 5%ͷϩά ͱ͍͏͔qVFOUE

    +40/
    {
    "time":1373876885,
    "status":200,
    "uri":"/52495/facebook",
    "session_id":"kn6avn2fuh21r25a65mgm3rjh3",
    "fb_id":"7c40c5dd2e55cde37a8c40ed80e1",
    ...
    }

    View Slide

  41. ϩάͷ1045

    View Slide

  42. qVFOUQIQMPHHFS
    use Fluent\Logger\FluentLogger;
    $logger =
    new FluentLogger("localhost","24224");
    $logger->post(
    "debug.test",
    array("hello"=>"world")
    );
    IUUQTHJUIVCDPNqVFOUqVFOUMPHHFSQIQ

    View Slide

  43. جຊతͳϩάઃܭ

    View Slide


  44. ΞΫηεϨίʔυͱͳΔΑ
    ͏ʹه࿥͢Δ

    View Slide

  45. Ϩεϙϯεʹͻ͔͚ͬΔ
    ϑϨʔϜϫʔΫʹ͍͍ͩͨ
    ϨεϙϯεΠϕϯτ΁ͷϑοΫϙΠϯτ͋ΔΑͶʁ
    4ZNGPOZͳΒ
    PO,FSOFM3FTQPOTF

    View Slide

  46. tags:
    - { name: kernel.event_listener, event:
    kernel.response }
    public function onKernelResponse(FilterResponseEvent $event)
    {
    $request = $event->getRequest();
    $response = $event->getResponse();
    // ͳΜ͔഑ྻͭͬͯ͘
    $data = $this->onAccess($request, $response);
    // log data
    $this->logger->post("access",$data);
    }
    ˞࣮ࡍʹ͸΋ͬͱෳ਺ͷ-JTUFOFS΍-PHHFS͕ొ࿥Ͱ͖ΔΑ͏ʹͯ͋͠Γ·͕͢

    View Slide


  47. جຊతͳεΩʔϚΛܾΊΔ

    View Slide

  48. εΩʔϚϨεͱ͍ͬͯ΋
    Ͳ͏͍͏ϩάΛѻ͍ͬͯΔͷ͔
    ֤ϨίʔυͰҙຯ͕ҧͬͯ͸ҙ
    ຯ͕ແ͍

    View Slide

  49. جຊతͳεΩʔϚΛܾΊΔ
    UJNF
    TUBUVT
    VSJ
    VB
    SFGFSSFS

    LTSVͬΆ໊͍લʹ߹Θͤͯ
    ͓͘ͱΘ͔Γ΍͍͔͢΋

    View Slide

  50. 8FCαʔόʹ͋Δϩά
    ͚ͩͰͳ͘
    BQQ
    SPVUF
    DPOUSPMMFS
    [email protected]
    EFWJDF

    ϑϨʔϜϫʔΫ಺Ͱͷ
    ϧʔςΟϯά໊ͱ͔ɺ
    ίϯτϩʔϥ໊ͱ͔
    (uri ʹϊΠζ͕͋ͬͯ΋
    routing ໊ͰूܭͰ͖Δ)

    View Slide


  51. ΞϓϦέʔγϣϯͷ஌Γ͏Δ
    ଐੑΛඇਖ਼نԽͯ͠Ϩίʔυ
    ʹؚΊΔ

    View Slide

  52. ඇਖ਼نԽ͞ΕͨϨίʔυ
    [email protected]
    [email protected]
    HFOEFS
    BHF
    EFWJDF

    View Slide

  53. ͳͥඇਖ਼نԽ͔ͷϝϦοτ
    +0*/ͤͣʹूܭؔ਺ʹ͔ΔͨΊ
    )BEPPQͰ΋+0*/͸Ͱ͖Δ͕ɺ
    ͜͏͓ͯ͘͠ͱ޻ఔ͕ݮΔ͔Β
    ଎͍ˍγϯϓϧ

    View Slide

  54. ͪͳΈʹ
    [email protected]
    [email protected]
    ͳͲ͸IBTIԽ͓ͯ͘͠ͱྑ͍
    ˞ສҰͷͱ͖ͷϓϥΠόγʔʹ
    ഑ྀ

    View Slide

  55. ·ͱΊΔͱ
    ΞΫηεϨίʔυͱͳΔΑ͏
    ʹه࿥͢Δ
    جຊతͳεΩʔϚΛܾΊΔ
    ΞϓϦέʔγϣϯͷ஌Γ͏Δଐ
    ੑΛඇਖ਼نԽͯ͠ϨίʔυʹؚΊΔ

    View Slide

  56. ͜͜·ͰདྷΔͱɺ΋͏෼ੳ͕Մೳ

    View Slide

  57. ෼ੳͷྫ
    SELECT
    AVG(v['process_time'])
    FROM
    access
    WHERE
    v['route'] = 'crocos_index'

    View Slide

  58. ෼ੳͷྫ
    SELECT
    v['gender'], COUNT(*)
    FROM
    access
    GROUP BY v['gender']
    ඇਖ਼نԽ͓͍ͯ͠
    ͯΑ͔ͬͨʂ

    View Slide

  59. ෼ੳͷྫ Τϥʔͷௐࠪʹ΋

    SELECT
    v['route'], v['status'], v['ua']
    FROM
    access
    WHERE v['user_id'] = 'xxx'

    View Slide

  60. ˞௕͘ͳΔͷͰ೔෇ؔ࿈ͷॲཧ͸লུͯ͠·͢
    ɹຊ౰͸೔ผʹ(3061#:ͨ͠Γ8&)&3۟ͰߜͬͨΓ

    View Slide

  61. εΩʔϚϨεϩάͷ׆༻ྫ
    τϥϯβΫγϣϯ

    View Slide

  62. ͯ͞
    جຊతͳεΩʔϚΛ࣋ͭ
    ϩά͕ͨ·Γ࢝Ί·ͨ͠

    View Slide

  63. ಛผͳҙຯΛ࣋ͭ
    ΞΫγϣϯͷ੒ޭͳͲΛ
    ه࿥͍ͨ͠

    View Slide

  64. τϥϯβΫγϣϯ
    uri ΍ route:
    ϦΫΤετ͕དྷͨ͜ͱ͸Θ͔Δ
    ͔͠͠ɺຊ౰ʹ੒ޭ͔ͨ͠͸ɺ
    ΞϓϦέʔγϣϯͰ͔͠Θ͔Β
    ͳ͍

    View Slide

  65. εΩʔϚϨεͷग़൪

    View Slide

  66. جຊతͳεΩʔϚ
    ௥ՃͷεΩʔϚ
    UJNF
    TUBUVT
    VSJ
    VB
    SFGFSSFS

    ͳΜͪΌΒ
    ͔ΜͪΌΒ
    ಛఆͷϨίʔυʹɺಛผ
    ͳҙຯΛ΋ͨͤΔ͜ͱ͕Ͱ
    ͖Δʂ
    ͔͠΋ଞͷϨίʔυʹӨڹ
    Λ͋ͨ͑Δ͜ͱͳ͘ɻ

    View Slide

  67. τϥϯβΫγϣϯ
    key_action
    key_attr_*

    View Slide

  68. τϥϯβΫγϣϯ
    key_action
    shop:buy:completed
    ΞϓϦ:ಈ࡞:ঢ়گ
    ※͜ͷྫ͸ʮߪೖ׬ྃʯ

    View Slide

  69. τϥϯβΫγϣϯ
    key_attr_*
    τϥϯβΫγϣϯʹؔΘΔ෇Ճ
    తͳ৘ใΛͭͬ͜Ή
    εΩʔϚ͸ɺkey_action ͝ͱʹ
    ҟͳΔ

    View Slide

  70. τϥϯβΫγϣϯྫ
    key_action
    = shop:buy:completed
    key_attr_item_id
    = xxxxx
    key_attr_ref
    = fb_share

    View Slide

  71. τϥϯβΫγϣϯ෼ੳͷྫ
    SELECT
    item_id, ref, COUNT(*)
    FROM
    access
    WHERE
    key_action = 'shop:buy:completed'
    GROUP BY
    item_id, ref
    ˞จࣈ਺ͷؔ܎্W<>ল͍ͯΔ

    View Slide

  72. τϥϯβΫγϣϯ෼ੳ
    ׆༻ྫ:
    ࢪࡦผʹΞΫηεݩΛه࿥
    τϥϯβΫγϣϯ੒ޭ਺͔Β
    ࠷΋ޮՌతͳࢪࡦΛݟ͚ͭΔ

    View Slide

  73. /&9545&1

    View Slide

  74. ूܭ݁Ռ͔Β
    ɾ౷ܭతղੳख๏
    ɾϞσϦϯά
    Ϗδωεʹରͯ͠ΫϦςΟΧϧͳࢦඪ
    ͷࢉग़ͱվળϓϩηεͷཱ֬

    View Slide

  75. ·ͱΊ

    View Slide

  76. ϩάΛूΊͨΓ෼ੳͨ͠Γ͢Δͷ͸େม
    ɹ→ Fluentd ΍ Hadoop ࢖͏
    ɹ→ Treasure Data ࢖͏
    Ͳ͏͍͏ϩάΛूΊΕ͹͍͍ͷ͔
    ɹ→ 1ΞΫηε1Ϩίʔυඇਖ਼نԽϩά
    ɹ→ ϩάϑΥʔϚοτࣗମͷઃܭ
    ɹ→ εΩʔϚϨεͷ׆༻

    View Slide

  77. ࠷ޙʹ
    8FBSFIJSJOH
    ύʔϑΣΫτ1)1ஶऀਓ
    ݩ1)1ΧϯϑΝϨϯεҕһ௕ਓ
    ݩඇϞςਓ
    ݩυϥ່ਓ
    ͱಇ͚Δͷ͸$SPDPT͚ͩ

    View Slide

  78. View Slide