Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Design Patterns for Collecting and Analyzing Schemaless Log

Design Patterns for Collecting and Analyzing Schemaless Log

スキーマレスなログデータの収集と集計のためのデザインパターン

at http://www.zusaar.com/event/876003

4564d3601f6230782300dff8499b2d5a?s=128

Sotaro Karasawa

August 09, 2013
Tweet

Transcript

  1. Crocos, Inc. Sotaro Karasawa @sotarok http://facebook.com/sotarok εΩʔϚϨεͳ ϩάσʔλͷ ऩूͱूܭͷͨΊͷ σβΠϯύλʔϯ

    #ds2013 ·ͨ͸ Treasure Data ϋΠύʔ׆༻ज़
  2. ࣗݾ঺հ 4PUBSP,BSBTBXB!TPUBSPL ฑ୔૱ଠ࿠ EIBUFOBOFKQTPUBSPL גࣜձࣾΫϩίε$SPDPT*OD 1)1 3FE#VMM

  3. ࣾһਓͰ૑ۀ ։ൃऀ౰࣌ਓ ೥݄ʹαʔϏεϩʔϯν ೥݄ʹ5%ಋೖ

  4. ࠓ೔࿩͍ͨ͜͠ͱ ΞϓϦέʔγϣϯϩάΛͲ͏ू ΊΔ͔ εΩʔϚϨεͳϩάͱ͸ جຊతͳϩάઃܭ ϩάऩूͷσβΠϯύλʔϯ

  5. ࠓ೔ͷ໨త ৘ใڞ༗ɺ৘ใަ׵ ͏ͪͰ͸͜͏΍ͬͯΔΑɺͱ͍͏Ұྫ ܾͯ͠ߨࢣͱͯ͠ɺ͜͏΍Γ·͠ΐ͏ͱݴ ͍ʹདྷͨΘ͚Ͱ͸ͳ͘ɻ ࠓޙɺ͜͏͍͏ωλ͕σΟεΧογϣϯͰ ͖Ε͹͍͍ͳͱ

  6. ओʹ 8FCΞϓϦέʔγϣϯ ͷ࿩ Ͱ͕͢ɺ8FCΞϓϦέʔγϣϯ΋ଟ༷Խ͠ ͍ͯ·͢ ޙ൒ͷσβΠϯύλʔϯͷͳ͔Ͱ͍͔ͭ͘ ৮ΕΒΕΔ͔ͳʁ

  7. 2ϩάऩूΛ͍ͯ͠Δ 2qVFOUEΛ࢖͍ͬͯΔ 25%Λ࢖͍ͬͯΔ

  8. ͲΜͳϩάΛूΊͯΔʁ

  9. 8FCαʔόͷϩά

  10. ϩάͱ͍͑͹ 8FCαʔόʔͷϩά 5SFBTVSF%BUBͷνϡʔτϦ Ξϧ΋"QBDIFͷϩά http://docs.treasure-data.com/articles/quickstart

  11. ͚ͩͲຊ౰ʹཉ͍͠ͷ͸

  12. ͲΜͳϢʔβʔ͕ʁ ͲΜͳ୺຤ͰʁͲ͔͜Βʁ ͍ͭԿΛͨ͠ͷ͔ʁ ͲΜͳϘλϯΛΫϦοΫͨ͠ ͷ͔ʁλοϓͨ͠ͷ͔ʁ

  13. ΞϓϦέʔγϣϯϩά

  14. ͲΜͳϢʔβʔ͕ʁ ɹˠϢʔβʔొ࿥৘ใ ͲΜͳ୺຤ͰʁͲ͔͜Βʁ ɹˠ6"(&0 ͍ͭԿΛͨ͠ͷ͔ʁ ɹˠ63*ΞΫγϣϯ

  15. ΞϓϦέʔγϣϯϩάΛ Ͳ͏ूΊΔ͔

  16. εΩʔϚϨεϩάͱ͸ʁ

  17. εΩʔϚϨεϩάͱ͸ʁ εΩʔϚͷແ͍ϩά

  18. ϩάͷεΩʔϚ ͜Ε·Ͱ ˠྫ͑͹547

  19. ΧϥϜ໨UJNF ΧϥϜ໨TUBUVT ΧϥϜ໨VSJ ΧϥϜ໨VTFS@JE  JOEFY

  20. ΧϥϜ໨UJNF ΧϥϜ໨TUBUVT ΧϥϜ໨VSJ ΧϥϜ໨VTFS@JE  JOEFY εΩʔϚ

  21. for line in open('app.log', 'r'): columns = line.split("\t") time =

    columns[0] ...
  22. ߲໨ͷΘ͔ΓͮΒ͞ εΩʔϚมߋͷ೉͠͞ ෼ੳऀͱऩूऀͷೝࣝࠩҟʹ ΑΔࣄނ

  23. 5%ͷϩά ͱ͍͏͔qVFOUE +40/ { "time":1373876885, "status":200, "request_uri":"/52495/facebook", "session_id":"kn6avn2fuh21r25a65mgm3rjh3", "fb_id":"7c40c5dd2e55cde37a8c40ed80e1", ...

    }
  24. Θ͔Γ΍͍͢ ߲໨Λ௥ՃͰ͖Δ σʔλྔ͸૿͑Δɾ৑௕

  25. ΞϓϦέʔγϣϯϩάΛ Ͳ͏ूΊΔ͔

  26. جຊతͳϩάઃܭ

  27.  ΠϕϯτϨίʔυͱͳΔΑ ͏ʹه࿥͢Δ ˞8FCΞϓϦέʔγϣϯͷ৔߹ɺΞΫηε

  28. Πϕϯτͱ͸ 8FCΞϓϦέʔγϣϯͳΒ ɾΞΫηε ωΠςΟϒΞϓϦͳΒ ɾΠϕϯτ

  29.  جຊతͳεΩʔϚΛܾΊΔ

  30. 

  31. εΩʔϚϨεͱ͍ͬͯ΋ Ͳ͏͍͏ϩάΛѻ͍ͬͯΔͷ͔ ֤ϨίʔυͰҙຯ͕ҧͬͯ͸ҙ ຯ͕ແ͍

  32. جຊతͳεΩʔϚΛܾΊΔ UJNF TUBUVT VSJ VB SFGFSSFS 

  33. جຊతͳεΩʔϚΛܾΊΔ UJNF TUBUVT VSJ VB SFGFSSFS  LTSVͬΆ໊͍લʹ ߹Θ͓ͤͯ͘ͱΘ ͔Γ΍͍͔͢΋

  34. 8FCαʔόʹ͋Δϩά ͚ͩͰͳ͘ BQQ SPVUF DPOUSPMMFS QSPDFTT@UJNF EFWJDF 

  35. 8FCαʔόʹ͋Δϩά ͚ͩͰͳ͘ BQQ SPVUF DPOUSPMMFS QSPDFTT@UJNF EFWJDF  ϑϨʔϜϫʔΫ಺Ͱͷϧʔ ςΟϯά໊ͱ͔ɺίϯτ

    ϩʔϥ໊ͱ͔ (uri ʹϊΠζ͕͋ͬͯ΋ routing ໊ͰूܭͰ͖Δ)
  36.  ΞϓϦέʔγϣϯͷ஌Γ͏Δ ଐੑΛඇਖ਼نԽͯ͠Ϩίʔυ ʹؚΊΔ

  37. ඇਖ਼نԽ͞ΕͨϨίʔυ TFTTJPO@JE VTFS@JE HFOEFS BHF EFWJDF 

  38. ͳͥඇਖ਼نԽ͔ͷϝϦοτ +0*/ͤͣʹूܭؔ਺ʹ͔ΔͨΊ

  39. ͪͳΈʹ VTFS@JE TFTTJPO@JE ͳͲ͸IBTIԽ͓ͯ͘͠

  40. ·ͱΊΔͱ ΠϕϯτϨίʔυͱͳΔΑ͏ ʹه࿥͢Δूܭؔ਺ʹ͔ΔͨΊ جຊతͳεΩʔϚΛܾΊΔ ΞϓϦέʔγϣϯͷ஌Γ͏Δଐ ੑΛඇਖ਼نԽͯ͠ϨίʔυʹؚΊΔ

  41. ͜͜·ͰདྷΔͱɺ΋͏෼ੳ͕Մೳ

  42. ෼ੳͷྫ SELECT AVG(v[‘process_time’]) FROM access WHERE v[‘route’] = ‘crocos_index’

  43. ෼ੳͷྫ SELECT v[‘gender’], COUNT(*) FROM access GROUP BY v[‘gender’] ඇਖ਼نԽ͓͍ͯ͠

    ͯΑ͔ͬͨʂ
  44. ෼ੳͷྫ SELECT v[‘gender’], COUNT(*) FROM access GROUP BY v[‘gender’]

  45. ෼ੳͷྫ Τϥʔͷௐࠪʹ΋ SELECT v[‘route’], v[‘status’], v[‘ua’] FROM access WHERE v[‘user_id’]

    = ‘xxx’
  46. ˞௕͘ͳΔͷͰ೔෇ؔ࿈ͷॲཧ͸লུͯ͠·͢ ɹຊ౰͸೔ผʹ(3061#:ͨ͠Γ8&)&3۟ͰߜͬͨΓ

  47. εΩʔϚϨεͳ ΞϓϦέʔγϣϯϩά ͷͨΊͷ σβΠϯύλʔϯ Λߟ͑Δ

  48. ͯ͞ جຊతͳεΩʔϚΛ࣋ͭ ϩά͕ͨ·Γ࢝Ί·ͨ͠

  49. ͔͜͜Βઌ͸ ԿΛ෼ੳΛ͍ͨ͠৔߹ʹ ͲΜͳϩάΛೖΕ͓͚ͯ͹ྑ ͍͔ ύλʔϯʹ෼͚ͯߟ͑·͢

  50. εΩʔϚϨεͷग़൪

  51. جຊతͳεΩʔϚ UJNF TUBUVT VSJ VB SFGFSSFS  ͳΜͪΌΒ ͔ΜͪΌΒ

  52. جຊతͳεΩʔϚ UJNF TUBUVT VSJ VB SFGFSSFS  ͳΜͪΌΒ ͔ΜͪΌΒ ಛఆͷϨίʔυʹɺಛ

    ผͳҙຯΛ΋ͨͤΔ͜ͱ ͕Ͱ͖Δʂ ͔͠΋ଞͷϨίʔυʹӨ ڹΛ͋ͨ͑Δ͜ͱͳ͘ɻ
  53. ύλʔϯ τϥϯβΫγϣϯ

  54. ಛผͳҙຯΛ࣋ͭ ΞΫγϣϯͷ੒ޭͳͲΛ ه࿥͍ͨ͠

  55. τϥϯβΫγϣϯ uri ΍ route: ϦΫΤετ͕དྷͨ͜ͱ͸Θ͔Δ ͔͠͠ɺຊ౰ʹ੒ޭ͔ͨ͠͸ɺ ΞϓϦέʔγϣϯͰ͔͠Θ͔Β ͳ͍

  56. τϥϯβΫγϣϯ key_action key_attr_*

  57. τϥϯβΫγϣϯ key_action present:entry:completed ΞϓϦ:ಈ࡞:ঢ়گ ※͜ͷྫ͸ʮొ࿥׬ྃʯ

  58. τϥϯβΫγϣϯ key_attr_* τϥϯβΫγϣϯʹؔΘΔ෇Ճ తͳ৘ใΛͭͬ͜Ή εΩʔϚ͸ɺkey_action ͝ͱʹ ҟͳΔ

  59. τϥϯβΫγϣϯྫ key_action = shop:register:completed key_attr_user_id = xxxxx key_attr_ref = fb_share

  60. τϥϯβΫγϣϯ෼ੳͷྫ SELECT v[‘key_attr_ref’], COUNT(*) FROM access WHERE v[‘key_action’] = ‘...’

    GROUP BY v[‘key_attr_ref’]
  61. τϥϯβΫγϣϯ෼ੳ ࠷ۙΑ͘ݟͯΔσʔλ ... Ͳͷࢪࡦ͕Ұ൪ޮ͍ͨͷ͔

  62. ύλʔϯ Πϕϯτ

  63. ΞΫηεʹґଘ͠ͳ͍ ΠϕϯτͷൃੜΛ஌Γ͍ͨ

  64. ɾ+BWB4DSJQUʹΑΔΠϕϯτ ɾϞʔμϧͷදࣔ ɾ5XJUUFS΍'BDFCPPL΁ͷ γΣΞ ɾωΠςΟϒΞϓϦ

  65. Πϕϯτ tag = app:action:location & some attributes

  66. Πϕϯτྫ tag = shop:tweet:shop_item item_id = 1234 tweet_id = xxxxx

  67. Πϕϯτ෼ੳͷྫ SELECT v[‘item_id’], COUNT(*) FROM events WHERE v[‘tag’] = ‘shop:tweet:shop_item’

    GROUP BY v[‘item_id’]
  68. τϥϯβΫγϣϯͱ ࣮͸࢓૊Έ͸͔ΘΒͳ͍

  69. εΩʔϚϨεϩάͷѻ͍ํͰ ࠷΋ॏཁͳͷ͸ ղऍͷϧʔϧΛܾΊΔ͜ͱ

  70. ଟ෼͕࣌ؒແ͍ͷͰ ͜ͷ΁ΜͰ

  71. ͜͏͍͏࣌ʹ͸ ͜͏͍͏෩ʹσʔλͷूΊͯ ͜͏ղੳ͠Α͏ ͱ͍͏ͷΛڞ༗͍ͨ͠

  72. ஫ҙ͍ͨ͠ͱ͜Ζ

  73. εΩʔϚϨεͱ͍͑Ͳ ࣄલͷϩάઃܭΛ͔ͬ͠Γ΍Δ ϩά͸Ұ౓ೖΕΔͱมߋ͕೉͍͠ ˠ෼ੳ͍߲ͨ͠໨ͷ࿙Ε͕ແ͍͔ ϓϥΠόγʔ໰୊ʹ΋ؾΛ͚ͭΔ

  74. None