Upgrade to Pro — share decks privately, control downloads, hide ads and more …

メルカリのデータ分析基盤 / mercari data analysis infrastructure

メルカリのデータ分析基盤 / mercari data analysis infrastructure

Tatsuhiko Kubo

April 27, 2017
Tweet

More Decks by Tatsuhiko Kubo

Other Decks in Technology

Transcript

  1. Agenda • ϝϧΧϦͷσʔλ෼ੳج൫ͷ঺հ • Server log analysis infrastructure • Event

    based log analysis infrastructure • Machine learning analysis infrastructure
  2. σʔλ෼ੳʹؔ͢Δ໾ׂ෼୲ @ • BI AnalystʢData Scientistʣ • ෼ੳʹඞཁͳϩάσʔλߏ଄ͷઃܭ • σʔλ෼ੳʹج͍ͮͨاը΍ఏҊɺKPIઃܭͱϨϙʔςΟϯάɾՄࢹԽ

    • Software Engineer (Backend System, Site Reliability) • σʔλ෼ੳج൫ͷ։ൃɺӡ༻ɺαϙʔτ • Software Engineer (Machine Learning / NLP / AI) • (ओʹ)ػցֶशΛ͸͡Ίͱ͢Δઐ໳ٕज़ͱαʔϏεͷڮ౉͠Λߦ͏ • ݚڀ։ൃ΋݉ͶΔ
  3. Agenda • ϝϧΧϦͷσʔλ෼ੳج൫ • Server log analysis infrastructure • Event

    based log analysis infrastructure • Machine learning analysis infrastructure
  4. Server log analysis infrastructure app app app access_log application_log app_error_log

    error_log php_log... AWS S Check to make sure you recent set of AWS Simple This version was last upda (v1.4) Find the most recen aws.amazon.com/architect Usage Guidelines DEC 01 BigQuery AWS Check to make sure y recent set of AWS Sim This version was last u (v1.4) Find the most re aws.amazon.com/arch Always use Icon labe always include a label b the group in Arial. The Usage Guidelines DEC 01 Mackerel A Check to recent se This vers (v1.4) Fin aws.ama Always u always in the group Usage Guidel DEC 01 Slack Stream Processing batch Filtering & Import logs to BigQuery
  5. • σʔλ෼ੳͷى఺ • ։ൃऀ͕ௐࠪ໨తͰΫΤϦΛ౤͛Δ • μογϡϘʔυ΍֤छεϓϨουγʔτͷσʔλιʔε • ͦͷଞϩάσʔλΛ׆༻ͨ͠಺෦޲͚αʔϏεͰར༻ • ఆֹϓϥϯΛར༻

    • ΫΤϦͷྉۚΛؾʹ͠ͳͯ͘Α͍ • ͨͩ͠ɺεϩοτ͸༗ݶͳͷͰௐࢠʹ৐ͬͯॏ͍ΫΤϦΛ౤͛ա͗Δͱେ෯ʹ஗Ԇ • தؒςʔϒϧΛ࡞੒͢Δ͜ͱͰॲཧྔΛ࡟ݮ • εϩοτͷར༻ྔ͸StackdriverͰ֬ೝͰ͖Δʢᮢ஋Ξϥʔτ΋Մʣ Google BigQuery
  6. • Google Spread Sheet • ͔ΒσʔλΛμ΢ϯϩʔυͯ͠ूܭɺάϥϑԽ • Excelܗࣜ͸׳Ε͍ͯΔਓ͕ଟ͍ͷͰɺඇΤϯδχΞͱͷڞ༗͕ḿΔ • Google

    App Script • ొ࿥ͨ͠ΫΤϦΛఆظతʹࣗಈ࣮ߦͯ͠ΦϨΦϨμογϡϘʔυੜ੒ • ੜ੒ͨ͠άϥϑը૾Λ ʹ౤ߘ Google Spread Sheet & App Script
  7. Stream processing by SQL 4&-&$5 $06/5  VQTUSFBN@DBDIF@TUBUVT)*5 $06/5 

    "4SBUF@IJU  $06/5  VQTUSFBN@DBDIF@TUBUVT.*44 $06/5  "4SBUF@NJTT  $06/5  VQTUSFBN@DBDIF@TUBUVT&91*3&% $06/5  "4SBUF@FYQJSFE '30.NFSDBSJ@TPNF@MPHXJOUJNF@CBUDI NJO 8)&3&VSJlTPNFBQJz ͱ͋ΔnginxͷΩϟογϡώοτ཰ / min
  8. Agenda • ϝϧΧϦͷσʔλ෼ੳج൫ • Server log analysis infrastructure • Event

    based log analysis infrastructure • Machine learning analysis infrastructure
  9. 0QFO3FTUZ 0QFO3FTUZ 0QFO3FTUZ Developer Data Sientist Analyze by SQL send

    events send events send events Powered by cookpad/puree-(ios|android) utilize events utilize events utilize events hydra(※) hydra(※) hydra(※) (※) fluent-agent-hydra Pascal - Event based log analysis infrastructure in_tail & out_forward BigQuery
  10. • Only Server log analysis infrastructure࣌୅ͷ՝୊(౰࣌) • ֤छKPI΍෼ੳʹඞཁͳݩσʔλ͕෼ࢄ • ϩά͕ੜͷঢ়ଶʹ͍ۙͷͰ࢖͍ͮΒ͍(ίπ͕͍Δ)

    • ֎෦ͷ෼ੳπʔϧͩͱखܰʹͰ͖ΔҰํͰࡉ͔͘ௐ΂ͨΓɺෳ ਺ͷσʔλ΍πʔϧͱ૊Έ߹ΘͤΔͷ͕೉͍͠ • ෼ੳʹదͨ͠ϩάΛҰ͔Βઃܭɾूܭͯ͠෼ੳπʔϧͱ૊Έ߹Θͤ ͯ࢖͑ΔΑ͏ʹ͠Α͏ʂ • ৽͍͠ϩά෼ੳج൫Λߏங͢Δ͜ͱʹ Pascal - Event based log analysis infrastructure
  11. 0QFO3FTUZ 0QFO3FTUZ 0QFO3FTUZ Developer Data Sientist Analyze by SQL send

    events send events send events Powered by cookpad/puree-(ios|android) utilize events utilize events utilize events hydra(※) hydra(※) hydra(※) (※) fluent-agent-hydra Pascal - Event based log analysis infrastructure in_tail & out_forward BigQuery
  12. • over 10,000 records / sec (not requests /sec) •

    ΠϕϯτϕʔεͷϩάΛνϟϯωϧ୯ҐͰू໿ɾసૹ • ΞϓϦ্ͷΠϕϯτϩάʢྫɿλοϓʣ • ։෧ϩά • ABςετϩά • etc… • ΞϓϦ͔Β͚ͩͰͳ֤͘छαϒγεςϜ͔Βͷϩά΋ू໿ɾసૹ Pascal - Event based log analysis infrastructure
  13. • over 10,000 records / sec (not requests /sec) •

    ΠϕϯτϕʔεͷϩάΛνϟϯωϧ୯ҐͰू໿ɾసૹ • ΞϓϦ্ͷΠϕϯτϩάʢྫɿλοϓʣ • ։෧ϩά • ABςετϩά • etc… • ΞϓϦ͔Β͚ͩͰͳ֤͘छαϒγεςϜ͔Βͷϩά΋ू໿ɾసૹ Pascal - Event based log analysis infrastructure
  14. Agenda • ϝϧΧϦͷσʔλ෼ੳج൫ • Server log analysis infrastructure • Event

    based log analysis infrastructure • Machine learning analysis infrastructure
  15. • ٕज़ελοΫ • PythonɺDjangoɺscikit-learnɺTensorFlow • BigQuery্ͷσʔλΛݩʹ • σϞάϥϑΟοΫਪఆ • ΧςΰϦਪఆ

    • ϥϕϦϯά • ৭ʑͳՕॴ͔Βར༻Ͱ͖ΔΑ͏ʹAPIͱͯ͠ఏڙ • ڈ೥ͷ฻Ε͙Β͍ʹઐ೚νʔϜ͕Ͱ͖ͯຊ֨తʹՔಇத Machine learning analysis infrastructure
  16. Summary • ϝϧΧϦͷσʔλ෼ੳج൫ • Server log analysis infrastructure • Event

    based log analysis infrastructure • Machine learning analysis infrastructure • ϩάσʔλ෼ੳͷى఺͸Google BigQuery • ৭ʑͳπʔϧ΍αʔϏεͱ࿈ܞ • A/Bςετ͸࢓૊Έ͓ͯ͘͠ͱ৭ʑͱԠ༻͕ར͘ • ػցֶशΛར༻ͨ͠γεςϜ΋ຊ֨తʹՔಇ։࢝