$30 off During Our Annual Pro Sale. View Details »

メルカリのデータ分析基盤 / mercari data analysis infrastructure

メルカリのデータ分析基盤 / mercari data analysis infrastructure

Tatsuhiko Kubo

April 27, 2017
Tweet

More Decks by Tatsuhiko Kubo

Other Decks in Technology

Transcript

  1. Tatsuhiko Kubo@cubicdaiya
    σʔλ෼ੳج൫Night #2 2017/04/26
    ͷσʔλ෼ੳج൫ /
    mercari log analysis

    View Slide

  2. @cubicdaiya / Tatsuhiko Kubo
    Principal Engineer, SRE @ Mercari, Inc.

    View Slide

  3. Agenda
    • ϝϧΧϦͷσʔλ෼ੳج൫ͷ঺հ
    • Server log analysis infrastructure
    • Event based log analysis infrastructure
    • Machine learning analysis infrastructure

    View Slide

  4. σʔλ෼ੳʹؔ͢Δ໾ׂ෼୲ @
    • BI AnalystʢData Scientistʣ
    • ෼ੳʹඞཁͳϩάσʔλߏ଄ͷઃܭ
    • σʔλ෼ੳʹج͍ͮͨاը΍ఏҊɺKPIઃܭͱϨϙʔςΟϯάɾՄࢹԽ
    • Software Engineer (Backend System, Site Reliability)
    • σʔλ෼ੳج൫ͷ։ൃɺӡ༻ɺαϙʔτ
    • Software Engineer (Machine Learning / NLP / AI)
    • (ओʹ)ػցֶशΛ͸͡Ίͱ͢Δઐ໳ٕज़ͱαʔϏεͷڮ౉͠Λߦ͏
    • ݚڀ։ൃ΋݉ͶΔ

    View Slide

  5. Agenda
    • ϝϧΧϦͷσʔλ෼ੳج൫
    • Server log analysis infrastructure
    • Event based log analysis infrastructure
    • Machine learning analysis infrastructure

    View Slide

  6. Server log analysis infrastructure
    app
    app app
    access_log
    application_log
    app_error_log
    error_log
    php_log...
    AWS S
    Check to make sure you
    recent set of AWS Simple
    This version was last upda
    (v1.4) Find the most recen
    aws.amazon.com/architect
    Usage Guidelines
    DEC
    01
    BigQuery
    AWS
    Check to make sure y
    recent set of AWS Sim
    This version was last u
    (v1.4) Find the most re
    aws.amazon.com/arch
    Always use Icon labe
    always include a label b
    the group in Arial. The
    Usage Guidelines
    DEC
    01
    Mackerel
    A
    Check to
    recent se
    This vers
    (v1.4) Fin
    aws.ama
    Always u
    always in
    the group
    Usage Guidel
    DEC
    01
    Slack
    Stream Processing
    batch
    Filtering & Import logs to BigQuery

    View Slide

  7. Server log analysis infrastructure
    • ֤αʔόͷϩάΛFluentdͰऩूɾసૹ
    • ༻్ʹԠ֤ͯ͡αʔϏε΍ϛυϧ΢ΣΞʹ౤ೖ
    • BigQueryɿ෼ੳ༻ͷϩά͸શ෦͜͜ʹूΊΔ
    • NorikraɿSQLʹΑΔετϦʔϛϯάॲཧ
    • etc…ʢe.g. KibanaɺKPI reportingʣ

    View Slide

  8. • όονͰϩάϑΝΠϧΛΠϯϙʔτ
    • over 1TB / day
    • ϩάϑΝΠϧࣗମ͸GCSɺS3ʹόοΫΞοϓ
    • Google Cloud SDK & AWS CLI & Embulk
    Google BigQuery

    View Slide

  9. • σʔλ෼ੳͷى఺
    • ։ൃऀ͕ௐࠪ໨తͰΫΤϦΛ౤͛Δ
    • μογϡϘʔυ΍֤छεϓϨουγʔτͷσʔλιʔε
    • ͦͷଞϩάσʔλΛ׆༻ͨ͠಺෦޲͚αʔϏεͰར༻
    • ఆֹϓϥϯΛར༻
    • ΫΤϦͷྉۚΛؾʹ͠ͳͯ͘Α͍
    • ͨͩ͠ɺεϩοτ͸༗ݶͳͷͰௐࢠʹ৐ͬͯॏ͍ΫΤϦΛ౤͛ա͗Δͱେ෯ʹ஗Ԇ
    • தؒςʔϒϧΛ࡞੒͢Δ͜ͱͰॲཧྔΛ࡟ݮ
    • εϩοτͷར༻ྔ͸StackdriverͰ֬ೝͰ͖Δʢᮢ஋Ξϥʔτ΋Մʣ
    Google BigQuery

    View Slide

  10. • ChartioΛར༻
    • https://chartio.com/
    • Ϋϥ΢υܕͷBIαʔϏε
    • ৭ʑͳσʔλιʔε͔Β
    μογϡϘʔυΛ࡞੒
    Dashboard

    View Slide

  11. • Google Spread Sheet
    • ͔ΒσʔλΛμ΢ϯϩʔυͯ͠ूܭɺάϥϑԽ
    • Excelܗࣜ͸׳Ε͍ͯΔਓ͕ଟ͍ͷͰɺඇΤϯδχΞͱͷڞ༗͕ḿΔ
    • Google App Script
    • ొ࿥ͨ͠ΫΤϦΛఆظతʹࣗಈ࣮ߦͯ͠ΦϨΦϨμογϡϘʔυੜ੒
    • ੜ੒ͨ͠άϥϑը૾Λ ʹ౤ߘ
    Google Spread Sheet & App Script

    View Slide

  12. • ਺෼ͷ΢Οϯυ΢ͰSQLʹΑΔूܭॲཧ
    • APIͷϦΫΤετ਺ / minͷάϥϑԽ
    • 1෼ຖʹnginxͷΩϟογϡώοτ཰ΛMackerelʹ౤͛ͯάϥϑԽ
    • ̋෼ؒʹ˚݅Ҏ্ಛఆͷΤϥʔ͕ग़ͨΒSlackʹ௨஌
    • etc…
    • Powered by fluentd-plugin-(mackerel|slack|norikra)
    Norikra

    View Slide

  13. Visualize and Alerting
    by log analysis
    Mackerel
    Slack
    Filter
    Aggregate
    Summarize
    by SQL
    Visualize
    Alerting

    View Slide

  14. Stream processing by SQL
    4&-&$5
    $06/5 VQTUSFBN@DBDIF@TUBUVT)*5
    $06/5
    "4SBUF@IJU
    $06/5 VQTUSFBN@DBDIF@TUBUVT.*44
    $06/5
    "4SBUF@NJTT
    $06/5 VQTUSFBN@DBDIF@TUBUVT&91*3&%
    $06/5
    "4SBUF@FYQJSFE
    '30.NFSDBSJ@TPNF@MPHXJOUJNF@CBUDI NJO
    8)&3&VSJlTPNFBQJz
    ͱ͋ΔnginxͷΩϟογϡώοτ཰ / min

    View Slide

  15. Agenda
    • ϝϧΧϦͷσʔλ෼ੳج൫
    • Server log analysis infrastructure
    • Event based log analysis infrastructure
    • Machine learning analysis infrastructure

    View Slide

  16. 0QFO3FTUZ
    0QFO3FTUZ
    0QFO3FTUZ
    Developer Data Sientist
    Analyze
    by SQL
    send events
    send events
    send events
    Powered by
    cookpad/puree-(ios|android)
    utilize events
    utilize events
    utilize events
    hydra(※)
    hydra(※)
    hydra(※)
    (※) fluent-agent-hydra
    Pascal - Event based log analysis infrastructure
    in_tail & out_forward
    BigQuery

    View Slide

  17. • Only Server log analysis infrastructure࣌୅ͷ՝୊(౰࣌)
    • ֤छKPI΍෼ੳʹඞཁͳݩσʔλ͕෼ࢄ
    • ϩά͕ੜͷঢ়ଶʹ͍ۙͷͰ࢖͍ͮΒ͍(ίπ͕͍Δ)
    • ֎෦ͷ෼ੳπʔϧͩͱखܰʹͰ͖ΔҰํͰࡉ͔͘ௐ΂ͨΓɺෳ
    ਺ͷσʔλ΍πʔϧͱ૊Έ߹ΘͤΔͷ͕೉͍͠
    • ෼ੳʹదͨ͠ϩάΛҰ͔Βઃܭɾूܭͯ͠෼ੳπʔϧͱ૊Έ߹Θͤ
    ͯ࢖͑ΔΑ͏ʹ͠Α͏ʂ
    • ৽͍͠ϩά෼ੳج൫Λߏங͢Δ͜ͱʹ
    Pascal - Event based log analysis infrastructure

    View Slide

  18. 0QFO3FTUZ
    0QFO3FTUZ
    0QFO3FTUZ
    Developer Data Sientist
    Analyze
    by SQL
    send events
    send events
    send events
    Powered by
    cookpad/puree-(ios|android)
    utilize events
    utilize events
    utilize events
    hydra(※)
    hydra(※)
    hydra(※)
    (※) fluent-agent-hydra
    Pascal - Event based log analysis infrastructure
    in_tail & out_forward
    BigQuery

    View Slide

  19. • over 10,000 records / sec (not requests /sec)
    • ΠϕϯτϕʔεͷϩάΛνϟϯωϧ୯ҐͰू໿ɾసૹ
    • ΞϓϦ্ͷΠϕϯτϩάʢྫɿλοϓʣ
    • ։෧ϩά
    • ABςετϩά
    • etc…
    • ΞϓϦ͔Β͚ͩͰͳ֤͘छαϒγεςϜ͔Βͷϩά΋ू໿ɾసૹ
    Pascal - Event based log analysis infrastructure

    View Slide

  20. • over 10,000 records / sec (not requests /sec)
    • ΠϕϯτϕʔεͷϩάΛνϟϯωϧ୯ҐͰू໿ɾసૹ
    • ΞϓϦ্ͷΠϕϯτϩάʢྫɿλοϓʣ
    • ։෧ϩά
    • ABςετϩά
    • etc…
    • ΞϓϦ͔Β͚ͩͰͳ֤͘छαϒγεςϜ͔Βͷϩά΋ू໿ɾసૹ
    Pascal - Event based log analysis infrastructure

    View Slide

  21. A/B Testing new features
    • A/BςετͷϑϨʔϜϫʔΫԽʹΑΔॊೈੑͷ֬อ
    • ਺े݅୯ҐͰA/BςετΛಉ࣌ਐߦ
    • A/BςετҎ֎ͷར༻ͷ࢓ํ΋͋Δ
    • ஈ֊తϦϦʔεʢ10% -> 50% -> 100%)
    • ػೳࣗମͷOn/Off
    • ݁Ռͷ෼ੳ͸Google BigQuery

    View Slide

  22. Agenda
    • ϝϧΧϦͷσʔλ෼ੳج൫
    • Server log analysis infrastructure
    • Event based log analysis infrastructure
    • Machine learning analysis infrastructure

    View Slide

  23. • ٕज़ελοΫ
    • PythonɺDjangoɺscikit-learnɺTensorFlow
    • BigQuery্ͷσʔλΛݩʹ
    • σϞάϥϑΟοΫਪఆ
    • ΧςΰϦਪఆ
    • ϥϕϦϯά
    • ৭ʑͳՕॴ͔Βར༻Ͱ͖ΔΑ͏ʹAPIͱͯ͠ఏڙ
    • ڈ೥ͷ฻Ε͙Β͍ʹઐ೚νʔϜ͕Ͱ͖ͯຊ֨తʹՔಇத
    Machine learning analysis infrastructure

    View Slide

  24. Summary
    • ϝϧΧϦͷσʔλ෼ੳج൫
    • Server log analysis infrastructure
    • Event based log analysis infrastructure
    • Machine learning analysis infrastructure
    • ϩάσʔλ෼ੳͷى఺͸Google BigQuery
    • ৭ʑͳπʔϧ΍αʔϏεͱ࿈ܞ
    • A/Bςετ͸࢓૊Έ͓ͯ͘͠ͱ৭ʑͱԠ༻͕ར͘
    • ػցֶशΛར༻ͨ͠γεςϜ΋ຊ֨తʹՔಇ։࢝

    View Slide

  25. We are hiring!
    https://www.mercari.com/jp/jobs/

    View Slide