Upgrade to Pro — share decks privately, control downloads, hide ads and more …

サービスに寄り添うログ基盤/pepabo_log_infrastructure_bigfoot

 サービスに寄り添うログ基盤/pepabo_log_infrastructure_bigfoot

サービスに寄り添うログ基盤 - ログ収集のその先に -
はてな・ペパボ技術大会〜インフラ技術基盤〜@京都
http://hatena.connpass.com/event/33521/

monochromegane

July 02, 2016
Tweet

More Decks by monochromegane

Other Decks in Programming

Transcript

  1. ϩάऩूͷͦͷઌʹ
    ࡾ୐༔հ(.01&1"#0JOD
    ͸ͯͳɾϖύϘٕज़େձʙΠϯϑϥٕज़ج൫ʙ!ژ౎
    αʔϏεʹدΓఴ͏ϩάج൫

    View Slide

  2. ϓϦϯγύϧΤϯδχΞ
    ࡾ୐༔հ!NPOPDISPNFHBOF
    NJOOFࣄۀ෦
    IUUQCMPHNPOPDISPNFHBOFDPN

    View Slide

  3. NJOOF
    IUUQTNJOOFDPN

    View Slide

  4. ໨࣍
    w8FCαʔϏεͱߦಈϩά
    w#JHGPPU
    wαʔϏεʹدΓఴ͏ϩάج൫

    View Slide

  5. 8FCαʔϏεͱߦಈϩά

    View Slide

  6. ϩά͸͍͍ͧ

    View Slide

  7. ߦಈϩά

    View Slide

  8. ߦಈϩά
    ΞϓϦέʔγϣϯ૚Ͱग़ྗ͢Δϩά
    ͍ͭɺͩΕ͕ɺͳʹΛ΍͔͕ͬͨಛఆͰ͖Δ
    ࠷ऴతͳߦಈ݁Ռ͚ͩͰͳ͘ɺ్தͷͲ͜Ͱ͖͋ΒΊ͔ͨɺͲ͏໎͔ͬͨ
    ͕Θ͔Δ

    View Slide

  9. ߦಈϩάʹ͸
    αʔϏεվળͷώϯτ͕ͭ·͍ͬͯΔ

    View Slide

  10. ߦಈϩάͷ׆༻ஈ֊

    View Slide

  11. ߦಈϩάͷ׆༻ஈ֊
    ऩूߦಈϩά͕ग़ྗ͞ΕɺऔΓ·ͱΊΒΕ͍ͯΔঢ়ଶ
    ෼ੳऔΓ·ͱΊͨߦಈϩάΛࢹ֮Խɺ෼ੳͰ͖Δঢ়ଶ
    ׆༻෼ੳͨ͠ߦಈϩάΛ΋ͱʹܧଓతͳαʔϏεվળ͕ߦ͍͑ͯΔঢ়ଶ

    View Slide

  12. ϩάج൫

    View Slide

  13. ϩάج൫ʹେ੾ͳ͜ͱ

    View Slide

  14. lϩάͷ׆༻z

    View Slide

  15. ϩάl׆༻zج൫

    View Slide

  16. #JHGPPU

    View Slide

  17. #JHGPPU
    wϖύϘͷ࣍ੈ୅ϩάl׆༻zج൫
    wߦಈϩάͷऩूɺ෼ੳɺ׆༻ͷ֤ஈ֊ʹ͓͍ͯɺશࣾͰར༻Ͱ͖Δ൚༻ੑͱ۩
    ମతͳ׆༻ํ๏Λఏڙ
    wࠃ಺࠷େڃϋϯυϝΠυϚʔέοτNJOOFΛࢧ͑Δϩάج൫

    View Slide

  18. #JHGPPU
    IDFA/GAID
    UID
    rack-bigfoot
    Service
    Request
    Activity
    log
    Services
    DB
    Attribute
    Big Cube
    Cube
    https://icons8.com
    BI
    Recommendation
    Bandit algorithm
    Re-marketing
    Feedback
    Name identification
    Cookie Sync

    View Slide

  19. #JHGPPUΛࢧ͑Δٕज़

    View Slide

  20. ऩूɾ෼ੳ

    View Slide

  21. ϩάΛૹΔ

    View Slide

  22. SBDLCJHGPPU
    w3BJMTΞϓϦέʔγϣϯͱ'MVFOUEΛͭͳ͙3BDLϛυϧ΢ΣΞ
    w#JHGPPUʹඞཁͳڞ௨ύϥϝλΛϦΫΤετɾϨεϙϯεϔομ͔Βऔಘ
    wαʔϏεݻ༗ͷύϥϝλΛ෇༩͢Δ͜ͱ΋Մೳ
    Rails.application.config.app_middleware.insert_after ActionDispatch::Callbacks,
    Rack::Bigfoot do |config|
    config.service = 'minne'
    config.environment = Rails.env
    config.enable_fluent = Rails.env.production? || Rails.env.staging?
    config.ignore_path_patterns << %r(\A/healthcheck)
    config.headers << 'HTTP_X_CLIENT_VERSION'
    end

    View Slide

  23. ϩάΛͨΊΔ

    View Slide

  24. 5SFBTVSF%BUB
    wΫϥ΢υܕσʔλϚωδϝϯταʔϏε
    wIUUQTXXXUSFBTVSFEBUBDPN
    wେ༰ྔͷϩάอଘɺ෼ࢄॲཧʹΑΔߴ଎ͳϩάૢ࡞
    log Plasma DB
    HiveQL
    export
    SQL
    aggregate
    Data Tanks

    View Slide

  25. ϩάΛѻ͏

    View Slide

  26. )JWF2-
    w5SFBTVSF%BUB্ͷߦಈϩάΛ42-ϥΠΫʹѻ͏
    wIUUQIJWFBQBDIFPSH
    wIUUQTEPDTUSFBTVSFEBUBDPNBSUJDMFTIJWF
    SELECT
    TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss', 'JST') AS timestamp,
    response_time,
    request_method,
    path_info
    FROM
    activity
    WHERE
    TD_TIME_RANGE(time, '2016-07-01 10:00:00', '2016-07-01 12:00:00', 'JST');

    View Slide

  27. ϫʔΫϑϩʔ
    w5SFBTVSF%BUBͷεέδϡʔϧΫΤϦΛར༻
    wΫΤϦͷίʔυ؅ཧ༻ʹ1FOEVMVNΛ։ൃ
    wIUUQTHJUIVCDPNNPOPDISPNFHBOFQFOEVMVN
    w%4-ʹΑͬͯεέδϡʔϧΫΤϦΛهड़͠ɺίʔυ؅ཧ
    Scheduled queries
    Queries on GitHub
    Apply
    Pendulum

    View Slide

  28. 1FOEVMVN
    schedule 'test-scheduled-job' do
    database 'db_name'
    query 'select time from access;'
    retry_limit 0
    priority :normal
    cron '30 0 * * *'
    timezone 'Asia/Tokyo'
    delay 0
    result_url 'td://@/db_name/table_name'
    end
    Schedfile
    Apply
    $ pendulum --apikey='...' -a --dry-run
    $ pendulum --apikey='...' -a

    View Slide

  29. %JHEBHҠߦத
    IUUQTHJUIVCDPNUSFBTVSFEBUBEJHEBH

    View Slide

  30. ϩάΛศརʹ͢Δ

    View Slide

  31. ଐੑ৘ใ
    wߦಈϩάͱଐੑ৘ใΛ૊Έ߹ΘͤΔ͜ͱͰ෼ੳ࣌ͷ෯͕޿͕Δ
    Attribute
    Master 1,000 records each
    Sidekiq workers
    def perform(*args)
    User.order(:id).select(:id).find_in_batches do |users|
    UserAttributesUploadJob.perform_later(users.first.id, users.last.id)
    end
    end
    Activity
    Join
    HiveQL
    No temporary file

    View Slide

  32. ໊دͤ
    wαʔϏεͷΞΧ΢ϯτͱ֤ΫϥΠΞϯτΛϚοϐϯά
    wະϩάΠϯঢ়ଶͷΞΧ΢ϯτ΋໊دͤޙʹաڈʹḪͬͯඥ෇͚
    w$PPLJF4ZODͱ૊Έ߹ΘͤͯαʔϏεΛ·͍ͨͩϚοϐϯά΋Մೳ
    Name identification

    View Slide

  33. ϩάΛ෼ੳ͢Δ

    View Slide

  34. #JH$VCFͱ$VCF
    wશαʔϏεͷߦಈϩάΛू໿ͨ͠#JH$VCF
    w੾ޱ͕֬ఆͨ͠΋ͷ͸ϝδϟʔΧϥϜɺσΟ
    ϝϯγϣϯΧϥϜͷ୯ҐͰ$VCFʹ੾Γग़͠
    wϝδϟʔఆྔԽՄೳͳΧϥϜ
    wσΟϝϯγϣϯूܭͷ੾ΓޱͱͳΔΧϥϜ
    wྫ࣌ؒ͝ͱͷച্ɺ౎ಓ෎ݝ͝ͱͷ࡞඼਺
    w$VCF͸σʔλϚʔτʹஔ͖ɺߴ଎ʹࢀরͰ
    ͖ΔΑ͏ʹ͢Δ
    Activity Big Cube
    Cube
    HiveQL SQL
    BI, Dashboard
    ad-hoc query
    Analyst Managers,
    Product owners,
    Promotion groups

    View Slide

  35. ࢹ֮Խͱ෼ੳ
    wࢹ֮Խͱ෼ੳʹ͸5BCMFBVࣾͷ5BCMFBV%FTLUPQΛར༻
    wIUUQXXXUBCMFBVDPN
    wσʔλιʔεͱͯ͠5SFBTVSF%BUBΛબ୒Մ
    w μογϡϘʔυྫ
    w ྲྀ௨ֹɺΩϟϯηϧֹۚɺ஫จֹۚɺϢʔβʔ୯Ձ
    w ྦྷੵձһ਺ɺ஫จ୯Ձɺ৽نొ࿥Ϣʔβʔɺ%"6$৽نɺ%"6$طଘ
    w ड஫࡞඼਺ɺड஫཰ɺड஫࡞඼Ձ֨ɺड஫Մೳ࡞඼਺
    w ૯ࡏݿ਺ɺࡏݿ୯Ձɺࡏݿ૯ֹ
    w ड஫Մೳ࡞Ո਺ɺൢചத࡞඼਺ɺ։ళத࡞඼਺ɺ૯࡞඼਺

    View Slide

  36. ׆༻

    View Slide

  37. ׆༻
    w෼ੳͨ݁͠ՌΛ΋ͱʹԾઆΛཱͯͯγεςϜͷվमΛߦ͏
    wը໘σβΠϯͷมߋɺεςοϓͷݟ௚͠
    w"#ςετ
    ˠ੩తͳϑΟʔυόοΫ

    View Slide

  38. ಈతͳϑΟʔυόοΫ

    View Slide

  39. όϯσΟοτΞϧΰϦζϜ
    w୳ٻͱ׆༻ͷׂ߹Λߋ৽͠ଓ͚Δ͜ͱͰ"#ςετͷػձଛࣦΛݮΒ͢
    wIUUQTXXXPSFJMMZDPKQCPPLT
    wྫ͑͹ɺ͋Δػೳͷ$53Λվળ͢ΔͨΊʹׂ͸࠷ળͷख๏ʢ׆༻ʣɺ࢒Γ
    ׂͰෳ਺ͷख๏Λࢼ͢ʢ୳ٻʣ
    Activity
    Epsilon-Greedy algorithm
    User
    1-ε: exploitation
    ε/pattern: exploration
    Click or not click
    Import

    View Slide

  40. Ϩίϝϯυ
    wNJOOFʮ͋ͳͨʹ͓͢͢Ίͷ࡞Ոʯ
    wϢʔβʔͷߦಈΛجʹ࡞ՈΛϨʔςΟϯά
    Activity
    Filter and shuffle
    Users
    fav, follow etc…
    Matrix Factorization
    Recommendation
    import
    DB

    View Slide

  41. Ϩίϝϯυ.BUSJY'BDUPSJ[BUJPO
    wڠௐϑΟϧλϦϯάϢʔβͷᅂ޷৘ใΛ஝ੵ͠ɺ͋ΔϢʔβͱᅂ޷ͷྨࣅ͠
    ͨଞͷϢʔβͷ৘ใΛ༻͍ͯਪ࿦Λߦ͏
    w.BUSJY'BDUPSJ[BUJPO
    w࣍ݩ࡟ݮ
    wϢʔβʔ΍࡞඼͝ͱͷධՁͷภΓ͕͋Γɺૄͳσʔλʹର͢ΔධՁ༧ଌ





    Item
    User R ≈
    =
    m
    P
    n n Q
    ×
    m
    k
    k

    View Slide

  42. Ϩίϝϯυ.BUSJY'BDUPSJ[BUJPO
    R’ui =
    μ + Bu
    + Bi
    + Pu
    TQi
    minP,Q,B
    Σ (Rui
    - R’ui
    )2 + λ(||Bu
    ||2 + ||Bi
    ||2 + ||Pu
    ||2 + ||Qi
    ||2)
    ༧ଌ
    ֶश
    (u,i)∈R
    ਖ਼ଇԽ߲
    ޡࠩ
    ฏۉ όΠΞε

    View Slide

  43. )JWFNBMM

    View Slide

  44. Ϩίϝϯυ.BUSJY'BDUPSJ[BUJPO
    SELECT
    idx,
    array_avg(u_rank) as Pu,
    array_avg(i_rank) as Qi,
    avg(u_bias) as Bu,
    avg(i_bias) as Bi,
    min(mu) as mu
    FROM (
    SELECT train_mf_sgd(account_id, creator_id, rating,
    '-factor 20 -iter 50 -update_mu') AS (idx, u_rank,
    i_rank, u_bias, i_bias, mu)
    FROM training
    ) t
    GROUP BY idx;

    View Slide

  45. ͳͲͳͲ
    w͍ΘΏΔɺӾཡ์غɺΧʔτ์غͷ࡞඼Λߦಈϩά͔Βநग़
    wಛఆͷ৚݅Ͱݺͼ໭͠ͷ௨஌Λߦ͏
    wߦಈϩά͔Βؔ࿈ੑͷߴ͍޿ࠂΛग़͢
    wϦϚʔέςΟϯά
    w޿ࠂର৅ͷηάϝϯτԽʢߜࠐɺআ֎ʣ
    CSPXTF DBSUBCBOEPONFOU
    ޿ࠂ࿈ܞ

    View Slide

  46. αʔϏεʹدΓఴ͏ϩάج൫

    View Slide

  47. αʔϏεʹدΓఴ͏ϩάج൫
    w୯ʹϩάΛूΊΔ͚ͩʹͤͣɺ෼ੳɺ׆༻ͷஈ֊Λิॿ͢Δ
    w੩తͳϑΟʔυόοΫ͔ΒಈతͳϑΟʔυόοΫ΁
    wߦಈϩάͷ॥؀ʹΑΓɺͳΊΒ͔ͳੈք΁

    View Slide

  48. ϩά͸͍͍ͧ

    View Slide

  49. ͓ΘΓ

    View Slide

  50. ܅΋ϖύϘͰಇ͔ͳ͍͔ʁ
    ࠷৽ͷ࠾༻৘ใΛνΣοΫˠ [email protected]

    View Slide