サービスに寄り添うログ基盤/pepabo_log_infrastructure_bigfoot

 サービスに寄り添うログ基盤/pepabo_log_infrastructure_bigfoot

サービスに寄り添うログ基盤 - ログ収集のその先に -
はてな・ペパボ技術大会〜インフラ技術基盤〜@京都
http://hatena.connpass.com/event/33521/

Cd3d2cb2dadf5488935fe0ddaea7938a?s=128

monochromegane

July 02, 2016
Tweet

Transcript

  1. ϩάऩूͷͦͷઌʹ ࡾ୐༔հ(.01&1"#0JOD ͸ͯͳɾϖύϘٕज़େձʙΠϯϑϥٕज़ج൫ʙ!ژ౎ αʔϏεʹدΓఴ͏ϩάج൫

  2. ϓϦϯγύϧΤϯδχΞ ࡾ୐༔հ!NPOPDISPNFHBOF NJOOFࣄۀ෦ IUUQCMPHNPOPDISPNFHBOFDPN

  3. NJOOF IUUQTNJOOFDPN

  4. ໨࣍ w8FCαʔϏεͱߦಈϩά w#JHGPPU wαʔϏεʹدΓఴ͏ϩάج൫

  5. 8FCαʔϏεͱߦಈϩά

  6. ϩά͸͍͍ͧ

  7. ߦಈϩά

  8. ߦಈϩά ΞϓϦέʔγϣϯ૚Ͱग़ྗ͢Δϩά ͍ͭɺͩΕ͕ɺͳʹΛ΍͔͕ͬͨಛఆͰ͖Δ ࠷ऴతͳߦಈ݁Ռ͚ͩͰͳ͘ɺ్தͷͲ͜Ͱ͖͋ΒΊ͔ͨɺͲ͏໎͔ͬͨ ͕Θ͔Δ

  9. ߦಈϩάʹ͸ αʔϏεվળͷώϯτ͕ͭ·͍ͬͯΔ

  10. ߦಈϩάͷ׆༻ஈ֊

  11. ߦಈϩάͷ׆༻ஈ֊ ऩूߦಈϩά͕ग़ྗ͞ΕɺऔΓ·ͱΊΒΕ͍ͯΔঢ়ଶ ෼ੳऔΓ·ͱΊͨߦಈϩάΛࢹ֮Խɺ෼ੳͰ͖Δঢ়ଶ ׆༻෼ੳͨ͠ߦಈϩάΛ΋ͱʹܧଓతͳαʔϏεվળ͕ߦ͍͑ͯΔঢ়ଶ

  12. ϩάج൫

  13. ϩάج൫ʹେ੾ͳ͜ͱ

  14. lϩάͷ׆༻z

  15. ϩάl׆༻zج൫

  16. #JHGPPU

  17. #JHGPPU wϖύϘͷ࣍ੈ୅ϩάl׆༻zج൫ wߦಈϩάͷऩूɺ෼ੳɺ׆༻ͷ֤ஈ֊ʹ͓͍ͯɺશࣾͰར༻Ͱ͖Δ൚༻ੑͱ۩ ମతͳ׆༻ํ๏Λఏڙ wࠃ಺࠷େڃϋϯυϝΠυϚʔέοτNJOOFΛࢧ͑Δϩάج൫

  18. #JHGPPU IDFA/GAID UID rack-bigfoot Service Request Activity log Services DB

    Attribute Big Cube Cube https://icons8.com BI Recommendation Bandit algorithm Re-marketing Feedback Name identification Cookie Sync
  19. #JHGPPUΛࢧ͑Δٕज़

  20. ऩूɾ෼ੳ

  21. ϩάΛૹΔ

  22. SBDLCJHGPPU w3BJMTΞϓϦέʔγϣϯͱ'MVFOUEΛͭͳ͙3BDLϛυϧ΢ΣΞ w#JHGPPUʹඞཁͳڞ௨ύϥϝλΛϦΫΤετɾϨεϙϯεϔομ͔Βऔಘ wαʔϏεݻ༗ͷύϥϝλΛ෇༩͢Δ͜ͱ΋Մೳ Rails.application.config.app_middleware.insert_after ActionDispatch::Callbacks, Rack::Bigfoot do |config| config.service

    = 'minne' config.environment = Rails.env config.enable_fluent = Rails.env.production? || Rails.env.staging? config.ignore_path_patterns << %r(\A/healthcheck) config.headers << 'HTTP_X_CLIENT_VERSION' end
  23. ϩάΛͨΊΔ

  24. 5SFBTVSF%BUB wΫϥ΢υܕσʔλϚωδϝϯταʔϏε wIUUQTXXXUSFBTVSFEBUBDPN wେ༰ྔͷϩάอଘɺ෼ࢄॲཧʹΑΔߴ଎ͳϩάૢ࡞ log Plasma DB HiveQL export SQL

    aggregate Data Tanks
  25. ϩάΛѻ͏

  26. )JWF2- w5SFBTVSF%BUB্ͷߦಈϩάΛ42-ϥΠΫʹѻ͏ wIUUQIJWFBQBDIFPSH wIUUQTEPDTUSFBTVSFEBUBDPNBSUJDMFTIJWF SELECT TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss', 'JST') AS

    timestamp, response_time, request_method, path_info FROM activity WHERE TD_TIME_RANGE(time, '2016-07-01 10:00:00', '2016-07-01 12:00:00', 'JST');
  27. ϫʔΫϑϩʔ w5SFBTVSF%BUBͷεέδϡʔϧΫΤϦΛར༻ wΫΤϦͷίʔυ؅ཧ༻ʹ1FOEVMVNΛ։ൃ wIUUQTHJUIVCDPNNPOPDISPNFHBOFQFOEVMVN w%4-ʹΑͬͯεέδϡʔϧΫΤϦΛهड़͠ɺίʔυ؅ཧ Scheduled queries Queries on GitHub

    Apply Pendulum
  28. 1FOEVMVN schedule 'test-scheduled-job' do database 'db_name' query 'select time from

    access;' retry_limit 0 priority :normal cron '30 0 * * *' timezone 'Asia/Tokyo' delay 0 result_url 'td://@/db_name/table_name' end Schedfile Apply $ pendulum --apikey='...' -a --dry-run $ pendulum --apikey='...' -a
  29. %JHEBHҠߦத IUUQTHJUIVCDPNUSFBTVSFEBUBEJHEBH

  30. ϩάΛศརʹ͢Δ

  31. ଐੑ৘ใ wߦಈϩάͱଐੑ৘ใΛ૊Έ߹ΘͤΔ͜ͱͰ෼ੳ࣌ͷ෯͕޿͕Δ Attribute Master 1,000 records each Sidekiq workers def

    perform(*args) User.order(:id).select(:id).find_in_batches do |users| UserAttributesUploadJob.perform_later(users.first.id, users.last.id) end end Activity Join HiveQL No temporary file
  32. ໊دͤ wαʔϏεͷΞΧ΢ϯτͱ֤ΫϥΠΞϯτΛϚοϐϯά wະϩάΠϯঢ়ଶͷΞΧ΢ϯτ΋໊دͤޙʹաڈʹḪͬͯඥ෇͚ w$PPLJF4ZODͱ૊Έ߹ΘͤͯαʔϏεΛ·͍ͨͩϚοϐϯά΋Մೳ Name identification

  33. ϩάΛ෼ੳ͢Δ

  34. #JH$VCFͱ$VCF wશαʔϏεͷߦಈϩάΛू໿ͨ͠#JH$VCF w੾ޱ͕֬ఆͨ͠΋ͷ͸ϝδϟʔΧϥϜɺσΟ ϝϯγϣϯΧϥϜͷ୯ҐͰ$VCFʹ੾Γग़͠ wϝδϟʔఆྔԽՄೳͳΧϥϜ wσΟϝϯγϣϯूܭͷ੾ΓޱͱͳΔΧϥϜ wྫ࣌ؒ͝ͱͷച্ɺ౎ಓ෎ݝ͝ͱͷ࡞඼਺ w$VCF͸σʔλϚʔτʹஔ͖ɺߴ଎ʹࢀরͰ ͖ΔΑ͏ʹ͢Δ Activity

    Big Cube Cube HiveQL SQL BI, Dashboard ad-hoc query Analyst Managers, Product owners, Promotion groups
  35. ࢹ֮Խͱ෼ੳ wࢹ֮Խͱ෼ੳʹ͸5BCMFBVࣾͷ5BCMFBV%FTLUPQΛར༻ wIUUQXXXUBCMFBVDPN wσʔλιʔεͱͯ͠5SFBTVSF%BUBΛબ୒Մ w μογϡϘʔυྫ w ྲྀ௨ֹɺΩϟϯηϧֹۚɺ஫จֹۚɺϢʔβʔ୯Ձ w ྦྷੵձһ਺ɺ஫จ୯Ձɺ৽نొ࿥Ϣʔβʔɺ%"6$৽نɺ%"6$طଘ

    w ड஫࡞඼਺ɺड஫཰ɺड஫࡞඼Ձ֨ɺड஫Մೳ࡞඼਺ w ૯ࡏݿ਺ɺࡏݿ୯Ձɺࡏݿ૯ֹ w ड஫Մೳ࡞Ո਺ɺൢചத࡞඼਺ɺ։ళத࡞඼਺ɺ૯࡞඼਺
  36. ׆༻

  37. ׆༻ w෼ੳͨ݁͠ՌΛ΋ͱʹԾઆΛཱͯͯγεςϜͷվमΛߦ͏ wը໘σβΠϯͷมߋɺεςοϓͷݟ௚͠ w"#ςετ ˠ੩తͳϑΟʔυόοΫ

  38. ಈతͳϑΟʔυόοΫ

  39. όϯσΟοτΞϧΰϦζϜ w୳ٻͱ׆༻ͷׂ߹Λߋ৽͠ଓ͚Δ͜ͱͰ"#ςετͷػձଛࣦΛݮΒ͢ wIUUQTXXXPSFJMMZDPKQCPPLT wྫ͑͹ɺ͋Δػೳͷ$53Λվળ͢ΔͨΊʹׂ͸࠷ળͷख๏ʢ׆༻ʣɺ࢒Γ ׂͰෳ਺ͷख๏Λࢼ͢ʢ୳ٻʣ Activity Epsilon-Greedy algorithm User 1-ε:

    exploitation ε/pattern: exploration Click or not click Import
  40. Ϩίϝϯυ wNJOOFʮ͋ͳͨʹ͓͢͢Ίͷ࡞Ոʯ wϢʔβʔͷߦಈΛجʹ࡞ՈΛϨʔςΟϯά Activity Filter and shuffle Users fav, follow

    etc… Matrix Factorization Recommendation import DB
  41. Ϩίϝϯυ.BUSJY'BDUPSJ[BUJPO wڠௐϑΟϧλϦϯάϢʔβͷᅂ޷৘ใΛ஝ੵ͠ɺ͋ΔϢʔβͱᅂ޷ͷྨࣅ͠ ͨଞͷϢʔβͷ৘ใΛ༻͍ͯਪ࿦Λߦ͏ w.BUSJY'BDUPSJ[BUJPO w࣍ݩ࡟ݮ wϢʔβʔ΍࡞඼͝ͱͷධՁͷภΓ͕͋Γɺૄͳσʔλʹର͢ΔධՁ༧ଌ    

             Item User R ≈ = m P n n Q × m k k
  42. Ϩίϝϯυ.BUSJY'BDUPSJ[BUJPO R’ui = μ + Bu + Bi + Pu

    TQi minP,Q,B Σ (Rui - R’ui )2 + λ(||Bu ||2 + ||Bi ||2 + ||Pu ||2 + ||Qi ||2) ༧ଌ ֶश (u,i)∈R ਖ਼ଇԽ߲ ޡࠩ ฏۉ όΠΞε
  43. )JWFNBMM

  44. Ϩίϝϯυ.BUSJY'BDUPSJ[BUJPO SELECT idx, array_avg(u_rank) as Pu, array_avg(i_rank) as Qi, avg(u_bias)

    as Bu, avg(i_bias) as Bi, min(mu) as mu FROM ( SELECT train_mf_sgd(account_id, creator_id, rating, '-factor 20 -iter 50 -update_mu') AS (idx, u_rank, i_rank, u_bias, i_bias, mu) FROM training ) t GROUP BY idx;
  45. ͳͲͳͲ w͍ΘΏΔɺӾཡ์غɺΧʔτ์غͷ࡞඼Λߦಈϩά͔Βநग़ wಛఆͷ৚݅Ͱݺͼ໭͠ͷ௨஌Λߦ͏ wߦಈϩά͔Βؔ࿈ੑͷߴ͍޿ࠂΛग़͢ wϦϚʔέςΟϯά w޿ࠂର৅ͷηάϝϯτԽʢߜࠐɺআ֎ʣ CSPXTF DBSUBCBOEPONFOU ޿ࠂ࿈ܞ

  46. αʔϏεʹدΓఴ͏ϩάج൫

  47. αʔϏεʹدΓఴ͏ϩάج൫ w୯ʹϩάΛूΊΔ͚ͩʹͤͣɺ෼ੳɺ׆༻ͷஈ֊Λิॿ͢Δ w੩తͳϑΟʔυόοΫ͔ΒಈతͳϑΟʔυόοΫ΁ wߦಈϩάͷ॥؀ʹΑΓɺͳΊΒ͔ͳੈք΁

  48. ϩά͸͍͍ͧ

  49. ͓ΘΓ

  50. ܅΋ϖύϘͰಇ͔ͳ͍͔ʁ ࠷৽ͷ࠾༻৘ใΛνΣοΫˠ !QC@SFDSVJU