Upgrade to Pro — share decks privately, control downloads, hide ads and more …

minne meets Hivemall/pepabo_minne_matrix_factorization_in_hivemall

monochromegane
September 08, 2016

minne meets Hivemall/pepabo_minne_matrix_factorization_in_hivemall

第三回 Hivemall Meetup
https://eventdots.jp/event/597518

monochromegane

September 08, 2016
Tweet

More Decks by monochromegane

Other Decks in Technology

Transcript

  1. ڠௐϑΟϧλϦϯά  ᅂ޷ͷऩू  ϢʔβʔͱΞΠςϜʹΑΔධՁߦྻΛ࡞੒͢Δ  ࣅ͍ͯΔϢʔβʔΛ୳ͩ͢͠  ྫ͑͹ɺϢʔβʔؒͷධՁ఺਺ΛݩʹྨࣅੑείΞΛࢉग़͢Δ 

    ϢʔΫϦουڑ཭ɺϐΞιϯ૬ؔɺ+BDDBSE܎਺ͳͲ  ΞΠςϜΛਪન͢Δ  ࣅ͍ͯΔϢʔβʔ͕ධՁ͍ͯͯ͠ɺࣗ෼͕ධՁ͍ͯ͠ͳ͍ΞΠςϜ  ྫ͑͹ɺϢʔβʔྨࣅੑείΞʹج͍ͯධՁ༧ଌΛࢉग़͢Δ
  2. ͓͞Β͍ߦྻͷੵ 0 B B @ 1 3 1 4 2

    1 1 3 1 C C A ✓ 4 1 2 2 4 2 ◆ = 0 B B @ 10 13 8 12 17 10 10 6 6 10 13 8 1 C C A
  3. ͓͞Β͍ߦྻͷੵ w ߦͱྻͷ಺ੵ͕ߦྻ੒෼ 0 B B @ 1 3 1

    4 2 1 1 3 1 C C A ✓ 4 1 2 2 4 2 ◆ = 0 B B @ 10 13 8 12 17 10 10 6 6 10 13 8 1 C C A (1 ⇤ 4) + (3 ⇤ 2) = 10
  4. ͓͞Β͍ߦྻͷੵ w ߦͱྻͷ಺ੵ͕ߦྻ੒෼ w NºLͱLºOͷߦྻͷੵ͸NºO w ٯ΋·ͨ੒Γཱͭ 0 B B

    @ 1 3 1 4 2 1 1 3 1 C C A ✓ 4 1 2 2 4 2 ◆ = 0 B B @ 10 13 8 12 17 10 10 6 6 10 13 8 1 C C A 3 4 3 4 (4×2) ͱ (2×3) Ͱ (4×3)ͷߦྻ
  5. ͭ·Γ R = m P n n Q × m

    k k LݸͷજࡏҼࢠΛ࣋ͭͭͷߦྻͷੵʹ ΑΓɺݩͷධՁߦྻͷ஋ʹۙͮ͘Α͏ ͳA Aͷ૊Έ߹ΘͤΛ୳͢͜ͱ 0 B B @ ? ? ? ? ? ? ? ? 1 C C A ✓ ? ? ? ? ? ? ◆ = 0 B B @ 10 13 8 12 17 10 10 6 6 10 13 8 1 C C A ٻΊΒΕ֤ͨߦྻ͸ɺݩͷಛ௃Λ·ͱΊͨߦྻͱΈͳ ͢͜ͱ͕Ͱ͖Δ w 6TFSͷಛ௃ΛL࣍ݩʹ·ͱΊͨߦྻ w *UFNͷಛ௃ΛL࣍ݩʹ·ͱΊͨߦྻ
  6. ͭ·Γ R = m P n n Q × m

    k k LݸͷજࡏҼࢠΛ࣋ͭͭͷߦྻͷੵʹ ΑΓɺݩͷධՁߦྻͷ஋ʹۙͮ͘Α͏ ͳA Aͷ૊Έ߹ΘͤΛ୳͢͜ͱ 0 B B @ ? ? ? ? ? ? ? ? 1 C C A ✓ ? ? ? ? ? ? ◆ = 0 B B @ 10 13 8 12 17 10 10 6 6 10 13 8 1 C C A ˠ ϥϯμϜͳA Aͷ૊Έ߹Θ͔ͤΒٻΊ ͨ஋ͱɺਖ਼ղͷ஋ͷޡࠩΛখͯ͘͞͠ ͍͚͹Α͍ ˠޯ഑߱Լ๏
  7. .BUSJY'BDUPSJ[BUJPO w༧ଌ஋ ˆ rij = pT i qj = k

    X k=1 pikqkj e2 ij = (rij ˆ rij)2 = (rij k X k=1 pikqkj)2 p0 ik = pik + ↵ @ @pik e2 ij = pik + 2↵eijqkj q0 kj = qkj + ↵ @ @qkj e2 ij = qkj + 2↵eijpik wೋ৐ޡࠩͷ໨తؔ਺ wύϥϝλߋ৽ࣜ
  8. .BUSJY'BDUPSJ[BUJPO3FHVMBSJ[BUJPO w༧ଌ஋ ˆ rij = pT i qj = k

    X k=1 pikqkj wೋ৐ޡࠩͷ໨తؔ਺ wύϥϝλߋ৽ࣜ e2 ij = (rij ˆ rij)2 + 2 ( P 2 + Q 2 ) p0 ik = pik + ↵ @ @pik e2 ij = pik + ↵(2eijqkj pik) q0 kj = qkj + ↵ @ @qkj e2 ij = qkj + ↵(2eijpik qkj)
  9. .BUSJY'BDUPSJ[BUJPO#JBT w༧ଌ஋ wೋ৐ޡࠩͷ໨తؔ਺ wύϥϝλߋ৽ࣜ ˆ rij = µ + buser(i)

    + bitem(j) + k X k=1 pikqkj e2 ij = (rij ˆ rij)2 + 2 ( buser 2 + bitem 2 + P 2 + Q 2 ) p0 ik = pik + ↵ @ @pik e2 ij = pik + ↵(2eijqkj pik) q0 kj = qkj + ↵ @ @qkj e2 ij = qkj + ↵(2eijpik qkj) buser(i) = buser(i) + ↵ @ @buser(i) e2 ij = buser(i) + ↵(2eij buser(i)) bitem(j) = bitem(j) + ↵ @ @bitem(j) e2 ij = bitem(j) + ↵(2eij bitem(j))
  10. .BUSJY'BDUPSJ[BUJPO rows, cols := R.Dims() p := mat64.NewDense(rows, K, randomSlice(rows*K))

    q := mat64.NewDense(K, cols, randomSlice(rows*K)) for step := 0; step < 5000; step++ { for i := 0; i < rows; i++ { for j := 0; j < cols; j++ { rij := R.At(i, j) if rij == 0.0 { continue } pi := p.RowView(i) qj := q.ColView(j) err := rij - mat64.Dot(pi, qj) for k := 0; k < K; k++ { pik := p.At(i, k) qkj := q.At(k, j) p.Set(i, k, pik+alpha*(2*err*qkj)) q.Set(k, j, qkj+alpha*(2*err*pik)) } } } } return p, q # R ⽷5 0 3⽹ ⾇0 2 2⾇ ⽸4 4 0⽺ # R^ ⽷ 4.867 4.489 3.194⽹ ⾇ 2.482 2.266 1.629⾇ ⽸ 4.155 3.834 2.726⽺
  11. #JHGPPU IDFA/GAID UID rack-bigfoot Service Request Activity log Services DB

    Attribute Big Cube Cube https://icons8.com BI Recommendation Bandit algorithm Re-marketing Feedback Name identification Cookie Sync
  12. .BUSJY'BDUPSJ[BUJPOCZ)JWFNBMM Activity Rating Ratings 20% 80% testings trainings sgd_model predictions

    evaluations MAE RMSE Bias P, Q Service fav, follow etc… Recommendation Matrix Factorization
  13. SBUJOH INSERT OVERWRITE TABLE ratings SELECT r.account_id, r.creator_id, ( --

    rating ) AS rating, rand(31) as rnd FROM ( ... 3FTVMU&YQPSUΑΓ*/4&35 07&383*5& 5"#-& ͷ΄͏͕ߴ଎ ޙ޻ఔͰτϨʔχϯάͱςετσʔλʹׂΓৼΔͨΊ ϥϯμϜ஋ΛׂΓৼ͓ͬͯ͘ͱศར
  14. USBJOJOH UFTUJOH INSERT OVERWRITE TABLE training SELECT account_id, creator_id, rating,

    rnd FROM ratings WHERE rnd > 0.2 -- rnd <= 0.2 ; ΛτϨʔχϯά༻ USBJOJOH ʹɻ ಉ͡Α͏ʹͯ͠Λςετ༻ UFTUJOH ʹׂΓৼΔɻ
  15. .BUSJY'BDUPSJ[BUJPO INSERT OVERWRITE TABLE sgd_model SELECT idx, array_avg(u_rank) as Pu,

    array_avg(i_rank) as Qi, avg(u_bias) as Bu, avg(i_bias) as Bi, min(mu) as mu FROM ( SELECT train_mf_sgd(account_id, creator_id, rating, '- factor 20 -iter 50 -update_mu') AS (idx, u_rank, i_rank, u_bias, i_bias, mu) FROM training ) t GROUP BY idx;
  16. .BUSJY'BDUPSJ[BUJPO INSERT OVERWRITE TABLE sgd_model SELECT idx, array_avg(u_rank) as Pu,

    array_avg(i_rank) as Qi, avg(u_bias) as Bu, avg(i_bias) as Bi, min(mu) as mu FROM ( SELECT train_mf_sgd(account_id, creator_id, rating, '- factor 20 -iter 50 -update_mu') AS (idx, u_rank, i_rank, u_bias, i_bias, mu) FROM training ) t GROUP BY idx; ޙ޻ఔͰධՁͷฏۉ஋Λ࢖͏͕ɺ5SFBTVSF%BUB্ Ͱ͸ม਺͕࢖͑ͳ͍ͷͰʢશͯಉ͡ʹͳΔ͕ʣ VQEBUFNVΦϓγϣϯΛ࢖ͬͯɺ͜͜ͰٻΊ͓ͯ͘ ࿦ཧߏ଄ͱͯ͠͸ߦྻ͕ͩJEY͸ڞ௨Խ͞Ε͓ͯΓ ͻͱͭͷςʔϒϧʹ֨ೲ͞ΕΔ 1V 2J͸ͦΕͧΕϢʔβʔɺΞΠςϜʢ࡞Ոʣ͋ͨ ΓͷLݸͷજࡏҼࢠΛ഑ྻͱͯ࣋ͭ͠ USBJO@NG@THEʹΑΓ֬཰తޯ഑߱Լ๏ʹΑΔ .BUSJY'BDUPSJ[BUJPO͕ߦΘΕΔɻ ଞʹ΋"EB(SBE࠷దԽΛ࢖͏USBJO@NG@BEBHSBE΋ ࢖͏͜ͱ͕Ͱ͖Δ
  17. FWBMVBUJPO SELECT mae(t3.predicted, t3.actual) as mae, rmse(t3.predicted, t3.actual) as rmse

    FROM ( SELECT t2.actual, mf_predict(if(size(t2.Pu)=0, null, t2.Pu), if(size(p2.Qi)=0, null, Qi), t2.Bu, p2.Bi, t2.mu) as predicted FROM ( SELECT t1.account_id, t1.creator_id, t1.rating as actual, p1.Pu, p1.Bu, p1.mu FROM testing t1 LEFT OUTER JOIN sgd_model p1 ON (t1.account_id = p1.idx) WHERE p1.Bu IS NOT NULL AND p1.mu IS NOT NULL ) t2 LEFT OUTER JOIN sgd_model p2 ON (t2.creator_id = p2.idx) ) t3; ֶशࡁΈϞσϧΛ࢖ͬͯςετσʔλʹର͢Δ ."& 3.4&ΛٻΊΔʢ஋͸খ͍͞΄ͲΑ͍ʣ
  18. QSFEJDUJPO INSERT OVERWRITE TABLE prediction SELECT t2.account_id, t2.creator_id, t2.rank, t2.predicted

    FROM ( SELECT ROW_NUMBER() OVER(PARTITION BY t1.account_id ORDER BY t1.predicted DESC) AS rank, t1.account_id, t1.creator_id, t1.predicted FROM ( SELECT accounts_creators.account_id, accounts_creators.creator_id, mf_predict(if(size(accounts.Pu)=0, null, accounts.Pu), if(size(creators.Qi)=0, null, creators.Qi), accounts.Bu, creators.Bi, accounts.mu) AS predicted FROM -- prediction for all accounts and creators ) t1 ) t2 WHERE t2.rank <= 100; ֤ΞΧ΢ϯτͷ༧ଌϨʔςΟϯά͕ߴ͍΋ͷΛ݅ NG@QSFEJDUʹΑΓֶशࡁΈͷϞσϧ͔Β֤ΞΠςϜʹର͢Δ ༧ଌϨʔςΟϯάΛٻΊΔ ΞΧ΢ϯτ͝ͱͷ༧ଌϨʔςΟϯάͰฒͼସ͑
  19. 1FOEVMVN schedule 'test-scheduled-job' do database 'db_name' query 'select time from

    access;' retry_limit 0 priority :normal cron '30 0 * * *' timezone 'Asia/Tokyo' delay 0 result_url 'td://@/db_name/table_name' end Schedfile Apply $ pendulum --apikey='...' -a --dry-run $ pendulum --apikey='...' -a