Livesense Inc.
October 05, 2018
3.4k

# Julia によるレコメンドアルゴリズム実装

2018/10/04
MACHINE LEARNING Meetup KANSAI #3

October 05, 2018

## Transcript

1. ### Julia ʹΑΔϨίϝϯυΞϧΰϦζϜ࣮૷ Shotaro Tanaka / @yubessy / Ϧϒηϯε MACHINE LEARNING

Meetup KANSAI #3 LT

9. ### ٻਓϨίϝϯυͷಛ௃ ECαΠτ΍Web޿ࠂͱ͸ҟͳΔσʔλɾγεςϜཁ݅ • ΞΠςϜ਺ɾϢʔβ਺ɾϢʔβຖͷධՁΞΠςϜ਺͕ͦΕ΄Ͳଟ͘ͳ͍ • ΦϯϥΠϯॲཧͰ͸ͳ͘ఆظతͳόονॲཧͰ΋໰୊ͳ͍ ϨίϝϯυΞϧΰϦζϜʹٻΊΒΕ͜ͱ • ධՁ਺͕গͳ͍ϢʔβͰ΋͋Δఔ౓ྑ͍݁Ռ͕ग़ͯ΄͍͠ •

ܭࢉྔ͕͋Δఔ౓͔͔ͬͨͱͯ͠΋ਫ਼౓͕ߴ͍΄͏͕Α͍
10. ### BPMF ΞϧΰϦζϜ Matrix Factorization Λ֊૚ϕΠζͰϞσϧԽ • MAPਪఆͰ͸ͳ͘ϕΠζਪఆ → গͳ͍σʔλͰ΋ΦʔόʔϑΟοτ͠ʹ͍͘ •

ϢʔβɾΞΠςϜͷҼࢠߦྻͷύϥϝʔλʹ΋ࣄલ෼෍Λઃఆ → ϋΠύʔύϥϝʔλνϡʔχϯάͷख͕͔͔ؒΒͳ͍ • ਪఆ͸ MCMC (Gibbs Sampling) ͰՄೳ ৄࡉ: BPMF (Bayesian Probabilistic Matrix Factorization) ʹΑΔϨίϝϯυ

12. ### Python Ͱͷ BPMF ͷ࣮૷Πϝʔδ def bpmf_gibbs_sampling(R, D=10, T=1000): N, M

= R.shape[0], R.shape[1] U, V = np.zeros((T, N, D)), np.zeros((T, M, D)) # Gibbs Sampling Ͱ Tݸ ͷαϯϓϧΛܭࢉ for t in range(T - 1): # U, V ͷύϥϝʔλͷαϯϓϦϯά lamU, muU = sample_param_U(U[t, :, :]) lamV, muV = sample_param_V(V[t, :, :]) # U ͷαϯϓϦϯά for i in range(N): U[t+1, i, :] = sample_U(R[i, :], U[t, :, :], V[t, :, :], lamU, muU) # V ͷαϯϓϦϯά for j in range(M): V[t+1, j, :] = sample_V(R[i, :], U[t+1, :, :], V[t, :, :], lamV, muV) return U, V
13. ### MCMC ͱ Python MCMC (Gibbs Sampling) • લͷαϯϓϧ͔Β࣍ͷαϯϓϧΛܭࢉ͢Δ͜ͱΛ܁Γฦ͢ (ϥϯμϜ΢ΥʔΫ) •

ૉ௚ʹ࣮૷͢Δͱԋࢉεςοϓ͕Ͳ͏ͯ͠΋ଟ͘ͳΔ Python • ΠϯλϓϦλํࣜͰ1εςοϓͣͭίʔυΛ࣮ߦ • εςοϓ਺ͷଟ͍ԋࢉΛ for ϧʔϓͰ܁Γฦ͢Α͏ͳॲཧ͸஗͘ͳΓ͕ͪ
14. ### Python ͰίϯύΠϧɾɾɾ͢Δʁ Cython (Ahead-of-Time compilation) • ߏจ͕ࣅ͍ͯΔ΋ͷͷ Python ͱ͸ผݴޠ •

ه๏Λ֮͑Δίετ͕ͦΕͳΓʹߴ͍ Numba (Just-in-Time compilation) • ஫ҙ͠ͳ͍ͱ object ܕʹϑΥʔϧόοΫͯ͠ߴ଎Խ͕͖͔ͳ͍ • ϥΠϒϥϦؔ਺ͷݺͼग़͠෦෼͸ίϯύΠϧͰ͖ͳ͍͜ͱ΋

16. ### ਺஋ΞϧΰϦζϜ࣮૷ͱͷ਌࿨ੑ Julia Python ଟ࣍ݩ഑ྻ ૊ΈࠐΈܕ NumPy ઢܗ୅਺ ඪ४ϥΠϒϥϦ NumPy, SciPy

ίϯύΠϧ JIT͕ඪ४ Numba, Cython • ਺஋ΞϧΰϦζϜͷ࣮૷ʹඞཁͳػೳΛ͸͡Ί͔Β౥ࡌ • JITίϯύΠϧʹΑΓ࠷େͰCͷ1/2ఔ౓ͷύϑΥʔϚϯε (ެশ)
17. ### Julia Ͱͷ BPMF ͷ࣮૷Πϝʔδ function bpmf(R::SparseMatrixCSC{Float64}, D::Int = 10, T::Int

= 1000) N, M = size(R, 1), size(R, 2) U, V = zeros(T, N, D), zeros(T, M, D) # Gibbs Sampling Ͱ Tݸ ͷαϯϓϧΛܭࢉ for t in 1:(T-1) # U, V ͷύϥϝʔλͷαϯϓϦϯά λ_u, μ_u = sample_param_U(U[t, :, :]) λ_v, μ_v = sample_param_V(V[t, :, :]) # U ͷαϯϓϦϯά for i in 1:N U[t+1, i, :] = sample_U(R[i, :], V[t, :, :], λ_u, μ_u) # V ͷαϯϓϦϯά for j in 1:M V[t+1, :, j] = sample_V(R[:, j], U[t+1, :, :], λ_v, μ_v) return U, V end
18. ### BPMF࣮૷ͷ؆қϕϯνϚʔΫ ࣮ߦ࣌ؒ ഒ཰ Python (NumPy, SciPy) 2382s 1.0 Julia (Python

ͱಉ༷࣮૷) 122s 19.5 Julia (@inline ౳Ͱ࠷దԽ) 40s 59.5 • Dataset: MovieLens 100k (100k ratings, 9000 movies, 600 users) • Environment: MBP2017 (2.9 GHz Core i7), 1 process • Parameters: 10 factors, 100 samples
19. ### ຊ൪Ͱ Julia Λ࢖͏ͨΊʹ • MLγεςϜ͸΄΅͢΂ͯ Docker ίϯςφԽ • όονॲཧͷ֤εςοϓΛผίϯςφͰ࣮ߦ →

਺஋ܭࢉ෦෼͚ͩ Julia / DBೖग़ྗ౳͸ Python DB → (Python) → CSV → (Julia) → CSV → (Python) → DB Julia ෦෼͸υϝΠϯґଘͷॲཧ͕ͳ͍ͷͰOSSԽ΋Մೳʁ
20. ### ෆศͩͬͨ͜ͱ • ύοέʔδϚωʔδϟ͕ශऑ • sum , mean ͕ܕʹΑͬͯ஗͘ͳΔ • DataFrame

ϥΠϒϥϦͱݴޠຊମͷόʔδϣϯ૬ੑ • ... → ࣮͸ 2018/08 ͷ Julia 1.0 Ͱେ෯ʹվળ ͜ͷ࿩͸·ͨޙ೔
21. ### ·ͱΊ • MCMCͷΑ͏ͳԋࢉεςοϓ਺ͷଟ͍ஞ࣍ܭࢉ͸ Python ͷ೰ΈͲ͜Ζ • ਺஋ΞϧΰϦζϜͷ࣮૷Ͱ Julia ͸ Python

ͷ༗ྗͳ୅ସʹͳΓͦ͏ • ࠓ೔΋ݩؾʹຊ൪Ͱ Julia ͕ಈ͍͍ͯ·͢ ※BPMFͷJulia࣮૷͸ͦͷ͏ͪެ։͍ͨ͠