Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PySparkを使った レコメンドアルゴリズムの改良

nashibao
January 16, 2015

PySparkを使った レコメンドアルゴリズムの改良

"PyData Tokyo Meetup #3 - 分散型機械学習 with PySpark"で発表して来ました!
@nashibao
株式会社プレイド
http://plaid.co.jp

nashibao

January 16, 2015
Tweet

More Decks by nashibao

Other Decks in Programming

Transcript

  1. MLlib͸ɺϞσϧɺΞϧΰϦ ζϜ͕·ͩ·ͩগͳ͍ Classification Logistic Regression Naive Bayes SVM Clustering Gaussian

    Mixture K-means Regression Linear Regression Ridge Regression Lasso Regression Recommendation Matrix Factorization Tree Decision Tree Gradient Boosted Tree Random Forest
  2. MLlib͸ɺϞσϧɺΞϧΰϦ ζϜ͕·ͩ·ͩগͳ͍ Classification Logistic Regression Naive Bayes SVM Clustering Gaussian

    Mixture K-means Regression Linear Regression Ridge Regression Lasso Regression Recommendation Matrix Factorization Alternating Least Square Tree Decision Tree Gradient Boosted Tree Random Forest
  3. def update_als(x, W, H, V): w = W[x, :] v

    = V[x, :] m = H.shape[0] k = H.shape[1] HtH = H.T * H HtVt = H.T * v.T for i in range(k): HtH[i, i] += LAMBDA * m return np.linalg.solve(HtH, HtVt) def update_gmu(x, W, H, V): w = W[x, :] v = V[x, :] return multiply(w, (v * H) / (w * (H.T * H) + 10**-9)).T ٖࣅత
  4. &OFSHZ      *UFSBUJPO "-4 "-4 (.6

    ݁Ռ ALS GMU 収束 EMR m1.xlarge x 10 ີཚ਺ߦྻ 10000 x 10000
  5. 1. PySparkͷAPI៉ྷ/࢖͍΍͍͢ʂࣧౄ͢Β֮͑Δʂ 2. Interactiveʹग़དྷͯɺฦΓ΋σʔλ͕খ͚͞Ε͹଎͍ͷͰα ΫαΫ։ൃՄೳɽnumpy࢖͑ΔʂϑΥʔʂ 3. MLlibʹͪΐͬͱײ͡ΔMahoutͱಉ͡Α͏ͳҋ - [ҋ] ݸʑͷ࣮૷ͰΠϯλʔϑΣʔε͕͋·Γ౷Ұ͞Ε͍ͯͳ͍ɽΞϧΰϦζϜͱ

    ͦΕҎ֎͕੾Γ཭͞Ε͍ͯͳ͍ɽΞϧΰϦζϜ͕͋·Γ૿͍͑ͯͳ͍ɽ - اۀ಺ར༻ʹ͸ɺMLlibΛࢀߟఔ౓ʹ࣮૷ͨ͠ํ͕਎ܰʁ - ݸਓతʹ͸ScalaΑΓPythonଆʹ࣮૷ஔ͖͍ͨؾ͕͢Δ ײ૝