プログラミングLT_2019_はじめての推薦システム.pdf / introduction-to-recommender-system

プログラミングLT_2019_はじめての推薦システム.pdf / introduction-to-recommender-system

2019/4/30 の プログラミングLT 2019 (https://npoint.connpass.com/event/118783/) (http://prolt.n-point.pw/about/) での発表内容です。

推薦システムのイントロダクション的ななにか。

推薦システムとは?的な話と,
シンプルなユーザベースの協調フィルタリングを実装つきで説明する内容となっています。

3c0649db1ae5ca0ae57c76037243f501?s=128

Yuya Matsumura

April 30, 2019
Tweet

Transcript

  1. 2.

    ✓ Yuya Matsumura ✓ Software Engineer ✓ Wantedly, Inc. Recommendation

    Team ✓ Interested in Information Retrieval, Machine Learning Self-Introduction @yu-ya4 @yu__ya4
  2. 6.

    It is often necessary to make choices without sufficient personal

    experience of the alternatives. In everyday life, we rely on recommendations from other people either by word of mouth, recommendation letters, movie and book reviews printed in newspapers, or general surveys such as Zagat’s restaurant guides. Recommender systems assist and augment this natural social process. ਪનγεςϜͱ͸ [Resnick+ 97] ࣗ෼ͷܦݧ͔ΒͷΈͰ͸͋·ΓΑ͘Θ͔Βͳ͍΋ͷͷத͔ΒɼͲ͏ ͯ͠΋ͲΕ͔Λબ͹ͳ͚Ε͹ͳΒͳ͍ͱ͍͏͜ͱ͸Α͋͘Δɻ͜ͷ Α͏ͳࡍΘͨͨͪ͠͸ɼޱίϛɼਪનঢ়ɼ ৽ฉͷॻධ΍өըධɼϨ ετϥϯΨΠυͳͲͷଞਓ͔ΒͷਪનʹཔΔ͜ͱΛ ೔ৗతʹߦͬͯ ͍Δɻ ਪનγεςϜͱ͸ɼ͜͏ͨࣾ͠ձͰී௨ʹߦΘΕ͍ͯΔҰ࿈ ͷߦҝΛิॿͨ͠Γɼଅਐͨ͠Γ͢Δ΋ͷͰ͋Δɻ [Resnick 97] P. Resnick and H. R. Varian. Recommender systems. Communications of the ACM, Vol. 40, No. 3, pp. 56–58, 1997.
  3. 7.

    Recommenders: Tools to help identify worthwhile stuff ਪનγεςϜͱ͸ [Konstan+ 03]

    ਪનγεςϜ: ͲΕ͕Ձ஋ͷ͋Δ΋ͷ͔ಛఆ͢ΔͷΛॿ͚Δಓ۩ [Konstan 03] J. A. Konstan and J. Riedl. Recommender systems: Collaborating in commerce and communities. In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, Tutorial, 2003.
  4. 11.

    Recommenders: Tools to help identify worthwhile stuff ࠶ܝɿਪનγεςϜͱ͸ [Konstan+ 03]

    ਪનγεςϜ: ͲΕ͕Ձ஋ͷ͋Δ΋ͷ͔ಛఆ͢ΔͷΛॿ͚Δಓ۩ [Konstan 03] J. A. Konstan and J. Riedl. Recommender systems: Collaborating in commerce and communities. In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, Tutorial, 2003.
  5. 12.

    ਪનγεςϜͷٕज़ཁૉ 㾎 ώϡʔϚϯΠϯλϑΣʔε w ΞΠςϜͷ಺༰΍ਪનʹؔ࿈ͨ͠৘ใΛఏࣔ w ਪનʹඞཁͳ৘ใΛར༻ऀ͔Βత֬ʹऩू 㾎 ػցֶशɾ౷ܭత༧ଌɾ৘ใݕࡧʢਪનΞϧΰϦζϜʣ w

    ऩूͨ͠σʔλʹجͮ͘ਪન৘ใͷੜ੒ w ໨తʹԠͨ͡ਪન৘ใͷม׵ 㾎 σʔλϕʔεɾฒྻܭࢉɾωοτϫʔΫ w ਪનʹඞཁͳ্෍Λ஝ੵɾॲཧɻྲྀ௨ͤ͞Δج൫ http://www.kamishima.net/archive/recsysdoc.pdf
  6. 13.

    ਪનγεςϜͷٕज़ཁૉ 㾎 ώϡʔϚϯΠϯλϑΣʔε w ΞΠςϜͷ಺༰΍ਪનʹؔ࿈ͨ͠৘ใΛఏࣔ w ਪનʹඞཁͳ৘ใΛར༻ऀ͔Βత֬ʹऩू 㾎 ػցֶशɾ౷ܭత༧ଌɾ৘ใݕࡧʢਪનΞϧΰϦζϜʣ w

    ऩूͨ͠σʔλʹجͮ͘ਪન৘ใͷੜ੒ w ໨తʹԠͨ͡ਪન৘ใͷม׵ 㾎 σʔλϕʔεɾฒྻܭࢉɾωοτϫʔΫ w ਪનʹඞཁͳ্෍Λ஝ੵɾॲཧɻྲྀ௨ͤ͞Δج൫ http://www.kamishima.net/archive/recsysdoc.pdf ͜͜ʹ͍ͭͯ࿩͠·͢"
  7. 17.

    ಺༰ϕʔεϑΟϧλϦϯά ϢʔβͷϓϩϑΝΠϧͱΞΠςϜͷಛ௃ͱͷϚονϯάʹΑΓɼ Ϣʔβͷᅂ޷ʹ߹க͢ΔΞΠςϜΛਪન ϢʔβϓϩϑΝΠϧ ✖ ΞΠςϜͷಛ௃ ࿀Ѫ ϛεςϦʔ 90೥୅ͷ࡞඼ ࠷৽ͷ࡞࿀ѪՈ

    ୹ฤू ௕ฤ δϟϯϧ: SF ࡞ऀ: ాத ଠ࿠ ೉қ౓: ೉͍͠ Ϣʔβ͕޷Ή(աڈʹ޷Μͩ)ΞΠςϜͷಛ௃ δϟϯϧɼ࡞ऀɼՁ֨ͳͲͷΞΠςϜͷϝλσʔλ΍ɼ ղੳʹΑΓநग़ͨ͠ಛ௃ δϟϯϧ: ϛεςϦʔ ࡞ऀ: ླ໦ Ұ࿠ ೉қ౓: қ͍͠
  8. 27.

    ⭕ ⭕ ⭕ ⭕ ⭕ ⭕ ⭕ ʁ ⭕ Ξ

    Π ς Ϝ Ϣʔβ 1. ϢʔβͲ͏͠ͷྨࣅ౓Λܭࢉͯ͠ɼᅂ޷ͷࣅ͍ͯΔϢʔβΛݟ͚ͭΔɻ ✖ ✖ ✖ ✖ ✖ ⭕ ⭕ ⭕ ʁ ✖ ⭕ ✖ A B C D E ứ ϋ Ϧ ϙ λ Ừ ứ ޿ ࣙ ԓ Ừ ứ ౷ ܭ ֶ ೖ ໳ Ừ ứ ࢦ ྠ ෺ ޠ Ừ ứ φ ϧ χ Ξ Ừ - - - ✖ # ਪન͞ΕͨΞΠςϜʹ͍ͭͯͷ৘ใͷΈଘࡏ͢Δɻ # ߪೖͨ͠Β 1, ߪೖ͠ͳ͔ͬͨΒ 0 data = { "A": { "φϧχΞ": 1, "ϋϦϙλ": 1, "޿ࣙԓ": 0, "ࢦྠ෺ޠ": 1, "౷ܭֶೖ໳": 0 }, "B": { "ϋϦϙλ": 0, "޿ࣙԓ": 1, "ࢦྠ෺ޠ": 0, "౷ܭֶೖ໳": 1 }, "C": { "φϧχΞ": 1, "޿ࣙԓ": 0, "ࢦྠ෺ޠ": 1 }, "D": { "φϧχΞ": 1, "ϋϦϙλ": 0, "޿ࣙԓ": 1, "౷ܭֶೖ໳": 1 }, "E": { "φϧχΞ": 0, "޿ࣙԓ": 1, "ࢦྠ෺ޠ": 1, "౷ܭֶೖ໳": 0 }, } ⭕ɿߪೖ ✖ɿਪન͞Εͨʢݟͨʣ͕ະߪೖ - ɿະਪનɻݟͨ͜ͱͳ͍ɻ
  9. 28.

    # δϟοΧʔυ܎਺Λܭࢉ def jaccard_similarity(x, y): intersection = len(x.intersection(y)) union =

    len(x.union(y)) return float(intersection / union) ࠓճ͸ྨࣅ౓ई౓ͷܭࢉʹδϟοΧʔυ܎਺ʢJaccard indexʣΛར༻ →ʮ̎ͭͷू߹ʹؚ·Ε͍ͯΔཁૉͷ͏ͪڞ௨ཁૉ͕઎ΊΔׂ߹ʯΛද͢ɻ Ϣʔβ͕ߪೖͨ͠ΞΠςϜͷू߹Ͳ͏͠ͷ δϟοΧʔυ܎਺Λྨࣅ౓ʹ → ಉ͡ΞΠςϜΛߪೖ͍ͯ͠ΔϢʔβ΄Ͳྨࣅ a = set(key for key, val in data["A"].items() if val == 1) c = set(key for key, val in data["C"].items() if val == 1) >>> a {'φϧχΞ', 'ࢦྠ෺ޠ', 'ϋϦϙλ'} >>> c {'φϧχΞ', 'ࢦྠ෺ޠ'} >>> jaccard_similarity(a, c) 0.6666666666666666 1. ϢʔβͲ͏͠ͷྨࣅ౓Λܭࢉͯ͠ɼᅂ޷ͷࣅ͍ͯΔϢʔβΛݟ͚ͭΔɻ
  10. 29.

    # աڈͷߦಈཤྺʹج͍ͮͨ 2 Ϣʔβͷྨࣅ౓Λܭࢉ def get_similarity(user1, user2): # Ϣʔβͷߦಈཤྺ history1

    = data[user1] history2 = data[user2] # ྆ํͱ΋ͷϢʔβʹਪન͞ΕͨΞΠςϜͷू߹ΛͱΔɻ recommended_items1 = set(history1.keys()) recommended_items2 = set(history2.keys()) recommended_both = recommended_items1.intersection(recommended_items2) #྆ํͱ΋ʹਪન͞ΕͨΞΠςϜ͕ͳ͚Ε͹ྨࣅ౓ 0 ͱ͢Δɻ if len(recommended_both) == 0: return 0.0 # ྆ํͱ΋ͷϢʔβʹਪન͞Ε͍͔ͯͯͭɼߪೖͨ͠ΞΠςϜͷू߹ΛͱΔɻ possitive_items1 = set([key for key, val in history1.items() if key in recommended_both and val == 1]) possitive_items2 = set([key for key, val in history2.items() if key in recommended_both and val == 1]) return jaccard_similarity(possitive_items1, possitive_items2) ϢʔβͲ͏͠ͷྨࣅ౓ͷܭࢉ 1. ϢʔβͲ͏͠ͷྨࣅ౓Λܭࢉͯ͠ɼᅂ޷ͷࣅ͍ͯΔϢʔβΛݟ͚ͭΔɻ
  11. 30.

    ⭕ ⭕ ⭕ ⭕ ⭕ ⭕ ⭕ ʁ ⭕ Ξ

    Π ς Ϝ Ϣʔβ ✖ ✖ ✖ ✖ ✖ ⭕ ⭕ ⭕ ʁ ✖ ⭕ ✖ A B C D ứ ϋ Ϧ ϙ λ Ừ ứ ޿ ࣙ ԓ Ừ ứ ౷ ܭ ֶ ೖ ໳ Ừ ứ ࢦ ྠ ෺ ޠ Ừ ứ φ ϧ χ Ξ Ừ - - - ✖ sim = get_similarity("C", "E") >>> sim 0.3333333333333333 E 1. ϢʔβͲ͏͠ͷྨࣅ౓Λܭࢉͯ͠ɼᅂ޷ͷࣅ͍ͯΔϢʔβΛݟ͚ͭΔɻ ྆ํͱ΋ͷϢʔβʹਪન͞Ε͍͔ͯͯͭɼ ߪೖ͞ΕͨΞΠςϜͷू߹Ͳ͏͠ͷδϟοΧʔυ܎਺Λྨࣅ౓ͱ͢Δ
  12. 32.

    def get_recommends(user): # ର৅Ϣʔβ͕·ͩਪન͞Ε͍ͯͳ͍ΞΠςϜͷू߹ɿਪનର৅ΞΠςϜ new_items = all_items - set(data[user].keys()) #

    ਪનର৅ΞΠςϜͷ༧ଌධՁ஋ΛೖΕΔശ sum_scores = {item: 0.0 for item in new_items} sum_sim = {item: 0 for item in new_items} # # ର৅ϢʔβҎ֎ͷϢʔβ others = list(data.keys()); others.remove(user) for other in others: sim = get_similarity(user, other) for new_item in new_items: if new_item in data[other]: # "ධՁ஋" × "ྨࣅ౓" Λਪન౓ͷείΞͱͯ͠ɼશϢʔβʹ͍ͭͯ߹ܭ͢Δ sum_scores[new_item] += data[other][new_item] * sim # ΞΠςϜ͝ͱͷϢʔβͷྨࣅ౓ͷ߹ܭΛܭࢉ͓͖ͯ͠ɼ্هͷείΞΛׂΔʢՃॏฏۉͷΠϝʔδʣ sum_sim[new_item] += sim recommends = {item: sum_scores[item] / sum_sim[item] for item in new_items} recommends = sorted(recommends.items(), key=lambda x:x[1], reverse=True) return recommends 2. ᅂ޷ͷࣅ͍ͯΔϢʔβͷߦಈཤྺ͔Βɼະ஌ͷΞΠςϜʹର͢Δ༧ଌධՁ஋Λܭࢉ͢Δɻ ྨࣅ౓ͷߴ͍ϢʔβͷධՁ΄ͲॏΈ͕ߴ͘ͳΔΑ͏ʹ༧ଌධՁ஋Λܭࢉ
  13. 33.

    ⭕ ⭕ ⭕ ⭕ ⭕ ⭕ ⭕ ʁ ⭕ Ξ

    Π ς Ϝ Ϣʔβ ✖ ✖ ✖ ✖ ✖ ⭕ ⭕ ⭕ ʁ ✖ ⭕ ✖ A B C D E ứ ϋ Ϧ ϙ λ Ừ ứ ޿ ࣙ ԓ Ừ ứ ౷ ܭ ֶ ೖ ໳ Ừ ứ ࢦ ྠ ෺ ޠ Ừ ứ φ ϧ χ Ξ Ừ - - - ✖ sim_A = get_similarity("C", "A") sim_B = get_similarity("C", "B") sim_D = get_similarity("C", "D") sim_E = get_similarity("C", "E") >>> sim_A, sim_B, sim_D, sim_E 1.0 0.0 0.5 0.3333333333333333 sco re C = sco re A * sim A + sco re B * sim B + sco re D * sim D + sco re E * sim E simA + simB + simD + simE = 0 * 1.0 + 1 * 0.0 + 1 * 0.5 + 0 * 0.333 1.0 + 0.0 + 0.5 + 0.333 = 0.2727... recommends = get_recommends("C") >>> recommends [ ('ϋϦϙλ', 0.6666666666666666), ('౷ܭֶೖ໳', 0.27272727272727276) ] 2. ᅂ޷ͷࣅ͍ͯΔϢʔβͷߦಈཤྺ͔Βɼະ஌ͷΞΠςϜʹର͢Δ༧ଌධՁ஋Λܭࢉ͢Δɻ
  14. 35.

    ⭕ ⭕ ⭕ ⭕ ⭕ ⭕ ⭕ ʁ ⭕ Ξ

    Π ς Ϝ Ϣʔβ ✖ ✖ ✖ ✖ ✖ ⭕ ⭕ ⭕ ʁ ✖ ⭕ ✖ A B C D E ứ ϋ Ϧ ϙ λ Ừ ứ ޿ ࣙ ԓ Ừ ứ ౷ ܭ ֶ ೖ ໳ Ừ ứ ࢦ ྠ ෺ ޠ Ừ ứ φ ϧ χ Ξ Ừ - - - ✖ recommends = get_recommends("C") >>> recommends [ ('ϋϦϙλ', 0.6666666666666666), ('౷ܭֶೖ໳', 0.27272727272727276) ] 3. ༧ଌධՁ஋ͷߴ͍΋ͷΛϢʔβʹਪન͢Δɻ ਪન!
  15. 37.

    ·ͱΊ 㾎 ਪનγεςϜͱ͸ʁ w ʮͲΕ͕Ձ஋ͷ͋Δ΋ͷ͔ಛఆ͢ΔͷΛॿ͚Δಓ۩ʯ w ώϡʔϚϯΠϯλϑΣʔεػցֶशɾ౷ܭత༧ଌɾ৘ใݕࡧσʔλϕʔεɾ ฒྻܭࢉɾωοτϫʔΫͳͲͷٕज़ཁૉ͔Β੒Γཱͭ 㾎 ਪનख๏ͷ෼ྨ

    w ϙϐϡϥϦςΟ಺༰ϕʔεڠௐϑΟϧλϦϯάϋΠϒϦου 㾎 ڠௐϑΟϧλϦϯάʢϢʔβϕʔεʣͷ࢓૊Έ w ࣅ͍ͯΔϢʔβΛݟ͚ͭΔʂ w ࣅ͍ͯΔϢʔβͷߦಈཤྺʹԠͯ͡ධՁ஋Λ༧ଌͯ͠ਪનʂ
  16. 38.

    Ref. [Resnick 97] P. Resnick and H. R. Varian. Recommender

    systems. Communications of the ACM, Vol. 40, No. 3, pp. 56–58, 1997. [Konstan 03] J. A. Konstan and J. Riedl. Recommender systems: Collaborating in commerce and communities. In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, Tutorial, 2003. http://www.kamishima.net/archive/recsysdoc.pdf https://www.slideshare.net/xamat/recsys-2014-tutorial-the-recommender-problem-revisited http://blog.brainpad.co.jp/entry/2017/02/03/153000 https://www.slideshare.net/KentaOku/ss-50762836 https://qiita.com/hik0107/items/96c483afd6fb2f077985