Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Applications and Challenges of Streaming Servic...

Applications and Challenges of Streaming Service Data

As transferring insight from data to an application would be a challenge, persuading your team to deliver it as a product service would be more even challenging!

Jing-Kai Lou

July 23, 2017
Tweet

More Decks by Jing-Kai Lou

Other Decks in Science

Transcript

  1. Self Introduction • 羅經凱, Jing-Kai Lou • Data Scientist, KKStream,

    2016 - now • Data Scientist, KKBOX, 2014 - 2016 Ronald Coase: “If you torture the data long enough, it will confess.”
  2. Communicate A way to find niche market How our title

    performs Which kinds are alluring How to measure the satisfaction instead of popularity? How users binge-watch This measure leads us to find fascinating titles Better user preference understanding Observe & Measure Analysis & Forecast Smart content purchase 智慧擴充片庫
  3. Communicate Be confident to release better version Which rec. sys

    performs better? CTR of related items Apply A/B testing To tell which one is better Product Optimization 產品優化 Observe & Measure Analysis & Forecast
  4. Xiao Hu, Jin Ha Lee and Leanne Ka Yan Wong

    (2014), Music Information Behaviors and System Preferences of University Students in Hong Kong [Citation 174] JH Lee, JS Downie (2004), Survey of music information needs, uses, and seeking behaviours: preliminary findings 52.5% (31% in 2004) by the popularity 57.4% by recommendations from other people survey in HK, 2014 ⼤大家如何探索新⾳音樂?
  5. Social influence is great, and so is popularity. 52.5% (31%

    in 2004) by the popularity 57.4% by recommendations from other people survey in HK, 2014 ⼤大家如何探索新⾳音樂? Xiao Hu, Jin Ha Lee and Leanne Ka Yan Wong (2014), Music Information Behaviors and System Preferences of University Students in Hong Kong [Citation 174] JH Lee, JS Downie (2004), Survey of music information needs, uses, and seeking behaviours: preliminary findings
  6. 0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6

    0.8 1.0 play count song number 播放的比重逐步趨向熱⾨門 播放次數 歌曲比例例
  7. −50 −25 0 25 50 −50 −25 0 25 50

    dim1 dim2 Cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  8. −50 −25 0 25 50 −50 −25 0 25 50

    dim1 dim2 Cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 張學友, 張宇, 信樂團, 陳曉東, 劉劉德華
  9. −50 −25 0 25 50 −50 −25 0 25 50

    dim1 dim2 Cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 張學友, 張宇, 信樂團, 陳曉東, 劉劉德華 范逸⾂臣, 陶吉吉, 蕭敬騰, 陳 奕迅 彭佳慧, ⿑齊秦, 杜德偉, 吳宗憲 周杰倫倫, 陳零九, 無印良品, 嚴爵 MC Hot Dog, 張震嶽, 謝和弦 MP魔幻⼒力力量量, 黃鴻升, 蔡旻佑,
  10. −50 −25 0 25 50 Cluster 1 2 3 4

    5 6 7 8 9 10 11 12 13 14 15 有沒有更更好的⽅方法哩?
  11. Two Different Subjects 23 22 21 20 19 18 17

    16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 Monday Tuesday Wednesday Thursday Friday Saturday Sunday weekday hour 50 100 150 200 250 acts 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 Monday Tuesday Wednesday Thursday Friday Saturday Sunday weekday hour 50 100 150 acts 深夜時段 ⼯工作時段 4VCKFU" 4VCKFU#
  12. Do users listen regularly? Trace: users who purchase with mycard

    credits Y-axis 聆聽時間 X-axis ⼀一週內的 168 ⼩小時 0 50 150 250 User 67158956 hours in a week usage 24hr Mon Wed Fri 0 100 200 usage User A
  13. 0 50 150 250 User 67158956 hours in a week

    usage 24hr Mon Wed Fri 0 100 200 User 8729390 hours in a week usage 24hr Mon Wed Fri 0 50 150 User 21570083 hours in a week usage 24hr Mon Wed Fri 0 50 150 User 21566513 hours in a week usage 24hr Mon Wed Fri 0 50 150 250 User 21574953 hours in a week usage 24hr Mon Wed Fri 0 100 200 User 9058153 hours in a week usage 24hr Mon Wed Fri 0 50 150 User 69277857 hours in a week usage Mon Wed Fri 0 50 100 150 User 11757913 hours in a week usage Mon Wed Fri 0 50 150 User 44551330 hours in a week usage Mon Wed Fri 規律律 不規律律
  14. 24 hr 24 hr 24 hr 24 hr 24 hr

    24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 16 hr 24 hr 24 hr 24 hr 24 hr 23 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 25 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 26 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 23 hr 多數⽤用⼾戶 有週期性
  15. 0 100 300 Group 1: 5.8% hours in a day

    usage 0 6 12 18 0 200 400 Group 2: 7.3% hours in a day usage 0 6 12 18 0 100 200 300 Group 3: 11.8% hours in a day usage 0 6 12 18 0 100 200 300 Group 4: 16.0% hours in a day usage 0 6 12 18 0 100 300 Group 5: 12.8% hours in a day usage 0 6 12 18 0 100 300 Group 6: 13.4% hours in a day usage 0 6 12 18 0 100 300 Group 7: 14.2% hours in a day usage 0 6 12 18 0 100 300 Group 8: 12.4% hours in a day usage 0 6 12 18 0 100 200 300 Group 9: 6.3% hours in a day usage 0 6 12 18
  16. % 18 0 200 400 Group 2: 7.3% hours in

    a day usage 0 6 12 18 0 100 200 300 usage 0 % Group 5: 12.8% 通勤勤族 使⽤用⾼高峰落落於早晨八點與夜間六點 ⾼高峰持續時間短,持續僅 20 — 30 分鐘 average median
  17. How you describe pref. • Latent Representation • A multi-dimensional

    vector learned from crowd, is specified by a point in a latent space • The similarity between two objects is reflected in their distance in the latent space
  18. 精準 (Accuracy) / 多元 (Diversity) / 新穎 (Novelty) 精準,泛指根據⽤用⼾戶過去歷史記錄,是否可以預測到⽤用⼾戶下⼀一部看到什什麼?舉例例來來說,假使有六個⽉月 的資訊,我們隱藏後兩兩個⽉月的資訊,單純憑藉前四個⽉月的資訊預測後兩兩個⽉月的發展。

    多元,推薦應避免單⼀一⼝口味的推薦,充分展現我們⼿手上的資源(coverage)。我知道這⽤用⼾戶喜歡看超級 英雄,但是不能永遠只推薦超級英雄的電影。 新穎,拋出的新物品(新劇,或者是鮮少被⼈人看過的影劇)能夠使⽤用⼾戶產⽣生正⾯面的情緒反應。這也是 ⽬目前最難的部分,也正是⼤大家正在努⼒力力的議題之⼀一。(學者認為單純仰賴 CTR ,則放⼤大了了推薦系統的 效果,⾒見見⽂文)
  19. This 14-day game has 63 teams 81 players 334 downloads

    835 submissions Internal champion
  20. First-step Observation In training dataset, 27% customers’ labels = the

    last one saw in history views 37% customers’ labels = one appeared in history views 18% customers’ labels = one never appeared in training set
  21. Naïve Baseline Just fill in the last title id in

    view history for each individual You get 27%, namely, rank 20th
  22. Transition Matrix In training data, we observe how users view

    over titles time 甄嬛傳 甄嬛傳 甄嬛傳 琅琊榜 ⽉月薪嬌妻
  23. Benefits from Transition Matrix As Collaborate filtering and Matrix Factorization

    would not obey our finding (the last one is the answer in most cases). The transition matrix method supports the our finding! So, base on it, we have high confidence to improve the score higher than baseline. 0.27421
  24. Next Observation Consider it as a sequential problem, we overlook

    the spent time on each title. We find individuals spent time differently on titles. For some, they only view no longer than 5 mins, and never watch it again. Longer spent time = Favorite