Slide 1

Slide 1 text

.ZEBZ,,#09 4VSWJWBM-PHBTB%BUB4DJFOUJTU 羅經凱 副理理, KKStream Data Analytics, Nov. 28, 2017 “資料科學家 1000 天⽣生存⽇日誌” 中央⼤大學, CE6143 - Introduction to Data Science

Slide 2

Slide 2 text

KKBOX since 2004 Music streaming service operated in Taiwan, Hong Kong, Japan, Singapore, and Malaysia. We have rich content (40M high quality songs) and deliver customers personalized experiences.

Slide 3

Slide 3 text

Next wave: Video since 2016 B2C service, aim to provide new TV experience. B2B service, cloud based video solutions to provide the best video experiences that engage your valuable customers on every screen.

Slide 4

Slide 4 text

Me • 2010, KDD Cup Champion as a member • 2014, NTU EE PhD. • 2014, KKBOX Data scientist • 2015, KKBOX DS team lead • 2017, KKStream Data team lead

Slide 5

Slide 5 text

What I do in KKBOX • [Consultant] Support business decision making • strategy to expand video content library? 
 a trade off between price and customer satisfaction • [Developer] Enhance product perceived quality • system to deliver personalized experience? 
 recommender system, notification optimization, …

Slide 6

Slide 6 text

https://media.netflix.com/en/company-blog/the-power-of-a-picture

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Better Thumbnail

Slide 10

Slide 10 text

1min Summary: 女王偵訊室 S1E1 /Emergency Interrogation Room S1E1

Slide 11

Slide 11 text

Recommender: More like this

Slide 12

Slide 12 text

3 Stages of Me Fledgeling: how I dig out insight Collaborating: how I work with others Advocating: how I ask others to enjoy us

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

Collaborating Data Aggregating  Data Mining  Application development 

Slide 15

Slide 15 text

Proof of Concept

Slide 16

Slide 16 text

Performance Eval.

Slide 17

Slide 17 text

Fledgling Fledgeling: how I dig out insight

Slide 18

Slide 18 text

User Behavior Analytics first step to explore

Slide 19

Slide 19 text

⼤大家如何探索新⾳音樂?

Slide 20

Slide 20 text

Xiao Hu, Jin Ha Lee and Leanne Ka Yan Wong (2014), Music Information Behaviors and System Preferences of University Students in Hong Kong [Citation 174] JH Lee, JS Downie (2004), Survey of music information needs, uses, and seeking behaviours: preliminary findings 52.5% (31% in 2004) by the popularity 57.4% by recommendations from other people survey in HK, 2014 ⼤大家如何探索新⾳音樂?

Slide 21

Slide 21 text

Social influence is great, and so is popularity. Xiao Hu, Jin Ha Lee and Leanne Ka Yan Wong (2014), Music Information Behaviors and System Preferences of University Students in Hong Kong [Citation 174] JH Lee, JS Downie (2004), Survey of music information needs, uses, and seeking behaviours: preliminary findings 52.5% (31% in 2004) by the popularity 57.4% by recommendations from other people survey in HK, 2014 ⼤大家如何探索新⾳音樂?

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

科技 始終來來⾃自於惰性

Slide 24

Slide 24 text

0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 play count song number 2015 2008 2004 播放次數 歌曲比例例

Slide 25

Slide 25 text

犧牲的就是 ⾳音樂多樣性

Slide 26

Slide 26 text

如何刺刺激⽤用⼾戶脫離同溫層?

Slide 27

Slide 27 text

隨機挑⼀一張他沒聽過的啊

Slide 28

Slide 28 text

推薦的理理由是重要的

Slide 29

Slide 29 text

試試看⼯工⼈人智慧 那麼...

Slide 30

Slide 30 text

超強的編輯部⾨門⽀支援

Slide 31

Slide 31 text

太累了了。還有別招嗎? 那麼試試看⼈人⼯工智慧

Slide 32

Slide 32 text

−50 −25 0 25 50 −50 −25 0 25 50 dim1 dim2 Cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Slide 33

Slide 33 text

−50 −25 0 25 50 −50 −25 0 25 50 dim1 dim2 Cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 張學友, 張宇, 信樂團,

Slide 34

Slide 34 text

−50 −25 0 25 50 −50 −25 0 25 50 dim1 dim2 Cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 張學友, 張宇, 信樂團, 范逸⾂臣, 陶吉吉, 蕭 彭佳慧, ⿑齊秦, 杜德偉, 周杰倫倫, 陳零九, 無印良品, 嚴爵 MC Hot Dog, 張震嶽, 謝和弦 MP魔幻⼒力力量量, 黃鴻

Slide 35

Slide 35 text

−50 −25 0 25 50 Cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 並沒有跳出同溫層太遠

Slide 36

Slide 36 text

Art is how we decorate space Music is how we decorate time

Slide 37

Slide 37 text

Listening with Purpose

Slide 38

Slide 38 text

通勤勤中 勉勉強中 休息中 Purpose form the Style of living

Slide 39

Slide 39 text

Two Different Subjects 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 Monday Tuesday Wednesday Thursday Friday Saturday Sunday weekday hour 50 100 150 200 250 acts 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 Monday Tuesday Wednesday Thursday Friday Saturday Sunday weekday hour 50 10 15 acts 深夜時段 ⼯工作時段 4VCKFU" 4VCKFU#

Slide 40

Slide 40 text

Do users listen regularly? Trace: users who purchase with mycard credits Y-axis 聆聽時間 X-axis ⼀一週內的 168 ⼩小時 0 50 150 250 User 67158956 hours in a week usage 24hr Mon Wed Fri User A

Slide 41

Slide 41 text

0 50 150 250 User 67158956 hours in a week usage 24hr Mon Wed Fri 0 100 200 User 8729390 hours in a week usage 24hr Mon Wed Fri 0 50 150 User 21570083 hours in a week usage 24hr Mon Wed Fri 0 50 150 User 21566513 hours in a week usage 24hr Mon Wed Fri 0 50 150 250 User 21574953 hours in a week usage 24hr Mon Wed Fri 0 100 200 User 9058153 hours in a week usage 24hr Mon Wed Fri 0 50 150 User 69277857 hours in a week usage Mon Wed Fri 0 50 100 150 User 11757913 hours in a week usage Mon Wed Fri 0 50 150 User 44551330 hours in a week usage Mon Wed Fri 規律律 不規律律

Slide 42

Slide 42 text

24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 16 hr 24 hr 24 hr 24 hr 24 hr 23 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 25 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 26 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 24 hr 23 hr 多數⽤用⼾戶 有週期性

Slide 43

Slide 43 text

0 100 300 Group 1: 5.8% hours in a day usage 0 6 12 18 0 200 400 Group 2: 7.3% hours in a day usage 0 6 12 18 0 100 200 300 Group 3: 11.8% hours in a day usage 0 6 12 18 0 100 200 300 Group 4: 16.0% hours in a day usage 0 6 12 18 0 100 300 Group 5: 12.8% hours in a day usage 0 6 12 18 0 100 300 Group 6: 13.4% hours in a day usage 0 6 12 18 0 100 300 Group 7: 14.2% hours in a day usage 0 6 12 18 0 100 300 Group 8: 12.4% hours in a day usage 0 6 12 18 0 100 200 300 Group 9: 6.3% hours in a day usage 0 6 12 18 多種⽣生活型態

Slide 44

Slide 44 text

0 200 400 Group 2: 7.3% hours in a day usage 0 6 12 18 usage Group 5: 12.8% 通勤勤族 使⽤用⾼高峰落落於早晨八點與夜間六點 ⾼高峰持續時間短,持續僅 20 — 30 分鐘 average median

Slide 45

Slide 45 text

0 100 200 300 Group 4: 16.0% hours in a day usage 0 6 12 18 usage Group 7: 14.2% 使⽤用⾼高峰始於 10:00 到 18:00 ⾼高峰持續時間長,持續僅 4 - 5 ⼩小時 辦公族

Slide 46

Slide 46 text

Popularity Time-sensitive

Slide 47

Slide 47 text

Data Visualization preference as vectors

Slide 48

Slide 48 text

How you describe pref. • Latent Representation • A multi-dimensional vector learned from crowd, is specified by a point in a latent space • The similarity between two objects is reflected in their distance in the latent space

Slide 49

Slide 49 text

Word to Vector • 字以空間中的點呈現,並保持以下特性 • 意義相近的字,相距近 • 字之間的相對⽅方向保留留其意義,可以做出向量量操作。 • King - Man + Woman = Queen

Slide 50

Slide 50 text

聆聽歷史→⽂文字段落落

Slide 51

Slide 51 text

Music Experience as Words • 連續聆聽的歷程,如同句句⼦子。 • 「聆聽者」,「曲⼦子」都視為字。 ⽤用⼾戶 歌曲 ⽤用⼾戶 歌曲

Slide 52

Slide 52 text

Constructing DeepWalk Graph

Slide 53

Slide 53 text

Including Session

Slide 54

Slide 54 text

Multiple Sessions

Slide 55

Slide 55 text

Multiple Users

Slide 56

Slide 56 text

In 2-D Latent Space Users Songs 蘇打綠 陳綺貞 五⽉月天 John Mayer OneRepublic Maroon 5

Slide 57

Slide 57 text

資料→網路路→向量量

Slide 58

Slide 58 text

Visualisation Framework • Global Trend • Album clusters, • Artist clusters, • … • Individual Preference • Diversity of preference • Factors related to preference • …

Slide 59

Slide 59 text

Representation (TW)

Slide 60

Slide 60 text

An Example

Slide 61

Slide 61 text

Relaxing songs Japanese drama songs Western drama songs Mandarin drama songs

Slide 62

Slide 62 text

Considering time (session)

Slide 63

Slide 63 text

Day and Night

Slide 64

Slide 64 text

Relaxing songs for baby Mandarin pop songs

Slide 65

Slide 65 text

Considering device

Slide 66

Slide 66 text

Account sharing?

Slide 67

Slide 67 text

Korean and Western pop songs Mandarin old songs

Slide 68

Slide 68 text

Personal Preference • 同時會擁有單⼀一⾳音樂喜好,與多種⾳音樂喜好的⽤用⼾戶 • 多⼈人共享帳號是可以被偵測的。

Slide 69

Slide 69 text

More applications

Slide 70

Slide 70 text

song / artist / genre

Slide 71

Slide 71 text

Advocating: how I ask others to enjoy us Advocating

Slide 72

Slide 72 text

The easiest way to win an argument, helping him see things from your perspective

Slide 73

Slide 73 text

Data + Game → Arouse the awareness and interests of data we have…

Slide 74

Slide 74 text

提供數萬⽤用⼾戶每段觀影體驗的起始時間以及歷時。參參賽 者將以前四個⽉月的資料為分析基⽯石,推測⽤用⼾戶在接下來來 ⼀一個⽉月花最長時間觀看的是哪部劇

Slide 75

Slide 75 text

This 14-day game has 63 teams 81 players 334 downloads 835 submissions

Slide 76

Slide 76 text

Gains & Insightful findings from this public game

Slide 77

Slide 77 text

Happy Boss :) Hardworking Boss :)

Slide 78

Slide 78 text

Champion’s secret sauce

Slide 79

Slide 79 text

First-step Observation In training dataset, 27% customers’ labels = the last one saw in history views 37% customers’ labels = one appeared in history views 18% customers’ labels = one never appeared in training set

Slide 80

Slide 80 text

Naïve Baseline Just fill in the last title id in view history for each individual You get 27%, namely, rank 20th

Slide 81

Slide 81 text

Comments from a participant

Slide 82

Slide 82 text

Conclusion * 不斷衡量量⾃自⼰己,才能知道⾃自⼰己是 否進步! * 資料未被妥善利利⽤用,就是浪費。 說服更更多夥伴參參與吧!

Slide 83

Slide 83 text

One more thing

Slide 84

Slide 84 text

I'm gonna make him a notification he can't refuse http://bit.ly/kktv_dg_1711

Slide 85

Slide 85 text

No content

Slide 86

Slide 86 text

No content