The most stars repositories
on Github?
spark
apache/spark ˒ 12.8k
incubator-predictionio
apache/incubator-predictionio ˒ 10.2k
playframework
playframework/playframework ˒ 9.3k
scala
scala/scala ˒ 8.2k
Slide 36
Slide 36 text
spark
apache/spark ˒ 12.8k
incubator-predictionio
apache/incubator-predictionio ˒ 10.2k
playframework
playframework/playframework ˒ 9.3k
scala
scala/scala ˒ 8.2k
The most stars repositories
on Github?
˒10.2k
Slide 37
Slide 37 text
What is PredictionIO?
Slide 38
Slide 38 text
Apache
PredictionIO?
Apache PredictionIO
(incubating) is an open source
Machine Learning Server built
on top of state-of-the-art open
source stack for developers
and data scientists create
predictive engines for any
machine learning task.
Slide 39
Slide 39 text
Apache PredictionIO
(incubating) is an open source
Machine Learning Server built
on top of state-of-the-art open
source stack for developers
and data scientists create
predictive engines for any
machine learning task.
Apache
PredictionIO?
࠷ઌͷΦʔϓϯιʔεΛ
߹Θͤͨػցֶशαʔό
Slide 40
Slide 40 text
Apache PredictionIO
(incubating) is an open source
Machine Learning Server built
on top of state-of-the-art open
source stack for developers
and data scientists create
predictive engines for any
machine learning task.
Apache
PredictionIO?
࠷ઌͷΦʔϓϯιʔεΛ
߹Θͤͨػցֶशαʔό
ͲΜͳػցֶशλεΫͰ
༧ଌΤϯδϯ͕ͭ͘ΕΔ
Slide 41
Slide 41 text
Apache PredictionIO let you
ର͝ͱʹςϯϓϨʔτΛ࡞Γɺ
͙͢ʹσϓϩΠͰ͖Δ
quickly build and deploy an engine as a web service on production with customizable templates;
ΫΤϦ͛ͯ݁ՌΛฦ͢API͕͋Δ
respond to dynamic queries in real-time once deployed as a web service;
Slide 42
Slide 42 text
Apache PredictionIO let you
ޡࠩͷௐɺධՁͷΈ͋Δ
evaluate and tune multiple engine variants systematically;
όον or ϦΞϧλΠϜͰ
ֶशσʔλΛొ͢ΔI/F͕͋Δ
unify data from multiple platforms in batch or in real-time for comprehensive predictive analytics;
PIO CLI
eventserver
Launch an Event Server
app
Manage apps that are used by the Event Server
build
Build an engine at the current
train
Kick off a training using an engine
deploy
Deploy an engine as an engine server
Slide 55
Slide 55 text
eventserver
app
build
train
deploy
Slide 56
Slide 56 text
Photo by Bernard Spragg. NZ
System
Architecture
Slide 57
Slide 57 text
System Architecture
Apache Hadoop up to 2.7.2
required only if YARN and HDFS are needed
Apache HBase up to 1.2.4
Apache Spark up to 1.6.3
for Hadoop 2.6
not Spark 2.x version
Elasticsearch up to 1.7.5
not the Elasticsearch 2.x version
Slide 58
Slide 58 text
No content
Slide 59
Slide 59 text
Storage roles
Meta Data Event Data Model Data
✓ ✓ ✓
✓ ✓*
✓
✓
LOCALFS ✓
Collaborative Filtering
Job A Job B Job C Similarity
User X View Through - 1
User A View Through View 1
User B Through View Through -1
User C View View View 0.5
Recommended 1.5
Slide 73
Slide 73 text
Collaborative Filtering
Job A Job B Job C Similarity
User X View Through -
User A View Through View 1
User B Through View Through -1
User C View View View 0.5
Recommended 1.5
Slide 74
Slide 74 text
͘Θ͘͠
Slide 75
Slide 75 text
System Requirements
ελϯόΠͷϨίϝϯυཁ݅
ϢʔβͷΫϦοΫϩά
͓ؾʹೖΓՃϩά
S3ʹϩάσʔλ্͕͕͍ͬͯΔ
ֶश࣍Ͱ
Click Log
Favorite
Log
Event Server
ALS
Template
pio import
Slide 81
Slide 81 text
Click Log
Favorite
Log
Elasticsearch
v5.3
cluster
Event Server
ALS
Template
pio import
Data
Slide 82
Slide 82 text
Click Log
Favorite
Log
Elasticsearch
v5.3
cluster
Event Server
ALS
Template
pio import
Data
Spark
2 node cluster
RDD
Slide 83
Slide 83 text
Click Log
Favorite
Log
Elasticsearch
v5.3
cluster
Event Server
ALS
Template
pio import
Data
LOCALFS Spark
2 node cluster
RDD
Model
Slide 84
Slide 84 text
Click Log
Favorite
Log
Elasticsearch
v5.3
cluster
Event Server
ALS
Template
pio import
Data
LOCALFS Spark
2 node cluster
RDD
Model
Query
Predicted
Result
Slide 85
Slide 85 text
Engine Template?
Slide 86
Slide 86 text
Engine
Template
Slide 87
Slide 87 text
No content
Slide 88
Slide 88 text
D
A
S
E
D-A-S-E
Data Source and Data Preparator
Algorithm
Serving
Evaluation Metrics
Slide 89
Slide 89 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌॲཧ
Prediction Server
༧ଌ݁Ռ
Predicted Result
Slide 90
Slide 90 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌॲཧ
Prediction Server
༧ଌ݁Ռ
Predicted Result
Slide 91
Slide 91 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌॲཧ
Prediction Server
༧ଌ݁Ռ
Predicted Result
Slide 92
Slide 92 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌॲཧ
Prediction Server
༧ଌ݁Ռ
Predicted Result
Slide 93
Slide 93 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌॲཧ
Prediction Server
༧ଌ݁Ռ
Predicted Result
Slide 94
Slide 94 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌॲཧ
Prediction Server
༧ଌ݁Ռ
Predicted Result
Slide 95
Slide 95 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Slide 96
Slide 96 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Algorithm
A
Slide 97
Slide 97 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Algorithm
A
Serving
S
Slide 98
Slide 98 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Algorithm
A
Serving
S
E Evaluation Metrics
Slide 99
Slide 99 text
No content
Slide 100
Slide 100 text
D
Slide 101
Slide 101 text
D
A
Slide 102
Slide 102 text
No content
Slide 103
Slide 103 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Algorithm
A
Serving
S
E Evaluation Metrics
Slide 104
Slide 104 text
D
Slide 105
Slide 105 text
DataSource
•Event Store (Event Server) ͔ΒσʔλΛಡࠐ
•TrainingDataΛฦ͢
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Algorithm
A
Serving
S
E Evaluation Metrics
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Algorithm
A
Serving
S
E Evaluation Metrics
Slide 122
Slide 122 text
No content
Slide 123
Slide 123 text
Serving
• LServeΛܧঝ
• serve() Λ࣮
Slide 124
Slide 124 text
No content
Slide 125
Slide 125 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Algorithm
A
Serving
S
E Evaluation Metrics
Cross-validation
Training Data
Validation Data Training Data
Slide 128
Slide 128 text
Cross-validation
Training Data
Validation Data Training Data
Slide 129
Slide 129 text
Cross-validation
Training Data
Validation Data Training Data
Slide 130
Slide 130 text
Cross-validation
Training Data
x10
Validation Data Training Data
Slide 131
Slide 131 text
Grid Search
Parameter B
Parameter A
Slide 132
Slide 132 text
Grid Search
Parameter B
Parameter A
Slide 133
Slide 133 text
Grid Search
Parameter B
Parameter A
Slide 134
Slide 134 text
Grid Search
Parameter B
Parameter A
Slide 135
Slide 135 text
Grid Search
Parameter B
Parameter A
Slide 136
Slide 136 text
Precision@k
Precision@5 / Threshold = 2.0
Predicted A B C D E
Slide 137
Slide 137 text
Precision@k
Precision@5 / Threshold = 2.0
Predicted A
Validation
B C D E
A
˒ˑˑ
B
˒˒˒
X
˒˒ˑ
D
ˑˑˑ
E
˒˒ˑ
Slide 138
Slide 138 text
Precision@k
Precision@5 / Threshold = 2.0
Predicted A
Validation
B C D E
A
˒ˑˑ
B
˒˒˒
X
˒˒ˑ
D
ˑˑˑ
E
˒˒ˑ
Slide 139
Slide 139 text
Precision@k
Precision@5 / Threshold = 2.0
Predicted A
Validation
B C D E
A
˒ˑˑ
B
˒˒˒
X
˒˒ˑ
D
ˑˑˑ
E
˒˒ˑ
PositiveCount: 2.0
Slide 140
Slide 140 text
x ML
5 Jobs
Slide 141
Slide 141 text
No content
Slide 142
Slide 142 text
No content
Slide 143
Slide 143 text
Photo by Bernard Spragg. NZ
Conclusion
Slide 144
Slide 144 text
τϨʔχϯάσʔλ
ϦΞϧλΠϜͰɺόονͰσʔλΛऔΓࠐΉI/F͕͋Δ
ΞΫηετʔΫϯΛൃߦͰ͖ΔͷͰɺ֤αʔϏεͱͷ࿈ܞ͕ศར
ElasticsearchͷࢄετϨʔδͷػೳΛڗडͰ͖Δ
ֶशॲཧͷ࣮ߦ࣌ؒ
SparkͷΫϥελΛ͏ͨΊɺॲཧΛࢄֶ͠शʹ͔͔Δ࣌ؒΛॖ
Open Source Machine Learning Server
Slide 145
Slide 145 text
ֶशϞσϧͷετϨʔδ
ελϯόΠͰLOCALFSΛར༻͍ͯ͠Δ
ϞσϧͷಛੑʹԠͯ͡HDFSΛબՄೳ
༧ଌͷWeb API
“pio deploy” ίϚϯυ͚ͩͰ༧ଌͷAPIΛ࡞Ͱ͖Δ
APIαʔόAkka-Httpϕʔε
Open Source Machine Learning Server
Slide 146
Slide 146 text
Case Studies
ଞͷࣄྫ
Slide 147
Slide 147 text
ॻྨબߟ௨ա - ఆ - ఆঝ ༧ଌ
Prediction for Reject Ratio
ٻਓͷऩਪఆ
Salary Prediction
ٻਓ༰ͷࣗಈੜ
Job description writing-bot