The most stars repositories
on Github?
spark
apache/spark ˒ 12.8k
incubator-predictionio
apache/incubator-predictionio ˒ 10.2k
playframework
playframework/playframework ˒ 9.3k
scala
scala/scala ˒ 8.2k
Slide 39
Slide 39 text
spark
apache/spark ˒ 12.8k
incubator-predictionio
apache/incubator-predictionio ˒ 10.2k
playframework
playframework/playframework ˒ 9.3k
scala
scala/scala ˒ 8.2k
The most stars repositories
on Github?
˒10.2k
Slide 40
Slide 40 text
What is PredictionIO?
Slide 41
Slide 41 text
Apache
PredictionIO?
Apache PredictionIO
(incubating) is an open source
Machine Learning Server built
on top of state-of-the-art open
source stack for developers
and data scientists create
predictive engines for any
machine learning task.
Slide 42
Slide 42 text
Apache PredictionIO
(incubating) is an open source
Machine Learning Server built
on top of state-of-the-art open
source stack for developers
and data scientists create
predictive engines for any
machine learning task.
Apache
PredictionIO?
࠷ઌͷΦʔϓϯιʔεΛ
߹Θͤͨػցֶशαʔό
Slide 43
Slide 43 text
Apache PredictionIO
(incubating) is an open source
Machine Learning Server built
on top of state-of-the-art open
source stack for developers
and data scientists create
predictive engines for any
machine learning task.
Apache
PredictionIO?
࠷ઌͷΦʔϓϯιʔεΛ
߹Θͤͨػցֶशαʔό
ͲΜͳػցֶशλεΫͰ
༧ଌΤϯδϯ͕ͭ͘ΕΔ
Slide 44
Slide 44 text
Apache PredictionIO let you
ର͝ͱʹςϯϓϨʔτΛ࡞Γɺ
͙͢ʹσϓϩΠͰ͖Δ
quickly build and deploy an engine as a web service on production with customizable templates;
ΫΤϦ͛ͯ݁ՌΛฦ͢API͕͋Δ
respond to dynamic queries in real-time once deployed as a web service;
Slide 45
Slide 45 text
Apache PredictionIO let you
ޡࠩͷௐɺධՁͷΈ͋Δ
evaluate and tune multiple engine variants systematically;
όον or ϦΞϧλΠϜͰ
ֶशσʔλΛొ͢ΔI/F͕͋Δ
unify data from multiple platforms in batch or in real-time for comprehensive predictive analytics;
System Architecture
Apache Hadoop up to 2.7.2
required only if YARN and HDFS are needed
Apache HBase up to 1.2.4
Apache Spark up to 1.6.3
for Hadoop 2.6
not Spark 2.x version
Elasticsearch up to 1.7.5
not the Elasticsearch 2.x version
Slide 60
Slide 60 text
Storage roles
Meta Data Event Data Model Data
✓ ✓ ✓
✓ ✓*
✓
✓
LOCALFS ✓
Click Log
Favorite
Log
Event Server
ALS
Template
pio import
Slide 70
Slide 70 text
Click Log
Favorite
Log
Elasticsearch
v5.3
cluster
Event Server
ALS
Template
pio import
Data
Slide 71
Slide 71 text
Click Log
Favorite
Log
Elasticsearch
v5.3
cluster
Event Server
ALS
Template
pio import
Data
Spark
2 node cluster
RDD
Slide 72
Slide 72 text
Click Log
Favorite
Log
Elasticsearch
v5.3
cluster
Event Server
ALS
Template
pio import
Data
LOCALFS Spark
2 node cluster
RDD
Model
Slide 73
Slide 73 text
Click Log
Favorite
Log
Elasticsearch
v5.3
cluster
Event Server
ALS
Template
pio import
Data
LOCALFS Spark
2 node cluster
RDD
Model
Query
Predicted
Result
Slide 74
Slide 74 text
Engine Template?
Slide 75
Slide 75 text
Engine
Template
Slide 76
Slide 76 text
D
A
S
E
D-A-S-E
Data Source and Data Preparator
Algorithm
Serving
Evaluation Metrics
Slide 77
Slide 77 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌॲཧ
Prediction Server
༧ଌ݁Ռ
Predicted Result
Slide 78
Slide 78 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌॲཧ
Prediction Server
༧ଌ݁Ռ
Predicted Result
Slide 79
Slide 79 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌॲཧ
Prediction Server
༧ଌ݁Ռ
Predicted Result
Slide 80
Slide 80 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌॲཧ
Prediction Server
༧ଌ݁Ռ
Predicted Result
Slide 81
Slide 81 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌॲཧ
Prediction Server
༧ଌ݁Ռ
Predicted Result
Slide 82
Slide 82 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌॲཧ
Prediction Server
༧ଌ݁Ռ
Predicted Result
Slide 83
Slide 83 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Slide 84
Slide 84 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Algorithm
A
Slide 85
Slide 85 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Algorithm
A
Serving
S
Slide 86
Slide 86 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Algorithm
A
Serving
S
E Evaluation Metrics
Slide 87
Slide 87 text
No content
Slide 88
Slide 88 text
D
Slide 89
Slide 89 text
D
A
Slide 90
Slide 90 text
No content
Slide 91
Slide 91 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Algorithm
A
Serving
S
E Evaluation Metrics
Slide 92
Slide 92 text
D
Slide 93
Slide 93 text
DataSource
•Event Store (Event Server) ͔ΒσʔλΛಡࠐ
•TrainingDataΛฦ͢
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Algorithm
A
Serving
S
E Evaluation Metrics
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Algorithm
A
Serving
S
E Evaluation Metrics
Slide 109
Slide 109 text
No content
Slide 110
Slide 110 text
Serving
• LServeΛܧঝ
• serve() Λ࣮
Slide 111
Slide 111 text
No content
Slide 112
Slide 112 text
Machine Learning Flow
τϨʔχϯάσʔλ
Training Data
ػցֶशΞϧΰϦζϜ
Machine Learning Algorithm
༧ଌϞσϧ
Predictive Model
લॲཧ
Preprocessing
Πϯϓοτσʔλ
Input Data
༧ଌϞσϧ
Predictive Model
༧ଌ݁Ռ
Predicted Result
Data Source
& Preparator
D
Algorithm
A
Serving
S
E Evaluation Metrics
Slide 113
Slide 113 text
Precision@k
Precision@5 / Threshold = 2.0
Predicted
A
˒ˑˑ
Validation
B
˒˒˒
C
˒˒ˑ
D
ˑˑˑ
E
˒˒ˑ
A
˒ˑˑ
B
˒˒˒
X
˒˒ˑ
D
ˑˑˑ
E
˒˒ˑ
Slide 114
Slide 114 text
Precision@k
Precision@5 / Threshold = 2.0
Predicted
A
˒ˑˑ
Validation
B
˒˒˒
C
˒˒ˑ
D
ˑˑˑ
E
˒˒ˑ
A
˒ˑˑ
B
˒˒˒
X
˒˒ˑ
D
ˑˑˑ
E
˒˒ˑ
Slide 115
Slide 115 text
Precision@k
Precision@5 / Threshold = 2.0
Predicted
A
˒ˑˑ
Validation
B
˒˒˒
C
˒˒ˑ
D
ˑˑˑ
E
˒˒ˑ
A
˒ˑˑ
B
˒˒˒
X
˒˒ˑ
D
ˑˑˑ
E
˒˒ˑ
PositiveCount: 2.0
Slide 116
Slide 116 text
x ML
5 Jobs
Slide 117
Slide 117 text
No content
Slide 118
Slide 118 text
No content
Slide 119
Slide 119 text
Photo by Bernard Spragg. NZ
Conclusion
Slide 120
Slide 120 text
τϨʔχϯάσʔλ
ϦΞϧλΠϜͰɺόονͰσʔλΛऔΓࠐΉ͜ͱ͕Ͱ͖Δ
ΞΫηετʔΫϯΛൃߦͰ͖ΔͷͰɺ֤αʔϏεͱͷ࿈ܞ͕ศར
ElasticsearchͷࢄετϨʔδͷػೳΛڗडͰ͖Δ
ֶशॲཧͷ࣮ߦ࣌ؒ
SparkͷΫϥελΛ͏ͨΊɺॲཧΛࢄֶ͠शʹ͔͔Δ࣌ؒΛॖ
Open Source Machine Learning Server
Slide 121
Slide 121 text
ֶशϞσϧͷετϨʔδ
ελϯόΠͰLOCALFSΛར༻͍ͯ͠Δ
ϞσϧͷಛੑʹԠͯ͡HDFSΛબՄೳ
༧ଌͷWeb API
“pio deploy” ίϚϯυ͚ͩͰ༧ଌͷAPIΛ࡞Ͱ͖Δ
APIαʔόAkka-Httpϕʔε
Open Source Machine Learning Server