Apache PredictionIO - Machine Learning 15minutes! #12

Open Source Machine Learning Server 15 Machine Learning minutes!

Apache PredictionIOͷίϛολ͕ޠΔ Spark&Elasticsearch ػցֶशج൫

ػցֶशͷ͓೰Έ • ֶशɾϞσϧσʔλͷετϨʔδ • ػցֶशͷ෼ࢄॲཧϑϨʔϜϫʔΫ • σʔλ༧ଌͷWebαʔϏεʢAPIʣԽ ղܾͷબ୒ࢶ → ఻͍͑ͨ͜ͱ

Photo by Bernard Spragg. NZ About me

Proﬁle Takahiro Hagino Bizreach גࣜձࣾϏζϦʔν • ٻਓݕࡧΤϯδϯʮελϯόΠʯ • AIࣨ

Open Source Machine Learning Server

ϏζϦʔνͰ͸ػցֶशΞϓϦέʔγϣϯͷ ։ൃɾӡ༻ج൫ͱͯ͠ "QBDIF1SFEJDUJPO*0ʹऔΓ૊ΜͰ͍·͢ɻ ೥݄ฐ͔ࣾΒ໊͕ίϛολͱͯ͠ 1SFEJDUJPO*0ͷ։ൃʹࢀՃ͢Δ͜ͱʹͳΓ·ͨ͠ɻ Shinsuke Sugaya Naoki  Takezoe Takako 
Shimamoto Takahiro Hagino

x ML ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction
ۀछɾ৬छਪఆ Job Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics

ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction ۀछɾ৬छਪఆ Job
Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics x ML

τϨʔχϯάσʔλ σʔλ͝ͱES΁ͷόονΠϯϙʔτεΫϦϓτΛ४උ͍ͯͨ͠  ֶशͷ؀ڥ΍࣮ߦϑϩʔ͕୲౰ऀ͝ͱʹଐਓԽ͓ͯ͠Γɺ  τϨʔχϯάσʔλͷϑΥʔϚοτ΋ఆ·͍ͬͯ·ͤΜɻ ֶशॲཧͷ࣮ߦ࣌ؒ σʔλྔͷ૿Ճʹͱ΋ͳ͏ֶशͷ௕࣌ؒԽ x MLͷ͓ͳ΍Έ

ֶशϞσϧͷετϨʔδ ػೳ͝ͱʹֶशϞσϧͷอଘઌ΋ଐਓԽ ༧ଌͷWeb API ػೳ͝ͱʹTornadoͳͲͰ؆қͳAPIαʔόΛ࡞੒͓ͯ͠Γɺ  ͢͹΍͘ΞϓϦʹ൓өͰ͖ͳ͍ ༧ଌ݁ՌΛฦ͢APIͱͯ͠ESΛར༻͢ΔͳͲ͍ͯͨ͠ x MLͷ͓ͳ΍Έ

AIࣨͷ͓ͳ΍Έ શࣾͷ༷ʑͳࣄۀͱ࿈ܞ ෳ਺ͷࣄۀ͕͋ΓɺػցֶशͷػೳΛఏڙ͍ͯ͠Δ ֤ࣄۀͱσʔλ࿈ܞͷI/FΛڞ௨Խ͍ͨ͠ ։ൃϑϩʔΛڞ௨Խ͍ͨ͠ ઐ೚ͷΠϯϑϥΤϯδχΞ΋͍ͳ͍ ఏڙ͢Δػೳ͝ͱʹɺݸผʹΠϯϑϥΛ੔͑Δͱແବ͕ଟ͍ ग़དྷΔݶΓϑϨʔϜϫʔΫԽ͍ͨ͠

Solution ղܾࡦΛ୳͠·ͨ͠

4 3FETIJGU 3%4ͷσʔλ͔Β ֶशϞσϧΛ࡞੒Ͱ͖Δ ༧ଌͷͨΊʹΫΤϦΛ࣮ߦͰ͖Δ ֶशϞσϧ͔Β༧ଌ஋Λฦ͢ 8FCΞϓϦέʔγϣϯΛࣗಈੜ੒

Solution

Solution Machine Leaning as a Service ML Tools

Solution Machine Leaning as a Service ML Tools Open Source

Machine Learning Stacks Apps Algorithm Processing Datastore API Server (Tornado…)
Scikitlearn, SparkML … DL: Caffe2, DL4j, Tensorﬂow, Chainer … Hadoop, Spark, Storm … Elasticsearch, HBASE, Redshift …

Machine Learning Stacks Apps Algorithm Processing Datastore API Server (Tornado…)
Scikitlearn, SparkML … DL: Caffe2, DL4j, Tensorﬂow, Chainer … Hadoop, Spark, Storm … Elasticsearch, HBASE, Redshift … PredictionIO

PredictionIO?

Salesforce Acquires PredictionIO

Salesforce Acquires PredictionIO Feb 19, 2016 - TechCrunch 4BMFTGPSDF͕ػցֶशϓϥοτϑΥʔϜͷ 1SFEJDUJPO*0Λങऩ

Salesforce Introduces Salesforce Einstein

Salesforce Introduces Salesforce Einstein Sep 18, 2016 - TechCrunch "*ΛऔΓࠐΉ4BMFTGPSDFͷ໺๬
ػցֶशϓϥοτϑΥʔϜʮ&JOTUFJOʯΛൃද

The most stars repositories on Github? spark apache/spark ˒ 12.8k
incubator-predictionio apache/incubator-predictionio ˒ 10.2k playframework playframework/playframework ˒ 9.3k scala scala/scala ˒ 8.2k

spark apache/spark ˒ 12.8k incubator-predictionio apache/incubator-predictionio ˒ 10.2k playframework playframework/playframework
˒ 9.3k scala scala/scala ˒ 8.2k The most stars repositories on Github? ˒10.2k

What is PredictionIO?

Apache PredictionIO? Apache PredictionIO (incubating) is an open source Machine
Learning Server built on top of state-of-the-art open source stack for developers and data scientists create predictive engines for any machine learning task.

Apache PredictionIO (incubating) is an open source Machine Learning Server
built on top of state-of-the-art open source stack for developers and data scientists create predictive engines for any machine learning task. Apache PredictionIO? ࠷ઌ୺ͷΦʔϓϯιʔεΛ ૊߹Θͤͨػցֶशαʔό

Apache PredictionIO (incubating) is an open source Machine Learning Server
built on top of state-of-the-art open source stack for developers and data scientists create predictive engines for any machine learning task. Apache PredictionIO? ࠷ઌ୺ͷΦʔϓϯιʔεΛ ૊߹Θͤͨػցֶशαʔό ͲΜͳػցֶशλεΫͰ΋ ༧ଌΤϯδϯ͕ͭ͘ΕΔ

Apache PredictionIO let you ର৅໰୊͝ͱʹςϯϓϨʔτΛ࡞Γɺ  ͙͢ʹσϓϩΠͰ͖Δ quickly build and deploy
an engine as a web service on production with customizable templates; ΫΤϦ౤͛ͯ݁ՌΛฦ͢API͕͋Δ respond to dynamic queries in real-time once deployed as a web service;

Apache PredictionIO let you ޡࠩͷௐ੔΍ɺධՁͷ࢓૊Έ΋͋Δ evaluate and tune multiple engine
variants systematically; όον or ϦΞϧλΠϜͰ  ֶशσʔλΛొ࿥͢ΔI/F͕͋Δ unify data from multiple platforms in batch or in real-time for comprehensive predictive analytics;

REST: EventAPI SDK: EventClient

Engine Template

Photo by Bernard Spragg. NZ Quick Start

Versions Latest Release Version v0.11.0

Quick Startʢ೔ຊޠʣ takezoe.hatenablog.com/entry/2017/05/11/132410

Photo by Bernard Spragg. NZ Algorithm

Photo by Bernard Spragg. NZ System Architecture

System Architecture Apache Hadoop up to 2.7.2 required only if
YARN and HDFS are needed  Apache HBase up to 1.2.4 Apache Spark up to 1.6.3  for Hadoop 2.6 not Spark 2.x version Elasticsearch up to 1.7.5 not the Elasticsearch 2.x version

Storage roles Meta Data Event Data Model Data ✓ ✓
✓ ✓ ✓* ✓ ✓ LOCALFS ✓

Photo by Bernard Spragg. NZ Implementation

Scala ੡ ɾ ػցֶशج൫ PredictionIOͱ SparkʹΑΔϨίϝϯυγ ες Ϝ

System Requirements ελϯόΠͷϨίϝϯυཁ݅ ϢʔβͷΫϦοΫϩά ͓ؾʹೖΓ௥Ճϩά S3ʹϩάσʔλ্͕͕͍ͬͯΔ ֶश͸೔࣍Ͱ

ElasticsearchͱTasteϓϥάΠϯͰ ࡞ΔϨίϝϯυγεςϜ

PIOಋೖલ σʔλ༻ͷESΠϯσοΫΛຖճ࡞੒ σʔλΠϯϙʔτ༻ͷઐ༻εΫϦϓτ Elasticsearch TasteϓϥάΠϯͰ࣮ߦ ศར͕ͩ൚༻ੑɺσʔλ૿ʹΑΔ࣮ߦ͕࣌ؒ՝୊ʹ ֶश݁ՌΛόϧΫϑΝΠϧͱͯ͠ग़ྗ Elasticsearch ༧ଌͷAPIͱͯ͠ར༻ ࣮ߦϑϩʔΛγΣϧεΫϦϓτͰ؅ཧ

Click Log Favorite Log Event Server ALS Template

Click Log Favorite Log Event Server ALS Template pio import

Click Log Favorite Log Elasticsearch v5.3 cluster Event Server ALS
Template pio import Data

Template pio import Data Spark 2 node cluster RDD

Template pio import Data LOCALFS Spark 2 node cluster RDD Model

Template pio import Data LOCALFS Spark 2 node cluster RDD Model Query Predicted Result

Engine Template?

Engine Template

D A S E D-A-S-E Data Source and Data Preparator
Algorithm Serving Evaluation Metrics

Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm
༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌॲཧ Prediction Server ༧ଌ݁Ռ Predicted Result

༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D

༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A

༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A Serving S

༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A Serving S E Evaluation Metrics

DataSource •Event Store (Event Server) ͔ΒσʔλΛಡࠐ •TrainingDataΛฦ͢

Preparator • TrainingDataʹର͢Δલॲཧ • ಛ௃நग़ • ෳ਺AlgorithmΛར༻͢Δ৔߹ͷڞ௨ॲཧ • PreparedDataʹม׵ͯ͠Algoritmʹ౉͢

Algorithm • train() Λ࣮૷ • ༧ଌϞσϧͷֶशΛ୲౰͢Δ • pio train ίϚϯυͰݺͼग़͞ΕΔ
• HDFSʢLocalFSʣʹετΞ͞ΕΔ • predict() Λ࣮૷ • σϓϩΠޙͷΫΤϦʹରͯ͠ϦΞϧλΠϜʹݺ͹ΕΔ

Serving • LServeΛܧঝ • serve() Λ࣮૷

Precision@k Precision@5 / Threshold = 2.0 Predicted A ˒ˑˑ Validation
B ˒˒˒ C ˒˒ˑ D ˑˑˑ E ˒˒ˑ A ˒ˑˑ B ˒˒˒ X ˒˒ˑ D ˑˑˑ E ˒˒ˑ

Precision@k Precision@5 / Threshold = 2.0 Predicted A ˒ˑˑ Validation
B ˒˒˒ C ˒˒ˑ D ˑˑˑ E ˒˒ˑ A ˒ˑˑ B ˒˒˒ X ˒˒ˑ D ˑˑˑ E ˒˒ˑ PositiveCount: 2.0

x ML 5 Jobs

Photo by Bernard Spragg. NZ Conclusion

τϨʔχϯάσʔλ ϦΞϧλΠϜͰ΋ɺόονͰ΋σʔλΛऔΓࠐΉ͜ͱ͕Ͱ͖Δ ΞΫηετʔΫϯΛൃߦͰ͖ΔͷͰɺ֤αʔϏεͱͷ࿈ܞ͕ศར Elasticsearchͷ෼ࢄετϨʔδͷػೳΛڗडͰ͖Δ ֶशॲཧͷ࣮ߦ࣌ؒ SparkͷΫϥελΛ࢖͏ͨΊɺॲཧΛ෼ࢄֶ͠शʹ͔͔Δ࣌ؒΛ୹ॖ Open Source Machine Learning
Server

ֶशϞσϧͷετϨʔδ ελϯόΠͰ͸LOCALFSΛར༻͍ͯ͠Δ  ϞσϧͷಛੑʹԠͯ͡HDFSΛબ୒Մೳ ༧ଌͷWeb API “pio deploy” ίϚϯυ͚ͩͰ༧ଌͷAPIΛ࡞੒Ͱ͖Δ APIαʔό͸Akka-Httpϕʔε Open
Source Machine Learning Server

୤ɾଐਓԽ

Photo by Bernard Spragg. NZ Appendix

ϏζϦʔνͰ͸ػցֶशΞϓϦέʔγϣϯΛ ૊৫తʹ։ൃ͍ͯ͘͠ʹ͋ͨΓɺ։ൃɾӡ༻ج൫ͱ ͯ͠"QBDIF1SFEJDUJPO*0ʹऔΓ૊ΜͰ͍·͢ɻ ೥݄ฐ͔ࣾΒ໊͕ίϛολͱͯ͠ 1SFEJDUJPO*0ͷ։ൃʹࢀՃ͢Δ͜ͱʹͳΓ·ͨ͠ɻ Shinsuke Sugaya Naoki  Takezoe Takako 
Shimamoto Takahiro Hagino

How to Contribute to PIO

Add support for Elasticsearch 5.x

jpioug.org

• ೔ຊ Apache PredictionIO Ϣʔβձ JPIOUG Join Us!

30 6݄ FRI

30 6݄ FRI Open Source Machine Learning Server 02 JPIOUG
Meetup 19:30 @Shibuya

30 6݄ FRI Open Source Machine Learning Server 02 JPIOUG
Meetup 19:30 @Shibuya ʲٸืʳLT͍ͨ͠ํ

30 8݄ WED

30 8݄ WED Open Source Machine Learning Server 03 JPIOUG
Meetup 19:30 @Shibuya

30 8݄ WED Open Source Machine Learning Server 03 JPIOUG
Meetup 19:30 @Shibuya ʲΏΔืʳLT͍ͨ͠ํ

jpioug.org

Open Source Machine Learning Server 15 Machine Learning minutes! Thank
You

Apache PredictionIO - Machine Learning 15minute...

Apache PredictionIO - Machine Learning 15minutes! #12

More Decks by takahiro-hagino

Other Decks in Technology

Featured

Transcript