Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache PredictionIO - Machine Learning 15minute...

Apache PredictionIO - Machine Learning 15minutes! #12

2017/05/27に開催された第12回 Machine Learning 15minutes! でのLT資料です。
「Apache PredictionIOのコミッタが語る Spark&Elasticsearch 機械学習基盤 」


May 29, 2017

More Decks by takahiro-hagino

Other Decks in Technology


  1. x ML ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction

    ۀछɾ৬छਪఆ Job Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics
  2. ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction ۀछɾ৬छਪఆ Job

    Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics x ML
  3. ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction ۀछɾ৬छਪఆ Job

    Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics x ML
  4. ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction ۀछɾ৬छਪఆ Job

    Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics x ML
  5. ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction ۀछɾ৬छਪఆ Job

    Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics x ML
  6. ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction ۀछɾ৬छਪఆ Job

    Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics x ML
  7. ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction ۀछɾ৬छਪఆ Job

    Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics x ML
  8. Machine Learning Stacks Apps Algorithm Processing Datastore API Server (Tornado…)

    Scikitlearn, SparkML … DL: Caffe2, DL4j, Tensorflow, Chainer … Hadoop, Spark, Storm … Elasticsearch, HBASE, Redshift …
  9. Machine Learning Stacks Apps Algorithm Processing Datastore API Server (Tornado…)

    Scikitlearn, SparkML … DL: Caffe2, DL4j, Tensorflow, Chainer … Hadoop, Spark, Storm … Elasticsearch, HBASE, Redshift … PredictionIO
  10. The most stars repositories on Github? spark apache/spark ˒ 12.8k

    incubator-predictionio apache/incubator-predictionio ˒ 10.2k playframework playframework/playframework ˒ 9.3k scala scala/scala ˒ 8.2k
  11. Apache PredictionIO? Apache PredictionIO (incubating) is an open source Machine

    Learning Server built on top of state-of-the-art open source stack for developers and data scientists create predictive engines for any machine learning task.
  12. Apache PredictionIO (incubating) is an open source Machine Learning Server

    built on top of state-of-the-art open source stack for developers and data scientists create predictive engines for any machine learning task. Apache PredictionIO? ࠷ઌ୺ͷΦʔϓϯιʔεΛ ૊߹Θͤͨػցֶशαʔό
  13. Apache PredictionIO (incubating) is an open source Machine Learning Server

    built on top of state-of-the-art open source stack for developers and data scientists create predictive engines for any machine learning task. Apache PredictionIO? ࠷ઌ୺ͷΦʔϓϯιʔεΛ ૊߹Θͤͨػցֶशαʔό ͲΜͳػցֶशλεΫͰ΋ ༧ଌΤϯδϯ͕ͭ͘ΕΔ
  14. Apache PredictionIO let you ର৅໰୊͝ͱʹςϯϓϨʔτΛ࡞Γɺ
 ͙͢ʹσϓϩΠͰ͖Δ quickly build and deploy

    an engine as a web service on production with customizable templates; ΫΤϦ౤͛ͯ݁ՌΛฦ͢API͕͋Δ respond to dynamic queries in real-time once deployed as a web service;
  15. Apache PredictionIO let you ޡࠩͷௐ੔΍ɺධՁͷ࢓૊Έ΋͋Δ evaluate and tune multiple engine

    variants systematically; όον or ϦΞϧλΠϜͰ
 ֶशσʔλΛొ࿥͢ΔI/F͕͋Δ unify data from multiple platforms in batch or in real-time for comprehensive predictive analytics;
  16. System Architecture Apache Hadoop up to 2.7.2 required only if

    YARN and HDFS are needed
 Apache HBase up to 1.2.4 Apache Spark up to 1.6.3
 for Hadoop 2.6 not Spark 2.x version Elasticsearch up to 1.7.5 not the Elasticsearch 2.x version
  17. Click Log Favorite Log Elasticsearch v5.3 cluster Event Server ALS

    Template pio import Data Spark 2 node cluster RDD
  18. Click Log Favorite Log Elasticsearch v5.3 cluster Event Server ALS

    Template pio import Data LOCALFS Spark 2 node cluster RDD Model
  19. Click Log Favorite Log Elasticsearch v5.3 cluster Event Server ALS

    Template pio import Data LOCALFS Spark 2 node cluster RDD Model Query Predicted Result
  20. D A S E D-A-S-E Data Source and Data Preparator

    Algorithm Serving Evaluation Metrics
  21. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌॲཧ Prediction Server ༧ଌ݁Ռ Predicted Result
  22. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌॲཧ Prediction Server ༧ଌ݁Ռ Predicted Result
  23. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌॲཧ Prediction Server ༧ଌ݁Ռ Predicted Result
  24. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌॲཧ Prediction Server ༧ଌ݁Ռ Predicted Result
  25. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌॲཧ Prediction Server ༧ଌ݁Ռ Predicted Result
  26. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌॲཧ Prediction Server ༧ଌ݁Ռ Predicted Result
  27. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D
  28. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A
  29. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A Serving S
  30. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A Serving S E Evaluation Metrics
  31. D

  32. D A

  33. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A Serving S E Evaluation Metrics
  34. D

  35. D

  36. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A Serving S E Evaluation Metrics
  37. A

  38. Algorithm • train() Λ࣮૷ • ༧ଌϞσϧͷֶशΛ୲౰͢Δ • pio train ίϚϯυͰݺͼग़͞ΕΔ

    • HDFSʢLocalFSʣʹετΞ͞ΕΔ • predict() Λ࣮૷ • σϓϩΠޙͷΫΤϦʹରͯ͠ϦΞϧλΠϜʹݺ͹ΕΔ
  39. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A Serving S E Evaluation Metrics
  40. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A Serving S E Evaluation Metrics
  41. Precision@k Precision@5 / Threshold = 2.0 Predicted A ˒ˑˑ Validation

    B ˒˒˒ C ˒˒ˑ D ˑˑˑ E ˒˒ˑ A ˒ˑˑ B ˒˒˒ X ˒˒ˑ D ˑˑˑ E ˒˒ˑ
  42. Precision@k Precision@5 / Threshold = 2.0 Predicted A ˒ˑˑ Validation

    B ˒˒˒ C ˒˒ˑ D ˑˑˑ E ˒˒ˑ A ˒ˑˑ B ˒˒˒ X ˒˒ˑ D ˑˑˑ E ˒˒ˑ
  43. Precision@k Precision@5 / Threshold = 2.0 Predicted A ˒ˑˑ Validation

    B ˒˒˒ C ˒˒ˑ D ˑˑˑ E ˒˒ˑ A ˒ˑˑ B ˒˒˒ X ˒˒ˑ D ˑˑˑ E ˒˒ˑ PositiveCount: 2.0
  44. 30 6݄ FRI Open Source Machine Learning Server 02 JPIOUG

    Meetup 19:30 @Shibuya ʲٸืʳLT͍ͨ͠ํ
  45. 30 8݄ WED Open Source Machine Learning Server 03 JPIOUG

    Meetup 19:30 @Shibuya ʲΏΔืʳLT͍ͨ͠ํ