Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Recommendation Engine with Spark and Apache PredictionIO

Building a Recommendation Engine with Spark and Apache PredictionIO

Scala製機械学習基盤PredictionIOとSparkによるレコメンドシステム | JJUG CCC 2017 SPRING
#ccc_a3
SparkやMLlib、HDFS、Elasticsearchなど、注目を集めるオープンソースをベースとした機械学習サーバApache PredictionIOの概論と、同システムを使ったレコメンドシステム開発で得られた知見を共有するセッションです。Apache PredictionIOは様々な機械学習の手法をテンプレートに記述するだけで、Sparkをベースに学習タスクの分散処理が可能になります。それだけでなく、学習モデルから予測値を返したり、新たなイベントデータをリアルタイムに受けつけるAPIサーバまでを統合的に提供するプラットフォーム技術です。
本セッションでは、機械学習のインフラデザインとしても参考になるPredictionIOのアーキテクチャや、日本最大級の求人検索エンジンのログデータから、ユーザに最適な求人を推薦するレコメンドシステムの開発を通じて、学習ロジックのつくり方、学習モデルの評価と改善、Spark MLlibのチューニングやハマりどころなど実践導入のなかでのノウハウや苦労をお話します。Webシステムに機械学習を導入する際にPredictionIOを使うメリットをお伝えできればと思いますので、ぜひご参加ください。

3ea093e99fbc5ecf5b898c4c0ffd86c0?s=128

takahiro-hagino

May 20, 2017
Tweet

Transcript

  1. Open Source Machine Learning Server 2017 JJUG CCC Spring

  2. Scala ੡ ɾ ػցֶशج൫ PredictionIOͱ SparkʹΑΔϨίϝϯυγ ες Ϝ

  3. ػցֶशͷ͓೰Έ • ֶशɾϞσϧσʔλͷετϨʔδ • ػցֶशͷ෼ࢄॲཧϑϨʔϜϫʔΫ • σʔλ༧ଌͷWebαʔϏεʢAPIʣԽ ղܾͷબ୒ࢶ → ఻͍͑ͨ͜ͱ

  4. ػցֶशͷ͓೰Έ • ֶशɾϞσϧσʔλͷετϨʔδ • ػցֶशͷ෼ࢄॲཧϑϨʔϜϫʔΫ • σʔλ༧ଌͷWebαʔϏεʢAPIʣԽ ղܾͷબ୒ࢶ → ఻͍͑ͨ͜ͱ

  5. Warning Java ͷ࿩͸͋Γ·ͤΜ Scala ੡ɾػցֶशج൫ PredictionIO ͷ͓࿩Ͱ͢ ػցֶश ͷ͜ͱ͸࿩͠·ͤΜ ͋͘·Ͱػցֶशج൫ͷ͓࿩Ͱ͢

  6. Warning Java ͷ࿩͸͋Γ·ͤΜ Scala ੡ɾػցֶशج൫ PredictionIO ͷ͓࿩Ͱ͢ ػցֶश ͷ͜ͱ͸࿩͠·ͤΜ ͋͘·Ͱػցֶशج൫ͷ͓࿩Ͱ͢

  7. Warning Java ͷ࿩͸͋Γ·ͤΜ Scala ੡ɾػցֶशج൫ PredictionIO ͷ͓࿩Ͱ͢ ػցֶश ͷ͜ͱ͸࿩͠·ͤΜ ͋͘·Ͱػցֶशج൫ͷ͓࿩Ͱ͢

  8. Photo by Bernard Spragg. NZ About me

  9. Profile Takahiro Hagino Bizreach גࣜձࣾϏζϦʔν • ٻਓݕࡧΤϯδϯʮελϯόΠʯ • AIࣨ

  10. Open Source Machine Learning Server

  11. ٻਓݕࡧΤϯδϯ Play on Scala ͰͷϚΠΫϩαʔϏεɾΞʔΩςΫνϟ σʔλɾετϨʔδɺ ෼ࢄݕࡧʹElasticsearch Λ࠾༻ ৗ࣌400ສ݅Ҏ্ͷٻਓΛΫϩʔϦϯά iOS/AndroidΞϓϦͰ͸஍ਤݕࡧ͕Մೳ

    ελϯόΠ
  12. x ML ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction

    ۀछɾ৬छਪఆ Job Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics
  13. ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction ۀछɾ৬छਪఆ Job

    Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics x ML
  14. ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction ۀछɾ৬छਪఆ Job

    Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics x ML
  15. ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction ۀछɾ৬छਪఆ Job

    Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics x ML
  16. ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction ۀछɾ৬छਪఆ Job

    Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics x ML
  17. ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction ۀछɾ৬छਪఆ Job

    Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics x ML
  18. ٻ৬ऀͱٻਓͷϚονϯά Search Quality and Recommendation ೥ऩਪఆ Salary Prediction ۀछɾ৬छਪఆ Job

    Category Prediction ٻਓಛ௃ਪఆ Prediction of Job Characteristics x ML
  19. x ML

  20. x ML

  21. τϨʔχϯάσʔλ σʔλ͝ͱES΁ͷόονΠϯϙʔτεΫϦϓτΛ४උ͍ͯͨ͠
 ֶशͷ؀ڥ΍࣮ߦϑϩʔ͕୲౰ऀ͝ͱʹଐਓԽ
 τϨʔχϯάσʔλͷϑΥʔϚοτ΋ఆ·͍ͬͯͳ͔ͬͨ ֶशॲཧͷ࣮ߦ࣌ؒ σʔλྔͷ૿Ճʹͱ΋ͳ͏ֶशͷ௕࣌ؒԽ x MLͷ͓ͳ΍Έ

  22. τϨʔχϯάσʔλ σʔλ͝ͱES΁ͷόονΠϯϙʔτεΫϦϓτΛ४උ͍ͯͨ͠
 ֶशͷ؀ڥ΍࣮ߦϑϩʔ͕୲౰ऀ͝ͱʹଐਓԽ
 τϨʔχϯάσʔλͷϑΥʔϚοτ΋ఆ·͍ͬͯͳ͔ͬͨ ֶशॲཧͷ࣮ߦ࣌ؒ σʔλྔͷ૿Ճʹͱ΋ͳ͏ֶशͷ௕࣌ؒԽ x MLͷ͓ͳ΍Έ

  23. ֶशϞσϧͷετϨʔδ ػೳ͝ͱʹֶशϞσϧͷอଘઌ΋ଐਓԽ ༧ଌͷWeb API ػೳ͝ͱʹTornadoͳͲͰ؆қͳAPIαʔόΛ࡞੒͓ͯ͠Γɺ
 ͢͹΍͘ΞϓϦʹ൓өͰ͖ͳ͍ ༧ଌ݁ՌΛฦ͢APIͱͯ͠ESΛར༻͢ΔͳͲ͍ͯͨ͠ x MLͷ͓ͳ΍Έ

  24. ֶशϞσϧͷετϨʔδ ػೳ͝ͱʹֶशϞσϧͷอଘઌ΋ଐਓԽ ༧ଌͷWeb API ػೳ͝ͱʹTornadoͳͲͰ؆қͳAPIαʔόΛ࡞੒͓ͯ͠Γɺ
 ͢͹΍͘ΞϓϦʹ൓өͰ͖ͳ͍ ༧ଌ݁ՌΛฦ͢APIͱͯ͠ESΛར༻͢ΔͳͲ͍ͯͨ͠ x MLͷ͓ͳ΍Έ

  25. AIࣨͷ͓ͳ΍Έ શࣾͷ༷ʑͳࣄۀͱ࿈ܞ ෳ਺ͷࣄۀ͕͋ΓɺػցֶशͷػೳΛఏڙ͍ͯ͠Δ ֤ࣄۀͱσʔλ࿈ܞͷI/FΛڞ௨Խ͍ͨ͠ ։ൃϑϩʔΛڞ௨Խ͍ͨ͠ ઐ೚ͷΠϯϑϥΤϯδχΞ΋͍ͳ͍ ఏڙ͢Δػೳ͝ͱʹɺݸผʹΠϯϑϥΛ੔͑Δͱແବ͕ଟ͍ ग़དྷΔݶΓϑϨʔϜϫʔΫԽ͍ͨ͠

  26. Solution ղܾࡦΛ୳͠·ͨ͠

  27. Machine Learning Stacks Apps Algorithm Processing Datastore API Server (Tornado…)

    Scikitlearn, SparkML … DL: Caffe2, DL4j, Tensorflow, Chainer … Hadoop, Spark, Storm … Elasticsearch, HBASE, Redshift …
  28. Machine Learning Stacks Apps Algorithm Processing Datastore API Server Scikitlearn,

    SparkML … DL: Caffe2, DL4j, Tensorflow, Chainer … Hadoop, Spark, Storm … Elasticsearch, HBASE, Redshift … PredictionIO
  29. None
  30. PredictionIO?

  31. Salesforce Acquires PredictionIO

  32. Salesforce Acquires PredictionIO Feb 19, 2016 - TechCrunch 4BMFTGPSDF͕ػցֶशϓϥοτϑΥʔϜͷ 1SFEJDUJPO*0Λങऩ

  33. Salesforce Introduces Salesforce Einstein

  34. Salesforce Acquires PredictionIO Sep 18, 2016 - TechCrunch "*ΛऔΓࠐΉ4BMFTGPSDFͷ໺๬ ػցֶशϓϥοτϑΥʔϜʮ&JOTUFJOʯΛൃද

  35. The most stars repositories on Github? spark apache/spark ˒ 12.8k

    incubator-predictionio apache/incubator-predictionio ˒ 10.2k playframework playframework/playframework ˒ 9.3k scala scala/scala ˒ 8.2k
  36. spark apache/spark ˒ 12.8k incubator-predictionio apache/incubator-predictionio ˒ 10.2k playframework playframework/playframework

    ˒ 9.3k scala scala/scala ˒ 8.2k The most stars repositories on Github? ˒10.2k
  37. What is PredictionIO?

  38. Apache PredictionIO? Apache PredictionIO (incubating) is an open source Machine

    Learning Server built on top of state-of-the-art open source stack for developers and data scientists create predictive engines for any machine learning task.
  39. Apache PredictionIO (incubating) is an open source Machine Learning Server

    built on top of state-of-the-art open source stack for developers and data scientists create predictive engines for any machine learning task. Apache PredictionIO? ࠷ઌ୺ͷΦʔϓϯιʔεΛ ૊߹Θͤͨػցֶशαʔό
  40. Apache PredictionIO (incubating) is an open source Machine Learning Server

    built on top of state-of-the-art open source stack for developers and data scientists create predictive engines for any machine learning task. Apache PredictionIO? ࠷ઌ୺ͷΦʔϓϯιʔεΛ ૊߹Θͤͨػցֶशαʔό ͲΜͳػցֶशλεΫͰ΋ ༧ଌΤϯδϯ͕ͭ͘ΕΔ
  41. Apache PredictionIO let you ର৅໰୊͝ͱʹςϯϓϨʔτΛ࡞Γɺ
 ͙͢ʹσϓϩΠͰ͖Δ quickly build and deploy

    an engine as a web service on production with customizable templates; ΫΤϦ౤͛ͯ݁ՌΛฦ͢API͕͋Δ respond to dynamic queries in real-time once deployed as a web service;
  42. Apache PredictionIO let you ޡࠩͷௐ੔΍ɺධՁͷ࢓૊Έ΋͋Δ evaluate and tune multiple engine

    variants systematically; όον or ϦΞϧλΠϜͰ
 ֶशσʔλΛొ࿥͢ΔI/F͕͋Δ unify data from multiple platforms in batch or in real-time for comprehensive predictive analytics;
  43. None
  44. None
  45. None
  46. REST: EventAPI SDK: EventClient

  47. None
  48. Engine Template

  49. None
  50. Photo by Bernard Spragg. NZ Quick Start

  51. Versions Latest Release Version v0.11.0

  52. Quick Startʢ೔ຊޠʣ takezoe.hatenablog.com/entry/2017/05/11/132410

  53. Installation PredictionIOͷΠϯετʔϧ ιʔε͔ΒϏϧυʢϏϧυ༻εΫϦϓτʣ ·ͨDockerΠϝʔδ΋༻ҙ͞Ε͍ͯ·͢ SparkͷΠϯετʔϧ ετϨʔδͷΠϯετʔϧ ֶशʹ࢖༻͢ΔσʔλͳͲΛ֨ೲ͢ΔͨΊͷετϨʔδ ετϨʔδ͝ͱʹอଘͰ͖Δσʔλͷछྨ͕ҟͳΓ·͢ PostgreSQL /

    Elasticsearch / HBase / HDFS
  54. PIO CLI eventserver Launch an Event Server app Manage apps

    that are used by the Event Server build Build an engine at the current train Kick off a training using an engine deploy Deploy an engine as an engine server
  55. eventserver app build train deploy

  56. Photo by Bernard Spragg. NZ System Architecture

  57. System Architecture Apache Hadoop up to 2.7.2 required only if

    YARN and HDFS are needed
 Apache HBase up to 1.2.4 Apache Spark up to 1.6.3
 for Hadoop 2.6 not Spark 2.x version Elasticsearch up to 1.7.5 not the Elasticsearch 2.x version
  58. None
  59. Storage roles Meta Data Event Data Model Data ✓ ✓

    ✓ ✓ ✓* ✓ ✓ LOCALFS ✓
  60. HDFS

  61. Scala ੡ ɾ ػցֶशج൫ PredictionIOͱ SparkʹΑΔϨίϝϯυγ ες Ϝ

  62. Scala ੡ ɾ ػցֶशج൫ PredictionIOͱ SparkʹΑΔϨίϝϯυγ ες Ϝ ͔͜͜Βɺຊ୊Ͱ͢

  63. Photo by Bernard Spragg. NZ Implementation of 
 Recommendation Engine

    Template
  64. x ML

  65. Recommendation? JOB A JOB B Cafe Waiter Shibuya JOB C

    View Restaurant Waiter Shibuya Startup Programmer Roppongi
  66. Recommendation? JOB A JOB B Cafe Waiter Shibuya JOB C

    View Restaurant Waiter Shibuya Startup Programmer Roppongi
  67. Recommendation? JOB A JOB B Cafe Waiter Shibuya JOB C

    View Restaurant Waiter Shibuya Startup Programmer Roppongi Item-Based Recommendation
  68. Recommendation? ? User A User B User C

  69. Recommendation? ? User A User B User C

  70. Recommendation? ? User A User B User C User-based Recommendation

  71. Collaborative Filtering ڠௐϑΟϧλϦϯά Ϩίϝϯυͷ୅දతͳख๏ υϝΠϯ஌͕ࣝෆཁ ར༻ऀ͕ଟ͍৔߹ʹ͸༗ར Cold-Start ໰୊

  72. Collaborative Filtering Job A Job B Job C Similarity User

    X View Through - 1 User A View Through View 1 User B Through View Through -1 User C View View View 0.5 Recommended 1.5
  73. Collaborative Filtering Job A Job B Job C Similarity User

    X View Through - User A View Through View 1 User B Through View Through -1 User C View View View 0.5 Recommended 1.5
  74. ͘Θ͘͠͸

  75. System Requirements ελϯόΠͷϨίϝϯυཁ݅ ϢʔβͷΫϦοΫϩά ͓ؾʹೖΓ௥Ճϩά S3ʹϩάσʔλ্͕͕͍ͬͯΔ ֶश͸೔࣍Ͱ

  76. ElasticsearchͱTasteϓϥάΠϯͰ ࡞ΔϨίϝϯυγεςϜ

  77. PIOಋೖલ σʔλ༻ͷESΠϯσοΫΛຖճ࡞੒ σʔλΠϯϙʔτ༻ͷઐ༻εΫϦϓτ Elasticsearch TasteϓϥάΠϯͰ࣮ߦ ศར͕ͩ൚༻ੑɺσʔλ૿ʹΑΔ࣮ߦ͕࣌ؒ՝୊ʹ ֶश݁ՌΛόϧΫϑΝΠϧͱͯ͠ग़ྗ Elasticsearch ༧ଌͷAPIͱͯ͠ར༻ ࣮ߦϑϩʔΛγΣϧεΫϦϓτͰ؅ཧ

  78. σʔλ༻ͷESΠϯσοΫΛຖճ࡞੒ σʔλΠϯϙʔτ༻ͷઐ༻εΫϦϓτ Elasticsearch TasteϓϥάΠϯͰ࣮ߦ ศར͕ͩ൚༻ੑɺσʔλ૿ʹΑΔ࣮ߦ͕࣌ؒ՝୊ʹ ֶश݁ՌΛόϧΫϑΝΠϧͱͯ͠ग़ྗ Elasticsearch ༧ଌͷAPIͱͯ͠ར༻ ࣮ߦϑϩʔΛγΣϧεΫϦϓτͰ؅ཧ PIOಋೖલ

  79. Click Log Favorite Log Event Server ALS Template

  80. Click Log Favorite Log Event Server ALS Template pio import

  81. Click Log Favorite Log Elasticsearch v5.3 cluster Event Server ALS

    Template pio import Data
  82. Click Log Favorite Log Elasticsearch v5.3 cluster Event Server ALS

    Template pio import Data Spark 2 node cluster RDD
  83. Click Log Favorite Log Elasticsearch v5.3 cluster Event Server ALS

    Template pio import Data LOCALFS Spark 2 node cluster RDD Model
  84. Click Log Favorite Log Elasticsearch v5.3 cluster Event Server ALS

    Template pio import Data LOCALFS Spark 2 node cluster RDD Model Query Predicted Result
  85. Engine Template?

  86. Engine Template

  87. None
  88. D A S E D-A-S-E Data Source and Data Preparator

    Algorithm Serving Evaluation Metrics
  89. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌॲཧ Prediction Server ༧ଌ݁Ռ Predicted Result
  90. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌॲཧ Prediction Server ༧ଌ݁Ռ Predicted Result
  91. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌॲཧ Prediction Server ༧ଌ݁Ռ Predicted Result
  92. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌॲཧ Prediction Server ༧ଌ݁Ռ Predicted Result
  93. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌॲཧ Prediction Server ༧ଌ݁Ռ Predicted Result
  94. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌॲཧ Prediction Server ༧ଌ݁Ռ Predicted Result
  95. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D
  96. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A
  97. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A Serving S
  98. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A Serving S E Evaluation Metrics
  99. None
  100. D

  101. D A

  102. None
  103. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A Serving S E Evaluation Metrics
  104. D

  105. DataSource •Event Store (Event Server) ͔ΒσʔλΛಡࠐ •TrainingDataΛฦ͢

  106. None
  107. None
  108. D

  109. Preparator • TrainingDataʹର͢Δલॲཧ • ಛ௃நग़ • ෳ਺AlgorithmΛར༻͢Δ৔߹ͷڞ௨ॲཧ • PreparedDataʹม׵ͯ͠Algoritmʹ౉͢

  110. None
  111. None
  112. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A Serving S E Evaluation Metrics
  113. A

  114. Algorithm • train() Λ࣮૷ • ༧ଌϞσϧͷֶशΛ୲౰͢Δ • pio train ίϚϯυͰݺͼग़͞ΕΔ

    • HDFSʢLocalFSʣʹετΞ͞ΕΔ • predict() Λ࣮૷ • σϓϩΠޙͷΫΤϦʹରͯ͠ϦΞϧλΠϜʹݺ͹ΕΔ
  115. None
  116. None
  117. None
  118. None
  119. None
  120. None
  121. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A Serving S E Evaluation Metrics
  122. None
  123. Serving • LServeΛܧঝ • serve() Λ࣮૷

  124. None
  125. Machine Learning Flow τϨʔχϯάσʔλ Training Data ػցֶशΞϧΰϦζϜ Machine Learning Algorithm

    ༧ଌϞσϧ Predictive Model લॲཧ Preprocessing Πϯϓοτσʔλ Input Data ༧ଌϞσϧ Predictive Model ༧ଌ݁Ռ Predicted Result Data Source & Preparator D Algorithm A Serving S E Evaluation Metrics
  126. Evaluation? ࠷దͳϋΠύʔύϥϝʔλͷςετ ϋΠύʔύϥϝʔλ σʔλ͔Β͸ֶशͰ͖ͳ͍ʢਓ͕ܾؒఆ͢Δʣύϥϝʔλ ϝτϦοΫΛ༻͍ͨϋΠύʔύϥϝʔλͷௐ੔͢Δ νϡʔχϯά͸ࣗಈԽ͍ͨ͠ ΫϩεόϦσʔγϣϯ τϨʔχϯάσʔλΛ෼ׂ͠ɺͦͷҰ෦Λݕূ༻ͷσʔλͱͯ͠༻͍Δख๏ ෳ਺ճͷݕূΛߦ͏

  127. Cross-validation Training Data Validation Data Training Data

  128. Cross-validation Training Data Validation Data Training Data

  129. Cross-validation Training Data Validation Data Training Data

  130. Cross-validation Training Data x10 Validation Data Training Data

  131. Grid Search Parameter B Parameter A

  132. Grid Search Parameter B Parameter A

  133. Grid Search Parameter B Parameter A

  134. Grid Search Parameter B Parameter A

  135. Grid Search Parameter B Parameter A

  136. Precision@k Precision@5 / Threshold = 2.0 Predicted A B C

    D E
  137. Precision@k Precision@5 / Threshold = 2.0 Predicted A Validation B

    C D E A ˒ˑˑ B ˒˒˒ X ˒˒ˑ D ˑˑˑ E ˒˒ˑ
  138. Precision@k Precision@5 / Threshold = 2.0 Predicted A Validation B

    C D E A ˒ˑˑ B ˒˒˒ X ˒˒ˑ D ˑˑˑ E ˒˒ˑ
  139. Precision@k Precision@5 / Threshold = 2.0 Predicted A Validation B

    C D E A ˒ˑˑ B ˒˒˒ X ˒˒ˑ D ˑˑˑ E ˒˒ˑ PositiveCount: 2.0
  140. x ML 5 Jobs

  141. None
  142. None
  143. Photo by Bernard Spragg. NZ Conclusion

  144. τϨʔχϯάσʔλ ϦΞϧλΠϜͰ΋ɺόονͰ΋σʔλΛऔΓࠐΉI/F͕͋Δ ΞΫηετʔΫϯΛൃߦͰ͖ΔͷͰɺ֤αʔϏεͱͷ࿈ܞ͕ศར Elasticsearchͷ෼ࢄετϨʔδͷػೳΛڗडͰ͖Δ ֶशॲཧͷ࣮ߦ࣌ؒ SparkͷΫϥελΛ࢖͏ͨΊɺॲཧΛ෼ࢄֶ͠शʹ͔͔Δ࣌ؒΛ୹ॖ Open Source Machine Learning

    Server
  145. ֶशϞσϧͷετϨʔδ ελϯόΠͰ͸LOCALFSΛར༻͍ͯ͠Δ
 ϞσϧͷಛੑʹԠͯ͡HDFSΛબ୒Մೳ ༧ଌͷWeb API “pio deploy” ίϚϯυ͚ͩͰ༧ଌͷAPIΛ࡞੒Ͱ͖Δ APIαʔό͸Akka-Httpϕʔε Open

    Source Machine Learning Server
  146. Case Studies ଞͷࣄྫ

  147. ॻྨબߟ௨ա཰ - ಺ఆ཰ - ಺ఆঝ୚཰ ༧ଌ Prediction for Reject Ratio

    ٻਓͷ೥ऩਪఆ Salary Prediction ٻਓ಺༰ͷࣗಈੜ੒ Job description writing-bot
  148. None
  149. ୤ɾଐਓԽ

  150. Photo by Bernard Spragg. NZ Appendix

  151. ϏζϦʔνͰ͸ػցֶशΞϓϦέʔγϣϯΛ ૊৫తʹ։ൃ͍ͯ͘͠ʹ͋ͨΓɺ։ൃɾӡ༻ج൫ͱ ͯ͠"QBDIF1SFEJDUJPO*0ʹऔΓ૊ΜͰ͍·͢ɻ ೥݄ฐ͔ࣾΒ໊͕ίϛολͱͯ͠ 1SFEJDUJPO*0ͷ։ൃʹࢀՃ͢Δ͜ͱʹͳΓ·ͨ͠ɻ Shinsuke Sugaya Naoki
 Takezoe Takako


    Shimamoto Takahiro Hagino
  152. ϏζϦʔνͰ͸ػցֶशΞϓϦέʔγϣϯΛ ૊৫తʹ։ൃ͍ͯ͘͠ʹ͋ͨΓɺ։ൃɾӡ༻ج൫ͱ ͯ͠"QBDIF1SFEJDUJPO*0ʹऔΓ૊ΜͰ͍·͢ɻ ೥݄ฐ͔ࣾΒ໊͕ίϛολͱͯ͠ 1SFEJDUJPO*0ͷ։ൃʹࢀՃ͢Δ͜ͱʹͳΓ·ͨ͠ɻ Shinsuke Sugaya Naoki
 Takezoe Takako


    Shimamoto Takahiro Hagino
  153. How to Contribute to PIO

  154. Add support for Elasticsearch 5.x

  155. None
  156. jpioug.org

  157. • ೔ຊ Apache PredictionIO Ϣʔβձ JPIOUG Join Us!

  158. None
  159. 30 6݄ FRI

  160. 30 6݄ FRI Open Source Machine Learning Server 02 JPIOUG

    Meetup 19:30 @Shibuya
  161. 30 6݄ FRI Open Source Machine Learning Server 02 JPIOUG

    Meetup 19:30 @Shibuya ʲٸืʳLT͍ͨ͠ํ
  162. 30 8݄ WED

  163. 30 8݄ WED Open Source Machine Learning Server 03 JPIOUG

    Meetup 19:30 @Shibuya
  164. 30 8݄ WED Open Source Machine Learning Server 03 JPIOUG

    Meetup 19:30 @Shibuya ʲΏΔืʳLT͍ͨ͠ํ
  165. jpioug.org

  166. Open Source Machine Learning Server 2017 JJUG CCC Spring Thank

    You