$30 off During Our Annual Pro Sale. View Details »

Building a Recommendation Engine with Spark and Apache PredictionIO

Building a Recommendation Engine with Spark and Apache PredictionIO

Scala製機械学習基盤PredictionIOとSparkによるレコメンドシステム | JJUG CCC 2017 SPRING
#ccc_a3
SparkやMLlib、HDFS、Elasticsearchなど、注目を集めるオープンソースをベースとした機械学習サーバApache PredictionIOの概論と、同システムを使ったレコメンドシステム開発で得られた知見を共有するセッションです。Apache PredictionIOは様々な機械学習の手法をテンプレートに記述するだけで、Sparkをベースに学習タスクの分散処理が可能になります。それだけでなく、学習モデルから予測値を返したり、新たなイベントデータをリアルタイムに受けつけるAPIサーバまでを統合的に提供するプラットフォーム技術です。
本セッションでは、機械学習のインフラデザインとしても参考になるPredictionIOのアーキテクチャや、日本最大級の求人検索エンジンのログデータから、ユーザに最適な求人を推薦するレコメンドシステムの開発を通じて、学習ロジックのつくり方、学習モデルの評価と改善、Spark MLlibのチューニングやハマりどころなど実践導入のなかでのノウハウや苦労をお話します。Webシステムに機械学習を導入する際にPredictionIOを使うメリットをお伝えできればと思いますので、ぜひご参加ください。

takahiro-hagino

May 20, 2017
Tweet

More Decks by takahiro-hagino

Other Decks in Technology

Transcript

  1. Open Source Machine Learning Server
    2017
    JJUG CCC
    Spring

    View Slide

  2. Scala ੡
    ɾ
    ػցֶशج൫ PredictionIOͱ
    SparkʹΑΔϨίϝϯυγ
    ες
    Ϝ

    View Slide

  3. ػցֶशͷ͓೰Έ
    • ֶशɾϞσϧσʔλͷετϨʔδ
    • ػցֶशͷ෼ࢄॲཧϑϨʔϜϫʔΫ
    • σʔλ༧ଌͷWebαʔϏεʢAPIʣԽ
    ղܾͷબ୒ࢶ →
    ఻͍͑ͨ͜ͱ

    View Slide

  4. ػցֶशͷ͓೰Έ
    • ֶशɾϞσϧσʔλͷετϨʔδ
    • ػցֶशͷ෼ࢄॲཧϑϨʔϜϫʔΫ
    • σʔλ༧ଌͷWebαʔϏεʢAPIʣԽ
    ղܾͷબ୒ࢶ →
    ఻͍͑ͨ͜ͱ

    View Slide

  5. Warning
    Java ͷ࿩͸͋Γ·ͤΜ
    Scala ੡ɾػցֶशج൫ PredictionIO ͷ͓࿩Ͱ͢
    ػցֶश ͷ͜ͱ͸࿩͠·ͤΜ
    ͋͘·Ͱػցֶशج൫ͷ͓࿩Ͱ͢

    View Slide

  6. Warning
    Java ͷ࿩͸͋Γ·ͤΜ
    Scala ੡ɾػցֶशج൫ PredictionIO ͷ͓࿩Ͱ͢
    ػցֶश ͷ͜ͱ͸࿩͠·ͤΜ
    ͋͘·Ͱػցֶशج൫ͷ͓࿩Ͱ͢

    View Slide

  7. Warning
    Java ͷ࿩͸͋Γ·ͤΜ
    Scala ੡ɾػցֶशج൫ PredictionIO ͷ͓࿩Ͱ͢
    ػցֶश ͷ͜ͱ͸࿩͠·ͤΜ
    ͋͘·Ͱػցֶशج൫ͷ͓࿩Ͱ͢

    View Slide

  8. Photo by Bernard Spragg. NZ
    About me

    View Slide

  9. Profile
    Takahiro Hagino
    Bizreach
    גࣜձࣾϏζϦʔν
    • ٻਓݕࡧΤϯδϯʮελϯόΠʯ
    • AIࣨ

    View Slide

  10. Open Source Machine Learning Server

    View Slide

  11. ٻਓݕࡧΤϯδϯ
    Play on Scala ͰͷϚΠΫϩαʔϏεɾΞʔΩςΫνϟ
    σʔλɾετϨʔδɺ
    ෼ࢄݕࡧʹElasticsearch Λ࠾༻
    ৗ࣌400ສ݅Ҏ্ͷٻਓΛΫϩʔϦϯά
    iOS/AndroidΞϓϦͰ͸஍ਤݕࡧ͕Մೳ
    ελϯόΠ

    View Slide

  12. x ML
    ٻ৬ऀͱٻਓͷϚονϯά
    Search Quality and Recommendation
    ೥ऩਪఆ
    Salary Prediction
    ۀछɾ৬छਪఆ
    Job Category Prediction
    ٻਓಛ௃ਪఆ
    Prediction of Job Characteristics

    View Slide

  13. ٻ৬ऀͱٻਓͷϚονϯά
    Search Quality and Recommendation
    ೥ऩਪఆ
    Salary Prediction
    ۀछɾ৬छਪఆ
    Job Category Prediction
    ٻਓಛ௃ਪఆ
    Prediction of Job Characteristics
    x ML

    View Slide

  14. ٻ৬ऀͱٻਓͷϚονϯά
    Search Quality and Recommendation
    ೥ऩਪఆ
    Salary Prediction
    ۀछɾ৬छਪఆ
    Job Category Prediction
    ٻਓಛ௃ਪఆ
    Prediction of Job Characteristics
    x ML

    View Slide

  15. ٻ৬ऀͱٻਓͷϚονϯά
    Search Quality and Recommendation
    ೥ऩਪఆ
    Salary Prediction
    ۀछɾ৬छਪఆ
    Job Category Prediction
    ٻਓಛ௃ਪఆ
    Prediction of Job Characteristics
    x ML

    View Slide

  16. ٻ৬ऀͱٻਓͷϚονϯά
    Search Quality and Recommendation
    ೥ऩਪఆ
    Salary Prediction
    ۀछɾ৬छਪఆ
    Job Category Prediction
    ٻਓಛ௃ਪఆ
    Prediction of Job Characteristics
    x ML

    View Slide

  17. ٻ৬ऀͱٻਓͷϚονϯά
    Search Quality and Recommendation
    ೥ऩਪఆ
    Salary Prediction
    ۀछɾ৬छਪఆ
    Job Category Prediction
    ٻਓಛ௃ਪఆ
    Prediction of Job Characteristics
    x ML

    View Slide

  18. ٻ৬ऀͱٻਓͷϚονϯά
    Search Quality and Recommendation
    ೥ऩਪఆ
    Salary Prediction
    ۀछɾ৬छਪఆ
    Job Category Prediction
    ٻਓಛ௃ਪఆ
    Prediction of Job Characteristics
    x ML

    View Slide

  19. x ML

    View Slide

  20. x ML

    View Slide

  21. τϨʔχϯάσʔλ
    σʔλ͝ͱES΁ͷόονΠϯϙʔτεΫϦϓτΛ४උ͍ͯͨ͠

    ֶशͷ؀ڥ΍࣮ߦϑϩʔ͕୲౰ऀ͝ͱʹଐਓԽ

    τϨʔχϯάσʔλͷϑΥʔϚοτ΋ఆ·͍ͬͯͳ͔ͬͨ
    ֶशॲཧͷ࣮ߦ࣌ؒ
    σʔλྔͷ૿Ճʹͱ΋ͳ͏ֶशͷ௕࣌ؒԽ
    x MLͷ͓ͳ΍Έ

    View Slide

  22. τϨʔχϯάσʔλ
    σʔλ͝ͱES΁ͷόονΠϯϙʔτεΫϦϓτΛ४උ͍ͯͨ͠

    ֶशͷ؀ڥ΍࣮ߦϑϩʔ͕୲౰ऀ͝ͱʹଐਓԽ

    τϨʔχϯάσʔλͷϑΥʔϚοτ΋ఆ·͍ͬͯͳ͔ͬͨ
    ֶशॲཧͷ࣮ߦ࣌ؒ
    σʔλྔͷ૿Ճʹͱ΋ͳ͏ֶशͷ௕࣌ؒԽ
    x MLͷ͓ͳ΍Έ

    View Slide

  23. ֶशϞσϧͷετϨʔδ
    ػೳ͝ͱʹֶशϞσϧͷอଘઌ΋ଐਓԽ
    ༧ଌͷWeb API
    ػೳ͝ͱʹTornadoͳͲͰ؆қͳAPIαʔόΛ࡞੒͓ͯ͠Γɺ

    ͢͹΍͘ΞϓϦʹ൓өͰ͖ͳ͍
    ༧ଌ݁ՌΛฦ͢APIͱͯ͠ESΛར༻͢ΔͳͲ͍ͯͨ͠
    x MLͷ͓ͳ΍Έ

    View Slide

  24. ֶशϞσϧͷετϨʔδ
    ػೳ͝ͱʹֶशϞσϧͷอଘઌ΋ଐਓԽ
    ༧ଌͷWeb API
    ػೳ͝ͱʹTornadoͳͲͰ؆қͳAPIαʔόΛ࡞੒͓ͯ͠Γɺ

    ͢͹΍͘ΞϓϦʹ൓өͰ͖ͳ͍
    ༧ଌ݁ՌΛฦ͢APIͱͯ͠ESΛར༻͢ΔͳͲ͍ͯͨ͠
    x MLͷ͓ͳ΍Έ

    View Slide

  25. AIࣨͷ͓ͳ΍Έ
    શࣾͷ༷ʑͳࣄۀͱ࿈ܞ
    ෳ਺ͷࣄۀ͕͋ΓɺػցֶशͷػೳΛఏڙ͍ͯ͠Δ
    ֤ࣄۀͱσʔλ࿈ܞͷI/FΛڞ௨Խ͍ͨ͠
    ։ൃϑϩʔΛڞ௨Խ͍ͨ͠
    ઐ೚ͷΠϯϑϥΤϯδχΞ΋͍ͳ͍
    ఏڙ͢Δػೳ͝ͱʹɺݸผʹΠϯϑϥΛ੔͑Δͱແବ͕ଟ͍
    ग़དྷΔݶΓϑϨʔϜϫʔΫԽ͍ͨ͠

    View Slide

  26. Solution
    ղܾࡦΛ୳͠·ͨ͠

    View Slide

  27. Machine Learning Stacks
    Apps
    Algorithm
    Processing
    Datastore
    API Server (Tornado…)
    Scikitlearn, SparkML …
    DL: Caffe2, DL4j, Tensorflow, Chainer …
    Hadoop, Spark, Storm …
    Elasticsearch, HBASE, Redshift …

    View Slide

  28. Machine Learning Stacks
    Apps
    Algorithm
    Processing
    Datastore
    API Server
    Scikitlearn, SparkML …
    DL: Caffe2, DL4j, Tensorflow, Chainer …
    Hadoop, Spark, Storm …
    Elasticsearch, HBASE, Redshift …
    PredictionIO

    View Slide

  29. View Slide

  30. PredictionIO?

    View Slide

  31. Salesforce Acquires
    PredictionIO

    View Slide

  32. Salesforce Acquires
    PredictionIO
    Feb 19, 2016 - TechCrunch
    4BMFTGPSDF͕ػցֶशϓϥοτϑΥʔϜͷ
    1SFEJDUJPO*0Λങऩ

    View Slide

  33. Salesforce Introduces
    Salesforce Einstein

    View Slide

  34. Salesforce Acquires
    PredictionIO
    Sep 18, 2016 - TechCrunch
    "*ΛऔΓࠐΉ4BMFTGPSDFͷ໺๬
    ػցֶशϓϥοτϑΥʔϜʮ&JOTUFJOʯΛൃද

    View Slide

  35. The most stars repositories
    on Github?
    spark
    apache/spark ˒ 12.8k
    incubator-predictionio
    apache/incubator-predictionio ˒ 10.2k
    playframework
    playframework/playframework ˒ 9.3k
    scala
    scala/scala ˒ 8.2k

    View Slide

  36. spark
    apache/spark ˒ 12.8k
    incubator-predictionio
    apache/incubator-predictionio ˒ 10.2k
    playframework
    playframework/playframework ˒ 9.3k
    scala
    scala/scala ˒ 8.2k
    The most stars repositories
    on Github?
    ˒10.2k

    View Slide

  37. What is PredictionIO?

    View Slide

  38. Apache
    PredictionIO?
    Apache PredictionIO
    (incubating) is an open source
    Machine Learning Server built
    on top of state-of-the-art open
    source stack for developers
    and data scientists create
    predictive engines for any
    machine learning task.

    View Slide

  39. Apache PredictionIO
    (incubating) is an open source
    Machine Learning Server built
    on top of state-of-the-art open
    source stack for developers
    and data scientists create
    predictive engines for any
    machine learning task.
    Apache
    PredictionIO?
    ࠷ઌ୺ͷΦʔϓϯιʔεΛ
    ૊߹Θͤͨػցֶशαʔό

    View Slide

  40. Apache PredictionIO
    (incubating) is an open source
    Machine Learning Server built
    on top of state-of-the-art open
    source stack for developers
    and data scientists create
    predictive engines for any
    machine learning task.
    Apache
    PredictionIO?
    ࠷ઌ୺ͷΦʔϓϯιʔεΛ
    ૊߹Θͤͨػցֶशαʔό
    ͲΜͳػցֶशλεΫͰ΋
    ༧ଌΤϯδϯ͕ͭ͘ΕΔ

    View Slide

  41. Apache PredictionIO let you
    ର৅໰୊͝ͱʹςϯϓϨʔτΛ࡞Γɺ

    ͙͢ʹσϓϩΠͰ͖Δ
    quickly build and deploy an engine as a web service on production with customizable templates;
    ΫΤϦ౤͛ͯ݁ՌΛฦ͢API͕͋Δ
    respond to dynamic queries in real-time once deployed as a web service;

    View Slide

  42. Apache PredictionIO let you
    ޡࠩͷௐ੔΍ɺධՁͷ࢓૊Έ΋͋Δ
    evaluate and tune multiple engine variants systematically;
    όον or ϦΞϧλΠϜͰ

    ֶशσʔλΛొ࿥͢ΔI/F͕͋Δ
    unify data from multiple platforms in batch or in real-time for comprehensive predictive analytics;

    View Slide

  43. View Slide

  44. View Slide

  45. View Slide

  46. REST: EventAPI
    SDK: EventClient

    View Slide

  47. View Slide

  48. Engine
    Template

    View Slide

  49. View Slide

  50. Photo by Bernard Spragg. NZ
    Quick Start

    View Slide

  51. Versions
    Latest Release Version
    v0.11.0

    View Slide

  52. Quick Startʢ೔ຊޠʣ
    takezoe.hatenablog.com/entry/2017/05/11/132410

    View Slide

  53. Installation
    PredictionIOͷΠϯετʔϧ
    ιʔε͔ΒϏϧυʢϏϧυ༻εΫϦϓτʣ
    ·ͨDockerΠϝʔδ΋༻ҙ͞Ε͍ͯ·͢
    SparkͷΠϯετʔϧ
    ετϨʔδͷΠϯετʔϧ
    ֶशʹ࢖༻͢ΔσʔλͳͲΛ֨ೲ͢ΔͨΊͷετϨʔδ
    ετϨʔδ͝ͱʹอଘͰ͖Δσʔλͷछྨ͕ҟͳΓ·͢
    PostgreSQL / Elasticsearch / HBase / HDFS

    View Slide

  54. PIO CLI
    eventserver
    Launch an Event Server
    app
    Manage apps that are used by the Event Server
    build
    Build an engine at the current
    train
    Kick off a training using an engine
    deploy
    Deploy an engine as an engine server

    View Slide

  55. eventserver
    app
    build
    train
    deploy

    View Slide

  56. Photo by Bernard Spragg. NZ
    System
    Architecture

    View Slide

  57. System Architecture
    Apache Hadoop up to 2.7.2
    required only if YARN and HDFS are needed

    Apache HBase up to 1.2.4
    Apache Spark up to 1.6.3

    for Hadoop 2.6
    not Spark 2.x version
    Elasticsearch up to 1.7.5
    not the Elasticsearch 2.x version

    View Slide

  58. View Slide

  59. Storage roles
    Meta Data Event Data Model Data
    ✓ ✓ ✓
    ✓ ✓*


    LOCALFS ✓

    View Slide

  60. HDFS

    View Slide

  61. Scala ੡
    ɾ
    ػցֶशج൫ PredictionIOͱ
    SparkʹΑΔϨίϝϯυγ
    ες
    Ϝ

    View Slide

  62. Scala ੡
    ɾ
    ػցֶशج൫ PredictionIOͱ
    SparkʹΑΔϨίϝϯυγ
    ες
    Ϝ
    ͔͜͜Βɺຊ୊Ͱ͢

    View Slide

  63. Photo by Bernard Spragg. NZ
    Implementation of 

    Recommendation
    Engine Template

    View Slide

  64. x ML

    View Slide

  65. Recommendation?
    JOB A JOB B
    Cafe
    Waiter
    Shibuya
    JOB C
    View
    Restaurant
    Waiter
    Shibuya
    Startup
    Programmer
    Roppongi

    View Slide

  66. Recommendation?
    JOB A JOB B
    Cafe
    Waiter
    Shibuya
    JOB C
    View
    Restaurant
    Waiter
    Shibuya
    Startup
    Programmer
    Roppongi

    View Slide

  67. Recommendation?
    JOB A JOB B
    Cafe
    Waiter
    Shibuya
    JOB C
    View
    Restaurant
    Waiter
    Shibuya
    Startup
    Programmer
    Roppongi
    Item-Based Recommendation

    View Slide

  68. Recommendation?
    ?
    User A
    User B
    User C

    View Slide

  69. Recommendation?
    ?
    User A
    User B
    User C

    View Slide

  70. Recommendation?
    ?
    User A
    User B
    User C User-based
    Recommendation

    View Slide

  71. Collaborative Filtering
    ڠௐϑΟϧλϦϯά
    Ϩίϝϯυͷ୅දతͳख๏
    υϝΠϯ஌͕ࣝෆཁ
    ར༻ऀ͕ଟ͍৔߹ʹ͸༗ར
    Cold-Start ໰୊

    View Slide

  72. Collaborative Filtering
    Job A Job B Job C Similarity
    User X View Through - 1
    User A View Through View 1
    User B Through View Through -1
    User C View View View 0.5
    Recommended 1.5

    View Slide

  73. Collaborative Filtering
    Job A Job B Job C Similarity
    User X View Through -
    User A View Through View 1
    User B Through View Through -1
    User C View View View 0.5
    Recommended 1.5

    View Slide

  74. ͘Θ͘͠͸

    View Slide

  75. System Requirements
    ελϯόΠͷϨίϝϯυཁ݅
    ϢʔβͷΫϦοΫϩά
    ͓ؾʹೖΓ௥Ճϩά
    S3ʹϩάσʔλ্͕͕͍ͬͯΔ
    ֶश͸೔࣍Ͱ

    View Slide

  76. ElasticsearchͱTasteϓϥάΠϯͰ
    ࡞ΔϨίϝϯυγεςϜ

    View Slide

  77. PIOಋೖલ
    σʔλ༻ͷESΠϯσοΫΛຖճ࡞੒
    σʔλΠϯϙʔτ༻ͷઐ༻εΫϦϓτ
    Elasticsearch TasteϓϥάΠϯͰ࣮ߦ
    ศར͕ͩ൚༻ੑɺσʔλ૿ʹΑΔ࣮ߦ͕࣌ؒ՝୊ʹ
    ֶश݁ՌΛόϧΫϑΝΠϧͱͯ͠ग़ྗ
    Elasticsearch ༧ଌͷAPIͱͯ͠ར༻
    ࣮ߦϑϩʔΛγΣϧεΫϦϓτͰ؅ཧ

    View Slide

  78. σʔλ༻ͷESΠϯσοΫΛຖճ࡞੒
    σʔλΠϯϙʔτ༻ͷઐ༻εΫϦϓτ
    Elasticsearch TasteϓϥάΠϯͰ࣮ߦ
    ศར͕ͩ൚༻ੑɺσʔλ૿ʹΑΔ࣮ߦ͕࣌ؒ՝୊ʹ
    ֶश݁ՌΛόϧΫϑΝΠϧͱͯ͠ग़ྗ
    Elasticsearch ༧ଌͷAPIͱͯ͠ར༻
    ࣮ߦϑϩʔΛγΣϧεΫϦϓτͰ؅ཧ
    PIOಋೖલ

    View Slide

  79. Click Log
    Favorite
    Log
    Event Server
    ALS
    Template

    View Slide

  80. Click Log
    Favorite
    Log
    Event Server
    ALS
    Template
    pio import

    View Slide

  81. Click Log
    Favorite
    Log
    Elasticsearch
    v5.3
    cluster
    Event Server
    ALS
    Template
    pio import
    Data

    View Slide

  82. Click Log
    Favorite
    Log
    Elasticsearch
    v5.3
    cluster
    Event Server
    ALS
    Template
    pio import
    Data
    Spark
    2 node cluster
    RDD

    View Slide

  83. Click Log
    Favorite
    Log
    Elasticsearch
    v5.3
    cluster
    Event Server
    ALS
    Template
    pio import
    Data
    LOCALFS Spark
    2 node cluster
    RDD
    Model

    View Slide

  84. Click Log
    Favorite
    Log
    Elasticsearch
    v5.3
    cluster
    Event Server
    ALS
    Template
    pio import
    Data
    LOCALFS Spark
    2 node cluster
    RDD
    Model
    Query
    Predicted
    Result

    View Slide

  85. Engine Template?

    View Slide

  86. Engine
    Template

    View Slide

  87. View Slide

  88. D
    A
    S
    E
    D-A-S-E
    Data Source and Data Preparator
    Algorithm
    Serving
    Evaluation Metrics

    View Slide

  89. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌॲཧ
    Prediction Server
    ༧ଌ݁Ռ
    Predicted Result

    View Slide

  90. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌॲཧ
    Prediction Server
    ༧ଌ݁Ռ
    Predicted Result

    View Slide

  91. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌॲཧ
    Prediction Server
    ༧ଌ݁Ռ
    Predicted Result

    View Slide

  92. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌॲཧ
    Prediction Server
    ༧ଌ݁Ռ
    Predicted Result

    View Slide

  93. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌॲཧ
    Prediction Server
    ༧ଌ݁Ռ
    Predicted Result

    View Slide

  94. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌॲཧ
    Prediction Server
    ༧ଌ݁Ռ
    Predicted Result

    View Slide

  95. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D

    View Slide

  96. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D
    Algorithm
    A

    View Slide

  97. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D
    Algorithm
    A
    Serving
    S

    View Slide

  98. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D
    Algorithm
    A
    Serving
    S
    E Evaluation Metrics

    View Slide

  99. View Slide

  100. D

    View Slide

  101. D
    A

    View Slide

  102. View Slide

  103. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D
    Algorithm
    A
    Serving
    S
    E Evaluation Metrics

    View Slide

  104. D

    View Slide

  105. DataSource
    •Event Store (Event Server) ͔ΒσʔλΛಡࠐ
    •TrainingDataΛฦ͢

    View Slide

  106. View Slide

  107. View Slide

  108. D

    View Slide

  109. Preparator
    • TrainingDataʹର͢Δલॲཧ
    • ಛ௃நग़
    • ෳ਺AlgorithmΛར༻͢Δ৔߹ͷڞ௨ॲཧ
    • PreparedDataʹม׵ͯ͠Algoritmʹ౉͢

    View Slide

  110. View Slide

  111. View Slide

  112. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D
    Algorithm
    A
    Serving
    S
    E Evaluation Metrics

    View Slide

  113. A

    View Slide

  114. Algorithm
    • train() Λ࣮૷
    • ༧ଌϞσϧͷֶशΛ୲౰͢Δ
    • pio train ίϚϯυͰݺͼग़͞ΕΔ
    • HDFSʢLocalFSʣʹετΞ͞ΕΔ
    • predict() Λ࣮૷
    • σϓϩΠޙͷΫΤϦʹରͯ͠ϦΞϧλΠϜʹݺ͹ΕΔ

    View Slide

  115. View Slide

  116. View Slide

  117. View Slide

  118. View Slide

  119. View Slide

  120. View Slide

  121. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D
    Algorithm
    A
    Serving
    S
    E Evaluation Metrics

    View Slide

  122. View Slide

  123. Serving
    • LServeΛܧঝ
    • serve() Λ࣮૷

    View Slide

  124. View Slide

  125. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D
    Algorithm
    A
    Serving
    S
    E Evaluation Metrics

    View Slide

  126. Evaluation?
    ࠷దͳϋΠύʔύϥϝʔλͷςετ
    ϋΠύʔύϥϝʔλ
    σʔλ͔Β͸ֶशͰ͖ͳ͍ʢਓ͕ܾؒఆ͢Δʣύϥϝʔλ
    ϝτϦοΫΛ༻͍ͨϋΠύʔύϥϝʔλͷௐ੔͢Δ
    νϡʔχϯά͸ࣗಈԽ͍ͨ͠
    ΫϩεόϦσʔγϣϯ
    τϨʔχϯάσʔλΛ෼ׂ͠ɺͦͷҰ෦Λݕূ༻ͷσʔλͱͯ͠༻͍Δख๏
    ෳ਺ճͷݕূΛߦ͏

    View Slide

  127. Cross-validation
    Training Data
    Validation Data Training Data

    View Slide

  128. Cross-validation
    Training Data
    Validation Data Training Data

    View Slide

  129. Cross-validation
    Training Data
    Validation Data Training Data

    View Slide

  130. Cross-validation
    Training Data
    x10
    Validation Data Training Data

    View Slide

  131. Grid Search
    Parameter B
    Parameter A

    View Slide

  132. Grid Search
    Parameter B
    Parameter A

    View Slide

  133. Grid Search
    Parameter B
    Parameter A

    View Slide

  134. Grid Search
    Parameter B
    Parameter A

    View Slide

  135. Grid Search
    Parameter B
    Parameter A

    View Slide

  136. Precision@k
    Precision@5 / Threshold = 2.0
    Predicted A B C D E

    View Slide

  137. Precision@k
    Precision@5 / Threshold = 2.0
    Predicted A
    Validation
    B C D E
    A
    ˒ˑˑ
    B
    ˒˒˒
    X
    ˒˒ˑ
    D
    ˑˑˑ
    E
    ˒˒ˑ

    View Slide

  138. Precision@k
    Precision@5 / Threshold = 2.0
    Predicted A
    Validation
    B C D E
    A
    ˒ˑˑ
    B
    ˒˒˒
    X
    ˒˒ˑ
    D
    ˑˑˑ
    E
    ˒˒ˑ

    View Slide

  139. Precision@k
    Precision@5 / Threshold = 2.0
    Predicted A
    Validation
    B C D E
    A
    ˒ˑˑ
    B
    ˒˒˒
    X
    ˒˒ˑ
    D
    ˑˑˑ
    E
    ˒˒ˑ
    PositiveCount: 2.0

    View Slide

  140. x ML
    5 Jobs

    View Slide

  141. View Slide

  142. View Slide

  143. Photo by Bernard Spragg. NZ
    Conclusion

    View Slide

  144. τϨʔχϯάσʔλ
    ϦΞϧλΠϜͰ΋ɺόονͰ΋σʔλΛऔΓࠐΉI/F͕͋Δ
    ΞΫηετʔΫϯΛൃߦͰ͖ΔͷͰɺ֤αʔϏεͱͷ࿈ܞ͕ศར
    Elasticsearchͷ෼ࢄετϨʔδͷػೳΛڗडͰ͖Δ
    ֶशॲཧͷ࣮ߦ࣌ؒ
    SparkͷΫϥελΛ࢖͏ͨΊɺॲཧΛ෼ࢄֶ͠शʹ͔͔Δ࣌ؒΛ୹ॖ
    Open Source Machine Learning Server

    View Slide

  145. ֶशϞσϧͷετϨʔδ
    ελϯόΠͰ͸LOCALFSΛར༻͍ͯ͠Δ

    ϞσϧͷಛੑʹԠͯ͡HDFSΛબ୒Մೳ
    ༧ଌͷWeb API
    “pio deploy” ίϚϯυ͚ͩͰ༧ଌͷAPIΛ࡞੒Ͱ͖Δ
    APIαʔό͸Akka-Httpϕʔε
    Open Source Machine Learning Server

    View Slide

  146. Case Studies
    ଞͷࣄྫ

    View Slide

  147. ॻྨબߟ௨ա཰ - ಺ఆ཰ - ಺ఆঝ୚཰ ༧ଌ
    Prediction for Reject Ratio
    ٻਓͷ೥ऩਪఆ
    Salary Prediction
    ٻਓ಺༰ͷࣗಈੜ੒
    Job description writing-bot

    View Slide

  148. View Slide

  149. ୤ɾଐਓԽ

    View Slide

  150. Photo by Bernard Spragg. NZ
    Appendix

    View Slide

  151. ϏζϦʔνͰ͸ػցֶशΞϓϦέʔγϣϯΛ
    ૊৫తʹ։ൃ͍ͯ͘͠ʹ͋ͨΓɺ։ൃɾӡ༻ج൫ͱ
    ͯ͠"QBDIF1SFEJDUJPO*0ʹऔΓ૊ΜͰ͍·͢ɻ
    ೥݄ฐ͔ࣾΒ໊͕ίϛολͱͯ͠
    1SFEJDUJPO*0ͷ։ൃʹࢀՃ͢Δ͜ͱʹͳΓ·ͨ͠ɻ
    Shinsuke
    Sugaya
    Naoki

    Takezoe
    Takako

    Shimamoto
    Takahiro
    Hagino

    View Slide

  152. ϏζϦʔνͰ͸ػցֶशΞϓϦέʔγϣϯΛ
    ૊৫తʹ։ൃ͍ͯ͘͠ʹ͋ͨΓɺ։ൃɾӡ༻ج൫ͱ
    ͯ͠"QBDIF1SFEJDUJPO*0ʹऔΓ૊ΜͰ͍·͢ɻ
    ೥݄ฐ͔ࣾΒ໊͕ίϛολͱͯ͠
    1SFEJDUJPO*0ͷ։ൃʹࢀՃ͢Δ͜ͱʹͳΓ·ͨ͠ɻ
    Shinsuke
    Sugaya
    Naoki

    Takezoe
    Takako

    Shimamoto
    Takahiro
    Hagino

    View Slide

  153. How to Contribute to PIO

    View Slide

  154. Add support for Elasticsearch 5.x

    View Slide

  155. View Slide

  156. jpioug.org

    View Slide

  157. ● ೔ຊ
    Apache PredictionIO Ϣʔβձ
    JPIOUG
    Join Us!

    View Slide

  158. View Slide

  159. 30

    FRI

    View Slide

  160. 30

    FRI
    Open Source Machine Learning Server
    02
    JPIOUG
    Meetup
    19:30 @Shibuya

    View Slide

  161. 30

    FRI
    Open Source Machine Learning Server
    02
    JPIOUG
    Meetup
    19:30 @Shibuya
    ʲٸืʳLT͍ͨ͠ํ

    View Slide

  162. 30

    WED

    View Slide

  163. 30

    WED
    Open Source Machine Learning Server
    03
    JPIOUG
    Meetup
    19:30 @Shibuya

    View Slide

  164. 30

    WED
    Open Source Machine Learning Server
    03
    JPIOUG
    Meetup
    19:30 @Shibuya
    ʲΏΔืʳLT͍ͨ͠ํ

    View Slide

  165. jpioug.org

    View Slide

  166. Open Source Machine Learning Server
    2017
    JJUG CCC
    Spring
    Thank You

    View Slide