Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache PredictionIO - Machine Learning 15minutes! #12

Apache PredictionIO - Machine Learning 15minutes! #12

2017/05/27に開催された第12回 Machine Learning 15minutes! でのLT資料です。
「Apache PredictionIOのコミッタが語る Spark&Elasticsearch 機械学習基盤 」

takahiro-hagino

May 29, 2017
Tweet

More Decks by takahiro-hagino

Other Decks in Technology

Transcript

  1. Open Source Machine Learning Server
    15
    Machine Learning
    minutes!

    View full-size slide

  2. Apache PredictionIOͷίϛολ͕ޠΔ
    Spark&Elasticsearch ػցֶशج൫

    View full-size slide

  3. ػցֶशͷ͓೰Έ
    • ֶशɾϞσϧσʔλͷετϨʔδ
    • ػցֶशͷ෼ࢄॲཧϑϨʔϜϫʔΫ
    • σʔλ༧ଌͷWebαʔϏεʢAPIʣԽ
    ղܾͷબ୒ࢶ →
    ఻͍͑ͨ͜ͱ

    View full-size slide

  4. ػցֶशͷ͓೰Έ
    • ֶशɾϞσϧσʔλͷετϨʔδ
    • ػցֶशͷ෼ࢄॲཧϑϨʔϜϫʔΫ
    • σʔλ༧ଌͷWebαʔϏεʢAPIʣԽ
    ղܾͷબ୒ࢶ →
    ఻͍͑ͨ͜ͱ

    View full-size slide

  5. Photo by Bernard Spragg. NZ
    About me

    View full-size slide

  6. Profile
    Takahiro Hagino
    Bizreach
    גࣜձࣾϏζϦʔν
    • ٻਓݕࡧΤϯδϯʮελϯόΠʯ
    • AIࣨ

    View full-size slide

  7. Open Source Machine Learning Server

    View full-size slide

  8. ϏζϦʔνͰ͸ػցֶशΞϓϦέʔγϣϯͷ
    ։ൃɾӡ༻ج൫ͱͯ͠
    "QBDIF1SFEJDUJPO*0ʹऔΓ૊ΜͰ͍·͢ɻ
    ೥݄ฐ͔ࣾΒ໊͕ίϛολͱͯ͠
    1SFEJDUJPO*0ͷ։ൃʹࢀՃ͢Δ͜ͱʹͳΓ·ͨ͠ɻ
    Shinsuke
    Sugaya
    Naoki

    Takezoe
    Takako

    Shimamoto
    Takahiro
    Hagino

    View full-size slide

  9. ϏζϦʔνͰ͸ػցֶशΞϓϦέʔγϣϯͷ
    ։ൃɾӡ༻ج൫ͱͯ͠
    "QBDIF1SFEJDUJPO*0ʹऔΓ૊ΜͰ͍·͢ɻ
    ೥݄ฐ͔ࣾΒ໊͕ίϛολͱͯ͠
    1SFEJDUJPO*0ͷ։ൃʹࢀՃ͢Δ͜ͱʹͳΓ·ͨ͠ɻ
    Shinsuke
    Sugaya
    Naoki

    Takezoe
    Takako

    Shimamoto
    Takahiro
    Hagino

    View full-size slide

  10. x ML
    ٻ৬ऀͱٻਓͷϚονϯά
    Search Quality and Recommendation
    ೥ऩਪఆ
    Salary Prediction
    ۀछɾ৬छਪఆ
    Job Category Prediction
    ٻਓಛ௃ਪఆ
    Prediction of Job Characteristics

    View full-size slide

  11. ٻ৬ऀͱٻਓͷϚονϯά
    Search Quality and Recommendation
    ೥ऩਪఆ
    Salary Prediction
    ۀछɾ৬छਪఆ
    Job Category Prediction
    ٻਓಛ௃ਪఆ
    Prediction of Job Characteristics
    x ML

    View full-size slide

  12. ٻ৬ऀͱٻਓͷϚονϯά
    Search Quality and Recommendation
    ೥ऩਪఆ
    Salary Prediction
    ۀछɾ৬छਪఆ
    Job Category Prediction
    ٻਓಛ௃ਪఆ
    Prediction of Job Characteristics
    x ML

    View full-size slide

  13. ٻ৬ऀͱٻਓͷϚονϯά
    Search Quality and Recommendation
    ೥ऩਪఆ
    Salary Prediction
    ۀछɾ৬छਪఆ
    Job Category Prediction
    ٻਓಛ௃ਪఆ
    Prediction of Job Characteristics
    x ML

    View full-size slide

  14. ٻ৬ऀͱٻਓͷϚονϯά
    Search Quality and Recommendation
    ೥ऩਪఆ
    Salary Prediction
    ۀछɾ৬छਪఆ
    Job Category Prediction
    ٻਓಛ௃ਪఆ
    Prediction of Job Characteristics
    x ML

    View full-size slide

  15. ٻ৬ऀͱٻਓͷϚονϯά
    Search Quality and Recommendation
    ೥ऩਪఆ
    Salary Prediction
    ۀछɾ৬छਪఆ
    Job Category Prediction
    ٻਓಛ௃ਪఆ
    Prediction of Job Characteristics
    x ML

    View full-size slide

  16. ٻ৬ऀͱٻਓͷϚονϯά
    Search Quality and Recommendation
    ೥ऩਪఆ
    Salary Prediction
    ۀछɾ৬छਪఆ
    Job Category Prediction
    ٻਓಛ௃ਪఆ
    Prediction of Job Characteristics
    x ML

    View full-size slide

  17. τϨʔχϯάσʔλ
    σʔλ͝ͱES΁ͷόονΠϯϙʔτεΫϦϓτΛ४උ͍ͯͨ͠

    ֶशͷ؀ڥ΍࣮ߦϑϩʔ͕୲౰ऀ͝ͱʹଐਓԽ͓ͯ͠Γɺ

    τϨʔχϯάσʔλͷϑΥʔϚοτ΋ఆ·͍ͬͯ·ͤΜɻ
    ֶशॲཧͷ࣮ߦ࣌ؒ
    σʔλྔͷ૿Ճʹͱ΋ͳ͏ֶशͷ௕࣌ؒԽ
    x MLͷ͓ͳ΍Έ

    View full-size slide

  18. τϨʔχϯάσʔλ
    σʔλ͝ͱES΁ͷόονΠϯϙʔτεΫϦϓτΛ४උ͍ͯͨ͠

    ֶशͷ؀ڥ΍࣮ߦϑϩʔ͕୲౰ऀ͝ͱʹଐਓԽ͓ͯ͠Γɺ

    τϨʔχϯάσʔλͷϑΥʔϚοτ΋ఆ·͍ͬͯ·ͤΜɻ
    ֶशॲཧͷ࣮ߦ࣌ؒ
    σʔλྔͷ૿Ճʹͱ΋ͳ͏ֶशͷ௕࣌ؒԽ
    x MLͷ͓ͳ΍Έ

    View full-size slide

  19. ֶशϞσϧͷετϨʔδ
    ػೳ͝ͱʹֶशϞσϧͷอଘઌ΋ଐਓԽ
    ༧ଌͷWeb API
    ػೳ͝ͱʹTornadoͳͲͰ؆қͳAPIαʔόΛ࡞੒͓ͯ͠Γɺ

    ͢͹΍͘ΞϓϦʹ൓өͰ͖ͳ͍
    ༧ଌ݁ՌΛฦ͢APIͱͯ͠ESΛར༻͢ΔͳͲ͍ͯͨ͠
    x MLͷ͓ͳ΍Έ

    View full-size slide

  20. ֶशϞσϧͷετϨʔδ
    ػೳ͝ͱʹֶशϞσϧͷอଘઌ΋ଐਓԽ
    ༧ଌͷWeb API
    ػೳ͝ͱʹTornadoͳͲͰ؆қͳAPIαʔόΛ࡞੒͓ͯ͠Γɺ

    ͢͹΍͘ΞϓϦʹ൓өͰ͖ͳ͍
    ༧ଌ݁ՌΛฦ͢APIͱͯ͠ESΛར༻͢ΔͳͲ͍ͯͨ͠
    x MLͷ͓ͳ΍Έ

    View full-size slide

  21. AIࣨͷ͓ͳ΍Έ
    શࣾͷ༷ʑͳࣄۀͱ࿈ܞ
    ෳ਺ͷࣄۀ͕͋ΓɺػցֶशͷػೳΛఏڙ͍ͯ͠Δ
    ֤ࣄۀͱσʔλ࿈ܞͷI/FΛڞ௨Խ͍ͨ͠
    ։ൃϑϩʔΛڞ௨Խ͍ͨ͠
    ઐ೚ͷΠϯϑϥΤϯδχΞ΋͍ͳ͍
    ఏڙ͢Δػೳ͝ͱʹɺݸผʹΠϯϑϥΛ੔͑Δͱແବ͕ଟ͍
    ग़དྷΔݶΓϑϨʔϜϫʔΫԽ͍ͨ͠

    View full-size slide

  22. AIࣨͷ͓ͳ΍Έ
    શࣾͷ༷ʑͳࣄۀͱ࿈ܞ
    ෳ਺ͷࣄۀ͕͋ΓɺػցֶशͷػೳΛఏڙ͍ͯ͠Δ
    ֤ࣄۀͱσʔλ࿈ܞͷI/FΛڞ௨Խ͍ͨ͠
    ։ൃϑϩʔΛڞ௨Խ͍ͨ͠
    ઐ೚ͷΠϯϑϥΤϯδχΞ΋͍ͳ͍
    ఏڙ͢Δػೳ͝ͱʹɺݸผʹΠϯϑϥΛ੔͑Δͱແବ͕ଟ͍
    ग़དྷΔݶΓϑϨʔϜϫʔΫԽ͍ͨ͠

    View full-size slide

  23. Solution
    ղܾࡦΛ୳͠·ͨ͠

    View full-size slide

  24. 4 3FETIJGU 3%4ͷσʔλ͔Β
    ֶशϞσϧΛ࡞੒Ͱ͖Δ
    ༧ଌͷͨΊʹΫΤϦΛ࣮ߦͰ͖Δ
    ֶशϞσϧ͔Β༧ଌ஋Λฦ͢
    8FCΞϓϦέʔγϣϯΛࣗಈੜ੒

    View full-size slide

  25. Solution
    Machine Leaning as a Service
    ML Tools

    View full-size slide

  26. Solution
    Machine Leaning as a Service
    ML Tools
    Open Source

    View full-size slide

  27. Machine Learning Stacks
    Apps
    Algorithm
    Processing
    Datastore
    API Server (Tornado…)
    Scikitlearn, SparkML …
    DL: Caffe2, DL4j, Tensorflow, Chainer …
    Hadoop, Spark, Storm …
    Elasticsearch, HBASE, Redshift …

    View full-size slide

  28. Machine Learning Stacks
    Apps
    Algorithm
    Processing
    Datastore
    API Server (Tornado…)
    Scikitlearn, SparkML …
    DL: Caffe2, DL4j, Tensorflow, Chainer …
    Hadoop, Spark, Storm …
    Elasticsearch, HBASE, Redshift …
    PredictionIO

    View full-size slide

  29. PredictionIO?

    View full-size slide

  30. Salesforce Acquires
    PredictionIO

    View full-size slide

  31. Salesforce Acquires
    PredictionIO
    Feb 19, 2016 - TechCrunch
    4BMFTGPSDF͕ػցֶशϓϥοτϑΥʔϜͷ
    1SFEJDUJPO*0Λങऩ

    View full-size slide

  32. Salesforce Introduces
    Salesforce Einstein

    View full-size slide

  33. Salesforce Introduces
    Salesforce Einstein
    Sep 18, 2016 - TechCrunch
    "*ΛऔΓࠐΉ4BMFTGPSDFͷ໺๬
    ػցֶशϓϥοτϑΥʔϜʮ&JOTUFJOʯΛൃද

    View full-size slide

  34. The most stars repositories
    on Github?
    spark
    apache/spark ˒ 12.8k
    incubator-predictionio
    apache/incubator-predictionio ˒ 10.2k
    playframework
    playframework/playframework ˒ 9.3k
    scala
    scala/scala ˒ 8.2k

    View full-size slide

  35. spark
    apache/spark ˒ 12.8k
    incubator-predictionio
    apache/incubator-predictionio ˒ 10.2k
    playframework
    playframework/playframework ˒ 9.3k
    scala
    scala/scala ˒ 8.2k
    The most stars repositories
    on Github?
    ˒10.2k

    View full-size slide

  36. What is PredictionIO?

    View full-size slide

  37. Apache
    PredictionIO?
    Apache PredictionIO
    (incubating) is an open source
    Machine Learning Server built
    on top of state-of-the-art open
    source stack for developers
    and data scientists create
    predictive engines for any
    machine learning task.

    View full-size slide

  38. Apache PredictionIO
    (incubating) is an open source
    Machine Learning Server built
    on top of state-of-the-art open
    source stack for developers
    and data scientists create
    predictive engines for any
    machine learning task.
    Apache
    PredictionIO?
    ࠷ઌ୺ͷΦʔϓϯιʔεΛ
    ૊߹Θͤͨػցֶशαʔό

    View full-size slide

  39. Apache PredictionIO
    (incubating) is an open source
    Machine Learning Server built
    on top of state-of-the-art open
    source stack for developers
    and data scientists create
    predictive engines for any
    machine learning task.
    Apache
    PredictionIO?
    ࠷ઌ୺ͷΦʔϓϯιʔεΛ
    ૊߹Θͤͨػցֶशαʔό
    ͲΜͳػցֶशλεΫͰ΋
    ༧ଌΤϯδϯ͕ͭ͘ΕΔ

    View full-size slide

  40. Apache PredictionIO let you
    ର৅໰୊͝ͱʹςϯϓϨʔτΛ࡞Γɺ

    ͙͢ʹσϓϩΠͰ͖Δ
    quickly build and deploy an engine as a web service on production with customizable templates;
    ΫΤϦ౤͛ͯ݁ՌΛฦ͢API͕͋Δ
    respond to dynamic queries in real-time once deployed as a web service;

    View full-size slide

  41. Apache PredictionIO let you
    ޡࠩͷௐ੔΍ɺධՁͷ࢓૊Έ΋͋Δ
    evaluate and tune multiple engine variants systematically;
    όον or ϦΞϧλΠϜͰ

    ֶशσʔλΛొ࿥͢ΔI/F͕͋Δ
    unify data from multiple platforms in batch or in real-time for comprehensive predictive analytics;

    View full-size slide

  42. REST: EventAPI
    SDK: EventClient

    View full-size slide

  43. Engine
    Template

    View full-size slide

  44. Photo by Bernard Spragg. NZ
    Quick Start

    View full-size slide

  45. Versions
    Latest Release Version
    v0.11.0

    View full-size slide

  46. Quick Startʢ೔ຊޠʣ
    takezoe.hatenablog.com/entry/2017/05/11/132410

    View full-size slide

  47. Photo by Bernard Spragg. NZ
    Algorithm

    View full-size slide

  48. Photo by Bernard Spragg. NZ
    System
    Architecture

    View full-size slide

  49. System Architecture
    Apache Hadoop up to 2.7.2
    required only if YARN and HDFS are needed

    Apache HBase up to 1.2.4
    Apache Spark up to 1.6.3

    for Hadoop 2.6
    not Spark 2.x version
    Elasticsearch up to 1.7.5
    not the Elasticsearch 2.x version

    View full-size slide

  50. Storage roles
    Meta Data Event Data Model Data
    ✓ ✓ ✓
    ✓ ✓*


    LOCALFS ✓

    View full-size slide

  51. Photo by Bernard Spragg. NZ
    Implementation

    View full-size slide

  52. Scala ੡
    ɾ
    ػցֶशج൫ PredictionIOͱ
    SparkʹΑΔϨίϝϯυγ
    ες
    Ϝ

    View full-size slide

  53. System Requirements
    ελϯόΠͷϨίϝϯυཁ݅
    ϢʔβͷΫϦοΫϩά
    ͓ؾʹೖΓ௥Ճϩά
    S3ʹϩάσʔλ্͕͕͍ͬͯΔ
    ֶश͸೔࣍Ͱ

    View full-size slide

  54. ElasticsearchͱTasteϓϥάΠϯͰ
    ࡞ΔϨίϝϯυγεςϜ

    View full-size slide

  55. PIOಋೖલ
    σʔλ༻ͷESΠϯσοΫΛຖճ࡞੒
    σʔλΠϯϙʔτ༻ͷઐ༻εΫϦϓτ
    Elasticsearch TasteϓϥάΠϯͰ࣮ߦ
    ศར͕ͩ൚༻ੑɺσʔλ૿ʹΑΔ࣮ߦ͕࣌ؒ՝୊ʹ
    ֶश݁ՌΛόϧΫϑΝΠϧͱͯ͠ग़ྗ
    Elasticsearch ༧ଌͷAPIͱͯ͠ར༻
    ࣮ߦϑϩʔΛγΣϧεΫϦϓτͰ؅ཧ

    View full-size slide

  56. PIOಋೖલ
    σʔλ༻ͷESΠϯσοΫΛຖճ࡞੒
    σʔλΠϯϙʔτ༻ͷઐ༻εΫϦϓτ
    Elasticsearch TasteϓϥάΠϯͰ࣮ߦ
    ศར͕ͩ൚༻ੑɺσʔλ૿ʹΑΔ࣮ߦ͕࣌ؒ՝୊ʹ
    ֶश݁ՌΛόϧΫϑΝΠϧͱͯ͠ग़ྗ
    Elasticsearch ༧ଌͷAPIͱͯ͠ར༻
    ࣮ߦϑϩʔΛγΣϧεΫϦϓτͰ؅ཧ

    View full-size slide

  57. Click Log
    Favorite
    Log
    Event Server
    ALS
    Template

    View full-size slide

  58. Click Log
    Favorite
    Log
    Event Server
    ALS
    Template
    pio import

    View full-size slide

  59. Click Log
    Favorite
    Log
    Elasticsearch
    v5.3
    cluster
    Event Server
    ALS
    Template
    pio import
    Data

    View full-size slide

  60. Click Log
    Favorite
    Log
    Elasticsearch
    v5.3
    cluster
    Event Server
    ALS
    Template
    pio import
    Data
    Spark
    2 node cluster
    RDD

    View full-size slide

  61. Click Log
    Favorite
    Log
    Elasticsearch
    v5.3
    cluster
    Event Server
    ALS
    Template
    pio import
    Data
    LOCALFS Spark
    2 node cluster
    RDD
    Model

    View full-size slide

  62. Click Log
    Favorite
    Log
    Elasticsearch
    v5.3
    cluster
    Event Server
    ALS
    Template
    pio import
    Data
    LOCALFS Spark
    2 node cluster
    RDD
    Model
    Query
    Predicted
    Result

    View full-size slide

  63. Engine Template?

    View full-size slide

  64. Engine
    Template

    View full-size slide

  65. D
    A
    S
    E
    D-A-S-E
    Data Source and Data Preparator
    Algorithm
    Serving
    Evaluation Metrics

    View full-size slide

  66. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌॲཧ
    Prediction Server
    ༧ଌ݁Ռ
    Predicted Result

    View full-size slide

  67. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌॲཧ
    Prediction Server
    ༧ଌ݁Ռ
    Predicted Result

    View full-size slide

  68. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌॲཧ
    Prediction Server
    ༧ଌ݁Ռ
    Predicted Result

    View full-size slide

  69. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌॲཧ
    Prediction Server
    ༧ଌ݁Ռ
    Predicted Result

    View full-size slide

  70. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌॲཧ
    Prediction Server
    ༧ଌ݁Ռ
    Predicted Result

    View full-size slide

  71. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌॲཧ
    Prediction Server
    ༧ଌ݁Ռ
    Predicted Result

    View full-size slide

  72. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D

    View full-size slide

  73. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D
    Algorithm
    A

    View full-size slide

  74. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D
    Algorithm
    A
    Serving
    S

    View full-size slide

  75. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D
    Algorithm
    A
    Serving
    S
    E Evaluation Metrics

    View full-size slide

  76. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D
    Algorithm
    A
    Serving
    S
    E Evaluation Metrics

    View full-size slide

  77. DataSource
    •Event Store (Event Server) ͔ΒσʔλΛಡࠐ
    •TrainingDataΛฦ͢

    View full-size slide

  78. Preparator
    • TrainingDataʹର͢Δલॲཧ
    • ಛ௃நग़
    • ෳ਺AlgorithmΛར༻͢Δ৔߹ͷڞ௨ॲཧ
    • PreparedDataʹม׵ͯ͠Algoritmʹ౉͢

    View full-size slide

  79. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D
    Algorithm
    A
    Serving
    S
    E Evaluation Metrics

    View full-size slide

  80. Algorithm
    • train() Λ࣮૷
    • ༧ଌϞσϧͷֶशΛ୲౰͢Δ
    • pio train ίϚϯυͰݺͼग़͞ΕΔ
    • HDFSʢLocalFSʣʹετΞ͞ΕΔ
    • predict() Λ࣮૷
    • σϓϩΠޙͷΫΤϦʹରͯ͠ϦΞϧλΠϜʹݺ͹ΕΔ

    View full-size slide

  81. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D
    Algorithm
    A
    Serving
    S
    E Evaluation Metrics

    View full-size slide

  82. Serving
    • LServeΛܧঝ
    • serve() Λ࣮૷

    View full-size slide

  83. Machine Learning Flow
    τϨʔχϯάσʔλ
    Training Data
    ػցֶशΞϧΰϦζϜ
    Machine Learning Algorithm
    ༧ଌϞσϧ
    Predictive Model
    લॲཧ
    Preprocessing
    Πϯϓοτσʔλ
    Input Data
    ༧ଌϞσϧ
    Predictive Model
    ༧ଌ݁Ռ
    Predicted Result
    Data Source
    & Preparator
    D
    Algorithm
    A
    Serving
    S
    E Evaluation Metrics

    View full-size slide

  84. Precision@k
    Precision@5 / Threshold = 2.0
    Predicted
    A
    ˒ˑˑ
    Validation
    B
    ˒˒˒
    C
    ˒˒ˑ
    D
    ˑˑˑ
    E
    ˒˒ˑ
    A
    ˒ˑˑ
    B
    ˒˒˒
    X
    ˒˒ˑ
    D
    ˑˑˑ
    E
    ˒˒ˑ

    View full-size slide

  85. Precision@k
    Precision@5 / Threshold = 2.0
    Predicted
    A
    ˒ˑˑ
    Validation
    B
    ˒˒˒
    C
    ˒˒ˑ
    D
    ˑˑˑ
    E
    ˒˒ˑ
    A
    ˒ˑˑ
    B
    ˒˒˒
    X
    ˒˒ˑ
    D
    ˑˑˑ
    E
    ˒˒ˑ

    View full-size slide

  86. Precision@k
    Precision@5 / Threshold = 2.0
    Predicted
    A
    ˒ˑˑ
    Validation
    B
    ˒˒˒
    C
    ˒˒ˑ
    D
    ˑˑˑ
    E
    ˒˒ˑ
    A
    ˒ˑˑ
    B
    ˒˒˒
    X
    ˒˒ˑ
    D
    ˑˑˑ
    E
    ˒˒ˑ
    PositiveCount: 2.0

    View full-size slide

  87. Photo by Bernard Spragg. NZ
    Conclusion

    View full-size slide

  88. τϨʔχϯάσʔλ
    ϦΞϧλΠϜͰ΋ɺόονͰ΋σʔλΛऔΓࠐΉ͜ͱ͕Ͱ͖Δ
    ΞΫηετʔΫϯΛൃߦͰ͖ΔͷͰɺ֤αʔϏεͱͷ࿈ܞ͕ศར
    Elasticsearchͷ෼ࢄετϨʔδͷػೳΛڗडͰ͖Δ
    ֶशॲཧͷ࣮ߦ࣌ؒ
    SparkͷΫϥελΛ࢖͏ͨΊɺॲཧΛ෼ࢄֶ͠शʹ͔͔Δ࣌ؒΛ୹ॖ
    Open Source Machine Learning Server

    View full-size slide

  89. ֶशϞσϧͷετϨʔδ
    ελϯόΠͰ͸LOCALFSΛར༻͍ͯ͠Δ

    ϞσϧͷಛੑʹԠͯ͡HDFSΛબ୒Մೳ
    ༧ଌͷWeb API
    “pio deploy” ίϚϯυ͚ͩͰ༧ଌͷAPIΛ࡞੒Ͱ͖Δ
    APIαʔό͸Akka-Httpϕʔε
    Open Source Machine Learning Server

    View full-size slide

  90. ୤ɾଐਓԽ

    View full-size slide

  91. Photo by Bernard Spragg. NZ
    Appendix

    View full-size slide

  92. ϏζϦʔνͰ͸ػցֶशΞϓϦέʔγϣϯΛ
    ૊৫తʹ։ൃ͍ͯ͘͠ʹ͋ͨΓɺ։ൃɾӡ༻ج൫ͱ
    ͯ͠"QBDIF1SFEJDUJPO*0ʹऔΓ૊ΜͰ͍·͢ɻ
    ೥݄ฐ͔ࣾΒ໊͕ίϛολͱͯ͠
    1SFEJDUJPO*0ͷ։ൃʹࢀՃ͢Δ͜ͱʹͳΓ·ͨ͠ɻ
    Shinsuke
    Sugaya
    Naoki

    Takezoe
    Takako

    Shimamoto
    Takahiro
    Hagino

    View full-size slide

  93. ϏζϦʔνͰ͸ػցֶशΞϓϦέʔγϣϯΛ
    ૊৫తʹ։ൃ͍ͯ͘͠ʹ͋ͨΓɺ։ൃɾӡ༻ج൫ͱ
    ͯ͠"QBDIF1SFEJDUJPO*0ʹऔΓ૊ΜͰ͍·͢ɻ
    ೥݄ฐ͔ࣾΒ໊͕ίϛολͱͯ͠
    1SFEJDUJPO*0ͷ։ൃʹࢀՃ͢Δ͜ͱʹͳΓ·ͨ͠ɻ
    Shinsuke
    Sugaya
    Naoki

    Takezoe
    Takako

    Shimamoto
    Takahiro
    Hagino

    View full-size slide

  94. How to Contribute to PIO

    View full-size slide

  95. Add support for Elasticsearch 5.x

    View full-size slide

  96. ● ೔ຊ
    Apache PredictionIO Ϣʔβձ
    JPIOUG
    Join Us!

    View full-size slide

  97. 30

    FRI
    Open Source Machine Learning Server
    02
    JPIOUG
    Meetup
    19:30 @Shibuya

    View full-size slide

  98. 30

    FRI
    Open Source Machine Learning Server
    02
    JPIOUG
    Meetup
    19:30 @Shibuya
    ʲٸืʳLT͍ͨ͠ํ

    View full-size slide

  99. 30

    WED
    Open Source Machine Learning Server
    03
    JPIOUG
    Meetup
    19:30 @Shibuya

    View full-size slide

  100. 30

    WED
    Open Source Machine Learning Server
    03
    JPIOUG
    Meetup
    19:30 @Shibuya
    ʲΏΔืʳLT͍ͨ͠ํ

    View full-size slide

  101. Open Source Machine Learning Server
    15
    Machine Learning
    minutes!
    Thank You

    View full-size slide