Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Japan PredictionIO User Group Meetup #01

Japan PredictionIO User Group Meetup #01

A Slide for JPIOUG (Japan PredictionIO User Group) Meetup #01
第1回 PredictionIO勉強会の発表資料です。
https://d-cube.connpass.com/event/48590/

takahiro-hagino

January 22, 2017
Tweet

More Decks by takahiro-hagino

Other Decks in Technology

Transcript

  1. Open Source Machine Learning Server
    01
    JPIOUG
    Meetup

    View full-size slide

  2. Topics
    Introduction to Apache PredictionIO
    PIOͱ͸ͳʹ͔
    System Architecture
    PIOͷΞʔΩςΫνϟ
    Quick Start
    PIOΛಈ͔ͯ͠ΈΑ͏
    Implementation of Engine Template
    ΤϯδϯςϯϓϨʔτΛͭ͘Δʹ͸

    View full-size slide

  3. Photo by Bernard Spragg. NZ
    Introduction to

    Apache
    PredictionIO

    View full-size slide

  4. Apache
    PredictionIO?
    Apache PredictionIO
    (incubating) is an open
    source Machine Learning
    Server built on top of state-of-
    the-art open source stack for
    developers and data scientists
    create predictive engines for
    any machine learning task.

    View full-size slide

  5. Apache PredictionIO let you
    ςϯϓϨʔτ͔Β༧ଌΤϯδϯΛ࡞Γɺ

    ͙͢ʹWebαʔϏεͱͯ͠σϓϩΠͰ͖Δ
    quickly build and deploy an engine as a web service on production with customizable templates;
    ϦΞϧλΠϜʹΫΤϦ΁݁ՌΛฦ͢͜ͱ͕Ͱ͖Δ
    respond to dynamic queries in real-time once deployed as a web service;

    View full-size slide

  6. Apache PredictionIO let you
    ޡࠩͷௐ੔΍ɺධՁͷ࢓૊Έ͕༻ҙ͞Ε͍ͯΔ
    evaluate and tune multiple engine variants systematically;
    όονͰ΋ϦΞϧλΠϜͰ΋͋ΒΏΔϓϥοτ

    ϑΥʔϜ͔ΒͷσʔλΛ·ͱΊͯूΊΒΕΔ
    unify data from multiple platforms in batch or in real-time for comprehensive predictive analytics;

    View full-size slide

  7. Apache PredictionIO let you
    ࢓૊ΈԽ͞ΕͨΤϯδϯςϯϓϨʔτ͕͋Γػցֶ
    शͷϞσϧ࡞੒͕ૉૣ͘Ͱ͖Δ
    speed up machine learning modeling with systematic processes and pre-built evaluation measures;
    Spark MLLib΍OpenNLPͳͲػցֶशɺ

    σʔλॲཧϥΠϒϥϦΛલఏͱ͢Δ
    support machine learning and data processing libraries such as Spark MLLib and OpenNLP;

    View full-size slide

  8. Apache PredictionIO let you
    ࣗ෼ͷֶशϞσϧΛ࣮૷ͯ͠Τϯδϯʹ૊ࠐΊΔ
    implement your own machine learning models and seamlessly incorporate them into your engine;
    σʔλΠϯϑϥͷ؅ཧ͕༰қʹͳΔ
    simplify data infrastructure management.

    View full-size slide

  9. The Story Behind the Frog
    ΧΤϧʢଞɺΞϦɺͯΜͱ͏஬ʣ͸
    ؾީͷมԽ͔Βɺ஍਒Λ༧ଌͰ͖Δɻ
    ͜͏ͯ͠PredictionIOͷΧΤϧ͸OSS
    ͷಈ෺ԂʹՃΘͬͨͷͩɻ
    “I end up finding out other animals like ants, frogs,
    ladybugs etc having the ability to predict various
    attributes from temperature change to earthquake, and
    finally settled on the frog.”


    “PredictionIO’s logo (the Frog) joins a veritable zoo of
    other famous open-source logos featuring animals.”

    The Story Behind the Frog
    blog.prediction.io/story-behind-frog-logo/

    View full-size slide

  10. Initial Committers
    Pat Ferrell
    ActionML
    Tamas Jambor
    Channel4
    Justin Yip
    independent
    Xusen Yin
    USC
    Lee Moon Soo
    NFLabs
    Donald Szeto
    Salesforce

    View full-size slide

  11. Overview
    Event Server
    σʔλऩू༻ʹHTTPϕʔεͷEventAPI΋͘͠͸

    SDKͰఏڙ͞ΕΔEventClientܦ༝Ͱσʔλ௥Ճ
    Engine
    ༧ଌͷछྨʢͨͱ͑͹Ϩίϝϯσʔγϣϯʣ

    D-A-S-EʹΑͬͯߏ੒͞ΕΔ

    View full-size slide

  12. Photo by Bernard Spragg. NZ
    Quick Start

    View full-size slide

  13. Versions
    Latest Release Version
    v0.9.6
    Current Version
    v.0.10.0-incubating
    Road Map
    issues.apache.org/jira/browse/PIO/?selectedTab=com.atlassian.jira.jira-
    projects-plugin:roadmap-panel

    View full-size slide

  14. PIO CLI
    status
    Displays status information about PredictionIO
    version
    Displays the version of this command line console
    template
    Creates a new engine based on an engine template

    View full-size slide

  15. PIO CLI
    build
    Build an engine at the current
    train
    Kick off a training using an engine
    deploy
    Deploy an engine as an engine server

    View full-size slide

  16. PIO CLI
    eventserver
    Launch an Event Server
    app
    Manage apps that are used by the Event Server

    View full-size slide

  17. PIO CLI
    accesskey
    Manage app access keys
    export
    Export events from the Event Server
    run
    Launch a driver progra
    eval
    Kick off an evaluation using an engine
    dashboard
    Launch an evaluation dashboard

    View full-size slide

  18. Photo by Bernard Spragg. NZ
    Algorithm

    View full-size slide

  19. Machine Learning?
    Extracting
    ಛ௃நग़ͳͲ
    Transforming
    σʔλΛՃ޻ɺܗଶૉղੳͳͲ
    Classification
    ෼ྨ໰୊ɿڭࢣ͋Γ
    Regression
    ճؼ෼ੳɿڭࢣ͋Γ
    Clustering
    ෼ྨ໰୊ɿڭࢣͳ͠
    Collaborative filtering
    ਪન

    View full-size slide

  20. Extractors
    TF-IDF 

    Word2Vec
    Count Vectorizer

    View full-size slide

  21. Transformers
    Tokenizer
    ܗଶૉղੳ
    StopWordsRemover
    ετοϓϫʔυআڈ
    n-gram
    จࣈ෼ׂ
    Binarizer
    ͖͍͠஋ม׵
    PCA
    ओ੒෼෼ੳ

    View full-size slide

  22. Classification
    Logistic Regression
    ϩδεςΟοΫճؼ
    Decision tree
    ܾఆ໦
    Random forest
    ϥϯμϜϑΥϨετ
    Naive Bayes
    φΠʔϒϕΠζ

    View full-size slide

  23. Regression
    Linear regression
    ઢܗճؼ
    Generalized linear regression
    ҰൠԽઢܗճؼ
    Decision tree regression
    ճؼ໦
    Survival regression
    ੜଘ෼ੳ

    View full-size slide

  24. Clustering
    K-means
    kฏۉ๏
    Latent Dirichlet allocation (LDA)
    τϐοΫநग़
    Bisecting k-means
    Gaussian Mixure Model (GMM)

    View full-size slide

  25. Photo by Bernard Spragg. NZ
    System
    Architecture

    View full-size slide

  26. System Architecture
    Apache Hadoop up to 2.7.2
    required only if YARN and HDFS are needed

    Apache HBase up to 1.2.4
    Apache Spark up to 1.6.3

    for Hadoop 2.6
    not Spark 2.x version
    Elasticsearch up to 1.7.5
    not the Elasticsearch 2.x version

    View full-size slide

  27. System Architecture

    View full-size slide

  28. HBase
    Event Server
    ྻࢦ޲ɺ෼ࢄσʔλϕʔε
    GoogleͷBigTableΛϞσϧͱͨOSS࣮૷
    Apache HadoopϓϩδΣΫτͷҰ෦ͱͯ͠։ൃ
    HDFS্Ͱ࣮ߦɺHadoopʹର͠BigtableͷΑ͏ͳػೳΛఏڙ͢Δ

    View full-size slide

  29. Apache Spark
    σʔλࣄલॲཧɾAlgorithmֶश
    େن໛σʔλॲཧΤϯδϯ
    PredictionIOͰ͸SparkΛ࢖Θͳ͍͜ͱ΋Մೳ
    ଟ͘ͷ৔߹͸MLlibΛར༻͢Δ

    View full-size slide

  30. HDFS
    σʔληοτͷಡࠐɾϞσϧͷॻࠐ
    ΫϥελʔؒͰͷ෼ࢄϑΝΠϧγεςϜ
    Ϟσϧͷग़ྗ͸HDFSͷ΄͔ɺϩʔΧϧϑΝΠϧγεςϜɺ
    ElasticsearchΛར༻Ͱ͖Δ

    View full-size slide

  31. Elasticsearch
    ϝλσʔλ؅ཧ
    ෼ࢄܕશจݕࡧΤϯδϯ
    ϞσϧͷόʔδϣϯɺΤϯδϯͷόʔδϣϯɺΞΫηεΩʔͱ
    AppIdͷϚοϐϯάɺֶश݁ՌͷϞσϧͳͲϝλσʔλͷ؅ཧ

    View full-size slide

  32. Hadoop HDFS
    Copyright © 2008 The Apache Software Foundation.

    View full-size slide

  33. Hadoop MapReduce
    © tutorialspoint 2017.

    View full-size slide

  34. Cons - MapReduce
    ॲཧ࣌ؒ
    ϓϩάϥϜΛىಈ͠ɺԿ΋ͤͣऴྃ͢ΔδϣϒͰ΋1ͭ਺ेඵ͔͔Δ
    Φʔόʔϔου
    ຖճετϨʔδͱͷಡΈॻ͖͕ൃੜ͢Δ͜ͱʹΑΔΦʔόʔϔου͕େ͖͍

    View full-size slide

  35. Cons - MapReduce
    ॲཧ࣌ؒ
    ϓϩάϥϜΛىಈ͠ɺԿ΋ͤͣऴྃ͢ΔδϣϒͰ΋1ͭ਺ेඵ͔͔Δ
    Φʔόʔϔου
    ຖճετϨʔδͱͷಡΈॻ͖͕ൃੜ͢Δ͜ͱʹΑΔΦʔόʔϔου͕େ͖͍
    ػցֶशͷΑ͏ͳ܁Γฦ͠ॲཧͰ͸ɺੑೳ͕ग़ͳ͍

    View full-size slide

  36. Pros - Spark
    Ωϟογϡػೳͷಋೖ
    σʔλΛϝϞϦʹอ࣋
    ৐Γ੾Βͳ͍৔߹͸σΟεΫʹు͖ग़͢
    ػցֶशͰར༻͢ΔߦྻσʔλͳͲͰ͋Ε͹৐Γ੾Δ͜ͱ͕ଟ͍
    RDD (Resilient Distributed Dataset)
    ॲཧର৅ͷσʔλɾηοτΛந৅Խͨ͠΋ͷ
    ো֐͕ൃੜͨ͠৔߹͸ετϨʔδ͔ΒḷΕΔ৘ใΛ͓࣋ͬͯΓ

    ϨδϦΤϯτʹઃܭ͞Ε͍ͯΔ
    ScalaͷΠϛϡʔλϒϧͳίϨΫγϣϯͰද͢

    View full-size slide

  37. Spark RDD
    © tutorialspoint 2017.

    View full-size slide

  38. Examples using RDD

    View full-size slide

  39. Photo by Bernard Spragg. NZ
    Implementation of 

    Engine Template

    View full-size slide

  40. D
    A
    S
    E
    D-A-S-E
    Data Source and Data Preparator
    Algorithm
    Serving
    Evaluation Metrics

    View full-size slide

  41. Template Sample
    D
    S
    A
    E

    View full-size slide

  42. PEvents / LEvents
    PEvents
    ֶश࣌ʹSpark͔Βݺͼग़͢
    σʔλετΞʹHadoopܦ༝ͰΞΫηε
    RDD[Event] Λฦ͢
    LEvents
    EventServerίʔϧ࣌ͷσʔλετΞ΁ͷΞΫηε
    Future[Event]Λฦ͢

    View full-size slide

  43. DataSource
    •Event Store (Event Server) ͔ΒσʔλΛಡࠐ
    •TrainingDataΛฦ͢
    •PDataSourceΛܧঝ
    •readTraining() Λ࣮૷
    •PEventStore Engine APIͰσʔλΛಡΈग़͢

    View full-size slide

  44. Preparator
    • TrainingDataʹର͢Δલॲཧ
    • ಛ௃நग़
    • ෳ਺AlgorithmΛར༻͢Δ৔߹ͷڞ௨ॲཧ
    • PreparedDataʹม׵ͯ͠Algoritmʹ౉͢
    • prepare()Λ࣮૷

    View full-size slide

  45. Algorithm
    • train() Λ࣮૷
    • ༧ଌϞσϧͷֶशΛ୲౰͢Δ
    • pio train ίϚϯυͰݺͼग़͞ΕΔ
    • HDFSʢLocalFSʣʹετΞ͞ΕΔ
    • predict() Λ࣮૷
    • σϓϩΠޙͷΫΤϦʹରͯ͠ϦΞϧλΠϜʹݺ͹ΕΔ

    View full-size slide

  46. Algorithm
    • P2LAlgorithm
    • Ϟσϧ͕γϦΞϥΠζ͞Εͯอଘ͞ΕΔ
    • PAlgorithm
    • RDDΛؚΜͩϞσϧ͕࡞ΒΕΔ৔߹
    • Ϟσϧ͸IPersistentModelΛܧঝ
    • save()Λ࣮૷ʢWriteʣ
    • ίϯύχΦϯΦϒδΣΫτʹapply()Λ࣮૷ʢReadʣ

    View full-size slide

  47. Serving
    • LServeΛܧঝ
    • serve() Λ࣮૷

    View full-size slide

  48. Photo by Bernard Spragg. NZ
    Appendix

    View full-size slide

  49. How to Contribute to PIO

    View full-size slide

  50. Add support for Elasticsearch 5.x

    View full-size slide

  51. Bug fix for Templates

    View full-size slide

  52. ● ೔ຊ
    Apache PredictionIO Ϣʔβձ
    JPIOUG
    https://groups.google.com/forum/#!forum/predictionio-user-jp
    Join Us!

    View full-size slide

  53. Open Source Machine Learning Server
    01
    JPIOUG
    Meetup
    Thank You

    View full-size slide