Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Japan PredictionIO User Group Meetup #01

Japan PredictionIO User Group Meetup #01

A Slide for JPIOUG (Japan PredictionIO User Group) Meetup #01
第1回 PredictionIO勉強会の発表資料です。
https://d-cube.connpass.com/event/48590/

3ea093e99fbc5ecf5b898c4c0ffd86c0?s=128

takahiro-hagino

January 22, 2017
Tweet

Transcript

  1. Open Source Machine Learning Server 01 JPIOUG Meetup

  2. Topics Introduction to Apache PredictionIO PIOͱ͸ͳʹ͔ System Architecture PIOͷΞʔΩςΫνϟ Quick

    Start PIOΛಈ͔ͯ͠ΈΑ͏ Implementation of Engine Template ΤϯδϯςϯϓϨʔτΛͭ͘Δʹ͸
  3. Photo by Bernard Spragg. NZ Introduction to
 Apache PredictionIO

  4. Apache PredictionIO? Apache PredictionIO (incubating) is an open source Machine

    Learning Server built on top of state-of- the-art open source stack for developers and data scientists create predictive engines for any machine learning task.
  5. Apache PredictionIO let you ςϯϓϨʔτ͔Β༧ଌΤϯδϯΛ࡞Γɺ
 ͙͢ʹWebαʔϏεͱͯ͠σϓϩΠͰ͖Δ quickly build and deploy

    an engine as a web service on production with customizable templates; ϦΞϧλΠϜʹΫΤϦ΁݁ՌΛฦ͢͜ͱ͕Ͱ͖Δ respond to dynamic queries in real-time once deployed as a web service;
  6. Apache PredictionIO let you ޡࠩͷௐ੔΍ɺධՁͷ࢓૊Έ͕༻ҙ͞Ε͍ͯΔ evaluate and tune multiple engine

    variants systematically; όονͰ΋ϦΞϧλΠϜͰ΋͋ΒΏΔϓϥοτ
 ϑΥʔϜ͔ΒͷσʔλΛ·ͱΊͯूΊΒΕΔ unify data from multiple platforms in batch or in real-time for comprehensive predictive analytics;
  7. Apache PredictionIO let you ࢓૊ΈԽ͞ΕͨΤϯδϯςϯϓϨʔτ͕͋Γػցֶ शͷϞσϧ࡞੒͕ૉૣ͘Ͱ͖Δ speed up machine learning

    modeling with systematic processes and pre-built evaluation measures; Spark MLLib΍OpenNLPͳͲػցֶशɺ
 σʔλॲཧϥΠϒϥϦΛલఏͱ͢Δ support machine learning and data processing libraries such as Spark MLLib and OpenNLP;
  8. Apache PredictionIO let you ࣗ෼ͷֶशϞσϧΛ࣮૷ͯ͠Τϯδϯʹ૊ࠐΊΔ implement your own machine learning

    models and seamlessly incorporate them into your engine; σʔλΠϯϑϥͷ؅ཧ͕༰қʹͳΔ simplify data infrastructure management.
  9. The Story Behind the Frog ΧΤϧʢଞɺΞϦɺͯΜͱ͏஬ʣ͸ ؾީͷมԽ͔Βɺ஍਒Λ༧ଌͰ͖Δɻ ͜͏ͯ͠PredictionIOͷΧΤϧ͸OSS ͷಈ෺ԂʹՃΘͬͨͷͩɻ “I

    end up finding out other animals like ants, frogs, ladybugs etc having the ability to predict various attributes from temperature change to earthquake, and finally settled on the frog.”
 
 “PredictionIO’s logo (the Frog) joins a veritable zoo of other famous open-source logos featuring animals.” 
 The Story Behind the Frog blog.prediction.io/story-behind-frog-logo/
  10. Initial Committers Pat Ferrell ActionML Tamas Jambor Channel4 Justin Yip

    independent Xusen Yin USC Lee Moon Soo NFLabs Donald Szeto Salesforce
  11. None
  12. Overview Event Server σʔλऩू༻ʹHTTPϕʔεͷEventAPI΋͘͠͸
 SDKͰఏڙ͞ΕΔEventClientܦ༝Ͱσʔλ௥Ճ Engine ༧ଌͷछྨʢͨͱ͑͹Ϩίϝϯσʔγϣϯʣ
 D-A-S-EʹΑͬͯߏ੒͞ΕΔ

  13. Photo by Bernard Spragg. NZ Quick Start

  14. None
  15. Versions Latest Release Version v0.9.6 Current Version v.0.10.0-incubating Road Map

    issues.apache.org/jira/browse/PIO/?selectedTab=com.atlassian.jira.jira- projects-plugin:roadmap-panel
  16. PIO CLI status Displays status information about PredictionIO version Displays

    the version of this command line console template Creates a new engine based on an engine template
  17. PIO CLI build Build an engine at the current train

    Kick off a training using an engine deploy Deploy an engine as an engine server
  18. PIO CLI eventserver Launch an Event Server app Manage apps

    that are used by the Event Server
  19. PIO CLI accesskey Manage app access keys export Export events

    from the Event Server run Launch a driver progra eval Kick off an evaluation using an engine dashboard Launch an evaluation dashboard
  20. Photo by Bernard Spragg. NZ Algorithm

  21. None
  22. Machine Learning? Extracting ಛ௃நग़ͳͲ Transforming σʔλΛՃ޻ɺܗଶૉղੳͳͲ Classification ෼ྨ໰୊ɿڭࢣ͋Γ Regression ճؼ෼ੳɿڭࢣ͋Γ

    Clustering ෼ྨ໰୊ɿڭࢣͳ͠ Collaborative filtering ਪન
  23. Extractors TF-IDF 
 Word2Vec Count Vectorizer

  24. Transformers Tokenizer ܗଶૉղੳ StopWordsRemover ετοϓϫʔυআڈ n-gram จࣈ෼ׂ Binarizer ͖͍͠஋ม׵ PCA

    ओ੒෼෼ੳ
  25. Classification Logistic Regression ϩδεςΟοΫճؼ Decision tree ܾఆ໦ Random forest ϥϯμϜϑΥϨετ

    Naive Bayes φΠʔϒϕΠζ
  26. Regression Linear regression ઢܗճؼ Generalized linear regression ҰൠԽઢܗճؼ Decision tree

    regression ճؼ໦ Survival regression ੜଘ෼ੳ
  27. Clustering K-means kฏۉ๏ Latent Dirichlet allocation (LDA) τϐοΫநग़ Bisecting k-means

    Gaussian Mixure Model (GMM)
  28. Photo by Bernard Spragg. NZ System Architecture

  29. System Architecture Apache Hadoop up to 2.7.2 required only if

    YARN and HDFS are needed
 Apache HBase up to 1.2.4 Apache Spark up to 1.6.3
 for Hadoop 2.6 not Spark 2.x version Elasticsearch up to 1.7.5 not the Elasticsearch 2.x version
  30. System Architecture

  31. HBase Event Server ྻࢦ޲ɺ෼ࢄσʔλϕʔε GoogleͷBigTableΛϞσϧͱͨOSS࣮૷ Apache HadoopϓϩδΣΫτͷҰ෦ͱͯ͠։ൃ HDFS্Ͱ࣮ߦɺHadoopʹର͠BigtableͷΑ͏ͳػೳΛఏڙ͢Δ

  32. Apache Spark σʔλࣄલॲཧɾAlgorithmֶश େن໛σʔλॲཧΤϯδϯ PredictionIOͰ͸SparkΛ࢖Θͳ͍͜ͱ΋Մೳ ଟ͘ͷ৔߹͸MLlibΛར༻͢Δ

  33. HDFS σʔληοτͷಡࠐɾϞσϧͷॻࠐ ΫϥελʔؒͰͷ෼ࢄϑΝΠϧγεςϜ Ϟσϧͷग़ྗ͸HDFSͷ΄͔ɺϩʔΧϧϑΝΠϧγεςϜɺ ElasticsearchΛར༻Ͱ͖Δ

  34. Elasticsearch ϝλσʔλ؅ཧ ෼ࢄܕશจݕࡧΤϯδϯ ϞσϧͷόʔδϣϯɺΤϯδϯͷόʔδϣϯɺΞΫηεΩʔͱ AppIdͷϚοϐϯάɺֶश݁ՌͷϞσϧͳͲϝλσʔλͷ؅ཧ

  35. Hadoop HDFS Copyright © 2008 The Apache Software Foundation.

  36. Hadoop MapReduce © tutorialspoint 2017.

  37. Cons - MapReduce ॲཧ࣌ؒ ϓϩάϥϜΛىಈ͠ɺԿ΋ͤͣऴྃ͢ΔδϣϒͰ΋1ͭ਺ेඵ͔͔Δ Φʔόʔϔου ຖճετϨʔδͱͷಡΈॻ͖͕ൃੜ͢Δ͜ͱʹΑΔΦʔόʔϔου͕େ͖͍

  38. Cons - MapReduce ॲཧ࣌ؒ ϓϩάϥϜΛىಈ͠ɺԿ΋ͤͣऴྃ͢ΔδϣϒͰ΋1ͭ਺ेඵ͔͔Δ Φʔόʔϔου ຖճετϨʔδͱͷಡΈॻ͖͕ൃੜ͢Δ͜ͱʹΑΔΦʔόʔϔου͕େ͖͍ ػցֶशͷΑ͏ͳ܁Γฦ͠ॲཧͰ͸ɺੑೳ͕ग़ͳ͍

  39. Pros - Spark Ωϟογϡػೳͷಋೖ σʔλΛϝϞϦʹอ࣋ ৐Γ੾Βͳ͍৔߹͸σΟεΫʹు͖ग़͢ ػցֶशͰར༻͢ΔߦྻσʔλͳͲͰ͋Ε͹৐Γ੾Δ͜ͱ͕ଟ͍ RDD (Resilient Distributed

    Dataset) ॲཧର৅ͷσʔλɾηοτΛந৅Խͨ͠΋ͷ ো֐͕ൃੜͨ͠৔߹͸ετϨʔδ͔ΒḷΕΔ৘ใΛ͓࣋ͬͯΓ
 ϨδϦΤϯτʹઃܭ͞Ε͍ͯΔ ScalaͷΠϛϡʔλϒϧͳίϨΫγϣϯͰද͢
  40. Spark RDD © tutorialspoint 2017.

  41. Examples using RDD

  42. Photo by Bernard Spragg. NZ Implementation of 
 Engine Template

  43. None
  44. D A S E D-A-S-E Data Source and Data Preparator

    Algorithm Serving Evaluation Metrics
  45. None
  46. None
  47. Template Sample D S A E

  48. PEvents / LEvents PEvents ֶश࣌ʹSpark͔Βݺͼग़͢ σʔλετΞʹHadoopܦ༝ͰΞΫηε RDD[Event] Λฦ͢ LEvents EventServerίʔϧ࣌ͷσʔλετΞ΁ͷΞΫηε

    Future[Event]Λฦ͢
  49. DataSource •Event Store (Event Server) ͔ΒσʔλΛಡࠐ •TrainingDataΛฦ͢ •PDataSourceΛܧঝ •readTraining() Λ࣮૷

    •PEventStore Engine APIͰσʔλΛಡΈग़͢
  50. Preparator • TrainingDataʹର͢Δલॲཧ • ಛ௃நग़ • ෳ਺AlgorithmΛར༻͢Δ৔߹ͷڞ௨ॲཧ • PreparedDataʹม׵ͯ͠Algoritmʹ౉͢ •

    prepare()Λ࣮૷
  51. Algorithm • train() Λ࣮૷ • ༧ଌϞσϧͷֶशΛ୲౰͢Δ • pio train ίϚϯυͰݺͼग़͞ΕΔ

    • HDFSʢLocalFSʣʹετΞ͞ΕΔ • predict() Λ࣮૷ • σϓϩΠޙͷΫΤϦʹରͯ͠ϦΞϧλΠϜʹݺ͹ΕΔ
  52. Algorithm • P2LAlgorithm • Ϟσϧ͕γϦΞϥΠζ͞Εͯอଘ͞ΕΔ • PAlgorithm • RDDΛؚΜͩϞσϧ͕࡞ΒΕΔ৔߹ •

    Ϟσϧ͸IPersistentModelΛܧঝ • save()Λ࣮૷ʢWriteʣ • ίϯύχΦϯΦϒδΣΫτʹapply()Λ࣮૷ʢReadʣ
  53. Serving • LServeΛܧঝ • serve() Λ࣮૷

  54. Photo by Bernard Spragg. NZ Appendix

  55. How to Contribute to PIO

  56. Add support for Elasticsearch 5.x

  57. Bug fix for Templates

  58. • ೔ຊ Apache PredictionIO Ϣʔβձ JPIOUG https://groups.google.com/forum/#!forum/predictionio-user-jp Join Us!

  59. Open Source Machine Learning Server 01 JPIOUG Meetup Thank You