Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building and Deploying ML Applications on Produ...

Hakka Labs
February 13, 2015

Building and Deploying ML Applications on Production in a Fraction of the Time

Full post here:

Hakka Labs

February 13, 2015
Tweet

More Decks by Hakka Labs

Other Decks in Programming

Transcript

  1. Building and Deploying ML Applications on production in a fraction

    of the time. #predictionio SF Data Mining
  2. Available Open Source Tools Processing Framework • e.g. Apache Spark,

    Apache Hadoop Algorithm Libraries • e.g. MLlib, Mahout Data Storage • e.g. HBase, Cassandra
  3. You have a mobile app A Classic Recommender Example… App

    Predict products You need a Recommendation Engine Predict products that a customer will like – and show it. Predictive model Algorithm - Predictive model - based on users’ behaviors
  4. def pseudocode () { // Read training data
 val trainingData

    = sc.textFile("trainingData.txt").map(_.split(',') match
 { …. }) // Build a predictive model with an algorithm
 val model = ALS.train(trainingData, 10, 20, 0.01) // Make prediction
 allUsers.foreach { user =>
 model.recommendProducts(user, 5)
 } } A Classic Recommender Example prototyping…
  5. • How to deploy a scalable service that respond to

    dynamic prediction query? • How do you persist the predictive model, in a distributed environment? • How to make HBase, Spark and algorithms talking to each other? • How should I prepare, or transform, the data for model training? • How to update the model with new data without downtime? • Where should I add some business logics? • How to make the code configurable, re-usable and maintainable? • How do I build all these with a separate of concerns (SoC)? Beyond Prototyping
  6. Engine Event Server (data storage) Data: User Actions Query via

    REST: User ID Predicted Result: A list of Product IDs A Classic Recommender Example on production… Mobile App
  7. Data: User Actions Query via REST: User ID Predicted Result:

    A list of Product IDs Engine Event Server (data storage) Mobile App Event Server
  8. • $ pio eventserver • Event-based client.create_event( event="rate", entity_type="user", entity_id=“user_123”,

    target_entity_type="item", target_entity_id=“item_100”, properties= { "rating" : 5.0 } ) Event Server Collecting Date
  9. Query via REST: User ID Predicted Result: A list of

    Product IDs Engine Data: User Actions Event Server (data storage) Mobile App Engine
  10. • DASE - the “MVC” for Machine Learning • Data:

    Data Source and Data Preparator • Algorithm(s) • Serving • Evaluator Engine Building an Engine with Separation of Concerns (SoC)
  11. A. Train deployable predictive model(s) B. Respond to dynamic query

    C. Evaluation Engine Functions of an Engine
  12. Engine A. Train predictive model(s) Event Server Algorithm 1 Algorithm

    3 Algorithm 2 PreparedDate Engine Data Preparator Data Source TrainingDate Model 3 Model 1 Model 2
  13. Engine A. Train predictive model(s) class DataSource(…) extends PDataSource def

    readTraining(sc: SparkContext) ==> trainingData class Preparator(…) extends PPreparator def prepare(sc: SparkContext, trainingData: TrainingData) ==> preparedData class Algorithm1(…) extends PAlgorithm def train(prepareData: PreparedData) ==> Model $ pio train
  14. B. Respond to dynamic query Engine Algorithm 1 Model 1

    Serving Mobile App Algorithm 3 Model 3 Algorithm 2 Model 2 Predicted Results Query (input) Predicted Result (output) Engine
  15. B. Respond to dynamic query Engine • Query (Input) :


    
 $ curl -H "Content-Type: application/json" -d 
 '{ "user": "1", "num": 4 }' 
 http://localhost:8000/queries.json case class Query( val user: String, val num: Int ) extends Serializable
  16. B. Respond to dynamic query Engine • Predicted Result (Output):


    
 {“itemScores”:[{"item":"22","score":4.072304374729956}, {"item":"62","score":4.058482414005789},
 {"item":"75","score":4.046063009943821}]} case class PredictedResult( val itemScores: Array[ItemScore] ) extends Serializable case class ItemScore( item: String, score: Double ) extends Serializable
  17. class Algorithm1(…) extends PAlgorithm def predict(model: ALSModel, query: Query) ==>

    predictedResult class Serving extends LServing def serve(query: Query, predictedResults: Seq[PredictedResult]) ==> predictedResult B. Respond to dynamic query Engine Query via REST
  18. Engine DASE Factory object RecEngine extends IEngineFactory { def apply()

    = { new Engine( classOf[DataSource], classOf[Preparator], Map("algo1" -> classOf[Algorithm1]), classOf[Serving]) } }
  19. • PredictionIO is a machine learning server for building and

    deploying predictive engines
 on production
 in a fraction of the time. • Built on Apache Spark, MLlib and HBase. PredictionIO
  20. Running on Production • Install PredictionIO
 $ bash -c "$(curl

    -s http://install.prediction.io/install.sh)" • Start the Event Server
 $ pio eventserver • Deploy an Engine
 $ pio build; pio train; pio deploy • Update Engine Model with New Data
 $ pio train; pio deploy
  21. Deploy on Production Website Mobile App Email Campaign Event Server

    (data storage) Data Query via REST Predicted Result Engine 1 Engine 3 Engine 2 Engine 4
  22. The Next Step • Quickstart with an Engine Template! •

    Follow on Github: github.com/predictionio/ • Learn PredictionIO: prediction.io/ • Contribute!