Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open standards for predictive model deployment ...

Open standards for predictive model deployment - a lightning talk in March 2019

This is a 15 minute presentation I gave at a Meetup of "Write/Speak/Code" group on March 26, 2019. Predictive model deployment is a very important part of the overall Machine Learning process, and can be difficult because model building and deployment are often done by different teams, using different environments, etc.
PMML, PFA, and ONNX are three open standards that help to solve the problem.
PMML is more than 20 years old, based on XML, and widely used for traditional machine learning. PFA is newer, JSON-based, attracts interest. ONNX was released in 2017 by Microsoft and Facebook for deep learning models, is now worked on by many companies, supports most DL frameworks, adding traditional ML.

Avatar for Svetlana Levitan

Svetlana Levitan

March 26, 2019
Tweet

More Decks by Svetlana Levitan

Other Decks in Technology

Transcript

  1. Open standards for predictive model deployment — Svetlana Levitan, PhD

    IBM Developer Advocate Center for Open Source Data and AI Technologies [email protected] @SvetaLevitan
  2. CODAIT codait.org Center for Open Source Data and AI Technologies

    DBG / Oct 4, 2018 / © 2018 IBM Corporation “Make AI solutions dramatically easier to create, deploy, and manage in the enterprise” Projects: • Apache Spark • Fabric for Deep Learning • AI Farness 360 • Adversarial Robustness Toolkit • Model Asset Exchange • … Improving Enterprise AI Lifecycle in Open Source Gather Data Analyze Data Machine Learning Deep Learning Deploy Model Maintain Model Python Data Science Stack Fabric for Deep Learning (FfDL) Mleap + PFA Scikit-Learn Pandas Apache Spark Apache Spark Jupyter Model Asset eXchange Keras + Tensorflow
  3. Typical Stages in Machine Learning 3 Collect Data Analyze and

    Clean Data Transform Data Build a Model Deploy the model Monitor and update as needed (C) 2019 IBM Corp Deployment Challenges: • different teams, languages, environments • must keep transformations with the model(s)
  4. DMG to the rescue! 4 Data Mining Group dmg.org Founded

    in late 1990’s by Professor Robert Grossman DMG develops PMML and PFA
  5. PMML - Predictive Model Markup Language • An Open Standard

    for XML Representation of models and transforms • Over 30 vendors and organizations • XML elements for specific data transforms and models • Now 16 models + ensembles / compositions • Platform-, language- and framework-agnostic • Human- and machine readable • Started in or before 1997 dmg.org/pmml
  6. PMML document structure © 2018 IBM Corporation • Application info,

    Timestamp, copyright Header • Field info: name, data type, etc. Data Dictionary • Derived Fields Transformation Dictionary • Mining Schema • Specific model contents Model(s)
  7. <Node id=“0"> <True/> <Node id=“1" score="Iris-setosa" recordCount="50.0"> <SimplePredicate field="petal_length" operator="lessOrEqual“

    value=“2.6"/> <ScoreDistribution value="Iris-setosa" recordCount="50.0"/> <ScoreDistribution value="Iris-versicolor" recordCount="0.0"/> <ScoreDistribution value="Iris-virginica" recordCount="0.0"/> </Node> <Node id=“2"> <SimplePredicate field="petal_length" operator="greaterThan“ value=“2.6"/> <Node id=“3“score="Iris-versicolor" recordCount=“40.0"> <SimplePredicate field="petal_length" operator="lessOrEqual" value=“4.75"/> Example PMML for a Tree Model
  8. JPMML – Open source package for PMML Created by Villu

    Ruusmann, CTO and Founder at Openscoring.IO, Estonia JPMML-Model is licensed under the BSD 3-Clause License. Other parts of JPMML have Apache 2 license
  9. 11 Portable Format for Analytics - PFA PMML is great,

    except when a model or feature is not supported PFA to overcome this JSON format, AVRO schemas for data types Built-in functions and common models A mini functional math language + schema specification, portable Info: dmg.org/pfa Jim Pivarski
  10. 12 A Simple Example of PFA (copied from Nick Pentreath’s

    presentation) • Example – multi-class logistic regression • Specify input and output types using Avro schemas • Specify the action to perform (typically on input) (C) 2018 IBM Corp
  11. 13 Managing State in PFA (copied from Nick Pentreath’s presentation)

    • Data storage specified by cells • A cell is a named value acting as a global variable • Typically used to store state (such as model coefficients, vocabulary mappings, etc) • Types specified with Avro schemas • Cell values are mutable within an action, but immutable between action executions of a given PFA document • Persistent storage specified by pools • Closer in concept to a database • Pools values are mutable across action executions (C) 2018 IBM Corp
  12. 14 Known Support for PFA Hadrian (PFA export and scoring

    engine) from Open Data Group (Chicago, IL) Aardpfark (PFA export in SparkML) by Nick Pentreath, IBM CODAIT, South Africa Woken (PFA export and validation) by Ludovic Claude, CHUV, Lausanne, Switzerland There is a lot of interest in PFA! Some work is needed to help it grow
  13. Open Neural Network eXchange (ONNX) IBM Data and AI, 2019

    © 2019 IBM Corporation 15 Binary protobuf format: computation graph + weights + metadata Originally only for deep NN, now added more Started by Microsoft and Facebook in September 2017 Now supports all major DL frameworks and some traditional ML Many companies are actively working on it For more info see onnx.ai
  14. ONNX use pattern ONNX IR Spec .onnx Frontend Models in

    different frameworks Tools Netron visualizer Net Drawer visualizer Checker Shape Inferencer Graph Optimizer Opset Version Converter Backend Models in different frameworks Training Inference Export Import Run 16
  15. 17 https://callforcode.org The world answered the call in 2018, and

    we made a difference… …in 2019, we have a chance to change the world together…
  16. HOW YOU CAN ENGAGE INDIVIDUAL CONTRIBUTOR We’re calling on individual

    developers to amplify our messaging, register for the challenge and get started building applications that will save lives. SUPPORTER (ORGANIZATION) Add your organization’s support to the cause by promoting our messaging and adding your company’s logo to our website indicating that you support the cause. SPONSOR (ORGANIZATION) Paid sponsors have the benefit of being front-and-center in all Call for Code outreach. You’ll also have the option of hosting events and tying your product or service to the cause. 18 Learn more: https://callforcode.org/why-answer/ https://callforcode.org/beco me-a-sponsor/ https://callforcode.org/become -a-supporter/ https://callforcode.org/challenge/
  17. 19 Thank you! https://www.linkedin.com/in/svetlanalevitan @SvetaLevitan [email protected] PMML: dmg.org/pmml PFA: dmg.org/pfa

    ONNX: onnx.ai IBM CODAIT: codait.org Meetup groups: Chicago Cloud Developers, IBM Cloud, Big Data Developers in Chicago, Open Source Analytics