• How does the business benefit from the insights? • Operationalization is frequently the weak link – Operationalizing PowerPoint? – Hand rolled scoring flows?
a different data platform to training – Framework specific persistence formats • Complex data preprocessing requirements – Data cleansing and feature engineering • Batch training versus RT/stream scoring • How frequently are models updated? • How is performance monitored?
Created in 1998 – Version 4.3 just released • Good for specifying many common model types • Limited support for complex data preprocessing – Can require companion scripts/code • Broad PMML export support • Limited import support
deployment to a scoring engine - Entire flow encapsulated in model output Pre-Processing Workflow - Transformations required before model - ETL, feature engineering, etc. Trained model - ML model
export PFA • Process entire pipeline from raw data input to final model output – Synthetize PFA doc to represent the flow • PFA is capable of representing many key operations – Much richer than PMML
• Need cross-platform and cross-framework interoperability • Need easy model deployment to ensure maximum impact • PFA makes it much simpler to deploy complex scoring flows • OSS PFA scoring engines available and easily integrated with Spark • Working to enable PFA model export from SparkML