Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache SystemML AI/ML

Apache SystemML AI/ML

This presentation gives an overview of the Apache SystemML AI/ML project. It explains Apache SystemML AI/ML in terms of it's functionality, dependencies and how systemDS has been forked from it providing greater functionality.

Links for further information and connecting

http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/

https://nz.linkedin.com/pub/mike-frampton/20/630/385

https://open-source-systems.blogspot.com/

Mike Frampton

June 13, 2020
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. What Is Apache SystemML ? • A machine learning system

    • Designed to scale to Spark / Hadoop clusters • Open source / Apache 2 license • Developed in Java • Supports R-like and Python-like languages • Which are designed to scale into the big data range • Automatic optimization at scale for data and cluster
  2. SystemML Execution Modes • System ML supports multiple execution modes

    • Including – Standalone – Spark Batch – Spark MLContext – Hadoop Batch – Java Machine Learning Connector (JMLC)
  3. SystemML Dependencies • System DS forked from ML 1.2 •

    Current dependencies – Java 8+ – Scala 2.11+ – Python 2.7/3.5+ – Hadoop 2.6+ – Spark 2.1+
  4. What Is Apache SystemDS ? • Forked from Apache SystemML

    1.2 in September 2018 • Supports linear algebra programs over matrices • Replaces the underlying data model and compiler • Substantially extends the supported functionalities • Supports the whole data science lifecycle – Data integration, cleaning – Feature engineering – Model training • Over efficient • Local and distributed ML – Deployment, serving
  5. What Is Apache SystemDS ? • R-like languages for –

    The data-science life cycle stages – Differing expertise levels • High-level scripts are compiled into hybrid execution plans – For local, in-memory CPU / GPU operations – For distributed operations on Apache Spark • Underlying data model are DataTensors – Tensors (multi-dimensional arrays) whose first dimension – May have a heterogeneous and nested schema
  6. SystemDS Algorithms • Descriptive Statistics – Univariate Statistics – Bivariate

    Statistics – Stratified Bivariate Statistics • Classification – Multinomial Logistic Regression – Support Vector Machines • Binary-Class Support Vector Machines • Multi-Class Support Vector Machines – Naive Bayes – Decision Trees – Random Forests
  7. SystemDS Algorithms • Clustering – K-Means Clustering • Regression –

    Linear Regression – Stepwise Linear Regression – Generalized Linear Models – Stepwise Generalized Linear Regression – Regression Scoring and Prediction • Matrix Factorization – Principal Component Analysis – Matrix Completion via Alternating Minimizations
  8. SystemDS Algorithms • Survival Analysis – Kaplan-Meier Survival Analysis –

    Cox Proportional Hazard Regression Model • Factorization Machines – Factorization Machine
  9. SystemDS Deep Neural Nets • Use SystemDS to implement deep

    neural networks – Specifying network in Keras format / invoke with Keras2DML API – Specifying network in Caffe format / invoke with Caffe2DML API – Use DML-bodied SystemDS-NN library • Ease training compute resource issues with – Native BLAS (Basic Linear Algebra Subprograms) – SystemDS GPU backend
  10. Available Books • See “Big Data Made Easy” – Apress

    Jan 2015 • See “Mastering Apache Spark” – Packt Oct 2015 • See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” • Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ • Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
  11. Connect • Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

    • See my open source blog at – open-source-systems.blogspot.com/ • I am always interested in – New technology – Opportunities – Technology based issues – Big data integration