Slide 1

Slide 1 text

Machine & Deep Learning with Spring Cloud Data Flow Christian Tzolov Pivotal Engineer, Spring Cloud Data Flow Apache Committer, Crunch PMC member

Slide 2

Slide 2 text

Industry Trends - Enterprises are adopting DevOps practices in their transition into software and data-driven businesses. - ETL integration with existing systems, and modernization efforts are still very important. - Continuous Event processing is becoming mainstream. - Integration of IoT data flows and Machine Learning/ Deep Learning algorithms

Slide 3

Slide 3 text

- Brings unprecedented abilities to the Software Engineering field. - Provides a different way to reason about problems - Solves “un-programmable” tasks Machine / Deep Learning (ML/DL)

Slide 4

Slide 4 text

How ML/DL can help us to deliver richer business solutions? For Java Practitioners?

Slide 5

Slide 5 text

Spoiler: Spring Cloud Data Flow (SCDF) would tackle the ML integration complexity Image Recognition TensorFlow Demo:

Slide 6

Slide 6 text

- Observations about an uncertain world - Experiments with train datasets - Statistics to analyze the results The ML Paradigm

Slide 7

Slide 7 text

- Phase 1: Train model on historical datasets - Phase 2: Run pre-trained model for predictive analytics ML/DL Life-cycle

Slide 8

Slide 8 text

Model inference for predictive analytics is the most common use of ML/DL in Java applications. For Java practitioners?

Slide 9

Slide 9 text

- ML Model Reusability: PMML, PFA, MLeap, ONNX … TensorFlow - Model Serving vs Embedding Inference Considerations Java Process Pre-trained ML Model Output Predictions Stream Input Data Stream External System Java Process Pre-trained ML Model Output Predictions Stream Input Data Stream

Slide 10

Slide 10 text

Input data Stream - Real-Time ML Inference - Embedded Pre-trained Models - PMML & TensorFlow ML models Reference Architecture Java Process Pre-trained ML Model Output Predictions

Slide 11

Slide 11 text

Species Prediction Iris Flower Dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set Naive Bayes classifier: https://en.wikipedia.org/wiki/Naive_Bayes_classifier SCDF Sample: https://docs.spring.io/spring-cloud-dataflow-samples/docs/ current/reference/htmlsingle/#_data_science

Slide 12

Slide 12 text

☹ Ingest Processing (predictions) Storage Let’s do a Twitter Sentiment Analysis!

Slide 13

Slide 13 text

Let’s do a real-time Object Detection

Slide 14

Slide 14 text

Spring Cloud Data Flow a toolkit for building data integration, real-time, and batch data processing pipelines

Slide 15

Slide 15 text

a toolkit for building data integration, real-time, and batch data processing pipelines Spring Cloud Stream a event-driven microservice framework - eliminate boilerplate when developing messaging apps - pluggable messaging middleware abstraction - durable publish/subscribe semantics - data partitioning - schema evolution and management

Slide 16

Slide 16 text

Stream A Processor Processor Source Sink Transport Middleware - DSL inspired by Unix Pipes & Filters - Source | Processor* | Sink - Data payload flows through some transport abstraction Spring Cloud Streams Source Processor option Sink stream create demo --deploy --definition "http | transform --expression=payload.toUpperCase() | file" - Example:

Slide 17

Slide 17 text

a toolkit for building data integration, real-time, and batch data processing pipelines Spring Cloud Task a short-lived microservice framework - end-to-end auditing - snapshotting and checkpointing for replays - pluggable task repository abstraction - remote partitioning

Slide 18

Slide 18 text

a toolkit for building data integration, real- time, and batch data processing pipelines Source Processor Sink file ftp gemfire gemfire-cq http jdbc jms load-generato loggregator mail mongodb mqtt rabbit s3 sftp syslog tcp tcp-client time trigger triggertask twitterstream aggregator bridge filter groovy-filter groovy-transform header-enricher httpclient pmml python-http python-jython scriptable-transform splitter tasklaunchrequest- transform tcp-client tensorflow transform twitter-sentiment aggregate-counter cassandra counter field-value-counter file ftp gemfire gpfdist hdfs hdfs-dataset jdbc log mongodb mqtt pgcopy rabbit redis-pubsub router s3 sftp task-launcher-cloudfoundry task-launcher-local task-launcher-yarn tcp throughput websocket Task composed-task-runner jdbchdfs-local spark-client spark-cluster spark-yarn timestamp timestamp-batch Streaming Apps Batch/Task Apps

Slide 19

Slide 19 text

SCDF TensorFlow Processor

Slide 20

Slide 20 text

Pivotal Data Suite

Slide 21

Slide 21 text

References [1] PMML - Predictive Model Markup Language (https://en.wikipedia.org/wiki/ Predictive_Model_Markup_Language) [2] Spring Cloud Data Flow (SCDF): https://cloud.spring.io/spring-cloud-dataflow/ [3] Image-Recognition Demo Video: https://www.youtube.com/watch? v=bvDM7_CKQjo&t=38s [4] Spices Prediction PMML Sample: https://docs.spring.io/spring-cloud- dataflow-samples/docs/current/reference/htmlsingle/#_data_science [5[ SCDF Twitter Sentiment Analysis (Tensorflow): http://bit.ly/2DHpTfX [6] SCDF Object Detection Tensorflow Processor: https://github.com/spring- cloud-stream-app-starters/tensorflow/tree/master/spring-cloud-starter-stream- processor-object-detection [7] Object Detection Example: https://www.youtube.com/watch? v=2uOtImHKtgI&t=2s [8] Spring Cloud Stream: http://cloud.spring.io/spring-cloud-stream/ [9] Spring Cloud Task: http://cloud.spring.io/spring-cloud-task/

Slide 22

Slide 22 text

Keep in touch https://github.com/spring-cloud-stream-app-starters/tensorflow https://twitter.com/christzolov https://www.linkedin.com/in/tzolov