Slide 1

Slide 1 text

© 2014 Pivotal Software, Inc. All rights reserved. Distributed Data Processing with Spring Cloud Data Flow Kenny Bastani Spring Developer Advocate 1

Slide 2

Slide 2 text

© 2014 Pivotal Software, Inc. All rights reserved. Speaker Intro 2

Slide 3

Slide 3 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Kenny Bastani 3

Slide 4

Slide 4 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Agenda 4 1 Agenda 2 Microservices 3 What is Spring Boot? 4 What is Spring Cloud? 5 Lattice - Cloud Native Platform 6 Spring Cloud Data Flow 7 Streaming Analytics Example (Twitter) 9. Demo

Slide 5

Slide 5 text

© 2014 Pivotal Software, Inc. All rights reserved. Microservices “Kind of like a head ache, but more of them.” 5

Slide 6

Slide 6 text

© 2015 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Quick Explanation of Microservices • Each team gets one database and one service • Shared caches are platform provided services that are shared for consistency 6

Slide 7

Slide 7 text

© 2015 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani 7

Slide 8

Slide 8 text

© 2014 Pivotal Software, Inc. All rights reserved. Cloud Native Microservices 8

Slide 9

Slide 9 text

© 2015 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Cloud-native Microservice Deployment • Each microservice can be containerized with their application dependencies • Containers get scheduled on virtual machines with an allotted resource policy 9

Slide 10

Slide 10 text

© 2015 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Auto-scaling • An elastic runtime handles auto-scaling of VMs with cloud providers • Microservices should be load balanced vertically and not horizontally 10

Slide 11

Slide 11 text

© 2015 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Composition & Orchestration • Each microservice needs to communicate outside containers • Service discovery provides an automatic method for finding other service dependencies 11

Slide 12

Slide 12 text

© 2014 Pivotal Software, Inc. All rights reserved. Spring Boot A JVM micro-framework for building microservices 12

Slide 13

Slide 13 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani What is Spring Boot? 13

Slide 14

Slide 14 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Spring Boot Roles 14

Slide 15

Slide 15 text

© 2015 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Automatic Configuration • An application class is annotated with @SpringBootApplication • Additional annotations are added to indicate the role of the Spring Boot application 15

Slide 16

Slide 16 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Spring Boot for Microservices 16

Slide 17

Slide 17 text

© 2014 Pivotal Software, Inc. All rights reserved. Spring Cloud A toolset designed for building distributed systems 17

Slide 18

Slide 18 text

© 2015 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani What is Spring Cloud? • Spring Cloud provides a way to turn Spring Boot microservices into distributed applications 18

Slide 19

Slide 19 text

© 2015 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani What is Spring Cloud? 19 ✴ Service Discovery ✴ API Gateway ✴ Circuit Breakers ✴ Distributed Tracing

Slide 20

Slide 20 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Service Discovery & Configuration Service 20

Slide 21

Slide 21 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Configuration Service 21

Slide 22

Slide 22 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Service Discovery 22

Slide 23

Slide 23 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani API Gateway 23

Slide 24

Slide 24 text

© 2014 Pivotal Software, Inc. All rights reserved. But what about deployments? 24

Slide 25

Slide 25 text

© 2014 Pivotal Software, Inc. All rights reserved. Lattice A cloud-native platform for deploying and scaling containers in production 25

Slide 26

Slide 26 text

© 2015 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Containers, containers, containers • Lattice helps you manage Docker container deployments on clusters of VMs • Choose the cloud provider you want, deploys containers from Docker hub 26

Slide 27

Slide 27 text

© 2014 Pivotal Software, Inc. All rights reserved. Spring Cloud Data Flow Java microservices that process data in a pipeline 27

Slide 28

Slide 28 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Spring Cloud Data Flow • What is Spring Cloud Data Flow? – Spring Cloud Data Flow is a data processing pipeline that uses Spring Boot microservices – Each Spring Boot microservice takes in a message and produces a message, containing the data that you're processing 28

Slide 29

Slide 29 text

© 2014 Pivotal Software, Inc. All rights reserved. How do we design a data processing pipeline? 29

Slide 30

Slide 30 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Start with understanding your source data • What do you want to measure? – Trending analytics in real time 30

Slide 31

Slide 31 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Understand the function of your pipeline • What does an individual message contain? – A single tweet 31

Slide 32

Slide 32 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Understand what you want to measure • What do you want to filter? – A set of hash tags in the body of a Tweet 32

Slide 33

Slide 33 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Understand the outputs • What do I want to measure over time? – The velocity of hash tag counts from tweets every second 33

Slide 34

Slide 34 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani What is the result of our measurements? • A graph that shows the velocity of each tweet and its hash tags over time • Real time streaming analytics so we can make fast informed decisions 34

Slide 35

Slide 35 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Data Processing Pipeline Example 35 Source Process Filter Counter Data Data Filter messages Transformation Increment counters Ingest messages Database

Slide 36

Slide 36 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Scalable distributed data processing 36 Source Process Filter Counter Data Data Filter messages Transformation Increment counters Ingest messages Source Filter Process Counter Database

Slide 37

Slide 37 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Input and output channels • Each Spring Boot microservice has an input channel and an output channel 37 Spring Boot Service Input Channel Output Channel Microservice Message queue Message queue

Slide 38

Slide 38 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Input and output channels 38 Spring Boot Service Input Channel Output Channel Microservice Message queue Message queue Message Message Messages Messages

Slide 39

Slide 39 text

© 2015 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Building Real-time Analytics on Twitter hash tags 39

Slide 40

Slide 40 text

© 2014 Pivotal Software, Inc. All rights reserved. Setting up a data processing pipeline 40

Slide 41

Slide 41 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani We want to build something like this: 41 Source Process Filter Counter Data Data Filter messages Transformation Increment counters Ingest messages Source Filter Process Counter Database

Slide 42

Slide 42 text

© 2014 Pivotal Software, Inc. All rights reserved. Tweet Source Module 42

Slide 43

Slide 43 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Responsibility of a source module • Ingest data from multiple sources, such as a streaming REST API endpoint or HDFS • Transform a stream into discrete messages that are uniformly distributed, such as an individual tweet • Output those messages to an output channel for the next service to process 43

Slide 44

Slide 44 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Ingesting tweets • We start by building a Spring Boot source module that imports tweets from Twitter 44 Data Tweet Ingest tweets Source Module

Slide 45

Slide 45 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Visualize the tweet source module 45 Spring Boot Streaming Endpoint Output Channel Source Module Twitter API Twitter stream Message Message Tweets Twitter Stream Service Channel:
 twitter-stream

Slide 46

Slide 46 text

© 2014 Pivotal Software, Inc. All rights reserved. Adding a filter module 46

Slide 47

Slide 47 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Responsibility of a filter module • Filter messages from the source module • Filters noise to increase quality of measurements in down stream modules • Example: – I only want to measure tweets containing the hash tag #java2days 47

Slide 48

Slide 48 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Visualizing the filter module 48 Spring Boot Input Channel Output Channel Only tweets containing:
 
 #java2days Message Message Tweets Filter Service Channel:
 twitter-stream Channel:
 processor-stream

Slide 49

Slide 49 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Our pipeline now looks like: 49 Source Filter Data Tweets #java2days Twitter Stream API Twitter Stream Filter Tweets …

Slide 50

Slide 50 text

© 2014 Pivotal Software, Inc. All rights reserved. Adding a processor module 50

Slide 51

Slide 51 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Responsibility of a processor module • Take a filtered stream of messages and produce multiple output messages by transforming the payload into multiple dimensions of attributes • For example: – Take a #java2days tweet and parse the other hash tags and output one message per hash tag – #java2days -> (#Java, #SpringBoot, #JavaEE) 51

Slide 52

Slide 52 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Processor module 52 Spring Boot Input Channel Output Channel “… #java2days …" 
 -> 
 #Java, #SpringBoot, #JavaEE… Message Message #java2days
 Tweets Processor Service Channel:
 processor-stream Channel:
 counter-stream

Slide 53

Slide 53 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Our pipeline now looks like: 53 Source Filter Data Tweets #java2days Twitter Stream API Twitter Stream Filter Tweets Filter #Java, #SpringBoot… Process Hash tags …

Slide 54

Slide 54 text

© 2014 Pivotal Software, Inc. All rights reserved. Adding a counter module 54

Slide 55

Slide 55 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Responsibility of a counter module • Take messages from an input channel and output an increment to multiple buckets that count message attributes over time • Save the results to a sink, for example a Redis database • Use Spring Cloud Data Flow admin tool to measure tweets over time 55

Slide 56

Slide 56 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Counter module 56 Spring Boot Input Channel Redis DB Increment counts for hashtags:
 
 #Java -> +1, 
 #SpringBoot -> +1 Message Message Hash tags Counter Service Channel:
 counter-stream #Java: 201 #SpringBoot: 120 #JavaEE: 111 Counter Metrics:

Slide 57

Slide 57 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Our pipeline now looks like: 57 Source Filter Data Tweets #java2days Twitter Stream API Twitter Stream Filter Tweets Filter #Java, 
 #SpringBoot… Process Hash tags Filter #Java: 201
 #SpringBoot: 123 Increment Counter Metrics Redis Tweets Tweets Hash tags

Slide 58

Slide 58 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Scaling our pipeline • Services in the pipeline can be scaled up and down automatically to handle the load and prevent bottlenecks 58

Slide 59

Slide 59 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Auto-scaling 59 Source Process Filter Counter Filter Process 5,121 Filter 52 Filter 23 1 4 2 1 Scale up Scale down Instances: Instances: Instances: Instances:

Slide 60

Slide 60 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Auto-scaling 60 Source Process Filter Counter Filter Process 123 Filter 412 121 1 3 3 1 Scale down Scale up Instances: Instances: Instances: Instances: Process

Slide 61

Slide 61 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Goal of auto-scaling • Keep the data processing pipeline uniform to prevent bottlenecks • Optimize the instance count on cloud providers so that cost can be predicted and optimized 61

Slide 62

Slide 62 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Sinking messages into counters • Each message in the pipeline has an opportunity to increase multiple counters • Counters are like buckets, and we can increment those buckets with a name and timestamp 62

Slide 63

Slide 63 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Twitter Analytics Demo • We’re going to brave the demo gods • Wish me luck • First, let’s review the steps for the demo (in case it fails) 63

Slide 64

Slide 64 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani What we will demo: 64 Source Filter Data Tweets All tweets Twitter Stream API Twitter Stream Filter Tweets Filter #Fun, 
 #Awesome… Process Hash tags Filter #Fun: 201
 #Awesome: 123 Increment Counter Metrics Redis Tweets Tweets Hash tags

Slide 65

Slide 65 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Let’s go through the steps • Start a Redis server • Start the twitter streaming module • Start the filter module • Start the processor module • Start the counter module • Start Spring Cloud Data Flow Admin UI 65

Slide 66

Slide 66 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Start Redis Server 66

Slide 67

Slide 67 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Start Twitter Stream Module 67

Slide 68

Slide 68 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Start the Filter Module 68

Slide 69

Slide 69 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Start the Processor Module 69

Slide 70

Slide 70 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Start the Counter Module 70

Slide 71

Slide 71 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Start Spring Cloud Data Flow Admin UI 71

Slide 72

Slide 72 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Navigate to the Admin UI 72

Slide 73

Slide 73 text

© 2014 Pivotal Software, Inc. All rights reserved. Follow me on Twitter: @kennybastani Thanks! Questions? http://start.spring.io/ 73