Slide 1

Slide 1 text

Continuous Delivery for Data Pipelines

Slide 2

Slide 2 text

Spring XD Spring Cloud Stream Spring Cloud Task Spring Cloud Skipper Spring Flo Spring Cloud Data Flow Sabby Anandan | @sabbyanandan

Slide 3

Slide 3 text

Role of Data Integration Data Pipeline Concepts CI/CD & Data Pipelines Orchestrate All-the-Things 3

Slide 4

Slide 4 text

Dev/Testing Deep-dive CI/CD Workshop Tools Selection This talk is not a .. 4

Slide 5

Slide 5 text

5

Slide 6

Slide 6 text

You have data from disparate systems 6

Slide 7

Slide 7 text

You have data of different types 7

Slide 8

Slide 8 text

You have data of varying speed, size & shape 8

Slide 9

Slide 9 text

You have data that evolves 9

Slide 10

Slide 10 text

“simple things should be simple, complex things should be possible” (alan kay) 10

Slide 11

Slide 11 text

a toolkit for building data integration, real-time streaming, and batch data processing pipelines 11 Spring Cloud Data Flow

Slide 12

Slide 12 text

Data Integration Source Processor Sink file ftp gemfire gemfire-cq http jdbc jms load-generato loggregator mail mongodb mqtt rabbit s3 sftp syslog tcp tcp-client time trigger triggertask twitterstream aggregator bridge filter groovy-filter groovy-transform header-enricher httpclient pmml python-http python-jython scriptable-transform splitter tasklaunchrequest- transform tcp-client tensorflow transform twitter-sentiment aggregate-counter cassandra counter field-value-counter file ftp gemfire gpfdist hdfs hdfs-dataset jdbc log mongodb mqtt pgcopy rabbit redis-pubsub router s3 sftp task-launcher-cloudfoundry task-launcher-local task-launcher-yarn tcp throughput websocket Streaming Apps Task composed-task-runner jdbchdfs-local spark-client spark-cluster spark-yarn timestamp timestamp-batch Batch/Task Apps 12

Slide 13

Slide 13 text

A DSL inspired by unix’s pipes and filter syntaxes “source | processor | … | processor | sink” “source | processor > :commonDestination” 13

Slide 14

Slide 14 text

14

Slide 15

Slide 15 text

DEMO 111-22-3333 444-55-6666 777-88-9999 . . . Add prefix to each Payload 15 The Security Number = 111-22-3333 The Security Number = 444-55-6666 The Security Number = 777-88-9999 . . .

Slide 16

Slide 16 text

OK, what’s different about it? Abstractions! 16

Slide 17

Slide 17 text

Cloud Runtime Abstraction Flexible cloud- runtime implementations Apps run standalone or in any cloud runtime .. exactly the same 17

Slide 18

Slide 18 text

Message Binder Abstraction Flexible messaging- middleware implementations Same code; same tests; runs exactly the same on different message brokers Google PubSub Active MQ IBM MQ Solace Amazon SQS Amazon Kinesis Rabbit MQ Apache Kafka Kafka Streams 18

Slide 19

Slide 19 text

19 Latency Data Corruption False Predictions Recovery and Resiliency Wild Side of Data Processing

Slide 20

Slide 20 text

It is all about Customer Experience! vs. 20

Slide 21

Slide 21 text

“The DSL doesn’t provide granular control over application lifecycle” “Rely on runtime- platform’s blue- green deployment support for rolling- upgrades” “Manually tweak and re-deploy application properties by- hand” “Changing deployment properties means, a new stream/task altogether” FEEDBACK 21

Slide 22

Slide 22 text

22 How do we Continuously Deliver?

Slide 23

Slide 23 text

23 Spring Cloud Skipper

Slide 24

Slide 24 text

24 Inspiration

Slide 25

Slide 25 text

25 111-22-3333 444-55-6666 777-88-9999 . . . Mask each Payload The Security Number = xxx-xx—3333 The Security Number = xxx-xx-6666 The Security Number = xxx-xx-9999 . . . 25 DEMO Don’t Disturb Don’t Disturb Fix This!

Slide 26

Slide 26 text

SCDF Shell Skipper Server REST SCDF Server REST Changes Detected Kubernetes / Cloud Foundry source sink source process process sink source process process sink Binders Stream App Deploy Delta 26 Diff Record Single Source of Truth

Slide 27

Slide 27 text

Build Test Package IT Test Unit Test Candidate Stage Deploy to PROD E2E Test Deploy to PROD automatic automatic manual automatic automatic automatic Continuous Delivery Continuous Deployment 27 automatic automatic automatic automatic automatic

Slide 28

Slide 28 text

V1 V2 V3 V4 V5 V6 All and every action is versioned … Single Source of Truth 28

Slide 29

Slide 29 text

Granular app-lifecycle controls … History Diff Upgrade Rollback ` 29

Slide 30

Slide 30 text

Pluggable Deployment Pipeline Strategies ChatOps Continuous Integration Tools 30

Slide 31

Slide 31 text

Resources: Spring Cloud Data Flow: http://cloud.spring.io/spring-cloud-dataflow Spring Cloud Skipper: http://cloud.spring.io/spring-cloud-skipper/ Concourse: http://concourse.ci/ Demo: https://github.com/sabbyanandan/xfmr Keep Your Customers Happy!

Slide 32

Slide 32 text

Resources: Bike to Work Day in San Francisco Unsafe bike lane disclaimer Aerial Of Dubai Highway Roads 4k