DevNexus 2018: Continuous Delivery for Data Pipelines

7d5161a154a9f0958864fb12bb9ba966?s=47 Sabby Anandan
February 22, 2018

DevNexus 2018: Continuous Delivery for Data Pipelines

Abstract: Continuously delivery is central to every software-driven organization. If you’re doing this today or you’d like to learn or extend the practice to “data-centric applications”, you will find this demo-driven talk supplementing your current methods and as well as provide useful context for future developments.

In this talk, we will review Spring Cloud Data Flow and Spring Cloud Skipper and how they come together to solve data integration and the continuous delivery challenges respectively. The day-to-day developer workflow including development, testing, CI, and the overall orchestration on cloud platforms (e.g., Cloud Foundry, Kubernetes) will be demonstrated.

7d5161a154a9f0958864fb12bb9ba966?s=128

Sabby Anandan

February 22, 2018
Tweet

Transcript

  1. Continuous Delivery for Data Pipelines

  2. Spring XD Spring Cloud Stream Spring Cloud Task Spring Cloud

    Skipper Spring Flo Spring Cloud Data Flow Sabby Anandan | @sabbyanandan
  3. Role of Data Integration Data Pipeline Concepts CI/CD & Data

    Pipelines Orchestrate All-the-Things 3
  4. Dev/Testing Deep-dive CI/CD Workshop Tools Selection This talk is not

    a .. 4
  5. 5

  6. You have data from disparate systems 6

  7. You have data of different types 7

  8. You have data of varying speed, size & shape 8

  9. You have data that evolves 9

  10. “simple things should be simple, complex things should be possible”

    (alan kay) 10
  11. a toolkit for building data integration, real-time streaming, and batch

    data processing pipelines 11 Spring Cloud Data Flow
  12. Data Integration Source Processor Sink file ftp gemfire gemfire-cq http

    jdbc jms load-generato loggregator mail mongodb mqtt rabbit s3 sftp syslog tcp tcp-client time trigger triggertask twitterstream aggregator bridge filter groovy-filter groovy-transform header-enricher httpclient pmml python-http python-jython scriptable-transform splitter tasklaunchrequest- transform tcp-client tensorflow transform twitter-sentiment aggregate-counter cassandra counter field-value-counter file ftp gemfire gpfdist hdfs hdfs-dataset jdbc log mongodb mqtt pgcopy rabbit redis-pubsub router s3 sftp task-launcher-cloudfoundry task-launcher-local task-launcher-yarn tcp throughput websocket Streaming Apps Task composed-task-runner jdbchdfs-local spark-client spark-cluster spark-yarn timestamp timestamp-batch Batch/Task Apps 12
  13. A DSL inspired by unix’s pipes and filter syntaxes “source

    | processor | … | processor | sink” “source | processor > :commonDestination” 13
  14. 14

  15. DEMO 111-22-3333 444-55-6666 777-88-9999 . . . Add prefix to

    each Payload 15 The Security Number = 111-22-3333 The Security Number = 444-55-6666 The Security Number = 777-88-9999 . . .
  16. OK, what’s different about it? Abstractions! 16

  17. Cloud Runtime Abstraction Flexible cloud- runtime implementations Apps run standalone

    or in any cloud runtime .. exactly the same 17
  18. Message Binder Abstraction Flexible messaging- middleware implementations Same code; same

    tests; runs exactly the same on different message brokers Google PubSub Active MQ IBM MQ Solace Amazon SQS Amazon Kinesis Rabbit MQ Apache Kafka Kafka Streams 18
  19. 19 Latency Data Corruption False Predictions Recovery and Resiliency Wild

    Side of Data Processing
  20. It is all about Customer Experience! vs. 20

  21. “The DSL doesn’t provide granular control over application lifecycle” “Rely

    on runtime- platform’s blue- green deployment support for rolling- upgrades” “Manually tweak and re-deploy application properties by- hand” “Changing deployment properties means, a new stream/task altogether” FEEDBACK 21
  22. 22 How do we Continuously Deliver?

  23. 23 Spring Cloud Skipper

  24. 24 Inspiration

  25. 25 111-22-3333 444-55-6666 777-88-9999 . . . Mask each Payload

    The Security Number = xxx-xx—3333 The Security Number = xxx-xx-6666 The Security Number = xxx-xx-9999 . . . 25 DEMO Don’t Disturb Don’t Disturb Fix This!
  26. SCDF Shell Skipper Server REST SCDF Server REST Changes Detected

    Kubernetes / Cloud Foundry source sink source process process sink source process process sink Binders Stream App Deploy Delta 26 Diff Record Single Source of Truth
  27. Build Test Package IT Test Unit Test Candidate Stage Deploy

    to PROD E2E Test Deploy to PROD automatic automatic manual automatic automatic automatic Continuous Delivery Continuous Deployment 27 automatic automatic automatic automatic automatic
  28. V1 V2 V3 V4 V5 V6 All and every action

    is versioned … Single Source of Truth 28
  29. Granular app-lifecycle controls … History Diff Upgrade Rollback ` 29

  30. Pluggable Deployment Pipeline Strategies ChatOps Continuous Integration Tools 30

  31. Resources: Spring Cloud Data Flow: http://cloud.spring.io/spring-cloud-dataflow Spring Cloud Skipper: http://cloud.spring.io/spring-cloud-skipper/

    Concourse: http://concourse.ci/ Demo: https://github.com/sabbyanandan/xfmr Keep Your Customers Happy!
  32. Resources: Bike to Work Day in San Francisco Unsafe bike

    lane disclaimer Aerial Of Dubai Highway Roads 4k