Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevNexus 2018: Continuous Delivery for Data Pipelines

Sabby Anandan
February 22, 2018

DevNexus 2018: Continuous Delivery for Data Pipelines

Abstract: Continuously delivery is central to every software-driven organization. If you’re doing this today or you’d like to learn or extend the practice to “data-centric applications”, you will find this demo-driven talk supplementing your current methods and as well as provide useful context for future developments.

In this talk, we will review Spring Cloud Data Flow and Spring Cloud Skipper and how they come together to solve data integration and the continuous delivery challenges respectively. The day-to-day developer workflow including development, testing, CI, and the overall orchestration on cloud platforms (e.g., Cloud Foundry, Kubernetes) will be demonstrated.

Sabby Anandan

February 22, 2018
Tweet

More Decks by Sabby Anandan

Other Decks in Technology

Transcript

  1. Continuous Delivery
    for
    Data Pipelines

    View Slide

  2. Spring XD
    Spring
    Cloud
    Stream
    Spring
    Cloud
    Task
    Spring
    Cloud
    Skipper
    Spring
    Flo
    Spring Cloud Data Flow
    Sabby Anandan | @sabbyanandan

    View Slide

  3. Role of Data Integration
    Data Pipeline Concepts
    CI/CD & Data Pipelines
    Orchestrate All-the-Things
    3

    View Slide

  4. Dev/Testing Deep-dive
    CI/CD Workshop
    Tools Selection
    This talk is not a ..
    4

    View Slide

  5. 5

    View Slide

  6. You have data
    from disparate systems
    6

    View Slide

  7. You have data
    of different types
    7

    View Slide

  8. You have data
    of varying speed, size & shape
    8

    View Slide

  9. You have data
    that evolves
    9

    View Slide

  10. “simple things should be simple,
    complex things should be possible”
    (alan kay)
    10

    View Slide

  11. a toolkit for building data integration,
    real-time streaming, and batch data
    processing pipelines
    11
    Spring Cloud Data Flow

    View Slide

  12. Data Integration
    Source Processor Sink
    file
    ftp
    gemfire
    gemfire-cq
    http
    jdbc
    jms
    load-generato
    loggregator
    mail
    mongodb
    mqtt
    rabbit
    s3
    sftp
    syslog
    tcp
    tcp-client
    time
    trigger
    triggertask
    twitterstream
    aggregator
    bridge
    filter
    groovy-filter
    groovy-transform
    header-enricher
    httpclient
    pmml
    python-http
    python-jython
    scriptable-transform
    splitter
    tasklaunchrequest-
    transform
    tcp-client
    tensorflow
    transform
    twitter-sentiment
    aggregate-counter
    cassandra
    counter
    field-value-counter
    file
    ftp
    gemfire
    gpfdist
    hdfs
    hdfs-dataset
    jdbc
    log
    mongodb
    mqtt
    pgcopy
    rabbit
    redis-pubsub
    router
    s3
    sftp
    task-launcher-cloudfoundry
    task-launcher-local
    task-launcher-yarn
    tcp
    throughput
    websocket
    Streaming Apps
    Task
    composed-task-runner
    jdbchdfs-local
    spark-client
    spark-cluster
    spark-yarn
    timestamp
    timestamp-batch
    Batch/Task Apps
    12

    View Slide

  13. A DSL inspired by unix’s pipes and filter syntaxes
    “source | processor | … | processor | sink”
    “source | processor > :commonDestination”
    13

    View Slide

  14. 14

    View Slide

  15. DEMO
    111-22-3333
    444-55-6666
    777-88-9999
    . . . Add prefix to each Payload
    15
    The Security Number = 111-22-3333
    The Security Number = 444-55-6666
    The Security Number = 777-88-9999
    . . .

    View Slide

  16. OK,
    what’s
    different
    about it?
    Abstractions!
    16

    View Slide

  17. Cloud
    Runtime
    Abstraction
    Flexible cloud-
    runtime
    implementations
    Apps run
    standalone or in
    any cloud
    runtime .. exactly
    the same
    17

    View Slide

  18. Message
    Binder
    Abstraction
    Flexible
    messaging-
    middleware
    implementations
    Same code; same
    tests; runs
    exactly the same
    on different
    message brokers
    Google PubSub Active MQ IBM MQ Solace Amazon SQS
    Amazon Kinesis
    Rabbit MQ
    Apache Kafka
    Kafka Streams
    18

    View Slide

  19. 19
    Latency
    Data Corruption
    False Predictions
    Recovery and Resiliency
    Wild Side of Data Processing

    View Slide

  20. It is all about Customer Experience!
    vs.
    20

    View Slide

  21. “The DSL doesn’t
    provide granular
    control over
    application
    lifecycle”
    “Rely on runtime-
    platform’s blue-
    green deployment
    support for rolling-
    upgrades”
    “Manually tweak
    and re-deploy
    application
    properties by-
    hand”
    “Changing
    deployment
    properties means,
    a new stream/task
    altogether”
    FEEDBACK
    21

    View Slide

  22. 22
    How do we
    Continuously
    Deliver?

    View Slide

  23. 23
    Spring Cloud Skipper

    View Slide

  24. 24
    Inspiration

    View Slide

  25. 25
    111-22-3333
    444-55-6666
    777-88-9999
    . . . Mask each Payload
    The Security Number = xxx-xx—3333
    The Security Number = xxx-xx-6666
    The Security Number = xxx-xx-9999
    . . .
    25
    DEMO
    Don’t Disturb Don’t Disturb
    Fix This!

    View Slide

  26. SCDF Shell
    Skipper
    Server
    REST SCDF Server
    REST
    Changes
    Detected
    Kubernetes / Cloud Foundry
    source
    sink
    source
    process process
    sink
    source
    process
    process
    sink
    Binders
    Stream App
    Deploy
    Delta
    26
    Diff
    Record
    Single Source of Truth

    View Slide

  27. Build Test
    Package
    IT Test
    Unit Test
    Candidate
    Stage
    Deploy to
    PROD
    E2E Test
    Deploy to
    PROD
    automatic
    automatic
    manual
    automatic
    automatic
    automatic
    Continuous Delivery
    Continuous Deployment
    27
    automatic
    automatic
    automatic
    automatic
    automatic

    View Slide

  28. V1 V2 V3 V4 V5 V6
    All and every action is versioned …
    Single Source
    of
    Truth
    28

    View Slide

  29. Granular app-lifecycle controls …
    History
    Diff
    Upgrade
    Rollback
    `
    29

    View Slide

  30. Pluggable
    Deployment
    Pipeline
    Strategies
    ChatOps
    Continuous
    Integration
    Tools
    30

    View Slide

  31. Resources:
    Spring Cloud Data Flow: http://cloud.spring.io/spring-cloud-dataflow
    Spring Cloud Skipper: http://cloud.spring.io/spring-cloud-skipper/
    Concourse: http://concourse.ci/
    Demo: https://github.com/sabbyanandan/xfmr
    Keep Your Customers Happy!

    View Slide

  32. Resources:
    Bike to Work Day in San Francisco
    Unsafe bike lane disclaimer
    Aerial Of Dubai Highway Roads 4k

    View Slide