Save 37% off PRO during our Black Friday Sale! »




Mahmoud Ben Hassine

March 12, 2020


  1. Mahmoud Ben Hassine March 2020 Spring Batch on Kubernetes Efficient

    batch processing at scale Copyright © 2020 VMware, Inc. or its affiliates.
  2. About me • Principal Software Engineer at VMWare • Spring

    Batch Co-Lead • Open source enthusiast @benas @b_e_n_a_s
  3. What about you? • Any Spring Batch users? • Any

    Spring Boot users? • Any Kubernetes users?
  4. Agenda • Spring Batch 101 • Kubernetes Jobs 101 •

    Spring Batch on Kubernetes, a perfect match! • Demo • Q+A
  5. Spring Batch 101

  6. What is batch processing? “Batch processing … is defined as

    the processing of a finite amount of data without interaction or interruption.” Michael Minella, The definitive guide to Spring Batch
  7. Batch domain language (1/2)

  8. Batch domain language (2/2) Once successfully completed, a job instance

    cannot be (re)started again.
  9. Batch domain model

  10. Chunk-oriented processing

  11. Core Features Robustness • Repeat/Retry/Skip/Restart • Transaction management • Chunk-oriented

    processing Scalability • Multi-threaded steps • Parallel steps • Remote chunking/partitioning Flexibility • XML/Java config styles • Declarative I/O • Rich library of item readers/writers And based on Spring Framework!
  12. Use cases • ETL processing • Generation of statements/reports •

    Data analysis • Data science • Business intelligence
  13. History of Spring Batch • Step scope • Chunk-oriented processing

    • Remote chunking/partitioning • Java 5 • Spring Framework 3 v2.0 Apr 11, 2009 • Initial APIs • Item-oriented processing • XML configuration • Java 1.4 • Spring Framework 2.5 • Java configuration • Spring Data support • Non-identifying Job params • AMQP support • SQLFire support • Job scope • JSR-352 support • SQLite support • Spring Batch Integration • Spring Boot support • Builders for readers • Builders for writers • Java 8 • Spring Framework 5 v3.0 May 22, 2014 v1.0 Mar 28, 2008 v2.2 Jun 05, 2013 v4.0 Dec 1, 2017
  14. Why would you need a framework like Spring Batch?

  15. Why would you need a framework like Spring Batch?

  16. Why would you need a framework like Spring Batch?

  17. Kubernetes Jobs 101

  18. Kubernetes Jobs • A Job creates one or more Pods

    and ensures that a specified number of them successfully terminate • The Job object will start a new Pod if the first Pod fails or is deleted (for example due to a node hardware failure or a node reboot). • You can also use a Job to run multiple Pods in parallel. • Deleting a Job will clean up the Pods it created.
  19. Kubernetes Jobs - example

  20. Kubernetes CronJobs • A Cron Job creates Jobs on a

    time-based schedule (Unix like cron). • There are certain circumstances where two jobs might be created, or no job might be created [..] Therefore, jobs should be idempotent. • The CronJob Controller checks how many schedules it missed in the duration from its last scheduled time until now. If there are more than 100 missed schedules, then it does not start the job and logs the error. • The CronJob is only responsible for creating Jobs that match its schedule, and the Job in turn is responsible for the management of the Pods it represents.
  21. Kubernetes CronJobs - example

  22. Why would you need Kubernetes (for your batch jobs) ?

    • It is not hype anymore, Kubernetes is awesome! Give it a try, even if you are not Google or Netflix (See the reasonable migration path suggested in next slides) • Ability to run batch jobs on any node in the cluster with a single command • Ability to query the entire cluster for running jobs with a single command • Ability to automatically run jobs to completion (in case of node/pod failure) • Efficient resources management (k8s plays Tetris with your cluster) • Scalability
  23. Spring Batch on Kubernetes A perfect match!

  24. Cloud friendly batch jobs, how? • Spring Batch jobs maintain

    their state in an external database, and as such, they are already 12 factors processes [1] (and could be easily 12 factorized: log to standard output, configured from the environment, etc) • Skip successfully executed steps in previous run in case of failure (cost efficient) • Retry failed items in case of transient errors (like a call to a web service that might be temporarily down or being re-scheduled in a cloud environment) • Restart from the last save point within the same step thanks to the chunk-oriented processing model (cost efficient) • Safe against duplicate job executions (due to a human error or k8s pod rescheduling or CronJob limitation when it might run the same job twice) [1]:
  25. Containerised batch jobs, why? • Separate logs • Independent life

    cycle (bugs/features, deployment, etc) • Separate parameters / exit codes ! • Restartability (in case of failure, only restart the failed job) • Testability • Scalability • Resource usage efficiency (optimized resource limits => better pod scheduling)
  26. BigBang-less migration plan • Keep the database outside Kubernetes [1],

    migrate only stateless batch jobs: gradual, hybrid migration path • Traditional job => bootify it => 12 factorize it => dockerize it => kubernetize it • Use kubernetes namespaces for testing/deploying jobs in staging/production [2] • CI/CD CronJobs live with “kubectl set image” or its REST API equivalent [3] [1]: [2]: [3]:
  27. What you should know before the migration • Job/Container exit

    code is very important! • Understand graceful/abrupt shutdown implications • Choose the right job pattern [1] (Job instances volume) • Choose the right restart/concurrency policies • Understand CronJobs limitations [2] [1]: [2]:
  28. Job deployment and administration with Spring Cloud Data Flow +

    Spring Batch Spring Boot Spring Cloud Data Flow = +
  29. DEMO

  30. Q+A

  31. Thank you! © 2020 Spring. A VMware-backed project. Code:

    Slides: Spring Batch home: Kubernetes home: Github: @benas Twitter: @b_e_n_a_s