Upgrade to Pro — share decks privately, control downloads, hide ads and more …

spring-batch-kubernetes.pdf

 spring-batch-kubernetes.pdf

Mahmoud Ben Hassine

March 12, 2020
Tweet

More Decks by Mahmoud Ben Hassine

Other Decks in Technology

Transcript

  1. Mahmoud Ben Hassine March 2020 Spring Batch on Kubernetes Efficient

    batch processing at scale Copyright © 2020 VMware, Inc. or its affiliates.
  2. About me • Principal Software Engineer at VMWare • Spring

    Batch Co-Lead • Open source enthusiast @benas @b_e_n_a_s
  3. What about you? • Any Spring Batch users? • Any

    Spring Boot users? • Any Kubernetes users?
  4. Agenda • Spring Batch 101 • Kubernetes Jobs 101 •

    Spring Batch on Kubernetes, a perfect match! • Demo • Q+A
  5. What is batch processing? “Batch processing … is defined as

    the processing of a finite amount of data without interaction or interruption.” Michael Minella, The definitive guide to Spring Batch
  6. Core Features Robustness • Repeat/Retry/Skip/Restart • Transaction management • Chunk-oriented

    processing Scalability • Multi-threaded steps • Parallel steps • Remote chunking/partitioning Flexibility • XML/Java config styles • Declarative I/O • Rich library of item readers/writers And based on Spring Framework!
  7. Use cases • ETL processing • Generation of statements/reports •

    Data analysis • Data science • Business intelligence
  8. History of Spring Batch • Step scope • Chunk-oriented processing

    • Remote chunking/partitioning • Java 5 • Spring Framework 3 v2.0 Apr 11, 2009 • Initial APIs • Item-oriented processing • XML configuration • Java 1.4 • Spring Framework 2.5 • Java configuration • Spring Data support • Non-identifying Job params • AMQP support • SQLFire support • Job scope • JSR-352 support • SQLite support • Spring Batch Integration • Spring Boot support • Builders for readers • Builders for writers • Java 8 • Spring Framework 5 v3.0 May 22, 2014 v1.0 Mar 28, 2008 v2.2 Jun 05, 2013 v4.0 Dec 1, 2017
  9. Kubernetes Jobs • A Job creates one or more Pods

    and ensures that a specified number of them successfully terminate • The Job object will start a new Pod if the first Pod fails or is deleted (for example due to a node hardware failure or a node reboot). • You can also use a Job to run multiple Pods in parallel. • Deleting a Job will clean up the Pods it created. https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion
  10. Kubernetes CronJobs • A Cron Job creates Jobs on a

    time-based schedule (Unix like cron). • There are certain circumstances where two jobs might be created, or no job might be created [..] Therefore, jobs should be idempotent. • The CronJob Controller checks how many schedules it missed in the duration from its last scheduled time until now. If there are more than 100 missed schedules, then it does not start the job and logs the error. • The CronJob is only responsible for creating Jobs that match its schedule, and the Job in turn is responsible for the management of the Pods it represents. https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/
  11. Why would you need Kubernetes (for your batch jobs) ?

    • It is not hype anymore, Kubernetes is awesome! Give it a try, even if you are not Google or Netflix (See the reasonable migration path suggested in next slides) • Ability to run batch jobs on any node in the cluster with a single command • Ability to query the entire cluster for running jobs with a single command • Ability to automatically run jobs to completion (in case of node/pod failure) • Efficient resources management (k8s plays Tetris with your cluster) • Scalability
  12. Cloud friendly batch jobs, how? • Spring Batch jobs maintain

    their state in an external database, and as such, they are already 12 factors processes [1] (and could be easily 12 factorized: log to standard output, configured from the environment, etc) • Skip successfully executed steps in previous run in case of failure (cost efficient) • Retry failed items in case of transient errors (like a call to a web service that might be temporarily down or being re-scheduled in a cloud environment) • Restart from the last save point within the same step thanks to the chunk-oriented processing model (cost efficient) • Safe against duplicate job executions (due to a human error or k8s pod rescheduling or CronJob limitation when it might run the same job twice) [1]: https://12factor.net/processes
  13. Containerised batch jobs, why? • Separate logs • Independent life

    cycle (bugs/features, deployment, etc) • Separate parameters / exit codes ! • Restartability (in case of failure, only restart the failed job) • Testability • Scalability • Resource usage efficiency (optimized resource limits => better pod scheduling)
  14. BigBang-less migration plan • Keep the database outside Kubernetes [1],

    migrate only stateless batch jobs: gradual, hybrid migration path • Traditional job => bootify it => 12 factorize it => dockerize it => kubernetize it • Use kubernetes namespaces for testing/deploying jobs in staging/production [2] • CI/CD CronJobs live with “kubectl set image” or its REST API equivalent [3] [1]: https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-mapping-external-services [2]: https://kubernetes.io/blog/2015/08/using-kubernetes-namespaces-to-manage/ [3]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#patch-cronjob-v1beta1-batch
  15. What you should know before the migration • Job/Container exit

    code is very important! • Understand graceful/abrupt shutdown implications • Choose the right job pattern [1] (Job instances volume) • Choose the right restart/concurrency policies • Understand CronJobs limitations [2] [1]: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#job-patterns [2]: https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations
  16. Job deployment and administration with Spring Cloud Data Flow +

    Spring Batch Spring Boot Spring Cloud Data Flow = +
  17. Q+A

  18. Thank you! © 2020 Spring. A VMware-backed project. Code: https://github.com/benas/spring-batch-lab/tree/master/talks

    Slides: https://speakerdeck.com/benas/spring-batch-kubernetes Spring Batch home: https://projects.spring.io/spring-batch/ Kubernetes home: https://kubernetes.io Github: @benas Twitter: @b_e_n_a_s