Slide 1

Slide 1 text

1 Singapore Spring User Group 3th April 2014 www.singasug.com Sergiu Bodiu IT Consultant BAML https://www.linkedin.com/in/sergiubodiu mailto:[email protected] Spring Batch A Quickstart guide to running batch plications with Spring

Slide 2

Slide 2 text

2 Agenda ● Batch processing ● Spring Batch high-level overview ● Quick start using Spring Batch ● Batch Specification Language ● General Principles and Guidelines

Slide 3

Slide 3 text

3 What are the Batch characteristics ● Long-running – Often outside office hours ● Non-interactive – Often include logic for handling errors or restarts ● Process large volumes of data – More than fits in memory or a single transaction

Slide 4

Slide 4 text

4 Batch processing ● Close of business processing – Order processing – Business reporting – Account reconciliation ● Import/export handling – a.k.a. ETL jobs (Extract-Transform-Load) – Instrument/position import – Data warehouse synchronization ● Large-scale output jobs – Loyalty scheme emails – Bank statements

Slide 5

Slide 5 text

5 Batch Domain ● The batch domain adds some value to a plain business process by introducing new concepts: – A job has an identity – defines what needs to be done – A job has steps – A job instance can be restarted after a failure – a new execution – Each execution has a start time, stop time, status – The job instance has an overall status – Each execution can tell us how many items were processed, how many commits, rollbacks, skips ● Add value through robustness, reliability, traceability (SLA)

Slide 6

Slide 6 text

6 Batch Applications for the Java API for robust batch processing targeted to Java EE, Java ● ItemReader class is designed to consume a chunk of the processing data (usually a single record); ● ItemProcessor, for which business and domain logic is to be imposed upon the chunk; ● ItemWriter, to which records will be delegated post-processing, and thereafter aggregated JobOperator Job Step Job Repository ItemProcessor ItemReader ItemWriter

Slide 7

Slide 7 text

7 Sample: Import flat files to database ItemReader File Database ItemWriter Step ItemProcessor

Slide 8

Slide 8 text

8 Job Configuration

Slide 9

Slide 9 text

9 Batch Applications with the Java Config @Bean public ItemReader reader() { FlatFileItemReader reader = return reader; @Bean public ItemProcessor processor() { return new PersonItemProcessor(); @Bean public ItemWriter writer(DataSource dataSource) { JdbcBatchItemWriter writer = new JdbcBatchItemWriter(); ... return writer; @Bean public Step step1(StepBuilderFactory stepBuilder, ItemReader reader, ItemWriter writer, ItemProcessor processor) { return stepBuilder.get("step1") .chunk(10) .reader(reader) .processor(processor) .writer(writer) .build(); }

Slide 10

Slide 10 text

● Application developers have clear, reusable interfaces for constructing batch style applications. ● Job writers have a powerful expression language for how to execute the steps of a batch execution. ● Solution integrators have a runtime API for initiating and controlling batch execution. ● a programming model ● a job specification language ● a batch runtime ● Spring Batch make available a framework for building, deploying, and running batch applications. Spring Batch has influenced JSR 352 and it addresses three critical concerns:

Slide 11

Slide 11 text

11 Batch pplications for the Java Platform Batch Applications for the Java Platform, known also as JSR-352, offers application developers a model for developing robust batch processing systems. The core of this programming model is a development pattern borrowed from Spring Batch, coined the Reader- Processor-Writer pattern, in which developers are encouraged to embrace a Chunk-oriented processing standard.

Slide 12

Slide 12 text

12 Batch Programming Artifact Overview JSR 352 – Codifies key batch programming constructs - Reader, Processor, Writer, Listener, more... - Btch runtime orchestrate flow based on well known patterns

Slide 13

Slide 13 text

Chunk-Oriented Processing ● Input-output can be grouped together ● Input collects Items before outputting: Chunk-Oriented Processing ● Optional ItemProcessor Delegate business logic Chunk with size N

Slide 14

Slide 14 text

14 Batch Usage Patterns

Slide 15

Slide 15 text

JobLauncher JobLauncher start() JobExecution Job execute() Business ExitStatus Client With ExitStatus.COMPLETED or FAILED doStuff() Done

Slide 16

Slide 16 text

16 More Readers and Writers ● Spring Batch provides many implementations of ItemReader and ItemWriter, e.g. – Flat files – XML – JDBC: cursor & driving query – Hibernate – JMS ● Some simple jobs can be implemented with off-the- shelf components

Slide 17

Slide 17 text

Run Tier is concerned with the scheduling and launching of the application. A vendor product is typically used in this tier to allow time-based and interdependent scheduling of batch jobs as well as providing parallel processing capabilities. Job Tier is responsible for the overall execution of a batch job. It sequentially executes batch steps, ensuring that all steps are in the correct state and all appropriate policies are enforced. Application Tier contains components required to execute the program. It contains specific modules that address the required batch functionality and enforces policies around a module execution (e.g., commit intervals, capture of statistics, etc.) Data Tier provides the integration with the physical data sources that might include databases, files, or queues. Note: In some cases the Job tier can be completely missing and in other cases one job script can start several batch job instances.

Slide 18

Slide 18 text

18 General Principles and Guidelines ● A batch architecture typically affects on-line architecture and vice versa. Design with both architectures and environments in mind using common building blocks when possible. ● Simplify as much as possible and avoid building complex logical structures in single batch applications. ● Process data as close to where the data physically resides as possible or vice versa (i.e., keep your data where your processing occurs). ● Minimize system resource use, especially I/O. Perform as many operations as possible in internal memory. ● Review application I/O (analyze SQL statements) to ensure that unnecessary physical I/O is avoided. In particular, the following four common flaws need to be looked for:

Slide 19

Slide 19 text

19 General Principles and Guidelines ● Allocate enough memory at the beginning of a batch application to avoid time-consuming reallocation during the process. ● Always assume the worst with regard to data integrity. Insert adequate checks and record validation to maintain data integrity. ● Implement checksums for internal validation where possible. For example, flat files should have a trailer record telling the total of records in the file and an aggregate of the key fields. ● Plan and execute stress tests as early as possible in a production-like environment with realistic data volumes.

Slide 20

Slide 20 text

Questions ?

Slide 21

Slide 21 text

21 Reference ● http://docs.spring.io/spring-batch/batch-principles-guidelin es.html ● http://docs.spring.io/spring-batch/faq.html ● http://docs.spring.io/spring-batch-core/index.html ● http://docs.spring.io/spring-batch-admin/reference/referen ce.xhtml ● http://spring.io/guides/gs/batch-processing/ https://github.com/spring-projects/spring-batch https://github.com/spring-guides/gs-batch-processing

Slide 22

Slide 22 text

Spring Batch Admin ● Sub project of Spring Batch ● Provides Web UI and ReSTFul interface to manage batch processes http://static.springsource.org/spring-batch-admin/index.html ● Manager, Resources, Sample WAR – Deployed with batch job(s) as single app to be able to control & monitor jobs – Or monitors external jobs only via shared database

Slide 23

Slide 23 text

Home Page

Slide 24

Slide 24 text

Registered Jobs

Slide 25

Slide 25 text

Launching Jobs

Slide 26

Slide 26 text

Details for Job Execution

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

28 General Principles and Guidelines ● There are a great many extension points in Spring Batch for the framework developer (as opposed to the implementor of business logic). Clients are expected to create their own more specific strategies that can be plugged in to control things like commit intervals (CompletionPolicy), rules about how to deal with exceptions (ExceptionHandler), and many others. ● Generally you can expect anything at the top level of the source tree in packages org.springframework.batch.* to be public, but not necessarily sub-classable. Extending the concrete implementations of most strategies is discouraged in favour of a composition or forking approach. If your code can use only the interfaces from Spring Batch, that gives you the greatest possible portability.

Slide 29

Slide 29 text

29 General Principles and Guidelines ● A specific implementation of the Step deals with the concern of breaking apart the business logic and sharing it efficiently between parallel processes or processors (see PartitionStep ). ● There are a number of technologies that could play a role here. The essence is just a set of concurrent remote calls to distributed agents that can handle some business processing. ● One implementation that we have had some experience with is a set of remote web services handling the business processing. We send a specific range of primary keys for the inputs to each of a number of remote calls.

Slide 30

Slide 30 text

30