Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Spring Batch Overview

Spring Batch Overview

Presenting Spring Batch Framework for building, deploying, and running batch applications. Showcase how developers can use batch programming model, a job specification language and a batch runtime. Build on reusable interfaces for constructing batch style applications.

sergiubodiu

April 03, 2014
Tweet

More Decks by sergiubodiu

Other Decks in Technology

Transcript

  1. 1 Singapore Spring User Group 3th April 2014 www.singasug.com Sergiu

    Bodiu IT Consultant BAML https://www.linkedin.com/in/sergiubodiu mailto:[email protected] Spring Batch A Quickstart guide to running batch plications with Spring
  2. 2 Agenda • Batch processing • Spring Batch high-level overview

    • Quick start using Spring Batch • Batch Specification Language • General Principles and Guidelines
  3. 3 What are the Batch characteristics • Long-running – Often

    outside office hours • Non-interactive – Often include logic for handling errors or restarts • Process large volumes of data – More than fits in memory or a single transaction
  4. 4 Batch processing • Close of business processing – Order

    processing – Business reporting – Account reconciliation • Import/export handling – a.k.a. ETL jobs (Extract-Transform-Load) – Instrument/position import – Data warehouse synchronization • Large-scale output jobs – Loyalty scheme emails – Bank statements
  5. 5 Batch Domain • The batch domain adds some value

    to a plain business process by introducing new concepts: – A job has an identity – defines what needs to be done – A job has steps – A job instance can be restarted after a failure – a new execution – Each execution has a start time, stop time, status – The job instance has an overall status – Each execution can tell us how many items were processed, how many commits, rollbacks, skips • Add value through robustness, reliability, traceability (SLA)
  6. 6 Batch Applications for the Java API for robust batch

    processing targeted to Java EE, Java • ItemReader class is designed to consume a chunk of the processing data (usually a single record); • ItemProcessor, for which business and domain logic is to be imposed upon the chunk; • ItemWriter, to which records will be delegated post-processing, and thereafter aggregated JobOperator Job Step Job Repository ItemProcessor ItemReader ItemWriter
  7. 8 Job Configuration <job id="myJob"> <step name="myStep"> <tasklet> <chunk reader="myItemReader"

    processor="myItemProcessor" writer="myItemWriter" commit-interval="100" /> </tasklet> </step> </job> <bean id="myItemReader" class="...MyItemReader" /> <bean id="myItemProcessor" class="...MyItemProcessor" /> <bean id="myItemWriter" class="...MyItemWriter" />
  8. 9 Batch Applications with the Java Config @Bean public ItemReader<Person>

    reader() { FlatFileItemReader<Person> reader = return reader; @Bean public ItemProcessor<Person, Person> processor() { return new PersonItemProcessor(); @Bean public ItemWriter<Person> writer(DataSource dataSource) { JdbcBatchItemWriter<Person> writer = new JdbcBatchItemWriter<Person>(); ... return writer; @Bean public Step step1(StepBuilderFactory stepBuilder, ItemReader<Person> reader, ItemWriter<Person> writer, ItemProcessor<Person, Person> processor) { return stepBuilder.get("step1") .<Person, Person>chunk(10) .reader(reader) .processor(processor) .writer(writer) .build(); }
  9. • Application developers have clear, reusable interfaces for constructing batch

    style applications. • Job writers have a powerful expression language for how to execute the steps of a batch execution. • Solution integrators have a runtime API for initiating and controlling batch execution. • a programming model • a job specification language • a batch runtime • Spring Batch make available a framework for building, deploying, and running batch applications. Spring Batch has influenced JSR 352 and it addresses three critical concerns:
  10. 11 Batch pplications for the Java Platform Batch Applications for

    the Java Platform, known also as JSR-352, offers application developers a model for developing robust batch processing systems. The core of this programming model is a development pattern borrowed from Spring Batch, coined the Reader- Processor-Writer pattern, in which developers are encouraged to embrace a Chunk-oriented processing standard.
  11. 12 Batch Programming Artifact Overview JSR 352 – Codifies key

    batch programming constructs - Reader, Processor, Writer, Listener, more... - Btch runtime orchestrate flow based on well known patterns
  12. Chunk-Oriented Processing • Input-output can be grouped together • Input

    collects Items before outputting: Chunk-Oriented Processing • Optional ItemProcessor Delegate business logic Chunk with size N
  13. 16 More Readers and Writers • Spring Batch provides many

    implementations of ItemReader and ItemWriter, e.g. – Flat files – XML – JDBC: cursor & driving query – Hibernate – JMS • Some simple jobs can be implemented with off-the- shelf components
  14. Run Tier is concerned with the scheduling and launching of

    the application. A vendor product is typically used in this tier to allow time-based and interdependent scheduling of batch jobs as well as providing parallel processing capabilities. Job Tier is responsible for the overall execution of a batch job. It sequentially executes batch steps, ensuring that all steps are in the correct state and all appropriate policies are enforced. Application Tier contains components required to execute the program. It contains specific modules that address the required batch functionality and enforces policies around a module execution (e.g., commit intervals, capture of statistics, etc.) Data Tier provides the integration with the physical data sources that might include databases, files, or queues. Note: In some cases the Job tier can be completely missing and in other cases one job script can start several batch job instances.
  15. 18 General Principles and Guidelines • A batch architecture typically

    affects on-line architecture and vice versa. Design with both architectures and environments in mind using common building blocks when possible. • Simplify as much as possible and avoid building complex logical structures in single batch applications. • Process data as close to where the data physically resides as possible or vice versa (i.e., keep your data where your processing occurs). • Minimize system resource use, especially I/O. Perform as many operations as possible in internal memory. • Review application I/O (analyze SQL statements) to ensure that unnecessary physical I/O is avoided. In particular, the following four common flaws need to be looked for:
  16. 19 General Principles and Guidelines • Allocate enough memory at

    the beginning of a batch application to avoid time-consuming reallocation during the process. • Always assume the worst with regard to data integrity. Insert adequate checks and record validation to maintain data integrity. • Implement checksums for internal validation where possible. For example, flat files should have a trailer record telling the total of records in the file and an aggregate of the key fields. • Plan and execute stress tests as early as possible in a production-like environment with realistic data volumes.
  17. 21 Reference • http://docs.spring.io/spring-batch/batch-principles-guidelin es.html • http://docs.spring.io/spring-batch/faq.html • http://docs.spring.io/spring-batch-core/index.html •

    http://docs.spring.io/spring-batch-admin/reference/referen ce.xhtml • http://spring.io/guides/gs/batch-processing/ https://github.com/spring-projects/spring-batch https://github.com/spring-guides/gs-batch-processing
  18. Spring Batch Admin • Sub project of Spring Batch •

    Provides Web UI and ReSTFul interface to manage batch processes http://static.springsource.org/spring-batch-admin/index.html • Manager, Resources, Sample WAR – Deployed with batch job(s) as single app to be able to control & monitor jobs – Or monitors external jobs only via shared database
  19. 28 General Principles and Guidelines • There are a great

    many extension points in Spring Batch for the framework developer (as opposed to the implementor of business logic). Clients are expected to create their own more specific strategies that can be plugged in to control things like commit intervals (CompletionPolicy), rules about how to deal with exceptions (ExceptionHandler), and many others. • Generally you can expect anything at the top level of the source tree in packages org.springframework.batch.* to be public, but not necessarily sub-classable. Extending the concrete implementations of most strategies is discouraged in favour of a composition or forking approach. If your code can use only the interfaces from Spring Batch, that gives you the greatest possible portability.
  20. 29 General Principles and Guidelines • A specific implementation of

    the Step deals with the concern of breaking apart the business logic and sharing it efficiently between parallel processes or processors (see PartitionStep ). • There are a number of technologies that could play a role here. The essence is just a set of concurrent remote calls to distributed agents that can handle some business processing. • One implementation that we have had some experience with is a set of remote web services handling the business processing. We send a specific range of primary keys for the inputs to each of a number of remote calls.
  21. 30