Slide 1

Slide 1 text

Easy Batch The simple, stupid batch processing framework for Java Mahmoud Ben Hassine https://benas.github.io @b_e_n_a_s

Slide 2

Slide 2 text

2 Agenda • Introduction • State of the art • Motivations • Easy Batch • Overview • Basic usage • Advanced topics • Wrap-up

Slide 3

Slide 3 text

3 Agenda • Introduction • State of the art • Motivations • Easy Batch • Overview • Basic usage • Advanced topics • Wrap-up

Slide 4

Slide 4 text

Batch vs Stream processing Batch processing Stream processing Bounded data set Unbounded data stream High latency Low latency Static data set Dynamic data stream

Slide 5

Slide 5 text

Batch processing • Long running jobs • No human interaction • No fancy GUIs • OutOfMemory errors! 5

Slide 6

Slide 6 text

State of the art* 6 JSR-352 Excellent solutions! But .. *: Big data tools like Spark, Flink, etc are out of scope

Slide 7

Slide 7 text

What’s wrong with
 Spring Batch / JSR 352? 7 “I have to admit I got a little overwhelmed by the complexity and amount of configuration needed for even a simple example”, Jeff Zapotoczny “What should we think of the Spring Batch solution? Complex. Obviously, it looks more complicated than the simple approaches. This is typical of a framework: the learning curve is steeper”, Arnaud Cogoluègnes “Recently evaluated Spring Batch, and quickly rejected it once I realized that it added nothing to my project aside from bloat and overhead”, RT. Person Complex configuration + Steep learning curve

Slide 8

Slide 8 text

What’s wrong with
 Spring Batch / JSR 352? 8 “The context of a Spring Batch application grows pretty quick and involves configuring a lot of stuff that, at the outset, it just doesn't seem like you should need to configure. A "job repository" to track the status and history of job executions, which itself requires a data source - just to get started? Wow, that's a bit heavy handed.”, Jeff Zapotoczny “On voit que l’on a besoin d’un transaction manager. Cette propriété est obligatoire, ce qui est à mon sens dommage pour les cas simples comme le nôtre où nous n’utilisons pas les transactions.”, Julien Jakubowski Mandatory components that you might not need “Spring Batch or How Not to Design an API.. Why do I Need a Transaction Manager? Why do I Need a Job Repository?”, William Shields

Slide 9

Slide 9 text

Agenda • Introduction • State of the art • Motivations • Easy Batch • Overview • Basic usage • Advanced topics • Wrap-up

Slide 10

Slide 10 text

Motivations (1/2) • Keep it simple, stupid • Flexible and extensible API • Modular architecture • Reduce boilerplate code 10 Build yet another: - big data - cloud-native - map-reduce - fault-tolerant - ultra high-performance - massively parallel - distributed - reactive - real-time - resilient - [put buzzword here]
 processing framework. No, this is not the goal.. Goals Non Goals

Slide 11

Slide 11 text

Motivations (2/2) #id,name,description,price,published,lastUpdate 0001,product1,description1,2500,true,2014-01-01 000x,product2,description2,2400,true,2014-01-01 0003,,description3,2300,true,2014-01-01 0004,product4,description4,-2200,true,2014-01-01 0005,product5,description5,2100,true,2024-01-01 0006,product6,description6,2000,true,2014-01-01,Blah! import java.util.Date; public class Product { private long id; private String name; private String description; private double price; private boolean published; private Date lastUpdate; // getters, setters omitted } products.csv Common requirements: - Read file line by line - Filter header record - Parse and map data to the Product bean - Validate product data - Do something with the product (business logic) - Log errors - Report statistics The goal is to keep focus on business logic! Boilerplate Product.java 11

Slide 12

Slide 12 text

Agenda • Introduction • State of the art • Motivations • Easy Batch • Overview • Basic usage • Advanced topics • Wrap-up

Slide 13

Slide 13 text

Easy Batch in a nutshell • Name: Easy Batch • Date of birth: 13/08/2012 • Weight: 108 Kb (v6) • DNA: https://github.com/j-easy/easy-batch 13

Slide 14

Slide 14 text

Overview 14

Slide 15

Slide 15 text

The Record abstraction (1/2) 15

Slide 16

Slide 16 text

public interface Record

{ /** * Header of the record. */ Header getHeader(); /** * Payload of the record. */ P getPayload(); } The Record abstraction (2/2) 16 Header (No, Source, etc) Payload (Raw Data) Record Multiple implementations: FlatFileRecord, XmlRecord, JsonRecord, JdbcRecord, JmsRecord, etc.. Record.java

Slide 17

Slide 17 text

The Batch abstraction 17 { record 1, record 2, ... record n } Batch public class Batch implements Iterable { private List records; } Batch.java

Slide 18

Slide 18 text

The Job abstraction 18 public interface Job extends Callable { String getName(); } class BatchJob implements Job { } • Synchronous execution JobReport report = jobExecutor.execute(job); • Asynchronous execution Future report = jobExecutor.submit(job); • Parallel execution jobExecutor
 .submitAll(job1, job2); • Scheduled execution scheduledExecutorService
 .schedule(job, 2, MINUTES);

Slide 19

Slide 19 text

Batch Jobs 19 • Read records in sequence • Process records in pipeline • Write records in batches

Slide 20

Slide 20 text

Workflow listeners Job listener 20

Slide 21

Slide 21 text

Workflow listeners Batch listener 21

Slide 22

Slide 22 text

Workflow listeners Reader/Writer listeners 22

Slide 23

Slide 23 text

Workflow listeners Pipeline listener 23

Slide 24

Slide 24 text

Reading data 24 • Streaming APIs • One record at a time • Hide low-level APIs

Slide 25

Slide 25 text

Filtering data 25 • Filter undesired records • Data cleaning • Filter chain: multiple filters

Slide 26

Slide 26 text

Mapping data 
 to domain objects 26 • POJO-centric development • Abstract data format • Enforce DDD

Slide 27

Slide 27 text

Validating data • Validate data against application’s constraints • Declarative approach: Bean Validation API (JSR303) public class Tweet { private int id; @NotNull private String user; @Size(min=0, max=280) private String message; } 27

Slide 28

Slide 28 text

Processing pipeline • Define application’s business logic • Multiple processors • Unix-like pipelines 28

Slide 29

Slide 29 text

Writing data 29 • Hide low-level APIs • Write records in batches • Transaction management for relational databases

Slide 30

Slide 30 text

Agenda • Introduction • State of the art • Motivations • Easy Batch • Overview • Basic usage • Advanced topics • Wrap-up

Slide 31

Slide 31 text

Demo 1

Slide 32

Slide 32 text

Demo 2

Slide 33

Slide 33 text

Agenda • Introduction • State of the art • Motivations • Easy Batch • Overview • Basic usage • Advanced topics • Wrap-up

Slide 34

Slide 34 text

Parallel processing 34 • Jobs are Callable objects => jobExecutor.submitAll(job1, job2) • ReportMerger API to merge partial reports • Suitable for physical/logical partitioning

Slide 35

Slide 35 text

Fault tolerance 35 • Retry feature • Retryable record reader/processor/writer • Custom RetryPolicy + RetryTemplate if needed • Skip feature • Batch scanning in case of write error • Skip bad records instead of failing the whole job

Slide 36

Slide 36 text

Real time monitoring 36 Job job = new JobBuilder().enableJmx(true).build();

Slide 37

Slide 37 text

Agenda • Introduction • State of the art • Motivations • Easy Batch • Overview • Basic usage • Advanced usage • Wrap-up

Slide 38

Slide 38 text

Wrap-up • Lightweight, free and open source • Easy to learn, configure and use • Flexible & extensible API • Modular architecture • Fault tolerance features • Declarative data validation • Real-time monitoring 38 • No step concept with flows • No remote partitioning • No remote chunking • Not suitable for big data The not so good ones The good ones

Slide 39

Slide 39 text

FAQs 39 • How does Easy Batch compare to Spring Batch? • Why does Easy Batch not persist job state in a database like Spring Batch? • Why does Easy Batch not provide a Step concept like Spring Batch?

Slide 40

Slide 40 text

Final word: be pragmatic! 40

Slide 41

Slide 41 text

Who is using Easy Batch? 41 “I use this framework in production (and love it)” chsFleury / @github “Try EasyBatch. The simple stupid Batch framework. Try it once and use it forever.” Eddy Bayonne / @stackoverflow “Loving it so far. Making something I'm working on very simple” zackehh_ / @twitter “Thanks @easy_batch. You guys rock - especially your use of fluent interfaces in your APIs :-) #cleancode” NorthConcepts / @twitter “we have successfully used @easy_batch in production at Leroy Merlin and we love it” benensi / @twitter Community feedback Trusted by

Slide 42

Slide 42 text

Thank you! https://benas.github.io @b_e_n_a_s https://github.com/j-easy/easy-batch