batch processing how-to · > custom batch processing framework (not spring batch) > 1...

37
Batch Processing How- To Or the “The Single Threaded Batch Processing Paradigm” Stefan Rufer, Netcetera Matthias Markwalder, SIX Card Solutions 6840

Upload: others

Post on 23-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

Batch Processing How- ToOr the “The Single Threaded Batch Processing Paradigm”

Stefan Rufer, Netcetera

Matthias Markwalder, SIX Card Solutions

6840

Page 2: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

2

Speakers

> Stefan Rufer

– Studied business IT at the University of Applied Sciences in Bern

– Senior Software Engineer at Netcetera

– Main interest: Server side applicat ion development using JEE

> Matthias Markwalder

– Graduated from ETH Zurich

– Senior Developer + Framework Responsible at SIX Card Solut ions

– Main interest: High performance and quality batch processing

Page 3: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

3

Why are we here?

> Let 's learn how to bake an omelet .

Page 4: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

4

AGENDA

> What do we do

> Sharing our experience

> Wrap up + Q&A

Page 5: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

5

What do we do

> Credit / debit card t ransact ion processing

> Backoff ice batch processing applicat ion 24x7x365

> 1.7 Mio card t ransact ions a day

> Volume will double by end of 2010 be ready…

> Migrated from Forté UDS to JEE

> More agile code base now

Page 6: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

6

How do we do it

> Transact ional integrity at any t ime

> Custom batch processing framework (not Spring Batch)

> 1 controller builds the jobs35 workers process the steps of jobs(or as many as you want and your system can take)

> 1 applicat ion server (12 cores)

> 1 database server (12 cores, 1.5TB SAN)

Page 7: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

7

Batch Processing Basics

> It ‘s simple, but parallel:

– Read f ile(s)

– Process a bit

– Write f ile(s)

> Terminology fromSpring Batch

Page 8: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

8

AGENDA

> What do we do

> Sharing our experience

> Wrap up + Q&A

Page 9: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

9

Bake an omelet

> 200g f lour, 3 eggs, 2 dl milk, 2 dl water, ½ table spoon salt

> St ir well, wait 30min ( )

> St ir again

> Put lit t le but ter in heated pan

> Add 1dl dough

> Bake unt il slight ly brown, f lip over, bake again half as long

> Put cheese / marmalade / apfelmus / ... on top, fold

> Enjoy

Page 10: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

10

Jobs run in parallel

Motivat ion

> Load balancing

Example

> Complete yesterdays reports while doing today's business

How to achieve

> Use batch scheduling applicat ion that controls your ent ire processing.

> Read/ modify categorizat ion of jobs

Page 11: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your
Page 12: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

12

Load limitations

Motivat ion

> Load balancing

Example

> Generate 70 reports, but max 20 in parallel

How to achieve

> Number of workers one job can use

> Priorit ies of the steps of a job

Page 13: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

13

Decouple controller + workers

Motivat ion

> Scalability

Example

> SETI@home

Page 14: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

14

Motivat ion

> Avoid structuring steps in code

Example

> Collect data, af terwardswrite a f ile.

How to achieve

> Sequent ial execut ion

> Fail on except ion (rollback ent ire step)

Step trees, Sequential, Fail on Exception

Page 15: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

15

Motivat ion

> Minimize work left

Example

> Process 30'000transact ions in 3 steps.

How to achieve

> Parallel execut ion

> Continue on except ion (st ill rollback ent ire step)

Step trees, Parallel, Continue on Exception

Page 16: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

Motivat ion

> Speedup

Example

> A f ile of 200'000 credit card authorisat ions and transact ions have to be read into database.

How to achieve

> Cut input f ile in pieces of 10'000 lines each.

– btw: perl, sort are unbeaten for this...

> Process each piece in a parallel step.

16

Parallelize reading

Page 17: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

17

Parallelize processing

Motivat ion

> Speedup

Example

> Summarize accounting data and store result in database again.

How to achieve

> Group data in chunks of 10'000 and process each chunk in a parallel step.

> Choose grouping criteria carefully:

– No overlapping data areas

– Pass along data that you had to read for the grouping process

Page 18: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

18

Parallelize processing – how to group

Motivat ion> Structuring your data in parallelizable chunks> Load balancing

Example> Parallelize processing by client as data is dist inct by design.

How to achieve> Group by client> Group by keys: Ranges or ids

– Ranges (1..5) can grow very large– Keys (1, 2, 3, 4, 5) can become very many

Page 19: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

19

Parallelize writing

Motivat ion

> Transact ional integrity while writ ing f iles.

> Easy recovery while writ ing f iles.

Example

> Collect data for the payment f ile.

How to achieve

> Collect data in parallel and write to a staging table.

> Staging table content very close to target f ile format.

> In a last step dump ent ire content of staging table to f ile.

Page 20: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

20

Different processes write in parallel

Motivat ion> Don't lock out each other

Example> Account informat ion changes

while account balance grows.

How to achieve> No opt imist ic locking> Modify deltas on sums and counters> Keep dist inct f ields for dif ferent parallel jobs> Be aware of deadlock potent ial

Page 21: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

21

Avoid insert and update in same table and stepMotivat ion

> Speedup

> Avoid DB locks

Example

> Summary rows in same table asthe raw data.

How to achieve

> Normalize your database.

Page 22: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

22

Let the database work for you

Motivat ion

> Simple code

> Speedup

Example

> Sort ing or joining arrays in memory.

How to achieve

> Code review.

> Book SQL course.

Page 23: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

23

Read long, write short

Motivat ion> Keep lock contention on database minimal> Keep transactional DB overhead minimal

Example> Fully process the whole batch of 1‘000 records before start ing to write to

DB.

How to achieve> 1 (one) "writ ing" database transaction per step.

interface IModifyingStepRunner {

void prepareData();

void writeData();

}

Page 24: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

24

This omelet did not taste like grandma's!

> Despite following the recipe, there are the hidden corners

> Let's have a look at some pitfalls

Page 25: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

25

Don't forget to catch Error

Motivat ion> Application integrity delegated to DB

Example> OutOfMemoryError caused half of a batch to be committed. Fatal as rerun

can not f ix inconsistency.

How to fixtry { result = action.doInTransaction(status);} catch (Throwable err) { transactionManager.rollback(status); throw err;}transactionManager.commit(status);

Page 26: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

26

Use BufferedReader / BufferedWriter

Motivat ion

> Speedup (f ile reading t ime cut in half)

Example

> Forgot to use BufferedReader in f ile reading framework.

How to f ix

> Code review.

> Prof ile if performance "feels not right".

Page 27: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

27

Use 1 thread only

Motivat ion

> Simplicity for the programmer

> Safety (no concurrent access)

Example

> Singleton, synchronized blocks, stat ic variables, stateful step runners – we had it all...

How to achieve

> Configure framework to use one JVM per worker.

Page 28: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

28

Cache wisely

Motivat ion> Speedup> Limit memory use

Example> Tax rates do not change during a processing day, cache it long.> Customer data will be reused if processing transact ion of same

customer – cache it short .

How to achieve> Cache per worker> Cache lifet imes: Worker / step / on demand

Page 29: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

29

Support JDBC batch operations

Motivat ion> Speedup

ExampleList<Booking> bookings = new ArrayList<Booking>();...bookingDao.update(bookings);

How to achieve> Enhance your database layer with a built - in JDBC batch facility.> Execute batch after 1000 items added. > Automat ically re- run failed batch using single JDBC statements

Page 30: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

30

Structured patching

Motivat ion

> Risk management

> Stay agile in product ion

Example

> Bug found, f ixed and unit tested. Deploy to product ion asap.

How to achieve

> Eclipse- wizard to create patch (all f iles involved to f ix a bug)

> Patch- script that applies .class f ile/ SQL script / whatever...

Page 31: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

31

Never, ever, update primary keys

Motivat ion

> Good database design

> Speedup

Example

> Homemade library always wrote ent ire row to database.

How to f ix

> Only write changed f ields (dirty f lags).

> Make primary keys immutable on your objects.

Page 32: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

32

AGENDA

> What do we do

> Sharing our experience

> Wrap up + Q&A

Page 33: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

33

Future

> Scalability is an issue with a single database server.

– Part it ioning opt ions used, but not to the end.

– Will Moore's law save us again?

> Processing double the volume st ill to be proven...

Page 34: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

34

If you remember just three things...

Java batch processing works and is cool :- )

Trade- offs:> Do not stock the work, start.> Single threaded, many JVMs.> Designing for scalability, stability needs experts.

http:/ / www.google.ch/ search?q= how+ to+ flip+ an+ omelet

Page 35: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

Stefan Rufer [email protected]

Netcetera AG www.netcetera.ch

Matthias Markwalder [email protected]

SIX Card Solutions www.six - group.com

Page 36: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

36

Links / References

> http:/ / en.wikipedia.org/ wiki/ Batch_processing

> http:/ / stat ic.springframework.org/ spring- batch/

> http:/ / www.bmc.com/ products/ offering/ control- m.html

> http:/ / www.javaspecialists.eu/

And to really learn how to bake f ine omelets, buy a book:

> http:/ / de.wikipedia.org/ wiki/ Marianne_Kaltenbach

> http:/ / www.oreilly.de/ catalog/ geeksckbkger/

Page 37: Batch Processing How-To · > Custom batch processing framework (not Spring Batch) > 1 controller builds the jobs 35 workers process the steps of jobs (or as many as you want and your

37

Other batch processing frameworks (public only)> http:/ / www.bmap4j.org/

> http:/ / freshmeat.net/ projects/ jppf

> http:/ / hadoop.apache.org/