batch processing how-to · > custom batch processing framework (not spring batch) > 1...

Post on 23-Aug-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Batch Processing How- ToOr the “The Single Threaded Batch Processing Paradigm”

Stefan Rufer, Netcetera

Matthias Markwalder, SIX Card Solutions

6840

2

Speakers

> Stefan Rufer

– Studied business IT at the University of Applied Sciences in Bern

– Senior Software Engineer at Netcetera

– Main interest: Server side applicat ion development using JEE

> Matthias Markwalder

– Graduated from ETH Zurich

– Senior Developer + Framework Responsible at SIX Card Solut ions

– Main interest: High performance and quality batch processing

3

Why are we here?

> Let 's learn how to bake an omelet .

4

AGENDA

> What do we do

> Sharing our experience

> Wrap up + Q&A

5

What do we do

> Credit / debit card t ransact ion processing

> Backoff ice batch processing applicat ion 24x7x365

> 1.7 Mio card t ransact ions a day

> Volume will double by end of 2010 be ready…

> Migrated from Forté UDS to JEE

> More agile code base now

6

How do we do it

> Transact ional integrity at any t ime

> Custom batch processing framework (not Spring Batch)

> 1 controller builds the jobs35 workers process the steps of jobs(or as many as you want and your system can take)

> 1 applicat ion server (12 cores)

> 1 database server (12 cores, 1.5TB SAN)

7

Batch Processing Basics

> It ‘s simple, but parallel:

– Read f ile(s)

– Process a bit

– Write f ile(s)

> Terminology fromSpring Batch

8

AGENDA

> What do we do

> Sharing our experience

> Wrap up + Q&A

9

Bake an omelet

> 200g f lour, 3 eggs, 2 dl milk, 2 dl water, ½ table spoon salt

> St ir well, wait 30min ( )

> St ir again

> Put lit t le but ter in heated pan

> Add 1dl dough

> Bake unt il slight ly brown, f lip over, bake again half as long

> Put cheese / marmalade / apfelmus / ... on top, fold

> Enjoy

10

Jobs run in parallel

Motivat ion

> Load balancing

Example

> Complete yesterdays reports while doing today's business

How to achieve

> Use batch scheduling applicat ion that controls your ent ire processing.

> Read/ modify categorizat ion of jobs

12

Load limitations

Motivat ion

> Load balancing

Example

> Generate 70 reports, but max 20 in parallel

How to achieve

> Number of workers one job can use

> Priorit ies of the steps of a job

13

Decouple controller + workers

Motivat ion

> Scalability

Example

> SETI@home

14

Motivat ion

> Avoid structuring steps in code

Example

> Collect data, af terwardswrite a f ile.

How to achieve

> Sequent ial execut ion

> Fail on except ion (rollback ent ire step)

Step trees, Sequential, Fail on Exception

15

Motivat ion

> Minimize work left

Example

> Process 30'000transact ions in 3 steps.

How to achieve

> Parallel execut ion

> Continue on except ion (st ill rollback ent ire step)

Step trees, Parallel, Continue on Exception

Motivat ion

> Speedup

Example

> A f ile of 200'000 credit card authorisat ions and transact ions have to be read into database.

How to achieve

> Cut input f ile in pieces of 10'000 lines each.

– btw: perl, sort are unbeaten for this...

> Process each piece in a parallel step.

16

Parallelize reading

17

Parallelize processing

Motivat ion

> Speedup

Example

> Summarize accounting data and store result in database again.

How to achieve

> Group data in chunks of 10'000 and process each chunk in a parallel step.

> Choose grouping criteria carefully:

– No overlapping data areas

– Pass along data that you had to read for the grouping process

18

Parallelize processing – how to group

Motivat ion> Structuring your data in parallelizable chunks> Load balancing

Example> Parallelize processing by client as data is dist inct by design.

How to achieve> Group by client> Group by keys: Ranges or ids

– Ranges (1..5) can grow very large– Keys (1, 2, 3, 4, 5) can become very many

19

Parallelize writing

Motivat ion

> Transact ional integrity while writ ing f iles.

> Easy recovery while writ ing f iles.

Example

> Collect data for the payment f ile.

How to achieve

> Collect data in parallel and write to a staging table.

> Staging table content very close to target f ile format.

> In a last step dump ent ire content of staging table to f ile.

20

Different processes write in parallel

Motivat ion> Don't lock out each other

Example> Account informat ion changes

while account balance grows.

How to achieve> No opt imist ic locking> Modify deltas on sums and counters> Keep dist inct f ields for dif ferent parallel jobs> Be aware of deadlock potent ial

21

Avoid insert and update in same table and stepMotivat ion

> Speedup

> Avoid DB locks

Example

> Summary rows in same table asthe raw data.

How to achieve

> Normalize your database.

22

Let the database work for you

Motivat ion

> Simple code

> Speedup

Example

> Sort ing or joining arrays in memory.

How to achieve

> Code review.

> Book SQL course.

23

Read long, write short

Motivat ion> Keep lock contention on database minimal> Keep transactional DB overhead minimal

Example> Fully process the whole batch of 1‘000 records before start ing to write to

DB.

How to achieve> 1 (one) "writ ing" database transaction per step.

interface IModifyingStepRunner {

void prepareData();

void writeData();

}

24

This omelet did not taste like grandma's!

> Despite following the recipe, there are the hidden corners

> Let's have a look at some pitfalls

25

Don't forget to catch Error

Motivat ion> Application integrity delegated to DB

Example> OutOfMemoryError caused half of a batch to be committed. Fatal as rerun

can not f ix inconsistency.

How to fixtry { result = action.doInTransaction(status);} catch (Throwable err) { transactionManager.rollback(status); throw err;}transactionManager.commit(status);

26

Use BufferedReader / BufferedWriter

Motivat ion

> Speedup (f ile reading t ime cut in half)

Example

> Forgot to use BufferedReader in f ile reading framework.

How to f ix

> Code review.

> Prof ile if performance "feels not right".

27

Use 1 thread only

Motivat ion

> Simplicity for the programmer

> Safety (no concurrent access)

Example

> Singleton, synchronized blocks, stat ic variables, stateful step runners – we had it all...

How to achieve

> Configure framework to use one JVM per worker.

28

Cache wisely

Motivat ion> Speedup> Limit memory use

Example> Tax rates do not change during a processing day, cache it long.> Customer data will be reused if processing transact ion of same

customer – cache it short .

How to achieve> Cache per worker> Cache lifet imes: Worker / step / on demand

29

Support JDBC batch operations

Motivat ion> Speedup

ExampleList<Booking> bookings = new ArrayList<Booking>();...bookingDao.update(bookings);

How to achieve> Enhance your database layer with a built - in JDBC batch facility.> Execute batch after 1000 items added. > Automat ically re- run failed batch using single JDBC statements

30

Structured patching

Motivat ion

> Risk management

> Stay agile in product ion

Example

> Bug found, f ixed and unit tested. Deploy to product ion asap.

How to achieve

> Eclipse- wizard to create patch (all f iles involved to f ix a bug)

> Patch- script that applies .class f ile/ SQL script / whatever...

31

Never, ever, update primary keys

Motivat ion

> Good database design

> Speedup

Example

> Homemade library always wrote ent ire row to database.

How to f ix

> Only write changed f ields (dirty f lags).

> Make primary keys immutable on your objects.

32

AGENDA

> What do we do

> Sharing our experience

> Wrap up + Q&A

33

Future

> Scalability is an issue with a single database server.

– Part it ioning opt ions used, but not to the end.

– Will Moore's law save us again?

> Processing double the volume st ill to be proven...

34

If you remember just three things...

Java batch processing works and is cool :- )

Trade- offs:> Do not stock the work, start.> Single threaded, many JVMs.> Designing for scalability, stability needs experts.

http:/ / www.google.ch/ search?q= how+ to+ flip+ an+ omelet

Stefan Rufer stefan.rufer@netcetera.ch

Netcetera AG www.netcetera.ch

Matthias Markwalder matthias.markwalder@six-group.com

SIX Card Solutions www.six - group.com

36

Links / References

> http:/ / en.wikipedia.org/ wiki/ Batch_processing

> http:/ / stat ic.springframework.org/ spring- batch/

> http:/ / www.bmc.com/ products/ offering/ control- m.html

> http:/ / www.javaspecialists.eu/

And to really learn how to bake f ine omelets, buy a book:

> http:/ / de.wikipedia.org/ wiki/ Marianne_Kaltenbach

> http:/ / www.oreilly.de/ catalog/ geeksckbkger/

37

Other batch processing frameworks (public only)> http:/ / www.bmap4j.org/

> http:/ / freshmeat.net/ projects/ jppf

> http:/ / hadoop.apache.org/

top related