batch processing how-to · > custom batch processing framework (not spring batch) > 1...
TRANSCRIPT
Batch Processing How- ToOr the “The Single Threaded Batch Processing Paradigm”
Stefan Rufer, Netcetera
Matthias Markwalder, SIX Card Solutions
6840
2
Speakers
> Stefan Rufer
– Studied business IT at the University of Applied Sciences in Bern
– Senior Software Engineer at Netcetera
– Main interest: Server side applicat ion development using JEE
> Matthias Markwalder
– Graduated from ETH Zurich
– Senior Developer + Framework Responsible at SIX Card Solut ions
– Main interest: High performance and quality batch processing
3
Why are we here?
> Let 's learn how to bake an omelet .
4
AGENDA
> What do we do
> Sharing our experience
> Wrap up + Q&A
5
What do we do
> Credit / debit card t ransact ion processing
> Backoff ice batch processing applicat ion 24x7x365
> 1.7 Mio card t ransact ions a day
> Volume will double by end of 2010 be ready…
> Migrated from Forté UDS to JEE
> More agile code base now
6
How do we do it
> Transact ional integrity at any t ime
> Custom batch processing framework (not Spring Batch)
> 1 controller builds the jobs35 workers process the steps of jobs(or as many as you want and your system can take)
> 1 applicat ion server (12 cores)
> 1 database server (12 cores, 1.5TB SAN)
7
Batch Processing Basics
> It ‘s simple, but parallel:
– Read f ile(s)
– Process a bit
– Write f ile(s)
> Terminology fromSpring Batch
8
AGENDA
> What do we do
> Sharing our experience
> Wrap up + Q&A
9
Bake an omelet
> 200g f lour, 3 eggs, 2 dl milk, 2 dl water, ½ table spoon salt
> St ir well, wait 30min ( )
> St ir again
> Put lit t le but ter in heated pan
> Add 1dl dough
> Bake unt il slight ly brown, f lip over, bake again half as long
> Put cheese / marmalade / apfelmus / ... on top, fold
> Enjoy
10
Jobs run in parallel
Motivat ion
> Load balancing
Example
> Complete yesterdays reports while doing today's business
How to achieve
> Use batch scheduling applicat ion that controls your ent ire processing.
> Read/ modify categorizat ion of jobs
12
Load limitations
Motivat ion
> Load balancing
Example
> Generate 70 reports, but max 20 in parallel
How to achieve
> Number of workers one job can use
> Priorit ies of the steps of a job
13
Decouple controller + workers
Motivat ion
> Scalability
Example
> SETI@home
14
Motivat ion
> Avoid structuring steps in code
Example
> Collect data, af terwardswrite a f ile.
How to achieve
> Sequent ial execut ion
> Fail on except ion (rollback ent ire step)
Step trees, Sequential, Fail on Exception
15
Motivat ion
> Minimize work left
Example
> Process 30'000transact ions in 3 steps.
How to achieve
> Parallel execut ion
> Continue on except ion (st ill rollback ent ire step)
Step trees, Parallel, Continue on Exception
Motivat ion
> Speedup
Example
> A f ile of 200'000 credit card authorisat ions and transact ions have to be read into database.
How to achieve
> Cut input f ile in pieces of 10'000 lines each.
– btw: perl, sort are unbeaten for this...
> Process each piece in a parallel step.
16
Parallelize reading
17
Parallelize processing
Motivat ion
> Speedup
Example
> Summarize accounting data and store result in database again.
How to achieve
> Group data in chunks of 10'000 and process each chunk in a parallel step.
> Choose grouping criteria carefully:
– No overlapping data areas
– Pass along data that you had to read for the grouping process
18
Parallelize processing – how to group
Motivat ion> Structuring your data in parallelizable chunks> Load balancing
Example> Parallelize processing by client as data is dist inct by design.
How to achieve> Group by client> Group by keys: Ranges or ids
– Ranges (1..5) can grow very large– Keys (1, 2, 3, 4, 5) can become very many
19
Parallelize writing
Motivat ion
> Transact ional integrity while writ ing f iles.
> Easy recovery while writ ing f iles.
Example
> Collect data for the payment f ile.
How to achieve
> Collect data in parallel and write to a staging table.
> Staging table content very close to target f ile format.
> In a last step dump ent ire content of staging table to f ile.
20
Different processes write in parallel
Motivat ion> Don't lock out each other
Example> Account informat ion changes
while account balance grows.
How to achieve> No opt imist ic locking> Modify deltas on sums and counters> Keep dist inct f ields for dif ferent parallel jobs> Be aware of deadlock potent ial
21
Avoid insert and update in same table and stepMotivat ion
> Speedup
> Avoid DB locks
Example
> Summary rows in same table asthe raw data.
How to achieve
> Normalize your database.
22
Let the database work for you
Motivat ion
> Simple code
> Speedup
Example
> Sort ing or joining arrays in memory.
How to achieve
> Code review.
> Book SQL course.
23
Read long, write short
Motivat ion> Keep lock contention on database minimal> Keep transactional DB overhead minimal
Example> Fully process the whole batch of 1‘000 records before start ing to write to
DB.
How to achieve> 1 (one) "writ ing" database transaction per step.
interface IModifyingStepRunner {
void prepareData();
void writeData();
}
24
This omelet did not taste like grandma's!
> Despite following the recipe, there are the hidden corners
> Let's have a look at some pitfalls
25
Don't forget to catch Error
Motivat ion> Application integrity delegated to DB
Example> OutOfMemoryError caused half of a batch to be committed. Fatal as rerun
can not f ix inconsistency.
How to fixtry { result = action.doInTransaction(status);} catch (Throwable err) { transactionManager.rollback(status); throw err;}transactionManager.commit(status);
26
Use BufferedReader / BufferedWriter
Motivat ion
> Speedup (f ile reading t ime cut in half)
Example
> Forgot to use BufferedReader in f ile reading framework.
How to f ix
> Code review.
> Prof ile if performance "feels not right".
27
Use 1 thread only
Motivat ion
> Simplicity for the programmer
> Safety (no concurrent access)
Example
> Singleton, synchronized blocks, stat ic variables, stateful step runners – we had it all...
How to achieve
> Configure framework to use one JVM per worker.
28
Cache wisely
Motivat ion> Speedup> Limit memory use
Example> Tax rates do not change during a processing day, cache it long.> Customer data will be reused if processing transact ion of same
customer – cache it short .
How to achieve> Cache per worker> Cache lifet imes: Worker / step / on demand
29
Support JDBC batch operations
Motivat ion> Speedup
ExampleList<Booking> bookings = new ArrayList<Booking>();...bookingDao.update(bookings);
How to achieve> Enhance your database layer with a built - in JDBC batch facility.> Execute batch after 1000 items added. > Automat ically re- run failed batch using single JDBC statements
30
Structured patching
Motivat ion
> Risk management
> Stay agile in product ion
Example
> Bug found, f ixed and unit tested. Deploy to product ion asap.
How to achieve
> Eclipse- wizard to create patch (all f iles involved to f ix a bug)
> Patch- script that applies .class f ile/ SQL script / whatever...
31
Never, ever, update primary keys
Motivat ion
> Good database design
> Speedup
Example
> Homemade library always wrote ent ire row to database.
How to f ix
> Only write changed f ields (dirty f lags).
> Make primary keys immutable on your objects.
32
AGENDA
> What do we do
> Sharing our experience
> Wrap up + Q&A
33
Future
> Scalability is an issue with a single database server.
– Part it ioning opt ions used, but not to the end.
– Will Moore's law save us again?
> Processing double the volume st ill to be proven...
34
If you remember just three things...
Java batch processing works and is cool :- )
Trade- offs:> Do not stock the work, start.> Single threaded, many JVMs.> Designing for scalability, stability needs experts.
http:/ / www.google.ch/ search?q= how+ to+ flip+ an+ omelet
Stefan Rufer [email protected]
Netcetera AG www.netcetera.ch
Matthias Markwalder [email protected]
SIX Card Solutions www.six - group.com
36
Links / References
> http:/ / en.wikipedia.org/ wiki/ Batch_processing
> http:/ / stat ic.springframework.org/ spring- batch/
> http:/ / www.bmc.com/ products/ offering/ control- m.html
> http:/ / www.javaspecialists.eu/
And to really learn how to bake f ine omelets, buy a book:
> http:/ / de.wikipedia.org/ wiki/ Marianne_Kaltenbach
> http:/ / www.oreilly.de/ catalog/ geeksckbkger/
37
Other batch processing frameworks (public only)> http:/ / www.bmap4j.org/
> http:/ / freshmeat.net/ projects/ jppf
> http:/ / hadoop.apache.org/