spring batch

14
Published on Javalobby ( http://java.dzone.com) Getting Started With Spring Batch 2.0 By wwheeler C reated 2009/03/25 - 1:26am In this article we're going to take a look at Spring Batch 2.0, the latest version of the Spring Batch framework. Our approach will be strongly practical: we'll cover the key ideas without dwelling too much on the details, we'll get you up and running with one of the sample applications that ships with Spring Batch, and finally we'll take a closer look at the sample app so you can understand what's going on. At the time of this writing Spring Batch 2.0 is actually in RC2 status, so there may be minor changes between now and the GA release. Let's begin with an overview of Spring Batch itself. What Is Spring Batch? While there are lots of different frameworks for building web applications, building web services, performing object/relational mapping and so forth, batch processing frameworks are comparatively rare. Yet enterprises use batch jobs to process billions of transactions daily. Spring Batch fills the gap by providing a Spring-based framework for batch processing. Like all Spring frameworks, it's based on POJOs and dependency injection. In addition it provides infrastructure for building batch jobs as well as execution runtimes for running them. At the highest level, the Spring Batch architecture looks like this: Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1 1 of 14 10/06/2010 12:56 PM

Upload: rui-carlos-lorenzetti-da-silva

Post on 02-Oct-2014

289 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Spring Batch

Published on Javalobby (http://java.dzone.com)Getting Started With Spring Batch 2.0By wwheelerCreated 2009/03/25 - 1:26am

In this article we're going to take a look at Spring Batch 2.0, the latest version ofthe Spring Batch framework. Our approach will be strongly practical: we'll coverthe key ideas without dwelling too much on the details, we'll get you up andrunning with one of the sample applications that ships with Spring Batch, andfinally we'll take a closer look at the sample app so you can understand what'sgoing on.

At the time of this writing Spring Batch 2.0 is actually in RC2 status, so there maybe minor changes between now and the GA release.

Let's begin with an overview of Spring Batch itself.

What Is Spring Batch?

While there are lots of different frameworks for building web applications, buildingweb services, performing object/relational mapping and so forth, batch processingframeworks are comparatively rare. Yet enterprises use batch jobs to processbillions of transactions daily.

Spring Batch fills the gap by providing a Spring-based framework for batchprocessing. Like all Spring frameworks, it's based on POJOs and dependencyinjection. In addition it provides infrastructure for building batch jobs as well asexecution runtimes for running them.

At the highest level, the Spring Batch architecture looks like this:

Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1

1 of 14 10/06/2010 12:56 PM

Page 2: Spring Batch

In figure 1, the top of the hierarchy is the batch application itself. This is whateverbatch processing application you want to write. It depends on the Spring Batchcore module, which primarily provides a runtime environment for your batch jobs.Both the batch app and the core module in turn depend upon an infrastructuremodule that provides classes useful for both building and running batch apps.

Batch processing itself is a decades-old computing concept, and as such, thedomain has standard concepts, terminology and methods. Spring Batch adoptsthe standard approach, as shown in figure 2:

Here we see a hypothetical three-step job, though obviously a job can havearbitrarily many steps. The steps are typically sequential, though as of SpringBatch 2.0 it's possible to define conditional flows (e.g., execute step 2 if step 1succeeds; otherwise execute step 3). We won't cover conditional flows in thisarticle.

Within any given step, the basic process is as follows: read a bunch of "items"

Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1

2 of 14 10/06/2010 12:56 PM

Page 3: Spring Batch

(e.g., database rows, XML elements, lines in a flat file—whatever), process them,and write them out somewhere to make it convenient for subsequent steps towork with the result. There are some subtleties around how often commits occur,but we'll ignore those for now.

With the high-level overview of Spring Batch behind us, let's jump right into thefootball sample application that comes with Spring Batch.

Running The Football Sample Application

Spring Batch includes several sample batch applications. A good starting point isthe (American) football sample app. I'm going to assume that you have yourproject already set up in your IDE. If you're using Eclipse, I recommend installingthe latest version of Spring IDE, including the core plug-in and the Batchextension. That will allow you to visualize the bean dependencies in your Springbean configuration files.

The football sample app is a three-step job. The first step loads a bunch of playerdata in from a text file and copies it into a database table called players. Thesecond step does the same thing with game data, placing the result in a tablecalled games. Finally, the third step generates player summary stats from theplayers and games tables and writes it into a third database table calledplayer_summary.

You might find it useful to glance at the player and game data files, just to seewhat's up. The data files are inside src/main/resources/data/footballjob/input.

Let's run it. We can run the job by running the JUnit test

org.springframework.batch.sample.FootballJobFunctionalTests

in the src/test/java folder. Go ahead and try that now. The JUnit tests should pass.

By default the test uses an in-memory HSQLDB database. While this makes for afast test, it's not so useful for trying to see what the job is actually doing. Soinstead let's run the batch job against a persistent database. I'm using MySQLthough you can use whatever you like. Here's what we need to do.

Step 1. Create a database; e.g. CREATE DATABASE spring_batch_samples.

Step 2. Inside src/main/resources you'll see various batch-xxx.properties files.Open the one corresponding to your RDBMS of choice and modify the propertiesas necessary. Make sure the value of batch.jdbc.url matches the database nameyou chose in step 1.

Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1

3 of 14 10/06/2010 12:56 PM

Page 4: Spring Batch

Step 3. When running the tests, we need to override the default RDBMS specifiedin src/main/resources/data-source-context.xml. If you're lazy, you can just find theenvironment bean in that file and change the defaultValue property's value fromhsql to mysql or sqlserver or whatever. (The options correspond to the batch-xxx.properties files we mentioned above.) The right way to do it, though, is to setthe

org.springframework.batch.support.SystemPropertyInitializer.ENVIRONMENT

system property. There are different ways to do that. If you're using Eclipse, go tothe Run > Run Configurations dialog, and in the run configuration forFootballJobFunctionalTests go to the Arguments tab. Then add the following tothe VM arguments:

-Dorg.springframework.batch.support.SystemPropertyInitializer.ENVIRONMENT=mysql

(I've broken that into two lines for formatting purposes, but it should all be asingle line.)

Step 4. Just to make this batch job more interesting (i.e., to make it muchbigger), open up the src/main/resources/jobs/footballJob.xml application contextfile and look for the footballProperties bean. Change its properties from

<beans:value> games.file.name=games-small.csv player.file.name=player-small1.csv job.commit.interval=2 </beans:value>

to

<beans:value> games.file.name=games.csv player.file.name=player.csv job.commit.interval=100 </beans:value>

Step 5. Run FootballJobFunctionalTests again. It will run for a while depending onhow fast your computer is. Mine is pretty slow but the job still finishes in a coupleof minutes.

Assuming everything runs as it should, step 5 creates several tables in yourdatabase. Here's what it looks like in MySQL:

mysql> show tables;

+--------------------------------+

| Tables_in_spring_batch_samples |

+--------------------------------+

Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1

4 of 14 10/06/2010 12:56 PM

Page 5: Spring Batch

| batch_job_execution |

| batch_job_execution_context |

| batch_job_execution_seq |

| batch_job_instance |

| batch_job_params |

| batch_job_seq |

| batch_staging |

| batch_staging_seq |

| batch_step_execution |

| batch_step_execution_context |

| batch_step_execution_seq |

| customer |

| customer_seq |

| error_log |

| games |

| player_summary |

| players |

| trade |

| trade_seq |

+--------------------------------+

19 rows in set (0.00 sec)

Spring Batch uses the batch_xxx tables to manage job execution. These are partof Spring Batch itself, not part of the samples, and so the SQL scripts thatgenerate them are inside the org.springframework.batch.core-2.0.0.RC2.jar. Onthe other hand, the other tables are sample business tables. These are definedin the src/main/resources/business-schema-xxx.sql scripts. As you can see, thereare some extra tables here—these support some of the other sample apps—butthe only business tables we care about are players, games and player_summary.

Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1

5 of 14 10/06/2010 12:56 PM

Page 6: Spring Batch

There's a lot of data in the tables. Here's what it looks like:

mysql> select count(*) from players;

+----------+

| count(*) |

+----------+

| 4320 |

+----------+

1 row in set (0.00 sec)

mysql> select count(*) from games;

+----------+

| count(*) |

+----------+

| 56377 |

+----------+

1 row in set (0.06 sec)

mysql> select count(*) from player_summary;

+----------+

| count(*) |

+----------+

| 5931 |

+----------+

1 row in set (0.01 sec)

If you want to check out some of the data itself without having to pull down theentire dataset, you can use the following queries:

select * from players limit 10;

select * from games limit 10;

select * from player_summary limit 10;

Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1

6 of 14 10/06/2010 12:56 PM

Page 7: Spring Batch

Just for kicks, you might find it entertaining to investigate the batch_xxx tablestoo. For instance:

mysql> select * from batch_job_execution;

+------------------+---------+-----------------+---------------------+

| JOB_EXECUTION_ID | VERSION | JOB_INSTANCE_ID | CREATE_TIME |

+------------------+---------+-----------------+---------------------+

| 1 | 2 | 1 | 2009-03-22 20:31:40 |

+------------------+---------+-----------------+---------------------+

+---------------------+---------------------+-----------+-----------+

| START_TIME | END_TIME | STATUS | EXIT_CODE |

+---------------------+---------------------+-----------+-----------+

| 2009-03-22 20:31:40 | 2009-03-22 20:33:44 | COMPLETED | COMPLETED |

+---------------------+---------------------+-----------+-----------+

+--------------+---------------------+

| EXIT_MESSAGE | LAST_UPDATED |

+--------------+---------------------+

| | 2009-03-22 20:33:44 |

+--------------+---------------------+

1 row in set (0.00 sec)

This will give you some visibility into how Spring Batch keeps track of jobexecutions, but we're not going to worry about that here. (Consult the SpringBatch 2.0 reference manual for more information on that.)

It's time to take a closer look at what's going on behind the scenes.

Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1

7 of 14 10/06/2010 12:56 PM

Page 8: Spring Batch

Understanding The Football Sample Application

Let's start from FootballJobFunctionalTests and work backwards.

Normally we wouldn't launch batch jobs from JUnit tests, but that's what we'redoing here so let's look at that. The sample app uses the Spring TestContextframework, and without going into the gory details (they're not directly relevant toSpring Batch), it turns out that the TestContext framework provides a defaultapplication context file for FootballJobFunctionalTests; namely

org/springframework/batch/sample/FootballJobFunctionalTests-context.xml

inside the src/test/resources folder. Listing 1 shows what it contains.

<?xml version="1.0" encoding="UTF-8"?><beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd">

<import resource="classpath:/simple-job-launcher-context.xml" /> <import resource="classpath:/jobs/footballJob.xml" /></beans>

In listing 1 we can see that the app context provides a couple of things: first, itprovides via the simple-job-launcher-context.xml import a JobLauncher bean so wecan run jobs; second, it provides via the jobs/footballJob.xml import an actual jobto run. Both of these live in the src/main/resources folder. Once you have aJobLauncher, a Job and a JobParameters (we're using an empty JobParametersbean for this sample app), all we have to do is this:

jobLauncher.run(job, jobParameters);

That's exactly what the FootballJobFunctionalTests class does, though you have tonavigate up its inheritance hierarchy toAbstractBatchLauncherTests.testLaunchJob() to see it.

Anyway, let's look first at the JobLauncher.

Defining A JobLauncher

As noted above, the sample app defines the JobLauncher bean in simple-job-launcher-context.xml. We can see some of the bean dependencies in figure 3,courtesy of Spring IDE:

Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1

8 of 14 10/06/2010 12:56 PM

Page 9: Spring Batch

Listing 2 shows the corresponding application context file.

<?xml version="1.0" encoding="UTF-8"?><beans xmlns="http://www.springframework.org/schema/beans" xmlns:aop="http://www.springframework.org/schema/aop" xmlns:tx="http://www.springframework.org/schema/tx" xmlns:p="http://www.springframework.org/schema/p" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-2.5.xsd http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-2.5.xsd">

... some imports ...

<bean id="jobLauncher" class="org.springframework.batch.core.launch.support. SimpleJobLauncher" p:jobRepository-ref="jobRepository" />

<bean id="jobRepository" class="org.springframework.batch.core.repository.support. JobRepositoryFactoryBean" p:dataSource-ref="dataSource" p:transactionManager-ref="transactionManager" />

... other bean definitions ...

</beans>

I've obviously suppressed some of the beans from the application context. Thetwo beans we need to know about here are the JobLauncher itself and itsJobRepositoryFactoryBean dependency, which is a factory for SimpleJobRepositoryinstances. I already mentioned that the JobLauncher allows us to run jobs. The

Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1

9 of 14 10/06/2010 12:56 PM

Page 10: Spring Batch

point of the JobRepository is to store and retrieve job metadata of the sort storedin the batch_xxx tables we saw earlier. Again we're not going to cover that here,but the basic idea is that the JobRepository contains information on which jobs weran when, which steps succeeded and failed, and that sort of thing. That kind ofmetadata allows Spring Batch to support, for example, job retries.

We'll now consider the job definition itself.

Defining A Job

The footballJob.xml application context defines the football job. It's a long file, solet's digest it in pieces. First, here are the namespace declarations:

<beans:beans xmlns="http://www.springframework.org/schema/batch" xmlns:beans="http://www.springframework.org/schema/beans" xmlns:aop="http://www.springframework.org/schema/aop" xmlns:tx="http://www.springframework.org/schema/tx" xmlns:p="http://www.springframework.org/schema/p" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.0.xsd http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-2.0.xsd http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-2.0.xsd">

Next, here's the configuration that defines the job's high-level step structure:

<job id="footballJob"> <step id="playerload" next="gameLoad"> <tasklet reader="playerFileItemReader" writer="playerWriter" commit-interval="${job.commit.interval}" /> </step> <step id="gameLoad" next="playerSummarization"> <tasklet reader="gameFileItemReader" writer="gameWriter" commit-interval="${job.commit.interval}" /> </step> <step id="playerSummarization" ref="summarizationStep" /></job>

<step id="summarizationStep"> <tasklet reader="playerSummarizationSource" writer="summaryWriter" commit-interval="${job.commit.interval}" /></step>

Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1

10 of 14 10/06/2010 12:56 PM

Page 11: Spring Batch

As we discussed above, we're using batch namespace elements like job, step andtasklet. You can see that it certainly cleans up the configuration as compared todefining everything using bean elements.

Our football job has three steps. Individual steps can use the next attribute topoint to the next step in the flow. Each step has some internal tasklet details(more on these in a minute), and we can define steps either internally to the job(see, e.g., playerLoad and gameLoad) or else they can be externalized (see, e.g.,playerSummarization). I don't think there's a good reason to externalize theplayerSummarization step here other than simply to show that it can be done andto show how to do it.

Earlier in the article we noted that each step reads items from some source,optionally processes them in some way and finally writes them out somewhere.Our three steps fit that general pattern. None of them includes an explicitprocessing step, but they all read items and write them back out.

You may recall that we said that the first two steps read player and game datafrom flat files. They do this using a class from the Spring Batch infrastructuremodule called FlatFileItemReader. Let's see how that works.

Loading Items From a Flat File

We'll focus on the playerload step, since it's essentially the same as thegameLoad step. Here's the definition for the playerFileItemReader bean wereference from the playerLoad step:

<beans:bean id="playerFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader"> <beans:property name="resource" value="classpath:data/footballjob/input/${player.file.name}" /> <beans:property name="lineMapper"> <beans:bean class="org.springframework.batch.item.file.mapping. DefaultLineMapper"> <beans:property name="lineTokenizer"> <beans:bean class="org.springframework.batch.item.file. transform.DelimitedLineTokenizer"> <beans:property name="names" value= "ID,lastName,firstName,position,birthYear,debutYear" /> </beans:bean> </beans:property> <beans:property name="fieldSetMapper"> <beans:bean class="org.springframework.batch.sample.domain. football.internal.PlayerFieldSetMapper" /> </beans:property> </beans:bean> </beans:property></beans:bean>

Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1

11 of 14 10/06/2010 12:56 PM

Page 12: Spring Batch

The player.file.name property resolves to player.csv since that's what we set it tojust before we ran the job. Anyway, there are a couple of dependencies thereader needs. First it needs a Resource to represent the file we want to read.(See the Spring 2.5.6 Reference Documentation, chapter 4, for more informationabout Resources). Second it needs a LineMapper to help tokenize and parse thefile. We won't dig into all the details of the LineMapper dependencies—you cancheck out the Javadocs for the various infrastructure classes involved—but it'sworth taking a peek at the PlayerFieldSetMapper class, since that's a customclass. Listing 3 shows what it does.

package org.springframework.batch.sample.domain.football.internal;

import org.springframework.batch.item.file.mapping.FieldSetMapper;import org.springframework.batch.item.file.transform.FieldSet;import org.springframework.batch.sample.domain.football.Player;

public class PlayerFieldSetMapper implements FieldSetMapper<Player> {

public Player mapFieldSet(FieldSet fs) { if (fs == null) { return null; } Player player = new Player(); player.setId(fs.readString("ID")); player.setLastName(fs.readString("lastName")); player.setFirstName(fs.readString("firstName")); player.setPosition(fs.readString("position")); player.setDebutYear(fs.readInt("debutYear")); player.setBirthYear(fs.readInt("birthYear")); return player; }}

PlayerFieldSetMapper carries a FieldSet (essentially a set of tokens) to a Playerdomain object. If you don't want to do this kind of mapping manually, you mightcheck out the Javadocs for BeanWrapperFieldSetMapper, which uses reflection toaccomplish the mapping automatically.

We'll now turn to the topic of storing records to a database table.

Storing Items Into a Database

Here's the playerWriter bean we referenced from the playerLoad step:

<beans:bean id="playerWriter" class="org.springframework.batch.sample.domain.

Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1

12 of 14 10/06/2010 12:56 PM

Page 13: Spring Batch

football.internal.PlayerItemWriter"> <beans:property name="playerDao"> <beans:bean class="org.springframework.batch.sample.domain.football. internal.JdbcPlayerDao"> <beans:property name="dataSource" ref="dataSource" /> </beans:bean> </beans:property></beans:bean>

The PlayerItemWriter is a custom class, though it turns out that it's pretty trivial aslisting 4 shows.

package org.springframework.batch.sample.domain.football.internal;

import java.util.List;

import org.springframework.batch.item.ItemWriter;import org.springframework.batch.sample.domain.football.Player;import org.springframework.batch.sample.domain.football.PlayerDao;

public class PlayerItemWriter implements ItemWriter<Player> { private PlayerDao playerDao;

public void setPlayerDao(PlayerDao playerDao) { this.playerDao = playerDao; }

public void write(List<? extends Player> players) throws Exception { for (Player player : players) { playerDao.savePlayer(player); } }

There isn't anything special happening here. The step will use theFlatFileItemReader to pull items from a flat file and will pass them in chunks to thePlayerItemWriter, which dutifully saves them to the database.

The examples we've seen so far are among the simplest possible, but the generalidea behind ItemReaders and ItemWriters should be clear now: readers pull itemsfrom an arbitrary data source and map them to domain objects, whereas writersmap domain objects to items in a data sink. But just for good measure, let's takea look at one more ItemReader.

JdbcCursorItemReader

The JdbcCurstorItemReader allows us to pull items from a database. In the case ofthe football job, we're using the JdbcCursorItemReader to pull player and gamedata from the database so that we can synthesize them into PlayerSummarydomain objects, which we'll subsequently write. At any rate here's the definitionfor our playerSummarizationSource, which is part of the job's third step:

<beans:bean id="playerSummarizationSource"

Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1

13 of 14 10/06/2010 12:56 PM

Page 14: Spring Batch

class="org.springframework.batch.item.database.JdbcCursorItemReader"> <beans:property name="dataSource" ref="dataSource" /> <beans:property name="rowMapper"> <beans:bean class="org.springframework.batch.sample.domain.football. internal.PlayerSummaryMapper" /> </beans:property> <beans:property name="sql"> <beans:value> SELECT games.player_id, games.year_no, SUM(COMPLETES), SUM(ATTEMPTS), SUM(PASSING_YARDS), SUM(PASSING_TD), SUM(INTERCEPTIONS), SUM(RUSHES), SUM(RUSH_YARDS), SUM(RECEPTIONS), SUM(RECEPTIONS_YARDS), SUM(TOTAL_TD) from games, players where players.player_id = games.player_id group by games.player_id, games.year_no </beans:value> </beans:property></beans:bean>

The sql property as you might guess provides the SQL used to pull data from thedata source. Here we're using both the players and games tables to computeplayer stats. The result of that query is a JDBC ResultSet, which this particularItemReader implementation passes to a RowMapper implementation. ThePlayerSummaryMapper is a custom implementation, and it essentially takes a rowin a ResultSet and carries it to a PlayerSummary domain object.

Summary

With that we conclude our introductory tour of Spring Batch 2.0. We've onlyscratched the surface, showing how to create simple jobs with simple sequentialsteps, and how to run them.

Once you feel comfortable with simple jobs, it makes sense to spend a little timewith the introductory chapters of the Spring Batch Reference Documentation tolearn more about the execution environment, including the difference betweenJobs, JobInstances and JobExcecutions. You can use Spring Batch in conjunctionwith a scheduler (such as Quartz) to run batch jobs on a recurring basis in anautomated fashion.

More advanced topics include non-sequential step flow, such as conditional flowsand parallel flows, and support for partitioning individual steps across multiplethreads or even servers.

Enjoy!

Willie is an IT director with 12 years of Java development experience. Heand his brother John are coauthors of the upcoming book Spring inPractice by Manning Publications (www.manning.com/wheeler/). Williealso publishes technical articles (including many on Spring) towheelersoftware.com/articles/.

Source URL: http://java.dzone.com/articles/getting-started-spring-batch

Getting Started With Spring Batch 2.0 http://java.dzone.com/print/8845?page=0,1

14 of 14 10/06/2010 12:56 PM