spring batch workshop (advanced)

55
Spring Batch Workshop (advanced) Spring Batch Workshop (advanced) Arnaud Cogolu` egnes Consultant with Zenika, Co-author Spring Batch in Action March 20, 2012

Upload: lyonjug

Post on 25-Jan-2015

2.755 views

Category:

Technology


8 download

DESCRIPTION

Arnaud vous propose de découvrir le framework Spring Batch: du Hello World! jusqu'à l'exécution multi-threadée de batch, en passant par la lecture de fichiers CSV et la reprise sur erreur. Les techniques qu'utilise le framework pour lire et écrire efficacement de grands volumes de données e vous seront pas non plus épargnées ! La présentation se base sur une approche problème/solution, avec de nombreux exemples de code et des démos. A la suite de cette présentation, vous saurez si Spring Batch convient à vos problématiques et aurez toutes les cartes en mains pour l'intégrer à vos applications batch.

TRANSCRIPT

Page 1: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Arnaud Cogoluegnes

Consultant with Zenika, Co-author Spring Batch in Action

March 20, 2012

Page 2: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Outline

Overview

XML file reading

Item enrichment

File reading partitioning

File dropping launching

Database reading partitioning

Complex flow

Atomic processing

Page 3: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Overview

Overview

I This workshop highlights advanced Spring Batch featuresI Problem/solution approach

I A few slides to cover the featureI A project to start from, just follow the TODOs

I PrerequisitesI Basics about Java and Java EEI Spring: dependency injection, enterprise supportI Spring Batch: what the first workshop covers

I https://github.com/acogoluegnes/Spring-Batch-Workshop

Page 4: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Overview

License

This work is licensed under the Creative CommonsAttribution-NonCommercial-ShareAlike 3.0 Unported License.To view a copy of this license, visithttp://creativecommons.org/licenses/by-nc-sa/3.0/ or send aletter to Creative Commons, 444 Castro Street, Suite 900,Mountain View, California, 94041, USA.

Page 5: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

XML file reading

I Problem: reading items from a XML file and sending them toanother source (e.g. database)

I Solution: using the StaxEventItemReader

Page 6: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

XML file reading

Spring Batch’s support for XML file reading

I Spring Batch has built-in support for XML filesI Through the StaxEventItemReader for reading

I The StaxEventItemReader handles I/O for efficient XMLprocessing

I 2 main steps:I Configuring the StaxEventItemReaderI Configuring a Spring OXM’s Unmarshaller

Page 7: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

XML file reading

The usual suspects<?xml version="1.0" encoding="UTF-8"?>

<contacts>

<contact>

<firstname>De-Anna</firstname>

<lastname>Raghunath</lastname>

<birth>2010-03-04</birth>

</contact>

<contact>

<firstname>Susy</firstname>

<lastname>Hauerstock</lastname>

<birth>2010-03-04</birth>

</contact>

(...)

</contacts>

?

public class Contact {

private Long id;

private String firstname,lastname;

private Date birth;

(...)

}

Page 8: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

XML file reading

The StaxEventItemReader configuration

<bean id="reader" class="org.springframework.batch.item.xml.StaxEventItemReader">

<property name="fragmentRootElementName" value="contact" />

<property name="unmarshaller">

<bean class="org.springframework.oxm.xstream.XStreamMarshaller">

<property name="aliases">

<map>

<entry key="contact" value="com.zenika.workshop.springbatch.Contact" />

</map>

</property>

<property name="converters">

<bean class="com.thoughtworks.xstream.converters.basic.DateConverter">

<constructor-arg value="yyyy-MM-dd" />

<constructor-arg><array /></constructor-arg>

<constructor-arg value="true" />

</bean>

</property>

</bean>

</property>

<property name="resource" value="classpath:contacts.xml" />

</bean>

I NB: Spring OXM supports XStream, JAXB2, etc.

Page 9: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

XML file reading

Going further...

I StaxEventItemWriter to write XML files

I Spring OXM’s support for other marshallers

I Skipping badly formatted lines

Page 10: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Item enrichment

I Problem: I want to enrich read items with a Web Servicebefore they get written

I Solution: implement an ItemProcessor to make the WebService call

Page 11: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Item enrichment

Use case

I Reading contacts from a flat file

I Enriching the contact with their social security number

I Writing the whole contact in the database

Page 12: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Item enrichment

1,De-Anna,Raghunath,2010-03-04

2,Susy,Hauerstock,2010-03-04

3,Kiam,Whitehurst,2010-03-04

4,Alecia,Van Holst,2010-03-04

5,Hing,Senecal,2010-03-04

I NB: no SSN in the CSV file!

public class Contact {

private Long id;

private String firstname,lastname;

private Date birth;

private String ssn;

(...)

}

Page 13: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Item enrichment

The Web Service

I It can be any kind of Web Service (SOAP, REST)I Our Web Service

I URL:http://host/service?firstname=John&lastname=Doe

I It returns

<contact>

<firstname>John</firstname>

<lastname>Doe</lastname>

<ssn>987-65-4329</ssn>

</contact>

Page 14: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Item enrichment

The ItemProcessor implementationpackage com.zenika.workshop.springbatch;

import javax.xml.transform.dom.DOMSource;

import org.springframework.batch.item.ItemProcessor;

import org.springframework.web.client.RestTemplate;

import org.w3c.dom.NodeList;

public class SsnWebServiceItemProcessor implements

ItemProcessor<Contact, Contact> {

private RestTemplate restTemplate = new RestTemplate();

private String url;

@Override

public Contact process(Contact item) throws Exception {

DOMSource source = restTemplate.getForObject(url,DOMSource.class,

item.getFirstname(),item.getLastname());

String ssn = extractSsnFromXml(item, source);

item.setSsn(ssn);

return item;

}

private String extractSsnFromXml(Contact item, DOMSource source) {

// some DOM code

}

(...)

}

Page 15: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Item enrichment

Configuring the SsnWebServiceItemProcessor

<batch:job id="itemEnrichmentJob">

<batch:step id="itemEnrichmentStep">

<batch:tasklet>

<batch:chunk reader="reader" processor="processor" writer="writer"

commit-interval="3"/>

</batch:tasklet>

</batch:step>

</batch:job>

<bean id="processor"

class="com.zenika.workshop.springbatch.SsnWebServiceItemProcessor">

<property name="url"

value="http://localhost:8085/?firstname={firstname}&amp;lastname={lastname}" />

</bean>

Page 16: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Item enrichment

But my Web Service has a lot of latency!

I The Web Service call can benefit from multi-threading

I Why not spawning several processing at the same time?

I We could wait for the completion in the ItemWriter

I Let’s use some asynchronous ItemProcessor andItemWriter

I Provided in the Spring Batch Integration project

Page 17: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Item enrichment

Using async ItemProcessor and ItemWriter

I This is only about wrapping

<bean id="processor"

class="o.s.b.integration.async.AsyncItemProcessor">

<property name="delegate" ref="processor" />

<property name="taskExecutor" ref="taskExecutor" />

</bean>

<bean id="writer"

class="o.s.b.integration.async.AsyncItemWriter">

<property name="delegate" ref="writer" />

</bean>

<task:executor id="taskExecutor" pool-size="5" />

Page 18: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Item enrichment

Going further...

I Business delegation with an ItemProcessor

I Available ItemProcessor implementationsI Adapter, validator

I The ItemProcessor can filter items

Page 19: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

File reading partitioning

I Problem: I have multiple input files and I want to processthem in parallel

I Solution: use partitioning to parallelize the processing onmultiple threads

Page 20: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

File reading partitioning

Serial processing

Page 21: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

File reading partitioning

Partitioning in Spring Batch

I Partition the input dataI e.g. one input file = one partitionI partition processed in a dedicated step execution

I Partitioning is easy to set up but need some knowledge aboutthe data

I Partition handler implementationI Multi-threadedI Spring Integration

Page 22: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

File reading partitioning

Multi-threaded partitioning

Page 23: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

File reading partitioning

Partitioner for input files

<bean id="partitioner"

class="o.s.b.core.partition.support.MultiResourcePartitioner">

<property name="resources"

value="file:./src/main/resources/input/*.txt" />

</bean>

Page 24: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

File reading partitioning

The partitioner sets a context for the components

<bean id="reader"

class="org.springframework.batch.item.file.FlatFileItemReader"

scope="step">

(...)

<property name="resource"

value="#{stepExecutionContext[’fileName’]}" />

</bean>

Page 25: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

File reading partitioning

Using the multi-threaded partition handler

<batch:job id="fileReadingPartitioningJob">

<batch:step id="partitionedStep" >

<batch:partition step="readWriteContactsPartitionedStep"

partitioner="partitioner">

<batch:handler task-executor="taskExecutor" />

</batch:partition>

</batch:step>

</batch:job>

<batch:step id="readWriteContactsPartitionedStep">

<batch:tasklet>

<batch:chunk reader="reader" writer="writer"

commit-interval="10" />

</batch:tasklet>

</batch:step>

Page 26: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

File reading partitioning

Going further...

I Spring Integration partition handler implementationI Other scaling approaches

I parallel steps, remote chunking, multi-threaded step)

Page 27: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

File dropping launching

I Problem: downloading files from a FTP server and processingthem with Spring Batch

I Solution: use Spring Integration to poll the FTP server andtrigger Spring Batch accordingly

Page 28: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

File dropping launching

Using Spring Integration for transfer and triggering

Page 29: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

File dropping launching

The launching code

public class FileContactJobLauncher {

public void launch(File file) throws Exception {

JobExecution exec = jobLauncher.run(

job,

new JobParametersBuilder()

.addString("input.file", "file:"+file.getAbsolutePath())

.toJobParameters()

);

}

}

I The File is the local copy

Page 30: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

File dropping launching

Listening to the FTP server

<int:channel id="fileIn" />

<int-ftp:inbound-channel-adapter local-directory="file:./input"

channel="fileIn" session-factory="ftpClientFactory"

remote-directory="/" auto-create-local-directory="true">

<int:poller fixed-rate="1000" />

</int-ftp:inbound-channel-adapter>

<bean id="ftpClientFactory"

class="o.s.i.ftp.session.DefaultFtpSessionFactory">

<property name="host" value="localhost"/>

<property name="port" value="2222"/>

<property name="username" value="admin"/>

<property name="password" value="admin"/>

</bean>

Page 31: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

File dropping launching

Calling the launcher on an inbound message

<int:channel id="fileIn" />

<int:service-activator input-channel="fileIn">

<bean

class="c.z.w.springbatch.integration.FileContactJobLauncher">

<property name="job" ref="fileDroppingLaunchingJob" />

<property name="jobLauncher" ref="jobLauncher" />

</bean>

</int:service-activator>

Page 32: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

File dropping launching

Going further...

I Checking Spring Integration connectorsI Local file system, FTPS, SFTP, HTTP, JMS, etc.

I Checking operations on messagesI Filtering, transforming, routing, etc.

Page 33: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Database reading partitioning

I Problem: I want to export items from different categoriesfrom a database to files

I Solution: provide a partition strategy and use partitioning

Page 34: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Database reading partitioning

The use case

ID name category

1 Book 01 books2 DVD 01 dvds3 DVD 02 dvds4 Book 02 books

? ?products books.txt

1,Book 01,books

4,Book 02,books

products dvds.txt

2,DVD 01,dvds

3,DVD 02,dvds

Page 35: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Database reading partitioning

Partitioning based on categories

I 2 partitions in this case

ID name category

1 Book 01 books2 DVD 01 dvds3 DVD 02 dvds4 Book 02 books

Page 36: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Database reading partitioning

Partitioning logic with the Partitioner interface

public class ProductCategoryPartitioner implements Partitioner {

(...)

@Override

public Map<String, ExecutionContext> partition(int gridSize) {

List<String> categories = jdbcTemplate.queryForList(

"select distinct(category) from product",

String.class

);

Map<String, ExecutionContext> results =

new LinkedHashMap<String, ExecutionContext>();

for(String category : categories) {

ExecutionContext context = new ExecutionContext();

context.put("category", category);

results.put("partition."+category, context);

}

return results;

}

}

Page 37: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Database reading partitioning

Output of the Partitioner

I Excerpt:

for(String category : categories) {

ExecutionContext context = new ExecutionContext();

context.put("category", category);

results.put("partition."+category, context);

}

I Results:

partition.books = { category => ’books’ }

partition.dvds = { category => ’dvds’ }

Page 38: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Database reading partitioning

Components can refer to partition parameters

I They need to use the step scope

<bean id="reader"

class="org.springframework.batch.item.database.JdbcCursorItemReader"

scope="step">

<property name="sql"

value="select id,name,category from product where category = ?" />

<property name="preparedStatementSetter">

<bean class="org.springframework.jdbc.core.ArgPreparedStatementSetter">

<constructor-arg value="#{stepExecutionContext[’category’]}" />

</bean>

</property>

</bean>

<bean id="writer"

class="org.springframework.batch.item.file.FlatFileItemWriter"

scope="step">

<property name="resource"

value="file:./target/products_#{stepExecutionContext[’category’]}.txt}" />

(...)

</bean>

Page 39: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Database reading partitioning

Configure the partitioned step

I The default implementation is multi-threaded

<batch:job id="databaseReadingPartitioningJob">

<batch:step id="partitionedStep" >

<batch:partition step="readWriteProductsPartitionedStep"

partitioner="partitioner">

<batch:handler task-executor="taskExecutor" />

</batch:partition>

</batch:step>

</batch:job>

<batch:step id="readWriteProductsPartitionedStep">

<batch:tasklet>

<batch:chunk reader="reader" writer="writer"

commit-interval="10" />

</batch:tasklet>

</batch:step>

Page 40: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Database reading partitioning

Going further...

I Check existing partitioner implementations

I Check other partition handler implementations

I Check other scaling strategies

Page 41: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Complex flow

I Problem: my job has a complex flow of steps, how can SpringBatch deal with it?

I Solution: use the step flow attributes and tags, as well asStepExecutionListeners.

Page 42: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Complex flow

The next attribute for a linear flow

<batch:job id="complexFlowJob">

<batch:step id="digestStep" next="flatFileReadingStep">

<batch:tasklet ref="digestTasklet" />

</batch:step>

<batch:step id="flatFileReadingStep">

<batch:tasklet>

<batch:chunk reader="reader" writer="writer"

commit-interval="3" />

</batch:tasklet>

</batch:step>

</batch:job>

Page 43: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Complex flow

There’s also a next tag non-linear flows

<batch:job id="complexFlowJob">

<batch:step id="digestStep">

<batch:tasklet ref="digestTasklet" />

<batch:next on="COMPLETED" to="flatFileReadingStep"/>

<batch:next on="FAILED" to="trackIncorrectFileStep"/>

</batch:step>

<batch:step id="flatFileReadingStep">

<batch:tasklet>

<batch:chunk reader="reader" writer="writer"

commit-interval="3" />

</batch:tasklet>

</batch:step>

<batch:step id="trackIncorrectFileStep">

<batch:tasklet ref="trackIncorrectFileTasklet" />

</batch:step>

</batch:job>

I NB: there are also the end, fail, and stop tags

Page 44: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Complex flow

But if I have more complex flows?

I The next attribute decision is based on the exit code

I You can use your own exit codes

I A StepExecutionListener can modify the exit code of astep execution

Page 45: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Complex flow

Modifying the exit code of a step execution

package com.zenika.workshop.springbatch;

import org.springframework.batch.core.ExitStatus;

import org.springframework.batch.core.StepExecution;

import org.springframework.batch.core.StepExecutionListener;

public class SkipsListener implements StepExecutionListener {

@Override

public void beforeStep(StepExecution stepExecution) { }

@Override

public ExitStatus afterStep(StepExecution stepExecution) {

String exitCode = stepExecution.getExitStatus().getExitCode();

if (!exitCode.equals(ExitStatus.FAILED.getExitCode()) &&

stepExecution.getSkipCount() > 0) {

return new ExitStatus("COMPLETED WITH SKIPS");

} else {

return null;

}

}

}

Page 46: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Complex flow

Plugging the SkipsListener

<step id="flatFileReadingStep">

<tasklet>

<chunk reader="reader" writer="writer"

commit-interval="3" skip-limit="10">

<skippable-exception-classes>

<include

class="o.s.b.item.file.FlatFileParseException"/>

</skippable-exception-classes>

</chunk>

</tasklet>

<end on="COMPLETED"/>

<next on="COMPLETED WITH SKIPS" to="trackSkipsStep"/>

<listeners>

<listener>

<beans:bean class="c.z.workshop.springbatch.SkipsListener" />

</listener>

</listeners>

</step>

Page 47: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Complex flow

Going further...

I The JobExecutionDecider and the decision tag

I The stop, fail, and end tags

Page 48: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Atomic processing

I Problem: I want to process an input file in an all-or-nothingmanner.

I Solution: Don’t do it. Atomic processing isn’t batchprocessing! Anyway, if you really need an atomic processing,use a custom CompletionPolicy.

Page 49: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Atomic processing

Why atomic processing is a bad idea?

I You loose the benefits of chunk-oriented processingI speed, small memory footprint, etc.

I On a rollback, you loose everything (that’s perhaps the point!)

I The rollback can take a long time (several hours)

I It all depends on the amount of data and on the processing

Page 50: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Atomic processing

I really need an atomic processing

I Rollback yourself, with compensating transactionsI Use a transaction rollback

I It’s only a never ending chunk!

Page 51: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Atomic processing

Quick and dirty, large commit interval

I Set the commit interval to a very large value

I You should never have more items!

<chunk reader="reader" writer="writer" commit-interval="1000000"/>

Page 52: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Atomic processing

Use a never-ending CompletionPolicy

I Spring Batch uses a CompletionPolicy to know if a chunk iscomplete

package org.springframework.batch.repeat;

public interface CompletionPolicy {

boolean isComplete(RepeatContext context, RepeatStatus result);

boolean isComplete(RepeatContext context);

RepeatContext start(RepeatContext parent);

void update(RepeatContext context);

}

Page 53: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Atomic processing

Plugging in the CompletionPolicy

<batch:job id="atomicProcessingJob">

<batch:step id="atomicProcessingStep">

<batch:tasklet>

<batch:chunk reader="reader" writer="writer"

chunk-completion-policy="atomicCompletionPolicy" />

</batch:tasklet>

</batch:step>

</batch:job>

I NB: remove the commit-interval attribute when using aCompletionStrategy

Page 54: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Atomic processing

Which CompletionPolicy for my atomic processing?

<bean id="atomicCompletionPolicy"

class="o.s.b.repeat.policy.DefaultResultCompletionPolicy" />

Page 55: Spring Batch Workshop (advanced)

Spring Batch Workshop (advanced)

Atomic processing

Going further...

I Flow in a job

I SkipPolicy, RetryPolicy