elegance in - no fluff just stuff · 2011-01-21 · » technology deep dive there are no beginner...

35
legance in OSGI Myths Clojure: Taking Control Open Source Business Intelligence REST and SOAP Web Services Testing with SoapUI Volume II Issue I | Jan/Feb/March 2010 E

Upload: others

Post on 08-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

legance in

OSGI Myths

Clojure: Taking Control

Open Source Business Intelligence

REST and SOAP Web Services Testing with SoapUIVolume II Issue I | Ja

n/Feb/March 2010

E

Page 2: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Make Your Training Dollars Go Further In 2010NFJS brings a high-quality technical conference to your city... no travel required.Many of our speakers are the same people you will see speaking at JavaOne andother national events.

Develop Your SkillsStaying up-to-date can be overwhelming. NFJS helps developers: improve skills,solve problems, and be productive. The best developers possess many talents. NFJSevents deliver a select body of technical knowledge and hands on training. During anevent, there are always interesting sessions available to you.

Learn How Agile Teams WinSoftware is a difficult industry with very high rates of failure. NFJS emphasizes Agilepractices such as: Test Driven Development, Continuous Integration, Code QualityMeasures, and Team Building methods to stabilize your processes and improvequality. Do your work better and faster!

Exchange Knowledge with Your PeersDo you want to solve a problem? Get a fresh opinion. Few developer problems aretruly new! NFJS is a great opportunity to interact with your peers and exchange ideas.

Save Big with NFJS Discounts1) Earlybird Rates - register early to get the best rate2) Group Discounts - NFJS makes it practical to bring your team3) Alumni Discounts - Have you attended before? Watch your email for alumni rates. * Be sure to check nofluffjuststuff.com early to get the best rate available.

Topics for 2010Most NFJS events offer 5 concurrent sessions. Throughout the duration of an event,there are sure to be sessions of interest available to all developers. Our content isupdated frequently. Check nofluffjuststuff.com for current session details.

Agile PracticesCore JavaEnterprise Java : EJB3 & JMSArchitecture & ScalingGroovy and GrailsSecurityDynamic LanguagesFrameworks: Hibernate, Spring, JSFAJAX, Flex, RIA, REST

For more tour dates and information please visithttp://www.nofluffjuststuff.com/ or [email protected]

Neal FordApplication Architect atThoughtWorks, Inc.

Ted NewardEnterprise, VirtualMachine and LanguageWonk

Venkat SubramaniamFounder of AgileDeveloper, Inc.

Brian Slettenauthor of Resource-Oriented Architectures :Building Webs of Data

Matthew McCulloughOpen Source Architect,Ambient Ideas

Tour ScheduleBloomington, IL Apr 9 - 10Tampa, FL Apr 16 - 18Memphis, TN Apr 23 - 25Reston, VA Apr 30 - May 2St. Louis, MO May 21 - 23Dallas, TX Jun 4 - 6Columbus, OH Jun 25 - 27Salt Lake City, UT Jul 9 - 10Austin, TX Jul 16 - 17Des Moines, IA Jul 30 - Aug 1Jersey City, NJ Aug 6 - 8Columbia, MD Aug 13 - 15Boston, MA Sep 10 - 12Seattle, WA Sep 17 - 19Minneapolis, MN Oct 1 - 3Atlanta, GA Oct 22 - 24Reston, VA Nov 5 - 7Chicago, IL Nov 12 - 14Denver, CO Nov 19 - 21

Featured Speakers

Page 3: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 3

Contributors

Publisher & Editor in Chief: Jay Zimmerman

Technical Editor: Nick Watts

Layout/Design: Alicia Weller

Author Contributors for Volume II Issue I:

Rohit Bhardwaj

Craig Walls

Tim Berglund

Howard Lewis Ship

Table of Contents

REST and SOAP Web Service Testing with soapUI, Part 1

5 - 11

Busting OSGi Myths and Misconceptions 13 - 19

Open Source Business Intelligence

21 - 26

Clojure: Taking Control Of Your Language

28 - 33

Greetings!

Welcome to the 1st issue of the 2010 NFJS, the Magazine! We look forward to delivering another great year of excellent articles! An exciting new offering for NFJS, the Magazine is an ePub version so you can read NFJS, the Magazine on virtually any smartphone, Kindle or iPad.

The 2010 NFJS tour is in full swing. We just completed our first three shows of the year

(Milwaukee, Boston and Minneapolis). All three events were great and the new talks and speakers were well received. Make sure and check out www.nofluffjuststuff.com for the full 2010 schedule.

I am proud to announce a premier destination Java event hosted by NFJS called Über Conf 2010. The Ü will offer over 100 technically focused sessions, including hands on workshops centered around Architecture, Cloud, Security, Enterprise Java, Languages on the JVM, Build/Test, Mobility and Agility. The goal of Über Conf is a simple one: totally blow the minds of our attendees.

Make sure and check out www.uberconf.com for details. Super early bird discount pricing is available through April 12th. I hope to see you there!

We thank each of you for your support of NFJS, the Magazine. It is a pleasure to bring you excellent technical content every issue. As always, your feedback is greatly appreciated! Have a great month!

Until next month,

Jay ZimmermanPublisher & Editor in [email protected]: @NoFluff

Page 4: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Come to the Ü for an Extreme Tech Experience

Über Conf will take place June 14 - 17, 2010 in Denver, CO. This event will focus onthe best practices, new languages, and latest advancements on the Java Platform.

This is an exciting time of inovation and change. Java is not just a language. Java is atechnolgy platform and ecosystem. Über Conf will educate developers and explorethe powerful languages and tools which are changing the way we create softwareusing the Java Platform.

» Technology Deep DiveThere are no beginner sessions here. This is your opportunity to go beyond thebasics and master critical skills. We expect that you are a competent developer whois ready to solve problems using today's best tools and practices.

» Agile Practices that WorkSoftware is a difficult industry with high rates of failure. To create winning teams, weembrace principles laid out by the Agile Manifesto. Speakers at Über Conf emphasizeand present on topics such as: Test Driven Development, Continuous Integration,Code Quality Measurements, Code Smells, Team Building, and CustomerCollaboration.

» Hands On WorkshopsAt Über Conf you will not just listen to lectures. You will have the opportunity toparticipate in workshops, get your hands dirty, and write code.

» Learn from the BestÜber Conf will bring together many of the industry's best project leaders, developers,authors, and trainers.

» RatesTake advantage of our all inclusive travel package. This package includes: conferenceregistration, 3 nights lodging, and airfare in the continental US.

Register By PriceAll-InclusivePrice

Super Earlybird Rate Monday, April 12 $1,250 $2,100

Earlybird Rate Friday, May 14 $1,400 $2,250

Regular Rate $1,550 $2,400

For more information please visit http://uberconf.com/ or [email protected]

Alex AntonovTechnical Lead on theCore Frameworks team atOrbitz Worldwide

Tim BerglundDeveloper, Consultant,Author

Cliff ClickChief JVM Architect ofAzul Systems

Esther DerbyCo-author of "BehindClosed Doors: Secrets ofGreat Management"

Hans DockterFounder and Project Leadof Gradle

Keith DonaldSpringSource Principal &Founding Partner

Define Über1 : a superlative example of its kind2 : to an extreme degree

Topics at Über ConfLanguages on the JVMSecuritySOA and ROAEnterprise JavaCloud ComputingJava InternalsAgilityMobile Dev (iPhone, Android)

Featured Speakers

Page 5: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 5

REST and SOAP Web Service Testing with soapUI, part 1by Rohit Bhardwaj

In the enterprise world web services act as a link

to connect disparate applications. SOA and web

services help companies to seamlessly interact

with each other. Users do not care where the

service is rendered and which technology is used.

Problems come when resolving issues with SOA

or web service implementation. SoapUI helps by

effectively simplifying the testing of web services.

SoapUI is extremely useful to developers, quality

assurance, and managers. This article is the

first of a two part series in which we will delve

into soapUI. We will discuss RESTful and SOAP

web services and how to perform functional

testing. We will discuss powerful Groovy

scripting for manipulating data and validation.

Why soapUI for web service development and testing?

Service Oriented Architecture (SOA) and Web services help companies to seamlessly connect. Enhanced testing techniques are crucial for developing Web services. Problems in integration can be at any level of service. Web services are the easy way to connect to external software interfaces. With each interface added, there is an opportunity for errors. Problems come when debugging or integrating different web services.

Even a small mistake can make web services fail.

Testing becomes more essential when multiple web services are called to provide an enhanced service to the customer. Some of web services problems are in integration with other web services, performance and scalability. For example, Google service outages came in September 24th 2009 and May 14th 2009. This created lots of problems for existing web services based on Google services. It is extremely difficult to debug and fix these problems without web service test suites.

SoapUI helps in effectively simplifying the testing of Web Services. SoapUI is available to developers, quality assurance, and managers. Its graphical interface is one of the simplest to use with support for SOAP and RESTful Web services. REST stands for Representational State Transfer. SOAP is another industry standard which is defined as Simple Object Access Protocol. SOAP is a protocol specification for exchanging structured information in the implementation of Web Services.

The benefit of soapUI is its ability to inspect the web service and automatically generate web service requests and tests. SoapUI has greatly helped my company’s testing effort and reduced the time for integration tests for features from two weeks to two days. SoapUI provides security features like web service authentication and WS-Security. SoapUI monitor provides easy access to the ability to monitor and analyze traffic. SoapUI also provides command line options to run tests. This helps in automating tests and running them as batch files. SoapUI is also supported by Ant scripts and Maven.

What is in this article?In this article we will look at following topics of soapUI:

• A simple SOAP web service • A simple RESTful web service• Functional testing in soapUI (including dynamic

Groovy scripting)

In this article I have used soapUI Pro version 3.0.1. The next few sections will describe soapUI features with web service examples.

Page 6: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

{restAndSoap}

6 | NoFluff JustStuff .com

A simple web SOAP service

Let’s use a web service to find the U.S. weather forecast for a week. The input parameter for this service is a valid zip code or Place name in the U.S. and the output is forecasted details for the next seven days.

Weather forecast SOAP service tutorialThe WSDL for this service is located here.

The following operations are supported in this WSDL:

GetWeatherByPlaceName – Get one week’s weather forecast for a place name (USA).

GetWeatherByZipCode – Get one week’s weather forecast for a valid Zip Code (USA).

Using soapUI, click on “New soapUI Project” in the File menu. In the initial WSDL/WADL field add the service WSDL URL. Choose the checkboxes to create requests and a TestSuite. You can also choose to store the relative paths. Click “OK” and save this newly created project. Next, soapUI will give you the option to create default requests for each operation, click on “OK”. SoapUI will now add the SOAP/HTTP Binding for the weather service to your project and create nodes for each operation. Figure BHA-1 shows the values to enter and options to choose when starting the new project.

Request and response of weather web serviceNow that you have added the weather service, you can inspect and test all requests. SoapUI created default requests for each operation in accordance with their WSDL and Schema definition. Let’s first look at first SOAP envelope in this example. The WeatherForecast WSDL contains the method

GetWeatherByPlaceName. Navigate to this method from the WeatherForecastSoap directory in soapUI. Once the Request 1 is open, put the city you want to get weather for next seven days. Boston is used as the place name in Figure BHA-2.

As you see in figure BHA-2, click on green arrow button on top left, you will get back the response shown in Figure BHA-3.

A simple RESTful web service

Since soapUI 2.5, invoking and testing REST/HTTP services have become extremely easy. You can start with a WADL definition. This way the resources, methods and representations are automatically created by soapUI.

Figure BHA-1: New soapUI Project

Figure BHA-2

Figure BHA-3

Page 7: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 7

Yahoo traffic REST web service tutorialLet’s start with a Yahoo web service called Yahoo Traffic Web Service. The Traffic Web Service from Yahoo gives access to real-time traffic alerts information for a given location. The traffic web service is a REST API whose input is location (street, city, state). First create a new soapUI project and select ‘REST web service as shown in Figure BHA-4. Here, add the project name and select the options ‘Add REST Service’ and ‘Relative Paths’. Click ‘OK’ without entering anything into the ‘Initial WSDL/WADL’ field.

You will be first asked to save the project to disk. When the ‘New REST Service’ dialog box appears, add the following link to the Service Endpoint as shown in figure BHA-5. Check the option to ‘Extract Resource/Method’.

http://local.yahooapis.com/MapsService/V1/trafficData?appid=YdnDemo&street=701+First+Ave&city=Sunnyvale&state=CA

Accept the default values for the parameters automatically extracted from the endpoint. SoapUI will create a ‘New REST GET Method’ with name ‘Method 1’. The program will then open the newly created ‘Request 1’ which you can run with the green button on the top left.

Inspecting the WADL definition SoapUI will automatically generate a WADL file for the Yahoo Traffic Web Service which contains resources, methods and representations. To see the new WADL file, right click on the service ‘Yahoo testing’ from the explorer view and click on ‘Show Service Viewer’. With this view, it becomes remarkably easy to inspect and test RESTful web services. The WADL file is shown in Figure BHA-6.

Functional testing in SoapUI

Functional testing is used to validate the test scenarios just like performing a unit test. SoapUI supports integration testing where two modules can be tested. Groovy support helps in creating robust tests, which also support linking with the database for verification or getting ‘initial seed’ data.

Test suite, Test cases and Test stepsThe next step is to develop a test suite and add a test case. SoapUI supports unit testing just like JUnit tests. This enables test driven development. This means first write the test to make the test fail and then write the web service to support the failing test. In soapUI a ‘TestSuite’ contains many TestCases. Each ‘TestCase’ will test a scenario for web service. A TestCase contains many TestSteps that execute in a sequence. Load tests are run on a TestCase scenario to check if the scenario meets the Service Level Agreements (SLA). Table BHA-1 (next page) describes different types of TestSteps supported by soapUI.

Figure BHA-4: New REST Yahoo web service

Figure BHA-5: New REST Yahoo web service

Figure BHA-6: The traffic WADL definition is automatically generated

Page 8: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

{restAndSoap}

8 | NoFluff JustStuff .com

DataSource TestStep tutorialDataSource TestStep is part of soapUI Pro for data driven tests. It provides a way to loop over a list of entries making web service calls and validating the response. The DataSourceLoop step is needed for creating a loop for the data source. Table BHA-2 describes currently supported DataSources.

Adding AssertionsSoapUI helps in checking the response generated from a web service with asserts. SoapUI has a number of assertions which help in testing web services. These assertions assist in validating the web service during development and testing. Table BHA-3 (next page) describes different types of assertions used in soapUI. Figure BHA-7 describes an example of the XPath assertion to make sure the severity is greater than zero.

Test Step Type Short DescriptionRequest Sends a SOAP request and allows the response to be validated using a variety of assertions.REST Request Step Executes a REST Request to a Resource defined in the project.HTTP Request Step Executes and arbitrary HTTP request.Property Transfer Used for transferring property values between two test steps.Groovy Script Runs a Groovy script that can do more or less “anything”.Properties Used for defining global properties that can be read from an external source.Conditional Goto Allows any number of conditional jumps in the TestCase execution path. Conditions are specified as

XPath expressions and applied to the previous request step’s response.Delay Step Pauses a TestCase run for the specified number of milliseconds.Run TestCase Step Runs another TestCase from within an existing one.MockResponse Step

Waits/Listens for an incoming SOAP Request that can be validated and return a mock response.

DataSource Step Reads external data to be used as input to requests, etc - soapUI pro only.DataSourceLoop Step

Used together with a DataSource to specify looping for external data rows - soapUI pro only.

DataSink Step Writes properties to an external storage - soapUI pro only.DataGen Step Generates property values - soapUI Pro only.

Table BHA-1

DataSource Type Short DescriptionJDBC DataSource Reads data from a JDBC data source.JDBC Connection DataSource Reads data through a JDBCConnection configured on a project level.Excel DataSource Extracts data from an Excel (.xls) file.Grid DataSource Allows entries or management of data from within the editorXML DataSource Extracts data from an XML propertyFile DataSource Extracts data from a columnar data fileDirectory DataSource Reads files into propertiesGroovy DataSource Opens for any kind of DataSource

Table BHA-2

Figure BHA-7: Assertion to make sure the severity is greater than 0

Page 9: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 9

Property Transfers exampleProperty Transfers are used to transfer properties from response XML to a property. This makes it easy to get values from response XML. The test step can contain a number of “transfers”. Each transfer is between a source and destination property which optionally can also use XPath/XQuery expressions (Figure BHA-8).

Code coverageJust like code coverage for Java or .Net, soapUI does web service coverage where web service calls are dynamically analyzed. The coverage analysis is done mainly for functional tests, which may use mock services and clients.

Web Services in soapUI consists of operations, messages and representations. Expected output can

be verified by using assertions. Coverage analysis looks at how many elements or attributes are actually used during the tests.

Validation of web services is done using assertions. SoapUI analyzes how many elements or attributes have actually been asserted. This analysis is called “Assertion Coverage”. Figure BHA-9 shows code coverage for the Traffic service.

Type DescriptionSchema Compliance * Validates the response message against its xml schema.Simple Contains Checks for the existence of a token.Simple Not Contains Checks for the non existence of a token.SOAP Fault * Checks that the response is a soap fault.Not SOAP Fault * Checks that the response is not a soap fault.SOAP Response * Checks that the response is valid SOAP Response.Response SLA * Checks the response time to be under a specified value.XPath Match Matches the result of a specified XPath expression against a predefined value.XQuery Match Matches the result of a specified XQuery expression against a predefined value.Script Assertion Allows a custom Groovy script for asserting the message exchange.WS-Security Status * Checks that incoming WS-Security processing was successful.WS-Addressing Response Assertion Checks that the response has valid WS-A header properties.WS-Addressing Request Assertion Checks that the request has valid WS-A header properties.

Assertions marked with a (*) are “singular assertions”, meaning that they can be added only once to a TestRequest

Table BHA-3

Figure BHA-8: Transfer property style and title from response

Figure BHA-8:

Transfe

r property style and title from response

Page 10: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

{restAndSoap}

10 | NoFluff JustStuff .com

Dynamic Groovy Scripting

The Groovy Script step is the most powerful feature of soapUI. Using a Groovy script, the developer can access the soapUI object model. Here are few of the tasks which can be performed:

• Reading initial data from a properties file. • Reading data from a data source for initial setup.• Creating dynamic properties within a test case.• Provide control flow where the script decides the

next step to go to.• Perform validations and assertions.• Write the results to a data source or external files. Following are few examples of groovy scripts in soapUI.

Groovy script transfer property exampleIn the Groovy script in Figure BHA-10, the property ‘Title’ is read from a source property to a target property. You can also add assertions to validate the property. Groovy script reading from a file exampleListing BHA-1 shows a Groovy script that reads properties from a file and assigns them to a Properties Step in soapUI. Here “inputProperty.txt” is read and properties are put in the target step “Properties”.

Groovy script adding TestStep dynamically exampleListing BHA-2 shows a Groovy script adding a TestStep dynamically.

Figure BHA-10: Transfer from source property to target property

// read the filedef properties = new java.util.Properties();properties.load( new java.io.FileInputStream( “inputProperty.txt” ));def targetStep = testRunner.testCase.getTestStepByName( “Properties” );// assign single propertytargetStep.setPropertyValue( “newproperty”, properties.getProperty( “ newproperty” ));// assign all properties from the filedef names = properties.propertyNames();while( names.hasMoreElements() ){ def name = names.nextElement(); targetStep.setPropertyValue( name, properties.getProperty( name ));}

Listing BHA-1

// add a properties steptestRunner.testCase.addTestStep( “properties”, “New Test step” );

Listing BHA-2

Page 11: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 11

Conclusion

In this article I introduced the soapUI for testing web services. The soapUI is an immensely powerful tool that enables users to effectively test web services during development and testing. SoapUI helps in testing enterprise web services interfacing with other products. We discussed both a simple RESTful and SOAP web service. Agile web service test driven development is possible using soapUI. Dynamic Groovy Scripting is the most powerful feature of soapUI. Using Groovy scripts the developer can access the soapUI object model and can perform a variety of tasks. In the second part of the soapUI series we will explore load testing and web service simulation. We will also look at a few testing scenarios for login services and template driven testing.

References

http://www.soapUI.org

http://en.wikipedia.org/wiki/SoapUI

http://developer.yahoo.com

http://www.programmableweb.com/apis/directory/1?protocol=REST

http://www.xml.com/pub/a/2004/08/04/tr-xml.html

http://googlecode.blogspot.com/2006/12/beyond-soap-search-api.html

http://developer.yahoo.com/faq/#rest

About the Author

Rohit Bhardwaj is working at Kronos Incorporated as Principal Software Engineer. He has fifteen years of extensive experience in architecture, design and agile development. He is an expert in application development in Service Oriented Architecture (SOA), REST, Cloud Computing, RIA, Android, Web Services, XML, XSL, SOAP, UDDI, JSON, and soapUI. Rohit is Sun Certified Java developer for Java 1.5 Platform. Rohit did his Masters in Computer Science from Boston University and Harvard University.

Rohit is a world class speaker and has given presentations on topics like SOA, REST and SPARQL for Semantic Web, Cloud computing, Android, RIA, Agile Development, Test Driven Development, Performance monitoring and scalability. Rohit can be reached at [email protected]

Page 12: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Your world class partner. Keeping you ahead of the curve.

Consulting

NFJS One has access to the richest technical talent in the world. We offer expert con-sultations on a variety of topics including:

• Architectural Reviews• Test Automation• Web Frameworks • Groovy and Grails• Agile Introductions• Finding Open Source Al-

ternatives• And More...

Our team has a solid foun-dation in a broad variety of technologies, and our indus-try contacts include experts from every major software area.

We welcome the chance to work alongside your team and help make your project a success.

Training

We offer training classes for teams of all sizes, providing instructional style classes as well as hands-on workshops to immerse your team.

Our classes range from two to five days, and can be cus-tomized from introductory to expert level.

Current courses include:

• Migrating Away from Struts• Automation Strategies• Moving to Java 6• Moving to REST• Beginning Ruby & Rails• Leveraging Continuous

Integration

Our cutting edge instructors can deliver an excellent hands-on experience to boost your team s productiv-ity.

Mentoring

What better way to learn than to have a seasoned veteran working alongside your team? NFJS One and your team can work through the problems together as we first perform a high level pro-ject triage, and then provide the additional expertise you need.

NFJS One also offers a re-search service for your CTO/CIO/CEO. When you want to stay on top of the latest technology and need to be sure that you re ahead of the curve, we ll provide precise position papers on topics you need.

We re available to mentor your developers, team leads, or executives.

Stay ahead of the curveNFJSOne.com

sales@n�sone.com

Page 13: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 13

Busting OSGi Myths and Misconceptionsby Craig Walls

Methods, classes, packages, JAR files...these are all mechanisms that we can use to organize the code within Java applications. Slicing an application into cohesive and loosely-coupled parts increases the maintainability, comprehensibility, testability, and reusability of those parts.

Methods and classes are especially effective at defining discrete units of functionality. But Java doesn’t offer much for coarse-grained modularity. Packages appear to be a means for modularity, but ultimately offer little more than a weak organizational mechanism. JAR files also seem to be a way to define modules in Java—but once they’re put into a classpath, the boundaries fall off and the contents of JAR files a laid bare among the contents of other JAR files on the classpath.

In the past few years, OSGi has garnered a lot of attention in the developer community as a way to achieve modularity on the Java platform. OSGi extends the concept of a JAR file as a bundle that can contain both public and private contents. Moreover, by enabling bundles to publish and consume services, applications can be assembled from bundles that are highly cohesive and loosely-coupled.

Along with the attention given to OSGi has come some skepticism and even some misinformation about OSGi. In this article, we’ll look at several of the most common myths and misconceptions concerning OSGi and attempt to debunk or confirm them. Let’s start with the myth that I hear the most often; that OSGi is too heavyweight.

“A place for everything and everything in its place.”

package com.habuma.hello.service.impl;import org.osgi.framework.BundleActivator;import org.osgi.framework.BundleContext;import org.osgi.framework.ServiceRegistration;import com.habuma.hello.service.HelloService;

public class HelloPublisher implements BundleActivator { private ServiceRegistration registration;

public void start(BundleContext context) throws Exception { registration = context.registerService(HelloService.class.getName(), new HelloImpl(), null); }

public void stop(BundleContext context) throws Exception { registration.unregister(); }}

Listing WAL-1: HelloPublisher.java

Myth: OSGi is Heavyweight

The term “heavyweight” is often used to mean that a framework or specification imposes itself too heavily, making excessive demands of code that is to work within the framework. Specifically, when a f ramework requ ires that you wr i te components that extend f ramework-spec i f i c classes or implement framework-specific interfaces, you’re deal ing with a heavyweight framework. EJB 2.x is considered to be the epitome of a heavyweight specification. Even the simplest EJB would require:

• An implementation class that implements javax.ejb.SessionBean and implements a handful of lifecycle methods (which are usually left empty).

• A home interface that extends either javax.ejb.EJBHome or javax.ejb.EJBLocalHome

• A business interface that implements either javax.ejb.EJBObject or javax.ejb.EJBLocalObject

That’s quite a lot to ask for a single component! EJB 2.x classes and interfaces leave their mark all over your application code.

Is OSGi heavyweight in the same way that EJB was heavyweight? For this myth to be busted, I’ll have to show that OSGi does not unduly invade our code.

At first glance, you might say that OSGi is heavyweight because bundle activators must implement the org.osgi.framework.BundleActivator interface and implement its start() and stop() methods.

For examp le , cons ider HelloPublisher.java in Listing WAL-1, a simple bundle activator that registers an instance of HelloImpl as an OSGi service exposed through the HelloService interface.

Page 14: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

{OSGiMyths}

14 | NoFluff JustStuff .com

Clearly, if you’re writing a bundle activator, you’re allowing the OSGi API to invade your code. Doesn’t this mean that OSGi is heavyweight?

If this myth is to be confirmed, I’d have to show that OSGi requires you to work directly with the OSGi API. All I’ve done so far is show that you can write code against the OSGi API. But it doesn’t mean that you must do it that way.

The OSGi specification offers three ways of publishing and consuming services: Declarative Services (DS), Blueprint Services, and programmatically. My preferred way to publish services is to use the OSGi Blueprint Services. With Blueprint Services, an instance of HelloImpl can be published declaratively by creating an XML file within the bundle’s /OSGI-INF/blueprint directory. Listing WAL-2 shows a Blueprint Service declaration of the HelloService.

No Java code was harmed in the publication of the HelloService! In fact, there’s no need for the bundle activator at all. The service implementation, interface, and Blueprint Service declaration file are all that’s needed. The only place where OSGi is involved is in the Blueprint Service declaration file. The rest is just plain-old Java code...completely unaware that it will be used as an OSGi service.

If you’re thinking that the Blueprint Service declaration file looks a little like a Spring configuration file, then you’re right! The Blueprint Services specification is based on Spring-DM, and Spring-DM is the reference implementation of Blueprint Services.

Publishing services with OSGi Blueprint Services is just one bit of evidence that working with OSGi doesn’t require working directly with the OSGi API. I could go on and show you several more examples, but I’ve got other myths to bust. Suffice it to say that most anything you could want to do with OSGi can be done declaratively and without importing anything from an OSGi-specific package.

MYTH BUSTED

Myth: OSGi is Complex

When I hear about the alleged complexity of OSGi, what people are usually referring to is the manifest file.

In truth, the current version of OSGi only requires that you put only one entry in a bundle’s /META-INF/MANIFEST.MF. The Bundle-SymbolicName entry is used to identify a bundle and is the only compulsory header in a bundle’s manifest. As shown in Listing WAL-3, the manifest for the hello service bundle may contain the following Bundle-SymbolicName entry:

Of course, if all you do is add a Bundle-SymbolicName to the manifest, then you’ll have a perfectly valid OSGi bundle that’s probably perfectly useless. Depending on what the bundle is intended to do, you may need to add additional OSGi headers, such as:

Import-Package – Specifies one or more packages that need to be imported from other bundles into the bundle’s class space.

Export-Package – Specifies one or more packages that are to be exported from this bundle and made available for import by other bundles.

But if you’re writing bundle manifests by hand, you’re doing it wrong. Tools such as Peter Kriens’ Bnd or SpringSource’s Bundlor can automate much of the management of bundle manifests.

MYTH BUSTED

Myth: You Don’t Need OSGi to Achieve Modularity

I’ll concede that you don’t need OSGi to build modular applications. Modularity is ultimately a design discipline. With the right mindset and a lot of discipline, it’s very possible to design modularity into an application without OSGi.

But before we label this myth as confirmed, let me point out that there’s not much in the Java platform that encourages modularity, much less enforces it. Even the most disciplined and well-intentioned Java developers

<?xml version=”1.0” encoding=”UTF-8”?>

<blueprint xmlns=”http://www.osgi.org/xmlns/blueprint/v1.0.0” default-activation=”lazy”> <bean id=”helloService” class=”com.habuma.hello.service.impl.HelloImpl” />

<service ref=”helloService” interface=”com.habuma.hello.service.HelloService” /></blueprint>

Listing WAL-2: /OSGI-INF/blueprint/hello-service.xml

Bundle-SymbolicName: com.habuma.HelloWorldService

Listing WAL-3:

OSGI Entry in /META-INF/MANIFEST.MF

Page 15: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 15

make mistakes. In short, if modularity isn’t enforced, then it doesn’t exist. Although OSGi doesn’t strictly enforce modularity, it does strongly encourage it. With OSGi, it’s easier to develop modularity than not to.

And let me also suggest that it’s possible to employ dependency injection without Spring or Guice. And you don’t need an ORM or JPA to do object persistence. An IDE is unnecessary when it comes to developing in Java. And you don’t need JUnit to write well-designed and working code. You don’t need any of these things; but I am not ready to give them up.

I think OSGi is an essential choice in developing modular applications. But because it is possible to achieve modularity without it, I’ll call this myth...

MYTH PLAUSIBLE

Myth: OSGi is For Eclipse Users

This myth is probably less prevalent than it was a year ago. But I still occasionally hear from someone who thinks that OSGi is an Eclipse thing and isn’t applicable to people who work with IntelliJ IDEA or some other IDE.

The origin of this myth seems to be that, for quite awhile, the vast majority of articles about OSGi were written about the Eclipse Plugin Development Environment (PDE). If you were to read a handful of articles about OSGi, you could easily walk away with the impression that Eclipse is the only way to develop OSGi bundles.

Truthfully, however, OSGi is not coupled to any particular IDE. Certainly, Eclipse PDE offers some great features for OSGi developers. But IntelliJ users can develop OSGi bundles using the Osmorc plugin.

Even with PDE and Osmorc as options for developers and even though I use Eclipse (SpringSource Toolsuite, to be precise) for Java editing, I find it easy enough to edit most OSGi artifacts (such as manifests, Bnd files, and blueprint XML files) using a plain text editor.

To sum it up, Eclipse is certainly a fantastic option for working with OSGi. But OSGi doesn’t require Eclipse. You’re free to pick whichever Java development environment that suits you best.

MYTH BUSTED

Myth: OSGi is Not A Standard

Personally, I’m not sure that standards are as important as some people think. Spring isn’t a standard. Hibernate isn’t a standard (although JPA was heavily influenced by Hibernate). Many of these established frameworks and libraries are rarely questioned anymore, despite the fact that they aren’t directly backed by a JSR. Solutions, after all, are more important than standards.

But, if standards are something that are important to you or your company, then rest easy knowing that OSGi is, in fact, a recognized Java standard.

To start, OSGi is self-standardized by the OSGi Alliance, the governing body that produces the OSGi specification.

But bringing it closer to Java, OSGi R4.1 is the subject of JSR-291 “Dynamic Component Support for Java SE”. This Java Specification Request (JSR) was given final approval and was released in mid-2007. The ballot included 12 votes in favor, two votes against, and two members of the expert panel did not vote.

In fact, despite recent efforts in the Java Community Process ( JCP) to produce a modular izat ion spec i f i cat ion , OSGi i s the on ly recogn ized modularization specification that has been approved within the JCP.

MYTH BUSTED

Myth: OSGi is Too New

The buzz around OSGi is fairly new. Until Eclipse 3.0 based its plugin infrastructure on OSGi in mid-2004, most developers hadn’t even heard of OSGi. OSGi’s popularity heated up again in late 2007 when SpringSource started promoting an OSGi-based development model in some Spring projects.

But OSGi is actually older than you may think. By the time you read this, OSGi will have celebrated its 11th birthday, having reached many milestones along the way, including those shown in the time line in Figure WAL-1 (next page).

As you can see, OSGi has been around for a long while; longer than many technologies that are readily accepted without question.

MYTH BUSTED

Page 16: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

{OSGiMyths}

16 | NoFluff JustStuff .com

Myth: OSGi is Not Ready for the Real World

The real world is a scary place. Applications must be robust and scalable to survive. Can OSGi stand up to real world pressures?

In my experience as a software developer, the litmus test for whether something is a real world application or not has always been “Would you use it to save lives?” and “Would you use it to launch rockets?”

I can’t say for sure how many lives OSGi has saved or how many rockets have been launched with it. But I do know that OSGi has played a part in medical imaging systems and has been used at NASA. Siemens Medical Solutions and ProSyst Software used OSGi in the development of a solution for maintenance of medical imaging solutions. And NASA is using OSGi on its Mars rover missions.

If that’s not enough to convince you that OSGi is ready for real world problems, then consider that since version 3.0, the Eclipse IDE has used OSGi as the basis of its plugin system. And virtually every major Java application server is either based on OSGi or has plans to build upon OSGi. And a few of those servers are even starting to expose OSGi’s modular programming model to developers.

MYTH BUSTED

Myth: There’s no tool support for OSGI

Despite common belief, there are several great tools for working with OSGi. Certainly, there’s room for more tooling around OSGi, yet there is no dearth of OSGi tools, as some have suggested.

When it comes to OSGi tools, the first tool that comes to mind is Peter Kriens’ Bnd. Bnd is a remarkable little tool for creating and diagnosing problems with OSGi bundles. Using Bnd you can view a bundle’s manifest and contents, wrap a non-bundle JAR file so that it becomes an OSGi-ready bundle, create a bundle given a specification and classpath, and verify the validity of a bundle’s manifest.

What’s especially interesting about Bnd is that, although it comes as a single JAR file, it is actually three tools. If you run it with java -jar bnd.jar, it is a command-line tool. If you add it to the Eclipse plugins directory, it is an Eclipse plugin. And, it has an Ant task built into it so you can use it in an Ant build.

Similar to Bnd is SpringSource’s Bundlor. Bundlor addresses many of the same problems as Bnd, but with some additional features that make for a better experience when used within Eclipse.

And there are a lot more tools for OSGi...more than I could possibly list here. But to name a few of my favorites: Pax Runner, Pax Construct, Pax URL, Apache Felix File Install, Apache Felix Maven Bundle Plugin, Apache Felix Web Console, Spring Roo’s Bundlor Add-on.

And there are hundreds more tools for working with OSGi. Just looking at the Felix, Equinox, and Pax web sites alone should give you a great deal of tools to start with. Beyond that, asking Google will turn up several dozen more tools.

MYTH BUSTED

Figure WAL-1: The life and times of OSGi

Page 17: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 17

Myth: OSGi is Difficult to Test

In conventional applications there are several levels of tests that can be written. At the very lowest level, unit tests assert the correctness of an individual class or method. Stepping up a bit, integration tests check the correctness of several units in concert.

These kinds of tests are still applicable in OSGi and are written in pretty much the same way. Deep down in every OSGi bundle is a bunch of classes that can be tested in a unit test. And the classes within a bundle can be tested together in an intra-bundle integration test.

In addition to the conventional forms of testing, OSGi adds inter-bundle testing. Inter-bundle testing involves installing one or more bundles into an OSGi framework (such as Equinox or Felix) and then making assertions about how those bundles work together, what packages they export, and (even more interesting) the services that they expose.

Testing OSGi bundles “in-container” may sound difficult, but it’s actually quite easy. Pax Exam is an extension to JUnit 4 that loads a selection of bundles into an OSGi runtime of your choosing along with an on-the-fly bundle that contains the test class itself. From within the OSGi framework, the test can access and make assertions against services and exports from other bundles.

Spring-DM also includes support for in-container tests that is quite similar to Pax Exam. Spring-DM’s testing support differs from Pax Exam in that it is Spring-aware and tests can be autowired with references to OSGi services. But where Pax Exam is based on JUnit 4, Spring-DM’s testing support is based on JUnit 3.

Clearly, testing OSGi bundles isn’t that difficult. Therefore, this myth is...

MYTH BUSTED

Myth: OSGi Will Be Replaced by JSR-294 and Jigsaw

I find that there are a lot of people who think that OSGi may be okay, but with JSR-294 and Project Jigsaw looming they might as well wait—that OSGi will be obsolete when Java 7 is released. Before I can attempt to bust this myth, it’s important to understand what these two projects entail.

JSR-294’s title tells most of its story. The “Improved Modularity Support in the Java Programming Language” indicates that JSR-294’s mission is to focused on language features to support modularity. In addition to language changes, there will also be some JVM changes to support modularity. JSR-294 is not, however, a module system. OSGi may leverage some of JSR-294’s features, but OSGi will not be replaced by JSR-294 .

Project Jigsaw, on the other hand, is a module system that takes advantage of JSR-294. Its primary aim is to modularize the JDK itself. The idea is to break the JDK up into pieces so that you only need to install what you need (after all, how many projects are actually using the CORBA stuff that comes in the JDK?). Jigsaw could be used to modularize applications built in Java, but the focus of the project is squarely on the JDK.

In any event, JSR-294 is slated to be part of Java 7. Given that it involves language and JVM changes, this implies that JSR-294 won’t be usable with older versions of Java. Contrast this to OSGi which works fine with versions of Java going back as far as Java 1.3 (not to mention other languages such as Groovy and Scala that also run on the JVM).

Jigsaw has an even more dubious support story. Jigsaw is being grown out of the OpenJDK project and will not be an official part of the Java 7 specification. Therefore, it may not be supported in all Java SE 7 implementations.

To put this myth to rest, also consider that JSR-294/Jigsaw only speak to the issue of defining modules whose contents may or may not be visible to other modules in an application. Neither provide for the service-oriented approach to development afforded by OSGi’s service model.

MYTH BUSTED

Myth: OSGi is Impossible to Use With Existing Applications

This almost sounds like an admission that modularity is too hard to do on your own, without something like OSGi to guide you. After all, if you could do it on your own, then your application would already be broken into modules and adding OSGi to manage those modules would be a cinch.

It is certainly much easier to start using OSGi on an application early on and let the application grow with

Page 18: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

{OSGiMyths}

18 | NoFluff JustStuff .com

OSGi to underpin the modular design. Without some sort of module system in place early on, a project’s code base will tend to veer away from modularity as it grows and it will become more difficult to bring it back on course.

But that does not mean that it is impossible to work OSGi into an existing application to make it more modular. You will just need to develop a plan to move toward modularity and carefully follow it to its end.

For example, suppose you have a traditional web application that is deployed as a big monolithic WAR file. I propose the following stages to bring it in line with OSGi:

Convert the WAR file into an OSGi bundle by adding a Bundle-SymbolicName header to its manifest as well as a Bundle-ClassPath header that references /WEB-INF/classes and each of the JAR files in /WEB-INF/lib.

Extract the third-party libraries out of /WEB-INF/lib and deploy them as bundles alongside the WAR bundle.

Extract proprietary libraries out of /WEB-INF/lib and deploy them as bundles alongside the WAR bundle.

Extract classes out of /WEB-INF/classes into one or more external bundles (if it makes sense to do so).

Is OSGi impossible to work into existing applications? Not at all. Is it difficult? Perhaps, but much of the difficulty has to do with how modular the application is already. Adapting an application that is referred to as a “big ball of mud” to OSGi might be very difficult. But if you’re already taking steps toward modularity, then adding OSGi to the mix will probably not be that big of a deal.

MYTH BUSTED

Myth: Third Party Libraries Aren’t OSGi-Ready

In order for a JAR file to be used as an OSGi bundle, it must have (at least) a Bundle-SymbolicName entry in the JAR’s manifest. And, if any of its contents are to be used by other bundles (a likely scenario), then it should export the public packages using an Export-Package header in the manifest.

Fortunately, as OSGi has gained popularity, more and more Java libraries are coming in the form of OSGi bundles. If you’re not sure, try installing the JAR file in an OSGi framework and then starting it. If it starts, then it’s a valid OSGi bundle. You can also crack open the JAR file and

examine the manifest to determine if it’s OSGi-ready or not.

If one of the libraries that your application depends upon is not OSGi-ready, then all is not lost. You have a few options to be able to use that JAR file within an OSGi application.

First, if the JAR file is a library within your project and you have the source code for it, then you have no excuses for not adding the manifest entries. You can either create the manifest by hand or use Bnd (or a Bnd-based tool such as the Felix Maven bundle plugin) to generate the manifest automatically.

If it’s an open-source library, then take a look in SpringSource’ Enterprise Bundle Repository or one of the other bundle repositories on the web. You might find a ready-made OSGi-ified version of your library there.

If you can’t find a bundle of your library anywhere, then no problem. Just use Bnd to wrap the JAR file as a bundle. What this does is create a new JAR file with the original JAR file embedded within it and the appropriate Export-Package entries to export the JAR’s libraries.

So, while not every JAR file out there is OSGi-ready, you shouldn’t let that stop you from using them in an OSGi application.

MYTH BUSTED

Myth: OSGi Doesn’t Have a Cool Name

OSGi sounds complex. OSGi sounds heavyweight. OSGi sounds like a painful medical procedure.

Okay, I’ll concede that the name OSGi doesn’t have a hip and cool name like Spring and Hibernate. Even the Jigsaw project has a much more fashionable name than the lackluster OSGi.

OSGi’s name was originally an acronym for “Open Services Gateway Initiative”, reflecting its initial focus on development of home gateway systems. Since then, however, OSGi’s approach to modularization has proven useful in many applications beyond home gateways and thus the acronym has lost its meaning.

Although the name has stuck, it’s no longer an acronym at all. It’s just “OSGi”. Many OSGi-related projects have cool names like Aries, Gemini, Pax, Equinox, and Felix. But OSGi still has it’s not-so-cool name. Therefore, I unfortunately must rule this myth...

MYTH CONFIRMED

Page 19: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 19

Conclusion

Desp i te be ing surrounded by a c loud o f misinformation, OSGi is a great way to carve up an application into discrete modules that can be developed, tested, and deployed independently. In this article, I’ve tried to dispel many of the myths that I’ve heard regarding OSGi.

There’s a lot of benefit in building modular applications. Factoring an application into modules makes it easier to maintain, understand, test, and reuse. OSGi is the Java standard for modularity that is both simple and lightweight.

If you’ve been avoiding OSGi because of any of the myths in this article, then I challenge you to give it another look.

References

Peter Kriens’ Bnd tool: http://www.aqute.biz/Code/Bnd

SpringSource Bundlor: http://www.springsource.org/bundlor

Osmorc plugin for IntelliJ IDEA: http://www.osmorc.org/

OSGi’s JSR: http://jcp.org/en/jsr/summary?id=291

Equinox helps NASA Improve Ef f ic iency of Interplanetary Missions: http://www.eclipse.org/equinox-portal/case_studies/nasafinal.pdf

OPS4J (Pax): http://www.ops4j.org

Apache Felix: http://felix.apache.org

Eclipse Equinox: http://www.eclipse.org/equinoxAbout the Author

Craig Walls has been professionally developing software for over 15 years (and longer than that for the pure geekiness of it). He is a Principal Consultant with Improving Enterprises in Dallas, TX and is the author of Modular Java (published by Pragmatic Bookshelf) and Spring in Action and XDoclet in Action (both published by Manning). He’s a zealous promoter of the Spring Framework, speaking frequently at local user groups and conferences and writing about Spring and OSGi on his blog. When he’s not slinging code, Craig spends as much time as he can with his wife, two daughters, 6 birds and 2 dogs.

Page 20: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

NFJS One provides access to the top technical talent in the industry. Our services include: in-depth

public and private training sessions, project consulting engagements, team mentoring, and on-

site private events.

Investing in your people should be your number one priority

The benefits of keeping your folks sharp and ahead of the curve are endless:

Increase your organization's productivity Reduce development costs Accelerate employees' learning curves

The NFJS One Model:

NFJS One takes pride in ensuring a recognized expert is matched to your organizational strategies and initiatives. Whether it be a private 3 day training class on a specific tool, an organizational assessment, a project coach, etc. Our first goal is to get a grasp on your environment today.

What are your strengths/weaknesses (toolset, process, initiatives, etc)?What challenges are preventing you from meeting your goals?

Once we come to an agreement on services and benefits, NFJS One will align you to one of our 75+ experts – ensuring this person is a 5 star resource on the subject matter at hand.

Training:Our courses range from beginner to expert on various tools, process, frameworks, languages and more. We align our courses to your environment, and don’t come in for a lecture – be prepared to get your hands dirty and leave ready to use what you’ve learned tomorrow.

Mentoring/Consulting:We have various models to meet our customer requirements, including organizational assessments, agile coaching, and technology evaluations

Private Events:NFJS One is an entity of the No Fluff Just Stuff Software Symposium. If you are ready to host a professional conference with the best speakers in the industry – NFJS One is your provider.

[email protected]

Page 21: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 21

Open Source Business Intelligenceby Tim Berglund

As the relentlessly improving economics of

data storage and retrieval continue to reward

businesses for recording ever-greater volumes

of data, wise developers see a greater priority in

understanding the basics of business intelligence

tools and concepts. Traditionally these tools have

been the province of specialized data architects

and heavily funded strategic initiatives. Their

strategic nature is more obvious than ever, but

now free and open-source tools have emerged

that enable nontraditional data specialists to

take control of Big Data.

Big Data

Storage and computing are cheap and getting cheaper. The volumes of data collected by businesses and governments today are often expressed in staggering numbers that would have been prohibitively expensive or outright impossible just a few years ago. In the June, 2008 issue of Wired magazine, Chris Anderson illustrated the progression of big data. A 1TB hard drive, at the time of this writing an $85 device, holds 260,000 songs. The Hubble Space Telescope has generated 120TB of imagery in its storied, 20-year history. The Large Hadron Collider will generate 330TB of data each week when fully operational. Google’s servers process more than 1PB of data every 72 minutes.1

Most of us face much more pedestrian concerns than searching for the Higgs Boson or indexing the entire web every hour and a half. However, the decreasing costs of data storage and analysis are bringing entirely

new classes of data processing problems into the view of the ordinary developer. More importantly, the professional open source business model pioneered a decade ago by Red Hat and others is providing low-cost tools to meet this emerging need, working to reverse the long-term association of data analysis tools with six-figure license fees, marching armies of consultants, and golf. Fortunately for many businesses, open source business intelligence is coming into its own.

Isn’t Business Intelligence an Oxymoron?

The label Business Intelligence prompts jokes about our favorite management missteps and corporate blunders, but the category has nothing to do with the skill of any particular management team or corporate board. Rather, business intelligence (BI) is a suite of tools, concepts, and practices oriented towards taking the raw data on which a business operates and turning it into actionable information that decision makers can trust.

On a fundamental level , businesses conduct transactions with their trading partners, exchanging business output for valuable consideration—usually money. A business’s output might be consulting services, information, an entertainment experience, the brokered products or services of another business, a manufactured device, or any other valuable thing. In the process of producing its output, a business can engage in any number of smaller, internal transactions, with many differentiated operational units inside the business trading goods and information for the ultimate purpose of generating customer value.

BI concerns itself with the quantitative analysis of those transactions. Each unit of business exchange can be measured, and each measurement can be qualified by a number of related attributes. A measurement might be a sales total, the duration of a tech support phone call, or the weight of a shipment. A qualifying attribute might be the customer making a purchase, the call center agent taking a call, or the carrier selected to deliver goods.

Page 22: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

{openSource}

22 | NoFluff JustStuff .com

These measurements and their metadata are typically scattered around the enterprise in diverse formats in potentially many separate databases and database technologies. That diversity is a normal consequence of organizational differentiation, a phenomenon which occurs in large enterprises in order to break them down into small enough pieces for decision makers to be able to think about them. Due to the localized work of differentiated departments, enterprise data is stored in forms that create local efficiencies optimized to the needs of the people and systems directly using it. Individually these systems serve locally optimized business purposes, but together they create a significant global suboptimization. For a BI stack to function, the operational data of the enterprise must be re-integrated into one data store where it can represented in a consistent format and made accessible to standard database management tools. This store is called the data warehouse.

Once the business’s data is made accessible in a data warehouse, it also must be made visible through reports. Decision makers have predictable questions they will ask of the data, and reporting tools must be made available to render answers in a convenient and readable way, perhaps employing sophisticated visualizations to render complex data to the eye. Through a variety of delivery mechanisms and formats, reports can provide up-to-date answers to well-understood questions.

Beyond reporting, analytics tools can expose ad-hoc views of enterprise data to enable creative, entrepreneurial decision makers to explore data dynamically. A well-trained manager with an analytics console can approach the business’s data with an approximate idea of the question she intends to ask, then iterate on the question, getting more and more refined answers as her understanding of her own query and the business results she’s exploring improve over time.

Let’s take a more detailed look at data warehousing, reporting, and analytics in turn.

Data Warehousing

A data warehouse is a unified store of enterprise data maintained in parallel with the existing, operational data stores used by the business’s applications. It is structured differently than a typical transactional database, featuring simple but strict parent-child hierarchies and a high degree of denormalization relative to most application databases. A specialized batch process updates it at regular intervals, making it a purely derived, read-only version of the business’s live data.

A complex organization will support a diverse set of applications, each tailored to the unique needs of an individual group or organizational unit. These applications may or may not share a single database; moreover, not all of the data on which an enterprise operates may even be stored in a relational database. A minority of it might be in non-relational mainframe stores, spreadsheets, flat files, XML, or other formats. For the data that is in a relational database, its schema is probably heavily normalized, designed to eliminate data duplication and enforce the integrity of parent-child relationship between domain entities.

The data warehouse, on the other hand, is stored in a single relational database. The diverse data sources of the enterprise are brought together in one place, where the data quality problems can be addressed and varying accounts of the physics of the business—different units of measurement, divergent models of business activity, and parallel lists of common entities like customers or locations—can be conformed to one agreed-upon standard. Furthermore, this single account of the business’s activity is not stored in a heavily normalized schema, but is intentionally denormalized. Data duplication is not a cost within the data warehouse as it would be in the transactional database, but is instead a necessary feature designed to enable queries that are easy to write and fast to execute.

Figure BER-1

Figure BER-2

An example of a

normalized schema of

the kind that application

software might use. A star schema of the kind that might be used in a data warehouse. This

schema contains the same data as shown in Figure BER-1.

Page 23: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 23

The fundamental data warehouse design pattern is the star schema. At the center of the star is a fact table. A fact table describes a series of discrete, measurable business events like shipments, orders, or support calls. Each row in the fact table can contain as many numeric measurements as can be associated with the event the fact describes. A sales order might have a subtotal, a tax amount, and a shipping amount, while a support call might have a hold time, a duration, and count of times the caller was forwarded. Fact tables also contain many foreign keys to dimension tables. Dimension tables contain the non-numeric attributes of a fact, or perspectives from which a fact might be understood. A sales order might have salesperson, customer, and shipper dimensions, while a support call might have customer, service representative, and product dimensions.

The organization of the data warehouse into facts and related dimensions presupposes that later analysis will focus on selecting and aggregating business measurements in light of the various perspectives provided by the dimension tables. Exactly how this works will become clear when we look at analytics later on.

The work of extracting the source data, conforming it to business norms, transforming it into a star schema, and finally loading it into the data warehouse is performed by a specialized program called an ETL (Extract, Transform, and Load) tool. Traditionally, enterprise-grade ETL toolsets have been a costly piece of software infrastructure, but now several open-source options have emerged with free and low-cost licenses.

Talend Open Studio (TOS) is one such option. It is an Eclipse-based visual designer providing hundreds of built-in components for data integration, transformation, and job control. Components can be dragged and dropped onto a design palette with data flows and execution paths mapped visually between them. External resources accessed by a job, like a database, a flat file, a spreadsheet, or an FTP site, can be stored in the project’s metadata repository, so static configuration information need not be duplicated across the project.

The built-in component collection in TOS is quite rich, and covers most of the integration and transformation tasks a typical ETL or data integration job will need. When the standard components aren’t adequate, though, TOS provides components to run arbitrary Java or Groovy code inside of a job. And when custom

requirements overwhelm that facility, Talend provides the Talend Forge site, which contains a repository of custom components and support for developing your own.

TOS provides a rich debugging and execution environment for developing ETL jobs, but is not adequate as a production environment by itself. For deployment, Talend generates an executable JAR artifact containing all generated code and all required dependencies to run from a command line2. Talend Open Studio is licensed under the GPL 2.0.

Reporting

An integrated, conformed data warehouse with a smoothly running ETL process providing hourly updates to the data is an object of enterprise software delight, but it is of little use if the data cannot be observed. The primary task of the reporting system is to expose data to users in a way that makes it easy to understand and visualize. A careful consideration of reporting also further underscores the value of the data warehouse.

Reports answer questions that are known in advance. A report definition generally consists of a fixed data query and report layout. The layout specifies the location and formatting of data tables, charts, and static decoration like a company logo or report title. The report’s query, which provides the data to populate the layout, usually will be parameterized, being able to

Figure BER-3The Talend Open Studio visual designer showing a job that creates a fact table from

a sales database.

Page 24: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

{openSource}

24 | NoFluff JustStuff .com

be run over a certain date range, for a certain product line, or on behalf of a certain salesperson or customer service representative. Even with these parametric inputs, however, the basic structure of the report doesn’t change over time. The report reliably produces a timely answer to a known, fixed question.

This doesn’t imply that report design must be difficult, or must be constrained only to the hands of specialists; report writing is not software development. Power users in the business are quite capable of designing their own reports when trained on a comprehensible database schema and given a visual design tool. Jasper iReport is one such tool. It is free and open-source, licensed under the GNU Affero Public License 3.0. iReport provides an easy-to-use, drag-and-drop environment for designing, testing, and executing reports against an existing database.

However, l imiting reports to a desktop client application is not likely to be a part of a winning data delivery architecture. You will almost certainly want to integrate reporting into your enterprise applications, delivering them through the web, to mobile devices, or through internal web services—not through Jasper iReport or a similar tool. JasperReports is a first-class Java API providing integrated reporting functionality. It is capable of digesting the report definitions created by iReport and rendering them into HTML, PDF, CSV, Excel, and other file types using data provided by the application in a number of integration-friendly formats. JasperReports is free and open source, licensed under the Lesser GNU Public License (LGPL).

Often direct integration of report generation into your application’s business tier imposes too great a computational burden on the app servers, resulting in slow reporting and sluggish application response in general. In this case, the reporting function should be partitioned from the application and deployed on a separate server or cluster. JasperServer is a free and open-source reporting server, licensed under the Affero GNU Public License 3.0 that provides a direct web UI to reporting functions as well as a RESTful API for integration with client applications.

Careful consideration of reporting functions also helps underscore the value of the data warehouse. Report queries often feature large result sets, selects on columns of low cardinality whose performance is not substantially improved by indexing, and lots of computationally expensive aggregation. They all have run-time characteristics that are substantially different from the transactional queries issued by an application,

and would impair the performance of the transactional database if issued directly against it. Running them against the warehouse allows designers to defer the performance penalty to a less-critical system and to optimize that system to handle the different queries issued by the reporting system.

There are other reasons a warehouse is a safer place for reporting. Designing a detailed report against a heavily normalized transactional database can be a daunting task even for a database professional that is very familiar with the system, and might be outright impossible even for a power-user in the business who wants to write his own reports. The data needed for one report might not even exist in a single target database, or it might be scattered in different operational areas of the schema whose attending applications have adopted different units and data representation idioms. Only the data integration and conforming functions of the ETL process can bring all of the enterprise’s data together in one safe place to support a responsibly designed reporting system.

Analytics

If reporting is the process of answering questions known in advance, then analytics is about enabling decision makers to interact with data when they don’t yet know the right question to ask. Analytics systems are sometimes called Online Analytical Processing, or OLAP, systems. An OLAP query centers on a measured business event which can be viewed through multiple perspectives. This can be very expensive both conceptually and computationally, so we will introduce some helpful abstractions and optimizations to ease the burden.

A typical report provides a single perspective on the subject data. Reports may apply many individual constraints, but all of them are fundamentally functions that return lists. A sales report may list all salespeople selling over $100,000 in February with territories in the United States, and it may list many attributes of those salespeople and their sales performance. It could even include sub-reports detailing each salesperson’s top-selling products, but it’s hard to imagine much more complexity than that. Suppose you want to study the relationship between shipping method and sales results while retaining a view of salesperson, product, and region. It is precisely this kind of perspective-driven view that analytics tools provide.

The fundamental unit of OLAP analysis is the cube.

Page 25: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 25

Cubes can be understood in terms of the star schema of the data warehouse. The star schema’s dimension tables form the literal dimensions of the cube, and the fact table provides the results visible at each “cell” inside the cube. A cube can have less or more than three dimensions, but the paradigm is best understood in terms of its three-dimensional analogy.

Suppose we have a fact table representing sales orders with three associated dimensions: time, salesperson, and product. Visualize a three-dimensional space in which each of the three axes represents one of those dimensions (Figure BER-4). Each point in that space holds one or more sales order facts. Thus if we select the time of February 22, 2010, the salesperson of Willy Lowman, and the product of Men’s Large Left-Handed Baseball Glove, we would have selected all of the orders Willy booked for that kind of glove on that day.

If, on the other hand, we wanted to know who sold the most of what on February 22, 2010, we might ask our analytics tool to display a table in which salespeople were rows and products were columns, then to aggregate sales numbers in each cell. It is as if we have taken the data cube and squashed it along its time dimension into a square, leaving one axis of product types and one axis of salespeople. This would provide a convenient, two-dimensional, tabular view of our dimensional analysis, allowing us easily to see whose numbers were strongest in which product lines—assuming the number of salespeople and the number of products were manageable3.

Before we leave this example, we should note that it is no small thing that we asked the OLAP system to aggregate sales volume. In a query spanning a long period of time for a high-volume sales organization, this could mean hundreds of thousands or millions of rows being retrieved from a query to compute a sum. It’s possible for fast, responsibly provisioned hardware to return those results in a fairly timely fashion, but for large data sets, it’s unlikely. The design of the data warehouse needs to be extended to help.

When users are observed using a particular aggregation over and over—for instance, if some decision-maker in the organization always wants monthly sales numbers by salesperson and region—then it is desirable to optimize the schema by building an aggregate table. This takes the form of a new fact table expressing aggregate facts referencing new, aggregated dimension tables. Aggregate fact tables will have fewer dimensions (all dimensions other than those participating in the aggregation will disappear), and those dimensions tables will have fewer records in them. The tables’ smaller size and pre-computed values are jointly responsible for the performance gains. In general, if an aggregate query is executed frequently, and building an aggregate table would result in at least a 90% reduction in the size of the fact table, it’s worth it to perform this optimization.

Analytics tools have traditionally been a very expensive part of the business intelligence stack. Now vendors like Pentaho have provided options like the Pentaho BI Platform, which provides a rich schema editor and a pivot table interface to the data warehouse. The BI Platform contains other enterprise-grade features a data architect might want in an analytics platform, like role-based permissions and a dashboard interface. Once trained in the data warehouse schema, decision makers can be turned loose on open-ended views into the enterprise data, free to ask, refine, and re-ask whatever questions the warehouse is capable of answering. Tools like the Pentaho BI Platform are the crown jewels of the BI stack: they expose the full power and value of the enterprise data to decision makers in a form they can readily use to improve the operations of the business. The Pentaho BI Platform Community Edition is free and open-source, licensed under the GPL 2.0.

Figure BER-3

A graphical respresentation of a data cube. Each axis represents a

dimension in the star schema, with fact table values located within the

“cells” inside the cube.

Page 26: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

{openSource}

26 | NoFluff JustStuff .com

Conclusion

The increasing size and scope of enterprise data stores and the increasing demands for intelligent analysis of that data have placed a growing premium on business intelligence as a data processing discipline and as a toolset. Professional open-source vendors like Talend, JasperSoft, and Pentaho have dramatically lowered the barriers to entry for motivated developers and enterprising organizations to enter the field, bringing tools that once cost hundreds of thousands of dollars into play for free or for affordable subscriptions prices. Now more than ever, developers wanting to make a difference should master the theoretical concepts behind business intell igence and familiarize themselves with the tools of a growing trade. Data is only getting bigger. Our skills must grow with it.

For Further Study

The following works are helpful source material to continue your business intelligence education. The Kimball Group and its founder, Ralph Kimball, are prominent thought leaders in the data warehousing space.

Business Intelligence For Dummies. This is a good overview of basic concepts written in the quirky For Dummies style.

Adaptive Business Intelligence. A treatment of business intelligence generally by a recognized leader.

The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence. A summary of data warehousing and BI concepts by the redoubtable Kimball Group. This volume is less detailed than their usual works, and is therefore good for beginners.

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. A comprehensive treatment of data warehouse schema design.

The Data Warehouse ETL Toolkit. A very thorough treatment of the ETL process, with an overview of schema design principles.

The Data Warehousing Institute is a professional association that produces some helpful educational resources related to data warehousing specifically. They may have a chapter and educational events in your area.

About the Author

Tim Berg lund runs a consult ing f irm ca l led the August Technology Group, which provides tra in ing , development , and coaching services to customers building web applications with open-source tools, especially with the Grails framework. His technology interests span web appl ications, business integration, data architecture, and software architecture, but his greatest passion is to help developers improve in their craft. He is a speaker on the No Fluff Just Stuff tour, at conferences internationally, and at user groups in the United States. Through his partnership with ThirstyHead.com, Tim offers public and private classroom training in Groovy, Grails, and the Liquibase database refactoring tool, and is available to develop custom courseware by private engagement. His firm offers first-class consulting and development using open-source business intelligence tools. He lives with the wife of his youth and their three children in Littleton, CO.

Endnotes

1 http://www.wired.com/science/discoveries/magazine/16-07/pb_intro

2 Since these JARs contain all dependencies, they are quite unwieldy. A good future open-source contribution might unpack a Talend JAR, remove its many megabytes of open-source dependencies, and re-bundle the generated code for scripted execution by Gradle or Maven, letting that tool’s dependency management framework do the heavy downloading. This would decrease deployment times for ETL jobs that change frequently.

3 If the count of salespeople or the number of products were too large to consider all at once, we could define hierarchies in the cube’s dimensions, such that we could group salespeople into business units, and individual products into product families. Hierarchies like this are always necessary if there are too many individual categories in a single dimension. No data analysis tool can give the human mind the power to think about more than a few things at once.

Page 27: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Come to the Ü for an Extreme Tech Experience

Über Conf will take place June 14 - 17, 2010 in Denver, CO. This event will focus onthe best practices, new languages, and latest advancements on the Java Platform.

This is an exciting time of inovation and change. Java is not just a language. Java is atechnolgy platform and ecosystem. Über Conf will educate developers and explorethe powerful languages and tools which are changing the way we create softwareusing the Java Platform.

» Technology Deep DiveThere are no beginner sessions here. This is your opportunity to go beyond thebasics and master critical skills. We expect that you are a competent developer whois ready to solve problems using today's best tools and practices.

» Agile Practices that WorkSoftware is a difficult industry with high rates of failure. To create winning teams, weembrace principles laid out by the Agile Manifesto. Speakers at Über Conf emphasizeand present on topics such as: Test Driven Development, Continuous Integration,Code Quality Measurements, Code Smells, Team Building, and CustomerCollaboration.

» Hands On WorkshopsAt Über Conf you will not just listen to lectures. You will have the opportunity toparticipate in workshops, get your hands dirty, and write code.

» Learn from the BestÜber Conf will bring together many of the industry's best project leaders, developers,authors, and trainers.

» RatesTake advantage of our all inclusive travel package. This package includes: conferenceregistration, 3 nights lodging, and airfare in the continental US.

Register By PriceAll-InclusivePrice

Super Earlybird Rate Monday, April 12 $1,250 $2,100

Earlybird Rate Friday, May 14 $1,400 $2,250

Regular Rate $1,550 $2,400

For more information please visit http://uberconf.com/ or [email protected]

Alex AntonovTechnical Lead on theCore Frameworks team atOrbitz Worldwide

Tim BerglundDeveloper, Consultant,Author

Cliff ClickChief JVM Architect ofAzul Systems

Esther DerbyCo-author of "BehindClosed Doors: Secrets ofGreat Management"

Hans DockterFounder and Project Leadof Gradle

Keith DonaldSpringSource Principal &Founding Partner

Define Über1 : a superlative example of its kind2 : to an extreme degree

Topics at Über ConfLanguages on the JVMSecuritySOA and ROAEnterprise JavaCloud ComputingJava InternalsAgilityMobile Dev (iPhone, Android)

Featured Speakers

Page 28: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

28 | NoFluff JustStuff .com

Clojure Taking Control Of Your Languageby Howard Lewis Ship

Clojure enthusiasts often talk about the “power”

of the Clojure language, but it can be hard

to nail down exactly what that means. We’ll

take a close look at Clojure’s macro feature, to

demonstrate how Clojure gives you control over

the language itself.

Who Owns Your Language?

So you’re cranking through your current project and you notice that for about the hundredth time, you’ve created some kind of cookie-cutter solution to some common problem. Maybe you’ve cobbled together a couple of AtomicReferences to form a kind of lazily-evaluated function, or perhaps you’ve found a clever way to use a queue and a thread pool to run your algorithm in parallel. It could even be something more trivial, like remembering to invoke isDebugEnabled() before debug() when using a logging API. In any case, you suddenly think: “This should be part of the language.” Not just a new API, or some clever use of static methods and generics, but a new pattern baked right into the language. Chances are, what you need is some new language syntax, and that requires updates to the Java compiler. What do you do?

Well, unless you are very, very patient, you put these thoughts aside and keep cranking out code. Changing the syntax of the Java language is a long, long process: it requires buy-in from Sun and the other members of the Java Community Process. A JSR (Java Specification Request) will need to be written, hashed out by a committee and (eventually) changes to compilers and IDEs will roll out. Perhaps five years later, if everything goes well, you might be in a position to use your new language syntax in actual production code.

It’s not that much different for C# or C++ or Scala or most other languages, even dynamic languages like Ruby and Python. As a programmer, you are a

consumer of the language compiler provided to you. Even if you are savvy enough to fork the code for your compiler, it’s not likely you will have the influence necessary to get your customized compiler distributed very far. If you don’t own the compiler, you don’t own the language.

Clojure is a different beast entirely. It’s not just that Clojure is relatively new, or that it is open source and easy to fork on GitHub. Clojure is a Lisp dialect, and that one simple statement implies a vast number of differences from what you may be used to in a programming language. Any variation of Lisp exists to give the programmer all the power they can handle, and then some … no hand-holding, and more than enough rope to hang yourself.

Clojure, as a Lisp, is both a complete language and a language toolkit: you are expected to build the language you need on top of and from within Clojure. Clojure so completely embraces this concept that the term “domain specific language” is rarely, if ever, used. It’s all just considered programming.

The core language feature that supports these approaches is the Clojure macro, which should not be confused with the very simple facility built into the C family of programming languages. C macros operate at the source text level: they allow for very simple manipulations of the source code before it is parsed and compiled. Clojure macros operate at a higher, structural level. To appreciate the differences between these two approaches, we first need to take a step back and examine the differences between how Java and Clojure implement primitive operations.

Some Ifs, Ands, and Buts

Consider, for a moment, the lowly logical and operator in Java: &&. The job of && is to evaluate two boolean expressions and return true only if both of the expressions return true. If Java’s designers had somehow omitted && from the language, you could imagine writing it as a static Java method (as in listing LEW-1 on the next page).

Page 29: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 29

So, if left is true, the result is whatever right is. If left is false, the whole expression is false. In fact, if left is false, then we don’t really need right at all.

As method parameters, both left and right are evaluated before being passed to the and method. This is no big deal when they’re just local variables or simple expressions … but when right is the result of invoking method doATableScanAndAFewDatabaseJoins() it feels wasteful to evaluate it when left is false and right won’t even be used.

That’s why Java includes the && operator in the first place: as an operator, && can short-circuit. That is, when the left expression evaluates to false, the right expression is not evaluated at all. That’s great for the && operator but bad for you: as a Java programmer, you can only write methods, not operators.

A naïve re-implementation of and in Clojure, as shown in listing LEW-2, has the same problem. This code defines three versions of function and*: With no arguments, it returns true. With a single argument, it returns the argument. With multiple arguments it gets more interesting: the first argument is evaluated. If it is logical true (neither nil nor false) then the remaining arguments (if any) are evaluated. Otherwise, the first argument must be logical false (either false or nil) and is returned.

The good news is that we’re getting close to the desired short-circuiting logic present in the Java && operator. Clojure’s if may look like a function, but is really a Clojure special form: one of a number of built in function-like hooks into Clojure’s internals. The if form only evaluates its second argument if its first argument is logical true. That means that as soon as we come across an x that evaluates to false, we can terminate the recursive and* calls.

Still and* itself is a function, so all the arguments passed to it are evaluated first, before it is called … meaning that we are not sheltered from the pain of an unnecessary call to function do-a-table-scan-and-a-few-database-joins.

If we follow this approach, but want to ensure that no unnecessary expressions are evaluated, we will need to leverage Clojure’s let and if forms. Let’s say we were building a nuclear missile defense against stray asteroids that may intersect Earth’s orbit. We might, at some point, want to see if the missile launcher is enabled (a quick easy check), whether the missile is ready for launch (more expensive, as there’s a lot of subsystems to check) and finally, if there is a suitable target (very expensive, lots of database activity and calculations to see if there’s an asteroid coming and whether it’s dangerous enough to shoot down). To do this efficiently with if, we might write some code like listing LEW-3.

Notice that at each step, the result of a single evaluation is captured in a local symbol (we don’t want to call is-launcher-enabled? twice). The captured symbol is evaluated, and then either returned (if it is logical false) or evaluation continues deeper into the other function calls and their local symbols. This works, but is cumbersome. A better implementation of find-target is shown in LEW-4:

With the implementation of and* (from listing LEW-2), this code is not ideal, as it evaluates scan-for-target even when the missile is not enabled. What we want is a new implementation of and* that converts listing LEW-4 into LEW-3. That’s exactly what Clojure macros do.

public static boolean and(boolean left, boolean right){ if (left) { return right; } return false;}

Listing LEW-1: and implemented in Java

(defn and* ([] true) ([x] x) ([x & rest] (if x (apply and* rest) x)))

Listing LEW-2:

and* implemented in Clojure

(defn find-target [missile] (let [enabled (is-launcher-enabled? missile)] (if enabled (let [ready (is-missile-ready? missile)] (if ready (scan-for-target missile) ready)) enabled)))

Listing LEW-3:

Using if instead of and*

(defn find-target [missile] (and* (is-launcher-enabled? missile) (is-missile-ready? missile) (scan-for-target missile)))

Listing LEW-4:

Improved find-target function

Page 30: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

{clojure}

30 | NoFluff JustStuff .com

A macro is a special kind of function that converts a portion of a Clojure program from how the programmer authors it, into a form that Clojure can ultimately compile. The and* macro is shown in listing LEW-5.

Clojure compiles each function definition as it is first read from the source file; this applies to both ordinary function definitions and to macro definitions. The and* macro definition will be compiled first, and then that macro will be used when the find-target function is compiled. Inside find-target, the and* form is a kind of placeholder: the content of the form, (is-launcher-enabled? missile) and so forth, are passed to the and* macro function to be expanded out into nested let and if forms.

Invoking and* with no arguments evaluates to true. It’s as if someone used a text editor on the source and converted (and*) into true. Don’t let that metaphor confuse you, however: what’s being operated on are Clojure forms: lists, symbols and the like – not the individual characters. The return values from the macro are also forms: replacement forms, taking the place of the and* form. That’s exactly what a macro is, a function called to dynamically replace certain forms within an the overall expression.

With a single argument to the and* macro, all that’s needed is the argument itself. With multiple arguments, the and* macro starts building out nested let and if forms, much like those in listing LEW-3.

That last line of listing LEW-5 is where much of the magic lies; this is a syntax quote (denoted by the leading back quote), a special kind of list that primarily exists to make it easier to create macros. Inside the syntax quote, most symbols are passed through unchanged: they evaluate to themselves.

The ~ prefix causes a replacement … Clojure jumps up a level, out of the syntax quote, and substitutes in the value of x; in our example, the initial x will be (is-launcher-enabled? missile). The ~@rest sequence splices in the contents of list rest, forming a recursive

call to the and* macro, with multiple arguments. Clojure will continue with these forms and look for yet more macros to expand (including the recursive and* macro calls). This continues until all macros have been expanded.

That leaves the x# symbol. The trailing # on the symbol name is meaningful inside a syntax quote. It identifies x# as a generated local symbol that’s given a unique name, and x# is replaced with that name throughout the the syntax quote. This is the equivalent to the enabled and ready symbols in listing LEW-3. Why is this name-generating-and-replacing business necessary? Because if you used a fixed name like x, y, or temp you could be sure that someday, somewhere, someone would pass in as as an argument a symbol with a matching name and that could cause some part of the expanded macro to evaluate incorrectly.

Macro expansion is a phase that takes place just once; between the point that the Clojure reader has read individual characters and parsed them into forms (such as symbols, literals and lists) and the point at which Clojure converts the forms into Java classes. Once the and* macro inside the find-target function are evaluated, all that’s left are the nested if and let forms and the original function calls. In fact, any Clojure program, through the process of macro expansion, eventually consists of only special forms (such as if, let and def) or function calls. Only those can be used by Clojure to create Java classes.

At this point, you might be thinking that it’s pretty cool that you can use macros to replicate the behavior of Clojure’s built-in library … but it’s more than that. As shown in listing LEW-6, we’ve actually duplicated the definition of Clojure’s and macro. They are on equal footing, there is absolutely no difference between them.

In fact, the majority of Clojure is implemented in Clojure, and over time the ratio of Clojure code to native Java code will grow, not just because new functions are being added to the clojure.core

(defmacro and* ([] true) ([x] x) ([x & rest] (̀let [x# ~x] (if x# (and* ~@rest) x#))))

Listing LEW-5:

and* implemented as a macro

(defmacro and “Evaluates exprs one at a time, from left to right. If a form returns logical false (nil or false), and returns that value and doesn’t evaluate any of the other expressions, otherwise it returns the value of the last expr. (and) returns true.” ([] true) ([x] x) ([x & next] (̀let [and# ~x] (if and# (and ~@next) and#))))

Listing LEW-6:

clojure.core/and source

Page 31: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 31

namespace, but because of a determined effort to reduce the amount of Java code in Clojure to an absolute minimum, to assist in porting Clojure beyond the Java platform.

Now that we understand the basics, we can look at a few more interesting examples of macros. First off, that pesky logging code.

Consistently Invoking Loggers

I’m a fan of the logging framework Simple Logging Facade for Java (SL4J). This framework acts as a pluggable intermediary between your code and a logging framework such as the one built into the JDK, or Log4J. The Cascade web framework I’m developing uses SLF4J and it was important to create an idiomatic way for Clojure code to make use of it.

The goal was to make it easy to inject calls into Clojure code, such as (debug “Tracking %s (last-modified %tc)” file last-modified), with the following provisions:

• the logger is automatically determined from the current namespace name

• the call into the logger is properly cleared via isDebugEnabled()

• the format string is only evaluated (into a simple string) if debugging is actually enabled

quote, the log* macro captures the logger in a local symbol, and uses Clojure’s Java interop forms to invoke the correct check method on the logger before invoking the corresponding logging method. For the debug macro, these methods are isDebugEnabled() and debug(), respectively.

The call to format is nested deeply enough in the macro that it will only occur if logging is enabled.

Listing LEW-7 shows how macros can be used to tweak and simplify boilerplate code; next up we’ll see how macros can be used to build an entirely new and different language from within Clojure.

Cascade Templates

Cascade is an experimental web application framework built in Clojure. Like any web framework, a significant part of what it does is to convert a template version of an HTML page into a final stream of markup.

Whereas most mainstream web frameworks use some form of external file for templates, Cascade applications are built entirely inside Clojure, including view templates. This is driven by the template macro, as shown in listing LEW-8. The as-ul function uses an embedded template to format a list of strings into a <ul> element around nested <li> elements (an HTML unordered list).

(ns # {̂:doc “Wrappers around Simple Logging Facade for Java (SLF4J)”} cascade.logging (:import (org.slf4j LoggerFactory Logger)))

(defn #^Logger get-logger [#^String name] (LoggerFactory/getLogger name))

(defmacro log* [check-member log-member fmt & args] (let [logger-name (name (ns-name *ns*))] (̀let [logger# (get-logger ~logger-name)] (and (. logger# ~check-member) (. logger# ~log-member (format ~fmt ~@args))))))

(defmacro debug [fmt & args] (̀log* isDebugEnabled debug ~fmt ~@args))

(defmacro info [fmt & args] (̀log* isInfoEnabled info ~fmt ~@args))

(defmacro error [fmt & args] (̀log* isErrorEnabled error ~fmt ~@args))

Listing LEW-7:

Cascade’s logging namespace

This approach is not unusual when designing macros: on the one side, you need to consider the code that is required for your operation, even if that code will be tedious or verbose. On the other side, you determine what you’d like to appear in your source, which will be concise and readable. It’s then just a matter of writing a macro to convert the latter into the former.

Listing LEW-7 is the result, and like most Clojure code it is quite succinct. At the core is the log* macro, which uses the *ns* symbol, defined by Clojure, to determine the active namespace when the macro is expanded. Once the namespace name is known, it is possible to obtain a Logger instance. Inside the syntax

Page 32: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

{clojure}

32 | NoFluff JustStuff .com

Inside the template macro, keywords (such as :ul or :li) are directly mapped to HTML elements (<ul> and <li>, respectively). After an element keyword, a vector contains the body of that element. This is evaluated recursively; however symbols or function calls mixed into a template are incorporated into the expanded code. Ultimately, what’s produced is a Document Object Model (DOM) containing elements and their bodies; the last step in a Cascade request is to convert this to HTML markup and stream it to the client.

The details of how the template macro is implemented is beyond the scope of this article, but using the Clojure REPL (Read Eval Print Loop, the Clojure console) we can invest igate how the embedded template is expanded into code, as shown in listing LEW-9.

user=> (defn as-ul [items] (template :ul [ (for [i items] (template :li [ i ])) ]))#’user/as-uluser=> (pprint (as-ul [“moe” “larry” “curly”]))[{:type :element, :name :ul, :attributes nil, :value [{:type :element, :name :li, :attributes nil, :value [{:type :text, :name nil, :attributes nil, :value “moe”}]} {:type :element, :name :li, :attributes nil, :value [{:type :text, :name nil, :attributes nil, :value “larry”}]} {:type :element, :name :li, :attributes nil, :value [{:type :text, :name nil, :attributes nil, :value “curly”}]}]}]niluser=>

Listing LEW-8: Use of the template macro

Experimenting with the template macro is made easier by Clojure’s built-in macroexpand function, and by the pprint (pretty printer) function that’s part of Clojure’s contrib library.

Elements in the template expand to calls to the cascade.dom/element-node function; the overall template, and element bodies within the template, are assembled via the cascade.internal.viewbuilder/combine function. The nested for and template macros will themselves eventually be expanded into yet more calls to combine, element-node and so forth.

The end result is that the template is translated into the code necessary to build the Cascade DOM, working from bottom to top. Although this represents a considerable distance from ordinary Clojure code, it still conforms to Clojure conventions and it still, just like the and macro and the debug macro, ultimately transforms into executable Clojure.

user=> (pprint (macroexpand ‘(template :p [ “Current items:” ] :ul [ (for [i items] (template :li [i])) ])))(cascade.internal.viewbuilder/combine (cascade.dom/element-node :p nil (cascade.internal.viewbuilder/combine (cascade.dom/raw-node “Current items:”))) (cascade.dom/element-node :ul nil (cascade.internal.viewbuilder/combine (for [i items] (template :li [i])))))niluser=>

Listing LEW-9: Expanding the template macro

Page 33: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

Volume II Issue I

No Fluff Just Stuff, the Magazine | 33

Conclusion

features in Clojure is no different than adding new frameworks and libraries in a more traditional language … and it is a reasonable expectation that a Clojure application be much more concise than an equivalent Java application.

Clojure macros, like any powerful tool, take time to learn to use most effectively. In most cases, writing macros should be avoided; but in those situations where explicit function calls are too unwieldy, or precise control over the context of expression evaluation is required, Clojure macros provide you the power and control you need to really own the language.

Once you begin to appreciate macros and how they fit in with all the many other interrelated features of Clojure, you may just find that working in Java has become a primitive and tedious option. You have been warned!

References

Clojure: http://clojure.org

Simple Logging Facade for Java: http://slf4j.org/

Cascade web framework: http://github.org/hlship/cascade

Programming Clojure: http://pragprog.com/titles/shcloj/programming-clojure

The Joy of Clojure: http://joyofclojure.com/

About the Author

Howard Lewis Ship is the creator of the Apache Tapestry web framework and has been active in the Java community since 1998. He’s always been fascinated with sophisticated, elegant abstractions, which is one reason Clojure is so attractive to him. Howard is a frequent speaker on the NFJS tour, as well as JavaOne, ApacheCon, Devoxx, and other conferences. As an independent consultant, Howard specializes in Tapestry training, mentoring and project work (see http://howardlewisship.com for more details). He lives in Portland, OR with his wife Suzanne and son, Jacob.

In Clojure you have full reign to add new language features. Anytime you need to control the context of evaluation, including when or if an expression is evaluated, you can use macros. In effect, macros let you tie in to the normal compilation phase of the Clojure language, the equivalent of being able to customize the Java compiler. Clojure lets you own the language, extending it as you see fit, and do it right now … not in five years. With most of Clojure itself written in Clojure and extensible from within Clojure, you end up with a single language that can be precisely adapted to whatever needs you can define.

Many of those new to Clojure are concerned that the use of macros and other facilities can lead to write-only programs: programs whose inner workings can no longer be followed by anyone not intimately aware of how the macros work. To my eye, based on over a year of Clojure coding, this is not an issue … adding new language

Page 34: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical

34 | NoFluff JustStuff .com

Coming up in the April Issue

Venkat Subramaniam explains the ‘Execute Around Method Pattern’. This pattern is from the Smalltalk days and helps with, among other things, timely, deterministic cleanup of resources. In the article Venkat will explore how that pattern can be implemented and used in Java and other JVM languages.

Matthew McCulloughexplores the Hadoop framework. Hadoop is the divide-and-conquer MapReduce framework from Apache based on concepts from Google, and currently used to solve terabyte and petabyte data computations at Yahoo, Facebook and even the Large Hadron Collider at CERN.

Nathaniel Schuttaupdates us on modern Ajax and JavaScript tools of the trade. His April article will provide you with an overview of the modern JavaScript developer’s toolkit including IDEs that offer code complete, debugging beyond alerts, testing tools and utilities like JSLint.

Rohit Bhardwajcontinues his tutorial on the SOAP/REST web services testing tool soapUI. In part two of his article, Rohit will explore the advanced topics of load testing and web service simulation in soapUI. He will also look at few testing scenarios for login services and template driven testing.

Page 35: Elegance in - No Fluff Just Stuff · 2011-01-21 · » Technology Deep Dive There are no beginner sessions here. This is your opportunity to go beyond the basics and master critical