2001: bridging the gap between rss and java old school style

18
1 Enabling Live Newsfeeds using RSS, Servlets and Transformations Russell Castagnaro [email protected] Introduction Presenter ? Russell Castagnaro ? Chief Mentor ? 4Charity.com ? SyncTank Solutions, Inc ? [email protected] ? Experience

Upload: russell-castagnaro

Post on 23-Jan-2015

216 views

Category:

Technology


0 download

DESCRIPTION

Before things had really caught on with Atom, RSS etc. There were many people looking for ways to handle Syndicated content. This was a pretty successful talk that I ended up giving quite a bit.

TRANSCRIPT

Page 1: 2001: Bridging the Gap between RSS and Java Old School Style

1

Enabling LiveNewsfeeds using RSS,

Servlets and Transformations

Russell Castagnaro

[email protected]

IntroductionPresenter? Russell Castagnaro? Chief Mentor

? 4Charity.com ? SyncTank Solutions, Inc

? [email protected]? Experience

Page 2: 2001: Bridging the Gap between RSS and Java Old School Style

2

Introduction4Charity.com? Application Service Provider for the Non-

Profit industry? Pure Java development? Http://www.4charity.com? Locations:

? San Francisco,CA (HQ)? Honolulu, HI (Tech Team)

Goals?Leverage the Servlet 2.2 APIEmploy XML for data and configurationUse Resource Definition Format for content dataFormat XML using XSL TransformationEliminate hard-coding values

Page 3: 2001: Bridging the Gap between RSS and Java Old School Style

3

What’s the deal?

Newsfeeds are becoming a requirement for portal sites.Easy integration with existing web services is a key requirement?How can we avoid writing custom code for information providers?Can we avoid applets!!?

BackgroundIn 1999 I wrote an information portal application.Live newsfeeds seemed like a good ideaI wrote custom parsers and employed an open-source tool called CocoonEvery time the html changed, I had to recode!

Page 4: 2001: Bridging the Gap between RSS and Java Old School Style

4

Code ExampleNeeded different ‘ParsSpec’ for each content providerURLToXMLConsumer.javaSpaceProducer.javaThese worked great for 2 months...

‘ParseSpec’#HeadlineEntrycacheTime=6000HeadlineEntry=start=\n,end=<p>,attributes=Link,URL,Headline,Source,DateHeadlineEntry.Link=start=<a href=",end=">HeadlineEntry.Headline=start=">,end=</a>HeadlineEntry.Source=start=<font size="-1">,end=</font>#HeadlineEntry.Description=start=<br>,end=<br>HeadlineEntry.Date=start=- <i>,end=</i>HeadlineEntry.DTD="http://space.synctank.com/dtds/newsfeed.dtd "HeadlineEntry.Doctype=NewsfeedHeadlineEntry.URL=http://search.news.yahoo.com/search/news?p=space+aerospace&n=HeadlineEntry.QTY=10HeadlineEntry.XML=version="1.0"HeadlineEntry.Header=\

<?xml-stylesheet href="http://space.synctank.com/xsl/spacenews.xsl" type="text/xsl"?>\n\<?cocoon-process type="xslt"?>\n\<!-- ============================================================ -->\n\<!-- spacenews.xml -->\n\<!-- Simple XML file that uses the Newsfeed DTD. -->\n\<!-- Author: XML Loader Russell Castagnaro Thu Nov 18 22:59:07 HST 1999 ->\n\<!-- ============================================================ -->\n\

Page 5: 2001: Bridging the Gap between RSS and Java Old School Style

5

Java CodeURLToXMLProducer.xml and subclasses

Nice Features

All search providers content was converted to one XML document typeOnce the XML was created all search engines results were handled easily with XSLT

Page 6: 2001: Bridging the Gap between RSS and Java Old School Style

6

Document Type Definition<?xml version="1.0" encoding="US-ASCII" ?><!-- Newsfeed.dtd --><!-- Simple DTD that defines a grammar for news Feeds. --><!-- Author: Russell Castagnaro Nov 15 1999 --><!ELEMENT Newsfeed (HeadlineEntry)+><!ELEMENT HeadlineEntry (Link, Headline, Source, Description, Date)><!ELEMENT Link (#PCDATA)><!ELEMENT Headline (#PCDATA)><!ELEMENT Source (#PCDATA)><!ELEMENT Description (#PCDATA)><!ELEMENT Date (#PCDATA)>

NewsFeed Content (XML)<?xml version="1.0"?><?xml-stylesheet href="spacenews.xsl" type="text/xsl"?><?cocoon-process type="xslt"?><Newsfeed><HeadlineEntry><Link>http://dailynews.yahoo.com/h/ap/19991222/sc/space_shuttle_77.html</Link><Headline>Shuttle Astronauts Begin <b>Space</b>walk</Headline><Source>(Associated Press)</Source><Date>Dec 22 6:08 PM EST</Date></HeadlineEntry><HeadlineEntry><Link>http://biz.yahoo.com/rf/991222/xr.html</Link><Headline>RESEARCH ALERT - Boeing raised to buy</Headline><Source>(Reuters)</Source><Date>Dec 22 12:03 PM EST</Date></HeadlineEntry></Newsfeed>

Page 7: 2001: Bridging the Gap between RSS and Java Old School Style

7

Transforming the Newsfeed

Make the news feed human readable:? Create a Stylesheet using the XML

DOCTYPE rules? Transform the XML Document Using the XSL

Document

* Specifics on transformations coming soon!

The StyleSheet<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:output method="html" indent="no"/><xsl:template match="/">

<TABLE width="100%" cellpadding="0" cellspacing="0" border="0"><TR><TD bgcolor="#3366CC" align="left" valign="middle"><font face="helvetica, arial" size="2" color="#FFFFFF"><nobr><b>News</b></nobr></font></TD><TD align="right" bgcolor="#3366CC" valign="top" ><a href="/space/news/spacenews.xml">

<font face="helvetica, arial" size="1" color="#FFFFFF">View</font></a><IMG SRC="/space/images/spacer2.gif" BORDER="0" WIDTH="5" HEIGHT="2"/>

</TD></TR><TR><TD><font size="2" face="Arial, Helvetica, sans-serif"><b>Space and Aerospace News</b></font><BR/>

<xsl:apply-templates/> </TD></TR></TABLE>

</xsl:template><xsl:template match="HeadlineEntry">

<B><FONT face="helvetica, arial" size="1"><A HREF="{Link}"><xsl:value-of select="Headline"/></A></FONT></B> - <I><FONT size="-2" face="Arial, Helvetica, sans-serif"><xsl:value-of select="Source"/></FONT></I><BR/>

</xsl:template></xsl:stylesheet>

Page 8: 2001: Bridging the Gap between RSS and Java Old School Style

8

HTML Content

Then the Display Format Changed

Simple changes in the format from any site required significant changesChanging the parsing rules was not trivialEventually this became boring and tiresome

Page 9: 2001: Bridging the Gap between RSS and Java Old School Style

9

Interesting PointsI was not interested in manipulating XML documents within Java*I did not want to deal with DOM or SAXI was interested in displaying data in a clean, efficient mannerThe producer code I created was a bit embarrassing

*I was not lazy. I had a very full schedule at the time… . Sheesh!

Time Warp (Oct 2000)None of my parsing instructions still worked ?I had no interest in using the old code

There had to be a better wayI heard about O’reilly’s merkat project…

Page 10: 2001: Bridging the Gap between RSS and Java Old School Style

10

Enter RDF Site SummaryPreliminary format was v .91 from Netscape (remember them?)Resource Definition Format Summary (RSS .91) http://my.netscape.com/publish/formats/rss-0.91.dtd

Eliminates the need to parse through HTML for content.Standard - now WC3 has recommended version 1.0

RSS Example<?xml version="1.0" encoding="iso-8859-1"?><!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN""http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91"><channel>

<title> Space science news</title> <link>http://www.moreover.com</link><description>Space science news - news headlines from around the web, refreshed every 15

minutes</description> <language>en-us</language><image>

<title>moreover...</title> <url>http://i.moreover.com/pics/rss.gif</url><link>http://www.moreover.com</link> <width>144</width> <height>16</height><description>News headlines from more than 1,800 sources, harvested every 15 minutes...</description>

</image><item>

<title>NASA releases space station crew logs</title><link>http://c.moreover.com/click/here.pl?r16768175</link><description>floridatoday.com Mar 22 2001 12:20AM ET</description>

</item> <item><title>Tough love but support for space by George W. Bushs team</title><link>http://c.moreover.com/click/here.pl?r16768185</link><description>floridatoday.com Mar 22 2001 12:20AM ET</description>

</item></channel></rss>

C:\development\Castagnaro\space\space-moreover.xml

Page 11: 2001: Bridging the Gap between RSS and Java Old School Style

11

RSS Stylesheet Example<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="rss">

<Foo bar="{version}"> <xsl:apply-templates/> </Foo></xsl:template><xsl:template match="channel">

<TABLE width="100%" cellpadding="0" cellspacing="0" border="0"> <TR><TH align="left" bgcolor="#3366CC" valign="top" ><a alt="{description}" href="{link}"><font face="helvetica, arial" size="2" color="#FFFFFF"><nobr><xsl:value-of select="title"/></nobr></font></a></TH> </TR>

<xsl:apply-templates select="image"/> <xsl:apply-templates select="item"/>

</TABLE></xsl:template> <xsl:template match="image"><TR><TD align="right"><a href="{link}"><IMG SRC="{url}" BORDER="0" WIDTH="{width}" HEIGHT="{height}"/></a></TD></TR></xsl:template><xsl:template match="item">

<TR><TD colspan="2"><B><FONT face="helvetica, arial" size="1"><A HREF="{link}"><xsl:value-of select="title"/></A></FONT></B> - <I><FONT size="-2" face="Arial, Helvetica, sans-serif"><xsl:value-of select="description"/></FONT></I></TD></TR>

</xsl:template></xsl:stylesheet>

Newsfeed HTML

Page 12: 2001: Bridging the Gap between RSS and Java Old School Style

12

Access to RSS FeedsWhere do you find providers???Directory of open RSS providers:? http://www.superopendirectory.com/directory/4/standards/rss/sources

RSS Providers? 10.am

? http://10.am/search/-rss?search=<your term here>? List of topics: http://10.am/extra/ocsdirectory.xml

? echofactor? http://www.echofactor.com/feed_categories.html?format=RSS

? MoreOver? http://w.moreover.com/categories/category_list.html

Now we need to make this content readable!

Transforming XML to HTML

We have many options on performing XSL Transformations:? Depend on the client’s browser to transform the XML? Write a Servlet to handle the transformation? Use software that is widely available and standards based

Issues:? IE 5.x is one of the few browsers that support XSL

transformations? Publicly available software has many merits too? Servlets are easy enough. Transformations can be done in

< 10 lines

Page 13: 2001: Bridging the Gap between RSS and Java Old School Style

13

Transformation in a Servlet

public void service(HttpServletRequest req, HttpServletResponse res) throws IOException, ServletException {

PrintWriter out = res.getWriter(); res.setContentType("text/html");File xmlFile = new File(sourcePath, req.getParameter("XML")); File xslFile = new File(sourcePath, req.getParameter("XSL"));try {

XSLTProcessor processor = XSLTProcessorFactory.getProcessor();processor.process(new XSLTInputSource(new FileReader(xmlFile)),

new XSLTInputSource(new FileReader(xslFile)), new XSLTResultTarget(out));

} catch (Exception e) {out.println("Error: " + e.getMessage());

}out.flush();

}

One Problem We have to get the XML (RSS) file from the content provider!Use the networking classes to access the URLBe considerate of your provider!

Page 14: 2001: Bridging the Gap between RSS and Java Old School Style

14

New Code public void doGet(HttpServletRequest req, HttpServletResponse res) {try {

PrintWriter out = res.getWriter(); res.setContentType("text/html");URLConnection con; DataInputStream in;

URL url = new URL(sourceURL); con = url.openConnection();con.connect(); String type = null;in = new DataInputStream(con.getInputStream());FileReader fr = new FileReader(xslsrc);try {

XSLTProcessor processor = XSLTProcessorFactory.getProcessor();processor.process(new XSLTInputSource(in), new XSLTInputSource(fr),

new XSLTResultTarget(out)); } catch (Exception e) { log("Error: " + e.getMessage());} finally { in.close(); fr.close(); }out.flush();

} catch (Exception e) { …}

XSLT ModelRequest

Response

Servlet

URL LoadedXML

XSLTProcessor

XSLDocument

HTMLNewsFeed

Page 15: 2001: Bridging the Gap between RSS and Java Old School Style

15

Setting up your servletMost Appservers or Webservers support WAR’s and Deployment DescriptorsYou create a WebApp which has servlets, parameters and servlet mappings

Deployment Descriptor<web-app>

<servlet><servlet-name>newsServlet</servlet-name><servlet-class>com.synctank.http.servlets.RSSServlet</servlet-class><init-param><param-name>ERROR_URL</param-name><param-value>/error.jsp</param-value><description>The error page for this app.</description>

</init-param><init-param><param-name>SOURCE_SERVLET_URI</param-name><param-value>http://www.moreover.com/cgi-local/page?o=rss&c=Space%20science%20news</param-value><description>An absolute url that points to your XML</description>

</init-param>

Page 16: 2001: Bridging the Gap between RSS and Java Old School Style

16

Deployment Descriptor<init-param>

<param-name>STYLESHEET</param-name><param-value>/xsl/rss.xsl</param-value><description>The Stylesheet for presentation of the headlines. Should be a subdirectory of the war. The default is /xsl/rss.xsl </description>

</init-param><load-on-startup>0</load-on-startup></servlet><servlet-mapping><servlet-name>newsServlet</servlet-name><url-pattern>/newsy</url-pattern>

</servlet-mapping> <welcome-file-list><welcome-file>/foo/news.html</welcome-file>

</welcome-file-list><error-page><error-code>404</error-code><location>/error.jsp</location>

</error-page></web-app>

War directory structureRoot? WEB-INF

? Web.xml

? classes? com\synctank\http\servlets\RSSServlet.class

? xsl? rss.xsl

? docs? Index.html

? error.jsp

Page 17: 2001: Bridging the Gap between RSS and Java Old School Style

17

Moving Forward

RSS version 1.0 has been recommended by the w3c1.0 Uses has more flexibilityOnce more providers support

ReviewDon’t do the time!Leverage RSS and open content providersUse XSL to transform XML content to your format of choiceCache requests to content providers (keep them free!)

Page 18: 2001: Bridging the Gap between RSS and Java Old School Style

18

FinallyThanks for attendingSource Code Available? http://www.synctank.com/xmldevcon? [email protected]

Aloha