1 xml data management xlst werner nutt. a hello world! stylesheet world
TRANSCRIPT
1
XML Data Management
XLST
Werner Nutt
A Hello World! Stylesheet
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" encoding="utf-8" />
<xsl:template match="/">
<hello>world</hello>
</xsl:template>
</xsl:stylesheet>
Top-level: <xsl:stylesheet> elementwith a version="1.0" attribute
Declarations (all elements except the <xsl:template> ones),
in this case just an <xsl:output>
Template rulesin this case a template that applies to the root node
Invocation of an XSLT Stylesheet
An XSLT stylesheet may be invoked:• Programmatically, through an XSLT libraries• Through a command line interface.• In a Web Publishing context, by including a styling processing
instruction in the XML document
<?xml version="1.0"?><?xml-stylesheet href="blabla.xsl" type="application/xml"?>
<doc> <blublu/></doc>
– the transformation can be processed on the server side by a PHP, ASP, JSP, . . . Script
– or on the client side through the XSLT engines integrated to most browsers.
Web Publishing with XSLT
HTML HTML
XSLTStylesheet
XSLTStylesheet
Network
XML Document
HTMLHTML
XSLTStylesheet
XSLTStylesheet
Network
XML Document
Stylesheet Output
• method is either xml (default), html or text
• encoding is the desired encoding of the result
• doctype-public and doctype-system makes it possible to add a document type declaration in the resulting document
• indent specifies whether the resulting XML document will be indented (default is no)
<xsl:output method="html"
encoding="iso-8859-1"
doctype-public="-//W3C//DTD HTML 4.01//EN"
doctype-system="http://www.w3.org/TR/html4/strict.dtd"
indent="yes" />
Handling Whitespace
Both elements require a set of space-separated node tests
as their attribute.
• <xsl:strip-space> specifies the set of nodes whosewhitespace-only text child nodes will be removed
• <xsl:preserve-space> allows for exceptions to this list
<xsl:strip-space elements="*" />
<xsl:preserve-space elements="para poem" />
The <xsl:template> Element
A template consists of• A pattern: an XPath expression (restricted) which
determines the nodes to which the template applies. The pattern is the value of the “match” attribute.
• A body: an XML fragment (valid!) which is inserted in the output document when the template is instantiated
<xsl:template match="book"> The book title is: <xsl:value-of select="title" /> <h2>Authors list</h2> <ul> <xsl:apply-templates select="authors/name" /> </ul></xsl:template>
XPath Patterns in XSLT
The XPath expression of the “match” attribute describes the nodes
that can be the target of a template instantiation.
Those expressions are called patterns and must satisfy:
• A pattern always denotes a node set.
Example: <xsl:template match=’1’> is incorrect.
• Checking whether a node is matched must be easy
Example: <xsl:template match=’preceding::*[12]’> is meaningful, but difficult to evaluate.
Pattern syntax:
• A pattern is a valid XPath expression which uses only the child and @ axes, and the abbreviation //. Predicates are allowed.
Content of a Template Body
• Literal elements and text
Example: <h2>Authors</h2> . Creates in the output document an element h2,
with a text child node ’Authors’.
• Values and elements from the input document
Example: <xsl:value-of select=’title’/> ). Inserts in the output document a node set,
result of the XPath expression title.
• Call to other templates
Example: <xsl:apply-templates select=’authors’/>. Applies a template to each node in the node set result
of the XPath expression authors.
Only the basics of XSLT programming! Many advanced features (modes, priorities, loops and tests) beyond this core description
Instantiation of an <xsl:template>
Main principles:• Literal elements (those that don’t belong to the XSLT
namespace) and text are simply copied to the output document.
• Context node: A template is always instantiated in the context of a node from the input document.
• XPath expressions: all the (relative) XPath expression found in the template are evaluated with respect to the context node (an exception: <xsl:for-each> ).
• Calls with <xsl:apply-templates>: find and instantiate a template for each node selected by the XPath expression select.
• Template call substitution: any call to other templates is eventually replaced by the instantiation of these templates.
The <xsl:apply-templates> Element
• select: an XPath expression which, if relative, is interpreted with respect to the context node. Note: the default value is child::node(), i.e., select all the children of the context node
• mode: a label which can be used to specify which kind of template is required.
• priority: gives a priority level in case of conflict.
<xsl:apply-templates select="authors/name" mode="top" priority="1"/>
The <xsl:apply-templates> Mechanism
<xsl:template match="book"> <ul> <xsl:apply-templates select="authors/name" /> </ul></xsl:template>
<xsl:template match="name"> <li><xsl:value-of select="." /></li></xsl:template>
<book>... <authors> <name>Jim</name> <name>Joe</name> </authors></book>
<ul> <li>Jim</li> <li>Joe</li></ul>
The Execution Model of XSLT
An XSLT stylesheet consists of a set of templates.
The transformation of an input document I proceeds as follows:
1. The engine considers the root node r of I, and selects the template that applies to r.
2. The template body is copied in the output document O.
3. Next, the engine considers all the <xsl:apply-templates> in O, and evaluates the match XPath expression,
taking r as context node.
4. For each node result of the XPath evaluation, a template is selected, and its body replaces the <xsl:apply-templates> call.
5. The process iterates, as new <xsl:apply-templates> are inserted in O.
6. The transformation terminates when O is free of xsl: instructions.
The Execution Model: Illustration
The Execution Model: Illustration
The Execution Model: Illustration
The Execution Model: Illustration
The Execution Model: Illustration
The Execution Model: Illustration
“Return all Titles of Recipes”
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="iso-8859-1" indent="yes"/>
<xsl:template match="/"> <titles> <xsl:for-each select="//title"> <xsl:copy-of select="./self::*"/> </xsl:for-each> </titles></xsl:template>
</xsl:stylesheet>
Copy each title
Iterate over alltitle elements
A Variant That Does Not Copy the Content
…<xsl:template match="/"> <titles> <xsl:for-each select="//title"> <xsl:copy/> </xsl:for-each> </titles></xsl:template>…
• copy-of: produces a deep copy, i.e., copies a subtree
• copy: produces a shallow copy, i.e., copies one node
(plus optionally namespace info for elements)
Identity Map
Deep copy
Shallow copy
<xsl:template match="/"> <xsl:copy-of select="*"/></xsl:template>
<xsl:template match=" node() | @*"> <xsl:copy> <xsl:apply-templates select="node() | @*"/> </xsl:copy></xsl:template>
Shallow Copy, with Explicit Text Copy
<xsl:template match="/">
<titles>
<xsl:for-each select="//title">
<xsl:copy>
<xsl:for-each select="text()">
<xsl:copy/>
</xsl:for-each>
</xsl:copy>
</xsl:for-each>
</titles>
</xsl:template>
Shallow copyof title
Shallow copyof title content
Element Construction
<xsl:template match="/">
<xsl:element name="titles">
<xsl:for-each select="//title">
<xsl:copy>
<xsl:for-each select="text()">
<xsl:copy/>
</xsl:for-each>
</xsl:copy>
</xsl:for-each>
</xsl:element>
</xsl:template>
Dynamic element constructor
Copying Attributes
<xsl:template match="/">
<xsl:element name="recipe-ingredients">
<xsl:for-each select="//recipe[1]//ingredient/@name">
<xsl:element name="ingredient">
<xsl:copy/>
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:template>
Copying attributes leads to element fusion
Constructing Attributes
<xsl:template match="/">
<xsl:element name="recipe-ingredients">
<xsl:for-each select="//recipe[1]//ingredient/@name">
<xsl:element name="ingredient">
<xsl:attribute name="name" select="."/>
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:template>
Dynamic attribute constructor
• It’s called “dynamic” because element and attribute namescan be computed on the fly
Computing Attribute Values
<xsl:template match="/">
<xsl:element name="recipe-ingredients">
<xsl:for-each select="//recipe[1]//ingredient/@name">
<ingredient name="{self::*}"/>
</xsl:for-each>
</xsl:element>
</xsl:template>
Expressions in {braces}will be evaluated
Nested Iteration with <xsl:for-each>
<xsl:template match="/">
<my-recipes>
<xsl:for-each select=".//recipe">
<my-recipe title="{title}">
<xsl:for-each select="ingredient">
<my-ingredient>
<xsl:value-of select="@name"/>
</my-ingredient>
</xsl:for-each>
</my-recipe>
</xsl:for-each>
</my-recipes>
</xsl:template>
Turn
• recipe titles into attributes
• ingredient names into strings
As Before, with 2 Levels of Ingredients
<xsl:template match="/"> <my-recipes> <xsl:for-each select=".//recipe"> <my-recipe title="{title}"> <xsl:for-each select="ingredient"> <my-ingredient> <xsl:value-of select="@name"/> <xsl:for-each select="ingredient"> <my-ingredient> <xsl:value-of select="@name"/> </my-ingredient> </xsl:for-each> </my-ingredient> </xsl:for-each> </my-recipe> </xsl:for-each> </my-recipes> </xsl:template>
Level 1
Level 2
Nested Iteration with Template Calls
<xsl:template match="/">
<xsl:apply-templates select="recipes"/>
</xsl:template>
<xsl:template match="recipes">
<my-recipes>
<xsl:apply-templates select="recipe"/>
</my-recipes>
</xsl:template> • Root calls recipes• Recipes calls recipe
Nested Iteration with Template Calls (cntd)
<xsl:template match="recipe"> <my-recipe title="{title}"> <xsl:apply-templates select="ingredient"/> </my-recipe></xsl:template>
<xsl:template match="ingredient"> <ingredient> <name> <xsl:value-of select="@name"/> </name> <xsl:apply-templates select="ingredient"/> </ingredient></xsl:template>
• Recipe calls ingredient• Ingredient calls ingredient
Sorted List of All Ingredients
<xsl:template match="/"> <result> <xsl:apply-templates select="/recipes"/> </result></xsl:template>
<xsl:template match="recipes"> <xsl:for-each select="recipe//ingredient"> <xsl:sort select="@name" /> <ingredient> <xsl:value-of select="@name"/> </ingredient> </xsl:for-each></xsl:template> <xsl:sort> can be nested
into <xsl:for-each>
Sorting: Does This Work?
<xsl:template match="/"> <result> <xsl:apply-templates select="//ingredient"/> </result></xsl:template>
<xsl:template match="ingredient"> <xsl:for-each select="."> <xsl:sort select="@name" /> <ingredient> <xsl:value-of select="@name"/> </ingredient> </xsl:for-each></xsl:template>
We want to sort all ingredients by name
Exercise: Restructuring Recipes
Return a list, inside an element <recipes>, of recipes, containing for every recipe the recipe’s title element and an element with the number of calories. Use different approaches:(a) express iteration by recursive calls of templates (b) express iteration by <xsl:for-each> elements.Create new elements (a) by explicit construction, that is by writing the tags into the
code,(b) by dynamic construction, that is, by using <xsl:element>
and <xsl:attribute> elements,(c) by shallow and deep copying, wherever the latter is
possible.
Ordered Output
Using iteration by recursion, return a similar list,
alphabetically ordered according to title.
Using iteration by means of <xsl:for-each>,
return a similar list, ordered according to calories
in descending order.
Element and Attribute Construction
Return a similar list, with title as attribute and
calories as content.
Return a list, inside an element <recipes>, of recipes,
where each recipe contains the title and the top level
ingredients, while dropping the lower level ingredients.
XSLT for Recipes (1/6)
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" encoding="iso-8859-1" indent="yes"/>
<xsl:template match="/"> <html> <head> <title>My Best Recipes</title> </head> <body> <table border="1"> <xsl:apply-templates select="recipes/recipe"/>
</table> </body> </html></xsl:template>
XSLT for Recipes (2/6)
<xsl:template match="recipe">
<tr>
<td>
<h1><xsl:value-of select="title"/></h1>
<ul><xsl:apply-templates select="ingredient"/></ul>
<xsl:apply-templates select="preparation"/>
<xsl:apply-templates select="comment"/>
<xsl:apply-templates select="nutrition"/>
</td>
</tr>
</xsl:template>
XSLT for Recipes (3/6)
<xsl:template match="ingredient"> <xsl:choose> <xsl:when test="@amount"> <li> <xsl:if test="@amount!='*'"> <xsl:value-of select="@amount"/> <xsl:text> </xsl:text> <xsl:if test="@unit"> <xsl:value-of select="@unit"/> <xsl:if test="number(@amount)>number(1)"> <xsl:text>s</xsl:text> </xsl:if> <xsl:text> of </xsl:text> </xsl:if> </xsl:if> <xsl:text> </xsl:text> <xsl:value-of select="@name"/> </li> </xsl:when>
XSLT for Recipes (4/6)
<xsl:otherwise>
<li><xsl:value-of select="@name"/></li>
<ul><xsl:apply-templates select="ingredient"/></ul>
<xsl:apply-templates select="preparation"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
XSLT for Recipes (5/6)
<xsl:template match="preparation"> <ol><xsl:apply-templates select="step"/></ol></xsl:template>
<xsl:template match="step"> <li><xsl:value-of select="node()"/></li></xsl:template>
<xsl:template match="comment"> <ul> <li type="square">
<xsl:value-of select="node()"/> </li> </ul></xsl:template>
XSLT for Recipes (6/6)<xsl:template match="nutrition"> <table border="2"> <tr>
<th>Calories</th><th>Fat</th><th>Carbohydrates</th><th>Protein</th>
<xsl:if test="@alcohol"><th>Alcohol</th></xsl:if> </tr> <tr> <td align="right"><xsl:value-of select="@calories"/></td> <td align="right"><xsl:value-of select="@fat"/></td> <td align="right"><xsl:value-of
select="@carbohydrates"/></td> <td align="right"><xsl:value-of
select="@protein"/></td> <xsl:if test="@alcohol"> <td align="right"><xsl:value-of select="@alcohol"/></td> </xsl:if> </tr> </table></xsl:template>
A Different View
<xsl:template match="/"> <nutrition> <xsl:apply-templates select="recipes/recipe"/> </nutrition></xsl:template>
<xsl:template match="recipe"><dish name="{title/text()}" calories="{nutrition/@calories}" fat="{nutrition/@fat}" carbohydrates="{nutrition/@carbohydrates}" protein="{nutrition/@protein}" alcohol="{if (nutrition/@alcohol) then nutrition/@alcohol else '0%'}"/></xsl:template>
The Output
<?xml version="1.0" encoding="iso-8859-1"?><nutrition> <dish name="Beef Parmesan with Garlic Angel Hair Pasta"
calories="1167" fat="23" carbohydrates="45" protein="32" alcohol="0%"/>
<dish name="Ricotta Pie" calories="349" fat="18" carbohydrates="64" protein="18" alcohol="0%"/> <dish name="Linguine alla Pescatora" calories="532" fat="12" carbohydrates="59" protein="29" alcohol="0%"/> <dish name="Zuppa Inglese" calories="612" fat="49" carbohydrates="45" protein="4" alcohol="2"/> <dish name="Cailles en Sarcophages" calories="1892" fat="33" carbohydrates="28" protein="39" alcohol="0%"/></nutrition>
A Sorted List of Ingredients w/o Duplicates
<xsl:template match="recipes"> <xsl:for-each select="recipe//ingredient"> <xsl:sort select="@name" /> <xsl:if test=
"not(@name = preceding::ingredient/@name)" <ingredient> <xsl:copy-of select="@name"/> </ingredient> </xsl:if> </xsl:for-each></xsl:template>
We ensure that only ingredients are outputthat have not appeared before
This test for duplicates can be expensive!
Duplicate Eliminations à la Muench *
Step 1: Construct keys (= indices for node sets)
name: name of the key
match: the node set to be indexed
use: the index key values
<xsl:key name="ingredients-by-name" match="ingredient" use="@name"/>
<xsl:key name="recipes-by-title" match="recipe“ use="title"/>
* Invented by Steve Muench, called the “Muenchian Method” in the XSLT world
<xsl:key> is a top-level element that declares a named key that can be used in the style sheet with the key() function.Note: A key does not have to be unique!
Duplicate Eliminations à la Muench
Step 2: Iterate over the recipes …
<xsl:template match="/"> <result> <xsl:apply-templates select="/recipes"/> </result></xsl:template>
Duplicate Elimination à la Muench
<xsl:template match="recipes"> <xsl:for-each select="recipe//ingredient [count(. | key('ingredients-by-name',
@name)[1]) = 1]"> <xsl:sort select="@name" /> <ingredient> <xsl:copy-of select="@name"/> <xsl:for-each select="key('recipes-by-title',
ancestor::recipe/title)"> <xsl:copy> <xsl:copy-of select="title"/> </xsl:copy> </xsl:for-each> </ingredient> </xsl:for-each></xsl:template>
select those ingredients that occur as the first element in their index groupthe others are redundant …
sort ingredients by name,then retrieve recipesfrom their index
Grouping in XSLT 2.0
<xsl:template match="/">
<uses>
<xsl:for-each-group select="//ingredient"
group-by="@name">
<xsl:sort select="@name"/>
<use name="{current-grouping-key()}"
count="{count(current-group())}"/>
</xsl:for-each-group>
</uses>
</xsl:template>
countries.dtd
<!ELEMENT countries (country*)><!ELEMENT country (city*, language*)><!ATTLIST country name CDATA #REQUIRED population CDATA #REQUIRED area CDATA #REQUIRED><!ELEMENT city (name, population)><!ELEMENT language (#PCDATA)><!ATTLIST language percentage CDATA #REQUIRED><!ELEMENT name (#PCDATA)><!ELEMENT population (#PCDATA)>
Queries About Countries: Example 1
Restructure the document by• listing countries according to population,• cities within each country according to population, and• languages within each country according to percentage.
Restructuring the Countries Document (1)
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:output method="xml" version="1.0" encoding="iso-8859-1" indent="yes"/>
<xsl:template match="/"> <xsl:apply-templates select="countries"/></xsl:template>
<xsl:template match="countries"> <xsl:copy> <xsl:apply-templates select="country"> <xsl:sort select="@population" order="descending"
data-type="number"/> </xsl:apply-templates> </xsl:copy></xsl:template>
Restructuring the Countries Document (2)
<xsl:template match="country"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:apply-templates select="city"> <xsl:sort select="population"
order="descending" data-type="number"/>
</xsl:apply-templates> <xsl:apply-templates select="language"> <xsl:sort select="@percentage"
order="descending" data-type="number"/>
</xsl:apply-templates> </xsl:copy></xsl:template>
Restructuring the Countries Document (3)
<xsl:template match="city">
<xsl:copy-of select="."/>
</xsl:template>
<xsl:template match="language">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
Compare the XQuery
let $doc := doc("countries.xml")let $cs := $doc//countryreturnelement countries{for $c in $cs order by number($c/@population) descending return element country {$c/@*, for $city in $c/city order by number($city/population) descending return $city, for $l in $c/language order by number($l/@percentage) descending return $l }}
Languages Spoken in Countries
Return a list of language elements, alphabetically sorted,
where each language element contains – a list of country elements, – such that the language is spoken in the country, – together with the number of speakers of the language
in that country.
Difficulties:• eliminate duplicates among languages• retrieve the countries where the language is spoken how can we remember that language?
Languages Spoken in Countries: XQuery
let $doc := doc("countries.xml")let $ls := distinct-values($doc//language)let $cs := $doc//countryreturn<languages>{for $l in $ls order by $l return <language> {attribute name {$l}} {for $c in $cs[language=$l] order by $c/@name return <country> {$c/@name} {attribute speakers {xs:int(($c/@population div 100)
* $c/language[.=$l]/@percentage) }} </country>} </language>}</languages>
languageis rememberedin variable $l
Languages Spoken in Countries: XSLT
<xsl:template name="top" match="/">
<xsl:element name="languages">
<xsl:apply-templates select=".//language">
<xsl:sort select="text()"
order="ascending"
data-type="text"/>
</xsl:apply-templates>
</xsl:element>
</xsl:template>
named template
Languages Spoken in Countries: XSLT
<xsl:template name="language" match="language">
<xsl:if test="not(text() = preceding::language)">
<xsl:copy>
<xsl:attribute name="name" select="text()"/>
<xsl:call-template name="country-with-language">
<xsl:with-param name="language" select="."/>
</xsl:call-template>
</xsl:copy>
</xsl:if>
</xsl:template>
why not " not( . = preceding::language)"
calling a named template
adding a parameterto the call
eliminate duplicates
Languages Spoken in Countries: XSLT
<xsl:template name="country-with-language">
<xsl:param name="language"/>
<xsl:for-each select="//country[language=$language]">
<xsl:copy>
<xsl:copy-of select="@name"/>
<xsl:attribute name="speakers" select="format-number(
(@population div 100) * $language/@percentage,'0')"/>
</xsl:copy>
</xsl:for-each>
</xsl:template>
context node =context node at call
(no matching)makes the parameterof the call available $language is a
reference to the paramenter
with format-number we can specify number formats,
e.g., ‘0’ indicates digit notation