processing of structured documents. transforming xml
TRANSCRIPT
Extensible Stylesheet Language (XSL)
a language for transforming XML documents: XSLT
an XML vocabulary for specifying the formatting of XML documents
XSLT
specifies the conversion of a document from one format to another
XSLT transformation (stylesheet) is a valid XML document
based on hierarchical tree structure a transformation describes rules for transforming
a source tree into a result tree a rule: a template with a pattern
a pattern is matched against elements in the source treea template is instantiated to create part of the result tree
Processing model
A list of source nodes is processed to create a result tree fragment
the result tree is constructed by processing a list containing just the root node
a list of source nodes is processed by appending the result tree structure created by processing each of the members of the list in order
Processing model
A node is processed by finding all the template rules with patterns that match the node, and choosing the best amongst them
the chosen rule’s template is then instantiated with the node as the current node and with the list of source nodes as the current node list
A template typically contains instructions that select an additional list of source nodes (e.g. children) for processing
processing continues until no new source nodes
XSL stylesheet is an XML document
must be well-formedmust contain an XML declarationmust declare all the namespaces it usesthe XSL namespace (prefix xsl:) defines
elements that are needed for performing transformations
Skeleton XSL stylesheet
<?xml version=”1.0” ?>
<xsl:stylesheet
xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”
version=”1.0”>
...
</xsl:stylesheet>
Printing all the text data:
<?xml version=”1.0” ?>
<xsl:stylesheet xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”version=”1.0”>
<xsl:template match=”/”> <xsl:apply-templates /></xsl:template>
</xsl:stylesheet>
Template rules
A template rule is specified with the xsl:template element
attribute match: a pattern that identifies the source node or nodes to which the rule applies
the content is the template that is instantiated when the template rule is instantiated<xsl:template match=”[XPath expression]”>
<!-- content -->
</xsl:template>
Example
In XML document:
This is an <emph>important</emph> point.
The following template rule matches emph elements and produces a fo:inline-sequence formatting object with a font-weight property of bold.
<xsl:template match=”emph”>
<fo:inline-sequence font-weight=”bold”>
<xsl:apply-templates/>
</fo:inline-sequence>
<xsl:template>
Applying template rules
recursively processing the children of the current source element
element xsl:apply-templatesattribute select
in the absence of select attribute, the xsl:apply-templates instruction processes all of the children of the current node, including text nodes
a select attribute is used to process nodes selected by an expression (that returns a node set)
the selected set of nodes is processed in document order
Examples
<xsl:template match=”author-group”>
<fo:inline-sequence>
<xsl:apply-templates select=”author”/>
</fo:inline-sequence>
</xsl:template>
<xsl:template match=”author-group”>
<fo:inline-sequence>
<xsl:apply-templates select=”author/given-name”/>
</fo:inline-sequence>
</xsl:template>
Example
Processing of all of the heading descendant elements of the book element:
<xsl:template match=”book”>
<fo:block>
<xsl:apply-templates select=”.//heading”/>
</fo:block>
</xsl:template>
Example
Assume: a department element has a dname child and employee descendants
the rule finds an employee’s department and then processes the dname child of the department
<xsl:template match=”employee”>
<fo:block>
Employee <xsl:apply-templates select=”name”/> belongs to
department <xsl:apply-templates select=”ancestor::department/dname”/>.
</fo:block>
</xsl:template>
Built-in template rules
There are built-in template rules to allow recursive processing to continue in the absence of a successful pattern match by an explicit template rule in the stylesheet
<xsl:template match=”* | /”><xsl:apply-templates/>
</xsl:template>
<xsl:template match=”text() | @*”> <xsl:value-of select=”.”/></xsl:template>
Named templates
Templates can be invoked by namean xsl:template element with a name
attribute specifies a named template if an xsl:template element has a name
attribute, it may also have a match attributean xsl:call-template element invokes a
template by name (using name attribute)xsl:call-template does not change the current
node or the current node list (unlike xsl:apply-templates)
Creating content
Literal result elementscreating elements with xsl:elementcreating attributes with xsl:attribute and
named attribute sets with xsl:attribute-set
creating text, PIs and commentscopyingcomputing generated textnumbering
Literal result elements
In a template, an element that does not belong to the XSLT namespace (~is not an XSL instruction) is instantiated to create an element node with the same name
the content of the element is a template, which is instantiated to give the content of the created element node
the created element node will have the attribute nodes that were present on the element node in the stylesheet tree
Example: Generating HTML
<xsl:template match=”book”> <html> <head> <title>Here is my HTML page!</title> </head> <body> <xsl:apply-templates /> </body> </html></xsl:template>
Creating elements with xsl:element
The xsl:element element allows an element to be created with a computed name
the name of the element to be created is specified by a required name attribute and an optional namespace attribute
the content of the xsl:element element is a template for the attributes and children of the created element
the name attribute is interpreted as an attribute value template
Creating attributes with xsl:attribute
The xsl:attribute element can be used to add attributes to result elements whether created by literal result elements in the stylesheet or by instructions such as xsl:element
the name of the attribute to be created is specified by a required name attribute and an optional namespace attribute
instantiating an xsl:attribute element adds an attribute node to the containing result element node;
Creating attributes with xsl:attribute
the content of the xsl:attribute element is a template for the value of the created attribute
the name attribute is interpreted as an attribute value template
adding an attribute to an element replaces any existing attribute of that element with the same name
an attribute has to be added before the children
Example
<xsl:element name=”myElement”> <xsl:attribute name=”myAttribute”> XML </xsl:attribute> is great!</xsl:element>
Produces: <myElement myAttribute=”XML”>is great!</myElement>
Named attribute sets
The xsl:attribute-set element defines a named set of attributes
the name attribute specifies the name of the attribute set
the content of the xsl:attribute-set element consists of zero or more xsl:attribute elements that specify the attributes in the set
attribute sets are used by specifying a use-attribute-sets attribute on xsl:element, xsl:copy or xsl:attribute-set elements
Named attribute sets
The value of the use-attribute-sets attribute is a whitespace-separated list of names of attribute sets
attribute sets can also be used by specifying an xsl:use-attribute-sets attribute on a literal result element order of adding attributes: 1. Attribute sets, 2.
Attributes specified on the literal result element, 3. Any attributes specified by xsl:attribute elements
later ones override the earlier ones
Creating text
Creating text a template can also contain textnodes each text node will create a text node with the
same string-value in the result tree adjacent text nodes are automatically merged literal data characters may also be wrapped in
an xsl:text element (may change whitespace handling)
Creating PIs
The xsl:processing-instruction element is instantiated to create a processing instruction node
the name attribute specifies the name of the processing instruction node
<xsl:processing-instruction name=”xml-stylesheet”>
href=”book.css” type=”text/css”</processing-instruction>
creates: <?xml-stylesheet href=”book.css” type=”text/css”?>
Creating comments
The xsl:comment element is instantiated to create a comment node in the result tree
<xsl:comment>This file is automatically generated. Do not edit!</xsl:comment>
creates:
<!--This file is automatically generated. Do not edit!-->
Copying
The xsl:copy element provides an easy way of copying the current node
attributes and children are not automatically copied the content of the xsl:copy element is a template
for the attributes and children of the created node
Example
Copying the language attributes for each element
use (instead of <xsl:apply-templates/>): <xsl:call-template name=”apply-templates-copy-lang”/>
<xsl:template name=”apply-templates-copy-lang”>
<xsl:for-each select=”@xml:lang”>
<xsl:copy/>
</xsl:for-each>
<xsl:apply-templates/>
<xsl:template>
xsl:copy-of
The xsl:copy-of element can be used to insert a result tree fragment into the result tree, without first converting it to a string (as xsl:value-of does)
the required select attribute contains an expression
when the result of evaluating the expression is a result tree fragment, the complete fragment is copied
into the result tree node set, all the nodes are copied (with children)
Copying parts without transforming
sometimes a part should be passed as such, without any transformation
assume: copyright contains some HTML formatting:
<xsl:template match=”copyright”> <xsl:copy-of select=”*” /></xsl:template>
Computing generated text
Within a template, the xsl:value-of element can be used to compute generated text e.g. by extracting text from the source tree or by
inserting the value of a variable the xsl:value-of element is instantiated to create
a text node in the result tree
the required select attribute is an expression the expression is evaluated and the resulting
object is converted to a string
ExampleAssume: a person element with given-name
and family-name attributescreate an HTML paragraph
the value of the given-name attribute, a space, the value of the family-name attribute (for current node)
<xsl:template match=”person”>
<p>
<xsl:value-of select=”@given-name”/>
<xsl:text> </xsl:text>
<xsl:value-of select =”@family-name”/>
</p> </xsl:template>
Examples
<xsl:value-of select=”.”/> output the string-value of the current node
<xsl:value-of select=”title”/> output the string-value of the first child title element
of the current node
<xsl:value-of select=”sum(@*)”/> ouput the sum of the values of the attributes of the
current node, converted to a string
<xsl:value-of select=”$x”/> output the value of the variable $x, converted to a
string
Attribute value templates
In an attribute value that is interpreted as an attribute value template, such as an attribute of a literal result element, an expression can be used by surrounding the expression with curly braces ({})
Example
<xsl:variable name=”img-dir”>/images</xsl:variable>
<xsl:template match=”photograph”> <img src=”{$img-dir}/{href}” width=”{size/@width}”/><xsl:template>
XML document: <photograph> <href>headquarters.jpg</href> <size width=”300”> </photograph>
result: <img src=”/images/headquarters.jpg” width=”300”/>
Numbering
The xsl:number element is used to insert a formatted number into the result tree
the number to be inserted may be specified by an expression
the value attribute contains an expression the expression is evaluated and the resulting object
is converted to a number the number is rounded and converted to a string if no value attribute is specified, the number based
on the position of the current node is inserted
Example: numbering a sorted list
<xsl:template match=”items>
<xsl:for-each select=”item”>
<xsl:sort select=”.”>
<p>
<xsl:number value=”position()” format=”1. ”/>
<xsl:value-of select=”.”/>
</p>
</xsl:for-each>
</xsl:template>
Numbering by position
The xsl:number element has the following attributes level: specifies what levels of the source tree should
be considered; has values single, multiple, or any count: is a pattern that specifies what nodes should
be counted at those levelsif not specified, it defaults to the pattern that matches any
node with the same node type as the current node, and if the current node has a name, with the same name as the current node
from: is a pattern that specifies where counting starts
Example
Assume: a document contains a sequence of chapters followed by a sequence of appendixes both chapters and appendixes contain sections,
which in turn contain subsections
the following rules would number title elementsnumbering:
chapters: 1,2,3,… appendixes: A,B,C,… sections in chapters: 1.1, 1.2, 1.3, … sections in appendixes: A.1, A.2, A.3, ...
Example
<xsl:template match=”title”> <fo:block>
<xsl:number level=”multiple” count=”chapter|section|subsection” format=”1.1 ”/>
<xsl:apply-templates> <fo:block><xsl:/template>
Example
<xsl:template match=”appendix//title” priority=”1”> <fo:block>
<xsl:number level=”multiple” count=”appendix|section|subsection” format=”A.1 ”/>
<xsl:apply-templates> <fo:block><xsl:/template>
Number to string conversion attributes
format: tokens with separators the default value is 1 any token where the last character has a decimal
digit value of 11: 1 2 3 … 01: 01 02 03 … 09 10 11
A: A B C … Z AA AB AC … a: a b c … z aa ab ac … i: i ii iii iv v vi … I: I II III IV V VI …
separators: e.g. A.1 if more numbers than format tokens, the last format
token is used to format remaining numbers
Number to string conversion attributes
format grouping-separator: grouping (e.g.
thousands) separator in decimal numbering sequences
grouping-size: the size (normally 3) of the grouping
e.g. grouping-separator=”,” and grouping-size=”3”numbers of the form 1,000,000
Repetition
When the result has a known regular structure, it is useful to be able to specify directly the template for selected nodes
the xsl:for-each instruction contains a template, which is instantiated for each node selected by the expression specified by the select attribute
the expression must evaluate to a node-set the template is instantiated with the selected
node as the current node, and with the list of all of the selected nodes as the current node list
Example: XML document
<customers> <customer> <name>…</name> <order>…</order> <order>…</order> </customer> <customer> <name>…</name> <order>…</order> <order>…</order> </customer></customers>
Create HTML document containing a table with a row for each customer element
<xsl:template match=”/”> <html><head><title>Customers</title></head> <body> <table><tbody> <xsl:for-each select=”customers/customer”> <tr><th><xsl:apply-templates select=”name”/></th> <xsl:for-each select=”order”> <td><xsl:apply-templates/></td> </xsl:for-each> </tr> </xsl:for-each></tbody></table> </body></html></xsl:template>
Conditional processing
Two instructions support conditional processing xsl:if (if-then conditionality) xsl:choose (choice from several alternatives)
xsl:if has a test attribute, which specifies an expression
example: comma follows, if not last in the list
<xsl:template match=”namelist/name”> <xsl:apply-templates/> <xsl:if test=”not(position()=last())”>, </xsl:if></xsl:template>
Example
The following colors every other table row yellow:
<xsl:template match=”item”> <tr> <xsl:if test=”position() mod 2 = 0”> <xsl:attribute name=”bgcolor”>yellow</xsl:attribute> </xsl:if> <xsl:apply-templates/> </tr></xsl:template>
Conditional processingxsl:choose element selects one among a
number of possible alternativesconsists of a sequence of xsl:when elements
followed by an optional xsl:otherwise elementeach xsl:when element has a single attribute,
test, which specifies an expressioneach of the xsl:when elements is tested in turn the content of the first, and only the first,
element whose test is true, is instantiated if no test is true, xsl:otherwise is instantiated
Example
<xsl:for-each select=”chapter”> <xsl:choose> <xsl:when test=”@focus=’Java’”> <li><xsl:value-of select=”title” /> (Java Focus) </li> </xsl:when> <xsl:when test=”@focus=’JavaScript’”> <li><xsl:value-of select=”title”/> (JavaScript Focus) </li> </xsl:when> <xsl:otherwise> <li><xsl:value-of select=”title” /> (XML Focus)</li> </xsl:otherwise> </xsl:choose><xsl:for-each>
Sorting
Sorting is specified by adding xsl:sort elements as children of an xsl:apply-templates or xsl:for-each element
the first xsl:sort child specifies the primary sort key, the second xsl:sort child specifies the second sort key, and so on
nodes are sorted according to the sort keys, and then processed in sorted order
xsl:sort has a select attribute default is . (string-value of the current node as a key)
Sorting
xsl:sort has optional attributes order: ascending (default) or descending lang: language of the sort keys data-type: the data type of the strings
text: sort keys should be sorted lexicographicallynumber: sort keys should be converted to numbers
and then sorted according to the numeric valueother values may be provided later (from XML
Schemas)
case-order: upper-first or lower-firstdefault is language dependent
Example
<employees> <employee> <name> <given>James</given> <family>Clark</family> </name> … </employee></employees>
Example: list of employees sorted by name
<xsl:template match=”employees”> <ul> <xsl:apply-templates select=”employee”> <xsl:sort select=”name/family” /> <xsl:sort select=”name/given” /> </xsl:apply-templates> </ul></xsl:template>
<xsl:template match=”employee”> <li> <xsl:value-of select=”name/given” /> <xsl:text> </xsl:text> <xsl:value-of select=”name/family” /> </li></xsl:template>
Variables and parametersA variable is a name that may be bound to a valuethe value of the variable can be an object of any of
the types that can be returned by expressionstwo elements: xsl:variable and xsl:paramxsl:param : the value specified on the xsl:param
variable is only a default value for the binding when the template or the stylesheet within which the
xsl:param element occurs is invoked, parameters may be passed that are used instead of the defaults
Variables and parameters
Both xsl:variable and xsl:param have a required name attribute: name of the variable
for any use of xsl:variable and xsl:param, there is a region of the stylesheet tree within which the binding is visible within this region, any binding of the variable that
was visible on the variable-binding element itself is hidden -> only the innermost binding is visible
Values of variables and parameters
A variable-binding element can specify the value of the variable in three alternative ways if the element has a select attribute:
the value of the select attribute must be an expressionthe value of the variable is the object resulting from evaluation of
the expressioncontent of the variable-binding element has to be empty
if the element does not have a select attribute and the content is non-tempty:
the content of the element specifies the valuethe content is a template, which is instantiated to give the valuethe value is a result tree fragment
otherwise: the value is an empty string
Top-level variables and parameters
Both xsl:variable and xsl:param are allowed as top-level elements
a top-level variable-binding element declares a global variable that is visible everywhere
a top-level xsl:param declares a parameter to the stylesheet XSLT does not specify how the parameters are
passed to the stylesheet
context for expressions for specifying the value: the root node
Variables within templates
Both xsl:variable and xsl:param are allowed in templates
xsl:variable is allowed anywhere that an instruction (xsl:…) is allowed the binding is visible for all following siblings and their
descendants the binding is not visible for the xsl:variable element
itself
xsl:param is allowed in the beginning of an xsl:template element visibility as with xsl:variable
Passing parameters to templates
Parameters are passed to templates using the xsl:with-param element
the required name attribute specifies the name of the parameter
xsl:with-param is allowed within xsl:call-template and xsl:apply-templates
the value is specified as for xsl:variable and xsl:param
Example<xsl:template name=”numbered-block”> <xsl:param name=”format”>1. </xsl:param> <fo:block> <xsl:number format=”{$format}”/> <xsl:apply-templates/> </fo:block></xsl:template>
<xsl:template match=”ol//ol/li”> <xsl:call-template name=”numbered-block”> <xsl:with-param name=”format”>a. </xsl:with-param> </xsl:call-template></xsl:template>
Output
xsl:output element allows stylesheet authors to specify how they wish the result tree to be output
xsl:output is a top-level elementthe method attribute identifies the method
that should be used for ouputting the result tree value can be: html, xml, text … or some other name (behavior not specified
by XSLT)
Output
Default of the method attribute the default is html, if
the root node of the result tree has an element childthe name of the first element child is htmlany text nodes preceding the first element child
contain whitespace characters only
otherwise the default is xml
XML output method
Outputs the result tree as a well-formed external parsed entity
if the root node of the result tree has a single element node child and no text node children, then the entity should be a well-formed XML document entity
attributes (among others): version: the XML version (default 1.0) encoding: the preferred character encoding omit-xml-declaration (yes or no)
HTML output method
Outputs the result tree as HTMLattributes:
version: the version of HTML (default is 4.0)
un-prefixed elements are interpreted as HTML, others as XML
empty elements <br></br> and </br> -> <br>
HTML names should be recognized regardless of case
Text output method
Outputs the result tree by outputting the string-value of every text node in the result tree in document order without any escaping (= character references are expanded)
Combining stylesheets
Two mechanisms to combine stylesheets inclusion: allows stylesheets to be combined
without changing the semantics of the stylesheets being combined
import: allows stylesheets to override each other
Stylesheet inclusion
An XSLT stylesheet may include another XSLT stylesheet using an xsl:include element
the element has an href attribute whose value is a URI reference identifying the stylesheet to be included
a top-level element the resource located by the href attribute is
parsed as an XML document and the children of the xsl:stylesheet element in this document replace the xsl:include element in the including document
Stylesheet import
An XSLT stylesheet may import another XSLT stylesheet using an xsl:import element
importing a stylesheet is the same as including it, except that definitions and template rules in the importing stylesheet take precedence over template rules and definitions in the imported stylesheet
a top-level elementan href attibute
Stylesheet import
The xsl:import elements must precede all the other element children of an xsl:stylesheet element, including any xsl:include elements
when xsl:include is used to include a stylesheet, any xsl:import elements in the included document are moved up in the including document to after any existing xsl:import elements in the including document
The import treeThe xsl:stylesheet elements encountered during
processing of a stylesheet that contains xsl:import elements are treated as forming an import tree
in the import tree, each xsl:stylesheet element has one import child for each xsl:import element that it contains
any xsl:include elements are resolved before constructing the import tree
import precedence is defined based on a post-order traversal (before = lower, after = higher)
The import tree
Assume: stylesheet A imports stylesheets B and C in that
order stylesheet B imports stylesheet D stylesheet C imports stylesheet E
the order of import precedence (lowest first): D, B, E, C, A
a definition or template rule with higher precedence takes precedence over a definition or template rule with lower import precedence
Conflict resolution for template rules
It is possible for a source node to match more than one template rule
the template rule to be used is determined as follows 1. All matching template rules that have lower
import precedence than the matching rules with the highest import precedence are eliminated from consideration
2. All matching template rules that have lower priority than the matching rules with the highest priority are eliminated from consideration
Conflict resolution for template rules: priority
The priority of a template rule is specified by the priority attribute on the template rule
if the pattern contains multiple alternatives separated by | , then it is treated equivalently to a set of template rules, one for each alternative
explicit priority is specified using a numeric value for the priority attribute
implicit priority assumed based on pattern specificity (see next slide)
The default priority is defined as follows: -0.5 for patterns comprised of only a wildcard or node
typewildcards: ”*” and ”@*”node types: node(), comment(), processing_instruction() or text()
-0.25 for patterns comprised of a namespace prefix and a wildcard
”prefix:*”
0 for patterns comprised of only a node’s nameun-prefixed or child:: axis for elementsprefixed with ”@” or attribute:: axis for an attributewith or without a namespace prefix
0.5 for all other patterns
Example
The element figure specifies a figure reference
the element para specifies a paragraph of content
the element margin specifies a marginalia construct
ExampleThe rendering of figures differs based on context
when found outside a paragraph when found inside a paragraph when found inside a paragraph that is marginalia
without priority the following rules would conflict when processing a figure in a paragraph
in marginalia
<xsl:template match=”margin/para/figure” priority=”2”><xsl:template match=”para/figure”> <!-- priority=”0.5” --><xsl:template match=”figure”> <!--priority=”0” --><xsl:template match=”*”> <!-- priority=”-0.5” -->
Overriding template rules
A template rule that is being used to override a template rule in an imported stylesheet can use the xsl:apply-imports element to invoke the overridden template rule
Overriding template rules
<xsl:template match=”example”> … is contained in <pre><xsl:apply-templates/></pre> doc.xsl...<xsl:template>----------------------------------------------------------------------------<xsl:import href=”doc.xsl”/>
<xsl:template match=”example”> <div style=”border: solid red”> <xsl:apply-imports/> </div></xsl:template>
effect: <div style=”border: solid red”><pre>…</pre></div>
Modes
Modes allow an element to be processed multiple times, each time producing a different result
both xsl:template and xsl:apply-templates have an optional mode attribute if xsl:template does not have a match attribute, it
must not have a mode attribute if an xsl:apply-templates element has a mode
attribute, then it applies only to those template rules from xsl:template elements that have a mode attribute with the same value
”no mode” applies to ”no mode”