xml technologies for text encoding tamás váradi [email protected]
TRANSCRIPT
XML technologies for text encoding
Tamás Vá[email protected]
BTANT129 w4 2
Introduction
• Processing XML files– CSS – getting the picture right– XPATH – Finding our way around– XSLT extracting the right info
• Encoding content the right way– Text Encoding Initiative– TEI Lite
• Tools
BTANT129 w4 3
Benefits of XML
• makes structure and content clear• encoding independent of display and
device• portable, platform independent• ideal for exchange of data• with a DTD, validation of document is
easy
BTANT129 w4 4
Limitations of XML
• Verbose annotation increases the size of the files (sometimes hugely)
• Not very efficient format for fast access and recall
BTANT129 w4 5
Displaying XML files?
• Style sheets– consistent design– easy to change– one stylesheet can serve many XML
documents– one documents can use different
stylesheets
BTANT129 w4 6
Cascading Stylesheets
h1: { font-size: 3em; }
Elements are associated with display styles
selector property value
A Stylesheet is a collections of style rules
BTANT129 w4 7
Declaring the stylesheet
<?xml-stylesheet
type = "text/css"
href = "url-of-stylesheet"
?>
<? xml version="1.0' ?>
<? xml-stylesheet type="text/css" href="cards.css" ?>
BTANT129 w4 8
An example
• Load the file letter.xml into Internet Explorer
• Now load the file letter2.xml• View source• Open the file letter.css in notepad• Check that what you see corresponds
to what is in the css file
BTANT129 w4 9
Cascading stylesheets
• Features are inherited down the XML tree
• Three levels of applying styles:1. External stylesheets2. Internal style definitions3. Inline style settings
BTANT129 w4 10
Limitations of CSS
• Elements are formatted in their original sequence
• No means to reorder elements• No means to select a set of elements
BTANT129 w4 11
More advanced techniques
• XSL – Extensible stylesheet Language
• XSLT – XSL with Transformations• XPath – a standard way to find
elements in the XML hierarchy
BTANT129 w4 12
XSLT
• See the excellent introduction to XSLT by Sebastian Rahtz available here
BTANT129 w4 13
Standard annotation of content
• XML is an annotation standard• it is not designed for any particular
domain• Need for standard way of encoding
typical text genres like books, dictionaries, letters, radio news etc. etc.
• => TEXT ENCODING INITIATIVES (TEI)