xml technologies for text encoding tamás váradi [email protected]

13
XML technologies for text encoding Tamás Váradi [email protected]

Upload: alexia-campbell

Post on 04-Jan-2016

217 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: XML technologies for text encoding Tamás Váradi varadi@nytud.hu

XML technologies for text encoding

Tamás Vá[email protected]

Page 2: XML technologies for text encoding Tamás Váradi varadi@nytud.hu

BTANT129 w4 2

Introduction

• Processing XML files– CSS – getting the picture right– XPATH – Finding our way around– XSLT extracting the right info

• Encoding content the right way– Text Encoding Initiative– TEI Lite

• Tools

Page 3: XML technologies for text encoding Tamás Váradi varadi@nytud.hu

BTANT129 w4 3

Benefits of XML

• makes structure and content clear• encoding independent of display and

device• portable, platform independent• ideal for exchange of data• with a DTD, validation of document is

easy

Page 4: XML technologies for text encoding Tamás Váradi varadi@nytud.hu

BTANT129 w4 4

Limitations of XML

• Verbose annotation increases the size of the files (sometimes hugely)

• Not very efficient format for fast access and recall

Page 5: XML technologies for text encoding Tamás Váradi varadi@nytud.hu

BTANT129 w4 5

Displaying XML files?

• Style sheets– consistent design– easy to change– one stylesheet can serve many XML

documents– one documents can use different

stylesheets

Page 6: XML technologies for text encoding Tamás Váradi varadi@nytud.hu

BTANT129 w4 6

Cascading Stylesheets

h1: { font-size: 3em; }

Elements are associated with display styles

selector property value

A Stylesheet is a collections of style rules

Page 7: XML technologies for text encoding Tamás Váradi varadi@nytud.hu

BTANT129 w4 7

Declaring the stylesheet

<?xml-stylesheet

type = "text/css"

href = "url-of-stylesheet"

?>

<? xml version="1.0' ?>

<? xml-stylesheet type="text/css" href="cards.css" ?>

Page 8: XML technologies for text encoding Tamás Váradi varadi@nytud.hu

BTANT129 w4 8

An example

• Load the file letter.xml into Internet Explorer

• Now load the file letter2.xml• View source• Open the file letter.css in notepad• Check that what you see corresponds

to what is in the css file

Page 9: XML technologies for text encoding Tamás Váradi varadi@nytud.hu

BTANT129 w4 9

Cascading stylesheets

• Features are inherited down the XML tree

• Three levels of applying styles:1. External stylesheets2. Internal style definitions3. Inline style settings

Page 10: XML technologies for text encoding Tamás Váradi varadi@nytud.hu

BTANT129 w4 10

Limitations of CSS

• Elements are formatted in their original sequence

• No means to reorder elements• No means to select a set of elements

Page 11: XML technologies for text encoding Tamás Váradi varadi@nytud.hu

BTANT129 w4 11

More advanced techniques

• XSL – Extensible stylesheet Language

• XSLT – XSL with Transformations• XPath – a standard way to find

elements in the XML hierarchy

Page 12: XML technologies for text encoding Tamás Váradi varadi@nytud.hu

BTANT129 w4 12

XSLT

• See the excellent introduction to XSLT by Sebastian Rahtz available here

Page 13: XML technologies for text encoding Tamás Váradi varadi@nytud.hu

BTANT129 w4 13

Standard annotation of content

• XML is an annotation standard• it is not designed for any particular

domain• Need for standard way of encoding

typical text genres like books, dictionaries, letters, radio news etc. etc.

• => TEXT ENCODING INITIATIVES (TEI)