1 ceal preconference workshop: xml wooseob jeong assistant professor school of information studies...
TRANSCRIPT
1
CEAL Preconference Workshop: XML
Wooseob JeongAssistant Professor
School of Information StudiesUniversity of Wisconsin – Milwaukee
March 2, 2004 San Diego, CA
Sponsored by School of Information Studies, University of Wisconsin – Milwaukee
University of California – San Diego Library
2
Why XML?
Simply because it’s already everywhere. MS Office XHTML - WYSIWYG PDF RDF - Dublin Core, RSS MARC in XML E-books
3
What is XML?
Extensible Markup Language XML is a concept, not an application.
Meta Language Linguistics for individual languages XHTML is an application of XML.
Brief history of XML SGML – HTML Not enough … why?
4
Learning XML
No technical experience needed. Even no HTML experience is welcome. HTML vs. XHTML (different families) Again, XML is a concept.
Good starts on XML http://www.infomotions.com
/musings/getting-started/
5
XML is simple but very strict. You can make your own mark up set as you like wi
th minimal requirement. Every tag should be paired. Tags should be in a hierarchy.
However, once you establish the set, you have to follow it. It’s the law! No exception. Otherwise, your document won’t be displayed at all. “Well-formedness” – minimum requirement DTD (Document Type Definition)
6
Philosophy of XML Separation of presentation information from its co
ntent. No decorating information allowed in contents. Presentation should be rendered by methods outside th
e document, currently either CSS or XSLT CSS has been used in HTML as well as in XML.
Ex) http://www.uwm.edu/~dhedberg/MENU.xml XSLT is more powerful.
Ex) http://web.utk.edu/~rgilmou1/xml4lita/ More Examples
7
Markup information Presentational Markup: Describe Appearance
<blockquote>1234 N. Oakland Ave.Milwaukee, WI 53201</blockquote>
Semantic Markup: Indicates Meaning<address><street>1234 N. Oakland Ave.</street><city>Milwaukee</city><state>WI</state><zip>53201</zip></address>
8
Your First XML Document
Using NotePad, please follow the instruction at http://supervoca.com/xml/first.htm
The result should look like http://supervoca.com/xml/first.xml
9
Restaurant Menu Exercise
Well-formedness CSS (Cascading Style Sheet)
Simple but not flexible XSLT (Extensible Stylesheet Language Tran
sformations) It is an xml document itself. Complex but really powerful
Online Exercises
10
Menu CSS Exercise
Use NotePad and type yourself, please! Watch out “save as” option. Modify “menu.xml” with your favorite foods,
adding CSS info. Modify “menu.css” with your prefences. Comprehensive CSS reference
http://www.w3schools.com/css/default.asp
11
Menu XSLT Exercise Modify “menu2.xml” by adding XSLT info. Modify “menu2.xsl” with your preference. It is like a limited programming language.
Selective displays with the same data. Examples
You may use HTML tags freely, but every attribute’s value should be quoted.
Watch out typos!
12
Unicode in XML Unicode is the default character set in XML. What’s Unicode?
http://unicode.org/ Why is it so important? Where is ASCII? Multilingual vs. Multiscript
WordPad or MS Word should be used for Unicode documents. Save as “Unicode Text”
13
“united.xml”<?xml version="1.0"?><?xml-stylesheet type="text/css" href="united.css"?>
<united><English>Eradicate extreme poverty and hunger</English>
<Chinese> 消灭极端贫穷和饥饿 </Chinese>
<French>Réduire l'extrême pauvreté et la faim</French>
<Russian>Ликвидация крайней нищеты и голода</Russian></united>
14
“united.css”
English {display: block; color=red}French {display: block; color=blue}Chinse {display: block; color=green}Russian {display: block; color=purple}
15
More Unicode Exercise Multilingual/multiscript sources
United Nations International Bible Society
Since an XSLT file is an XML document, you can use any languages or any scripts in your XSLT.
Only Windows 2000 or XP supports Unicode fully. CSS – “bible.xml” XSLT – “biblecjk.xml”
16
SMIL Exercise (1)
Synchronized Multimedia Integration Language
Still an XML application! Multiple media are played together.
Example: Closed Captioning. RealText Exercise
Based on Real Player setting
17
SMIL Exercise (2)
Locate an audio source. Ex) Voice of America at http://voanews.com
Locate its transcript. Modify “example.smil” file according to your
information. Modify “example.rt” file with your transcript. It can be a “Karaoke” application.
18
SMIL Exercise (3) Online Exercise http://supervocab.com/xml Locate any CJK real audio file on the web, and cop
y the URL to the form. Ex) http://homepage.third-wave.com/didreat/kor/real.h
tm Type the script in CJK. Choose a character set and a font.
SMIL in Real Audio does still support local character sets only.
19
Document Type Definition
What is DTD? The master plan dictates all the rules for
elements, attributes, and entities. You may make your own DTD, but once
you make it, you should follow the rule. No exception!
Why is DTD important? Data Exchange
20
Elements, Attributes, and Entities
Elements Building blocks of markup (tags)
Attributes Qualifying Elements (properties)
Entities Referencing External Content and Saving
Typing Ex) special characters
21
DTDs XHTML
TEI (Text Encoding Initiative)
EAD (Encoded Archival Description)
RDF (Resource Description Framework) Dublin Core; RSS
22
Validation To be a same type of document, it should be valid
for its DTD.
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver. 1.1//EN" "http://www.tei-c.org/Lite/DTD/teixlite.dtd">
Online validation tool
Well-formedness vs. Validation
23
TEI Letter Transcript Exercise The purpose of this exercise is to make a valid TEI
document transcribing a letter. Use a remote TEI DTD
TEIXLite DTD <!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver.
1.1//EN" "http://www.tei-c.org/Lite/DTD/teixlite.dtd"> Modify “letter.xml” and “letter.css” with your text an
d preference. Do a validation test, please.
24
Dublin Core Most Frequent Example in RDF http://dublincore.org/
<?xml version="1.0"?> <!DOCTYPE rdf:RDF PUBLIC "-//DUBLIN CORE//DCMES DTD 2002/07/31//EN" "http://dublincore.org/documents/2002/07/31/dcmes-xml/dcmes-xml-dtd.dtd"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://www.ilrt.bristol.ac.uk/people/cmdjb/"> <dc:title>Dave Beckett's Home Page</dc:title> <dc:creator>Dave Beckett</dc:creator> <dc:publisher>ILRT, University of Bristol</dc:publisher> <dc:date>2002-07-31</dc:date> </rdf:Description> </rdf:RDF>
25
RSS (RDF Site Summary)
“Rich Site Summary” More practical and active example in
RDF http://supervocab.com/rss RSS feeds are so many on the web.
http://mtgear.net/index.rdf http://homepage.mac.com/cyberdog_to_go/
iblog/B1549800066/rss.xml http://blog.isism.net/b2rdf.php
26
Other important parts in XML (1)
XSL XSL Transformations XSL Formatting Objects
Ex) PDF
URI (Uniform Resource Identifiers) URL (Uniform Resource Locator) ISBN/ISSN
27
Other important parts in XML (2) XLINK
More than what HTML links do Ex) inbound link information, behavior of links
(when, how to activate) XPointer
More than what HTML anchors do XPointers refer to particular parts of or location
s in XML documents. Ex) linking to the third sentence of the sevente
enth paragraph in a document
28
Other important parts in XML (3)
Namespace An XML namespace is a collection of
names, identified by a URI reference Problem: same element names Ex) title in HTML and title of a book
Schema Alternative to DTD Data type
29
Popular E-book Formats Adobe: basically PDF Microsoft Palm Free E-book Projects
http://etext.lib.virginia.edu/ebooks/ebooklist.html
http://www.sois.uwm.edu/xml/ Authoring tools
Universal CJK support cannot be found yet.
30
Conclusion XML is a concept.
There are many XML applications. XML should separate its presentation
information from its contents. XML’s default character set is Unicode. XML should be “well-formed” at least. DTD/Schema is very important for
data/information interchange.