xml for information management – day 1 airi salminen xml for information management university of...

50
XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen http://users.jyu.fi/~airi / 12.1.-16.1. 2009

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

XML for Information Management

University of Erlangen-NurembergComputational Linguistics

Instructor: Professor Airi Salminenhttp://users.jyu.fi/~airi/

12.1.-16.1. 2009

Page 2: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

2

1. Course introduction 2. XML examples 3. XML concepts

Day 1: Course introduction, XML examples and concepts

Outline

Page 3: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

3

1. Course introduction: Instructor

‣ Home university: University of Jyväskylä in Finland, Faculty of Information Technology

‣ Home page: http://users.jyu.fi/~airi/

‣ Experience Jyväskylä:• http://www3.jkl.fi/international/experience/ind

ex.html

Page 4: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

4

‣ My research areas: structured documents, content management in organizations, document standardization, semantic web, information retrieval

‣ My XML-related research has concerned:• modelling structured text• querying structured text• SGML/XML standardization

1. Course introduction: Instructor

Page 5: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

5

Tague, J., Salminen, A., & McClellan, C. (1991). Complete formal model for information retrieval systems. In Proc. of the 14th ACM SIGIR Conference, 14-20. New York: ACM Press.

Salminen, A., & Watters, C. (1992). A two-level structure for textual databases to support hypertext access. Journal of the American Society for Information Science 43 (6), 432-447.

Salminen, A., & Tompa, F. (1993). PAT expressions: an algebra for text search Acta Linguistica Hungarica, 41 (1-4), 277-306. http://www.cs.jyu.fi/~airi/papers/COMPLEX-1992.pdf

Salminen, A., Tague-Sutcliffe, J., & McClellan, C. (1995). From text to hypertext by indexing. ACM Transactions on Information Systems 13 (1), 69-99.

Salminen, A., Lehtovaara, M., & Kauppinen, K. (1996). Standardization of digital legislative documents - a case study. In Proceedings of the Twenty-Ninth Hawaii International Conference on System Sciences (pp. 72-81). Los Alamitos, CA: IEEE Computer Society Press.

Kuikka, E., & Salminen, A. (1997). Two-dimensional filters for structured text. Information Processing and Management 33 (1), 37-54.

1. Course introduction: Instructor

Page 6: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

6

Salminen, A., Kauppinen, K., & Lehtovaara, M. (1997). Towards a methodology for document analysis. Journal of the American Society for Information Science 48 (7), Special Issue on Structured Information/Standards for Document Architectures, 644-655. 

Salminen, A., & Tompa, F. (1999). Grammars++ for modelling information in text. Information Systems 24 (1), 1-24.

Salminen, A., Tiitinen, P., & Lyytikäinen, V. (1999). Usability evaluation of a structured document archive. In Proc. of the Thirty-Second Hawaii International Conference on System Sciences. Los Alamitos, CA: IEEE Computer Society Press.

Lyytikäinen, V., Tiitinen, P., & Salminen, A. (2001). XML metadata for accessing heterogeneous legal databases. In Proc. of the XML Europe 2001 Conference. http://www.gca.org/papers/xmleurope2001/papers/html/s27-4.html

Salminen, A., & Tompa, F.W. (2001). Requirements for XML document database systems. In Proc. of the ACM Symposium on Document Engineering (DocEng '01), 85-94. New York: ACM Press.

Salminen, A., Lyytikäinen, V., Tiitinen, P., & Mustajärvi, O. (2001). Experiences of SGML standardization: The case of the Finnish legislative documents. In Proc. of the Thirty-Fourth Hawaii International Conference on System Sciences. Los Alamitos, CA: IEEE Computer Society Press.

1. Course introduction: Instructor

Page 7: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

7

Salminen, A. (2003). Document analysis methods. Encyclopedia of Library and Information Science, Second Edition, Revised and Expanded (pp. 916-927). New York: Marcel Dekker. New York: ACM Press.

Korhonen, R. & Salminen, A. (2003). Visualization of EDI messages: Facing the problems in the use of XML. In Proc. of the Fifth International Conference on Electronic Commerce, 466-473. New York: ACM Press.

Salminen, A., Lyytikäinen, V., Tiitinen, P., & Mustajärvi, O. (2004). Implementing digital government in the Finnish Parliament. In Digital Government: Strategies and Implementation (pp. 242-259). Hersley, PA: IDEA Group Publishing

Salminen, A. (2005). Building digital government by XML. In Proc. of the Thirty-Eighth Hawaii International Conference on System Sciences. Los Alamitos, CA: IEEE Computer Society Press.

Salminen, A., Nurmeksela, R., Lehtinen, A., Lyytikäinen, V., & Mustajärvi, O. (2006). Content production strategies for e-Government. In Encyclopedia of Digital Government, Vol. I (pp. 224-230). Hersley, PA: IDEA Group Publishing.

Nurmeksela, R., Jauhiainen, E., Salminen, A., & Honkaranta, A. (2007). XML document implementation: Experiences from three cases. In Proceedings of the Second International Conference on Digial Information Management (pp. 224-229). Los Alamitos, CA: IEEE.

1. Course introduction: Instructor

Page 8: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

8

XML-related projects

‣ RASKE (1994-1998): Developing Standards for Structured Documents

‣ inSGML (1998-2001): Methods for SGML standardization in industry

‣ EULEGIS (1998-2000): European User Views to Legislative Information in Structured Form

‣ AirXML (2002-2004): XML and Data Warehousing in Air Defence

‣ RASKE2 (2003-2006): Methods for the Integration of Systems and Services in e-Government

1. Course introduction: Instructor

Page 9: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

9

1. Course introduction

‣ Syllabus:• http://users.jyu.fi/~airi/opetus/xml/erlangen/

‣ Course Readings:• available on the course web site

‣ Project Assignment: • http://

users.jyu.fi/~airi/opetus/xml/erlangen/project.html

‣ Contact by email: [email protected]

Page 10: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

10

1. Course introduction: project

‣ Purpose• The projects are intended to explore the application of

XML in various contexts. Students interested in practical XML exercises are free to suggest a practical project where they can test some XML software and/or build an application of their own.

• The project can also be an investigation of an existing or planned XML solution in an organizational context together with an analysis of the impacts of the solution.

‣ Topics: Proposed by students‣ Teams of two, or individual projects‣ The phases

• 2 page topic proposal: due on Feb. 20• Project report: due on March 31

Page 11: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

11

2. XML examples

• separation of the primary content and markup

• markup is metadata adding some information to the primary content

<?xml version = "1.0"?><poem author = ”Murasaki Shikibu” author_born = ”974”><stanza><line>This life of ours would not cause you sorrow</line><line>if you thought of it as like</line><line>the mountain cherry blossoms </line><line>which bloom and fade in a day.</line></stanza></poem>

Note: The text of the line elements is taken fromhttp://www.bopsecrets.org/rexroth/translations/japanese.htm,containing Kenneth Rexroth’s translations of Japanese poetry

Page 12: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

12

2. XML examples

This life of ours would not cause you sorrowif you thought of it as like

the mountain cherry blossomswhich bloom and fade in a day.

External presentation for human perception can be defined in a separate stylesheet. By a proper stylesheet the previous XML

document might look like:

Examples of the attachment of stylesheets. Try ”xml examples” by Google.

Page 13: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

13

2. XML examples

http://www.tei-c.org/Guidelines/Customization/Lite/U5-eg.html

A piece of prose in the TEI Guidelines:

Page 14: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

14

3. XML concepts

XML = Extensible Markup Language

T. Bray, J. Paoli, & C. M. Sperberg-McQueen (Eds.), Extensible Markup Language (XML) 1.0,W3C Recommendation 10- February-1998, http://www.w3.org/TR/1998/REC-xml-19980210/

T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, & F. Yergeau (Eds.), Extensible Markup Language (XML) 1.0 (Fifth Edition), W3C Recommendation 16 August 2006, http://www.w3.org/TR/2008/REC-xml-20081126/

T. Bray, J. Paoli, C.M. Sperberg-McQueen, E. Maler, F. Yergeau, & J. Cowan (Eds.), Extensible Markup Language (XML) 1.1. (Second Edition) W3C Recommendation 16 August 2006. http://www.w3.org/TR/2006/REC-xml11-20060816/

A set of rules for defining and representing information as structured documents for applications on the Internet. XML is a restricted form of the older markup language called SGML.

XML Development History: http://www.w3.org/XML/hist2002

Page 15: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

15

Processing XML documents

XML Document

XML Processor

Application

3. XML concepts: XML processor

Page 16: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

16

physical structure, consisting of entities

logical structure where elements are the core composites

XML processor recognizes from a document two structures:

3. XML concepts: physical and logical structure

Page 17: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

17

Entity

file (text or some other kind of data)

named piece of text

3. XML concepts: entity

Page 18: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

18

Example of an entity structure

part 1

root entity

part 2

entity entity reference

figure1.jpg figure2.jpg figure3.jpg

3. XML concepts: entity

Page 19: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

19

Entity as a named piece of text, like in HTML:

name value reference

auml ä &auml;

ouml ö &ouml;

Y&ouml; Jyv&auml;skyl&auml;ss&auml; Yö Jyväskylässä

3. XML concepts: entity

Page 20: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

20

An element is marked-up by a begin-tag and an end-tag.

<year>1654</year>

begin-tag end-tag

content

Element

3. XML concepts: element

Page 21: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

21

<?xml version="1.0"?><rhymecollection><rhyme><line>Ole aina iloinen</line><line> niin kuin pikku varpunen</line></rhyme><rhyme><line>See, see! What shall I see?</line><line>A horse's head where his tail should be</line></rhyme></rhymecollection>

Example 1: a document of seven elements

3. XML concepts: element

Page 22: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

22

Example 1 as an element tree

root element rhymecollection

rhymerhyme

lineline

lineline

3. XML concepts: tree structure

There is always one root element

Every non-root element is a child element of a parent element

Page 23: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

23

• name• value (character

string)

Extra information can be attached to elements by attributes

An attribute has:

<lastname earlier=“Rantanen”>Korhonen</lastname>

name value

3. XML concepts: attribute

xml:lang for identifying the language of the content of an element

xml:space for signaling that the white spaces should be preserved by the application

Two predefined attributes: xml:lang and xml:space.

Page 24: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

24

• as element content

• as attribute value

Data in XML elements:

3. XML concepts: elements and attributes

Page 25: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

25

<lastname earlier=“Rantanen”>Korhonen</lastname>

Three alternative ways for giving two lastnames for a person:

<lastname><earlier>Rantanen</earlier><now>Korhonen </now></lastname>

<lastname earlier=“Rantanen” now=“Korhonen”></lastname>

What is the difference?

3. XML concepts: elements and attributes

1.

2.

3.

Page 26: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

26

Child elements of a parent element are ordered.

The writing order of attributes in an element is insignificant.

In the logical structure

3. XML concepts: elements and attributes

Page 27: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

27

Different structures:

<lastname><earlier>Rantanen</earlier><now>Korhonen </now></lastname>

<lastname><now>Korhonen </now><earlier>Rantanen</earlier></lastname>

1. child element

2. child element1. child element

2. child element

3. XML concepts: elements and attributes

Page 28: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

28

Equivalent solutions:

<lastname earlier=“Rantanen” now=“Korhonen”></lastname>

<lastname now=“Korhonen” earlier=“Rantanen” ></lastname>

3. XML concepts: elements and attributes

Page 29: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

29

XML documents encoded in: Unicode

intended for content written in any natural language of the world

3. XML concepts: Unicode

The latest version: Unicode 5.1.0

The development work done by the Unicode Consortium

Page 30: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

30

XML is a meta language intended to define languages for special application areas

Document Type Definition (DTD) is the mechanism to define languages

3. XML concepts: DTD

Page 31: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

31

DTD :

Example 1 meets the constraints defined in the DTD.

<!DOCTYPE rhymecollection [<!ELEMENT rhymecollection (title?, rhyme+)><!ELEMENT title (#PCDATA)><!ELEMENT rhyme (line+)><!ELEMENT line (#PCDATA)> ]>

3. XML concepts: DTD

Page 32: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

32

Attributes added

<!DOCTYPE rhymecollection [<!ELEMENT rhymecollection (title?, rhyme+)><!ELEMENT title (#PCDATA)><!ELEMENT rhyme (line+)><!ATTLIST rhyme

xml:lang NMTOKEN #REQUIREDauthor CDATA #IMPLIED >

<!ELEMENT line (#PCDATA)> ]>

3. XML concepts: DTD

Page 33: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

33

DTD can be attached to a document

as in an internal subset

as an external subset

by combining internal and external markup declarations

DTD consists of all markup declarations together.

3. XML concepts: DTD

Page 34: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

34

<?xml version="1.0" ?><!DOCTYPE rhymecollection [<!ELEMENT rhymecollection (title?, rhyme+)><!ELEMENT title (#PCDATA)><!ELEMENT rhyme (line+)><!ATTLIST rhyme

xml:lang NMTOKEN #REQUIREDauthor CDATA #IMPLIED >

<!ELEMENT line (#PCDATA)> ]><rhymecollection><rhyme><line>See, see! What shall I see?</line><line>A horse's head where his tail should be</line></rhyme></rhymecollection>

Internal DTD

3. XML concepts: DTD

Page 35: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

35

System identifier ”myrhyme.dtd" gives the address for the external DTD

<?xml version="1.0"?><!DOCTYPE rhymecollection SYSTEM ”myrhyme.dtd”><rhymecollection><rhyme><line>See, see! What shall I see?</line><line>A horse's head where his tail should be</line></rhyme></rhymecollection>

3. XML concepts: DTD

Page 36: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

36

markup declarations in ”myrhyme.dtd”:

Text Declaration<?xml version="1.0"?><!DOCTYPE rhymecollection [<!ELEMENT rhymecollection (title?, rhyme+)><!ELEMENT title (#PCDATA)><!ELEMENT rhyme (line+)><!ATTLIST rhyme

xml:lang NMTOKEN #REQUIREDauthor CDATA #IMPLIED >

<!ELEMENT line (#PCDATA)> ]>

3. XML concepts: DTD

Page 37: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

37

DTD is just one definition mechanism available for constraining XML data. The most important:

3. XML concepts: DTD

XML Schema

RELAX NG

The term schema or (XML schema) can refer to a definition written by any definion mechanism developed for XML data. The languages for defining schemas are called schema languages.

Page 38: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

38

Examples of XML applications:

XHTML: http://www.w3.org/TR/xhtml1/ RSS (Really Simple Syndication):

http://blogs.law.harvard.edu/tech/rss TEI (Text Encoding Initiative): http://www.tei-

c.org/index.xml ebXML (Electronic Business using XML):

http://www.ebxml.org/

3. XML concepts: XML application

An XML application is an XML-based language, (usually) defined by some schema language.

Page 39: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

39

XML is a subset of SGML

HTML is an SGML application

XHTML is an XML application

XML -- SGML – HTML -- XHTML

3. XML concepts: XML application

Page 40: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

40

Two kinds of constraints in the XML specification: well-formedness constraints: all XML documents

have to meet them and they are called well-formed

validity constraints: documents associated with a DTD and meeting the constraints (including that they have to meet the constraints expressed in the DTD) are called valid

3. XML concepts: well-formed and valid

Page 41: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

41

A requirement for well-formed documents:

each child element has to be contained in the parent element

<date><day>24<month>1</day></month><year>2005</year></date>

NOT well-formed

3. XML concepts: well-formed and valid

Page 42: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

42

<!DOCTYPE rhymecollection [<!ELEMENT rhymecollection (title?, rhyme+)><!ELEMENT title (#PCDATA)><!ELEMENT rhyme (line+)><!ATTLIST rhyme

xml:lang NMTOKEN #REQUIREDauthor CDATA #IMPLIED >

<!ELEMENT line (#PCDATA)> ]>

<?xml version="1.0" ?>

<rhymecollection><rhyme xml:lang = “fi”><line>See, see! What shall I see?</line><line>A horse's head where his tail should be</line></rhyme></rhymecollection>

VALID, even though the attribute value is not correct

3. XML concepts: well-formed and valid

Page 43: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

43

<!DOCTYPE rhymecollection [<!ELEMENT rhymecollection (title?, rhyme+)><!ELEMENT title (#PCDATA)><!ELEMENT rhyme (line+)><!ATTLIST rhyme

xml:lang NMTOKEN #REQUIREDauthor CDATA #IMPLIED >

<!ELEMENT line (#PCDATA)> ]>

<?xml version="1.0" ?>

<rhymecollection><rhyme><line>See, see! What shall I see?</line><line>A horse's head where his tail should be</line></rhyme></rhymecollection>

NOT valid

3. XML concepts: well-formed and valid

Page 44: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

44

Often need to use elements and attributes originating from different environments (or applications).

Vocabularies in two environments may include common names intended for different purposes.

If multiple declarations used in a single DTD, name collisions must avoided.

3. XML concepts: Namespaces

Page 45: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

45

XML namespaces

Provides a method for qualifying element and attribute names so that name collisions can be avoided

Motivation: modularity and documentation

If a well-understood markup vocabulary for element and attribute names exists, it shoud be re-used rather than re-invented, especially if there is also software available.

http://www.w3c.org/TR/REC-xml-names

3. XML concepts: Namespaces

Page 46: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

46

Collection of names, identified by a URI

No formal rules for defining names in a namespace

URI (Uniform Resource Identifier)

• URL (Uniform Resource Locator) or• URN (Uniform Resource Name)

XML namespace

Generic Syntax, RFC 3986: http://www.ietf.org/rfc/rfc3986.txt

3. XML concepts: Namespaces

In XML Names 1.1 URI has been replaced by IRI (Internationalized Resource Identifier, RFC 3987: http://www.rfc-editor.org/rfc/rfc3987.txt

Page 47: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

47

Example

Namespace: http://uwaterloo.caElement names: department, name, professor, student, last_name, first_name, ...Global attribute names: id, ...Per-element-type attribute names: student: supervisor, ...

3. XML concepts: Namespaces

Page 48: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

48

Namespace declaration: defines a label (prefix) for the namespace and associates it to the namespace identifier (URI)

Qualified name: a namespace prefix and a local part, separated by a colon

<?xml version="1.0"?>

<report xmlns:uw="http://uwaterloo.ca">

<uw:department>

<uw:name>Department of Computer Science</uw:name>

...

</report>

3. XML concepts: Namespaces

Page 49: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

49

Prefix xml is reserved for W3C development work and its identifier is http://www.w3.org/XML/1998/namespace.

The namespace can be declared in a document but it can be used without declaration.

Prefix xmlns is used only for declaring namespaces. It cannot be used as a name of a namespace.

3. XML concepts: Namespaces

Page 50: XML for Information Management – Day 1 Airi Salminen XML for Information Management University of Erlangen-Nuremberg Computational Linguistics Instructor:

XML for Information Management – Day 1Airi Salminen

50

Open source software for experimentations:

http://www.w3.org/Status