introduction to xml
DESCRIPTION
Integration & Interoperability. Tools. Data (XML). Web Services. Introduction to XML. Based on tutorials of B. Cormia, D. Suciu, H. Boley, S. Decker, M. Sintek, E. R. Harold and others. - PowerPoint PPT PresentationTRANSCRIPT
Introduction to XMLBased on tutorials of B. Cormia, D. Suciu, H. Boley, S.
Decker, M. Sintek, E. R. Harold and others
Tool
sTo
ols
WebWebServicesServices
Integration & Integration & InteroperabilityInteroperability
Data
(XM
L)Da
ta (X
ML)
Such Format, which Describes the Content of a Web Document Rather than the Way to Display it, is among the Basic Needs of the Intelligent Web Applications
Introduction XML is a text-based markup language that is fast
becoming the standard for data interchange on the Web. As HTML, XML uses tags. But unlike HTML, XML tags identify the data, rather than
specifying how to display it. Where an HTML tag says something like "display this data
in bold font" (<b>...</b>), an XML tag acts like a field name in your program. It puts a label on a piece of data that identifies it (for example: <message>...</message>).
HTML vs. XML<h1> Bibliography </h1><p> <i> Foundations of DBs</i>, Abiteboul, Hull, Vianu <br> Addison-Wesley, 1995<p> <i> Logics for DBs and ISs </i>, Chomicki, Saake, eds.
<br> Kluwer, 1998
<biobliography> <book> <title> Foundations of DBs </title> <author> Abiteboul </author> <author> Hull </author>
<author> Vianu </author> <publisher> Addison-Wesley </publisher> .....</book> <book> ... <editor> Chomicki </editor>... </book> ... </bibliography>
HTML tags: presentation,
generic document structure
XML tags: content,
"semantic", (DTD-) specific
External Presentations from XML
<address> <name>Xaver M. Linde</name> <street>Wikingerufer 7</street> <town>10555 Berlin</town></address>
XML Markup:
Xaver M. LindeWikingerufer 710555 Berlin
External Presentations:
XML stylesheets are,e.g., usable to generatedifferent presentations
Xaver M. LindeWikingerufer 710555 Berlin. . .
<address> <name>Xaver M. Linde</name> <place> <street>Wikingerufer 7</street> <town>10555 Berlin</town> </place></address>
XML to XML Transformations
<address> <name>Xaver M. Linde</name> <street>Wikingerufer 7</street> <town>10555 Berlin</town></address>
XML Markup 1:
XML Markup 2:XML stylesheets arealso usable to transformXML representations
<address> <name>Xaver M. Linde</name> <street>Wikingerufer 7</street> <town>10555 Berlin</town></address>
WHERE <address> <name>Xaver M. Linde</name> <street>$s</street> <town>$t</town> </address>CONSTRUCT <binding> <s>$s</s> <t>$t</t> </binding>
XML QueriesXML Markup:
XML Query (XML-QL):
XML queries canselect subelementsof XML elements
element
ssubelements
<binding> <s>Wikingerufer 7</s> <t>10555 Berlin</t> </binding>
PART_OF and HAS_PART Example
HAS_PART
PART_OFkitchen
flat
kitchenflatHAS_PART
PART_OF
XML
<FLAT> kitchen </FLAT>
Role of an Object
to be place for making food in kitchen
flat
kitchenflatto be place for making food in
XML
<FLAT>
<PLACE FOR MAKING FOOD> kitchen </PLACE FOR MAKING FOOD>
</FLAT>
Multi-Roles Object
Vagan
University of Kharkov
to be Head of Department
to be a lecturer
to be Head of Research Lab.
to be Member of Council
to be Head of Exchange Programs
XML (1)<UNIVERSITY OF KHARKOV>
<MEMBER OF COUNCIL> Vagan </MEMBER OF COUNCIL>
<HEAD OF EXCHANGE PROGRAM> Vagan </HEAD OF EXCHANGE PROGRAM>
<HEAD OF DEPARTMENT> Vagan </HEAD OF DEPARTMENT>
<HEAD OF RESEARCH LAB> Vagan </ HEAD OF RESEARCH LAB>
<LECTURER> Vagan </LECTURER>
</UNIVERSITY OF KHARKOV>
XML (2)<VAGAN>
<ROLES IN UNIVERSITY OF KHARKOV>
<ROLE 1> Member of Council </ROLE 1>
<ROLE 2> Head of Exchange Program </ROLE 2>
<ROLE 3> Head of Department </ROLE 3>
<ROLE 4> Head of Research Lab </ROLE 4>
<ROLE 5> Lecturer </ROLE 5>
</ROLES IN UNIVERSITY OF KHARKOV>
</VAGAN>
XML (3)
<UNIVERSITY OF KHARKOV>
<ROLE type = “MEMBER OF COUNCIL”> Vagan </ROLE>
<ROLE type = “HEAD OF EXCHANGE PROGRAM”> Vagan </ROLE>
<ROLE type = “HEAD OF DEPARTMENT”> Vagan </ROLE>
<ROLE type = “HEAD OF RESEARCH LAB”> Vagan </ROLE>
<ROLE type = “LECTURER”> Vagan </ROLE>
</UNIVERSITY OF KHARKOV>
Multi-Contextual Role of Object
University of Jyvaskyla
Vagan
University of Kharkov
to be Head of AI Departmentto be a lecturer
XML (1)
<VAGAN>
<ROLE IN UNIVERSITY OF KHARKOV>
Head of Department
</ROLE IN UNIVERSITY OF KHARKOV>
<ROLE IN UNIVERSITY OF JYVASKYLA>
Lecturer
</ROLE IN UNIVERSITY OF JYVASKYLA >
</VAGAN>
XML (2)
<VAGAN>
<ROLE place = “UNIVERSITY OF KHARKOV”> Head of Department </ROLE>
<ROLE place = “UNIVERSITY OF JYVASKYLA” > Lecturer </ROLE>
</VAGAN>
Multilevel Context Roles
Vagan
Kharkov University
AI Department
Ukraine... citizen
... employer
... Head
XML<COUNTRY>
<NAME> Ukraine </NAME>
<LEADING UNIVERSITY>
<NAME> Kharkov University </NAME>
<BEST DEPARTMENT>
<NAME> AI Department </NAME>
<HEAD>
<NAME> Vagan </NAME>
</HEAD>
</BEST DEPARTMENT>
</LEADING UNIVERSITY>
</COUNTRY>
Not enough
Contents
XML SpecificationDocument Type DefinitionsCascading Style SheetsQuerying XML
XML Specification
Elements, Attributes, and Values
XML uses the same building blocks as HTML, elements, attributes, and values
Elements contain attributesAttributes contain valuesValues contained in quotations (“ ”)
Simple XML Element (no attributes)
<position>professor</position>
name of the element
opening tag closing tag
name of the element
content of the element
Simple XML Element
<position>professor
</position>
<position>professor</position>is equivalent to
<diagnosis>professor
</diagnosis>
is different with
XML Element with Attribute
<position place = “university”>professor</position>
name ofthe element
opening tag closing tag
name ofthe element
content of the element
attribute ofthe element
value ofthe attribute
XML Element with Two Attributes
<position place = “university” type = “teaching”> professor</position>
XML Element with Two Attributes<position place = “university” type = “teaching”> professor</position>
<position> <name>professor</name> <place>university</place> <type>teaching</type></position>
is similar but not equivalent to
Do Not Forget to Put Quotations
… place = “university”...
quotations are obligatoryaround the value of an attribute
Nominal vs. Numerical Attributes
<price currency = “Euro”> 49.90</price>
<constant value = “3.14”> </constant >
Empty Element
<position/>
name of the element
opening and closing tags are merged together
<position></position>
is equivalent to
Empty Element with Attribute
<picture location = “/images/blueball.gif”/>
<picture location = “/images/blueball.gif”></picture>
is equivalent to
Tags Must be Nested Correctly<department> <head> vagan
</head></department>
<department> <head> vagan </department></head>
Case Matters<department> Artificial Intelligence</department>
<Department> Artificial Intelligence</Department>
is not the same as
<Department> Artificial Intelligence</department>
A Root Element is Required
<CS_Faculty> <department> Artificial Intelligence
</department> <department> Information Systems
</department></CS_Faculty>
<department> Artificial Intelligence
</department> <department> Information Systems
</department>
Writing five special symbols
To write the five special symbols:Type & for ampersand (&)Type < for the less than sign (<)Type > for the greater than sign (>)Type " to create a double quote (“)Type &apos to create an apostrophe (‘)
Declaring the XML Version
At the very beginning of the document type: <?xml
Then type: version=“1.0”Type: ?>
<?xml version=“1.0” ?>
Declaring the XML VersionObligatory and Optional Attributes
<?xml version=“number” [encoding=“encoding”] [standalone=“yes|no”] ?>optionaloptional
obligatory
Encoding Attribute
<?xml version=“1.0” encoding=“US-ASCII” ?>
Encoding Attribute ValuesUS-ASCIIUS-ASCII is a 7-bit encoding scheme that covers the English-language alphabet.
UTF-8UTF-8 is an 8-bit encoding scheme. Characters from the English-language alphabet are all encoded using an 8-bit bytes. Characters for other languages are encoding using 2, 3 or even 4 bytes. UTF-8 therefore produces compact documents for the English language, but very large documents for other languages.
UTF-16UTF-16 is a 16-bit encoding scheme. It is large enough to encode all the characters from all the alphabets in the world, with the exception of ideogram-based languages like Chinese. All characters in UTF-16 are encoded using 2 bytes. An English-language document that uses UTF-16 will be twice as large as the same document encoded using UTF-8. Documents written in other languages, however, will be far smaller using UTF-16.
Standalone Attribute
<?xml version=“1.0” standalone=“no” ?>
<?xml version=“1.0” standalone=“yes” ?>
An outside DTD is needed to correctly interpret the XML document
An outside DTD is not needed
DTD (Document Type Definition) is a file which describes the elements and attributes that may appear in the XML document and used to check its syntactical structure
The optional standalone attribute in XML declaration specifies whether a DTD is required to parse the document. The value must be “yes” or “no”.
Writing comments
To write comments:Type <!--Write the desired commentsType -->
<!-- This is a comment -->
Namespaces
Namespaces are a recent addition to the XML specification. The use of namespaces is not mandatory in XML, but it's often wise.
Namespaces were created to ensure uniqueness among XML elements.
<CS_Faculty xmlns = ‘http://www.academic.com’>
…
</CS_Faculty>
Namespaces
element Value - namespace identifier (URL)Attribute - XML
namespace
area of validity of the namespace
Namespace Prefix
<stock xmlns:edi='http://ecommerce.org/schema'> <!-- the 'price' element's namespace is http://ecommerce.org/schema -->
< edi :price units='Euro'>32.18</edi:price> ... </ stock >
Namespace prefix
Document Type Definitions
Document type definitionsA DTD specifies how elements inside an XML
document should relate to each other It also provides grammar rules for the document
and each of the elementsA document that fits to the XML specifications
and rules outlined by its DTD is considered to be “valid”
(Not to be confused with a well-formed document, which adheres to XML syntax rules
Declaring DTD in XML Document
<!DOCTYPE CS_Faculty SYSTEM “faculty.dtd”>
keywordfile with DTDroot element
in XML file
Denotes that DTD resides in a separate local file
Declaring DTD in XML Document<?xml version=“1.0” standalone=“no” ?>
<!DOCTYPE CS_Faculty SYSTEM “faculty.dtd”>
<!-- Here begins the XML data -->
<CS_Faculty> <department> Artificial Intelligence </department> <department> Information Systems </department></CS_Faculty>
Declaring an internal DTD
At the top of the XML document, after the XML declaration, type:
<! DOCTYPE root[where root corresponds to the name of the root element in the document that the DTD will be applied to.
Type: ]> to complete the DTD.
Example code
<? XML version=“1.0” ?><!DOCTYPE CS_Faculty [
]>
Leave room between [ and ] for document type definitions.
Declaring a personal external DTD
In the XML declaration at the top of the document, add standalone =“no”
Type <!DOCTYPE root (name of root element)
Type SYSTEM to indicate that the external DTD is a personal, non-standardized DTD
Type file.dtd, where “file.dtd” is the DTD file
Type > to complete the document type declaration
Writing a personal external DTDThus use declaration like:
Create a new text file faculty.dtd with a text editorDefine the rules for the DTD (document type definitions
for defining elements and attributes, and entities and notations)
Save the file as text only with the .dtd extension
<!DOCTYPE CS_Faculty SYSTEM “faculty.dtd”>
<?xml version=“1.0” standalone=“no” ?>
Naming an external DTDType:
+ if DTD is approved by a standards body
- if DTD is not a recognized standard
Type: // Owner//DTD where owner identifies who
wrote or maintains the DTDType a space followed by a label for the DTD, then
//XX// where XX defines the language
Naming an external DTD (example)
- //Vagan Terziyan//DTD Faculties//EN//
Vagan Terziyan is the ownerFaculties is the DTD descriptionEN means the DTD is written in English
Declaring a public external DTDIn the XML declaration at the top of the document, add
standalone =“no”Type <!DOCTYPE root (name of root element)
Type PUBLIC to indicate that the external DTD is a standardized set of rules
Type “DTD_name” where DTD_name is the official name of the DTD you are referencing
Type file.dtd, where “file.dtd” is the DTD file
Type > to complete the document type declaration
Example code
<?xml version =”1.0” standalone = “no”?> <!DOCTYPE CS_Faculty PUBLIC “- //Vagan Terziyan//DTD Faculties//EN//” “http://www.ac.com/XML/examples/faculty.dtd”>
Defining elements and attributes in a DTD
Type <!ELEMENT tagType name of the elementType EMPTY if no contentsSpecify contentsType (ANY) to allow any combination of
elements or textTYPE > to complete the element declaration
Defining an element to contain only text
Type: <!ELEMENTType: name of the element
Next type: (#PCDATA)Finally type: >
<!ELEMENT faculty (#PCDATA)>
Defining an element to contain one child
Type: <!ELEMENTType: name of the element
Next type: (child of the element)Finally type: >
<!ELEMENT faculty (department)>
Defining an element to contain a sequence
Type: <!ELEMENTType: name of the element
Type: (child1, child2 ,…, childn of the element)Type: >
<!ELEMENT faculty (deans_office, department)>
Defining choices
Type: <!ELEMENTType: name of the element
Type: (child1 | child2 | … | childn of the element)Type: >
<!ELEMENT faculty (deans_office, (department | research_lab))>
Defining how many unitsTo define how many unitsType ? To indicate that the unit can appear at
most once, if at all (zero or one)Type + to indicate that the unit must appear at
least once (one or more)Or type * to indicate that the unit can appear as
many times as necessary, or not at all (zero or more)
Defining how many units
<!ELEMENT faculty (deans_office, financial_office*, library?, (department+ | research_lab+))>
About attributes
Attributes add information about an element
Information contained in attributes tends to be about the content of the page
Elements are perhaps better for information you want to display attributes for information about information
Defining simple attributesType <!ATTLISTType elementType attributeType CDATA
Or type (choice_1 | choice_2)Type DEFAULT
or type #REQUIRED or type #IMPLIED
Type >attribute must be explicitly provided
attribute is optional
Defining attributes example<!ELEMENT slideshow (slide+)> <!ATTLIST slideshow title CDATA #REQUIREDdate CDATA #IMPLIEDauthor CDATA #REQUIREDlanguage (English | German) # IMPLIED > <!ELEMENT slide (title, item*)>
Creating shortcuts for textPredefined Entities
In the DTD type <!ENTITYType abbreviationType “content”Type >
Using shortcuts for text
In the XML document type: &Type: abbreviation
where abbreviation is the identifying name of yourentity (and matches the one used in the previousexample)
Type: ;
Example<!ENTITY product "WonderWidget"><!ENTITY products "WonderWidgets">
<slideshow title="WonderWidget&product; Slide Show" ...<!-- TITLE SLIDE --> <slide type="all"> <title>Wake up to WonderWidgets&products;!</title> </slide><!-- OVERVIEW --> <slide type="all"> <title>Overview</title> <item>Why <em>WonderWidgets&products;</em> are great</item> <item/> <item>Who <em>buys</em> WonderWidgets&products;</item> </slide>
Cascading Style Sheets
CSS
CSS was made to format XML documents for presentation
External Style sheets global control of presentation
The Anatomy of Style
A style is made up of a selector and one or more declarations
Declarations determine how the chosen elements will be displayed
A selector can be as simple as an element name
Declarations have a property and a value: color:red or font:bold 12pt Tekton
Creating an External Style SheetTo create a style sheet:Create a text documentType name of selector for elementsType { to begin the properties that
should be appliedDefine as many properties as desiredType } to mark the end of the rule
Sample CSS code
name {display:block; position:absolute}intro {display:block; border:medium
dotted red; padding:5; margin-top:5}picture {display:block}population {display:inline}latin_name {display:inline}more_info {display:inline}
Calling a Style Sheet for an XML Document
To create the processing instruction manually:At the top of the document, after the initial
XML declaration, type: <?xml-stylesheet type=“text/css”Then type: href=“style.css”Finally, type: ?> to complete the processing instruction
<?xml-stylesheet type=“text/css” href=“style.css” ?>
Setting the Text Color
To set the text color:Type color:Type colorname, where colorname is one of 16
predefined colorsOr type #rrggbb, or rgb (r,g,b) where each
can be a value from 0-255Or rgb (%r,%g,%b) where r, g, b, specify the
percentage of red, green, or blue.
Aligning TextYou can set up certain HTML tags to always be aligned
to the right, left, center, or justified, as desired.To align text:Type left to align text to the leftType right to align text to the rightType center to center the text in the middle of the
screenType justify to align the text on both the right and left
Underlining Text
To underline text:Type text-decoration:To underline text type underlineFor a line above the text, type overlineTo strike out text, type line-throughTo get rid of underlining, overlining.
Etc., type text-decoration:none
Querying XML
A Query Language for XML: XML-QL
Designed in AT&T Labs (w. Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy)
Implementation on top of Strudel (Alin Deutsch, Mary Fernandez)
Prototype: http://www.research.att.com/sw/tools/xmlql
XML-QL Example Data (bib.xml)
<bib> <book year=“1995> <title> An Introduction to DB Systems </title>
<author> <lastname> Date </lastname></author><publisher><name> Addison-Wesley</name> </publisher></book><book year=“1995>
<title> Foundations for OR Databases </title><author> <lastname> Date </lastname></author> <author> <lastname> Darwen </lastname></author> <publisher><name> Addison-Wesley</name> </publisher></book>
</bib>
XML-QL Example Data (bib.dtd)
<!ELEMENT book (author+, title, publisher)><!ATTLIST book year CDATA><!ELEMENT article (author+, title, year?, (shortversion|longversion))><!ATTLIST article type CDATA><!ELEMENT publisher (name, address)><!ELEMENT author (firstname?, lastname)>
Query ExampleFind all the names of the authors whose publisher isAddison-Wesley:
WHERE <book><publisher><name> Addison-Wesley </name></publisher><title> $t </title><author> $a </author>
</book> IN "www.a.b.c/bib.xml"CONSTRUCT $a
Query Example (syntax)The use of </> instead of </XXX>:
WHERE <book><publisher><name> Addison-Wesley </></><title> $t </><author> $a </>
</> IN "www.a.b.c/bib.xml"CONSTRUCT $a
Result of the query:
The output is in XML form:
<lastname> Date </lastname><lastname> Darwen </lastname><lastname> Date </lastname>
XML-QL: Pattern Matching and Selections
WHERE <book><publisher>Springer</publisher> <author> $a </author> <year> $y </year> </book> IN "www.a.b.c/bib.xml”, 1991 <= $y AND $y <= 1994CONSTRUCT $a
XML-QLConstruction of New XML Data
WHERE <book> <publisher> Springer </> <title> $t </> <author> $a </> </> IN "www.a.b.c/bib.xml"CONSTRUCT <result> <author> $a </> <title> $t </> </>
Constructing new XML data: (result)
<result> <author> <lastname> Date </lastname> </author> <title> An Introduction to DB Systems </title></result><result> <author> <lastname> Date </lastname> </author> <title> Foundation for OR Databases</title></result><result> <author> <lastname> Darwen </lastname> </author> <title> Foundation for Object/Relational Databases: The Third Manifesto </title></result>
XML-QL Semantics
Step 1: find all substitutionsStep 2: construct XML result
WHERE $X..$Y..$Z
CONSTRUCT$ X $ Y $ Z
Conclusions
XML is for structuring dataXML looks a bit like HTMLXML is text, but isn't meant to be readXML is a family of technologiesXML is modularXML is the basis for RDF and the Semantic WebXML is license-free, platform-independent and well-
supported
Web Referenceshttp://www.xml.com/http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/http://wdvl.com/http://www.xml.org/http://www.w3.org/http://www.microsoft/XML/http://www.ibm/alphaworks/http://www.arbortext.com/