applied component-based software engineering xml basics
Post on 24-Feb-2016
35 Views
Preview:
DESCRIPTION
TRANSCRIPT
Applied Component-Based Software Engineering
XML Basics
CSE 668 / ECE 668Prof. Roger Crawfis
XML Quiz
What does XML stand for?
Is XML a language?
What is HTML? What is XHTTP? What is HTTPS?
Is HTML a language?
XML Quiz
What does XML stand for? eXtensible Markup Language
Is XML a language? No!
What is HTML? What is XHTTP? What is HTTPS? xhttp is a well-formed html (aka a valid XML)
Is HTML a language? Yes!
XML Motivation
Data interchange is critical in today’s networked world Examples:
Banking: funds transfer Order processing (especially inter-company orders) Scientific data
Chemistry: ChemML, … Genetics: BSML (Bio-Sequence Markup Language), …
Paper flow of information between organizations is being replaced by electronic flow of information
Each application area has its own set of standards for representing information Plain text with line headers indicating the meaning of fields
XML has become the basis for all new generation data interchange formats
Semi-structured Data
Nodes = objects.Labels on arcs (attributes, relationships).Atomic values at leaf nodes (nodes with
no arcs out).Flexibility: no restriction on:
Labels out of a node.Number of successors with a given label.
Example: Data Graph
BudA.B.
Gold1995
MapleJoe’s
M’lob
beer beerbar
manfmanf
servedAt
name
namename
addr
prize
year award
root
The bar objectfor Joe’s Bar
The beer objectfor Bud
Notice anew kindof data.
XML Standardization
World Wide Web Consortium (W3C) http://www.w3.org
More resources at http://www.xml.com
Java-XML (and web services) info at http://java.sun.com/javaee/technologies
.NET-XML (via web services) info at http://www.microsoft.com/net/TechnicalResources
XML Uses
Example: the Ajax technology. Small volume browser-server communication in XML supports more interactive Web pages.
Example: Web services. Marshalling and unmarshalling data in SOAP uses XML. Service descriptions use XML.
XML Uses
Example: Data exchange formats. (Applications must agree on common meaning for tags.)
Older data exchange formats have been redesigned as instances of XML, eg. HL7 in health informatics, FIX in the financial industry, etc. Even proprietary formats like MS Word now have open XML versions.
Example: Software development configuration files, eg., in W3C, Apache, Java EE, .NET frameworks.
(All this may be geek paradise but it’s awfully verbose and the scarcity of visual editors is puzzling.)
Why People Like XML
Can get data from all sorts of sourcesAllows us to touch data we don’t own!Can integrate various data sources as if
they were databases (almost)We can publish some of the data in our
databases on the Web conveniently
Well-Formed and Valid XML
Well-Formed XML allows you to invent your own tags.Similar to labels in semi-structured data.
Valid XML involves either a: DTD (Document Type Definition), a
grammar for tags.XSD (XML Scheme Document), a grammar
for tags in XML format.
Well-Formed XML
A legal XML document – fully parsable by an XML parserAll open-tags have matching close-tagsAttributes (which are unordered) only
appear once in an elementThere’s a single root element
Well-Formed XML
Start the document with a declaration, surrounded by <?xml … ?> .
Normal declaration is:<?xml version = “1.0” standalone = “yes” ?>
Standalone – DTD or Schema provided.
Balance of document is a root tag surrounding nested tags.
Tags
Tags, as in HTML, are normally matched pairs, as <FOO> … </FOO> .
Tags may be nested arbitrarily.XML tags are case sensitive.
Example: Well-Formed XML
<?xml version = “1.0” standalone = “yes” ?><BARS>
<BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME>
<PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME>
<PRICE>3.00</PRICE></BEER></BAR><BAR> …
</BARS>
A NAMEsubobject
A BEERsubobject
XML and Semi-structured Data
Well-Formed XML with nested tags is exactly the same idea as trees of semi-structured data.
Graphs are possible through indirection.
Example
The <BARS> XML document is:
Joe’s Bar
Bud 2.50 Miller 3.00
PRICE
BARBAR
BARS
NAME . . .
BAR
PRICENAME
BEERBEER
NAME
XML as a Data Model
XML “information set” includes 7 types of nodes: Document (root) Element Attribute Processing instruction Text (content) Namespace Comment
XML data model includes this, plus order info and a few other things
XML Anatomy
<?xml version="1.0" encoding="ISO-8859-1" ?> <dblp> <mastersthesis mdate="2002-01-03" key="ms/Brown92"> <author>Kurt P. Brown</author> <title>PRPL: A Database Workload Specification Language</title> <year>1992</year> <school>Univ. of Wisconsin-Madison</school> </mastersthesis> <article mdate="2002-01-03" key="tr/dec/SRC1997-018"> <editor>Paul R. McJones</editor> <title>The 1995 SQL Reunion</title> <journal>Digital System Research Center Report</journal> <volume>SRC1997-018</volume> <year>1997</year> <ee>db/labs/dec/SRC1997-018.html</ee> <ee>http://www.mcjones.org/System_R/SQL_Reunion_95/</ee> </article>
Attribute
Element
Close-tag
Open-tagProcessing Instr.
A Visualization of XML Data
Root
?xml dblp
mastersthesis article
mdate keyauthor title year school editor title yearjournal volume eeee
mdatekey
2002…
ms/Brown92
Kurt P….
PRPL…
1992
Univ….
2002…
tr/dec/…
Paul R.
The…
Digital…
SRC…
1997
db/labs/dec
http://www.
attributeroot
p-i element
text
Empty Elements
We can do all the work of an element in its attributes. Like BEER in previous example.
Another example: SELLS elements could have attribute price rather than a value that is a price.
Example use:<SELLS theBeer = “Bud” price = “2.50”/>
Note exception to“matching tags” rule
XML Namespaces
Namespaces allow us to specify a context for different tags
Two parts: Binding of namespace to URI Qualified names
<tag xmlns:myns=“http://www.fictitious.com/mypath”><thistag>is in namespace myns</thistag><myns:thistag>is the same</myns:thistag><otherns:thistag>is a different tag</otherns:thistag>
</tag>
XML Attributes
An (opening) tag may contain attributes. These are typically used to describe the content of an element
<entry> <word language = “en”> cheese </word> <word language = “fr”> fromage </word> <word language = “ro”> branza </word> <meaning> A food made … </meaning>
</entry>
XML Attributes
Another common use for attributes is to express dimension or type
<picture> <height dim= “cm”> 2400 </height> <width dim= “in”> 96 </width> <data encoding = “gif” compression = “zip”> M05-.+C$@02!G96YE<FEC ... </data></picture>
When to use attributes
<person ssno= “123 45 6789”> <name> F. MacNiel </name> <email> fmacn@dcs.barra.ac.sc </email> ...</person>
<person> <ssno> 123 45 6789 </ssno> <name> F. MacNiel </name> <email> fmacn@dcs.barra.ac.sc </email> ...</person>
The choice between representing data as attributes or as elements is sometimes unclear, taste applies.
Defining the structure of an XML file
We can check if an XML file is well-formed by looking at it, maybe By loading it into a browser
If well-formed, it will be displayed
However, how can we check that the well-formed file contains the correct elements in the correct quantities? We need to write a specification for the XML file
XML Needs Help
It’s too unconstrained for many cases!How will we know when we’re getting
garbage?How will we query?How will we understand what we got?
We also need:Some idea of the structurePresentation, in some cases – CSS, XSLSome way of interpreting the tags
Defining the structure of an XML file
There are 2 main alternativesDocument Type Definitions
Original and simpleXML Schema
More versatile and complexWe will look at both
Concentrating on XML SchemaXML documents are not required to have
an associated schema
Document Type Definition (DTD)
The type of an XML document can be specified using a DTD
DTD constrains structure of XML data What elements can occur What attributes can/must an element have What sub-elements can/must occur inside each element, and
how many times. DTD does not constrain data types
All values represented as strings in XML DTD syntax
<!ELEMENT element (subelements-specification) > <!ATTLIST element (attributes) >
Example: An Address Book
<person ssn = “4444”> <name> Homer Simpson </name><tel> 2543 </tel><tel> 2544 </tel><email>
homer@math.springfield.edu </email></person>
Up to 4 tel nos
At least one email
Exactly one nameAn attribute
One or more persons
Example: The Address Book2
<person> <name> MacNiel, John </name><greet> Dr. John MacNiel </greet><addr>1234 Huron Street </addr><addr> Rome, OH 98765 </addr><tel> (321) 786 2543 </tel><fax> (321) 786 2543 </fax><tel> (321) 786 2543 </tel><email> jm@abc.com </email>
</person>
Exactly one nameAt most one greeting
As many address lines as needed (in order)
Mixed telephones and faxes
At least one
DTD - Specifying the Structure
In a DTD, we can specify the permitted content for each element, using regular expressions
For a person element, the regular expression isname, title?, tel*,email+
What’s in a person Element?
This meansname = there must be a name element title? = there is an optional title element (i.e.,
0 or 1 title elements)name, title? = the name element is followed
by an optional title element tel* = there are 0 or more tel elements email+ = there are 1 or more email elements
DTD For the Address Book2
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE addressbook [ <!ELEMENT addressbook (person*)> <!ELEMENT person (name, title?, tel*, email+)> <!ELEMENT name (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT tel (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ATTLIST person
ssn CDATA REQUIRED>
]>PCDATA means parsed character data
Regular expressions
Attributes in a DTD
XML elements can have attributes. General Syntax for DTD:
<!ATTLIST element-name attribute-name1 type1 default-value1….attribute-namen typen default-valuen>
Example: <!ATTLIST person ssn CDATA REQUIRED>
CDATA means Character data Default value could be REQUIRED or IMPLIED (meaning
optional)
Example: DTD
<!DOCTYPE BARS [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>
]>
A BARS object haszero or more BAR’snested within.
A BAR has oneNAME and oneor more BEERsubobjects.
A BEER has aNAME and aPRICE.
NAME and PRICEare text.
Use of DTD’s
1. Set standalone = “no”.2. Either:
a) Include the DTD as a preamble of the XML document, or
b) Follow DOCTYPE and the <root tag> by SYSTEM and a path to the file where the DTD can be found.
Use of DTD’s
<?xml version = “1.0” standalone = “no” ?><!DOCTYPE BARS [
<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>
]><BARS>
<BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME>
<PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME>
<PRICE>3.00</PRICE></BEER></BAR> <BAR> …
</BARS>
The DTD
The document
Use of DTD’s
Assume the BARS DTD is in file bar.dtd.<?xml version = “1.0” standalone = “no” ?><!DOCTYPE BARS SYSTEM “bar.dtd”><BARS>
<BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME>
<PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME>
<PRICE>3.00</PRICE></BEER></BAR><BAR> …
</BARS>
Get the DTDfrom the filebar.dtd
Valid Documents
A document with a DTD is valid if it conforms to the DTD, i.e., the document conforms to the regular-
expression grammar, types of attributes are correct, andconstraints on references are satisfied
DTDs Problems
DTDs are rather weak specifications by DB & programming-language standards
Some limitations:Only one base type – PCDATAAlso no constraints, e.g range of values,
frequency of occurrenceNot easily parsed (since they are not XML)Not easy to express that element a has
exactly the children c, d, e in any order
DTDs Problems
Difficult to specify unordered sets of subelements Order is usually irrelevant in databases (unlike
in the document-layout environment from which XML evolved)
(A | B)* allows specification of an unordered set, but
Cannot ensure that each of A and B occurs only once
Many other more complex problems.
XML Schema
DTDs are now being superceded by XML schemas. They provide the following features
XML Syntax So can be parsed, validated with standard XML tools
Data types other than #PCDATA There are built in types such as integer, float, boolean,
string and many others Greater control over permitted constructs
Can specify maximum and minimum occurrences Can use regular expressions to set patterns to be
matched Support for modularity and inheritance
Schema types
There are some basic built-in types such as xs:string, xs:decimal, xs:integer, xs:ID
Each element is composed of either simple types or complex types. A complex type is often a sequence of elements
The content of the type can be declared as shown in the following example. A type can also be declared, named and referred to.
Notice the use of minOccurs and maxOccurs. Their default is 1.
Simple Schema Example
<?xml version="1.0" ?> <xs:schema xmlns:xs= "http://www.w3.org/2001/XMLSchema"><xs:element name="people"> <xs:complexType>
<xs:sequence> <xs:element name="person" maxOccurs = "unbounded"> details of the person element -pto
</xs:element> </xs:sequence> </xs:complexType>
</xs:element> </xs:schema>
standard stuff
Top-level element
Namespace
Namespace declaration
So at the start of a document we must specify what namespaces we are using.
In the schema example, we are using the XML schema namespace with the xs prefix
We declare this namespace in an attribute in the top-level element<xs:schema xmlns:xs=
"http://www.w3.org/2001/XMLSchema"> We then use the xs prefix in all the XML Schema
elements e.g. complexType, sequence, element etc
Schema Example Continued
Details of the person element<xs:element name="person"
maxOccurs="unbounded"> <xs:complexType>
<xs:sequence> <xs:element name ="name" type="xs:string"/> <xs:element name = "tel" type="xs:string" /> <xs:element name = "email" type="xs:string"
minOccurs="0" maxOccurs="1"/> </xs:sequence>
<xs:attribute name= "sssNo" type="xs:integer" use="required"/>
</xs:complexType></xs:element> A person is a complex type
which is a sequence of elements and an attribute
Empty element
Restrictions on elements
You can also restrict the data values a range
<xs:minInclusive value="0"/> <xs:maxInclusive value="120"/>
an enumerated list <xs:enumeration value="Audi"/>
<xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/>
a pattern <xs:pattern value="([a-z])*"/>
Means 0 or more lowercase alphabetic chars
XSD Built-in Types
Declaring your own types
Named types can be used for elements or attributes. Here’s an example which specifies restrictions on the attribute A named type is declared
<xs:simpleType name = "ssstype"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> </xs:restriction></xs:simpleType>
And used as the attribute type <xs:attribute name= "sssNo" type="ssstype"
use="required"/>
More complex Schemas
The previous example shows a simple schema.
It is also possible to make the schema easier to maintain by declaring all the simple elements first and
then referring to them in the body of the document
By naming the declaration of simple and complex types, which could then be used later in the document, and more than once if necessary
Referring to a schema
Save your schema in a file with the extension xsd.
Linking schema definition with a document is done using a special attribute of the root node of the document:<people
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation=“people.xsd">
Validating
Validators http://www.w3.org/2001/03/webdata/xsv http://tools.decisionsoft.com/schemaValidate/Many others on the web
XML Schema Example
<xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema><xs:element name=“bank” type=“BankType”/><xs:element name=“account”>
<xs:complexType> <xs:sequence> <xs:element name=“account_number” type=“xs:string”/> <xs:element name=“branch_name” type=“xs:string”/> <xs:element name=“balance” type=“xs:decimal”/> </xs:squence></xs:complexType>
</xs:element>….. definitions of customer and depositor ….<xs:complexType name=“BankType”>
<xs:squence><xs:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/><xs:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/><xs:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/>
</xs:sequence></xs:complexType></xs:schema>
Application Program Interface
Two standard application program interfaces to XML data (Java, C++, etc.): SAX (Simple API for XML) (3rd party for .NET)
Based on parser model, user provides event handlers (call-back functions) for parsing events
E.g. start of element, end of element Not suitable for database applications
DOM (Document Object Model) XML data is parsed into a tree representation Functions for accessing, traversing and searching the DOM .NET DOM API provides XmlNode class:
ParentNode, ChildNodes, NextSibling, FirstChild, Attributes. properties
.NET adds a 3rd method: LINQ to XML.
top related