module 2 xml basics - stanford...

49
1 Module 2 Module 2 XML Basics XML Basics (XML, Namespaces, (XML, Namespaces, Usage scenarios, DTDs) Usage scenarios, DTDs)

Upload: others

Post on 07-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

1

Module 2Module 2

XML BasicsXML Basics(XML, Namespaces, (XML, Namespaces,

Usage scenarios, DTDs)Usage scenarios, DTDs)

Page 2: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

2

History: SGML vs. HTML vs. History: SGML vs. HTML vs. XMLXML

SGML (1960)

XML(1996)

HTML(1990) XHTML(2000)

http://www.w3.org/TR/2006/REC-xml-20060816/

Page 3: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

3

Why XML ?Why XML ? HTML is to be interpreted by browsers HTML is to be interpreted by browsers

Shown on the screen to a humanShown on the screen to a human Desire to separate the “content” from Desire to separate the “content” from

“presentation”“presentation” Presentation has to please the human eyePresentation has to please the human eye Content can be interpreted by machines, for Content can be interpreted by machines, for

machines presentation is a handicapmachines presentation is a handicap Semantic markup of the dataSemantic markup of the data

Page 4: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

4

Information about a book in Information about a book in HTMLHTML

<td><h1 class=”<td><h1 class=”BooksBooks">">Politics of experience by Ronald Laing, Politics of experience by Ronald Laing, published in 1967published in 1967</h1></td><td align="right" nowrap> Item </h1></td><td align="right" nowrap> Item number:320070381076</td><td align="right" valign="top"><img number:320070381076</td><td align="right" valign="top"><img src="http://pics.booksstatic.com/aw/pics/globalAssets/rtCurve.gisrc="http://pics.booksstatic.com/aw/pics/globalAssets/rtCurve.gif" width="8" height="8"></td></tr><tr><td colspan="6" f" width="8" height="8"></td></tr><tr><td colspan="6" valign="middle" bgcolor="#5F66EE"><img valign="middle" bgcolor="#5F66EE"><img src="http://pics.booksstatic.com/aw/pics/s.gif" width="1" src="http://pics.booksstatic.com/aw/pics/s.gif" width="1" height="4"></td></tr></table><table width="100%" border="0" height="4"></td></tr></table><table width="100%" border="0" cellpadding="0" cellspacing="0"><tr><td cellpadding="0" cellspacing="0"><tr><td bgcolor="#CCCCFF"><img bgcolor="#CCCCFF"><img src="http://pics.booksstatic.com/aw/pics/s.gif" width="1" src="http://pics.booksstatic.com/aw/pics/s.gif" width="1" height="1"></td><td bgcolor="#EEEEFF"><div height="1"></td><td bgcolor="#EEEEFF"><div id="FastVIPBIBO"><table border="0" cellpadding="0" id="FastVIPBIBO"><table border="0" cellpadding="0" cellspacing="0" width="100%">cellspacing="0" width="100%">

Page 5: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

5

The same information in XMLThe same information in XML<<bookbook yearyear=“1967”>=“1967”> <<titletitle>Politics of experience</>Politics of experience</titletitle>> <<authorauthor>>

<<firstnamefirstname>Ronald</>Ronald</firstnamefirstname>><<lastnamelastname>Laing</>Laing</lastnamelastname>>

</</authorauthor>></</bookbook>> Elements

• Information is (1) decoupled from presentation, then (2) chopped into smaller pieces, and then (3) marked with semantic meaning • It can be processed by machines•Like HTML, only syntax, not logical abstract data model

Page 6: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

6

XML key conceptsXML key concepts

DocumentsDocuments ElementsElements AttributesAttributes Namespace declarationsNamespace declarations TextText CommentsComments Processing InstructionsProcessing Instructions All inherited from SGML, then HTMLAll inherited from SGML, then HTML

Page 7: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

7

The key concepts of XMLThe key concepts of XML<<bookbook yearyear=“1967”>=“1967”> <<titletitle>Politics of experience</>Politics of experience</titletitle>> <<authorauthor>>

<<firstnamefirstname>Ronald</>Ronald</firstnamefirstname>><<lastnamelastname>Laing</>Laing</lastnamelastname>>

</</authorauthor>></</bookbook>> Elements

• Documents• Elements• Attributes• Text

• Nested structure• Conceptual tree• Order is important• Only “characters”, not integers, etc

Page 8: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

8

ElementsElements Enclosed in TagsEnclosed in Tags

Begin Tag: e.g., Begin Tag: e.g., <bibliography><bibliography> End Tag: e.g., End Tag: e.g., </bibliography></bibliography> Element without content: e.g., Element without content: e.g., <bibliography /> <bibliography /> is a is a

shorthand forshorthand for <bibliography> </bibliography> <bibliography> </bibliography>

Elements can be nestedElements can be nested<bib> <bib> <book> Wilde Wutz </book> <book> Wilde Wutz </book> </bib></bib>

Subelements can implement multisets Subelements can implement multisets <bib> <bib> <book> ... </book> <book> ... </book> <book> ... </book> <book> ... </book> </bib></bib>

Order is important !Order is important ! Documents must be well-formedDocuments must be well-formed

<a> <a> <b> <b> </a> </a> </b> </b> is forbidden!is forbidden!<a> <a> <b> </b> <b> </b> is forbidden!is forbidden!

Page 9: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

9

AttributesAttributes Attribute are associated to ElementsAttribute are associated to Elements

<book price = „55“ year = „1967“<book price = „55“ year = „1967“ >> <title> ... </title><title> ... </title> <author> ... </author> <author> ... </author></book></book>

Elements can have only attributesElements can have only attributes<person name = „Wutz“ age = „33“/><person name = „Wutz“ age = „33“/>

Attribute names must be unique! (No Multisets)Attribute names must be unique! (No Multisets)<person name = „Wilde“ name = „Wutz“/> <person name = „Wilde“ name = „Wutz“/> is illegal!is illegal!

What is the difference between a nested element What is the difference between a nested element and an attribute? Are attributes useful?and an attribute? Are attributes useful?

Modeling decision: should „name“ be an attribute Modeling decision: should „name“ be an attribute or a subelement of a person ? What about „age“ ?or a subelement of a person ? What about „age“ ?

Page 10: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

10

Text and Mixed ContentText and Mixed Content Text appears in element contentText appears in element content

<title><title>The politics of experienceThe politics of experience</title></title> Can be mixed with other subelementsCan be mixed with other subelements

<title><title>The politics of <em>experience</em>The politics of <em>experience</em></title></title> Mixed ContentMixed Content

For „documents“ data -- very usefulFor „documents“ data -- very useful The need does not arise in „data“ processing, only entities The need does not arise in „data“ processing, only entities

and relationshipsand relationships People speak in sentences, not entities and relationships. People speak in sentences, not entities and relationships.

XML allows to preserve the structure of natural language, XML allows to preserve the structure of natural language, while adding semantic markup that can be interpreted by while adding semantic markup that can be interpreted by machines.machines.

Page 11: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

11

Continuous spectrum between Continuous spectrum between natural language, semi-structured natural language, semi-structured

data, and structured datadata, and structured data1.1. Dana said that the book entitledDana said that the book entitled „The politics of „The politics of

experience“ is really excellent !experience“ is really excellent !2.2. <citation author=„Dana“><citation author=„Dana“> The book entitled The book entitled „The „The

politics of experience“ is really excellent ! politics of experience“ is really excellent ! </citation></citation>3.3. <citation author=„Dana“><citation author=„Dana“> The book entitled The book entitled <title><title>

The politics of experienceThe politics of experience</title></title> is really excellent ! is really excellent ! </citation></citation>

4.4. <citation><citation> <author><author>DanaDana</author></author> <aboutTitle><aboutTitle>The politics of experienceThe politics of experience</aboutTitle></aboutTitle> <rating><rating> excellent excellent</rating></rating>

</citation></citation>

Page 12: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

12

CDATA sectionsCDATA sections Sometimes we would like to preserve the Sometimes we would like to preserve the

original characters, and not interpret them as original characters, and not interpret them as markupmarkup

CDATA sectionsCDATA sections Not parsed as XMLNot parsed as XML

<message><message> <greeting>Hello,world!</greeting><greeting>Hello,world!</greeting>

</message></message>

<message> <message> <![CDATA[<greeting>Hello, <![CDATA[<greeting>Hello, world!</greeting>]]>world!</greeting>]]> </message> </message>

Page 13: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

13

Comments, PIs, PrologComments, PIs, Prolog Comment: Syntax as in HTMLComment: Syntax as in HTML

<!-- this is a comment --><!-- this is a comment --> Processing InstructionsProcessing Instructions

Contain no data - interpretation by processorContain no data - interpretation by processor Syntax: Syntax: <?pause 10 secs ?> <?pause 10 secs ?> Pause is Pause is „Target“; „Target“; 10secs 10secs is „Content“is „Content“ XML XML is a reserved target for prologis a reserved target for prolog

PrologProlog<?xml version=„1.0“ encoding=„UTF-8“ standalone=„yes“ ?><?xml version=„1.0“ encoding=„UTF-8“ standalone=„yes“ ?> Standalone defines whether there is a DTDStandalone defines whether there is a DTD Encoding is usually Unicode.Encoding is usually Unicode.

Page 14: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

14

Whitespaces declarationWhitespaces declaration Whitespace = Continuous sequence ofWhitespace = Continuous sequence of Space Space, , TabTab

andand Return Return character character Special Attribute Special Attribute xml:spacexml:space to control useto control use Human-readible XML (with Whitespace)Human-readible XML (with Whitespace)

<book <book xml:space=„preserve“xml:space=„preserve“ > > <title>The politics of experience</title> <title>The politics of experience</title> <author>Ronald laing</author> <author>Ronald laing</author></book></book>

(Efficient) machine-readible XML (no WS)(Efficient) machine-readible XML (no WS) <book <book xml:space=„default“xml:space=„default“ ><title>The politics of ><title>The politics of experience</title><author>Ronald experience</title><author>Ronald Laing</author></book>Laing</author></book>

Performance improvement: ca. Factor 2.Performance improvement: ca. Factor 2.

Page 15: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

15

Language declarationLanguage declaration <p <p xml:lang="en">xml:lang="en">The quick The quick brown fox jumps over the lazy brown fox jumps over the lazy dog.</p>dog.</p>

<p <p xml:lang="en-GB">xml:lang="en-GB">What colour What colour is it?</p>is it?</p>

<p <p xml:lang="en-US">xml:lang="en-US">What color What color is it?</p>is it?</p>

Page 16: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

16

Universal Resource Identifiers Universal Resource Identifiers on the Webon the Web

URLs, URIs, IRIsURLs, URIs, IRIs URL (Universal Resource Locators):URL (Universal Resource Locators): deferenceable deferenceable

identifier on the Webidentifier on the Web The target of an URL pointer is an HTML file (virtual or The target of an URL pointer is an HTML file (virtual or

materialized)materialized) URIs (Unique Resource Identifier):URIs (Unique Resource Identifier): general purpose key general purpose key

to resources on the Webto resources on the Web Uniquely identifies a resourceUniquely identifies a resource Target is not an HTML file, can be anything (schema, table, file, Target is not an HTML file, can be anything (schema, table, file,

entity, object, tuple, person, physical item, etc)entity, object, tuple, person, physical item, etc) Lifetime and scope of this “key” is user dependentLifetime and scope of this “key” is user dependent

IRI (Internationalized Resource Identifiers)IRI (Internationalized Resource Identifiers) Allow non Latin characters (Chinese, Arabic, Japanese, etc)Allow non Latin characters (Chinese, Arabic, Japanese, etc)

URL, URI, IRIsURL, URI, IRIs All stringsAll strings Very LONG stringsVery LONG strings

Page 17: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

17

NamespacesNamespaces Integration of Data from diverse data sourcesIntegration of Data from diverse data sources Integration of different XML Vocabularies (aka Namespaces)Integration of different XML Vocabularies (aka Namespaces) Each „vocabulary“ has a unique key, identified by a URI/IRIEach „vocabulary“ has a unique key, identified by a URI/IRI Same local name, from different vocabularies can haveSame local name, from different vocabularies can have

Different meaningDifferent meaning Different structure associated with itDifferent structure associated with it

Qualified Names (Qname) to attach a „name“ to its Qualified Names (Qname) to attach a „name“ to its „vocabulary“„vocabulary“ for all nodes in an XML document that has names (Attributes, Elements, for all nodes in an XML document that has names (Attributes, Elements,

PisPis QNameQName ::= triple ( URI ::= triple ( URI [ prefix: ][ prefix: ] localname )localname )

Binding (prefix, URI) is introduced in elements start tagBinding (prefix, URI) is introduced in elements start tag Later only the prefix is used, not the long URIsLater only the prefix is used, not the long URIs Prefix is optional, default namespacesPrefix is optional, default namespaces Prefix and localname a separated by „:“ Prefix and localname a separated by „:“

„„http://w3.org/TR/1999/REC-xml-names“http://w3.org/TR/1999/REC-xml-names“

Page 18: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

18

Namespaces (cont)Namespaces (cont)

Namespace definitions look like AttributesNamespace definitions look like Attributes Identified by „xmlns:prefix“ or „xmlns“ (default)Identified by „xmlns:prefix“ or „xmlns“ (default) Bind the Prefix to the URIBind the Prefix to the URI

Scope is the entire element where the Scope is the entire element where the namespace is declarednamespace is declared Includes the element itslef, its attributes and ist Includes the element itslef, its attributes and ist

subtreessubtrees ExampleExample

<<ns:ns:a a xmlns:ns=„someURI“ ns:xmlns:ns=„someURI“ ns:b=„foo“> b=„foo“> <<ns:ns:b>content</b>content</nsns:b>:b>

</</ns:ns:a>a>

Page 19: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

19

Default namespacesDefault namespaces Default namespaces, no prefixDefault namespaces, no prefix

<a xmlns=„someURI“ ><a xmlns=„someURI“ > <b/> <!-- a and b are in the someURI namespace! --><b/> <!-- a and b are in the someURI namespace! --></a></a>

Only applies to subelements, not attributesOnly applies to subelements, not attributes<a xmlns=„someURI“ <a xmlns=„someURI“ c = „not in someURI c = „not in someURI

namespace“namespace“>> <b/> <!-- a and b are in the someURI namespace! --><b/> <!-- a and b are in the someURI namespace! --></a></a>

Page 20: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

20

Example: NamespacesExample: Namespaces DQ1 defines DQ1 defines dishdish for for chinachina

Diameter, Volume, Decor, ...Diameter, Volume, Decor, ... DQ2 defines DQ2 defines dishdish for for satellitessatellites

Diameter, FrequencyDiameter, Frequency How many „dishes“ are there?How many „dishes“ are there? Better ask for:Better ask for:

„„How many How many dishes dishes are there?“are there?“ or or

„„How many How many dishesdishes are there are there?“ ?“

Page 21: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

21

Example: NamespacesExample: Namespaces<gs:dish <gs:dish xmlns:gs = „http://china.com“xmlns:gs = „http://china.com“ >> <gs:dm gs:unit = „cm“><gs:dm gs:unit = „cm“>2020</gs:dm></gs:dm> <gs:vol gs:unit = „l“><gs:vol gs:unit = „l“>55</gs:vol></gs:vol> <gs:decor><gs:decor>MeissnerMeissner</gs:decor></gs:decor></gs:dish></gs:dish>

<sat:dish <sat:dish xmlns:sat = „http://satelite.com“xmlns:sat = „http://satelite.com“ >> <sat:dm><sat:dm>200200</sat:dm></sat:dm> <sat:freq><sat:freq>20-2000MHz20-2000MHz</sat:freq></sat:freq></sat:dish></sat:dish>

Page 22: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

22

Mixing Several NamespacesMixing Several Namespaces<<gs:dish xmlns:gs = „http://china.com“gs:dish xmlns:gs = „http://china.com“ xmlns:uom = „http://units.com“>xmlns:uom = „http://units.com“> <<gs:dmgs:dm uom:unit = „cm“>uom:unit = „cm“>2020<</gs:dm/gs:dm>> <<gs:volgs:vol uom:unit = „l“>uom:unit = „l“>55<</gs:vol/gs:vol>> <<gs:decorgs:decor>>MeissnerMeissner<</gs:decor/gs:decor>> <comment><comment>This is an unqualified element nameThis is an unqualified element name</comment></comment><</gs:dish/gs:dish>>

Page 23: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

23

Example XML dataExample XML data XHTML (browser/presentation)XHTML (browser/presentation) RSS (blogs)RSS (blogs) UBL (Universal Business Language)UBL (Universal Business Language) HealthCare Level 7 (medical data)HealthCare Level 7 (medical data) XBRL (financial data)XBRL (financial data) Digital photography metadata (XMP)Digital photography metadata (XMP) XMI (metadata)XMI (metadata) XQueryX (programs)XQueryX (programs) XForms (forms)XForms (forms) SOAP (message envelopes)SOAP (message envelopes) Microsoft Office -- Powerpoint in XML Microsoft Office -- Powerpoint in XML

(documents)(documents)

Page 24: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

24

XHTMLXHTML

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 25: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

25

RSS, blogsRSS, blogs <?xml version="1.0"?><rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/"> <channel rdf:about="http://www.xml.com/xml/news.rss"> <title>XML.com</title> <link>http://xml.com/pub</link> <description> XML.com features a rich mix of information and services for the XML community. </description> <image rdf:resource="http://xml.com/universal/images/xml_tiny.gif" /> <items> <rdf:Seq> <rdf:li resource="http://xml.com/pub/2000/08/09/xslt/xslt.html" /> <rdf:li resource="http://xml.com/pub/2000/08/09/rdfdb/index.html" /> </rdf:Seq> </items> <textinput rdf:resource="http://search.xml.com" /> </channel> <image rdf:about="http://xml.com/universal/images/xml_tiny.gif"> <title>XML.com</title> <link>http://www.xml.com</link> <url>http://xml.com/universal/images/xml_tiny.gif</url> </image>

Page 26: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

26

UBL (Universal Business UBL (Universal Business Language) Language)

Vocabularies definitions for:Vocabularies definitions for: ApplicationResponse•AttachedDocument•BillOfLadingApplicationResponse•AttachedDocument•BillOfLading

•Catalogue•CatalogueDeletion•CatalogueItemSpecific•Catalogue•CatalogueDeletion•CatalogueItemSpecificationUpdate•CataloguePricingUpdate•CatalogueRequationUpdate•CataloguePricingUpdate•CatalogueRequest•CertificateOfOrigin•CreditNote•DebitNote•Despatcest•CertificateOfOrigin•CreditNote•DebitNote•DespatchAdvice•ForwardingInstructions•FreightInvoice•InvoichAdvice•ForwardingInstructions•FreightInvoice•Invoice•Order•OrderCancellation•OrderChange•OrderRespe•Order•OrderCancellation•OrderChange•OrderResponse•OrderResponseSimple•PackingList•Quotation•Ronse•OrderResponseSimple•PackingList•Quotation•ReceiptAdvice•Reminder•RemittanceAdvice•RequestFeceiptAdvice•Reminder•RemittanceAdvice•RequestForQuotation•SelfBilledCreditNote•SelfBilledInvoice•StorQuotation•SelfBilledCreditNote•SelfBilledInvoice•Statement•TransportationStatus•Waybillatement•TransportationStatus•Waybill

Page 27: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

27

HealthCareLevel 7HealthCareLevel 7 Medical information that is being exchanged Medical information that is being exchanged

between hospitals, patients, doctors, between hospitals, patients, doctors, pharmacies and insurance companiespharmacies and insurance companies

http://en.wikipedia.org/wiki/HL7http://en.wikipedia.org/wiki/HL7

Page 28: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

28

XBRL (Financial information)XBRL (Financial information) Goal: facilitate the exchange of business Goal: facilitate the exchange of business

and financial performance information and financial performance information between companies, governments, between companies, governments, insurance companies, banks, etc.insurance companies, banks, etc.

Mandate by law in many countriesMandate by law in many countries http://en.wikipedia.org/wiki/XBRLhttp://en.wikipedia.org/wiki/XBRL

Page 29: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

29

Extensible Metadata Platform Extensible Metadata Platform (XMP)(XMP)

Used in Used in PDFPDF, , photographyphotography and and photo editingphoto editing applications.applications.

Particular Particular schemasschemas for basic properties useful for for basic properties useful for recording the history of a resource as it passes recording the history of a resource as it passes through multiple processing steps, from being through multiple processing steps, from being photographed, photographed, scannedscanned, or authored as text, , or authored as text, through photo editing steps (such as through photo editing steps (such as croppingcropping or or color adjustment), to assembly into a final image.color adjustment), to assembly into a final image.

XMP allows each software program or device along XMP allows each software program or device along the way to add its own information to a digital the way to add its own information to a digital resource, which can then be retained in the final resource, which can then be retained in the final digital file.digital file.

http://en.wikipedia.org/wiki/Extensible_Metadathttp://en.wikipedia.org/wiki/Extensible_Metadata_Platforma_Platform

Page 30: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

30

Microsoft Office in XMLMicrosoft Office in XML Office 2003 was able to import/export all Office 2003 was able to import/export all

documents into XMLdocuments into XML Office 2007 models the documents NATIVELY Office 2007 models the documents NATIVELY

in XMLin XML Examples of vocabularies and schemas:Examples of vocabularies and schemas:

WordprocessingML (the XML file format for WordprocessingML (the XML file format for Word 2003), SpreadsheetML (Excel 2003), Word 2003), SpreadsheetML (Excel 2003), FormTemplate XML schemas (InfoPath 2003) FormTemplate XML schemas (InfoPath 2003) and DataDiagramingML (Visio 2003)and DataDiagramingML (Visio 2003)

Page 31: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

31

Forms on the Web in XMLForms on the Web in XML XML Forms (Xforms)XML Forms (Xforms)

http://www.w3.org/TR/xforms/http://www.w3.org/TR/xforms/ <xforms:model> <xforms:instance> <xforms:model> <xforms:instance> <ecommerce xmlns=""> <method/> <ecommerce xmlns=""> <method/> <number/> <expiry/> <number/> <expiry/> </ecommerce> </xforms:instance> </ecommerce> </xforms:instance> <xforms:submission <xforms:submission action="http://example.com/submit" action="http://example.com/submit" method="post" id="submit" method="post" id="submit" </xforms:model></xforms:model>

Page 32: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

32

Programs and queries in XMLPrograms and queries in XML XQuery, the XML query language, has an XML XQuery, the XML query language, has an XML

representationrepresentation Programs and queries are also DATAPrograms and queries are also DATA Blurring the distinction between data, metadata, codeBlurring the distinction between data, metadata, code <xqx:functionName>distinct</xqx:functionName> <xqx:functionName>distinct</xqx:functionName>

<xqx:parameters> <xqx:expr <xqx:parameters> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:functionCallExpr"> xsi:type="xqx:functionCallExpr"> <xqx:functionName>document</xqx:functionName> <xqx:functionName>document</xqx:functionName> <xqx:parameters> <xqx:expr <xqx:parameters> <xqx:expr xsi:type="xqx:stringConstantExpr"> xsi:type="xqx:stringConstantExpr"> <xqx:value>http://www.bn.com</xqx:value> <xqx:value>http://www.bn.com</xqx:value> </xqx:expr> </xqx:parameters> </xqx:expr> </xqx:parameters> </xqx:expr> </xqx:expr> <xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis> <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis> <xqx:elementTest> <xqx:elementTest> <xqx:nodeName> <xqx:nodeName> <xqx:QName>author</xqx:QName> <xqx:QName>author</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> </xqx:stepExpr> </xqx:expr></xqx:expr>

Page 33: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

33

SOAP and Web ServicesSOAP and Web Services Web Services is the favorite way of exchanging Web Services is the favorite way of exchanging

information between applicationsinformation between applications XML exchange over HTTP, with a specific protocol XML exchange over HTTP, with a specific protocol

(SOAP)(SOAP) <?xml version='1.0' ?><env:Envelope <?xml version='1.0' ?><env:Envelope

xmlns:env="http://www.w3.org/2003/05/soap-envelope"> xmlns:env="http://www.w3.org/2003/05/soap-envelope"> <env:Header> <m:reservation <env:Header> <m:reservation xmlns:m="http://travelcompany.example.org/reservation" xmlns:m="http://travelcompany.example.org/reservation" env:role="http://www.w3.org/2003/05/soap- env:role="http://www.w3.org/2003/05/soap-envelope/role/next" env:mustUnderstand="true"> envelope/role/next" env:mustUnderstand="true"> <m:reference>uuid:093a2da1-q345-739r-ba5d-<m:reference>uuid:093a2da1-q345-739r-ba5d-pqff98fe8j7d</m:reference> <m:dateAndTime>2001-11-pqff98fe8j7d</m:reference> <m:dateAndTime>2001-11-29T13:20:00.000-05:00</m:dateAndTime> </m:reservation> 29T13:20:00.000-05:00</m:dateAndTime> </m:reservation> <n:passenger <n:passenger xmlns:n="http://mycompany.example.com/employees" xmlns:n="http://mycompany.example.com/employees" env:role="http://www.w3.org/2003/05/soap-env:role="http://www.w3.org/2003/05/soap-envelope/role/next" env:mustUnderstand="true"> envelope/role/next" env:mustUnderstand="true"> <n:name>Åke Jógvan Øyvind</n:name> </n:passenger> <n:name>Åke Jógvan Øyvind</n:name> </n:passenger> </env:Header> <env:Body/> </env:Envelope></env:Header> <env:Body/> </env:Envelope>

Page 34: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

34

The need for XML “schemas”The need for XML “schemas” Unlike any other data format, XML is totally flexible, Unlike any other data format, XML is totally flexible,

elements can be nested in arbitrary wayselements can be nested in arbitrary ways We can start by writing the XML data -- no need for We can start by writing the XML data -- no need for

a priori design of a schemaa priori design of a schema Think relational databases, or Java classesThink relational databases, or Java classes

However, schemas are necessary:However, schemas are necessary: Facilitate the writing of applications that process dataFacilitate the writing of applications that process data Constraint the data that is correct for a certain applicationConstraint the data that is correct for a certain application Have a priori agreements between parties with respect to Have a priori agreements between parties with respect to

the data being exchangedthe data being exchanged Schema: a model of the dataSchema: a model of the data

Structural definitionsStructural definitions Type definitionsType definitions DefaultsDefaults

Page 35: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

35

History and role of XML Schema History and role of XML Schema LanguagesLanguages

Several standard Schema LanguagesSeveral standard Schema Languages DTDs, XML Schema, RelaxNGDTDs, XML Schema, RelaxNG

Schema languages have been designed after, and in Schema languages have been designed after, and in an orthogonal fashion, to XML itselfan orthogonal fashion, to XML itself

Schemas and data are completely decoupled in XMLSchemas and data are completely decoupled in XML Data can exist with or without schemasData can exist with or without schemas Or with multiple schemasOr with multiple schemas Schema evolutions rarely impose evolving the dataSchema evolutions rarely impose evolving the data Schemas can be designed before the data, or extracted from Schemas can be designed before the data, or extracted from

the data (DataGuide -- Stanford)the data (DataGuide -- Stanford) Makes XML the right choice for manipulating semi-Makes XML the right choice for manipulating semi-

structured data, or rapidly evolving data, or highly structured data, or rapidly evolving data, or highly customizable datacustomizable data

Page 36: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

36

DTDsDTDs Inherited from SGMLInherited from SGML Part of the original XML 1.0 specificationPart of the original XML 1.0 specification Describe the “grammar” of the XML fileDescribe the “grammar” of the XML file

Element declarations:Element declarations: how elements are allowed to nest how elements are allowed to nest within each other by rules and constraintswithin each other by rules and constraints

Attributes lists:Attributes lists: describe what attributes are allowed on describe what attributes are allowed on which elementwhich element

Some constraints on the value of elements and Some constraints on the value of elements and attributesattributes

Which is the root element of the XML fileWhich is the root element of the XML file Checking the structural constraints: Checking the structural constraints: DTD DTD

validation validation (valid vs. invalid documents)(valid vs. invalid documents) DTD very useful for a while, not used anymore, DTD very useful for a while, not used anymore,

several major limitationsseveral major limitations

Page 37: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

37

Declaring the structure of Declaring the structure of elementselements

Grammar that describes the structure of the elementGrammar that describes the structure of the element Subelements, identified by Name orSubelements, identified by Name or #PCDATA#PCDATA

Combinators :Combinators : „„+“ for at least 1+“ for at least 1 „„*“ for 0 or more *“ for 0 or more „„?“ for 0 or 1?“ for 0 or 1 „ „ , „ for concatenation, „ for concatenation „ „ | „ for choice | „ for choice

<!ELEMENT a ( (b | c) * , d ? , e ) ><!ELEMENT a ( (b | c) * , d ? , e ) > PCDATA: only textual content allowedPCDATA: only textual content allowed

<!ELEMENT a #PCDATA><!ELEMENT a #PCDATA> EMPTY : the element must be emptyEMPTY : the element must be empty

<!ELEMENT a EMPTY><!ELEMENT a EMPTY> ANY: allows any contentANY: allows any content

<!ELEMENT a ANY ><!ELEMENT a ANY >

Page 38: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

38

Example DTD for recipesExample DTD for recipes<!E L E M E NT collection (description,recipe*)><!E L E M E NT description ANY ><!E L E M E NT recipe

(title,ingredient*,preparation,comment?,nutrition)>

<!E L E M E NT title (#P CD ATA)><!E L E M E NT ingredient

(ingredient*,preparation)?><!E L E M E NT preparation (step*)><!E L E M E NT step (#P CD ATA)><!E L E M E NT comment (#P CD ATA)><!E L E M E NT nutrition E M P TY >

Page 39: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

39

Defining the attribute listsDefining the attribute lists

Structure: Structure: <!ATTLIST <!ATTLIST ElementNameElementName definitiondefinition>>

<!ATTLIST<!ATTLIST ingredient ingredient name CDATA #REQUIRED name CDATA #REQUIRED amount CDATA #IMPLIED amount CDATA #IMPLIED unit CDATA #FIXED „cup“ unit CDATA #FIXED „cup“ >>

CDATA means normal contentCDATA means normal content #REQUIRED, or #IMPLIED refer to the fact #REQUIRED, or #IMPLIED refer to the fact

that the attribute is optional or notthat the attribute is optional or not Default value possibleDefault value possible

Page 40: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

40

Attributes (cont.)Attributes (cont.) #REQUIRED#REQUIRED

Document must specify a value for attributeDocument must specify a value for attribute #IMPLIED#IMPLIED

Attribute is optional, there is no defaultAttribute is optional, there is no default valuevalue

Default value, if no other value specifiedDefault value, if no other value specified #FIXED #FIXED valuevalue

Default value, if no other value specifiedDefault value, if no other value specified If value specified, it must be the fixed valueIf value specified, it must be the fixed value

Page 41: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

41

Major attribute typesMajor attribute types

PCDATA: normal Text contentPCDATA: normal Text content IDID

Value is unique within documentValue is unique within document Element has at most one attribute of this typeElement has at most one attribute of this type No default values allowedNo default values allowed

IDREF, IDREFSIDREF, IDREFS References to other elements within the References to other elements within the

documentdocument IDREFS: Enumeration, „ “ as separatorIDREFS: Enumeration, „ “ as separator

Page 42: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

42

ID and IDREF attributesID and IDREF attributes <!ATTLIST<!ATTLIST book book

isbn ID #REQUIRED isbn ID #REQUIRED price CDATA #IMPLIED price CDATA #IMPLIED index IDREFS „“ index IDREFS „“ >>

<book id=„1“ index=„2 3 “ ><book id=„1“ index=„2 3 “ > <book id=„2“ index=„3“/><book id=„2“ index=„3“/> <book id =„3“/><book id =„3“/>

Page 43: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

43

Attributes list exampleAttributes list example<!E L E M E NT ingredient (ingredient*,preparation)?><!ATTL I ST ingredient name CD ATA #R E QUI R E D

amount CD ATA #I M P L I E D

unit CD ATA #I M P L I E D ><!E L E M E NT nutrition E M P TY ><!ATTL I ST nutrition protein CD ATA #R E QUI R E D

carbohydrates CD ATA

#R E QUI R E D fat CD ATA #R E QUI R E D

calories CD ATA #R E QUI R E D

alcohol CD ATA #I M P L I E D >

Page 44: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

44

Mixed content in DTDsMixed content in DTDs Mixing PCDATA declarations with other Mixing PCDATA declarations with other

subelements means that the content can be subelements means that the content can be “mixed”“mixed”

<!ELEMENT p(#PCDATA|a|ul|b|i|em)*><!ELEMENT p(#PCDATA|a|ul|b|i|em)*>

<p>some text <em>some emphasized <p>some text <em>some emphasized text</em> blah <b>some bold text</em> blah <b>some bold text</b> </p>text</b> </p>

Page 45: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

45

Declarations of DTDsDeclarations of DTDs No DTD (well-formed Documents)No DTD (well-formed Documents) DTD inside the Document: DTD inside the Document:

<!DOCTYPE name <!DOCTYPE name [definition][definition] >> DTD external, specified by URI:DTD external, specified by URI:

<!DOCTYPE name <!DOCTYPE name SYSTEM „demo.dtd“>SYSTEM „demo.dtd“> DTD external, Name and optional URI:DTD external, Name and optional URI:

<!DOCTYPE name <!DOCTYPE name PUBLIC „Demo“>PUBLIC „Demo“><!DOCTYPE name <!DOCTYPE name PUBLIC „Demo“ „demo.dtd“>PUBLIC „Demo“ „demo.dtd“>

DTD inside the document + external:DTD inside the document + external:<!DOCTYPE name1 <!DOCTYPE name1 SYSTEM „demo.dtdSYSTEM „demo.dtd >>

Page 46: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

46

Correctness of XML documentsCorrectness of XML documents Well formedWell formed documents documents

Verify the basic XML constraints, e.g. <a></b>Verify the basic XML constraints, e.g. <a></b> Valid documentsValid documents

Verify the additional DTD structural constraintsVerify the additional DTD structural constraints Non well formed XML documents cannot be processedNon well formed XML documents cannot be processed Non-valid documents can still be processed (queried, Non-valid documents can still be processed (queried,

transformed, etc)transformed, etc)

Page 47: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

47

Limitations of DTDsLimitations of DTDs DTDs describe only the “grammar” of the XML DTDs describe only the “grammar” of the XML

file, not the detailed structure and/or typesfile, not the detailed structure and/or types This grammatical description has some obvious This grammatical description has some obvious

shortcomings:shortcomings: we cannot express that a “length” element must we cannot express that a “length” element must

contain a non-negative number contain a non-negative number (constraints on the (constraints on the type of the value of an element or attribute)type of the value of an element or attribute)

The “unit”The “unit” element should only be allowed when element should only be allowed when ““amount”amount” is present is present (co-occurrence constraints)(co-occurrence constraints)

the “the “comment”comment” element should be allowed to appear element should be allowed to appear anywhere anywhere (schema flexibility)(schema flexibility)

Page 48: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

48

Good Schema design principles

The XML schema language shall be1. more expressive than XML DTDs2. expressed in XML3. self-describing4. usable by a wide variety of applications that

employ XML5. straightforwardly usable on the Internet6. optimized for interoperability7. simple enough to be implemented with

modest design and runtime resources8. coordinated with relevant W3C specs

Page 49: Module 2 XML Basics - Stanford Universityweb.stanford.edu/class/cs345b/slides/XMLDB-M2-Stanford.pdfUBL (Universal Business Language) HealthCare Level 7 (medical data) XBRL (financial

49

RecapitulationRecapitulation XML as inheriting from the Web historyXML as inheriting from the Web history

SGML, HTML, XHTML, XMLSGML, HTML, XHTML, XML XML key conceptsXML key concepts

Documents, elements, attributes, textDocuments, elements, attributes, text Order, nested structure, textual informationOrder, nested structure, textual information

NamespacesNamespaces XML usage scenariosXML usage scenarios

Financial, medical, metadata, blogs, etcFinancial, medical, metadata, blogs, etc DTDs and the need for describing the DTDs and the need for describing the

“structure” of an XML file“structure” of an XML file Next: XML SchemasNext: XML Schemas