chapter 15: xml tp2543 web programming mohammad faidzul nasrudin

45
Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

Upload: barnaby-tyler

Post on 21-Jan-2016

244 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

Chapter 15: XML

TP2543 Web ProgrammingMohammad Faidzul Nasrudin

Page 2: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.1 Introduction

• The Extensible Markup Language (XML) was developed in 1996 by the World Wide Web Consortium’s (W3C’s) XML Working Group

• XML is a portable, widely supported, open (i.e., nonproprietary) technology for data storage and exchange

Page 3: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

What is the difference between XML and HTML?

Page 4: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

An HTML Example

<h2>Nonmonotonic Reasoning: ContextDependent Reasoning</h2><i>by <b>V. Marek</b> and<b>M. Truszczynski</b></i><br>Springer 1993<br>ISBN 0387976892

Page 5: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

The Same Example in XML

<book><title>Nonmonotonic Reasoning: ContextDependent Reasoning</title><author>V. Marek</author><author>M. Truszczynski</author><publisher>Springer</publisher><year>1993</year><ISBN>0387976892</ISBN>

</book>

Page 6: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

HTML versus XML: Similarities

• Both use tags (e.g. <h2> and <year>)• Tags may be nested (tags within tags)• Human users can read and interpret both

HTML and XML representations quite easily• But how about machines?

Page 7: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

Problems with AutomatedInterpretation of HTML Documents

• An intelligent agent trying to retrieve the names of the authors of the book

• Authors’ names could appear immediately after the title or immediately after the word “by”

• Are there two authors?• Or just one, called iV. Marek and M.

Truszczynskii?

Page 8: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

HTML vs XML: Structural Information

• HTML documents do not contain structural information: pieces of the document and their relationships.

• XML more easily accessible to machines because– Every piece of information is described.– Relations are also defined through the nesting

structure.– E.g., the <author> tags appear within the <book> tags,

so they describe properties of the particular book.

Page 9: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

HTML vs XML: Formatting

• The HTML representation provides more than the XML representation:– The formatting of the document is also described

• The main use of an HTML document is to display information: it must define formatting

• XML: separation of content from display– same information can be displayed in different

ways

Page 10: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.2 XML Basics

• XML permits document authors to create markup for virtually any type of information– Can create entirely new markup languages that

describe specific types of data, including mathematical formulas, chemical molecular structures, music and recipes

• XML describes data in a way that human beings can understand and computers can process

Page 11: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.2 XML Basics (2)

• An XML parser is responsible for identifying components of XML documents (typically files with the .xml extension) and then storing those components in a data structure for manipulation

• An XML document can reference a Document Type Definition (DTD) or schema that defines the document’s proper structure

Page 12: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.2 XML Basics (3)

• An XML document that conforms to a DTD/schema (i.e., has the appropriate structure) is valid

• If an XML parser (validating or non-validating) can process an XML document successfully, that XML document is well-formed

Page 13: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

player.xml

XML that describes a baseball player’s information

Page 14: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.2 XML Basics (4)

• DTDs and schemas are essential for business-to-business (B2B) transactions and mission critical systems

• Validating XML documents ensures that disparate systems can manipulate data structured in standardized ways and prevents errors caused by missing or malformed data.

Page 15: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.3 Structuring Data

XMLProlog

Page 16: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.3 Structuring Data (2)

• XML element names can be of any length and can contain letters, digits, underscores, hyphens and periods– Must begin with either a letter or an underscore,

and they should not begin with “xml” in any combination of uppercase and lowercase letters, as this is reserved for use in the XML standards

Page 17: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.3 Structuring Data (3)

• When a user loads an XML document in a browser, the browser uses a style sheet to format the data for display

• Google Chrome places a down arrow and right arrow next to every container element; they’re not part of the XML document. – down arrow indicates that the browser is displaying

the container element’s child elements– clicking the right arrow next to an element expands

that element

Page 18: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

article.xml in web browser

XML used to mark up an article

Page 19: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.3 Structuring Data (4)

• An error will happen if:– the XML declaration is missing– any characters, including white space, is placed

before the XML declaration– start tag is not matched with end tag or omitting

either tag– different cases is used for the start-tag and end-

tag names for the same element

Page 20: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.3 Structuring Data (5)

– a white-space character is used in an XML element name

– nesting XML tags improperly. For example, <x><y>hello</x></y> is an error, because the </y> tag must precede the </x> tag

– Failure to enclose attribute values in double ("") or single ('') quotes

Page 21: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

letter.xml in web browser

Business letter marked up with XML

Page 22: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.3 Structuring Data (6)

• An XML document is not required to reference a DTD, but validating XML parsers can use a DTD to ensure that the document has the proper structure

• Validating an XML document helps guarantee that independent developers will exchange data in a standardized form that conforms to the DTD

Page 23: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.4 Namespaces

• XML namespaces provide a means to prevent naming collisions

• Each namespace prefix is bound to a uniform resource identifier (URI) that uniquely identifies the namespace– A URN or URL or even a random string– The parser does not visit these URLs, nor do these

URLs need to refer to actual web pages

Page 24: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.4 Namespaces (2)

• To eliminate the need to place a namespace prefix in each element, authors can specify a default namespace for an element and its children

Page 25: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

namespace.xml and defaultnamespace.xml

XML namespaces demonstration and default namespace demonstration

Page 26: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.5 Document Type Definitions (DTDs)

• To verify whether an XML document is valid (i.e., its elements contain the proper attributes and appear in the proper sequence), an XML parser needs:– Document Type Definitions (DTD) or– Schema (not covered in this course)

• DTDs and schemas specify documents’ element types and attributes, and their relationships to one another

Page 27: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.5 Document Type Definitions (DTDs) (2)

• A DTD expresses the set of rules for document structure using an EBNF (Extended Backus-Naur Form) grammar

• In a DTD:– an ELEMENT element type declaration defines the

rules for an element– an ATTLIST attribute-list declaration defines

attributes for a particular element

Page 28: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.5 Document Type Definitions (DTDs) (3)

• Internal DTD

<?xml version="1.0"?><!DOCTYPE note [<!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>]><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend</body></note>

Page 29: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.5 Document Type Definitions (DTDs) (4)

• External DTD

<?xml version="1.0"?><!DOCTYPE note SYSTEM "note.dtd"><note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body></note>

Page 30: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.5 Document Type Definitions (DTDs) (5)

• In ELEMENT, when children are declared in a sequence separated by commas, the children must appear in the same sequence in the document

<!ELEMENT note (to, from, heading, body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>

• PCDATA specifies that an element (e.g., name) may contain parsed character data. Elements with parsed character data cannot contain markup characters, such as less than (<), greater than (>) or ampersand (&). Replace them with &lt; &gt and &amp;

Page 31: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.5 Document Type Definitions (DTDs) (6)

• In ELEMENT,• Declaring Only One Occurrence

<!ELEMENT note (message)>• Minimum One Occurrence

<!ELEMENT note (message+)> • Zero or More Occurrences

<!ELEMENT note (message*)>• Declaring Zero or One Occurrences

<!ELEMENT note (message?)>• Declaring either/or Content

<!ELEMENT note (to,from,header,(message|body))>

Page 32: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.5 Document Type Definitions (DTDs) (7)

• Attributes are declared with an ATTLIST declaration• CDATA specifies that attribute type contains character data. A parser will

pass such data to an application without modification• #REQUIRED, #IMPLIED, #FIXED value

<!ELEMENT square EMPTY> <!ATTLIST square width CDATA "0"><!ATTLIST contact fax CDATA #IMPLIED> <!ATTLIST person number CDATA #REQUIRED> <!ATTLIST sender company CDATA #FIXED "Microsoft">

• Enumerated Attribute Values<!ATTLIST payment type (check|cash) "cash">

Page 33: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.5 Document Type Definitions (DTDs) (8)

<person sex="female"> <firstname>Anna</firstname> <lastname>Smith</lastname></person>

<person> <sex>female</sex> <firstname>Anna</firstname> <lastname>Smith</lastname></person>

• attributes cannot contain multiple values (child elements can) • attributes are not easily expandable (for future changes) • attributes cannot describe structures (child elements can) • attributes are more difficult to manipulate by program code • Attribute values are not easy to test against a DTD

Page 34: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.5 Document Type Definitions (DTDs) (9)

<note date="12/11/2002"> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body></note>

<note> <date>12/11/2002</date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body></note>

<note> <date> <day>12</day> <month>11</month> <year>2002</year> </date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body></note>

Page 35: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.5 Document Type Definitions (DTDs) (10)

• ENTITY: to define shortcuts to special characters• Internal declaration

DTD:<!ENTITY writer "Donald Duck."><!ENTITY copyright "Copyright W3Schools.">XML:<author>&writer;&copyright;</author>

• External declarationDTD:<!ENTITY writer SYSTEM "http://www.w3schools.com/entities.dtd"><!ENTITY copyright SYSTEM "http://www.w3schools.com/entities.dtd">XML:<author>&writer;&copyright;</author>

Page 36: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

letter2.xml and letter.dtd

Document Type Definition (DTD) for a business letter

Page 37: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.7 XML Vocabularies

• XML allows authors to create their own tags to describe data precisely– People and organizations in various fields of study

have created many different kinds of XML for structuring data

– Some of these markup languages are:• MathML (Mathematical Markup Language)

– describes mathematical expressions for display

• Scalable Vector Graphics (SVG)

Page 38: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.7 XML Vocabularies (2)

• Wireless Markup Language (WML)• Extensible Business Reporting Language (XBRL)• Extensible User Interface Language (XUL)• Product Data Markup Language (PDML)• W3C XML Schema • Extensible Stylesheet Language (XSL)

Page 39: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

Mathml2.mml

file:///H:/TP2543/textbookcode/ch15/Fig15_15/mathml2.mml

Firefox

Page 40: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.8 Extensible Stylesheet Language and XSL Transformations

• Convert XML into any text-based document• XSL documents have the extension .xsl• XSL is a group of three technologies:– XSL-FO (XSL Formatting Objects): specifying formatting– XPath (XML Path Language): locating structures and

data (such as specific elements and attributes)– XSLT (XSL Transformations): transforming the structure

of the XML document data to another structure

Page 41: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.8 Extensible Stylesheet Language and XSL Transformations (2)

• For example, XSLT allows you to convert a simple XML document to an HTML5 document that presents the XML document’s data (or a subset of the data) formatted for display in a web browser

• Transforming an XML document using XSLT involves two tree structures– the source tree (i.e., the XML document to transform)– the result tree (i.e., the XML document to create)

Page 42: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

sports.xml, sports.xsl, style.css

http://test.deitel.com/iw3htp5/ch15/Fig15_18-19/sports.xml

Page 43: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

sorting.xml, sorting.xsl, style.css

Page 44: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

15.8 Extensible Stylesheet Language and XSL Transformations (3)

• XPath character / (a forward slash) – Selects the document root– In XPath, a leading forward slash specifies that we are using absolute

addressing– An XPath expression with no beginning forward slash uses relative addressing

• XSL @ symbol – Retrieves an attribute’s value

• XSL name() – Retrieves the current node’s element name

• XSL text() – Retrieves the text between an element’s start and end tags

• XPath expression //* – Selects all the nodes in an XML document

• Fig. 15.22 for XSL style-sheet elements

Page 45: Chapter 15: XML TP2543 Web Programming Mohammad Faidzul Nasrudin

The End

Thank You