xml, dom and the web - page d'accueil / lirmm.fr / - lirmmcroitoru/xml-php.pdf · • 1996:...

45
XML, DOM and the Web XML, DOM and the Web XML, DOM and the Web XML, DOM and the Web Madalina Croitoru IUT Montpellier

Upload: ngokhuong

Post on 06-Jul-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

XML, DOM and the WebXML, DOM and the WebXML, DOM and the WebXML, DOM and the WebMadalina CroitoruIUT Montpellier

What is XML?What is XML?

• Extensible Markup Language

• Markup Language (like HTML)

• Difference with HTML:

– HTML: designed to display data– HTML: designed to display data

– XML: designed to transport and carry data

– HTML: tags are already predefined

– XML: you define you own tags

XML “does nothing”XML “does nothing”

• XML was created to structure, store and transport information:

<note><to>Tove</to><from>Jani</from><from>Jani</from><heading>Reminder</heading><body>Don't forget me this

weekend!</body>

</note>

XML and the WebXML and the Web

• Early Web: URI + HTTP +HTML– URIs identify resources– HTTP retrieves resources– HTML is the resource format

• Web Today: many different technologies:• Web Today: many different technologies:– URI+HTTP+HTML+PHP for basic Web publishing– CSS & JavaScript for advanced publishing

• JavaScript & XML (AJAX)– Scripts dynamically loading data from a server– Machine-to-machine interaction: the server and the script

From Humans to MachinesFrom Humans to Machines

• The Web was designed for humans:– HTML is a language for describing page

layouts and links

– Machines were only used for implementing it

• Search engines were the first machine users on the Web:– They made the Web success possible

– They demonstrated how hard it is to understand HTML pages

HTML is for HUMANSHTML is for HUMANS

• HTML is:

– GOOD for rendering Web pages

– BAD for understanding Web pages

• Web growth in the late 90’s was enormous:

– Everybody putting information online which was inaccessible for machines

MachineMachine--Friendly WebFriendly Web

• Information should be published in a machine-understandable way:– Machines need other structures to process Web

content

• 1996: W3C Working Group SGML on the Web– SGML a very complex and expensive technology

– HTML is just one document type defined with SGLM

– How can SGML be made easily and widely usable?

SGML, HTML and XMLSGML, HTML and XML

• SGML: Standard Generalised Markup Language– Language for designing document types

• HTML: Hypertext Markup Language – Implements a simple SGML document type

– Its syntax is SGML but it uses very few SGML features

• XML: Extensible Markup Language– A language for designing document types

– Greatly simplified version of SGML

XML usage for the WebXML usage for the Web

• Server side foundation for Web publishing

• Successful:

– Technically sound (simple)

– Human-readable based on a well-known – Human-readable based on a well-known syntax

– Great for rapid prototyping and experiments

• Ontologies etc.

XML Usage elsewhereXML Usage elsewhere

• Messages from sensors• Genome sequences• Scalable Vector Graphics• Etc etc• Etc etc

• Information professionals should know and use XML:– XML and some schema language– XSLT for processing

XML is a syntax for treesXML is a syntax for trees

• Not all data is easily represented by trees

• XML encodes a structure purely on the syntactic level

• XML structures must be accompanied by • XML structures must be accompanied by semantic descriptions

XML encodingXML encoding

• XML documents can use a wide array of characters – defined by Unicode

– Currently defines more than 100000 characterscharacters

<?xml version=“1.0” encoding = “UTF-8”?>

• XML processors must support UTF-8 and UTF-16

Basic ConceptsBasic Concepts

• XML documents have an XML declaration

• Exactly one document element (the root element)

• Elements are marked up using tags• Elements are marked up using tags– Most elements have content, surrounded by

start and end tags

• Elements may be nested– Elements may be repeated

• Elements may have attributes

Example 1Example 1

<?xml version="1.0" encoding="ISO-8859-1"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><heading>Reminder</heading><body>Don't forget me this weekend!</body>

</note>

• root element: <note>

• 4 child elements of root: <to>, <from>, <heading>, and <body>

XML Documents form a Tree StructureXML Documents form a Tree Structure

<root><child><subchild>.....</subchild>

</child></child>

</root>

Example 2Example 2

<bookstore><book category="COOKING">

<title lang="en">Everyday Italian</title><author>Giada De Laurentiis</author><year>2005</year><price>30.00</price>

</book><book category="CHILDREN">

<title lang="en">Harry Potter</title><author>J K. Rowling</author><author>J K. Rowling</author><year>2005</year><price>29.99</price>

</book><book category="WEB">

<title lang="en">Learning XML</title><author>Erik T. Ray</author><year>2003</year><price>39.95</price>

</book></bookstore>

Syntax RulesSyntax Rules

• All XML elements must have a closing tag

• XML tags are case sensitive

• XML elements must be properly nested

• XML Documents must have a root • XML Documents must have a root elements

• XML Attribute values must be quoted

Entity references and commentsEntity references and comments

• This will generate an XML error:

<message>if salary < 1000 then</message>

• Use instead:

<message>if salary &lt; 1000 then</message>

• <!-- This is an XML comment -->

• Attention: with xml the white space in a document is preserved

XML attributes vs ElementsXML attributes vs Elements

• <person sex="female"><firstname>Anna</firstname><lastname>Smith</lastname>

</person>

• <person>• <person><sex>female</sex><firstname>Anna</firstname><lastname>Smith</lastname>

</person>

• As much as possible try to use elements rather than attributes

Why avoid XML attributes?Why avoid XML attributes?

• Problems with using attributes:

– attributes cannot contain multiple values (elements can)

– attributes cannot contain tree structures – attributes cannot contain tree structures (elements can)

– attributes are not easily expandable (for future changes)

Valid XML DocumentsValid XML Documents

• A document which conforms to the rules for a Document Type Definition (DTD)<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE note SYSTEM "Note.dtd"><note><note>

<to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body>

</note>

XML DTDXML DTD

• DOCTYPE is a reference to an external DTD file

• DTD file: defines the structure of an XML document<!DOCTYPE note

[[<!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>

]>

XML SchemaXML Schema

• W3C supports an XML-alternative to DTD called XML –Schema:

<xs:element name="note">

<xs:complexType><xs:complexType><xs:sequence><xs:element name="to" type="xs:string"/><xs:element name="from" type="xs:string"/><xs:element name="heading" type="xs:string"/><xs:element name="body" type="xs:string"/>

</xs:sequence></xs:complexType>

</xs:element>

XML DTDXML DTD

<!DOCTYPE note [<!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>]>

• !DOCTYPE note defines that the root element of this document is note

• !ELEMENT note defines that the note element contains four elements: "to,from,heading,body"

• !ELEMENT to defines the to element to be of type "#PCDATA"

• !ELEMENT from defines the from element to be of type "#PCDATA"

• !ELEMENT heading defines the heading element to be of type "#PCDATA"

• !ELEMENT body defines the body element to be of type "#PCDATA"

XML DTDXML DTD

• External DTD declaration:<!DOCTYPE root-element SYSTEM "filename">

• XML file:<?xml version="1.0"?><!DOCTYPE note SYSTEM "note.dtd"><!DOCTYPE note SYSTEM "note.dtd">…

• DTD file:<!ELEMENT … >…

DTD View: XML building blocksDTD View: XML building blocks

• Elements• Attributes – extra information about

elements. Come in name/value pairs• Entities: &lt, &gt, &amp, &quot, &apos• PCDATA: Parsed character data. The text • PCDATA: Parsed character data. The text

that will be parsed by a parser for entities and markup

• CDATA: Character data. The text that will not be parsed by a parser (tags inside the text will not be treated as markup)

Declaring elementsDeclaring elements

• <!ELEMENT element-name category>or<!ELEMENT element-name (element-content)>

– <!ELEMENT element-name EMPTY>– <!ELEMENT element-name EMPTY>

– <!ELEMENT element-name (#PCDATA)>

– <!ELEMENT element-name (child1,child2,...)>

– <!ELEMENT note (message+)>

– <!ELEMENT note (message*)>

– <!ELEMENT note (to,from,header,(message|body))>

Declaring attributesDeclaring attributes

• <!ATTLIST element-name attribute-name attribute-type default-value>

– <!ATTLIST payment type CDATA "check">

• Attribute type:– CDATA The value is character data – (en1|en2|..) The value must be one from an enumerated list – ID The value is a unique id

• Default-value:– #REQUIRED The attribute is required – #IMPLIED The attribute is not required – #FIXED value The attribute value is fixed

XML Schema definesXML Schema defines

• elements that can appear in a document• attributes that can appear in a document• which elements are child elements• the order of child elements• the number of child elements• the number of child elements• whether an element is empty or can include

text• data types for elements and attributes• default and fixed values for elements and

attributes

XML, DTD, XML SchemaXML, DTD, XML Schema

<?xml version="1.0"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body>

<!ELEMENT note (to, from, heading, body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>

<body>Don't forget me this weekend!</body></note>

<?xml version="1.0"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"targetNamespace="http://www.w3schools.com"xmlns="http://www.w3schools.com"elementFormDefault="qualified">

<xs:element name="note"><xs:complexType>

<xs:sequence><xs:element name="to" type="xs:string"/><xs:element name="from" type="xs:string"/><xs:element name="heading" type="xs:string"/><xs:element name="body" type="xs:string"/>

</xs:sequence></xs:complexType>

</xs:element>

</xs:schema>

Referencing an external XML SchemaReferencing an external XML Schema

<?xml version="1.0"?>

<notexmlns="http://www.w3schools.com"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3schools.com note.xsd"><to>Tove</to>

The <schema> ElementThe <schema> Element

• The root of every XML Schema

<xs:schema>.........

</xs:schema>

Simple Elements and AttributesSimple Elements and Attributes

• <xs:element name="xxx" type="yyy"/>

• <xs:attribute name="xxx" type="yyy"/>

– xs:string

– xs:decimal– xs:decimal

– xs:integer

– xs:boolean

– xs:date

– xs:time

Complex ElementsComplex Elements

• <employee><firstname>John</firstname><lastname>Smith</lastname>

</employee>

• <xs:element name="employee">• <xs:element name="employee"><xs:complexType><xs:sequence><xs:element name="firstname" type="xs:string"/><xs:element name="lastname" type="xs:string"/>

</xs:sequence></xs:complexType>

</xs:element>

DOMDOM

• W3C standard

• Defines a standard for accessing documents like XML and HTML:– Objects and properties of all document elements

– Methods to access them– Methods to access them

• Three parts:– Core DOM: standard model for any structured

document

– XML DOM: standard model for XML documents

– HTML DOM: standard model for HTML documents

XML DOMXML DOM

• Standard object model for XML

• Standard programming interface for XML

• A standard for how to get, change, add or delete XML elements:delete XML elements:

– The entire document is a document node

– Every XML element is an element node

– The text in the XML elements are text nodes

– Every attribute is an attribute node

– Comments are comment nodes

XML DOM Node TreeXML DOM Node Tree

• XML DOM views an XML document as a tree-structure (called a node-tree)

Node Parents, Children, SiblingsNode Parents, Children, Siblings

– In a node tree, the top node is called the root

– Every node, except the root, has exactly one parent node

– A node can have any number of children

– A leaf is a node with no children– A leaf is a node with no children

– Siblings are nodes with the same parent

• XML parser reads the XML, and converts it into an XML DOM object that can be accessed using different languages

The HTML DOM Node TreeThe HTML DOM Node Tree

DOM Model?

• OK – we have the model – what do we do with it?

– Manipulate it with a programming language!

• In this lecture and practical lesson – PHP• In this lecture and practical lesson – PHP

• Next week: JavaScript

DOM and PHP: the functions

• Let us consider the following XML file called note.xml:

<?xml version="1.0" encoding="ISO-8859-1"?><?xml version="1.0" encoding="ISO-8859-1"?><note>

<to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body>

</note>

Load and Output XML

<?php$xmlDoc = new DOMDocument();//creates a DOMDocument-Object

$xmlDoc->load("note.xml");// loads the XML // loads the XML print $xmlDoc->saveXML();//puts the internal XML document into a string

?>

The output of this should be:

Tove Jani Reminder Don't forget me this weekend!

Looping through XML

<?php

$xmlDoc = new DOMDocument();$xmlDoc->load("note.xml");

$x = $xmlDoc->documentElement;$x = $xmlDoc->documentElement;

foreach ($x->childNodes AS $item){print $item->nodeName . " = " . $item->nodeValue . "<br />";

}

?>

Other interesting functions

• getElementsByTagName();$books = $doc->getElementsByTagName( "book" );

foreach( $books as $book ) {

$authors = $book->getElementsByTagName( "author" );

$author = $authors->item(0)->nodeValue; }}

– The script uses the getElementsByName method to get a list of all of the elements with the given name.

– Within the loop of the book nodes, the script uses the getElementsByName method to get the nodeValuefor the author tags. The nodeValue is the text withinthe node.

More information (French)

• http://eusebius.developpez.com/php5dom/

• http://www.scriptol.fr/xml/dom.php

• http://durand.iut-amiens.fr/mcr51:cours:dom#chargement_amiens.fr/mcr51:cours:dom#chargement_d_un_fichier_xml_ou_d_une_chaine_xml_php