dr james denholm-price [email protected] xml lecture 3 client server: php + xml

Dr James Denholm-Price

[email protected]

XML Lecture 3

Client Server: PHP + XML

Delimiters

http://ars.userfriendly.org/cartoons/?id=20041025

http://ars.userfriendly.org/cartoons/?id=20041025

Lecture 3 contents

• Motivation: Context & why XML.• Examples: “Use cases”• Client-server: Server-side processing.

– Using what we know (the DOM!)– Looking forward to what we don’t yet know...

– Client-side processing will be later.

Context“Things should be made as simple as possible,but no simpler. ” – Albert Einstein.– “Things” CD player?

• To play a CD you put your CD into a CD player and the playerplays it for you.

• The CD player provides a CD-playing service.

• Object oriented programming approach:– You bind data and its processing together.

• In O-O, every CD would come with its own player.

• Service-orientation approach:– Services provide functions that “do things”.– For software, services exchange data which must be comprehensible ...

XML is a lingua franca commonly (but not exclusively) used.

Image: Albert Einstein

Archives,The Hebrew University of Jerusalem,

Israel.

Services• Huge subject – “Service-Oriented Architecture” (SOA).• The essence is something like:

– Break down “business process” workflows.– Provide “services” using common protocols & well-defined models

to securely expose “business data”– Aggregate service data into “business objects” to answer questions.– E.g. “Who’s on module CO3070 in 2010/11?”

• Module code module name.• Module code + year list of student IDs.• Student ID students details.• Student ID student photo?• Aggregate it all together to give a class list (e.g. PDF).

Why bother with “services” at all?

Server

HTTPHTTP

DB

Data Service

XMLXML Data

XMLXML Data

XMLXML Data

“data” direct to the client! ... More later...

Server

HTTPHTTP

DB

Data Service

HTTPHTTP Data

HTTPHTTP Data

HTTPHTTP Data

Why XML?

• Why not text? After all...– Easily read, easily written!

• Why not arbitrary binary files? After all...– Easily written, efficient/small size!

• But are they really:– Easily understood?

• Internationalisation! (Test: Ff vs IE)• Meaning!• Sharing!

– Easy to write?!

http://www.w3.org/International/tests/tests-html-css/tests-character-encoding/generate?test=55



XML• Written in plain text easy to write, exchange.• Defaults to UTF-8 internationalised.• Can be defined messages can be:

– Specified.• DTDs (also XML Schema, RelaxNG, Schematron...)

– Validated.• Can be exchanged HTTP is easy & ubiquitous.• Easy for machines to parse into data...• It’s not particularly efficient (size-wise) issue?• It’s also not the only data specification with (most of) these

properties ... more of which in a later lecture!

http://www.joelonsoftware.com/articles/Unicode.html

What data’s out there?

• TV– http://bleb.org/tv/data/rss.php?ch=bbc1&day=0

• Youtube– http://gdata.youtube.com/feeds/base/videos?q=h

tc%20desire%20hd&client=ytapi-youtube-search&alt=rss&v=2

• weather data– http://weather.yahooapis.com/forecastrss?p=UKX

X0770&u=c (Valid?)

• Notice anything? – RSS = early example of “paving the cowpaths” ;-)

http://bleb.org/tv/data/rss.php?ch=bbc1&day=0

http://gdata.youtube.com/feeds/base/videos?q=htc%20desire%20hd&client=ytapi-youtube-search&alt=rss&v=2



http://weather.yahooapis.com/forecastrss?p=UKXX0770&u=c

http://weather.yahooapis.com/forecastrss?p=UKXX0770&u=c

http://validator.w3.org/feed/check.cgi?url=http://weather.yahooapis.com/forecastrss?p=UKXX0770&u=c

http://en.wikipedia.org/wiki/RSS

http://www.w3.org/TR/html-design-principles/#pave-the-cowpaths

http://www.w3.org/TR/html-design-principles/#pave-the-cowpaths

RSS

• “Rich Site Summary” or “Really Simple Syndication” ... its history is complex– Wikipedia has a good summary!

• RSS v2.0 seems well-specified (DTD or XSD) if “unofficial” ;-)

http://en.wikipedia.org/wiki/Rss

http://www.rssboard.org/rss-2-0

http://www.silmaril.ie/software/rss2.dtd

http://www.thearchitect.co.uk/schemas/rss-2_0.xsd

RSS example: WebTech podcast<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel>

<title>Web Technologies Podcast</title><link>http://lms.../webapps/lobj-podcast-bb_bb60/feed/CO2013-A_SEM1/podcast.xml</

link> <description>A podcast to accompany lectures in Web Technologies ...</description> <image>

<title>CO2013A banner.GIF</title> <url>http://lms.../webapps/lobj-podcast-bb_bb60/files/itunes/...</url>

</image> <item>

<title>Lecture 1 Introduction and revision</title><link>http://staffnet.../~ku13043/WebTech/media/lecture01.mp3</link> <description>Easy introduction to the module,

...</a><br></description> <enclosure url="http://staffnet.../~ku13043/WebTech/media/lecture01.mp3"

type="audio/mpeg" /> <pubDate>Mon, 05 Oct 2009 13:37:05 GMT</pubDate> <guid>http://staffnet..../~ku13043/WebTech/media/lecture01.mp3</guid> <dc:date>2009-10-05T13:37:05Z</dc:date>

</item> </channel> </rss> Valid?

http://validator.w3.org/feed/check.cgi?url=http://lms.kingston.ac.uk/webapps/lobj-podcast-bb_bb60/feed/CO2013-A_SEM1/podcast.xml

Mashups

• Mash data from disparate sources into an application.

• Google maps is commonly used because its client-side APIs support reading & displaying data, e.g. XML data.– http://googlemapsmania.blogspot.com/– http://ccgi.arutherford.plus.com/website/flexTraffic/

• UK Highways Agency Java applet– Where’s the path? OS data + Google maps– WalkJogRun + iPhone app

http://googlemapsmania.blogspot.com/

http://ccgi.arutherford.plus.com/website/flexTraffic/

http://www.trafficengland.com/map.aspx?long0=-153.02732193009462&lat0=3107.724938251105&long1=16.731196588423927&lat1=3034.7776501155117

http://wtp2.appspot.com/wheresthepath.htm?lat=51.297504038048956&lon=-0.5562584101470003&gz=15&oz=8&gt=1

http://www.walkjogrun.net/running-routes/UK

How can we process XML?

• Depends what we want to do with it & when:– Java application – many ways, JAXB/P is common.– Windows app’s – .Net has tools...– Web (client/server):

• Server-side languages (server) usually have platform-specific packages.

– E.g. JSP JAXB/P, PHP DOM ;-) SimpleXML etc.• Browsers (client) have varying support for XML, usually

XMLDOM (lecture 8) & XSLT (lecture 6)– Security issues (“same origin”).

http://php.net/manual/en/refs.xml.php

http://php.net/manual/en/book.dom.php

http://php.net/simplexml

16

Request for remote data

DNS server

1 DNS query

2 TCP connection

3 HTTP request

4 HTTP response

5 Optional parallel connections

(embedded resources)

Browser

Origin server

DB

6 Asynchronous requests

Tim

e

Security (1/2)• Is it a good idea to let an arbitrary web page load data

from any source?– No!

• On the client– E.g. What if you inadvertently make it possible for someone

nefarious to inject arbitrary JavaScript into a page ... That JavaScript could load “data” from anywhere and do anything to the page, including monitoring keystrokes, passwords, credit cards, reading cookies, phishing...

– Current solution = “same origin” policy:• Scripted requests must come from the same domain (web server) as

the HTML.

Security (2/2) – “Same origin” is a severe restriction so

workarounds exist ;-)• E.g. a <script> tag can do pretty-much anything;

Google for “comet”.• A W3C working party (on “webapps”) is looking at APIs

and “trust” issues...

• On the server– Strict restrictions don’t apply (but the remote

server might filter requests).– Do whatever you like & get away with it badge...

http://www.w3.org/2008/webapps/



Approaches to XML processing• XPath & XSLT: after lecture ~5.

– “Better” ways to navigate the node tree & manipulate data.• DOM (“Document Object Model”)

– leverages what we know (from CO2013): Server-side and client-side.– Simple but relatively heavyweight API.– Well-defined by W3C.– Works ~OK with namespaces.

• Objects let us use language-native notation– Usually needs bolt-on search mechanisms, XPath...– Different in each language.– Namespaces are a mess.

• Other processors like SAX try to be more lightweight, particularly for large XML doc’s.

http://www.w3.org/DOM/

21

PHP• PHP is a server-side scripting language

– Like ASP, JSP, ASP.Net, Cold Fusion it is executed • by the server!• on the server!• you must preview/test your PHP from the server!

• Unlike CGI (Perl etc) the code is embedded within the web page – usually identified by a special extension ‘file.php’ not

‘file.html’

• PHP documentation is your friend...– The online manual is fantastic so bookmark it!

• http://www.php.net/docs.php

CO3041Databases and the Web

http://www.php.net/docs.php




22

PHP in web pages (from the PHP manual)• The recommended way to embed PHP in (X)HTML is:

– <?php echo "if you want to serve XHTML or XML documents, do like this\n"; ?>

– You should all use this!• PHP short tags are:

– <? echo "this is the simplest, an SGML processing instruction\n"; ?>

– These can interfere with XML instructions, like DOCTYPE or XML prolog <?xml version="1.0" encoding="utf-8"?>

• PHP echo shortcut (like <%= %> in ASP):– <?= expression ?> This is a shortcut for "<? echo

expression ?>"• Alternative embedding to please old M$ FrontPage

– <script language="php">echo "some editors";</script>• Finally ASP-style could be enabled (for some mad reason!)

– <% echo "You may optionally use ASP-style tags"; %>

CO3041Databases and the Web

A syntax quickie

• C-like syntax (similar to C, Java, JavaScript etc.)• $ denotes a PHP variable (scalar, array, object)• '....' delimits unparsed strings;

"..." is parsed for variables, wrap arrays in {}• <?php

$date = date('r');echo "Today's date is <b>$date</b>. ";echo 'Can\'t wait until the

summer ;-)';?>

Client vs Server

• PHP is server-side code!– The client only sees stuff that is deliberately

written to the output stream.– E.g.

• echo, print, printf, print_r• Error messages (if error_reporting is enabled.)• header(...) used to set HTTP headers (also see

output buffering)

http://php.net/manual/en/errorfunc.configuration.php#ini.error-reporting

http://uk2.php.net/manual/en/book.outcontrol.php

Introducing the document object model

• DOM:– Stands for Document Object Model– Developed by the W3C– Designed for XML and HTML– Interface between programming language and

document content– Allows the browser to run programs to manipulate

documents

27

W3C DOM levels• Level 1:

– HTML and XML• Level 2:

– Supports namespaces, Cascading Style Sheets, and user-initiated actions, such as mouse clicks and key strokes

• Level 3:– Standardise support for document loading and saving– Still under development

• DOM spec’s exist for many XML languages & platforms– SMIL– MathML– SVG– {JavaScript, ActionScript, JScript} ECMAScript– PHP

28

http://www.w3.org/DOM/DOMTR#dom1




29CO3070XML for the Web

DOM nodes & tree structure

• W3C Level 1 Document Object Model (DOM1)– Gives access to the elements in a document– Tree structure

• elements correspond with nodes (objects)• attributes are properties• element text is a node itself

– Provides methods to• Query elements contents, attributes, children• Add, delete, move elements within the tree• Add, delete, change attributes


A simple example document

<html>

<head>

<title>Hi!</title>

</head>

<body>

<h1>Simple</h1>

<p>A <b>bold</b>

example</p>

</body>

</html>

31

Behind the scenes: Tree of nodes

<html>

CO3070XML for the Web

documentElement

<html>

<head> <body>

<title>

'Hi!'

<h1> <p>

'Simple'

'A ' <b> ' example'

'bold'

<head>

<title>Hi!</title>

</head>

<body>

</html>

<p>A <b>bold</b>

example</p>

</body>

<h1>Simple</h1>

32

Locating objects by id

• DOM1$document->getElementById('theP');

– This example returns a reference to the object that represents the <p> element in the DOM

– However this only works in XML if there is an attached & valid DTD which defines an attribute of type ID.


<html>

<head>

<title>Hi!</title>

</head>

<body>

</html>

<p id="theP">A <b>bold</b> example</p>

</body>

<h1>Simple</h1>

Example (src)

NB: Examples require PHP5 XML DOM.

http://staffnet.kingston.ac.uk/~ku13043/XML/week03/lecture-example.php?n=1

http://staffnet.kingston.ac.uk/~ku13043/pp.php?s=XML/ex/lecture-example.php

33

Locating objects by tag

• DOM1: $pEls = $document->getElementsByTagName('p');

– Can also be called from any DOM node.– This returns a collection (~array) of objects,

no prizes for guessing what's in it(not much in this simple doc!)

• pEls->item(…)


<html>

<head>

<title>Hi!</title>

</head>

<body>

</html>

<p id="thePara">A <b>bold</b> example</p>

</body>

<h1>Simple</h1>

Example (src)



34

Locating objects by tag #2• DOM1: $pEls = document.getElementsByTagName('p');

– pEls.item(…)

– pEls.item(0)

– pEls.item(1)

• What about “namespaces”?– getElementsByTagNameNS(

namespace,

tagName

)


<html>

<head>

<title>Hi!</title>

</head>

<body>

</html>

<p>A <b>bold</b> example</p>

<p>Ha!</p>

</body>

<h1>Simple</h1>


<p>Ha!</p>


<p>Ha!</p>

example (src)



35

<p class="blob">A…e</p>

<p class="blob">Ha!</p>

Looping over objects by tag• DOM1: $pEls = $document->getElementsByTagName('p');

• for ($i=0;

$i<$pEls->length;

$i++

) {

echo $pEls

->item($i)

->firstChild

->nodeValue;

}


<html>

<head>

<title>Hi!</title>

</head>

<body>

</html>


<p>Ha!</p>

</body>

<h1>Simple</h1>

example (src)




Parent node relationships

If $objP == $doc->getElementById('thp'); then$objP->childNodes[] an array containing 3 nodes…$objP->firstChild the 'A ' text node$objP->lastChild the ' example' text

node

<p id="thp">A <b>bold</b> example</p>

'bold'

<p>

'A' <b> ' example' children

parent

example (src)




Questions:

$objP->childNodes holds the children of <p>…Q: What is $objP->childNodes->length?Q: What does $objP->childNodes->item(1) refer to?Q: How do I access the 'bold' text node?


<p>

<b>

children

parent

A: 3

'bold'

'A' ' example'

A:

A: objP.childNodes[1].firstChild example (src)




Child node relationships

Each of $objP->childNodes is a set of nodes whose parentNode properties points back up the tree to the <p> element

– E.g. $objP->childNodes->item(0) is the 'A' text node

$objP->childNodes->item(0)->parentNode refers back to <p>


<p>

'A' <b> ' example'

'bold'children

parent


Question:

Q: If $objP->childNodes->item(1)->firstChild is the 'bold' text node, what is: $objP->childNodes->item(1)->firstChild

->parentNode->parentNode ?


<p>

'A' <b> ' example'

'bold'

children

parent

A: It refers to the parent of the parent of the 'bold' textnode, i.e. back to the <p>. example (src)




Sibling node relationships

Each of the children of the <p> element are siblings (of each

other):

•If $objB represents the <b> node then– $objB == objP->childNodes->item(1) from before

– $objB->nextSibling is the ' example' text

node

– $objB->previousSibling is the 'A' text node


'A' <b> ' example' siblings

example (src)




Question:

Q: If $objB represents the <b> node then what does

$objB->nextSibling->parentNode refer to?


'A' <b> ' example' siblings

A: It refers back to the parent of the ' example'text node, i.e. back to the <p> as before.(As any of the 'sibling' nodes do)

<p> parent

A:

42

Node attribute methods

• For any DOM object $obj, use– $obj->getAttribute(attName);

to query (return) an attribute value– $obj->getAttributeNS(ns,attName);

to query (return) an attribute value with a namespace (ex)– $obj->setAttribute(attName,attValue);

to set an attribute value– $obj->hasAttributes();

returns true/false


http://www.w3schools.com/dom/met_element_getattributens.asp

43

Node attributes• E.g. attributes of tags can be accessed from

those DOM1 object methods:– <img src="image.gif" id="img1">– So to change the image source :

$imageObj = $document.getElementById('img1');$imageObj->setAttribute('src','image2.gif');


44

Node type and value properties

CO2013/CO3013XML for the Web

Node nodeName nodeType nodeValue

Element HTML tag name 1 null

Attribute Attribute name 2 The att. Value

Text node #text 3 The text

CDATA #cdata-section 4 The CDATA text

Comment node #comment 8 The comment text

Document node #document 9 null

More: W3C

http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-1841493061

45

Node type and value properties

CO2013/CO3013XML for the Web

Node nodeName nodeType nodeValue

#text 3 'A'

'b' 1 null

#text 3 ' example'

id="thp" ‘id' 2 'thp'

<p id="thp">A <b>bold</b>

example</p>

'A'

<b>

' example'

PHP XML DOM

• Client or server the procedure is the same:1. Load the XML document.2. Successfully-loaded doc is parsed into a DOM.3. Methods & properties of the DOM objects give

access to the information and relationships represented by tags.

• E.g. See w3schools.com, php.net

http://www.w3schools.com/php/php_xml_dom.asp

http://php.net/book.dom

Example: Parsing RSS

• The example RSS document:– <?xml version="1.0" encoding="UTF-8"?><rss ...>

<channel> ... <item> ...

<title>...</title> <link>...</link> <description>...</description> <enclosure>...</enclosure>

– etc... Repeated for each <item>– NB: What about white space?

XML DOM

• So...– $doc = new DOMDocument();– $doc->preserveWhiteSpace=FALSE;– $doc->load('http://.../podcast.xml');– $doc->getElementsByTagName('item');– for ($i=0; $i < $doc->length; $i++) ...– $doc->item($i)->firstChild is <title>– $doc->item($i)->childNodes->item(2) is <description>

– $doc->item($i)->childNodes->item(3) is <enclosure>

Example

http://staffnet.kingston.ac.uk/~ku13043/XML/week03/hello-world.php

dr james denholm-price [email protected] xml lecture 3 client server: php + xml

Documents

weather data http

youtube http

data specification

services exchange data

php xml slide

delimiters http

tv http

cowpaths slide