dr james denholm-price [email protected] xml lecture 3 client server: php + xml
Post on 20-Dec-2015
218 views
TRANSCRIPT
Delimiters
http://ars.userfriendly.org/cartoons/?id=20041025
Lecture 3 contents
• Motivation: Context & why XML.• Examples: “Use cases”• Client-server: Server-side processing.
– Using what we know (the DOM!)– Looking forward to what we don’t yet know...
– Client-side processing will be later.
Context“Things should be made as simple as possible,but no simpler. ” – Albert Einstein.– “Things” CD player?
• To play a CD you put your CD into a CD player and the playerplays it for you.
• The CD player provides a CD-playing service.
• Object oriented programming approach:– You bind data and its processing together.
• In O-O, every CD would come with its own player.
• Service-orientation approach:– Services provide functions that “do things”.– For software, services exchange data which must be comprehensible ...
XML is a lingua franca commonly (but not exclusively) used.
Image: Albert Einstein
Archives,The Hebrew University of Jerusalem,
Israel.
Services• Huge subject – “Service-Oriented Architecture” (SOA).• The essence is something like:
– Break down “business process” workflows.– Provide “services” using common protocols & well-defined models
to securely expose “business data”– Aggregate service data into “business objects” to answer questions.– E.g. “Who’s on module CO3070 in 2010/11?”
• Module code module name.• Module code + year list of student IDs.• Student ID students details.• Student ID student photo?• Aggregate it all together to give a class list (e.g. PDF).
Why bother with “services” at all?
Server
HTTPHTTP
DB
Data Service
XMLXML Data
XMLXML Data
XMLXML Data
“data” direct to the client! ... More later...
Server
HTTPHTTP
DB
Data Service
HTTPHTTP Data
HTTPHTTP Data
HTTPHTTP Data
Why XML?
• Why not text? After all...– Easily read, easily written!
• Why not arbitrary binary files? After all...– Easily written, efficient/small size!
• But are they really:– Easily understood?
• Internationalisation! (Test: Ff vs IE)• Meaning!• Sharing!
– Easy to write?!
XML• Written in plain text easy to write, exchange.• Defaults to UTF-8 internationalised.• Can be defined messages can be:
– Specified.• DTDs (also XML Schema, RelaxNG, Schematron...)
– Validated.• Can be exchanged HTTP is easy & ubiquitous.• Easy for machines to parse into data...• It’s not particularly efficient (size-wise) issue?• It’s also not the only data specification with (most of) these
properties ... more of which in a later lecture!
What data’s out there?
• TV– http://bleb.org/tv/data/rss.php?ch=bbc1&day=0
• Youtube– http://gdata.youtube.com/feeds/base/videos?q=h
tc%20desire%20hd&client=ytapi-youtube-search&alt=rss&v=2
• weather data– http://weather.yahooapis.com/forecastrss?p=UKX
X0770&u=c (Valid?)
• Notice anything? – RSS = early example of “paving the cowpaths” ;-)
RSS
• “Rich Site Summary” or “Really Simple Syndication” ... its history is complex– Wikipedia has a good summary!
• RSS v2.0 seems well-specified (DTD or XSD) if “unofficial” ;-)
RSS specification• Human-readable versions online (e.g. RSS2)• DTD:
– <!ELEMENT rss (channel)>
– <!ATTLIST rss version CDATA #FIXED "2.0">
– <!ELEMENT channel ((item+)| (title,link,description,(language|copyright| managingEditor|webMaster|pubDate|lastBuildDate| category|generator|docs|cloud|ttl|image| textInput|skipHours|skipDays)*))>
– <!ELEMENT item ((title|description)+,link?, (author|category|comments|enclosure|guid|pubDate|source)*)>
Etc...
RSS example: WebTech podcast<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel>
<title>Web Technologies Podcast</title><link>http://lms.../webapps/lobj-podcast-bb_bb60/feed/CO2013-A_SEM1/podcast.xml</
link> <description>A podcast to accompany lectures in Web Technologies ...</description> <image>
<title>CO2013A banner.GIF</title> <url>http://lms.../webapps/lobj-podcast-bb_bb60/files/itunes/...</url>
</image> <item>
<title>Lecture 1 Introduction and revision</title><link>http://staffnet.../~ku13043/WebTech/media/lecture01.mp3</link> <description>Easy introduction to the module,
...</a><br></description> <enclosure url="http://staffnet.../~ku13043/WebTech/media/lecture01.mp3"
type="audio/mpeg" /> <pubDate>Mon, 05 Oct 2009 13:37:05 GMT</pubDate> <guid>http://staffnet..../~ku13043/WebTech/media/lecture01.mp3</guid> <dc:date>2009-10-05T13:37:05Z</dc:date>
</item> </channel> </rss> Valid?
Mashups
• Mash data from disparate sources into an application.
• Google maps is commonly used because its client-side APIs support reading & displaying data, e.g. XML data.– http://googlemapsmania.blogspot.com/– http://ccgi.arutherford.plus.com/website/flexTraffic/
• UK Highways Agency Java applet– Where’s the path? OS data + Google maps– WalkJogRun + iPhone app
How can we process XML?
• Depends what we want to do with it & when:– Java application – many ways, JAXB/P is common.– Windows app’s – .Net has tools...– Web (client/server):
• Server-side languages (server) usually have platform-specific packages.
– E.g. JSP JAXB/P, PHP DOM ;-) SimpleXML etc.• Browsers (client) have varying support for XML, usually
XMLDOM (lecture 8) & XSLT (lecture 6)– Security issues (“same origin”).
16
Request for remote data
DNS server
1 DNS query
2 TCP connection
3 HTTP request
4 HTTP response
5 Optional parallel connections
(embedded resources)
Browser
Origin server
DB
6 Asynchronous requests
Tim
e
Security (1/2)• Is it a good idea to let an arbitrary web page load data
from any source?– No!
• On the client– E.g. What if you inadvertently make it possible for someone
nefarious to inject arbitrary JavaScript into a page ... That JavaScript could load “data” from anywhere and do anything to the page, including monitoring keystrokes, passwords, credit cards, reading cookies, phishing...
– Current solution = “same origin” policy:• Scripted requests must come from the same domain (web server) as
the HTML.
Security (2/2) – “Same origin” is a severe restriction so
workarounds exist ;-)• E.g. a <script> tag can do pretty-much anything;
Google for “comet”.• A W3C working party (on “webapps”) is looking at APIs
and “trust” issues...
• On the server– Strict restrictions don’t apply (but the remote
server might filter requests).– Do whatever you like & get away with it badge...
Approaches to XML processing• XPath & XSLT: after lecture ~5.
– “Better” ways to navigate the node tree & manipulate data.• DOM (“Document Object Model”)
– leverages what we know (from CO2013): Server-side and client-side.– Simple but relatively heavyweight API.– Well-defined by W3C.– Works ~OK with namespaces.
• Objects let us use language-native notation– Usually needs bolt-on search mechanisms, XPath...– Different in each language.– Namespaces are a mess.
• Other processors like SAX try to be more lightweight, particularly for large XML doc’s.
PHP
21
PHP• PHP is a server-side scripting language
– Like ASP, JSP, ASP.Net, Cold Fusion it is executed • by the server!• on the server!• you must preview/test your PHP from the server!
• Unlike CGI (Perl etc) the code is embedded within the web page – usually identified by a special extension ‘file.php’ not
‘file.html’
• PHP documentation is your friend...– The online manual is fantastic so bookmark it!
• http://www.php.net/docs.php
CO3041Databases and the Web
22
PHP in web pages (from the PHP manual)• The recommended way to embed PHP in (X)HTML is:
– <?php echo "if you want to serve XHTML or XML documents, do like this\n"; ?>
– You should all use this!• PHP short tags are:
– <? echo "this is the simplest, an SGML processing instruction\n"; ?>
– These can interfere with XML instructions, like DOCTYPE or XML prolog <?xml version="1.0" encoding="utf-8"?>
• PHP echo shortcut (like <%= %> in ASP):– <?= expression ?> This is a shortcut for "<? echo
expression ?>"• Alternative embedding to please old M$ FrontPage
– <script language="php">echo "some editors";</script>• Finally ASP-style could be enabled (for some mad reason!)
– <% echo "You may optionally use ASP-style tags"; %>
CO3041Databases and the Web
A syntax quickie
• C-like syntax (similar to C, Java, JavaScript etc.)• $ denotes a PHP variable (scalar, array, object)• '....' delimits unparsed strings;
"..." is parsed for variables, wrap arrays in {}• <?php
$date = date('r');echo "Today's date is <b>$date</b>. ";echo 'Can\'t wait until the
summer ;-)';?>
Client vs Server
• PHP is server-side code!– The client only sees stuff that is deliberately
written to the output stream.– E.g.
• echo, print, printf, print_r• Error messages (if error_reporting is enabled.)• header(...) used to set HTTP headers (also see
output buffering)
Introducing the document object model
• DOM:– Stands for Document Object Model– Developed by the W3C– Designed for XML and HTML– Interface between programming language and
document content– Allows the browser to run programs to manipulate
documents
27
W3C DOM levels• Level 1:
– HTML and XML• Level 2:
– Supports namespaces, Cascading Style Sheets, and user-initiated actions, such as mouse clicks and key strokes
• Level 3:– Standardise support for document loading and saving– Still under development
• DOM spec’s exist for many XML languages & platforms– SMIL– MathML– SVG– {JavaScript, ActionScript, JScript} ECMAScript– PHP
28
29CO3070XML for the Web
DOM nodes & tree structure
• W3C Level 1 Document Object Model (DOM1)– Gives access to the elements in a document– Tree structure
• elements correspond with nodes (objects)• attributes are properties• element text is a node itself
– Provides methods to• Query elements contents, attributes, children• Add, delete, move elements within the tree• Add, delete, change attributes
30CO3070XML for the Web
A simple example document
<html>
<head>
<title>Hi!</title>
</head>
<body>
<h1>Simple</h1>
<p>A <b>bold</b>
example</p>
</body>
</html>
31
Behind the scenes: Tree of nodes
<html>
CO3070XML for the Web
documentElement
<html>
<head> <body>
<title>
'Hi!'
<h1> <p>
'Simple'
'A ' <b> ' example'
'bold'
<head>
<title>Hi!</title>
</head>
<body>
</html>
<p>A <b>bold</b>
example</p>
</body>
<h1>Simple</h1>
32
Locating objects by id
• DOM1$document->getElementById('theP');
– This example returns a reference to the object that represents the <p> element in the DOM
– However this only works in XML if there is an attached & valid DTD which defines an attribute of type ID.
CO3070XML for the Web
<html>
<head>
<title>Hi!</title>
</head>
<body>
</html>
<p id="theP">A <b>bold</b> example</p>
</body>
<h1>Simple</h1>
Example (src)
NB: Examples require PHP5 XML DOM.
33
Locating objects by tag
• DOM1: $pEls = $document->getElementsByTagName('p');
– Can also be called from any DOM node.– This returns a collection (~array) of objects,
no prizes for guessing what's in it(not much in this simple doc!)
• pEls->item(…)
CO3070XML for the Web
<html>
<head>
<title>Hi!</title>
</head>
<body>
</html>
<p id="thePara">A <b>bold</b> example</p>
</body>
<h1>Simple</h1>
Example (src)
34
Locating objects by tag #2• DOM1: $pEls = document.getElementsByTagName('p');
– pEls.item(…)
– pEls.item(0)
– pEls.item(1)
• What about “namespaces”?– getElementsByTagNameNS(
namespace,
tagName
)
CO3070XML for the Web
<html>
<head>
<title>Hi!</title>
</head>
<body>
</html>
<p>A <b>bold</b> example</p>
<p>Ha!</p>
</body>
<h1>Simple</h1>
<p>A <b>bold</b> example</p>
<p>Ha!</p>
<p>A <b>bold</b> example</p>
<p>Ha!</p>
example (src)
35
<p class="blob">A…e</p>
<p class="blob">Ha!</p>
Looping over objects by tag• DOM1: $pEls = $document->getElementsByTagName('p');
• for ($i=0;
$i<$pEls->length;
$i++
) {
echo $pEls
->item($i)
->firstChild
->nodeValue;
}
CO3070XML for the Web
<html>
<head>
<title>Hi!</title>
</head>
<body>
</html>
<p>A <b>bold</b> example</p>
<p>Ha!</p>
</body>
<h1>Simple</h1>
example (src)
36CO3070XML for the Web
Parent node relationships
If $objP == $doc->getElementById('thp'); then$objP->childNodes[] an array containing 3 nodes…$objP->firstChild the 'A ' text node$objP->lastChild the ' example' text
node
<p id="thp">A <b>bold</b> example</p>
'bold'
<p>
'A' <b> ' example' children
parent
example (src)
37CO3070XML for the Web
Questions:
$objP->childNodes holds the children of <p>…Q: What is $objP->childNodes->length?Q: What does $objP->childNodes->item(1) refer to?Q: How do I access the 'bold' text node?
<p id="thp">A <b>bold</b> example</p>
<p>
<b>
children
parent
A: 3
'bold'
'A' ' example'
A:
A: objP.childNodes[1].firstChild example (src)
38CO3070XML for the Web
Child node relationships
Each of $objP->childNodes is a set of nodes whose parentNode properties points back up the tree to the <p> element
– E.g. $objP->childNodes->item(0) is the 'A' text node
$objP->childNodes->item(0)->parentNode refers back to <p>
<p id="thp">A <b>bold</b> example</p>
<p>
'A' <b> ' example'
'bold'children
parent
39CO3070XML for the Web
Question:
Q: If $objP->childNodes->item(1)->firstChild is the 'bold' text node, what is: $objP->childNodes->item(1)->firstChild
->parentNode->parentNode ?
<p id="thp">A <b>bold</b> example</p>
<p>
'A' <b> ' example'
'bold'
children
parent
A: It refers to the parent of the parent of the 'bold' textnode, i.e. back to the <p>. example (src)
40CO3070XML for the Web
Sibling node relationships
Each of the children of the <p> element are siblings (of each
other):
•If $objB represents the <b> node then– $objB == objP->childNodes->item(1) from before
– $objB->nextSibling is the ' example' text
node
– $objB->previousSibling is the 'A' text node
<p id="thp">A <b>bold</b> example</p>
'A' <b> ' example' siblings
example (src)
41CO3070XML for the Web
Question:
Q: If $objB represents the <b> node then what does
$objB->nextSibling->parentNode refer to?
<p id="thp">A <b>bold</b> example</p>
'A' <b> ' example' siblings
A: It refers back to the parent of the ' example'text node, i.e. back to the <p> as before.(As any of the 'sibling' nodes do)
<p> parent
A:
42
Node attribute methods
• For any DOM object $obj, use– $obj->getAttribute(attName);
to query (return) an attribute value– $obj->getAttributeNS(ns,attName);
to query (return) an attribute value with a namespace (ex)– $obj->setAttribute(attName,attValue);
to set an attribute value– $obj->hasAttributes();
returns true/false
CO3070XML for the Web
43
Node attributes• E.g. attributes of tags can be accessed from
those DOM1 object methods:– <img src="image.gif" id="img1">– So to change the image source :
$imageObj = $document.getElementById('img1');$imageObj->setAttribute('src','image2.gif');
CO3070XML for the Web
44
Node type and value properties
CO2013/CO3013XML for the Web
Node nodeName nodeType nodeValue
Element HTML tag name 1 null
Attribute Attribute name 2 The att. Value
Text node #text 3 The text
CDATA #cdata-section 4 The CDATA text
Comment node #comment 8 The comment text
Document node #document 9 null
More: W3C
45
Node type and value properties
CO2013/CO3013XML for the Web
Node nodeName nodeType nodeValue
#text 3 'A'
'b' 1 null
#text 3 ' example'
id="thp" ‘id' 2 'thp'
<p id="thp">A <b>bold</b>
example</p>
'A'
<b>
' example'
PHP XML DOM
• Client or server the procedure is the same:1. Load the XML document.2. Successfully-loaded doc is parsed into a DOM.3. Methods & properties of the DOM objects give
access to the information and relationships represented by tags.
• E.g. See w3schools.com, php.net
Example: Parsing RSS
• The example RSS document:– <?xml version="1.0" encoding="UTF-8"?><rss ...>
<channel> ... <item> ...
<title>...</title> <link>...</link> <description>...</description> <enclosure>...</enclosure>
– etc... Repeated for each <item>– NB: What about white space?
XML DOM
• So...– $doc = new DOMDocument();– $doc->preserveWhiteSpace=FALSE;– $doc->load('http://.../podcast.xml');– $doc->getElementsByTagName('item');– for ($i=0; $i < $doc->length; $i++) ...– $doc->item($i)->firstChild is <title>– $doc->item($i)->childNodes->item(2) is <description>
– $doc->item($i)->childNodes->item(3) is <enclosure>
Example