x ml basics seminar
TRANSCRIPT
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 1/97
XML Basics
Wednesday May 12, 1999 SD99
Copyright 1999 Elliotte Rusty Harold
http://metalab.unc.edu/xml/slides/
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 2/97
What is XML?
• Extensible Markup Language
• A syntax for documents
• A Meta-Markup Language
• A Structural and Semantic language,
not a formatting language• Not just for Web pages
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 3/97
XML is a Meta Markup Language
• Not like HTML, troff, LaTeX
• Make up the tags you needs as you
need them
• The tags you create can bedocumented in a Document Type
Definition (DTD)
• A meta syntax for domain-specificmarkup languages like MusicML,
MathML, and CML
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 4/97
XML describes structure andsemantics, not formatting
• XML documents form a tree
• Element and attribute names reflect
the kind of the element
• Formatting can be added with a stylesheet
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 5/97
A Song Description in HTML
<dt>Hot Cop <dd> by Jacques Morali, Henri
Belolo, and Victor Willis <ul> <li>Producer: Jacques Morali <li>Publisher: PolyGram Records <li>Length: 6:20
<li>Written: 1978 <li>Artist: Village People </ul>
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 6/97
A Song Description in XML
<SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER>
<COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER>
<LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST>
</SONG>
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 7/97
Style Sheets provide formattingSONG {display: block}
TITLE {display: block; font-family: Helvetica, serif;font-size: 20pt; font-weight: bold}
COMPOSER {display: block;font-family: Times, Times New Roman, serif;font-size: 14pt; font-style: italic}
ARTIST {display: block;font-family: Times, Times New Roman, serif;font-size: 14pt; font-weight: bold;font-style: italic}
PUBLISHER {display: block; font-size: 14pt;
font-family: Times, Times New Roman, serif}LENGTH {display: block;font-family: Times, Times New Roman, serif;font-size: 14pt}
YEAR {display: block;font-family: Times, Times New Roman, serif;
font-size: 14pt}
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 8/97
Attaching style sheets todocuments
• Processing Instruction
<?xml-stylesheet type="text/css"href="song.css"?>
• Converter Program
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 9/97
What is XML used for?
• Domain-Specific Markup Languages
• Self-Describing Data
• Interchange of Data Among Applications
•
Structured and Integrated Data
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 10/97
Domain-Specific MarkupLanguages
• Non proprietary format
• Don’t pay for what you don’t use
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 11/97
Self-Describing Data
• Much data is lost due to formatproblems
• XML is very simple
• XML is self-describing
•
XML is well documented
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 12/97
<PERSON ID="p1100" SEX="M"> <NAME> <GIVEN>Judson</GIVEN>
<SURNAME>McDaniel</SURNAME> </NAME> <BIRTH> <DATE>21 Feb 1834</DATE>
</BIRTH> <DEATH> <DATE>9 Dec 1905</DATE>
</DEATH>
</PERSON>
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 13/97
Interchange of Data Among Applications
• E-commerce
• Syndication
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 14/97
Structured and Integrated Data
• Can specify relationships betweenelements
• Can assemble data from multiplesources
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 15/97
XML Applications
• A specific markup language uses the XML meta-syntax is called an XMLapplication
• Different XML applications havetheir own more constricted syntaxesand vocabularies within the broader
XML syntax
• Further syntax can be layered ontop of this; e.g. data typing through
DCDs or other schemas
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 16/97
Example XML Applications
• Web Pages
• Mathematical Equations
• Music Notation
• Vector Graphics
• Metadata
• and more…
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 17/97
Mathematical Markup Language
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 18/97
Channel Definition Format
<?xml version="1.0"?> <CHANNEL HREF="http://metalab.unc.edu/xml/index.html"> <TITLE>Cafe con Leche</TITLE> <ITEM HREF="http://metalab.unc.edu/xml/books.html"> <TITLE>Books about XML</TITLE>
</ITEM> <ITEM HREF="http://metalab.unc.edu/xml/tradeshows.html"> <TITLE>Trade shows and conferences about XML</TITLE>
</ITEM> <ITEM HREF="http://metalab.unc.edu/xml/lists.htm"> <TITLE>Mailing Lists dedicated to XML</TITLE>
</ITEM> </CHANNEL>
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 19/97
Classic Literature
• The Complete Plays of Shakespeare
• The Bible
• The Koran
• The Book of Mormon
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 20/97
Vector Graphics
• Vector Markup Language (VML)
– Internet Explorer 5.0
– Microsoft Office 2000
• Scalable Vector Graphics (SVG)
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 21/97
The Resource Description
Framework (RDF)
• Meta-data
• Dublin Core
• Better Web searching
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 22/97
An Example of RDF
<rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/DC/> <rdf:Descriptionabout="http://metalab.unc.edu/xml/> <dc:CREATOR>Elliotte Rusty
Harold</dc:CREATOR>
<dc:TITLE>Cafe con Leche</dc:TITLE> </rdf:Description>
</rdf:RDF>
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 23/97
XML for XML
• XSL: The Extensible StylesheetLanguage
• DCD: The Document ContentDescription Schema Language
• XLL: The Extensible Linking Language
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 24/97
XSL: The Extensible StylesheetLanguage
• XSL Transformations
• XSL Formatting Objects
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 25/97
DCD: The Document ContentDescription Schema Language
• Data Typing in XML is Weak
• <MONTH>9</MONTH>
<DCD> <ElementDef Type="MONTH"
Model="Data" Datatype="i1" Min="1" Max="12" />
</DCD>
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 26/97
XLL: The Extensible LinkingLanguage
• Any element can be a link
• Links can be bi-directional
• Links can be separated from thedocuments they connect
<footnote xlink:form="simple"href="footnote7.xml">7</footnote>
File Formats In house
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 27/97
File Formats, In-houseapplications, and other behind
the scenes uses • Microsoft Office 2000
• Federal Express Web API
• Netscape What’s Related
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 28/97
Hello XML
<?xml version="1.0" standalone="yes"?> <FOO> Hello XML!
</FOO>
• Plain ASCII or UTF-8 text
• .xml is standard file extension
• Any standard text editor will work
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 29/97
The XML Declaration
• version attribute
– required
– always has the value 1.0
• standalone attribute
– yes
– no
• encoding attribute
– UTF-8
– 8859_1
– etc.
<?xml version="1.0" standalone="yes"?>
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 30/97
The FOO element
• Start tag <FOO>
•
Contents "Hello XML!"• End tag </FOO>
<FOO> Hello XML!
</FOO>
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 31/97
greeting.xml
<?xml version="1.0" standalone="yes"?> <GREETING>
Hello XML! </GREETING>
St l h t
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 32/97
Style sheets
• Separate from the XML document
• Different Languages
– Cascading Style Sheets Level 1 (CSS1)
Internet Explorer 5.0
Mozilla 5.0 – Cascading Style Sheets Level 2 (CSS2)
Internet Explorer 5 (partial)
Mozilla 5.0 (partial)
– Extensible Style Language (XSL)
Internet Explorer 5.0 (older draft, buggy)
LotusXSL, XT, Other non-browser converters
–
Document Style and Semantics Language
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 33/97
xml-stylesheet
•
Style sheets are attached via an xml-stylesheet processing instruction in the
prolog
<?xml version="1.0" standalone="yes"?> <?xml-stylesheet type="text/css"
href="greeting.css"?> <GREETING>Hello XML!</GREETING>
– type attribute has the value text/css or text/xsl
– href attribute is a URL to the stylesheet, possiblyrelative
• Can also use non-browser converters like
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 34/97
greeting.css
GREETING {display: block;font-size: 24pt;font-weight: bold}
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 35/97
A larger example: Baseballstatistics
• Examine the data
• Design a vocabulary for the data
• Write a style sheet
S l t ti ti
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 36/97
Sample statistics
http://cbs.sportsline.com/u/baseball/mlb/
stats.htm
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 37/97
Organizing the Data
• XML documents are trees.
• XML elements contain other elements as
well as text
• Within these limits there's more than oneway to organize the data
– Hierarchically
– Relationally
– Objects
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 38/97
What is the Root Element
• The League?
• The Season?
• A custom Document element?
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 39/97
The Root Element
<?xml version="1.0"?> <SEASON> </SEASON>
• Choose SEASON for the root element
• Everything else will be a descendant of SEASON
• This is not the only possible choice
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 40/97
What are the ImmediateChildren of The root?
• Leagues?
• Teams?
• Players?
• Games?
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 41/97
Child Elements
<?xml version="1.0"?> <SEASON> <YEAR>
1998
</YEAR> </SEASON>
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 42/97
White space in XML is notespecially significant
<?xml version="1.0"?>
<SEASON><YEAR>1998</YEAR></SEASON>
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 43/97
Leagues
• Major league baseball is divided intotwo leagues
• Each league has – a name
– three divisions
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 44/97
Divisions
• Each division has
– name
–
4-6 teams
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 45/97
Teams
• Each team has
– Name
–
City – Players
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 46/97
Player Data
• Each player has
– First name
–
Last name – Position
– Statistics
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 47/97
Player Batting Statistics
• G Games Played
• GS Games Started
• AB At Bats
• R Runs• H Hits
• 2B Doubles
•
3B Triples• HR Home Runs
• RBI Runs Batted In
• SB Stolen Bases
• CS Caught Stealing
• SH Sacrifice Hits
• SF Sacrifice Flies• Err Errors
• PB Pitcher Balked
• BB Base on Balls(Walks)
• SO Strike Outs
• HBP Hit By Pitch
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 48/97
What does a player look like
• Long names vs. short names
Th C l t 1998 M j
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 49/97
The Complete 1998 MajorLeague
• Long version
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 50/97
A Style Sheet
• 1998shortstats.xml
• baseballstats.css
• <?xml-stylesheet type="text/css"href="baseballstats.css"?>
• styled1998shortstats.xml
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 51/97
Cascading Style Sheets
• Partially supported by Mozilla and IE5.0
•Full W3C Recommendation
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 52/97
The Default Rule
• Not every element needs a rule
• The root element should be at least
display: block
SEASON { font-size: 14pt; background-color: white;
color: black;display: block}
A t l l f th YEAR
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 53/97
A style rule for the YEAR element
• Make it look like a title
YEAR { display: block;font-size: 32pt;font-weight: bold;text-align: center}
St l R l f Di i i d L
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 54/97
Style Rules for Division and LeagueNames
LEAGUE_NAME { display: block;text-align: center;font-size: 28pt;font-weight: bold}
DIVISION_NAME { display: block;text-align: center;font-size: 24pt;
font-weight: bold}
Alt t St l R l f Di i i
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 55/97
Alternate Style Rules for Divisionand League Names
LEAGUE_NAME, DIVISION_NAME {display: block;text-align: center;font-weight: bold}
LEAGUE_NAME {font-size: 28pt }DIVISION_NAME {font-size: 24pt }
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 56/97
Style Rules for Teams• Team name and Team city must be one
title
• Must be inline elements
•
Previous and following must be block elements
TEAM_CITY { font-size: 20pt; font-weight: bold; font-style: italic}
TEAM_NAME { font-size: 20pt; font-weight: bold; font-style: italic}
TEAM, PLAYER {display: block}
Style Rules for Players
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 57/97
Style Rules for PlayersTEAM {display: table}TEAM_CITY {display: table-caption}TEAM_NAME {display: table-caption}PLAYER {display: table-row}
SURNAME, GIVEN_NAME, POSITION,
GAMES, GAMES_STARTED, AT_BATS, RUNS, HITS,DOUBLES, TRIPLES, HOME_RUNS, RBI, STEALS,CAUGHT_STEALING, SACRIFICE_HITS,SACRIFICE_FLIES, ERRORS, WALKS, STRUCK_OUT,
HIT_BY_PITCH {display: table-cell}
Finished Style Sheet
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 58/97
Finished Style Sheet
SEASON {font-size: 14pt; background-color:
white; color: black; display: block}YEAR {display: block; font-size: 32pt;
font-weight: bold; text-align: center}LEAGUE_NAME {display: block; text-align:
center; font-size: 28pt; font-weight: bold}DIVISION_NAME {display: block; text-align:center; font-size: 24pt; font-weight: bold}TEAM_CITY {font-size: 20pt; font-weight:
bold; font-style: italic}TEAM_NAME {font-size: 20pt;
font-weight: bold; font-style: italic}TEAM {display: block}PLAYER {display: block}
Possible Extensions
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 59/97
Possible Extensions
• There should be captions like "RBI" or "At
Bats.” • Derived numbers like batting averages are
not included.
• The titles are short. E.g. "1998" instead of "1998 Major League Baseball".
• The document is so long it's hard to read.
Something similar to IE5's collapsibleoutline view would be nice.
• Pitcher stats should be separated frombatter stats.
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 60/97
Possible Solutions
• CSS Level 2
• XSL
• XSL + JavaScript
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 61/97
Well-formedness Rules• Open and close all tags
• Empty tags end with />
• There is a unique root element
• Elements may not overlap
• Attribute values are quoted
• < and & are only used to start tags andentities
• Only the five predefined entity references
are used
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 62/97
Open and close all tags
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 63/97
Empty tags end with />
• <BR/>, <HR/>, and <IMG/> insteadof <BR>, <HR>, and <IMG>
•
Web browsers deal inconsistently withthese
• Can use <BR></BR> <HR></HR>
<IMG></IMG> instead
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 64/97
There is a unique root element
• One element completely contains allother elements of the document
•
This is HTML in HTML files
• XML Declaration is not an element
<?xml version="1.0" standalone="yes"?>
<GREETING>
Hello XML!
</GREETING>
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 65/97
Elements may not overlap
• If an element contains a start tag foran element, it must also contain thecorresponding end tag
• Empty elements may appear anywhere
• Every non root element has a parent
element
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 66/97
Attribute values are quoted
• Good:
– <AHREF="http://metalab.unc.edu/xml/">
• Bad:
– <A HREF=http://metalab.unc.edu/xml/>
< and & are only used to start tags
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 67/97
y gand entities
• Good: <H1>O'Reilly & Associates</H1>
• Bad: <H1> O'Reilly &
Associates</H1>
• Good:
– <CODE>for (int i = 0; i <=args.length; i++ ) { </CODE>
• Bad:
–
<CODE>for (int i = 0; i <= args.length;++
Only the five predefined entity
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 68/97
Only the five predefined entityreferences are used
• Good:
– &
– <
– >
– "
– '
• Bad:
– ©
– ®
– &tm;
– α
– é
–
– etc.
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 69/97
DTDs and Validity
• A Document T ype Definition describesthe elements and attributes that mayappear in a document
• Validation compares a particulardocument against a DTD
• Well-formedness is a prerequisite for
validity
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 70/97
What is a DTD?
• a list of the elements, tags, attributes,and entities contained in a document,and their relationship to each other
• internal vs. external DTDs
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 71/97
The importance of validation
• Ensures that data is correct beforefeeding it into a program
• Ensure that a format is followed
• Establish what must be supported
• Not all documents need to be valid;
sometimes well-formed is enough
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 72/97
A DTD for greeting.xml
• greeting.xml:
<?xml version="1.0"?> <GREETING> Hello XML! </GREETING>
• greeting.dtd:
<!ELEMENT GREETING (#PCDATA)>
D t T D l ti
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 73/97
Document Type Declarations <?xml version="1.0"?>
<!DOCTYPE GREETING SYSTEM "greeting.dtd"> <GREETING> Hello XML!
</GREETING> • specifies the root element
• gives a URL for the DTD
Invalid Documents
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 74/97
Invalid Documents• Valid: <GREETING> various random text but no markup </GREETING>
• Invalid: anything else including <GREETING> <sometag>various random text</sometag> <someEmptyTag/>
</GREETING>
– or <GREETING> <GREETING>various random
text</GREETING>
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 75/97
Validating Tools
• Command line programs like XJParse
• Online validators
–
http://www.stg.brown.edu/service/xmlvalid/
– http://www.cogsci.ed.ac.uk/%7Erichard/ xml-check.html
• Browsers
l l
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 76/97
Element Declarations
• Each tag must be declared in a<!ELEMENT> declaration.
• A <!ELEMENT> declaration gives thename and content model of theelement
• The content model uses a simple
regular expression-like grammar toprecisely specify what is and isn'tallowed in an element
Content Specifications
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 77/97
Content Specifications
•
ANY • #PCDATA
• Sequences
• Choices
• Mixed Content
• Modifiers
• Empty
ANY
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 78/97
ANY
<!ELEMENT SEASON ANY>
• A SEASON can contain any childelement and/or raw text (parsedcharacter data)
#PCDATA
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 79/97
#PCDATA
<!ELEMENT YEAR (#PCDATA)>
• Parsed Character Data; i.e. raw text,no markup
#PCDATA
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 80/97
#PCDATA
• Valid: <YEAR>1999</YEAR> <YEAR>99</YEAR> <YEAR>1999 C.E.</YEAR> <YEAR>
The year of our Lord one thousand, ninehundred, and ninety-nine
</YEAR>
• Invalid: <YEAR> <MONTH>January</MONTH> <MONTH>February</MONTH>
<MONTH>March</MONTH> <MONTH>April</MONTH> <MONTH>May</MONTH> <MONTH>June</MONTH> <MONTH>July</MONTH>
<MONTH>August</MONTH> <MONTH>September</MONTH> <MONTH>October</MONTH> <MONTH>November</MONTH> <MONTH>December</MONTH>
</YEAR>
Child El t
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 81/97
Child Elements
• To declare that a LEAGUE elementmust have a LEAGUE_NAME child:
<!ELEMENT LEAGUE (LEAGUE_NAME)>
<!ELEMENT LEAGUE_NAME (#PCDATA)>
S
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 82/97
Sequences
• Separate multiple required childelements with commas; e.g.
<!ELEMENT SEASON (YEAR, LEAGUE,
LEAGUE)>
<!ELEMENT LEAGUE (LEAGUE_NAME,DIVISION, DIVISION, DIVISION)>
O M Child
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 83/97
One or More Children +
<!ELEMENT DIVISION_NAME (#PCDATA)>
<!ELEMENT DIVISION (DIVISION_NAME,TEAM+)>
Z M Child *
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 84/97
Zero or More Children *
<!ELEMENT TEAM (TEAM_CITY, TEAM_NAME,PLAYER*)>
<!ELEMENT TEAM_CITY (#PCDATA)>
<!ELEMENT TEAM_NAME (#PCDATA)>
Z O Child ?
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 85/97
Zero or One Children ? <!ELEMENT PLAYER (GIVEN_NAME, SURNAME,
POSITION, GAMES, GAMES_STARTED, AT_BATS?, RUNS?, HITS?, DOUBLES?,TRIPLES?, HOME_RUNS?, RBI?, STEALS?,CAUGHT_STEALING?, SACRIFICE_HITS?,
SACRIFICE_FLIES?, ERRORS?, WALKS?,STRUCK_OUT?, HIT_BY_PITCH?, WINS?,LOSSES?, SAVES?, COMPLETE_GAMES?,SHUT_OUTS?, ERA?, INNINGS?,
EARNED_RUNS?, HIT_BATTER?, WILD_PITCHES?, BALK?,WALKED_BATTER?,STRUCK_OUT_BATTER?)
>
Fi i h d DTD
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 86/97
Finished DTD
Ch i
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 87/97
Choices
<!ELEMENT PAYMENT (CASH |CREDIT_CARD)>
<!ELEMENT PAYMENT (CASH |
CREDIT_CARD | CHECK)>
Grouping With Parentheses
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 88/97
• Parentheses combine several elements
into a single element.
• Parenthesized element can be nestedinside other parentheses in place of a
single element.
• The parenthesized element can besuffixed with a plus sign, a comma, or a
question mark. <!ELEMENT dl (dt, dd)*> <!ELEMENT ARTICLE (TITLE, (P | PHOTO |
GRAPH | SIDEBAR | PULLQUOTE |
SUBHEAD)*, BYLINE?)>
Mixed Content
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 89/97
Mixed Content
• Both #PCDATA and child elements in achoice
<!ELEMENT TEAM (#PCDATA | TEAM_CITY |
TEAM_NAME | PLAYER)*>
• #PCDATA must come first
•#PCDATA cannot be used in asequence
Empty elements
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 90/97
Empty elements
<!ELEMENT BR EMPTY>
<!ELEMENT IMG EMPTY>
<!ELEMENT HR EMPTY>
Internal DTDs
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 91/97
Internal DTDs
<?xml version="1.0"?> <!DOCTYPE GREETING [ <!ELEMENT GREETING (#PCDATA)>
]> <GREETING> Hello XML!
</GREETING>
Internal DTD Subsets
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 92/97
Internal DTD Subsets
<?xml version="1.0"?> <!DOCTYPE GREETING SYSTEM
"greeting.dtd" [
<!ELEMENT GREETING (#PCDATA)> ]> <GREETING> Hello XML!
</GREETING> • Internal declarations override
external declarations
Programming with XML
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 93/97
Programming with XML
• Java works best
• C, Perl, Python etc. can also be used
• Unicode support is the biggest issue
SAX the Simple API for XML
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 94/97
SAX, the Simple API for XML
• Event based
• Programs can plug in different parsers
The Document Object Model
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 95/97
j(DOM)
To Learn More: Books
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 96/97
To Learn More: Books
• XML: Extensible Markup Language
– IDG Books 1998
– ISBN 0-76453-199-9
• The XML Bible
– IDG Books 1999
– ISBN 0-76453-236-7
Questions?
7/29/2019 x Ml Basics Seminar
http://slidepdf.com/reader/full/x-ml-basics-seminar 97/97
Questions?