august 20061 chapter 2 - markup and core concepts learning xml by erik t. ray slides were developed...
TRANSCRIPT
August 2006 1
Chapter 2 - Markup and Core Concepts Learning XML
byErik T. Ray
Slides were developed by Jack Davis
College of Information Scienceand Technology
Radford University
August 2006 2
XML Syntax
• “Syntax” refers to the rules of a language
• Syntax is needed with any language so that the documents created with that language are consistent
• Programs that process documents expect the syntax rules to be followed, otherwise the document may not be interpreted correctly
August 2006 3
Components of an XML Document
• XML Declaration
• Elements
• Attributes
• Entities
• Comments
August 2006 4
Components: The XML Declaration
• The XML Declaration:– Tells the processing program that the document is an
XML document, along with other optional information
– The declaration is always the first line of an XML document
– Attributes that can be used in the Declaration:• version• encoding• standalone
– Example: <?xml version=“1.0”? Encoding=“UTF-8” standalone=“yes”?>
August 2006 5
Document Type Declaration
• Document type declarations are used to define entities or default attribute values. Secondly, they are used to support validation, a special mode of parsing that checks grammar and vocabulary of markup. A validating parser needs to read a list of declarations for element rules before it can begin to parse. In both cases, this is done in document type declaration section.
• A document type declaration consists of:- delimeter <!DOCTYPE- element name identifies the type element- dtd id local path or url- entity decl optional list of entity declara.
• dtd identifier supports two methods of identification: system-specific and public
<!DOCTYPE doc SYSTEM "/usr/simple.dtd">
<!DOCTYPE html PUBLIC "-//w3c//DTD HTML 3.2//EN" "http://www.w3.org/TR … >
August 2006 6
XML Syntax
• “Syntax” refers to the rules of a language
• Syntax is needed with any language so that the documents created with that language are consistent
• Programs that process documents expect the syntax rules to be followed, otherwise the document may not be interpreted correctly
August 2006 7
Components: XML Elements
• Elements:– Used to describe the data. Consist of:
• A start tag
• Content
• An end tag
– Example: <element>Content</element>
– The “root” element of a document is the outermost element, and contains all of the other elements in the document. There can be only one root element in a single document
• An element that does not contain any content is known as an “empty element”
August 2006 8
Element Nesting
• The term “nesting” refers to the process of containing elements within other elements
• Terminology:
– Child elements – elements that are contained within other elements
– Parent elements – elements that contain other elements
– Sibling elements – elements that share the same parent element
August 2006 9
Nesting Example
1 <family_tree>
2 <mother>Sally</mother>
3 <father>Joe</father>
4 <children>
5 <child>Larry</child>
6 <child>Curly</child>
7 <child>Mo</child>
8 </children>
9 </family_tree>
August 2006 10
Components: XML Attributes
• Attributes help to describe XML elements
• Attributes are always contained in the start tag of the element they are describing
• Attributes are known as “name-value pairs”
• Example: address=“123 Main Street”
August 2006 11
Components: XML Entities
• Two types of entities:– General – placeholders for information contained in
the XML document
– Parameter – used within a DTD to reference a grouping of elements
• Three types of general entities:– Character – used in place of special characters
– Content – used for blocks of frequently used text
– Unparsed – used for binary or non-text data, like image files
August 2006 12
Examples of Entities
• Character entity:– Character: >– Entity reference: > or >– Usage: <formula> x > y </formula>
• Content entity:– Declaration:
<!ENTITY address “123 Main St”>
– Usage: <ship_address> &address; <ship_address>
• Unparsed entity:– Declaration:
<!ENTITY image SYSTEM “sunset.gif” NDATA GIF>
– Usage:
<picture> &aimage; </picture>
August 2006 13
Components: Comments
• An XML comment is ignored by applications that process XML
• Comments are commonly used for documentation, or to add information for others viewing the document
• The content of the comment is surrounded by special comment tags: <!– and -->
• Example: <!-- This is a comment -->
August 2006 14
Well-Formed XML Documents
• A “well-formed” document is one which adheres to the syntax rules for XML:
– An XML document contains one root element
– All elements must have start and end tags, except for empty elements
– Elements must be properly nested
– All attributes must have a value
– Attributes can only appear in the start tag and must be unique to that element
– Element names are case-sensitive
– Special characters must be written as entities
– Names of element can start only with letters or an underscore, and can contain letters, numbers, hyphens, periods and underscores
August 2006 15
XML Parsers
• A “parser” is a program that checks the syntax of an XML document to ensure that the document is well-formed
• Two types of parsers:
– Non-validating – only checks for syntax
– Validating – checks syntax and verifies the document against a DTD or Schema