1 using xml in.net sorting through the choices. 2 xml and api’s xml is a series of w3c...

Using XML in .NET

Sorting through the choices

XML and API’s

• XML is a series of W3C recommendations

• Focus on structure and behavior

• W3C says very little about API implementation

• The only real API defined by the W3C is the Document Object Model (a.k.a DOM)

• W3C does not mandate any particular API

Relevant W3C Recommendations

• Extensible Markup Language 1.0– Defines syntax– Defines document node types

Core content represented as Elements, Attributes and Text

Non-core content represented as Comments and Processing Instructions

W3C Recommendations (cont.)

• Namespaces– Defines rules for unique element and attribute naming– Uniqueness achieved through the use of Uniform Resource

Identifiers (URI)– All .NET API’s support namespaces

• XML Information Set (Infoset)– Defines information items to represent XML node types– Abstracts data relationships from syntax– Abstracts application from document encoding

Content represented to applications as Unicode

– Most higher level recommendations based on Infoset– .NET API’s mostly represent content consistent with the

Infoset

• XPath– XML Document query language– Syntactically similar to directory paths

/Invoices/Invoice[descendant::Price > 50]

– Identify what you want not how to get it

Loosely analogous to SQL

Verifying Document Content

• Wellformed– Document conforms to XML 1.0 recommendation– All .NET API’s enforce wellformidness

• Validation– Document conforms to defined content rules– .NET supported validation types

Document Type Definitions, W3C XML Schema, XML Data Reduced

– Not all .NET API’s support validation

.NET XML API’s

• Document Object Model (DOM)– Implements the W3C DOM recommendation– A collection of classes representing the various Infoset

information items– Indirectly supports validation– Relatively resource intensive– Only API that both reads & writes– Excellent for random access processing– Easiest migration if experienced with MSXML’s

DomDocument

.NET XML API’s (cont.)

• XPathNavigator– A scrollable, cursor-based document reader– Indirectly supports document validation– May be more resource efficient then DOM– Well-suited for processing document subsets

• XmlReader– Abstract (MustOverride) forward-only document reader– Well-suited for sequential processing– XmlTextReader

Derived from XmlReader

Absolutely fastest document reader

Does not support validation– XmlValidatingReader

Derived from XmlReader

Directly supports validation

• XmlWriter– Abstract (MustOverride) forward-only document writer– Well-suited for sequential document generation– XmlTextWriter

Derived from XmlWriter

Absolutely fastest document writer

Does not support validation

Where’s the SAX Parser?

• .NET does not provide a SAX parser

• Benefits of SAX available through XmlReader implementations

• Microsoft asserts that XmlReader provides several benefits over SAX– XmlReader’s “pull” model is simpler to program then SAX’s

“push” model– Pull model allows program to be optimized for specific

document structure– Simpler programming when multiple documents processed

simultaneously

Document Object Model

Document Object Model (DOM)

• A hierarchy of classes representing the various document nodes

• Classes in System.Xml

• Well-suited for random access and dynamic modification

• All node classes inherit from XmlNode

• The only creatable class is XmlDocument

• Contents validated if populated through XmlValidatingReader

System.Object

XmlLinkedNode

XmlNode

XmlEntityReference

XmlProcessingInstruction

XmlCharacterData

XmlCDataSection

XmlElement

XmlDecleration

XmlDocumentType

.NET DOM Class Hierarchy Classes in System.Xml

XmlComment

XmlSignificantWhiteSpace

XmlText

XmlWhiteSpace

XmlEntity

XmlNotation

XmlCharacterData

XmlAttribute

XmlDocument

XmlDocumentFragment

XmlNode

• XmlNode– Abstract (MustOverride) class that generically represents

each node– Properties & methods to manage node relationships– Properties expose node type, name, namespace and

content– Meaning of name and content vary depending on node type

XmlNode Name & Value Definitions

NodeType Name Value

Element Element QName null

Attribute Attribute QName Attribute Value

Text #text Text

Document #document null

XmlDecleration xml Declaration Content

Comment #comment Comment Text

Processing Instruction PI Target PI Data

XmlNode Relationships

• Relationships managed through read-only XmlNode reference properties– OwnerDocument– ParentNode– Siblings

PreviousSibling, NextSibling– Children

FirstChild, LastChild

DOM Tree Walker

void Main(string[] args){ XmlDocument dom = new XmlDocument() ; dom.Load(@"C:\DataFiles\Classes.xml") ; TreeWalk(dom) ;}

void TreeWalk(XmlNode node){ if (node == null) return ; Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", node.NodeType, node.Name, node.Value) ; TreeWalk(node.FirstChild) ; TreeWalk(node.NextSibling) ;}

XmlNode Collections

• Child Nodes– childNodes property returnes an XmlNodeList

Nodes accessed either through item property or indexer []

– XmlNode also exposes an indexer [ ] to access children

Invoice[“Price”]

XmlNode Collections (cont.)

• Attribute Nodes– Attributes property returns an XmlAttributeCollection

Attributes accessed through an indexer [] by either name or position

Attributes added or changed through SetNamedItem method

DOM Tree Walker Using Collections

static void CollectWalk(XmlNode node){ Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", node.NodeType, node.Name, node.Value) ; if (node.HasChildNodes) { XmlNodeList nodeList = node.ChildNodes ; foreach (XmlNode child in nodeList) CollectWalk(child) ; }}

DOM Tree Walker Using Collections

static void CollectWalk(XmlNode node){ Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", node.NodeType, node.Name, node.Value) ;

if (node.Attributes != null) foreach (XmlAttribute Attr in node.Attributes) Console.WriteLine("\tAttr: {0}={1}", Attr.Name, Attr.Value) ;

if (node.HasChildNodes) { XmlNodeList nodeList = node.ChildNodes ; foreach (XmlNode child in nodeList) CollectWalk(child) ; }}

XPath Support

• SelectNodes– Returns an XmlNodeList containing selection result

• SelectSingleNode– Returns an XmlNode containing only the first node in the

selection result

void ShowStudentNames (XmlNode node){XmlNodeList nl = node.SelectNodes(“Student/@Name”) ;foreach (XmlNode n in nl) Console.WriteLine(“Student:{0}”, n.Value) ;}

DOM Modification

• Node Creation– New nodes must be created by XmlDocument

• XmlNode Placement Methods– InsertBefore, InserAfter– PrependChild, AppendChild– RemoveChild, ReplaceChild, RemoveAll

• Modification events managed through delegates– NodeChanging, NodeChanged– NodeInserting, NodeInserted– NodeRemoving, NodeRemoved

DOM Modificationstatic void BuildClass(XmlDocument dom, string Path){ XmlElement cs = dom.CreateElement("Classes") ; dom.AppendChild(cs) ; XmlElement c = dom.CreateElement("Class") ; c.SetAttribute("name", ".NET XML") ; cs.AppendChild(c) ; n = c.AppendChild(dom.CreateElement("Students")) ; n.AppendChild(dom.CreateTextNode("12")) ; n = c.AppendChild(dom.CreateElement("Location")) ; n.AppendChild(dom.CreateTextNode("Maine Bytes")) ; n = c.AppendChild(dom.CreateElement("Inst")) ; n.AppendChild(dom.CreateTextNode("Jim")) ; dom.Save(Path) ;}

<?xml version="1.0" encoding="utf-8"?><Classes name=".NET XML"> <Students>12</Students> <Location>Maine Bytes</Location> <Instructor>Jim</Instructor></Classes>

XPathNavigator

• Classes in System.Xml.XPath

• Read-only

• Provides a scrolling cursor “window” over the document

• Great support for document filtering

• Best XPath support

• Content is interpreted according to XPath specification

Creating the Navigator

• XPathNavigator is an abstract (MustOverride) class

• Must be factoried from another object

• Factory objects must implement IXPathNavogable– XPathDocument implementation creates an efficient

navigator cache

Can be populated from XmlValidatingReader

– XmlNode implementation creates a navigator over the corresponding DOM instance.

XPathNavigator Name & Value Definitions

NodeType Name Value

Element Element QNameConcatenated value of descendant text nodes in document order

Attribute Attribute QName Attribute Value

Text null Text

Root nullConcatenated value of all text nodes in document order

Comment null Comment Text

Processing Instruction PI Target PI Data

Cursor Navigation

• Cursor is controlled by “MoveTo…” methods– MoveToRoot– MoveToParent– MoveToFirstChild– Siblings

MoveToFirst, MoveToPrevious, MoveToNext

– Attributes

MoveToFirstAttribute, MoveToNextAttribute

Navigator Tree Walker

static void Main(string[] args){ XPathDocument doc = new XPathDocument(Path) ; XPathNavigator nav = doc.CreateNavigator() ; TreeWalk(nav) ;}

static void TreeWalk(XPathNavigator nav){ Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", nav.NodeType, nav.Name, nav.Value) ; if (nav.HasChildren) { nav.MoveToFirstChild() ; TreeWalk(nav) ; nav.MoveToParent() ; } if (nav.MoveToNext()) TreeWalk(nav) ;}

Navigator Tree Walker

static void TreeWalk(XPathNavigator nav){ Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", nav.NodeType, nav.Name, nav.Value) ; if (nav.HasAttributes) { while (nav.MoveToNextAttribute()) Console.WriteLine("\tAttr: {0}={1}", nav.Name, nav.Value) ; nav.MoveToParent() ; } if (nav.HasChildren) { nav.MoveToFirstChild() ; TreeWalk(nav) ; nav.MoveToParent() ; } if (nav.MoveToNext()) TreeWalk(nav) ;}

Processing Document Subsets

• Selection Methods– SelectChildren, SelectDescendants, SelectAncestors– All functions return an XPathNodeIterator

• XPathNodeIterator– Represents a cursor over the selected set– MoveNext method

Advances the cursor– Current property

Returns an XPathNavigator positioned at the current node

XPath Support

• Select method– Generic XPath selection– Node set returned as an XPathNodeIterator

• Evaluate method– Returns typed XPath results– XPath supports arithmetic operations, logical operations and

function calls– XPath statements can return numeric, string and Boolean

results

Navigator XPath Support

static void LargeClasses(XPathNavigator nav){ XPathNodeIterator nodes = nav.Select("//Class[Students > 10]") ; while (nodes.MoveNext()) { XPathNavigator classNav = nodes.Current ; classNav.MoveToAttribute("name", "") ; Console.WriteLine("Class: {0}", classNav.Value) ; }}

static void CountStdnt(XPathNavigator nav){ double total = (double) nav.Evaluate("sum(//Students)") ; Console.WriteLine("Total Students: {0}", total) ;}

Enhancing XPath Performance

• XPath statements can be compiled to improve performance– XPathNavigator.Compile

Compiles XPath string into an XPathExpression

– XPathExpression represents a compiled XPath statement– Select & Evaluate are overloaded to support

XPathExpressions in addition to XPath strings

void CountStudents (XPathNavigator nav){XPathExpression exp = nav.Compile(“sum(//Students)”)Console.WriteLine(“Total:{0}”, nav.Evaluate(exp)) ;}

Sequential Reading

XmlReader

• Abstract (MustOverride) class

• Represents a forward-only document reader

• Exposes information only for the current node position– NodeType, Name, NamespaceURI, Value

• Application must handle some lexical issues

XmlReader (cont.)

• Read method– Supports generic document processing– Reads the next hierarchical node– Application code must manage details of each node

• Attributes must be specifically read– MoveToFirstAttribute, MoveToNextAttribute

Iterates through the attribute list– MoveToAttribute

Move to a named attribute or attribute position

XmlReader (cont.)

• ReadStartElement & ReadEndElement provide element optimizations– Reader verifies that node is an element– Overloads support name and namespace verification

• ReadElementString– Encapsulates node type, name & namespace verification– Reads element start & end tags and text child– Returns value of text child

• MoveToContent– Skips over white space, comments & Processing

Instructions

XmlTextReader

• Derived from XmlReader

• Most performent .NET XML reader

• Adds methods to interrogate file information– LineNumber, LinePosition, Encoding

• Adds methods to simplify large data block handling– ReadBase64, ReadBinHex, ReadChars

XmlTextReader

static void Main(string[] args){ ClassInfo(new XmlTextReader(Path)) ; }

static void ClassInfo(XmlTextReader Rdr){ Rdr.MoveToContent() ; Rdr.ReadStartElement("Classes") ; Rdr.MoveToContent() ; while (Rdr.Name != "Classes") { Console.Write ("{0}|", Rdr["name"]) ; Rdr.ReadStartElement("Class") ; Console.Write ("{0}|", Rdr.ReadElementString("Students")) ; Console.Write ("{0}|", Rdr.ReadElementString("Location")) ; Console.WriteLine("{0}", Rdr.ReadElementString("Instructor")) ; Rdr.ReadEndElement() ; Rdr.MoveToContent() ; }}

XmlValidatingReader

• Derived from XmlReader

• Provides DocumentValidation over an existing XmlReader instance

• Validation errors reported to a delegate or by throwing an exception– Delegates registered with ValidationEventHandler– If no delegate registered then an XmlException is thrown

XmlValidatingReader

• XmlDocument and XPathDocument can be populated through XmlValidatingReader

void LoadDom (XmlDocument dom, string fileName){ XmlTextReader TRdr = new XmlTextReader(fileName) ; XmlValidatingReader VRdr = new XmlValidatingReader(TRdr) ; VRdr.ValidationEventHandler += new ValidationEventHandler(vCallBack) ; dom.Load(VRdr) ;}

void vCallBack(object sender, ValidationEventArgs args){. . .}

XmlValidatingReader

• ValidationType & Schemas properties can be used to manage the validation process

• ReadTypedValue returns value as the proper CLR type– Must be using XML Schema or XDR validation

Sequential Writing

XmlWriter

• Abstract (MustOverride) class

• Represents a forward-only, sequential document writer

• Checks wellformidness of generated content

• Does not validate

XmlWriter (cont.)

• Provides “Write” methods for the various node types– WriteStartElement, WriteEndElement, WriteString,

WriteComment, etc.– WriteElementString writes start tag, end tag and character

child in a single call

• WriteDocType method supports writing DTD entries

• WriteRaw method allows pass-through writing of raw XML– Writer does not check wellformidness of raw writes

XmlTextWriter

• Derives from XmlWriter

• Adds Formatting control– Formmatting, Indentation, IndentChar & QuoteChar

properties

• Adds methods to simplify large data block handling– WriteBase64, WriteBinHex, WriteChars

XmlTextWriterstatic void WriteClass(XmlTextWriter wrt){ wrt.Formatting = Formatting.Indented ; wrt.WriteStartDocument() ; wrt.WriteStartElement("Classes") ; wrt.WriteAttributeString("name", ".NET XML") ; wrt.WriteElementString("Students", "12") ; wrt.WriteElementString("Location", "Maine Bytes") ; wrt.WriteElementString("Instructor", "Jim") ;}static void Main(string[] args){ XmlTextWriter wrt = new XmlTextWriter(Path, Encoding.UTF8) ; WriteClass(wrt) ; wrt.Close() ;}

<?xml version="1.0" encoding="utf-8"?><Classes name=".NET XML"> <Students>12</Students> <Location>Maine Bytes</Location> <Instructor>Jim</Instructor></Classes>

Summary

• Each API is optimized for a different use

• DOM– Random Access, Dynamic Updates

• XPath Navigator– Document Subsets, Rich XPath Support

• XmlReader– Abstract class, models sequential reading

• XmlTextReader– Most performant reader

Summary (cont.)

• XmlValidatingReader– Validation, Usable directly or w/ DOM & XPathDocument

• XmlWriter– Abstract class, models sequential writing

• XmlTextWriter– Most performant writer

• Often best solution is to use a combination of the API’s

Download & Contact Information

Jim Wilsonjimw@jwhedgehog.com

Presentation Downloadhttp://www.jwhedgehog.com/MaineBytes0701

Sample Code Downloadhttp://www.jwhedgehog.com/MaineBytes0701

1 using xml in.net sorting through the choices. 2 xml and api’s xml is a series of w3c...

Documents

xml schema (w3c) thanks to jussi pohjolainen tamk university...

security in.net. objectives security in.net basic concepts -...

open xml developer workshop xml programming in.net

w3c xml schema - arizona state university · xml document...

prof. riccardo torlone università roma...

8.2 w3c xml query language

xquery – the w3c xml query language

acceleration techniques for xml processors · xml...

introduction au langage xml - unice.fr › ~nlt › cours...

xml$schema(w3c)$ -...

xquery - csd.uoc.grhy561/data/lectures/cs561xquery10.pdf ·...

html+xml: the w3c html/xml task force · html+xml: the w3c...

w3c xml query

sdpl 20077: querying xml with xquery1 7 querying xml n how...

entso-e xml namespace reference … xml...version 3.0...

w3c workshop on next steps for xml signature and xml...

entso-e xml namespace reference document xml namespace use...

sdpl 20067: querying xml with xquery1 7 querying xml n how...

1 xml linking language (xlink) w3c working draft - 3/3/1998

xml family of languagesn of w3c specifications · 12/4/2014...