1 using xml in.net sorting through the choices. 2 xml and api’s xml is a series of w3c...

Post on 27-Dec-2015

233 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Using XML in .NET

Sorting through the choices

2

XML and API’s

• XML is a series of W3C recommendations

• Focus on structure and behavior

• W3C says very little about API implementation

• The only real API defined by the W3C is the Document Object Model (a.k.a DOM)

• W3C does not mandate any particular API

3

Relevant W3C Recommendations

• Extensible Markup Language 1.0– Defines syntax– Defines document node types

Core content represented as Elements, Attributes and Text

Non-core content represented as Comments and Processing Instructions

4

W3C Recommendations (cont.)

• Namespaces– Defines rules for unique element and attribute naming– Uniqueness achieved through the use of Uniform Resource

Identifiers (URI)– All .NET API’s support namespaces

5

W3C Recommendations (cont.)

• XML Information Set (Infoset)– Defines information items to represent XML node types– Abstracts data relationships from syntax– Abstracts application from document encoding

Content represented to applications as Unicode

– Most higher level recommendations based on Infoset– .NET API’s mostly represent content consistent with the

Infoset

6

W3C Recommendations (cont.)

• XPath– XML Document query language– Syntactically similar to directory paths

/Invoices/Invoice[descendant::Price > 50]

– Identify what you want not how to get it

Loosely analogous to SQL

7

Verifying Document Content

• Wellformed– Document conforms to XML 1.0 recommendation– All .NET API’s enforce wellformidness

• Validation– Document conforms to defined content rules– .NET supported validation types

Document Type Definitions, W3C XML Schema, XML Data Reduced

– Not all .NET API’s support validation

8

.NET XML API’s

• Document Object Model (DOM)– Implements the W3C DOM recommendation– A collection of classes representing the various Infoset

information items– Indirectly supports validation– Relatively resource intensive– Only API that both reads & writes– Excellent for random access processing– Easiest migration if experienced with MSXML’s

DomDocument

9

.NET XML API’s (cont.)

• XPathNavigator– A scrollable, cursor-based document reader– Indirectly supports document validation– May be more resource efficient then DOM– Well-suited for processing document subsets

10

.NET XML API’s (cont.)

• XmlReader– Abstract (MustOverride) forward-only document reader– Well-suited for sequential processing– XmlTextReader

Derived from XmlReader

Absolutely fastest document reader

Does not support validation– XmlValidatingReader

Derived from XmlReader

Directly supports validation

11

.NET XML API’s (cont.)

• XmlWriter– Abstract (MustOverride) forward-only document writer– Well-suited for sequential document generation– XmlTextWriter

Derived from XmlWriter

Absolutely fastest document writer

Does not support validation

12

Where’s the SAX Parser?

• .NET does not provide a SAX parser

• Benefits of SAX available through XmlReader implementations

• Microsoft asserts that XmlReader provides several benefits over SAX– XmlReader’s “pull” model is simpler to program then SAX’s

“push” model– Pull model allows program to be optimized for specific

document structure– Simpler programming when multiple documents processed

simultaneously

13

Document Object Model

14

Document Object Model (DOM)

• A hierarchy of classes representing the various document nodes

• Classes in System.Xml

• Well-suited for random access and dynamic modification

• All node classes inherit from XmlNode

• The only creatable class is XmlDocument

• Contents validated if populated through XmlValidatingReader

15

System.Object

XmlLinkedNode

XmlNode

XmlEntityReference

XmlProcessingInstruction

XmlCharacterData

XmlCDataSection

XmlElement

XmlDecleration

XmlDocumentType

.NET DOM Class Hierarchy Classes in System.Xml

XmlComment

XmlSignificantWhiteSpace

XmlText

XmlWhiteSpace

XmlEntity

XmlNotation

XmlCharacterData

XmlAttribute

XmlDocument

XmlDocumentFragment

16

XmlNode

• XmlNode– Abstract (MustOverride) class that generically represents

each node– Properties & methods to manage node relationships– Properties expose node type, name, namespace and

content– Meaning of name and content vary depending on node type

17

XmlNode Name & Value Definitions

NodeType Name Value

Element Element QName null

Attribute Attribute QName Attribute Value

Text #text Text

Document #document null

XmlDecleration xml Declaration Content

Comment #comment Comment Text

Processing Instruction PI Target PI Data

18

XmlNode Relationships

• Relationships managed through read-only XmlNode reference properties– OwnerDocument– ParentNode– Siblings

PreviousSibling, NextSibling– Children

FirstChild, LastChild

19

DOM Tree Walker

void Main(string[] args){ XmlDocument dom = new XmlDocument() ; dom.Load(@"C:\DataFiles\Classes.xml") ; TreeWalk(dom) ;}

void TreeWalk(XmlNode node){ if (node == null) return ; Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", node.NodeType, node.Name, node.Value) ; TreeWalk(node.FirstChild) ; TreeWalk(node.NextSibling) ;}

20

XmlNode Collections

• Child Nodes– childNodes property returnes an XmlNodeList

Nodes accessed either through item property or indexer []

– XmlNode also exposes an indexer [ ] to access children

Invoice[“Price”]

21

XmlNode Collections (cont.)

• Attribute Nodes– Attributes property returns an XmlAttributeCollection

Attributes accessed through an indexer [] by either name or position

Attributes added or changed through SetNamedItem method

22

DOM Tree Walker Using Collections

static void CollectWalk(XmlNode node){ Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", node.NodeType, node.Name, node.Value) ; if (node.HasChildNodes) { XmlNodeList nodeList = node.ChildNodes ; foreach (XmlNode child in nodeList) CollectWalk(child) ; }}

23

DOM Tree Walker Using Collections

static void CollectWalk(XmlNode node){ Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", node.NodeType, node.Name, node.Value) ;

if (node.Attributes != null) foreach (XmlAttribute Attr in node.Attributes) Console.WriteLine("\tAttr: {0}={1}", Attr.Name, Attr.Value) ;

if (node.HasChildNodes) { XmlNodeList nodeList = node.ChildNodes ; foreach (XmlNode child in nodeList) CollectWalk(child) ; }}

24

XPath Support

• SelectNodes– Returns an XmlNodeList containing selection result

• SelectSingleNode– Returns an XmlNode containing only the first node in the

selection result

void ShowStudentNames (XmlNode node){XmlNodeList nl = node.SelectNodes(“Student/@Name”) ;foreach (XmlNode n in nl) Console.WriteLine(“Student:{0}”, n.Value) ;}

25

DOM Modification

• Node Creation– New nodes must be created by XmlDocument

• XmlNode Placement Methods– InsertBefore, InserAfter– PrependChild, AppendChild– RemoveChild, ReplaceChild, RemoveAll

• Modification events managed through delegates– NodeChanging, NodeChanged– NodeInserting, NodeInserted– NodeRemoving, NodeRemoved

26

DOM Modificationstatic void BuildClass(XmlDocument dom, string Path){ XmlElement cs = dom.CreateElement("Classes") ; dom.AppendChild(cs) ; XmlElement c = dom.CreateElement("Class") ; c.SetAttribute("name", ".NET XML") ; cs.AppendChild(c) ; n = c.AppendChild(dom.CreateElement("Students")) ; n.AppendChild(dom.CreateTextNode("12")) ; n = c.AppendChild(dom.CreateElement("Location")) ; n.AppendChild(dom.CreateTextNode("Maine Bytes")) ; n = c.AppendChild(dom.CreateElement("Inst")) ; n.AppendChild(dom.CreateTextNode("Jim")) ; dom.Save(Path) ;}

<?xml version="1.0" encoding="utf-8"?><Classes name=".NET XML"> <Students>12</Students> <Location>Maine Bytes</Location> <Instructor>Jim</Instructor></Classes>

27

XPathNavigator

28

XPathNavigator

• Classes in System.Xml.XPath

• Read-only

• Provides a scrolling cursor “window” over the document

• Great support for document filtering

• Best XPath support

• Content is interpreted according to XPath specification

29

Creating the Navigator

• XPathNavigator is an abstract (MustOverride) class

• Must be factoried from another object

• Factory objects must implement IXPathNavogable– XPathDocument implementation creates an efficient

navigator cache

Can be populated from XmlValidatingReader

– XmlNode implementation creates a navigator over the corresponding DOM instance.

30

XPathNavigator Name & Value Definitions

NodeType Name Value

Element Element QNameConcatenated value of descendant text nodes in document order

Attribute Attribute QName Attribute Value

Text null Text

Root nullConcatenated value of all text nodes in document order

Comment null Comment Text

Processing Instruction PI Target PI Data

31

Cursor Navigation

• Cursor is controlled by “MoveTo…” methods– MoveToRoot– MoveToParent– MoveToFirstChild– Siblings

MoveToFirst, MoveToPrevious, MoveToNext

– Attributes

MoveToFirstAttribute, MoveToNextAttribute

32

Navigator Tree Walker

static void Main(string[] args){ XPathDocument doc = new XPathDocument(Path) ; XPathNavigator nav = doc.CreateNavigator() ; TreeWalk(nav) ;}

static void TreeWalk(XPathNavigator nav){ Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", nav.NodeType, nav.Name, nav.Value) ; if (nav.HasChildren) { nav.MoveToFirstChild() ; TreeWalk(nav) ; nav.MoveToParent() ; } if (nav.MoveToNext()) TreeWalk(nav) ;}

33

Navigator Tree Walker

static void TreeWalk(XPathNavigator nav){ Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", nav.NodeType, nav.Name, nav.Value) ; if (nav.HasAttributes) { while (nav.MoveToNextAttribute()) Console.WriteLine("\tAttr: {0}={1}", nav.Name, nav.Value) ; nav.MoveToParent() ; } if (nav.HasChildren) { nav.MoveToFirstChild() ; TreeWalk(nav) ; nav.MoveToParent() ; } if (nav.MoveToNext()) TreeWalk(nav) ;}

34

Processing Document Subsets

• Selection Methods– SelectChildren, SelectDescendants, SelectAncestors– All functions return an XPathNodeIterator

• XPathNodeIterator– Represents a cursor over the selected set– MoveNext method

Advances the cursor– Current property

Returns an XPathNavigator positioned at the current node

35

XPath Support

• Select method– Generic XPath selection– Node set returned as an XPathNodeIterator

• Evaluate method– Returns typed XPath results– XPath supports arithmetic operations, logical operations and

function calls– XPath statements can return numeric, string and Boolean

results

36

Navigator XPath Support

static void LargeClasses(XPathNavigator nav){ XPathNodeIterator nodes = nav.Select("//Class[Students > 10]") ; while (nodes.MoveNext()) { XPathNavigator classNav = nodes.Current ; classNav.MoveToAttribute("name", "") ; Console.WriteLine("Class: {0}", classNav.Value) ; }}

static void CountStdnt(XPathNavigator nav){ double total = (double) nav.Evaluate("sum(//Students)") ; Console.WriteLine("Total Students: {0}", total) ;}

37

Enhancing XPath Performance

• XPath statements can be compiled to improve performance– XPathNavigator.Compile

Compiles XPath string into an XPathExpression

– XPathExpression represents a compiled XPath statement– Select & Evaluate are overloaded to support

XPathExpressions in addition to XPath strings

void CountStudents (XPathNavigator nav){XPathExpression exp = nav.Compile(“sum(//Students)”)Console.WriteLine(“Total:{0}”, nav.Evaluate(exp)) ;}

38

Sequential Reading

39

XmlReader

• Abstract (MustOverride) class

• Represents a forward-only document reader

• Exposes information only for the current node position– NodeType, Name, NamespaceURI, Value

• Application must handle some lexical issues

40

XmlReader (cont.)

• Read method– Supports generic document processing– Reads the next hierarchical node– Application code must manage details of each node

• Attributes must be specifically read– MoveToFirstAttribute, MoveToNextAttribute

Iterates through the attribute list– MoveToAttribute

Move to a named attribute or attribute position

41

XmlReader (cont.)

• ReadStartElement & ReadEndElement provide element optimizations– Reader verifies that node is an element– Overloads support name and namespace verification

• ReadElementString– Encapsulates node type, name & namespace verification– Reads element start & end tags and text child– Returns value of text child

• MoveToContent– Skips over white space, comments & Processing

Instructions

42

XmlTextReader

• Derived from XmlReader

• Most performent .NET XML reader

• Adds methods to interrogate file information– LineNumber, LinePosition, Encoding

• Adds methods to simplify large data block handling– ReadBase64, ReadBinHex, ReadChars

43

XmlTextReader

static void Main(string[] args){ ClassInfo(new XmlTextReader(Path)) ; }

static void ClassInfo(XmlTextReader Rdr){ Rdr.MoveToContent() ; Rdr.ReadStartElement("Classes") ; Rdr.MoveToContent() ; while (Rdr.Name != "Classes") { Console.Write ("{0}|", Rdr["name"]) ; Rdr.ReadStartElement("Class") ; Console.Write ("{0}|", Rdr.ReadElementString("Students")) ; Console.Write ("{0}|", Rdr.ReadElementString("Location")) ; Console.WriteLine("{0}", Rdr.ReadElementString("Instructor")) ; Rdr.ReadEndElement() ; Rdr.MoveToContent() ; }}

44

XmlValidatingReader

• Derived from XmlReader

• Provides DocumentValidation over an existing XmlReader instance

• Validation errors reported to a delegate or by throwing an exception– Delegates registered with ValidationEventHandler– If no delegate registered then an XmlException is thrown

45

XmlValidatingReader

• XmlDocument and XPathDocument can be populated through XmlValidatingReader

void LoadDom (XmlDocument dom, string fileName){ XmlTextReader TRdr = new XmlTextReader(fileName) ; XmlValidatingReader VRdr = new XmlValidatingReader(TRdr) ; VRdr.ValidationEventHandler += new ValidationEventHandler(vCallBack) ; dom.Load(VRdr) ;}

void vCallBack(object sender, ValidationEventArgs args){. . .}

46

XmlValidatingReader

• ValidationType & Schemas properties can be used to manage the validation process

• ReadTypedValue returns value as the proper CLR type– Must be using XML Schema or XDR validation

47

Sequential Writing

48

XmlWriter

• Abstract (MustOverride) class

• Represents a forward-only, sequential document writer

• Checks wellformidness of generated content

• Does not validate

49

XmlWriter (cont.)

• Provides “Write” methods for the various node types– WriteStartElement, WriteEndElement, WriteString,

WriteComment, etc.– WriteElementString writes start tag, end tag and character

child in a single call

• WriteDocType method supports writing DTD entries

• WriteRaw method allows pass-through writing of raw XML– Writer does not check wellformidness of raw writes

50

XmlTextWriter

• Derives from XmlWriter

• Adds Formatting control– Formmatting, Indentation, IndentChar & QuoteChar

properties

• Adds methods to simplify large data block handling– WriteBase64, WriteBinHex, WriteChars

51

XmlTextWriterstatic void WriteClass(XmlTextWriter wrt){ wrt.Formatting = Formatting.Indented ; wrt.WriteStartDocument() ; wrt.WriteStartElement("Classes") ; wrt.WriteAttributeString("name", ".NET XML") ; wrt.WriteElementString("Students", "12") ; wrt.WriteElementString("Location", "Maine Bytes") ; wrt.WriteElementString("Instructor", "Jim") ;}static void Main(string[] args){ XmlTextWriter wrt = new XmlTextWriter(Path, Encoding.UTF8) ; WriteClass(wrt) ; wrt.Close() ;}

<?xml version="1.0" encoding="utf-8"?><Classes name=".NET XML"> <Students>12</Students> <Location>Maine Bytes</Location> <Instructor>Jim</Instructor></Classes>

52

Summary

• Each API is optimized for a different use

• DOM– Random Access, Dynamic Updates

• XPath Navigator– Document Subsets, Rich XPath Support

• XmlReader– Abstract class, models sequential reading

• XmlTextReader– Most performant reader

53

Summary (cont.)

• XmlValidatingReader– Validation, Usable directly or w/ DOM & XPathDocument

• XmlWriter– Abstract class, models sequential writing

• XmlTextWriter– Most performant writer

• Often best solution is to use a combination of the API’s

54

Download & Contact Information

Jim Wilsonjimw@jwhedgehog.com

Presentation Downloadhttp://www.jwhedgehog.com/MaineBytes0701

Sample Code Downloadhttp://www.jwhedgehog.com/MaineBytes0701

top related