ARTICLE
Easily find tags and values in a large xml document using XmlTextReader in VB
In this article you will learn how to find tags and values in a large xml document using XmlTextReader in VB.
Use XmlTextReader to parse large XML documents.
Public Sub findAParticularNodesUsingTextReader()
Dim txtreaderObj As New XmlTextReader("C:\" & Chr(13) & "" & Chr(10) & "" & Chr(13) & "" & Chr(10) & " Document and Settings\ Administrator\Desktop\samleXmlDoc.xml")
txtreaderObj.WhitespaceHandling = WhitespaceHandling.None
While txtreaderObj.Read()
If txtreaderObj.Name.Equals("TotalPrice") AndAlso txtreaderObj.IsStartElement() Then
txtreaderObj.Read()
richTextBox1.AppendText(txtreaderObj.Value)
End If
End While
End Sub
Result
12.36 11.99 7.97
Faster,read-only XPath query-based access to data,use XPathDocument and XPathNavigator along with xpath query.
Public Sub FindTagsUsingXPthNaviatorAndXPathDocumentNew()
Dim xpDoc As New XPathDocument("C:\Documents " & Chr(13) & "" & Chr(10) & "" & Chr(13) & "" & Chr(10) & " and Settings\ Administrator\Desktop\samleXmlDoc.xml")
Dim xpNav As XPathNavigator = xpDoc.CreateNavigator()
Dim xpExpression As XPathExpression = xpNav.Compile("/Orders/Order/TotalPrice")
Dim xpIter As XPathNodeIterator = xpNav.[Select](xpExpression)
While xpIter.MoveNext()
richTextBox1.AppendText(xpIter.Current.Value)
End While
End Sub
Result
12.36 11.99 7.97
Combining XmlReader and XmlDocument. On the XmlReader use the MoveToContent and Skip methods to skip unwanted items.
Public Sub UserXmlReaderAndXmlDocument()
Dim RdrObj As XmlReader = XmlReader.Create("C:\Documents and Settings\ Administrator\ Desktop\samleXmlDoc.xml")
While RdrObj.Read()
If RdrObj.NodeType.Equals(XmlNodeType.Element) AndAlso RdrObj.Name.Equals("TotalPrice") AndAlso RdrObj.IsStartElement() Then
RdrObj.Read()
richTextBox1.AppendText(RdrObj.Value)
End If
End While
End Sub
Result
12.36 11.99 7.97
Public Sub UserXmlReaderAndXmlDocumentNew()
Dim RdrObj As XmlReader = XmlReader.Create("C:\Documents and Settings\Administrator\ Desktop\samleXmlDoc.xml")
Dim XmlDocObj As New XmlDocument()
While RdrObj.Read()
If RdrObj.NodeType.Equals(XmlNodeType.Element) AndAlso RdrObj.Name.Equals("TotalPrice") AndAlso RdrObj.IsStartElement() Then
RdrObj.Read()
richTextBox1.AppendText(RdrObj.Value)
End If
End While
XmlDocObj.Load(RdrObj)
richTextBox1.Text = XmlDocObj.InnerText
End Sub
Design Considerations
- Avoid XML as long as possible.
- Avoid processing large documents.
- Avoid validation. XmlValidatingReader is 2-3x slower than XmlTextReader.
- Avoid DTD, especially IDs and entity references.
- Use streaming interfaces such as XmlReader or SAXdotnet.
- Consider hard-coded processing, including validation.
- Shorten node name length.
- Consider sharing NameTable, but only when names are likely to be really common. With more and more irrelevant names, it becomes slower and slower.
Parsing XML
- Use XmlTextReader and avoid validating readers.
- When node is required, consider using XmlDocument.ReadNode(), not the entire Load().
- Set null for XmlResolver property on some XmlReaders to avoid access to external resources.
- Make full use of MoveToContent() and Skip(). They avoids extraneous name creation. However, it becomes almost nothing when you use XmlValidatingReader.
- Avoid accessing Value for Text/CDATA nodes as long as possible.
Validating XML
- Avoid extraneous validation.
- Consider caching schemas.
- Avoid identity constraint usage. Not only because it stores key/fields for the entire document, but also because the keys are boxed.
- Avoid extraneous strong typing. It results in XmlSchemaDatatype.ParseValue(). It could also result in avoiding access to Value string.
Writing XML
- Write output directly as long as possible.
- To save documents, XmlTextWriter without indentation is better than TextWriter/Stream/file output (all indented) except for human reading.
DOM Processing
- Avoid InnerXml. It internally creates XmlTextReader/XmlTextWriter. InnerText is fine.
- Avoid PreviousSibling. XmlDocument is very inefficient for backward traverse.
- Append nodes as soon as possible. Adding a big subtree results in longer extraneous run to check ID attributes.
- Prefer FirstChild/NextSibling and avoid to access ChildNodes. It creates XmlNodeList, which is initially not instantiated.
XPath Processing
- Consider using XPathDocument but only when you need the entire document. With XmlDocument you can use ReadNode() but no equivalent for XPathDocument.
- Avoid preceding-sibling and preceding axes queries, especially over XmlDocument. They would result in sorting, and for XmlDocument they need access to PreviousSibling.
- Avoid // (descendant). The returned nodes are mostly likely to be irrelevant.
- Avoid position(), last() and positional predicates (especially things like foo[last()-1]).
- Compile XPath string to XPathExpression and reuse it for frequent query.
- Don't run XPath query frequently. It is costly since it always have to Clone() XPathNavigators.
XSLT Processing
- Reuse (cache) XslTransform objects.
- Avoid key() in XSLT. They can return all kind of nodes that prevents node-type based optimization.
- Avoid document() especially with nonstatic argument.
- Pull style (e.g. xsl:for-each) is usually better than template match.
- Minimize output size. More importantly, minimize input.
NOTE: THIS ARTICLE IS CONVERTED FROM C# TO VB.NET USING A CONVERSION TOOL. ORIGINAL ARTICLE CAN BE FOUND ON C# Corner (http://www.c-sharpcorner.com/).