ARTICLE

Easily find tags and values in a large xml document using XmlTextReader in VB

Posted by Liju Gopalan Articles | XML in VB.NET April 17, 2008
In this article you will learn how to find tags and values in a large xml document using XmlTextReader in VB.
 
Reader Level:

Use XmlTextReader to parse large XML documents.
 

Public Sub findAParticularNodesUsingTextReader() 

    Dim txtreaderObj As New XmlTextReader("C:\" & Chr(13) & "" & Chr(10) & "" & Chr(13) & "" & Chr(10) & " Document and Settings\ Administrator\Desktop\samleXmlDoc.xml") 

    txtreaderObj.WhitespaceHandling = WhitespaceHandling.None 

    While txtreaderObj.Read() 

        If txtreaderObj.Name.Equals("TotalPrice") AndAlso txtreaderObj.IsStartElement() Then 

            txtreaderObj.Read() 

            richTextBox1.AppendText(txtreaderObj.Value) 

        End If

    End While 

End Sub

Result 

12.36 11.99 7.97
 

Faster,read-only XPath query-based access to data,use XPathDocument and XPathNavigator along with xpath query.


Public
Sub FindTagsUsingXPthNaviatorAndXPathDocumentNew() 

    Dim xpDoc As New XPathDocument("C:\Documents " & Chr(13) & "" & Chr(10) & "" & Chr(13) & "" & Chr(10) & " and Settings\ Administrator\Desktop\samleXmlDoc.xml") 

    Dim xpNav As XPathNavigator = xpDoc.CreateNavigator() 

    Dim xpExpression As XPathExpression = xpNav.Compile("/Orders/Order/TotalPrice") 

    Dim xpIter As XPathNodeIterator = xpNav.[Select](xpExpression) 

    While xpIter.MoveNext() 

        richTextBox1.AppendText(xpIter.Current.Value)

    End While 

End Sub

Result

12.36 11.99 7.97


Combining XmlReader and XmlDocument. On the XmlReader use the MoveToContent and Skip methods to skip unwanted items.


Public
Sub UserXmlReaderAndXmlDocument() 

    Dim RdrObj As XmlReader = XmlReader.Create("C:\Documents and Settings\ Administrator\ Desktop\samleXmlDoc.xml") 

    While RdrObj.Read() 

        If RdrObj.NodeType.Equals(XmlNodeType.Element) AndAlso RdrObj.Name.Equals("TotalPrice") AndAlso RdrObj.IsStartElement() Then 

            RdrObj.Read() 

            richTextBox1.AppendText(RdrObj.Value) 

        End If

    End While 

End Sub

Result

12.36 11.99 7.97


Public
Sub UserXmlReaderAndXmlDocumentNew() 

    Dim RdrObj As XmlReader = XmlReader.Create("C:\Documents and Settings\Administrator\ Desktop\samleXmlDoc.xml") 

    Dim XmlDocObj As New XmlDocument() 

    While RdrObj.Read() 

        If RdrObj.NodeType.Equals(XmlNodeType.Element) AndAlso RdrObj.Name.Equals("TotalPrice") AndAlso RdrObj.IsStartElement() Then 

            RdrObj.Read() 

            richTextBox1.AppendText(RdrObj.Value)

        End If

    End While 

    XmlDocObj.Load(RdrObj) 

    richTextBox1.Text = XmlDocObj.InnerText 

End Sub 

Design Considerations

 

  • Avoid XML as long as possible.
  • Avoid processing large documents.
  • Avoid validation. XmlValidatingReader is 2-3x slower than XmlTextReader.
  • Avoid DTD, especially IDs and entity references.
  • Use streaming interfaces such as XmlReader or SAXdotnet.
  • Consider hard-coded processing, including validation.
  • Shorten node name length.
  • Consider sharing NameTable, but only when names are likely to be really common. With more and more irrelevant names, it becomes slower and slower. 

Parsing XML

 

  • Use XmlTextReader and avoid validating readers.
  • When node is required, consider using XmlDocument.ReadNode(), not the entire Load().
  • Set null for XmlResolver property on some XmlReaders to avoid access to external resources.
  • Make full use of MoveToContent() and Skip(). They avoids extraneous name creation. However, it becomes almost nothing when you use XmlValidatingReader.
  • Avoid accessing Value for Text/CDATA nodes as long as possible.

Validating XML

 

  • Avoid extraneous validation.
  • Consider caching schemas.
  • Avoid identity constraint usage. Not only because it stores key/fields for the entire document, but also because the keys are boxed.
  • Avoid extraneous strong typing. It results in XmlSchemaDatatype.ParseValue(). It could also result in avoiding access to Value string. 

Writing XML

 

  • Write output directly as long as possible.
  • To save documents, XmlTextWriter without indentation is better than TextWriter/Stream/file output (all indented) except for human reading.

DOM Processing

 

  • Avoid InnerXml. It internally creates XmlTextReader/XmlTextWriter. InnerText is fine.
  • Avoid PreviousSibling. XmlDocument is very inefficient for backward traverse.
  • Append nodes as soon as possible. Adding a big subtree results in longer extraneous run to check ID attributes.
  • Prefer FirstChild/NextSibling and avoid to access ChildNodes. It creates XmlNodeList, which is initially not instantiated. 

XPath Processing

 

  • Consider using XPathDocument but only when you need the entire document. With XmlDocument you can use ReadNode() but no equivalent for XPathDocument.
  • Avoid preceding-sibling and preceding axes queries, especially over XmlDocument. They would result in sorting, and for XmlDocument they need access to PreviousSibling.
  • Avoid // (descendant). The returned nodes are mostly likely to be irrelevant.
  • Avoid position(), last() and positional predicates (especially things like foo[last()-1]).
  • Compile XPath string to XPathExpression and reuse it for frequent query.
  • Don't run XPath query frequently. It is costly since it always have to Clone() XPathNavigators.

XSLT Processing

 

  • Reuse (cache) XslTransform objects.
  • Avoid key() in XSLT. They can return all kind of nodes that prevents node-type based optimization.
  • Avoid document() especially with nonstatic argument.
  • Pull style (e.g. xsl:for-each) is usually better than template match.
  • Minimize output size. More importantly, minimize input.

NOTE: THIS ARTICLE IS CONVERTED FROM C# TO VB.NET USING A CONVERSION TOOL. ORIGINAL ARTICLE CAN BE FOUND ON C# Corner (http://www.c-sharpcorner.com/).

Login to add your contents and source code to this article
share this article :
post comment
 

How do I loop through the file using xpath navigator and first get dr-nbr for first record and as the first record does not have bidder info so go to bidder info which is in sub node in second record. See the xml data shown below <reports> <report> <xml-report> <summary> <dr-nbr>2012004</dr-nbr> </summary> <data> <proj-title> <title-code> <title ci = "N">District 321</title> </title-code> </proj-title> <p-location> <project-location> <title-code> <p-county-name ci = "N">BOOLE</p-county-name> <p-fips-county>MT101</p-fips-county> <p-city-name ci = "N">MTELBY</p-city-name> <p-state-id ci = "N">MT</p-state-id> <p-zip-code ci = "N">69474</p-zip-code> <p-zip-code5 ci = "N">69474</p-zip-code5> <p-country-id ci = "N">USA</p-country-id> </title-code> </project-location> <pct-project-county> <title-code> <p-county-name ci = "N">TOOLE</p-county-name> <p-fips-county>MT101</p-fips-county> <p-state-id>MT</p-state-id> <p-country-id>USA</p-country-id> </title-code> </pct-project-county> </p-location> <status> <title-code> <status-proj-dlvry-sys ci = "N">Design-Bid-Build</status-proj-dlvry-sys> </title-code> </status> </data> </xml-report> </report> <report> <xml-report> <summary> <dr-nbr>2011005</dr-nbr> </summary> <data> <proj-title> <title-code> <title ci = "N">Plane Pitch</title> </title-code> </proj-title> <p-location> <project-location> <title-code> <p-county-name ci = "A">SUMMIT</p-county-name> <p-fips-county>MI153</p-fips-county> <p-city-name ci = "A">AVON</p-city-name> <p-state-id ci = "A">MI</p-state-id> <p-zip-code ci = "C">44308</p-zip-code> <p-zip-code5 ci = "C">44308</p-zip-code5> <p-country-id ci = "A">USA</p-country-id> </title-code> </project-location> <pct-project-county> <title-code> <p-county-name ci = "A">SUMMIT</p-county-name> <p-fips-county>OH153</p-fips-county> <p-state-id>OH</p-state-id> <p-country-id>USA</p-country-id> </title-code> </pct-project-county> </p-location> <project-bidder-information> <title-code> <bid-header> <bid-header-desc ci = "N">Low Bidders</bid-header-desc> <bid-title> <bid-details> <contact-information> <firm-name>Many Stocks</firm-name> <contact-name>Many Moree</contact-name> </contact-information> </bid-details> <bid-details> <contact-information> <firm-name>Who Constrcution</firm-name> </contact-information> </bid-details> </bid-title> </bid-header> </title-code> </project-bidder-information> </data> </xml-report> </report> </reports> I am able to get bidder info like this but how to know it is for which project Dim BidCompanyiter As XPathNodeIterator BidCompanyiter = nav.Select("reports/report/xml-report/data/project-bidder-information/title-code/bid-header/bid-title/bid-details/contact-information/firm-name") While BidCompanyiter.MoveNext 'Get the data we need from the node lstNav = BidCompanyiter.Current iterNews = lstNav.SelectDescendants(XPathNodeType.Element, False) 'Loop through the child nodes Response.Write(iterNews.Current.Name & ": " & iterNews.Current.Value) Response.Write("<BR>") Exit While End While

Posted by Murari Yalamanchily Feb 29, 2012

Great job!

Posted by John Doe May 13, 2009
Team Foundation Server Hosting
Become a Sponsor
PREMIUM SPONSORS
  • Finally – a virtual platform that delivers next-generation Windows Server 2008 Hyper-V virtualization technology from a managed hosting partner you can truly depend on. Visit www.maximumasp.com/max for a FREE 30 day trial. Hurry offer ends soon. Climb aboard the MaxV platform and take advantage of High Availability, Intelligent Monitoring, Recurrent Backups, and Scalability – with no hassle or hidden fees. As a managed hosting partner focused solely on Microsoft technologies since 2000, MaximumASP is uniquely qualified to provide the superior support that our business is built on. Unparalleled expertise with Microsoft technologies lead to working directly with Microsoft as first to offer IIS 7 and SQL 2008 betas in a hosted environment; partnering in the Go Live Program for Hyper-V; and product co-launches built on WS 2008 with Hyper-V technology.
    ceTE software specializes in components for dynamic PDF generation and manipulation. The DynamicPDF™ product line allows you to dynamically generate PDF documents, merge PDF documents and new content to existing PDF documents from within your applications.
Become a Sponsor