Using firebug to help with web page html parsing using HTMlParser

by Viper 22. January 2009 16:45

For a long time I have been using my application to parse Microsoft Knowledgebase articles and store the content locally to display these articles. All of a sudden the parser stopped working. When I started debugging I realized that there was some changes made by Microsoft on their knowledgebase pages that altered how the flow of mark up. When I initially did the application I had to save the page as HTML and then open it up in Visual Studio editor and format to figure out the structure. Not any more. Now we have handy dandy tool called FireBug that I can fireup and easily figure out the structure of the pages. You can do the same with Internet Explorer Developer Tool bar as well. After that my code to parse the page was reduced down the following snippet.


NodeFilter tableFilter = new NodeClassFilter(typeof(TableColumn));
NodeFilter obAttribFilter = new HasAttributeFilter("class", "listContainer");
NodeFilter andFilter = new AndFilter(tableFilter, obAttribFilter);
NodeList tblNodes = obParser.ExtractAllNodesThatMatch(andFilter);

I use HTMLParser.Net for all my web page parsing. Thats what reduced the parsing to those 4 lines of code.

Views: 935

Tags:

HTMLParser

Comments

Add comment


(Will show your Gravatar icon)

  Country flag

biuquote
  • Comment
  • Preview
Loading



Powered by BlogEngine.NET 1.5.1.7
Theme by Naveen Kohli