C#解析html文件

阿新 • • 發佈：2019-01-03

當我們需要解析一個web頁面的時候，如果非常簡單，可以用字串查詢的方式，複雜一點可以用正則表示式，但是有時候正則很麻煩的，因為html程式碼本身就比較麻煩，像常用的img標籤，這個東東到了瀏覽器上就沒了閉合標籤（一直還沒搞懂為什麼），想用XML解析，也是同樣的原因根本解析不了，今天發現一個解析html控制元件，用了一下，非常好用。

我直接把例子貼這兒，一看就明白。因為是作為xml解析的，所以呢，少不了XPath，如果不懂這個東西的話，趕緊看看吧，現在xpath語法都擴充套件到css裡面了，語法比較簡單，先看看基礎的就行了。

最基本的使用方法不是SelectSingleNode，而是GetElementById，這是與XmlDocument不同的地方。

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

//
 The HtmlWeb class is a utility class to get the HTML over HTTP

HtmlWeb
 htmlWeb =

new HtmlWeb();

//
 Creates an HtmlDocument object from an URL

HtmlAgilityPack.HtmlDocument
 document = htmlWeb.Load(

);

//
 Targets a specific node

HtmlNode
 someNode = document.GetElementbyId(

"mynode");

//
 If there is no node with that Id, someNode will be null

if

(someNode
 !=

null) {

//
 Extracts all links within that node

IEnumerable<htmlnode>
 allLinks = someNode.Descendants(

"a");

//
 Outputs the href for external links

foreach

(HtmlNode
 link

in allLinks) {

//
 Checks whether the link contains an HREF attribute

if (link.Attributes.Contains("href")) {

//
 Simple check: if the href begins with "http://",
 prints it out

if (link.Attributes["href"].Value.StartsWith()) Console.WriteLine(link.Attributes["href"].Value); } } }</htmlnode>

使用xpath

1 2 3 4 5 6 // Extracts all links under a specific node that have an href that begins with "http://" HtmlNodeCollection allLinks = document.DocumentNode.SelectNodes("//*[@id='mynode']//a[starts-with(@href,'http://')]"); // Outputs the href for external links foreach (HtmlNode linkin allLinks) Console.WriteLine(link.Attributes["href"].Value);

One more

1 2 3 4 xpath ="//table[@id='1' or @id='2' or @id='3']//a[@onmousedown]"; xpath ="//ul[@id='wg0']//li[position()<4]/h3/a"; xpath ="//div[@class='resitem' and position()<4]/a"; xpath ="//li[@class='result' and position()<4]/a";

C#解析html文件

C#解析html文件

C#對HTML文件的解析

使用JSOUP解析HTML文件

瀏覽器解析HTML文件的資源並下載

c++解析csv文件

C++解析頭文件-Qt自動生成信號聲明

c# mvc 在控制器中動態解析cshtml文件並獲取對應的html代碼

【U1結業機試題】新聞內容管理系統：解析XML文件讀取Html模版生成網頁文件

html加C#上傳文件

C#儀器數據文件解析-RTF文件

C#儀器數據文件解析-Excel文件（xls、xlsx）

C#儀器數據文件解析-Word文件（doc、docx）

HTML文件解析

使用Python中的HTMLParser、cookielib抓取和解析網頁、從HTML文件中提取連結、影象、文字、Cookies .

Html中嵌套其他HTML文件的幾種方法（轉）

解析PE文件的附加數據

java解析xml文件練習——通過應用包名獲取應用圖標即其他信息（基於魅族應用商店）

(二) C/C++中判斷文件或文件夾是否存在

怎樣解決jsp:include標簽在包括html文件時遇到的亂碼問題

[轉]win7下修改C盤USERS文件下的名稱

C#解析html文件

相關推薦