1. 程式人生 > 其它 >C# 使用HtmlAgilityPack 抓取 網站連結

C# 使用HtmlAgilityPack 抓取 網站連結

今天在找電視劇下載連結的時候,找了一個整部劇的下載地址,但是有40多集,連結地址較長且不好複製,於是就想到了HtmlAgilityPack抓取的方式。

先看實現效果:

使用到的NUGET包:HtmlAgilityPack、HttpCode.Core

請求到網址獲取整個網址

   static void Main(string[] args)
        { 
            HttpHelpers httpHelpers = new HttpHelpers();
            HttpItems items = new HttpItems();
            items.Url 
= "https://www.123455.com/videodetails/2222.html";//請求地址 items.Method = "Get";//請求方式 post HttpResults hr = httpHelpers.GetHtml(items); JX(hr.Html); }

解析獲取到的網址

/// <summary>
        /// 解析XML
        /// </summary>
        /// <param name="htmlCode"></param>
public static void JX(string htmlCode) { //HtmlAgilityPack //原始碼地址:https://html-agility-pack.net/?z=codeplex //下載地址2:https://codeplexarchive.blob.core.windows.net/archive/projects/htmlagilitypack/htmlagilitypack.zip string path = System.AppDomain.CurrentDomain.BaseDirectory;
var filname = "抓取檔案.txt"; HtmlDocument document = new HtmlDocument(); document.LoadHtml(htmlCode); HtmlNode rootNode = document.DocumentNode; //categoryNodeList 具有相同型別的節點的集合 //標籤@屬性='屬性名稱' HtmlNodeCollection categoryNodeList = rootNode.SelectNodes("//div[@id='content']//li[@id='li3_0']//span[@id='s3p0']"); //也可以通過Xpath路徑的形式獲取 Xpath路徑可以使用HAPExplorer.exe(通過上面的原始碼地址可以下載並生成工具) //HtmlNodeCollection categoryNodeList = rootNode.SelectNodes("/html[1]/head[1]/div[2]/div[6]/ul[1]"); foreach (var item in categoryNodeList) { var sapn = item.InnerHtml.Trim(); var herf = sapn.Split('"')[3]; WriteMessage(path + filname, herf); } }

輸出到文字檔案

        /// <summary>
        /// 輸出指定資訊到文字檔案
        /// </summary>
        /// <param name="path">文字檔案路徑</param>
        /// <param name="msg">輸出資訊</param>
        public static void WriteMessage(string path, string msg)
        {
            using (FileStream fs = new FileStream(path, FileMode.OpenOrCreate, FileAccess.Write))
            {
                using (StreamWriter sw = new StreamWriter(fs))
                {
                    sw.BaseStream.Seek(0, SeekOrigin.End);
                    sw.WriteLine("{0}\n", msg, DateTime.Now);
                    sw.Flush();
                }
            }
        }