Python《十》Python獲取網頁內容、使用BeautifulSoup庫分析html

阿新 • • 發佈：2019-02-18

一,利用 urllib包獲取網頁內容

#引入包
from urllib.request import urlopen

response = urlopen("http://fund.eastmoney.com/fund.html")
html = response.read();

#這個網頁編碼是gb2312
#print(html.decode("gb2312"))

#把html內容儲存到一個檔案
with open("1.txt","wb") as f:
    f.write(html.decode("gb2312").encode("utf8"))
    f.close()

二，使用BeautifulSoup分析html

from bs4 import BeautifulSoup
with open("1.txt", "rb") as f:
    html = f.read().decode("utf8")
    f.close()

# 分析html內容
soup = BeautifulSoup(html,"html.parser")

# 取出網頁title
print(soup.title) #<title>每日開放式基金淨值表 _ 天天基金網</title>

# 基金編碼
codes = soup.find("table",id="oTable").tbody.find_all("td","bzdm")

result = () # 初始化一個元組
for code in codes:
    result += ({
        "code":code.get_text(),
        "name":code.next_sibling.find("a").get_text(),
        "NAV":code.next_sibling.next_sibling.get_text(),
        "ACCNAV":code.next_sibling.next_sibling.next_sibling.get_text()
     },)
for item in result:
    print(item["name"]+"---"+item["NAV"])
# 列印結果
print(result[0]["name"])

Python《十》Python獲取網頁內容、使用BeautifulSoup庫分析html

一,利用 urllib包獲取網頁內容 #引入包 from urllib.request import urlopen response = urlopen("http://fund.eastmon

Python獲取網頁內容、使用BeautifulSoup庫分析html

利用 urllib包獲取網頁內容 #引入包 from urllib.request import urlopen response = urlopen("http://fund.eastmoney.com/fund.html") html = resp

Python 讀取檔案下所有內容、獲取檔名、擷取字元、寫回檔案

Python 讀取檔案下所有內容、獲取檔名、擷取字元、寫回檔案 # coding=gbk import os import os.path #讀取目錄下的所有檔案，包括巢狀的資料夾 def GetFileList(dir, fileList): newDir = dir

Python爬蟲：lxml模組分析並獲取網頁內容

運用css選擇器： # -*- coding: utf-8 -*- from lxml import html page_html = ''' <html><body> <input id="input_id" value="input value" nam

python爬蟲如何獲取網頁資訊時，發現所需要的資訊是動態生成的，然後抓包獲取到資訊來源的URL？

如果在利用爬蟲爬取網頁資料的時候，發現是動態生成的時候，目前來看主要表現在以下幾種：以介面的形式生成資料，這種形式其實挺好處理的，比較典型的是知乎的使用者資訊，我們只要知道介面的URL，就可以不用再考慮頁面本身的內容以知乎為例，我們在爬取使用者資訊的時候，可能

python 爬蟲 css提取網頁內容

四大提取網頁內容的基本方法之 4.css提取網頁內容語法簡單一覽 CSS選擇器用於選擇你想要的元素的樣式的模式。 "CSS"列表示在CSS版本的屬性定義（CSS1，CSS2，或對CSS3）。在使用c

Python篇----Requests獲取網頁原始碼（爬蟲基礎）

1 下載與安裝見其他教程。 2 Requsts簡介 Requests is an Apache2 Licensed HTTP library, written inPython, for human beings. Python’s standard urllib2

Python爬蟲如何獲取動態內容-上

首先這裡說一下我標題動態內容指的就是一個網頁，每天你去瀏覽它的時候有些內容是更新的，所以這些是在原始碼裡面沒有的。例子為B站每天的輪播和靜態推薦內容都是不斷更新的。因此，如果想要爬取這些資訊，一直用之前的爬取方式：requests.get(URL) ，是找不到這些的。用

python筆記系列：檔案內容、檔案及資料夾的對比difflib、filecmp

檔案內容對比#!/usr/bin/pythonimport difflibtext1 = """text1:This module provides classes and functions for comparing sequences.including HTML an

vc++使用IWinHttpRequest獲取網頁內容亂碼

[0 網頁 unicode code com box get ant 解決方法 mfc項目的字符集為unicode字符集亂碼前代碼: CString rspStr = pHttpReq->ResponseText; MessageBox(rspStr); 亂碼效

通過request獲取網頁資訊通過BeautifulSoup剖析網頁元素

獲取網頁 alink his odi res req 特定 bsp css屬性 import requests newsUrl =‘http://news.sina.com.cn/china/‘ res = requests.get(newsUrl) res.encod

C# HttpClient Get獲取網頁內容

獲取網頁 lec net IV 內容 pen style 網頁 pac 1 using System; 2 using System.Collections.Generic; 3 using System.IO; 4 using System.Linq;

nodejs 使用 body-parser 獲取網頁內容

parse nodejs 網頁 url end func body parser .post var bodyParser = require(‘body-parser‘); var urlencodedParser = bodyParser.urlencoded({

Python3 Selenium WebDriver網頁的前進、後退、重新整理、最大化、獲取視窗位置、設定視窗大小、獲取頁面title、獲取網頁原始碼、獲取Url等基本操作

Python3 Selenium WebDriver網頁的前進、後退、重新整理、最大化、獲取視窗位置、設定視窗大小、獲取頁面title、獲取網頁原始碼、獲取Url等基本操作通過selenium webdriver操作網頁前進、後退、重新整理、最大化、獲取視窗位置、設定視窗大小、獲取頁面title、獲取網頁

關於java獲取網頁內容

最近專案需求，做一些新聞站點的爬取工作。1.簡單的jsoup爬取，靜態頁面形式； String url="a.atimo.cn";//靜態頁面連結地址Document doc = Jsoup.connect(url).userAgent("Mozilla").timeout(4000).get();

python爬蟲學習筆記四：BeautifulSoup庫對HTML文字進行操作

只要你提供的資訊是標籤，就可以很好的解析怎麼使用BeautifulSoup庫？ from bs4 import BeautifulSoup soup=BeautifulSoup('<p>data<p>','html.parser'）例如： import

php通過登入後的cookie以及使用者代理然後通過curl獲取網頁內容

function curl_get_https($url, $data=array(), $header=array(), $timeout=30){ $ch = curl_init(); curl_setopt($ch, CURLOPT_

php獲取網頁內容的三種方法

3種利用php獲得網頁原始碼抓取網頁內容的方法，我們可以根據實際需要選用。 1、使用file_get_contents獲得網頁原始碼這個方法最常用，只需要兩行程式碼即可，非常簡單方便。參考程式碼： <?php $fh= file_get_contents('http

C#獲取網頁內容的三種方式

本文轉載自http://www.cnblogs.com/ceachy/articles/CSharp_Retrive_Page_Document.html，Luke Zhang的部落格。搜尋網路，發現C#通常有三種方法獲取網頁內容，使用WebClient、WebB

PHP 獲取網頁內容的三種方法

抓取到的內容在通過正則表示式做一下過濾就得到了你想要的內容。 file_get_contents() 把整個檔案讀入一個字串中。 Java程式碼 <meta charset="utf-8"> <?php $url = "http

Python《十》Python獲取網頁內容、使用BeautifulSoup庫分析html

相關推薦