Python爬蟲網頁的列表頁

阿新 • • 發佈：2018-12-17

私募基金公示列表頁：

開啟網頁，右鍵檢查，檢視network中的url，找出頁面url變化的規律：

發現就是page這個引數的變化，ok

# -*- coding: utf-8 -*- """ Created on Sat Oct 27 11:21:11 2018

@author: Belinda """ from lxml import etree import requests import csv import time from multiprocessing import *

def spider(): headers = {'User-Agent': 'Mozilla/5.0 (Window NT 10.0; WOW64)\ AppleWebKit/537.36 (KTML,like Gecko) Chrome/46.0.2490.80 Safari/537.36'} for i in range(0,4): url='http://gs.amac.org.cn/amac-infodisc/api/pof/fund?rand=0.49229080398526315&page={}&size=20'.format(i) html=requests.get(url,headers=headers) time.sleep(1) #用獲取的頁面初始化etree,得到一個selector #然後用selector使用xpath提取資料 selector=etree.HTML(html.text) #先獲取基金列表,檢視每一行資料的xpath,提取相同部分作為simu_list的xapath simu_list=selector.xpath('//*[@id="fundlist"]/tbody/tr[1]') for simu in simu_list: id=''.join(simu.xpath('td[1]/text()')) fundName=''.join(simu.xpath('td[2]/a/text()')) managerName=''.join(simu.xpath('td[3]/a/text()')) mandatorName=''.join(simu.xpath('td[4]/text()')) establishDate=''.join(simu.xpath('td[5]/text()')) recordTime=''.join(simu.xpath('td[6]/text()')) item=(id,fundName,managerName,mandatorName,establishDate,recordTime) print(item) writer.writerow(item)

if __name__=='__main__': fp=open("./simuwang.csv",'a+',encoding="utf-8",newline="") writer=csv.writer(fp) writer.writerow(('id', 'fundName','managerName','mandatorName', 'establishDate','recordTime'))#csv檔案的每列的列表名 #pool=Pool(4) #pool=mutiprocessing.Pool(4) #pool.map(spider()) spider() fp.close() print("爬取結束！")

Python爬蟲網頁的列表頁

Python爬蟲網頁的列表頁

python爬蟲爬取頁面源碼在本頁面展示

Python爬蟲時翻頁等操作URL不會改變的解決辦法----以攜程評論爬取為例

python爬蟲+網頁點選事件+selenium模擬瀏覽器，爬取選股寶內容

python 爬蟲網頁亂碼問題解決方法

Python 爬蟲工具列表

一個鹹魚的Python爬蟲之路（三）：爬取網頁圖片

Python簡單網頁爬蟲

Python爬蟲：新浪新聞詳情頁的數據抓取（函數版）

python爬蟲 selenium+phantomjs動態解析網頁，加載頁面成功，返回空數據

python爬蟲--解析網頁幾種方法之正則表達式

python爬蟲--解析網頁幾種方法之BeautifulSoup

python爬蟲之解析網頁的工具pyquery

python爬蟲之真實世界中的網頁解析

Python爬蟲系列：判斷目標網頁編碼的幾種方法

python 爬蟲（一） requests+BeautifulSoup 爬取簡單網頁代碼示例

Python爬蟲初探 - selenium+beautifulsoup4+chromedriver爬取需要登錄的網頁信息

★ Python爬蟲 - 爬取網頁文字資訊並儲存（美文的爬取與儲存）

Python 爬蟲技巧1 | 將爬取網頁中的相對路徑轉換為絕對路徑

[Python] [爬蟲] 6.批量政府網站的招投標、中標資訊爬取和推送的自動化爬蟲——網頁解析器

Python爬蟲網頁的列表頁

相關推薦