Python selenium爬取微博資料程式碼例項

阿新 • • 發佈：2020-05-23

爬取某人的微博資料，把某人所有時間段的微博資料都爬下來。

具體思路：

建立driver-----get網頁----找到並提取資訊-----儲存csv----翻頁----get網頁（開始迴圈）----...----沒有“下一頁”就結束，

用了while True，沒用自我呼叫函式

嘟大海的微博：https://weibo.com/u/1623915527

辦公室小野的微博：https://weibo.com/bgsxy

程式碼如下

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import csv
import os
import time
 
#只有這2個引數設定，想爬誰的微博資料就在這裡改地址和目標csv名稱就行
weibo_url = 'https://weibo.com/bgsxy?profile_ftype=1&is_all=1#_0'
csv_name = 'bgsxy_allweibo.csv'
 
def start_chrome():
  print('開始建立瀏覽器')
  driver = webdriver.Chrome(executable_path='C:/Users/lori/Desktop/python52project/chromedriver_win32/chromedriver.exe')
  driver.start_client()
  return driver
 
def get_web(url):   #獲取網頁，並下拉到最底部
  print('開始開啟指定網頁')
  driver.get(url)
  time.sleep(7)
  scoll_down()
  time.sleep(5)
 
def scoll_down():  # 滾輪下拉到最底部
  html_page = driver.find_element_by_tag_name('html')
  for i in range(7):
    print(i)
    html_page.send_keys(Keys.END)
    time.sleep(1)
 
def get_data():
  print('開始查詢並提取資料')
  card_sel = 'div.WB_cardwrap.WB_feed_type'
  time_sel = 'a.S_txt2[node-type="feed_list_item_date"]'
  source_sel = 'a.S_txt2[suda-uatrack="key=profile_feed&value=pubfrom_guest"]'
  content_sel = 'div.WB_text.W_f14'
  interact_sel = 'span.line.S_line1>span>em:nth-child(2)'
 
  cards = driver.find_elements_by_css_selector(card_sel)
  info_list = []
 
  for card in cards:
    time = card.find_elements_by_css_selector(time_sel)[0].text #雖然有可能在一個card中有2個time元素，我們取第一個就對
    if card.find_elements_by_css_selector(source_sel):
      source = card.find_elements_by_css_selector(source_sel)[0].text
    else:
      source = ''
    content = card.find_elements_by_css_selector(content_sel)[0].text
    link = card.find_elements_by_css_selector(time_sel)[0].get_attribute('href')
    trans = card.find_elements_by_css_selector(interact_sel)[1].text
    comment = card.find_elements_by_css_selector(interact_sel)[2].text
    like = card.find_elements_by_css_selector(interact_sel)[3].text
    info_list.append([time,source,content,link,trans,comment,like])
 
  return info_list
 
def save_csv(info_list,csv_name):
  csv_path = './' + csv_name
  print('開始寫入csv檔案')
  if os.path.exists(csv_path):
    with open(csv_path,'a',newline='',encoding='utf-8-sig') as f: #newline=''避免空行；encoding='utf-8-sig'比utf8牛，儲存中文沒問題
      writer = csv.writer(f)
      writer.writerows(info_list)
  else:
    with open(csv_path,'w+',encoding='utf-8-sig') as f:
      writer = csv.writer(f)
      writer.writerow(['釋出時間','來源','內容','連結','轉發數','評論數','點贊數'])
      writer.writerows(info_list)
  time.sleep(5)
 
def next_page_url():
  next_page_sel = 'a.page.next'
  next_page_ele = driver.find_elements_by_css_selector(next_page_sel)
  if next_page_ele:
    return next_page_ele[0].get_attribute('href')
  else:
    return None
 
 
driver = start_chrome()
input('請在chrome中登入weibo.com')   # 暫停程式，手動登入weibo.com
 
while True:
  get_web(weibo_url)
  info_list = get_data()
  save_csv(info_list,csv_name)
  if next_page_url():
    weibo_url = next_page_url()
  else:
    print('爬取結束')
    break

以上就是本文的全部內容，希望對大家的學習有所幫助，也希望大家多多支援我們。

Python selenium爬取微博資料程式碼例項

爬取某人的微博資料，把某人所有時間段的微博資料都爬下來。具體思路：建立driver-----get網頁----找到並提取資訊-----儲存csv----翻頁----get網頁（開始迴圈）----...----沒有“下一頁”就結束，

Python selenium爬取微信公眾號文章程式碼詳解

參照資料：selenium webdriver新增cookie: https://www.jb51.net/article/193102.html 需求：想閱讀微信公眾號歷史文章，但是每次找回看得地方不方便。

python爬取微博評論的例項講解

python爬蟲是程式設計師們一定會掌握的知識，練習python爬蟲時，很多人會選擇爬取微博練手。python爬蟲微博根據微博存在於不同媒介上，所爬取的難度有差異，無論是python新入手的小白，還是已經熟練掌握的程式設計師

Python selenium抓取虎牙短視訊程式碼例項

今天閒著沒事，用selenium抓取視訊儲存到本地，只爬取了第一頁，只要小於等於5分鐘的視訊。。。

Python爬取微博熱搜榜，將資料存入資料庫

#-*-coding:utf-8-*- import urllib, pymysql, requests, re # 配置資料庫 config = { \'host\': \'127.0.0.1\',

基於Python採集爬取微信公眾號歷史資料

鯤之鵬的技術人員將在本文介紹一種通過模擬操作微信App的方式採集指定公眾號的所有歷史資料的方法。

python使用beautifulsoup4爬取酷狗音樂程式碼例項

這篇文章主要介紹了python使用beautifulsoup4爬取酷狗音樂程式碼例項,文中通過示例程式碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值,需要的朋友可以參考下

Python爬取豆瓣視訊資訊程式碼例項

這篇文章主要介紹了Python爬取豆瓣視訊資訊程式碼例項,文中通過示例程式碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值,需要的朋友可以參考下

Python爬蟲爬取電影票房資料及圖表展示操作示例

本文例項講述了Python爬蟲爬取電影票房資料及圖表展示操作。分享給大家供大家參考，具體如下：

Python爬蟲爬取、解析資料操作示例

本文例項講述了Python爬蟲爬取、解析資料操作。分享給大家供大家參考，具體如下：

Scrapy嘗試爬取微博熱搜

首先自己想要的item： 1 import scrapy 2 3 4 class WeiboItem(scrapy.Item): 5 6rank = scrapy.Field()

Python爬蟲爬取微信朋友圈

接下來，我們將實現微信朋友圈的爬取。如果直接用 Charles 或 mitmproxy 來監聽微信朋友圈的介面資料，這是無法實現爬取的，因為資料都是被加密的。而 Appium 不同，Appium 作為一個自動化測試工具可以直接模擬 App

Python爬蟲——爬取豆瓣top250完整程式碼

# -*- coding = utf-8 -*- # 解析網頁 from bs4 import BeautifulSoup as bf # 正則表示式 import re # Excel表格操作

python爬蟲----爬取淘寶資料

引言前幾周在做c#大作業，做的是一個水果系統，需要一些水果的資料，於是就去爬取淘寶資料，可是爬下來總是空資料，不知道是怎麼回事，於是我百度了一下說selenium可以實現，然後我就把selenium學習了下，編寫了一個

python爬蟲爬取圖片的簡單程式碼

Python是很好的爬蟲工具不用再說了，它可以滿足我們爬取網路內容的需求，那最簡單的爬取網路上的圖片，可以通過很簡單的方法實現。只需匯入正則表示式模組，並利用spider原理通過使用定義函式的方法可以輕鬆的實現爬

爬取微博簽到頁(一)——確定底層抓取邏輯

技術標籤：爬蟲分享大資料爬蟲seleniumpythonchrome 我是利用Python的 webdriver+selenium工具抓取的動態連結

python+ selenium爬取房天下新房詳情

新房詳情 from selenium import webdriver from selenium.webdriver.chrome.options import Options from time import sleep

Python selenium 爬取cnvd(國家資訊保安漏洞共享平臺)

#coding = utf-8#@author :今夕#@Time :2021.08.06 16:09#@file :mian.py#@software :PyCharmimport timefrom selenium import webdriverfrom bs4 import BeautifulSoupimport reimport pymysqlimport random#應用漏

Python selenium 爬取cnvd(國家資訊保安漏洞共享平臺)剩餘部分

# coding = utf-8# @author :今夕# @Time :2021.08.10 09:22# @file :main2.py# @software :PyCharmimport timefrom selenium import webdriverfrom bs4 import BeautifulSoupimport reimport pymysqlimport random

python Selenium爬取實戰

### python Selenium爬取實戰 @[toc]目標網站： ```https://spa2.scrape.center/``` 這個網站是一個電影評分網站，採用selenium進行爬取

Python selenium爬取微博資料程式碼例項

相關推薦