Scrapy(二)獲取script標籤裡面的資料內容

阿新 • • 發佈：2019-02-19

1.資料例子演示

1.1主要獲取內容

主要獲取內容

2.開始擼程式碼(python3.6)

只是部分主要程式碼

import requests
from bs4 import BeautifulSoup
import js2xml
from lxml import etree

class HdbSpider(scrapy.Spider):
    name = 'hdb'
    allowed_domains = ['http://www.hdb.com/']
    start_urls = ['http://www.hdb.com/']
    #全國
    globalUrl = ['http://www.hdb.com/quanguo/']

def url(self):
    url = http://www.hdb.com/party/a0lz2.html 
    yield scrapy.Request(url,self.parse,dont_filter=True)
def parse(self,response):
    #主要內容
    resp = response.text
    soup = BeautifulSoup(resp, 'lxml')
    src = soup.select('head script')[6].string
    src_text = js2xml.parse(src,  debug=False)
    src_tree = js2xml.pretty_print(src_text)
    print('treeeeeeeeeeeeeeeeeeeeeeeeeeeee')
    print(src_tree)
    #生成結果展示圖一
    selector = etree.HTML(src_tree)
    # print(selector)
    #自己去匹配自己想要的資料
    content = selector.xpath("//property[@name = '_id']/string/text()")[0]
    print(content)

圖一

生成後的結果

詳細程式碼地址

[email protected]:yzw1/python-Reptilian-content.git

參考文章

1. https://blog.csdn.net/fan3652/article/details/72780301（去除裡面的內容）
2. https://blog.csdn.net/qq_34246164/article/details/80700399
3. https://blog.csdn.net/freeking101/article/details/64461574

Scrapy(二)獲取script標籤裡面的資料內容

1.資料例子演示 1.1主要獲取內容 2.開始擼程式碼(python3.6) 只是部分主要程式碼 import requests from bs4 import Beauti

jsoup獲取script標籤中的內容

String page = HttpUtil.doGet(href); Document document = Jsoup.parse(page); Elements elements = document.select("

原生態php通過dom獲取div/table裡面的內容，不用正則！

原生態php獲取網頁標籤裡面的內容，不用外掛！不用正則，直接一把摳出來！ error_reporting(E_ALL); $out=_getUrl('http://www.gdczepb.gov.

使用selenium無法獲取到標籤的文字內容的解決方法

在我們使用selenium進行抓取網頁的時候，可能有的時候會抓取不到內容。 selenium 獲取不了標籤文字的解決方法 ------ 即driver.find_element_by_xxx().text() 為空的解決辦法 <a href="http://www.baidu.

BeautifulSoup 提取某個tag標籤裡面的內容

用的版本是BeautifulSoup4，用起來的確要比 re 好用一些，不用一個個的去寫正則表示式，這樣還是挺方便的。比如我要獲取高匿代理IP頁面上的IP和埠，網址這裡：點選開啟連結，它的組織方式是這樣的，如下圖： IP和埠 tr.td 標籤裡面，tr有class屬性，

微信小程式獲取標籤裡面的自定義資料

<view wx:for="{{receiverlist}}" wx:for-item="receiver" wx:key="" wx:for-index="idx"> <input type='number' value='{{salesNumber}}' data-na

jQuery：獲取Html標籤元素裡的資料內容

<select class="form-control" id="province" name="province" placeholder="請選擇省份"> <option val

HTML 標籤如何獲取裡面的內容

<select> <option value ="volvo">Volvo</option> <option value ="saab">Saa

Python 遍歷資料夾裡面的內容 5*

root_path='./result' sub_path=root_path+'./tmp' for root, dirs, files in os.walk(sub_path): for file in files: if os.path.splitext(file)[

2018 - Python 3.7 爬蟲之利用 Scrapy 框架獲取圖片並下載（二）

一、通過命令構建一個爬蟲專案二、定義 item 三、啟用 pipeline 管道四、編寫爬蟲 Spider 五、執行爬蟲六、結果檢視未安裝 Scrapy 框架，見上一篇文章：框架安裝及配置一、通過命令構建一個爬蟲專

Scrapy如何獲取返回的headers裡面的多個Set-Cookie

爬蟲有時候需要先拿到cookie，然後再用cookie去訪問其他頁面。當遇到返回的response的headers中包含多個Set-Cookie時，如何獲取呢？如果直接用requests模組： #獲取響應的cookie html = requests.get(u

jq 點選複製div裡面的內容如果貼上到富文字中，會將樣式，裡面所有的標籤，文字一併貼上進去

<!doctype html> <html> <head> <meta charset="utf-8"> <title>點選複製功能</title> </head> <script src="http://code

java簡單部落格系統（二）導航標籤頁點選後頁面內容改變及背景色改變

一、同一個Servlet處理多個請求，顯示不同的頁面內容導航標籤頁 bootStrap模板： <ul class="nav nav-tabs"> <li role="presentation" class="active"><a href="#">Home

JQuery 同時獲取多個標籤的指定內容並儲存為陣列

文章來自：原始碼線上https://www.shengli.me/jquery/271.html 此時的list1的陣列中每個元素已經不是'li'物件,如此執行控制檯會報錯: &nbs

PHP 獲取excel表格裡面的資料

1.先下載類庫 https://github.com/PHPOffice/PHPExcel <?php include "./Classes/PHPExcel/IOFactory.php"; $inputFileName = "./

mysql查詢語句中對欄位內容補位(補零為例)length()函式獲取某個欄位資料長度

mysql查詢語句中對欄位內容補位(補零為例)length()函式獲取某個欄位資料長度 jakehu 2014 年 4 月 10 日 mysql查詢語句中對欄位內容補位(補零為例)length()函式獲取某個欄位資料長度2014-04-

獲取p標籤的內容

<html><head><script type="text/javascript">function aa() { var obj = document.g

jsp獲取properties中的資料-fmt標籤

fmt標籤的使用首先要在jsp中引入標籤庫：引入方法為<%@ taglib prefix="fmt" uri="http://java.sun.com/jsp/jstl/fmt"%> 其次引入標籤 <fmt:setBundle basename="t

Scrapy操作瀏覽器獲取網易新聞資料

爬蟲程式碼： 1 import scrapy 2 from selenium import webdriver 3 4 class WangyiSpider(scrapy.Spider): 5 name = 'wangyi' 6 # allowed_do

JS和JQ獲取標籤裡的內容方法總結

<select class="form-control" id="province" name="province"> <option value="1" >河南</option> <option value="2"

Scrapy(二)獲取script標籤裡面的資料內容

1.資料例子演示

1.1主要獲取內容

2.開始擼程式碼(python3.6)

相關推薦