python獲取小說內容

阿新 • • 發佈：2018-12-17

在使用前要安裝python的第3方庫，BeautifulSoup，pymysql

程式碼裡面用了mysql資料庫

程式碼裡面獲取小說網站地址是：http://www.kbiquge.com

以下是原始碼：

 1 #coding=utf-8
 2 import pymysql
 3 import time
 4 import datetime
 5 import uuid
 6 
 7 
 8 from urllib import request
 9 from bs4 import BeautifulSoup
10 
11 
12 #資料存入章節表中 批量提價資料， usersvalues[] 包含chapter_id，story_id，chapter_name，chapter_content,chapter_href 

13 def Write_info(usersvalues):
14     db = pymysql.connect("localhost","root","123456","python" )
15     cursor = db.cursor()
16     try:
17         sql = "INSERT  INTO chapter(chapter_id,story_id,chapter_name,chapter_content,chapter_href) \
18           VALUES(%s,%s,%s,%s,%s)"
19         # 執行sql語句 批量插入資料 

20         cursor.executemany(sql, usersvalues)
21         db.commit()
22     except ZeroDivisionError:
23         print ("Error: unable to fetch data")
24         db.rollback()
25     db.close()
26 
27 #小說名稱 story_name
28 def Story_name(story_name):
29     db = pymysql.connect("localhost","root","123456 
","python" )
30     uuids=str(uuid.uuid1()).replace('-','')
31     cursor = db.cursor()
32     try:
33         cursor.execute("select id from story  where name='"+story_name+"'")
34         fname=""
35         results = cursor.fetchall()
36         for row in results:
37             fname= row[0]
38         if cursor.rowcount!=1:
39             sql = """INSERT INTO STORY(id,name, start, end_start,author) 
40              VALUES ('"""+uuids+"""', '"""+story_name+"""', '1', '1', 'wangyh')"""
41             cursor.execute(sql)
42             db.commit()
43             return uuids
44         else:
45             return fname
46     except ZeroDivisionError:
47         print ("Error: unable to fetch data")
48         db.rollback()
49     db.close()
50 
51 
52 if __name__ == '__main__':
53     # 目錄頁
54     url_xs='http://www.kbiquge.com'
55     url = url_xs+'/86_86683/'
56     head = {}
57     head['User-Agent'] = 'Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166  Safari/535.19'
58     req = request.Request(url, headers = head)
59     response = request.urlopen(req)
60     html = response.read()
61     # 解析目錄頁
62     soup = BeautifulSoup(html, 'lxml')
63     #小說名稱 id="info"
64     story_name = soup.find('div', id = 'info').find("h1").text
65     #查詢是否存入 story表中 story_id 小說ID
66     story_id= Story_name(story_name)
67     print("story_id:"+story_id)
68     # find_next找到第二個<div> 小說目錄
69     soup_texts = soup.find('div', id = 'list')
70     usersvalues=[]
71     # 遍歷ol的子節點，打印出章節標題和對應的連結地址
72     for link in soup_texts.dl.children:
73         if link != '\n':
74             print('start')
75             list_tmp=link.find_all('a')
76             for a in list_tmp:
77                 #0.5秒
78                 time.sleep(0.5)
79                 download_url = url_xs+a.get('href')
80                 download_req = request.Request(download_url, headers = head)
81                 download_response = request.urlopen(download_req)
82                 download_html = download_response.read()
83                 download_soup = BeautifulSoup(download_html, 'lxml')
84                 download_soup_texts = download_soup.find('div', id = 'content')
85                 download_soup_texts = download_soup_texts.text
86                 download_soup_texts= download_soup_texts.replace(u'\xa0', u' ')
87                 uuids="w"+str(int(round(time.time() * 1000)))
88                 data=(uuids,story_id,a.text,download_soup_texts,download_url)
89                 usersvalues.append(data)
90     Write_info(usersvalues)

View Code

python獲取小說內容

在使用前要安裝python的第3方庫，BeautifulSoup，pymysql 程式碼裡面用了mysql資料庫程式碼裡面獲取小說網站地址是：http://www.kbiquge.com 以下是原始碼： 1 #coding=utf-8 2 import pymysql 3 import ti

Python獲取網頁內容、使用BeautifulSoup庫分析html

利用 urllib包獲取網頁內容 #引入包 from urllib.request import urlopen response = urlopen("http://fund.eastmoney.com/fund.html") html = resp

Python《十》Python獲取網頁內容、使用BeautifulSoup庫分析html

一,利用 urllib包獲取網頁內容 #引入包 from urllib.request import urlopen response = urlopen("http://fund.eastmon

[python]獲取網頁中內容為漢字的字符串的判斷

vsr rbo ats art htm acad for swe lin IPerf%E2%80%94%E2%80%94%E7%BD%91%E7%BB%9C%E6%B5%8B%E8%AF%95%E5%B7%A5%E5%85%B7%E4%BB%8B%E7%BB%8D%E4%B

Python獲取全網電影，深夜有小電影看難道不是你學習的初衷嗎？

Python Pythonweb 爬蟲程序員職業你以為這是×××？NO，這只是簡單的Python爬蟲。如今各種各樣的影視Vip收費出現在我們的視野中，對於我們來說也許是一部期待已久的電影電視，可是對於網站，App開發人員來說只是一組數據，為了一組數據去付費、等廣告時間，我覺得還是有

【小白專區】python 列表基礎內容彙總

很多時候，我們會用到對字串的處理，這裡簡單概括了下列表的一些基礎用法：定義list=['a','b','c'] 1、獲取每個元素的值，因為列表是有序的陣列，根據index即可訪問，從0開始，list[0]='a' 2、修改list中某一個值的方法：list[0]='d' l

Python獲取字串中特定的內容

有時需要多次呼叫提取字串內容的函式時，使用正則表示式不是很方便的時候或者，可以封裝成函式呼叫。獲取某字元後的int型： get_int_after def get_int_after(s, f): S = s.upper() F = f.upper

Python獲取B站直播中的最新一條評論，複製內容傳送到該直播間中

注意：不要在一個直播間浪太久，會被拉黑的，說不定會被罵[哭唧唧] # encoding=utf-8 # Created by double lin at 2018/10/10 import requests # 獲取最新的彈幕列表，並輸出內容 # cookie = { #

Python爬蟲：lxml模組分析並獲取網頁內容

運用css選擇器： # -*- coding: utf-8 -*- from lxml import html page_html = ''' <html><body> <input id="input_id" value="input value" nam

python 獲取網頁的內容

1.安裝pip 我的個人桌面系統用的linuxmint，系統預設沒有安裝pip，考慮到後面安裝requests模組使用pip，所以我這裡第一步先安裝pip。 1 $ sudo apt install python-pip

如何寫出一見傾心的Python程式碼？獲取小姐姐的歡心呢？

The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex i

python 根據標籤名獲取標籤內容

import re import json import requests from bs4 import BeautifulSoup import lxml.html from lxml import etree result = requests.get('http://example.w

(微信小程式)關於require引入JS裡 wx.getStorageSync()無法即時獲取快取內容的分析與記錄

先交代問題場景: LZ要做一個小程式內資料切換功能，即在首頁做一個按鈕，點選了之後切換全部接口裡的一個請求引數值。(即將該值由A改為B)。 OK ，很自然的想到了利用本地快取。一切都是那麼的順利，將全域性的介面地址修改為從快取中獲取該值之後，

使用selenium和phantomJS瀏覽器獲取網頁內容的小演示

# 使用selenium和phantomJS瀏覽器獲取網頁內容的小演示 # 匯入包 from selenium import webdriver # 使用selenium庫裡的webdriver方法呼叫PhantomJS瀏覽器例項一個瀏覽器的操作物件 # 括號裡的引數為PhantomJS瀏覽器在電

Python獲取Redis所有Key以及內容

一、獲取所有Key # -*- encoding: UTF-8 -*- __author__ = "Sky" import redis pool=redis.ConnectionPool(host='127.0.0.1',port=6379,db=0) r = redi

Python爬蟲如何獲取動態內容-上

首先這裡說一下我標題動態內容指的就是一個網頁，每天你去瀏覽它的時候有些內容是更新的，所以這些是在原始碼裡面沒有的。例子為B站每天的輪播和靜態推薦內容都是不斷更新的。因此，如果想要爬取這些資訊，一直用之前的爬取方式：requests.get(URL) ，是找不到這些的。用

微信小程式wx：for給每一個元素加事件，並獲取元素內容

首先是wxml <view class='js'> <view wx:for="{{adressMessages}}">/*這裡是元素的內容*/ <text class='diming'>{{it

python獲取docx文件的內容(文字)

首先下載第三方庫python-docx: pip install python-docx(在py檔案裡面匯入的時候是import docx) 簡單的說,docx裡面的每一個段落都是一個paragraph物件,段落中文字如果有不同的樣式(加粗，斜體)就會有不同的

python獲取完整網頁內容（即包括js動態載入的）：selenium+phantomjs

在上一篇文章（http://blog.csdn.net/Trisyp/article/details/78732630）中我們利用模擬開啟瀏覽器的方法模擬點選網頁中的載入更多來實現動態載入網頁並獲取網

python獲取自己發的說說內容

一、模擬登陸 import re from selenium import webdriver from time import sleep from PIL import Image #定義QQ空間登入函式 def QR_login(): def getG

python獲取小說內容

相關推薦