Python自動化批量下載網上的論文

阿新 • • 發佈：2021-01-19

在科研學習的過程中，我們難免需要查詢相關的文獻資料，而想必很多小夥伴都知道SCI-HUB，此乃一大神器，它可以幫助我們搜尋相關論文並下載其原文。可以說，SCI-HUB造福了眾多科研人員，用起來也是“美滋滋”。

一、程式碼分析

程式碼分析的詳細思路跟以往依舊如此雷同，逃不過的還是：抓包分析->模擬請求->程式碼整合。

1. 搜尋論文

通過論文的URL、PMID、DOI號或者論文標題等搜尋到對應的論文，並通過bs4庫找出PDF原文的連結地址，程式碼如下：

def search_article(artName):
    '''
    搜尋論文
    ---------------
    輸入：論文名
    ---------------
    輸出：搜尋結果（如果沒有返回""，否則返回PDF連結）
    '''
    url = 'https://www.sci-hub.ren/'
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0',
               'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
               'Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
               'Accept-Encoding':'gzip, deflate, br',
               'Content-Type':'application/x-www-form-urlencoded',
               'Content-Length':'123',
               'Origin':'https://www.sci-hub.ren',
               'Connection':'keep-alive',
               'Upgrade-Insecure-Requests':'1'}
    data = {'sci-hub-plugin-check':'',
            'request':artName}
    res = requests.post(url, headers=headers, data=data)
    html = res.text
    soup = BeautifulSoup(html, 'html.parser')
    iframe = soup.find(id='pdf')
    if iframe == None: # 未找到相應文章
        return ''
    else:
        downUrl = iframe['src']
        if 'http' not in downUrl:
            downUrl = 'https:'+downUrl
        return downUrl

2. 下載論文

得到了論文的連結地址之後，只需要通過requests傳送一個請求，即可將其下載：

def download_article(downUrl):
    '''
    根據論文連結下載文章
    ----------------------
    輸入：論文連結
    ----------------------
    輸出：PDF檔案二進位制
    '''
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0',
               'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
               'Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
               'Accept-Encoding':'gzip, deflate, br',
               'Connection':'keep-alive',
               'Upgrade-Insecure-Requests':'1'}
    res = requests.get(downUrl, headers=headers)
    return res.content

很多人學習python，不知道從何學起。
很多人學習python，掌握了基本語法過後，不知道在哪裡尋找案例上手。
很多已經做案例的人，卻不知道如何去學習更加高深的知識。
那麼針對這三類人，我給大家提供一個好的學習平臺，免費領取視訊教程，電子書籍，以及課程的原始碼！
QQ群：568668415

二、完整程式碼

將上述兩個函式整合之後，我的完整程式碼如下：

# -*- coding: utf-8 -*-
"""
Created on Tue Jan  5 16:32:22 2021

@author: kimol_love
"""
import os
import time
import requests
from bs4 import BeautifulSoup

def search_article(artName):
    '''
    搜尋論文
    ---------------
    輸入：論文名
    ---------------
    輸出：搜尋結果（如果沒有返回""，否則返回PDF連結）
    '''
    url = 'https://www.sci-hub.ren/'
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0',
               'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
               'Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
               'Accept-Encoding':'gzip, deflate, br',
               'Content-Type':'application/x-www-form-urlencoded',
               'Content-Length':'123',
               'Origin':'https://www.sci-hub.ren',
               'Connection':'keep-alive',
               'Upgrade-Insecure-Requests':'1'}
    data = {'sci-hub-plugin-check':'',
            'request':artName}
    res = requests.post(url, headers=headers, data=data)
    html = res.text
    soup = BeautifulSoup(html, 'html.parser')
    iframe = soup.find(id='pdf')
    if iframe == None: # 未找到相應文章
        return ''
    else:
        downUrl = iframe['src']
        if 'http' not in downUrl:
            downUrl = 'https:'+downUrl
        return downUrl
        
def download_article(downUrl):
    '''
    根據論文連結下載文章
    ----------------------
    輸入：論文連結
    ----------------------
    輸出：PDF檔案二進位制
    '''
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0',
               'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
               'Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
               'Accept-Encoding':'gzip, deflate, br',
               'Connection':'keep-alive',
               'Upgrade-Insecure-Requests':'1'}
    res = requests.get(downUrl, headers=headers)
    return res.content

def welcome():
    '''
    歡迎介面
    '''
    os.system('cls')
    title = '''
               _____  _____ _____      _    _ _    _ ____  
              / ____|/ ____|_   _|    | |  | | |  | |  _ \ 
             | (___ | |      | |______| |__| | |  | | |_) |
              \___ \| |      | |______|  __  | |  | |  _ < 
              ____) | |____ _| |_     | |  | | |__| | |_) |
             |_____/ \_____|_____|    |_|  |_|\____/|____/
                

            '''
    print(title)
    
if __name__ == '__main__':
    while True:
        welcome()
        request = input('請輸入URL、PMID、DOI或者論文標題：')
        print('搜尋中...')
        downUrl = search_article(request)
        if downUrl == '':
            print('未找到相關論文，請重新搜尋！')
        else:
            print('論文連結：%s'%downUrl)
            print('下載中...')
            pdf = download_article(downUrl)
            with open('%s.pdf'%request, 'wb') as f:
                f.write(pdf)
            print('---下載完成---')
        time.sleep(0.8)

寫在最後

當然，我的程式碼僅供參考，小夥伴們完全可以根據自己的需要進行相應的調整和改動，這樣才能更多地發揮其價值。

以下內容無用，為本篇部落格被搜尋引擎抓取使用
(*￣︶￣)(*￣︶￣)(*￣︶￣)(*￣︶￣)(*￣︶￣)(*￣︶￣)(*￣︶￣)(*￣︶￣)
python 是幹什麼的零基礎學 python 要多久 python 為什麼叫爬蟲
python 爬蟲菜鳥教程 python 爬蟲萬能程式碼 python 爬蟲怎麼掙錢
python 基礎教程網路爬蟲 python python 爬蟲經典例子
python 爬蟲
(*￣︶￣)(*￣︶￣)(*￣︶￣)(*￣︶￣)(*￣︶￣)(*￣︶￣)(*￣︶￣)(*￣︶￣)
以上內容無用，為本篇部落格被搜尋引擎抓取使用

Python自動化批量下載網上的論文

一、程式碼分析

1. 搜尋論文

2. 下載論文

二、完整程式碼

寫在最後

Python自動化批量下載網上的論文

python FTP批量下載/刪除/上傳例項

用python爬蟲批量下載pdf的實現

Python爬蟲批量下載文獻

Python實用案例，Python指令碼，Python實現批量下載百度圖片

如何基於Python批量下載音樂

Python實現超簡單【抖音】無水印視訊批量下載

Python百度圖片批量下載器的空間複核崗dskjfhe

如何讓程式像人一樣的去批量下載歌曲？Python爬取付費歌曲

讓程式像人一樣的去批量下載歌曲？Python採集付費歌曲

Python爬蟲教程：python批量下載整站高清大圖

Python Excel 批量付款匯入明細資料分析整理核銷下載表匯入資料轉換

python 批量下載bilibili視訊的gui程式

python 根據列表批量下載網易雲音樂的免費音樂

用python批量下載apk

Python自動化辦公實現批量Word轉pdf

python批量下載郵件附件

【Python】Jupyter Notebook保留層級結構批量下載

Python批量下載小姐姐視訊，總有你喜歡的，你一般在哪個平臺看？

基於Python 任意頁面下的桌布批量下載

Python自動化批量下載網上的論文

一、程式碼分析

1. 搜尋論文

2. 下載論文

二、完整程式碼

寫在最後

相關推薦