爬取網易財經全部A股上市公司年報
阿新 • • 發佈:2019-02-13
首先要找到所有A股上市公司的股票程式碼,將東方財富網列表中所有的股票的程式碼(6位數字號)取下來
<a target="_blank" href="http://quote.eastmoney.com/sh500001.html">基金金泰(500001)</a>
從網頁中找到我們要的資訊,儲存在字典中,寫入"stock_name.txt"檔案
# -*- coding: utf-8 -*- """ Created on Tue Oct 9 00:03:46 2018 @author: South """ import requests import time import sys import json import os def get_file(url, filename): r = requests.get(url) try: with open(filename, 'wb') as file: file.write(r.content) except: print(filename) pass def check_file(filename): '''檢查有沒有被反爬''' if os.path.exists(filename): with open(filename, 'r') as f: line = f.readline() if 'Doc' in line: return False else: return True else: return False def check_item(num): '''檢查檔案是否下載完整''' zcfzb = './data/zcfzb/' + num + '.csv' lrb = './data/lrb/' + num + '.csv' xjllb = './data/xjllb/' + num + '.csv' if check_file(zcfzb) == False | check_file(lrb) == False | check_file(xjllb) == False: return False else: return True f = open('stock_name.txt', 'r') stockdict = json.loads(f.read()) f.close() count = 0 for num, v in stockdict.items(): count = count + 1 if count%100 == 0: print(int(count*100/len(stockdict)), '% completed downloading') #存放檔案的路徑 zcfzb = './data/zcfzb/' + num + '.csv' lrb = './data/lrb/' + num + '.csv' xjllb = './data/xjllb/' + num + '.csv' #檔案下載網址 zcfzb_url = "http://quotes.money.163.com/service/zcfzb_"+ num + ".html?type=year" lrb_url = "http://quotes.money.163.com/service/lrb_"+ num + ".html?type=year" xjllb_url = "http://quotes.money.163.com/service/xjllb_"+ num + ".html?type=year" get_file(zcfzb_url, zcfzb) get_file(lrb_url, lrb) get_file(xjllb_url, xjllb) #time.sleep(1) if check_item(num): pass else: print("被反爬了,休息10s") time.sleep(5)
有了股票程式碼就可以去網易財經上下報表了。以貴州茅臺為例,股票程式碼:600519
後得到3654家A股上市公司的三張表