python爬取codeforces比賽題目

阿新 • • 發佈：2018-12-24

cf的題目有很多Latex公式，而且是用’$$$’三個符號標記，所以複製題目寫部落格的時候很不方便，寫一個爬蟲儲存一場比賽中的所有題目資訊。

# -*- coding:utf-8 -*-

import os
import requests
from bs4 import BeautifulSoup

f = open('blog.md', 'w')

Latextag = 0

def GetHtmlText(url):
    try:
        r = requests.get(url, timeout = 30)
        r.raise_for_status()
        r.encoding = 'utf-8' 

        return r.text
    except:
        return ""

def Clear(text):
    flag = True
    while flag:
        flag = False
        try:
            index = text.index('$$$')
            if Latextag == 0:
                pass
            elif Latextag == 1:
                text = text[:index] + text[index + 1 
:]
            elif Latextag == 2:
                text = text[:index] + text[index + 2:]
            flag = True
        except:
            break
    return text

def FindInfo(soup, url):
    AllInfo = soup.find('div', {'class', 'problemindexholder'})
    divs = AllInfo.find_all('div')
    title = '# ' 
 + divs[3].get_text()
    f.write('%s\n' % title)
    problem = '## Description:\n' + divs[12].get_text()
    problem = Clear(problem)
    f.write('%s\n' % problem)
    Input = '## Input:\n' + divs[13].get_text()[5:]
    Input = Clear(Input)
    f.write('%s\n' % Input)
    Output = '## Output\n' + divs[15].get_text()[6:]
    Output = Clear(Output)
    f.write('%s\n' % Output)
    Sample = soup.find('div', {'class', 'sample-test'})
    SampleInputs = Sample.find_all('div', {'class', 'input'})
    SampleOutputs = Sample.find_all('div', {'class', 'output'})
    for i in range(len(SampleInputs)):
        SampleInput = SampleInputs[i].get_text()
        SampleOutput = SampleOutputs[i].get_text()
        f.write('## Sample Input:\n%s\n' % SampleInput[5:])
        f.write('## Sample Output:\n%s\n' % SampleOutput[6:])
    f.write('### [題目連結](%s)\n\n' % url)
    f.write('## AC程式碼:\n```\n```\n')

def main():
    global Latextag
    print('Welcome to use codeforces contest crawler\n')
    Latextag = int(input("Please enter the Latex tag you need(0:'$$$',1:'$$',2:'$'):\n"))
    Url = input("請輸入比賽連結(eg:'http://codeforces.com/contest/1003'):\n")
    Problem = input('請輸入比賽題目編號(eg:A B C D E F):\n').split(' ')
    Url += '/problem/'
    for i in Problem:
        url = Url + i;
        print(url)
        html = GetHtmlText(url).replace('<br />', '\n').replace('</p>', '\n')
        soup = BeautifulSoup(html, "html.parser")
        FindInfo(soup, url)
    f.close()

if __name__ == '__main__':
    main()

執行結果:

python爬取codeforces比賽題目

cf的題目有很多Latex公式，而且是用’$$$’三個符號標記，所以複製題目寫部落格的時候很不方便，寫一個爬蟲儲存一場比賽中的所有題目資訊。 # -*- coding:utf-8 -*- import os import requests from

python爬取足球比賽賽程筆記

decode range 目標 err 函數 find ade col 表示目標：爬取某網站比賽賽程，動態網頁，則需找到對應ajax請求（具體可參考：https://blog.csdn.net/you_are_my_dream/article/details/533999

python爬取網易雲音樂歌單音樂

string attrs default textarea bsp color read contents dom 在網易雲音樂中第一頁歌單的url：http://music.163.com/#/discover/playlist/ 依次第二頁：http://music.1

python 爬取qidian某一頁全部小說

decode return data- dib read etc break beautiful range 1 import re 2 import urllib.request 3 from bs4 import BeautifulSou

Python爬取今日頭條段子

找到 eat 修改是什麽一次時間地址 style 用戶名剛入門Python爬蟲，試了下爬取今日頭條官網中的段子，網址為https://www.toutiao.com/ch/essay_joke/源碼比較簡陋，如下： 1 import requests 2 i

利用python爬取龍虎榜數據及後續分析

登錄 one 可能股市 .com 爬蟲但我由於相關 ##之前已經有很多人寫過相關內容，但我之前並未閱讀過，這個爬蟲也是按照自己的思路寫的，可能比較醜陋，請見諒！本人作為Python爬蟲新手和股市韭菜，由於時間原因每晚沒辦法一個個翻龍虎榜數據，所以希望借助爬蟲篩選出

python爬取豆瓣小組700+話題加回復啦啦啦python open file with a variable name

技術分享 ash 寫入 blog ima ron tar 回復 -128 需求：爬取豆瓣小組所有話題（話題title，內容，作者，發布時間），及回復（最佳回復，普通回復，回復_回復，翻頁回復，0回復）解決：1. 先爬取小組下，所有的主題鏈接，通過定位nextp

Node.js/Python爬取網上漫畫

版本中間 kit ont mic 這一圖片加載同步改變　　某個周日晚上偶然發現了《火星異種》這部漫畫，便在網上在線看了起來。在看的過程中圖片加載很慢，而且有時候還不小心點到廣告，大大延緩了我看的進度。後來想到能不能把先把漫畫全部抓取到本地再去看。　　經過一段時間

python爬取百度搜索圖片

知乎需要 with 異常 mage 不足 request height adr 在之前通過爬取貼吧圖片有了一點經驗，先根據之前經驗再次爬取百度搜索界面圖片廢話不說，先上代碼 #!/usr/bin/env python # -*- coding: utf-8 -*- #

Python爬取百度貼吧數據

utf-8 支持我 family encode code word keyword 上一條時間　　本渣除了工作外，在生活上還是有些愛好，有些東西，一旦染上，就無法自拔，無法上岸，從此走上一條不歸路。花鳥魚蟲便是我堅持了數十年的愛好。　　本渣還是需要上班，才能支持我的

python爬取七星彩的開獎歷史記錄

clas 程序代碼 aip dal zip file utf mage decode 1.因為人不可能一直無休止的學習，偶爾也想做點兒別的，昨天無聊就想寫寫Python，當然我承認我上班後基本都是在學工作方面的事情，在這個崗位我也呆了三年多了，還是那句話問我什麽會不會我會給

python爬取網頁圖片

ima com col list https pytho 表達式 images 5% 在Python中使用正則表達式，一個小小的爬蟲，抓取百科詞條網頁的jpg圖片。下面就是我的代碼，作為參考： #coding=utf-8 # __author__ = ‘Hinfa‘ im

python爬取百度搜索結果ur匯總

百度搜索 sta attr amp end rom range 百度篩選寫了兩篇之後，我覺得關於爬蟲，重點還是分析過程分析些什麽呢： 1）首先明確自己要爬取的目標　　比如這次我們需要爬取的是使用百度搜索之後所有出來的url結果 2）分析手動進行的獲取目標的過程，以便

python 爬取微博信息

微博爬蟲 python cookie 新浪微博爬取的話需要設計到登錄，這裏我沒有模擬登錄，而是使用cookie進行爬取。獲取cookie：代碼：#-*-coding:utf8-*- from bs4 import BeautifulSoup import requests impor

Python爬取貓眼top100排行榜數據【含多線程】

代碼 status log col return map result port htm # -*- coding: utf-8 -*- import requests from multiprocessing import Pool from requests.e

python 爬取京東手機圖

跳過 close 高手 cnblogs port cep findall pen 得到初學urllib，高手勿噴... import re import urllib.request #函數：每一頁抓取的30張圖片 def craw(url,page): ima

Python 爬取數據入庫mysql

for filename raw adl note input 入庫 mat csv 1 # -*- enconding:etf-8 -*- 2 import pymysql 3 import os 4 import time 5 import re 6 se

python爬取煎蛋妹子圖（老司機養成之路）

chrom all with file windows import apple 妹子 lib 源碼： 1 import urllib.request 2 from bs4 import BeautifulSoup 3 import os 4 import io

python 爬取百度url

style not 域名 head dex fin compile threads www 1 #!/usr/bin/env python 2 # -*- coding: utf-8 -*- 3 # @Date : 2017-08-29 18:38:23 4

Python 爬取淘寶商品信息和相應價格

獲得 com ppa pri 大小 light parent tps 爬取！只用於學習用途！ plt = re.findall(r‘\"view_price\"\:\"[\d\.]*\"‘,html) ：獲得商品價格和view_price字段，並保存在plt中 tlt =

python爬取codeforces比賽題目

執行結果:

相關推薦