A rookie of python_crawler----2(dict)

阿新 • • 發佈：2018-12-13

用crwaler做了個字典，可以中翻英，英翻中，還有例句什麼的，做了個簡單的實驗

# A simple dict made by crawler, supporting Chiners->English and English->Chinese
import requests
import re
from bs4 import BeautifulSoup

browser = requests.Session()


def build_url():
    a = 0
    mesg = input("PLease input:")
    if re.search('^[A-Za-z].*', mesg):
        a = 1
    mesg = mesg.encode()
    mesg = (str(mesg).replace(r'\x', '%'))[2:-1]
    return mesg, a


def CH_EN(addr):
    # browser=requests.Session()
    # http://dict.cn/%E8%8B%B9%E6%9E%9C  apple
    # http://dict.cn/%E6%B0%B4%E6%9E%9C  fruit
    url = 'http://dict.cn/' + addr
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Geck    o) '
                      'Chrome/70.0.3538.77 Safari/537.36'
    }

    response = browser.get(url, headers=headers)

    dic = BeautifulSoup(response.text.replace('<br>', '').replace('<br/>', ''), 'html.parser')

    ch_label = dic.find('h1', class_='keyword')  # 中文標籤
    print(ch_label.get_text())

    # 基本釋義
    en_meaning = dic.find_all('a', href=re.compile(r'http://dict.cn/[a-z]'))
    print('------------')
    print("基本釋義")
    for mean in en_meaning:
        # print(mean['href'])
        result = re.search('/dir/', mean['href'])
        result2 = re.search('/jp/', mean['href'])
        result3 = re.search('/de/', mean['href'])
        result4 = re.search('/fr/', mean['href'])
        result5 = re.search('/kr/', mean['href'])
        result6 = re.search('/es/', mean['href'])
        result7 = re.search('/it/', mean['href'])
        result8 = re.search('/ru/', mean['href'])
        result9 = re.search('/list/yinbiao', mean['href'])
        # print(result)
        if result or result2 or result3 or result4 or result5 or result6 or result7 or result8 or result9:
            continue
        else:
            print((mean.get_text()).strip())

    # 例句
    print('------------')
    print("例句")
    sentence = dic.find_all('ol', slider='2')
    for each in sentence:
        each_2 = each.find_all('li')
        for each_3 in each_2:
            response = each_3.string
            key = re.findall('(?<=\t\t\t\t\t)(.*)(?=\t\t\t\t\t)', response)
            for i in key:
                print(i)

    return


def EN_CH(addr):
    # browser=requests.Session()
    # http://dict.cn/%E8%8B%B9%E6%9E%9C  apple
    # http://dict.cn/%E6%B0%B4%E6%9E%9C  fruit
    url = 'http://dict.cn/' + addr
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Geck    o) '
                      'Chrome/70.0.3538.77 Safari/537    .36'
    }

    response = browser.get(url, headers=headers)
    dic = BeautifulSoup(response.text.replace('<br>', '').replace('<br/>', ''), 'html.parser')

    en_label = dic.find('h1', class_='keyword')  # 英文標籤
    print(en_label.get_text())

    # 釋義
    print('------------')
    print("釋義")
    mean = dic.find_all('li')
    for each in mean:
        each_2 = each.find('span')
        each_3 = each.find('strong')
        if not each_2:
            break
        else:
            print(each_2.get_text() + '\t' + each_3.get_text())

    # 例句
    print('------------')
    print("例句")
    mean = dic.find_all('ol', slider='2')
    for each in mean:
        if not each.string:
            each_2 = each.find_all('li')
            for each_3 in each_2:
                if not each_3.find('strong'):
                    if re.search('^[A-Za-z].*', each_3.string):
                        print(each_3.string + '\n')


# main
url_addr, flag = build_url()

if flag == 1:
    EN_CH(url_addr)
else:
    CH_EN(url_addr)

A rookie of python_crawler----2(dict)

用crwaler做了個字典，可以中翻英，英翻中，還有例句什麼的，做了個簡單的實驗 # A simple dict made by crawler, supporting Chiners->English and English->Chinese import requests imp

A rookie of python_crawler----1(tf)

記錄一個菜鳥學習爬蟲的過程下面這個程式碼很簡單，爬取的是TF官網上熱門口紅的資訊採取的是最基本的BeautifulSoup和requests庫 #A simple code for crawling the information of the popular TF-lipstick

後端程序員之路 52、A Tour of Go-2

run arrays primes var auto 程序 pointer ase tex # flowcontrol - for - for i := 0; i < 10; i++ { - for ; sum < 1000;

Codeforces Round #432 (Div. 2) D. Arpa and a list of numbers（暴力）

esp for int ans logs and codeforce style inf 枚舉質數，判斷是否超過臨界值。臨界值就是將不是因子中不含這個素數的數的個數乘以x和y的較小值，是否小於當前最小值。 #include <algorithm> #inclu

CHAPTER 1 ----- a tour of computer sysytems(2)

reads 地址 cpu mach sin sel error evel over 1.3 It pays to understand how compilation systems work Why programmers need to understand how

Codeforces Round #487 (Div. 2) A Mist of Florescence (暴力構造)

while XA lap form output () stream contains rain C. A Mist of Florescence time limit per test 1 second memory limit per test 256 megab

Codeforces Round #487 (Div. 2) C - A Mist of Florescence

至少 pan mes info enter test pre -- img 傳送門：http://codeforces.com/contest/989/problem/C 這是一個構造問題。構造一張網格，網格中的字符為’A’、’B’、’C’、’D’，並且其連通塊的個數分別

D. Arpa and a list of numbers Codeforces Round #432 (Div. 2, based on IndiaHacks Final Round 2017)

bsp tdi ble mat sum i++ amp ext com http://codeforces.com/contest/851/problem/D 分區間操作 1 #include <cstdio> 2 #include <cstdl

A Tour of Go: Basics 2

原則 panic https 動作語言表達包括 for cas For For語句有三個基本部分組成，以分號隔開：初始語句：只在第一次循環開始前執行，通常就是變量定義和初始化，這裏定義的變量作用範圍只在For循環本身。條件表達式：每一次循環開始前執行，當fals

Codeforces Round #487 (Div. 2) C - A Mist of Florescence

-- pac != using ace work 矩形 col () C - A Mist of Florescence 把50*50的矩形拆成4塊 #include<bits/stdc++.h> using namespace std; int a

2----A Comparison of Short-Term Load Forecasting Techniques

短期負荷預測的比較技術印度的：就是多種演算法的比較，RF的結果最好 Auto-Regressive Integrated Moving Average (ARIMA), Multiple Linear Regression (MLR), Recursive Partitioning and

推薦系統論文筆記（2）：Towards the Next Generation of Recommender Systems:A Survey of the State-of-the-Art ....

一、基本資訊論文題目：《Towards the Next Generation of Recommender Systems:A Survey of the State-of-the-Art and Possible Extensions》發表時間：July 2005,IEEE Tran

Internets of interest #2: John Ousterhout discusses a Philosophy of Software Design

Ousterhout’s opus is tearing up tech twitter at the moment. But for those outside the North American prime shipping service area, we’re shit out of luc

GoLand 2018.2 EAP 3: Move refactoring, vgo updates, new inspections, actions and a lot of platform updates

We know it’s been a while since we last released an EAP for our 2018.2. But today we are happy to release our third iteration and give you som

CF-Codeforces Round #487 (Div. 2)-D-A Shade of Moonlight

ACM模版描述題解數軸上 nn 個不重疊的雲，給座標，長度都是 ll，有些雲速度 11，有些雲速度 −1−1，現風速 ww，問在風速不大於 wmaxwmax 時，有幾對雲可能在

Writing a Resource Manager -- Chapter 2：The Bones of a Resource Manager

Chapter 2：The Bones of a Resource Manager 讓我們從資源管理器的整體結構開始。首先，我們將瞭解客戶端和伺服器端的內幕情況。之後，我們將進入資源管理器中的層，然後檢視一些示例。 Under the covers 儘管您將使用隱藏了許

Codeforces Round #487 (Div. 2) C. A Mist of Florescence [思維+構造矩陣]

There are four kinds of flowers in the wood, Amaranths, Begonias, Centaureas and Dianthuses.The wood can be represented by a rectangular grid of n rows and

平日小問題 (2)：[Error] 'setprecision' is not a member of 'std'

#include <iostream> template <typename t> t sum(t x,t y) { return x+y; } int main()

freedom is a kind of responsibility

重寫小學三個月一個經濟創新裏的它的整形張維迎教授在2017年7月1日北大國發院2017屆畢業典禮上的發言《自由是一種責任》張維迎：自由是一種責任本文為張維迎教授在2017年7月1日北大國發院2017屆畢業典禮上的發言同學們好！首先祝賀大

03 Complementing a Strand of DNA

osal vco str tga truct nat dual dataset vid Problem In DNA strings, symbols ‘A‘ and ‘T‘ are complements of each other, as are ‘C‘ and ‘G

A rookie of python_crawler----2(dict)

相關推薦