python3 googletrans超時報錯問題及翻譯工具優化方案 附原始碼
一. 問題:
在寫呼叫谷歌翻譯介面的指令碼時,老是報錯,我使用的的是googletrans這個模組中Translator的translate方法,程式執行以後會報訪問超時錯誤:
Traceback (most recent call last): File "E:/PycharmProjects/MyProject/Translate/translate_test.py",line 3,in <module> result=translator.translate('안녕하세요.') File "D:\python3\lib\site-packages\googletrans\client.py",line 182,in translate data = self._translate(text,dest,src,kwargs) File "D:\python3\lib\site-packages\googletrans\client.py",line 78,in _translate token = self.token_acquirer.do(text) File "D:\python3\lib\site-packages\googletrans\gtoken.py",line 194,in do self._update() File "D:\python3\lib\site-packages\googletrans\gtoken.py",line 54,in _update r = self.client.get(self.host) File "D:\python3\lib\site-packages\httpx\_client.py",line 763,in get timeout=timeout,File "D:\python3\lib\site-packages\httpx\_client.py",line 601,in request request,auth=auth,allow_redirects=allow_redirects,timeout=timeout,line 621,in send request,line 648,in send_handling_redirects request,history=history File "D:\python3\lib\site-packages\httpx\_client.py",line 684,in send_handling_auth response = self.send_single_request(request,timeout) File "D:\python3\lib\site-packages\httpx\_client.py",line 719,in send_single_request timeout=timeout.as_dict(),File "D:\python3\lib\site-packages\httpcore\_sync\connection_pool.py",line 153,in request method,url,headers=headers,stream=stream,timeout=timeout File "D:\python3\lib\site-packages\httpcore\_sync\connection.py",line 65,in request self.socket = self._open_socket(timeout) File "D:\python3\lib\site-packages\httpcore\_sync\connection.py",line 86,in _open_socket hostname,port,ssl_context,timeout File "D:\python3\lib\site-packages\httpcore\_backends\sync.py",line 139,in open_tcp_stream return SyncSocketStream(sock=sock) File "D:\python3\lib\contextlib.py",line 130,in __exit__ self.gen.throw(type,value,traceback) File "D:\python3\lib\site-packages\httpcore\_exceptions.py",line 12,in map_exceptions raise to_exc(exc) from None httpcore._exceptions.ConnectTimeout: timed out
二. 解決方法:
1.尋找解決方法
經過多方資料查詢,最後才知道google翻譯對介面進行了更新,之前用的googletrans已經不能用了。但是網上大神已經開發出了新的方法
https://github.com/lushan88a/google_trans_new
在此道一聲感謝!
2.使用解決方法
在cmd中輸入以下指令即可。
pip install google_trans_new
三. 程式碼(優化)
from google_trans_new import google_translator from multiprocessing.dummy import Pool as ThreadPool import time import re """ 此版本呼叫最新版google_trans_new 使用多執行緒訪問谷歌翻譯介面 能夠翻譯len(text)>5000的文字 """ class Translate(object): def __init__(self): #初始化翻譯文字路徑以及翻譯目標語言 self.txt_file='./test.txt' self.aim_language='zh-CN' #讀入要翻譯的文字檔案 def read_txt(self): with open(self.txt_file,'r',encoding='utf-8')as f: txt = f.readlines() return txt #進行文字處理,此為優化 def cut_text(self,text): #如果只是一行,就切割成5000字一次來翻譯 if len(text)==1: str_text = ''.join(text).strip() #篩選是一行但是文字長度大於5000 if len(str_text)>5000: #使用正則表示式切割超長文字為5000一段的短文字 result = re.findall('.{5000}',str_text) return result else: #如果文字為一行但是這一行文字長度小於5000,則直接返回text return text """ 如果不止一行,加以判斷 (1)每行字元數都小於5000 (2)有的行字元數小於5000,有的行字元數大於5000 """ else: result = [] for line in text: #第(1)種情況 if len(line)<5000: result.append(line) else: # 第(2)種情況,切割以後,追加到列表中 cut_str=re.findall('.{5000}',line) result.extend(cut_str) return result def translate(self,text): if text: aim_lang = self.aim_language try: t = google_translator(timeout=10) translate_text = t.translate(text,aim_lang) print(translate_text) return translate_text except Exception as e: print(e) def main(): time1=time.time() #開啟八條執行緒 pool = ThreadPool(8) trans = Translate() txt = trans.read_txt() texts = trans.cut_text(txt) try: pool.map(trans.translate,texts) except Exception as e: raise e pool.close() pool.join() time2 = time.time() print("一共翻譯了 {} 個句子,消耗了 {:.2f} s".format(len(texts),time2 - time1)) if __name__ == "__main__" : main()
測試文字我放在了:http://xiazai.jb51.net/202012/yuanma/test.rar
可自行下載。
四. 執行結果
五. 總結
本篇首先解決了呼叫googletrans模組的報錯問題,然後使用新的google翻譯模組編寫了程式碼,並且解決了我這篇文章中翻譯文字長度不能大於5000的問題。