1. 程式人生 > >python+pillow+pytesseract+Tesseract-OCR驗證碼識別[轉]

python+pillow+pytesseract+Tesseract-OCR驗證碼識別[轉]

安裝 pillow,pytesseract ,安裝該模組之後,還需要安裝 tesseract-ocr 。

(PS:如果安裝了pip,可以python的scripts檔案下,輸入cmd,然後輸入pip install pillow安裝最新版的pillow,如果需要安裝其它版本的則要自己下載安裝,安裝其它第三方庫都可用這種方法。)

tesseract-ocr 下載地址: https://digi.bib.uni-mannheim.de/tesseract/

本次測試下載的是 tesseract-ocr-setup-4.00.00dev.exe ,這塊的過程遇到好幾個問題。

FileNotFoundError: [WinError 2] 系統找不到指定的檔案。

pytesseract.pytesseract.TesseractError: (2, ‘Usage: python pytesseract.py [-l lang] input_file’)

pytesseract.pytesseract.TesseractError: (1, ‘Error opening data file \Program Files (x86)\Tesseract-OCR\eng.traineddata’)

這幾個問題主要是需要安裝配置Tesseract-OCR,

  1. 下載安裝tesseract-ocr,

  2. 新增環境變數: TESSDATA_PREFIX = C:\Program Files (x86)\Tesseract-OCR (PS:在環境變數中新新增變數:TESSDATA_PREFIX ,值(路徑)為:C:\Program Files (x86)\Tesseract-OCR(安裝路徑))

  3. 編輯檔案 D:\Python35\Lib\site-packages\pytesseract\pytesseract.py

tesseract_cmd = ‘tesseract’
改為:
tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract’

https://blog.csdn.net/qq_33472658/article/details/78760135

# coding=utf-8
import requests
import pytesseract
from PIL import Image
from
io import BytesIO # captcha_url = 'https://www.' # captcha_content = requests.get(url=captcha_url) # captcha_content = captcha_content.content # # 用自位元組讀出圖片 # image = Image.open(BytesIO(captcha_content)) img_path = r'1351_5243.png' image = Image.open(img_path) # 轉化為灰度圖 imgry = image.convert('L') table = [0 if i < 140 else 1 for i in range(256)] # 使字型更加突出的顯示 out = imgry.point(table,'1') # out.show() captcha = pytesseract.image_to_string(out) captcha = captcha.strip() captcha = captcha.upper() print(captcha)