1. 程式人生 > 程式設計 >python opencv pytesseract 驗證碼識別的實現

python opencv pytesseract 驗證碼識別的實現

一、環境配置

需要 pillow 和 pytesseract 這兩個庫,pip install 安裝就好了。

install pillow -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
pip install pytesseract -i http://pypi.douban.com/simple --trusted-host pypi.douban.com

安裝好Tesseract-OCR.exe

pytesseract 庫的配置:搜尋找到pytesseract.py,開啟該.py檔案,找到 tesseract_cmd,改變它的值為剛才安裝 tesseract.exe 的路徑。

python opencv pytesseract 驗證碼識別的實現

二、驗證碼識別

識別驗證碼,需要先對影象進行預處理,去除會影響識別準確度的線條或噪點,提高識別準確度。

例項1

import cv2 as cv
import pytesseract
from PIL import Image


def recognize_text(image):
  # 邊緣保留濾波 去噪
  dst = cv.pyrMeanShiftFiltering(image,sp=10,sr=150)
  # 灰度影象
  gray = cv.cvtColor(dst,cv.COLOR_BGR2GRAY)
  # 二值化
  ret,binary = cv.threshold(gray,255,cv.THRESH_BINARY_INV | cv.THRESH_OTSU)
  # 形態學操作  腐蝕 膨脹
  erode = cv.erode(binary,None,iterations=2)
  dilate = cv.dilate(erode,iterations=1)
  cv.imshow('dilate',dilate)
  # 邏輯運算 讓背景為白色 字型為黑 便於識別
  cv.bitwise_not(dilate,dilate)
  cv.imshow('binary-image',dilate)
  # 識別
  test_message = Image.fromarray(dilate)
  text = pytesseract.image_to_string(test_message)
  print(f'識別結果:{text}')


src = cv.imread(r'./test/044.png')
cv.imshow('input image',src)
recognize_text(src)
cv.waitKey(0)
cv.destroyAllWindows()

執行效果如下:

識別結果:3n3D

Process finished with exit code 0

python opencv pytesseract 驗證碼識別的實現

例項2

import cv2 as cv
import pytesseract
from PIL import Image


def recognize_text(image):
  # 邊緣保留濾波 去噪
  blur =cv.pyrMeanShiftFiltering(image,sp=8,sr=60)
  cv.imshow('dst',blur)
  # 灰度影象
  gray = cv.cvtColor(blur,cv.THRESH_BINARY_INV | cv.THRESH_OTSU)
  print(f'二值化自適應閾值:{ret}')
  cv.imshow('binary',binary)
  # 形態學操作 獲取結構元素 開操作
  kernel = cv.getStructuringElement(cv.MORPH_RECT,(3,2))
  bin1 = cv.morphologyEx(binary,cv.MORPH_OPEN,kernel)
  cv.imshow('bin1',bin1)
  kernel = cv.getStructuringElement(cv.MORPH_OPEN,(2,3))
  bin2 = cv.morphologyEx(bin1,kernel)
  cv.imshow('bin2',bin2)
  # 邏輯運算 讓背景為白色 字型為黑 便於識別
  cv.bitwise_not(bin2,bin2)
  cv.imshow('binary-image',bin2)
  # 識別
  test_message = Image.fromarray(bin2)
  text = pytesseract.image_to_string(test_message)
  print(f'識別結果:{text}')


src = cv.imread(r'./test/045.png')
cv.imshow('input image',src)
recognize_text(src)
cv.waitKey(0)
cv.destroyAllWindows()

執行效果如下:

二值化自適應閾值:181.0
識別結果:8A62N1

Process finished with exit code 0

python opencv pytesseract 驗證碼識別的實現

例項3

import cv2 as cv
import pytesseract
from PIL import Image


def recognize_text(image):
  # 邊緣保留濾波 去噪
  blur = cv.pyrMeanShiftFiltering(image,cv.COLOR_BGR2GRAY)
  # 二值化 設定閾值 自適應閾值的話 黃色的4會提取不出來
  ret,185,cv.THRESH_BINARY_INV)
  print(f'二值化設定的閾值:{ret}')
  cv.imshow('binary',binary)
  # 邏輯運算 讓背景為白色 字型為黑 便於識別
  cv.bitwise_not(binary,binary)
  cv.imshow('bg_image',binary)
  # 識別
  test_message = Image.fromarray(binary)
  text = pytesseract.image_to_string(test_message)
  print(f'識別結果:{text}')


src = cv.imread(r'./test/045.jpg')
cv.imshow('input image',src)
recognize_text(src)
cv.waitKey(0)
cv.destroyAllWindows()

執行效果如下:

二值化設定的閾值:185.0
識別結果:7364

Process finished with exit code 0

python opencv pytesseract 驗證碼識別的實現

到此這篇關於python opencv pytesseract 驗證碼識別的實現的文章就介紹到這了,更多相關opencv pytesseract 驗證碼識別內容請搜尋我們以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援我們!