Python實現從圖片提取文字

阿新 • • 發佈：2019-02-05

環境

Python3
Python3的pillow、pytesseract包
可使用pip install pillow、pip install pytesseract命令安裝
或者通過pycharm進行安裝
識別引擎tesseract-ocr ，下載地址

程式碼

#-*- coding:utf-8 -*-  
import pytesseract  
from PIL import Image  

# 使用pytesseract對英文進行識別，lang引數可省略  
print(pytesseract.image_to_string(Image.open('textEng.png' 
，lang='eng')))  
# 使用pytesseract對中文（含英文，但識別率降低）進行識別  
print(pytesseract.image_to_string(Image.open('textCh.png'), lang='chi_sim'))

該提取文字的功能對英文識別率還是可以的，但對中文稍差強人意，不過還是比手打的要方便。

報錯及解決

1. FileNotFoundError:[WinError 2]系統找不到指定檔案。

解決方法：
搜尋檔案pytesseract.py，找到如下程式碼，將tesseract_cmd的值修改為全路徑（tesseract檔案的全路徑，該檔案在Tesseract-OCR下）。如下：

tesseract_cmd = 'tesseract'

改為

tesseract_cmd = 'E:\Python36\Tesseract-OCR\\tesseract'

2.pytesseract.pytesseract.TesseractError: (1, ‘Error opening data file ··· ··· Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your “tessdata” directory. Failed loading language \’chi_sim\’ Tesseract couldn\’t load any languages! Could not initialize tesseract.’)

解決辦法：
1. 檢查是否將TESSDATA_PREFIX新增到系統變數裡，若沒有，在系統變數（注意不是環境變數）裡新建變數名TESSDATA_PREFIX，變數值為E:\Python37\Tesseract-OCR\（此處填你的Tesseract-OCR檔案路徑）。
2. 檢查“Tesseract-OCR\tessdata”路徑下是否存在chi_sim.traineddata（若是報錯是無法載入eng則檢視是否存在相應檔案），若沒有，下載chi_sim.traineddata檔案，並放置在“Tesseract-OCR\tessdata”路徑下。
3. 若還沒有解決：
開啟檔案pytesseract.py,找到image_to_string,在上面一行指定config的引數為tessdata檔案的路徑，如下：

tessdata_dir_config = '--tessdata-dir "E:\Python37\Tesseract-OCR\\tessdata"'
def image_to_string(image,
                    lang=None,
                    config='',
                    nice=0,
                    boxes=False,
                    output_type=Output.STRING):

3.permission denied：[WinError 5] 拒絕訪問

解決方法：
Tesseract-OCR預設安裝在”C:\Program Files (x86)”下，訪問該路徑需要administrator許可權。修改Tesseract-OCR安裝路徑並更改tesseract_cmd的值即可。

Python實現從圖片提取文字

環境

程式碼

報錯及解決

1. FileNotFoundError:[WinError 2]系統找不到指定檔案。

3.permission denied：[WinError 5] 拒絕訪問

Python實現從圖片提取文字

python-opencv-人臉識別實現從圖片中扣人臉

python實現簡單圖片爬蟲並保存

Python 實現簡單圖片驗證碼登錄

堆的實現（圖片演示+文字講解）

python實現gabor濾波器提取紋理特征提取指靜脈紋理特征指靜脈切割代碼

python實現本地圖片上傳到服務區

用Java實現給圖片新增文字水印-原始碼分享

Python實現動態圖片背景的二維碼

python實現從字串中找出特定字元的位置以及個數的方法

python實現彩色圖片灰度化並轉化為字元型圖片

tablayout實現新增圖片與文字

python實現從大圖中篩選出小的子圖並導成json檔案

從 HTML 提取文字的 7 個工具

python實現從二維矩陣左上角到右下角的出路數尋找

Python實現將圖片插入MySQL資料庫

圖片提取文字

Python實現批量圖片格式轉換

【內附PDF資料】Python實現下載圖片並生產PDF檔案

用python實現從1加到100的三種方法: for迴圈，while迴圈，匯入模組法

Python實現從圖片提取文字

環境

程式碼

報錯及解決

1. FileNotFoundError:[WinError 2]系統找不到指定檔案。

3.permission denied：[WinError 5] 拒絕訪問

相關推薦