1. 程式人生 > 實用技巧 >製作自己的tesseract-docker環境映象包(實戰)

製作自己的tesseract-docker環境映象包(實戰)

  做OCR圖文識別,在linux系統上釋出時,需要安裝tesseract環境。網上資訊比較雜,基於各種linux系統做的Dockerfile,其表現也是五花八門,搞不清白。以下是我經過一兩天的摸索的成果,可以有效的部署環境,希望對大家有用。過程大致分為三個階段:1、製作基礎映象包,安裝tesseract環境;2、上傳tessdata語言包到伺服器上,供tesseract識別時對照;3、製作應用程式的映象,掛載tessdata語言包目錄到/usr/local/share/tessdata,同時設定docker容器的環境變數TESSDATA_PREFIX;

一、準備基礎映象的Dockerfile檔案。需要相關資原始檔tesseract-4.1.1.tar.gz,leptonica-1.80.0.tar.gz

https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.1

http://www.leptonica.org/source/leptonica-1.80.0.tar.gz

FROM mamohr/centos-java
LABEL ANTHOR="siman([email protected])" VERSION="1.0.0" BUILD_DATE="2020-09-01" \
      RESOURCES="https://github.com/tesseract-ocr/tesserac http://www.leptonica.org/index.html https://github.com/tesseract-ocr/tessdata" \
      DESCRIPTION=
"This image integrated and edited the running environment of tesseract-4.1.1 and leptonica-1.80.0, \ and made it based on CentOS system. Based on this basic image, you can run your own tess4j jar application" # 環境變數(tesseract) ENV LD_LIBRARY_PATH="/usr/local/lib" \ LIBLEPT_HEADERSDIR="/usr/local/include" \ PKG_CONFIG_PATH=
"/usr/local/lib/pkgconfig" # 安裝tesseract環境 ADD tesseract-4.1.1.tar.gz / ADD leptonica-1.80.0.tar.gz / RUN yum -y install file automake libicu-devel libpango1.0-dev libcairo-dev libjpeg-devel libpng-devel libtiff-devel zlib-devel libtool gcc-c++ make \ && cd /leptonica-1.80.0 && ./configure && make && make install \ && cd /tesseract-4.1.1 && ./autogen.sh && ./configure && make && make install \ && rm -rf /leptonica-1.80.0 /tesseract-4.1.1 # 時區設定 RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime RUN echo 'Asia/Shanghai' >/etc/timezone

二、建立基礎映象包

docker build -t tess/centos-java:v1.0 . 

三、安裝tessdata包

連結:https://pan.baidu.com/s/1XAvPkTdUXuFq-q2InDREhQ提取碼: 6vjp

四、製作自己的springboot-ocr服務映象包

FROM tess/centos-java:v1.0
LABEL ANTHOR="siman([email protected])" VERSION="1.0.0" BUILD_DATE="2020-09-01"
VOLUME /tmp
ADD simm-framework-test-1.0.jar app.jar
EXPOSE 8080
ENV  TESSDATA_PREFIX="/usr/local/share/tessdata"
# 啟動入口
ENTRYPOINT ["java","-jar","/app.jar"]

五、啟動容器,並掛載tessdata目錄

docker run -it -v /usr/tessdata:/usr/local/share/tessdata -p 8080:8080 --name="ocr-api" ocr-api:v1.0