【OCR技術系列之八】端到端不定長文字識別CRNN程式碼實現

阿新 • • 發佈：2020-10-25

【OCR技術系列之八】端到端不定長文字識別CRNN程式碼實現

CRNN是OCR領域非常經典且被廣泛使用的識別演算法，其理論基礎可以參考我上一篇文章，本文將著重講解CRNN程式碼實現過程以及識別效果。

資料處理

利用影象處理技術我們手工大批量生成文字影象，一共360萬張影象樣本，效果如下：

我們劃分了訓練集和測試集（10:1），並單獨儲存為兩個文字檔案：

文字檔案裡的標籤格式如下：

我們獲取到的是最原始的資料集，在影象深度學習訓練中我們一般都會把原始資料集轉化為lmdb格式以方便後續的網路訓練。因此我們也需要對該資料集進行lmdb格式轉化。下面程式碼就是用於lmdb格式轉化，思路比較簡單，就是首先讀入影象和對應的文字標籤，先使用字典將該組合儲存起來（cache），再利用lmdb包的put函式把字典(cache)儲存的k,v寫成lmdb格式儲存好（cache當有了1000個元素就put一次）。


import lmdb
import cv2
import numpy as np
import os


def checkImageIsValid(imageBin):
    if imageBin is None:
        return False
    try:
        imageBuf = np.fromstring(imageBin, dtype=np.uint8)
        img = cv2.imdecode(imageBuf, cv2.IMREAD_GRAYSCALE)
        imgH, imgW = img.shape[0], img.shape[1]
    except:
        return False
    else:
        if imgH * imgW == 0:
            return False
    return True


def writeCache(env, cache):
    with env.begin(write=True) as txn:
        for k, v in cache.items():
            txn.put(k, v)


def createDataset(outputPath, imagePathList, labelList, lexiconList=None, checkValid=True):
    """
    Create LMDB dataset for CRNN training.
    ARGS:
        outputPath    : LMDB output path
        imagePathList : list of image path
        labelList     : list of corresponding groundtruth texts
        lexiconList   : (optional) list of lexicon lists
        checkValid    : if true, check the validity of every image
    """
    assert (len(imagePathList) == len(labelList))
    nSamples = len(imagePathList)
    env = lmdb.open(outputPath, map_size=1099511627776)
    cache = {}
    cnt = 1
    for i in range(nSamples):
        imagePath = ''.join(imagePathList[i]).split()[0].replace('\n', '').replace('\r\n', '')
        # print(imagePath)
        label = ''.join(labelList[i])
        print(label)
        # if not os.path.exists(imagePath):
        #     print('%s does not exist' % imagePath)
        #     continue

        with open('.' + imagePath, 'r') as f:
            imageBin = f.read()

        if checkValid:
            if not checkImageIsValid(imageBin):
                print('%s is not a valid image' % imagePath)
                continue
        imageKey = 'image-%09d' % cnt
        labelKey = 'label-%09d' % cnt
        cache[imageKey] = imageBin
        cache[labelKey] = label
        if lexiconList:
            lexiconKey = 'lexicon-%09d' % cnt
            cache[lexiconKey] = ' '.join(lexiconList[i])
        if cnt % 1000 == 0:
            writeCache(env, cache)
            cache = {}
            print('Written %d / %d' % (cnt, nSamples))
        cnt += 1
        print(cnt)
    nSamples = cnt - 1
    cache['num-samples'] = str(nSamples)
    writeCache(env, cache)
    print('Created dataset with %d samples' % nSamples)


OUT_PATH = '../crnn_train_lmdb'
IN_PATH = './train.txt'

if __name__ == '__main__':
    outputPath = OUT_PATH
    if not os.path.exists(OUT_PATH):
        os.mkdir(OUT_PATH)
    imgdata = open(IN_PATH)
    imagePathList = list(imgdata)

    labelList = []
    for line in imagePathList:
        word = line.split()[1]
        labelList.append(word)
    createDataset(outputPath, imagePathList, labelList)

我們執行上面的程式碼，可以得到訓練集和測試集的lmdb

在資料準備部分還有一個操作需要強調的，那就是文字標籤數字化，即我們用數字來表示每一個文字（漢字，英文字母，標點符號）。比如“我”字對應的id是1，“l”對應的id是1000，“？”對應的id是90，如此類推，這種編解碼工作使用字典資料結構儲存即可，訓練時先把標籤編碼（encode），預測時就將網路輸出結果解碼(decode)成文字輸出。


class strLabelConverter(object):
    """Convert between str and label.

    NOTE:
        Insert `blank` to the alphabet for CTC.

    Args:
        alphabet (str): set of the possible characters.
        ignore_case (bool, default=True): whether or not to ignore all of the case.
    """

    def __init__(self, alphabet, ignore_case=False):
        self._ignore_case = ignore_case
        if self._ignore_case:
            alphabet = alphabet.lower()
        self.alphabet = alphabet + '-'  # for `-1` index

        self.dict = {}
        for i, char in enumerate(alphabet):
            # NOTE: 0 is reserved for 'blank' required by wrap_ctc
            self.dict[char] = i + 1

    def encode(self, text):
        """Support batch or single str.

        Args:
            text (str or list of str): texts to convert.

        Returns:
            torch.IntTensor [length_0 + length_1 + ... length_{n - 1}]: encoded texts.
            torch.IntTensor [n]: length of each text.
        """

        length = []
        result = []
        for item in text:
            item = item.decode('utf-8', 'strict')

            length.append(len(item))
            for char in item:

                index = self.dict[char]
                result.append(index)

        text = result
        # print(text,length)
        return (torch.IntTensor(text), torch.IntTensor(length))

    def decode(self, t, length, raw=False):
        """Decode encoded texts back into strs.

        Args:
            torch.IntTensor [length_0 + length_1 + ... length_{n - 1}]: encoded texts.
            torch.IntTensor [n]: length of each text.

        Raises:
            AssertionError: when the texts and its length does not match.

        Returns:
            text (str or list of str): texts to convert.
        """
        if length.numel() == 1:
            length = length[0]
            assert t.numel() == length, "text with length: {} does not match declared length: {}".format(t.numel(),
                                                                                                         length)
            if raw:
                return ''.join([self.alphabet[i - 1] for i in t])
            else:
                char_list = []
                for i in range(length):
                    if t[i] != 0 and (not (i > 0 and t[i - 1] == t[i])):
                        char_list.append(self.alphabet[t[i] - 1])
                return ''.join(char_list)
        else:
            # batch mode
            assert t.numel() == length.sum(), "texts with length: {} does not match declared length: {}".format(
                t.numel(), length.sum())
            texts = []
            index = 0
            for i in range(length.numel()):
                l = length[i]
                texts.append(
                    self.decode(
                        t[index:index + l], torch.IntTensor([l]), raw=raw))
                index += l
            return texts

網路設計

根據CRNN的論文描述，CRNN是由CNN-》RNN-》CTC三大部分架構而成，分別對應卷積層、迴圈層和轉錄層。首先CNN部分用於底層的特徵提取，RNN採取了BiLSTM，用於學習關聯序列資訊並預測標籤分佈，CTC用於序列對齊，輸出預測結果。

為了將特徵輸入到Recurrent Layers，做如下處理：

首先會將影象縮放到 32×W×3 大小
然後經過CNN後變為 1×（W/4）× 512
接著針對LSTM，設定 T=(W/4) ， D=512 ，即可將特徵輸入LSTM。

以上是理想訓練時的操作，但是CRNN論文提到的網路輸入是歸一化好的100×32大小的灰度影象，即高度統一為32個畫素。下面是CRNN的深度神經網路結構圖，CNN採取了經典的VGG16，值得注意的是，在VGG16的第3第4個max pooling層CRNN採取的是1×2的矩形池化視窗(w×h)，這有別於經典的VGG16的2×2的正方形池化視窗，這個改動是因為文字影象多數都是高較小而寬較長，所以其feature map也是這種高小寬長的矩形形狀，如果使用1×2的池化視窗則更適合英文字母識別（比如區分i和l）。VGG16部分還引入了BatchNormalization模組，旨在加速模型收斂。還有值得注意一點，CRNN的輸入是灰度影象，即影象深度為1。CNN部分的輸出是512x1x16（c×h×w）的特徵向量。

接下來分析RNN層。RNN部分使用了雙向LSTM，隱藏層單元數為256，CRNN採用了兩層BiLSTM來組成這個RNN層，RNN層的輸出維度將是（s,b,class_num），其中class_num為文字類別總數。

值得注意的是：Pytorch裡的LSTM單元接受的輸入都必須是3維的張量（Tensors）.每一維代表的意思不能弄錯。第一維體現的是序列（sequence）結構，第二維度體現的是小塊（mini-batch）結構，第三位體現的是輸入的元素（elements of input）。如果在應用中不適用小塊結構，那麼可以將輸入的張量中該維度設為1，但必須要體現出這個維度。

LSTM的輸入

input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. 
The input can also be a packed variable length sequence.
input shape(a,b,c)
a:seq_len  -> 序列長度
b:batch
c:input_size   輸入特徵數目

根據LSTM的輸入要求，我們要對CNN的輸出做些調整，即把CNN層的輸出調整為[seq_len, batch, input_size]形式，下面為具體操作：先使用squeeze函式移除h維度，再使用permute函式調整各維順序，即從原來[w, b, c]的調整為[seq_len, batch, input_size]，具體尺寸為[16,batch,512]，調整好之後即可以將該矩陣送入RNN層。


x = self.cnn(x)
b, c, h, w = x.size()
# print(x.size()): b,c,h,w
assert h == 1   # "the height of conv must be 1"
x = x.squeeze(2)  # remove h dimension, b *512 * width
x = x.permute(2, 0, 1)  # [w, b, c] = [seq_len, batch, input_size]
x = self.rnn(x)

RNN層輸出格式如下，因為我們採用的是雙向BiLSTM，所以輸出維度將是hidden_unit * 2

Outputs: output, (h_n, c_n)
output of shape (seq_len, batch, num_directions * hidden_size)
h_n of shape (num_layers * num_directions, batch, hidden_size)
c_n (num_layers * num_directions, batch, hidden_size)

然後我們再通過線性變換操作self.embedding1 = torch.nn.Linear(hidden_unit * 2, 512)是的輸出維度再次變為512，繼續送入第二個LSTM層。第二個LSTM層後繼續接線性操作torch.nn.Linear(hidden_unit * 2, class_num)使得整個RNN層的輸出為文字類別總數。

import torch
import torch.nn.functional as F


class Vgg_16(torch.nn.Module):

    def __init__(self):
        super(Vgg_16, self).__init__()
        self.convolution1 = torch.nn.Conv2d(1, 64, 3, padding=1)
        self.pooling1 = torch.nn.MaxPool2d(2, stride=2)
        self.convolution2 = torch.nn.Conv2d(64, 128, 3, padding=1)
        self.pooling2 = torch.nn.MaxPool2d(2, stride=2)
        self.convolution3 = torch.nn.Conv2d(128, 256, 3, padding=1)
        self.convolution4 = torch.nn.Conv2d(256, 256, 3, padding=1)
        self.pooling3 = torch.nn.MaxPool2d((1, 2), stride=(2, 1)) # notice stride of the non-square pooling
        self.convolution5 = torch.nn.Conv2d(256, 512, 3, padding=1)
        self.BatchNorm1 = torch.nn.BatchNorm2d(512)
        self.convolution6 = torch.nn.Conv2d(512, 512, 3, padding=1)
        self.BatchNorm2 = torch.nn.BatchNorm2d(512)
        self.pooling4 = torch.nn.MaxPool2d((1, 2), stride=(2, 1))
        self.convolution7 = torch.nn.Conv2d(512, 512, 2)

    def forward(self, x):
        x = F.relu(self.convolution1(x), inplace=True)
        x = self.pooling1(x)
        x = F.relu(self.convolution2(x), inplace=True)
        x = self.pooling2(x)
        x = F.relu(self.convolution3(x), inplace=True)
        x = F.relu(self.convolution4(x), inplace=True)
        x = self.pooling3(x)
        x = self.convolution5(x)
        x = F.relu(self.BatchNorm1(x), inplace=True)
        x = self.convolution6(x)
        x = F.relu(self.BatchNorm2(x), inplace=True)
        x = self.pooling4(x)
        x = F.relu(self.convolution7(x), inplace=True)
        return x  # b*512x1x16


class RNN(torch.nn.Module):
    def __init__(self, class_num, hidden_unit):
        super(RNN, self).__init__()
        self.Bidirectional_LSTM1 = torch.nn.LSTM(512, hidden_unit, bidirectional=True)
        self.embedding1 = torch.nn.Linear(hidden_unit * 2, 512)
        self.Bidirectional_LSTM2 = torch.nn.LSTM(512, hidden_unit, bidirectional=True)
        self.embedding2 = torch.nn.Linear(hidden_unit * 2, class_num)

    def forward(self, x):
        x = self.Bidirectional_LSTM1(x)   # LSTM output: output, (h_n, c_n)
        T, b, h = x[0].size()   # x[0]: (seq_len, batch, num_directions * hidden_size)
        x = self.embedding1(x[0].view(T * b, h))  # pytorch view() reshape as [T * b, nOut]
        x = x.view(T, b, -1)  # [16, b, 512]
        x = self.Bidirectional_LSTM2(x)
        T, b, h = x[0].size()
        x = self.embedding2(x[0].view(T * b, h))
        x = x.view(T, b, -1)
        return x  # [16,b,class_num]


# output: [s,b,class_num]
class CRNN(torch.nn.Module):
    def __init__(self, class_num, hidden_unit=256):
        super(CRNN, self).__init__()
        self.cnn = torch.nn.Sequential()
        self.cnn.add_module('vgg_16', Vgg_16())
        self.rnn = torch.nn.Sequential()
        self.rnn.add_module('rnn', RNN(class_num, hidden_unit))

    def forward(self, x):
        x = self.cnn(x)
        b, c, h, w = x.size()
        # print(x.size()): b,c,h,w
        assert h == 1   # "the height of conv must be 1"
        x = x.squeeze(2)  # remove h dimension, b *512 * width
        x = x.permute(2, 0, 1)  # [w, b, c] = [seq_len, batch, input_size]
        # x = x.transpose(0, 2)
        # x = x.transpose(1, 2)
        x = self.rnn(x)
        return x

損失函式設計

剛剛完成了CNN層和RNN層的設計，現在開始設計轉錄層，即將RNN層輸出的結果翻譯成最終的識別文字結果，從而實現不定長的文字識別。pytorch沒有內建的CTC loss，所以只能去Github下載別人實現的CTC loss來完成損失函式部分的設計。安裝CTC-loss的方式如下：

git clone https://github.com/SeanNaren/warp-ctc.git
cd warp-ctc
mkdir build; cd build
cmake ..
make
cd ../pytorch_binding/
python setup.py install
cd ../build
cp libwarpctc.so ../../usr/lib

待安裝完畢後，我們可以直接呼叫CTC loss了，以一個小例子來說明ctc loss的用法。

import torch
from warpctc_pytorch import CTCLoss
ctc_loss = CTCLoss()
# expected shape of seqLength x batchSize x alphabet_size
probs = torch.FloatTensor([[[0.1, 0.6, 0.1, 0.1, 0.1], [0.1, 0.1, 0.6, 0.1, 0.1]]]).transpose(0, 1).contiguous()
labels = torch.IntTensor([1, 2])
label_sizes = torch.IntTensor([2])
probs_sizes = torch.IntTensor([2])
probs.requires_grad_(True)  # tells autograd to compute gradients for probs
cost = ctc_loss(probs, labels, probs_sizes, label_sizes)
cost.backward()

CTCLoss(size_average=False, length_average=False)
    # size_average (bool): normalize the loss by the batch size (default: False)
    # length_average (bool): normalize the loss by the total number of frames in the batch. If True, supersedes size_average (default: False)

forward(acts, labels, act_lens, label_lens)
    # acts: Tensor of (seqLength x batch x outputDim) containing output activations from network (before softmax)
    # labels: 1 dimensional Tensor containing all the targets of the batch in one large sequence
    # act_lens: Tensor of size (batch) containing size of each output sequence from the network
    # label_lens: Tensor of (batch) containing label length of each example

從上面的程式碼可以看出，CTCLoss的輸入為[probs, labels, probs_sizes, label_sizes]，即預測結果、標籤、預測結果的數目和標籤數目。那麼我們仿照這個例子開始設計CRNN的CTC LOSS。


preds = net(image)
preds_size = Variable(torch.IntTensor([preds.size(0)] * batch_size))  # preds.size(0)=w=16
cost = criterion(preds, text, preds_size, length) / batch_size   # 這裡的length就是包含每個文字標籤的長度的list，除以batch_size來求平均loss
cost.backward()

網路訓練設計

接下來我們需要完善具體的訓練流程，我們還寫了個trainBatch函式用於bacth形式的梯度更新。

def trainBatch(net, criterion, optimizer, train_iter):
    data = train_iter.next()
    cpu_images, cpu_texts = data
    batch_size = cpu_images.size(0)
    lib.dataset.loadData(image, cpu_images)
    t, l = converter.encode(cpu_texts)
    lib.dataset.loadData(text, t)
    lib.dataset.loadData(length, l)

    preds = net(image)
    #print("preds.size=%s" % preds.size)
    preds_size = Variable(torch.IntTensor([preds.size(0)] * batch_size))  # preds.size(0)=w=22
    cost = criterion(preds, text, preds_size, length) / batch_size  # length= a list that contains the len of text label in a batch
    net.zero_grad()
    cost.backward()
    optimizer.step()
    return cost

整個網路訓練的流程如下：CTC-LOSS物件->CRNN網路物件->image,text,len的tensor初始化->優化器初始化，然後開始迴圈每個epoch，指定迭代次數就進行模型驗證和模型儲存。CRNN論文提到所採用的優化器是Adadelta，但是經過我實驗看來，Adadelta的收斂速度非常慢，所以改用了RMSprop優化器，模型收斂速度大幅度提升。


    criterion = CTCLoss()

    net = Net.CRNN(n_class)
    print(net)

    net.apply(lib.utility.weights_init)

    image = torch.FloatTensor(Config.batch_size, 3, Config.img_height, Config.img_width)
    text = torch.IntTensor(Config.batch_size * 5)
    length = torch.IntTensor(Config.batch_size)

    if cuda:
        net.cuda()
        image = image.cuda()
        criterion = criterion.cuda()

    image = Variable(image)
    text = Variable(text)
    length = Variable(length)

    loss_avg = lib.utility.averager()

    optimizer = optim.RMSprop(net.parameters(), lr=Config.lr)
    #optimizer = optim.Adadelta(net.parameters(), lr=Config.lr)
    #optimizer = optim.Adam(net.parameters(), lr=Config.lr,
                           #betas=(Config.beta1, 0.999))

    for epoch in range(Config.epoch):
        train_iter = iter(train_loader)
        i = 0
        while i < len(train_loader):
            for p in net.parameters():
                p.requires_grad = True
            net.train()

            cost = trainBatch(net, criterion, optimizer, train_iter)
            loss_avg.add(cost)
            i += 1

            if i % Config.display_interval == 0:
                print('[%d/%d][%d/%d] Loss: %f' %
                      (epoch, Config.epoch, i, len(train_loader), loss_avg.val()))
                loss_avg.reset()

            if i % Config.test_interval == 0:
                val(net, test_dataset, criterion)

            # do checkpointing
            if i % Config.save_interval == 0:
                torch.save(
                    net.state_dict(), '{0}/netCRNN_{1}_{2}.pth'.format(Config.model_dir, epoch, i))

訓練過程與測試設計

下面這幅圖表示的就是CRNN訓練過程，文字類別數為6732，一共訓練20個epoch，batch_Szie設定為64，所以一共是51244次迭代/epoch。

在迭代4個epoch時，loss降到0.1左右，acc上升到0.98。

接下來我們設計推斷預測部分的程式碼，首先需初始化CRNN網路，載入訓練好的模型，讀入待預測的影象並resize為高為32的灰度影象，接著講該影象送入網路，最後再將網路輸出解碼成文字即可輸出。


import time
import torch
import os
from torch.autograd import Variable
import lib.convert
import lib.dataset
from PIL import Image
import Net.net as Net
import alphabets
import sys
import Config

os.environ['CUDA_VISIBLE_DEVICES'] = "4"

crnn_model_path = './bs64_model/netCRNN_9_48000.pth'
IMG_ROOT = './test_images'
running_mode = 'gpu'
alphabet = alphabets.alphabet
nclass = len(alphabet) + 1


def crnn_recognition(cropped_image, model):
    converter = lib.convert.strLabelConverter(alphabet)  # 標籤轉換

    image = cropped_image.convert('L')  # 影象灰度化

    ### Testing images are scaled to have height 32. Widths are
    # proportionally scaled with heights, but at least 100 pixels
    w = int(image.size[0] / (280 * 1.0 / Config.infer_img_w))
    #scale = image.size[1] * 1.0 / Config.img_height
    #w = int(image.size[0] / scale)

    transformer = lib.dataset.resizeNormalize((w, Config.img_height))
    image = transformer(image)
    if torch.cuda.is_available():
        image = image.cuda()
    image = image.view(1, *image.size())
    image = Variable(image)

    model.eval()
    preds = model(image)

    _, preds = preds.max(2)
    preds = preds.transpose(1, 0).contiguous().view(-1)

    preds_size = Variable(torch.IntTensor([preds.size(0)]))
    sim_pred = converter.decode(preds.data, preds_size.data, raw=False)  # 預測輸出解碼成文字
    print('results: {0}'.format(sim_pred))


if __name__ == '__main__':

    # crnn network
    model = Net.CRNN(nclass)
    
    # 載入訓練好的模型，CPU和GPU的載入方式不一樣，需分開處理
    if running_mode == 'gpu' and torch.cuda.is_available():
        model = model.cuda()
        model.load_state_dict(torch.load(crnn_model_path))
    else:
        model.load_state_dict(torch.load(crnn_model_path, map_location='cpu'))

    print('loading pretrained model from {0}'.format(crnn_model_path))

    files = sorted(os.listdir(IMG_ROOT))  # 按檔名排序
    for file in files:
        started = time.time()
        full_path = os.path.join(IMG_ROOT, file)
        print("=============================================")
        print("ocr image is %s" % full_path)
        image = Image.open(full_path)

        crnn_recognition(image, model)
        finished = time.time()
        print('elapsed time: {0}'.format(finished - started))

識別效果和總結

首先我從測試集中抽取幾張影象送入模型識別，識別全部正確。

我也隨機在一些文件圖片、掃描影象上截取了一段文字影象送入我們該模型進行識別，識別效果也挺好的，基本識別正確，表明模型泛化能力很強。

我還截取了增值稅掃描發票上的文字影象來看看我們的模型能否還可以表現出穩定的識別效果：

這裡做個小小的總結：對於端到端不定長的文字識別，CRNN是最為經典的識別演算法，而且實戰看來效果非常不錯。上面識別結果可以看出，雖然我們用於訓練的資料集是自己生成的，但是我們該模型對於pdf文件、掃描影象等都有很不錯的識別結果，如果需要繼續提升對特定領域的文字影象的識別，直接大量加入該類影象用於訓練即可。CRNN的完整程式碼可以參考我的Github。

分類:OCR系列好文要頂關注我收藏該文 Madcola
關注 - 30
粉絲 - 1831 +加關注 9 0 «上一篇：【OCR技術系列之七】端到端不定長文字識別CRNN演算法詳解 posted @2019-02-01 11:44Madcola 閱讀(28113) 評論(79)編輯收藏 < Prev 12
評論列表回覆引用 #51樓 2019-07-11 15:10明天週六了 CNN用的是VGG16嗎？為什麼層數有點不太一樣啊？支援(0)反對(0) 回覆引用 #52樓 2019-07-15 11:18puma360 樓主你好，請問train/test.txt裡的資料已經是數字化過了嗎？謝謝支援(3)反對(0) 回覆引用 #53樓 2019-07-15 15:57puma360 想請問一下大家有什麼辦法可以將train和test.txt裡已經數字化的部分轉回文字支援(0)反對(0) 回覆引用 #54樓 2019-07-15 16:49puma360 @西決英豪
你好，想請問一下將資料集轉換為imdb的時候，是train/test.txt 和 .jpg影象集都需要嗎？影象集就是那360w的那個嗎，謝謝~ 支援(0)反對(0) 回覆引用 #55樓 2019-07-15 16:49puma360 @Keep_exercising
你好，想請問一下將資料集轉換為imdb的時候，是train/test.txt 和 .jpg影象集都需要嗎？影象集就是那360w的那個嗎，謝謝~ 支援(0)反對(0) 回覆引用 #56樓 2019-07-15 16:51puma360 @ruz_zhang
你好，請問生產imdb的時候是train/test.txt 和.jpg影象集都需要嘛謝謝! 支援(0)反對(0) 回覆引用 #57樓 2019-07-17 11:27puma360 @走吧！走吧

引用各位大佬，我標籤設定的為帶空格的字串，例如 "80 33 181 758 2 662 19 94 71 73"，但是驗證集正確率為0，就表示我的標籤格式不對，那麼到底應該怎麼弄呢？因為漢字之間可以不加空格，但是漢字轉換為數字後必須有空格，標籤格式是啥樣的？

我也有同樣的疑惑。我們用的訓練集和測試集裡的文字已經是數字化的了，xxxx.jpg後面的一行字，每個字用數字表示並且用空格分開，word = line.split()[1]這一句只把每行的第一個數字讀到labelList裡了，請問這樣的處理正確嗎支援(1)反對(0) 回覆引用 #58樓 2019-07-27 09:36rider99 不知是否有人遇到下面這個錯誤

RuntimeError: DataLoader worker (pid 2675) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace. 支援(2)反對(1) 回覆引用 #59樓 2019-08-14 00:17杜沐清 @寂寞的小乞丐
https://github.com/suntreeDu/easy_ocr 支援(0)反對(0) 回覆引用 #60樓 2019-09-05 18:27恭喜博主，我這個剛入門，看了你好幾篇文章，感覺都很專業，知道的東西也特別的多。而且有幾篇文章特別符合我的需求。所以很佩服。所以想問下有沒有什麼推薦的書籍，網站之類適合剛入門的人補充一下基礎。或者瞭解更多這些技術與現實解決方案的使用。再次謝過。支援(0)反對(0) 回覆引用 #61樓 2019-09-07 15:50佛科院的小鴻 @puma36你好，請問你解決這個疑問了嗎？我也遇到同樣的問題。data_train.txt和data_text.txt裡面的是座標，不是文字支援(0)反對(0) 回覆引用 #62樓 2019-09-07 21:07佛科院的小鴻 @回溯法
你好，你的分享被刪除了，請問可以重新分享嗎？或者可以qq分析給我嗎？531257949 支援(0)反對(0) 回覆引用 #63樓 2019-10-11 10:50xxxqcb @rider99
我在win10上跑得，遇到了相同的問題，建議把Config中的data_worker 設定為0即可支援(0)反對(0) 回覆引用 #64樓 2019-10-16 09:32xxxqcb @puma360
我也有同樣的疑問，已經數字化了就不能這麼處理了支援(0)反對(0) 回覆引用 #65樓 2019-11-04 16:16胡笳 @明天週六了
這個是調整過的 VGG網路應為後面要接RNN 是不能直接使用的支援(0)反對(0) 回覆引用 #66樓 2019-11-05 20:31小灰灰超解碼過程只是找了一條概率最大的路徑，請問一下有沒有ctc解碼的函式或者庫啊支援(0)反對(0) 回覆引用 #67樓 2020-01-16 11:00Ryansanity 有沒有人遇到像我這樣的問題：
img = Image.open(buf).convert('L')
OSError: cannot identify image file <_io.BytesIO object at 0x000000000F3CA938>
求各位大佬指教，我查了下，好像說是Pillow庫的問題，但是沒有解決方法。。支援(2)反對(0) 回覆引用 #68樓 2020-02-25 22:03yyynu @回溯法
資料集還可以再分享下嗎？連結失效了支援(0)反對(0) 回覆引用 #69樓 2020-03-30 21:39LudwigH

@Ryansanity
請問您解決這個問題了麼，我也遇到了這個問題

支援(1)反對(0) 回覆引用 #70樓 2020-04-13 19:55Heyjude1984

執行train.py時在這一句報錯了
train_iter = iter(train_loader)
報錯內容如下：
Traceback (most recent call last):
File "C:/Users/mds/Desktop/Lets_OCR-master/Lets_OCR-master/recognizer/crnn/train.py", line 155, in <module>
train_iter = iter(train_loader)
File "C:\Program Files\Python35\lib\site-packages\torch\utils\data\dataloader.py", line 278, initer
return _MultiProcessingDataLoaderIter(self)
File "C:\Program Files\Python35\lib\site-packages\torch\utils\data\dataloader.py", line 682, ininit
w.start()
File "C:\Program Files\Python35\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Program Files\Python35\lib\multiprocessing\context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Program Files\Python35\lib\multiprocessing\context.py", line 313, in _Popen
return Popen(process_obj)
File "C:\Program Files\Python35\lib\multiprocessing\popen_spawn_win32.py", line 66, ininit
reduction.dump(process_obj, to_child)
File "C:\Program Files\Python35\lib\multiprocessing\reduction.py", line 59, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'Environment'>: attribute lookup Environment on builtins failed
請問有人遇到或者知道怎麼解決這個問題的嗎？我用的是自己生成的資料集。

支援(0)反對(0) 回覆引用 #71樓 2020-05-15 17:48oweiii

@LudwigH
參考博主git的issue裡，可以這樣解決:
def writeCache(env, cache):
with env.begin(write=True) as txn:
for k, v in cache.items():

影象名為bytes型別，而label為str型別，k為str型別，需要將str型別轉為bytes型別

if isinstance(v, bytes):
txn.put(k.encode(), v) # 新增資料和鍵值
elif isinstance(v, str):
txn.put(k.encode(), v.encode())

支援(0)反對(0) 回覆引用 #72樓 2020-05-15 17:50oweiii

@yyynu
若只是用訓練好的模型infer可以不用改po主的alphabet.py 因為正確版的字符集編譯之後就會變成po主那樣 infer時是可以顯示正確對應的字元的

有需要資料集標籤可以去博主git裡面issue取 #45

支援(0)反對(0) 回覆引用 #73樓 2020-05-15 17:53oweiii

@佛科院的小鴻
文字可以去博主git裡面issue#82取

支援(0)反對(0) 回覆引用 #74樓 2020-06-15 17:21精業

@LudwigH
你好這個有解決麼

支援(0)反對(0) 回覆引用 #75樓 2020-06-30 13:10咕咕MY

請問txt檔案的路徑和標籤之間是不是空格呀，不是陣列或列表的話用[0][1]不是不行嗎

支援(0)反對(0) 回覆引用 #76樓 2020-08-06 16:39Casablancarsq

你這個程式碼問題太多了,data loader，evaluate，要改的太多。。。

支援(0)反對(0) 回覆引用 #77樓 2020-08-07 13:50Casablancarsq

@西決英豪
是數字，但跟github上的alphabets根本對不上，詞典和數字索引是亂的

支援(0)反對(0) 回覆引用 #78樓 2020-08-12 01:39NoOne_xixi

@Ryansanity
請問解決了嗎

支援(0)反對(0) 回覆引用 #79樓 2020-08-17 16:19老笨啊

請教下~~我用博主的create_lmdb_dataset.py檔案把資料轉換成lmdb檔案，一直會報這種錯：
UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 0: illegal multibyte sequence，
怎麼解決。。。

支援(0)反對(0) < Prev 12 重新整理評論重新整理頁面返回頂部發表評論編輯預覽

退出訂閱評論我的部落格

[Ctrl+Enter快捷鍵提交]

公告

暱稱：Madcola
園齡：3年9個月
粉絲：1831
關注：30 +加關注

2020年10月

日

一

二

三

四

五

六

搜尋

常用連結

積分與排名

積分 - 220459
排名 - 2657

【OCR技術系列之八】端到端不定長文字識別CRNN程式碼實現

【OCR技術系列之八】端到端不定長文字識別CRNN程式碼實現

資料處理

網路設計

損失函式設計

網路訓練設計

訓練過程與測試設計

識別效果和總結

影象名為bytes型別，而label為str型別，k為str型別，需要將str型別轉為bytes型別

公告

搜尋

常用連結

積分與排名

隨筆分類(69)

隨筆檔案(69)

最新評論

閱讀排行榜

評論排行榜

推薦排行榜

【OCR技術系列之四】基於深度學習的文字識別（3755個漢字）

【OCR技術系列之八】端到端不定長文字識別CRNN程式碼實現

【OCR技術系列之七】端到端不定長文字識別CRNN演算法詳解

【OCR技術系列之三】大批量生成文字訓練集

【OCR技術系列之二】文字定位與切割

【OCR技術系列之六】文字檢測CTPN的程式碼實現

【OCR技術系列之一】字元識別技術總覽

【5年Android從零覆盤系列之八】Android自定義View(3)：衍生/擴充套件式

ClickHouse學習系列之八【資料匯入遷移&同步】

【ClickHouse 技術系列】- ClickHouse 中的巢狀資料結構

【最新面試系列之訊息中介軟體】你們的系統架構中為什麼要引入訊息中介軟體？

【API進階之路】老闆給我漲薪30%！如何通過SDK介面搞定千萬級流量直播

【API進階之路】因為不會建立雲伺服器，我被實習生擺了一道

【API進階之路】幫公司省下20萬調研費！如何巧用情感分析API實現使用者偏好調研

題解 P5397 【[Ynoi2018]天降之物】

OpenvSwitch系列之八 vxlan隧道

ocs部署實驗系列之八——配置CWA 伺服器

【API進階之路】研發需求突增3倍，測試團隊集體鬧離職

【JVM系統學習之路】JAVA 虛擬機器棧

【Python 測試開發之路】html

【OCR技術系列之八】端到端不定長文字識別CRNN程式碼實現

資料處理

網路設計

損失函式設計

網路訓練設計

訓練過程與測試設計

識別效果和總結

影象名為bytes型別，而label為str型別，k為str型別，需要將str型別轉為bytes型別

公告

搜尋

常用連結

積分與排名

隨筆分類(69)

隨筆檔案(69)

最新評論

閱讀排行榜

評論排行榜

推薦排行榜

相關推薦