1. 程式人生 > 其它 >caffe模型轉pytorch---LSTM

caffe模型轉pytorch---LSTM

之前完成了幾個網路的caffe轉pytorch。
refinenet https://www.cnblogs.com/yanghailin/p/13096258.html
refinedet https://www.cnblogs.com/yanghailin/p/12965695.html
上面那個是提取caffe權重然後轉libtorch,下面是直接對應的pytorch版本轉libtorch,大量的後處理用libtorch完成,後來同事也完成了直接拿caffe權重轉libtorch。
無出意外,上面的都是需要編譯caffe的python介面完成。但是一般的工程場景是我們只用caffe的c++,有時候沒有對應的python工程。然後編譯python介面並呼叫會有一些麻煩。
後來我想為啥我們要多此一舉,直接用caffe跑前向的c++工程難道不行嗎?
其實是可以的,只是caffe的原始碼複雜,一開始看不懂。
本系列的部落格就是直接用caffe的c++工程直接提取權重,搭建同樣的pytorch網路,把caffe權重填充過來就可以直接執行跑前向推理。

我是這麼處理的,首先編譯caffe lstm的cpu版本,可以在clion裡面debug,我是在/caffe_ocr/tools/caffe.cpp 把caffe.cpp原有的都刪了,然後換上了lstm 跑前向推理的程式碼,這樣編譯出來的caffe原始碼。
然後我就可以打斷點除錯了。

caffe原始碼是一個高度抽象的工程,通過Layer作為基類,其他所有演算法模組都是在這個Layer上派生出來的。
net類是一個很重要的類,它管理與統籌了整個網路,在net類中可以拿到網路所有中間feature map結果,可以拿到每個層對應的權重。
由於我的目的是需要轉lstm到pytorch。所有把lstm這個運算元實現方法整明白至關重要。一看不知道,再看直接傻眼。lstm實現真是複雜啊!它內部自己整了一個net類!!!雙向lstm就是整了2個net類,派生於RecurrentLayer這個類。
lstm原理的話就是那6個公式,看這個部落格就可以:

https://www.jianshu.com/p/9dc9f41f0b29
https://colah.github.io/posts/2015-08-Understanding-LSTMs/




本文並不打算仔細講解caffe原始碼與lstm具體實現方式。後面有機會單獨開一個部落格吧。
本文具體講解從caffemodel提取各個層的權重。權重是一般是很大的一個矩陣,比如[64,3,7,7], 需要把這些權重儲存起來供Python讀取。
一開始我也在c++想有啥辦法和Python numpy一樣的方便處理矩陣,想過了用json,xml或者直接用caffe自帶的blob類,但是不會用啊!用caffe的proto應該是可以的,但是不會用。
然後就用最直接的方法吧,就是把權重直接一行一個儲存在本地txt中,檔案命名就直接是該層的層名,比如該層層名是conv1,那麼就是conv1_weight_0.txt,conv1_weight_1.txt。第一行放形狀,比如64,3,7,7。
由於權重也是以blob形式存在的,所以我在blob原始碼裡面加上了儲存該blob資料到本地txt的函式,只需要提供儲存的地址就可以。如下:

void save_data_to_txt(const string path_txt,bool b_save_shape = true)
  {
    std::ofstream fOut(path_txt);
    if (!fOut)
    {
      std::cout << "Open output file faild." << std::endl;
    }
    if(b_save_shape)
    {
      for(int i=0;i<shape_.size();i++)
      {
        fOut << shape_[i];
        if(i == shape_.size()-1)
        {
          fOut<<std::endl;
        }else
        {
          fOut<<",";
        }
      }
    }

    const Dtype* data_vec = cpu_data();
    for (int i = 0; i < count_; ++i) {
      fOut << data_vec[i] << std::endl;
    }
    fOut.close();
  }

下面直接上我的程式碼,儲存每層權重到txt的程式碼如下:

 std::cout<<"\n\n\n\n============2021-11-18======================================="<<std::endl;
      shared_ptr<Net<float> > net_ = classifier.get_net(); //這裡是從跑前向的類裡面拿Net類
      vector<shared_ptr<Layer<float> > >  layers = net_->layers(); //拿到每層Layer運算元的指標
      vector<shared_ptr<Blob<float> > > params = net_->params();//拿到所有權重指標
      vector<vector<Blob<float>*> > bottom_vecs_ = net_->bottom_vecs();//拿到所有bottom feature map
      vector<vector<Blob<float>*> > top_vecs_ = net_->top_vecs();//拿到所有top feature map //注意這裡面的layers和bottom_vecs_ top_vecs_都是一一對應的
      std::cout<<"size layer=" << layers.size()<<std::endl;
      std::cout<<"size params=" << params.size()<<std::endl;
      string path_save_dir = "/data_1/Yang/project/save_weight/";

      for(int i=0;i<layers.size();i++)
      {
          shared_ptr<Layer<float> > layer = layers[i];
          string name_layer = layer->layer_param().name();//當前層層名
          std::cout<<i<<"   layer_name="<<name_layer<<"    type="<<layer->layer_param().type()<<std::endl;
          int bottom_name_size = layer->layer_param().bottom().size();
          std::cout<<"=================bottom================"<<std::endl;
          if(bottom_name_size>0)
          {
              for(int ii=0;ii<bottom_name_size;ii++)
              {
                  std::cout<<ii<<" ::bottom name="<<layer->layer_param().bottom(ii)<<std::endl;
                  Blob<float>* ptr_blob = bottom_vecs_[i][ii];
                  std::cout<<"bottom shape="<<ptr_blob->shape_string()<<std::endl;
              }
          } else{
              std::cout<<"no bottom"<<std::endl;
          }
          std::cout<<"=================top================"<<std::endl;
          int top_name_size = layer->layer_param().top().size();
          if(top_name_size>0)
          {
              for(int ii=0;ii<top_name_size;ii++)
              {
                  std::cout<<ii<<" ::top name="<<layer->layer_param().top(ii)<<std::endl;
                  Blob<float>* ptr_blob = top_vecs_[i][ii];
                  std::cout<<"top shape="<<ptr_blob->shape_string()<<std::endl;
              }
          } else{
              std::cout<<"no top"<<std::endl;
          }


          vector<shared_ptr<Blob<float> > > params = layer->blobs();
          std::cout<<"=================params ================"<<std::endl;
          std::cout<<"params size= "<<params.size()<<std::endl;
          if(0 == params.size())
          {
              std::cout<<"has no params"<<std::endl;
          } else
          {
              for(int j=0;j<params.size();j++)
              {
                  std::cout<<"params_"<<j<<" shape="<<params[j]->shape_string()<<std::endl;

                  params[j]->save_data_to_txt(path_save_dir + name_layer + "_weight_" + std::to_string(j)+".txt");
              }
          }
          std::cout<<std::endl;
      }


      //這裡是為了對比caffe和pytorch的某一層輸出是否一致,先儲存caffe的某層feature map輸出。
      string name_aim_top = "premuted_fc";
      const shared_ptr<Blob<float>> feature_map = net_->blob_by_name(name_aim_top);
      bool b_save_shape = false;
      std::cout<<"featuremap shape="<<std::endl;
      std::cout<<feature_map->shape_string()<<std::endl;
      feature_map->save_data_to_txt("/data_1/Yang/project/myfile/blob_val/"+name_aim_top+".txt",b_save_shape);

看caffe網路的話,可以直接把prototxt檔案複製到網頁上面檢視。
http://ethereon.github.io/netscope/quickstart.html
這樣看比較直觀。

這裡需要特別注意的是一個,就地操作。就是比如圖上網路連在一起的conv1,conv1_bn,conv1_scale,conv1_relu由於它們的bottom和top名字一樣,導致經過該層的運算結果直接會覆蓋bottom,就是共用了一塊記憶體。
這裡是一個坑,之前一個同事也在做類似的工作,然後不同框架之間對比檢查精度,發現剛開始的幾層精度就對不上了,苦苦找問題找了一週都沒有找到,最後讓我幫忙看了看,我看了大半天才發現是這個就地操作導致的,你想拿conv1的feature map的結果是拿不到,你拿的實際已經是經過了conv1,conv1_bn,conv1_scale,conv1_relu這4步操作之後的結果了!

以上,就會生成每層權重,如果該層有多個權重,就直接是檔名末尾計數0,1,2來區分的,命名方式是layerName+_weight_cnt.txt。檔案txt第一行是權重的shape,比如64,64,1,1。

完事之後,在Python端,我先寫了一個指令碼,讀取txt把這些權重儲存在一個字典裡面。

import os
import numpy as np

#這個類主要是為了能夠多重字典賦值
class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value


def get_weight_numpy(path_dir):
    out = AutoVivification()
    list_txt = os.listdir(path_dir)
    for cnt,txt in enumerate(list_txt):
        print(cnt, "  ", txt)
        txt_ = txt.replace(".txt","")
        layer_name, idx = txt_.split("_weight_")
        path_txt = path_dir + txt
        with open(path_txt, 'r') as fr:
            lines = fr.readlines()
            data = []
            shape_line = []
            for cnt_1, line in enumerate(lines):
                if(0 == cnt_1):
                    shape_line = []
                    shape_line = line.strip().split(",")
                else:
                    data.append(float(line))

            shape_line = map(eval, shape_line)
            data = np.array(data).reshape(shape_line)
            # new_dict = {}
            out[layer_name][int(idx)] = data

    return out

if __name__ == "__main__":
    path_dir = "/data_1/Yang/project/save_weight/"
    out = get_weight_numpy(path_dir)
    conv1_weight = out['conv1'][0]
    conv1_bias = out['conv1'][1]

下面直接給出把caffe儲存的權重懟到搭建的pytorch 層上:

# coding=utf-8
import torch
import torchvision
from torch import nn
import torch.nn.functional as F

import cv2
import numpy as np
from weight_numpy import get_weight_numpy



class lstm_general(nn.Module):  # SfSNet = PS-Net in SfSNet_deploy.prototxt
    def __init__(self):
        super(lstm_general, self).__init__()
        # self.conv1_1 = nn.Conv2d(3, 64, 3, 1, 1)
        self.data_bn = nn.BatchNorm2d(3)
        self.conv1 = nn.Conv2d(3, 64, 7, 2, 3)
        self.conv1_bn = nn.BatchNorm2d(64)

        self.conv1_pool = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        self.layer_64_1_conv1 = nn.Conv2d(64, 64, 1, 1, 0, bias = False)
        self.layer_64_1_bn2 = nn.BatchNorm2d(64)

        self.layer_64_1_conv2 = nn.Conv2d(64, 64, 3, 1, 1, bias=False)
        self.layer_64_1_bn3 = nn.BatchNorm2d(64)

        self.layer_64_1_conv3 = nn.Conv2d(64, 256, 1, 1, 0, bias=False)
        self.layer_64_1_conv_expand = nn.Conv2d(64, 256, 1, 1, 0, bias=False)

        self.layer_128_1_bn1 = nn.BatchNorm2d(256)

        self.layer_128_1_conv1 = nn.Conv2d(256, 128, 1, 1, 0, bias=False)
        self.layer_128_1_bn2 = nn.BatchNorm2d(128)

        self.layer_128_1_conv2 = nn.Conv2d(128, 128, 3, 1, 1, bias=False)
        self.layer_128_1_bn3 = nn.BatchNorm2d(128)

        self.layer_128_1_conv3 = nn.Conv2d(128, 512, 1, 1, 0, bias=False)
        self.layer_128_1_conv_expand = nn.Conv2d(256, 512, 1, 1, 0, bias=False)

        self.last_bn = nn.BatchNorm2d(512)


        # self.lstm_1 = nn.LSTM(512 * 8, 100, 1, bidirectional=False)
        self.lstm_lr = nn.LSTM(512 * 8, 100, 1, bidirectional=True)

        self.fc1x1_r2_v2_a = nn.Linear(200,7118)


    def forward(self, inputs):
        # x = F.relu(self.bn1_1(self.conv1_1(inputs)))
        x = self.data_bn(inputs)
        x = F.relu(self.conv1_bn(self.conv1(x)))
        x = self.conv1_pool(x) #[1,64,8,80]

        x = F.relu(self.layer_64_1_bn2(self.layer_64_1_conv1(x)))  # 1 64 8 80
        layer_64_1_conv1 = x

        x = F.relu(self.layer_64_1_bn3(self.layer_64_1_conv2(x)))

        x = self.layer_64_1_conv3(x)

        layer_64_1_conv_expand = self.layer_64_1_conv_expand(layer_64_1_conv1)
        layer_64_3_sum = x + layer_64_1_conv_expand  #1 256 8 80

        x = F.relu(self.layer_128_1_bn1(layer_64_3_sum))
        layer_128_1_bn1 = x

        x = F.relu(self.layer_128_1_bn2(self.layer_128_1_conv1(x)))
        x = F.relu(self.layer_128_1_bn3(self.layer_128_1_conv2(x)))
        x = self.layer_128_1_conv3(x) #1, 512, 8, 80
        layer_128_1_conv_expand = self.layer_128_1_conv_expand(layer_128_1_bn1)  #1, 512, 8, 80
        layer_128_4_sum = x + layer_128_1_conv_expand

        x = F.relu(self.last_bn(layer_128_4_sum))
        x = F.dropout(x, p=0.7, training=False) #1 512 8 80
        x = x.permute(3,0,1,2) # 80 1 512 8
        x = x.reshape(80,1,512*8)
        #
        # merge_lstm_rlstmx, (hn, cn) = self.lstm_r(x)

        lstm_out,(_,_) = self.lstm_lr(x) #(80,1,200)
        out = self.fc1x1_r2_v2_a(lstm_out) #(80,1,7118)

        return out



def save_tensor(tensor_in,path_save):
    tensor_in = tensor_in.contiguous().view(-1,1)
    np_tensor = tensor_in.cpu().detach().numpy()
    # np_tensor = np_tensor.view()
    np.savetxt(path_save,np_tensor,fmt='%.12e')



def access_pixels(frame):
    print(frame.shape)  # shape內包含三個元素:按順序為高、寬、通道數
    height = frame.shape[0]
    weight = frame.shape[1]
    channels = frame.shape[2]
    print("weight : %s, height : %s, channel : %s" % (weight, height, channels))

    with open("/data_1/Yang/project/myfile/blob_val/img_stand_python.txt", "w") as fw:
        for row in range(height):  # 遍歷高
            for col in range(weight):  # 遍歷寬
                for c in range(channels):  # 便利通道
                    pv = frame[row, col, c]
                    fw.write(str(int(pv)))
                    fw.write("\n")




def LstmImgStandardization(img, ratio=10.0, stand_w=320, stand_h=32):
    img_h, img_w, _ = img.shape
    if img_h < 2 or img_w < 2:
        return
    # if 32 == img_h and 320 == img_w:
    #     return img

    ratio_now = img_w * 1.0 / img_h
    if ratio_now <= ratio:
        mask = np.ones((img_h, int(img_h * ratio), 3), dtype=np.uint8) * 255
        mask[0:img_h,0:img_w,:] = img
    else:
        mask = np.ones((int(img_w*1.0/ratio), img_w, 3), dtype=np.uint8) * 255
        mask[0:img_h, 0:img_w, :] = img

    mask_stand = cv2.resize(mask,(stand_w, stand_h),interpolation=cv2.INTER_LINEAR)

    # access_pixels(mask_stand)
    return mask_stand




if __name__ == '__main__':

    device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')

    net = lstm_general()
    # net.eval()

    index = 0
    print("*" * 50)
    for name, param in list(net.named_parameters()):
        print(str(index) + ':', name, param.size())
        index += 1
    print("*" * 50)

    ##搭建完網路就可以通過這裡看到網路所需要的引數名字
    for k, v in net.state_dict().items():
        print(k)
        print(v.shape)

        # print(k,v)
    print("@" * 50)

    # aaa = np.zeros((400,1))






    path_dir = "/data_1/Yang/project/OCR/3rdlib/caffe_ocr_2021/myfile/save_weight/"
    weight_numpy_dict = get_weight_numpy(path_dir)
    from torch import from_numpy
    state_dict = {}
    state_dict['data_bn.running_mean'] = from_numpy(weight_numpy_dict["data_bn"][0] / weight_numpy_dict["data_bn"][2])
    state_dict['data_bn.running_var'] = from_numpy(weight_numpy_dict["data_bn"][1] / weight_numpy_dict["data_bn"][2])
    state_dict['data_bn.weight'] = from_numpy(weight_numpy_dict['data_scale'][0])
    state_dict['data_bn.bias'] = from_numpy(weight_numpy_dict['data_scale'][1])

    state_dict['conv1.weight'] = from_numpy(weight_numpy_dict['conv1'][0])
    state_dict['conv1.bias'] = from_numpy(weight_numpy_dict['conv1'][1])
    state_dict['conv1_bn.running_mean'] = from_numpy(weight_numpy_dict["conv1_bn"][0] / weight_numpy_dict["conv1_bn"][2])
    state_dict['conv1_bn.running_var'] = from_numpy(weight_numpy_dict["conv1_bn"][1] / weight_numpy_dict["conv1_bn"][2])
    state_dict['conv1_bn.weight'] = from_numpy(weight_numpy_dict['conv1_scale'][0])
    state_dict['conv1_bn.bias'] = from_numpy(weight_numpy_dict['conv1_scale'][1])

    state_dict['layer_64_1_conv1.weight'] = from_numpy(weight_numpy_dict['layer_64_1_conv1'][0])
    state_dict['layer_64_1_bn2.running_mean'] = from_numpy(weight_numpy_dict["layer_64_1_bn2"][0] / weight_numpy_dict["layer_64_1_bn2"][2])
    state_dict['layer_64_1_bn2.running_var'] = from_numpy(weight_numpy_dict["layer_64_1_bn2"][1] / weight_numpy_dict["layer_64_1_bn2"][2])
    state_dict['layer_64_1_bn2.weight'] = from_numpy(weight_numpy_dict['layer_64_1_scale2'][0])
    state_dict['layer_64_1_bn2.bias'] = from_numpy(weight_numpy_dict['layer_64_1_scale2'][1])


    state_dict['layer_64_1_conv2.weight'] = from_numpy(weight_numpy_dict['layer_64_1_conv2'][0])
    state_dict['layer_64_1_bn3.running_mean'] = from_numpy(weight_numpy_dict["layer_64_1_bn3"][0] / weight_numpy_dict["layer_64_1_bn3"][2])
    state_dict['layer_64_1_bn3.running_var'] = from_numpy(weight_numpy_dict["layer_64_1_bn3"][1] / weight_numpy_dict["layer_64_1_bn3"][2])
    state_dict['layer_64_1_bn3.weight'] = from_numpy(weight_numpy_dict['layer_64_1_scale3'][0])
    state_dict['layer_64_1_bn3.bias'] = from_numpy(weight_numpy_dict['layer_64_1_scale3'][1])

    state_dict['layer_64_1_conv3.weight'] = from_numpy(weight_numpy_dict['layer_64_1_conv3'][0])
    state_dict['layer_64_1_conv_expand.weight'] = from_numpy(weight_numpy_dict['layer_64_1_conv_expand'][0])

    state_dict['layer_128_1_bn1.running_mean'] = from_numpy(weight_numpy_dict["layer_128_1_bn1"][0] / weight_numpy_dict["layer_128_1_bn1"][2])
    state_dict['layer_128_1_bn1.running_var'] = from_numpy(weight_numpy_dict["layer_128_1_bn1"][1] / weight_numpy_dict["layer_128_1_bn1"][2])
    state_dict['layer_128_1_bn1.weight'] = from_numpy(weight_numpy_dict['layer_128_1_scale1'][0])
    state_dict['layer_128_1_bn1.bias'] = from_numpy(weight_numpy_dict['layer_128_1_scale1'][1])

    state_dict['layer_128_1_conv1.weight'] = from_numpy(weight_numpy_dict['layer_128_1_conv1'][0])
    state_dict['layer_128_1_bn2.running_mean'] = from_numpy(weight_numpy_dict["layer_128_1_bn2"][0] / weight_numpy_dict["layer_128_1_bn2"][2])
    state_dict['layer_128_1_bn2.running_var'] = from_numpy(weight_numpy_dict["layer_128_1_bn2"][1] / weight_numpy_dict["layer_128_1_bn2"][2])
    state_dict['layer_128_1_bn2.weight'] = from_numpy(weight_numpy_dict['layer_128_1_scale2'][0])
    state_dict['layer_128_1_bn2.bias'] = from_numpy(weight_numpy_dict['layer_128_1_scale2'][1])

    state_dict['layer_128_1_conv2.weight'] = from_numpy(weight_numpy_dict['layer_128_1_conv2'][0])
    state_dict['layer_128_1_bn3.running_mean'] = from_numpy(weight_numpy_dict["layer_128_1_bn3"][0] / weight_numpy_dict["layer_128_1_bn3"][2])
    state_dict['layer_128_1_bn3.running_var'] = from_numpy(weight_numpy_dict["layer_128_1_bn3"][1] / weight_numpy_dict["layer_128_1_bn3"][2])
    state_dict['layer_128_1_bn3.weight'] = from_numpy(weight_numpy_dict['layer_128_1_scale3'][0])
    state_dict['layer_128_1_bn3.bias'] = from_numpy(weight_numpy_dict['layer_128_1_scale3'][1])

    state_dict['layer_128_1_conv3.weight'] = from_numpy(weight_numpy_dict['layer_128_1_conv3'][0])
    state_dict['layer_128_1_conv_expand.weight'] = from_numpy(weight_numpy_dict['layer_128_1_conv_expand'][0])

    state_dict['last_bn.running_mean'] = from_numpy(weight_numpy_dict["last_bn"][0] / weight_numpy_dict["last_bn"][2])
    state_dict['last_bn.running_var'] = from_numpy(weight_numpy_dict["last_bn"][1] / weight_numpy_dict["last_bn"][2])
    state_dict['last_bn.weight'] = from_numpy(weight_numpy_dict['last_scale'][0])
    state_dict['last_bn.bias'] = from_numpy(weight_numpy_dict['last_scale'][1])

    ## caffe i f o g
    ## pytorch i f g o

    ww = from_numpy(weight_numpy_dict['lstm1x_r2'][0])  # [400,4096]
    ww_200_if = ww[:200,:] #[200,4096]
    ww_100_o = ww[200:300,:] #[100,4096]
    ww_100_g = ww[300:400,:]#[100,4096]
    ww_cat_ifgo = torch.cat((ww_200_if,ww_100_g,ww_100_o),0)
    state_dict['lstm_lr.weight_ih_l0'] = ww_cat_ifgo

    bb = from_numpy(weight_numpy_dict['lstm1x_r2'][1])  # [400]
    bb_200_if = bb[:200]
    bb_100_o = bb[200:300]
    bb_100_g = bb[300:400]
    bb_cat_ifgo = torch.cat((bb_200_if, bb_100_g, bb_100_o), 0)
    state_dict['lstm_lr.bias_ih_l0'] = bb_cat_ifgo

    ww = from_numpy(weight_numpy_dict['lstm1x_r2'][2])  # [400,100]
    ww_200_if = ww[:200, :]  # [200,100]
    ww_100_o = ww[200:300, :]  # [100,100]
    ww_100_g = ww[300:400, :]  # [100,100]
    ww_cat_ifgo = torch.cat((ww_200_if, ww_100_g, ww_100_o), 0)
    state_dict['lstm_lr.weight_hh_l0'] = ww_cat_ifgo

    state_dict['lstm_lr.bias_hh_l0'] = from_numpy(np.zeros((400)))

    ##########################################
    ww = from_numpy(weight_numpy_dict['lstm2x_r2'][0])  # [400,4096]
    ww_200_if = ww[:200, :]  # [200,4096]
    ww_100_o = ww[200:300, :]  # [100,4096]
    ww_100_g = ww[300:400, :]  # [100,4096]
    ww_cat_ifgo = torch.cat((ww_200_if, ww_100_g, ww_100_o), 0)
    state_dict['lstm_lr.weight_ih_l0_reverse'] = ww_cat_ifgo

    bb = from_numpy(weight_numpy_dict['lstm2x_r2'][1])  # [400]
    bb_200_if = bb[:200]
    bb_100_o = bb[200:300]
    bb_100_g = bb[300:400]
    bb_cat_ifgo = torch.cat((bb_200_if, bb_100_g, bb_100_o), 0)
    state_dict['lstm_lr.bias_ih_l0_reverse'] = bb_cat_ifgo

    ww = from_numpy(weight_numpy_dict['lstm2x_r2'][2])  # [400,100]
    ww_200_if = ww[:200, :]  # [200,100]
    ww_100_o = ww[200:300, :]  # [100,100]
    ww_100_g = ww[300:400, :]  # [100,100]
    ww_cat_ifgo = torch.cat((ww_200_if, ww_100_g, ww_100_o), 0)
    state_dict['lstm_lr.weight_hh_l0_reverse'] = ww_cat_ifgo

    state_dict['lstm_lr.bias_hh_l0_reverse'] = from_numpy(np.zeros((400)))

    state_dict['fc1x1_r2_v2_a.weight'] = from_numpy(weight_numpy_dict['fc1x1_r2_v2_a'][0])
    state_dict['fc1x1_r2_v2_a.bias'] = from_numpy(weight_numpy_dict['fc1x1_r2_v2_a'][1])



    ####input########################################
    path_img = "/data_2/project/1.jpg"
    img = cv2.imread(path_img)
    # access_pixels(img)

    img_stand = LstmImgStandardization(img, ratio=10.0, stand_w=320, stand_h=32)


    img_stand = img_stand.astype(np.float32)
    # img = (img / 255. - config.DATASET.MEAN) / config.DATASET.STD
    img_stand = img_stand.transpose([2, 0, 1])
    img_stand = img_stand[None,:,:,:]
    img_stand = torch.from_numpy(img_stand)

    img_stand = img_stand.type(torch.FloatTensor)

    img_stand = img_stand.to(device)
    # img_stand = img_stand.view(1, *img.size())



    #######net##########################
    net.load_state_dict(state_dict)
    net.cuda()
    net.eval()

    preds = net(img_stand)
    print("out shape=",preds.shape)


    torch.save(net.state_dict(), './lstm_model.pth')



    # name_top_caffe_layer = "fc1x_a"  #"merge_lstm_rlstmx"  #"#"data_bn"
    # path_save = "/data_1/Yang/project/myfile/blob_val/" + name_top_caffe_layer + "_torch.txt"
    # save_tensor(preds, path_save)


    aaa = 0

這裡需要注意一下caffe裡面的bn層有三個引數,前面兩個是均值和方差,第三個引數是一個係數,均值和方差都需要除以這個係數,這個係數是一個固定值999.982

caffe中的scale層就是圖中下面這個公式係數。

這裡還需要講下lstm這個演算法。在caffe中設定的time_step為80,設定的hidden為100,輸入到lstm之前的feature map大小是80,1,512,8.
然後我通過層的權重看到lstm有3個權重,大小分別是[400,4096] [400] [400,100]
lstm通過檢視原始碼發現有引數的就是2個全連線層,[400,4096] [400] 這兩個是對輸入進行inner所需要的引數,400是100*4得到的,至於為什麼是4,這個需要看lstm原理,這裡簡單說下就是用h,x有4組相乘。
[400,100]是隱含h進行inner所需要的權重。
檢視pytorch手冊關於lstm介紹。
https://pytorch.org/docs/1.0.1/nn.html?highlight=lstm#torch.nn.LSTM。輸入引數介紹。



然後根據輸入引數,單獨寫了一個lstm運算元測試看看:

import  torch
import torch.nn as nn



# rnn = nn.LSTM(512*8, 100, 1, False)
# input = torch.randn(80, 1, 512*8)
#
# output, (hn, cn) = rnn(input)
#
#
# for name,parameters in rnn.named_parameters():
#   print(name,':',parameters.size())
#   # parm[name]=parameters.detach().numpy()
#
# aa = 0


rnn = nn.LSTM(512*8, 100, 1, bidirectional=True)
input = torch.randn(80, 1, 512*8)

output, (hn, cn) = rnn(input)
print("out shape=",output.shape)

for name,parameters in rnn.named_parameters():
  print(name,':',parameters.size())
  # parm[name]=parameters.detach().numpy()

aa = 0

輸出如下:

('out shape=', (80, 1, 200))
('weight_ih_l0', ':', (400, 4096))
('weight_hh_l0', ':', (400, 100))
('bias_ih_l0', ':', (400,))
('bias_hh_l0', ':', (400,))
('weight_ih_l0_reverse', ':', (400, 4096))
('weight_hh_l0_reverse', ':', (400, 100))
('bias_ih_l0_reverse', ':', (400,))
('bias_hh_l0_reverse', ':', (400,))

Process finished with exit code 0

可以看到pytorch的lstm所需要的引數基本與caffe一致,不過caffe的一個lstm引數是3個,pytorch的lstm引數是4個,顯然是因為caffe隱含層的inner沒用偏置,到時候直接把一個pytorch的偏置放為0就可以!

然而事情並不是一帆風順的,上面給出的程式碼是成功的,但是在此之前我把所有的引數都懟上,但是精度是不對的。後面仔細看lstm原始碼,發現caffe的計算順序:
lstm_unit_layer.cpp

template <typename Dtype>
void LSTMUnitLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  const int num = bottom[0]->shape(1);//1
  const int x_dim = hidden_dim_ * 4;
  const Dtype* C_prev = bottom[0]->cpu_data();
  const Dtype* X = bottom[1]->cpu_data();
  const Dtype* cont = bottom[2]->cpu_data();
  Dtype* C = top[0]->mutable_cpu_data();
  Dtype* H = top[1]->mutable_cpu_data();
  for (int n = 0; n < num; ++n) { //1
    for (int d = 0; d < hidden_dim_; ++d) {//100
      const Dtype i = sigmoid(X[d]);
      const Dtype f = (*cont == 0) ? 0 :
          (*cont * sigmoid(X[1 * hidden_dim_ + d]));weight_ih_l[k] – the learnable input-hidden weights of the \text{k}^{th}k 
th
  layer (W_ii|W_if|W_ig|W_io), of shape (4*hidden_size x input_size)
weight_hh_l[k] – the learnable hidden-hidden weights of the \text{k}^{th}k 
th
  layer (W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size x hidden_size)
bias_ih_l[k] – the learnable input-hidden bias of the \text{k}^{th}k 
th
  layer (b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)
bias_hh_l[k] – the learnable hidden-hidden bias of the \text{k}^{th}k 
th
  layer (b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)
      const Dtype o = sigmoid(X[2 * hidden_dim_ + d]);
      const Dtype g = tanh(X[3 * hidden_dim_ + d]);
      const Dtype c_prev = C_prev[d];
      const Dtype c = f * c_prev + i * g;
      C[d] = c;
      const Dtype tanh_c = tanh(c);
      H[d] = o * tanh_c;
    }
    C_prev += hidden_dim_;
    X += x_dim;
    C += hidden_dim_;
    H += hidden_dim_;
    ++cont;
  }
}

發現caffe的計算順序是ifog。
看pytorch說明文件介紹權重的順序是

weight_ih_l[k] – the learnable input-hidden weights of the \text{k}^{th}k 
th
  layer (W_ii|W_if|W_ig|W_io), of shape (4*hidden_size x input_size)
weight_hh_l[k] – the learnable hidden-hidden weights of the \text{k}^{th}k 
th
  layer (W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size x hidden_size)
bias_ih_l[k] – the learnable input-hidden bias of the \text{k}^{th}k 
th
  layer (b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)
bias_hh_l[k] – the learnable hidden-hidden bias of the \text{k}^{th}k 
th
  layer (b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)

有點兒不一樣,那麼我只需要把caffe的權重順序改下和pytorch一致試試。所有就有了上面的程式碼:

 ## caffe i f o g
    ## pytorch i f g o

    ww = from_numpy(weight_numpy_dict['lstm1x_r2'][0])  # [400,4096]
    ww_200_if = ww[:200,:] #[200,4096]
    ww_100_o = ww[200:300,:] #[100,4096]
    ww_100_g = ww[300:400,:]#[100,4096]
    ww_cat_ifgo = torch.cat((ww_200_if,ww_100_g,ww_100_o),0)
    state_dict['lstm_lr.weight_ih_l0'] = ww_cat_ifgo

這樣一整,成功了,精度一致!! 給出測試精度的程式碼。
不同框架下驗證精度 https://www.cnblogs.com/yanghailin/p/15593614.html
給出我跑出結果的程式碼:

# -*- coding: utf-8
import torch
from torch import nn
import torch.nn.functional as F

import cv2
import numpy as np
import os

from chn_tab import chn_tab



class lstm_general(nn.Module):  # SfSNet = PS-Net in SfSNet_deploy.prototxt
    def __init__(self):
        super(lstm_general, self).__init__()
        # self.conv1_1 = nn.Conv2d(3, 64, 3, 1, 1)
        self.data_bn = nn.BatchNorm2d(3)
        self.conv1 = nn.Conv2d(3, 64, 7, 2, 3)
        self.conv1_bn = nn.BatchNorm2d(64)

        self.conv1_pool = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        self.layer_64_1_conv1 = nn.Conv2d(64, 64, 1, 1, 0, bias = False)
        self.layer_64_1_bn2 = nn.BatchNorm2d(64)

        self.layer_64_1_conv2 = nn.Conv2d(64, 64, 3, 1, 1, bias=False)
        self.layer_64_1_bn3 = nn.BatchNorm2d(64)

        self.layer_64_1_conv3 = nn.Conv2d(64, 256, 1, 1, 0, bias=False)
        self.layer_64_1_conv_expand = nn.Conv2d(64, 256, 1, 1, 0, bias=False)

        self.layer_128_1_bn1 = nn.BatchNorm2d(256)

        self.layer_128_1_conv1 = nn.Conv2d(256, 128, 1, 1, 0, bias=False)
        self.layer_128_1_bn2 = nn.BatchNorm2d(128)

        self.layer_128_1_conv2 = nn.Conv2d(128, 128, 3, 1, 1, bias=False)
        self.layer_128_1_bn3 = nn.BatchNorm2d(128)

        self.layer_128_1_conv3 = nn.Conv2d(128, 512, 1, 1, 0, bias=False)
        self.layer_128_1_conv_expand = nn.Conv2d(256, 512, 1, 1, 0, bias=False)

        self.last_bn = nn.BatchNorm2d(512)






        # self.lstm_1 = nn.LSTM(512 * 8, 100, 1, bidirectional=False)
        self.lstm_lr = nn.LSTM(512 * 8, 100, 1, bidirectional=True)



        self.fc1x1_r2_v2_a = nn.Linear(200,7118)


    def forward(self, inputs):
        # x = F.relu(self.bn1_1(self.conv1_1(inputs)))
        x = self.data_bn(inputs)
        x = F.relu(self.conv1_bn(self.conv1(x)))
        x = self.conv1_pool(x) #[1,64,8,80]

        x = F.relu(self.layer_64_1_bn2(self.layer_64_1_conv1(x)))  # 1 64 8 80
        layer_64_1_conv1 = x

        x = F.relu(self.layer_64_1_bn3(self.layer_64_1_conv2(x)))

        x = self.layer_64_1_conv3(x)

        layer_64_1_conv_expand = self.layer_64_1_conv_expand(layer_64_1_conv1)
        layer_64_3_sum = x + layer_64_1_conv_expand  #1 256 8 80

        x = F.relu(self.layer_128_1_bn1(layer_64_3_sum))
        layer_128_1_bn1 = x

        x = F.relu(self.layer_128_1_bn2(self.layer_128_1_conv1(x)))
        x = F.relu(self.layer_128_1_bn3(self.layer_128_1_conv2(x)))
        x = self.layer_128_1_conv3(x) #1, 512, 8, 80
        layer_128_1_conv_expand = self.layer_128_1_conv_expand(layer_128_1_bn1)  #1, 512, 8, 80
        layer_128_4_sum = x + layer_128_1_conv_expand

        x = F.relu(self.last_bn(layer_128_4_sum))###acc ok

        x = F.dropout(x, p=0.7, training=False) #1 512 8 80
        x = x.permute(3,0,1,2) # 80 1 512 8
        x = x.reshape(80,1,512*8)###acc ok


        #
        # merge_lstm_rlstmx, (hn, cn) = self.lstm_r(x)

        lstm_out,(_,_) = self.lstm_lr(x) #(80,1,200)

        return lstm_out


        out = self.fc1x1_r2_v2_a(lstm_out) #(80,1,7118)

        return out


def LstmImgStandardization(img, ratio=10.0, stand_w=320, stand_h=32):
    img_h, img_w, _ = img.shape
    if img_h < 2 or img_w < 2:
        return
    # if 32 == img_h and 320 == img_w:
    #     return img

    ratio_now = img_w * 1.0 / img_h
    if ratio_now <= ratio:
        mask = np.ones((img_h, int(img_h * ratio), 3), dtype=np.uint8) * 255
        mask[0:img_h,0:img_w,:] = img
    else:
        mask = np.ones((int(img_w*1.0/ratio), img_w, 3), dtype=np.uint8) * 255
        mask[0:img_h, 0:img_w, :] = img

    mask_stand = cv2.resize(mask,(stand_w, stand_h),interpolation=cv2.INTER_LINEAR)

    # access_pixels(mask_stand)
    return mask_stand




if __name__ == '__main__':
    path_model = "/data_1/everyday/1118/pytorch_lstm_test/lstm_model.pth"
    path_img = "/data_2/project_202009/chejian/test_data/model_test/rec_general/1.jpg"
    blank_label = 7117
    prev_label = blank_label


    device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')

    img = cv2.imread(path_img)
    img_stand = LstmImgStandardization(img, ratio=10.0, stand_w=320, stand_h=32)
    img_stand = img_stand.astype(np.float32)
    img_stand = img_stand.transpose([2, 0, 1])
    img_stand = img_stand[None, :, :, :]
    img_stand = torch.from_numpy(img_stand)
    img_stand = img_stand.type(torch.FloatTensor)
    img_stand = img_stand.to(device)

    net = lstm_general()
    checkpoint = torch.load(path_model)
    net.load_state_dict(checkpoint)
    net.cuda()
    net.eval()

    # traced_script_module = torch.jit.trace(net, img_stand)
    # traced_script_module.save("./lstm.pt")

    preds = net(img_stand)
    # print("out shape=", preds.shape)

    preds_1 = preds.squeeze()
    # print("preds_1 out shape=", preds_1.shape)
    val, pos = torch.max(preds_1,1)
    pos = pos.cpu().numpy()


    rec = ""
    for predict_label in pos:
        if predict_label != blank_label and predict_label != prev_label:
            # print("predict_label=",predict_label)
            print(chn_tab[predict_label])
            rec += chn_tab[predict_label]
        prev_label = predict_label


    # print("rec=",rec)
    print(rec)

弄成功了,但是隻高興了一天。

我最終目的是能在c++下面跑,於是轉libtorch,本來我以為這是輕而易舉的事情,但是事情並沒有那麼簡單。
我發現我的libtorch程式碼經過lstm這層之後精度就對不上了,在此之前都是可以對上的。!!!無解。
可能和版本有關係,因為我用高版本的libtorch之前是轉成功一個crnn的,是沒有問題的。
https://github.com/wuzuowuyou/crnn_libtorch
這個是pytorch1.7版本的,而我現在是用的1.0版本的。我試了很久發現還是精度不對,這就無法解決了,也不知道從何下手去解決這個問題。翻遍了pytorch github上面的issue,沒人遇到和我一樣的問題。。。除非看pytorch原始碼去找問題,這太難了。
在pytorch的github提了issue
https://github.com/pytorch/pytorch/issues/68864
我知道這也會石沉大海的。

以下是我凌亂的,未完工的程式碼:

#include <torch/script.h> // One-stop header.
#include "torch/torch.h"
#include "torch/jit.h"
#include <memory>
#include "opencv2/opencv.hpp"
#include <queue>

#include <dirent.h>
#include <iostream>
#include <cstdlib>
#include <cstring>

#include <opencv2/opencv.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
using namespace cv;
using namespace std;

// cv::Mat m_stand;

#define TABLE_SIZE 7117
static string chn_tab[TABLE_SIZE+1] = {"啊","阿","埃"

                                        。。。
                                        。。。
                                        。。。
                                       "0","1","2","3","4","5","6","7","8","9",
                                       ":",";","<","=",">","?","@",
                                       "A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z",
                                       "[","\\","]","^","_","`",
                                       "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
                                       "{","|","}","~",
                                       " "};

bool LstmImgStandardization_src_1(const cv::Mat &src, const float &ratio, int standard_w, int standard_h, cv::Mat &dst)
{
    if(src.empty())return false;
    float width=src.cols;
    float height=src.rows;
    float  a=width/ height;

    if(a <=ratio)
    {
        Mat mask(height, ratio*height, CV_8UC3, cv::Scalar(255, 255, 255));
        Mat imageROI = mask(Rect(0, 0, width, height));
        src.copyTo(imageROI);
        dst=mask.clone();
    }
    else
    {
        Mat mask(width/ratio, width, CV_8UC3, cv::Scalar(255, 255, 255));
        Mat imageROI = mask(Rect(0, 0, width, height));
        src.copyTo(imageROI);
        dst=mask.clone();
    }

    //cv::resize(dst, dst, cv::Size(standard_w,standard_h));
    cv::resize(dst, dst, cv::Size(standard_w,standard_h),0,0,cv::INTER_AREA);
    return true;
}

bool lstm_img_standardization(cv::Mat src, cv::Mat &dst,float ratio)
{
    if(src.empty())return false;
    double width=src.cols;
    double height=src.rows;
    double a=width/height;

    if(a <=ratio)//6
    {
        Mat mask(height, ratio*height, CV_8UC3, Scalar(255, 255, 255));
        Mat imageROI = mask(Rect(0, 0, width, height));
        src.copyTo(imageROI);
        dst=mask.clone();
    }
    else
    {
        Mat mask(width/ratio, width, CV_8UC3, Scalar(255, 255, 255));
        Mat imageROI = mask(Rect(0, 0, width, height));
        src.copyTo(imageROI);
        dst=mask.clone();
    }

//    cv::resize(dst, dst, cv::Size(360,60));
    cv::resize(dst, dst, cv::Size(320,32));

    return true;
}

//torch::Tensor pre_img(cv::Mat &img)
//{
//    cv::Mat m_stand;
//    float ratio = 10.0;
//    if(1 == img.channels()) { cv::cvtColor(img,img,CV_GRAY2BGR); }
//    lstm_img_standardization(img, m_stand, ratio);
//
//    std::vector<int64_t> sizes = {m_stand.rows, m_stand.cols, m_stand.channels()};
//    torch::TensorOptions options = torch::TensorOptions().dtype(torch::kByte);
//    torch::Tensor tensor_image = torch::from_blob(m_stand.data, torch::IntList(sizes), options);
//    // Permute tensor, shape is (C, H, W)
//    tensor_image = tensor_image.permute({2, 0, 1});
//
//
//    // Convert tensor dtype to float32, and range from [0, 255] to [0, 1]
//    tensor_image = tensor_image.toType(torch::ScalarType::Float);
//
//
////    tensor_image = tensor_image.div_(255.0);
////    // Subtract mean value
////    for (int i = 0; i < std::min<int64_t>(v_mean.size(), tensor_image.size(0)); i++) {
////        tensor_image[i] = tensor_image[i].sub_(v_mean[i]);
////    }
////    // Divide by std value
////    for (int i = 0; i < std::min<int64_t>(v_std.size(), tensor_image.size(0)); i++) {
////        tensor_image[i] = tensor_image[i].div_(v_std[i]);
////    }
//    //[c,h,w]  -->  [1,c,h,w]
//    tensor_image.unsqueeze_(0);
//    std::cout<<tensor_image;
//    return tensor_image;
//}



bool pre_img(cv::Mat &img, torch::Tensor &input_tensor)
{
    static cv::Mat m_stand;
    float ratio = 10.0;
//    if(1 == img.channels()) { cv::cvtColor(img,img,CV_GRAY2BGR); }
    lstm_img_standardization(img, m_stand, ratio);
    m_stand.convertTo(m_stand, CV_32FC3);


//    imshow("m_stand",m_stand);
//    waitKey(0);

//    Mat m_stand_new;
//        m_stand.convertTo(m_stand_new, CV_32FC3);

//        int rowNumber = m_stand_new.rows;  //行數
//        int colNumber = m_stand_new.cols*m_stand_new.channels();  //列數 x 通道數=每一行元素的個數
//        std::ofstream out_file("/data_1/everyday/1123/img_acc/after_CV_32FC3-float-111.txt");
//        //雙重迴圈,遍歷所有的畫素值
//        for (int i = 0; i < rowNumber; i++)  //行迴圈
//        {
//            uchar *data = m_stand_new.ptr<uchar>(i);  //獲取第i行的首地址
//            for (int j = 0; j < colNumber; j++)   //列迴圈
//            {
//                // ---------【開始處理每個畫素】-------------
//                int pix = int(data[j]);
//                out_file << pix << std::endl;
//            }
//        }
//
//        out_file.close();
//        std::cout<<"==m_stand.convertTo(m_stand, CV_32FC3);=="<<std::endl;
//        while(1);




    int stand_row = m_stand.rows;
    int stand_cols = m_stand.cols;

    input_tensor = torch::from_blob(
            m_stand.data, {stand_row, stand_cols, 3}).toType(torch::kFloat);
    input_tensor = input_tensor.permute({2,0,1});
    input_tensor = input_tensor.unsqueeze(0);//.to(torch::kFloat);

//    std::cout<<input_tensor;
    return true;
}



void GetFileInDir(string dirName, vector<string> &v_path)
{
    DIR* Dir = NULL;
    struct dirent* file = NULL;
    if (dirName[dirName.size()-1] != '/')
    {
        dirName += "/";
    }
    if ((Dir = opendir(dirName.c_str())) == NULL)
    {
        cerr << "Can't open Directory" << endl;
        exit(1);
    }
    while (file = readdir(Dir))
    {
        //if the file is a normal file
        if (file->d_type == DT_REG)
        {
            v_path.push_back(dirName + file->d_name);
        }
            //if the file is a directory
        else if (file->d_type == DT_DIR && strcmp(file->d_name, ".") != 0 && strcmp(file->d_name, "..") != 0)
        {
            GetFileInDir(dirName + file->d_name,v_path);
        }
    }
}

string str_replace(const string &str,const string &str_find,const string &str_replacee)
{
    string str_tmp=str;
    size_t pos = str_tmp.find(str_find);
    while (pos != string::npos)
    {
        str_tmp.replace(pos, str_find.length(), str_replacee);

        size_t pos_t=pos+str_replacee.length();
        string str_sub=str_tmp.substr(pos_t,str_tmp.length()-pos_t);

        size_t pos_tt=str_sub.find(str_find);
        if(string::npos != pos_tt)
        {
            pos =pos_t + str_sub.find(str_find);
        }else
        {
            pos=string::npos;
        }
    }
    return str_tmp;
}

string get_ans(const string path)
{
    int pos_1 = path.find_last_of("_");
    int pos_2 = path.find_last_of(".");
    string ans = path.substr(pos_1+1,pos_2-pos_1-1);
    ans = str_replace(ans,"@","/");
    return ans;
}

bool save_tensor_txt(torch::Tensor tensor_in_,string path_txt)
{
#include "fstream"
    ofstream outfile(path_txt);
    torch::Tensor tensor_in = tensor_in_.clone();
    tensor_in = tensor_in.view({-1,1});
    tensor_in = tensor_in.to(torch::kCPU);

    auto result_data = tensor_in.accessor<float, 2>();

    for(int i=0;i<result_data.size(0);i++)
    {
        float val = result_data[i][0];
//        std::cout<<"val="<<val<<std::endl;
        outfile<<val<<std::endl;

    }

    return true;
}



int main()
{
    std::string path_pt = "/data_1/everyday/1118/pytorch_lstm_test/lstmunidirectional20211124.pt";//"/data_1/everyday/1118/pytorch_lstm_test/lstm20211124.pt";//"/data_1/everyday/1118/pytorch_lstm_test/lstm10000.pt";//"/data_1/everyday/1118/pytorch_lstm_test/lstm.pt";
    std::string path_img_dir = "/data_1/2020biaozhushuju/2021_rec/general/test";//"/data_1/everyday/1118/pytorch_lstm_test/test_data";
    int blank_label = 7117;


    std::ifstream list("/data_1/everyday/1123/list.txt");

    int standard_w = 320;
    int standard_h = 32;

//    vector<string> v_path;
//    GetFileInDir(path_img_dir, v_path);
//    for(int i=0;i<v_path.size();i++)
//    {
//        std::cout<<i<<"  "<<v_path[i]<<std::endl;
//    }


    torch::Device m_device(torch::kCUDA);
//    torch::Device m_device(torch::kCPU);
    std::shared_ptr<torch::jit::script::Module> m_model = torch::jit::load(path_pt);

    torch::NoGradGuard no_grad;

    m_model->to(m_device);
    std::cout<<"success load model"<<std::endl;

    int cnt_all = 0;
    int cnt_right = 0;
    double start = getTickCount();
    string file;
    while(list >> file)
    {
        file = "/data_1/everyday/1123/img/bxd_39_發動機號碼.jpg";
        cout<<cnt_all++<<" :: "<<file<<endl;
        string jpg=".jpg";
        string::size_type idx = file.find( jpg );
        if ( idx == string::npos )
            continue;

        int pos_1 = file.find_last_of("_");
        int pos_2 = file.find_last_of(".");
        string answer = file.substr(pos_1+1,pos_2-pos_1-1);

        cv::Mat img = cv::imread(file);
//        int rowNumber = img.rows;  //行數
//        int colNumber = img.cols*img.channels();  //列數 x 通道數=每一行元素的個數
//        std::ofstream out_file("/data_1/everyday/1123/img_acc/libtorch_img.txt");
//        //雙重迴圈,遍歷所有的畫素值
//        for (int i = 0; i < rowNumber; i++)  //行迴圈
//        {
//            uchar *data = img.ptr<uchar>(i);  //獲取第i行的首地址
//            for (int j = 0; j < colNumber; j++)   //列迴圈
//            {
//                // ---------【開始處理每個畫素】-------------
//                int pix = int(data[j]);
//                out_file << pix << std::endl;
//            }
//        }
//
//        out_file.close();
//        while(1);




        torch::Tensor tensor_input;
        pre_img(img, tensor_input);
        tensor_input = tensor_input.to(m_device);
        tensor_input.print();

        std::cout<<tensor_input[0][2][12][25]<<std::endl;
        std::cout<<tensor_input[0][1][15][100]<<std::endl;
        std::cout<<tensor_input[0][0][16][132]<<std::endl;
        std::cout<<tensor_input[0][1][17][156]<<std::endl;
        std::cout<<tensor_input[0][2][5][256]<<std::endl;
        std::cout<<tensor_input[0][0][14][205]<<std::endl;

        save_tensor_txt(tensor_input, "/data_1/everyday/1124/acc/libtorch_input-100.txt");

        torch::Tensor output = m_model->forward({tensor_input}).toTensor();
        output.print();
//        output = output.squeeze();//80,7118
//        output.print();

        save_tensor_txt(output, "/data_1/everyday/1124/acc/libtorch-out-100.txt");
////        std::cout<<output<<std::endl;
        while(1);
//
        torch::Tensor index = torch::argmax(output,1).cpu();//.to(torch::kInt);
        index.print();
//        std::cout<<index<<std::endl;
//        while(1);


        int prev_label = blank_label;
        string result;
        auto result_data = index.accessor<long, 1>();
        for(int i=0;i<result_data.size(0);i++)
        {
//            std::cout<<result_data[i]<<std::endl;
              int predict_label = result_data[i];
            if (predict_label != blank_label && predict_label != prev_label )
            {
                {
                    result = result + chn_tab[predict_label];
                }
            }
            prev_label = predict_label;
        }

        cout << "answer: " << answer << endl;
        cout << "result : " << result << endl;

        imshow("src",img);
        waitKey(0);


//        while(1);


    }


//    for(int i=0;i<v_path.size();i++)
//    {
//        cnt_all += 1;
//        std::string path_img = v_path[i];
//        string ans = get_ans(path_img);
//        std::cout<<i<<"  path="<<path_img<<"    ans="<<ans<<std::endl;
//        cv::Mat img = cv::imread(path_img);



//        torch::Tensor input = pre_img(img, v_mean, v_std, standard_w, standard_h);
//        input = input.to(m_device);
//        torch::Tensor output = m_module.forward({input}).toTensor();
//
//        std::string rec = get_label(output);
//#if 1   //for show
//        std::cout<<"rec="<<rec<<std::endl;
//        std::cout<<"ans="<<ans<<std::endl;
//        cv::imshow("img",img);
//        cv::waitKey(0);
//#endif
//
//#if 0   //In order to test the accuracy
//        std::cout<<"rec="<<rec<<std::endl;
//        std::cout<<"ans="<<ans<<std::endl;
//        if(ans == rec)
//        {
//            cnt_right += 1;
//        }
//        std::cout<<"cnt_right="<<cnt_right<<std::endl;
//        std::cout<<"cnt_all="<<cnt_all<<std::endl;
//        std::cout<<"ratio="<<cnt_right * 1.0 / cnt_all<<std::endl;
//#endif
//    }
//    double time_cunsume = ((double)getTickCount() - start) / getTickFrequency();
//    std::cout<<"ave time="<< time_cunsume * 1.0 / cnt_all * 1000 <<"ms"<<std::endl;

    return 0;
}

好記性不如爛鍵盤---點滴、積累、進步!