caffe模型轉pytorch---LSTM
之前完成了幾個網路的caffe轉pytorch。
refinenet https://www.cnblogs.com/yanghailin/p/13096258.html
refinedet https://www.cnblogs.com/yanghailin/p/12965695.html
上面那個是提取caffe權重然後轉libtorch,下面是直接對應的pytorch版本轉libtorch,大量的後處理用libtorch完成,後來同事也完成了直接拿caffe權重轉libtorch。
無出意外,上面的都是需要編譯caffe的python介面完成。但是一般的工程場景是我們只用caffe的c++,有時候沒有對應的python工程。然後編譯python介面並呼叫會有一些麻煩。
後來我想為啥我們要多此一舉,直接用caffe跑前向的c++工程難道不行嗎?
其實是可以的,只是caffe的原始碼複雜,一開始看不懂。
本系列的部落格就是直接用caffe的c++工程直接提取權重,搭建同樣的pytorch網路,把caffe權重填充過來就可以直接執行跑前向推理。
我是這麼處理的,首先編譯caffe lstm的cpu版本,可以在clion裡面debug,我是在/caffe_ocr/tools/caffe.cpp 把caffe.cpp原有的都刪了,然後換上了lstm 跑前向推理的程式碼,這樣編譯出來的caffe原始碼。
然後我就可以打斷點除錯了。
caffe原始碼是一個高度抽象的工程,通過Layer作為基類,其他所有演算法模組都是在這個Layer上派生出來的。
net類是一個很重要的類,它管理與統籌了整個網路,在net類中可以拿到網路所有中間feature map結果,可以拿到每個層對應的權重。
由於我的目的是需要轉lstm到pytorch。所有把lstm這個運算元實現方法整明白至關重要。一看不知道,再看直接傻眼。lstm實現真是複雜啊!它內部自己整了一個net類!!!雙向lstm就是整了2個net類,派生於RecurrentLayer這個類。
lstm原理的話就是那6個公式,看這個部落格就可以:
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
本文並不打算仔細講解caffe原始碼與lstm具體實現方式。後面有機會單獨開一個部落格吧。
本文具體講解從caffemodel提取各個層的權重。權重是一般是很大的一個矩陣,比如[64,3,7,7], 需要把這些權重儲存起來供Python讀取。
一開始我也在c++想有啥辦法和Python numpy一樣的方便處理矩陣,想過了用json,xml或者直接用caffe自帶的blob類,但是不會用啊!用caffe的proto應該是可以的,但是不會用。
然後就用最直接的方法吧,就是把權重直接一行一個儲存在本地txt中,檔案命名就直接是該層的層名,比如該層層名是conv1,那麼就是conv1_weight_0.txt,conv1_weight_1.txt。第一行放形狀,比如64,3,7,7。
由於權重也是以blob形式存在的,所以我在blob原始碼裡面加上了儲存該blob資料到本地txt的函式,只需要提供儲存的地址就可以。如下:
void save_data_to_txt(const string path_txt,bool b_save_shape = true)
{
std::ofstream fOut(path_txt);
if (!fOut)
{
std::cout << "Open output file faild." << std::endl;
}
if(b_save_shape)
{
for(int i=0;i<shape_.size();i++)
{
fOut << shape_[i];
if(i == shape_.size()-1)
{
fOut<<std::endl;
}else
{
fOut<<",";
}
}
}
const Dtype* data_vec = cpu_data();
for (int i = 0; i < count_; ++i) {
fOut << data_vec[i] << std::endl;
}
fOut.close();
}
下面直接上我的程式碼,儲存每層權重到txt的程式碼如下:
std::cout<<"\n\n\n\n============2021-11-18======================================="<<std::endl;
shared_ptr<Net<float> > net_ = classifier.get_net(); //這裡是從跑前向的類裡面拿Net類
vector<shared_ptr<Layer<float> > > layers = net_->layers(); //拿到每層Layer運算元的指標
vector<shared_ptr<Blob<float> > > params = net_->params();//拿到所有權重指標
vector<vector<Blob<float>*> > bottom_vecs_ = net_->bottom_vecs();//拿到所有bottom feature map
vector<vector<Blob<float>*> > top_vecs_ = net_->top_vecs();//拿到所有top feature map //注意這裡面的layers和bottom_vecs_ top_vecs_都是一一對應的
std::cout<<"size layer=" << layers.size()<<std::endl;
std::cout<<"size params=" << params.size()<<std::endl;
string path_save_dir = "/data_1/Yang/project/save_weight/";
for(int i=0;i<layers.size();i++)
{
shared_ptr<Layer<float> > layer = layers[i];
string name_layer = layer->layer_param().name();//當前層層名
std::cout<<i<<" layer_name="<<name_layer<<" type="<<layer->layer_param().type()<<std::endl;
int bottom_name_size = layer->layer_param().bottom().size();
std::cout<<"=================bottom================"<<std::endl;
if(bottom_name_size>0)
{
for(int ii=0;ii<bottom_name_size;ii++)
{
std::cout<<ii<<" ::bottom name="<<layer->layer_param().bottom(ii)<<std::endl;
Blob<float>* ptr_blob = bottom_vecs_[i][ii];
std::cout<<"bottom shape="<<ptr_blob->shape_string()<<std::endl;
}
} else{
std::cout<<"no bottom"<<std::endl;
}
std::cout<<"=================top================"<<std::endl;
int top_name_size = layer->layer_param().top().size();
if(top_name_size>0)
{
for(int ii=0;ii<top_name_size;ii++)
{
std::cout<<ii<<" ::top name="<<layer->layer_param().top(ii)<<std::endl;
Blob<float>* ptr_blob = top_vecs_[i][ii];
std::cout<<"top shape="<<ptr_blob->shape_string()<<std::endl;
}
} else{
std::cout<<"no top"<<std::endl;
}
vector<shared_ptr<Blob<float> > > params = layer->blobs();
std::cout<<"=================params ================"<<std::endl;
std::cout<<"params size= "<<params.size()<<std::endl;
if(0 == params.size())
{
std::cout<<"has no params"<<std::endl;
} else
{
for(int j=0;j<params.size();j++)
{
std::cout<<"params_"<<j<<" shape="<<params[j]->shape_string()<<std::endl;
params[j]->save_data_to_txt(path_save_dir + name_layer + "_weight_" + std::to_string(j)+".txt");
}
}
std::cout<<std::endl;
}
//這裡是為了對比caffe和pytorch的某一層輸出是否一致,先儲存caffe的某層feature map輸出。
string name_aim_top = "premuted_fc";
const shared_ptr<Blob<float>> feature_map = net_->blob_by_name(name_aim_top);
bool b_save_shape = false;
std::cout<<"featuremap shape="<<std::endl;
std::cout<<feature_map->shape_string()<<std::endl;
feature_map->save_data_to_txt("/data_1/Yang/project/myfile/blob_val/"+name_aim_top+".txt",b_save_shape);
看caffe網路的話,可以直接把prototxt檔案複製到網頁上面檢視。
http://ethereon.github.io/netscope/quickstart.html
這樣看比較直觀。
這裡需要特別注意的是一個,就地操作。就是比如圖上網路連在一起的conv1,conv1_bn,conv1_scale,conv1_relu由於它們的bottom和top名字一樣,導致經過該層的運算結果直接會覆蓋bottom,就是共用了一塊記憶體。
這裡是一個坑,之前一個同事也在做類似的工作,然後不同框架之間對比檢查精度,發現剛開始的幾層精度就對不上了,苦苦找問題找了一週都沒有找到,最後讓我幫忙看了看,我看了大半天才發現是這個就地操作導致的,你想拿conv1的feature map的結果是拿不到,你拿的實際已經是經過了conv1,conv1_bn,conv1_scale,conv1_relu這4步操作之後的結果了!
以上,就會生成每層權重,如果該層有多個權重,就直接是檔名末尾計數0,1,2來區分的,命名方式是layerName+_weight_cnt.txt。檔案txt第一行是權重的shape,比如64,64,1,1。
完事之後,在Python端,我先寫了一個指令碼,讀取txt把這些權重儲存在一個字典裡面。
import os
import numpy as np
#這個類主要是為了能夠多重字典賦值
class AutoVivification(dict):
"""Implementation of perl's autovivification feature."""
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value
def get_weight_numpy(path_dir):
out = AutoVivification()
list_txt = os.listdir(path_dir)
for cnt,txt in enumerate(list_txt):
print(cnt, " ", txt)
txt_ = txt.replace(".txt","")
layer_name, idx = txt_.split("_weight_")
path_txt = path_dir + txt
with open(path_txt, 'r') as fr:
lines = fr.readlines()
data = []
shape_line = []
for cnt_1, line in enumerate(lines):
if(0 == cnt_1):
shape_line = []
shape_line = line.strip().split(",")
else:
data.append(float(line))
shape_line = map(eval, shape_line)
data = np.array(data).reshape(shape_line)
# new_dict = {}
out[layer_name][int(idx)] = data
return out
if __name__ == "__main__":
path_dir = "/data_1/Yang/project/save_weight/"
out = get_weight_numpy(path_dir)
conv1_weight = out['conv1'][0]
conv1_bias = out['conv1'][1]
下面直接給出把caffe儲存的權重懟到搭建的pytorch 層上:
# coding=utf-8
import torch
import torchvision
from torch import nn
import torch.nn.functional as F
import cv2
import numpy as np
from weight_numpy import get_weight_numpy
class lstm_general(nn.Module): # SfSNet = PS-Net in SfSNet_deploy.prototxt
def __init__(self):
super(lstm_general, self).__init__()
# self.conv1_1 = nn.Conv2d(3, 64, 3, 1, 1)
self.data_bn = nn.BatchNorm2d(3)
self.conv1 = nn.Conv2d(3, 64, 7, 2, 3)
self.conv1_bn = nn.BatchNorm2d(64)
self.conv1_pool = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
self.layer_64_1_conv1 = nn.Conv2d(64, 64, 1, 1, 0, bias = False)
self.layer_64_1_bn2 = nn.BatchNorm2d(64)
self.layer_64_1_conv2 = nn.Conv2d(64, 64, 3, 1, 1, bias=False)
self.layer_64_1_bn3 = nn.BatchNorm2d(64)
self.layer_64_1_conv3 = nn.Conv2d(64, 256, 1, 1, 0, bias=False)
self.layer_64_1_conv_expand = nn.Conv2d(64, 256, 1, 1, 0, bias=False)
self.layer_128_1_bn1 = nn.BatchNorm2d(256)
self.layer_128_1_conv1 = nn.Conv2d(256, 128, 1, 1, 0, bias=False)
self.layer_128_1_bn2 = nn.BatchNorm2d(128)
self.layer_128_1_conv2 = nn.Conv2d(128, 128, 3, 1, 1, bias=False)
self.layer_128_1_bn3 = nn.BatchNorm2d(128)
self.layer_128_1_conv3 = nn.Conv2d(128, 512, 1, 1, 0, bias=False)
self.layer_128_1_conv_expand = nn.Conv2d(256, 512, 1, 1, 0, bias=False)
self.last_bn = nn.BatchNorm2d(512)
# self.lstm_1 = nn.LSTM(512 * 8, 100, 1, bidirectional=False)
self.lstm_lr = nn.LSTM(512 * 8, 100, 1, bidirectional=True)
self.fc1x1_r2_v2_a = nn.Linear(200,7118)
def forward(self, inputs):
# x = F.relu(self.bn1_1(self.conv1_1(inputs)))
x = self.data_bn(inputs)
x = F.relu(self.conv1_bn(self.conv1(x)))
x = self.conv1_pool(x) #[1,64,8,80]
x = F.relu(self.layer_64_1_bn2(self.layer_64_1_conv1(x))) # 1 64 8 80
layer_64_1_conv1 = x
x = F.relu(self.layer_64_1_bn3(self.layer_64_1_conv2(x)))
x = self.layer_64_1_conv3(x)
layer_64_1_conv_expand = self.layer_64_1_conv_expand(layer_64_1_conv1)
layer_64_3_sum = x + layer_64_1_conv_expand #1 256 8 80
x = F.relu(self.layer_128_1_bn1(layer_64_3_sum))
layer_128_1_bn1 = x
x = F.relu(self.layer_128_1_bn2(self.layer_128_1_conv1(x)))
x = F.relu(self.layer_128_1_bn3(self.layer_128_1_conv2(x)))
x = self.layer_128_1_conv3(x) #1, 512, 8, 80
layer_128_1_conv_expand = self.layer_128_1_conv_expand(layer_128_1_bn1) #1, 512, 8, 80
layer_128_4_sum = x + layer_128_1_conv_expand
x = F.relu(self.last_bn(layer_128_4_sum))
x = F.dropout(x, p=0.7, training=False) #1 512 8 80
x = x.permute(3,0,1,2) # 80 1 512 8
x = x.reshape(80,1,512*8)
#
# merge_lstm_rlstmx, (hn, cn) = self.lstm_r(x)
lstm_out,(_,_) = self.lstm_lr(x) #(80,1,200)
out = self.fc1x1_r2_v2_a(lstm_out) #(80,1,7118)
return out
def save_tensor(tensor_in,path_save):
tensor_in = tensor_in.contiguous().view(-1,1)
np_tensor = tensor_in.cpu().detach().numpy()
# np_tensor = np_tensor.view()
np.savetxt(path_save,np_tensor,fmt='%.12e')
def access_pixels(frame):
print(frame.shape) # shape內包含三個元素:按順序為高、寬、通道數
height = frame.shape[0]
weight = frame.shape[1]
channels = frame.shape[2]
print("weight : %s, height : %s, channel : %s" % (weight, height, channels))
with open("/data_1/Yang/project/myfile/blob_val/img_stand_python.txt", "w") as fw:
for row in range(height): # 遍歷高
for col in range(weight): # 遍歷寬
for c in range(channels): # 便利通道
pv = frame[row, col, c]
fw.write(str(int(pv)))
fw.write("\n")
def LstmImgStandardization(img, ratio=10.0, stand_w=320, stand_h=32):
img_h, img_w, _ = img.shape
if img_h < 2 or img_w < 2:
return
# if 32 == img_h and 320 == img_w:
# return img
ratio_now = img_w * 1.0 / img_h
if ratio_now <= ratio:
mask = np.ones((img_h, int(img_h * ratio), 3), dtype=np.uint8) * 255
mask[0:img_h,0:img_w,:] = img
else:
mask = np.ones((int(img_w*1.0/ratio), img_w, 3), dtype=np.uint8) * 255
mask[0:img_h, 0:img_w, :] = img
mask_stand = cv2.resize(mask,(stand_w, stand_h),interpolation=cv2.INTER_LINEAR)
# access_pixels(mask_stand)
return mask_stand
if __name__ == '__main__':
device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
net = lstm_general()
# net.eval()
index = 0
print("*" * 50)
for name, param in list(net.named_parameters()):
print(str(index) + ':', name, param.size())
index += 1
print("*" * 50)
##搭建完網路就可以通過這裡看到網路所需要的引數名字
for k, v in net.state_dict().items():
print(k)
print(v.shape)
# print(k,v)
print("@" * 50)
# aaa = np.zeros((400,1))
path_dir = "/data_1/Yang/project/OCR/3rdlib/caffe_ocr_2021/myfile/save_weight/"
weight_numpy_dict = get_weight_numpy(path_dir)
from torch import from_numpy
state_dict = {}
state_dict['data_bn.running_mean'] = from_numpy(weight_numpy_dict["data_bn"][0] / weight_numpy_dict["data_bn"][2])
state_dict['data_bn.running_var'] = from_numpy(weight_numpy_dict["data_bn"][1] / weight_numpy_dict["data_bn"][2])
state_dict['data_bn.weight'] = from_numpy(weight_numpy_dict['data_scale'][0])
state_dict['data_bn.bias'] = from_numpy(weight_numpy_dict['data_scale'][1])
state_dict['conv1.weight'] = from_numpy(weight_numpy_dict['conv1'][0])
state_dict['conv1.bias'] = from_numpy(weight_numpy_dict['conv1'][1])
state_dict['conv1_bn.running_mean'] = from_numpy(weight_numpy_dict["conv1_bn"][0] / weight_numpy_dict["conv1_bn"][2])
state_dict['conv1_bn.running_var'] = from_numpy(weight_numpy_dict["conv1_bn"][1] / weight_numpy_dict["conv1_bn"][2])
state_dict['conv1_bn.weight'] = from_numpy(weight_numpy_dict['conv1_scale'][0])
state_dict['conv1_bn.bias'] = from_numpy(weight_numpy_dict['conv1_scale'][1])
state_dict['layer_64_1_conv1.weight'] = from_numpy(weight_numpy_dict['layer_64_1_conv1'][0])
state_dict['layer_64_1_bn2.running_mean'] = from_numpy(weight_numpy_dict["layer_64_1_bn2"][0] / weight_numpy_dict["layer_64_1_bn2"][2])
state_dict['layer_64_1_bn2.running_var'] = from_numpy(weight_numpy_dict["layer_64_1_bn2"][1] / weight_numpy_dict["layer_64_1_bn2"][2])
state_dict['layer_64_1_bn2.weight'] = from_numpy(weight_numpy_dict['layer_64_1_scale2'][0])
state_dict['layer_64_1_bn2.bias'] = from_numpy(weight_numpy_dict['layer_64_1_scale2'][1])
state_dict['layer_64_1_conv2.weight'] = from_numpy(weight_numpy_dict['layer_64_1_conv2'][0])
state_dict['layer_64_1_bn3.running_mean'] = from_numpy(weight_numpy_dict["layer_64_1_bn3"][0] / weight_numpy_dict["layer_64_1_bn3"][2])
state_dict['layer_64_1_bn3.running_var'] = from_numpy(weight_numpy_dict["layer_64_1_bn3"][1] / weight_numpy_dict["layer_64_1_bn3"][2])
state_dict['layer_64_1_bn3.weight'] = from_numpy(weight_numpy_dict['layer_64_1_scale3'][0])
state_dict['layer_64_1_bn3.bias'] = from_numpy(weight_numpy_dict['layer_64_1_scale3'][1])
state_dict['layer_64_1_conv3.weight'] = from_numpy(weight_numpy_dict['layer_64_1_conv3'][0])
state_dict['layer_64_1_conv_expand.weight'] = from_numpy(weight_numpy_dict['layer_64_1_conv_expand'][0])
state_dict['layer_128_1_bn1.running_mean'] = from_numpy(weight_numpy_dict["layer_128_1_bn1"][0] / weight_numpy_dict["layer_128_1_bn1"][2])
state_dict['layer_128_1_bn1.running_var'] = from_numpy(weight_numpy_dict["layer_128_1_bn1"][1] / weight_numpy_dict["layer_128_1_bn1"][2])
state_dict['layer_128_1_bn1.weight'] = from_numpy(weight_numpy_dict['layer_128_1_scale1'][0])
state_dict['layer_128_1_bn1.bias'] = from_numpy(weight_numpy_dict['layer_128_1_scale1'][1])
state_dict['layer_128_1_conv1.weight'] = from_numpy(weight_numpy_dict['layer_128_1_conv1'][0])
state_dict['layer_128_1_bn2.running_mean'] = from_numpy(weight_numpy_dict["layer_128_1_bn2"][0] / weight_numpy_dict["layer_128_1_bn2"][2])
state_dict['layer_128_1_bn2.running_var'] = from_numpy(weight_numpy_dict["layer_128_1_bn2"][1] / weight_numpy_dict["layer_128_1_bn2"][2])
state_dict['layer_128_1_bn2.weight'] = from_numpy(weight_numpy_dict['layer_128_1_scale2'][0])
state_dict['layer_128_1_bn2.bias'] = from_numpy(weight_numpy_dict['layer_128_1_scale2'][1])
state_dict['layer_128_1_conv2.weight'] = from_numpy(weight_numpy_dict['layer_128_1_conv2'][0])
state_dict['layer_128_1_bn3.running_mean'] = from_numpy(weight_numpy_dict["layer_128_1_bn3"][0] / weight_numpy_dict["layer_128_1_bn3"][2])
state_dict['layer_128_1_bn3.running_var'] = from_numpy(weight_numpy_dict["layer_128_1_bn3"][1] / weight_numpy_dict["layer_128_1_bn3"][2])
state_dict['layer_128_1_bn3.weight'] = from_numpy(weight_numpy_dict['layer_128_1_scale3'][0])
state_dict['layer_128_1_bn3.bias'] = from_numpy(weight_numpy_dict['layer_128_1_scale3'][1])
state_dict['layer_128_1_conv3.weight'] = from_numpy(weight_numpy_dict['layer_128_1_conv3'][0])
state_dict['layer_128_1_conv_expand.weight'] = from_numpy(weight_numpy_dict['layer_128_1_conv_expand'][0])
state_dict['last_bn.running_mean'] = from_numpy(weight_numpy_dict["last_bn"][0] / weight_numpy_dict["last_bn"][2])
state_dict['last_bn.running_var'] = from_numpy(weight_numpy_dict["last_bn"][1] / weight_numpy_dict["last_bn"][2])
state_dict['last_bn.weight'] = from_numpy(weight_numpy_dict['last_scale'][0])
state_dict['last_bn.bias'] = from_numpy(weight_numpy_dict['last_scale'][1])
## caffe i f o g
## pytorch i f g o
ww = from_numpy(weight_numpy_dict['lstm1x_r2'][0]) # [400,4096]
ww_200_if = ww[:200,:] #[200,4096]
ww_100_o = ww[200:300,:] #[100,4096]
ww_100_g = ww[300:400,:]#[100,4096]
ww_cat_ifgo = torch.cat((ww_200_if,ww_100_g,ww_100_o),0)
state_dict['lstm_lr.weight_ih_l0'] = ww_cat_ifgo
bb = from_numpy(weight_numpy_dict['lstm1x_r2'][1]) # [400]
bb_200_if = bb[:200]
bb_100_o = bb[200:300]
bb_100_g = bb[300:400]
bb_cat_ifgo = torch.cat((bb_200_if, bb_100_g, bb_100_o), 0)
state_dict['lstm_lr.bias_ih_l0'] = bb_cat_ifgo
ww = from_numpy(weight_numpy_dict['lstm1x_r2'][2]) # [400,100]
ww_200_if = ww[:200, :] # [200,100]
ww_100_o = ww[200:300, :] # [100,100]
ww_100_g = ww[300:400, :] # [100,100]
ww_cat_ifgo = torch.cat((ww_200_if, ww_100_g, ww_100_o), 0)
state_dict['lstm_lr.weight_hh_l0'] = ww_cat_ifgo
state_dict['lstm_lr.bias_hh_l0'] = from_numpy(np.zeros((400)))
##########################################
ww = from_numpy(weight_numpy_dict['lstm2x_r2'][0]) # [400,4096]
ww_200_if = ww[:200, :] # [200,4096]
ww_100_o = ww[200:300, :] # [100,4096]
ww_100_g = ww[300:400, :] # [100,4096]
ww_cat_ifgo = torch.cat((ww_200_if, ww_100_g, ww_100_o), 0)
state_dict['lstm_lr.weight_ih_l0_reverse'] = ww_cat_ifgo
bb = from_numpy(weight_numpy_dict['lstm2x_r2'][1]) # [400]
bb_200_if = bb[:200]
bb_100_o = bb[200:300]
bb_100_g = bb[300:400]
bb_cat_ifgo = torch.cat((bb_200_if, bb_100_g, bb_100_o), 0)
state_dict['lstm_lr.bias_ih_l0_reverse'] = bb_cat_ifgo
ww = from_numpy(weight_numpy_dict['lstm2x_r2'][2]) # [400,100]
ww_200_if = ww[:200, :] # [200,100]
ww_100_o = ww[200:300, :] # [100,100]
ww_100_g = ww[300:400, :] # [100,100]
ww_cat_ifgo = torch.cat((ww_200_if, ww_100_g, ww_100_o), 0)
state_dict['lstm_lr.weight_hh_l0_reverse'] = ww_cat_ifgo
state_dict['lstm_lr.bias_hh_l0_reverse'] = from_numpy(np.zeros((400)))
state_dict['fc1x1_r2_v2_a.weight'] = from_numpy(weight_numpy_dict['fc1x1_r2_v2_a'][0])
state_dict['fc1x1_r2_v2_a.bias'] = from_numpy(weight_numpy_dict['fc1x1_r2_v2_a'][1])
####input########################################
path_img = "/data_2/project/1.jpg"
img = cv2.imread(path_img)
# access_pixels(img)
img_stand = LstmImgStandardization(img, ratio=10.0, stand_w=320, stand_h=32)
img_stand = img_stand.astype(np.float32)
# img = (img / 255. - config.DATASET.MEAN) / config.DATASET.STD
img_stand = img_stand.transpose([2, 0, 1])
img_stand = img_stand[None,:,:,:]
img_stand = torch.from_numpy(img_stand)
img_stand = img_stand.type(torch.FloatTensor)
img_stand = img_stand.to(device)
# img_stand = img_stand.view(1, *img.size())
#######net##########################
net.load_state_dict(state_dict)
net.cuda()
net.eval()
preds = net(img_stand)
print("out shape=",preds.shape)
torch.save(net.state_dict(), './lstm_model.pth')
# name_top_caffe_layer = "fc1x_a" #"merge_lstm_rlstmx" #"#"data_bn"
# path_save = "/data_1/Yang/project/myfile/blob_val/" + name_top_caffe_layer + "_torch.txt"
# save_tensor(preds, path_save)
aaa = 0
這裡需要注意一下caffe裡面的bn層有三個引數,前面兩個是均值和方差,第三個引數是一個係數,均值和方差都需要除以這個係數,這個係數是一個固定值999.982
caffe中的scale層就是圖中下面這個公式係數。
這裡還需要講下lstm這個演算法。在caffe中設定的time_step為80,設定的hidden為100,輸入到lstm之前的feature map大小是80,1,512,8.
然後我通過層的權重看到lstm有3個權重,大小分別是[400,4096] [400] [400,100]
lstm通過檢視原始碼發現有引數的就是2個全連線層,[400,4096] [400] 這兩個是對輸入進行inner所需要的引數,400是100*4得到的,至於為什麼是4,這個需要看lstm原理,這裡簡單說下就是用h,x有4組相乘。
[400,100]是隱含h進行inner所需要的權重。
檢視pytorch手冊關於lstm介紹。
https://pytorch.org/docs/1.0.1/nn.html?highlight=lstm#torch.nn.LSTM。輸入引數介紹。
然後根據輸入引數,單獨寫了一個lstm運算元測試看看:
import torch
import torch.nn as nn
# rnn = nn.LSTM(512*8, 100, 1, False)
# input = torch.randn(80, 1, 512*8)
#
# output, (hn, cn) = rnn(input)
#
#
# for name,parameters in rnn.named_parameters():
# print(name,':',parameters.size())
# # parm[name]=parameters.detach().numpy()
#
# aa = 0
rnn = nn.LSTM(512*8, 100, 1, bidirectional=True)
input = torch.randn(80, 1, 512*8)
output, (hn, cn) = rnn(input)
print("out shape=",output.shape)
for name,parameters in rnn.named_parameters():
print(name,':',parameters.size())
# parm[name]=parameters.detach().numpy()
aa = 0
輸出如下:
('out shape=', (80, 1, 200))
('weight_ih_l0', ':', (400, 4096))
('weight_hh_l0', ':', (400, 100))
('bias_ih_l0', ':', (400,))
('bias_hh_l0', ':', (400,))
('weight_ih_l0_reverse', ':', (400, 4096))
('weight_hh_l0_reverse', ':', (400, 100))
('bias_ih_l0_reverse', ':', (400,))
('bias_hh_l0_reverse', ':', (400,))
Process finished with exit code 0
可以看到pytorch的lstm所需要的引數基本與caffe一致,不過caffe的一個lstm引數是3個,pytorch的lstm引數是4個,顯然是因為caffe隱含層的inner沒用偏置,到時候直接把一個pytorch的偏置放為0就可以!
然而事情並不是一帆風順的,上面給出的程式碼是成功的,但是在此之前我把所有的引數都懟上,但是精度是不對的。後面仔細看lstm原始碼,發現caffe的計算順序:
lstm_unit_layer.cpp
template <typename Dtype>
void LSTMUnitLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
const int num = bottom[0]->shape(1);//1
const int x_dim = hidden_dim_ * 4;
const Dtype* C_prev = bottom[0]->cpu_data();
const Dtype* X = bottom[1]->cpu_data();
const Dtype* cont = bottom[2]->cpu_data();
Dtype* C = top[0]->mutable_cpu_data();
Dtype* H = top[1]->mutable_cpu_data();
for (int n = 0; n < num; ++n) { //1
for (int d = 0; d < hidden_dim_; ++d) {//100
const Dtype i = sigmoid(X[d]);
const Dtype f = (*cont == 0) ? 0 :
(*cont * sigmoid(X[1 * hidden_dim_ + d]));weight_ih_l[k] – the learnable input-hidden weights of the \text{k}^{th}k
th
layer (W_ii|W_if|W_ig|W_io), of shape (4*hidden_size x input_size)
weight_hh_l[k] – the learnable hidden-hidden weights of the \text{k}^{th}k
th
layer (W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size x hidden_size)
bias_ih_l[k] – the learnable input-hidden bias of the \text{k}^{th}k
th
layer (b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)
bias_hh_l[k] – the learnable hidden-hidden bias of the \text{k}^{th}k
th
layer (b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)
const Dtype o = sigmoid(X[2 * hidden_dim_ + d]);
const Dtype g = tanh(X[3 * hidden_dim_ + d]);
const Dtype c_prev = C_prev[d];
const Dtype c = f * c_prev + i * g;
C[d] = c;
const Dtype tanh_c = tanh(c);
H[d] = o * tanh_c;
}
C_prev += hidden_dim_;
X += x_dim;
C += hidden_dim_;
H += hidden_dim_;
++cont;
}
}
發現caffe的計算順序是ifog。
看pytorch說明文件介紹權重的順序是
weight_ih_l[k] – the learnable input-hidden weights of the \text{k}^{th}k
th
layer (W_ii|W_if|W_ig|W_io), of shape (4*hidden_size x input_size)
weight_hh_l[k] – the learnable hidden-hidden weights of the \text{k}^{th}k
th
layer (W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size x hidden_size)
bias_ih_l[k] – the learnable input-hidden bias of the \text{k}^{th}k
th
layer (b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)
bias_hh_l[k] – the learnable hidden-hidden bias of the \text{k}^{th}k
th
layer (b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)
有點兒不一樣,那麼我只需要把caffe的權重順序改下和pytorch一致試試。所有就有了上面的程式碼:
## caffe i f o g
## pytorch i f g o
ww = from_numpy(weight_numpy_dict['lstm1x_r2'][0]) # [400,4096]
ww_200_if = ww[:200,:] #[200,4096]
ww_100_o = ww[200:300,:] #[100,4096]
ww_100_g = ww[300:400,:]#[100,4096]
ww_cat_ifgo = torch.cat((ww_200_if,ww_100_g,ww_100_o),0)
state_dict['lstm_lr.weight_ih_l0'] = ww_cat_ifgo
這樣一整,成功了,精度一致!! 給出測試精度的程式碼。
不同框架下驗證精度 https://www.cnblogs.com/yanghailin/p/15593614.html
給出我跑出結果的程式碼:
# -*- coding: utf-8
import torch
from torch import nn
import torch.nn.functional as F
import cv2
import numpy as np
import os
from chn_tab import chn_tab
class lstm_general(nn.Module): # SfSNet = PS-Net in SfSNet_deploy.prototxt
def __init__(self):
super(lstm_general, self).__init__()
# self.conv1_1 = nn.Conv2d(3, 64, 3, 1, 1)
self.data_bn = nn.BatchNorm2d(3)
self.conv1 = nn.Conv2d(3, 64, 7, 2, 3)
self.conv1_bn = nn.BatchNorm2d(64)
self.conv1_pool = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
self.layer_64_1_conv1 = nn.Conv2d(64, 64, 1, 1, 0, bias = False)
self.layer_64_1_bn2 = nn.BatchNorm2d(64)
self.layer_64_1_conv2 = nn.Conv2d(64, 64, 3, 1, 1, bias=False)
self.layer_64_1_bn3 = nn.BatchNorm2d(64)
self.layer_64_1_conv3 = nn.Conv2d(64, 256, 1, 1, 0, bias=False)
self.layer_64_1_conv_expand = nn.Conv2d(64, 256, 1, 1, 0, bias=False)
self.layer_128_1_bn1 = nn.BatchNorm2d(256)
self.layer_128_1_conv1 = nn.Conv2d(256, 128, 1, 1, 0, bias=False)
self.layer_128_1_bn2 = nn.BatchNorm2d(128)
self.layer_128_1_conv2 = nn.Conv2d(128, 128, 3, 1, 1, bias=False)
self.layer_128_1_bn3 = nn.BatchNorm2d(128)
self.layer_128_1_conv3 = nn.Conv2d(128, 512, 1, 1, 0, bias=False)
self.layer_128_1_conv_expand = nn.Conv2d(256, 512, 1, 1, 0, bias=False)
self.last_bn = nn.BatchNorm2d(512)
# self.lstm_1 = nn.LSTM(512 * 8, 100, 1, bidirectional=False)
self.lstm_lr = nn.LSTM(512 * 8, 100, 1, bidirectional=True)
self.fc1x1_r2_v2_a = nn.Linear(200,7118)
def forward(self, inputs):
# x = F.relu(self.bn1_1(self.conv1_1(inputs)))
x = self.data_bn(inputs)
x = F.relu(self.conv1_bn(self.conv1(x)))
x = self.conv1_pool(x) #[1,64,8,80]
x = F.relu(self.layer_64_1_bn2(self.layer_64_1_conv1(x))) # 1 64 8 80
layer_64_1_conv1 = x
x = F.relu(self.layer_64_1_bn3(self.layer_64_1_conv2(x)))
x = self.layer_64_1_conv3(x)
layer_64_1_conv_expand = self.layer_64_1_conv_expand(layer_64_1_conv1)
layer_64_3_sum = x + layer_64_1_conv_expand #1 256 8 80
x = F.relu(self.layer_128_1_bn1(layer_64_3_sum))
layer_128_1_bn1 = x
x = F.relu(self.layer_128_1_bn2(self.layer_128_1_conv1(x)))
x = F.relu(self.layer_128_1_bn3(self.layer_128_1_conv2(x)))
x = self.layer_128_1_conv3(x) #1, 512, 8, 80
layer_128_1_conv_expand = self.layer_128_1_conv_expand(layer_128_1_bn1) #1, 512, 8, 80
layer_128_4_sum = x + layer_128_1_conv_expand
x = F.relu(self.last_bn(layer_128_4_sum))###acc ok
x = F.dropout(x, p=0.7, training=False) #1 512 8 80
x = x.permute(3,0,1,2) # 80 1 512 8
x = x.reshape(80,1,512*8)###acc ok
#
# merge_lstm_rlstmx, (hn, cn) = self.lstm_r(x)
lstm_out,(_,_) = self.lstm_lr(x) #(80,1,200)
return lstm_out
out = self.fc1x1_r2_v2_a(lstm_out) #(80,1,7118)
return out
def LstmImgStandardization(img, ratio=10.0, stand_w=320, stand_h=32):
img_h, img_w, _ = img.shape
if img_h < 2 or img_w < 2:
return
# if 32 == img_h and 320 == img_w:
# return img
ratio_now = img_w * 1.0 / img_h
if ratio_now <= ratio:
mask = np.ones((img_h, int(img_h * ratio), 3), dtype=np.uint8) * 255
mask[0:img_h,0:img_w,:] = img
else:
mask = np.ones((int(img_w*1.0/ratio), img_w, 3), dtype=np.uint8) * 255
mask[0:img_h, 0:img_w, :] = img
mask_stand = cv2.resize(mask,(stand_w, stand_h),interpolation=cv2.INTER_LINEAR)
# access_pixels(mask_stand)
return mask_stand
if __name__ == '__main__':
path_model = "/data_1/everyday/1118/pytorch_lstm_test/lstm_model.pth"
path_img = "/data_2/project_202009/chejian/test_data/model_test/rec_general/1.jpg"
blank_label = 7117
prev_label = blank_label
device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
img = cv2.imread(path_img)
img_stand = LstmImgStandardization(img, ratio=10.0, stand_w=320, stand_h=32)
img_stand = img_stand.astype(np.float32)
img_stand = img_stand.transpose([2, 0, 1])
img_stand = img_stand[None, :, :, :]
img_stand = torch.from_numpy(img_stand)
img_stand = img_stand.type(torch.FloatTensor)
img_stand = img_stand.to(device)
net = lstm_general()
checkpoint = torch.load(path_model)
net.load_state_dict(checkpoint)
net.cuda()
net.eval()
# traced_script_module = torch.jit.trace(net, img_stand)
# traced_script_module.save("./lstm.pt")
preds = net(img_stand)
# print("out shape=", preds.shape)
preds_1 = preds.squeeze()
# print("preds_1 out shape=", preds_1.shape)
val, pos = torch.max(preds_1,1)
pos = pos.cpu().numpy()
rec = ""
for predict_label in pos:
if predict_label != blank_label and predict_label != prev_label:
# print("predict_label=",predict_label)
print(chn_tab[predict_label])
rec += chn_tab[predict_label]
prev_label = predict_label
# print("rec=",rec)
print(rec)
弄成功了,但是隻高興了一天。
我最終目的是能在c++下面跑,於是轉libtorch,本來我以為這是輕而易舉的事情,但是事情並沒有那麼簡單。
我發現我的libtorch程式碼經過lstm這層之後精度就對不上了,在此之前都是可以對上的。!!!無解。
可能和版本有關係,因為我用高版本的libtorch之前是轉成功一個crnn的,是沒有問題的。
https://github.com/wuzuowuyou/crnn_libtorch
這個是pytorch1.7版本的,而我現在是用的1.0版本的。我試了很久發現還是精度不對,這就無法解決了,也不知道從何下手去解決這個問題。翻遍了pytorch github上面的issue,沒人遇到和我一樣的問題。。。除非看pytorch原始碼去找問題,這太難了。
在pytorch的github提了issue
https://github.com/pytorch/pytorch/issues/68864
我知道這也會石沉大海的。
以下是我凌亂的,未完工的程式碼:
#include <torch/script.h> // One-stop header.
#include "torch/torch.h"
#include "torch/jit.h"
#include <memory>
#include "opencv2/opencv.hpp"
#include <queue>
#include <dirent.h>
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <opencv2/opencv.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
using namespace cv;
using namespace std;
// cv::Mat m_stand;
#define TABLE_SIZE 7117
static string chn_tab[TABLE_SIZE+1] = {"啊","阿","埃"
。。。
。。。
。。。
"0","1","2","3","4","5","6","7","8","9",
":",";","<","=",">","?","@",
"A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z",
"[","\\","]","^","_","`",
"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
"{","|","}","~",
" "};
bool LstmImgStandardization_src_1(const cv::Mat &src, const float &ratio, int standard_w, int standard_h, cv::Mat &dst)
{
if(src.empty())return false;
float width=src.cols;
float height=src.rows;
float a=width/ height;
if(a <=ratio)
{
Mat mask(height, ratio*height, CV_8UC3, cv::Scalar(255, 255, 255));
Mat imageROI = mask(Rect(0, 0, width, height));
src.copyTo(imageROI);
dst=mask.clone();
}
else
{
Mat mask(width/ratio, width, CV_8UC3, cv::Scalar(255, 255, 255));
Mat imageROI = mask(Rect(0, 0, width, height));
src.copyTo(imageROI);
dst=mask.clone();
}
//cv::resize(dst, dst, cv::Size(standard_w,standard_h));
cv::resize(dst, dst, cv::Size(standard_w,standard_h),0,0,cv::INTER_AREA);
return true;
}
bool lstm_img_standardization(cv::Mat src, cv::Mat &dst,float ratio)
{
if(src.empty())return false;
double width=src.cols;
double height=src.rows;
double a=width/height;
if(a <=ratio)//6
{
Mat mask(height, ratio*height, CV_8UC3, Scalar(255, 255, 255));
Mat imageROI = mask(Rect(0, 0, width, height));
src.copyTo(imageROI);
dst=mask.clone();
}
else
{
Mat mask(width/ratio, width, CV_8UC3, Scalar(255, 255, 255));
Mat imageROI = mask(Rect(0, 0, width, height));
src.copyTo(imageROI);
dst=mask.clone();
}
// cv::resize(dst, dst, cv::Size(360,60));
cv::resize(dst, dst, cv::Size(320,32));
return true;
}
//torch::Tensor pre_img(cv::Mat &img)
//{
// cv::Mat m_stand;
// float ratio = 10.0;
// if(1 == img.channels()) { cv::cvtColor(img,img,CV_GRAY2BGR); }
// lstm_img_standardization(img, m_stand, ratio);
//
// std::vector<int64_t> sizes = {m_stand.rows, m_stand.cols, m_stand.channels()};
// torch::TensorOptions options = torch::TensorOptions().dtype(torch::kByte);
// torch::Tensor tensor_image = torch::from_blob(m_stand.data, torch::IntList(sizes), options);
// // Permute tensor, shape is (C, H, W)
// tensor_image = tensor_image.permute({2, 0, 1});
//
//
// // Convert tensor dtype to float32, and range from [0, 255] to [0, 1]
// tensor_image = tensor_image.toType(torch::ScalarType::Float);
//
//
//// tensor_image = tensor_image.div_(255.0);
//// // Subtract mean value
//// for (int i = 0; i < std::min<int64_t>(v_mean.size(), tensor_image.size(0)); i++) {
//// tensor_image[i] = tensor_image[i].sub_(v_mean[i]);
//// }
//// // Divide by std value
//// for (int i = 0; i < std::min<int64_t>(v_std.size(), tensor_image.size(0)); i++) {
//// tensor_image[i] = tensor_image[i].div_(v_std[i]);
//// }
// //[c,h,w] --> [1,c,h,w]
// tensor_image.unsqueeze_(0);
// std::cout<<tensor_image;
// return tensor_image;
//}
bool pre_img(cv::Mat &img, torch::Tensor &input_tensor)
{
static cv::Mat m_stand;
float ratio = 10.0;
// if(1 == img.channels()) { cv::cvtColor(img,img,CV_GRAY2BGR); }
lstm_img_standardization(img, m_stand, ratio);
m_stand.convertTo(m_stand, CV_32FC3);
// imshow("m_stand",m_stand);
// waitKey(0);
// Mat m_stand_new;
// m_stand.convertTo(m_stand_new, CV_32FC3);
// int rowNumber = m_stand_new.rows; //行數
// int colNumber = m_stand_new.cols*m_stand_new.channels(); //列數 x 通道數=每一行元素的個數
// std::ofstream out_file("/data_1/everyday/1123/img_acc/after_CV_32FC3-float-111.txt");
// //雙重迴圈,遍歷所有的畫素值
// for (int i = 0; i < rowNumber; i++) //行迴圈
// {
// uchar *data = m_stand_new.ptr<uchar>(i); //獲取第i行的首地址
// for (int j = 0; j < colNumber; j++) //列迴圈
// {
// // ---------【開始處理每個畫素】-------------
// int pix = int(data[j]);
// out_file << pix << std::endl;
// }
// }
//
// out_file.close();
// std::cout<<"==m_stand.convertTo(m_stand, CV_32FC3);=="<<std::endl;
// while(1);
int stand_row = m_stand.rows;
int stand_cols = m_stand.cols;
input_tensor = torch::from_blob(
m_stand.data, {stand_row, stand_cols, 3}).toType(torch::kFloat);
input_tensor = input_tensor.permute({2,0,1});
input_tensor = input_tensor.unsqueeze(0);//.to(torch::kFloat);
// std::cout<<input_tensor;
return true;
}
void GetFileInDir(string dirName, vector<string> &v_path)
{
DIR* Dir = NULL;
struct dirent* file = NULL;
if (dirName[dirName.size()-1] != '/')
{
dirName += "/";
}
if ((Dir = opendir(dirName.c_str())) == NULL)
{
cerr << "Can't open Directory" << endl;
exit(1);
}
while (file = readdir(Dir))
{
//if the file is a normal file
if (file->d_type == DT_REG)
{
v_path.push_back(dirName + file->d_name);
}
//if the file is a directory
else if (file->d_type == DT_DIR && strcmp(file->d_name, ".") != 0 && strcmp(file->d_name, "..") != 0)
{
GetFileInDir(dirName + file->d_name,v_path);
}
}
}
string str_replace(const string &str,const string &str_find,const string &str_replacee)
{
string str_tmp=str;
size_t pos = str_tmp.find(str_find);
while (pos != string::npos)
{
str_tmp.replace(pos, str_find.length(), str_replacee);
size_t pos_t=pos+str_replacee.length();
string str_sub=str_tmp.substr(pos_t,str_tmp.length()-pos_t);
size_t pos_tt=str_sub.find(str_find);
if(string::npos != pos_tt)
{
pos =pos_t + str_sub.find(str_find);
}else
{
pos=string::npos;
}
}
return str_tmp;
}
string get_ans(const string path)
{
int pos_1 = path.find_last_of("_");
int pos_2 = path.find_last_of(".");
string ans = path.substr(pos_1+1,pos_2-pos_1-1);
ans = str_replace(ans,"@","/");
return ans;
}
bool save_tensor_txt(torch::Tensor tensor_in_,string path_txt)
{
#include "fstream"
ofstream outfile(path_txt);
torch::Tensor tensor_in = tensor_in_.clone();
tensor_in = tensor_in.view({-1,1});
tensor_in = tensor_in.to(torch::kCPU);
auto result_data = tensor_in.accessor<float, 2>();
for(int i=0;i<result_data.size(0);i++)
{
float val = result_data[i][0];
// std::cout<<"val="<<val<<std::endl;
outfile<<val<<std::endl;
}
return true;
}
int main()
{
std::string path_pt = "/data_1/everyday/1118/pytorch_lstm_test/lstmunidirectional20211124.pt";//"/data_1/everyday/1118/pytorch_lstm_test/lstm20211124.pt";//"/data_1/everyday/1118/pytorch_lstm_test/lstm10000.pt";//"/data_1/everyday/1118/pytorch_lstm_test/lstm.pt";
std::string path_img_dir = "/data_1/2020biaozhushuju/2021_rec/general/test";//"/data_1/everyday/1118/pytorch_lstm_test/test_data";
int blank_label = 7117;
std::ifstream list("/data_1/everyday/1123/list.txt");
int standard_w = 320;
int standard_h = 32;
// vector<string> v_path;
// GetFileInDir(path_img_dir, v_path);
// for(int i=0;i<v_path.size();i++)
// {
// std::cout<<i<<" "<<v_path[i]<<std::endl;
// }
torch::Device m_device(torch::kCUDA);
// torch::Device m_device(torch::kCPU);
std::shared_ptr<torch::jit::script::Module> m_model = torch::jit::load(path_pt);
torch::NoGradGuard no_grad;
m_model->to(m_device);
std::cout<<"success load model"<<std::endl;
int cnt_all = 0;
int cnt_right = 0;
double start = getTickCount();
string file;
while(list >> file)
{
file = "/data_1/everyday/1123/img/bxd_39_發動機號碼.jpg";
cout<<cnt_all++<<" :: "<<file<<endl;
string jpg=".jpg";
string::size_type idx = file.find( jpg );
if ( idx == string::npos )
continue;
int pos_1 = file.find_last_of("_");
int pos_2 = file.find_last_of(".");
string answer = file.substr(pos_1+1,pos_2-pos_1-1);
cv::Mat img = cv::imread(file);
// int rowNumber = img.rows; //行數
// int colNumber = img.cols*img.channels(); //列數 x 通道數=每一行元素的個數
// std::ofstream out_file("/data_1/everyday/1123/img_acc/libtorch_img.txt");
// //雙重迴圈,遍歷所有的畫素值
// for (int i = 0; i < rowNumber; i++) //行迴圈
// {
// uchar *data = img.ptr<uchar>(i); //獲取第i行的首地址
// for (int j = 0; j < colNumber; j++) //列迴圈
// {
// // ---------【開始處理每個畫素】-------------
// int pix = int(data[j]);
// out_file << pix << std::endl;
// }
// }
//
// out_file.close();
// while(1);
torch::Tensor tensor_input;
pre_img(img, tensor_input);
tensor_input = tensor_input.to(m_device);
tensor_input.print();
std::cout<<tensor_input[0][2][12][25]<<std::endl;
std::cout<<tensor_input[0][1][15][100]<<std::endl;
std::cout<<tensor_input[0][0][16][132]<<std::endl;
std::cout<<tensor_input[0][1][17][156]<<std::endl;
std::cout<<tensor_input[0][2][5][256]<<std::endl;
std::cout<<tensor_input[0][0][14][205]<<std::endl;
save_tensor_txt(tensor_input, "/data_1/everyday/1124/acc/libtorch_input-100.txt");
torch::Tensor output = m_model->forward({tensor_input}).toTensor();
output.print();
// output = output.squeeze();//80,7118
// output.print();
save_tensor_txt(output, "/data_1/everyday/1124/acc/libtorch-out-100.txt");
//// std::cout<<output<<std::endl;
while(1);
//
torch::Tensor index = torch::argmax(output,1).cpu();//.to(torch::kInt);
index.print();
// std::cout<<index<<std::endl;
// while(1);
int prev_label = blank_label;
string result;
auto result_data = index.accessor<long, 1>();
for(int i=0;i<result_data.size(0);i++)
{
// std::cout<<result_data[i]<<std::endl;
int predict_label = result_data[i];
if (predict_label != blank_label && predict_label != prev_label )
{
{
result = result + chn_tab[predict_label];
}
}
prev_label = predict_label;
}
cout << "answer: " << answer << endl;
cout << "result : " << result << endl;
imshow("src",img);
waitKey(0);
// while(1);
}
// for(int i=0;i<v_path.size();i++)
// {
// cnt_all += 1;
// std::string path_img = v_path[i];
// string ans = get_ans(path_img);
// std::cout<<i<<" path="<<path_img<<" ans="<<ans<<std::endl;
// cv::Mat img = cv::imread(path_img);
// torch::Tensor input = pre_img(img, v_mean, v_std, standard_w, standard_h);
// input = input.to(m_device);
// torch::Tensor output = m_module.forward({input}).toTensor();
//
// std::string rec = get_label(output);
//#if 1 //for show
// std::cout<<"rec="<<rec<<std::endl;
// std::cout<<"ans="<<ans<<std::endl;
// cv::imshow("img",img);
// cv::waitKey(0);
//#endif
//
//#if 0 //In order to test the accuracy
// std::cout<<"rec="<<rec<<std::endl;
// std::cout<<"ans="<<ans<<std::endl;
// if(ans == rec)
// {
// cnt_right += 1;
// }
// std::cout<<"cnt_right="<<cnt_right<<std::endl;
// std::cout<<"cnt_all="<<cnt_all<<std::endl;
// std::cout<<"ratio="<<cnt_right * 1.0 / cnt_all<<std::endl;
//#endif
// }
// double time_cunsume = ((double)getTickCount() - start) / getTickFrequency();
// std::cout<<"ave time="<< time_cunsume * 1.0 / cnt_all * 1000 <<"ms"<<std::endl;
return 0;
}
好記性不如爛鍵盤---點滴、積累、進步!