PSPNet測試程式碼解讀
PSPNet測試程式碼在原文《Pyramid Scene Parsing Network》作者的GitHub網站上https://github.com/hszhao/PSPNet,下載下來解壓後找到evaluation資料夾,裡面的六個.m檔案(Matlab程式碼)就是測試時用的程式碼,如下圖所示:
1.程式碼解讀
上圖中的run.sh是執行檔案,其程式碼解讀如下:
matlab -nodisplay -r "eval_all;exit" 2>&1 | tee matlab.log
其中,-nodisplay和-r均為matlab命令,前者表示啟動jvm,但不啟動desktop,不啟動任何顯示相關的,效果如下圖:
後者表示執行後面的程式碼,且程式碼間需要用';'分開,也即形如上述的"eval_all;exit"。
程式碼中的'2>&1' 的意思就是將標準錯誤重定向到標準輸出(1表示標準輸出,2表示標準錯誤)。
tee matlab.log表示將資料重定向到檔案matlab.log,即程式執行過程中的所有輸出都會寫入這個日誌裡(包括報錯的資訊)。
從裡面的程式碼中也可以看出,測試開始進入的是eval_all.m檔案,由此我們先從這個檔案開始分析測試程式碼。
eval_all.m檔案中的原始碼如下:
%{ Variables need to be modified: data_root, eval_list; and the default GPUs used for evaluation are with ID [0:3], modify variable 'gpu_id_array' if needed. %} close all; clc; clear; addpath('../matlab'); %add matcaffe path addpath('visualizationCode'); data_name = 'ADE20K'; %set to 'VOC2012' or 'cityscapes' for relevant datasets switch data_name case 'ADE20K' isVal = true; %evaluation on valset step = 500; %equals to number of images divide num of GPUs in testing e.g. 500=2000/4 data_root = '/data2/hszhao/dataset/ADEChallengeData2016'; %root path of dataset eval_list = 'list/ADE20K_val.txt'; %evaluation list, refer to lists in folder 'samplelist' save_root = 'mc_result/ADE20K/val/pspnet50_473/'; %root path to store the result image model_weights = 'model/pspnet50_ADE20K.caffemodel'; model_deploy = 'prototxt/pspnet50_ADE20K_473.prototxt'; fea_cha = 150; %number of classes base_size = 512; %based size for scaling crop_size = 473; %crop size fed into network data_class = 'objectName150.mat'; %class name data_colormap = 'color150.mat'; %color map case 'VOC2012' isVal = false; %evaluation on testset step = 364; %364=1456/4 data_root = '/data2/hszhao/dataset/VOC2012'; eval_list = 'list/VOC2012_test.txt'; save_root = 'mc_result/VOC2012/test/pspnet101_473/'; model_weights = 'model/pspnet101_VOC2012.caffemodel'; model_deploy = 'prototxt/pspnet101_VOC2012_473.prototxt'; fea_cha = 21; base_size = 512; crop_size = 473; data_class = 'objectName21.mat'; data_colormap = 'colormapvoc.mat'; case 'cityscapes' isVal = true; step = 125; %125=500/4 data_root = '/data2/hszhao/dataset/cityscapes'; eval_list = 'list/cityscapes_val.txt'; save_root = 'mc_result/cityscapes/val/pspnet101_713/'; model_weights = 'model/pspnet101_cityscapes.caffemodel'; model_deploy = 'prototxt/pspnet101_cityscapes_713.prototxt'; fea_cha = 19; base_size = 2048; crop_size = 713; data_class = 'objectName19.mat'; data_colormap = 'colormapcs.mat'; end skipsize = 0; %skip serveal images in the list is_save_feat = false; %set to true if final feature map is needed (not suggested for storage consuming) save_gray_folder = [save_root 'gray/']; %path for predicted gray image save_color_folder = [save_root 'color/']; %path for predicted color image save_feat_folder = [save_root 'feat/']; %path for predicted feature map scale_array = [1]; %set to [0.5 0.75 1 1.25 1.5 1.75] for multi-scale testing mean_r = 123.68; %means to be subtracted and the given values are used in our training stage mean_g = 116.779; mean_b = 103.939; acc = double.empty; iou = double.empty; gpu_id_array = [0:3]; %multi-GPUs for parfor testing, if number of GPUs is changed, remember to change the variable 'step' runID = 1; gpu_num = size(gpu_id_array,2); index_array = [(runID-1)*gpu_num+1:runID*gpu_num]; parfor i = 1:gpu_num %change 'parfor' to 'for' if singe GPU testing is used eval_sub(data_name,data_root,eval_list,model_weights,model_deploy,fea_cha,base_size,crop_size,data_class,data_colormap, ... is_save_feat,save_gray_folder,save_color_folder,save_feat_folder,gpu_id_array(i),index_array(i),step,skipsize,scale_array,mean_r,mean_g,mean_b); end if(isVal) eval_acc(data_name,data_root,eval_list,save_gray_folder,data_class,fea_cha); end
解讀如下:
%{ 執行次程式碼前,有些引數需要修正:data_root(資料存放根目錄),eval_list(測試/驗證集索引路徑), gpu_id_array(所用到的GPU索引,需要根據自己的GPU數目進行修改,並修改相應的step引數), 其餘一些路徑適當修改。 Variables need to be modified: data_root, eval_list; and the default GPUs used for evaluation are with ID [0:3], modify variable 'gpu_id_array' if needed. %} close all; clc; clear; addpath('/home/b622/PSPNet-master/matlab'); %matcaffe所在路徑(此處讀者需要修改為自己的路徑,最好為絕對路徑) addpath('visualizationCode'); %視覺化程式碼所在路徑(其實是要呼叫每一類的顏色資訊,並不是程式碼) data_name = 'VOC2012'; %按照自己要測試的資料集進行測試(我用的VOC2012測試集,該測試集需要註冊才能下載) switch data_name case 'ADE20K' isVal = true; %evaluation on valset step = 500; %equals to number of images divide num of GPUs in testing e.g. 500=2000/4 data_root = '/data2/hszhao/dataset/ADEChallengeData2016'; %root path of dataset eval_list = 'list/ADE20K_val.txt'; %evaluation list, refer to lists in folder 'samplelist' save_root = 'mc_result/ADE20K/val/pspnet50_473/'; %root path to store the result image model_weights = 'model/pspnet50_ADE20K.caffemodel'; model_deploy = 'prototxt/pspnet50_ADE20K_473.prototxt'; fea_cha = 150; %number of classes base_size = 512; %based size for scaling crop_size = 473; %crop size fed into network data_class = 'objectName150.mat'; %class name data_colormap = 'color150.mat'; %color map case 'VOC2012' isVal = false; %false的原因是VOC2012的測試集是沒有標註過的,即無法得到標記y,故只能展現分割效果,卻無法驗證精度 step = 728; %728=1456/2 由於我這邊只有兩塊GPU,故將測試集一分為二,每塊GPU測試一半 %注意路徑要嚴格按照以下樣例填寫: %data_root是存放資料集的根目錄路徑,最後不能新增符號'/',因為測試集索引檔案,例如PSPNet-master/evaluation/samplelist下的 %VOC2012_test.txt索引檔案,其每一張測試圖的索引格式形如'/JPEGImages/2008_000006.jpg' %故data_root的路徑下必須有資料夾JPEGImages,且最後不能有'/' %eval_listu即測試集索引檔案,需放在data_root目錄下 data_root = '/media/b622/My Passport/VOC2012'; %修改為自己存放資料的根目錄 eval_list = '/VOC2012_test.txt'; save_root = '/media/b622/My Passport/VOC2012test/'; %自行設定,但要注意最後需有'/' model_weights = '/media/b622/My Passport/SPSNet/pspnet101_VOC2012.caffemodel'; %模型存放路徑(在PSPNet-master/evaluation/model下能找到) model_deploy = 'prototxt/pspnet101_VOC2012_473.prototxt'; %模型的deploy.prototxt存放路徑(在PSPNet-master/evaluation/prototxt下能找到) fea_cha = 21; %VOC2012共21類 base_size = 512; %影象的基本大小(可以在此引數上實現多尺寸測試) crop_size = 473; %裁剪大小(由於訓練的網路的輸入影象大小為473×473,故需要對影象進行裁剪,詳細見後面其餘檔案的程式碼分析) data_class = 'objectName21.mat'; %objectName21.mat裡存放每一類的名稱 data_colormap = 'colormapvoc.mat'; %colormapvoc.mat為調色盤,存放每一類顏色的RGB資訊(Matlab下歸一化到0-1) case 'cityscapes' isVal = true; step = 125; %125=500/4 data_root = '/data2/hszhao/dataset/cityscapes'; eval_list = 'list/cityscapes_val.txt'; save_root = 'mc_result/cityscapes/val/pspnet101_713/'; model_weights = '/media/b622/My Passport/SPSNet/pspnet101_cityscapes.caffemodel'; model_deploy = 'prototxt/pspnet101_cityscapes_713.prototxt'; fea_cha = 19; base_size = 2048; crop_size = 713; data_class = 'objectName19.mat'; data_colormap = 'colormapcs.mat'; end skipsize = 0; %skip serveal images in the list(此處設定為不跳過任何測試圖片) is_save_feat = false; %set to true if final feature map is needed (not suggested for storage consuming)是否儲存特徵資料 save_gray_folder = [save_root 'gray/']; %path for predicted gray image 預測圖(灰度形式)儲存路徑 save_color_folder = [save_root 'color/']; %path for predicted color image 預測圖(彩色圖)儲存路徑 save_feat_folder = [save_root 'feat/']; %path for predicted feature map 預測特徵圖保持路徑(實際上是資料,不是圖) scale_array = [1]; %set to [0.5 0.75 1 1.25 1.5 1.75] for multi-scale testing 即在多個尺寸在測試,這裡設定為原尺寸 %訓練階段所使用的訓練集RGB均值(減去均值可以提高訓練速度) mean_r = 123.68; %means to be subtracted and the given values are used in our training stage mean_g = 116.779; mean_b = 103.939; acc = double.empty; iou = double.empty; gpu_id_array = [0:1]; %只有兩塊GPU,所以設定為0-1,修改此處後,記得修改引數'step' runID = 1; gpu_num = size(gpu_id_array,2); index_array = [(runID-1)*gpu_num+1:runID*gpu_num]; %轉化為Matlab的索引(Matlab的索引從1開始) %parfor能夠開啟多個執行緒來並行迴圈,如果為單個GPU,則需將parfor改為序列的for parfor i = 1:gpu_num eval_sub(data_name,data_root,eval_list,model_weights,model_deploy,fea_cha,base_size,crop_size,data_class,data_colormap, ... is_save_feat,save_gray_folder,save_color_folder,save_feat_folder,gpu_id_array(i),index_array(i),step,skipsize,scale_array,mean_r,mean_g,mean_b); end if(isVal) eval_acc(data_name,data_root,eval_list,save_gray_folder,data_class,fea_cha); %呼叫eval_acc對驗證集進行準確度和平均交併比的計算 end
eval_all.m檔案中最後幾句中用到eval_sub()函式,此函式在eval_sub.m中,該函式的解讀如下:
function eval_sub(data_name,data_root,eval_list,model_weights,model_deploy,fea_cha,base_size,crop_size,data_class,data_colormap, ...
is_save_feat,save_gray_folder,save_color_folder,save_feat_folder,gpu_id,index,step,skipsize,scale_array,mean_r,mean_g,mean_b)
list = importdata(fullfile(data_root,eval_list)); %fullfile相當於兩個字串連線
load(data_class); %載入類別(實際是每一種類別的名稱,如airplane)
load(data_colormap); %載入調色盤,載入後會存在對應的調色盤變數名
if(~isdir(save_gray_folder)) %不存在則建立
mkdir(save_gray_folder);
end
if(~isdir(save_color_folder))
mkdir(save_color_folder);
end
if(~isdir(save_feat_folder) && is_save_feat)
mkdir(save_feat_folder);
end
phase = 'test'; %run with phase test (so that dropout isn't applied),test時不應用dropout
if ~exist(model_weights, 'file')
error('Model missing!');
end
caffe.reset_all();
caffe.set_mode_gpu();
caffe.set_device(gpu_id); %根據gpu_id啟用相應的GPU
net = caffe.Net(model_deploy, model_weights, phase);
for i = skipsize+(index-1)*step+1:skipsize+index*step
fprintf(1, 'processing %d (%d)...\n', i, numel(list));
str = strsplit(list{i});
img = imread(fullfile(data_root,str{1}));
if(size(img,3) < 3) %for gray image 如果為灰度圖,則擴充套件為三通道一樣的圖
im_r = img;
im_g = img;
im_b = img;
img = cat(3,im_r,im_g,im_b); %cat函式用於聯接陣列
end
ori_rows = size(img,1); %原始長
ori_cols = size(img,2); %原始寬
data_all = zeros(ori_rows,ori_cols,fea_cha,'single');
for j = 1:size(scale_array,2)
long_size = base_size*scale_array(j) + 1;
new_rows = long_size;
new_cols = long_size;
%歸一化長和寬到設定的base_size
if ori_rows > ori_cols
new_cols = round(long_size/single(ori_rows)*ori_cols);
else
new_rows = round(long_size/single(ori_cols)*ori_rows);
end
img_scale = imresize(img,[new_rows new_cols],'bilinear'); %雙線性插值調整影象大小
data_all = data_all + scale_process(net,img_scale,fea_cha,crop_size,ori_rows,ori_cols,mean_r,mean_g,mean_b);
end
data_all = data_all/size(scale_array,2);
data = data_all; %already exp process
img_fn = strsplit(str{1},'/'); %對圖片路徑按'/'切分(圖片路徑包含圖片名字)
img_fn = img_fn{end}; %取最後一個字串(實際上是圖片名字,包含副檔名)
img_fn = img_fn(1:end-4); %去掉副檔名,只保留剩餘下來的部分,例如/JPEGImages/2008_000006.jpg最後只保留2008_000006
%max(data,[],3)取出data中fea_cha層的最大值及其對應的標號(例如VOC2012有21個類,則最後的data有21層大小為
%[ori_rows,ori_cols]的預測值,選出每一個畫素點所對應的21層中的最大預測值,其所對應的標號(也即類別)即為該畫素點的歸屬)
[~,imPred] = max(data,[],3); %imPred儲存每一個畫素點所對應的類別
imPred = uint8(imPred); %轉化為8位無符號整數
switch data_name
case 'ADE20K'
rgbPred = colorEncode(imPred, colors);
imwrite(imPred,[save_gray_folder img_fn '.png']);
imwrite(rgbPred,[save_color_folder img_fn '.png']);
case 'VOC2012'
imPred = imPred - 1; %VOC2010資料集的類別標號是0-20共21類,但imPred中的1對應VOC2012中的0(其餘依次對應),故全減1
imwrite(imPred,[save_gray_folder img_fn '.png']);
imwrite(imPred,colormapvoc,[save_color_folder img_fn '.png']); %根據調色盤colormapvoc進行上色,然後儲存
case 'cityscapes'
imPred = imPred - 1;
imwrite(imPred,[save_gray_folder img_fn '.png']);
imwrite(imPred,colormapcs,[save_color_folder img_fn '.png']);
end
if(is_save_feat)
save([save_feat_folder img_fn],'data');
end
end
caffe.reset_all();
end
其中的scale_process()函式在scale_process.m檔案中,解讀如下:
function data_output = scale_process(net,img_scale,fea_cha,crop_size,ori_rows,ori_cols,mean_r,mean_g,mean_b)
data_output = zeros(ori_rows,ori_cols,fea_cha,'single'); %建立陣列,儲存測試結果
new_rows = size(img_scale,1);
new_cols = size(img_scale,2);
long_size = new_rows;
short_size = new_cols;
if(new_cols > long_size)
long_size = new_cols;
short_size = new_rows;
end
if(long_size <= crop_size)
%利用pre_img()進行填充到和crop_size一樣大小,再減去均值,並轉化為caffe的blob儲存格式
input_data = pre_img(img_scale,crop_size,mean_r,mean_g,mean_b);
score = caffe_process(input_data,net); %前向傳播計算出預測值(預測值共21層)
score = score(1:new_rows,1:new_cols,:); %因為是在pre_img中是'post'後向填充,所以取[1:new_rows,1:new_cols,:]即可
else %當原始圖片大小大於設定的裁剪圖片大小(crop_size*crop_size)時,需要進行裁剪分塊就行測試,最後合到一塊兒
stride_rate = 2/3;
stride = ceil(crop_size*stride_rate); %裁剪步長
img_pad = img_scale;
if(short_size < crop_size) %如果長邊大於crop_size,而短邊小於crop_size,則需要對短邊進行填充
if(new_rows < crop_size) %如果Height是短邊,對Height進行填充(填充方法與pre_img中一致)
im_r = padarray(img_pad(:,:,1),[crop_size-new_rows,0],mean_r,'post');
im_g = padarray(img_pad(:,:,2),[crop_size-new_rows,0],mean_g,'post');
im_b = padarray(img_pad(:,:,3),[crop_size-new_rows,0],mean_b,'post');
img_pad = cat(3,im_r,im_g,im_b);
end
if(new_cols < crop_size) %如果Width是短邊,對Width進行填充(填充方法與pre_img中一致)
im_r = padarray(img_pad(:,:,1),[0,crop_size-new_cols],mean_r,'post');
im_g = padarray(img_pad(:,:,2),[0,crop_size-new_cols],mean_g,'post');
im_b = padarray(img_pad(:,:,3),[0,crop_size-new_cols],mean_b,'post');
img_pad = cat(3,im_r,im_g,im_b);
end
end
pad_rows = size(img_pad,1);
pad_cols = size(img_pad,2);
h_grid = ceil(single(pad_rows-crop_size)/stride) + 1;
w_grid = ceil(single(pad_cols-crop_size)/stride) + 1;
data_scale = zeros(pad_rows,pad_cols,fea_cha,'single');
count_scale = zeros(pad_rows,pad_cols,fea_cha,'single');
%根據裁剪步長進行裁剪,從而前向傳播計算預測值
for grid_yidx=1:h_grid
for grid_xidx=1:w_grid
s_x = (grid_xidx-1) * stride + 1; %裁剪起始座標的x值(start_x)
s_y = (grid_yidx-1) * stride + 1; %裁剪起始座標的y值(start_y)
e_x = min(s_x + crop_size - 1, pad_cols); %裁剪終止座標的x值(end_x)
e_y = min(s_y + crop_size - 1, pad_rows); %裁剪終止座標的y值(end_y)
s_x = e_x - crop_size + 1; %目的是使得最終裁剪出來的影象大小為crop_size*crop_size,故重新計算裁剪的起始座標
s_y = e_y - crop_size + 1;
img_sub = img_pad(s_y:e_y,s_x:e_x,:); %進行裁剪
count_scale(s_y:e_y,s_x:e_x,:) = count_scale(s_y:e_y,s_x:e_x,:) + 1; %由於前後裁剪部分會有重疊,故要統計一下每一個畫素點被測試了幾次
input_data = pre_img(img_sub,crop_size,mean_r,mean_g,mean_b); %執行處理
data_scale(s_y:e_y,s_x:e_x,:) = data_scale(s_y:e_y,s_x:e_x,:) + caffe_process(input_data,net); %執行預測
end
end
score = data_scale./count_scale; %求出每個畫素點預測均值
score = score(1:new_rows,1:new_cols,:); %因為是在pre_img中是'post'後向填充,所以取[1:new_rows,1:new_cols,:]即可
end
data_output = imresize(score,[ori_rows ori_cols],'bilinear'); %仍舊採用雙線性插值返回到原影象大小
data_output = bsxfun(@rdivide, data_output, sum(data_output, 3)); %進行歸一化,使得每個畫素點的21個預測值之和為1(此句語句主要針對於長邊大於crop_size的情況)
end
scale_process()函式中所呼叫的pre_img(),caffe_process()函式的解讀如下:
(1)pre_img()函式
function im_pad = pre_img(im,crop_size,mean_r,mean_g,mean_b)
row = size(im,1);
col = size(im,2);
im_pad = single(im); %轉換為單精度
if(size(im_pad,3) < 3) %如果為灰度圖,則轉換為三通道一樣的圖
im_r = im_pad;
im_g = im_pad;
im_b = im_pad;
im_pad = cat(3,im_r,im_g,im_b);
end
if(row < crop_size)
%padarray是matlab中用於填充的函式,'post'是後向填充
%(即在最後一行後填充crop_size-row行,在最後一列後填充0列,填充數值為對應的均值mean_r/g/b)
im_r = padarray(im_pad(:,:,1),[crop_size-row,0],mean_r,'post');
im_g = padarray(im_pad(:,:,2),[crop_size-row,0],mean_g,'post');
im_b = padarray(im_pad(:,:,3),[crop_size-row,0],mean_b,'post');
im_pad = cat(3,im_r,im_g,im_b);
end
if(col < crop_size)
im_r = padarray(im_pad(:,:,1),[0,crop_size-col],mean_r,'post');
im_g = padarray(im_pad(:,:,2),[0,crop_size-col],mean_g,'post');
im_b = padarray(im_pad(:,:,3),[0,crop_size-col],mean_b,'post');
im_pad = cat(3,im_r,im_g,im_b);
end
im_mean = zeros(crop_size,crop_size,3,'single');
im_mean(:,:,1) = mean_r;
im_mean(:,:,2) = mean_g;
im_mean(:,:,3) = mean_b;
im_pad = single(im_pad) - im_mean; %減去均值
im_pad = im_pad(:,:,[3 2 1]); %從RGB轉換為BGR儲存(適應caffe的格式)
im_pad = permute(im_pad,[2 1 3]); %轉置影象,即調換長和寬(也是適應caffe的儲存格式)
%注:caffe中的Blob型別是(Width,Height,Channel,Number)格式儲存
end
(2)caffe_process()函式
function score = ms_caffe_process(input_data,net)
score = net.forward({input_data}); %前向傳播計算預測值
score = score{1};
score_flip = net.forward({input_data(end:-1:1,:,:)}); %end:-1:1表示從尾到頭重新排列,實質是進行翻轉,即按列翻轉
score_flip = score_flip{1};
score = score + score_flip(end:-1:1,:,:);
score = permute(score, [2 1 3]); %恢復到原來的HeightxWidth格式
%逐元素計算指數值,由於所給的網路模型(例如pspnet101_VOC2012_473.prototxt)沒有softmax層,
%由此手動計算softmax值,達到分類效果
%softmax計算公式pxi=exp(xi)/sum(exp(xi),i=1:21), i=1,2,...,21
score = exp(score);
score = bsxfun(@rdivide, score, sum(score, 3)); %bsxfun呼叫matlab和C的混編函式rdivide(即右除,也是逐元素的)
end
以上過程中的影象裁剪思想如下:
PSPNet訓練VOC2012時,採用的輸入影象大小為固定的473*473,而VOC2012中的資料集本身大小並不固定,所以需要對小於473*473的影象進行補零填充;對大於473*473的影象進行裁剪,測試過程中的具體裁剪操作如下:
(1)當測試影象的長或寬或兩者都小於473時,按下圖進行補零填充(其中紅色部分為原圖,白色部分為補零填充部分,且是後向填充)
(2)當有一邊大於473時,若另外一邊小於473,則對這邊進行後向補零填充(如下圖);若另一邊也大於473,則保持不變
在填充完的基礎上需要對(2)中的情況進行裁剪,使得每一塊都為標準的473*473大小,具體裁剪方法是將473*473的視窗進行滑動,依此取出影象上的每一部分,而滑動的步長由程式碼中的引數'stride'決定,且不足的回退直到大小為473*473,如下圖所示(共裁剪出4塊區域):
對每一塊裁剪出的區域進行測試預測,重疊部分除以重疊次數(即上圖中最中間的那塊區域重疊了4次,故最後預測結果求和後除以4)。
2.執行測試及測試結果
開啟終端,切換到run.sh檔案所在的目錄,輸入以下語句執行測試:
./run.sh
執行過程如下(部分截圖):
分割效果如下: