1. 程式人生 > >pytorch 訓練資料以及測試 全部程式碼(4)

pytorch 訓練資料以及測試 全部程式碼(4)

接到上文

            # Show 10 * 3 images results each epoch
            if ii % (num_img_tr // 10) == 0:
                grid_image = make_grid(inputs[:3].clone().cpu().data, 3, normalize=True)
                writer.add_image('Image', grid_image, global_step)
                grid_image = make_grid(utils.decode_seg_map_sequence(torch.max(outputs[:3], 1)[1].detach().cpu().numpy()), 3, normalize=False,
                                       range=(0, 255))
                writer.add_image('Predicted label', grid_image, global_step)
                grid_image = make_grid(utils.decode_seg_map_sequence(torch.squeeze(labels[:3], 1).detach().cpu().numpy()), 3, normalize=False, range=(0, 255))
                writer.add_image('Groundtruth label', grid_image, global_step)

這部分待補充,到現在為止的需要補充的是模型框架,tensorboardx,用writer資料的儲存等知識

 # Save the model
        if (epoch % snapshot) == snapshot - 1:  # snapshot = 10
            torch.save(net.state_dict(), os.path.join(save_dir, 'models', modelName + '_epoch-' + str(epoch) + '.pth'))
            print("Save model at {}\n".format(os.path.join(save_dir, 'models', modelName + '_epoch-' + str(epoch) + '.pth')))
net.load_state_dict(
        torch.load(os.path.join(save_dir, 'models', modelName + '_epoch-' + str(resume_epoch - 1) + '.pth'),
                   map_location=lambda storage, loc: storage)) # Load all tensors onto the CPU

載入模型引數

每十次epoch就儲存一次模型引數,這個方式可以待優化!!!! torch.save(net.state_dict(), os.path.join(save_dir, 'models', modelName + '_epoch-' + str(epoch) + '.pth')) 儲存檔名字是pth結尾的.net.state_dict()這裡有一個小疑問要解答一下,根據前面可知net以及它的引數全部都在GPU上面,這個時候儲存的地點明顯就是本地路徑,為什麼不先轉移到CPU再儲存?原因可能是函式state_dict()已經進行了處理所以不需要了.

 # One testing epoch
        if useTest and epoch % nTestInterval == (nTestInterval - 1):  # nTestInterval = 5
            total_miou = 0.0
            net.eval()
            for ii, sample_batched in enumerate(testloader):
                inputs, labels = sample_batched['image'], sample_batched['label']

                # Forward pass of the mini-batch
                inputs, labels = Variable(inputs, requires_grad=True), Variable(labels)
                if gpu_id >= 0:
                    inputs, labels = inputs.cuda(), labels.cuda()

                with torch.no_grad():
                    outputs = net.forward(inputs)

                predictions = torch.max(outputs, 1)[1]

                loss = criterion(outputs, labels, size_average=False, batch_average=True)
                running_loss_ts += loss.item()

                total_miou += utils.get_iou(predictions, labels)

                # Print stuff
                if ii % num_img_ts == num_img_ts - 1:

                    miou = total_miou / (ii * testBatch + inputs.data.shape[0])
                    running_loss_ts = running_loss_ts / num_img_ts

                    print('Validation:')
                    print('[Epoch: %d, numImages: %5d]' % (epoch, ii * testBatch + inputs.data.shape[0]))
                    writer.add_scalar('data/test_loss_epoch', running_loss_ts, epoch)
                    writer.add_scalar('data/test_miour', miou, epoch)
                    print('Loss: %f' % running_loss_ts)
                    print('MIoU: %f\n' % miou)
                    running_loss_ts = 0

上面的就是驗證集部分了,也包含在了訓練的epoch裡面.

 net.eval() #進行測試
#之前我們已經看到了如下的包含net的函式
net.load_state_dict(
        torch.load(os.path.join(save_dir, 'models', modelName + '_epoch-' + str(resume_epoch - 1) + '.pth'),
                   map_location=lambda storage, loc: storage))
net.cuda()
optimizer = optim.SGD(net.parameters(), lr=p['lr'], momentum=p['momentum'], weight_decay=p['wd'])
net.train()
net.forward(inputs)
torch.save(net.state_dict(), os.path.join(save_dir, 'models', modelName + '_epoch-' + str(epoch) + '.pth'))

測試的時候就要用net.eval()和訓練的時候要使用net.train()是一樣的

 # Forward pass of the mini-batch
                inputs, labels = Variable(inputs, requires_grad=True), Variable(labels)
                if gpu_id >= 0:
                    inputs, labels = inputs.cuda(), labels.cuda()

                with torch.no_grad():
                    outputs = net.forward(inputs)

在這裡因為我們不需要求梯度了所以使用的是torch.no_grad(),當然這裡也可以修改為以下程式碼, Variable預設是False

 # Forward pass of the mini-batch
                inputs, labels = Variable(inputs, requires_grad=False), Variable(labels)
              # or inputs, labels = Variable(inputs), Variable(labels)
                if gpu_id >= 0:
                    inputs, labels = inputs.cuda(), labels.cuda()

                outputs = net.forward(inputs)
predictions = torch.max(outputs, 1)[1]

torch.max函式功能參考:https://blog.csdn.net/Z_lbj/article/details/79766690

predictions的結構和數值現在還沒有弄清楚,要先弄明白網路的輸出之後才能知道。

接下來是計算miou:

total_miou += utils.get_iou(predictions, labels)
def get_iou(pred, gt, n_classes=21):
    total_miou = 0.0
    for i in range(len(pred)):
        pred_tmp = pred[i]
        gt_tmp = gt[i]

        intersect = [0] * n_classes # 符號*表示倍乘
      # union=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
        union = [0] * n_classes 
        for j in range(n_classes):
            match = (pred_tmp == j) + (gt_tmp == j)

            it = torch.sum(match == 2).item()
            un = torch.sum(match > 0).item()

            intersect[j] += it
            union[j] += un

        iou = []
        unique_label = np.unique(gt_tmp.data.cpu().numpy())
        for k in range(len(intersect)):
            if k not in unique_label:
                continue
            iou.append(intersect[k] / union[k])

        miou = (sum(iou) / len(iou))
        total_miou += miou

    return total_miou

utils.get_iou(predictions, labels)計算一個batch的miou的函式,intersect儲存的是每一類物體的預測和標籤一致的數量,union儲存的是每一類物體的數量也就是總的數量,也有可能還包含著將背景預測成物體的那些畫素或者說是物體的數量。 unique_label = np.unique(gt_tmp.data.cpu().numpy())是得到標籤圖中含有的種類數目。

if k not in unique_label:
                continue

上面這句保證了將背景預測成物體的那些畫素或者說是物體的數量這種情況不會在計算裡面出現,也就是說把偽標籤去掉了。所以iou的大小可能不是21個元素,大部分是小於21。並且在這裡計算出來的每一類iou=預測出的真實標籤數量/真實標籤數量。

這個就是將所有的batch產生的miou加起來得到total_miou:

total_miou += utils.get_iou(predictions, labels)
# 總共的miou值

跑完全部驗證圖圖片的時候:

 # Print stuff
                if ii % num_img_ts == num_img_ts - 1:

                    miou = total_miou / (ii * testBatch + inputs.data.shape[0])
                    running_loss_ts = running_loss_ts / num_img_ts

                    print('Validation:')
                    print('[Epoch: %d, numImages: %5d]' % (epoch, ii * testBatch + inputs.data.shape[0]))
                    writer.add_scalar('data/test_loss_epoch', running_loss_ts, epoch)
                    writer.add_scalar('data/test_miour', miou, epoch)
                    print('Loss: %f' % running_loss_ts)
                    print('MIoU: %f\n' % miou)
                    running_loss_ts = 0

這裡ii * testBatch + inputs.data.shape[0]=241*6+6=242*6=1452, 而實際上總張數是1449,就是說最後一個batch只有3張圖片。

那麼這裡得到的miou就是平均每一張的iou值,running_loss_ts也是平均每一張圖圖片的損失。

最終全部的epoch跑完之後就要關閉

writer.close()