pytorch固定BN層引數

阿新 • • 發佈：2020-11-30

背景：基於PyTorch的模型，想固定主分支引數，只訓練子分支，結果發現在不同epoch相同的測試資料經過主分支輸出的結果不同。

原因：未固定主分支BN層中的running_mean和running_var。

解決方法：將需要固定的BN層狀態設定為eval。

問題示例：

環境：torch：1.7.0

# -*- coding:utf-8 -*-
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.bn1 = nn.BatchNorm2d(6)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.bn2 = nn.BatchNorm2d(16)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 5)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.bn1(self.conv1(x))), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.bn2(self.conv2(x))), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

def print_parameter_grad_info(net):
    print('-------parameters requires grad info--------')
    for name, p in net.named_parameters():
        print(f'{name}:\t{p.requires_grad}')

def print_net_state_dict(net):
    for key, v in net.state_dict().items():
        print(f'{key}')

if __name__ == "__main__":
    net = Net()

    print_parameter_grad_info(net)
    net.requires_grad_(False)
    print_parameter_grad_info(net)

    torch.random.manual_seed(5)
    test_data = torch.rand(1, 1, 32, 32)
    train_data = torch.rand(5, 1, 32, 32)

    # print(test_data)
    # print(train_data[0, ...])
    for epoch in range(2):
        # training phase, 假設每個epoch只迭代一次
        net.train()
        pre = net(train_data)
        # 計算損失和引數更新等
        # ....

        # test phase
        net.eval()
        x = net(test_data)
        print(f'epoch:{epoch}', x)

執行結果：

-------parameters requires grad info--------
conv1.weight:   True
conv1.bias:     True
bn1.weight:     True
bn1.bias:       True
conv2.weight:   True
conv2.bias:     True
bn2.weight:     True
bn2.bias:       True
fc1.weight:     True
fc1.bias:       True
fc2.weight:     True
fc2.bias:       True
fc3.weight:     True
fc3.bias:       True
-------parameters requires grad info--------
conv1.weight:   False
conv1.bias:     False
bn1.weight:     False
bn1.bias:       False
conv2.weight:   False
conv2.bias:     False
bn2.weight:     False
bn2.bias:       False
fc1.weight:     False
fc1.bias:       False
fc2.weight:     False
fc2.bias:       False
fc3.weight:     False
fc3.bias:       False
epoch:0 tensor([[-0.0755,  0.1138,  0.0966,  0.0564, -0.0224]])
epoch:1 tensor([[-0.0763,  0.1113,  0.0970,  0.0574, -0.0235]])

可以看到：

net.requires_grad_(False)已經將網路中的各引數設定成了不需要梯度更新的狀態，但是同樣的測試資料test_data在不同epoch中前向之後出現了不同的結果。

呼叫print_net_state_dict可以看到BN層中的引數running_mean和running_var並沒在可優化引數net.parameters中

bn1.weight
bn1.bias
bn1.running_mean
bn1.running_var
bn1.num_batches_tracked

但在training pahse的前向過程中，這兩個引數被更新了。導致整個網路在freeze

的情況下，同樣的測試資料出現了不同的結果

Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a defaultmomentumof 0.1. source

因此在training phase時對BN層顯式設定eval狀態：

if __name__ == "__main__":
    net = Net()
    net.requires_grad_(False)

    torch.random.manual_seed(5)
    test_data = torch.rand(1, 1, 32, 32)
    train_data = torch.rand(5, 1, 32, 32)

    # print(test_data)
    # print(train_data[0, ...])
    for epoch in range(2):
        # training phase, 假設每個epoch只迭代一次
        net.train()
        net.bn1.eval()
        net.bn2.eval()
        pre = net(train_data)
        # 計算損失和引數更新等
        # ....

        # test phase
        net.eval()
        x = net(test_data)
        print(f'epoch:{epoch}', x)

可以看到結果正常了：

epoch:0 tensor([[ 0.0944, -0.0372,  0.0059, -0.0625, -0.0048]])
epoch:1 tensor([[ 0.0944, -0.0372,  0.0059, -0.0625, -0.0048]])

交流基地：630390733

pytorch固定BN層引數

背景：基於PyTorch的模型，想固定主分支引數，只訓練子分支，結果發現在不同epoch相同的測試資料經過主分支輸出的結果不同。

pytorch獲取模型某一層引數名及引數值方式

1、Motivation： I wanna modify the value of some param; I wanna check the value of some param. The needed function:

pytorch如何凍結某層引數的實現

在遷移學習finetune時我們通常需要凍結前幾層的引數不參與訓練，在Pytorch中的實現如下：

淺談pytorch中的BN層的注意事項

最近修改一個程式碼的時候，當使用網路進行推理的時候，發現每次更改測試集的batch size大小竟然會導致推理結果不同，甚至產生錯誤結果，後來發現在網路中定義了BN層，BN層在訓練過程中，會將一個Batch的中的資料轉變

視覺化pytorch 模型中不同BN層的running mean曲線例項

載入模型字典逐一判斷每一層，如果該層是bn 的 running mean，就取出引數並取平均作為該層的代表

pytorch 求網路模型引數例項

用pytorch訓練一個神經網路時，我們通常會很關心模型的引數總量。下面分別介紹來兩種方法求模型引數

pytorch forward兩個引數例項

以channel Attention Block為例子 class CAB(nn.Module): def __init__(self,in_channels,out_channels):

tensorflow獲取預訓練模型某層引數並賦值到當前網路指定層方式

已經有了一個預訓練的模型，我需要從其中取出某一層，把該層的weights和biases賦值到新的網路結構中，可以使用tensorflow中的pywrap_tensorflow(用來讀取預訓練模型的引數值)結合Session.assign()進行操作。

解決Pytorch自定義層出現多Variable共享記憶體錯誤問題

錯誤資訊: RuntimeError: in-place operations can be only used on variables that don\'t share storage with any other variables,but detected that there are 4 objects sharing it

神經網路基本組成 - 池化層、Dropout層、BN層、全連線層 13

1. 池化層在卷積網路中，通常會在卷積層之間增加池化（Pooling）層，以降低特徵圖的引數量，提升計算速度，增加感受野，是一種降取樣操作。池化是一種較強的先驗，可以使模型更關注全域性特徵而非區域性出現

（pytorch-深度學習系列）pytorch實現多層感知機（自動定義模型）對Fashion-MNIST資料集進行分類-學習筆記

pytorch實現多層感知機（自動定義模型）對Fashion-MNIST資料集進行分類匯入模組：

神經網路（model.summary())模型層的轉換與層引數詳解

技術標籤：卷積神經網路神經網路tensorflow 簡單的卷積 from keras import layers from keras import models

unet每層引數形狀

原文連結：https://blog.csdn.net/weixin_39875161/article/details/103378281?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2~aggregatepage~first_rank_ecpm_v1~rank_aggregation-1-103378281

SpringBoot自定義控制層引數解析

一、背景在Spring的Controller中，我們通過@RequestParam或@RequestBody就可以將請求中的引數對映到控制層具體的引數中，那麼這個是怎麼實現的呢？如果我現在控制層中的某個引數的值是從Redis中來，那麼應該如何實

pytorch獲得模型的引數資訊，所佔記憶體的大小

一 sum 一個模型所佔的視訊記憶體無非是這兩種：模型權重引數模型所儲存的中間變數

Pytorch初始化模型引數

#高斯分佈torch.nn.init.normal_(tensor: torch.Tensor, mean: float = 0.0, std: float = 1.0) → torch.Tensor#均勻分佈torch.nn.init.uniform_(tensor: torch.Tensor, a: float = 0.0, b: float = 1.0) → torch.

Pytorch 實現凍結指定卷積層的引數

python程式碼 for i,para in enumerate(self._net.module.features.parameters()): if i < 16: para.requires_grad = False

Pytorch——BatchNorm層和LayerNorm層的引數含義以及應用理解

　　在我們平常面試和工程中會用到BN和LN，但或許沒有去了解過BN和LN到底在那個維度上進行的正則化（減均值除以標準差）。下面將會採用各種例子來為大家介紹BN層和LN層各個引數以及差別。

Pytorch: 自定義網路層例項

自定義Autograd函式對於淺層的網路，我們可以手動的書寫前向傳播和反向傳播過程。但是當網路變得很大時，特別是在做深度學習時，網路結構變得複雜。前向傳播和反向傳播也隨之變得複雜，手動書寫這兩個過程就會存在很

pytorch自定義二值化網路層方式

任務要求：自定義一個層主要是定義該層的實現函式,只需要過載Function的forward和backward函式即可,如下:

pytorch固定BN層引數

相關推薦