1. 程式人生 > 實用技巧 >SENet&語義分割相關知識學習

SENet&語義分割相關知識學習

SENet&語義分割相關知識學習

對上一次學習的 HybridSN 高光譜分類網路進行優化改進;SENet網路學習和實現;學習視訊北京大學李夏的《語義分割中的自注意力機制和低秩重重建》 , 南開大學程明明教授的《影象語義分割前沿進展》

HybridSN 高光譜分類網路的優化改進

關於DropOut使用

  • 在上一次的實驗程式碼中,因為使用了DropOut,所以需要對應使用net.train()和net.eval()函式
    • model.train() 讓model變成訓練模式,此時 dropout和batch normalization的操作在訓練過程中發揮作用,防止網路過擬合的問題
    • net.eval(): 把BN和DropOut固定住,不會取平均,而是用訓練好的值
      • 這樣的話,在測試過程中,由於網路引數都已經固定,所以每次的測試結果也都會保持一致
  • 準確率在95.5%左右

模型改進——先使用二位卷積,在使用三位卷積

# 模型改進——先使用二位卷積,在使用三位卷積
class_num = 16

class HybridSN(nn.Module):

  def __init__(self):
    super(HybridSN, self).__init__()
	# 二維卷積:原始輸入(30, 25, 25) 64個 3x3x30 的卷積核,得到 (64, 23, 23)
    self.conv4_2d = nn.Sequential(
        nn.Conv2d(30,64,(3,3)),
        nn.BatchNorm2d(64),
        nn.ReLU()
    )
    # 三個三維卷積
    # conv1:(1, 64, 23, 23), 8個 7x3x3 的卷積核 ==> (8, 58, 21, 21)
    self.conv1_3d = nn.Sequential(
        nn.Conv3d(1,8,(7,3,3)),
        nn.BatchNorm3d(8),
        nn.ReLU()
    )
    # conv2:(8, 58, 21, 21), 16個 5x3x3 的卷積核 ==> (16, 54, 19, 19)
    self.conv2_3d = nn.Sequential(
        nn.Conv3d(8,16,(5,3,3)),
        nn.BatchNorm3d(16),
        nn.ReLU()
    )
    # conv3:(16, 54, 19, 19), 32個 5x3x3 的卷積核 ==> (32, 52, 17, 17)
    self.conv3_3d = nn.Sequential(
        nn.Conv3d(16,32,(3,3,3)),
        nn.BatchNorm3d(32),
        nn.ReLU()
    )

    self.fn1 = nn.Linear(480896,256)# 32*52*17*17,這裡可以執行一下,print一下out.size()
    self.fn2 = nn.Linear(256,128)

    self.fn_out = nn.Linear(128,class_num)

    self.drop = nn.Dropout(p = 0.4)
    # emm我在這裡使用了softmax之後,網路在訓練過程中loss就不再下降了,不知道具體是為啥,很奇怪,,
    # self.soft = nn.Softmax(dim = 1)

  def forward(self, x):
    # 先降到二維
    out = x.view(x.shape[0],x.shape[2],x.shape[3],x.shape[4])
    out = self.conv4_2d(out)
    # 升維(64, 23, 23)-->(1,64, 23, 23)
    out = out.view(out.shape[0],1,out.shape[1],out.shape[2],out.shape[3])

    out = self.conv1_3d(out)
    out = self.conv2_3d(out)
    out = self.conv3_3d(out)
    # 進行重組,以b行,d列的形式存放(d自動計算)
    out = out.view(out.shape[0],-1)

    out = self.fn1(out)
    out = self.drop(out)
    out = self.fn2(out)
    out = self.drop(out)

    out = self.fn_out(out)

    return out

# 隨機輸入,測試網路結構是否通
x = torch.randn(1,1, 30, 25, 25)
net = HybridSN()
y = net(x)
print(y.shape)
print(y)
  • 由於先使用二維卷積,原始輸入(30, 25, 25) 經過64個 3x3x30 的卷積核,得到 (64, 23, 23),在進行三維卷積,可以明顯看到引數量的增加,所以整個網路模型的訓練時間也會相應變長,不過也是可以看到準確率的提升
  • 準確率在97.3%左右

引入注意力機制

# 引入注意力機制
class_num = 16

class Attention_Block(nn.Module):

    def __init__(self, planes, size):
        super(Attention_Block, self).__init__()

        self.globalAvgPool = nn.AvgPool2d(size, stride=1)

        self.fc1 = nn.Linear(planes, round(planes / 16))
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(round(planes / 16), planes)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        residual = x

        out = self.globalAvgPool(x)
        out = out.view(out.shape[0], out.shape[1])
        out = self.fc1(out)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.sigmoid(out)
        
        out = out.view(out.shape[0], out.shape[1], 1, 1)
        out = out * residual

        return out



class HybridSN(nn.Module):

    def __init__(self):
        super(HybridSN, self).__init__()
        # 3個3D卷積
        # conv1:(1, 30, 25, 25), 8個 7x3x3 的卷積核 ==> (8, 24, 23, 23)
        self.conv1_3d = nn.Sequential(
            nn.Conv3d(1,8,(7,3,3)),
            nn.BatchNorm3d(8),
            nn.ReLU()
        )
        # conv2:(8, 24, 23, 23), 16個 5x3x3 的卷積核 ==>(16, 20, 21, 21)
        self.conv2_3d = nn.Sequential(
            nn.Conv3d(8,16,(5,3,3)),
            nn.BatchNorm3d(16),
            nn.ReLU()
        )
        # conv3:(16, 20, 21, 21),32個 3x3x3 的卷積核 ==>(32, 18, 19, 19)
        self.conv3_3d = nn.Sequential(
            nn.Conv3d(16,32,(3,3,3)),
            nn.BatchNorm3d(32),
            nn.ReLU()
        )
        # 二維卷積:(576, 19, 19) 64個 3x3 的卷積核,得到 (64, 17, 17)
        self.conv4_2d = nn.Sequential(
            nn.Conv2d(576,64,(3,3)),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )
        # 注意力機制部分
        self.layer1 = self.make_layer(Attention_Block,planes = 576, size = 19)
        self.layer2 = self.make_layer(Attention_Block,planes = 64, size = 17)

        # 接下來依次為256,128節點的全連線層,都使用比例為0.1的 Dropout
        self.fn1 = nn.Linear(18496,256)
        self.fn2 = nn.Linear(256,128)

        self.fn_out = nn.Linear(128,class_num)

        self.drop = nn.Dropout(p = 0.1)
        # emm我在這裡使用了softmax之後,網路在訓練過程中loss就不再下降了,不知道具體是為啥,很奇怪,,
        # self.soft = nn.Softmax(dim = 1)

    def make_layer(self, block, planes, size):
        layers = []
        layers.append(block(planes, size))
        return nn.Sequential(*layers)

    def forward(self, x):
        out = self.conv1_3d(x)
        out = self.conv2_3d(out)
        out = self.conv3_3d(out)
        # 進行二維卷積,因此把前面的 32*18 reshape 一下,得到 (576, 19, 19)
        out = out.view(out.shape[0],out.shape[1]*out.shape[2],out.shape[3],out.shape[4])

        # 在二維卷積部分引入注意力機制
        out = self.layer1(out)
        out = self.conv4_2d(out)
        out = self.layer2(out)
        # 接下來是一個 flatten 操作,變為 18496 維的向量
        # 進行重組,以b行,d列的形式存放(d自動計算)
        out = out.view(out.shape[0],-1)

        out = self.fn1(out)
        out = self.drop(out)
        out = self.fn2(out)
        out = self.drop(out)

        out = self.fn_out(out)

        # out = self.soft(out)

        return out

# 隨機輸入,測試網路結構是否通
x = torch.randn(1, 1, 30, 25, 25)
net = HybridSN()
y = net(x)
print(y.shape)
print(y)
  • 可以明顯感覺到網路在訓練過程中能夠很快的收斂,並且整個網路的訓練過程也十分穩定,最終測試結果可以達到99%左右

SENet

其中心思想:對當前的輸入特徵圖的每一個channel,進行一個 Squeeze 操作得到一個權重值,然後將這個權重值與對應channel進行乘積操作,對每個channel進行加權操作,從而得到新的特徵圖

!

網路結構

  • \(X--> U\)
    • \(F_{tr}\)是傳統的卷積操作
  • \(U--> \widetilde X\)
    • Squeeze --\(F_{sq}(·)\)
      • 先對U中的每一個channel做一個 Global Average Pooling 操作,然後可以得到一個1x1xC的資料
        • 將整個通道上的值進行平均化操作,便能夠基於通道的整體資訊來計算scale
        • 因為這裡作者是想要得到各channel之間的分佈關聯,所以這裡雖然遮蔽了每個channel中空間分佈中的相關性,但無關大雅
      • 用來表明該層C個feature map的數值分佈情況
    • Excitation --\(F_{ex}(·,W)\)
      • \(s = F_{es}(z,W) = \sigma(g(z,W)) = \sigma(W_2\delta(W_1z) )\)
      • 將得到的1x1xC資料先進行一個全連線層操作, 其中\(W_1\)的維度是C * C/r
        • 這個r是一個縮放參數,在文中取的是16,這個引數的目的是為了減少channel個數從而降低計算量
        • 這裡使用全連線層是為了充分利用通道間的相關性來得到需要的一個權重引數
      • 然後經過一個ReLU層
      • 接著在經過一個全連線層操作,其中\(W_2\)的維度是C/r * C
      • 最後通過sigmoid 將最終權重限制到[0,1]的範圍
    • 最後將這個值s作為scale乘到U的每個channel上
  • 通過控制scale的大小,把重要的特徵增強,不重要的特徵減弱,從而讓提取的特徵指向性更強
  • 作者還給出了兩種實際應用的例子

程式碼實現

其實現程式碼來自連結

import torch.nn as nn
import math
import torch.utils.model_zoo as model_zoo

__all__ = ['SENet', 'se_resnet_18', 'se_resnet_34', 'se_resnet_50', 'se_resnet_101',
           'se_resnet_152']

def conv3x3(in_planes, out_planes, stride=1):
    """3x3 convolution with padding"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride

        if planes == 64:
            self.globalAvgPool = nn.AvgPool2d(56, stride=1)
        elif planes == 128:
            self.globalAvgPool = nn.AvgPool2d(28, stride=1)
        elif planes == 256:
            self.globalAvgPool = nn.AvgPool2d(14, stride=1)
        elif planes == 512:
            self.globalAvgPool = nn.AvgPool2d(7, stride=1)
        self.fc1 = nn.Linear(in_features=planes, out_features=round(planes / 16))
        self.fc2 = nn.Linear(in_features=round(planes / 16), out_features=planes)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        original_out = out
        out = self.globalAvgPool(out)
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.sigmoid(out)
        out = out.view(out.size(0), out.size(1), 1, 1)
        out = out * original_out

        out += residual
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        if planes == 64:
            self.globalAvgPool = nn.AvgPool2d(56, stride=1)
        elif planes == 128:
            self.globalAvgPool = nn.AvgPool2d(28, stride=1)
        elif planes == 256:
            self.globalAvgPool = nn.AvgPool2d(14, stride=1)
        elif planes == 512:
            self.globalAvgPool = nn.AvgPool2d(7, stride=1)
        self.fc1 = nn.Linear(in_features=planes * 4, out_features=round(planes / 4))
        self.fc2 = nn.Linear(in_features=round(planes / 4), out_features=planes * 4)
        self.sigmoid = nn.Sigmoid()
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        original_out = out
        out = self.globalAvgPool(out)
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.sigmoid(out)
        out = out.view(out.size(0),out.size(1),1,1)
        out = out * original_out

        out += residual
        out = self.relu(out)

        return out


class SENet(nn.Module):

    def __init__(self, block, layers, num_classes=1000):
        self.inplanes = 64
        super(SENet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AvgPool2d(7, stride=1)
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)

        return x


def se_resnet_18(pretrained=False, **kwargs):
    """Constructs a ResNet-18 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = SENet(BasicBlock, [2, 2, 2, 2], **kwargs)
    return model


def se_resnet_34(pretrained=False, **kwargs):
    """Constructs a ResNet-34 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = SENet(BasicBlock, [3, 4, 6, 3], **kwargs)
    return model


def se_resnet_50(pretrained=False, **kwargs):
    """Constructs a ResNet-50 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = SENet(Bottleneck, [3, 4, 6, 3], **kwargs)
    return model


def se_resnet_101(pretrained=False, **kwargs):
    """Constructs a ResNet-101 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = SENet(Bottleneck, [3, 4, 23, 3], **kwargs)
    return model


def se_resnet_152(pretrained=False, **kwargs):
    """Constructs a ResNet-152 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = SENet(Bottleneck, [3, 8, 36, 3], **kwargs)
    return model

語義分割中的自注意力機制和低秩重重建

語義分割

!

  • 原始的網路主要進行影象分類,通過卷積層+全連線層得到最後的一個分類結果
  • 當網路的最後幾層,依舊採用卷積層,再通過上取樣輸出一個nxn的結果輸出
    • 全卷積網路,無論卷積核多大,總是收到感受野大小的限制
    • 而進行語義分割,需要更大的感受野範圍

Nonlocal Networks

  • 對於卷積神經網路的感受野,其大小就是卷積核的大小,只考慮區域性區域,因此是local的,而 non-local指的就是感受野可以很大,而不是一個區域性領域 (全連線層就是non-local的)

  • 預測一個物體的資訊,需要儘可能多的採集整個影象中各個位置的資訊,考慮當前畫素點和其他畫素點的關聯資訊

    • 即利用兩個點的相似性對每個位置的特徵做加權
    • $y_i = \frac1{C(x)} \sum_{ \forall j}f(x_i,x_j)g(x_j) $
    • \(f(x_,x_j) = e^{\theta(x_i)^T \phi(x_j)}\) 表示 \(x_i\)\(x_j\) 的相關度計算,C(x)表示一個歸一化操作,\(g(x_j)\)表示參考畫素的變換
      • 實現原理如圖:

      • 其相似度的計算有多種方法,不過差異不大,選了一個好操作的

      • 其中Embedding的實現方式,以影象為例,在文章中都採用1*1的卷積 ,即 \(\theta\)\(\phi\) 都是1x1卷積操作。

    • \(z_i = W_zy_i + x_i\)
      • 構成一個殘差模型
      • 這樣也成了一個block元件,可以直接插入到神經網路中
      • 實驗也證明了這些結構其存在的必要性和有效性
    • 與全連線層的關聯
      • 當兩個點之間不再根據位置資訊計算相似性,而是直接運算
      • \(g(x_j) = x_j\)
      • 歸一化係數為1
      • 那麼就成了全連線層,可以將全連線層理解為non-local的一個特例
  • 其具體實現如下:

    • 不過當輸入feature map的尺寸很大時,其non-local的計算量會很龐大,因此只在比較深的網路層(高階語義層)上使用

影象語義分割前沿進展

Res2Net

  • 為了更好的利用多尺度資訊,在一個ResNet block中,再次進行多尺度資訊的分割,從而充分利用尺度資訊

Strip Pooling

  • 帶狀池化
    • 傳統的標準pooling多是方形,而實際場景中會有一些物體是長條形,因此希望儘可能捕獲一個long-range的特徵
    • 把標準的spatial pooling的kernel的寬或高置為1,然後每次取所有水平元素或垂直元素相加求平均
  • SP模組
      • 對於一個輸入x(HxW), 用兩個pathway 分別處理水平和垂直的strip pooling,然後再expand到輸入的原尺寸 (HxW)
      • 然後將兩個pathway的結果相加進行融合 ,再用1x1卷積進行降維,最後使用sigmoid啟用
      • 不過感覺上面的處理部分像是計算得到了一個權重矩陣,得到了每個畫素位置的權重分佈情況,這樣理解起來,有點像SENet的注意力機制。。
    • 同時其中任意兩個畫素點之間的資訊也可以通過這種類似橋接的方式得到連線,從而獲得更多的全域性資訊