Non-local Neural Networks

阿新 • • 發佈：2020-09-14

Non-local Neural Networks

一. 論文簡介

影象上（擴大感受野），視訊序列（臨近幾幀畫素不同的問題聯合），從區域性資訊到全域性資訊

主要做的貢獻如下（可能之前有人已提出）：

解決區域性感受野，設計一個Block

二. 模組詳解

2.1 Local和Non-Local

Local和Non-Local都是針對感受野來說的，3*3卷積就代表當前畫素的感受野範圍為9（8也可以，就是那個意思）

插曲：

看到這篇論文，真的有種相見恨晚的感覺，之前看到shuffleNet，通道之間打亂（按一定規則排序）可以增加資訊量，獲得更好的結果。那麼為什麼不能把feature打亂呢？\(（B、C、W、H）\)

，咱們一一分析：

B在取樣的時候已經打亂了，而且多少也可以設定。理論上，製藥模型足夠魯棒，B越大越好。

C的操作有很多，直接卷積就是對C的擴充套件，打亂是ShuffleNet的做法，不同權重是Attention的做法，大部分論文都是對C的操作，比如ResNet就是對不同通道相加.....

W、H的操作很少，最直接FC操作，這個操作效果很好，但是計算量太大。現在迴歸都不使用FC，使用1*1卷積+Reshape操作進行代替，比如人臉關鍵點（小網路）。

我本來的想法是將feature按block進行重新組合，然後卷積操作就可以獲得不同區域的資訊。

註釋：

使用多個卷積串聯可以增大感受野，但是在計算的過程中會丟失資訊

，所以串聯得到的全域性資訊是不足的（做什麼都會丟失，多少而已）。

使用SE模組可以獲得全域性資訊，但是完全沒有FC強大。

有沒有比FC計算量小，而且資訊量獲得和FC差不多的？

下面這幅圖是論文的核心，某一個點的預測，需要獲得不同位置的輔助，同時輔助的強度需要一個W權重控制。

2.2 具體實現

2.2.1 理論部分

看下面公式 \(（1）\)，\(x\) 表示輸入特徵，\(x_j\) 當前特徵，\(x_i\) 周圍特徵，\(f\) 表示相關函式（變換函式、\(x_i 、 x_j\) 關係函式） \(C\) 表示歸一化值（一般softmax即可）， \(g\) 表示當前特徵變換函式。

其實很簡單的一個函式，\(f\)

當做相關性函式（具體實現後面說），\(g\) 直接當做一個卷積，那麼兩者相乘就可以得到全域性資訊的 \(x\)。

整片文章都在介紹 \(f\) 這個二元函式的生成方式，有Gaussian、Embedded Gaussian、.....具體不用細看，因為實現比較麻煩，能用卷積的肯定不用其他的。

下面公式\(（4）\) 代表高斯函式，公式\(（5）\) 代表 \(g\) 函式:

如果還不懂上面的公式，直接看程式碼就恍然大悟

2.2.2 具體實現

程式碼的實現完全是按照論文敘述，整體結構如下圖所示，其中下采樣直接在 \(\phi、g\) 後面加maxpooling即可。

import torch
from torch import nn
from torch.nn import functional as F


class _NonLocalBlockND(nn.Module):
    def __init__(self, in_channels, inter_channels=None, dimension=3, sub_sample=True, bn_layer=True):
        super(_NonLocalBlockND, self).__init__()

        assert dimension in [1, 2, 3]

        self.dimension = dimension
        self.sub_sample = sub_sample

        self.in_channels = in_channels
        self.inter_channels = inter_channels

        if self.inter_channels is None:
            self.inter_channels = in_channels // 2
            if self.inter_channels == 0:
                self.inter_channels = 1

        if dimension == 3:
            conv_nd = nn.Conv3d
            max_pool_layer = nn.MaxPool3d(kernel_size=(1, 2, 2))
            bn = nn.BatchNorm3d
        elif dimension == 2:
            conv_nd = nn.Conv2d
            max_pool_layer = nn.MaxPool2d(kernel_size=(2, 2))
            bn = nn.BatchNorm2d
        else:
            conv_nd = nn.Conv1d
            max_pool_layer = nn.MaxPool1d(kernel_size=(2))
            bn = nn.BatchNorm1d

        self.g = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
                         kernel_size=1, stride=1, padding=0)

        if bn_layer:
            self.W = nn.Sequential(
                conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
                        kernel_size=1, stride=1, padding=0),
                bn(self.in_channels)
            )
            nn.init.constant_(self.W[1].weight, 0)
            nn.init.constant_(self.W[1].bias, 0)
        else:
            self.W = conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
                             kernel_size=1, stride=1, padding=0)
            nn.init.constant_(self.W.weight, 0)
            nn.init.constant_(self.W.bias, 0)

        self.theta = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
                             kernel_size=1, stride=1, padding=0)
        self.phi = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
                           kernel_size=1, stride=1, padding=0)

        if sub_sample:
            self.g = nn.Sequential(self.g, max_pool_layer)
            self.phi = nn.Sequential(self.phi, max_pool_layer)

    def forward(self, x):
        '''
        :param x: (b, c, t, h, w)
        :return:
        '''

        batch_size = x.size(0)

        g_x = self.g(x).view(batch_size, self.inter_channels, -1)
        g_x = g_x.permute(0, 2, 1)

        theta_x = self.theta(x).view(batch_size, self.inter_channels, -1)
        theta_x = theta_x.permute(0, 2, 1)
        phi_x = self.phi(x).view(batch_size, self.inter_channels, -1)
        f = torch.matmul(theta_x, phi_x)
        f_div_C = F.softmax(f, dim=-1)

        y = torch.matmul(f_div_C, g_x)
        y = y.permute(0, 2, 1).contiguous()
        y = y.view(batch_size, self.inter_channels, *x.size()[2:])
        W_y = self.W(y)
        z = W_y + x

        return z


class NONLocalBlock1D(_NonLocalBlockND):
    def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
        super(NONLocalBlock1D, self).__init__(in_channels,
                                              inter_channels=inter_channels,
                                              dimension=1, sub_sample=sub_sample,
                                              bn_layer=bn_layer)


class NONLocalBlock2D(_NonLocalBlockND):
    def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
        super(NONLocalBlock2D, self).__init__(in_channels,
                                              inter_channels=inter_channels,
                                              dimension=2, sub_sample=sub_sample,
                                              bn_layer=bn_layer)


class NONLocalBlock3D(_NonLocalBlockND):
    def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
        super(NONLocalBlock3D, self).__init__(in_channels,
                                              inter_channels=inter_channels,
                                              dimension=3, sub_sample=sub_sample,
                                              bn_layer=bn_layer)


if __name__ == '__main__':
    import torch

    for (sub_sample, bn_layer) in [(True, True), (False, False), (True, False), (False, True)]:
        '''
        img = torch.zeros(2, 3, 20)
        net = NONLocalBlock1D(3, sub_sample=sub_sample, bn_layer=bn_layer)
        out = net(img)
        print(out.size())
        '''
        img = torch.zeros(2, 3, 20, 20)
        net = NONLocalBlock2D(3, sub_sample=sub_sample, bn_layer=bn_layer)
        out = net(img)
        print(out.size())

        img = torch.randn(2, 3, 8, 20, 20)
        net = NONLocalBlock3D(3, sub_sample=sub_sample, bn_layer=bn_layer)
        out = net(img)
        print(out.size())

Non-local Neural Networks

Non-local Neural Networks 一. 論文簡介影象上（擴大感受野），視訊序列（臨近幾幀畫素不同的問題聯合），從區域性資訊到全域性資訊

Disentangled Non-Local Neural Networks

目錄Disentangled Non-Local Neural Networks一. 論文簡介二. 模組詳解2.1 論文思路簡介2.2 具體實現2.2.1 理論部分2.2.2 具體實現

《Non-local Neural Networks》論文閱讀筆記

原文連結：《Non-local Neural Networks》這是收錄在2018年CVPR中的一篇文章，受計算機視覺中Non-local means演算法啟發，文中提出non-local block以獲取long-range dependencies。由non-local block構建的模型在

圖神經網路論文閱讀(十六) GraLSP: Graph Neural Networks with Local Structural Patterns,AAAI 2020

本文作者來自香港科技大學、北大和北郵，其中包含宋國傑老師和石川老師，這兩位都是國內研究圖表示學習的翹楚了。之前讀石川團隊論文的時候自己犯傻發郵件問了一個比較弱智的問題，石川老師還是讓學生耐心幫

Learning local feature descriptors with triplets and shallow convolutional neural networks 論文閱讀筆記

Learning local feature descriptors with triplets and shallow convolutional neural networks 題目翻譯：學習 local feature descriptors使用 triplets 還有淺的卷積神經網路。讀罷此文，只覺收穫滿滿，同時另

【DMCP】2020-CVPR-DMCP Differentiable Markov Channel Pruning for Neural Networks-論文閱讀

DMCP 2020-CVPR-DMCP Differentiable Markov Channel Pruning for Neural Networks Shaopeng Guo（sensetime 商湯）

【論文筆記】Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition

地址：https://arxiv.org/pdf/2006.11538.pdf github：https://github.com/iduta/pyconv 目前的卷積神經網路普遍使用3×3的卷積神經網路，通過堆疊3×3的卷積核和下采樣層，會在減少影象的大小的同時增加

關係抽取--Relation Extraction: Perspective from Convolutional Neural Networks

一種使用CNN來提取特徵的模型，通過CNN的filter的大小來獲得不同的n-gram的資訊，模型的結構如下所示：

deep learning and neural networks--handwriting recongnization

use an algorithm to separate the letters from sentences. ( kind of classifier) the architecture of neural networks:

論文閱讀 (十三)：Revisiting Multiple Instance Neural Networks (2016 mi-Net & MI-Net)

文章目錄引入1 多示例神經網路1.1 mi-Net：Instance-Space MIL Algorithm1.2 MI-Net: A new Embedded-Space MIL Algorithm1.3 MI-Net with Deep Supervision1.4 MI-Net with Residual Connections1.5 MI

cs224w 圖神經網路學習筆記（九）Graph Neural Networks 圖神經網路

(8條訊息) cs224w 圖神經網路學習筆記（九）Graph Neural Networks 圖神經網路（含Hand-on Session）_喵木木的部落格-CSDN部落格

Declaration of non-local variable in 'for' loop

#include <stdio.h> int main() { /***Declaration of non-local variable in \'for\' loop***/ for (struct {

圖神經網路論文閱讀(十九) Position-aware Graph Neural Networks，ICML2019

本文的三位作者來自斯坦福大學。在正式閱讀本文之前，我們考慮這樣一個事實：以GCN為代表的領域聚合方式在每一輪的前向傳播中，只能獲取到N-hop鄰域內節點的特徵，並且，其最為基本的想法是基於網路同質性

AlexNet論文(ImageNet Classification with Deep Convolutional Neural Networks)學習筆記

ImageNet Classification with Deep Convolutional Neural Networks學習筆記第一個典型的CNN是LeNet5網路結構，但是第一個引起大家注意的網路卻是本論文所提出的AlexNet。這篇文章的AlexNet網路是在2012年

論文閱讀筆記《Convolutional Neural Networks for Steel Surface Defect Detection from Photometric Stereo》

小樣本學習&元學習經典論文整理||持續更新核心思想本文提出一種使用光度立體影象（Photometric Stereo Images）對鋼軌表面缺陷進行檢測的方法，關於光度立體影象我沒找到特別詳細的介紹，大體的概

TARGETDROP: A TARGETED REGULARIZATION METHOD FOR CONVOLUTIONAL NEURAL NETWORKS

TARGETDROP: A TARGETED REGULARIZATION METHOD FOR CONVOLUTIONAL NEURAL NETWORKS Dense Object Detection 一. 論文簡介

Convolutional Neural Networks: Application

所需檔案：本地下載 Convolutional Neural Networks: Application Welcome to Course 4\'s second assignment! In this notebook, you will:

關於Training deep neural networks for binary communication with the Whetstone method的程式碼實現

技術標籤：文獻閱讀脈衝神經網路 GitHub網址如下： https://github.com/SNL-NERL/Whetstone/blob/master/examples/adaptive_mnist.py 實現過程中解決的問題： 1.Ubuntu下，python+TensorFlow+Keras版本問題經檢

細粒度相關 - Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks - 1 - 論文學習

Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks Abstract 我們為卷積神經網路引入了一個基於顯著性的扭曲（distortion）層，這有助於改善給定任務的輸入資料的空間取樣。我們

（二）Linear Neural Networks -- 1. Linear Regression Implementation from Scratch

技術標籤：Dive into Deep Learning 1. Linear Regression Implementation from Scratch 1.1 Generating the Dataset

Non-local Neural Networks