Improving the Learning Speed of 2-Layer Neural Networks by Choosing Initial Values of the Adaptive Weights

阿新 • • 發佈：2022-03-25

概
主要內容
- 一維情形
  - 如何加速
- 多維情形
程式碼

Nguyen D. and Widrow B. Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In International Joint Conference on Neural Networks (IJCNN), 1990.

概

本文提出了一種關於兩層網路的權重初始化方法.

主要內容

假設我們想通過一個兩層的網路

\[f(\bm{x}) = \sum_{i=1}^{H} v_i \tanh (\bm{w}_i^T \bm{x} + b_i), \]

擬合一個函式$g(\bm{x})$

.
其中 $\bm{x}, \bm{w}_i \in \mathbb{R}^{d}$.

\[\tanh (x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}, \]

在$[-1, 1]$之前是近似線性的.

為了方便討論, 令$f_i(\bm{x}) := v_i \tanh (\bm{w}_i^T \bm{x} + b_i)$.

一維情形

\[f(x) = \sum_{i=1}^{H} v_i \tanh (w_i x + b_i), \\ f_i(x) = v_i \tanh (w_i x + b_i). \]

不考慮$\tanh$, 當$x \in [-1, 1]$

時, 每個$f_i(x)$(近似)落在區間

\[[b_i - |w_i|, b_i + |w_i|] \]

之中. 該區間以$b_i$為中心, 大小為$2|w_i|$.
所以總的網路可以看成是在每個區間內取樣$\xi_i$, 然後求和

\[f(x) = \sum_{i=1}^H v_i \xi_i. \]

倘若每個區間是差不離一致的, 那麼就容易導致$f(x)$本身是很平坦的,

注: 文中還要平坦, 這個和初始化有關係, 不過既然都是近似線性的, 所以直線是正常的.

如何加速

上面是的情況是權重 $w_i, b_i$取樣自 $\mathcal{U}[-1, 1]$

和$\mathcal{U}[-5, 5]$的結果. 我們希望輸出$f(x)$是不平坦的, 但是也希望其不會偏向任意一些值, 顯然取樣自 $\mathcal{U}[-5, 5]$的較小的值佔據了很大部分.

$f(x)$總共有$H$個結點, 滿足

\[-1 < w_ix + b_i < 1, \]

的$x$區間為

\[-1/|w_i| - b_i/ w_i < x < 1 / |w_i| - b_i / w_i, \]

該區間長度為 $2 / |w_i|$,
倘若不同結點

\[2 / |w_i| = 2 / H \rightarrow |w_i| = H. \]

即每個區間所對應的$x$的大小是一致的, 均為 $2 / H$. 那麼最後的輸出也就很有可能是均勻的且並不完全平坦的.

注: 在這種情況下, 對於任意$x \in [-1, 1]$, 只有一個區間 $f_i(x)$是落在$[-1, 1]$的, 其餘的都將表現出非線性, 從而導致不平坦?

為了讓區間有所重疊, 實際上我們採用$|w_i| = 0.7H$.

區間的中心 $-b_i / w_i$將從 $[-1, 1]$從取樣, 實際上

\[b_i \sim \mathcal{U}[-|w_i|, |w_i|]. \]

最後 $v_i$ 均勻取樣自 $\mathcal{U}[-0.5, 0.5]$.

下面是幾個示例:

多維情形

作者給出的是從$f_i(x)$到其傅立葉變換表示的解釋, 但是說實話
我不是很理解其中的具體思路 (可能是傅立葉變換了解得不是很透徹).
我這裡勉強給個自己的解釋吧.

首先,

\[f_i(\bm{x}) = v_i \tanh (\bm{w}_i^T \bm{x} + b_i). \]

倘若$\bm{x} \in [-1, 1]^d$, 那麼

\[-\|\bm{w}_i\|_1 + b_i \le \bm{w}_i^T\bm{x} + b_i \le \|\bm{w}_i\|_1 + b_i, \]

最大區間長度為$2\|\bm{w}_i\|_1$. 和之前一維一樣, 我們希望不同的區間長度是一致的, 即$\|\bm{w}_i\|_1$對於不同的$i$的大小也是一致的.

文中給出的是

\[\|\bm{w}\|_1 = H^{\frac{1}{d}}. \]

類似地, 在實際中, 採用

\[\|\bm{w}\|_1 = 0.7 * H^{\frac{1}{d}}. \]

並且, $b_i$取樣自$\mathcal{U}[-\|\bm{w}_i\|_1, \|\bm{w}_i\|_1]$.

注: 該係數是通過傅立葉變換, slices之類的推匯出來的, 但是我沒有搞清楚其中的關係;
注: 文中用的$|\cdot|$來表示大小, 我沒法保證這就是$\ell_1$範數.

程式碼

import torch.nn as nn
import torch.nn.functional as F

def nguyen_widrow_init(weight, bias, scale: float = 0.7):
    out_channels, in_channels = weight.size()
    nn.init.uniform_(weight, -0.5, 0.5)
    scale = scale * out_channels ** (1 / in_channels)
    weight.data.copy_(scale * F.normalize(weight.data, p=1, dim=-1))
    nn.init.uniform_(bias, -scale, scale)


# test

import torch
import torch.nn as nn
import torch.nn.functional as F

from freeplot.base import FreePlot

class Net(nn.Module):

    def __init__(self) -> None:
        super().__init__()

        self.linear1 = nn.Linear(1, 4)
        self.linear2 = nn.Linear(4, 1)


        nguyen_widrow_init(self.linear1.weight, self.linear1.bias)
        nn.init.uniform_(self.linear2.weight, -0.5, 0.5)
        nn.init.uniform_(self.linear2.bias, -0.5, 0.5)

    def forward(self, x):
        z = torch.tanh(self.linear1(x))
        y = self.linear2(z)
        return z, y


model = Net()
x = torch.linspace(-1, 1, 100).view(-1, 1)
z, y = model(x)

x = x.flatten().detach().clone().numpy()
y = y.flatten().detach().clone().numpy()
z = z.transpose(0, 1).detach().clone().numpy()


fp = FreePlot((1, 2), (5, 2), sharey=False)
for i in range(4):
    fp.lineplot(x, z[i], label=str(i))
fp.lineplot(x, y, label='ALL', index=(0, 1), color='black')
fp.set_label(r"$f_i(x)$", index=(0, 0))
fp.set_label(r"$f(x)$", index=(0, 1))
fp.show()

Improving the Learning Speed of 2-Layer Neural Networks by Choosing Initial Values of the Adaptive Weights

目錄概主要內容一維情形如何加速多維情形程式碼 Nguyen D. and Widrow B. Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In Inte

半監督-Learning Discrete Structures for Graph Neural Networks

標籤：圖神經網路動機圖神經網路主要優點是能夠在資料點之間結合稀疏和離散的依賴關係, 但是, 圖神經網路也只能在這樣的圖結構進行使用, 而在真實的世界中的圖通常是帶有噪聲和不完整的, 或者根本不可用的

Understanding the Representation Power of Graph Neural Networks in Learning Graph Topology-NIPS2019

理解圖神經網路再學習圖拓撲表示方面的能力-NIPS2019 一、引言 1、問題引入 Despite their practical success, most GCNs are deployed as black boxes feature extractors for graph data. It is not yet clear t

細粒度相關 - Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks - 1 - 論文學習

Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks Abstract 我們為卷積神經網路引入了一個基於顯著性的扭曲（distortion）層，這有助於改善給定任務的輸入資料的空間取樣。我們

AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What's the Difference?

AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What’s the Difference? | IBM 摘要: Deep Learning就像最新式的手機/電腦CPU，遊戲等等，好的很。可CPU早就有了，超級瑪麗、魂鬥羅、紅警這些遊

is running 236038656B beyond the 'VIRTUAL' memory limit. Current usage: 52.4 MB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used.

[2022-03-18 18:26:57.187]Container [pid=92959,containerID=container_1647598528227_0003_01_000002] is running 236038656B beyond the \'VIRTUAL\' memory limit. Current usage: 52.4 MB of 1 GB physical

The balance sheet of KriBank starts with an allowance for loan losses of $2.66 million. During the year, KriBank writes-off worthless loans amounting to $1.68 million, reco

The balance sheet of KriBank starts with an allowance for loan losses of $2.66 million. During the year, KriBank writes-off worthless loans amounting to $1.68 million, recovers $0.44 million on loans

Towards the Memorization Effect of Neural Networks in Adversarial Training

目錄概主要內容 typcial 和 atypical 樣本 atypical 較差的泛化性 typcial 和 atypical 樣本在魯棒性上的衝突

文獻閱讀 | Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single-nucleotide polymorphisms

Yamamoto, T., Nagasaki, H., Yonemaru, J. et al. Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single-nucleotide polymorphisms. BMC

echarts 中使用中國地圖前臺除錯工具持續報錯： Uncaught DOMException: Failed to execute 'drawImage' on 'CanvasRenderingContext2D': The image argument is a canvas element with a width or height of 0，

問題背景：　　因為自己需要，展示中國地圖的航飛路線，在製作大屏的過程中需要在中心的位置

Improving the Learning Speed of 2-Layer Neural Networks by Choosing Initial Values of the Adaptive Weights

概

主要內容

一維情形

如何加速

多維情形

程式碼

Improving the Learning Speed of 2-Layer Neural Networks by Choosing Initial Values of the Adaptive Weights

半監督-Learning Discrete Structures for Graph Neural Networks

Understanding the Representation Power of Graph Neural Networks in Learning Graph Topology-NIPS2019

細粒度相關 - Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks - 1 - 論文學習

AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What's the Difference?

is running 236038656B beyond the 'VIRTUAL' memory limit. Current usage: 52.4 MB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used.

The balance sheet of KriBank starts with an allowance for loan losses of $2.66 million. During the year, KriBank writes-off worthless loans amounting to $1.68 million, reco

Towards the Memorization Effect of Neural Networks in Adversarial Training

文獻閱讀 | Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single-nucleotide polymorphisms

echarts 中使用中國地圖前臺除錯工具持續報錯： Uncaught DOMException: Failed to execute 'drawImage' on 'CanvasRenderingContext2D': The image argument is a canvas element with a width or height of 0，

C++ Templates (2.1 類模板Stack的實現 Implementation of Class Template Stack)

deep learning and neural networks--handwriting recongnization

vue專案報錯Expected indentation of 2 spaces but found 4

Codeforces 776D【The Door Problem】(2-SAT)

關於Training deep neural networks for binary communication with the Whetstone method的程式碼實現

Extraction of the Quad Layout of a Triangle Mesh Guided by Its Curve Skeleton 3.5小節精讀

報錯：‘Concatenate’layer requires inputs with matching shapes expect for the concat axis. 解決思路

O(1) Check Power of 2

啟動hive報錯124.2MB of 1 GB physical memory used； 2.6 GB of 2.1 GB virtual memory used. Killing

CLion編寫Qt報錯：could not find a package configuration file provide by “QT“ with any of the following

Improving the Learning Speed of 2-Layer Neural Networks by Choosing Initial Values of the Adaptive Weights

概

主要內容

一維情形

如何加速

多維情形

程式碼

相關推薦