PyTorch筆記8-Dropout
阿新 • • 發佈:2018-12-30
概要
在訓練時 loss 已經很小,但是把訓練的 NN 放到測試集中跑,loss 突然飆升,這很可能出現了過擬合(overfitting)
減低過擬合,一般可以通過:加大訓練集、loss function 加入正則化項、Dropout 等途徑,這裡演示 Dropout
import torch
from torch.autograd import Variable
import matplotlib.pyplot as plt
torch.manual_seed(1)
%matplotlib inline
準備資料
出現過擬合一般是由於訓練資料過少且網路結構較複雜,為了凸顯過擬合問題,這裡只用 10 個數據集來進行訓練
DATA_SIZE = 10
# training set
x = torch.unsqueeze(torch.linspace(-1, 1, DATA_SIZE), dim=1) # sieze (20,1)
y = x + 0.3*torch.normal(torch.zeros(DATA_SIZE, 1), torch.ones(DATA_SIZE, 1))
x, y = Variable(x), Variable(y)
# test set
test_x = torch.unsqueeze(torch.linspace(-1, 1, DATA_SIZE), dim=1)
test_y = test_x + 0.3 *torch.normal(torch.zeros(DATA_SIZE,1), torch.ones(DATA_SIZE,1))
test_x, test_y = Variable(test_x), Variable(test_y)
# scatter
plt.scatter(x.data.numpy(), y.data.numpy(), label='train')
plt.scatter(test_x.data.numpy(), test_y.data.numpy(), label='test')
plt.legend(loc='upper left')
plt.ylim((-2.5, 2.5 ))
plt.show()
搭建神經網路
這裡準備兩個網路,一個是帶 Dropout 的,一個是沒有 Dropout 的,為了區分這兩個網路,不帶 Dropout 的容易出現過擬合(overfitting)故命名為 net_overfitting,而含 Dropout 的命名為 net_dropout
- 注1:設定 Dropout 時,torch.nn.Dropout(0.5), 這裡的 0.5 是指該層(layer)的神經元在每次迭代訓練時會隨機有 50% 的可能性被丟棄(失活),不參與訓練,一般多神經元的 layer 設定隨機失活的可能性比神經元少的高
- 注2:由於要 Dropout,以及凸顯過擬合現象,所以 layer 的神經元設定多些,這裡為 300
N_HIDDEN = 300
# quick build NN by using Sequential
net_overfitting = torch.nn.Sequential(
torch.nn.Linear(1, N_HIDDEN), # first hidden layer
torch.nn.ReLU(), # activation func for first hidden layer
torch.nn.Linear(N_HIDDEN, N_HIDDEN), # second hidden layer
torch.nn.ReLU(), # activation func for second hidden layer
torch.nn.Linear(N_HIDDEN, 1)
)
net_dropout = torch.nn.Sequential(
torch.nn.Linear(1, N_HIDDEN),
torch.nn.Dropout(0.5), # drop 50% neurons
torch.nn.ReLU(),
torch.nn.Linear(N_HIDDEN, N_HIDDEN),
torch.nn.Dropout(0.5),
torch.nn.ReLU(),
torch.nn.Linear(N_HIDDEN, 1)
)
print('net_overfitting: \n', net_overfitting)
print('\n net_dropout: \n', net_dropout)
net_overfitting:
Sequential (
(0): Linear (1 -> 300)
(1): ReLU ()
(2): Linear (300 -> 300)
(3): ReLU ()
(4): Linear (300 -> 1)
)
net_dropout:
Sequential (
(0): Linear (1 -> 300)
(1): Dropout (p = 0.5)
(2): ReLU ()
(3): Linear (300 -> 300)
(4): Dropout (p = 0.5)
(5): ReLU ()
(6): Linear (300 -> 1)
)
訓練
為了控制單一變數(網路加不加 Dropout),net_overfitting 和 net_Dropout 的 loss、optimizer、lr等設定一致
optimizer_overfitting = torch.optim.Adam(net_overfitting.parameters(), lr=0.01)
optimizer_dropout = torch.optim.Adam(net_dropout.parameters(), lr=0.01)
loss_func = torch.nn.MSELoss()
for t in range(1000):
# train the NN by training set
prediction_overfitting = net_overfitting(x)
prediction_dropout = net_dropout(x)
loss_overfitting = loss_func(prediction_overfitting, y)
loss_dropout = loss_func(prediction_dropout, y)
optimizer_overfitting.zero_grad()
optimizer_dropout.zero_grad()
loss_overfitting.backward()
loss_dropout.backward()
optimizer_overfitting.step()
optimizer_dropout.step()
測試並出圖
用之前偽造的 test set 進行測試,比較 net_overfitting 和 net_dropout 兩個網路的 loss ,並用圖形直觀顯示
- 注1:在測試時,是不需要 Dropout 的,所以在 測試前將網路改成 eval() 形式,net_overfitting 沒有加 Dropout,所以不需更改
# test the NN by test set
net_dropout.eval() # test time differ from train time, NOT Dropout as test time
test_prediction_overfitting = net_overfitting(test_x)
test_prediction_dropout = net_dropout(test_y)
# scatter
# plt.scatter(x.data.numpy(), y.data.numpy(), label='train')
plt.scatter(test_x.data.numpy(), test_y.data.numpy(), label='test')
# plot
plt.plot(test_x.data.numpy(), test_prediction_overfitting.data.numpy(),'r-', label='overfitting')
plt.plot(test_x.data.numpy(), test_prediction_dropout.data.numpy(), 'b--', label='dropout')
plt.text(0, -1.2, 'overfitting loss=%.4f' % loss_func(test_prediction_overfitting, test_y).data[0], fontdict={'size': 20, 'color': 'red'})
plt.text(0, -1.5, 'dropout loss=%.4f' % loss_func(test_prediction_dropout, test_y).data[0], fontdict={'size': 20, 'color': 'blue'})
plt.legend(loc='upper left')
plt.ylim((-2.5, 2.5))
plt.show()
從上圖可以看出,測試集中,net_overfitting(not dropout)的 loss 比 net_dropout(with dropout)的 loss 大,紅色線條(not dropout) 對測試集資料的擬合效果比藍色線條(with dropout)差