【動手學pytorch】softmax迴歸

阿新 • • 發佈：2020-02-13

一、什麼是softmax？

有一個數組S，其元素為S_i ，那麼v_i 的softmax值，就是該元素的指數與所有元素指數和的比值。具體公式表示為：

softmax迴歸本質上也是一種對資料的估計

二、交叉熵損失函式

在估計損失時，尤其是概率上的損失，交叉熵損失函式更加常用。下面是交叉熵

當我們預測單個物體（即每個樣本只有1個標籤），y⁽ⁱ⁾為我們構造的向量，其分量不是0就是1，並且只有一個1（第y⁽ⁱ⁾個數為1）。於是。交叉熵只關心對正確類別的預測概率，因為只要其值足夠大，就可以確保分類結果正確。遇到一個樣本有多個標籤時，例如影象裡含有不止一個物體時，我們並不能做這一步簡化。但即便對於這種情況，交叉熵同樣只關心對影象中出現的物體類別的預測概率。

交叉熵函式為：

三、獲取Fashion-MNIST訓練集和讀取資料

我這裡我們會使用torchvision包，它是服務於PyTorch深度學習框架的，主要用來構建計算機視覺模型。torchvision主要由以下幾部分構成：

torchvision.datasets: 一些載入資料的函式及常用的資料集介面；
torchvision.models: 包含常用的模型結構（含預訓練模型），例如AlexNet、VGG、ResNet等；
torchvision.transforms: 常用的圖片變換，例如裁剪、旋轉等；
torchvision.utils: 其他的一些有用的方法。

```
from IPython import display
import matplotlib.pyplot as plt

import torch
import torchvision
import torchvision.transforms as transforms
import time

import sys
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l

#get datatest。如果不設定train的值，那麼就同時返回train和test，此時的操作見“四”中的第二個程式碼塊

mnist_train = torchvision.datasets.FashionMNIST(root='/home/kesci/input/FashionMNIST2065', train=True, download=True, transform=transforms.ToTensor())
mnist_test = torchvision.datasets.FashionMNIST(root='/home/kesci/input/FashionMNIST2065', train=False, download=True, transform=transforms.ToTensor())
```
class torchvision.datasets.FashionMNIST(root, train=True, transform=None, target_transform=None, download=False)
- root（string）– 資料集的根目錄，其中存放processed/training.pt和processed/test.pt檔案。
- train（bool, 可選）– 如果設定為True，從training.pt建立資料集，否則從test.pt建立。
- download（bool, 可選）– 如果設定為True，從網際網路下載資料並放到root資料夾下。如果root目錄下已經存在資料，不會再次下載。
- transform（可被呼叫 , 可選）– 一種函式或變換，輸入PIL圖片，返回變換之後的資料。如：transforms.RandomCrop。
- target_transform（可被呼叫 , 可選）– 一種函式或變換，輸入目標，進行變換。

# show result 
print(type(mnist_train))
print(len(mnist_train), len(mnist_test))

<class 'torchvision.datasets.mnist.FashionMNIST'>
60000 10000

# 我們可以通過下標來訪問任意一個樣本
feature, label = mnist_train[0]
print(feature.shape, label)  # Channel x Height x Width

torch.Size([1, 28, 28]) 9

#如果不做變換輸入的資料是影象，我們可以看一下圖片的型別引數
mnist_PIL = torchvision.datasets.FashionMNIST(root='/home/kesci/input/FashionMNIST2065', train=True, download=True)
PIL_feature, label = mnist_PIL[0]
print(PIL_feature)

<PIL.Image.Image image mode=L size=28x28 at 0x7F54A41612E8>

# 本函式已儲存在d2lzh包中方便以後使用
def get_fashion_mnist_labels(labels):
    text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
                   'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
    return [text_labels[int(i)] for i in labels]

def show_fashion_mnist(images, labels):
    d2l.use_svg_display()
    # 這裡的_表示我們忽略（不使用）的變數
    _, figs = plt.subplots(1, len(images), figsize=(12, 12))
    for f, img, lbl in zip(figs, images, labels):
        f.imshow(img.view((28, 28)).numpy())
        f.set_title(lbl)
        f.axes.get_xaxis().set_visible(False)
        f.axes.get_yaxis().set_visible(False)
    plt.show()

X, y = [], []
for i in range(10):
    X.append(mnist_train[i][0]) # 將第i個feature加到X中
    y.append(mnist_train[i][1]) # 將第i個label加到y中
show_fashion_mnist(X, get_fashion_mnist_labels(y))

# 讀取資料
batch_size = 256
num_workers = 4


train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)

start = time.time()
for X, y in train_iter:
    continue
print('%.2f sec' % (time.time() - start))

四、從零開始的softmax

import torch
import torchvision
import numpy as np
import sys
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l

 1 #獲取訓練集資料和測試集資料
 2 batch_size = 256
 3 train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, root='/home/kesci/input/FashionMNIST2065')
 4 
 5 #模型引數初始化
 6 num_inputs = 784
 7 print(28*28)
 8 num_outputs = 10
 9 
10 W = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_outputs)), dtype=torch.float)
11 b = torch.zeros(num_outputs, dtype=torch.float)
12 
13 784
14 
15 W.requires_grad_(requires_grad=True)
16 b.requires_grad_(requires_grad=True)

#對多維陣列的操作
X = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(X.sum(dim=0, keepdim=True))  # dim為0，按照相同的列求和，並在結果中保留列特徵
print(X.sum(dim=1, keepdim=True))  # dim為1，按照相同的行求和，並在結果中保留行特徵
print(X.sum(dim=0, keepdim=False)) # dim為0，按照相同的列求和，不在結果中保留列特徵
print(X.sum(dim=1, keepdim=False)) # dim為1，按照相同的行求和，不在結果中保留行特徵

tensor([[5, 7, 9]])
tensor([[ 6],
        [15]])
tensor([5, 7, 9])
tensor([ 6, 15])

定義softmax：

 1 def softmax(X):
 2     X_exp = X.exp()                                            #對所有分量求exp
 3     partition = X_exp.sum(dim=1, keepdim=True) 
 4     print("X size is ", X_exp)
 5     print("partition size is ", partition, partition.size())
 6     return X_exp / partition                
 7 
 8 X = torch.rand((2, 5))
 9 X_prob = softmax(X)
10 print(X_prob, '\n', X_prob.sum(dim=1))
11 
12 #如果我們不在sum那一步設定 keepdim=True，那麼partition會變成一個1×2而不是2×1的矩陣
13 
14 X size is  tensor([[2.1143, 1.4179, 2.1258, 2.3031, 1.2574],
15         [1.1700, 1.1645, 1.1296, 1.8801, 1.3726]])
16 partition size is  tensor([[9.2185],
17         [6.7168]]) torch.Size([2, 1])
18 
19 tensor([[0.2253, 0.1823, 0.1943, 0.2275, 0.1706],
20         [0.1588, 0.2409, 0.2310, 0.1670, 0.2024]]) 
21 tensor([1.0000, 1.0000])    #說明所有樣本出現的概率之和為1

建立迴歸模型

def net(X):
    #行維度未知，列維度為輸入值。此時寫為.view(-1,num_inputs)。即行列哪一個未知，哪一個就寫-1。
    #如果是torch.view(-1)，則原張量會變成一維的結構。即把所有分量全部整合到一個向量中
    return softmax(torch.mm(X.view((-1, num_inputs)), W) + b)

定義損失函式

def cross_entropy(y_hat, y):
    return - torch.log(y_hat.gather(1, y.view(-1, 1)))#取對應第y(i)個的y_hat

補充：gather(input, dim, index)或input.gather(dim,index)

index由tensor型別提供。dim主要決定以行（0）還是以列（1）進行運算

下面的例子中因為按照列，並且

y.view(-1, 1)=(0,2)',為列向量
所以下面程式碼的意思是，按照列來看，取第一行的第0列分量（0.1）和第二行的第2列分量（0.5）

y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
y = torch.LongTensor([0, 2])
y_hat.gather(1, y.view(-1, 1))

tensor([[0.1000],
        [0.5000]])

定義準確率

完成預測後需要準確率函式進行檢驗

def accuracy(y_hat, y):
    return (y_hat.argmax(dim=1) == y).float().mean().item()　　#.argmax(dim=1)按照行取最大值。
#如果與真實值相同就為1，否則為0.然後計算他們的平均值
print(accuracy(y_hat, y))

# 求平均準確率。本函式已儲存在d2lzh_pytorch包中方便以後使用。該函式將被逐步改進：它的完整實現將在“影象增廣”一節中描述
def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0.0, 0
    for X, y in data_iter:
        acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
        n += y.shape[0]
    return acc_sum / n

print(evaluate_accuracy(test_iter, net))

訓練模型

num_epochs, lr = 5, 0.1

# 本函式已儲存在d2lzh_pytorch包中方便以後使用
def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
              params=None, lr=None, optimizer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
　　　　　#train_l為訓練損失，train_acc為訓練準確率
        for X, y in train_iter:
            y_hat = net(X)
            l = loss(y_hat, y).sum()
            
            # 梯度清零
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()
            
            l.backward()
            if optimizer is None:
                d2l.sgd(params, lr, batch_size)
            else:
                optimizer.step() 
            
            
            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))

train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W, b], lr)

模型預測

現在我們的模型訓練完了，可以進行一下預測，我們的這個模型訓練的到底準確不準確。現在就可以演示如何對影象進行分類了。給定一系列影象（第三行影象輸出），我們比較一下它們的真實標籤（第一行文字輸出）和模型預測結果（第二行文字輸出）。

X, y = iter(test_iter).next()

true_labels = d2l.get_fashion_mnist_labels(y.numpy())#真實標籤
pred_labels = d2l.get_fashion_mnist_labels(net(X).argmax(dim=1).numpy())#預測標籤
titles = [true + '\n' + pred for true, pred in zip(true_labels, pred_labels)]

d2l.show_fashion_mnist(X[0:9], titles[0:9])

五、pytorch的簡單實現

import torch
from torch import nn
from torch.nn import init
import numpy as np
import sys
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l

#初始化引數和獲取資料
batch_size = 256        
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, root='/home/kesci/input/FashionMNIST2065')

#定義網路模型（即迴歸模型）
num_inputs = 784    #28×28
num_outputs = 10    #10種類型的圖片

class LinearNet(nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super(LinearNet, self).__init__()
        self.linear = nn.Linear(num_inputs, num_outputs)
    def forward(self, x): # x 的形狀: (batch, 1, 28, 28)
        y = self.linear(x.view(x.shape[0], -1))
        return y
    
# net = LinearNet(num_inputs, num_outputs)

class FlattenLayer(nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()
    def forward(self, x): # x 的形狀: (batch, *, *, ...)
        return x.view(x.shape[0], -1)

from collections import OrderedDict
net = nn.Sequential(
        # FlattenLayer(),
        # LinearNet(num_inputs, num_outputs) 
        OrderedDict([
           ('flatten', FlattenLayer()),
           ('linear', nn.Linear(num_inputs, num_outputs))]) # 或者寫成我們自己定義的 LinearNet(num_inputs, num_outputs) 也可以
        )

 #初始化模型引數
init.normal_(net.linear.weight, mean=0, std=0.01)
init.constant_(net.linear.bias, val=0)

#定義損失函式
loss = nn.CrossEntropyLoss() # 下面是他的函式原型
# class torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')

#定義優化函式
optimizer = torch.optim.SGD(net.parameters(), lr=0.1) # 下面是函式原型
# class torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)

#訓練
num_epochs = 5
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)

【動手學pytorch】softmax迴歸

一、什麼是softmax？

二、交叉熵損失函式

三、獲取Fashion-MNIST訓練集和讀取資料

四、從零開始的softmax

定義softmax：

建立迴歸模型

定義損失函式

定義準確率

訓練模型

模型預測

五、pytorch的簡單實現

【動手學pytorch】softmax迴歸

【機器學習】softmax迴歸（二）

【機器學習】softmax迴歸（一）

【學習筆記】softmax迴歸與mnist程式設計

【深度學習】Softmax迴歸（一）概念和原理

【動手學深度學習】中Jupyter notebook中 import mxnet出錯

【小白學PyTorch】1 搭建一個超簡單的網路

【小白學PyTorch】3 淺談Dataset和Dataloader

【小白學PyTorch】4 構建模型三要素與權重初始化

【小白學PyTorch】5 torchvision預訓練模型與資料集全覽

【小白學PyTorch】6 模型的構建訪問遍歷儲存（附程式碼）

【小白學PyTorch】8 實戰之MNIST小試牛刀

【小白學PyTorch】9 tensor資料結構與儲存結構

【小白學PyTorch】11 MobileNet詳解及PyTorch實現

【小白學PyTorch】12 SENet詳解及PyTorch實現

【小白學PyTorch】13 EfficientNet詳解及PyTorch實現

【小白學PyTorch】15 TF2實現一個簡單的服裝分類任務

【小白學PyTorch】16 TF2讀取圖片的方法

【小白學PyTorch】17 TFrec檔案的建立與讀取

【小白學PyTorch】18 TF2構建自定義模型

【動手學pytorch】softmax迴歸

一、什麼是softmax？

二、交叉熵損失函式

三、獲取Fashion-MNIST訓練集和讀取資料

四、從零開始的softmax

定義softmax：

建立迴歸模型

定義損失函式

定義準確率

訓練模型

模型預測

五、pytorch的簡單實現

相關推薦