1. 程式人生 > 其它 >pytorch學習筆記四之訓練分類器

pytorch學習筆記四之訓練分類器

使用PyTorch進行分類器訓練

訓練分類器

1. 資料

處理影象,文字,音訊或視訊資料時,可以使用將資料載入到 NumPy 陣列中的標準 Python 包。 然後,將該陣列轉換為torch.*Tensor

  • 對於影象,Pillow,OpenCV 等包很有用
  • 對於音訊,請使用 SciPy 和 librosa 等包
  • 對於文字,基於 Python 或 Cython 的原始載入,或者 NLTK 和 SpaCy 很有用

專門針對視覺,一個名為torchvision的包,其中包含用於常見資料集(例如 Imagenet,CIFAR10,MNIST 等)的資料載入器,以及用於影象(即torchvision.datasets和torch.utils.data.DataLoader)的資料轉換器

我們將使用 CIFAR10 資料集。 它具有以下類別:“飛機”,“汽車”,“鳥”,“貓”,“鹿”,“狗”,“青蛙”,“馬”,“船”,“卡車”。 CIFAR-10 中的影象尺寸為3x32x32,即尺寸為32x32畫素的 3 通道彩色影象

資料集來源:CIFAR-10 and CIFAR-100 datasets

airplane
automobile
bird
cat
deer
dog
frog
horse
ship
truck

由於圖片地址在國外,以上圖片的載入可能不如人意,大致就是這個影象:

2. 訓練一個分類器

我們將會按順序做以下步驟:

  1. 用torchvision 載入和標準化CIFAR10訓練和測試資料
  2. 定義一個神經網路
  3. 定義一個損失函式
  4. 使用訓練資料訓練網路
  5. 使用測試資料測試網路

2.1. 載入資料並標準化

使用torchvision載入CIFAR10資料十分簡單:

In[1]:
import torch
import torchvision
import torchvision.transforms as transforms

輸出的torchvision資料集是PILImage影象,其範圍是[0,1]。我們將它轉化為Tensor的標準範圍[-1,1]

In[2]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 4

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=0)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=0)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Files already downloaded and verified
Files already downloaded and verified
  • 注意:如果在Windows上執行並且得到BrankPipeError,請嘗試將Torch.utils.Data.Dataloader()的Num_Worker設定為0。官網示例是Num_Worker設定為2

讓我們顯示一下訓練的圖片:

In[3]:
import matplotlib.pyplot as plt
import numpy as np

# functions to show an image


def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join(f'{classes[labels[j]]:5s}' for j in range(batch_size)))
dog   frog  dog   cat  

2.2.定義一個卷積神經網路

In[4]:
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

2.3.定義一個損失函式和優化器

讓我們使用分類交叉熵損失和帶有動量的 SGD

In[5]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

2.4.訓練網路

有趣的事情開始了,我們只需要迴圈我們的迭代器,並反饋到網路進行優化

In[6]:
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

print('Finished Training')
[1,  2000] loss: 2.193
[1,  4000] loss: 1.847
[1,  6000] loss: 1.661
[1,  8000] loss: 1.569
[1, 10000] loss: 1.488
[1, 12000] loss: 1.445
[2,  2000] loss: 1.405
[2,  4000] loss: 1.355
[2,  6000] loss: 1.329
[2,  8000] loss: 1.320
[2, 10000] loss: 1.277
[2, 12000] loss: 1.250
Finished Training

快速儲存訓練模型:

In[7]:
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

2.5.使用測試集測試網路

顯示測試集中的影象:

In[8]:
dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join(f'{classes[labels[j]]:5s}' for j in range(4)))
GroundTruth:  cat   ship  ship  plane

載入儲存的模型:

In[9]:
net = Net()
net.load_state_dict(torch.load(PATH))
Out[9]:
<All keys matched successfully>

使用神經網路進行預測:

In[10]:
outputs = net(images)
In[11]:
outputs
Out[11]:
tensor([[-0.4519, -2.6896,  1.1111,  2.4411, -1.2739,  0.9407,  1.2027, -0.9218,
         -0.3061, -1.4944],
        [ 4.0095,  5.7177, -1.3274, -3.2596, -4.4239, -6.4377, -5.2835, -5.2639,
          8.8550,  3.4490],
        [ 2.2643,  1.9055,  0.2977, -1.2159, -1.5517, -2.6117, -2.5904, -2.0696,
          3.1488,  0.7971],
        [ 3.6302,  0.2553,  0.3926, -1.3850,  0.2644, -2.8077, -2.8192, -1.0332,
          1.9776,  0.4094]], grad_fn=<AddmmBackward0>)

輸出是 10 類的能量。 一個類別的能量越高,網路就認為該影象屬於特定類別。 因此,讓我們獲取最高能量的指數:

In[12]:
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join(f'{classes[predicted[j]]:5s}'
                              for j in range(4)))
Predicted:  cat   ship  ship  plane

此次結果看起來不錯

我們看看這個網路在整個資料集的表現:

In[13]:
correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
    for data in testloader:
        images, labels = data
        # calculate outputs by running images through the network
        outputs = net(images)
        # the class with the highest energy is what we choose as prediction
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')
Accuracy of the network on the 10000 test images: 56 %

這看起來是比偶然更好(偶然的準確率是10%,即從10個類別中選擇一個),看起來這個網路學到了一些東西

看看這個這個分類器在哪些類別分類好,哪些類別分類差:

In[14]:
# prepare to count predictions for each class
correct_pred = {classname: 0 for classname in classes}
total_pred = {classname: 0 for classname in classes}

# again no gradients needed
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predictions = torch.max(outputs, 1)
        # collect the correct predictions for each class
        for label, prediction in zip(labels, predictions):
            if label == prediction:
                correct_pred[classes[label]] += 1
            total_pred[classes[label]] += 1


# print accuracy for each class
for classname, correct_count in correct_pred.items():
    accuracy = 100 * float(correct_count) / total_pred[classname]
    print(f'Accuracy for class: {classname:5s} is {accuracy:.1f} %')
Accuracy for class: plane is 65.5 %
Accuracy for class: car   is 67.1 %
Accuracy for class: bird  is 30.4 %
Accuracy for class: cat   is 53.5 %
Accuracy for class: deer  is 44.2 %
Accuracy for class: dog   is 35.9 %
Accuracy for class: frog  is 68.2 %
Accuracy for class: horse is 70.3 %
Accuracy for class: ship  is 68.9 %
Accuracy for class: truck is 60.4 %

2.6.在GPU上訓練

如果可以使用 CUDA,首先將我們的裝置定義為第一個可見的 cuda 裝置:

In[15]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# Assuming that we are on a CUDA machine, this should print a CUDA device:

print(device)
cuda:0

然後,這些方法將遞迴遍歷所有模組,並將其引數和緩衝區轉換為 CUDA 張量:

In[16]:
net.to(device)
Out[16]:
Net(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

還必須將每一步的輸入和目標也傳送到 GPU:

In[17]:
inputs, labels = data[0].to(device), data[1].to(device)

3.參考資料

[1]Training a Classifier — PyTorch Tutorials 1.10.1+cu102 documentation

[2]訓練分類器