1. 程式人生 > 其它 >利用transforms Dataset DataLoader對影象資料進行處理並構建自己的資料集

利用transforms Dataset DataLoader對影象資料進行處理並構建自己的資料集

技術標籤:python機器學習計算機視覺深度學習pytorch

1. torchvision.transforms

在CV任務中,可以用此對影象進行預處理,資料增強等操作

1.1 Transforms on Image

import torchvision.transforms as transforms
from PIL import Image

img = Image.open('lena.png')
img = img.convert("RGB")
img

在這裡插入圖片描述

width, height = img.size
print(width, height)
132 193

1.1.1 transforms.Resize

把給定的圖片resize到給定的size

size = (100, 100)
transform = transforms.Resize(size=size)
resize_img = transform(img)
resize_img

在這裡插入圖片描述

1.1.2 transforms.CenterCrop

在圖片的中心區域進行裁剪

size = (100, 100)
transform = transforms.CenterCrop(size=size)
centercrop_img = transform(img)
centercrop_img

在這裡插入圖片描述

1.1.3 transforms.RandomCrop

在圖片上隨機一個位置進行裁剪

size = (100, 100)
transform = transforms.RandomCrop(size=size)
randomcrop_img = transform(img)
randomcrop_img

在這裡插入圖片描述

1.1.4 transforms.RandomHorizontalFlip§

以概率為p水平翻轉給定的影象

transform = transforms.RandomHorizontalFlip(p=0.5)
rpf_img = transform(img)
rpf_img

在這裡插入圖片描述

1.1.5 transforms.RandomVerticalFlip§

以概率為p垂直翻轉給定的影象

transform = transforms.RandomVerticalFlip(p=0.5)
rvf_img = transform(img)
rvf_img

在這裡插入圖片描述

1.1.6 transforms.ColorJitter

隨機修改圖片的亮度、對比度和飽和度,常用來進行資料增強

brightness = (1, 10)
contrast = (1, 10)
saturation = (1, 10)
hue = (0.2, 0.4)
transform = transforms.ColorJitter(brightness, contrast, saturation, hue)
colorjitter_img = transform(img)
colorjitter_img

在這裡插入圖片描述

1.1.7 transforms.Grayscale

將影象轉換為灰度影象

transform = transforms.Grayscale()
gary_img = transform(img)
gary_img

在這裡插入圖片描述

1.1.8 transforms.RandomGrayscale

以概率p將影象轉換為灰度影象

transform = transforms.RandomGrayscale(p=0.5)
rg_img = transform(img)
rg_img

在這裡插入圖片描述

1.2 transforms on Tensor

1.2.1 transforms.ToTensor()

將Image轉換為Tensor

transform = transforms.ToTensor()
tensor_img = transform(img)
tensor_img
tensor([[[0.7176, 0.7294, 0.7255,  ..., 0.6627, 0.6549, 0.6627],
         [0.7137, 0.7176, 0.7176,  ..., 0.6510, 0.6510, 0.6549],
         [0.7137, 0.7176, 0.7137,  ..., 0.6392, 0.6431, 0.6353],
         ...,
         [0.9922, 1.0000, 0.9725,  ..., 0.6863, 0.6902, 0.7059],
         [1.0000, 1.0000, 0.9961,  ..., 0.6745, 0.6824, 0.6902],
         [1.0000, 0.9961, 0.9882,  ..., 0.6745, 0.6745, 0.6863]],

        [[0.3843, 0.3922, 0.3922,  ..., 0.3529, 0.3451, 0.3529],
         [0.3765, 0.3804, 0.3804,  ..., 0.3412, 0.3412, 0.3412],
         [0.3765, 0.3804, 0.3804,  ..., 0.3294, 0.3412, 0.3333],
         ...,
         [0.8745, 0.8941, 0.8863,  ..., 0.3294, 0.3490, 0.3647],
         [0.9098, 0.9176, 0.9176,  ..., 0.3216, 0.3373, 0.3490],
         [0.9294, 0.9255, 0.9255,  ..., 0.3216, 0.3294, 0.3412]],

        [[0.2745, 0.2863, 0.2784,  ..., 0.2353, 0.2235, 0.2353],
         [0.2784, 0.2745, 0.2745,  ..., 0.2353, 0.2353, 0.2314],
         [0.2784, 0.2745, 0.2706,  ..., 0.2275, 0.2392, 0.2353],
         ...,
         [0.8706, 0.8824, 0.8627,  ..., 0.2510, 0.2706, 0.2863],
         [0.9216, 0.9176, 0.9059,  ..., 0.2392, 0.2588, 0.2706],
         [0.9451, 0.9333, 0.9255,  ..., 0.2392, 0.2510, 0.2588]]])

1.2.2 transforms.Normalize

input[channel] = (input[channel] - mean[channel]) / std[channel]

transform = transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
img_normal = transform(tensor_img)
img_normal
tensor([[[ 0.4353,  0.4588,  0.4510,  ...,  0.3255,  0.3098,  0.3255],
         [ 0.4275,  0.4353,  0.4353,  ...,  0.3020,  0.3020,  0.3098],
         [ 0.4275,  0.4353,  0.4275,  ...,  0.2784,  0.2863,  0.2706],
         ...,
         [ 0.9843,  1.0000,  0.9451,  ...,  0.3725,  0.3804,  0.4118],
         [ 1.0000,  1.0000,  0.9922,  ...,  0.3490,  0.3647,  0.3804],
         [ 1.0000,  0.9922,  0.9765,  ...,  0.3490,  0.3490,  0.3725]],

        [[-0.2314, -0.2157, -0.2157,  ..., -0.2941, -0.3098, -0.2941],
         [-0.2471, -0.2392, -0.2392,  ..., -0.3176, -0.3176, -0.3176],
         [-0.2471, -0.2392, -0.2392,  ..., -0.3412, -0.3176, -0.3333],
         ...,
         [ 0.7490,  0.7882,  0.7725,  ..., -0.3412, -0.3020, -0.2706],
         [ 0.8196,  0.8353,  0.8353,  ..., -0.3569, -0.3255, -0.3020],
         [ 0.8588,  0.8510,  0.8510,  ..., -0.3569, -0.3412, -0.3176]],

        [[-0.4510, -0.4275, -0.4431,  ..., -0.5294, -0.5529, -0.5294],
         [-0.4431, -0.4510, -0.4510,  ..., -0.5294, -0.5294, -0.5373],
         [-0.4431, -0.4510, -0.4588,  ..., -0.5451, -0.5216, -0.5294],
         ...,
         [ 0.7412,  0.7647,  0.7255,  ..., -0.4980, -0.4588, -0.4275],
         [ 0.8431,  0.8353,  0.8118,  ..., -0.5216, -0.4824, -0.4588],
         [ 0.8902,  0.8667,  0.8510,  ..., -0.5216, -0.4980, -0.4824]]])

1.2.3 transforms.Compose

將多個變換組合在一起

img = Image.open('lena.png')
img = img.convert('RGB')

transform = transforms.Compose([
    transforms.Resize(100),
    transforms.RandomHorizontalFlip(),
    transforms.CenterCrop(64),
    transforms.ToTensor(),
    transforms.Normalize((.5, .5, .5), (.5, .5, .5))
])

img_compose = transform(img)
img_compose.size()
torch.Size([3, 64, 64])

2. torchvision.datasets

用來進行資料載入的,下面以CIFAR-10資料集為例,其中transform表示對資料進行預處理,對應著上面所講

import torchvision

trainset = torchvision.datasets.CIFAR10(
    root='./dataset',  # 資料集下載的地方
    train=True,   # True表示建立訓練集;False表示建立測試集
    download=True, # 如果為true,則從Internet下載資料集。如果已下載資料集,則不會再次下載
    transform=None  # 表示是否對資料進行預處理,None表示不做任何處理
)

3. torch.utils.data.DataLoader

import torch
from torch.utils.data.sampler import SubsetRandomSampler

trainloader = torch.utils.data.DataLoader(
    dataset=trainset,  # 載入torch.utils.data.Dataset物件資料或者是torchvision.datasets中的資料
    batch_size=1, # 每個batch所含樣本的大小
    shuffle=False, # 是否對資料進行打亂
    sampler=SubsetRandomSampler(indices=), # 按指定下標進行取樣,如果此引數被指定,shuffle引數必須為False
    drop_last=False, # 當整個資料集不能整除batch_size,False表示最後一個batch的大小會變小,True表示直接丟棄最後一個batch
    num_workers=0 # 表示載入的時候子程序數
)


4. torch.utils.data.Dataset

from torch.utils.data.dataset import Dataset


# 基本框架
class CustomDataset(Dataset):
    def __init__(self):
    	"""
    	一些初始化過程寫在這裡
    	"""
        # TODO
        # 1. Initialize file paths or a list of file names. 
        pass
    def __getitem__(self, index):
    	"""
    	返回資料和標籤,可以這樣顯示呼叫:
    	img, label = MyCustomDataset.__getitem__(index)
    	"""
        # TODO
        # 1. Read one data from file (e.g. using numpy.fromfile, PIL.Image.open).
        # 2. Preprocess the data (e.g. torchvision.Transform).
        # 3. Return a data pair (e.g. image and label).
        pass
    def __len__(self):
    	"""
    	返回所有資料的數量
    	"""
        # You should change 9 to the total size of your dataset.
        return 9 # e.g. 9 is size of dataset

目前我們有一個關於影象分類的問題,資料結構如下:

在這裡插入圖片描述

其中一個是訓練資料夾,一個測試資料夾,分類的類別數為6個,其中每個資料夾包含很多圖片

如何構建Custom Dataset

  1. 分別為訓練集和測試集建立兩個DataFrame檔案,其中DataFrame檔案有兩列,一列是圖片的名字,令一列為標籤
ImagesLabels
0.jpg0
99.jpg5
  1. 構建Custom Dataset
class INTELDataset(Dataset):
    def __init__(self, img_data,img_path,transform=None):
        self.img_path = img_path    # 資料路徑
        self.transform = transform
        self.img_data = img_data  # DaraFrame
        
   
    
    def __getitem__(self, index):
        img_name = os.path.join(self.img_path,self.img_data.loc[index, 'labels'],
                                self.img_data.loc[index, 'Images'])  # 圖片路徑
        image = Image.open(img_name)  # 獲得圖片
        image = image.convert('RGB')
        label = torch.tensor(self.img_data.loc[index, 'labels'])  # 獲得標籤
        if self.transform is not None:
            image = self.transform(image)
        return image, label

    
    
    def __len__(self):
        return len(self.img_data)  # 資料大小