製作自己的python版本的類CIFAR10資料集

阿新 • • 發佈：2022-05-08

關於python 版本的CIFAR10的資料格式，官網上已經介紹：

data – a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image.
labels – a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.

因此，想要製作自己的資料集，只需要把data, label準備好就可以，另外，我們可以讀取cifar10儲存好的檔案，檢視其資料格式，以data_batch_1為例（可以通過cifar10_read.py讀取）：

{'data': array([[ 59,  43,  50, ..., 140,  84,  72],
       [154, 126, 105, ..., 139, 142, 144],
       [255, 253, 253, ...,  83,  83,  84],
       ..., 
       [ 71,  60,  74, ...,  68,  69,  68],
       [250, 254, 211, ..., 215, 255, 254],
       [ 62,  61,  60, ..., 130, 130, 131]], dtype=uint8), 
'labels': [6, 9, 9, 4, 1, 1, 2, 7, 8, 3, 4, 7, 7, 2, 9, 9, 9, 3, 2, 6, 4, 3, 6, 6, 2, 6, 3, 5, 4, 0, 0, 9, 1, 3, 4, 0, 3, 7, 3, 3, 5, 2, 2, 7, 1, 1, 1, 2, 2, 0, 9, 5, 7, 9, 2, 2, 5, 2, 4, 3, 1, 1, 8, 2, 1, 1, 4, 9, 7, 8, 5, 9, 6, 7, 3, 1, 9, 0, 3, 1, 3, 5, 4, 5, 7, 7,  ... , 9, 8, 9, 4, 4, 7, 1, 0, 4, 3, 6, 3, 9, 8, 3, 6, 8, 3, 6, 6, 2, 6, 7, 3, 0, 0, 0, 2, 5, 1, 2, 9, 2, 2, 1, 6, 3, 9, 1, 1, 5], 
'batch_label': 'training batch 1 of 5', 
'filenames': ['leptodactylus_pentadactylus_s_000004.png', 'camion_s_000148.png', 'tipper_truck_s_001250.png', ... , 'truck_s_000036.png', 'car_s_002296.png', 'estate_car_s_001433.png', 'cur_s_000170.png']}

很明顯，python版本儲存成了一個dict，其中key包括:

data, 存放影象資料檔案，是一個nx3072的陣列；
labels, 存放影象對應的label，是一個nx1的陣列；
batch_label, 說明資訊；
filenames, 檔名列表。
詳細的程式碼內容，可以檢視實現程式碼，另外demo.py中提供了測試資料，這裡把讀取的檔案結果輸出：

{'data': array([[255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255]], dtype=uint8), 
'label': [0, 1], 
'batch_label': 'training batch 0 of 1', 
'filenames': ['a.png', 'b.png']}

跟官方資料的輸出格式一致，雖然沒有訓練測試，但是理論上應該沒問題，大家在測試的過程中，如果遇到問題，歡迎指出。

製作自己的python版本的類CIFAR10資料集

製作自己的python版本的類CIFAR10資料集

深度學習——製作自己的VOC影象分割資料集

pytorch VGG11識別cifar10資料集(訓練+預測單張輸入圖片操作)

keras實現VGG16 CIFAR10資料集方式

CIFAR10 資料集分類

YOLOv3中K-Means聚類出新資料集的Anchor尺寸

cifar10資料集訓練

如何用 Python 處理不平衡資料集

Cifar10資料集的下載和匯入，windows和linux（基於tensorflow）

【Tensorflow】tensorflow和keras+讀取官方版本的MNIST資料集

python 用 read_csv讀取資料集時刪除某幾列元素

python 拆分多類別資料集

學習筆記——6-5載入cifar10資料集

TF09——CIFAR10資料集

cifar10資料集解壓縮，按名字分資料夾

如何使用scikit-learn在Python中生成測試資料集

新聞個性化推薦系統(python)-（附原始碼資料集）

python實現將兩個資料夾合併至另一個資料夾(製作資料集)

COCO資料集提取自己需要的類轉VOC

目標檢測 – 解析VOC和COCO格式並製作自己的資料集

製作自己的python版本的類CIFAR10資料集

相關推薦