minist資料集的獲取方法
阿新 • • 發佈:2018-11-10
在官網下載minist資料集,速度感人。因此特意上傳了一份,供需要的朋友下載:minist資料集
第一次開啟這個資料集,也是一臉懵,不過還好,脫坑程式碼如下:
import numpy as np
import os
# 訓練集
with open('./minist_data/train-images.idx3-ubyte') as f:
loaded = np.fromfile(file = f, dtype = np.uint8)
train_data = loaded[16:].reshape((60000, 784))
print(train_data.shape) # (60000, 784)
with open('./minist_data/train-labels.idx1-ubyte') as f:
loaded = np.fromfile(file = f, dtype = np.uint8)
train_labels = loaded[8:]
print(train_labels.shape) # (60000,)
# 測試集
with open('./minist_data/t10k-images.idx3-ubyte') as f:
loaded = np.fromfile(file=f, dtype=np.uint8)
test_data = loaded[16:].reshape((10000, 784))
print(test_data.shape) # (10000, 784)
with open('./minist_data/t10k-labels.idx1-ubyte') as f:
loaded = np.fromfile(file=f, dtype=np.uint8)
test_labels = loaded[8:].reshape((10000))
print(test_labels.shape) # (10000,)
可以看到,訓練集有六萬條樣本,784個特徵。測試集一萬條樣本。