[NumPy]檔案的儲存和載入

阿新 • • 發佈：2020-11-24

import numpy as np

二進位制檔案

save()、savez()和load()函式以 numpy 專用的二進位制型別（npy、npz）儲存和讀取資料，這三個函式會自動處理ndim、dtype、shape等資訊，使用它們讀寫陣列非常方便，但是save()輸出的檔案很難與其它語言編寫的程式相容。

npy格式：以二進位制的方式儲存檔案，在二進位制檔案第一行以文字形式儲存了資料的元資訊（ndim，dtype，shape等），可以用二進位制工具檢視內容。

npz格式：以壓縮打包的方式儲存檔案，可以用壓縮軟體解壓。

numpy.save(file, arr, allow_pickle=True, fix_imports=True)

Save an array to a binary file in NumPy .npy format.
numpy.load(file, mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII') Load arrays or pickled objects from .npy, .npz or pickled files.

outfile = r"./data/numpy_task001_1.npy"
x = np.array([[1,2,3],[4,5,6]])
print(x)
np.save(outfile, x) # 引數為：儲存路徑 儲存的陣列
y = np.load(outfile)
print(y)

[[1 2 3]
 [4 5 6]]
[[1 2 3]
 [4 5 6]]

outfile = r"./data/numpy_task001_2.npz"
x = np.linspace(0, np.pi, 5)
y = np.sin(x)
z = np.cos(x)
np.savez(outfile, x, y, z_d=z)
data = np.load(outfile)
np.set_printoptions(suppress=True) # 設定小數不需要以科學計數法形式輸出
print(data.files)

['z_d', 'arr_0', 'arr_1']

savez()第一個引數是檔名，其後的引數都是需要儲存的陣列，也可以使用關鍵字引數為陣列起一個名字，非關鍵字引數傳遞的陣列會自動起名為arr_0, arr_1, …。

savez()輸出的是一個壓縮檔案（副檔名為npz），其中每個檔案都是一個save()儲存的npy檔案，檔名對應於陣列名。load()自動識別npz檔案，並且返回一個類似於字典的物件，可以通過陣列名作為關鍵字獲取陣列的內容。

print(data['arr_0'], data['arr_1'], data['z_d'], sep='\n')

[0.         0.78539816 1.57079633 2.35619449 3.14159265]
[0.         0.70710678 1.         0.70710678 0.        ]
[ 1.          0.70710678  0.         -0.70710678 -1.        ]

解壓之後檔案內容為：

文字檔案

savetxt()，loadtxt()和genfromtxt()函式用來儲存和讀取文字檔案（如TXT，CSV等）。genfromtxt()比loadtxt()更加強大，可對缺失資料進行處理。

numpy.savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None) Save an array to a text file.
fname：檔案路徑
X：存入檔案的陣列。
fmt：寫入檔案中每個元素的字串格式，預設'%.18e'（保留18位小數的浮點數形式）。
delimiter：分割字串，預設以空格分隔。
numpy.loadtxt(fname, dtype=float, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None) Load data from a text file.
fname：檔案路徑。
dtype：資料型別，預設為float。
comments: 字串或字串組成的列表，預設為# ，表示註釋字符集開始的標誌。
skiprows：跳過多少行，一般跳過第一行表頭。
usecols：元組（元組內資料為列的數值索引），用來指定要讀取資料的列（第一列為0）。
unpack：當載入多列資料時是否需要將資料列進行解耦賦值給不同的變數。

outfile = r"./data/numpy_task001_1.txt"
x = np.array([[1,2,3],[4,5,6]])
np.savetxt(outfile, x)
y = np.loadtxt(outfile)
print(x, x.dtype, sep='\n', end='\n\n')
print(y, y.dtype, sep='\n')

[[1 2 3]
 [4 5 6]]
int32

[[1. 2. 3.]
 [4. 5. 6.]]
float64

numpy_task001_1.txt檔案內容：

1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00
4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00

outfile = r"./data/numpy_task001_2.csv"
x = np.array([[1.1,2.2,3.3],[4.4,5.5,6.6]], dtype=np.float32)
np.savetxt(outfile, x, fmt='%d', delimiter=',') # 儲存成整數，以逗號分隔每個數字
y = np.loadtxt(outfile, delimiter=',', dtype=np.int32)
print(x, x.dtype, sep='\n', end='\n\n')
print(y, y.dtype, sep='\n')

[[1.1 2.2 3.3]
 [4.4 5.5 6.6]]
float32

[[1 2 3]
 [4 5 6]]
int32

numpy_task001_2.csv檔案內容為：

1,2,3
4,5,6

建立一個csv檔案，內容為：

id,value1,value2,value3
1,123,1.0,23
2,110,0.5,18
3,164,2.1,19

outfile = r'./data/numpy_task001_data.csv'
x = np.loadtxt(outfile, delimiter=',', skiprows=1) # 跳過第一行表頭哦
print(x)

[[  1.  123.    1.   23. ]
 [  2.  110.    0.5  18. ]
 [  3.  164.    2.1  19. ]]

x = np.loadtxt(outfile, delimiter=',', skiprows=1, usecols=(1,2))
print(x)

[[123.    1. ]
 [110.    0.5]
 [164.    2.1]]

x1, x2 = np.loadtxt(outfile, delimiter=',', skiprows=1, usecols=(1,2), unpack=True)
print(x1, x2)

[123. 110. 164.] [1.  0.5 2.1]

genfromtxt()是面向結構陣列和缺失資料處理的。

numpy.genfromtxt(fname, dtype=float, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=''.join(sorted(NameValidator.defaultdeletechars)), replace_space='_', autostrip=False, case_sensitive=True, defaultfmt="f%i", unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None, encoding='bytes') Load data from a text file, with missing values handled as specified.
names：設定為True時，程式將把第一行作為列名稱。

outfile = r'./data/numpy_task001_data.csv'
x = np.genfromtxt(outfile, delimiter=',', names=True)
print(x) # x的型別為 <class 'numpy.ndarray'>
print(x.dtype)

[(1., 123., 1. , 23.) (2., 110., 0.5, 18.) (3., 164., 2.1, 19.)]
[('id', '<f8'), ('value1', '<f8'), ('value2', '<f8'), ('value3', '<f8')]

print(x['id'])
print(x['value1'])
print(x['value2']) 
print(x['value3'])

[1. 2. 3.]
[123. 110. 164.]
[1.  0.5 2.1]
[23. 18. 19.]

修改csv檔案，使其中存在空值，內容為：

id,value1,value2,value3
1,,1.0,23
2,110,,18
3,164,2.1,

x = np.genfromtxt(outfile, delimiter=',', names=True)
print(x['id'])
print(x['value1'])
print(x['value2']) 
print(x['value3'])

[1. 2. 3.]
[ nan 110. 164.]
[1.  nan 2.1]
[23. 18. nan]

文字格式選項

numpy.set_printoptions(precision=None,threshold=None, edgeitems=None,linewidth=None, suppress=None, nanstr=None, infstr=None,formatter=None, sign=None, floatmode=None, **kwarg) Set printing options.
- precision：設定浮點精度，控制輸出的小數點個數，預設是8。
- threshold：概略顯示，超過該值則以“…”的形式來表示，預設是1000。
- linewidth：用於確定每行多少字元數後插入換行符，預設為75。
- suppress：當suppress=True，表示小數不需要以科學計數法的形式輸出，預設是False。
- nanstr：浮點非數字的字串表示形式，預設nan。
- infstr：浮點無窮大的字串表示形式，預設inf。

These options determine the way floating point numbers, arrays and other NumPy objects are displayed.

np.set_printoptions(precision=4)
x = np.array([1.112233445566])
print(x)

[1.1122]

np.set_printoptions(threshold=2000)
x = np.ones((50, 50))
print(x) # x 中元素個數為2500，大於2000，所以省略顯示

[[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 ...
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]]

print(np.get_printoptions()) # 返回當前的列印格式

{'edgeitems': 3, 'threshold': 2000, 'floatmode': 'maxprec', 'precision': 4, 'suppress': True, 'linewidth': 75, 'nanstr': 'nan', 'infstr': 'inf', 'sign': '-', 'formatter': None, 'legacy': False}

[NumPy]檔案的儲存和載入

二進位制檔案

文字檔案

文字格式選項

[NumPy]檔案的儲存和載入

在Keras中實現儲存和載入權重及模型結構

keras訓練淺層卷積網路並儲存和載入模型例項

在pytorch中儲存和載入神經網路

dgl資料集的儲存和載入使用

pytorch儲存和載入模型的兩種方式

TensorFlow模型儲存和載入方法彙總

ELF檔案解析和載入(附程式碼)

pytorch-模型儲存和載入

線性迴歸10-模型儲存和載入

使用ZIP進行多檔案儲存和讀取

PyTorch 介紹 | 儲存和載入模型

詳解Python list和numpy array的儲存和讀取方法

Tensorflow 使用pb檔案儲存(恢復)模型計算圖和引數例項詳解

深入淺析golang zap 日誌庫使用（含檔案切割、分級別儲存和全域性使用等）

keras 權重儲存和權過載入方式

winform CSV檔案的儲存和讀取方式（datatable）

php入門篇------->PHPCMS 入口檔案，自動載入系統函式和URL規則

Numpy 二進位制檔案儲存 (NPY, NPZ)

LEADTOOLS如何載入，儲存和拆分註釋教程

[NumPy]檔案的儲存和載入

二進位制檔案

文字檔案

文字格式選項

相關推薦