Python-儲存物件&模型檔案
阿新 • • 發佈:2018-11-02
1. 儲存變數
1.1 pickle(泡菜)
pickle庫可以指定多個變數儲存在.pickle檔案中,如果需要儲存的變數不是很多,這種方法可以採用。
import pickle
# obj0, obj1, obj2 are created here...
obj0, obj1, obj2 = [1, 2], [2, 3], [3, 4]
# Saving the objects:
# 通過傳遞protocol = -1到dump()來減少檔案大小
with open('test.pickle', 'wb') as f: # Python 3: open(..., 'wb')
pickle.dump([obj0, obj1, obj2], f)
f.close()
# Getting back the objects:
with open('test.pickle', 'rb') as f: # Python 3: open(..., 'rb')
x0, x1, x2 = pickle.load(f)
print(x0)
f.close()
[1, 2]
1.2 cPickle
cPickle的速度更快,其餘和cPickle基無差別。在python3裡面,cPickle
變成_pickle
。具體如下:
Docstring: Optimized C implementation for the Python pickle module.
import _pickle as cpickle
# obj0, obj1, obj2 are created here...
obj0, obj1, obj2 = [1, 2], [2, 3], [3, 4]
# Saving the objects:
# 通過傳遞protocol = -1到dump()來減少檔案大小
with open('test.pickle', 'wb') as f: # Python 3: open(..., 'wb')
cpickle.dump([obj0, obj1, obj2], f)
f.close()
del x0, x1, x2
# Getting back the objects:
with open('test.pickle', 'rb') as f: # Python 3: open(..., 'rb')
x0, x1, x2 = cpickle.load(f)
print(x0)
f.close()
[1, 2]
1.3 shelve
似乎不支援內建函式等其他的物件,也不是很智慧嘛~這樣到不如直接用pickle。參考:http://www.php.cn/python-tutorials-410803.html
import shelve
T='Hiya'
val=[1,2,3]
filename='test'
my_shelf = shelve.open(filename,'n') # 'n' for new
for key in dir():
try:
my_shelf[key] = globals()[key]
except:
#
# __builtins__, my_shelf, and imported modules can not be shelved.
#
print('ERROR shelving: {0}'.format(key))
my_shelf.close()
del val,T
my_shelf = shelve.open(filename)
for key in my_shelf:
globals()[key]=my_shelf[key]
my_shelf.close()
print(T)
# Hiya
print(val)
# [1, 2, 3]
1.4 dill
pycharm裡可以儲存檔案,but在jupyter裡報錯,不知道是個啥原因啊~
dump_session(filename='/tmp/session.pkl', main=None, byref=False)
pickle the current state of __main__ to a file
import dill
# 儲存檔案
filename = 'globalsave.pkl'
dill.dump_session(filename)
dill.load_session(filename)
2. 儲存模型檔案
2.1 .model檔案
2.1.1 訓練模型
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn import cross_validation, metrics
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# -- 載入鳶尾花資料集
iris_dataset = load_iris()
# -- 資料處理&訓練集、測試集切分
rawdata = pd.DataFrame(iris_dataset['data'], columns=['x0', 'x1', 'x2', 'x3'])
rawlabel = pd.DataFrame(iris_dataset['target'], columns=['label'])
dt_model = DecisionTreeClassifier()
train_X, test_X, train_y, test_y = train_test_split(rawdata,
rawlabel, test_size=0.3, random_state=0)
dt_model.fit(X=train_X, y=train_y)
print(metrics.classification_report(train_y,
dt_model.predict(X=train_X)))
print(metrics.classification_report(test_y,
dt_model.predict(X=test_X)))
2.1.2 儲存&呼叫模型檔案。
from sklearn.externals import joblib
# 模型儲存
joblib.dump(dt_model, './Code/dt_model.model')
# 模型載入
dt_model_load = joblib.load('./Code/dt_model.model')
print(metrics.classification_report(test_y,
dt_model_load.predict(X=test_X)))
2.2 pickle檔案
發現pickle也可以用,不知道是否效能方面存在一定的問題~
import pickle
with open('dt_model.pickle', 'wb') as f:
pickle.dump(dt_model, f)
f.close()
import pickle
with open('dt_model.pickle', 'rb') as f: # Python 3: open(..., 'rb')
x = pickle.load(f)
2.3 pmml檔案
sklearn訓練的模型可以儲存為pmml檔案,似乎可以用java直接呼叫~後面用到再說,其餘不贅。
2018-09-29 於南京 紫東創業園