『MXNet』第一彈_基礎操作以及常見層實現
阿新 • • 發佈:2018-05-15
sco 交叉熵 tor 內存數據 softmax war 反向 shuff 數字 自動求導
MXNet是基礎,Gluon是封裝,兩者猶如TensorFlow和Keras,不過得益於動態圖機制,兩者交互比TensorFlow和Keras要方便得多,其基礎操作和pytorch極為相似,但是方便不少,有pytorch基礎入門會很簡單。
庫導入寫法,
from mxnet import ndarray as nd from mxnet import autograd from mxnet import gluon import mxnet as mx
MXNet
mxnet.ndarray是整個科學計算系統的基礎,整體API和numpy的nparray一致,這一點類似於pytorch,不過不同於pytorch內置變量、張量等不同數據類型,mxnet簡化了只有ndarray一種,通過mxnet.autograd 可以直接實現求導,十分便捷.
自動求導
x = nd.arange(4).reshape((4, 1)) # 標記需要自動求導的量 x.attach_grad() # 有自動求導就需要記錄計算圖 with autograd.record(): y = 2 * nd.dot(x.T, x) # 反向傳播輸出 y.backward() # 獲取梯度 print(‘x.grad: ‘, x.grad)
nd轉化為數字
nd.asscalar()
nd與np數組互化
y = nd.array(x) # NumPy轉換成NDArray。
z = y.asnumpy() # NDArray轉換成NumPy。
節約內存的加法
nd.elemwise_add(x, y, out=z)
層實現
relu激活
def relu(X): return nd.maximum(X, 0)
全連接層
# 變量生成 w = nd.random.normal(scale=1, shape=(num_inputs, 1)) b = nd.zeros(shape=(1,)) params = [w, b] # 變量掛載梯度 for param in params: param.attach_grad() # 實現全連接 def net(X, w, b): return nd.dot(X, w) + b
SGD實現
def sgd(params, lr, batch_size): for param in params: param[:] = param - lr * param.grad / batch_size
Gluon
內存數據集加載
import mxnet as mx from mxnet import autograd, nd import numpy as np num_inputs = 2 num_examples = 1000 true_w = [2, -3.4] true_b = 4.2 features = nd.random.normal(scale=1, shape=(num_examples, num_inputs)) labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b labels += nd.random.normal(scale=0.01, shape=labels.shape) from mxnet.gluon import data as gdata batch_size = 10 dataset = gdata.ArrayDataset(features, labels) data_iter = gdata.DataLoader(dataset, batch_size, shuffle=True) for X, y in data_iter: print(X, y) break
[[-1.74047375 0.26071024] [ 0.65584248 -0.50490594] [-0.97745866 -0.01658815] [-0.55589193 0.30666101] [-0.61393601 -2.62473822] [ 0.82654613 -0.00791582] [ 0.29560572 -1.21692061] [-0.35985938 -1.37184834] [-1.69631028 -1.74014604] [ 1.31199837 -1.96280086]] <NDArray 10x2 @cpu(0)> [ -0.14842382 7.22247267 2.30917668 2.0601418 11.89551163 5.87866735 8.94194221 8.15139961 6.72600317 13.50252151] <NDArray 10 @cpu(0)>
模型定義
- 序列模型生成
- 層填充
- 初始化模型參數
net = gluon.nn.Sequential() with net.name_scope(): net.add(gluon.nn.Dense(1)) net.collect_params().initialize(mx.init.Normal(sigma=1)) # 模型參數初始化選擇normal分布
優化器
wd參數為模型添加了L2正則化,機制為:w = w - lr*grad - wd*w
trainer = gluon.Trainer(net.collect_params(), ‘sgd‘, { ‘learning_rate‘: learning_rate, ‘wd‘: weight_decay})
trainer.step(batch_size)需要運行在每一次反向傳播之後,會更新參數,一次模擬的訓練過程如下,
for e in range(epochs): for data, label in data_iter_train: with autograd.record(): output = net(data) loss = square_loss(output, label) loss.backward() trainer.step(batch_size) train_loss.append(test(net, X_train, y_train)) test_loss.append(test(net, X_test, y_test))
層函數API
拉伸
nn.Flatten()
全連接層
gluon.nn.Dense(256, activation="relu")
參數表示輸出節點數
損失函數class API
交叉熵
loss = gloss.SoftmaxCrossEntropyLoss()
『MXNet』第一彈_基礎操作以及常見層實現