1. 程式人生 > >lstm 三角函數預測

lstm 三角函數預測

lstm backward logs 描述 mach sha model forum Go

Preface

說了好久要手撕一次lstm預測,結果上學期用bucket時遇到issue後就擱了下來,後面還被突然尷尬了幾次(⊙﹏⊙)b。
好吧,我先把issue亮出來https://github.com/apache/incubator-mxnet/issues/8663,然而並沒有大神鳥(我也不知道為什麽 ...)。

Code

今天也是事起突然,然後就寫了段測試程序( 可能大家都玩gluon,不理symbol那一套了):

import mxnet as mx
from mxnet import gluon
import numpy as np

hiden_sizes=[10,20,1]
batch_size=
300 iteration=300000 log_freq = 20 ctx=mx.gpu() opt = ‘adam‘ # ‘sgd‘ unroll_len =9 t= mx.nd.arange(0,0.01*(1+unroll_len),.01, ctx=ctx) tt= mx.nd.random.uniform(shape=(iteration,1), ctx=ctx) t= (t+tt).T # (unroll_len, iteration) y= mx.nd.sin(t[-1])/2 model=gluon.rnn.SequentialRNNCell() with model.name_scope(): for
hidden_size in hiden_sizes: model.add(gluon.rnn.LSTMCell(hidden_size)) model.initialize(ctx=ctx) L=gluon.loss.L2Loss() Trainer= gluon.Trainer(model.collect_params(),opt) prev_batch_idx=-1 acc_l = mx.nd.array([0,], ctx=ctx) for batch_idx in xrange(iteration/batch_size): x_list = [x[batch_idx*
batch_size:(batch_idx+1)*batch_size].T for x in t[:unroll_len]] # e in x_list: (b,1) label = y[batch_idx*batch_size:(batch_idx+1)*batch_size] with mx.autograd.record(): outputs, states = model.unroll(unroll_len, x_list) l=L(outputs[-1], label) l.backward() Trainer.step(batch_size) acc_l += l.mean() if batch_idx- prev_batch_idx == log_freq: print ‘loss:%.4f%((acc_l/log_freq).asnumpy()) prev_batch_idx = batch_idx acc_l *= 0

Note

  1. adam要比sgd顯著地快,見文末loss的比較列表。
  2. 沒有relu激活,然後層數多了之後,難以優化?
    前一個問題:LSTM的定義式裏面沒有這個存在的地方;第二個問題,發現有幾個鏈接
    https://www.reddit.com/r/MachineLearning/comments/30eges/batch_normalization_or_other_tricks_for_lstms/
    https://groups.google.com/forum/#!topic/lasagne-users/EczUQckJggU
    以上是相關的討論。
    然後這份工作(http://cn.arxiv.org/abs/1603.09025)是針對hidden-to-hidden提出的BN。從描述和貼上的結果來看,收斂速度精度並沒有可觀的提升。
adam sgd
0.0378 0.0387
0.0223 0.0335
0.0059 0.0284
0.0043 0.0247
0.0030 0.0214

lstm 三角函數預測