1. 程式人生 > >DeepVO: Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks

DeepVO: Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks

step with 圖片 eight enter sub img layer each

1、Introduction

DL解決VO問題:End-to-End VO with RCNN

技術分享圖片

2、Network structure

技術分享圖片

a.CNN based Feature Extraction

技術分享圖片

  論文使用KITTI數據集。

  CNN部分有9個卷積層,除了Conv6,其他的卷積層後都連接1層ReLU,則共有17層。

b、RNN based Sequential Modelling

  RNN is different from CNN in that it maintains memory of its hidden states over time and has feedback loops among them, which enables its current hidden state to be a function of the previous ones.

  Given a convolutional feature xk at time k, a RNN updates at time step k by

技術分享圖片

  hk and yk are the hidden state and output at time k respectively.

  W terms denote corresponding weight matrices.

  b terms denote bias vectors.

  H is an element-wise nonlinear activation function.

  LSTM

技術分享圖片

Folded and unfolded LSTMs and internal structure of its unit.

技術分享圖片

  技術分享圖片is element-wise product of two vectors.

  σ is sigmoid non-linearity.

  tanh is hyperbolic tangent non-linearity.

  W terms denote corresponding weight matrices.

  b terms denote bias vectors.

  ik, f k, gk, ck and ok are input gate, forget gate, input modulation gate, memory cell and output gate.

  Each of the LSTM layers has 1000 hidden states.

3、損失函數及優化

  The conditional probability of the poses Yt = (y1, . . . , yt) given a sequence of monocular RGB images Xt = (x1, . . . , xt) up to time t.

技術分享圖片

  Optimal parameters :

技術分享圖片

  The hyperparameters  of the DNNs:

技術分享圖片

  (pk, φk) is the ground truth pose.

  (p?k, φ?k) is the estimated ground truth pose.

  κ (100 in the experiments) is a scale factor to balance the weights of positions and orientations.

  N is the number of samples.

  The orientation φ is represented by Euler angles rather than quaternion since quaternion is subject to an extra unit constraint which hinders the optimisation problem of DL.

DeepVO: Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks