caffe原始碼解析—caffe layer的工作原理理解
caffe是現在運用廣泛的深度學習框架,最近也在閱讀caffe原始碼,將layer的原理個人理解跟大家分享一下。
看完需要點耐心,分析的自認為比較清楚了,程式碼不多。
caffe要實現神經網路的前向以及反向傳播計算需要兩個要素:一個是資料,一個是演算法。
先說資料:caffe定義了blob類,用來儲存訓練時的資料,在此不細講,以後有機會再分享吧。
然後是演算法:首先我們知道深度學習有很多種型別的層,比如卷積,pooling,RELU等。那麼要實現這麼多種層的演算法該怎麼實現。(本文是自底向上的結構)
caffe運用的正是c++面對物件的動態多型。首先定義了一個基類layer,在layer.hpp檔案裡面。
並且包含了如下的檔案:
#include <algorithm>
#include <string>
#include <vector>
#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/layer_factory.hpp"
#include "caffe/proto/caffe.pb.h"
#include "caffe/util/math_functions.hpp"
可以看到blob也被包含,which means這裡就是演算法部分。
layer類裡面看一下有哪些成員(挑重要的):
inline Dtype Forward(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top);
inline void Backward(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
首先是兩個函式:一個是前向,一個是後向。
上面是宣告,下面看定義:
template <typename Dtype>
inline Dtype Layer<Dtype>::Forward(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
// Lock during forward to ensure sequential forward
Lock();
Dtype loss = 0;
Reshape(bottom, top);
switch (Caffe::mode()) {
case Caffe::CPU:
Forward_cpu(bottom, top);
for (int top_id = 0; top_id < top.size(); ++top_id) {
if (!this->loss(top_id)) { continue; }
const int count = top[top_id]->count();
const Dtype* data = top[top_id]->cpu_data();
const Dtype* loss_weights = top[top_id]->cpu_diff();
loss += caffe_cpu_dot(count, data, loss_weights);
}
break;
case Caffe::GPU:
Forward_gpu(bottom, top);
template <typename Dtype>
inline void Layer<Dtype>::Backward(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) {
switch (Caffe::mode()) {
case Caffe::CPU:
Backward_cpu(top, propagate_down, bottom);
break;
case Caffe::GPU:
Backward_gpu(top, propagate_down, bottom);
break;
default:
LOG(FATAL) << "Unknown caffe mode.";
}
}
可以看到,特分別呼叫了Forward_cpu以及Forward_gpu兩個函式,同樣,Backward也是呼叫了它的cpu以及gpu函式
那我們再看前向以及後向的cpu和gpu函式,它的宣告仍然在layer.hpp中:
/** @brief Using the CPU device, compute the layer output. */
virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) = 0;
/**
* @brief Using the GPU device, compute the layer output.
* Fall back to Forward_cpu() if unavailable.
*/
virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
// LOG(WARNING) << "Using CPU code as backup.";
return Forward_cpu(bottom, top);
}
/**
* @brief Using the CPU device, compute the gradients for any parameters and
* for the bottom blobs if propagate_down is true.
*/
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) = 0;
/**
* @brief Using the GPU device, compute the gradients for any parameters and
* for the bottom blobs if propagate_down is true.
* Fall back to Backward_cpu() if unavailable.
*/
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) {
// LOG(WARNING) << "Using CPU code as backup.";
Backward_cpu(top, propagate_down, bottom);
}
看到virtual是不是就有感覺了並且是純虛擬函式!!!!!!!!!!!!!!!!!!!!!!如果不懂virtual請移步c++面對物件好好看看。
這裡其實就是實現不同層的前向以及反向傳播的關鍵了。我們以卷積層為例來看看卷積層是怎麼實現的。首先我們應該想到,動態多型的實現靠virtual以及繼承。
那麼卷積層肯定是繼承了layer的,不信請看(conv_layer.hpp以及base_conv_layer.hpp):
#include <vector>
#include "caffe/blob.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"
#include "caffe/util/im2col.hpp"
namespace caffe {
/**
* @brief Abstract base class that factors out the BLAS code common to
* ConvolutionLayer and DeconvolutionLayer.
*/
template <typename Dtype>
class BaseConvolutionLayer : public Layer<Dtype> {
explicit ConvolutionLayer(const LayerParameter& param)
: BaseConvolutionLayer<Dtype>(param) {}
這裡有點小插曲,就是卷積在實現的時候用了一種方法,請看部落格(此處為引用):
http://blog.csdn.net/mounty_fsc/article/details/51290446
所以先定義了一個 BaseConvolutionLayer類,!!!!!!!!!!!!!繼承了layer!!!!!!!!!!!!!!!
然後ConvolutionLayer類再繼承BaseConvolutionLayer。
繼承已經出現了,我們直接進入主題,要實現動態多型子類肯定要重新定義前面的Foward_cpu,gpu等函式。先上圖,conv_layer.hpp檔案中:
protected:
virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top);
virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
virtual inline bool reverse_dimensions() { return false; }
virtual void compute_output_shape();
先聲明瞭四個函式,再看定義(conv_layer.cpp檔案):
template <typename Dtype>
void ConvolutionLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
const Dtype* weight = this->blobs_[0]->cpu_data();
for (int i = 0; i < bottom.size(); ++i) {
const Dtype* bottom_data = bottom[i]->cpu_data();
Dtype* top_data = top[i]->mutable_cpu_data();
for (int n = 0; n < this->num_; ++n) {
this->forward_cpu_gemm(bottom_data + n * this->bottom_dim_, weight,
top_data + n * this->top_dim_);
if (this->bias_term_) {
const Dtype* bias = this->blobs_[1]->cpu_data();
this->forward_cpu_bias(top_data + n * this->top_dim_, bias);
}
}
}
}
template <typename Dtype>
void ConvolutionLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
const Dtype* weight = this->blobs_[0]->cpu_data();
Dtype* weight_diff = this->blobs_[0]->mutable_cpu_diff();
for (int i = 0; i < top.size(); ++i) {
const Dtype* top_diff = top[i]->cpu_diff();
const Dtype* bottom_data = bottom[i]->cpu_data();
Dtype* bottom_diff = bottom[i]->mutable_cpu_diff();
// Bias gradient, if necessary.
if (this->bias_term_ && this->param_propagate_down_[1]) {
Dtype* bias_diff = this->blobs_[1]->mutable_cpu_diff();
for (int n = 0; n < this->num_; ++n) {
this->backward_cpu_bias(bias_diff, top_diff + n * this->top_dim_);
}
}
if (this->param_propagate_down_[0] || propagate_down[i]) {
for (int n = 0; n < this->num_; ++n) {
// gradient w.r.t. weight. Note that we will accumulate diffs.
if (this->param_propagate_down_[0]) {
this->weight_cpu_gemm(bottom_data + n * this->bottom_dim_,
top_diff + n * this->top_dim_, weight_diff);
}
// gradient w.r.t. bottom data, if necessary.
if (propagate_down[i]) {
this->backward_cpu_gemm(top_diff + n * this->top_dim_, weight,
bottom_diff + n * this->bottom_dim_);
}
}
}
}
}
函式裡面還呼叫了兩個函式forward_cpu_gemm,backward_cpu_gemm。裡面的具體實現細節就不關注了。在這裡就可以看出,卷積層定義了自己forward_cpu,gpu以及
Backward_cpu,gpu函式這些函式就是每種型別的類實現前向以及反向的演算法。同理,其他的型別的層也定義了屬於自己的演算法。到此為止,基礎設施已經搭建完成了,要
實現動態多型還要通過呼叫才能實現(動態多型我沒記錯的話應該是在執行的時候決定呼叫哪個函式)。
那麼我們來看看caffe 的test()函式(caffe.cpp檔案):
int test() {
CHECK_GT(FLAGS_model.size(), 0) << "Need a model definition to score.";
CHECK_GT(FLAGS_weights.size(), 0) << "Need model weights to score.";
// Set device id and mode
vector<int> gpus;
get_gpus(&gpus);
if (gpus.size() != 0) {
LOG(INFO) << "Use GPU with device ID " << gpus[0];
#ifndef CPU_ONLY
cudaDeviceProp device_prop;
cudaGetDeviceProperties(&device_prop, gpus[0]);
LOG(INFO) << "GPU device name: " << device_prop.name;
#endif
Caffe::SetDevice(gpus[0]);
Caffe::set_mode(Caffe::GPU);
} else {
LOG(INFO) << "Use CPU.";
Caffe::set_mode(Caffe::CPU);
}
// Instantiate the caffe net.
Net<float> caffe_net(FLAGS_model, caffe::TEST);
caffe_net.CopyTrainedLayersFrom(FLAGS_weights);
LOG(INFO) << "Running for " << FLAGS_iterations << " iterations.";
vector<int> test_score_output_id;
vector<float> test_score;
float loss = 0;
for (int i = 0; i < FLAGS_iterations; ++i) {
float iter_loss;
const vector<Blob<float>*>& result =
caffe_net.Forward(&iter_loss);
loss += iter_loss;
int idx = 0;
for (int j = 0; j < result.size(); ++j) {
const float* result_vec = result[j]->cpu_data();
for (int k = 0; k < result[j]->count(); ++k, ++idx) {
const float score = result_vec[k];
if (i == 0) {
test_score.push_back(score);
test_score_output_id.push_back(j);
} else {
test_score[idx] += score;
}
const std::string& output_name = caffe_net.blob_names()[
caffe_net.output_blob_indices()[j]];
LOG(INFO) << "Batch " << i << ", " << output_name << " = " << score;
}
}
}
loss /= FLAGS_iterations;
LOG(INFO) << "Loss: " << loss;
for (int i = 0; i < test_score.size(); ++i) {
const std::string& output_name = caffe_net.blob_names()[
caffe_net.output_blob_indices()[test_score_output_id[i]]];
const float loss_weight = caffe_net.blob_loss_weights()[
caffe_net.output_blob_indices()[test_score_output_id[i]]];
std::ostringstream loss_msg_stream;
const float mean_score = test_score[i] / FLAGS_iterations;
if (loss_weight) {
loss_msg_stream << " (* " << loss_weight
<< " = " << loss_weight * mean_score << " loss)";
}
LOG(INFO) << output_name << " = " << mean_score << loss_msg_stream.str();
}
return 0;
}
看到程式碼中的Forwad函式就對了,呼叫它的物件是caffe_net是Net類。我們直接轉到它的定義(net.cpp檔案):
template <typename Dtype>
const vector<Blob<Dtype>*>& Net<Dtype>::Forward(Dtype* loss) {
if (loss != NULL) {
*loss = ForwardFromTo(0, layers_.size() - 1);
} else {
ForwardFromTo(0, layers_.size() - 1);
}
return net_output_blobs_;
}
呼叫了一個ForwardFromTo函式,直接看定義:
template <typename Dtype>
Dtype Net<Dtype>::ForwardFromTo(int start, int end) {
CHECK_GE(start, 0);
CHECK_LT(end, layers_.size());
Dtype loss = 0;
for (int i = start; i <= end; ++i) {
// LOG(ERROR) << "Forwarding " << layer_names_[i];
Dtype layer_loss = layers_[i]->Forward(bottom_vecs_[i], top_vecs_[i]);
loss += layer_loss;
if (debug_info_) { ForwardDebugInfo(i); }
}
return loss;
}
有沒有一種似曾相識的感覺,再看一眼:
layer_[i]是什麼,是不是很激動!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!layer
現在回想一下,一個深度神經網路Net有很多層組成的是不是,那麼每一層的計算分別呼叫自己的演算法函式不就很完美。通過前面的介紹,
對於某一層,首先呼叫Forward()函式,這個函式是在layer基類中定義的,他又會去掉用Forward_cpu等函式來具體實現,這個時候有分別
呼叫各自定義的函式,對於一個Net網路,每一層都會呼叫自己的演算法,從而實現準確的前向以及反向計算。同理,train()函式的實現大家
可自行參考程式碼。
!!!!!!!!!!!!!!!!結尾了!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
看程式碼不易,先從.h檔案著手,理解專案的整體框架結構,誰引用了誰,實現了什麼,結構清晰就可以嘗試自己去構造具體實現(大神的工作)。
寫的挺糟糕的,需要有耐心去看。