DeepLearning to digit recognizer in kaggle

阿新 • • 發佈：2017-06-27

flags 權重數據位更新 multiple 就會 oss you 給定

DeepLearning to digit recongnizer in kaggle

近期在看deeplearning，於是就找了kaggle上字符識別進行練習。這裏我主要用兩種工具箱進行求解。並比對兩者的結果。

兩種工具箱各自是DeepLearningToolbox和caffe。

DeeplearningToolbox源代碼解析見：http://blog.csdn.net/lu597203933/article/details/46576017

Caffe學習見：http://caffe.berkeleyvision.org/

一：DeeplearningToolbox

DeeplearningToolbox基於matlab，很的簡單，讀下源代碼，對於了解卷積神經網絡等過程很有幫助。

這裏我主要是對digit recongnizer給出的數據集進行預處理以使其適用於我們的deeplearningToolbox工具箱。主要包括兩個.m文件，各自是predeal.m和cnntest.m文件。

所須要做的就是改變addpath的路徑，代碼凝視很具體，大家自己看。

代碼

predeal.m

% use the deeplearnToolbox to solve the digit recongnizer in kaggle!
clear;clc
trainFile = ‘train.csv‘;
testFile = ‘test.csv‘;
fidId = fopen(trainFile);

M = csvread(trainFile, 1);   % 讀取csv文件除第一行以外的全部數據
train_x = M(:, 2:end);    %第2列開始為數據data
label = M(:,1)‘;  %第一列為標簽
label(label == 0) = 10;   % 不變為10 以下一句無法處理
train_y = full(sparse(label, 1:size(train_x, 1), 1));   %將標簽變成一個矩陣

train_x = double(reshape(train_x‘,28,28,size(train_x, 1)))/255;  



fidId = fopen(‘test.csv‘);     %% 處理預測的數據
M = csvread(testFile, 1);   % 讀取csv文件除第一行以外的全部數據
test_x = double(reshape(M‘,28,28,size(M, 1)))/255;  
clear fidId label testFile M testFile trainFile


addpath D:\DeepLearning\DeepLearnToolbox-master\data\      %路徑須要改下
addpath D:\DeepLearning\DeepLearnToolbox-master\CNNaddpath D:\DeepLearning\DeepLearnToolbox-master\util
rand(‘state‘,0)
cnn.layers = {        %%% 設置各層feature maps個數及卷積模板大小等屬性
    struct(‘type‘, ‘i‘) %input layer
    struct(‘type‘, ‘c‘, ‘outputmaps‘, 6, ‘kernelsize‘, 5) %convolution layer
    struct(‘type‘, ‘s‘, ‘scale‘, 2) %sub sampling layer
    struct(‘type‘, ‘c‘, ‘outputmaps‘, 12, ‘kernelsize‘, 5) %convolution layer
    struct(‘type‘, ‘s‘, ‘scale‘, 2) %subsampling layer
};

opts.alpha = 0.01;   %叠代下降的速率
opts.batchsize = 50;   %每次選擇50個樣本進行更新  隨機梯度下降。每次僅僅選用50個樣本進行更新
opts.numepochs = 25;   %叠代次數
cnn = cnnsetup(cnn, train_x, train_y);      %對各層參數進行初始化 包含權重和偏置
cnn = cnntrain(cnn, train_x, train_y, opts);  %訓練的過程，包含bp算法及叠代過程

test_y = cnntest(cnn, test_x);      %對測試數據集進行測試
test_y(test_y == 10) = 0;      %標簽10 須要反轉為0
test_y = test_y‘;
M = [(1:length(test_y))‘ test_y(:)];  
csvwrite(‘test_y.csv‘, M);
figure; plot(cnn.rL);

cnntest.m

  function [test_y] = cnntest(net, x)
    %  feedforward
    net = cnnff(net, x);
    [~, test_y] = max(net.o);
end

結果：用deeplearningToolbox得到的結果並非非常好，僅僅有0.94586

二：caffe to digit recongnizer

盡管caffe自帶了mnist對樣例對字符進行處理。可是官網給出的數據是二進制的文件，得到的結果也僅僅是一個簡單的準確率，所以不能無限制的套用。

過程例如以下：

1：將給定csv數據轉變成lmdb格式

這裏我在mnist的目錄下寫了一個convert_data_to_lmdb.cpp的程序對數據進行處理：

代碼例如以下：

#include <iostream>
#include <string>
#include <sstream>
#include <gflags/gflags.h>


#include "boost/scoped_ptr.hpp"
#include "gflags/gflags.h"
#include "glog/logging.h"

#include "caffe/proto/caffe.pb.h"
#include "caffe/util/db.hpp"
#include "caffe/util/io.hpp"
#include "caffe/util/rng.hpp"

using namespace caffe;
using namespace std;
using std::pair;
using boost::scoped_ptr;

/* edited by Zack
 * argv[1] the input file, argv[2] the output file*/

DEFINE_string(backend, "lmdb", "The backend for storing the result");  // get Flags_backend == lmdb

int main(int argc, char **argv){
	::google::InitGoogleLogging(argv[0]);

	#ifndef GFLAGS_GFLAGS_H_
	  namespace gflags = google;
	#endif

	if(argc < 3){
		LOG(ERROR)<< "please check the input arguments!";
		return 1;
	}
	ifstream infile(argv[1]);
	if(!infile){
		LOG(ERROR)<< "please check the input arguments!";
		return 1;
	}
	string str;
	int count = 0;
	int rows = 28;
	int cols = 28;
	unsigned char *buffer = new  unsigned char[rows*cols];
	stringstream ss;

	Datum datum;             // this data structure store the data and label
	datum.set_channels(1);    // the channels
	datum.set_height(rows);    // rows
	datum.set_width(cols);     // cols

	scoped_ptr<db::DB> db(db::GetDB(FLAGS_backend));         // new DB object
	db->Open(argv[2], db::NEW);                    // open the lmdb file to store the data
	scoped_ptr<db::Transaction> txn(db->NewTransaction());   // new Transaction object to put and commit the data

	const int kMaxKeyLength = 256;           // to save the key
	char key_cstr[kMaxKeyLength];

	bool flag= false;
	while(getline(infile, str)){
		if(flag == false){
			flag = true;
			continue;
		}
		int beg = 0;
		int end = 0;
		int str_index = 0;
		//test  need to add this----------1
		//datum.set_label(0);
		while((end = str.find_first_of(‘,‘, beg)) != string::npos){
			//cout << end << endl;
			string dig_str = str.substr(beg, end - beg);
			int pixes;
			ss.clear();
			ss << dig_str;
			ss >> pixes;
			// test need to delete this--------------2
			if(beg == 0){
				datum.set_label(pixes);
				beg = ++ end;
				continue;
			}
			buffer[str_index++] = (unsigned char)pixes;
			beg = ++end;
		}
		string dig_str = str.substr(beg);
		int pixes;
		ss.clear();
		ss << dig_str;
		ss >> pixes;
		buffer[str_index++] = (unsigned char)pixes;
		datum.set_data(buffer, rows*cols);

		int length = snprintf(key_cstr, kMaxKeyLength, "%08d", count);

		    // Put in db
		string out;
		CHECK(datum.SerializeToString(&out));              // serialize to string
		txn->Put(string(key_cstr, length), out);        // put it, both the key and value

		if (++count % 1000 == 0) {       // to commit every 1000 iteration
		  // Commit db
		  txn->Commit();
		  txn.reset(db->NewTransaction());
		  LOG(ERROR) << "Processed " << count << " files.";
		}

	}
	// write the last batch
	  if (count % 1000 != 0) {            // commit the last batch
		txn->Commit();
		LOG(ERROR) << "Processed " << count << " files.";
	  }

	return 0;
}

然後我們運行make all –j8對代碼進行編譯。

這樣在build目錄下就會生成對應的二進制文件了。

如圖：

技術分享

然後運行./build/examples/mnist/convert_data_to_lmdb.bin examples/mnist/kaggle/data/train.csvexamples/mnist/kaggle/mnist_train_lmdb --backend=lmdb

就能夠得到得到訓練文件的lmdb格式文件了。對於測試test.csv，因為test.csv沒有標簽，所以須要對代碼進行細微調整，2處調整已在上述代碼中標註了。

然後相同運行make all –j8，再運行./build/examples/mnist/convert_data_to_lmdb.bin examples/mnist/kaggle/data/test.csvexamples/mnist/kaggle/mnist_test_lmdb --backend=lmdb

就能夠得到所相應的測試數據的lmdb格式文件了。

2：用訓練數據進行訓練得到model

Caffe在訓練model的時候，代碼須要在每隔test_iter時間就要對測試數據集進行測試，因此我們這裏能夠用train.csv的前1000條數據制作一個交叉驗證的數據集lmdb, 過程和上面一樣。

分別將mnist文件夾以下的lenet_solver.prototxt和lenet_train_test.prototxt復制到kaggle文件夾以下。並對相應的包括文件所在文件夾和相應的batch size進行改動。詳細見：下載地址。

然後運行./build/tools/caffe train –solver=examples/mnist/kaggle/lenet_solver.prototxt，這樣就能夠得到我們的lenet_iter_10000.caffemodel了。

3：提取測試集prob層的特征。

這裏我們使用tools文件下的extract_features.cpp的源文件。可是該源文件產生的結果是lmdb的格式。因此我對源代碼進行了改動例如以下：

#include <stdio.h>  // for snprintf
#include <string>
#include <vector>
#include <fstream>

#include "boost/algorithm/string.hpp"
#include "google/protobuf/text_format.h"

#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/net.hpp"
#include "caffe/proto/caffe.pb.h"
#include "caffe/util/db.hpp"
#include "caffe/util/io.hpp"
#include "caffe/vision_layers.hpp"

using caffe::Blob;
using caffe::Caffe;
using caffe::Datum;
using caffe::Net;
using boost::shared_ptr;
using std::string;
namespace db = caffe::db;

template<typename Dtype>
int feature_extraction_pipeline(int argc, char** argv);

int main(int argc, char** argv) {
  return feature_extraction_pipeline<float>(argc, argv);
//  return feature_extraction_pipeline<double>(argc, argv);
}

template<typename Dtype>
int feature_extraction_pipeline(int argc, char** argv) {
  ::google::InitGoogleLogging(argv[0]);
  const int num_required_args = 7;     /// the parameters must be not less 7
  if (argc < num_required_args) {
    LOG(ERROR)<<
    "This program takes in a trained network and an input data layer, and then"
    " extract features of the input data produced by the net.\n"
    "Usage: extract_features  pretrained_net_param"
    "  feature_extraction_proto_file  extract_feature_blob_name1[,name2,...]"
    "  save_feature_dataset_name1[,name2,...]  num_mini_batches  db_type"
    "  [CPU/GPU] [DEVICE_ID=0]\n"
    "Note: you can extract multiple features in one pass by specifying"
    " multiple feature blob names and dataset names seperated by ‘,‘."
    " The names cannot contain white space characters and the number of blobs"
    " and datasets must be equal.";
    return 1;
  }
  int arg_pos = num_required_args;     //the necessary nums of parameters

  arg_pos = num_required_args;
  if (argc > arg_pos && strcmp(argv[arg_pos], "GPU") == 0) {          // whether use GPU------ -gpu 0
    LOG(ERROR)<< "Using GPU";
    uint device_id = 0;
    if (argc > arg_pos + 1) {
      device_id = atoi(argv[arg_pos + 1]);
      CHECK_GE(device_id, 0);
    }
    LOG(ERROR) << "Using Device_id=" << device_id;
    Caffe::SetDevice(device_id);
    Caffe::set_mode(Caffe::GPU);
  } else {
    LOG(ERROR) << "Using CPU";
    Caffe::set_mode(Caffe::CPU);
  }

  arg_pos = 0;  // the name of the executable
  std::string pretrained_binary_proto(argv[++arg_pos]);      // the mode had been trained

  // Expected prototxt contains at least one data layer such as
  //  the layer data_layer_name and one feature blob such as the
  //  fc7 top blob to extract features.
  /*
   layers {
     name: "data_layer_name"
     type: DATA
     data_param {
       source: "/path/to/your/images/to/extract/feature/images_leveldb"
       mean_file: "/path/to/your/image_mean.binaryproto"
       batch_size: 128
       crop_size: 227
       mirror: false
     }
     top: "data_blob_name"
     top: "label_blob_name"
   }
   layers {
     name: "drop7"
     type: DROPOUT
     dropout_param {
       dropout_ratio: 0.5
     }
     bottom: "fc7"
     top: "fc7"
   }
   */
  std::string feature_extraction_proto(argv[++arg_pos]);    // get the net structure
  shared_ptr<Net<Dtype> > feature_extraction_net(
      new Net<Dtype>(feature_extraction_proto, caffe::TEST));               //new net object  and set each layers------feature_extraction_net
  feature_extraction_net->CopyTrainedLayersFrom(pretrained_binary_proto);           // init the weights

  std::string extract_feature_blob_names(argv[++arg_pos]);          //exact which blob‘s feature
  std::vector<std::string> blob_names;
  boost::split(blob_names, extract_feature_blob_names, boost::is_any_of(","));   //you can exact many blobs‘ features and to store them in different dirname

  std::string save_feature_dataset_names(argv[++arg_pos]);   // to store the features
  std::vector<std::string> dataset_names;
  boost::split(dataset_names, save_feature_dataset_names,         // each dataset_names to store one blob‘s feature
               boost::is_any_of(","));
  CHECK_EQ(blob_names.size(), dataset_names.size()) <<
      " the number of blob names and dataset names must be equal";
  size_t num_features = blob_names.size();     // how many features you exact

  for (size_t i = 0; i < num_features; i++) {
    CHECK(feature_extraction_net->has_blob(blob_names[i]))
        << "Unknown feature blob name " << blob_names[i]
        << " in the network " << feature_extraction_proto;
  }

  int num_mini_batches = atoi(argv[++arg_pos]);            // each exact num_mini_batches of images

  // init the DB and Transaction for all blobs you want to extract features
  std::vector<shared_ptr<db::DB> > feature_dbs;               // new DB object, is a vector  maybe has many blogs‘ feature
  std::vector<shared_ptr<db::Transaction> > txns;            // new Transaction object, is a vectore maybe has many blob‘s feature


  // edit by Zack
   //std::string strfile = "/home/hadoop/caffe/textileImage/features/probTest";
  std::string strfile = argv[argc-1];
  std::vector<std::ofstream*> vec(num_features, 0);

  const char* db_type = argv[++arg_pos];                  //the data to store style == lmdb
  for (size_t i = 0; i < num_features; ++i) {
    LOG(INFO)<< "Opening dataset " << dataset_names[i];               // dataset_name[i] to store the feature which type is lmdb
    shared_ptr<db::DB> db(db::GetDB(db_type));             // the type of the db
    db->Open(dataset_names.at(i), db::NEW);          // open the dir to store the feature
    feature_dbs.push_back(db);             // put the db to the vector
    shared_ptr<db::Transaction> txn(db->NewTransaction());     // the transaction to the db
    txns.push_back(txn);                // put the transaction to the vector

// edit by Zack

    std::stringstream ss;
    ss.clear();
    string index;
    ss << i;
    ss >> index;
    std::string str = strfile + index + ".txt";
    vec[i] = new std::ofstream(str.c_str());
  }

  LOG(ERROR)<< "Extacting Features";

  Datum datum;
  const int kMaxKeyStrLength = 100;
  char key_str[kMaxKeyStrLength];      // to store the key
  std::vector<Blob<float>*> input_vec;
  std::vector<int> image_indices(num_features, 0);   /// how many blogs‘ feature you exact


  for (int batch_index = 0; batch_index < num_mini_batches; ++batch_index) {
    feature_extraction_net->Forward(input_vec);
    for (int i = 0; i < num_features; ++i) {    // to exact the blobs‘ name  maybe fc7 fc8
      const shared_ptr<Blob<Dtype> > feature_blob = feature_extraction_net
          ->blob_by_name(blob_names[i]);
      int batch_size = feature_blob->num();     // the nums of images-------batch size
      int dim_features = feature_blob->count() / batch_size;    // this dim of this feature of each image in this blob
      const Dtype* feature_blob_data;   // float is the features
      for (int n = 0; n < batch_size; ++n) {
        datum.set_height(feature_blob->height());     // set the height
        datum.set_width(feature_blob->width());     // set the width
        datum.set_channels(feature_blob->channels());    // set the channel
        datum.clear_data();               // clear data
        datum.clear_float_data();        // clear float_data
        feature_blob_data = feature_blob->cpu_data() +
            feature_blob->offset(n);    //the features of  which image
        for (int d = 0; d < dim_features; ++d) {
          datum.add_float_data(feature_blob_data[d]);
          (*vec[i]) << feature_blob_data[d] << " ";          // save the features
        }
        (*vec[i]) << std::endl;
        //LOG(ERROR)<< "dim" << dim_features;
        int length = snprintf(key_str, kMaxKeyStrLength, "%010d",
            image_indices[i]);       // key  di ji ge tupian
        string out;
        CHECK(datum.SerializeToString(&out));    // serialize to string
        txns.at(i)->Put(std::string(key_str, length), out);       // put to transaction
        ++image_indices[i];       // key++
        if (image_indices[i] % 1000 == 0) {    // when it reach to 1000 ,we commit it
          txns.at(i)->Commit();
          txns.at(i).reset(feature_dbs.at(i)->NewTransaction());
          LOG(ERROR)<< "Extracted features of " << image_indices[i] <<
              " query images for feature blob " << blob_names[i];
        }
      }  // for (int n = 0; n < batch_size; ++n)
    }  // for (int i = 0; i < num_features; ++i)
  }  // for (int batch_index = 0; batch_index < num_mini_batches; ++batch_index)
  // write the last batch
  for (int i = 0; i < num_features; ++i) {
    if (image_indices[i] % 1000 != 0) {     // commit the last path images
      txns.at(i)->Commit();
    }
    // edit by Zack
      vec[i]->close();
      delete vec[i];

    LOG(ERROR)<< "Extracted features of " << image_indices[i] <<
        " query images for feature blob " << blob_names[i];
    feature_dbs.at(i)->Close();
  }

  LOG(ERROR)<< "Successfully extracted the features!";
  return 0;
}

最後將得到的prob層(即最後得到的概率)存入到了txt中了。

此外對網絡結構進行了調整，僅僅須要預測，網絡中的參數都能夠去掉不要了。,

deploy.prototxt代碼例如以下：

name: "LeNet"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/kaggle/mnist_test_lmdb"
    batch_size: 100
    backend: LMDB
  }
}

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
 
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
   
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"

  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
   
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  
  inner_product_param {
    num_output: 500
    
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"

  inner_product_param {
    num_output: 10
   
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "ip2"
  top: "prob"
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "prob"
  bottom: "label"
  top: "accuracy"
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

然後運行

./build/tools/extract_features.bin examples/mnist/kaggle/lenet_iter_10000.caffemodel examples/mnist/kaggle/deploy.prototxt prob examples/mnist/kaggle/features 280 lmdb /home/hadoop/caffe/caffe-master/examples/mnist/kaggle/feature

當中280為叠代次數，由於在deploy.prototxt中batch_size設為了100。故就為總共的測試數據集的大小=28000. /home/hadoop/caffe/caffe-master/examples/mnist/kaggle/feature為終於的提取特征存放在txt保存的路徑。examples/mnist/kaggle/lenet_iter_10000.caffemodel為訓練的權重參數，examples/mnist/kaggle/deploy.prototxt為網絡結構。

4：對得到的txt進行後處理

通過上面三個步驟，我們就能夠得到feture0.txt。存放的數據位28000*10大小。相應每一個樣本屬於哪一類發生的概率。然後運行下面matlab代碼就能夠得到kaggle所須要的提交結果了。最後的準確率為0.98986。排名也提升了400+。great！！

% caffe toolbox, the postprocessing of the data 
clear;clc;
feature = load(‘feature0.txt‘);
feature = feature‘;
[~,test_y] = max(feature);
[M,N] = size(test_y);
test_y = test_y - repmat([1], M, N);
test_y = test_y‘;
M = [(1:length(test_y))‘ test_y(:)];  
csvwrite(‘test_y3.csv‘, M);

全部文件代碼下載見：https://github.com/zack6514/zackcoding

DeepLearning to digit recognizer in kaggle

flags 權重數據位更新 multiple 就會 oss you 給定 DeepLearning to digit recongnizer in kaggle 近期在看deeplearning，於是就找了kaggle上字符識別進行練習。這裏我

DeepLearning to digit recognizer in kaggle

DeepLearning to digit recongnizer in kaggle

一：DeeplearningToolbox

二：caffe to digit recongnizer

1：將給定csv數據轉變成lmdb格式

2：用訓練數據進行訓練得到model

3：提取測試集prob層的特征。

4：對得到的txt進行後處理

DeepLearning to digit recognizer in kaggle

Build a handwritten digit recognizer in Watson Studio and PyTorch

Kaggle Digit Recognizer識別手寫數字入門賽基於tensorflow-GPU(TOP 15%)

kaggle練手題目Digit Recognizer

Kaggle KNN實現Digit Recognizer

kaggle——Digit Recognizer

Kaggle學習之路(二) —— Digit Recognizer之問題分析

kaggle 入門 digit recognizer Python xgboost

Kaggle digit-recognizer PCA+SVM

kaggle--Digit Recognizer（python實現）

Kaggle比賽——Digit Recognizer——Part 1（Pytorch 資料集的建立）

An introduction to parsing text in Haskell with Parsec

why does it suck to be an in-house programmer?

How to Install wget in OS X如何在Mac OS X下安裝wget並解決configure: error:

How to convert matrix to RDD[Vector] in spark

Kiggle:Digit Recognizer

[Recompose] Create Stream Behaviors to Push Props in React Components with mapPropsStream

安裝Android studio出現'tools.jar' seems to be not in Android Studio classpath......的解決方法

How to Access Data in a Property Tree

python3.6.1環境配置出現Requirement already up-to-date: pip in c:python36libsite-packages決解方案

DeepLearning to digit recognizer in kaggle

DeepLearning to digit recongnizer in kaggle

一：DeeplearningToolbox

二：caffe to digit recongnizer

1：將給定csv數據轉變成lmdb格式

2：用訓練數據進行訓練得到model

3：提取測試集prob層的特征。

4：對得到的txt進行後處理

相關推薦