Opencv實現行人檢測(HOG + SVM)
1. 理論基礎
使用OpenCv進行行人檢測的主要思想: HOG + SVM
HOG: 方向梯度直方圖(Histogram of Oriented Gradient, HOG)特徵是一種在計算機視覺和影象處理中用來進行物體檢測的特徵描述子。HOG特徵通過計算和統計影象區域性區域的梯度方向直方圖來構成特徵.
SVM: (Support Vector Machine)指的是支援向量機,是常見的一種判別方法。在機器學習領域,是一個有監督的學習模型,通常用來進行模式識別、分類以及迴歸分析, 在行人檢測中可以用作區分行人和非行人的分類器。
在使用HOG + SVM進行行人檢測時, 採集HOG特徵的主要思想是通過對一幅影象進行分析, 區域性目標的表象和形狀可以被剃度或者邊緣密度方向分佈很好的好的描述. 我們對影象的各個畫素點採集土堆或者邊緣的方向直方圖, 根據直方圖的資訊就可以描述圖片的特徵. 好在OpenCv 中已經提供了計算HOG特徵的方法, 根據採集到的HOG特徵向量, 供SVM分類使用. SVM簡單來說就是一個分類器, 在行人檢測中就可以轉化為行人與非行人的兩類分類問題, 在OpenCv中運用到的是基於網格法的SVM.使用採集到的正樣本(行人)和負樣本(非行人, 可以是汽車, 樹木, 路燈等等)的HOG特徵, 然後使用SVM分類器進行訓練, 得到行人檢測模型, 進行行人檢測.
2. 訓練和測試
分為以下幾個過程:
1. 準備訓練樣本集合: 包括正樣本集合和負樣本集合.
2. 對訓練樣本進行處理, 根據法國國家資訊與自動化研究所行人資料庫(INRIA Person DataBase)給出的樣本集和影象資訊, 進行樣本採集.(Tools類中的 ImgCut() 函式)(這裡進行處理的主要作用是對樣本進行歸一化, 將其歸一到一個尺度(64*120))
3. 提取正樣本的HOG特徵.
4. 提取負樣本的HOG特徵.
5. 對正負樣本進行標記, 正樣本為1, 負樣本為0.
6. 將正負樣本的HOG特徵及正負樣本的標籤輸入 SVM 進行訓練.
7. 訓練後SVM 的結果儲存在 Pedestrian.xml檔案中.
8. 對得到的Pedestrian.xml檔案輸入進SVM得到行人檢測分類器, 進行行人檢測, 行人檢測的主要會應用於智慧交通, 此程式不僅可以對圖片處理,還可以對視訊進行處理, 並且是一個視覺化的過程, 這裡的視訊資料來自加州理工行人檢測基準資料(Caltech Pedestrian Detection Benchmark).
9. 對於難例的處理, 所謂的難例就是第一次訓練出的分類器負樣本原圖中檢測到的有行人的樣本, 這些誤報使得行人檢測的分類器不是那麼準確, 所以可以將誤報的矩形框儲存為新的負樣本, 對新的負樣本進行二次訓練, 實現難例的處理.
注: 訓練和測試資料集見附錄, 結果見附件.
3. 模型及結果優缺點分析.
(1)此HOG + SVM進行行人檢測的模型的分辨特徵有以下幾個:
1. 對於特徵明顯行人分辨能力強.
2. 目標單一時分辨能力強.
3. 干擾較少時分辨能力強.
(2)不足:
對於人群較多的圖片和人物特徵不明顯的圖片分辨能力較差.
(3)對於不足的改進:
1. 使用更多的樣本進行訓練.
2. 對於分辨錯誤的難例進行二次訓練.
4. 模型建立中發現的問題及改進方法
在進行HOG + SVM進行行人檢測時, 時間消耗較長.對於訓練的2416個正樣本和12180個負樣本進行兩次訓練共使用了595.919秒.
改進方法可以是進行CPU 並行優化, 可以分為三個執行緒, 確保每一個CPU核心都是滿載.
1. 影象的預處理.
2. 提取HOG特徵.
3. 使用SVM訓練.
5. 行人檢測OpenCv 程式碼(C++)
1. Tools類(處理圖片)
#include "opencv2/core/core.hpp"
#include "opencv2/objdetect.hpp"
#include "opencv2/core/cuda.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/video/video.hpp"
#include "opencv2/ml.hpp"
#include "opencv2/opencv.hpp"
#include <opencv2/objdetect/objdetect.hpp>
#include <opencv2/imgcodecs.hpp>
#include <stdlib.h>
#include <time.h>
#include <algorithm>
#include <math.h>
#include <iostream>
#include <vector>
#include <fstream>
#include <cstdlib>
using namespace std;
using namespace cv;
using namespace cv::ml;
class Tools
{
public:
Tools();
int CropImageCount = 0; //裁剪出來的負樣本圖片個數
void ImgCut()
{
Mat src;
string ImgName;
char saveName[256];//裁剪出來的負樣本圖片檔名
ifstream fin("E:\\Program\\Vstudio\\OpenCv\\picture\\INRIAPerson\\Train\\neg.lst");//開啟原始負樣本圖片檔案列表
//一行一行讀取檔案列表
while (getline(fin, ImgName))
{
cout << "處理:" << ImgName << endl;
ImgName = "E:\\Program\\Vstudio\\OpenCv\\picture\\INRIAPerson\\" + ImgName;
src = imread(ImgName, 1);//讀取圖片
//cout<<"寬:"<<src.cols<<",高:"<<src.rows<<endl;
//圖片大小應該能能至少包含一個64*128的視窗
if (src.cols >= 64 && src.rows >= 128)
{
srand(time(NULL));//設定隨機數種子 time(NULL)表示當前系統時間
//從每張圖片中隨機取樣10個64*128大小的不包含人的負樣本
for (int i = 0; i<10; i++)
{
int x = (rand() % (src.cols - 64)); //左上角x座標
int y = (rand() % (src.rows - 128)); //左上角y座標
//cout<<x<<","<<y<<endl;
Mat imgROI = src(Rect(x, y, 64, 128));
sprintf(saveName, "E:\\Program\\Vstudio\\OpenCv\\picture\\INRIAPerson\\negphoto\\noperson%06d.jpg", ++CropImageCount);//生成裁剪出的負樣本圖片的檔名
imwrite(saveName, imgROI);//儲存檔案
}
}
}
}
~Tools();
};
2. Pedestrian類(行人檢測)
主要包括以下函式:
(1) 取得SVM分類器
void get_svm_detector(const Ptr< SVM >& svm, vector< float > & hog_detector)
(2) 轉化OpenCv機器學習演算法所使用的訓練和樣本集
void convert_to_ml(const vector< Mat > & train_samples, Mat& trainData)
(3) 載入目錄的圖片樣本
void load_images(const String & dirname, vector< Mat > & img_lst, bool showImages = false)
(4) 計算HOG特徵
void computeHOGs(const Size wsize, const vector< Mat > & img_lst, vector< Mat > & gradient_lst)
(5) 測試已經訓練的分類器
int test_trained_detector(String obj_det_filename, String test_dir, String videofilename)
#include "opencv2/core/core.hpp"
#include "opencv2/objdetect.hpp"
#include "opencv2/core/cuda.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/video/video.hpp"
#include "opencv2/ml.hpp"
#include "opencv2/opencv.hpp"
#include <opencv2/objdetect/objdetect.hpp>
#include <opencv2/imgcodecs.hpp>
#include <stdlib.h>
#include <time.h>
#include <algorithm>
#include <math.h>
#include <iostream>
#include <vector>
#include <fstream>
#include <cstdlib>
using namespace std;
using namespace cv;
using namespace cv::ml;
class _Pedestrain
{
public:
_Pedestrain();
//函式宣告
void get_svm_detector(const Ptr< SVM >& svm, vector< float > & hog_detector)
{
// get the support vectors
Mat sv = svm->getSupportVectors();
const int sv_total = sv.rows;
// get the decision function
Mat alpha, svidx;
double rho = svm->getDecisionFunction(0, alpha, svidx);
CV_Assert(alpha.total() == 1 && svidx.total() == 1 && sv_total == 1); //括號中的條件不滿足時,返回錯誤
CV_Assert((alpha.type() == CV_64F && alpha.at<double>(0) == 1.) ||
(alpha.type() == CV_32F && alpha.at<float>(0) == 1.f));
CV_Assert(sv.type() == CV_32F);
hog_detector.clear();
hog_detector.resize(sv.cols + 1);
memcpy(&hog_detector[0], sv.ptr(), sv.cols * sizeof(hog_detector[0])); //memcpy指的是c和c++使用的記憶體拷貝函式,memcpy函式的功能是從源src所指的記憶體地址的起始位置開始拷貝n個位元組到目標dest所指的記憶體地址的起始位置中。
hog_detector[sv.cols] = (float)-rho;
}
/*
* Convert training/testing set to be used by OpenCV Machine Learning algorithms.
* TrainData is a matrix of size (#samples x max(#cols,#rows) per samples), in 32FC1.
* Transposition of samples are made if needed.
*/
void convert_to_ml(const vector< Mat > & train_samples, Mat& trainData)
{
//--Convert data
const int rows = (int)train_samples.size(); //行數等於訓練樣本個數
const int cols = (int)std::max(train_samples[0].cols, train_samples[0].rows); //列數取樣本圖片中寬度與高度中較大的那一個
Mat tmp(1, cols, CV_32FC1); //< used for transposition if needed
trainData = Mat(rows, cols, CV_32FC1);
for (size_t i = 0; i < train_samples.size(); ++i)
{
CV_Assert(train_samples[i].cols == 1 || train_samples[i].rows == 1);
if (train_samples[i].cols == 1)
{
transpose(train_samples[i], tmp);
tmp.copyTo(trainData.row((int)i));
}
else if (train_samples[i].rows == 1)
{
train_samples[i].copyTo(trainData.row((int)i));
}
}
}
void load_images(const String & dirname, vector< Mat > & img_lst, bool showImages = false)
{ //載入目錄下的圖片樣本
vector< String > files;
glob(dirname, files); //返回一個包含有匹配檔案/目錄的陣列。出錯則返回false
for (size_t i = 0; i < files.size(); ++i)
{
Mat img = imread(files[i]); // load the image
if (img.empty()) // invalid image, skip it.
{
cout << files[i] << " is invalid!" << endl;
continue;
}
if (showImages)
{
imshow("image", img);
waitKey(1);
}
img_lst.push_back(img);//將Img壓入img_lst
}
}
void sample_neg(const vector< Mat > & full_neg_lst, vector< Mat > & neg_lst, const Size & size)
{ //該函式對每一個負樣本取樣出一個隨機的64*128尺寸的樣本,由於之前已經取樣過了,所以main函式中沒有使用該函式
Rect box;
box.width = size.width; //等於檢測器寬度
box.height = size.height; //等於檢測器高度
const int size_x = box.width;
const int size_y = box.height;
srand((unsigned int)time(NULL)); //生成隨機數種子
for (size_t i = 0; i < full_neg_lst.size(); i++)
{ //對每個負樣本進行裁剪,隨機指定x,y,裁剪一個尺寸為檢測器大小的負樣本
box.x = rand() % (full_neg_lst[i].cols - size_x);
box.y = rand() % (full_neg_lst[i].rows - size_y);
Mat roi = full_neg_lst[i](box);
neg_lst.push_back(roi.clone());
}
}
void computeHOGs(const Size wsize, const vector< Mat > & img_lst, vector< Mat > & gradient_lst)
{ //計算HOG特徵
HOGDescriptor hog;
hog.winSize = wsize;
Rect r = Rect(0, 0, wsize.width, wsize.height);
r.x += (img_lst[0].cols - r.width) / 2; //正樣本圖片的尺寸減去檢測器的尺寸,再除以2
r.y += (img_lst[0].rows - r.height) / 2;
Mat gray;
vector< float > descriptors;
for (size_t i = 0; i< img_lst.size(); i++)
{
cvtColor(img_lst[i](r), gray, COLOR_BGR2GRAY);
hog.compute(gray, descriptors, Size(8, 8), Size(0, 0)); //Size(8,8)為視窗移動步長,
gradient_lst.push_back(Mat(descriptors).clone());
}
}
int test_trained_detector(String obj_det_filename, String test_dir, String videofilename)
{ //當videofilename為空,則只檢測圖片中的行人
cout << "Testing trained detector..." << endl;
HOGDescriptor hog;
hog.load(obj_det_filename);
vector< String > files;
glob(test_dir, files);
int delay = 0;
VideoCapture cap;
if (videofilename != "")
{
cap.open(videofilename);
}
obj_det_filename = "testing " + obj_det_filename;
namedWindow(obj_det_filename, WINDOW_NORMAL);
for (size_t i = 0;; i++)
{
Mat img;
if (cap.isOpened())
{
cap >> img;
delay = 1;
}
else if (i < files.size())
{
img = imread(files[i]);
}
if (img.empty())
{
return 0;
}
vector< Rect > detections;
vector< double > foundWeights;
hog.detectMultiScale(img, detections, foundWeights);
for (size_t j = 0; j < detections.size(); j++)
{
if (foundWeights[j] < 0.5) continue; //清楚權值較小的檢測視窗
Scalar color = Scalar(0, foundWeights[j] * foundWeights[j] * 200, 0);
rectangle(img, detections[j], color, img.cols / 400 + 1);
}
imshow(obj_det_filename, img);
if (27 == waitKey(delay))
{
return 0;
}
}
return 0;
}
int trainAndTest(int argc, char** argv, const char* keys)
{
CommandLineParser parser(argc, argv, keys); //命令列函式,讀取keys中的字元, 其中key的格式為:名字 簡稱| 內容 |提示字元。
if (parser.has("help"))
{
parser.printMessage();
exit(0);
}
String pos_dir = parser.get< String >("pd"); //正樣本目錄
String neg_dir = parser.get< String >("nd"); //負樣本目錄
String test_dir = parser.get< String >("td"); //測試樣本目錄
String obj_det_filename = parser.get< String >("fn"); //訓練好的SVM檢測器檔名
String videofilename = parser.get< String >("tv"); //測試視訊
int detector_width = parser.get< int >("dw"); //檢測器寬度
int detector_height = parser.get< int >("dh"); //檢測器高度
bool test_detector = parser.get< bool >("t"); //測試訓練好的檢測器
bool train_twice = parser.get< bool >("d"); //訓練兩次
bool visualization = parser.get< bool >("v"); //訓練過程視覺化(建議false,不然爆炸)
if (test_detector) //若為true,測對測試集進行測試
{
test_trained_detector(obj_det_filename, test_dir, videofilename);
exit(0);
}
if (pos_dir.empty() || neg_dir.empty()) //檢測非空
{
parser.printMessage();
cout << "Wrong number of parameters.\n\n"
<< "Example command line:\n" << argv[0] << " -pd=/INRIAPerson/96X160H96/Train/pos -nd=/INRIAPerson/neg -td=/INRIAPerson/Test/pos -fn=HOGpedestrian96x160.xml -d\n"
<< "\nExample command line for testing trained detector:\n" << argv[0] << " -t -dw=96 -dh=160 -fn=HOGpedestrian96x160.xml -td=/INRIAPerson/Test/pos";
exit(1);
}
vector< Mat > pos_lst, //正樣本圖片向量
full_neg_lst, //負樣本圖片向量
neg_lst, //取樣後的負樣本圖片向量
gradient_lst; //HOG描述符存入到該梯度資訊裡面
vector< int > labels; //標籤向量
clog << "Positive images are being loaded...";
load_images(pos_dir, pos_lst, visualization); //載入圖片 pos正樣本的尺寸為96*160
if (pos_lst.size() > 0)
{
clog << "...[done]" << endl;
}
else
{
clog << "no image in " << pos_dir << endl;
return 1;
}
Size pos_image_size = pos_lst[0].size(); //令尺寸變數pos_image_size=正樣本尺寸
cout <<"pis = " << pos_image_size << endl;
//檢測所有正樣本是否具有相同尺寸
for (size_t i = 0; i < pos_lst.size(); ++i)
{
if (pos_lst[i].size() != pos_image_size)
{
cout << "All positive images should be same size!" << endl;
exit(1);
}
}
pos_image_size = pos_image_size / 8 * 8;
//令pos_image_size的尺寸為檢測器的尺寸
if (detector_width && detector_height)
{
pos_image_size = Size(detector_width, detector_height);
}
labels.assign(pos_lst.size(), +1); //assign()為labels分配pos_lst.size()大小的容器,用+1填充 表示為正樣本
const unsigned int old = (unsigned int)labels.size(); //舊標籤大小
clog << "Negative images are being loaded...";
load_images(neg_dir, neg_lst, false); //載入負樣本圖片
//sample_neg(full_neg_lst, neg_lst, pos_image_size);
clog << "...[done]" << endl;
labels.insert(labels.end(), neg_lst.size(), -1); //在labels向量的尾部新增neg_lst.size()大小的容器,用-1填充 表示為負樣本
CV_Assert(old < labels.size()); //CV_Assert()作用:CV_Assert()若括號中的表示式值為false,則返回一個錯誤資訊。
clog << "Histogram of Gradients are being calculated for positive images...";
computeHOGs(pos_image_size, pos_lst, gradient_lst); //計算正樣本圖片的HOG特徵
clog << "...[done]" << endl;
clog << "Histogram of Gradients are being calculated for negative images...";
computeHOGs(pos_image_size, neg_lst, gradient_lst); //計算負樣本圖片的HOG特徵
clog << "...[done]" << endl;
Mat train_data;
convert_to_ml(gradient_lst, train_data); //轉化為ml所需的訓練資料形式
clog << "Training SVM...";
Ptr< SVM > svm = SVM::create();
/* Default values to train SVM */
svm->setCoef0(0.0);
svm->setDegree(3);
svm->setTermCriteria(TermCriteria(CV_TERMCRIT_ITER + CV_TERMCRIT_EPS, 1000, 1e-3));
svm->setGamma(0);
svm->setKernel(SVM::LINEAR); //採用線性核函,其他的sigmoid 和RBF 可自行設定,其值由0-5。
svm->setNu(0.5);
svm->setP(0.1); // for EPSILON_SVR, epsilon in loss function?
svm->setC(0.01); // From paper, soft classifier
svm->setType(SVM::EPS_SVR); // C_SVC; // EPSILON_SVR; // may be also NU_SVR; // do regression task
svm->train(train_data, ROW_SAMPLE, Mat(labels));
clog << "...[done]" << endl;
//訓練兩次
if (train_twice)
{
clog << "Testing trained detector on negative images. This may take a few minutes...";
HOGDescriptor my_hog;
my_hog.winSize = pos_image_size;
// Set the trained svm to my_hog
vector< float > hog_detector;
get_svm_detector(svm, hog_detector);
my_hog.setSVMDetector(hog_detector);
vector< Rect > detections;
vector< double > foundWeights;
for (size_t i = 0; i < full_neg_lst.size(); i++)
{
my_hog.detectMultiScale(full_neg_lst[i], detections, foundWeights);
for (size_t j = 0; j < detections.size(); j++)
{
Mat detection = full_neg_lst[i](detections[j]).clone();
resize(detection, detection, pos_image_size);
neg_lst.push_back(detection);
}
if (visualization)
{
for (size_t j = 0; j < detections.size(); j++)
{
rectangle(full_neg_lst[i], detections[j], Scalar(0, 255, 0), 2);
}
imshow("testing trained detector on negative images", full_neg_lst[i]);
waitKey(5);
}
}
clog << "...[done]" << endl;
labels.clear();
labels.assign(pos_lst.size(), +1);
labels.insert(labels.end(), neg_lst.size(), -1);
gradient_lst.clear();
clog << "Histogram of Gradients are being calculated for positive images...";
computeHOGs(pos_image_size, pos_lst, gradient_lst);
clog << "...[done]" << endl;
clog << "Histogram of Gradients are being calculated for negative images...";
computeHOGs(pos_image_size, neg_lst, gradient_lst);
clog << "...[done]" << endl;
clog << "Training SVM again...";
convert_to_ml(gradient_lst, train_data);
svm->train(train_data, ROW_SAMPLE, Mat(labels));
clog << "...[done]" << endl;
}
//-------------------------------------------------------------------
vector< float > hog_detector; //定義hog檢測器
get_svm_detector(svm, hog_detector); //得到訓練好的檢測器
HOGDescriptor hog;
hog.winSize = pos_image_size; //視窗大小
hog.setSVMDetector(hog_detector);
hog.save(obj_det_filename); //儲存分類器
test_trained_detector(obj_det_filename, test_dir, videofilename); //檢測訓練集
return 0;
}
~_Pedestrain();
};
3. 主類
#include "Tools.h"
#include "_Pedestrain.h"
using namespace std;
using namespace cv;
using namespace cv::ml;
//vedio_dir | J:\\Download\\SEQ\\set01\\V003.seq
const char* keys =
{
"{help h| | show help message}"
"{pd | E:\\Program\\Vstudio\\OpenCv\\picture\\INRIAPerson\\96X160H96\\Train\\pos | path of directory contains possitive images}"
"{nd | E:\\Program\\Vstudio\\OpenCv\\picture\\INRIAPerson\\negphoto | path of directory contains negative images}"
"{td | E:\\Program\\Vstudio\\OpenCv\\picture\\INRIAPerson\\Test\\pos | path of directory contains test images}"
"{tv | | test video file name}"
"{dw | 64 | width of the detector}"
"{dh | 128 | height of the detector}"
"{d |true| train twice}"
"{t |false| test a trained detector}"
"{v |false| visualize training steps}"
"{fn |E:\\Pedestrain.xml| file name of trained SVM}"
};
string obj_det_filename = "E:\\Pedestrain.xml";
string test_dir = "E:\\Program\\Vstudio\\OpenCv\\picture\\INRIAPerson\\Test\\pos";
string vediofilename = "";
int main(int argc, char** argv)
{
//資料預處理
//Tools tool;
//tool.ImgCut();
//cout << tool.CropImageCount << endl;
//訓練並測試資料
_Pedestrain pt;
//pt.trainAndTest(argc, argv, keys);
//測試資料
pt.test_trained_detector(obj_det_filename, test_dir, vediofilename);
return 0;
}
6. 附錄
1. INRIA Person DataBase
http://pascal.inrialpes.fr/data/human/
2. Caltech Pedestrian Detection Benchmark
http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/
7. 參考文獻
1. http://blog.csdn.net/qianqing13579/article/details/46509037
2. http://blog.csdn.net/k87974/article/details/78583501?locationNum=8&fps=1
3. http://blog.sina.com.cn/s/blog_844b767a0102wqfh.html
4. https://www.zhihu.com/question/27662700?from=profile_question_card
5. http://blog.csdn.net/zouxy09/article/details/7929348
6. https://github.com/opencv/opencv/wiki/Deep-Learning-in-OpenCV