基於深度神經網路的目標檢測系列文章一:摘要
注:本文源自本人的碩士畢業論文,未經許可,嚴禁轉載! 原文請參考知網:知網本論文下載地址
摘 要
隨著計算機效能的飛速提升,蟄伏已久的深度學習演算法終於迎來了高速發展的時期。物體識別(也叫物體檢測,目標檢測)是計算機視覺領域中最有價值的研究方向之一。本論文主要研究的是卷積神經網路演算法在一般場景下物體識別方法的應用,更具體地說,這裡的物體識別是指行車時路況資訊(包括行人、過往車輛、訊號燈等)的識別。
傳統的物體識別方法分為三個步驟:首先在原始影象上生成目標建議框,然後提取這些建議框的特徵,最後對框裡的物體進行分類和邊框迴歸。其中每一步都存在問題,近似於窮舉式的目標建議框生成策略直接影響檢測的速度、精準度和計算的冗餘量;傳統方法採用人工提取影象特徵的方式並不能保證特徵的質量;特徵分類採用傳統機器學習方法導致速度慢。更重要的一點是,這三個步驟是完全分離的,不能做到實時檢測。
本論文針對上述三個問題,使用神經網路演算法突破了問題的難點。首先針對手工提取影象特徵難的問題,論文實現一個基於密集連線網路(DenseNet)改進的卷積神經網路,用於自動提取高質量的深度特徵,能夠替代手工提取影象特徵;其次針對傳統分類器速度慢的問題,論文使用Softmax分類器進行預測,該分類器能夠天然地和卷積網路組合使用,這樣就可以把第二、三步合併到一個網路中,大大提升了檢測速度和精度;然後針對目標建議框生成策略,論文拋棄了直接在原始影象上生成建議框的方式,而是用神經網路首先提取影象特徵,再在特徵圖上進行建議框生成策略,這樣既準確又高效。最後,論文將以上解決方案應用到SSD檢測方法中,生成一個改進的SSD檢測演算法。經過測試,改進後的SSD方法在檢測速度和精度上均有明顯的提升,並且在檢測小目標時表現更佳。改進的SSD檢測方法將三個檢測過程合併到了同一個網路中,真正實現了端到端的實時檢測。
本論文最後根據改進版的SSD檢測演算法實現了一個實時物體識別系統。得益於檢測演算法速度快、精度高等優點,系統最終實現了穩定實時檢測的目標。
關鍵詞:物體識別,卷積神經網路,密集連線,實時
ABSTRACT
With the rapid improvement of computer performance, long-standing deep learning algorithms have finally ushered in a period of rapid development. Object recognition (also called object detection, object detection) is one of the most valuable research directions in the field of computer vision. This dissertation focuses on the application of convolutional neural network algorithm in object recognition in general scenes. More specifically, object recognition here refers to the identification of traffic information (including pedestrians, passing vehicles, traffic lights, etc.) when the vehicle is traveling.
The traditional target detection method is divided into three steps. Firstly, the target suggestion frame is generated on the original image, then the features of these suggestion boxes are extracted, and finally the objects in the frame are classified and the frame is returned. Each step has its own problem, which is similar to the exhaustive target suggestion box generation strategy that directly affects the detection speed, accuracy, and computational redundancy. The traditional method of manually extracting image features does not guarantee the quality of the feature. Classification using traditional machine learning methods results in slow speeds. More importantly, these three steps are completely separate and cannot be used for real-time detection.
This paper aims at the above three problems, using neural network algorithm to break through the difficulties of the problem. For the difficulty of manually extracting image features, the paper implements a convolutional neural network based on dense connection network (DenseNet), which can automatically extract high-quality depth features and can replace the manual extraction of image features. For traditional classifiers, the speed is slow. The problem is that the paper uses the Softmax classifier to make predictions. The classifier can be used naturally in combination with a convolutional network, so that the second and third steps can be merged into one network, greatly improving the detection speed and accuracy; Generate strategy, the paper abandoned the way to generate the suggestion frame directly on the original image, but first use neural network to extract the image feature first, and then propose the strategy to generate the strategy on the feature map, which is accurate and efficient. Finally, the paper applies the above solution to the SSD detection method to generate an improved SSD detection algorithm. After testing, the improved SSD method has significantly improved detection speed and accuracy, and performs better when detecting small targets. The improved SSD detection method incorporates the three detection processes into the same network and truly enables end-to-end real-time detection.
At last, this paper implements a real-time object recognition (road condition information recognition) system based on the improved SSD detection algorithm. Thanks to the advantages of fast detection speed and high precision, the system finally achieves the goal of stable real-time detection.
Keywords: object recognition, convolutional neural networks, dense connections, real-time