2017-CVPR-Spindle Net: Person Re-identification with Human Body Region Guided Feature

阿新 • • 發佈：2018-11-22

轉載自：https://blog.csdn.net/weixin_41427758/article/details/82910295

論文地址：http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Spindle_Net_Person_CVPR_2017_paper.pdf

Motivation

由檢測演算法以及姿勢變化引起的行人身體不對準問題會為不同影象間的特徵匹配造成嚴重的影響 --> 怎麼解決這個問題？

Contribution

首次在ReID中考慮人體結構資訊：
- 幫助對齊不同影象中人體區域特徵
- 增強區域性細節資訊的表示能力
SpindleNet
- a multi-stage ROI pooling framework --> 不同語義層次的特徵在不同階段進行提取
- a tree-structured fusion network + competitive strategy --> 合併不同語義層次的特徵
真實監控場景的ReID資料集–SenseReID來評價演算法的效能；本文的方法在大多資料集上達到了SOTA的方法

1. Introduction

ReID定義以及用途
- 跨攝像頭或時間片段檢索行人
- 主要在安防場景

ReID常見的挑戰：
- 由於檢測演算法以及姿勢變化，不同影象之間的行人身體存在不對準問題，如上圖(a)
- 如何捕獲易於區分的細節資訊，如圖(b)，頭部區域對於兩個圖片有更強的判別力
- 遮擋問題：如何在比較過程中，降低遮擋區域的特徵重要性
a tree-structured feature fusion strategy + a competitive strategy

2. Related Work

特徵學習
度量學習
Video Based

3. Body Region Proposal Network

通過the Region Proposal Net- work (RPN)來產生7個身體區域
- 關鍵點定位
- 身體區域產生

1.step:定位輸入圖片的14個關鍵點

借鑑了CPM，利用sequential framework以由粗到細的方式來生成響應圖，全卷積網路 --> 14個response map
- 在每個階段，CNN提取特徵並結合上一個階段的響應圖來refine關鍵點估計的位置
- 對CPM進行修改來降低其複雜度：
  - 共享前幾層的卷積引數
  - 用s=2的卷積代替池化層
  - 減小了輸入大小、階段數、卷積層的通道數
14個關節點可以通過最大化特徵圖上的值得到：
Pi=[xi,yi]=argmaxx∈[1,X],y∈[1,Y]Fi(x,y)

Pi=[xi,yi]=argx∈[1,X],y∈[1,Y]maxFi(x,y)

2.step:產生7個身體區域

根據14個關節點生成3個巨集觀區域(頭-肩，上體、下體)、4個微觀區域(雙腿、雙臂)，具體可參考上圖
RPN的訓練：
- the MPII human pose dataset
- a Gaussian kernel
- Loss function：L2 distance

4. Body Region Guided Spindle Net

兩個主要部分：
- the Feature Extraction Network (FEN)：輸入為行人圖片以及候選區域 ==> 計算全域性特徵與子區域特徵
- the Feature Fusion Network (FFN)：合併不同區域的特徵向量

4.1. Feature Extraction Network (FEN)

FEN由three convolution stages (FEN-C1, FEN-C2, FEN-C3)、two ROI pooling stages (FEN-P1, FEN-P2)：
- 1個全圖 + 7個身體子部分每個產生256維向量
- sub-region的特徵從全圖的特徵上在不同階段crop得到
- 在FCN-C3後通過one global pooling layer and one inner product layer將輸出轉換為256維向量
下圖表明瞭子區域特徵的有效性

(b)、(e)為經過FEN-C1的全域性特徵，由該特徵計算非對準的相同人的距離將遠，相似人兩個人距離較近
(c )、(f)為FEN-P1後的特徵，利用該特徵計算相似性對於非對準的相同人距離減小，相似人的距離增大

4.2. Feature Fusion Network (FFN)

FFN：將8個特徵向量合併成為一個可以很好描述行人圖片的256向量
fusion unit：進行特徵融合過程，輸入為大小相同的兩個或多個特徵向量，輸出為合併後的特徵向量
- The feature competition and selection process：element-wise maximization operation
- The feature transformation process:a inner product layer ==> 對應caffe裡的全連線層
A tree-structured fusion strategy
- 根據子區域的不同語義層次與關係在不同的階段將特徵向量進行合併
- 雙腿、雙臂 --> 雙腿結果+下體、雙臂結果 + 上體 --> 上階段結果 + 頭-肩 --> 與全圖的特徵進行拼接並轉換成256維向量
對頭-肩、上體、下體的融合

4.3. Training Details

progressive strategy：先訓練FEN、再訓練FFN；權重全部隨機初始化
FEN訓練步驟：
- 先訓練輸入為全圖
- 固定FEN-C1引數，訓練三個巨集觀分支
- 固定FEN-C1、FEN-C2，訓練四個微觀分支
FFN由FEN產生的特徵向量進行訓練，Softmax

5. Experiments

5.1. Datasets

實驗資料集以及劃分策略如下表：

5.2. Comparison Results

在大多資料集上取得了SOTA方法

6. Investigations on Spindle Net

6.1. Investigations on FEN

ROI pooling得到巨集觀區域以及微觀區域的最佳位置

由上圖可以看到：
- Marco最佳為FEN-C1：macro包含更復雜的身份資訊，應該更早的pool out來得到更多獨立的學習引數
- Micro最佳為FEN-C2
全圖特徵與在不同階段提取巨集觀及微觀特徵的組合實驗

6.2. Investigations on FFN

測試每種特徵的效果:全圖 > 巨集觀 > 微觀

樹型融合策略與其他融合策略的對比：

7. Conclusion

本文提出的Spindle Net:
- a multi-stage ROI pooling network:分開提取不同身體區域特徵
- tree-structured fusion network：合併不同身體區域特徵
不同層次的身體特徵有助於對齊不同行人圖片的身體區域
通過實驗驗證了feature com- petition and fusion network的有效性
本文的方法在多個數據集上取得了SOTA的方法

2017-CVPR-Spindle Net: Person Re-identification with Human Body Region Guided Feature

轉載自：https://blog.csdn.net/weixin_41427758/article/details/82910295 論文地址：http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao

論文筆記（4）--（Re-ID）Re-ranking Person Re-identification with k-reciprocal Encoding

2017年的CVPR：《Re-ranking Person Re-identification with k-reciprocalEncoding》論文：https://arxiv.org/abs/1701.08398v1 GitHub：https://github.com/zhunzho

[Paper note] Video-based Person Re-identification with Accumulative Motion Context

paper Highlight Two stream: spatial + temporal (optical flow). Use a motion network pre-trained

CVPR 2017：See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-based Person Re-identification

network 測試 eee 分享 The 因此進行最大變化 [1] Z. Zhou, Y. Huang, W. Wang, L. Wang, T. Tan, Ieee, See the Forest for the Trees: Joint Spatial and

2014 CVPR-DeepReID Deep Filter Pairing Neural Network for Person Re-Identification

論文地址第一篇用深度學習來做Re-ID的工作，介紹了很多基礎性的概念 model部分對CNN的設計思路講的很詳細，有些細節還沒有完全搞懂，回頭會繼續理解總結~ Motivation 傳統的re-

2017 ICCV-Pose-driven Deep Convolutional Model for Person Re-identification

論文地址 Motivation 巨大的姿勢變化以及複雜的視角差異增加了從行人圖片中提取特徵與匹配的困難 Contribution 提出了Pose-driven Deep Convolutional(PDC) model來提高特徵學習以及匹配

Person Re-identification 系列論文筆記（二）：A Discriminatively Learned CNN Embedding for Person Re-identification

triplet put ali com multi 深度學習 native alt 出現　　A Discriminatively Learned CNN Embedding for Person Re-identification Zheng Z, Zheng L, Ya

Person Re-identification 系列論文筆記（八）：SPReID

最終數據集 pipeline 論文筆記 cat cati 對齊技術分享通道 Human Semantic Parsing for Person Re-identification Kalayeh M M, Basaran E, Gokmen M, et al. H

Person Re-identification：SPReID

Human Semantic Parsing for Person Re-identification Kalayeh M M, Basaran E, Gokmen M, et al. Human Semantic Parsing for Person Re-identification[J].

Human Semantic Parsing for Person Re-identification

論文地址 GitHub程式碼 Introduction 目前大部分的Person ReID方法都開始集中於提取更加具有表徵能力的區域性特徵輔助全域性特徵用於行人檢索。這篇文章是CVPR2018中關於Person ReID的一篇，文章的主體思路就是part-base的方法，但是跟大部分pa

Mask-guided Contrastive Attention Model for Person Re-Identification 詳解

最近在看Re-ID相關的東西，現在把這篇paper記錄一下。程式碼地址一、概述首先二元體掩碼可以在兩個方面為Re-ID做出貢獻。1、掩模可以幫助消除畫素級的背景雜波，這可以極大地提高ReID模型在各種背景條件下的魯棒性。2、面具包含可被視為重要步態特徵的體形資訊。如果直接掩蓋掉

行人重識別——《A Systematic Evaluation and Benchmark for Person Re-Identification Features, Metrics, and D》

Benchmark演算法總結論文：《A Systematic Evaluation and Benchmark for Person Re-Identification Features, Metrics, and Datasets》論文提出了一套迄今為止最全面的

Person Re-identification by Local Maximal Occurrence Representation and Metric Learning（LOMO+XQDA）

2015年，學術界主流都在使用深度學習，而這篇文章卻用傳統方法達到了遠超state of the art的結果： Retinex transform預處理：為了解決不同攝像頭下光照條件變化很大：作者在特徵提取前進行了預處理，使用的是multiscale Retine

Improving Person Re-Identification by Efficient Pairwise-Specific CRC Coding in the XQDA Subspace

發表於《Ieice Transactions on Information & Systems》2018 摘要：一種新穎並且高效的編碼方式被提出用於在XQDA子空間中提高行人再識別。傳統的CRC（Collaborative Representation based

《Diversity Regularized Spatiotemporal Attention for Video-based Person Re-identification》論文翻譯

基於視訊的人體再識別的時間規整化時空注意 &

論文筆記（8）--（Re-ID）Camera Style Adaptation for Person Re-identification

論文：《Camera Style Adaptation for Person Re-identification》 https://arxiv.org/abs/1711.10295v1 因為相機之間的差異，ReID任務會受到不同相機圖片風格變化的影響。以往的paper中，潛在的學習一個不

論文筆記（7）--（Re-ID）Video-based Person Re-identification via Self Paced Weighting

論文：《Video-based Person Re-identification via Self Paced Weighting》 http://mmap.whu.edu.cn/wp-content/uploads/2017/11/aaai-18_wenjun_huang.pdf 這是

論文筆記（6）--（Re-ID）A Pose-Sensitive Embedding for Person Re-Id with Expanded Cross Neighborhood Re-Rank

論文：《A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking》 https://arxiv.org/pdf/1711.10378.pdf 這是

論文筆記（3）--（Re-ID）In Defense of the Triplet Loss for Person Re-Identification

deep metric learning – 深度度量學習，也就是相似度學習 Classification Loss – 當目標很大時，會嚴重增加網路引數，而訓練結束後很多引數都會被摒棄。 Verification Loss – 只能成對的判斷兩張圖片的相似度，因此很難應用到目標聚類和檢索上

【論文閱讀】Batch Feature Erasing for Person Re-identification and Beyond

轉載請註明出處：https://www.cnblogs.com/White-xzx/ 原文地址：https://arxiv.org/abs/1811.07130 【Abstract】　　這篇文章展示了行人ReID的一個新的訓練機制——批特徵擦除（Batch Feature Erasing，BFE）。作

2017-CVPR-Spindle Net: Person Re-identification with Human Body Region Guided Feature

Motivation

Contribution

1. Introduction

2. Related Work

3. Body Region Proposal Network

4. Body Region Guided Spindle Net

4.1. Feature Extraction Network (FEN)

4.2. Feature Fusion Network (FFN)

4.3. Training Details

5. Experiments

5.1. Datasets

5.2. Comparison Results

6. Investigations on Spindle Net

6.1. Investigations on FEN

6.2. Investigations on FFN

7. Conclusion

相關推薦