多目標跟蹤綜述 2021

阿新 • • 發佈：2022-01-05

TransTrack(Init One Stage)/(TrackFormer)
TransCenter(Trans + Regression)
SiamMOT
CorrTrack(Detection)
CenterTrack 檢測和跟蹤聯合
~~MPNTracker~~

Background

MOT Categories

Track-by-detection

Detection
ReID(Data Association)
- ReID應該是充分的
Bounding Boxes
- Features of ROI
Siamese/Re-ID/IOU(Appearance Affinity)
Motion
- the projection of non-linear 3D motion into the 2D image domain still poses a challenging problem for many models.?

這裡有一個重要的假設：跟蹤結果一定是檢測結果的子集

Two-Stage Milestones: SORT and DeepSORT

Detector = +18.9%

SORT

Faster-RCNN
Kalman Filter
- \(\mathbf{o}_i = [\texttt{r},x,y,s,\dot{x},\dot{y},\dot{s}]\)
  - 將位置和麵積視作勻速變化的量（變化率為常數）
- linear constant velocity model / independent of other objects and camera motion.
Tracklet Init and Deletion
- Accept all detection(>minHeight&minConfidence)
- Immediate Deleting

DeepSORTref

Faster-RCNN
- \[c = \lambda D_1(\text{track}_i,\text{detection}_j) + (1-\lambda)D_2(\text{track}_i,\text{detection}_j) \]
- Motion
  - \(D_1\):馬氏距離
- Appearance
  - \(D_2\):深度特徵的餘弦相似度
  - ReID Pretrained Model
Tracklet Init and Deletion
- Tentative = 3 frames (>minHeight&minConfidence)
- Delete after 3 frames

Joint Detection and Tracking (Detection-by-tracking)

『IDEA [檢測（檢測本身就是多目標的）是多目標跟蹤的一個上界]』

核心是Joint Learning

TODO 『2021-12-21 [CenterTrack]』

Eval

Specifically, the mapping between ground truthand hypotheses is established as follows: if the ground truth object oi and the hypothesis hj are matched in frame t - 1,and in frame t the IoU(oi; hj) \(\geq\) 0:5, then oi and hj are matched in that frame, even if there exists another hypothesis hk such that IoU(oi; hj) < IoU(oi; hk), considering the continuity constraint. After the matching from previousframes has been performed, the remaining objects are tried to be matched with the remaining hypotheses, still using a0.5 IoU threshold. The ground truth bounding boxes that cannot be associated with a hypothesis are counted as falsenegatives (FN), and the hypotheses that cannot be associated with a real bounding box are marked as false positives(FP). Also, every time a ground truth object tracking is interrupted and later resumed is counted as a fragmentation,while every time a tracked ground truth object ID is incorrectly changed during the tracking duration is counted as anID switch. Then, the simple metrics computed are the following:

Detection

[ ] DETR

Transformer

https://jalammar.github.io/illustrated-transformer/

RNN
- Maintain the hidden state
Transformer
- Self-Attention Global
- Contextualized Embedding

SiamMOT = DeepSORT + Motion Model

Motivation & Background

傳統MOT
- 二分圖匹配
  - 節點維護外形+運動特徵
  - 全域性優化問題
  - 沒有顯式的幀間關聯
  - improving local linking over consecutive frames rather than building an offline graph to re-identify instances across large temporal gaps.

SORT & DeepSORT

孿生網路
響應圖

Siam網路缺點：無法區分不同的物體

TODO 『2021-12-21 [Faster RCNN]』

TODO 『2021-12-21 [center Track] SKIPPED』

Contribution

SiamMOT: Faster RCNN + Motion Model
- consecutive frames & local linking
- Based on SORT

Pipeline

Arch

Given\(\mathbf{F}^{t},\mathbf{F}^{t+1};\mathbf{R}_{i}^{t}(\text{Region, Bounding Box})\)

Faster-RCNN

\(R_{i}^{t}\)是t時刻的跟蹤結果
\(\mathbf{D}^{t+1} = \text{Detector}(\mathbf{F}^{t+1})\)

Region-based Siamese Tracker

\(\mathbf{f}_{R_i}^{t} = \operatorname{ROIAlign}(R_{i}^{t},\text{dim})\)
- feature of Region(Object) i in frame t
\(\mathbf{f}_{S_i}^{t} = \operatorname{ROIAlign}(S_{i}^{t},\text{dim})\)
- feature of Object Search Area i in frame t
- \(S_{i}^{t+1} = \operatorname{Expand}(R_{i}^{t},\text{factor})\)
- （考慮物體不會在兩幀之間產生巨大位移）
\(\mathbf{R}^{t+1} = \text{SpatialMatching}(\mathbf{Mot}^{t+1},\mathbf{Det}^{t+1} )\)
- \(\mathbf{Mot}^{t+1},\mathbf{Det}^{t+1}\)分別代表運動模型的預測結果和目標檢測器的結果
- 他們都是是在先驗擴充套件的區域內得到的
- \(\mathbf{Mot}^{t+1}\)對應互相關回歸
- \(\mathbf{Det}^{t+1}\)對應檢測結果

\[\mathbf{Mot}^{t+1} = \operatorname{SiameseTracker}(\mathbf{f}_{R_i}^{t},\mathbf{f}_{S}^{t+1}) \]

核心：Siamese Tracker

Implicit MM

\[\text{[Confidence:Offset]} = \operatorname{MLP}(\mathbf{f}_{S_i}^{t},\mathbf{f}_{S_i}^{t+1}) \]\[\mathbf{Mot}_{i}^{t+1} = \operatorname{Modify}(R_{i}^{t},\text{Offset}) \]

『IDEA [為什麼採用Search]』

[ ] 這裡沒有搞清楚

❓感覺好像是寫錯了

因為不一定是軌跡，沒有運動關聯性

Explicit MM
- channel-wise cross-correlation
- response map

通過Correlation互相關——卷積操作：實際上利用了卷積操作的可擴充套件性（靈活性）

包括置信度

\[\mathbf{v}_i(x,y) = \text{Confidence}_i(x,y) \]

以及迴歸框

\[\mathbf{p}_i(x,y) = [l; t; r; b] \]

v,p都是在16*16大小的特徵圖上，每個畫素點的

最後尋找最優位置

\[(x,y) = \mathop{\arg\max}\limits_{x,y}() \]

The penalty map is introduced to discourage dramatic movements

『IDEA [我一直感覺基於ROI的迴歸是不充分的，因為卷積操作對物體變化的感知在inference階段是不存在的，引數都已經固化了]』

『IDEA [BoundingBoxReg學習到的是一個通用的，根據特徵迴歸偏移量（調整量）的引數。按理說，但是效果卻很好]』

❓其實這裡說是Motion Model，我覺得並不是。這裡並沒有建立運動模型。在本質上和SiamFC是一樣的。

First it uses the channel independent correlation operation to allow the network to explicitly learn a matching function between the same instance in sequential frames. Second, it enables a mechanism for finer-grained pixel-level supervision which is important to reduce the cases of falsely matching to distractors.

那為什麼會有很好的效果呢？

Training

\[Loss = l_{rpn} + l_{detect} + l_{motion} \]

\(l_{motion}\)就是之前提到過的損失函式

Short Occlusion

Inference

Spacial Matching

Solver

一些匹配規則

CorrTrack/TLD = FairMOT+Corr.

這篇文章總體來講寫的不是很清楚

沒有程式碼，也沒有明確的架構圖

Motivation & Background

卷積神經網路結構上的區域性感知特性
- 相同語義個體相似度高，干擾強烈
不能有效獲得空間和時間上的長程依賴
MOT的任務特點：處理多尺度特徵/物體大小不確定（和檢測任務相同）檢測任務處理多尺度特徵的思路

相似度

\[\mathbf{A}_{i j}=\operatorname{dist}\left(\mathbf{f}\left(\mathbf{d}_{t}^{i}\right), \hat{\mathbf{f}}\left(\mathbf{T}_{t-1}^{j}\right)\right)+\alpha \operatorname{IoU}\left(\mathbf{d}_{t}^{i}, \hat{\mathbf{d}}_{t}^{j}\right) \]

基於previous Frame/previous ROI計算語義相似度
- 空間相關性受到了檢測器的限制
- 大量外圍相似個體（尤其在行人場景下）
- 導致ID Switch

the correlation information between the cropped image patches is lost directly, and the adjacency spatial relationship is only retained in coordinates

需要區分DISTRACOTRS

❓但是這個有必要嗎？相鄰物體並不會出現很劇烈的移動啊。

也可能作者的意思是出現了Touching Switch（接觸漂移，我自己起的名字）

基於FairMOT

FairMOT

Pipeline

Arch

通用特徵提取
從時空依賴中同時學習相關性並進行檢測預測
將檢測結果關聯到最接近的軌跡上

Spatial Local Correlation Layers

目的是得到context correlation features（融合上下文的空間特徵）

Spacial Correlation

參考

作者以Non-local Module為參照，進行了基於鄰域的改進

其實這個改進很Intuitive

僅僅是人為設定了一個區域性感受野

實際上是做了一個Self-Attention

注意力權重\(\alpha(x,y) = \text{NeighborsCorrAt}(x,y)\)
\[\text{Correlated Feature} = \mathbf{F}_{C}^{l} = \sum_{x,y} \alpha(x,y) \mathbf{F}_{t}^{l}(x,y) \]
- 上標\(l\)代表特徵金字塔的位置
- 在這裡指的是，還沒有用到\(l\)

Spacial Corr. on FPN

特徵金字塔

空洞卷積(Dilated Conv.)

\[[0,R\times D \times 2^l] \]

這是什麼意思？

Temporal Correlation

不是很懂

這是之前的一個公式，一模一樣啊

colorization as a proxy task

這裡甚至沒有提到損失函式

Actually, our method intensively perform siamese tracking operations \(M\times N\) to increases the discrimination.

Self-Supervised Feature Learning

TransTrack

Motivation

Query-key Promising

For the same object, its feature in different frames is highly similar, which enables the query-key mechanism to output ordered object sets. This inspiration should also be beneficial to the MOT task.

How to transfer q-k from SOT to MOT

作者認為，最嚴重的問題在於new-coming物體，沒有相應的Query。而在SOT當中，目標是保證在畫面當中的

A desirable solution should be able to well capture new coming objects and propagate previously detected objects to the following frames at the same time

New-coming Objects

Traditinoal Way

[ ] 傳統目標初始化方法

Pipeline

Arch

這裡的Q-K不是指的Attention裡面的

one Encoder -> key
- Input: extracted features(2-frames)
2 Parallel Decoder -> query?
- Object Detection(DETR)
- Track
  - Appearance & Location Information
Box Association
- KM演算法
- 帶權二分圖最大匹配

[ ] 迭代過程是什麼
set prediction
- 2 sets
- (DETR) object query for detection
  - (NO NMS)
- features(of objects on track) as track query
  - provide consistent object information to maintain tracklets.
simple IoU matching to generate the final ordered object set from them.
previous frames -> data association

Input and Output

Training

Inference

目標初始化

TransTrack first detects objects on the first frame, where the feature maps are from two copies of the first frame.

occlusions and shortterm
disappearing

Specifically, if a tracking box is unmatched, it keeps as an “inactive” tracking box until it remains unmatched for K consecutive frames. Inactive tracking boxes can be matched to detection boxes and regain their ID.
we choose K = 32.

Why using Transformer/ dominant reason

Decent Frame Work
No Prior
[ ] Transformer在視覺任務當中的優勢

Summary

JDT paradigm

What's Different from TD

TD = Detection + Association
Joint Learning/Task-Driven

Questions

矩形框估計可能並不準確
- SiamMask
- Segmentation is the upper bound of Detection
- 可能影響準確率
ID Switch

TODO

[ ] 怎樣閱讀一篇論文，可以算是理解了
- [ ] 能寫出虛擬碼
- [ ] 說出作者的創作動機

Appendix

Kalman Filter

SORT

DeepSORT

\(D_1\)
- 在卡爾曼濾波分佈上計算的馬氏距離
- \(M(\text{track}_i,\text{dec}_j;S_{\text{Kalman Filter}})\)

\(D_2\)
- 深度ReID特徵餘弦相似度
- 僅僅依靠外觀特徵進行匹配也是可以進行追蹤的。
級聯匹配

如果一條軌跡被遮擋了一段較長的時間，那麼在卡爾曼濾波器的不斷預測中就會導致概率彌散。那麼假設現在有兩條軌跡競爭同一個檢測目標，那麼那條遮擋時間長的往往得到馬氏距離更小（更加接近）
馬氏距離的協方差矩陣
使檢測目標傾向於分配給丟失時間更長的軌跡，但是直觀上，該檢測目標應該分配給時間上最近的軌跡。所以deepsort引入了級聯匹配的策略讓更經常出現的目標被分配的優先順序更高。

應對遮擋

IOU匹配

unconfirmed and unmatched tracks of age n = 1.This helps to to account for sudden appearance changes, e.g., due to partial occlusion with static scene geometry, and to increase robustness against erroneous initialization

Faster-RCNN

FPN + Fast RCNN

RPN
- Predefined Anchors
  - ROI Pooling
2 MLP Heads
- Classification
- Coord. Regression：修正候選框

Feature Pyramid

需要解決的問題

多尺度
- 細粒度資訊丟失
- 小物體無法重建

影象金字塔
單個高層特徵
直接抽取不同特徵層
特徵金字塔
- 降取樣
- 上取樣
  - 最近鄰插值
- 1*1卷積，側向連線+merge

Dilated Conv.

不做pooling損失資訊的情況下，加大了感受野

Non-local Module

卷積核的設計初衷

捕捉區域性精細結構/模式匹配
- 可以看做區域性濾波

目的：大範圍依賴

增大感受野的方式

堆疊
- Sampling 過程中丟失大量資訊
全連線/Attention

Multi Scale Perception

ROI Pooling/ROI Align
- 本質上屬於Pooling方法，有損取樣
FPN
- 特徵金字塔
- 可以較好地融合不同尺度的特徵

FairMOT

目標檢測:視為高解析度特徵圖上基於中心的包圍盒迴歸任務
- = Faster RCNN = clsHeatMap + BBReg
網路架構適應ReID任務
- 類似於特徵金字塔
- 多尺度融合

核心在於損失函式的設計

資料關聯上
- 基於DeepSORT
- 其實只是替代了ReID特徵
- 變成了Task-Driven

多目標跟蹤綜述 2021

TransTrack(Init One Stage)/(TrackFormer) TransCenter(Trans + Regression) SiamMOT CorrTrack(Detection) CenterTrack 檢測和跟蹤聯合

多目標跟蹤演算法——DeepSORT

1 簡介 DeepSORT在SORT的基礎上做了一些改進，其中最重大的改進是在做資料(track和detection)關聯時利用了行人的外觀特徵(feature embedding)。通過加入外觀特徵，可以處理更長時間遮擋下的跟蹤[經過更長時間的遮擋

雷達實測資料傳統多目標跟蹤Kalman濾波

雷達實測資料卡爾曼濾波(KF)的調參，主要包括一下幾個內容： 1.過程噪聲矩陣Q，觀測噪聲矩陣R；

python實現單目標、多目標、多尺度、自定義特徵的KCF跟蹤演算法(例項程式碼)

單目標跟蹤：直接呼叫opencv中封裝的tracker即可。 #!/usr/bin/env python3 # -*- coding: utf-8 -*-

【轉】帶約束的多目標優化進化演算法綜述

帶約束的多目標優化進化演算法綜述覺得有用的話,歡迎一起討論相互學習~ ————————————————

實時車輛行人多目標檢測與跟蹤系統-上篇（UI介面清新版，Python程式碼）

摘要：本文詳細介紹如何利用深度學習中的YOLO及SORT演算法實現車輛、行人等多目標的實時檢測和跟蹤，並利用PyQt5設計了清新簡約的系統UI介面，在介面中既可選擇自己的視訊、圖片檔案進行檢測跟蹤，也可以通過電腦

opencv3/C++基於顏色的目標跟蹤方式

inRange函式 void inRange(InputArray src,InputArray lowerb,InputArray upperb,OutputArray dst); src：輸入影象；

opencv3/C++ 使用Tracker實現簡單目標跟蹤

簡介 MIL: TrackerMIL 以線上方式訓練分類器將物件與背景分離;多例項學習避免魯棒跟蹤的漂移問題.

單目標跟蹤之相關濾波 MOSSE

相關濾波相關操作卷積操作 MOSSE 基本思想具體操作流程程式碼解讀初始化線上更新

利用目標跟蹤來提高實時人臉識別處理速度

原始人臉實時檢測識別邏輯在之前的部落格裡面介紹到如何利用 Dlib 進行實時的人臉識別（https://www.cnblogs.com/AdaminXie/p/9010298.html），但是會遇到 FPS 很低（FPS 差不多在 5 左右）的問題；

.NET Standard多目標框架類庫引用繼承和函式呼叫規則

0 引言最近準備把之前的公共類庫移植為多目標框架，就研究了一下.Net Standard多目標框架類庫的官方文件，且自己做了個格式，總算搞清楚了多目標框架類庫引用繼承和函式呼叫規則，為怕以往，在這裡分享留檔，以備不時

基於ROS搭建簡易軟體框架實現ROV水下目標跟蹤（一）--簡述

當前，水下領域比較小眾，開源的資料比較少。Bluerov作為成熟的產品，結合其開源的優勢，經歷了多年的技術迭代，在市場上受到極大歡迎。剛接觸水下領域時，希望能找到一個比較方便的實驗平臺，開源的Bluero

目標跟蹤初探（DeepSORT）

目前主流的目標跟蹤演算法都是基於Tracking-by-Detecton策略，即基於目標檢測的結果來進行目標跟蹤。DeepSORT運用的就是這個策略，上面的視訊是DeepSORT對人群進行跟蹤的結果，每個bbox左上角的數字是用來標

手把手教你用YOLOv4 + Deep SORT實現目標跟蹤（TensorFlow）

Skip to content PullrequestsIssues Marketplace Explore theAIGuysCode/yolov4-deepsort Watch9 Star145 Fork70 Code

基於ROS搭建簡易軟體框架實現ROV水下目標跟蹤（三）--軟體框架簡述

本文主要介紹ROV水下目標跟蹤的簡易demo軟體實現的思路。一、視覺模組視覺模組的任務為：通過單目相機識別目標，並計算目標中心位置與影象中心位置的偏差，通過PID控制器得到控制量。demo中得到的控制量可

目標跟蹤 facebook_如何關閉Facebook Messenger的位置跟蹤（如果已啟用）

目標跟蹤 facebook It seems like everyoneis tracking our location now. Not surprisingly, Facebook Messenger can also transmita significant amount of information on your location activit

ECCV 2020 GigaVision挑戰賽“行人和車輛檢測”和“多目標追蹤”冠軍方案解讀

點選上方“邁微AI研習社”，選擇“星標★”公眾號重磅乾貨，第一時間送達

OpenCV 4.5釋出！更強的SIFT，OCR，RANSAC演算法，新增目標跟蹤網路SiamRPN++

點選上方“3D視覺工坊”，選擇“星標” 乾貨第一時間送達 OpenCV 4.5版本釋出了！

《物流配送中VRP問題的多目標優化方法研究》個人小結

《物流配送中VRP問題的多目標優化方法研究》個人小結物流配送中VRP問題的多目標研究方法，自從去年開始了這項大學生創新創業，就一直圍繞在我的身邊，時時刻刻會想著她，儘可能地去多學一點相關的VRP問題，

OpenCV4.5新增深度學習單目標跟蹤SiamRPN++

OpenCV4.5釋出了！其中新增了深度學習單目標跟蹤演算法SiamRPN++。這兩天在嘗試通過OpenCV實現SiamRPN++，但是OpenCV-python還沒有釋出，無法作為第三方庫安裝，而且網路權重在谷歌盤上，下載比較麻煩，在看

多目標跟蹤綜述 2021

Background

MOT Categories

Track-by-detection

Joint Detection and Tracking (Detection-by-tracking)

Eval

Detection

Transformer

SiamMOT = DeepSORT + Motion Model

Motivation & Background

Contribution

Pipeline

Arch

Training

Inference

CorrTrack/TLD = FairMOT+Corr.

Motivation & Background

Pipeline

Arch

Spacial Correlation

Spacial Corr. on FPN

Temporal Correlation

Self-Supervised Feature Learning

TransTrack

Motivation

How to transfer q-k from SOT to MOT

Pipeline

Arch

Input and Output

Training

Inference

Why using Transformer/ dominant reason

Summary

Questions

ID Switch

TODO

Appendix

Kalman Filter

SORT

DeepSORT

Faster-RCNN

Feature Pyramid

Dilated Conv.

Non-local Module

Multi Scale Perception

FairMOT

相關推薦