[論文筆記 ECCV2020] Learning to Count in the Crowd from Limited Labeled Data

阿新 • • 發佈：2020-10-12

[論文筆記 ECCV2020] Learning to Count in the Crowd from Limited Labeled Data

論文地址：https://arxiv.org/pdf/2007.03195.pdf

摘要 Abstract

最近的人群計數的方法取得了很好的效能，但是大多數的方法都是基於全監督式的學習同時需要依賴大量的標註。獲取這些標註是非常耗時費力的，成本較高。因此本文提出一種從有限的標註樣本中學習計數的網路（同時利用了大量的無標註的資料），旨在減少標註成本。使用基於高斯過程的迭代學習機制用於對無標註樣本的偽標籤進行評估，然後將其作為監督資訊使用監督式的方法來訓練網路。

首先用高斯過程來對帶有gt的標註樣本的隱空間向量和沒有標註樣本的隱空間向量建立關係得到沒有標註樣本的偽標籤，然後對比一下上一輪GP建模的無標註輸入的偽標籤，之後就可以用這個偽標籤來使用監督式的方法在沒有標註的資料集上進行訓練。

貢獻 Contributions

我們提出了一個在訓練過程中基於高斯過程的框架來有效地探索無標註的資料，用來提升整體效能。所提出的方法包括了迭代式地同時訓練標註的和無標註的資料。對於無標註的資料，在標註階段使用高斯過程來評估偽標籤。
提出的框架在半監督和遷移學習表現有效，通過消融實驗，證明了提出的模型是能夠泛化到不同網路結構的。

Model Architecture(GP-based iterative learning)

網路是用一個encoder和decoder結構組成的，提出的框架與encoder網路無關，實驗階段表明其可以很好地泛化到像VGG16，ResNet50和ResNet101上去。decoder由一組兩個conv-relu層組成。輸入圖片 x x x經過encoder網路得到隱空間向量 z z z，這個隱空間向量 z z z經過decoder網路得到密度圖輸出 y y y。

模型結構圖

整個訓練過程分為兩個階段

1. labeled training stage

在這一階段，我們使用有標註的資料集，用監督損失函式（比如L2 loss）去學習網路模型的引數。

2. unlabeled training stage

在這一階段，使用高斯過程對無標註的資料點生成偽標籤，然後使用監督學習的方式進行訓練。

Labeled Stage

使用L2 loss來訓練網路，只訓練有標註資料的那部分資料樣本。
L s = L 2 = ∣ ∣ y l p r e d − y l ∣ ∣ 2 L_s = L_2 = ||y^{pred}_l - y_l||_2 Ls=L2=∣∣ylpred−yl∣∣2

y l p r e d = g ( z l , ϕ d ) y^{pred}_l = g(z_l,\phi_d) ylpred=g(zl,ϕd)表示的是模型的輸出， y l y_l yl是ground truth， z = h ( x , ϕ e ) z = h(x,\phi_e) z=h(x,ϕe)是隱空間向量，值得注意的是，需要額外儲存中間過程的隱空間向量矩陣 F z l = { z j i } i = 1 N l F_{z_l} = {\{z^i_j\}^{N_l}_{i=1}} Fzl={zji}i=1Nl。這個矩陣用於後續計算無標註資料的偽標籤。矩陣的維度是 N l × M N_l \times M Nl×M，這裡 M M M是隱空間向量 z l z_l zl的維度 64 × 32 × 32 = 65536 64 \times 32 \times 32 = 65536 64×32×32=65536。

Unlabeled Stage

在無標註資料的訓練階段，我們使用高斯過程來生成偽標籤作為監督資訊用於訓練網路結構。在有標註資料的訓練階段，我們使用隱層空間向量 F z l F_{z_l} Fzl來建模隱層空間向量與輸出密度圖之間的對映關係 y = t ( z ) y=t(z) y=t(z)。
通過高斯過程，利用標註資料和無標註資料的隱空間向量來聯合建模函式 t ( ⋅ ) t(·) t(⋅)的分佈。
P ( t ( z ) ∣ D L , F z l , T y l ) ∼ G P ( μ , K ( F z l , F z l ) + σ ϵ 2 I ) P(t(z)|D_L, F_{z_l},T_{y_l}) \sim GP(\mu, K(F_{z_l}, F_{z_l}) + \sigma_\epsilon^2I) P(t(z)∣DL,Fzl,Tyl)∼GP(μ,K(Fzl,Fzl)+σϵ2I)
這裡 μ \mu μ是通過高斯過程計算出來的函式值， σ ϵ 2 \sigma_\epsilon^2 σϵ2設定成1， K K K是核函式，基於此，對於第 k k k個無標註樣本 x u k x_u^k xuk的隱空間向量 z u k z_u^k zuk的條件聯合分佈就可以表示如下：
P ( t ( z u k ) ∣ D L , F z l , T z l ) = N ( μ u k , Σ u k ) P(t(z_u^k)|D_L, F_{z_l}, T_{z_l}) = N(\mu_u^k, \Sigma_u^k) P(t(zuk)∣DL,Fzl,Tzl)=N(μuk,Σuk)
這裡的 μ u k \mu_u^k μuk和 Σ u k \Sigma_u^k Σuk分別是：
μ u k = K ( z u k , F z l ) [ K ( F z l , F z l ) + σ ϵ 2 I ] − 1 T y l \mu_u^k = K(z_u^k, F_{z_l}) [K(F_{z_l}, F_{z_l})+\sigma_\epsilon^2I]^{-1}T_{y_l} μuk=K(zuk,Fzl)[K(Fzl,Fzl)+σϵ2I]−1Tyl
Σ u k = K ( z u k , z u k ) − K ( z u k , F z l ) [ K ( F z l , F z l ) + σ ϵ 2 I ] − 1 K ( F z l , z u k ) + σ ϵ 2 \Sigma_u^k = K(z_u^k,z_u^k) - K(z_u^k, F_{z_l}) [K(F_{z_l},F_{z_l})+\sigma_\epsilon^2I]^{-1}K(F_{z_l},z_u^k)+\sigma_\epsilon^2 Σuk=K(zuk,zuk)−K(zuk,Fzl)[K(Fzl,Fzl)+σϵ2I]−1K(Fzl,zuk)+σϵ2
K ( Z , Z ) k , i = K ( z u k , z l i ) = < z u k , z l i > ∣ z u k ∣ ⋅ ∣ z l i ∣ K(Z,Z)_{k,i} = \mathcal{K}(z_u^k, z_l^i) = \frac{<z_u^k,z_l^i>} {|z_u^k| \cdot |z_l^i|} K(Z,Z)k,i=K(zuk,zli)=∣zuk∣⋅∣zli∣<zuk,zli>
考慮到隨著有標註資料的樣本 N l N_l Nl慢慢地增大，對於 K ( F z l , F z l ) K(F_{z_l},F_{z_l}) K(Fzl,Fzl)的維度也會變得很大，對於計算和儲存是一個挑戰。因此不是所有的有標註樣本的隱空間向量都要用，具體來說就是隻選擇與無標註樣本的隱空間向量相似的且最近的 N n N_n Nn個有標註樣本的隱空間向量來計算那個矩陣 F z l , n F_{z_l,n} Fzl,n, 直接使用 μ u k \mu_u^k μuk來作為第 k k k個無標註樣本的偽標籤，即 y u , p s e u d o k = μ u k y_{u,pseudo}^k = \mu_u^k yu,pseudok=μuk,然後使用 L 2 L_2 L2距離來優化模型，更新encoder和decoder的引數。
進一步，還要最小化使用高斯過程計算出來的 z u k z_u^k zuk和其最近的 N n N_n Nn隱空間向量之間的方差 Σ u , n k \Sigma_{u,n}^k Σu,nk得到最終的損失 l o s s = L u n loss = \mathcal{L}_{un} loss=Lun。

L u n = 1 ∣ Σ u , n k ∣ ∣ ∣ y u , p r e d k − y u , p s e u d o k ∣ ∣ 2 + l o g Σ u , n k \mathcal{L}_{un} = \frac{1}{|\Sigma_{u,n}^k|} ||y_{u,pred}^k - y_{u,pseudo}^k||_2 +log\Sigma_{u,n}^k Lun=∣Σu,nk∣1∣∣yu,predk−yu,pseudok∣∣2+logΣu,nk

總的 l o s s loss loss

L f = L s + λ u n L u n \mathcal{L}_f = \mathcal{L}_s + \lambda_{un}\mathcal{L}_{un} Lf=Ls+λunLun

實現細節 Implementation Details

使用Adam優化器，學習率為1e-5，momentum=0.9, batchsize=24，使用Nvidia Titan Xp GPU
training： random crop size = 256 × \times × 256,
MAE和MSE作為評價指標。

實驗結果 Results&Ablation Study

消融1：選擇有標註資料的比例為5%，對比有無利用無標註樣本及高斯過程的結果效能差異

100% labeled dataset
5% labeled
5% labeled + 95% unlabeled + Ranking Loss
5% labeled + 95% unlabeled + Gaussian Process

在這裡插入圖片描述

消融2：有標註樣本的比例分別從5%到75%的效能差異

No-GP（labeled dataset only）
GP （labeled and unlabeled dataset）

在這裡插入圖片描述

效果圖

在這裡插入圖片描述

消融3：不同的網路結構作為encoder的效能對比

在這裡插入圖片描述

偽標籤分析：

It can be observed that the pseudo-GT errors are concentrated in the lower end of the error region as compared to the prediction errors. This implies that the pseudo-GTs are more closer to the GTs than the predictions. Hence, the pseudo-GTs obtained using the proposed method are able to provide good quality supervision on the unlabeled data.

大致意思就是說這個用高斯過程生成的偽標籤是有效的，能夠提供好的監督資訊來訓練網路。

在這裡插入圖片描述

消融4：可遷移性

No Adapt
Cycle GAN
SE Cycle GAN
Proposed Method

在這裡插入圖片描述

[論文筆記 ECCV2020] Learning to Count in the Crowd from Limited Labeled Data

[論文筆記 ECCV2020] Learning to Count in the Crowd from Limited Labeled Data 摘要 Abstract貢獻 ContributionsModel Architecture(GP-based iterative learning)整個訓練過程分為兩個階段1. labeled

1582 Incorrect parameter count in the call to native function 'FIND_IN_SET'

此錯誤提示： PDOException in Connection.php line 687 SQLSTATE[42000]: Syntax error or access violation: 1582 Incorrect parameter count in the call to native function \'FIND_IN_SET\'

Estimating Conversion Rate in Display Advertising from Past Performance Data 論文閱讀筆記

摘要在定向展示廣告中，目標是確定向最有可能採取購買產品或訂閱時事通訊等行動的線上使用者展示橫幅廣告的最佳機會。找到最好的廣告投放，即向用戶展示廣告的機會，需要估計在瀏覽器上看到廣告的使用者將

[LeetCode] 1269. Number of Ways to Stay in the Same Place After Some Steps 停在原地的方案數

You have a pointer at index0in an array of sizearrLen. At each step, you can move 1 position to the left, 1 position to the right in the array, or stay in the same place (The pointer should not be p

【論文筆記（5）ECCV2020】Graph convolutional networks for learning with few clean and many noisy labels

Graph convolutional networks for learning with few clean and many noisy labels AbstractIntroductionRelated WrokProblem formulationCleaning with graph convolutional networksLearning a classi

《AdaptSegNet：Learning to Adapt Structured Output Space for Semantic Segmentation》論文筆記

參考程式碼：AdaptSegNet 1. 概述導讀：這篇文章著力於解決模型未見過資料的適應性，一般來講模型對於與訓練集中資料類似的資料表現較好，但是對於未知場景的資料就表現較差了，這也是domain-adaptation需

【論文閱讀筆記】Towards Accurate Multi-person Pose Estimation in the Wild

論文地址：https://arxiv.org/abs/1701.01779 論文總結本文是top-down的姿態檢測模型，其人體檢測器採用Faster RCNN，Faster RCNN 以 ResNet-101姿態檢測器以Res101作為backbone，預測兩個分支：Heamt

論文筆記+模型實現TransNets: Learning to Transform for Recommendation

文章目錄摘要1. 介紹2. 提出的方法2.1 CNN處理文字 & 2.2 DeepCoNN模型2.3 DeepCoNN的一下侷限性2.4 TransNets模型2.5 TransNets模型的訓練2.6 設計決策和一些其他結構的選擇2.6.1 分步訓練 VS. 合併

《Video Abnormal Event Detection by Learning to Complete Visual Cloze Tests》論文筆記

1. 摘要　　儘管深度神經網路(DNNs)在視訊異常檢測(VAD)方面取得了很大的進展，但現有的解決方案通常存在兩個問題：

《Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events》論文筆記

0. 摘要　　視訊異常檢測(VAD)作為視訊內容解釋的重要課題，通過深度神經網路(DNN)取得了豐碩的進展。然而，現有的方法通常遵循重建或幀預測程式。他們主要存在兩大問題：

Working hard to know your neighbor's margins:Local descriptor learning loss論文筆記

HardNet Abstract 論文提出了一種新的訓練方法，受到了 Lowe’s matching criterion for SIFT的啟發。這種新的loss，要比負責的正則方法更好。把這個新的loss方法結合L2Net就得到了HardNet。它具有和SIFT同樣的特

報錯--->java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'delect from testd

報錯--->java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \'delect from testdb.

[論文筆記 ECCV2020] Learning to Count in the Crowd from Limited Labeled Data