1. 程式人生 > 其它 >Image Splicing Localization Using A Multi-Task Fully Convolutional Network (MFCN)

Image Splicing Localization Using A Multi-Task Fully Convolutional Network (MFCN)

論文解讀-Image Splicing Localization Using A Multi-Task Fully Convolutional Network (MFCN)

論文連結:TransForensics: Image Forgery Localization with Dense Self-Attention

摘要

在這項工作中,我們提出了一種利用全卷積網路 (FCN) 來定點陣圖像拼接攻擊的技術。我們首先評估了僅在表面標籤上訓練的單任務FCN (SFCN)。儘管SFCN顯示出比現有方法具有卓越的效能,但在某些情況下,它仍然提供粗略的本地化輸出。因此,我們建議使用多工FCN (MFCN),該多工FCN利用兩個輸出分支

進行多工學習。一個分支用於學習曲面標籤,而另一個分支用於學習拼接區域的邊緣或邊界。我們使用CASIA v2.0資料集訓練了網路,並在CASIA v1.0,Columbia uncompresed,Carvalho和DARPA/NIST Nimble Challenge 2016 SCI資料集上測試了訓練過的模型。實驗表明,SFCN和MFCN優於現有的拼接定位演算法,並且MFCN可以實現比SFCN更好的定位。

1 Introduction

  • The base network architecture is the FCN VGG-16 architecture with skip connections, but we incorporate several modifications, including batch normalization layers and class weighting.

    基本網路體系結構是帶有跳過連線的FCN VGG-16體系結構,但我們進行了一些修改,包括批量規範化層和類權衡。

  • Thus, we next propose the use of a multi-task FCN (MFCN) that utilizes two output branches for multi-task learning. One branch is used to learn the surface label, while the other branch is used to learn the edge or boundary of the spliced region. It is shown that by simultaneously training on the surface and edge labels, we can achieve finer localization of the spliced region, as compared to the SFCN. Once the MFCN was trained, we evaluated two different inference approaches. The first approach utilizes only the surface output probability map in the inference step. The second approach, which is referred to as the edge-enhanced MFCN, utilizes both the surface and edge output probability maps to achieve finer localization.

    因此,我們接下來建議使用多工 FCN (MFCN),它利用兩個輸出分支進行多工學習。一個分支用於學習表面標籤,而另一個分支用於學習拼接區域的邊緣或邊界。結果表明,與 SFCN 相比,通過同時訓練表面和邊緣標籤,我們可以實現拼接區域的更精細定位。訓練 MFCN 後,我們評估了兩種不同的推理方法。第一種方法在推理步驟中僅使用表面輸出概率圖。第二種方法,稱為邊緣增強 MFCN,利用表面和邊緣輸出概率圖來實現更精細的定位。

  • Furthermore, we show that after applying various post-processing techniques such as JPEG compression, blurring, and added noise to the spliced images, the SFCN and MFCN methods still outperform the existing methods.

    此外,我們表明,在對拼接影象應用各種後處理技術(如 JPEG 壓縮、模糊和新增噪聲)後,SFCN 和 MFCN 方法仍然優於現有方法。

3 Proposed Methods

3.1 Brief Review of Fully Convolutional Networks (FCNs)

  • In [24], the authors adapted common classification net-works into fully convolutional ones for the task of semantic segmentation. It was shown in [24] that FCNs can efficiently learn to make dense predictions for per-pixel tasks such as semantic segmentation.

3.2 Single-task Fully Convolutional Network (SFCN)

  • In addition, we incorporated several modifications, including batch normalization and class weighting. We utilized batch normalization to eliminate the bias and normalize the inputs at each layer [17]. Class weighting refers to the application of different weights to the different classes in the loss function.

    此外,我們還進行了一些修改,包括批量標準化和類權重。我們利用批量標準化來消除偏差,並對每一層的輸入進行標準化[17]。類別權重是指對損失函式中的不同類別應用不同的權重。

  • We apply a larger weight to the spliced pixels (since there are fewer spliced pixels than non-splicedones).In particular, we used median frequency class weighting [13, 2].

    我們對拼接畫素施加更大的權重 (因為拼接畫素比非拼接畫素少)。特別是,我們使用了中值頻率類加權。

3.3 Multi-task Fully Convolutional Network (MFCN)

  • In our work, we adopt the idea in [30] of utilizing a multi-task network, but we
    incorporate several modifications, including skip connections, batch normaliza-
    tion, and class weighting (as discussed in Section 3.2). In contrast to the SFCN,
    the MFCN utilizes two output branches for multi-task learning. One branch
    is used to learn the surface label, while the other branch is used to learn the
    edge or boundary of the spliced region.

    在我們的工作中,我們採用了 [30] 中利用多工網路的想法,但我們合併了一些修改,包括跳過連線、批量歸一化和類權重(如第 3.2 節所述)。與 SFCN 相比,MFCN 利用兩個輸出分支進行多工學習。一個分支用於學習表面標籤,而另一個分支用於學習拼接區域的邊緣或邊界。

  • The architecture of the MFCN used in our paper is shown in Fig. 3. In addition to the surface labels, the boundaries between inserted regions and their host background can be an important indicator of a manipulated area. This is what motivated us to use a multi-task learning network. The weights or parameters of the network are influenced by both the surface and edge labels during the training process. By simultaneously training on the surface and edge labels, we are able to obtain a finer localization of the spliced region, as compared to training only on the surface labels. Once the network was fully trained, we evaluated two different binary output mask generation approaches. In the first approach, we extract the surface output probability map, and then threshold it to yield the binary system output mask. In this approach, the edge output probability map is not utilized in the inference step. Please note that the edge label still influenced the weights of the network during the training process.

    在我們的工作中,我們採用了 [30] 中利用多工網路的想法,但我們合併了一些修改,包括跳過連線、批量歸一化和類權重(如第 3.2 節所述)。與 SFCN 相比,MFCN 利用兩個輸出分支進行多工學習。一個分支用於學習表面標籤,而另一個分支用於學習拼接區域的邊緣或邊界。我們論文中使用的 MFCN 的架構如圖 3 所示。除了表面標籤外,插入區域和它們的宿主背景之間的邊界可能是操作區域的重要指標。這就是促使我們使用多工學習網路的原因。在訓練過程中,網路的權重或引數受表面和邊緣標籤的影響。通過同時在表面和邊緣標籤上進行訓練,與僅在表面標籤上進行訓練相比,我們能夠獲得拼接區域的更精細定位。網路經過充分訓練後,我們評估了兩種不同的二進位制輸出掩碼生成方法。在第一種方法中,我們提取表面輸出概率圖,然後對其進行閾值化以產生二進位制系統輸出掩碼。在這種方法中,在推理步驟中不使用邊緣輸出概率圖。請注意,在訓練過程中,邊緣標籤仍然會影響網路的權重。

3.4 Edge-enhanced MFCN Inference

  • The second inference strategy, which we refer to as the edge-enhanced MFCN,
    utilizes both the surface and edge output probability maps, as described in the
    following steps:
    1. We threshold the surface probability map with a given threshold.
    2. We threshold the edge probability map with a given threshold.
    3. Next, we apply hole-filling to the output of step (2), yielding the hole-filled,
      thresholded edge mask.
    4. Finally, we generate the binary system output mask by computing the
      intersection of the output of step (1) and output of step (3).

It is shown in this paper that by utilizing both the edge and surface probability
maps in the inference step, we obtain finer localization of the spliced region. An
example illustrating inference with edge-enhancement is shown in Figure 4. It
can be seen that utilizing both the edge and surface probability maps leads to
a finer localization of the spliced region.

第二種推理策略,我們稱為邊緣增強 MFCN,利用表面和邊緣輸出概率圖,如以下步驟所述:

  1. 我們使用給定閾值對錶面概率圖進行閾值化。
  2. 我們用給定的閾值對邊緣概率圖進行閾值化。
  3. 接下來,我們將孔填充應用於步驟 (2) 的輸出,產生孔填充的閾值邊緣掩碼。
  4. 最後,我們通過計算步驟(1)的輸出和步驟(3)的輸出的交集來生成二進位制系統輸出掩碼。

3.5 Training and Testing Procedure

For the MFCN, the total loss function, Lt, is the sum of the loss corresponding to the surface label and the loss corresponding to the edge label, denoted by Ls and Le, respectively. Thus, we have
Lt = Ls + Le,
where Ls and Le are per-pixel softmax loss functions. In addition, we apply median-frequency class weighting to the surface and edge loss functions, as described in Sections 3.2 and 3.3. For the SFCN, the total loss function is equal to the surface loss function Ls.

4 Performance Evaluation Metrics

For each output map, we varied the threshold and picked the optimal threshold (this is done for each method). This technique of varying the threshold was also utilized by Zampoglou et. al. in [37]

Once the MFCN or SFCN is trained, we use the trained model to evaluate other images not in the training set. We evaluated the performance of the proposed and existing methods using the F1 and Matthews Correlation Coefficient (MCC) metrics, which are per-pixel localization metrics.

5 Experimental Results

6 Conclusion

It was demonstrated in this work that the application of FCN to the splicing localization problem yields large improvement over current published techniques.
The FCN we utilized is based on the FCN VGG-16 architecture with skip connections, and we incorporated several modifications, such as batch normalization layers and class weighting. We first evaluated a single-task FCN (SFCN) trained only on the surface ground truth mask (which classifies each pixel in an image as spliced or authentic). Although the single-task network is shown to outperform existing techniques, it can still yield a coarse localization output incertain cases. Thus, we next proposed the use of a multi-task FCN (MFCN) that is simultaneously trained on the surface ground truth mask and the edge ground truth mask, which indicates whether each pixel belongs to the boundary of the spliced region. For the MFCN-based method, we presented two different inference approaches. In the first approach, we compute the binary system output mask by thresholding the surface output probability map. In this approach, the edge output probability map is not utilized in the inference step. This first MFCN-based inference approach is shown to outperform the SFCN-based approach. In the second MFCN-based inference approach, which we refer to as edge-enhanced MFCN, we utilize both the surface and edge output probability map when generating the binary system output mask. The edge-enhanced MFCN is shown to yield finer localization of the spliced region, as compared to the SFCN-based approach and the MFCN without edge-enhanced inference. The proposed methods were evaluated on manipulated images from the Carvalho, CASIA v1.0, Columbia, and the DARPA/NIST Nimble Challenge 2016 SCI datasets. The experimental results showed that the proposed methods outperform existing splicing localization methods on these datasets, with the edge-enhanced MFCN performing the best.

個人總結

這篇文章主要的幾個contribution:

  1. 幾個FCN模型,還都是用的別人的結構,差評!
  2. 兩種推理方式的對比:一個是直接預測,另一個結合了邊緣概率圖。
  3. 對結構做的一些小修改還是可以借鑑的,包括批量規範化層和類權衡等,再進一步看下程式碼。
  4. 訓練資料需要用到edge label,得自己生成。