論文解讀-RRU-Net: The Ringed Residual U-Net for Image Splicing Forgery Detection
論文解讀-RRU-Net: The Ringed Residual U-Net for Image Splicing Forgery Detection
Abstract
- The proposed RRU-Net is an end-to-end image essence attribute segmentation network, which is independent of human visual system, it can accomplish the forgery detection without any preprocessing and post-processing. The core idea of the RRU-Net is to strengthen the learning way of CNN, which is inspired by the recall and the consolidation mechanism ofthe human brain and implemented by the propagation and the feedback process of the residual in CNN. The residual propagation recalls the input feature information to solve the gradient degradation problem in the deeper network; the residual feedback consolidates the input feature information to make the differences ofimage attributes between the un-tampered and tampered regions bemore obvious.
- 該RRU網路是一個獨立於人類視覺系統的端到端的影象本質屬性分割網路,不需要任何前處理和後處理就可以完成偽造檢測。RRU網路的核心思想是強化CNN的學習方式,其靈感來源於人腦的回憶和鞏固機制,並通過CNN中殘差的傳播和反饋過程來實現。殘差傳播召回輸入特徵資訊,解決深層網路中的梯度退化問題;殘差反饋對輸入的特徵資訊進行整合,使未篡改區域和篡改區域的影象屬性差異更加明顯。
1.Introduction
-
For improving the detected tampered regions, the detection methods [1, 27] use the non-overlapping image patch as the input of CNNs. However, when an image patch totally comes from the tampered regions, this image patch will be judged un-tampered label. In [15], the authors utilize the bigger image patch to reveal the image attributes of the tampered regions, however, the detection method may fail if the forgery image is small. For the existing CNN-based detection methods, since they use the image patch as the input of the network, the contextual spatial information is lost, which easily causes incorrect prediction.
為了改進檢測到的篡改區域,檢測方法 [1, 27] 使用非重疊影象塊作為 CNN 的輸入。但是,當一個影象塊完全來自被篡改區域時,該影象塊將被判斷為未篡改標籤。在[15]中,作者利用較大的影象塊來揭示被篡改區域的影象屬性,但是如果偽造影象很小,檢測方法可能會失敗。對於現有的基於 CNN 的檢測方法,由於它們使用影象塊作為網路的輸入,因此會丟失上下文空間資訊,從而容易導致錯誤的預測。而且,當網路架構更深時,會出現梯度退化問題,特徵的辨別能力會變弱,這會導致拼接偽造檢測更加困難甚至失敗。
-
For overcoming the drawbacks of traditional feature extraction-based methods, meanwhile, further solving the problems of current CNN-based detection methods, a ringed residual U-Net (RRU-Net) is proposed in this paper. RRU-Net is an end-to-end image essence attribute segmentation network, which is independent of human visual system, it can directly locate the forgery regions without any preprocessing and post-processing. Furthermore, RRU-Net can effectively decrease incorrect prediction since it makes better use of the contextual spatial information in a image.
And most of all, the ringed residual structure in RRU-Net can strengthen the learning way of CNN and simultaneously prevent the gradient degradation problem of deeper network, which ensure the discrimination of image essence
attribute features be more obvious while the features are extracted among layers of network.為了克服傳統基於特徵提取的方法的缺點,同時進一步解決當前基於CNN的檢測方法存在的問題,本文提出了一種環形殘差U-Net(RRU-Net)。 RRU-Net是一種端到端的影象本質屬性分割網路,它獨立於人類視覺系統,無需任何預處理和後處理即可直接定位偽造區域。此外,RRU-Net 可以有效地減少錯誤預測,因為它更好地利用了影象中的上下文空間資訊。
最重要的是,RRU-Net中的環狀殘差結構可以加強CNN的學習方式,同時防止更深網路的梯度退化問題,保證在層間提取特徵的同時,對影象本質屬性特徵的區分更加明顯。 的網路。
3. The Ringed Residual U-Net (RRU-Net)
3.1. Residual Propagation
According to the discussion above, the differences of image essence attributes are the significant basis for detecting image splicing forgery, however, the gradient degradation problem will destroy the basis when the network architecture gets deeper. For solving the gradient degradation problem, we add the residual propagation to each stacked layers. A building block is shown in Fig. 2, which consists of two convolutional (dilated convolution [31], dconv) layers and residual propagation. The output of the building block is defined as:
\[y_{f}=F\left(x,\left\{W_{i}\right\}\right)+W_{s} * x \]where, \(x\) and \(y_{f}\) are the input and output of the building block, \(W_{i}\) represents the weights of layer $ i $, the function \(F\left(x,\left\{W_{i}\right\}\right)\) represents the residual mapping to be learned. For the example in Fig. 2 that has two convolutional layers,\(F=W_{2} \sigma\left(W_{1} * x\right)\) in which \(\sigma\) denotes ReLU [19] and the biases are omitted for simplifying notations. The linear projection \(W_{s}\) is used to change the dimension of x to match the dimension of \(F\left(x,\left\{W_{i}\right\}\right)\) . The operation $ F + W_{s} * x$ is performed by a shortcut connection and element-wise addition.
The residual propagation looks like the recall mechanism of the human brain. We may forget the previous knowledge when we learn several more new knowledge, so we need
the recall mechanism to help us arouse those previous fuzzy memories.
3.2. Residual Feedback
It is obvious that, in splicing forgery detection, if the differences of image essence attributes between the un-tampered and tampered regions can be further strengthened, the performance of the detection can be further improved. In [36], the proposed method superposes the additional difference of noise attribute by passing the forgery imag through an SRM filter layer to enhance detection results. The SRM filter layer has a certain effect, however, it is a manual choosing method and can only for the RGB image forgery detection. Moreover, when the un-tampered and tampered regions come from the cameras with the same brand and model, the SRM filter layer will reduce effectiveness sharply, since they have same noise attribute. For further strengthening the differences of image essence attributes, the residual feedback is proposed, which is an automatic learning method and not just focus on one or several
specific image attributes. Furthermore, we design a simple and effective attention mechanism, which take advantage of ideas of Hu et al. [9], and then we add it on the residual feedback to pay more attention to the discriminative features of input information. In this attention mechanism, we opt to employ a simple gating mechanism with a sigmoid activation function to learn a nonlinear interaction between
discriminative feature channels and avoid diffusion of feature information, and then we superpose the response values obtained by sigmoid activation on input information to
amplify differences of image essence attributes between the un-tampered and tampered regions. The residual feedback in a building block is shown Fig. 3 and is defined as Eq.(3),
where, \(x\) is the input, \(y_{f}\) is the output of residual propagation defined in Eq.(2), \(y_{b}\) is the enhanced input. The function G is a linear projection, which is used to change the dimensions of \(y_{f}\). The function \(s\) is a sigmoid activation function.In contrast to the recall mechanism imitated by the residual propagation, the residual feedback seems to act as the consolidation mechanism of the human brain, we need to consolidate the knowledge already learned by us to obtain the new feature comprehensionp. The residual feedback can amplify the differences of image essence attributes between the un-tampered and tampered regions in the input, as shown in Fig. 1.(c), the tampered region ’eagle’ is am- plified to global maximal response values by the residual feedback. Furthermore, it also has two far-reaching effects:
(1) the strengthening of the discriminative features can simultaneously be viewed as the repression of the negative label features;
(2) the convergence rate of network in the training process is more fast.
3.3. Ringed Residual Structure and Network Archi-tectures
-
The proposed ringed residual structure that combines the residual propagation and the residual feedback is shown in Fig. 4.
所提出的結合了殘差傳播和殘差反饋的環形殘差結構如圖4所示。
-
To sum up, the ringed residual structure guarantees the discrimination of image essence attribute features be more obvious while the features are extracted among layers of network, which can achieve better and stable detection performance than traditional feature extraction-based detection methods and existing CNN-based detection methods.
綜上所述,環狀殘差結構在網路各層之間提取特徵的同時,保證了影象本質屬性特徵的判別更加明顯,與傳統的基於特徵提取的檢測方法和現有的基於CNN的檢測方法相比,能夠獲得更好、穩定的檢測效能。RRU-Net的網路架構如圖5所示,它是一個端到端的影象本質屬性分割網路,無需任何預處理和後處理即可直接檢測拼接偽造。
4.1. Detection at Pixel Level
4.2. Detection at Image Level
5. Conclusion
-
In this paper, we propose a ringed residual U-Net (RRU-Net) for image splicing forgery detection, which is an end-to-end image essence property segmentation network and can achieve the forgery detection without any preprocessing and post-processing. Inspiring by the recall and consolidation mechanisms of the human brain, the proposed RRU-Net strengthens the learning way of CNN by the propagation and feedback process of the residual. Simultaneously,
we also prove the validity of the ringed residual structure in RRU-Net from theoretical analysis and experimental comparison. We will further explore and visualize the latent discriminative feature between tampered and un-tampered regions to explain the key issues of image splicing forgery detection in our future works.在本文中,我們提出了一種用於影象拼接偽造檢測的環形殘差U-Net(RRUNet),它是一種端到端的影象本質屬性分割網路,無需任何預處理和後處理即可實現偽造檢測。 受人腦回憶和鞏固機制的啟發,所提出的 RRUNet 通過殘差的傳播和反饋過程加強了 CNN 的學習方式。 同時,我們還通過理論分析和實驗比較證明了 RRU-Net 中環狀殘差結構的有效性。 我們將進一步探索和視覺化篡改和未篡改區域之間的潛在判別特徵,以解釋我們未來工作中影象拼接偽造檢測的關鍵問題。