tensorflow中的padding方式SAME和VALID的區別

阿新 • • 發佈：2020-12-25

Tensorflow中的padding有兩種方式，其中的SAME方式比較特殊，可能產生非對稱pad的情況，之前在驗證一個tensorflow的網路時踩到這個坑。

Tensorflow的計算公式

二維卷積介面

tf.nn.conv2d(
    input, filters, strides, padding, data_format='NHWC', dilations=None, name=None
)

padding計算公式

需要注意padding的配置，如果是字串就有SAME和VALID兩種配置，如果是數字list就明確表明padding在各個維度的數量。

首先，padding如果表示確切的數字，其維度是input維度的2倍，因為每個維度兩個邊需要補pad，比如寬度的左邊和右邊，高度的上邊和下邊，但是tensorflow中不會給N維度以及C維度補pad，僅僅在W和H維度補pad，因此對於NHWC

，padding =[[0, 0], [pad_top, pad_bottom], [pad_left, pad_right], [0, 0]] ，對於NCHW的pad的順序換過來。

然後，如果輸入的是字串選項，補的pad都可以對映到padding這個引數上，

VALID模式表示不在任何維度補pad，等價於padding =[[0, 0], [0, 0], [0, 0], [0, 0]]；
SAME模式表示在stride的尺度下，Wo與Wi保持在stride\(S\)下保持一致（以寬度維度為例），需要滿足如下關係

\[W_{o}=\left\lceil\frac{W_{i}}{S}\right\rceil \]

我們知道如果dilation = 1，那麼在某個維度上，卷積的輸出寬度\(W_i\)、輸出寬度\(W_o\)和步長\(S\)，在沒有pad的情況下，滿足如下的關係式

\[W o=\left\lfloor\frac{W i-W_{k}}{S}\right\rfloor+1 \]
我們以補最小程度的\(P_a\)為基準，於是有關係式

\[W o=\frac{W i+P_{a}-W_{k}}{S}+1 \]
反推過來得到

\[P_{a}=\left(W_{o}-1\right) S+W_{k}-W_{i} \]
這是需要補的總的pad，tensorflow的補充pad是儘量兩邊對稱的，
- 如果\(P_a\)
  
  是偶數，那麼兩邊都補\(P_l = P_a/2\)；
- 如果\(P_a\)是奇數，那麼左邊補\(P_l = \lfloor{P_a/2}\rfloor\)，右邊是\(P_r = P_a-P_l\)。

參考如下的程式碼

inputH, inputW = 7, 8
strideH, strideW = 3, 3
filterH = 4
filterW = 4
inputC = 16
outputC = 3
inputData = np.ones([1,inputH,inputW,inputC]) # format [N, H, W, C]
filterData = np.float16(np.ones([filterH, filterW, inputC, outputC]) - 0.33)
strides = (1, strideH, strideW, 1)
convOutputSame = tf.nn.conv2d(inputData, filterData, strides, padding='SAME')
convOutput = tf.nn.conv2d(inputData, filterData, strides, padding=[[0,0],[1,2],[1,1],[0,0]]) # padded input data
print("output1, ", convOutputSame)
print("output2, ", convOutput)
print("Sum of a - b is ", np.sum(np.square(convOutputSame - convOutput)))

計算結果是

    output1,  tf.Tensor(
    [[[[ 96.46875  96.46875  96.46875]
       [128.625   128.625   128.625  ]
       [ 96.46875  96.46875  96.46875]]

      [[128.625   128.625   128.625  ]
       [171.5     171.5     171.5    ]
       [128.625   128.625   128.625  ]]

      [[ 64.3125   64.3125   64.3125 ]
       [ 85.75     85.75     85.75   ]
       [ 64.3125   64.3125   64.3125 ]]]])

    output2,  tf.Tensor(
    [[[[ 96.46875  96.46875  96.46875]
       [128.625   128.625   128.625  ]
       [ 96.46875  96.46875  96.46875]]

      [[128.625   128.625   128.625  ]
       [171.5     171.5     171.5    ]
       [128.625   128.625   128.625  ]]

      [[ 64.3125   64.3125   64.3125 ]
       [ 85.75     85.75     85.75   ]
       [ 64.3125   64.3125   64.3125 ]]]], shape=(1, 3, 3, 3), dtype=float64)
    Sum of a - b is  0.0

ONNX計算公式

onnx的介面，參考IR定義如下

std::function<void(OpSchema&)> ConvOpSchemaGenerator(const char* filter_desc) {
  return [=](OpSchema& schema) {
    std::string doc = R"DOC(
The convolution operator consumes an input tensor and {filter_desc}, and
computes the output.)DOC";
    ReplaceAll(doc, "{filter_desc}", filter_desc);
    schema.SetDoc(doc);
    schema.Input(
        0,
        "X",
        "Input data tensor from previous layer; "
        "has size (N x C x H x W), where N is the batch size, "
        "C is the number of channels, and H and W are the "
        "height and width. Note that this is for the 2D image. "
        "Otherwise the size is (N x C x D1 x D2 ... x Dn). "
        "Optionally, if dimension denotation is "
        "in effect, the operation expects input data tensor "
        "to arrive with the dimension denotation of [DATA_BATCH, "
        "DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...].",
        "T");
    schema.Input(
        1,
        "W",
        "The weight tensor that will be used in the "
        "convolutions; has size (M x C/group x kH x kW), where C "
        "is the number of channels, and kH and kW are the "
        "height and width of the kernel, and M is the number "
        "of feature maps. For more than 2 dimensions, the "
        "kernel shape will be (M x C/group x k1 x k2 x ... x kn), "
        "where (k1 x k2 x ... kn) is the dimension of the kernel. "
        "Optionally, if dimension denotation is in effect, "
        "the operation expects the weight tensor to arrive "
        "with the dimension denotation of [FILTER_OUT_CHANNEL, "
        "FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL ...]. "
        "X.shape[1] == (W.shape[1] * group) == C "
        "(assuming zero based indices for the shape array). "
        "Or in other words FILTER_IN_CHANNEL should be equal to DATA_CHANNEL. ",
        "T");
    schema.Input(
        2,
        "B",
        "Optional 1D bias to be added to the convolution, has size of M.",
        "T",
        OpSchema::Optional);
    schema.Output(
        0,
        "Y",
        "Output data tensor that contains the result of the "
        "convolution. The output dimensions are functions "
        "of the kernel size, stride size, and pad lengths.",
        "T");
    schema.TypeConstraint(
        "T",
        {"tensor(float16)", "tensor(float)", "tensor(double)"},
        "Constrain input and output types to float tensors.");
    schema.Attr(
        "kernel_shape",
        "The shape of the convolution kernel. If not present, should be inferred from input W.",
        AttributeProto::INTS,
        OPTIONAL);
    schema.Attr(
        "dilations",
        "dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.",
        AttributeProto::INTS,
        OPTIONAL);
    schema.Attr(
        "strides",
        "Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis.",
        AttributeProto::INTS,
        OPTIONAL);
    schema.Attr(
        "auto_pad",
        auto_pad_doc,
        AttributeProto::STRING,
        std::string("NOTSET"));
    schema.Attr(
        "pads",
        pads_doc,
        AttributeProto::INTS,
        OPTIONAL);
    schema.Attr(
        "group",
        "number of groups input channels and output channels are divided into.",
        AttributeProto::INT,
        static_cast<int64_t>(1));
    schema.TypeAndShapeInferenceFunction([](InferenceContext& ctx) {
      propagateElemTypeFromInputToOutput(ctx, 0, 0);
      convPoolShapeInference(ctx, true, false, 0, 1);
    });
  };
}

需要注意上面的auto_pad選項，與tensorflow類似有3個選項

NOTSET，同tensorflow的VALID

SAME_UPPER 或者SAME_LOWER，這裡的內容可以參考onnx檔案defs.cc中的程式碼

std::vector<int64_t> pads;
  if (getRepeatedAttribute(ctx, "pads", pads)) {
    if (pads.size() != n_input_dims * 2) {
      fail_shape_inference("Attribute pads has incorrect size");
    }
  } else {
    pads.assign(n_input_dims * 2, 0); // pads的size是輸入維度的2倍
    const auto* auto_pad_attr = ctx.getAttribute("auto_pad");
    if ((nullptr != auto_pad_attr) && (auto_pad_attr->s() != "VALID")) { // 如果pad mode是SAME_UPPER或者SAME_LOWER則進入該分支
      int input_dims_size = static_cast<int>(n_input_dims);
      for (int i = 0; i < input_dims_size; ++i) { // 計算每個axis的pads
        int64_t residual = 0;
        int64_t stride = strides[i];
        if (stride > 1) { // 如果stride == 1，那麼total_pad就是Wk - Stride = Wk - 1
          if (!input_shape.dim(2 + i).has_dim_value()) {
            continue;
          }
          residual = input_shape.dim(2 + i).dim_value();
          while (residual >= stride) {
            residual -= stride;
          }
        }
        int64_t total_pad = residual == 0 ? effective_kernel_shape[i] - stride : effective_kernel_shape[i] - residual; // effective_kernel_shape = Wk
        if (total_pad < 0)
          total_pad = 0;
        int64_t half_pad_small = total_pad >> 1;
        int64_t half_pad_big = total_pad - half_pad_small;
        if (auto_pad_attr->s() == "SAME_UPPER") {           // pad mode is here
          pads[i] = half_pad_small;
          pads[i + input_dims_size] = half_pad_big;
        } else if (auto_pad_attr->s() == "SAME_LOWER") {
          pads[i] = half_pad_big;
          pads[i + input_dims_size] = half_pad_small;
        }
      }
    }
  }

上面的程式碼裡面最難理解14~23行，其實這裡計算total_pad就是tensorflow中的\(P_a\)，以上面的公式，做更進一步的推導，

\[\begin{equation} \begin{aligned} P_{a}&=\left(W_{o}-1\right) S+W_{k}-W_{i}\\ &=(\left\lceil\frac{W_{i}}{S}\right\rceil - 1)S+W_{k}-W_{i}\\ \end{aligned} \end{equation} \]

下面的分析有兩種情況，對應程式碼第23行，

如果\(W_i\)是\(S\)的整數倍，那麼\(W_i = nS\)，帶入上面的公式有\(P_a = W_k - S\)；
如果\(W_i\)不是\(S\)的整數倍，那麼\(W_i = nS+m, m \gt 0\)，帶入上面的公式有\(P_a = W_k - m\)，這個\(m\)就是\(W_i\)被Stride相除之後的餘數，即程式碼中的residual。

SAME_UPPER和SAME_LOWER對應\(P_a\)是奇數的情況，如果是偶數，結果一樣，如果是奇數，那麼SAME_UPPER放小半部分\(\lfloor{P_a/2}\rfloor\)，SAME_LOWER放大半部分\(P_a - \lfloor{P_a/2}\rfloor\)。

舉例

在python - What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow? - Stack Overflow上有一個比較具體的例子，可以看到，使用SAME模式可以使得stride剛好完整取完所有的input而不會有剩餘或者短缺。

VALID 模式

inputs:         1  2  3  4  5  6  7  8  9  10 11 (12 13)
                |________________|                dropped
                               |_________________|

SAME模式

               pad|                                      |pad
   inputs:      0 |1  2  3  4  5  6  7  8  9  10 11 12 13|0  0
               |________________|
                              |_________________|
                                             |________________|

在這個例子中\(W_i = 13, W_k = 6, S = 5\)。

tensorflow中的padding方式SAME和VALID的區別

Tensorflow的計算公式

二維卷積介面

padding計算公式

ONNX計算公式

舉例

參考資料

tensorflow中的padding方式SAME和VALID的區別

當padding為“SAME” 和 “VALID”卷積層輸出尺寸的大小的計算

對tensorflow中tf.nn.conv1d和layers.conv1d的區別詳解

淺談tensorflow 中的圖片讀取和裁剪方式

spring 驗證 @Validated和@Valid區別

java中equals，hashcode和==的區別

HTTP請求方式GET和POST區別整理

Spring校驗：@Validated和@Valid區別

深入理解Tensorflow中的masking和padding

C#筆記-02：有關字串中查詢字元兩種方式————IndexOf和LastIndexOf的細微區別

MySQL中“:=”和“=”的區別淺析

圖文介紹mysql中:=和=的區別

Mysql中FIND_IN_SET()和IN區別簡析

pytorch中的卷積和池化計算方式詳解

Python實現影象去噪方式(中值去噪和均值去噪)

Python中repr和str區別詳解

es6 for迴圈中let和var區別詳解

淺談tensorflow中張量的提取值和賦值

tensorflow中tf.slice和tf.gather切片函式的使用

在TensorFlow中遮蔽warning的方式

tensorflow中的padding方式SAME和VALID的區別

Tensorflow的計算公式

二維卷積介面

padding計算公式

ONNX計算公式

舉例

參考資料

相關推薦