tensorflow中的padding方式SAME和VALID的區別
Tensorflow中的padding有兩種方式,其中的SAME方式比較特殊,可能產生非對稱pad的情況,之前在驗證一個tensorflow的網路時踩到這個坑。
Tensorflow的計算公式
二維卷積介面
tf.nn.conv2d(
input, filters, strides, padding, data_format='NHWC', dilations=None, name=None
)
padding計算公式
需要注意padding的配置,如果是字串就有SAME
和VALID
兩種配置,如果是數字list就明確表明padding在各個維度的數量。
首先,padding
如果表示確切的數字,其維度是input維度的2倍,因為每個維度兩個邊需要補pad,比如寬度的左邊和右邊,高度的上邊和下邊,但是tensorflow中不會給N維度以及C維度補pad,僅僅在W和H維度補pad,因此對於NHWC
padding =[[0, 0], [pad_top, pad_bottom], [pad_left, pad_right], [0, 0]]
,對於NCHW
的pad的順序換過來。
然後,如果輸入的是字串選項,補的pad都可以對映到padding這個引數上,
-
VALID
模式表示不在任何維度補pad,等價於padding =[[0, 0], [0, 0], [0, 0], [0, 0]]
; -
\[W_{o}=\left\lceil\frac{W_{i}}{S}\right\rceil \]SAME
模式表示在stride
的尺度下,Wo
與Wi
保持在stride\(S\)下保持一致(以寬度維度為例),需要滿足如下關係我們知道如果
\[W o=\left\lfloor\frac{W i-W_{k}}{S}\right\rfloor+1 \]dilation = 1
,那麼在某個維度上,卷積的輸出寬度\(W_i\)、輸出寬度\(W_o\)和步長\(S\),在沒有pad的情況下,滿足如下的關係式我們以補最小程度的\(P_a\)為基準,於是有關係式
\[W o=\frac{W i+P_{a}-W_{k}}{S}+1 \]反推過來得到
\[P_{a}=\left(W_{o}-1\right) S+W_{k}-W_{i} \]這是需要補的總的pad,tensorflow的補充pad是儘量兩邊對稱的,
- 如果\(P_a\)
- 如果\(P_a\)是奇數,那麼左邊補\(P_l = \lfloor{P_a/2}\rfloor\),右邊是\(P_r = P_a-P_l\)。
- 如果\(P_a\)
參考如下的程式碼
inputH, inputW = 7, 8
strideH, strideW = 3, 3
filterH = 4
filterW = 4
inputC = 16
outputC = 3
inputData = np.ones([1,inputH,inputW,inputC]) # format [N, H, W, C]
filterData = np.float16(np.ones([filterH, filterW, inputC, outputC]) - 0.33)
strides = (1, strideH, strideW, 1)
convOutputSame = tf.nn.conv2d(inputData, filterData, strides, padding='SAME')
convOutput = tf.nn.conv2d(inputData, filterData, strides, padding=[[0,0],[1,2],[1,1],[0,0]]) # padded input data
print("output1, ", convOutputSame)
print("output2, ", convOutput)
print("Sum of a - b is ", np.sum(np.square(convOutputSame - convOutput)))
計算結果是
output1, tf.Tensor(
[[[[ 96.46875 96.46875 96.46875]
[128.625 128.625 128.625 ]
[ 96.46875 96.46875 96.46875]]
[[128.625 128.625 128.625 ]
[171.5 171.5 171.5 ]
[128.625 128.625 128.625 ]]
[[ 64.3125 64.3125 64.3125 ]
[ 85.75 85.75 85.75 ]
[ 64.3125 64.3125 64.3125 ]]]])
output2, tf.Tensor(
[[[[ 96.46875 96.46875 96.46875]
[128.625 128.625 128.625 ]
[ 96.46875 96.46875 96.46875]]
[[128.625 128.625 128.625 ]
[171.5 171.5 171.5 ]
[128.625 128.625 128.625 ]]
[[ 64.3125 64.3125 64.3125 ]
[ 85.75 85.75 85.75 ]
[ 64.3125 64.3125 64.3125 ]]]], shape=(1, 3, 3, 3), dtype=float64)
Sum of a - b is 0.0
ONNX計算公式
onnx的介面,參考IR定義如下
std::function<void(OpSchema&)> ConvOpSchemaGenerator(const char* filter_desc) {
return [=](OpSchema& schema) {
std::string doc = R"DOC(
The convolution operator consumes an input tensor and {filter_desc}, and
computes the output.)DOC";
ReplaceAll(doc, "{filter_desc}", filter_desc);
schema.SetDoc(doc);
schema.Input(
0,
"X",
"Input data tensor from previous layer; "
"has size (N x C x H x W), where N is the batch size, "
"C is the number of channels, and H and W are the "
"height and width. Note that this is for the 2D image. "
"Otherwise the size is (N x C x D1 x D2 ... x Dn). "
"Optionally, if dimension denotation is "
"in effect, the operation expects input data tensor "
"to arrive with the dimension denotation of [DATA_BATCH, "
"DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...].",
"T");
schema.Input(
1,
"W",
"The weight tensor that will be used in the "
"convolutions; has size (M x C/group x kH x kW), where C "
"is the number of channels, and kH and kW are the "
"height and width of the kernel, and M is the number "
"of feature maps. For more than 2 dimensions, the "
"kernel shape will be (M x C/group x k1 x k2 x ... x kn), "
"where (k1 x k2 x ... kn) is the dimension of the kernel. "
"Optionally, if dimension denotation is in effect, "
"the operation expects the weight tensor to arrive "
"with the dimension denotation of [FILTER_OUT_CHANNEL, "
"FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL ...]. "
"X.shape[1] == (W.shape[1] * group) == C "
"(assuming zero based indices for the shape array). "
"Or in other words FILTER_IN_CHANNEL should be equal to DATA_CHANNEL. ",
"T");
schema.Input(
2,
"B",
"Optional 1D bias to be added to the convolution, has size of M.",
"T",
OpSchema::Optional);
schema.Output(
0,
"Y",
"Output data tensor that contains the result of the "
"convolution. The output dimensions are functions "
"of the kernel size, stride size, and pad lengths.",
"T");
schema.TypeConstraint(
"T",
{"tensor(float16)", "tensor(float)", "tensor(double)"},
"Constrain input and output types to float tensors.");
schema.Attr(
"kernel_shape",
"The shape of the convolution kernel. If not present, should be inferred from input W.",
AttributeProto::INTS,
OPTIONAL);
schema.Attr(
"dilations",
"dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.",
AttributeProto::INTS,
OPTIONAL);
schema.Attr(
"strides",
"Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis.",
AttributeProto::INTS,
OPTIONAL);
schema.Attr(
"auto_pad",
auto_pad_doc,
AttributeProto::STRING,
std::string("NOTSET"));
schema.Attr(
"pads",
pads_doc,
AttributeProto::INTS,
OPTIONAL);
schema.Attr(
"group",
"number of groups input channels and output channels are divided into.",
AttributeProto::INT,
static_cast<int64_t>(1));
schema.TypeAndShapeInferenceFunction([](InferenceContext& ctx) {
propagateElemTypeFromInputToOutput(ctx, 0, 0);
convPoolShapeInference(ctx, true, false, 0, 1);
});
};
}
需要注意上面的auto_pad
選項,與tensorflow類似有3個選項
-
NOTSET
,同tensorflow的VALID
-
SAME_UPPER
或者SAME_LOWER
,這裡的內容可以參考onnx檔案defs.cc
中的程式碼std::vector<int64_t> pads; if (getRepeatedAttribute(ctx, "pads", pads)) { if (pads.size() != n_input_dims * 2) { fail_shape_inference("Attribute pads has incorrect size"); } } else { pads.assign(n_input_dims * 2, 0); // pads的size是輸入維度的2倍 const auto* auto_pad_attr = ctx.getAttribute("auto_pad"); if ((nullptr != auto_pad_attr) && (auto_pad_attr->s() != "VALID")) { // 如果pad mode是SAME_UPPER或者SAME_LOWER則進入該分支 int input_dims_size = static_cast<int>(n_input_dims); for (int i = 0; i < input_dims_size; ++i) { // 計算每個axis的pads int64_t residual = 0; int64_t stride = strides[i]; if (stride > 1) { // 如果stride == 1,那麼total_pad就是Wk - Stride = Wk - 1 if (!input_shape.dim(2 + i).has_dim_value()) { continue; } residual = input_shape.dim(2 + i).dim_value(); while (residual >= stride) { residual -= stride; } } int64_t total_pad = residual == 0 ? effective_kernel_shape[i] - stride : effective_kernel_shape[i] - residual; // effective_kernel_shape = Wk if (total_pad < 0) total_pad = 0; int64_t half_pad_small = total_pad >> 1; int64_t half_pad_big = total_pad - half_pad_small; if (auto_pad_attr->s() == "SAME_UPPER") { // pad mode is here pads[i] = half_pad_small; pads[i + input_dims_size] = half_pad_big; } else if (auto_pad_attr->s() == "SAME_LOWER") { pads[i] = half_pad_big; pads[i + input_dims_size] = half_pad_small; } } } }
上面的程式碼裡面最難理解14~23行,其實這裡計算total_pad就是tensorflow中的\(P_a\),以上面的公式,做更進一步的推導,
\[\begin{equation} \begin{aligned} P_{a}&=\left(W_{o}-1\right) S+W_{k}-W_{i}\\ &=(\left\lceil\frac{W_{i}}{S}\right\rceil - 1)S+W_{k}-W_{i}\\ \end{aligned} \end{equation} \]下面的分析有兩種情況,對應程式碼第23行,
- 如果\(W_i\)是\(S\)的整數倍,那麼\(W_i = nS\),帶入上面的公式有\(P_a = W_k - S\);
- 如果\(W_i\)不是\(S\)的整數倍,那麼\(W_i = nS+m, m \gt 0\),帶入上面的公式有\(P_a = W_k - m\),這個\(m\)就是\(W_i\)被Stride相除之後的餘數,即程式碼中的
residual
。
SAME_UPPER
和SAME_LOWER
對應\(P_a\)是奇數的情況,如果是偶數,結果一樣,如果是奇數,那麼SAME_UPPER
放小半部分\(\lfloor{P_a/2}\rfloor\),SAME_LOWER
放大半部分\(P_a - \lfloor{P_a/2}\rfloor\)。
舉例
在python - What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow? - Stack Overflow上有一個比較具體的例子,可以看到,使用SAME
模式可以使得stride剛好完整取完所有的input而不會有剩餘或者短缺。
-
VALID
模式inputs: 1 2 3 4 5 6 7 8 9 10 11 (12 13) |________________| dropped |_________________|
-
SAME
模式pad| |pad inputs: 0 |1 2 3 4 5 6 7 8 9 10 11 12 13|0 0 |________________| |_________________| |________________|
在這個例子中\(W_i = 13, W_k = 6, S = 5\)。
參考資料
- TensorFlow中CNN的兩種padding方式“SAME”和“VALID” - wuzqChom的部落格 - CSDN部落格
- python - What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow? - Stack Overflow
- What does the 'same' padding parameter in convolution mean in TensorFlow? - Quora