tensorflow tf.nn.softmax_cross_entropy_with_logits & tf.nn.sparse_softmax_cross_entropy_with_logits

阿新 • • 發佈：2019-01-21

____tz_zs

tf.nn.sparse_softmax_cross_entropy_with_logits

sparse_softmax_cross_entropy_with_logits(
    _sentinel=None,
    labels=None,
    logits=None,
    name=None
)

labels 的輸入是稀疏表示的，是 [0，num_classes）中的一個數值，代表正確分類。labels 的 shape為 [batch_size]，即有batch個數值

logits 是神經網路執行的結果（tensorflow神經網路中沒有softmax層，而是在此方法會進行softmax運算）。logits 的shape是 [batch_size,num_classes]，即 batch數*類別數的矩陣

一般的使用

# 損失函式
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.argmax(y_, 1), logits=y)
    cross_entropy_mean = tf.reduce_mean(cross_entropy)

tf.nn.softmax_cross_entropy_with_logits

softmax_cross_entropy_with_logits(
    _sentinel=None,
    labels=None,
    logits=None,
    dim=-1,
    name=None
)

logits和labels的shape都是[batch_size, num_classes] ，

異同

兩方法的結果是相同的。sparse_softmax_cross_entropy_with_logits 直接用標籤上計算交叉熵，而 softmax_cross_entropy_with_logits 是標籤的onehot向量參與計算。softmax_cross_entropy_with_logits 的 labels 是 sparse_softmax_cross_entropy_with_logits 的 labels 的一個獨熱版本（one hot version）。

nn_ops.py原始碼

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Wrappers for primitive Neural Net (NN) Operations."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numbers

import numpy as np

from tensorflow.python.framework import dtypes
from tensorflow.python.framework import graph_util
from tensorflow.python.framework import ops
from tensorflow.python.framework import tensor_shape
from tensorflow.python.framework import tensor_util
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import gen_nn_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.ops import random_ops
# go/tf-wildcard-import
# pylint: disable=wildcard-import
from tensorflow.python.ops.gen_nn_ops import *
# pylint: enable=wildcard-import

# Aliases for some automatically-generated names.
local_response_normalization = gen_nn_ops.lrn

# pylint: disable=protected-access


def _non_atrous_convolution(input, filter, padding, data_format=None,  # pylint: disable=redefined-builtin
                            strides=None, name=None):
  """Computes sums of N-D convolutions (actually cross correlation).

  It is required that 1 <= N <= 3.

  This is used to implement the more generic `convolution` function, which
  extends the interface of this function with a `dilation_rate` parameter.

  Args:

    input: Rank N+2 tensor of type T of shape
      `[batch_size] + input_spatial_shape + [in_channels]` if `data_format`
      does not start with `"NC"`, or
      `[batch_size, in_channels] + input_spatial_shape` if `data_format` starts
      with `"NC"`.
    filter: Rank N+2 tensor of type T of shape
      `filter_spatial_shape + [in_channels, out_channels]`.  Rank of either
      `input` or `filter` must be known.
    padding: Padding method to use, must be either "VALID" or "SAME".
    data_format: A string or None.  Specifies whether the channel dimension of
      the `input` and output is the last dimension (default, or if `data_format`
      does not start with "NC"), or the second dimension (if `data_format`
      starts with "NC").  For N=1, the valid values are "NWC" (default) and
      "NCW".  For N=2, the valid values are "NHWC" (default) and "NCHW".
      For N=3, the valid values are "NDHWC" (default) and "NCDHW".
    strides: Sequence of N positive integers, defaults to `[1] * N`.
    name: Name prefix to use.

  Returns:
    Rank N+2 tensor of type T of shape
    `[batch_size] + output_spatial_shape + [out_channels]`, where
    if padding == "SAME":
      output_spatial_shape = input_spatial_shape
    if padding == "VALID":
      output_spatial_shape = input_spatial_shape - filter_spatial_shape + 1.

  Raises:
    ValueError: if ranks are incompatible.

  """
  with ops.name_scope(name, "non_atrous_convolution", [input, filter]) as scope:
    input = ops.convert_to_tensor(input, name="input")
    filter = ops.convert_to_tensor(filter, name="filter")
    filter_shape = filter.get_shape().with_rank(input.get_shape().ndims)
    input_shape = input.get_shape().with_rank(filter_shape.ndims)
    if input_shape.ndims is None:
      raise ValueError("Rank of convolution must be known")
    if input_shape.ndims < 3 or input_shape.ndims > 5:
      raise ValueError(
          "`input` and `filter` must have rank at least 3 and at most 5")
    conv_dims = input_shape.ndims - 2
    if strides is None:
      strides = [1] * conv_dims
    elif len(strides) != conv_dims:
      raise ValueError("len(strides)=%d, but should be %d" %
                       (len(strides), conv_dims))
    if conv_dims == 1:
      # conv1d uses the 2-d data format names
      if data_format is None or data_format == "NWC":
        data_format_2d = "NHWC"
      elif data_format == "NCW":
        data_format_2d = "NCHW"
      else:
        raise ValueError("data_format must be \"NWC\" or \"NCW\".")
      return conv1d(
          value=input,
          filters=filter,
          stride=strides[0],
          padding=padding,
          data_format=data_format_2d,
          name=scope)
    elif conv_dims == 2:
      if data_format is None or data_format == "NHWC":
        data_format = "NHWC"
        strides = [1] + list(strides) + [1]
      elif data_format == "NCHW":
        strides = [1, 1] + list(strides)
      else:
        raise ValueError("data_format must be \"NHWC\" or \"NCHW\".")
      return gen_nn_ops.conv2d(
          input=input,
          filter=filter,
          strides=strides,
          padding=padding,
          data_format=data_format,
          name=name)
    elif conv_dims == 3:
      if data_format is None or data_format == "NDHWC":
        strides = [1] + list(strides) + [1]
      elif data_format == "NCDHW":
        strides = [1, 1] + list(strides)
      else:
        raise ValueError("data_format must be \"NDHWC\" or \"NCDHW\". Have: %s"
                         % data_format)
      return gen_nn_ops.conv3d(
          input=input,
          filter=filter,
          strides=strides,
          padding=padding,
          data_format=data_format,
          name=name)


def with_space_to_batch(
    input,  # pylint: disable=redefined-builtin
    dilation_rate,
    padding,
    op,
    filter_shape=None,
    spatial_dims=None,
    data_format=None):
  """Performs `op` on the space-to-batch representation of `input`.

  This has the effect of transforming sliding window operations into the
  corresponding "atrous" operation in which the input is sampled at the
  specified `dilation_rate`.

  In the special case that `dilation_rate` is uniformly 1, this simply returns:

    op(input, num_spatial_dims, padding)

  Otherwise, it returns:

    batch_to_space_nd(
      op(space_to_batch_nd(input, adjusted_dilation_rate, adjusted_paddings),
         num_spatial_dims,
         "VALID")
      adjusted_dilation_rate,
      adjusted_crops),

  where:

    adjusted_dilation_rate is an int64 tensor of shape [max(spatial_dims)],
    adjusted_{paddings,crops} are int64 tensors of shape [max(spatial_dims), 2]

  defined as follows:

  We first define two int64 tensors `paddings` and `crops` of shape
  `[num_spatial_dims, 2]` based on the value of `padding` and the spatial
  dimensions of the `input`:

  If `padding = "VALID"`, then:

    paddings, crops = required_space_to_batch_paddings(
      input_shape[spatial_dims],
      dilation_rate)

  If `padding = "SAME"`, then:

    dilated_filter_shape =
      filter_shape + (filter_shape - 1) * (dilation_rate - 1)

    paddings, crops = required_space_to_batch_paddings(
      input_shape[spatial_dims],
      dilation_rate,
      [(dilated_filter_shape - 1) // 2,
       dilated_filter_shape - 1 - (dilated_filter_shape - 1) // 2])

  Because `space_to_batch_nd` and `batch_to_space_nd` assume that the spatial
  dimensions are contiguous starting at the second dimension, but the specified
  `spatial_dims` may not be, we must adjust `dilation_rate`, `paddings` and
  `crops` in order to be usable with these operations.  For a given dimension,
  if the block size is 1, and both the starting and ending padding and crop
  amounts are 0, then space_to_batch_nd effectively leaves that dimension alone,
  which is what is needed for dimensions not part of `spatial_dims`.
  Furthermore, `space_to_batch_nd` and `batch_to_space_nd` handle this case
  efficiently for any number of leading and trailing dimensions.

  For 0 <= i < len(spatial_dims), we assign:

    adjusted_dilation_rate[spatial_dims[i] - 1] = dilation_rate[i]
    adjusted_paddings[spatial_dims[i] - 1, :] = paddings[i, :]
    adjusted_crops[spatial_dims[i] - 1, :] = crops[i, :]

  All unassigned values of `adjusted_dilation_rate` default to 1, while all
  unassigned values of `adjusted_paddings` and `adjusted_crops` default to 0.

  Note in the case that `dilation_rate` is not uniformly 1, specifying "VALID"
  padding is equivalent to specifying `padding = "SAME"` with a filter_shape of
  `[1]*N`.

  Advanced usage. Note the following optimization: A sequence of
  `with_space_to_batch` operations with identical (not uniformly 1)
  `dilation_rate` parameters and "VALID" padding

    net = with_space_to_batch(net, dilation_rate, "VALID", op_1)
    ...
    net = with_space_to_batch(net, dilation_rate, "VALID", op_k)

  can be combined into a single `with_space_to_batch` operation as follows:

    def combined_op(converted_input, num_spatial_dims, _):
      result = op_1(converted_input, num_spatial_dims, "VALID")
      ...
      result = op_k(result, num_spatial_dims, "VALID")

    net = with_space_to_batch(net, dilation_rate, "VALID", combined_op)

  This eliminates the overhead of `k-1` calls to `space_to_batch_nd` and
  `batch_to_space_nd`.

  Similarly, a sequence of `with_space_to_batch` operations with identical (not
  uniformly 1) `dilation_rate` parameters, "SAME" padding, and odd filter
  dimensions

    net = with_space_to_batch(net, dilation_rate, "SAME", op_1, filter_shape_1)
    ...
    net = with_space_to_batch(net, dilation_rate, "SAME", op_k, filter_shape_k)

  can be combined into a single `with_space_to_batch` operation as follows:

    def combined_op(converted_input, num_spatial_dims, _):
      result = op_1(converted_input, num_spatial_dims, "SAME")
      ...
      result = op_k(result, num_spatial_dims, "SAME")

    net = with_space_to_batch(net, dilation_rate, "VALID", combined_op)

  Args:
    input: Tensor of rank > max(spatial_dims).
    dilation_rate: int32 Tensor of *known* shape [num_spatial_dims].
    padding: str constant equal to "VALID" or "SAME"
    op: Function that maps (input, num_spatial_dims, padding) -> output
    filter_shape: If padding = "SAME", specifies the shape of the convolution
      kernel/pooling window as an integer Tensor of shape [>=num_spatial_dims].
      If padding = "VALID", filter_shape is ignored and need not be specified.
    spatial_dims: Monotonically increasing sequence of `num_spatial_dims`
      integers (which are >= 1) specifying the spatial dimensions of `input`
      and output.  Defaults to: `range(1, num_spatial_dims+1)`.
    data_format: A string or None.  Specifies whether the channel dimension of
      the `input` and output is the last dimension (default, or if `data_format`
      does not start with "NC"), or the second dimension (if `data_format`
      starts with "NC").  For N=1, the valid values are "NWC" (default) and
      "NCW".  For N=2, the valid values are "NHWC" (default) and "NCHW".
      For N=3, the valid values are "NDHWC" (default) and "NCDHW".

  Returns:
    The output Tensor as described above, dimensions will vary based on the op
    provided.

  Raises:
    ValueError: if `padding` is invalid or the arguments are incompatible.
    ValueError: if `spatial_dims` are invalid.

  """
  input = ops.convert_to_tensor(input, name="input")
  dilation_rate = ops.convert_to_tensor(dilation_rate,
                                        dtypes.int32,
                                        name="dilation_rate")
  try:
    rate_shape = dilation_rate.get_shape().with_rank(1)
  except ValueError:
    raise ValueError("rate must be rank 1")

  if not dilation_rate.get_shape().is_fully_defined():
    raise ValueError("rate must have known shape")

  num_spatial_dims = rate_shape[0].value

  if data_format is not None and data_format.startswith("NC"):
    starting_spatial_dim = 2
  else:
    starting_spatial_dim = 1

  if spatial_dims is None:
    spatial_dims = range(starting_spatial_dim,
                         num_spatial_dims + starting_spatial_dim)
  orig_spatial_dims = list(spatial_dims)
  spatial_dims = sorted(set(int(x) for x in orig_spatial_dims))
  if spatial_dims != orig_spatial_dims or any(x < 1 for x in spatial_dims):
    raise ValueError(
        "spatial_dims must be a montonically increasing sequence of positive "
        "integers")  # pylint: disable=line-too-long

  if data_format is not None and data_format.startswith("NC"):
    expected_input_rank = spatial_dims[-1]
  else:
    expected_input_rank = spatial_dims[-1] + 1

  try:
    input.get_shape().with_rank_at_least(expected_input_rank)
  except ValueError:
    ValueError("input tensor must have rank %d at least" %
               (expected_input_rank))

  const_rate = tensor_util.constant_value(dilation_rate)
  rate_or_const_rate = dilation_rate
  if const_rate is not None:
    rate_or_const_rate = const_rate
    if np.any(const_rate < 1):
      raise ValueError("dilation_rate must be positive")
    if np.all(const_rate == 1):
      return op(input, num_spatial_dims, padding)

  # We have two padding contributions. The first is used for converting "SAME"
  # to "VALID". The second is required so that the height and width of the
  # zero-padded value tensor are multiples of rate.

  # Padding required to reduce to "VALID" convolution
  if padding == "SAME":
    if filter_shape is None:
      raise ValueError("filter_shape must be specified for SAME padding")
    filter_shape = ops.convert_to_tensor(filter_shape, name="filter_shape")
    const_filter_shape = tensor_util.constant_value(filter_shape)
    if const_filter_shape is not None:
      filter_shape = const_filter_shape

    # Spatial dimensions of the filters and the upsampled filters in which we
    # introduce (rate - 1) zeros between consecutive filter values.
    filter_spatial_shape = filter_shape[:num_spatial_dims]
    dilated_filter_spatial_shape = (filter_spatial_shape +
                                    (filter_spatial_shape - 1) *
                                    (rate_or_const_rate - 1))
    pad_extra_shape = dilated_filter_spatial_shape - 1

    # When full_padding_shape is odd, we pad more at end, following the same
    # convention as conv2d.
    pad_extra_start = pad_extra_shape // 2
    pad_extra_end = pad_extra_shape - pad_extra_start
    base_paddings = array_ops.stack([[pad_extra_start[i], pad_extra_end[i]]
                                     for i in range(num_spatial_dims)])
  elif padding == "VALID":
    base_paddings = np.zeros([num_spatial_dims, 2], np.int32)
  else:
    raise ValueError("Invalid padding method %r" % padding)

  # Handle input whose shape is unknown during graph creation.
  input_spatial_shape = None
  if input.get_shape().ndims is not None:
    input_shape_list = input.get_shape().as_list()
    input_spatial_shape = [input_shape_list[i] for i in spatial_dims]
  if input_spatial_shape is None or None in input_spatial_shape:
    input_shape_tensor = array_ops.shape(input)
    input_spatial_shape = array_ops.stack(
        [input_shape_tensor[i] for i in spatial_dims])

  paddings, crops = array_ops.required_space_to_batch_paddings(
      input_shape=input_spatial_shape,
      base_paddings=base_paddings,
      block_shape=dilation_rate)

  def adjust(orig, fill_value):
    """Returns an `adjusted` version of `orig` based on `spatial_dims`.

    Tensor of the same type as `orig` and with shape
    `[max(spatial_dims), ...]` where:

      adjusted[spatial_dims[i] - 1, ...] = orig[i, ...]

    for 0 <= i < len(spatial_dims), and

      adjusted[j, ...] = fill_value

    for j != spatial_dims[i] - 1 for some i.

    If `orig` is a constant value, then the result will be a constant value.

    Args:
      orig: Tensor of rank > max(spatial_dims).
      fill_value: Numpy scalar (of same data type as `orig) specifying the fill
        value for non-spatial dimensions.

    Returns:
      `adjusted` tensor.
    """
    fill_dims = orig.get_shape().as_list()[1:]
    dtype = orig.dtype.as_numpy_dtype
    parts = []
    const_orig = tensor_util.constant_value(orig)
    const_or_orig = const_orig if const_orig is not None else orig
    prev_spatial_dim = 0
    i = 0
    while i < len(spatial_dims):
      start_i = i
      start_spatial_dim = spatial_dims[i]
      if start_spatial_dim > 1:
        # Fill in any gap from the previous spatial dimension (or dimension 1 if
        # this is the first spatial dimension) with `fill_value`.
        parts.append(
            np.full(
                [start_spatial_dim - 1 - prev_spatial_dim] + fill_dims,
                fill_value,
                dtype=dtype))
      # Find the largest value of i such that:
      #   [spatial_dims[start_i], ..., spatial_dims[i]]
      #     == [start_spatial_dim, ..., start_spatial_dim + i - start_i],
      # i.e. the end of a contiguous group of spatial dimensions.
      while (i + 1 < len(spatial_dims) and
             spatial_dims[i + 1] == spatial_dims[i] + 1):
        i += 1
      parts.append(const_or_orig[start_i:i + 1])
      prev_spatial_dim = spatial_dims[i]
      i += 1
    if const_orig is not None:
      return np.concatenate(parts)
    else:
      return array_ops.concat(parts, 0)

  dilation_rate = adjust(dilation_rate, 1)
  paddings = adjust(paddings, 0)
  crops = adjust(crops, 0)

  input_converted = array_ops.space_to_batch_nd(
      input=input,
      block_shape=dilation_rate,
      paddings=paddings)

  result = op(input_converted, num_spatial_dims, "VALID")

  result_converted = array_ops.batch_to_space_nd(
      input=result, block_shape=dilation_rate, crops=crops)
  return result_converted


def _get_strides_and_dilation_rate(num_spatial_dims, strides, dilation_rate):
  """Helper function for verifying strides and dilation_rate arguments.

  This is used by `convolution` and `pool`.

  Args:
    num_spatial_dims: int
    strides: Optional.  List of N ints >= 1.  Defaults to [1]*N.  If any value
      of strides is > 1, then all values of dilation_rate must be 1.
    dilation_rate: Optional.  List of N ints >= 1.  Defaults to [1]*N.  If any
      value of dilation_rate is > 1, then all values of strides must be 1.

  Returns:
    Normalized (strides, dilation_rate) as int32 numpy arrays of shape
    [num_spatial_dims].

  Raises:
    ValueError: if the parameters are invalid.
  """
  if dilation_rate is None:
    dilation_rate = [1] * num_spatial_dims
  elif len(dilation_rate) != num_spatial_dims:
    raise ValueError("len(dilation_rate)=%d but should be %d" %
                     (len(dilation_rate), num_spatial_dims))
  dilation_rate = np.array(dilation_rate, dtype=np.int32)
  if np.any(dilation_rate < 1):
    raise ValueError("all values of dilation_rate must be positive")

  if strides is None:
    strides = [1] * num_spatial_dims
  elif len(strides) != num_spatial_dims:
    raise ValueError("len(strides)=%d but should be %d" %
                     (len(strides), num_spatial_dims))
  strides = np.array(strides, dtype=np.int32)
  if np.any(strides < 1):
    raise ValueError("all values of strides must be positive")

  if np.any(strides > 1) and np.any(dilation_rate > 1):
    raise ValueError(
        "strides > 1 not supported in conjunction with dilation_rate > 1")
  return strides, dilation_rate


def convolution(input, filter,  # pylint: disable=redefined-builtin
                padding, strides=None, dilation_rate=None,
                name=None, data_format=None):
  # pylint: disable=line-too-long
  """Computes sums of N-D convolutions (actually cross-correlation).

  This also supports either output striding via the optional `strides` parameter
  or atrous convolution (also known as convolution with holes or dilated
  convolution, based on the French word "trous" meaning holes in English) via
  the optional `dilation_rate` parameter.  Currently, however, output striding
  is not supported for atrous convolutions.

  Specifically, in the case that `data_format` does not start with "NC", given
  a rank (N+2) `input` Tensor of shape

    [num_batches,
     input_spatial_shape[0],
     ...,
     input_spatial_shape[N-1],
     num_input_channels],

  a rank (N+2) `filter` Tensor of shape

    [spatial_filter_shape[0],
     ...,
     spatial_filter_shape[N-1],
     num_input_channels,
     num_output_channels],

  an optional `dilation_rate` tensor of shape [N] (defaulting to [1]*N)
  specifying the filter upsampling/input downsampling rate, and an optional list
  of N `strides` (defaulting [1]*N), this computes for each N-D spatial output
  position (x[0], ..., x[N-1]):

  ```
    output[b, x[0], ..., x[N-1], k] =
        sum_{z[0], ..., z[N-1], q}
            filter[z[0], ..., z[N-1], q, k] *
            padded_input[b,
                         x[0]*strides[0] + dilation_rate[0]*z[0],
                         ...,
                         x[N-1]*strides[N-1] + dilation_rate[N-1]*z[N-1],
                         q]
  ```
  where `padded_input` is obtained by zero padding the input using an effective
  spatial filter shape of `(spatial_filter_shape-1) * dilation_rate + 1` and
  output striding `strides` as described in the
  @{tf.nn.convolution$comment here}.

  In the case that `data_format` does start with `"NC"`, the `input` and output
  (but not the `filter`) are simply transposed as follows:

    convolution(input, data_format, **kwargs) =
      tf.transpose(convolution(tf.transpose(input, [0] + range(2,N+2) + [1]),
                               **kwargs),
                   [0, N+1] + range(1, N+1))

  It is required that 1 <= N <= 3.

  Args:
    input: An N-D `Tensor` of type `T`, of shape
      `[batch_size] + input_spatial_shape + [in_channels]` if data_format does
      not start with "NC" (default), or
      `[batch_size, in_channels] + input_spatial_shape` if data_format starts
      with "NC".
    filter: An N-D `Tensor` with the same type as `input` and shape
      `spatial_filter_shape + [in_channels, out_channels]`.
    padding: A string, either `"VALID"` or `"SAME"`. The padding algorithm.
    strides: Optional.  Sequence of N ints >= 1.  Specifies the output stride.
      Defaults to [1]*N.  If any value of strides is > 1, then all values of
      dilation_rate must be 1.
    dilation_rate: Optional.  Sequence of N ints >= 1.  Specifies the filter
      upsampling/input downsampling rate.  In the literature, the same parameter
      is sometimes called `input stride` or `dilation`.  The effective filter
      size used for the convolution will be `spatial_filter_shape +
      (spatial_filter_shape - 1) * (rate - 1)`, obtained by inserting
      (dilation_rate[i]-1) zeros between consecutive elements of the original
      filter in each spatial dimension i.  If any value of dilation_rate is > 1,
      then all values of strides must be 1.
    name: Optional name for the returned tensor.
    data_format: A string or None.  Specifies whether the channel dimension of
      the `input` and output is the last dimension (default, or if `data_format`
      does not start with "NC"), or the second dimension (if `data_format`
      starts with "NC").  For N=1, the valid values are "NWC" (default) and
      "NCW".  For N=2, the valid values are "NHWC" (default) and "NCHW".
      For N=3, the valid values are "NDHWC" (default) and "NCDHW".

  Returns:
    A `Tensor` with the same type as `input` of shape

        `[batch_size] + output_spatial_shape + [out_channels]`

    if data_format is None or does not start with "NC", or

        `[batch_size, out_channels] + output_spatial_shape`

    if data_format starts with "NC",
    where `output_spatial_shape` depends on the value of `padding`.

    If padding == "SAME":
      output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides[i])

    If padding == "VALID":
      output_spatial_shape[i] =
        ceil((input_spatial_shape[i] -
              (spatial_filter_shape[i]-1) * dilation_rate[i])
             / strides[i]).

  Raises:
    ValueError: If input/output depth does not match `filter` shape, if padding
      is other than `"VALID"` or `"SAME"`, or if data_format is invalid.

  """
  # pylint: enable=line-too-long
  with ops.name_scope(name, "convolution", [input, filter]) as name:
    input = ops.convert_to_tensor(input, name="input")
    filter = ops.convert_to_tensor(filter, name="filter")
    num_total_dims = filter.get_shape().ndims
    if num_total_dims is None:
      num_total_dims = input.get_shape().ndims
    if num_total_dims is None:
      raise ValueError("rank of input or filter must be known")

    num_spatial_dims = num_total_dims - 2

    try:
      input.get_shape().with_rank(num_spatial_dims + 2)
    except ValueError:
      ValueError("input tensor must have rank %d" % (num_spatial_dims + 2))

    try:
      filter.get_shape().with_rank(num_spatial_dims + 2)
    except ValueError:
      ValueError("filter tensor must have rank %d" % (num_spatial_dims + 2))

    if data_format is None or not data_format.startswith("NC"):
      input_channels_dim = input.get_shape()[num_spatial_dims + 1]
      spatial_dims = range(1, num_spatial_dims+1)
    else:
      input_channels_dim = input.get_shape()[1]
      spatial_dims = range(2, num_spatial_dims+2)

    if not input_channels_dim.is_compatible_with(filter.get_shape()[
        num_spatial_dims]):
      raise ValueError(
          "number of input channels does not match corresponding dimension of filter, "
          "{} != {}".format(input_channels_dim, filter.get_shape()[
              num_spatial_dims]))

    strides, dilation_rate = _get_strides_and_dilation_rate(
        num_spatial_dims, strides, dilation_rate)

    def op(input_converted, _, padding):
      return _non_atrous_convolution(
          input=input_converted,
          filter=filter,
          padding=padding,
          data_format=data_format,
          strides=strides,
          name=name)

    return with_space_to_batch(
        input=input,
        filter_shape=array_ops.shape(filter),
        spatial_dims=spatial_dims,
        dilation_rate=dilation_rate,
        padding=padding,
        op=op)


def pool(input,  # pylint: disable=redefined-builtin
         window_shape,
         pooling_type,
         padding,
         dilation_rate=None,
         strides=None,
         name=None,
         data_format=None):
  # pylint: disable=line-too-long
  """Performs an N-D pooling operation.

  In the case that `data_format` does not start with "NC", computes for
      0 <= b < batch_size,
      0 <= x[i] < output_spatial_shape[i],
      0 <= c < num_channels:

  ```
    output[b, x[0], ..., x[N-1], c] =
      REDUCE_{z[0], ..., z[N-1]}
        input[b,
              x[0] * strides[0] - pad_before[0] + dilation_rate[0]*z[0],
              ...
              x[N-1]*strides[N-1] - pad_before[N-1] + dilation_rate[N-1]*z[N-1],
              c],
  ```

  where the reduction function REDUCE depends on the value of `pooling_type`,
  and pad_before is defined based on the value of `padding` as described in the
  @{tf.nn.convolution$comment here}.
  The reduction never includes out-of-bounds positions.

  In the case that `data_format` starts with `"NC"`, the `input` and output are
  simply transposed as follows:

  ```
    pool(input, data_format, **kwargs) =
      tf.transpose(pool(tf.transpose(input, [0] + range(2,N+2) + [1]),
                        **kwargs),
                   [0, N+1] + range(1, N+1))
  ```

  Args:
    input: Tensor of rank N+2, of shape
      `[batch_size] + input_spatial_shape + [num_channels]` if data_format does
      not start with "NC" (default), or
      `[batch_size, num_channels] + input_spatial_shape` if data_format starts
      with "NC".  Pooling happens over the spatial dimensions only.
    window_shape: Sequence of N ints >= 1.
    pooling_type: Specifies pooling operation, must be "AVG" or "MAX".
    padding: The padding algorithm, must be "SAME" or "VALID".
      See the @{tf.nn.convolution$comment here}
    dilation_rate: Optional.  Dilation rate.  List of N ints >= 1.
      Defaults to [1]*N.  If any value of dilation_rate is > 1, then all values
      of strides must be 1.
    strides: Optional.  Sequence of N ints >= 1.  Defaults to [1]*N.
      If any value of strides is > 1, then all values of dilation_rate must be
      1.
    name: Optional. Name of the op.
    data_format: A string or None.  Specifies whether the channel dimension of
      the `input` and output is the last dimension (default, or if `data_format`
      does not start with "NC"), or the second dimension (if `data_format`
      starts with "NC").  For N=1, the valid values are "NWC" (default) and
      "NCW".  For N=2, the valid values are "NHWC" (default) and "NCHW".
      For N=3, the valid values are "NDHWC" (default) and "NCDHW".

  Returns:
    Tensor of rank N+2, of shape
      [batch_size] + output_spatial_shape + [num_channels]

    if data_format is None or does not start with "NC", or

      [batch_size, num_channels] + output_spatial_shape

    if data_format starts with "NC",
    where `output_spatial_shape` depends on the value of padding:

    If padding = "SAME":
      output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides[i])

    If padding = "VALID":
      output_spatial_shape[i] =
        ceil((input_spatial_shape[i] - (window_shape[i] - 1) * dilation_rate[i])
             / strides[i]).

  Raises:
    ValueError: if arguments are invalid.

  """
  # pylint: enable=line-too-long
  with ops.name_scope(name, "%s_pool" %
                      (pooling_type.lower()), [input]) as scope:
    input = ops.convert_to_tensor(input, name="input")

    num_spatial_dims = len(window_shape)
    if num_spatial_dims < 1 or num_spatial_dims > 3:
      raise ValueError("It is required that 1 <= num_spatial_dims <= 3.")

    input.get_shape().with_rank(num_spatial_dims + 2)

    strides, dilation_rate = _get_strides_and_dilation_rate(
        num_spatial_dims, strides, dilation_rate)

    if padding == "SAME" and np.any(dilation_rate > 1):
      raise ValueError(
          "pooling with SAME padding is not implemented for dilation_rate > 1")

    if np.any(strides > window_shape):
      raise ValueError(
          "strides > window_shape not supported due to inconsistency between "
          "CPU and GPU implementations")

    pooling_ops = {("MAX", 1): max_pool,
                   ("MAX", 2): max_pool,
                   ("MAX", 3): max_pool3d,  # pylint: disable=undefined-variable
                   ("AVG", 1): avg_pool,
                   ("AVG", 2): avg_pool,
                   ("AVG", 3): avg_pool3d,  # pylint: disable=undefined-variable
                  }
    op_key = (pooling_type, num_spatial_dims)
    if op_key not in pooling_ops:
      raise ValueError("%d-D %s pooling is not supported." %
                       (op_key[1], op_key[0]))

    if data_format is None or not data_format.startswith("NC"):
      adjusted_window_shape = [1] + list(window_shape) + [1]
      adjusted_strides = [1] + list(strides) + [1]
      spatial_dims = range(1, num_spatial_dims + 1)
    else:
      adjusted_window_shape = [1, 1] + list(window_shape)
      adjusted_strides = [1, 1] + list(strides)
      spatial_dims = range(2, num_spatial_dims + 2)

    if num_spatial_dims == 1:
      if data_format is None or data_format == "NWC":
        data_format_kwargs = dict(data_format="NHWC")
      elif data_format == "NCW":
        data_format_kwargs = dict(data_format="NCHW")
      else:
        raise ValueError("data_format must be either \"NWC\" or \"NCW\".")
      adjusted_window_shape = [1] + adjusted_window_shape
      adjusted_strides = [1] + adjusted_strides
    else:
      data_format_kwargs = dict(data_format=data_format)

    def op(converted_input, _, converted_padding):  # pylint: disable=missing-docstring
      if num_spatial_dims == 1:
        converted_input = array_ops.expand_dims(converted_input,
                                                spatial_dims[0])
      result = pooling_ops[op_key](converted_input,
                                   adjusted_window_shape,
                                   adjusted_strides,
                                   converted_padding,
                                   name=scope,
                                   **data_format_kwargs)
      if num_spatial_dims == 1:
        result = array_ops.squeeze(result, [spatial_dims[0]])
      return result

    return with_space_to_batch(
        input=input,
        dilation_rate=dilation_rate,
        padding=padding,
        op=op,
        spatial_dims=spatial_dims,
        filter_shape=window_shape)


def atrous_conv2d(value, filters, rate, padding, name=None):
  """Atrous convolution (a.k.a. convolution with holes or dilated convolution).

  This function is a simpler wrapper around the more general
  @{tf.nn.convolution}, and exists only for backwards compatibility. You can
  use @{tf.nn.convolution} to perform 1-D, 2-D, or 3-D atrous convolution.


  Computes a 2-D atrous convolution, also known as convolution with holes or
  dilated convolution, given 4-D `value` and `filters` tensors. If the `rate`
  parameter is equal to one, it performs regular 2-D convolution. If the `rate`
  parameter is greater than one, it performs convolution with holes, sampling
  the input values every `rate` pixels in the `height` and `width` dimensions.
  This is equivalent to convolving the input with a set of upsampled filters,
  produced by inserting `rate - 1` zeros between two consecutive values of the
  filters along the `height` and `width` dimensions, hence the name atrous
  convolution or convolution with holes (the French word trous means holes in
  English).

  More specifically:

  ```
  output[batch, height, width, out_channel] =
      sum_{dheight, dwidth, in_channel} (
          filters[dheight, dwidth, in_channel, out_channel] *
          value[batch, height + rate*dheight, width + rate*dwidth, in_channel]
      )
  ```

  Atrous convolution allows us to explicitly control how densely to compute
  feature responses in fully convolutional networks. Used in conjunction with
  bilinear interpolation, it offers an alternative to `conv2d_transpose` in
  dense prediction tasks such as semantic image segmentation, optical flow
  computation, or depth estimation. It also allows us to effectively enlarge
  the field of view of filters without increasing the number of parameters or
  the amount of computation.

  For a description of atrous convolution and how it can be used for dense
  feature extraction, please see: [Semantic Image Segmentation with Deep
  Convolutional Nets and Fully Connected CRFs](http://arxiv.org/abs/1412.7062).
  The same operation is investigated further in [Multi-Scale Context Aggregation
  by Dilated Convolutions](http://arxiv.org/abs/1511.07122). Previous works
  that effectively use atrous convolution in different ways are, among others,
  [OverFeat: Integrated Recognition, Localization and Detection using
  Convolutional Networks](http://arxiv.org/abs/1312.6229) and [Fast Image
  Scanning with Deep Max-Pooling Convolutional Neural Networks](http://arxiv.org/abs/1302.1700).
  Atrous convolution is also closely related to the so-called noble identities
  in multi-rate signal processing.

  There are many different ways to implement atrous convolution (see the refs
  above). The implementation here reduces

  ```python
      atrous_conv2d(value, filters, rate, padding=padding)
  ```

  to the following three operations:

  ```python
      paddings = ...
      net = space_to_batch(value, paddings, block_size=rate)
      net = conv2d(net, filters, strides=[1, 1, 1, 1], padding="VALID")
      crops = ...
      net = batch_to_space(net, crops, block_size=rate)
  ```

  Advanced usage. Note the following optimization: A sequence of `atrous_conv2d`
  operations with identical `rate` parameters, 'SAME' `padding`, and filters
  with odd heights/ widths:

  ```python
      net = atrous_conv2d(net, filters1, rate, padding="SAME")
      net = atrous_conv2d(net, filters2, rate, padding="SAME")
      ...
      net = atrous_conv2d(net, filtersK, rate, padding="SAME")
  ```

  can be equivalently performed cheaper in terms of computation and memory as:

  ```python
      pad = ...  # padding so that the input dims are multiples of rate
      net = space_to_batch(net, paddings=pad, block_size=rate)
      net = conv2d(net, filters1, strides=[1, 1, 1, 1], padding="SAME")
      net = conv2d(net, filters2, strides=[1, 1, 1, 1], padding="SAME")
      ...
      net = conv2d(net, filtersK, strides=[1, 1, 1, 1], padding="SAME")
      net = batch_to_space(net, crops=pad, block_size=rate)
  ```

  because a pair of consecutive `space_to_batch` and `batch_to_space` ops with
  the same `block_size` cancel out when their respective `paddings` and `crops`
  inputs are identical.

  Args:
    value: A 4-D `Tensor` of type `float`. It needs to be in the default "NHWC"
      format. Its shape is `[batch, in_height, in_width, in_channels]`.
    filters: A 4-D `Tensor` with the same type as `value` and shape
      `[filter_height, filter_width, in_channels, out_channels]`. `filters`'
      `in_channels` dimension must match that of `value`. Atrous convolution is
      equivalent to standard convolution with upsampled filters with effective
      height `filter_height + (filter_height - 1) * (rate - 1)` and effective
      width `filter_width + (filter_width - 1) * (rate - 1)`, produced by
      inserting `rate - 1` zeros along consecutive elements across the
      `filters`' spatial dimensions.
    rate: A positive int32. The stride with which we sample input values across
      the `height` and `width` dimensions. Equivalently, the rate by which we
      upsample the filter values by inserting zeros across the `height` and
      `width` dimensions. In the literature, the same parameter is sometimes
      called `input stride` or `dilation`.
    padding: A string, either `'VALID'` or `'SAME'`. The padding algorithm.
    name: Optional name for the returned tensor.

  Returns:
    A `Tensor` with the same type as `value`.
    Output shape with `'VALID`` padding is:

        [batch, height - 2 * (filter_width - 1),
         width - 2 * (filter_height - 1), out_channels].

    Output shape with `'SAME'` padding is:

        [batch, height, width, out_channels].

  Raises:
    ValueError: If input/output depth does not match `filters`' shape, or if
      padding is other than `'VALID'` or `'SAME'`.
  """
  return convolution(
      input=value,
      filter=filters,
      padding=padding,
      dilation_rate=np.broadcast_to(rate, (2,)),
      name=name)


def conv2d_transpose(value,
                     filter,
                     output_shape,
                     strides,
                     padding="SAME",
                     data_format="NHWC",
                     name=None):
  """The transpose of `conv2d`.

  This operation is sometimes called "deconvolution" after [Deconvolutional
  Networks](http://www.matthewzeiler.com/pubs/cvpr2010/cvpr2010.pdf), but is
  actually the transpose (gradient) of `conv2d` rather than an actual
  deconvolution.

  Args:
    value: A 4-D `Tensor` of type `float` and shape
      `[batch, height, width, in_channels]` for `NHWC` data format or
      `[batch, in_channels, height, width]` for `NCHW` data format.
    filter: A 4-D `Tensor` with the same type as `value` and shape
      `[height, width, output_channels, in_channels]`.  `filter`'s
      `in_channels` dimension must match that of `value`.
    output_shape: A 1-D `Tensor` representing the output shape of the
      deconvolution op.
    strides: A list of ints. The stride of the sliding window for each
      dimension of the input tensor.
    padding: A string, either `'VALID'` or `'SAME'`. The padding algorithm.
      See the @{tf.nn.convolution$comment here}
    data_format: A string. 'NHWC' and 'NCHW' are supported.
    name: Optional name for the returned tensor.

  Returns:
    A `Tensor` with the same type as `value`.

  Raises:
    ValueError: If input/output depth does not match `filter`'s shape, or if
      padding is other than `'VALID'` or `'SAME'`.
  """
  with ops.name_scope(name, "conv2d_transpose",
                      [value, filter, output_shape]) as name:
    if data_format not in ("NCHW", "NHWC"):
      raise ValueError("data_format has to be either NCHW or NHWC.")
    value = ops.convert_to_tensor(value, name="value")
    filter = ops.convert_to_tensor(filter, name="filter")
    axis = 3 if data_format == "NHWC" else 1
    if not value.get_shape()[axis].is_compatible_with(filter.get_shape()[3]):
      raise ValueError("input channels does not match filter's input channels, "
                       "{} != {}".format(value.get_shape()[3], filter.get_shape(
                       )[3]))

    output_shape_ = ops.convert_to_tensor(output_shape, name="output_shape")
    if not output_shape_.get_shape().is_compatible_with(tensor_shape.vector(4)):
      raise ValueError("output_shape must have shape (4,), got {}"
                       .format(output_shape_.get_shape()))

    if isinstance(output_shape, (list, np.ndarray)):
      # output_shape's shape should be == [4] if reached this point.
      if not filter.get_shape()[2].is_compatible_with(output_shape[axis]):
        raise ValueError(
            "output_shape does not match filter's output channels, "
            "{} != {}".format(output_shape[axis], filter.get_shape()[2]))

    if padding != "VALID" and padding != "SAME":
      raise ValueError("padding must be either VALID or SAME:"
                       " {}".format(padding))

    return gen_nn_ops.conv2d_backprop_input(input_sizes=output_shape_,
                                            filter=filter,
                                            out_backprop=value,
                                            strides=strides,
                                            padding=padding,
                                            data_format=data_format,
                                            name=name)


def atrous_conv2d_transpose(value,
                            filters,
                            output_shape,
                            rate,
                            padding,
                            name=None):
  """The transpose of `atrous_conv2d`.

  This operation is sometimes called "deconvolution" after [Deconvolutional
  Networks](http://www.matthewzeiler.com/pubs/cvpr2010/cvpr2010.pdf), but is
  actually the transpose (gradient) of `atrous_conv2d` rather than an actual
  deconvolution.

  Args:
    value: A 4-D `Tensor` of type `float`. It needs to be in the default `NHWC`
      format. Its shape is `[batch, in_height, in_width, in_channels]`.
    filters: A 4-D `Tensor` with the same type as `value` and shape
      `[filter_height, filter_width, out_channels, in_channels]`. `filters`'
      `in_channels` dimension must match that of `value`. Atrous convolution is
      equivalent to standard convolution with upsampled filters with effective
      height `filter_height + (filter_height - 1) * (rate - 1)` and effective
      width `filter_width + (filter_width - 1) * (rate - 1)`, produced by
      inserting `rate - 1` zeros along consecutive elements across the
      `filters`' spatial dimensions.
    output_shape: A 1-D `Tensor` of shape representing the output shape of the
      deconvolution op.
    rate: A positive int32. The stride with which we sample input values across
      the `height` and `width` dimensions. Equivalently, the rate by which we
      upsample the filter values by inserting zeros across the `height` and
      `width` dimensions. In the literature, the same parameter is sometimes
      called `input stride` or `dilation`.
    padding: A string, either `'VALID'` or `'SAME'`. The padding algorithm.
    name: Optional name for the returned tensor.

  Returns:
    A `Tensor` with the same type as `value`.

  Raises:
    ValueError: If input/output depth does not match `filters`' shape, or if
      padding is other than `'VALID'` or `'SAME'`, or if the `rate` is less
      than one, or if the output_shape is not a tensor with 4 elements.
  """
  with ops.name_scope(name, "atrous_conv2d_transpose",
                      [value, filters, output_shape]) as name:
    value = ops.convert_to_tensor(value, name="value")
    filters = ops.convert_to_tensor(filters, name="filters")
    if not value.get_shape()[3].is_compatible_with(filters.get_shape()[3]):
      raise ValueError(
          "value's input channels does not match filters' input channels, "
          "{} != {}".format(value.get_shape()[3], filters.get_shape()[3]))
    if rate < 1:
      raise ValueError("rate {} cannot be less than one".format(rate))

    if rate == 1:
      return conv2d_transpose(value,
                              filters,
                              output_shape,
                              strides=[1, 1, 1, 1],
                              padding=padding,
                              data_format="NHWC")

    output_shape_ = ops.convert_to_tensor(output_shape, name="output_shape")
    if not output_shape_.get_shape().is_compatible_with(tensor_shape.vector(4)):
      raise ValueError("output_shape must have shape (4,), got {}"
                       .format(output_shape_.get_shape()))

    if isinstance(output_shape, (list, np.ndarray)):
      # output_shape's shape should be == [4] if reached this point.
      if not filters.get_shape()[2].is_compatible_with(output_shape[3]):
        raise ValueError(
            "output_shape does not match filter's output channels, "
            "{} != {}".format(output_shape[3], filters.get_shape()[2]))

    # We have two padding contributions. The first is used for converting "SAME"
    # to "VALID". The second is required so that the height and width of the
    # zero-padded value tensor are multiples of rate.

    # Padding required to reduce to "VALID" convolution
    if padding == "SAME":
      # Handle filters whose shape is unknown during graph creation.
      if filters.get_shape().is_fully_defined():
        filter_shape = filters.get_shape().as_list()
      else:
        filter_shape = array_ops.shape(filters)
      filter_height, filter_width = filter_shape[0], filter_shape[1]

      # Spatial dimensions of the filters and the upsampled filters in which we
      # introduce (rate - 1) zeros between consecutive filter values.
      filter_height_up = filter_height + (filter_height - 1) * (rate - 1)
      filter_width_up = filter_width + (filter_width - 1) * (rate - 1)

      pad_height = filter_height_up - 1
      pad_width = filter_width_up - 1

      # When pad_height (pad_width) is odd, we pad more to bottom (right),
      # following the same convention as conv2d().
      pad_top = pad_height // 2
      pad_bottom = pad_height - pad_top
      pad_left = pad_width // 2
      pad_right = pad_width - pad_left
    elif padding == "VALID":
      pad_top = 0
      pad_bottom = 0
      pad_left = 0
      pad_right = 0
    else:
      raise ValueError("padding must be either VALID or SAME:"
                       " {}".format(padding))

    in_height = output_shape[1] + pad_top + pad_bottom
    in_width = output_shape[2] + pad_left + pad_right

    # More padding so that rate divides the height and width of the input.
    pad_bottom_extra = (rate - in_height % rate) % rate
    pad_right_extra = (rate - in_width % rate) % rate

    # The paddings argument to space_to_batch is just the extra padding
    # component.
    space_to_batch_pad = [[0, pad_bottom_extra], [0, pad_right_extra]]

    value = array_ops.space_to_batch(input=value,
                                     paddings=space_to_batch_pad,
                                     block_size=rate)

    input_sizes = [rate * rate * output_shape[0],
                   (in_height + pad_bottom_extra) // rate,
                   (in_width + pad_right_extra) // rate,
                   output_shape[3]]

    value = gen_nn_ops.conv2d_backprop_input(input_sizes=input_sizes,
                                             filter=filters,
                                             out_backprop=value,
                                             strides=[1, 1, 1, 1],
                                             padding="VALID",
                                             data_format="NHWC")

    # The crops argument to batch_to_space includes both padding components.
    batch_to_space_crop = [[pad_top, pad_bottom + pad_bottom_extra],
                           [pad_left, pad_right + pad_right_extra]]

    return array_ops.batch_to_space(input=value,
                                    crops=batch_to_space_crop,
                                    block_size=rate)


def conv3d_transpose(value,
                     filter,
                     output_shape,
                     strides,
                     padding="SAME",
                     data_format="NDHWC",
                     name=None):
  """The transpose of `conv3d`.

  This operation is sometimes called "deconvolution" after [Deconvolutional
  Networks](http://www.matthewzeiler.com/pubs/cvpr2010/cvpr2010.pdf), but is
  actually the transpose (gradient) of `conv3d` rather than an actual
  deconvolution.

  Args:
    value: A 5-D `Tensor` of type `float` and shape
      `[batch, depth, height, width, in_channels]`.
    filter: A 5-D `Tensor` with the same type as `value` and shape
      `[depth, height, width, output_channels, in_channels]`.  `filter`'s
      `in_channels` dimension must match that of `value`.
    output_shape: A 1-D `Tensor` representing the output shape of the
      deconvolution op.
    strides: A list of ints. The stride of the sliding window for each
      dimension of the input tensor.
    padding: A string, either `'VALID'` or `'SAME'`. The padding algorithm.
      See the @{tf.nn.convolution$comment here}
    data_format: A string, either `'NDHWC'` or `'NCDHW`' specifying the layout
      of the input and output tensors. Defaults to `'NDHWC'`.
    name: Optional name for the returned tensor.

  Returns:
    A `Tensor` with the same type as `value`.

  Raises:
    ValueError: If input/output depth does not match `filter`'s shape, or if
      padding is other than `'VALID'` or `'SAME'`.
  """
  with ops.name_scope(name, "conv3d_transpose",
                      [value, filter, output_shape]) as name:
    value = ops.convert_to_tensor(value, name="value")
    filter = ops.convert_to_tensor(filter, name="filter")
    axis = 1 if data_format == "NCDHW" else 4
    if not value.get_shape()[axis].is_compatible_with(filter.get_shape()[4]):
      raise ValueError("input channels does not match filter's input channels, "
                       "{} != {}".format(value.get_shape()[axis],
                                         filter.get_shape()[4]))

    output_shape_ = ops.convert_to_tensor(output_shape, name="output_shape")
    if not output_shape_.get_shape().is_compatible_with(tensor_shape.vector(5)):
      raise ValueError("output_shape must have shape (5,), got {}"
                       .format(output_shape_.get_shape()))

    if isinstance(output_shape, (list, np.ndarray)):
      # output_shape's shape should be == [5] if reached this point.
      if not filter.get_shape()[3].is_compatible_with(output_shape[4]):
        raise ValueError(
            "output_shape does not match filter's output channels, "
            "{} != {}".format(output_shape[4], filter.get_shape()[3]))

    if padding != "VALID" and padding != "SAME":
      raise ValueError("padding must be either VALID or SAME:"
                       " {}".format(padding))

    return gen_nn_ops.conv3d_backprop_input_v2(input_sizes=output_shape_,
                                               filter=filter,
                                               out_backprop=value,
                                               strides=strides,
                                               padding=padding,
                                               data_format=data_format,
                                               name=name)


# pylint: disable=protected-access
def bias_add(value, bias, data_format=None, name=None):
  """Adds `bias` to `value`.

  This is (mostly) a special case of `tf.add` where `bias` is restricted to 1-D.
  Broadcasting is supported, so `value` may have any number of dimensions.
  Unlike `tf.add`, the type of `bias` is allowed to differ from `value` in the
  case where both types are quantized.

  Args:
    value: A `Tensor` with type `float`, `double`, `int64`, `int32`, `uint8`,
      `int16`, `int8`, `complex64`, or `complex128`.
    bias: A 1-D `Tensor` with size matching the last dimension of `value`.
      Must be the same type as `value` unless `value` is a quantized type,
      in which case a different quantized type may be used.
    data_format: A string. 'NHWC' and 'NCHW' are supported.
    name: A name for the operation (optional).

  Returns:
    A `Tensor` with the same type as `value`.
  """
  with ops.name_scope(name, "BiasAdd", [value, bias]) as name:
    value = ops.convert_to_tensor(value, name="input")
    bias = ops.convert_to_tensor(bias, dtype=value.dtype, name="bias")
    return gen_nn_ops._bias_add(value, bias, data_format=data_format, name=name)


# pylint: disable=protected-access
def bias_add_v1(value, bias, name=None):
  """Adds `bias` to `value`.

  This is a deprecated version of bias_add and will soon to be removed.

  This is (mostly) a special case of `tf.add` where `bias` is restricted to 1-D.
  Broadcasting is supported, so `value` may have any number of dimensions.
  Unlike `tf.add`, the type of `bias` is allowed to differ from `value` in the
  case where both types are quantized.

  Args:
    value: A `Tensor` with type `float`, `double`, `int64`, `int32`, `uint8`,
      `int16`, `int8`, `complex64`, or `complex128`.
    bias: A 1-D `Tensor` with size matching the last dimension of `value`.
      Must be the same type as `value` unless `value` is a quantized type,
      in which case a different quantized type may be used.
    name: A name for the operation (optional).

  Returns:
    A `Tensor` with the same type as `value`.
  """
  with ops.name_scope(name, "BiasAddV1", [value, bias]) as name:
    value = ops.convert_to_tensor(value, name="input")
    bias = ops.convert_to_tensor(bias, dtype=value.dtype, name="bias")
    return gen_nn_ops._bias_add_v1(value, bias, name=name)


def crelu(features, name=None):
  """Computes Concatenated ReLU.

  Concatenates a ReLU which selects only the positive part of the activation
  with a ReLU which selects only the *negative* part of the activation.
  Note that as a result this non-linearity doubles the depth of the activations.
  Source: [Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units. W. Shang, et al.](https://arxiv.org/abs/1603.05201) 

  Args:
    features: A `Tensor` with type `float`, `double`, `int32`, `int64`, `uint8`,
      `int16`, or `int8`.
    name: A name for the operation (optional).

  Returns:
    A `Tensor` with the same type as `features`.
  """
  with ops.name_scope(name, "CRelu", [features]) as name:
    features = ops.convert_to_tensor(features, name="features")
    c = array_ops.concat([features, -features], -1, name=name)
    return gen_nn_ops.relu(c)


def relu6(features, name=None):
  """Computes Rectified Linear 6: `min(max(features, 0), 6)`.
  Source: [Convolutional Deep Belief Networks on CIFAR-10. A. Krizhevsky](http://www.cs.utoronto.ca/~kriz/conv-cifar10-aug2010.pdf)

  Args:
    features: A `Tensor` with type `float`, `double`, `int32`, `int64`, `uint8`,
      `int16`, or `int8`.
    name: A name for the operation (optional).

  Returns:
    A `Tensor` with the same type as `features`.
  """
  with ops.name_scope(name, "Relu6", [features]) as name:
    features = ops.convert_to_tensor(features, name="features")
    return gen_nn_ops._relu6(features, name=name)


def _flatten_outer_dims(logits):
  """Flattens logits' outer dimensions and keep its last dimension."""
  rank = array_ops.rank(logits)
  last_dim_size = array_ops.slice(
      array_ops.shape(logits), [math_ops.subtract(rank, 1)], [1])
  output = array_ops.reshape(logits, array_ops.concat([[-1], last_dim_size], 0))

  # Set output shape if known.
  shape = logits.get_shape()
  if shape is not None and shape.dims is not None:
    shape = shape.as_list()
    product = 1
    product_valid = True
    for d in shape[:-1]:
      if d is None:
        product_valid = False
        break
      else:
        product *= d
    if product_valid:
      output_shape = [product, shape[-1]]
      output.set_shape(output_shape)

  return output


def _softmax(logits, compute_op, dim=-1, name=None):
  """Helper function for softmax and log_softmax.

  It reshapes and transposes the input logits into a 2-D Tensor and then invokes
  the tf.nn._softmax or tf.nn._log_softmax function. The output would be
  transposed and reshaped back.

  Args:
    logits: A non-empty `Tensor`. Must be one of the following types: `half`,
      `float32`, `float64`.
    compute_op: Either gen_nn_ops._softmax or gen_nn_ops._log_softmax
    dim: The dimension softmax would be performed on. The default is -1 which
      indicates the last dimension.
    name: A name for the operation (optional).

  Returns:
    A `Tensor`. Has the same type as `logits`. Same shape as `logits`.
  Raises:
    InvalidArgumentError: if `logits` is empty or `dim` is beyond the last
      dimension of `logits`.
  """
  def _swap_axis(logits, dim_index, last_index, name=None):
    """Swaps logits's dim_index and last_index."""
    return array_ops.transpose(logits,
                               array_ops.concat([
                                   math_ops.range(dim_index), [last_index],
                                   math_ops.range(dim_index + 1, last_index),
                                   [dim_index]
                               ], 0), name=name)

  logits = ops.convert_to_tensor(logits)

  # We need its original shape for shape inference.
  shape = logits.get_shape()
  is_last_dim = (dim is -1) or (dim == shape.ndims - 1)

  if shape.ndims is 2 and is_last_dim:
    return compute_op(logits, name=name)

  # If dim is the last dimension, simply reshape the logits to a matrix and
  # apply the internal softmax.
  if is_last_dim:
    input_shape = array_ops.shape(logits)
    logits = _flatten_outer_dims(logits)
    output = compute_op(logits)
    output = array_ops.reshape(output, input_shape, name=name)
    return output

  # If dim is not the last dimension, we have to do a reshape and transpose so
  # that we can still perform softmax on its last dimension.

  # Swap logits' dimension of dim and its last dimension.
  input_rank = array_ops.rank(logits)
  logits = _swap_axis(logits, dim, math_ops.subtract(input_rank, 1))
  shape_after_swap = array_ops.shape(logits)

  # Reshape logits into a matrix.
  logits = _flatten_outer_dims(logits)

  # Do the actual softmax on its last dimension.
  output = compute_op(logits)

  # Transform back the output tensor.
  output = array_ops.reshape(output, shape_after_swap)
  output = _swap_axis(output, dim, math_ops.subtract(input_rank, 1), name=name)

  # Make shape inference work since reshape and transpose may erase its static
  # shape.
  output.set_shape(shape)

  return output


def softmax(logits, dim=-1, name=None):
  """Computes softmax activations.

  For each batch `i` and class `j` we have

      softmax = exp(logits) / reduce_sum(exp(logits), dim)

  Args:
    logits: A non-empty `Tensor`. Must be one of the following types: `half`,
      `float32`, `float64`.
    dim: The dimension softmax would be performed on. The default is -1 which
      indicates the last dimension.
    name: A name for the operation (optional).

  Returns:
    A `Tensor`. Has the same type as `logits`. Same shape as `logits`.
  Raises:
    InvalidArgumentError: if `logits` is empty or `dim` is beyond the last
      dimension of `logits`.
  """
  return _softmax(logits, gen_nn_ops._softmax, dim, name)


def log_softmax(logits, dim=-1, name=None):
  """Computes log softmax activations.

  For each batch `i` and class `j` we have

      logsoftmax = logits - log(reduce_sum(exp(logits), dim))

  Args:
    logits: A non-empty `Tensor`. Must be one of the following types: `half`,
      `float32`, `float64`.
    dim: The dimension softmax would be performed on. The default is -1 which
      indicates the last dimension.
    name: A name for the operation (optional).

  Returns:
    A `Tensor`. Has the same type as `logits`. Same shape as `logits`.

  Raises:
    InvalidArgumentError: if `logits` is empty or `dim` is beyond the last
      dimension of `logits`.
  """
  return _softmax(logits, gen_nn_ops._log_softmax, dim, name)


def _ensure_xent_args(name, sentinel, labels, logits):
  # Make sure that all arguments were passed as named arguments.
  if sentinel is not None:
    raise ValueError("Only call `%s` with "
                     "named arguments (labels=..., logits=..., ...)" % name)
  if labels is None or logits is None:
    raise ValueError("Both labels and logits must be provided.")


def softmax_cross_entropy_with_logits(_sentinel=None,  # pylint: disable=invalid-name
                                      labels=None, logits=None,
                                      dim=-1, name=None):
  """Computes softmax cross entropy between `logits` and `labels`.

  Measures the probability error in discrete classification tasks in which the
  classes are mutually exclusive (each entry is in exactly one class).  For
  example, each CIFAR-10 image is labeled with one and only one label: an image
  can be a dog or a truck, but not both.

  **NOTE:**  While the classes are mutually exclusive, their probabilities
  need not be.  All that is required is that each row of `labels` is
  a valid probability distribution.  If they are not, the computation of the
  gradient will be incorrect.

  If using exclusive `labels` (wherein one and only
  one class is true at a time), see `spar

 
 
              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    tensorflow tf.nn.softmax_cross_entropy_with_logits & tf.nn.sparse_softmax_cross_entropy_with_logits
      
                ____tz_zstf.nn.sparse_softmax_cross_entropy_with_logits.sparse_softmax_cross_entropy_with_logits(
    _sentinel=None,
    labels=None,
    

  
 

    

    
    【TensorFlow】tf.nn.softmax_cross_entropy_with_logits的用法
      white   交叉   none   padding   tomat   ros   true   const   cross   在計算loss的時候，最常見的一句話就是 tf.nn.softmax_cross_entropy_with_logits ，那麽它到底是怎麽做的呢？
首先明確一點，loss是代 

  
 

    

    
    TensorFlow學習---tf.nn.softmax_cross_entropy_with_logits的用法
      
                
轉載部落格：http://blog.csdn.net/mao_xiao_feng/article/details/53382790


在計算loss的時候，最常見的一句話就是tf.nn.softmax_cross_entropy_with_logits，那麼它到底是怎麼做 

  
 

    

    
    Tensorflow學習筆記之tf.nn.relu
       
  
  
 Tensorflow學習筆記之tf.nn.relu 關於Tensorflow的學習筆記大部分為其他部落格或者書籍轉載，只為督促自己學習。 
 線性整流函式（Rectified Linear Unit，ReLU），又稱修正線性單元。其定義如下圖，在橫座標的右側，ReLU函式為線性函式。在橫座標 

  
 

    

    
    tensorflow常用函式之tf.nn.softmax
       
  
  
  
 
 關於softmax的詳細說明，請看Softmax。  通過Softmax迴歸，將logistic的預測二分類的概率的問題推廣到了n分類的概率的問題。通過公式    可以看出當月分類的個數變為2時，Softmax迴歸又退化為logistic迴歸問題。 
 

  
 

    

    
    TensorFlow 辨異 tf add a b 與 a b tf assign 與 tf nn bia
       
  
  
 分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow
 
 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！
 
 
          

  
 

    

    
    tensorflow-tf.nn.softmax,tf.nn.sparse_softmax_cr
      #!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Tue Oct  2 08:43:02 2018

@author: myhaspl
@email:[email protected]
tf.nn.softmax
tf.nn. 

  
 

    

    
    tensorflow學習（一）——有關tensorflow不同層的使用（tf.nn 和tf.layers以及tf.contrib.layers）的簡單區別
      
                小trick: 對於使用tf.layers建立的神經網路，如果想要對loss函式進行正則話，可以採用如下方式[1]：

但是該方法不適用於程式設計者自己定義不同層的正則化。

 l2 = tf.add_n([tf.nn.l2_loss(var) for var in tf.t 

  
 

    

    
    tensorflow-啟用函式及tf.nn.dropout
      
								
								            
							
							
							參考《Tensorflow技術解析與實戰》



啟用函式


啟用函式（activation function）將神經元計算wTx+b的結果經過非線性表達對映到下一層。
需要可微，啟用函式不會改變輸入 

  
 

    

    
    TensorFlow 學習（七） — 常用函式 api、tf.nn、tf.keras
      
								
								            
							
							
							0. 四則運算

平方：tf.square()，開方：tf.sqrt()
tf.add()、tf.sub()、tf.mul()、tf.div()、tf.mod()、tf.abs()、tf.neg()

 

  
 

    

    
    TensorFlow（七）tf.nn庫
      
								
								            
							
							
							tf.nn，tf.layers， tf.contrib模組有很多功能是重複的

下面是對三個模組的簡述：


tf.nn ：提供神經網路相關操作的支援，包括卷積操作（conv）、池化操作（pooling 

  
 

    

    
    Tensorflow卷積操作tf.nn.conv2d的理解
      
							
							
							函式定義

tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)

功能：在給定4-D 輸入和fliters的情況下，計 

  
 

    

    
    tf.convert_to_tensor() tf.nn.embedding_lookup tf.cast() KMeans解釋和實現
       
  
  
 tf.convert_to_tensor() 
 tf.convert_to_tensor(
    value,
    dtype=None,
    name=None,
    preferred_dtype=None
)

 
 將給定value轉換為Tensor。
此函式將各種型 

  
 

    

    
    詳解tf.nn.bias_add和tf.add、tf.add_n的區別
      
                tf.add(x,y,name=None)

x:a tensor musut be one of                           the following types: half, float32, float64, uint8, int8, int1 

  
 

    

    
    tf.nn，tf.layers， tf.contrib概述
      
                轉自：https://blog.csdn.net/u014365862/article/details/77833481

我們在使用tensorflow時，會發現tf.nn，tf.layers， tf.contrib模組有很多功能是重複的，尤其是卷積操作，在使用的時候，我們 

  
 

    

    
    tf.nn.conv2d tf.nn.bias_add tf.nn.max_pool tf.nn.bias_add tf.nn.relu 實現一個CNN
       
  
  
 溫馨提示 
 我首先會對知識點進行講解，後面的程式碼會用到這裡所講的所有知識點，在看的時候，如果不懂也沒事，看了後面程式碼就會明白 
 tf.nn.conv2d 
 tf.nn.conv2d(
    input,
    filter,
    strides,
    padding,
 

  
 

    

    
    （轉載）tf.nn，tf.layers， tf.contrib概述
      我們在使用tensorflow時，會發現tf.nn，tf.layers， tf.contrib模組有很多功能是重複的，尤其是卷積操作，在使用的時候，我們可以根據需要現在不同的模組。但有些時候可以一起混用。        下面是對三個模組的簡 

  
 

    

    
    tf.depth_to_space和torch.nn.pixel_shuffle
      
							
							
							   最近做專案用到了這兩個函式，本人經過仔細對比，認為它們的功能應該是完全一樣的，都是將一個較多通道的特徵變成較少通道的特徵，具體定義如下：



def depth_to_space(input, block_size, name=None):

   bl 

  
 

    

    
    tf.add 與tf.nn.bias_add 區別
      
                1. tf.add(a, b) 與 a+b在神經網路前向傳播的過程中，經常可見如下兩種形式的程式碼：tf.add(tf.matmul(x, w), b)tf.matmul(x, w) + b簡而言之，就是 tf.add(a, b) 與 a + b二者的區別，類似的也有，tf. 

  
 

    

    
    tensorflow API _ 2 (tf.app.flags.FLAGS)
      string   執行   with   自動調用   save   文件   oat   自動   viso   tf.app.flags.FLAGS 的使用，主要是在用命令行執行程序時，需要傳些參數，代碼如下：新建一個名為：app_flags.py 的文件。
#coding:utf-8  import t

tensorflow tf.nn.softmax_cross_entropy_with_logits & tf.nn.sparse_softmax_cross_entropy_with_logits

tf.nn.sparse_softmax_cross_entropy_with_logits

tf.nn.softmax_cross_entropy_with_logits

異同

相關知識

nn_ops.py原始碼

相關推薦