tensorflow clip_by_norm函式理解
阿新 • • 發佈:2019-02-04
clip_by_norm
這裡的clip_by_norm是指對梯度進行裁剪,通過控制梯度的最大正規化,防止梯度爆炸的問題,是一種比較常用的梯度規約的方式。
tensorflow中的clip_by_norm
示例
optimizer = tf.train.AdamOptimizer(learning_rate, beta1=0.5)
grads = optimizer.compute_gradients(cost)
for i, (g, v) in enumerate(grads):
if g is not None:
grads[i] = (tf.clip_by_norm(g, 5 ), v) # clip gradients
train_op = optimizer.apply_gradients(grads)
上面是一段比較通用的定義梯度計算公式的程式碼,其中用到了tf.clip_by_norm
這個方法,下面是該函式的原始碼:
def clip_by_norm(t, clip_norm, axes=None, name=None):
"""Clips tensor values to a maximum L2-norm.
Given a tensor `t`, and a maximum clip value `clip_norm`, this operation
normalizes `t` so that its L2-norm is less than or equal to `clip_norm`,
along the dimensions given in `axes`. Specifically, in the default case
where all dimensions are used for calculation, if the L2-norm of `t` is
already less than or equal to `clip_norm`, then `t` is not modified. If
the L2-norm is greater than `clip_norm`, then this operation returns a
tensor of the same type and shape as `t` with its values set to:
`t * clip_norm / l2norm(t)`
In this case, the L2-norm of the output tensor is `clip_norm`.
As another example, if `t` is a matrix and `axes == [1]`, then each row
of the output will have L2-norm equal to `clip_norm`. If `axes == [0]`
instead, each column of the output will be clipped.
This operation is typically used to clip gradients before applying them with
an optimizer.
Args:
t: A `Tensor`.
clip_norm: A 0-D (scalar) `Tensor` > 0. A maximum clipping value.
axes: A 1-D (vector) `Tensor` of type int32 containing the dimensions
to use for computing the L2-norm. If `None` (the default), uses all
dimensions.
name: A name for the operation (optional).
Returns:
A clipped `Tensor`.
"""
with ops.name_scope(name, "clip_by_norm", [t, clip_norm]) as name:
t = ops.convert_to_tensor(t, name="t")
# Calculate L2-norm, clip elements by ratio of clip_norm to L2-norm
l2norm_inv = math_ops.rsqrt(
math_ops.reduce_sum(t * t, axes, keep_dims=True))
tclip = array_ops.identity(t * clip_norm * math_ops.minimum(
l2norm_inv, constant_op.constant(1.0 , dtype=t.dtype) / clip_norm),
name=name)
return tclip
通過註解可以清晰的明白其作用在於將傳入的梯度張量t
的L2範數進行了上限約束,約束值即為clip_norm
,如果t
的L2範數超過了clip_norm
,則變換為t * clip_norm / l2norm(t)
,如此一來,變換後的t
的L2範數便小於等於clip_norm
了。
示例
下面我們通過一段程式碼來直觀地展示該函式的作用。
生成隨機數
import numpy as np
t = np.random.randint(low=0,high=5,size=10)
t
array([1, 1, 3, 4, 2, 2, 1, 4, 2, 3])
計算L2範數
l2norm4t = np.linalg.norm(t)
l2norm4t
8.0622577482985491
隨機數規約
clip_norm = 5
transformed_t = t *clip_norm/l2norm4t
transformed_t
array([ 0.62017367, 0.62017367, 1.86052102, 2.48069469, 1.24034735,
1.24034735, 0.62017367, 2.48069469, 1.24034735, 1.86052102])
驗證
np.linalg.norm(transformed_t)
5.0
可以看出,該隨機數序列的L2範數已經被規約為clip_norm
的值。