TensorFlow學習記錄-- 6.百度warp-ctc 引數以及測試例子2解釋
阿新 • • 發佈:2018-12-30
1 百度CTC
2 CTC詳解
總的來說就是想不對齊標籤,來設計一個loss,通過最小化這個loss,可以得到精確的識別效果(即最後還能在不對齊標籤的情況下解碼出來),在語音識別方面效果和優勢明顯。
未完待續
3 解讀百度warp-ctc引數以及例子
1 ctc函式
ctc(activations, flat_labels, label_lengths, input_lengths, blank_label=0)
Computes the CTC loss between a sequence of activations and a
ground truth labeling.
Args:
activations: A 3 -D Tensor of floats. The dimensions
should be (t, n, a), where t is the time index, n
is the minibatch index, and a indexes over
activations for each symbol in the alphabet.
#這個相當於logits吧(rnn預測的輸出):在tensorflow中,相當於第一個是時間序列t,第二個為batch n,第三個為輸入資料的維度a,一樣的
flat_labels: A 1-D Tensor of ints, a concatenation of all the
labels for the minibatch.
#labels是1-D的tensor,例如,對於倆個輸入資料,他的label分別為1,2,那麼1-D的label就可以記為[1,2],這是一個batch的,假如多個batch,也要把多個batch打平,假如倆個batch的label都為1,2,那麼倆個batch的label應該寫作[1,2,1,2]。
label_lengths: A 1 -D Tensor of ints, the length of each label
for each example in the minibatch.
#這個是每個minibatch中每個例子的每個label的長度,可能是因為所有label都連在一起了,不告訴label的長度就無法區分了吧?
input_lengths: A 1-D Tensor of ints, the number of time steps
for each sequence in the minibatch.
#上面這個是輸入長度,這是每個minibatch的每個序列的時間嗎?
blank_label: int, the label value/index that the CTC
calculation should use as the blank label
#返回每個minibatch每個例子?的cost。
Returns:
1-D float Tensor, the cost of each example in the minibatch
(as negative log probabilities).
* This class performs the softmax operation internally.
* The label reserved for the blank symbol should be label 0.
2 基礎測試 _test_basic輸入解讀
#開始activations維度為(2,5)
activations = np.array([
[0.1, 0.6, 0.1, 0.1, 0.1],
[0.1, 0.1, 0.6, 0.1, 0.1]
], dtype=np.float32)
alphabet_size = 5
# dimensions should be t, n, p: (t timesteps, n minibatches,
# p prob of each alphabet). This is one instance, so expand
# dimensions in the middle
#現在activations維度為(2,1,5),對應為(t,batch_size,dims)
activations = np.expand_dims(activations, 1)
#label
labels = np.asarray([1, 2], dtype=np.int32)
#每個minibatch中每個例子的每個label的長度
label_lengths = np.asarray([2], dtype=np.int32)
#輸入的時間序列長度
input_lengths = np.asarray([2], dtype=np.int32)
3 多batch測試 輸入解讀
#開始activations維度為(2,5)
activations = np.array([
[0.1, 0.6, 0.1, 0.1, 0.1],
[0.1, 0.1, 0.6, 0.1, 0.1]
], dtype=np.float32)
alphabet_size = 5
# dimensions should be t, n, p: (t timesteps, n minibatches,
# p prob of each alphabet). This is one instance, so expand
# dimensions in the middle
#現在activations維度為(2,1,5),對應為(t,batch_size,dims)
_activations = np.expand_dims(activations, 1)
#現在activations維度為(2,2,5),對應為(t,batch_size,dims)
activations = np.concatenate([_activations, _activations[...]], axis=1)
#flat labels
labels = np.asarray([1, 2, 1, 2], dtype=np.int32)
#每個minibatch中每個例子的每個label的長度,然後再組合起來
label_lengths = np.asarray([2, 2], dtype=np.int32)
#輸入的時間序列長度,然後也再組合起來
input_lengths = np.asarray([2, 2], dtype=np.int32)