1. 程式人生 > >Utterance-Wise Recurrent Dropout And Iterative Speaker Adaptation For Robust Monaural Speech Recognition

Utterance-Wise Recurrent Dropout And Iterative Speaker Adaptation For Robust Monaural Speech Recognition

back hid eve 以及 pre learn line sig ann

單聲道語音識別的逐句循環Dropout叠代說話人自適應

WRBNwide residual BLSTM network,寬殘差雙向長短時記憶網絡)

[2] J. Heymann, L. Drude, and R. Haeb-Umbach, "Wide residual blstm network with discriminative speaker adaptation for robust speech recognition," submitted to the CHiME, vol. 4, 2016.

reverberationn. [] 混響;反射;反響;回響

CLDNNconvolutional, long short-term memory, fully connected deep neural networks,卷積-長短時記憶-全連接深度神經網絡)

[1] T.N. Sainath, O. Vinyals, A. Senior, and H. Sak, "Convolutional, long short-term memory, fully connected deep neural networks," in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4580

4584.

speech separation,語音分離,將多說話人同時說話的語句分離為各個說話人獨立說話的語句。

LSTM訓練中使用Dropout能有效緩解過擬合。

[3] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov, "Improving neural networks by preventing co-adaptation of feature detectors," arXiv preprint arXiv:1207.0580, 2012.

在輸出門、遺忘門以及輸入門使用基於語句采樣丟幀Mask

能取得最優結果(Cheng dropout)。

[7] G. Cheng, V. Peddinti, D. Povey, V. Manohar, S. Khudanpur, and Y. Yan, "An exploration of dropout with lstms," in Proceedings of Interspeech, 2017.

基於MLLR的叠代自適應方法,使用上一次叠代的解碼結果來更新高斯參數。

[10] P.C. Woodland, D. Pye, and M.J.F. Gales, "Iterative unsupervised adaptation using maximum likelihood linear regression," inSpokenLanguage, 1996.ICSLP96.Proceedings., Fourth International Conference on. IEEE, 1996, vol. 2, pp. 1133–1136.

近期提出了一種batch正則化說話人自適應。

[14] P. Swietojanski, J. Li, and S. Renals, "Learning hidden unit contributions for unsupervised acoustic model adaptation," IEEE/ACMTransactionsonAudio,Speech, and Language Processing, vol. 24, no. 8, pp. 1450– 1463, 2016.

本文使用了無監督的LIN說話人自適應

[11]

使用的LIN層矩陣維數為80*80,該層被三個輸入特征共享(原始、deltadelta-delta)。

本文嘗試使用以下兩種方式進行叠代的說話人自適應:

  • 在叠代時使用上一次叠代的模型生成新標簽進行訓練。
  • 每次叠代堆疊一個額外的線性輸入層(數學上,多個線性層相當於一個隱層)

傳統DNN訓練方式是segment-wise

實驗得出,使用RNN時,Iter(叠代方案)更優;使用tri-gram時,Stack(堆疊)方案更優

Utterance-Wise Recurrent Dropout And Iterative Speaker Adaptation For Robust Monaural Speech Recognition