端到端語音識別（二） ctc

阿新 • • 發佈：2019-02-08

History

ICML-2006. Graves et al. [1] introduced the connectionist temporal classification (CTC) objective function for phone recognition.
ICML-2014. Graves [2] demonstrated that character-level speech transcription can be performed by a recurrent neural network with minimal preprocessing.
Baidu. 2014 [3] DeepSpeech, 2015 [4] DeepSpeech2.
ASRU-2015. YaJie Miao [5] presented Eesen framework.
ASRU-2015. Google [6] extended the application of Context-Dependent (CD) LSTM trained with CTC and sMBR loss.
ICASSP-2016. Google [7] presented a compact large vocabulary speech recognition system that can run efficiently on mobile devices, accurately and with low latency.
NIPS-2016. Google [8] used whole words as acoustic units.
2017, IBM [9] employed direct acoustics-to-word models.

Reference

[1]. A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber. Connectionist temporal classfification: labelling unsegmented sequence data with recurrent neural networks. In ICML, 2006.
[2]. Graves, Alex and Jaitly, Navdeep. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1764–1772, 2014.
[3]. Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G.,Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates,A., et al. (2014a).Deepspeech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567.
[4]. D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos et al., “Deep speech 2: End-to-end speech recognition in english and mandarin,” CoRR arXiv:1512.02595, 2015.
[5]. Yajie Miao, Mohammad Gowayyed, Florian Metze. EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding. 2015 Automatic Speech Recognition and Understanding Workshop (ASRU 2015)
[6]. A. Senior, H. Sak, F. de Chaumont Quitry, T. N. Sainath, and K. Rao, “Acoustic Modelling with CD-CTC-SMBR LSTM RNNS,” in ASRU, 2015
[7]. I. McGraw, R. Prabhavalkar, R. Alvarez, M. Gonzalez Arenas, K. Rao, D. Rybach, O. Alsharif, H. Sak, A. Gruenstein, F. Beaufays, and C. Parada, “Personalized speech recognition on mobile devices,” in Proc. of ICASSP, 2016.
[8]. H. Soltau, H. Liao, and H. Sak, “Neural speech recognizer: Acoustic-to-word lstm model for large vocabulary speech recognition,” arXiv preprint arXiv:1610.09975,2016.
[9]. K. Audhkhasi, B. Ramabhadran, G. Saon, M. Picheny, D. Nahamoo, “Direct Acoustics-to-Word Models for English Conversational Speech Recognition” arXiv preprint arXiv:1703.07754,2017.

端到端語音識別（二） ctc

相關筆記

History

Reference

端到端語音識別（二） ctc

基於HMM的語音識別（二）

Android-谷歌語音識別之離線識別（二）

手遊客戶端的效能篇（二）----Unity和C#版之字串拼接，Struct和Class的區別與應用

端到端的車牌識別（ＳＳＤ）

論文筆記：語音情感識別（二）聲譜圖+CRNN

一個很好用的移動端Lightbox特效外掛（二）

c/s客戶端---功能測試點（二）

Java服務端支付功能模組--（二）微信支付

以太坊go-ethereum客戶端查詢交易列表（二）

Tensorflow lite for 移動端安卓開發（二）——完整詳細過程訓練自己的模型

語音識別（SR）的秘密

ROS kinetic語音識別（轉）

NLPCC2013中文微博細粒度情感識別（二）

智慧語音計算器（二）

計算機視覺與模式識別（二）色彩遷移

語音識別（1）---語音識別(ASR)評估指標-WER（字錯誤率）和SER（句錯誤率）

tensorflow實現驗證碼識別（二）

專案二 CIFAR10與ImageNet影象識別（二）

java實現opencv人臉識別（二）

端到端語音識別（二） ctc

相關筆記

History

Reference

相關推薦