1. 程式人生 > >kaldi上執行thchs30中文語音庫的錯誤總結

kaldi上執行thchs30中文語音庫的錯誤總結

在執行完timit示例後,開始執行中文庫thchs30。在執行的過程中,遇到的第一個錯誤如下:

decode.sh: feature type is lda
steps/align_fmllr.sh: doing final alignment.
ERROR: FstHeader::Read: Bad FST header: -
ERROR (fstdeterminizestar[5.1]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input

[ Stack-Trace: ]
fstdeterminizestar() [0x626fe2]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
fst::ReadFstKaldi(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
main
__libc_start_main
_start

ERROR: FstHeader::Read: Bad FST header: -
ERROR (fstrmsymbols[5.1]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input

[ Stack-Trace: ]
fstrmsymbols() [0x54d89c]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
fst::ReadFstKaldi(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
main
__libc_start_main
_start

ERROR: FstHeader::Read: Bad FST header: -
ERROR (fstrmepslocal[5.1]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input

[ Stack-Trace: ]
fstrmepslocal() [0x5739d4]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
fst::ReadFstKaldi(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
main
__libc_start_main
_start

ERROR: FstHeader::Read: Bad FST header: -
ERROR (fstminimizeencoded[5.1]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input

[ Stack-Trace: ]
fstminimizeencoded() [0x5c3b92]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
fst::ReadFstKaldi(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
main
__libc_start_main
_start

在請教大神@wbglearn點選開啟連結後,才知道是指令碼在並行運算的時候出錯了,解決辦法是把下面程式碼中紅色標註的並行運算子號&去掉

#monophone
steps/train_mono.sh --boost-silence 1.25 --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/mono || exit 1; 
#test monophone model
local/thchs-30_decode.sh --mono true --nj $n "steps/decode.sh" exp/mono data/mfcc &

#monophone_ali
steps/align_si.sh --boost-silence 1.25 --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/mono exp/mono_ali || exit 1;

#triphone
steps/train_deltas.sh --boost-silence 1.25 --cmd "$train_cmd" 2000 10000 data/mfcc/train data/lang exp/mono_ali exp/tri1 || exit 1;
#test tri1 model
local/thchs-30_decode.sh --nj $n "steps/decode.sh" exp/tri1 data/mfcc &

#triphone_ali
steps/align_si.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri1 exp/tri1_ali || exit 1;

#lda_mllt
steps/train_lda_mllt.sh --cmd "$train_cmd" --splice-opts "--left-context=3 --right-context=3" 2500 15000 data/mfcc/train data/lang exp/tri1_ali exp/tri2b || exit 1;
#test tri2b model
local/thchs-30_decode.sh --nj $n "steps/decode.sh" exp/tri2b data/mfcc 
#lda_mllt_ali steps/align_si.sh --nj $n --cmd "$train_cmd" --use-graphs true data/mfcc/train data/lang exp/tri2b exp/tri2b_ali || exit 1; #sat steps/train_sat.sh --cmd "$train_cmd" 2500 15000 data/mfcc/train data/lang exp/tri2b_ali exp/tri3b || exit 1; #test tri3b model local/thchs-30_decode.sh --nj $n "steps/decode_fmllr.sh" exp/tri3b data/mfcc
#sat_ali steps/align_fmllr.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri3b exp/tri3b_ali || exit 1; #quick steps/train_quick.sh --cmd "$train_cmd" 4200 40000 data/mfcc/train data/lang exp/tri3b_ali exp/tri4b || exit 1; #test tri4b model local/thchs-30_decode.sh --nj $n "steps/decode_fmllr.sh" exp/tri4b data/mfcc #quick_ali steps/align_fmllr.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri4b exp/tri4b_ali || exit 1; #quick_ali_cv steps/align_fmllr.sh --nj $n --cmd "$train_cmd" data/mfcc/dev data/lang exp/tri4b exp/tri4b_ali_cv || exit 1; #train dnn model local/nnet/run_dnn.sh --stage 0 --nj $n exp/tri4b exp/tri4b_ali exp/tri4b_ali_cv || exit 1;
然後上面的錯誤就能解決了。

但是在跑帶噪語音dae的時候又出現了錯誤:

num_fea = 40
run.pl: job failed, log is in exp/tri4b_dnn_dae/log/train_nnet.log
任務失敗,錯誤日誌在上面那個路徑的檔案中,開啟對應的檔案找到錯誤如下:
steps/nnet/train_scheduler.sh: line 86: 21609 Segmentation fault      (core dumped) 
$train_tool --cross-validate=true --randomize=false --verbose=$verbose 
$train_tool_opts ${feature_transform:+ --feature-transform=$feature_transform} 
${frame_weights:+ "--frame-weights=$frame_weights"} 
${utt_weights:+ "--utt-weights=$utt_weights"} "$feats_cv" "$labels_cv" 
$mlp_best 2>> $log
在同樣的資料夾下還有個日誌檔案,裡面有錯誤如下:
LOG (nnet-train-frmshuff[5.1]:Init():nnet-randomizer.cc:32) Seeding by srand with : 777
LOG (nnet-train-frmshuff[5.1]:main():nnet-train-frmshuff.cc:157) CROSS-VALIDATION STARTED
apply-cmvn --norm-vars=false scp:exp/tri4b_dnn_dae/tgt_cmvn.scp ark:- ark:- 
copy-feats scp:exp/tri4b_dnn_dae/tgt_feats.scp ark:- 
WARNING (apply-cmvn[5.1]:Open():util/kaldi-table-inl.h:1650) Script file exp/tri4b_dnn_dae/tgt_cmvn.scp contains duplicate key: A02
ERROR (apply-cmvn[5.1]:RandomAccessTableReader():util/kaldi-table-inl.h:2528) Error opening RandomAccessTableReader object  (rspecifier is: scp:exp/tri4b_dnn_dae/tgt_cmvn.scp)

[ Stack-Trace: ]
apply-cmvn() [0x5413ae]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::RandomAccessTableReader<kaldi::KaldiObjectHolder<kaldi::Matrix<double> > >::RandomAccessTableReader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
kaldi::RandomAccessTableReaderMapped<kaldi::KaldiObjectHolder<kaldi::Matrix<double> > >::RandomAccessTableReaderMapped(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
main
__libc_start_main
_start
這個問題真的不知道該怎麼解決了,怪自己太菜。由於自己做的ASR降噪部分沒用kaldi的DNN,所以這個問題對自己的研究方向沒大的影響,所以就先擱置了。

對於純淨語音的解碼結果及識別率等資訊儲存路在檔案/home/wang/download/KALDI_ROOT/egs/thchs30/s5/exp。裡面對應的tir1 tri2b tri3b tri4b tri4b_dnn資料夾下就是識別結果。

雖然這個問題沒大影響,但是總像肉中刺一樣難受,如果有人遇到同樣的問題歡迎和我交流。