kaldi上執行thchs30中文語音庫的錯誤總結
阿新 • • 發佈:2019-01-01
在執行完timit示例後,開始執行中文庫thchs30。在執行的過程中,遇到的第一個錯誤如下:
decode.sh: feature type is lda steps/align_fmllr.sh: doing final alignment. ERROR: FstHeader::Read: Bad FST header: - ERROR (fstdeterminizestar[5.1]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input [ Stack-Trace: ] fstdeterminizestar() [0x626fe2] kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*) kaldi::MessageLogger::~MessageLogger() fst::ReadFstKaldi(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) main __libc_start_main _start ERROR: FstHeader::Read: Bad FST header: - ERROR (fstrmsymbols[5.1]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input [ Stack-Trace: ] fstrmsymbols() [0x54d89c] kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*) kaldi::MessageLogger::~MessageLogger() fst::ReadFstKaldi(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) main __libc_start_main _start ERROR: FstHeader::Read: Bad FST header: - ERROR (fstrmepslocal[5.1]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input [ Stack-Trace: ] fstrmepslocal() [0x5739d4] kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*) kaldi::MessageLogger::~MessageLogger() fst::ReadFstKaldi(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) main __libc_start_main _start ERROR: FstHeader::Read: Bad FST header: - ERROR (fstminimizeencoded[5.1]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input [ Stack-Trace: ] fstminimizeencoded() [0x5c3b92] kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*) kaldi::MessageLogger::~MessageLogger() fst::ReadFstKaldi(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) main __libc_start_main _start
在請教大神@wbglearn點選開啟連結後,才知道是指令碼在並行運算的時候出錯了,解決辦法是把下面程式碼中紅色標註的並行運算子號&去掉
#monophone
steps/train_mono.sh --boost-silence 1.25 --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/mono || exit 1;
#test monophone model
local/thchs-30_decode.sh --mono true --nj $n "steps/decode.sh" exp/mono data/mfcc &
#monophone_ali
steps/align_si.sh --boost-silence 1.25 --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/mono exp/mono_ali || exit 1;
#triphone
steps/train_deltas.sh --boost-silence 1.25 --cmd "$train_cmd" 2000 10000 data/mfcc/train data/lang exp/mono_ali exp/tri1 || exit 1;
#test tri1 model
local/thchs-30_decode.sh --nj $n "steps/decode.sh" exp/tri1 data/mfcc &
#triphone_ali
steps/align_si.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri1 exp/tri1_ali || exit 1;
#lda_mllt
steps/train_lda_mllt.sh --cmd "$train_cmd" --splice-opts "--left-context=3 --right-context=3" 2500 15000 data/mfcc/train data/lang exp/tri1_ali exp/tri2b || exit 1;
#test tri2b model
local/thchs-30_decode.sh --nj $n "steps/decode.sh" exp/tri2b data/mfcc &
#lda_mllt_ali
steps/align_si.sh --nj $n --cmd "$train_cmd" --use-graphs true data/mfcc/train data/lang exp/tri2b exp/tri2b_ali || exit 1;
#sat
steps/train_sat.sh --cmd "$train_cmd" 2500 15000 data/mfcc/train data/lang exp/tri2b_ali exp/tri3b || exit 1;
#test tri3b model
local/thchs-30_decode.sh --nj $n "steps/decode_fmllr.sh" exp/tri3b data/mfcc &
#sat_ali
steps/align_fmllr.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri3b exp/tri3b_ali || exit 1;
#quick
steps/train_quick.sh --cmd "$train_cmd" 4200 40000 data/mfcc/train data/lang exp/tri3b_ali exp/tri4b || exit 1;
#test tri4b model
local/thchs-30_decode.sh --nj $n "steps/decode_fmllr.sh" exp/tri4b data/mfcc &
#quick_ali
steps/align_fmllr.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri4b exp/tri4b_ali || exit 1;
#quick_ali_cv
steps/align_fmllr.sh --nj $n --cmd "$train_cmd" data/mfcc/dev data/lang exp/tri4b exp/tri4b_ali_cv || exit 1;
#train dnn model
local/nnet/run_dnn.sh --stage 0 --nj $n exp/tri4b exp/tri4b_ali exp/tri4b_ali_cv || exit 1;
然後上面的錯誤就能解決了。
但是在跑帶噪語音dae的時候又出現了錯誤:
num_fea = 40
run.pl: job failed, log is in exp/tri4b_dnn_dae/log/train_nnet.log
任務失敗,錯誤日誌在上面那個路徑的檔案中,開啟對應的檔案找到錯誤如下:
steps/nnet/train_scheduler.sh: line 86: 21609 Segmentation fault (core dumped)
$train_tool --cross-validate=true --randomize=false --verbose=$verbose
$train_tool_opts ${feature_transform:+ --feature-transform=$feature_transform}
${frame_weights:+ "--frame-weights=$frame_weights"}
${utt_weights:+ "--utt-weights=$utt_weights"} "$feats_cv" "$labels_cv"
$mlp_best 2>> $log
在同樣的資料夾下還有個日誌檔案,裡面有錯誤如下:
LOG (nnet-train-frmshuff[5.1]:Init():nnet-randomizer.cc:32) Seeding by srand with : 777
LOG (nnet-train-frmshuff[5.1]:main():nnet-train-frmshuff.cc:157) CROSS-VALIDATION STARTED
apply-cmvn --norm-vars=false scp:exp/tri4b_dnn_dae/tgt_cmvn.scp ark:- ark:-
copy-feats scp:exp/tri4b_dnn_dae/tgt_feats.scp ark:-
WARNING (apply-cmvn[5.1]:Open():util/kaldi-table-inl.h:1650) Script file exp/tri4b_dnn_dae/tgt_cmvn.scp contains duplicate key: A02
ERROR (apply-cmvn[5.1]:RandomAccessTableReader():util/kaldi-table-inl.h:2528) Error opening RandomAccessTableReader object (rspecifier is: scp:exp/tri4b_dnn_dae/tgt_cmvn.scp)
[ Stack-Trace: ]
apply-cmvn() [0x5413ae]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::RandomAccessTableReader<kaldi::KaldiObjectHolder<kaldi::Matrix<double> > >::RandomAccessTableReader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
kaldi::RandomAccessTableReaderMapped<kaldi::KaldiObjectHolder<kaldi::Matrix<double> > >::RandomAccessTableReaderMapped(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
main
__libc_start_main
_start
這個問題真的不知道該怎麼解決了,怪自己太菜。由於自己做的ASR降噪部分沒用kaldi的DNN,所以這個問題對自己的研究方向沒大的影響,所以就先擱置了。
對於純淨語音的解碼結果及識別率等資訊儲存路在檔案/home/wang/download/KALDI_ROOT/egs/thchs30/s5/exp。裡面對應的tir1 tri2b tri3b tri4b tri4b_dnn資料夾下就是識別結果。
雖然這個問題沒大影響,但是總像肉中刺一樣難受,如果有人遇到同樣的問題歡迎和我交流。