kaldi中文語音識別thchs30模型訓練程式碼功能和配置引數解讀

阿新 • • 發佈：2018-11-01

Monophone

單音素模型的訓練

# Flat start and monophone training, with delta-delta features. # This script applies cepstral mean normalization (per speaker).
#monophone 訓練單音素模型
steps/train_mono.sh --boost-silence 1.25 --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp /mono || exit 1;
#test monophone model
local/thchs-30_decode.sh --mono true --nj $n "steps/decode.sh" exp/mono data/mfcc &

train_mono.sh 用法

echo "Usage: steps/train_mono.sh [options] <data-dir> <lang-dir> <exp-dir>"
echo " e.g.: steps/train_mono.sh data/train.1k data/lang exp/mono"
echo "main options (for others, see top of script file)"

其中的引數設定，訓練單音素的基礎HMM模型，迭代40次，並按照 realign_iters 的次數對資料對齊

# Begin configuration section.
nj=4
cmd=run.pl
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
num_iters=40 # Number of iterations of training
max_iter_inc=30 # Last iter to increase #Gauss on.
totgauss=1000 # Target #Gaussians.
careful=false
boost_silence=1.0 # Factor by which to boost silence likelihoods in alignment
realign_iters="1 2 3 4 5 6 7 8 9 10 12 14 16 18 20 23 26 29 32 35 38";
config= # name of config file.
stage=-4
power=0.25 # exponent to determine number of gaussians from occurrence counts
norm_vars=false # deprecated, prefer --cmvn-opts "--norm-vars=false"
cmvn_opts= # can be used to add extra options to cmvn.
# End configuration section.

thchs - 30 _decode . sh 測試單音素模型，實際使用mkgraph.sh建立完全的識別網路，並輸出一個有限狀態轉換器，最後使用decode.sh以語言模型和測試資料為輸入計算WER.

#decode word
utils/mkgraph.sh $opt data/graph/lang $srcdir $srcdir/graph_word || exit 1;
$decoder --cmd "$decode_cmd" --nj $nj $srcdir/graph_word $datadir/test $srcdir/decode_test_word || exit 1
#decode phone
utils/mkgraph.sh $opt data/graph_phone/lang $srcdir $srcdir/graph_phone || exit 1;
$decoder --cmd "$decode_cmd" --nj $nj $srcdir/graph_phone $datadir/test_phone $srcdir/decode_test_phone || exit 1

align_si . sh用指定模型對指定資料進行對齊，一般在訓練新模型前進行，以上一版本模型作為輸入，輸出在 <align-dir>

#monophone_ali
steps/align_si.sh --boost-silence 1.25 --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/mono exp/mono_ali || exit 1;
# Computes training alignments using a model with delta or
# LDA+MLLT features.
# If you supply the "--use-graphs true" option, it will use the training
# graphs from the source directory (where the model is). In this
# case the number of jobs must match with the source directory.
echo "usage: steps/align_si.sh <data-dir> <lang-dir> <src-dir> <align-dir>"
echo "e.g.: steps/align_si.sh data/train data/lang exp/tri1 exp/tri1_ali"
echo "main options (for others, see top of script file)"
echo " --config <config-file> # config containing options"
echo " --nj <nj> # number of parallel jobs"
echo " --use-graphs true # use graphs in src-dir"
echo " --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."

Triphone

以單音素模型為輸入訓練上下文相關的三音素模型

#triphone
steps/train_deltas.sh --boost-silence 1.25 --cmd "$train_cmd" 2000 10000 data/mfcc/train data/lang exp/mono_ali exp/tri1 || exit 1;
#test tri1 model
local/thchs-30_decode.sh --nj $n "steps/decode.sh" exp/tri1 data/mfcc &

train_deltas . sh 中的相關配置如下，其中輸入

# Begin configuration.
stage=-4 # This allows restarting after partway, when something when wrong.
config=
cmd=run.pl
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
realign_iters="10 20 30";
num_iters=35 # Number of iterations of training
max_iter_inc=25 # Last iter to increase #Gauss on.
beam=10
careful=false
retry_beam=40
boost_silence=1.0 # Factor by which to boost silence likelihoods in alignment
power=0.25 # Exponent for number of gaussians according to occurrence counts
cluster_thresh=-1 # for build-tree control final bottom-up clustering of leaves
norm_vars=false # deprecated. Prefer --cmvn-opts "--norm-vars=true"
# use the option --cmvn-opts "--norm-means=false"
cmvn_opts=
delta_opts=
context_opts= # use"--context-width=5 --central-position=2" for quinphone
# End configuration.
echo "Usage: steps/train_deltas.sh <num-leaves> <tot-gauss> <data-dir> <lang-dir> <alignment-dir> <exp-dir>"
echo "e.g.: steps/train_deltas.sh 2000 10000 data/train_si84_half data/lang exp/mono_ali exp/tri1"

LDA_MLLT

對特徵使用LDA和MLLT進行變換，訓練加入LDA和MLLT的三音素模型。

LDA+MLLT refers to the way we transform the features after computing the MFCCs: we splice across several frames, reduce the dimension (to 40 by default) using Linear Discriminant Analysis), and then later estimate, over multiple iterations, a diagonalizing transform known as MLLT or CTC.

詳情可參考 http://kaldi-asr.org/doc/transform.html

#triphone_ali
steps/align_si.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri1 exp/tri1_ali || exit 1;
#lda_mllt
steps/train_lda_mllt.sh --cmd "$train_cmd" --splice-opts "--left-context=3 --right-context=3" 2500 15000 data/mfcc/train data/lang exp/tri1_ali exp/tri2b || exit 1;
#test tri2b model
local/thchs-30_decode.sh --nj $n "steps/decode.sh" exp/tri2b data/mfcc &

train_lda_mllt . sh相關程式碼配置如下：

# Begin configuration.
cmd=run.pl
config=
stage=-5
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
realign_iters="10 20 30";
mllt_iters="2 4 6 12";
num_iters=35 # Number of iterations of training
max_iter_inc=25 # Last iter to increase #Gauss on.
dim=40
beam=10
retry_beam=40
careful=false
boost_silence=1.0 # Factor by which to boost silence likelihoods in alignment
power=0.25 # Exponent for number of gaussians according to occurrence counts
randprune=4.0 # This is approximately the ratio by which we will speed up the
# LDA and MLLT calculations via randomized pruning.
splice_opts=
cluster_thresh=-1 # for build-tree control final bottom-up clustering of leaves
norm_vars=false # deprecated. Prefer --cmvn-opts "--norm-vars=false"
cmvn_opts=
context_opts= # use "--context-width=5 --central-position=2" for quinphone.
# End configuration.

Sat

運用基於特徵空間的最大似然線性迴歸（fMLLR）進行說話人自適應訓練 This does Speaker Adapted Training (SAT), i.e. train on fMLLR-adapted features. It can be done on top of either LDA+MLLT, or delta and delta-delta features. If there are no transforms supplied in the alignment directory, it will estimate transforms itself before building the tree (and in any case, it estimate

kaldi中文語音識別thchs30模型訓練程式碼功能和配置引數解讀

Monophone

Triphone

LDA_MLLT

Sat

kaldi中文語音識別thchs30模型訓練程式碼功能和配置引數解讀

kaldi中文語音識別(1)——thchs30

kaldi中文語音識別_基於thchs30(1)

kaldi中文語音識別_基於thchs30(3)

語音識別——基於深度學習的中文語音識別系統實現（程式碼詳解）

Unity中使用百度中文語音識別功能

基於seq2seq+attention的中文語音識別

語音識別語言模型和拼音字典檔案製作

使用 pocketsphinx 做中文語音識別時報錯 ERROR: Input audio file has sample rate [44100], but decoder expects [160

winform程式實現中文語音識別

用深度學習做命名實體識別(四)——模型訓練

語音識別完成詩句的查詢功能，iOS AVSpeechSynthesis語音輸出結果的詩歌APP

PocketSphinx語音識別系統的編譯、安裝和使用

機器學習使用sklearn進行模型訓練、預測和評價

CNN卷積神經網路應用於人臉識別（詳細流程+程式碼實現)和相應的超引數解釋

人工智慧人臉識別在業務上選擇RabbitMQ的配置引數

kaldi使用訓練好的模型做語音識別

語音識別系統語言模型的訓練和聲學模型的改進

在伺服器上執行kaldi說話人識別模型訓練程式遇到的小問題

Kaldi學習筆記（四）——thchs30中文線上識別

kaldi中文語音識別thchs30模型訓練程式碼功能和配置引數解讀

Monophone

Triphone

LDA_MLLT

Sat

相關推薦