Facebook 釋出 wav2letter 工具包，用於端到端自動語音識別

阿新 • • 發佈：2022-05-04

AI 研習社訊息，日前， Facebook 人工智慧研究院釋出 wav2letter 工具包，它是一個簡單高效的端到端自動語音識別(ASR)系統，實現了 Wav2Letter: an End-to-End ConvNet-based Speech Recognition System 和 Letter-Based Speech Recognition with Gated ConvNets 這兩篇論文中提出的架構。如果大家想現在就開始使用這個工具進行語音識別，Facebook 提供 Librispeech 資料集的預訓練模型。

以下為對系統的要求，以及這一工具的安裝教程， AI 研習社整理如下：

安裝要求：

系統：MacOS 或 Linux

Torch：接下來會介紹安裝教程

在 CPU 上訓練：Intel MKL

在 GPU 上訓練：英偉達 CUDA 工具包 (cuDNN v5.1 for CUDA 8.0)

音訊檔案讀取：Libsndfile

標準語音特徵：FFTW

安裝：

MKL

如果想在 CPU 上進行訓練，強烈建議安裝 Intel MKL

執行如下程式碼更新 .bashrc file

# We assume Torch will be installed in $HOME/usr.

# Change according to your needs.

export PATH=$HOME/usr/bin:$PATH



# This is to detect MKL during compilation

# but also to make sure it is found at runtime.

INTEL_DIR=/opt/intel/lib/intel64
MKL_DIR=/opt/intel/mkl/lib/intel64
MKL_INC_DIR=/opt/intel/mkl/include



if [ ! -d "$INTEL_DIR" ]; then
   echo "$ warning: INTEL_DIR out of date"

fi

if [ ! -d "$MKL_DIR" ]; then
   echo "$ warning: MKL_DIR out of date"

fi

if [ ! -d "$MKL_INC_DIR" ]; then
   echo "$ warning: MKL_INC_DIR out of date"

fi



# Make sure MKL can be found by Torch.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$INTEL_DIR:$MKL_DIR

export CMAKE_LIBRARY_PATH=$LD_LIBRARY_PATH

export CMAKE_INCLUDE_PATH=$CMAKE_INCLUDE_PATH:$MKL_INC_DIR

LuaJIT 和 LuaRocks

執行如下程式碼可以在 $HOME/usr 下安裝 LuaJIT 和 LuaRocks，如果你想要進行系統級安裝，刪掉程式碼中的 -DCMAKE_INSTALL_PREFIX=$HOME/usr 即可。

git clone https://github.com/torch/luajit-rocks.git

cd luajit-rocks
mkdir build; cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/usr -DWITH_LUAJIT21=OFF
make -j 4
make install

cd ../..

接下來，我們假定 luarocks 和 luajit 被安裝在 $PATH 下，如果你把它們安裝在 $HOME/usr 下了，可以執行 ~/usr/bin/luarocks 和 ~/usr/bin/luajit 這兩段程式碼。

KenLM 語言模型工具包（https://kheafield.com/code/kenlm）

如果你想採用 wav2letter decoder，需要安裝 KenLM。

這裡需要用到 Boost：

# make sure boost is installed (with system/thread/test modules)

# actual command might vary depending on your system

sudo apt-get install libboost-dev libboost-system-dev libboost-thread-dev libboost-test-dev

Boost 安裝之後就可以安裝 KenLM 了：

wget https://kheafield.com/code/kenlm.tar.gz
tar xfvz kenlm.tar.gzcd kenlm
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/usr -DCMAKE_POSITION_INDEPENDENT_CODE=ON
make -j 4
make install
cp -a lib/* ~/usr/lib # libs are not installed by default :(cd ../..

OpenMPI （https://www.open-mpi.org/）和 TorchMPI（https://github.com/facebookresearch/TorchMPI）

如果計劃用到多 CPU/GPU（或者多裝置），需要安裝 OpenMPI 和 TorchMPI

免責宣告：我們非常鼓勵大家重新編譯 OpenMPI。標準釋出版本中的 OpenMPI 二進位制檔案編譯標記不一致，想要成功編譯和執行 TorchMPI，確定的編譯標記至關重要。

先安裝 OpenMPI：

wget https://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.2.tar.bz2
tar xfj openmpi-2.1.2.tar.bz2

cd openmpi-2.1.2; mkdir build; cd build
./configure --prefix=$HOME/usr --enable-mpi-cxx --enable-shared --with-slurm --enable-mpi-thread-multiple --enable-mpi-ext=affinity,cuda --with-cuda=/public/apps/cuda/9.0
make -j 20 all
make install

注意：也可以執行 openmpi-3.0.0.tar.bz2，但需要刪掉 --enable-mpi-thread-multiple。

接下來可以安裝 TorchMPI 了：

MPI_CXX_COMPILER=$HOME/usr/bin/mpicxx ~/usr/bin/luarocks install torchmpi

Torch 和其他 Torch 包

luarocks install torch
luarocks install cudnn # for GPU supportluarocks install cunn # for GPU support

wav2letter 包

git clone https://github.com/facebookresearch/wav2letter.git
cd wav2letter
cd gtn && luarocks make rocks/gtn-scm-1.rockspec && cd ..
cd speech && luarocks make rocks/speech-scm-1.rockspec && cd ..
cd torchnet-optim && luarocks make rocks/torchnet-optim-scm-1.rockspec && cd ..
cd wav2letter && luarocks make rocks/wav2letter-scm-1.rockspec && cd ..
# Assuming here you got KenLM in $HOME/kenlm
# And only if you plan to use the decoder:
cd beamer && KENLM_INC=$HOME/kenlm luarocks make rocks/beamer-scm-1.rockspec && cd ..

訓練 wav2letter 模型

資料預處理

資料資料夾中有預處理不同資料集的多個指令碼，現在我們只提供預處理 LibriSpeech 和 TIMIT 資料集的指令碼。

下面是預處理 LibriSpeech ASR 資料集的案例：

wget http://www.openslr.org/resources/12/dev-clean.tar.gz
tar xfvz dev-clean.tar.gz

# repeat for train-clean-100, train-clean-360, train-other-500, dev-other, test-clean, test-other

luajit ~/wav2letter/data/librispeech/create.lua ~/LibriSpeech ~/librispeech-proc
luajit ~/wav2letter/data/utils/create-sz.lua librispeech-proc/train-clean-100 librispeech-proc/train-clean-360 librispeech-proc/train-other-500 librispeech-proc/dev-clean librispeech-proc/dev-other librispeech-proc/test-clean librispeech-proc/test-other

訓練

mkdir experiments
luajit ~/wav2letter/train.lua --train -rundir ~/experiments -runname hello_librispeech -arch ~/wav2letter/arch/librispeech-glu-highdropout -lr 0.1 -lrcrit 0.0005 -gpu 1 -linseg 1 -linlr 0 -linlrcrit 0.005 -onorm target -nthread 6 -dictdir ~/librispeech-proc  -datadir ~/librispeech-proc -train train-clean-100+train-clean-360+train-other-500 -valid dev-clean+dev-other -test test-clean+test-other -gpu 1 -sqnorm -mfsc -melfloor 1 -surround "|" -replabel 2 -progress -wnorm -normclamp 0.2 -momentum 0.9 -weightdecay 1e-05

多 GPU 訓練

利用 OpenMPI

mpirun -n 2 --bind-to none  ~/TorchMPI/scripts/wrap.sh luajit ~/wav2letter/train.lua --train -mpi -gpu 1 ...

執行 decoder（推理階段）

為了執行 decoder，需要做少量預處理。

首先建立一個字母詞典，其中包括在 wav2letter 中用到的特殊重複字母：

cat ~/librispeech-proc/letters.lst >> ~/librispeech-proc/letters-rep.lst && echo "1" >> ~/librispeech-proc/letters-rep.lst && echo "2" >> ~/librispeech-proc/letters-rep.lst

然後將得到一個語言模型，並對這個模型進行預處理。這裡，我們將使用預先訓練過的 LibriSpeech 語言模型，大家也可以用 KenLM 訓練自己的模型。然後，我們對模型進行預處理，指令碼可能會對錯誤轉錄的單詞給予警告，這不是什麼大問題，因為這些詞很少見。

wget http://www.openslr.org/resources/11/3-gram.pruned.3e-7.arpa.gz luajit

~/wav2letter/data/utils/convert-arpa.lua ~/3-gram.pruned.3e-7.arpa.gz ~/3-gram.pruned.3e-7.arpa ~/dict.lst -preprocess ~/wav2letter/data/librispeech/preprocess.lua -r 2 -letters letters-rep.lst

可選項：利用 KenLM 將模型轉換成二進位制格式，載入起來將會更快。

build_binary 3-gram.pruned.3e-7.arpa 3-gram.pruned.3e-7.bin

現在執行 test.lua lua，可以生成 emission。下面的指令碼可以顯示出字母錯誤率 (LER) 和單詞錯誤率 (WER)。

luajit ~/wav2letter/test.lua ~/experiments/hello_librispeech/001_model_dev-clean.bin -progress -show -test dev-clean -save

一旦儲存好 emission，可以執行 decoder 來計算 WER：

luajit ~/wav2letter/decode.lua ~/experiments/hello_librispeech dev-clean -show -letters ~/librispeech-proc/letters-rep.lst  -words ~/dict.lst -lm ~/3-gram.pruned.3e-7.arpa -lmweight 3.1639 -beamsize 25000 -beamscore 40 -nthread 10 -smearing max -show

預訓練好的模型：

我們提供訓練充分的 LibriSpeech 模型：

wget https://s3.amazonaws.com/wav2letter/models/librispeech-glu-highdropout.bin

注意：該模型是在 Facebook 的框架下訓練好的，因此需要用稍微不同的引數來執行 test.lua

luajit ~/wav2letter/test.lua ~/librispeech-glu-highdropout.bin -progress -show -test dev-clean -save -datadir ~/librispeech-proc/ -dictdir ~/librispeech-proc/ -gfsai

大家可以加入 wav2letter 社群

Facebook：https://www.facebook.com/groups/717232008481207/

Google 社群：https://groups.google.com/forum/#!forum/wav2letter-users

via：https://github.com/facebookresearch/wav2letter

Facebook 釋出 wav2letter 工具包，用於端到端自動語音識別

Facebook 釋出 wav2letter 工具包，用於端到端自動語音識別

乘風破浪，遇見Android Jetpack之Compose宣告式UI開發工具包，逐漸大一統的原生UI繪製體系

【平臺】Seldon.io釋出新開源平臺，用於Kubernetes上的機器學習

Win11/Win10 全新開發，Windows App SDK 1.0.0 體驗工具包釋出：支援 WinUI 3、推送通知、視窗化...

Steam Deck 即將上市，官方釋出新工具用於檢測遊戲相容性

自定義非同步執行緒池工具，用於執行非同步方法

基於 Quill、適用於 Vue 的富文字編輯器，支援服務端渲染和單頁應用

無需 SMT ，AMD 銳龍自動超頻工具 CTR v1.1 釋出：新引擎、更穩定

IDEA:將使用springboot技術開發的web專案，打成war包，並使用tomcat釋出

JUC 包下工具類，它的名字叫 LockSupport ！你造麼？

Android+SpringBoot+Vue實現安裝包前臺上傳，後臺管理，移動端檢測自動更新

Java之Properties類，緩衝類，轉換流，序列化流，裝飾者模式，commons-io工具包

MyBatis-Plus 入門程式碼，一個強大的單表查詢工具包

【JavaDebug（十二）】之NoSuchMethodError，使用arthas工具查詢jar包，maven命令搜尋jar包

CPU 調整工具 ThrottleStop 9.3 版釋出：介面改善，支援更多 CPU

騰龍釋出 11-20mm F/2.8 鏡頭：搭載非球面鏡片，用於索尼 C 畫幅微單

騰龍釋出 150-500mm F5-6.7 Di III VC VXD 鏡頭，用於索尼全畫幅無反

微軟釋出測試工具，可將 Dropbox 資料遷移至 Microsoft 365

豪威科技釋出 OH08A 和 OH08B 醫療級 CMOS 影象感測器：800 萬畫素，用於一次性和可重複使用內窺鏡

麥芒 10 SE 5G 手機今日釋出：驍龍 480 5G 處理器，定位低端

Facebook 釋出 wav2letter 工具包，用於端到端自動語音識別

相關推薦