ROS 教程3 機器人語音語音識別理解合成控制 ASR NLU TTS

阿新 • • 發佈：2018-12-08

機器人語音語音識別理解合成控制 ASR NLU TTS

一、語音處理總體框架

    1. 語音識別（ASR ， Automatic Speech Recognition ）
    2. 語義理解（NLU ， Natural Language Understanding）
    e. 語音合成（TTS ， Text To Speech）

1. 語音識別 **ASR**：支援的包：
    國外：CMU SPhinx  ——>  pocketsphinx
    國內：科大迅飛等。。

2. 語義理解 NLU：  圖靈

3. 語音合成 TTS：
    國外：Festival    ——> sound_play   是 ros-indigo-audio-common 的一部分
    國內：科大迅飛等。。。

二、國外庫

1、語音識別 pocketsphinx

 1) 安裝
  sudo apt-get install gstreamer0.10-pocketsphinx   # 原生系統
  sudo apt-get install ros-indigo-pocketsphinx      # ros介面支援
  sudo apt-get install ros-indigo-audio-common      # 包含了sound_play TTS
  sudo apt-get install libasound2                   # 語音驅動
  sudo apt-get install gstreamer0.10-gconf          # GStreamer元件
  
 2) 測試
   pocketsphinx包 包含了 一個 節點  recognizer.py
   獲取硬體 語音的輸入流，在已有的語音庫裡  匹配語音相應的單詞  併發布到 /recognizer/output 話題   
   適合robotcup的一個語音庫測試：
   roslaunch pocketsphinx robocup.launch
   說話測試
   顯示 話題訊息：
     rostopic echo /recognizer/output
   檢視語音庫：
   roscd pocketsphinx/demo
   more robocup.corpus      顯示
   只有說了語音庫內的語音才能得到較滿意的結果

  # robocup.launch     
 <launch>
  <node name="recognizer" pkg="pocketsphinx" type="recognizer.py" output="screen"> # 識別器
    <param name="lm" value="$(find pocketsphinx)/demo/robocup.lm"/>                # 語言模型 線上工具根據語言庫生成
    <param name="dict" value="$(find pocketsphinx)/demo/robocup.dic" 
/>             # 語言詞典
  </node>
</launch>

# recognizer.py 檔案
import roslib; roslib.load_manifest('pocketsphinx')
import rospy

import pygtk
pygtk.require('2.0')
import gtk

import gobject
import pygst
pygst.require('0.10')
gobject.threads_init()
import gst

from std_msgs.msg import String
from std_srvs.srv import *
import os
import commands

class recognizer(object):
    """ GStreamer based speech recognizer. """

    def __init__(self):
        # Start node
        rospy.init_node("recognizer")

        self._device_name_param = "~mic_name"  # Find the name of your microphone by typing pacmd list-sources in the terminal
        self._lm_param = "~lm"
        self._dic_param = "~dict"

        # Configure mics with gstreamer launch config
        if rospy.has_param(self._device_name_param):
            self.device_name = rospy.get_param(self._device_name_param)
            self.device_index = self.pulse_index_from_name(self.device_name)
            self.launch_config = "pulsesrc device=" + str(self.device_index)
            rospy.loginfo("Using: pulsesrc device=%s name=%s", self.device_index, self.device_name)
        elif rospy.has_param('~source'):
            # common sources: 'alsasrc'
            self.launch_config = rospy.get_param('~source')
        else:
            self.launch_config = 'gconfaudiosrc'

        rospy.loginfo("Launch config: %s", self.launch_config)

        self.launch_config += " ! audioconvert ! audioresample " \
                            + '! vader name=vad auto-threshold=true ' \
                            + '! pocketsphinx name=asr ! fakesink'

        # Configure ROS settings
        self.started = False
        rospy.on_shutdown(self.shutdown)
        self.pub = rospy.Publisher('~output', String)
        rospy.Service("~start", Empty, self.start)
        rospy.Service("~stop", Empty, self.stop)

        if rospy.has_param(self._lm_param) and rospy.has_param(self._dic_param):
            self.start_recognizer()
        else:
            rospy.logwarn("lm and dic parameters need to be set to start recognizer.")

    def start_recognizer(self):
        rospy.loginfo("Starting recognizer... ")

        self.pipeline = gst.parse_launch(self.launch_config)
        self.asr = self.pipeline.get_by_name('asr')
        self.asr.connect('partial_result', self.asr_partial_result)
        self.asr.connect('result', self.asr_result)
        self.asr.set_property('configured', True)
        self.asr.set_property('dsratio', 1)

        # Configure language model
        if rospy.has_param(self._lm_param):
            lm = rospy.get_param(self._lm_param)
        else:
            rospy.logerr('Recognizer not started. Please specify a language model file.')
            return

        if rospy.has_param(self._dic_param):
            dic = rospy.get_param(self._dic_param)
        else:
            rospy.logerr('Recognizer not started. Please specify a dictionary.')
            return

        self.asr.set_property('lm', lm)
        self.asr.set_property('dict', dic)

        self.bus = self.pipeline.get_bus()
        self.bus.add_signal_watch()
        self.bus_id = self.bus.connect('message::application', self.application_message)
        self.pipeline.set_state(gst.STATE_PLAYING)
        self.started = True

    def pulse_index_from_name(self, name):
        output = commands.getstatusoutput("pacmd list-sources | grep -B 1 'name: <" + name + ">' | grep -o -P '(?<=index: )[0-9]*'")

        if len(output) == 2:
            return output[1]
        else:
            raise Exception("Error. pulse index doesn't exist for name: " + name)

    def stop_recognizer(self):
        if self.started:
            self.pipeline.set_state(gst.STATE_NULL)
            self.pipeline.remove(self.asr)
            self.bus.disconnect(self.bus_id)
            self.started = False

    def shutdown(self):
        """ Delete any remaining parameters so they don't affect next launch """
        for param in [self._device_name_param, self._lm_param, self._dic_param]:
            if rospy.has_param(param):
                rospy.delete_param(param)

        """ Shutdown the GTK thread. """
        gtk.main_quit()

    def start(self, req):
        self.start_recognizer()
        rospy.loginfo("recognizer started")
        return EmptyResponse()

    def stop(self, req):
        self.stop_recognizer()
        rospy.loginfo("recognizer stopped")
        return EmptyResponse()

    def asr_partial_result(self, asr, text, uttid):
        """ Forward partial result signals on the bus to the main thread. """
        struct = gst.Structure('partial_result')
        struct.set_value('hyp', text)
        struct.set_value('uttid', uttid)
        asr.post_message(gst.message_new_application(asr, struct))

    def asr_result(self, asr, text, uttid):
        """ Forward result signals on the bus to the main thread. """
        struct = gst.Structure('result')
        struct.set_value('hyp', text)
        struct.set_value('uttid', uttid)
        asr.post_message(gst.message_new_application(asr, struct))

    def application_message(self, bus, msg):
        """ Receive application messages from the bus. """
        msgtype = msg.structure.get_name()
        if msgtype == 'partial_result':
            self.partial_result(msg.structure['hyp'], msg.structure['uttid'])
        if msgtype == 'result':
            self.final_result(msg.structure['hyp'], msg.structure['uttid'])

    def partial_result(self, hyp, uttid):
        """ Delete any previous selection, insert text and select it. """
        rospy.logdebug("Partial: " + hyp)

    def final_result(self, hyp, uttid):
        """ Insert the final result. """
        msg = String()
        msg.data = str(hyp.lower())
        rospy.loginfo(msg.data)
        self.pub.publish(msg)

if __name__ == "__main__":
    start = recognizer()
    gtk.main()

#!/usr/bin/python
# -*- coding:utf-8 -*-
### 修改後的 檔案
import roslib
roslib.load_manifest('pocketsphinx')
import rospy
import pygtk # Python輕鬆建立具有圖形使用者介面的程式  播放音樂等
pygtk.require('2.0')
import gtk   #  GNU Image Manipulation Program (GIMP)   Toolkit

import gobject # 亦稱Glib物件系統，是一個程式庫，它可以幫助我們使用C語言編寫面向物件程式
import pygst   # 與 pygtk 相關
pygst.require('0.10')
gobject.threads_init()# 初始化
import gst

from std_msgs.msg import String
from std_srvs.srv import *
import os
import commands

class recognizer(object):
    """GStreamer是一個多媒體框架，它可以允許你輕易地建立、編輯與播放多媒體檔案"""
    # 初始化系統配置
    def __init__(self):
        # 建立節點
        rospy.init_node("recognizer")
        # 全域性引數
        self._device_name_param = "~mic_name"  # 麥克風
        self._lm_param = "~lm"                 # 語言模型 language model  
        self._dic_param = "~dict"              # 語言字典
        self._hmm_param = "~hmm"               # 識別網路  hiden markov model 隱馬爾可夫模型 分中英文模型
        
        
        # 用 gstreamer launch config 配置 麥克風  一些啟動資訊
        if rospy.has_param(self._device_name_param):# 按照指定的麥克風
            self.device_name = rospy.get_param(self._device_name_param)# 麥克風名字
            self.device_index = self.pulse_index_from_name(self.device_name)# 麥克風編號 ID
            self.launch_config = "pulsesrc device=" + str(self.device_index)# 啟動資訊
            rospy.loginfo("Using: pulsesrc device=%s name=%s", self.device_index, self.device_name)
        elif rospy.has_param('~source'):
            # common sources: 'alsasrc'
            self.launch_config = rospy.get_param('~source')
        else:
            self.launch_config = 'gconfaudiosrc'

        rospy.loginfo("麥克風配置: %s", self.launch_config) # "Launch config: %s",self.launch_config

        self.launch_config += " ! audioconvert ! audioresample " \
                            + '! vader name=vad auto-threshold=true ' \
                            + '! pocketsphinx name=asr ! fakesink'

        # 配置ros系統設定
        self.started = False
        rospy.on_shutdown(self.shutdown)# 自主關閉
        self.pub = rospy.Publisher('~output', String)# 釋出 ~output 引數指定的 話題 型別 tring  似乎缺少 指定釋出佇列大小 tring
        
        rospy.Service("~start", Empty, self.start)   # 開始服務
        rospy.Service("~stop", Empty, self.stop)     # 結束服務
        # 檢查模型和字典配置
        if rospy.has_param(self._lm_param) and rospy.has_param(self._dic_param):
            self.start_recognizer()
        else:
            rospy.logwarn("啟動語音識別器必須指定語言模型lm,以及語言字典dic.")
            # rospy.logwarn("lm and dic parameters need to be set to start recognizer.")
                    
    def start_recognizer(self):
        rospy.loginfo("開始語音識別... ")
        # rospy.loginfo("Starting recognizer... ")
        
        self.pipeline = gst.parse_launch(self.launch_config)# 解析 麥克風配置 
        self.asr = self.pipeline.get_by_name('asr')         # 自動語音識別 模型
        self.asr.connect('partial_result', self.asr_partial_result)# 後面的函式
        self.asr.connect('result', self.asr_result)
        #self.asr.set_property('configured', True) # 需要開啟配置  hmm模型
        self.asr.set_property('dsratio', 1)

        # 配置語言模型
        if rospy.has_param(self._lm_param):
            lm = rospy.get_param(self._lm_param)
        else:
            rospy.logerr('請配置一個語言模型 lm.')
            return

        if rospy.has_param(self._dic_param):
            dic = rospy.get_param(self._dic_param)
        else:
            rospy.logerr('請配置一個語言字典 dic.')
            return
        
        if rospy.has_param(self._hmm_param):
            hmm = rospy.get_param(self._hmm_param)
        else:
            rospy.logerr('請配置一個語言識別模型 hmm.')
            return


        self.asr.set_property('lm', lm)   # 設定asr的語言模型
        self.asr.set_property('dict', dic)# 設定asr的字典
        self.asr.set_property('hmm', hmm) # 設定asr的識別模型
        

        self.bus = self.pipeline.get_bus()
        self.bus.add_signal_watch()
        self.bus_id = self.bus.connect('message::application', self.application_message)
        self.pipeline.set_state(gst.STATE_PLAYING)
        self.started = True

    # 解析 麥克風名稱 得到 麥克風ID
    def pulse_index_from_name(self, name):
        output = commands.getstatusoutput("pacmd list-sources | grep -B 1 'name: <" + name + ">' | grep -o -P '(?<=index: )[0-9]*'")

        if len(output) == 2:
            return output[1]
        else:
            raise Exception("Error. pulse index doesn't exist for name: " + name)
        
    # 停止識別器
    def stop_recognizer(self):
        if self.started:
            self.pipeline.set_state(gst.STATE_NULL)
            self.pipeline.remove(self.asr)
            self.bus.disconnect(self.bus_id)
            self.started = False
    # 程式關閉
    def shutdown(self):
        """ 刪除所有的引數，以防影響下次啟動"""
        for param in [self._device_name_param, self._lm_param, self._dic_param]:
            if rospy.has_param(param):
                rospy.delete_param(param)

        """ 關閉 GTK 程序. """
        gtk.main_quit()
    # 開始
    def start(self, req):
        self.start_recognizer()
        rospy.loginfo("識別器啟動")
        return EmptyResponse()
    # 停止
    def stop(self, req):
        self.stop_recognizer()
        rospy.loginfo("識別器停止")
        return EmptyResponse()
    
    def asr_partial_result(self, asr, text, uttid):
        """前線部分結果到主執行緒. """
        struct = gst.Structure('partial_result')
        struct.set_value('hyp', text)
        struct.set_value('uttid', uttid)
        asr.post_message(gst.message_new_application(asr, struct))

    def asr_result(self, asr, text, uttid):
        """ 前線結果到主執行緒 """
        struct = gst.Structure('result')
        struct.set_value('hyp', text)
        struct.set_value('uttid', uttid)
        asr.post_message(gst.message_new_application(asr, struct))

    def application_message(self, bus, msg):
        """ 從總線上接收應用資料. """
        msgtype = msg.structure.get_name()
        if msgtype == 'partial_result':
            self.partial_result(msg.structure['hyp'], msg.structure['uttid'])
        if msgtype == 'result':
            self.final_result(msg.structure['hyp'], msg.structure['uttid'])
    # 部分結果
    def partial_result(self, hyp, uttid):
        """ Delete any previous selection, insert text and select it. """
        rospy.logdebug("Partial: " + hyp)
        
    # 最終結果
    def final_result(self, hyp, uttid):
        """ Insert the final result. """
        msg = String()# 話題訊息型別
        msg.data = str(hyp)# 識別語音對於成的文字
        rospy.loginfo(msg.data)
        self.pub.publish(msg)

if __name__ == "__main__":
    start = recognizer()
    gtk.main()

 3) 建立新的語音單詞
  a) 建立語音單詞語句檔案  一行一句 的 txt檔案
   例如:
    roscd rbx1_speech/config
    more nav_commands.txt

pause speech
continue speech
move forward
move backward
move back
move left
move right
...

## 中文 
voice_ctr.txt
前進
後退
左轉
右轉
向左轉
向右轉
停止
加速
減速

   b) 編譯生成語音庫
    通過線上的一個 語言模型（lm）生成
    http://www.speech.cs.cmu.edu/tools/lmtool-new.html
    上傳語言檔案  Upload a sentence corpus file: Browse
    線上編譯     COMPILE KNOWLEDGE BASE
    下載         編譯好的檔案

使用 .dic 字典檔案 音節/音素 字典檔案
     .lm  語言默默檔案 出現的概率

注意中文 .dic檔案是空的
需要自己生成

使用別人生成好的比較全的.dic檔案查詢自己定義的單詞的 音節
例如：
cd ewenwan/catkin_ws/src/voice_system/model/lm/zh/zh_CN/
grep 停止 mandarin_notone.dic

>>>
停止	t ing zh ib
停止聽寫	t ing zh ib t ing x ie
停止錄音	t ing zh ib l u y in
停止注水	t ing zh ib zh u sh ui
呼吸停止	h u x i t ing zh ib
自動停止	z if d ong t ing zh ib

# 編寫 詞典檔案
# voice_ctr.dic
前進	t ing zh ib
後退	h ou t ui
左轉	z uo zh uan
右轉	y uo zh uan
向左轉	x iang z uo zh uan
向右轉	x iang y uo zh uan
停止	t ing zh ib
加速	j ia s u
減速	j ian s u

   c) 編寫自己的launch啟動檔案

voice_nav_commands.launch

<launch>
 <node name="recognizer" pkg="pocketsphinx" type="recognizer.py" output="screen"> #識別器
  <param name="lm" value="$(find rbx1_

 
 
              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    ROS 教程3 機器人語音 語音識別理解合成控制 ASR NLU TTS
       
  
  
 機器人語音 語音識別理解合成控制 ASR NLU TTS 
 github 
 一、語音處理總體框架 
     1. 語音識別（ASR ， Automatic Speech Recognition ）
    2. 語義理解（NLU ， Natural Language Understan 

  
 

    

    
    如何給自己的微信機器人新增語音識別和文字識別的功能
      
							
							
							前言：這是我幾個月前一邊工作，一邊搗鼓的個人專案，一直沒有時間去寫點總結，也許是我這個人比較懶吧，不然也不會做出這種東西，哈哈哈。記得那時閒來無事，自己又有許多奇思異想，由於自己之前使用itchat做了個可以接管微信的聊天機器人，我就想能不能給自己的機器人加一些 

  
 

    

    
    ROS機器人作業系統中級教程 3
      
roslaunch在大型專案中的使用技巧
課程描述: 本教程主要介紹roslaunch在大型專案中的使用技巧。重點關注如何構建launch檔案使得它能夠在不同的情況下重複利用。我們將使用2dnav_pr2功能包作為學習案例。
課程難度: 中級 

目錄
簡介高層結構機器標籤和環境變數引數、名稱空間和yaml 

  
 

    

    
    【ROS總結】ROS下的百度語音識別應用
      
								
								            
						
                
前言

今天閒來無事檢視下語音識別在ROS中的應用，之前在ROS中玩過一段時間的Pocket Sphinx，關於Pocket Sphinx的學習過程以後會介紹，或者可以去網上搜索一些教程，都是比較不錯 

  
 

    

    
    智慧語音（識別+格式轉換+合成+相似度分析+問答）
       
 
 from aip import AipSpeech          檔案格式轉換（os）+翻譯成文字（原流001010）+提取有效資訊文段+利用相似度(simnet)人工制定答案
 +將答案與voice合成音訊寫入mp3
from aip import AipNlp
import os
#lsi模 

  
 

    

    
    論文筆記：語音情感識別（二）聲譜圖+CRNN
      一：An Attention Pooling based Representation Learning Method for Speech Emotion Recognition（2018 InterSpeech） （1）論文的模型如下圖，輸入聲譜圖，CNN先用兩個不同的卷積核分別提取時域特徵和頻域特徵，c 

  
 

    

    
    kaggle | 基於樸素貝葉斯分類器的語音性別識別
      
							
							
							
概要：  本實驗基於kaggle上的一個資料集，採用樸素貝葉斯分類器，實現了通過語音識別說話人性別的功能。本文將簡要介紹這一方法的原理、程式碼實現以及在程式設計過程中需要注意的若干問題，程式碼仍然是用MATLAB寫成的。
關鍵字： MATLAB; 語音性別識別 

  
 

    

    
    論文筆記：語音情感識別（四）語音特徵之聲譜圖，log梅爾譜，MFCC，deltas
      一：原始訊號 從音訊檔案中讀取出來的原始語音訊號通常稱為raw waveform，是一個一維陣列，長度是由音訊長度和取樣率決定，比如取樣率Fs為16KHz，表示一秒鐘內取樣16000個點，這個時候如果音訊長度是10秒，那麼raw waveform中就有160000個值，值的大小通常表示的是振幅。 
二：（線性 

  
 

    

    
    論文筆記：語音情感識別（五）語音特徵集之eGeMAPS，ComParE，09IS，BoAW
      一：LLDs特徵和HSFs特徵 （1）首先區分一下frame和utterance，frame就是一幀語音。utterance是一段語音，是比幀高一級的語音單位，通常指一句話，一個語音樣本。utterance由多幀語音組成，通常對一個utterance做分幀來得到多幀訊號。 （2）LLDs（low level  

  
 

    

    
    【AI測試】智慧音箱--自然語言處理，語音語義識別測試
      
							
							
							
自己目前沒有做過自然語言處理，語音語義識別測試，本文為聽一場語音語義識別測試分享學習所得，以及結合網上學習資料整理。

語音識別測試
主要考慮距離、噪聲、不同手機機型或硬體、不同網路

噪音干擾識別測試
不同距離識別測試
不同機型識別測試
不同網路語音識別速度 

  
 

    

    
    訊飛語音語音識別輔助類
      
                
個人網站：極客時代
下面貼出訊飛語音識別的輔助類：

package com.Ray.assistwModel;

import java.net.ContentHandler;
import java.security.PublicKey;

import com.Ray 

  
 

    

    
    Android自定義語音音訊對話識別翻譯動畫控制元件
      
							
							
							最近做翻譯器專案，專案中用到科大語音翻譯，語音動畫需要自己寫，對於我稍微有點複雜，把程式碼貼出來供大家參考下，不足之處請指正！ 
自定義控制元件包含有語音動畫（音量大小波浪動畫），音柱（音量大小音柱改變高低），文字（請講話、識別中、翻譯中）。


/**
 *  

  
 

    

    
    Unity整合百度語音識別和合成--REST API
      
							
							
							直接上unity的C#指令碼程式碼

百度語音識別



using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using System.Xml;
using 

  
 

    

    
    基於linphone android sdk 的voip語音、視訊通話 教程二、sip語音撥打、接聽
      
                

1.在實現撥打之前我們先把註冊狀態顯示到view上 

讓LinphoneMiniManager extends Service 便於傳送訊息到MainActivity.java 別忘了AndroidManifest.xml的註冊service

    <serv 

  
 

    

    
    想3分鐘搭建影象識別系統？這裡有一份TensorFlow速成教程(轉）
      http://www.voidcn.com/article/p-wyaahqji-dr.html
從我們見到的各種影象識別軟體來看，機器似乎能認出人臉、貓、狗、花草、各種汽車等等日常生活中出現的物體，但實際上，這有一個前提：你要用這些類別的影象，對它進行過訓練。
確切地說，該叫它“影象分類”。
建立一個影象分 

  
 

    

    
    中國移動靈犀雲語音識別及合成SDK（iOS）使用指南
      
                

隨著智慧家居概念的火熱，語音互動這一新時代的人機互動方式再度掀起了熱潮。移動網際網路應用在設計開發時也紛紛考慮加入語音識別功能，帶給使用者除傳統鍵盤或觸控互動方式外的更便捷互動體驗。中國移動近日推出的“靈犀雲”，作為新一代的智慧語音雲平臺，正致力於為應用提供全方位的智慧語 

  
 

    

    
    機器人作業系統ROS Indigo 入門學習（6）——理解ROS Nodes
      
                
 １．理解ROS Node
這個教程將會介紹ROS圖的概念並且會討論roscoe,rosnode,和rosrun命令列工具。

1.1前提
在這個教程中我們會用到小型模擬器，請安裝：

$ sudo apt-get install ros-<distro>-ro 

  
 

    

    
    ROS實戰（一） 語音互動系統的學習：初步瞭解語音互動流程
      
							
							
							前言

雖然說目前語音識別，合成這塊技術已經很成熟了，沒什麼可以拓展的地方了，但是還是想自己實現一下在ros下進行語音識別以及熟悉整個流程，感覺還是挺cool的。 
目前這塊屬科大訊飛和百度語音這兩公司做的不錯，不過還有其他的一些比如：思必馳，捷通華聲，雲之聲， 

  
 

    

    
    機器人作業系統ROS教程（三） ROS新手教程
      
                
前面我們介紹了ROS的特點和結構，接下來就要開始準備動手感受一下ROS的強大了。ROS官網的wiki上針對 新手的教程很詳細，最好把所有的新手教程都搞清楚，這是後面開發最基礎的東西。儘管如此，ROS對於新手 來說還是很難上手，這裡，我就來總結一下我當時學習的歷程，也為其他新 

  
 

    

    
    C# 10分鐘完成百度語音技術（語音識別與合成）——入門篇
      我們已經講了人臉識別（入門+進階）、圖片識別（入門）。下面是連結：
C# 10分鐘完成百度人臉識別——入門篇
C# 30分鐘完成百度人臉識別——進階篇（文末附原始碼）
C# 10分鐘完成百度圖片提取文字（文字識別）——入門篇
今天我們來盤一盤語音識別與合成。
PS：僅供瞭解參考，如需進一步瞭解請繼續研究。
我

ROS 教程3 機器人語音語音識別理解合成控制 ASR NLU TTS

機器人語音語音識別理解合成控制 ASR NLU TTS

一、語音處理總體框架

二、國外庫

1、語音識別 pocketsphinx

ROS 教程3 機器人語音語音識別理解合成控制 ASR NLU TTS

如何給自己的微信機器人新增語音識別和文字識別的功能

ROS機器人作業系統中級教程 3

【ROS總結】ROS下的百度語音識別應用

智慧語音（識別+格式轉換+合成+相似度分析+問答）

論文筆記：語音情感識別（二）聲譜圖+CRNN

kaggle | 基於樸素貝葉斯分類器的語音性別識別

論文筆記：語音情感識別（四）語音特徵之聲譜圖，log梅爾譜，MFCC，deltas

論文筆記：語音情感識別（五）語音特徵集之eGeMAPS，ComParE，09IS，BoAW

【AI測試】智慧音箱--自然語言處理，語音語義識別測試

訊飛語音語音識別輔助類

Android自定義語音音訊對話識別翻譯動畫控制元件

Unity整合百度語音識別和合成--REST API

基於linphone android sdk 的voip語音、視訊通話教程二、sip語音撥打、接聽

想3分鐘搭建影象識別系統？這裡有一份TensorFlow速成教程(轉）

中國移動靈犀雲語音識別及合成SDK（iOS）使用指南

機器人作業系統ROS Indigo 入門學習（6）——理解ROS Nodes

ROS實戰（一）語音互動系統的學習：初步瞭解語音互動流程

機器人作業系統ROS教程（三） ROS新手教程

C# 10分鐘完成百度語音技術（語音識別與合成）——入門篇

ROS 教程3 機器人語音 語音識別理解合成控制 ASR NLU TTS

機器人語音 語音識別理解合成控制 ASR NLU TTS

一、語音處理總體框架

二、國外庫

1、語音識別 pocketsphinx

相關推薦

ROS 教程3 機器人語音語音識別理解合成控制 ASR NLU TTS

機器人語音語音識別理解合成控制 ASR NLU TTS