【聊天機器人篇】--聊天機器人從初始到應用
一、前述
維基百科中的機器人是指主要用於協助編者執行大量自動化、高速或機械式、繁瑣的編輯工作的計算機程序或腳本及其所登錄的帳戶。
二、具體
1、最簡單的就是基於Rule-Base的聊天機器人。
也就是計算設計好語料庫的問答語句。 就是小學生級別的 問什麽 答什麽
import random
# 打招呼
greetings = [‘hola‘, ‘hello‘, ‘hi‘, ‘Hi‘, ‘hey!‘,‘hey‘]
# 回復打招呼
random_greeting = random.choice(greetings)
# 對於“你怎麽樣?”這個問題的回復
question = [‘How are you? ‘,‘How are you doing?‘]
# “我很好”
responses = [‘Okay‘,"I‘m fine"]
# 隨機選一個回
random_response = random.choice(responses)
# 機器人跑起來
while True:
userInput = input(">>> ")
if userInput in greetings:
print(random_greeting)
elif userInput in question:
print(random_response)
# 除非你說“拜拜”
elif userInput == ‘bye‘:
break
else:
print("I did not understand what you said")
結果:
>>> hi
hey
>>> how are u
I did not understand what you said
>>> how are you
I did not understand what you said
>>> how are you?
I did not understand what you said
>>> How are you?
I‘m fine
>>> bye
2、升級I:
顯然 這樣的rule太弱智了,我們需要更好一點的“精準對答”,比如 透過關鍵詞來判斷這句話的意圖是什麽(intents)。
from nltk import word_tokenize
import random
# 打招呼
greetings = [‘hola‘, ‘hello‘, ‘hi‘, ‘Hi‘, ‘hey!‘,‘hey‘]
# 回復打招呼
random_greeting = random.choice(greetings)
# 對於“假期”的話題關鍵詞
question = [‘break‘,‘holiday‘,‘vacation‘,‘weekend‘]
# 回復假期話題
responses = [‘It was nice! I went to Paris‘,"Sadly, I just stayed at home"]
# 隨機選一個回
random_response = random.choice(responses)
# 機器人跑起來
while True:
userInput = input(">>> ")
# 清理一下輸入,看看都有哪些詞
cleaned_input = word_tokenize(userInput)
# 這裏,我們比較一下關鍵詞,確定他屬於哪個問題
if not set(cleaned_input).isdisjoint(greetings):
print(random_greeting)
elif not set(cleaned_input).isdisjoint(question):
print(random_response)
# 除非你說“拜拜”
elif userInput == ‘bye‘:
break
else:
print("I did not understand what you said")
>>> hi
hey
>>> how was your holiday?
It was nice! I went to Paris
>>> wow, amazing!
I did not understand what you said
>>> bye
大家大概能發現,這依舊是文字層面的“精準對應”。現在主流的研究方向,是做到語義層面的對應。比如,“肚子好餓哦”, “飯點到了”,應該表示的是要吃飯了的意思。在這個層面,就需要用到word vector之類的embedding方法,這部分內容 日後的課上會涉及到。
3、升級II:
光是會BB還是不行,得有知識體系!才能解決用戶的問題。我們可以用各種數據庫,建立起一套體系,然後通過搜索的方式,來查找答案。比如,最簡單的就是Python自己的graph數據結構來搭建一個“地圖”。依據這個地圖,我們可以清楚的找尋從一個地方到另一個地方的路徑,然後作為回答,反饋給用戶。
# 建立一個基於目標行業的database
# 比如 這裏我們用python自帶的graph
graph = {‘上海‘: [‘蘇州‘, ‘常州‘],
‘蘇州‘: [‘常州‘, ‘鎮江‘],
‘常州‘: [‘鎮江‘],
‘鎮江‘: [‘常州‘],
‘鹽城‘: [‘南通‘],
‘南通‘: [‘常州‘]}
# 明確如何找到從A到B的路徑
def find_path(start, end, path=[]):
path = path + [start]
if start == end:
return path
if start not in graph:
return None
for node in graph[start]:
if node not in path:
newpath = find_path(node, end, path)
if newpath: return newpath
return None
print(find_path(‘上海‘, "鎮江"))
[‘上海‘, ‘蘇州‘, ‘常州‘, ‘鎮江‘]
同樣的構建知識圖譜的玩法,也可以使用一些Logic Programming,比如上個世紀學AI的同學都會學的Prolog。或者比如,python版本的prolog:PyKE。他們可以構建一種復雜的邏輯網絡,讓你方便提取信息,而不至於需要你親手code所有的信息:
son_of(bruce, thomas, norma)
son_of(fred_a, thomas, norma)
son_of(tim, thomas, norma)
daughter_of(vicki, thomas, norma)
daughter_of(jill, thomas, norma)
4、升級III:
任何行業,都分個前端後端。AI也不例外。我們這裏講的算法,都是後端跑的。那麽, 為了做一個靠譜的前端,很多項目往往也需要一個簡單易用,靠譜的前端。比如,這裏,利用Google的API,寫一個類似鋼鐵俠Tony的語音小秘書Jarvis:我們先來看一個最簡單的說話版本。利用gTTs(Google Text-to-Speech API), 把文本轉化為音頻。
from gtts import gTTS
import os
tts = gTTS(text=‘您好,我是您的私人助手,我叫小辣椒‘, lang=‘zh-tw‘)
tts.save("hello.mp3")
os.system("mpg321 hello.mp3")
同理,有了文本到語音的功能,我們還可以運用Google API讀出Jarvis的回復:
(註意:這裏需要你的機器安裝幾個庫 SpeechRecognition, PyAudio 和 PySpeech)
import speech_recognition as sr
from time import ctime
import time
import os
from gtts import gTTS
import sys
# 講出來AI的話
def speak(audioString):
print(audioString)
tts = gTTS(text=audioString, lang=‘en‘)
tts.save("audio.mp3")
os.system("mpg321 audio.mp3")
# 錄下來你講的話
def recordAudio():
# 用麥克風記錄下你的話
r = sr.Recognizer()
with sr.Microphone() as source:
audio = r.listen(source)
# 用Google API轉化音頻
data = ""
try:
data = r.recognize_google(audio)
print("You said: " + data)
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
return data
# 自帶的對話技能(rules)
def jarvis():
while True:
data = recordAudio()
if "how are you" in data:
speak("I am fine")
if "what time is it" in data:
speak(ctime())
if "where is" in data:
data = data.split(" ")
location = data[2]
speak("Hold on Tony, I will show you where " + location + " is.")
os.system("open -a Safari https://www.google.com/maps/place/" + location + "/&")
if "bye" in data:
speak("bye bye")
break
# 初始化
time.sleep(2)
speak("Hi Tony, what can I do for you?")
# 跑起
jarvis()
Hi Tony, what can I do for you?
You said: how are you
I am fine
You said: what time is it now
Fri Apr 7 18:16:54 2017
You said: where is London
Hold on Tony, I will show you where London is.
You said: ok bye bye
bye bye
不僅僅是語音前端。包括應用場景:微信,slack,Facebook Messager,等等 都可以把我們的ChatBot給integrate進去。
【聊天機器人篇】--聊天機器人從初始到應用