【聊天機器人篇】--聊天機器人從初始到應用

阿新 • • 發佈：2018-07-08

tab python版本級別 star int log ror spa import

一、前述

維基百科中的機器人是指主要用於協助編者執行大量自動化、高速或機械式、繁瑣的編輯工作的計算機程序或腳本及其所登錄的帳戶。

二、具體

1、最簡單的就是基於Rule-Base的聊天機器人。

也就是計算設計好語料庫的問答語句。就是小學生級別的問什麽答什麽

import random

# 打招呼
greetings = [‘hola‘, ‘hello‘, ‘hi‘, ‘Hi‘, ‘hey!‘,‘hey‘]
# 回復打招呼
random_greeting = random.choice(greetings)

# 對於“你怎麽樣？”這個問題的回復
question = [‘How are you? 
‘,‘How are you doing?‘]
# “我很好”
responses = [‘Okay‘,"I‘m fine"]
# 隨機選一個回
random_response = random.choice(responses)

# 機器人跑起來
while True:
    userInput = input(">>> ")
    if userInput in greetings:
        print(random_greeting)
    elif userInput in question:
        print(random_response)
     
# 除非你說“拜拜”
    elif userInput == ‘bye‘:
        break
    else:
        print("I did not understand what you said")

結果：

>>> hi
hey
>>> how are u
I did not understand what you said
>>> how are you
I did not understand what you said
>>> how are you?
I did not understand what you said
 
>>> How are you?
I‘m fine
>>> bye

2、升級I：

顯然這樣的rule太弱智了，我們需要更好一點的“精準對答”，比如透過關鍵詞來判斷這句話的意圖是什麽（intents）。

from nltk import word_tokenize
import random

# 打招呼
greetings = [‘hola‘, ‘hello‘, ‘hi‘, ‘Hi‘, ‘hey!‘,‘hey‘]
# 回復打招呼
random_greeting = random.choice(greetings)

# 對於“假期”的話題關鍵詞
question = [‘break‘,‘holiday‘,‘vacation‘,‘weekend‘]
# 回復假期話題
responses = [‘It was nice! I went to Paris‘,"Sadly, I just stayed at home"]
# 隨機選一個回
random_response = random.choice(responses)



# 機器人跑起來
while True:
    userInput = input(">>> ")
    # 清理一下輸入，看看都有哪些詞
    cleaned_input = word_tokenize(userInput)
    # 這裏，我們比較一下關鍵詞，確定他屬於哪個問題
    if  not set(cleaned_input).isdisjoint(greetings):
        print(random_greeting)
    elif not set(cleaned_input).isdisjoint(question):
        print(random_response)
    # 除非你說“拜拜”
    elif userInput == ‘bye‘:
        break
    else:
        print("I did not understand what you said")

>>> hi
hey
>>> how was your holiday?
It was nice! I went to Paris
>>> wow, amazing!
I did not understand what you said
>>> bye

大家大概能發現，這依舊是文字層面的“精準對應”。現在主流的研究方向，是做到語義層面的對應。比如，“肚子好餓哦”， “飯點到了”，應該表示的是要吃飯了的意思。在這個層面，就需要用到word vector之類的embedding方法，這部分內容日後的課上會涉及到。

3、升級II：

光是會BB還是不行，得有知識體系！才能解決用戶的問題。我們可以用各種數據庫，建立起一套體系，然後通過搜索的方式，來查找答案。比如，最簡單的就是Python自己的graph數據結構來搭建一個“地圖”。依據這個地圖，我們可以清楚的找尋從一個地方到另一個地方的路徑，然後作為回答，反饋給用戶。

# 建立一個基於目標行業的database
# 比如 這裏我們用python自帶的graph
graph = {‘上海‘: [‘蘇州‘, ‘常州‘],
         ‘蘇州‘: [‘常州‘, ‘鎮江‘],
         ‘常州‘: [‘鎮江‘],
         ‘鎮江‘: [‘常州‘],
         ‘鹽城‘: [‘南通‘],
         ‘南通‘: [‘常州‘]}

# 明確如何找到從A到B的路徑
def find_path(start, end, path=[]):
    path = path + [start]
    if start == end:
        return path
    if start not in graph:
        return None
    for node in graph[start]:
        if node not in path:
            newpath = find_path(node, end, path)
            if newpath: return newpath
    return None

print(find_path(‘上海‘, "鎮江"))

[‘上海‘, ‘蘇州‘, ‘常州‘, ‘鎮江‘]

同樣的構建知識圖譜的玩法，也可以使用一些Logic Programming，比如上個世紀學AI的同學都會學的Prolog。或者比如，python版本的prolog：PyKE。他們可以構建一種復雜的邏輯網絡，讓你方便提取信息，而不至於需要你親手code所有的信息:

son_of(bruce, thomas, norma)
son_of(fred_a, thomas, norma)
son_of(tim, thomas, norma)
daughter_of(vicki, thomas, norma)
daughter_of(jill, thomas, norma)

4、升級III：

任何行業，都分個前端後端。AI也不例外。我們這裏講的算法，都是後端跑的。那麽，為了做一個靠譜的前端，很多項目往往也需要一個簡單易用，靠譜的前端。比如，這裏，利用Google的API，寫一個類似鋼鐵俠Tony的語音小秘書Jarvis：我們先來看一個最簡單的說話版本。利用gTTs(Google Text-to-Speech API), 把文本轉化為音頻。

from gtts import gTTS
import os
tts = gTTS(text=‘您好，我是您的私人助手，我叫小辣椒‘, lang=‘zh-tw‘)
tts.save("hello.mp3")
os.system("mpg321 hello.mp3")

同理，有了文本到語音的功能，我們還可以運用Google API讀出Jarvis的回復：

（註意：這裏需要你的機器安裝幾個庫 SpeechRecognition, PyAudio 和 PySpeech）

import speech_recognition as sr
from time import ctime
import time
import os
from gtts import gTTS
import sys
 
# 講出來AI的話
def speak(audioString):
    print(audioString)
    tts = gTTS(text=audioString, lang=‘en‘)
    tts.save("audio.mp3")
    os.system("mpg321 audio.mp3")

# 錄下來你講的話
def recordAudio():
    # 用麥克風記錄下你的話
    r = sr.Recognizer()
    with sr.Microphone() as source:
        audio = r.listen(source)
 
    # 用Google API轉化音頻
    data = ""
    try:
        data = r.recognize_google(audio)
        print("You said: " + data)
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))
 
    return data

# 自帶的對話技能（rules）
def jarvis():
    
    while True:
        
        data = recordAudio()

        if "how are you" in data:
            speak("I am fine")

        if "what time is it" in data:
            speak(ctime())

        if "where is" in data:
            data = data.split(" ")
            location = data[2]
            speak("Hold on Tony, I will show you where " + location + " is.")
            os.system("open -a Safari https://www.google.com/maps/place/" + location + "/&amp;")

        if "bye" in data:
            speak("bye bye")
            break

# 初始化
time.sleep(2)
speak("Hi Tony, what can I do for you?")

# 跑起
jarvis()

Hi Tony, what can I do for you?
You said: how are you
I am fine
You said: what time is it now
Fri Apr  7 18:16:54 2017
You said: where is London
Hold on Tony, I will show you where London is.
You said: ok bye bye
bye bye

不僅僅是語音前端。包括應用場景：微信，slack，Facebook Messager，等等都可以把我們的ChatBot給integrate進去。

【聊天機器人篇】--聊天機器人從初始到應用

tab python版本級別 star int log ror spa import 一、前述維基百科中的機器人是指主要用於協助編者執行大量自動化、高速或機械式、繁瑣的編輯工作的計算機程序或腳本及其所登錄的帳戶。二、具體 1、最簡單的就是基於Rule-Base的聊天機

【聊天機器人篇】--聊天機器人從初始到應用

【聊天機器人篇】--聊天機器人從初始到應用

華為新機榮耀Magic2，微信聊天記錄怎麼恢復?【安卓篇】

【AC軍團週報（第四周）第一篇】線段樹從入門到入土【4】（未完成）

【從零開始/親測國內外均可】基於阿里雲Ubuntu的kubernetes(k8s)主從節點分散式叢集搭建——分步詳細攻略v1.11.3【準備工作篇】

002java面試筆記——【java基礎篇】從團800失敗面試總結的java面試題

【破解手機QQ】聊天記錄內容的8個問題

Python開發【第六篇】：Python基礎條件和循環

Python開發【第五篇】：Python基礎之2

Python開發【第四篇】：Python基礎之函數

Python之路【第五篇】：面向對象及相關

PowerShell【do while篇】

用戶控件的緩存技術之一【共三篇】

用戶控件的緩存技術之二【共三篇】

python學習【第三篇】基本數據類型

業務邏輯漏洞挖掘隨筆【身份認證篇】

Python開發【第xxx篇】函數練習題-----員工信息表

Python開發【第xxx篇】函數習題

【第三篇】ASP.NET MVC快速入門之安全策略（MVC5+EF6）

python之旅六【第七篇】面向對象

【幹貨篇】步步為營，帶你輕松掌握jQuery！

【聊天機器人篇】--聊天機器人從初始到應用

相關推薦