1. 程式人生 > 程式設計 >哈工大自然語言處理工具箱之ltp在windows10下的安裝使用教程

哈工大自然語言處理工具箱之ltp在windows10下的安裝使用教程

ltp是哈工大出品的自然語言處理工具箱,pyltp是python下對ltp(c++)的封裝.

在linux下我們很容易的安裝pyltp,因為各種編譯工具比較方便. 但是在windows下需要安裝vs並且還得做一些配置,因為我服務的人都是在windows下辦公,需要讓他們能夠在windows下使用ltp,所以才有了這篇筆記. 我的方案有兩個:

  • 在win10 的bash下安裝ltp,然後啟動ltp的server,通過http協議來實現在windows下python呼叫ltp的方法.
  • 安裝編譯好的wheel(目前只有python3.6/3.5 amd64)(我推薦這種方案)
  • 我在文章最下面還引用了一種方法,就是使用官方已經編譯好的可執行exe檔案,直接在命令列(如cmd)下呼叫.

第一種方案: bash下安裝

基本環境

  • windows 10
  • bash for windows
  • python 3.6

安裝bash on ubuntu on windows

這個大家自行百度,安裝很簡單.

安裝編譯環境

sudo apt install cmake
sudo apt install g++

安裝過程大概十幾分鍾.

下載ltp原始碼

  • 下載原始碼,這是github地址.
  • 解壓到你能記住的位置

編譯

cd到原始碼目錄,比如我的目錄:

cd /mnt/d/bash-sites/ltp-3.4.0

執行編譯命令:

./configure
make

編譯過程大概花費十幾分鍾. 現在我的目錄裡多了一個bin資料夾:

drwxrwxrwx 0 root root 512 Jan 31 15:42 ./
drwxrwxrwx 0 root root 512 Jan 31 15:30 ../
-rwxrwxrwx 1 root root 800 Jan 31 15:30 appveyor.yml*
-rwxrwxrwx 1 root root 0 Jan 31 15:30 AUTHORS*
drwxrwxrwx 0 root root 512 Jan 31 15:53 bin/
drwxrwxrwx 0 root root 512 Jan 31 15:42 build/
-rwxrwxrwx 1 root root 29301 Jan 31 15:30 ChangeLog.md*
drwxrwxrwx 0 root root 512 Jan 31 15:30 cmake/
-rwxrwxrwx 1 root root 1439 Jan 31 15:30 CMakeLists.txt*
drwxrwxrwx 0 root root 512 Jan 31 15:30 conf/
-rwxrwxrwx 1 root root 131 Jan 31 15:30 configure*
-rwxrwxrwx 1 root root 902 Jan 31 15:30 COPYING*
drwxrwxrwx 0 root root 512 Jan 31 15:30 doc/
-rwxrwxrwx 1 root root 79976 Jan 31 15:30 Doxyfile*
drwxrwxrwx 0 root root 512 Jan 31 15:30 examples/
-rwxrwxrwx 1 root root 1028 Jan 31 15:30 .gitignore*
drwxrwxrwx 0 root root 512 Jan 31 15:42 include/
-rwxrwxrwx 1 root root 85 Jan 31 15:30 INSTALL*
drwxrwxrwx 0 root root 512 Jan 31 15:53 lib/
-rwxrwxrwx 1 root root 965 Jan 31 15:30 Makefile*
-rwxrwxrwx 1 root root 6639 Jan 31 15:30 NEWS.md*
-rwxrwxrwx 1 root root 4750 Jan 31 15:30 README.md*
drwxrwxrwx 0 root root 512 Jan 31 15:30 src/
-rwxrwxrwx 1 root root 3048 Jan 31 15:30 subproject.d.json*
drwxrwxrwx 0 root root 512 Jan 31 15:31 thirdparty/
drwxrwxrwx 0 root root 512 Jan 31 15:31 tools/
-rwxrwxrwx 1 root root 1372 Jan 31 15:30 .travis.yml*

配置server

一開始我啟動server遇到了這個錯誤.

[INFO] 2018-01-31 15:54:39 Loading segmentor model from "ltp_data/cws.model" ...
[ERROR] 2018-01-31 15:54:39 /mnt/d/bash-sites/ltp-3.4.0/src/ltp/LTPResource.cpp: line 50: LoadSegmentorResource(): Failed to load segmentor model
[ERROR] 2018-01-31 15:54:39 /mnt/d/bash-sites/ltp-3.4.0/src/ltp/Ltp.cpp: line 78: load(): in LTP::wordseg,failed to load segmentor resource
[ERROR] 2018-01-31 15:54:39 /mnt/d/bash-sites/ltp-3.4.0/src/server/ltp_server.cpp: line 172: main(): Failed to setup LTP engine.

因為缺少了模型檔案,在這裡下載最新的模型檔案.

解壓到/mnt/d/bash-sites/ltp-3.4.0/ltp_data/下,這是ltp預設的資料模型存放位置.

然後就能順利啟動伺服器啦.

syd@DESKTOP-J02R2VJ:/mnt/d/bash-sites/ltp-3.4.0$ ./bin/ltp_server --port 9090
[INFO] 2018-01-31 15:56:36 Loading segmentor model from "ltp_data/cws.model" ...
[INFO] 2018-01-31 15:56:36 segmentor model is loaded.
[INFO] 2018-01-31 15:56:36 Loading postagger model from "ltp_data/pos.model" ...
[INFO] 2018-01-31 15:56:36 postagger model is loaded
[INFO] 2018-01-31 15:56:36 Loading NER resource from "ltp_data/ner.model"
[INFO] 2018-01-31 15:56:36 NER resource is loaded.
[INFO] 2018-01-31 15:56:36 Loading parser resource from "ltp_data/parser.model"
[INFO] 2018-01-31 15:56:37 parser is loaded.
[INFO] 2018-01-31 15:56:37 Loading srl resource from "ltp_data/pisrl.model"
[dynet] random seed: 493907432
[dynet] allocating memory: 2000MB
[dynet] memory allocation done.
[INFO] 2018-01-31 15:56:39 srl resource is loaded.
[INFO] 2018-01-31 15:56:39 Resources loading finished.
[INFO] 2018-01-31 15:56:39 Start listening on port [9090]...

測試

隨便寫個請求,看看效果:

import requests
import json
uri_base = "http://127.0.0.1:9090/ltp"
data = {'s': '我認為他叫湯姆去拿外衣和鞋子。','x': 'n','t': 'srl'}
response = requests.get(uri_base,data=data)
rdata = response.json()
print(json.dumps(rdata,indent=4,ensure_ascii=False))

[
 [
 [
 {
 "arg": [],"cont": "我","id": 0,"ne": "O","parent": 1,"pos": "r","relate": "SBV"
 },{
 "arg": [
  {
  "beg": 0,"end": 0,"type": "A0"
  },{
  "beg": 2,"end": 9,"id": 1,"type": "A1"
  }
 ],"cont": "認為","parent": -1,"pos": "v","relate": "HED"
 },{
 "arg": [],"cont": "他","id": 2,"parent": 3,{
 "arg": [
  {
  "beg": 2,"end": 2,{
  "beg": 4,"end": 4,"type": "A1"
  },{
  "beg": 5,"type": "A2"
  }
 ],"cont": "叫","id": 3,"relate": "VOB"
 },"cont": "湯姆","id": 4,"ne": "S-Nh","pos": "nh","relate": "DBL"
 },"cont": "去","id": 5,"parent": 6,"relate": "ADV"
 },{
 "arg": [
  {
  "beg": 7,"cont": "拿","id": 6,"cont": "外衣","id": 7,"pos": "n","cont": "和","id": 8,"parent": 9,"pos": "c","relate": "LAD"
 },"cont": "鞋子","id": 9,"parent": 7,"relate": "COO"
 },"cont": "。","id": 10,"pos": "wp","relate": "WP"
 }
 ]
 ]
]

第二種方案: 安裝wheel

下載wheels

下面兩個檔案針對不同的python版本下載一個即可, 這是我在自己的電腦(win10)上編譯的,不知道你的系統是否能用,64bit的windows應該都可以,有問題在下面留言。

  • pyltp-0.2.1-cp35-cp35m-win_amd64.whl
  • pyltp-0.2.1-cp36-cp36m-win_amd64.whl

注意: 這兩個檔案的區別是python版本號

安裝檔案

下載好了以後,在命令列下,cd到wheel檔案所在的目錄,然後使用命令pip install wheel檔名安裝.

測試

安裝好了以後,開啟python shell,試用一下.

from pyltp import SentenceSplitter
sents = SentenceSplitter.split('元芳你怎麼看?我就趴視窗上看唄!') # 分句
print('\n'.join(sents))

下載models資料

  • 下載models連結:https://pan.baidu.com/s/1o9vytmU 密碼:5ntf
  • 放到任意方便呼叫的地方即可, 因為程式裡需要你自己主動呼叫的

第三種方案: 直接呼叫編譯好的ltp的可執行檔案
可以參考這篇文章,但是我在3.4版本中測試不成功,載入srl資源失敗. 但是在3.3.1版本上測試是成功的.

總結

到此這篇關於哈工大自然語言處理工具箱之ltp在windows10下的安裝使用教程的文章就介紹到這了,更多相關ltp在windows10下的安裝使用內容請搜尋我們以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援我們!