ubuntu14.04中開始Python NLP之旅(一)
由於此係統是去年安裝的,環境是ubuntu14.04。之前在Windows環境下已經安裝好了python的環境,而且安裝好了各種包,包括nltk,numpy,matplotlib等。打算將工作環境徹底換到ubuntu環境中,遂開始折騰安裝環境,新手,有不對的還麻煩幫忙指出以便於改正。
1. 安裝python3.5.2
在官網上面下載好了python3.5.2的包,然後解壓安裝,我安裝到/usr/local/Python/Python-3.5.2/目錄下,並且在/usr/bin下面刪除了原有的python符號連結檔案,為新安裝的python3.5.2在/usr/bin/下面建立了python符號連結。測試,python3.5.2安裝完畢。然後準備安裝nltk,首先需要安裝pip。結果在這裡安裝pip時候,遇到了很多錯誤,最後連繫統都出現一點兒問題了。遇到的問題實在太多了,後來才發現ubuntu中已有的python2.7,是不能刪除的,於是乎又使得/usr/bin下面的python符號連結檔案,重新指向/usr/bin/python2.7。打算另闢新徑。
2. 安裝setuptools
tar -zxvf setuptools-32.1.2.zip
cd setuptools-32.1.2
python3 setup.py build
sudo python3 setup.py install
3. 安裝pip
tar -zxvf pip-9.0.1.tar.gz
cd pip-9.0.1
python3 setup.py build
sudo python3 setup.py install
4.安裝nltk
sudo pip install -U nltk
測試剛剛的安裝
輸入python3,進入到python3.4環境中,然後輸入import nltk
5.安裝軟體包
安裝Numpy
sudo pip install -U numpy
安裝scipy
sudo pip install -U scipy
安裝matplotlib
sudo pip install -U matplotlib
注意這裡需要用sudo獲取許可權進行安裝,否則會因為許可權不足而報許可權類錯誤
備註:有時候下載matplotlib這樣的軟體包,實在是速度太慢了,可以先下載好軟體包之後,然後再使用pip形式進行本地安裝。
首先安裝wheel
sudo pip install wheel
matplotlib地址https://pypi.python.org/pypi/matplotlib/
scipy地址https://pypi.python.org/pypi/scipy/
下載時候,注意選擇對應於自己python版本的軟體包,比如我用python3.4,就下載matplotlib-2.0.0rc2-cp34-cp34m-manylinux1_x86_64.whl,注意“cp34”
然後就可以使用命令pip install XXX.whl來安裝.whl的檔案了
6.nltk_data
按照官方推薦的方式去下載安裝,實在是太慢了。從百度雲https://pan.baidu.com/s/1hq7UUFU 上面下載,解壓。放在哪個目錄下面呢?
輸入python3之後,輸入from nltk.book import *之後,會有一段錯誤提示
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/nltk/corpus/util.py", line 63, in __load
try: root = nltk.data.find('corpora/%s' % zip_name)
File "/usr/local/lib/python3.4/dist-packages/nltk/data.py", line 641, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource 'corpora/gutenberg.zip/gutenberg/' not found. Please
use the NLTK Downloader to obtain the resource: >>>
nltk.download()
Searched in:
- '/home/×××yourName/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
**********************************************************************
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.4/dist-packages/nltk/book.py", line 20, in <module>
text1 = Text(gutenberg.words('melville-moby_dick.txt'))
File "/usr/local/lib/python3.4/dist-packages/nltk/corpus/util.py", line 99, in __getattr__
self.__load()
File "/usr/local/lib/python3.4/dist-packages/nltk/corpus/util.py", line 64, in __load
except LookupError: raise e
File "/usr/local/lib/python3.4/dist-packages/nltk/corpus/util.py", line 61, in __load
root = nltk.data.find('corpora/%s' % self.__name)
File "/usr/local/lib/python3.4/dist-packages/nltk/data.py", line 641, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource 'corpora/gutenberg' not found. Please use the NLTK
Downloader to obtain the resource: >>> nltk.download()
Searched in:
- '/home/***yourName/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
**********************************************************************
將下載的zip檔案解壓後,放置於上面五個目錄中任意一個即可。
遇到的問題:
在Windows環境下,可以使用將需要下載的安裝包連結複製到迅雷,讓迅雷下載更快;但是我的ubuntu中沒有安裝wine,於是使用uget,aria2,並且在Firefox中安裝flashgot外掛,這樣下載安裝包會快一點兒。