1. 程式人生 > >ubuntu14.04中開始Python NLP之旅(一)

ubuntu14.04中開始Python NLP之旅(一)

由於此係統是去年安裝的,環境是ubuntu14.04。之前在Windows環境下已經安裝好了python的環境,而且安裝好了各種包,包括nltk,numpy,matplotlib等。打算將工作環境徹底換到ubuntu環境中,遂開始折騰安裝環境,新手,有不對的還麻煩幫忙指出以便於改正。

1. 安裝python3.5.2

在官網上面下載好了python3.5.2的包,然後解壓安裝,我安裝到/usr/local/Python/Python-3.5.2/目錄下,並且在/usr/bin下面刪除了原有的python符號連結檔案,為新安裝的python3.5.2在/usr/bin/下面建立了python符號連結。測試,python3.5.2安裝完畢。然後準備安裝nltk,首先需要安裝pip。結果在這裡安裝pip時候,遇到了很多錯誤,最後連繫統都出現一點兒問題了。遇到的問題實在太多了,後來才發現ubuntu中已有的python2.7,是不能刪除的,於是乎又使得/usr/bin下面的python符號連結檔案,重新指向/usr/bin/python2.7。打算另闢新徑。

2. 安裝setuptools

tar -zxvf setuptools-32.1.2.zip
cd setuptools-32.1.2
python3 setup.py build
sudo python3 setup.py install

3. 安裝pip

tar -zxvf pip-9.0.1.tar.gz
cd pip-9.0.1
python3 setup.py build
sudo python3 setup.py install

4.安裝nltk

sudo pip install -U nltk
測試剛剛的安裝
輸入python3,進入到python3.4環境中,然後輸入import nltk

5.安裝軟體包

安裝Numpy
sudo pip install -U numpy
安裝scipy
sudo pip install -U scipy
安裝matplotlib
sudo pip install -U matplotlib

注意這裡需要用sudo獲取許可權進行安裝,否則會因為許可權不足而報許可權類錯誤

備註:有時候下載matplotlib這樣的軟體包,實在是速度太慢了,可以先下載好軟體包之後,然後再使用pip形式進行本地安裝。
首先安裝wheel
sudo pip install wheel
matplotlib地址https://pypi.python.org/pypi/matplotlib/


scipy地址https://pypi.python.org/pypi/scipy/
下載時候,注意選擇對應於自己python版本的軟體包,比如我用python3.4,就下載matplotlib-2.0.0rc2-cp34-cp34m-manylinux1_x86_64.whl,注意“cp34”
然後就可以使用命令pip install XXX.whl來安裝.whl的檔案了

6.nltk_data

按照官方推薦的方式去下載安裝,實在是太慢了。從百度雲https://pan.baidu.com/s/1hq7UUFU 上面下載,解壓。放在哪個目錄下面呢?
輸入python3之後,輸入from nltk.book import *之後,會有一段錯誤提示

*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/nltk/corpus/util.py", line 63, in __load
    try: root = nltk.data.find('corpora/%s' % zip_name)
  File "/usr/local/lib/python3.4/dist-packages/nltk/data.py", line 641, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource 'corpora/gutenberg.zip/gutenberg/' not found.  Please
  use the NLTK Downloader to obtain the resource:  >>>
  nltk.download()
  Searched in:
    - '/home/×××yourName/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.4/dist-packages/nltk/book.py", line 20, in <module>
    text1 = Text(gutenberg.words('melville-moby_dick.txt'))
  File "/usr/local/lib/python3.4/dist-packages/nltk/corpus/util.py", line 99, in __getattr__
    self.__load()
  File "/usr/local/lib/python3.4/dist-packages/nltk/corpus/util.py", line 64, in __load
    except LookupError: raise e
  File "/usr/local/lib/python3.4/dist-packages/nltk/corpus/util.py", line 61, in __load
    root = nltk.data.find('corpora/%s' % self.__name)
  File "/usr/local/lib/python3.4/dist-packages/nltk/data.py", line 641, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource 'corpora/gutenberg' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
    - '/home/***yourName/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

將下載的zip檔案解壓後,放置於上面五個目錄中任意一個即可。

遇到的問題:

在Windows環境下,可以使用將需要下載的安裝包連結複製到迅雷,讓迅雷下載更快;但是我的ubuntu中沒有安裝wine,於是使用uget,aria2,並且在Firefox中安裝flashgot外掛,這樣下載安裝包會快一點兒。

參考網頁

其他可供參考網頁