python自然語言處理01--搭建環境

阿新 • • 發佈：2019-01-07

入門學習使用的是《python自然語言處理》，書中是用python2.4或者2.5，但是因為安裝Anaconda2總是失敗，於是換成了Anaconda3。Anaconda是一款很好用的python開發整合環境。
首先安裝nltk，Anaconda已經預設下載了，所以只需要開啟Anaconda prompt，輸入命令列

conda install nltk

然後按照提示選擇y，等待數分鐘，如果沒有錯誤的話，就安裝完成了。
接下來開啟ipython進行測試，輸入

In [1]: import nltk

In [2]: nltk.download()

如果出現下面這個圖，說明已經安裝成功了
這裡寫圖片描述

接下來修改安裝路徑(Download Directory),預設是安裝在C盤的，我換成了Anaconda所在的路徑Anaconda/nltk_data。
點選book標記所在行，一鍵安裝《python自然語言處理》所需要的資料，下載過程會比較慢，而且有些會顯示out of date(在Corpora中檢視),但是下載中途一定不要去點選任何檔案試圖逐個下載，會導致卡機。
下載的過程中你會發現自己太天真了，畢竟是國外的網，下的不僅慢還容易出問題。我上傳了百度雲連結：http://pan.baidu.com/s/1pKWv5MV 密碼：655i(2016年年底的包，已經足夠使用了)，使用方法：IE瀏覽器開啟後，右擊選擇迅雷下載全部連結，可以批量下載，速度很快，有2.5G，其中2.1G是panlex_lite.zip。
我們先把除了panlex_lite.zip的資料夾解壓縮到nltk_data資料夾下，然後按照以下目錄安排好

├─chunkers #這一級為nltk_data下的資料夾
│ └─maxent_ne_chunker #這一級為相對應資料夾下的資料檔案
├─corpora      #這一級為nltk_data下的資料夾
│ ├─abc        #這一級為相對應資料夾下的資料檔案
│ ├─alpino
│ ├─basque_grammars
│ ├─biocreative_ppi
│ ├─book_grammars
│ ├─brown
│ ├─brown_tei
│ ├─cess_cat
│ ├─cess_esp
│ ├─chat80
│ ├─city_database
│ ├─cmudict
│ ├─comtrans
│ ├─conll2000
│ ├─conll2002
│ ├─conll2007
│ ├─dependency_treebank
│ ├─europarl_raw
│ │ 
│ ├─floresta
│ ├─gazetteers
│ ├─genesis
│ ├─gutenberg
│ ├─hmm_treebank_pos_tagger
│ ├─ieer
│ ├─inaugural
│ ├─indian
│ ├─jeita
│ ├─kimmo
│ ├─knbc
│ │ 
│ ├─langid
│ ├─large_grammars
│ ├─machado
│ │ 
│ ├─mac_morpho
│ ├─maxent_ne_chunker
│ ├─maxent_treebank_pos_tagger
│ ├─movie_reviews
│ │ 
│ ├─names
│ ├─nombank.1.0 

│ │ 
│ ├─nps_chat
│ ├─oanc_masc
│ │ 
│ ├─paradigms
│ ├─pe08
│ ├─pil
│ ├─pl196x
│ ├─ppattach
│ ├─problem_reports
│ ├─propbank
│ │ 
│ ├─ptb
│ ├─punkt
│ ├─qc
│ ├─reuters
│ │ 
│ ├─rslp
│ ├─rte
│ ├─sample_grammars
│ ├─semcor
│ │ 
│ ├─senseval
│ ├─shakespeare
│ ├─sinica_treebank
│ ├─smultron
│ ├─spanish_grammars
│ ├─state_union
│ ├─stopwords
│ ├─swadesh
│ ├─switchboard
│ ├─tagsets
│ ├─timit
│ │
│ ├─toolbox
│ │ 
│ ├─treebank
│ │ 
│ ├─udhr
│ ├─udhr2
│ ├─unicode_samples
│ ├─verbnet
│ ├─webtext
│ ├─wordnet
│ ├─wordnet_ic
│ ├─words
│ └─ycoe
├─grammars
│ ├─basque_grammars
│ ├─book_grammars
│ ├─large_grammars
│ ├─sample_grammars
│ └─spanish_grammars
├─help
│ └─tagsets
├─stemmers
│ └─rslp
├─taggers
│ ├─hmm_treebank_pos_tagger
│ ├─maxent_ne_chunker
│ └─maxent_treebank_pos_tagger
└─tokenizers
│ └─punkt

然後就可以邊做書中的實驗，邊等待panlex_lite.zip下載好後放入相應目錄。

測試:

In [1]: from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908

python自然語言處理01--搭建環境

python自然語言處理01--搭建環境

NLP-python 自然語言處理01

Python自然語言處理1

Python自然語言處理 Chapter 1

Python自然語言處理筆記【二】文本分類之監督式分類的細節問題

Python | 自然語言處理 (一)

《精通Python自然語言處理》高清中文版PDF+高清英文版PDF+源代碼

Python自然語言處理—統計詞頻

python自然語言處理-—安裝NLTK

精通Python自然語言處理 pdf 下載

python自然語言處理-讀書筆記5

python自然語言處理-讀書筆記4

python自然語言處理-讀書筆記3

python自然語言處理-讀書筆記9

python自然語言處理-讀書筆記8

python自然語言處理-讀書筆記7

python自然語言處理-讀書筆記6

python自然語言處理-讀書筆記

python自然語言處理（二）

python自然語言處理（一）

python自然語言處理01--搭建環境

相關推薦