python3.X 自然語言處理基礎知識

阿新 • • 發佈：2019-02-07

import nltk
nltk.download()#下載nltk語料庫如果沒有安裝nltk,請在cmd下用批 pip install nltk進行安裝

from nltk.book import *

###搜尋文字
#搜尋單詞
text1.concordance("monstrous")
text2.concordance("affection")
text3.concordance("lived")
text5.concordance("lol")

#搜尋相似詞
text1.similar("monstrous")

text2.similar("monstrous")

#搜尋共同上下文
text2.common_contexts(["monstrous", "very"])

#詞彙分佈圖
text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])

###計數詞彙
len(text3)

sorted(set(text3))

len(set(text3))

#重複詞密度
from __future__ import division
len(text3) / len(set(text3))

#關鍵詞密度
text3.count("smote")

100 * text4.count('a') / len(text4)

def lexical_diversity(text):
return len(text) / len(set(text))

def percentage(count, total):
return 100 * count / total

lexical_diversity(text3)

lexical_diversity(text5)

percentage(4, 5)

percentage(text4.count('a'), len(text4))

###詞連結串列
sent1 = ['Call', 'me', 'Ishmael', '.']
sent1

len(sent1)

lexical_diversity(sent1)

print(sent2)
print(sent3)

#連線
sent4+sent1
#追加
sent1.append("some")
print(sent1)

#索引
text4[173]

text4.index('awaken')

#切片
print(text5[16715:16735])
print(text6[1600:1625])

#索引從0開始，要注意
sent = ['word1', 'word2', 'word3', 'word4', 'word5','word6', 'word7', 'word8', 'word9', 'word10']
print(sent[0])
print(sent[9])

print(sent[10])

print(sent[5:8])
print(sent[5])
print(sent[6])
print(sent[7])

print(sent[:3])
print(text2[141525:])

sent[0] = 'First'
sent[9] = 'Last'
sent[1:9] = ['Second', 'Third']
print(sent)
sent[9]

###簡單統計
#頻率分佈
fdist1 = FreqDist(text1)
fdist1

vocabulary1 = list(fdist1.keys())
vocabulary1[:50]

fdist1['whale']

fdist1.plot(50, cumulative=True)

fdist1.hapaxes()#尋找只出現過一次的詞

#細粒度的選擇詞
V = set(text4)
long_words = [w for w in V if len(w) > 15]
sorted(long_words)

V = set(text5)
long_words = [w for w in V if len(w) > 15]
sorted(long_words)

fdist5 = FreqDist(text5)
sorted([w for w in set(text5) if len(w) > 7 and fdist5[w] > 7])

#詞語搭配
from nltk.util import bigrams
list(bigrams(['more', 'is', 'said', 'than', 'done']))

text4.collocations()

text8.collocations()

###其他統計結果
[len(w) for w in text1]

fdist = FreqDist([len(w) for w in text1])
fdist
fdist.keys()

fdist.items()

fdist.max()

fdist[3]

fdist.freq(3)

python3.X 自然語言處理基礎知識

python3.X 自然語言處理基礎知識

自然語言處理基礎知識

《統計自然語言處理基礎》作者Christopher D. Manning指出的NLP研究趨勢

自然語言處理基礎技術之依存句法分析

自然語言處理基礎技術之命名實體識別實戰

自然語言處理基礎技術之命名實體識別簡介

自然語言處理基礎技術之詞性標註實戰

自然語言處理基礎技術之分詞實戰

自然語言處理基礎技術之分詞介紹

自然語言處理基礎技術之依存句法分析實戰

自然語言處理基礎技術之成分句法分析

自然語言處理基礎技術之成分句法分析實戰

自然語言處理基礎技術之詞性標註

自然語言處理基礎技術之組合範疇文法

TensorFlow實現經典深度學習網路（5）：TensorFlow實現自然語言處理基礎網路Word2Vec

自然語言處理基礎（1）--基本分詞方法

統計自然語言處理基礎學習筆記(7)——句法分析

Hanlp中文自然語言處理入門基礎知識

NLP系列(1)_從破譯外星人文字淺談自然語言處理的基礎

文本情感分析的基礎在於自然語言處理、情感詞典、機器學習方法等內容。以下是我總結的一些資源。

python3.X 自然語言處理基礎知識

相關推薦