python統計中文單詞

阿新 • • 發佈：2019-01-30

#coding:UTF-8
import sys
sys.setrecursionlimit(100000000)
def wordHan(inIo, outIo='wordcountHAN.txt', writing='w'):
s = ''
for fg in inIo:
s = s + open(fg, 'r').read().decode('utf-8')
print "一共" , len(s) , "單詞"
lt = set(s)
word = []
for x in lt:
if 19968 <= ord(x) <= 40869:
word.append(x)
sts = lambda x:[x + "-->", str(s.count(x) * 1.0 / len(s))]
m = map(sts, word)
m = wordsort(m[0], m, 0)
w = open(outIo, writing)
for i in m:
w.writelines(i)
w.write('\n')
w.flush()
w.close()

def wordsort(x, m, i):
if len(m[i:]) == 1:
return m
for v in m[i + 1:]:
f = float(v[1])
if f > float(m[i][1]):
ind = m.index(v, i + 1)
z = m[i]
m[i] = v
m[ind] = z
i += 1
return wordsort(m[i], m, i)

if __name__ == '__main__':
wordHan(['test1.txt', 'test2.txt'], writing='w')
wordEn('test1.txt', writing='w')

python統計中文單詞

#coding:UTF-8 import sys sys.setrecursionlimit(100000000) def wordHan(inIo, outIo='wordcountHAN.txt', writing='w'): s = '' fo

用python統計文章單詞詞頻

import re with open("text.txt") as f: #讀取檔案中的字串 txt = f.read() #去除字串中的標點、數字等 txt = re.sub('[,\.()":;[email pr

Python - 統計一篇文章中單詞的頻率

readlines lis pre sta spl pen word lower pri def frenquence_statistic(file_name): frequence = {} for line in open(file_name,‘r‘)

用python統計你的文章裡每個英文單詞的數量

p=''' i heared a story about you ''' #你的文章段 lines=p.strip().split('\n') words_cnt={} for line in lines: line=line.replace(',','').lower() #逗號都用英

jmu-Java&Python-統計一段文字中的單詞個數並按單詞的字母順序排序後輸出

現需要統計若干段文字(英文)中的不同單詞數量。如果不同的單詞數量不超過10個，則將所有單詞輸出(按字母順序)，否則輸出前10個單詞。注1：單詞之間以空格(1個或多個空格)為間隔。注2：忽略空行或者空格行。注3：單詞大小寫敏感，即'word'與'WORD'是兩個不同的單詞。輸入說明

Python統計一個英文文件中各單詞出現的行數

在網上看到一個人求的大作業，要求是這樣的：讀入一個英文的文件，然後建立一個單詞引用索引表，也就是說，對於該文件中出現的所有單詞，按照字母順序進行排序，並且每個單詞後面跟著它在文件中出現的行號。然後把這個索引表顯示出來，同時儲存在一個輸出檔案中。為了方便處理，假定文件長度不

用python統計檔案中各個單詞出現的次數

import string d = {} def choice(str): s = str.lower() #全部轉化為小寫 for c in range(97,123): #ASC

用python統計多個文字中你想統計的單詞

import collections #計數器 import os import string path = "/Users/U/workspace/python learning/show-me-

利用PYTHON快速統計數字|單詞在文字中出現的次數

容器資料型別collections 原始碼： Lib / collections /__init__.py 這個模組實現專門的容器資料型別提供替代Python的通用內建容器中，dict，list， s

Python統計excel表格中文本的詞頻，生成詞雲圖片

matplot round nump window idt excel表格 __name__ xlrd rom import xlrd import jieba import pymysql import matplotlib.pylab as plt from

Python統計列表中的重復項出現的次數

times 個數 %d list 記錄位置解決方法利用解決對一個列表，比如[1,2,2,2,2,3,3,3,4,4,4,4]，現在我要統計這個列表裏的重復項，並且重復了幾次也要統計出來。eg1：mylist = [1,2,2,2,2,3,3,3,4,4,4,

Python cmd 中文顯示亂碼

number popu span type nco proc utf-8 顯示 odin 方法一：# -*- coding:utf-8 -*- content = "我是中文" content_unicode = content.decode("utf-8") conten

Python統計web應用的每個連接使用情況

日誌分析 nginx 功能背景：前段時間接到一個需求，領導說他想要知道我們在生產環境中某系統的每個應用使用情況。需求：統計每個按鈕的點擊量；不能影響生產環境；數據要不斷遞增，而不是看某個時間段的；數據要永久存放，不丟；思路：我想這可以通過nginx的日誌來進行分析，每個action和後臺的ngi

python統計nginx日誌域名下載量

訪問量統計 python 統計nginx訪問日誌，access.log形式：1xx.xx.xxx.xxx - - [09/Oct/2017:10:48:21 +0800] "GET /images/yyy/2044/974435412336/Cover/9787503434r.jpg HTTP

python判斷一個單詞是否為有效的英文單詞？——三種方法

eas www. cal ges art etc code port href For (much) more power and flexibility, use a dedicated spellchecking library like PyEnchant. Ther

Python Kivy 中文教程：安裝（Windows）

速度超越入門 ret 加速平臺 ads 運行打包工具 Kivy 是一套用於跨平臺快速應用開發的開源框架，只需編寫一套代碼，便可運行於各大桌面及移動平臺上（包括 Linux, Windows, OS X, Android, iOS, 以及 Raspberry Pi）

python實現中文轉換url編碼的方法

pos true clas url編碼 python 10.10 前端美的 http 本文實例講述了python實現中文轉換url編碼的方法。分享給大家供大家參考，具體如下：之前做上傳圖片編輯的時候，文件路勁中出現中文，會變成以下這種格式： http://192

Python統計字符串中的中英文字符、數字空格，特殊字符

sci 數字 ascii col body int total ron pan # -*- coding:utf8 -*- import string from collections import namedtuple def str_count(s):

Python 生成中文詞雲

images nag 塑造中國國情基礎如果 jieba 和諧進入豆子無意中發現Python有個現成的模塊 word cloud可以根據文本文件生成詞雲，很好很強大，簡單地玩了一把。寫代碼之前，首先需要安裝3個依賴的Python模塊，分別是matplotlib,

python統計apache、nginx訪問日誌IP訪問次數並且排序（顯示前20條）

als apache orm item lambda roo oot ipaddr str 前言：python統計apache、nginx訪問日誌IP訪問次數並且排序（顯示前20條）。其實用awk+sort等命令可以實現，用awk數組也可以實現，這裏只是用python嘗試下