使用python統計《三國演義》小說里人物出現次數前十名，並實現視覺化。

阿新 • • 發佈：2020-11-24

# 一、安裝所需要的第三方庫 > jieba （jieba是優秀的中文分詞第三分庫） > pyecharts （一個優秀的資料視覺化庫） > [《三國演義》.txt下載地址](https://pan.baidu.com/s/10y0C1iE5XEGh1MQy2eQDgg )（提取碼：kist ） ## 使用pycharm安裝庫 - 開啟Pycharm選擇【File】下的Settings ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201123212204458-1158385426.png) - 出現下面頁面, ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201123212326581-2045746829.png) - 選擇右邊的【+】出現下面頁面，在此頁面頂端搜尋想要的庫，然後安裝就可以了 ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201123212543283-674205403.png) # 二、編寫程式碼 ```Python import jieba #匯入庫 import os print("人物出現次數前十名：") txt = open('三國演義.txt', 'r' ,encoding='gb18030').read() words = jieba.lcut(txt) counts = {} for word in words: if len(word) == 1: continue elif word == "諸葛亮" or word == "孔明曰": rword = "孔明" elif word == "關公" or word == "雲長": rword = "關羽" elif word == "玄德" or word == "玄德曰": rword = "劉備" elif word == "孟德" or word == "丞相": rword = "曹操" # 把相同意思的名字歸為一個人 else: rword = word counts[rword] = counts.get(rword, 0) + 1 items = list(counts.items()) items.sort(key=lambda x: x[1], reverse=True) for i in range(10): word, count=items[i] print("{}:{}".format(word, count)) # 列印前十名名單 ``` - 結果如下圖： ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201123224330781-1222440661.png) - 可以看到這裡面有很多不是人物的名字，所以咱們要把這些刪掉。更改程式碼如下 ```Python import jieba #匯入庫 import os print("人物出現次數前十名：") txt = open('三國演義.txt', 'r' ,encoding='gb18030').read() remove = {"將軍", "卻說", "不能", "後主", "上馬", "不知", "天子", "大叫", "眾將", "不可", "主公", "蜀兵", "只見", "如何", "商議", "都督", "一人", "漢中", "人馬", "陛下", "魏兵", "天下", "今日", "左右", "東吳", "於是", "荊州", "不能", "如此", "大喜", "引兵", "次日", "軍士", "軍馬","二人","不敢"} # 這些文字是要排出掉的，多次執行程式所得到的 words = jieba.lcut(txt) counts = {} for word in words: if len(word) == 1: continue elif word == "諸葛亮" or word == "孔明曰": rword = "孔明" elif word == "關公" or word == "雲長": rword = "關羽" elif word == "玄德" or word == "玄德曰": rword = "劉備" elif word == "孟德" or word == "丞相": rword = "曹操" # 把相同意思的名字歸為一個人 else: rword = word counts[rword] = counts.get(rword, 0) + 1 for word in remove: del counts[word] #匹配文字相等就刪除 items = list(counts.items()) items.sort(key=lambda x: x[1], reverse=True) for i in range(10): word, count=items[i] print("{}:{}".format(word, count)) # 列印前十名名單 ``` - 執行結果如下圖 ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201124184517406-460277805.png) > 可以看到現在都是人物名稱了 - 匯出資料，程式碼如下 ```Python import jieba #匯入庫 import os print("人物出現次數前十名：") txt = open('三國演義.txt', 'r' ,encoding='gb18030').read() remove = {"將軍", "卻說", "不能", "後主", "上馬", "不知", "天子", "大叫", "眾將", "不可", "主公", "蜀兵", "只見", "如何", "商議", "都督", "一人", "漢中", "人馬", "陛下", "魏兵", "天下", "今日", "左右", "東吳", "於是", "荊州", "不能", "如此", "大喜", "引兵", "次日", "軍士", "軍馬","二人","不敢"} # 這些文字是要排出掉的，多次執行程式所得到的 words = jieba.lcut(txt) counts = {} for word in words: if len(word) == 1: continue elif word == "諸葛亮" or word == "孔明曰": rword = "孔明" elif word == "關公" or word == "雲長": rword = "關羽" elif word == "玄德" or word == "玄德曰": rword = "劉備" elif word == "孟德" or word == "丞相": rword = "曹操" # 把相同意思的名字歸為一個人 else: rword = word counts[rword] = counts.get(rword, 0) + 1 for word in remove: del counts[word] #匹配文字相等就刪除 items = list(counts.items()) items.sort(key=lambda x: x[1], reverse=True) #匯出資料 fo = open("三國人物出場次數.txt", "a", encoding='utf-8') for i in range(10): word, count=items[i] word = str(word) count = str(count) fo.write(word) fo.write(':') #使用冒號分開 fo.write(count) fo.write('\n') #換行 fo.close() #關閉檔案 ``` - 現在咱們執行看是否匯出，執行結果如下圖。 ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201124102400726-2121004112.png) > 可以看到已經生成一個名為三國人物出場次數.txt的檔案，而檔案裡的內容就是咱們剛才的資料。 # 三、資料視覺化 - 想要視覺化首先咱們要有資料，咱們把剛才匯出的資料轉換為字典形式。程式碼如下 ```Python #將txt文本里的資料轉換為字典形式 fr = open('三國人物出場次數.txt', 'r', encoding='utf-8') dic = {} keys = [] # 用來儲存讀取的順序 for line in fr: v = line.strip().split(':') dic[v[0]] = v[1] keys.append(v[0]) fr.close() print(dic) ``` -執行結果如下 ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201124103534014-1428020670.png) - 使用pyecharts繪圖 - 先倒入模組 ```Python from pyecharts import options as opts from pyecharts.charts import Bar ``` - 程式碼如下 ```Python #　繪圖 list1=list(dic.keys()) list2=list(dic.values()) #提取字典裡的資料作為繪圖資料 c = ( Bar() .add_xaxis(list1) .add_yaxis("人物出場次數",list2) .set_global_opts( xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)), ) .render("人物出場次數視覺化圖.html") ) ``` - 執行程式看到目錄下會生成一個名為人物出場次數視覺化圖.html的檔案，如下圖 ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201124185044727-535685821.png) - 使用瀏覽器開啟，就可以看到資料以圖形的方式呈現出來。 ![](https://img2020.cnblogs.com/blog/2205265/202011/2205265-20201124185256956-213926224.png) # 三、全部程式碼呈現 ```Python #《三國演義》的人物出場次數Python程式碼： import jieba #匯入庫 import os from pyecharts import options as opts from pyecharts.charts import Bar print("人物出現次數前十名：") txt = open('三國演義.txt', 'r' ,encoding='gb18030').read() remove = {"將軍", "卻說", "不能", "後主", "上馬", "不知", "天子", "大叫", "眾將", "不可", "主公", "蜀兵", "只見", "如何", "商議", "都督", "一人", "漢中", "人馬", "陛下", "魏兵", "天下", "今日", "左右", "東吳", "於是", "荊州", "不能", "如此", "大喜", "引兵", "次日", "軍士", "軍馬","二人","不敢"} # 這些文字是要排出掉的，多次執行程式所得到的 words = jieba.lcut(txt) counts = {} for word in words: if len(word) == 1: continue elif word == "諸葛亮" or word == "孔明曰": rword = "孔明" elif word == "關公" or word == "雲長": rword = "關羽" elif word == "玄德" or word == "玄德曰": rword = "劉備" elif word == "孟德" or word == "丞相": rword = "曹操" # 把相同意思的名字歸為一個人 else: rword = word counts[rword] = counts.get(rword, 0) + 1 for word in remove: del counts[word] #匹配文字相等就刪除 items = list(counts.items()) items.sort(key=lambda x: x[1], reverse=True) #匯出資料 fo = open("三國人物出場次數.txt", "a", encoding='utf-8') for i in range(10): word, count=items[i] word = str(word) count = str(count) fo.write(word) fo.write(':') #使用冒號分開 fo.write(count) fo.write('\n') #換行 fo.close() #關閉檔案 #將txt文本里的資料轉換為字典形式 fr = open('三國人物出場次數.txt', 'r',encoding='utf-8' ) dic = {} keys = [] # 用來儲存讀取的順序 for line in fr: v = line.strip().split(':') dic[v[0]] = v[1] keys.append(v[0]) fr.close() print(dic) #　繪圖 list1=list(dic.keys()) list2=list(dic.values()) #提取字典裡的資料作為繪圖資料 c = ( Bar() .add_xaxis(list1) .add_yaxis("人物出場次數",list2) .set_global_opts( xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)), ) .render("人物出場次數視覺化圖.html"

使用python統計《三國演義》小說里人物出現次數前十名，並實現視覺化。

# 一、安裝所需要的第三方庫 > jieba （jieba是優秀的中文分詞第三分庫） > pyecharts （一個優秀的資料視覺化庫） > [《三國演義》.txt下載地址](https://pan.baidu.com/s/10y0C1iE5XEGh1MQy2eQDgg )（提取碼：ki

【Python】三國演義詞頻統計

RM pre excludes 孔明 use {} HR form PE import jiebatxt = open(‘C:/Users/eternal/Desktop/threekingdoms.txt‘,‘r‘,encoding=‘UTF-8‘).read()　　#提

hdu3374 String Problem 最小最大表示法最小循環節出現次數

出現 mes knx %d cnblogs void cpp scanf hdu #include <iostream> #include <cstring> #include <cstdio> using namespace std;

詞雲分析《天龍八部》人物出現次數

出現次數標題 http 春秋 mar -s pen image 背景圖片一.需要的三方庫 1.安裝詞雲: 　　pip install wordcloud 2.安裝結巴　　pip install jieba 3.安裝matplotlib 　　pip install ma

python學習第一週獲取字串中出現次數最多的字母

給定一個包含不同的英文字母和標點符號的文字，找出其中出現最多的字母，檢測時不區分大小寫，並返回一個小寫字母，若存在相同次數的字母，則返回字母表中最先出現的那個。比如find,則返回f #!/usr/bin/python3 #-*- coding:UTF-8 -*- str_input = in

統計一個檔案中各個字母出現次數

import java.io.BufferedReader; import java.io.FileInputStream; import java.io.InputStreamReader; /**

如何統計英文文字中詞彙的出現次數

def getText(): txt = open('hamlet.txt', 'r').read() txt = txt.lower() for ch in '!"#$%&()*+,-./:;<=>[email protected][\\]^_

Python獲取一段文章中字母出現頻率前5的字母以及個數（去除空格、換行符等，只算字母）

import time,re from collections import Counter text = 'A friend of mine named Paul received an automobile from his brother as Christmas present.

使用Python 統計nginx日誌前十ip訪問量並以柱狀圖顯示

指令碼內容： import matplotlib.pyplot as plt # nginx_file = '10.6.11.91_access.log-2018-12-27' ip = {} #篩選nginx日誌檔案中的IP with open(nginx_file) as f: for

統計陣列中出現次數最多的元素並輸出

實驗過程中遇到一個實際問題：需要統計出10次計數的值中出現最多的一個數，比如輸入34 35 35 35 34 35 35 35 34 33 十個數，要求最終輸出35.如果出現兩個數同樣多，則輸出兩個元素中較小的那一個（也可以是較大的那一個，但是必須確定是其中一種）。程式碼

Python 實現找出一個字串中出現次數最多的字元並輸出該字元

'''演算法題二：找出一個字串中出現次數最多的字正確的解決思路是：利用collections 工具中的Counter,對列表中元素出現的頻率進行排序。 Counter返回值是一個按元素出現頻率降序排列的Counter物件，它是字典的子類，因此可以使用字典的方法'''fro

Python 詞頻｜張小龍 4 小時 3 萬字演講，哪些詞被提及最多？

閱讀文字大概需要 3 分鐘。今早朋友圈已經被張小龍的四小時演講給刷了屏，張小龍一手締造了微信帝國，被譽為中國最牛逼的產品經理，他的一言一行都會產生巨大的影響，刷屏也是預料之中的事。但隨著演講文章刷屏的同時，另一張圖片也瘋狂被轉發，

Java實現統計福彩雙色球出現次數（毫無技術可言）

今天早上閒來無事，恰好在看微博的時候看到的雙色球的開獎視訊，沒出現一個號，主持人就會報出現了多少次，於是自己就無聊的寫了一下統計每期的號碼出現的次數程式。彩票號碼資料來自網上：程式碼如下Money.java： package com.heynine.money;

統計字串中出現次數最多的字母並輸出

/* *input:tester *output:e */ #include<stdio.h> #include<stdlib.h> //統計字串中出現次數最多的字母，如果字母出現次數相同，則按a-z字典序輸出第一個。 int countCharac

統計一篇英文文章中出現次數最多的10個單詞

package se; import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.IOException; import java.util.

Python獲取個人網站的所有課程下載鏈接和密碼，並保存到Mongodb中

one find() net agent play col pat 進行 jpg 1、獲取網站課程的分類地址； ‘‘‘ 爬取屌絲首頁，獲取每個分類名稱和鏈接 ‘‘‘ import requests from lxml import etree headers = {

python difflib模塊實現兩個文件差異對比，並輸出html格式。

python difflib difflib 模塊包含一些用來計算和處理序列之間差異的工具。它對於比較文本尤其有用，其中包含的函數可以使用多種常用差異格式生成報告。實現了三個類： SequenceMatcher 任意類型序列的比較 (可以比較字符串)Differ 對字符串進行比較HtmlDiff

10.16輸入一個字符串，內有數字和非數字字符，如： a123x456 17960? 302tab5876 將其中連續的數字作為一個整數，依次存放到一數組num中。例如123放在num[0]中，456放在num[1]中……統計共有多少個整數，並輸出這些數。

tab lnp zip sm2 cuc ycm rds qt5 tft 10.16輸入一個字符串，內有數字和非數字字符，如： a123x456 17960? 302tab5876 將其中連續的數字作為一個整數，依次存放到一數組num中。例

Python隨機數random模塊學習，並實現生成6位驗證碼

import ID for循環 list 序列獲取大小前言字符一、前言學習python隨機數random模塊的使用 ,並使用模塊中的函數，實現6位驗證碼生成二、random模塊 1、random.random() 返回0-1直接的隨機數，類型為float &g

找到出現次數最多的數並輸出次數

找到出現次數最多的數並輸出次數 1.輸入n 表示輸入數字的個數 2.接下來輸入n 個數數之間分割用逗號分割要求是找出這n個數中出現最多次數的數，和出現的次數 #include "stdio.h" #include "malloc.h" /* 1.C語言沒有

使用python統計《三國演義》小說里人物出現次數前十名，並實現視覺化。

相關推薦