使用Java記錄四級試卷中的單詞出現頻率

阿新 • • 發佈：2018-11-17

1. 前言

實現的功能是，記錄四級試卷中所有單詞出現頻率，並將單詞以頻率高低的方式進行排序，當然不僅是試卷，其他的文章都可以

2. 思路

怎麼使用Java記錄一張四級卷子裡中所有的單詞呢

讀取到文章所有的內容
區分單詞與其他字元，用map將單詞與出現次數建立對映
將map排序

3. 實現

3.1 讀取文章所有內容

首先在網上下載四級卷子的word版，將其內容複製到txt文字中，我們使用Java讀取這個txt文字裡的所有內容

使用的是Java 7中新增的NIO包中的Files類和Path類來讀取檔案，非常方便，確保JDK版本在1.7以上

// 引數填上需要讀取檔案的路徑
Path path = Paths.get("D:\code\words\2015年6月.txt"); 

BufferedReader br = Files.newBufferedReader(path);
String line = null;
while ((line = br.readLine()) != null) {
  System.out.println(line);
}

3.2 區分單詞與其他字元，用map將單詞與出現次數建立對映

一開始想使用正則的方法來將單詞拆分，但發現會比較麻煩，因為單詞拆分原則比較多，有空格，有括號，引號以及句號等等

還有一個重要原因是因為，我想把單詞第一個大寫的字母轉換為小寫，這樣在統計頻率的時候，會更加利於我們操作

所以採用了下面的方法

while ((line = br.readLine()) != null) {
  StringBuilder word = new StringBuilder();
  for (int j = 0; j < line.length(); j++) {
    char ch =line.charAt(j);
    if (ch >= 'a' && ch <= 'z') {
      word.append(ch); 

    } else if (ch >= 'A' && ch <= 'Z') { // 如果單詞是大寫，把它轉換為小寫形式，方便統計次數
      word.append((char) (ch + 32));
    } else if (word.toString().length() > 1) {// 如果遇到不是單詞的字元，而且單詞不是一個字母
        String word_string = word.toString();
        // 如果單詞不存在, 將單詞填入並初始化為1次
        if (!map.containsKey(word_string)) {
          map.put(word_string, 1);
        } else {
            // 如果單詞存在，找到該單詞對應的頻率，將其加1，重新覆蓋
          map.put(word_string, map.get(word_string) + 1);
        }
        // 清空StringBuilder, 計入下個單詞
        word.setLength(0);
      }
  }
}

3.3 將map排序

用到Collections.sort()方法和map的內部介面Map.Entry

Map.Entry是map的一個內部介面，它表示map的一個實體（也就是一個key-value對）

用以下方法可以實現，將Map按value以降序的方式排序

List<Map.Entry<String, Integer>> infoIds = new ArrayList<>(map.entrySet());

Collections.sort(infoIds, new Comparator<Map.Entry<String, Integer>>() {
  @Override
  public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
    return (o2.getValue() - o1.getValue());
  }
});

4. 程式碼

以下是實現的總程式碼，可計算多張卷子

package indi.zmj.corejava.bit;

import java.io.BufferedReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.*;

/**
 * @author zmj
 * @create 2018/11/11
 */
public class splitWords {

  public void readPapers(String[] url) throws IOException {
    
    TreeMap<String, Integer> map = new TreeMap<>();

    for (int i = 0; i < url.length; i++) {
      Path path = Paths.get(url[i]);
      BufferedReader br = Files.newBufferedReader(path);
      String line = null;
      // 讀取檔案中的每一行, 並用StringBuilder統計每個單詞
      while ((line = br.readLine()) != null) {
        StringBuilder word = new StringBuilder();
        for (int j = 0; j < line.length(); j++) {
          char ch = line.charAt(j);
          if (ch >= 'a' && ch <= 'z') {
            word.append(ch);
          } else if (ch >= 'A' && ch <= 'Z') {
            // 如果單詞是大寫，把它轉換為小寫形式，方便統計次數
            word.append((char) (ch + 32));
          } else if (word.toString().length() > 1) {
            String word_string = word.toString();
            // 如果單詞不存在, 將單詞填入並初始化為1次
            if (!map.containsKey(word_string)) {
              map.put(word_string, 1);
            } else {
              // 如果單詞存在，找到該單詞對應的頻率，將其加1，重新覆蓋
              map.put(word_string, map.get(word_string) + 1);
            }
            // 清空StringBuilder, 計入下個單詞
            word.setLength(0);
          }
        }
      }
      br.close();
    }

    List<Map.Entry<String, Integer>> infoIds = new ArrayList<>(map.entrySet());

    Collections.sort(infoIds, new Comparator<Map.Entry<String, Integer>>() {
      @Override
      public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
        return (o2.getValue() - o1.getValue());
      }
    });
    for (int i = 0; i < infoIds.size(); i++) {
      System.out.printf("單詞：%12s   出現了：%d次\n", infoIds.get(i).getKey(), infoIds.get(i).getValue());
    }
  }

  public static void main(String[] args) throws IOException {
    String[] url = {"D:\\code\\words\\2015年6月.txt", "D:\\code\\words\\2015年12月.txt"
            , "D:\\code\\words\\2016年6月.txt", "D:\\code\\words\\2016年12月.txt", 
            "D:\\code\\words\\2017年12月.txt", "D:\\code\\words\\2018年6月.txt"};
    new splitWords().readPapers(url);
  }
}

5. 效果圖

截取了其中一部分效果
在這裡插入圖片描述

使用Java記錄四級試卷中的單詞出現頻率

1. 前言實現的功能是，記錄四級試卷中所有單詞出現頻率，並將單詞以頻率高低的方式進行排序，當然不僅是試卷，其他的文章都可以 2. 思路怎麼使用Java記錄一張四級卷子裡中所有的單詞呢讀取到文章所有的內容區分單詞與其他字元，用map將單詞與出現次數建

統計一TXT文件中單詞出現頻率，輸出頻率最高的10個單詞

實驗過程主要思路就是首先將標點符號，常用冠詞等替換掉，然後利用雜湊表和陣列原理排序，輸出最高頻率的前十個陣列程式碼如下 import java.io.BufferedReader; import java.io.File; import java.io.Fil

用python分析英語母音及子音音素在單詞中的出現頻率資料

import sqlite3 import matplotlib.pyplot as plt vowels=["iː","i","ɪ","e","æ","ɑː","ɒ","ɔː","ʊ","u","uː","ʌ","ɜː","ə","eɪ","əʊ","aɪ"

Python - 統計一篇文章中單詞的頻率

readlines lis pre sta spl pen word lower pri def frenquence_statistic(file_name): frequence = {} for line in open(file_name,‘r‘)

Hadoop 統計檔案中單詞出現的次數

pom.xml <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://

[Trie樹] 統計英文文字中單詞出現的個數 - C語言實現 - 考慮數字、英文

【英文文字】 However, after reaching the shore there are plenty of challenges waiting for him."The biggest challenge now is learning to walk agai

統計檔案中單詞出現的頻次

public class Util{ public static void main(String[] args) throws IOException { //鍵盤錄入指定檔名 Scanner sc = new Scanner(Sys

Python獲取一段文章中字母出現頻率前5的字母以及個數（去除空格、換行符等，只算字母）

import time,re from collections import Counter text = 'A friend of mine named Paul received an automobile from his brother as Christmas present.

linux統計txt檔案中單詞出現次數並排序

檔案：a.txt 任務：統計該檔案中每一個單詞出現的次數，並按照出現頻率從大到小排序 sed 's/ /\n/g' "a.txt" | sort | uniq -c | sort -nr 解析： sed替換 sed 's/被替換的字串/新字串/[替換選項]' fil

java 找出陣列中只出現一次的數字

題目：一個整型數組裡除了兩個數字之外，其他的數字都出現了兩次。請寫程式找出這兩個只出現一次的數字。演算法如下： import java.util.HashMap; import java

讀取JDK API文件，並根據單詞出現頻率排序

1，拿到 API 文件登入 https://docs.oracle.com/javase/8/docs/api/ ，選中特定的類，然後 copy 其中的內容，放入 TXT 檔案中， 2，讀取TXT內容，並排序 package com.lgx.test; import java.io

資料結構經典面試題：在字串中找到出現頻率大於50%的那個字元

來源：我是碼農，轉載請保留出處和連結！本文連結：http://www.54manong.com/?id=13 問題描述：在某個字串中（字串可能很長，比如有幾千萬個字元），請找出某個出現頻率大於50%的那個字元。例如：在字串"aabcdaa"中，字串長為7，字元'a'出現了4

IOS菜鳥的所感所思(十一)——統計文字中單詞出現的次數並按照次數高低排序

//確認我放英文檔案的目錄下又該檔案， - (NSString *)getFileData{ //這是放在其沙盒路徑下 // NSString *docDirPath = [NSSearchPathForDirectoriesInDomains(NSCachesDirectory, NS

C語言通過二叉樹實現單詞出現頻率的統計

一步步記錄自己的成長，在DVE-C++下編譯通過 #include <stdio.h> #include <ctype.h> #include <string.h> #include <stdlib.h> #define MA

根據陣列中數字出現頻率排序

不知道是那個面試題目，題目大概意思就是陣列中不多於10個一位的數字（0--9），根據數字出現的頻率從大到小排序，頻率相同的按照數值大小排序，最後輸出排序後的結果。例如陣列資料：arrayA[10]={1,1,2,3,4,4,6,7,7,7}; 根據頻率排序後結果是：7 7

《Java》Java實現一個“計算文字中某個詞出現頻率”的應用程式

一、目的計算某個詞出現頻率，可以很好的對一篇文章水平的評價提供客觀依據，比如在技術類文章中出現“的”字的頻率太高，說明此文章邏輯不夠嚴謹，本次應用程式的目的就是計算出指定文字中指定詞出現的頻率。二、程式演示有如下一個文字檔案w.txt，我們計算“的”字出

Java版統計文件中的每個單詞出現次數

ack ioe .cn style pri .html key red reg 正則表達式之Pattern和Matcher，請參見轉載博客 http://www.cnblogs.com/haodawang/p/5967219.html 代碼實現： 1 import

Java 統計一個字串中每個單詞,或者字母出現的次數

package cn.itcast.demo24; import java.util.HashMap; /* * 用程式碼實現以下需求(1)有如下字串"If you want to change your fate I think you must come to the

Java實現統計一篇文章中每個單詞出現的次數

import java.io.File; import java.io.FileReader; import java.util.HashMap; import java.util.Iterator; import java.util.Map; import java.util.Set; import jav

java讀取一篇英語文章並且統計出單詞出現的頻率並從高到低輸出

package com.amt.crm.controller; import java.io.BufferedReader; import java.io.FileReader; import jav

使用Java記錄四級試卷中的單詞出現頻率

1. 前言

2. 思路

3. 實現

3.1 讀取文章所有內容

3.2 區分單詞與其他字元，用map將單詞與出現次數建立對映

3.3 將map排序

4. 程式碼

5. 效果圖

相關推薦