統計一篇英文文章中出現次數最多的10個單詞

阿新 • • 發佈：2019-02-08

package se;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

public class damn {
	public static void main(String[] args) throws IOException {
		String str2 = System.getProperty("java.io.tmpdir");
		System.out.println(str2);
		long start = System.currentTimeMillis(); // 程式開始時間
		File file = new File("C:/Users/Wll/Desktop/Computer.txt");

		BufferedReader br = new BufferedReader(new FileReader(file));
		StringBuilder sb = new StringBuilder();
		String line = null;
		while ((line = br.readLine()) != null) {
			sb.append(line);
		}
		br.close(); // 關閉流

		String words = sb.toString(); // 全部的單詞字串
		String targetString = words.replaceAll("[.,\"\\?!:;\\(\\)]", ""); // 將標點替換為空

		// 分詞並且定義英文中不代表實際意義的一些單詞，如介詞、代詞、情態動詞等
		String[] singleWord = targetString.split(" ");
		String[] keys = { "you", "i", "he", "she", "me", "him", "her", "it",
				"they", "them", "we", "us", "your", "yours", "our", "his",
				"her", "its", "my", "in", "into", "on", "for", "out", "up",
				"down", "at", "to", "too", "with", "by", "about", "among",
				"between", "over", "from", "be", "been", "am", "is", "are",
				"was", "were", "whthout", "the", "of", "and", "a", "an",
				"that", "this", "be", "or", "as", "will", "would", "can",
				"could", "may", "might", "shall", "should", "must", "has",
				"have", "had", "than" };

		// 將一部分常見的無意義的英語單詞替換為字元 '#' 以便後面輸出單詞出現次數時的判斷
		for (int i = 0; i < singleWord.length; i++) {
			for (String str : keys) {
				if (singleWord[i].equals(str))
					singleWord[i] = "#";
			}
		}

		// 將單詞以及其出現的次數關聯起來
		for (int i = 0; i < singleWord.length; i++) {
			count++; // 計算單詞個數
			if ((wordMap.get(singleWord[i]) != null)) {
				int value = ((Integer) wordMap.get(singleWord[i])).intValue();
				value++;
				wordMap.put(singleWord[i].toLowerCase(), new Integer(value)); // 將單詞轉換為小寫存放以統一格式
			} else {
				wordMap.put(singleWord[i].toLowerCase(), new Integer(1));
			}

		}

		System.out.println("\t\t--檔案資訊--");
		System.out.println("     名稱： " + file.getName() + "    大小： "
				+ file.length() / 1024 + " KB");
		System.out.println("\t\t--檔案資訊--");
		System.out.println();
		System.out.println("■■■■ " + count + " 個單詞中出現頻率最高的 10 個單詞如下■■■■");

		// 比較器， 按值排序
		System.setProperty("java.util.Arrays.useLegacyMergeSort", "true");
		List<Entry<String, Integer>> list = new ArrayList<Entry<String, Integer>>(
				wordMap.entrySet());
		Collections.sort(list, new Comparator<Entry<String, Integer>>() {
			public int compare(Entry<String, Integer> e1,
					Entry<String, Integer> e2) {
				if (e2.getValue() != null && e1.getValue() != null
						&& e2.getValue().compareTo(e1.getValue()) > 0) {
					return 1;
				} else {
					return -1;
				}
			}
		});

		int wordCount = 1; // 記錄已經輸出單詞的個數
		for (Map.Entry<String, Integer> entry : list) {
			if (entry.getKey().equals("#")) // 相當於過濾作用，不輸出介詞、代詞、情態動詞等無意義單詞
				continue;
			System.out.printf("\t%2d、 %8s \t %4d次\n", wordCount,
					entry.getKey(), entry.getValue());
			if (wordCount++ == 10) { // 表示只輸出10個
				long end = System.currentTimeMillis(); // 程式結束時間
				System.out.println("■■■■■■■■■■■■■■■ 耗時 " + (end - start)
						+ " ms" + " ■■■■■■■■■■■■■■■■");
				return;
			}
		}
	}

	private static HashMap<String, Integer> wordMap = new HashMap<String, Integer>();
	private static int count = 0;
}

程式執行情況如下：

總的來說，這個程式自己覺得還是完成得比較好，而且從中也學到了很多。比如正則表示式之前沒怎麼接觸過，這次就學習了許多正則表示式相關的知識。另外，也進一步熟悉了HashMap類和ArrayList類。除此之外，還學到了一些編寫程式的方法與技巧，使得程式碼條理更加清晰。

統計一篇英文文章中出現次數最多的10個單詞

package se; import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.IOException; import java.util.

Java統計一篇文章中出現次數最多的漢字或英文單詞又出現次數的統計

思想是用到了Map集合的鍵唯一性儲存漢字或者單詞，單詞的獲取通過正則獲取：統計類： import java.util.ArrayList; import java.util.Map; import java.util.Set; import java.util.Tree

統計一篇文章中出現次數最多的前k個詞，文章中一行一詞

應該考慮檔案大小和詞的多少，有一個1G大小的一個檔案，裡面每一行是一個詞，詞的大小不超過16位元組，記憶體限制大小是1M。返回頻數最高的100

C++經典題目二：統計一篇英文文章中的單詞個數

要求：統計處一篇英文文章中的不同的單詞，並得到單詞個數。用一個單向連結串列儲存所出現的單詞，注意幾點：1）檔案輸入輸出；2）字串處理；3）連結串列資料結構再看程式碼——演算法實現如下： //========================================

TOP K演算法（微軟筆試題統計英文電子書中出現次數最多的k個單詞）

在v_JULY_v的文章中找到了這個問題的解法後用C++實現了一下，發現C++的程式碼非常的簡潔。主要用到了標準庫中的hash_map，優先順序佇列priority_queue。

如何統計一本英文書(比如簡愛)中出現次數最多的前1000個單詞

如何統計一本書中出現最多的前1000個單詞，其實只要處理好新單詞的儲存，已經出現單詞個數的統計，和根據出現次數的排序，就基本完成了這個專案。思路：1、從檔案中依次讀取一個個字元，如果是字母字元就放到一個字元陣列中，當讀取的字元是空格或者標點符號時，則已經讀取的字串視為一個單詞，將其儲存起來。&

Python實現統計一篇英文文章內每個單詞的出現頻率的兩種很好解法

有一道Python面試題: 用python實現統計一篇英文文章內每個單詞的出現頻率，並返回出現頻率最高的前10個單詞及其出現次數。檔案的內容，就拷貝import this模組中的內容，檔名為: this.txt The Zen of Python, by Tim Peters Beauti

判斷一個字符串中出現次數最多的字符，統計這個次數

sdff BE 出現次數 div asd 遍歷轉換成 arr str var str = ‘abaasdffggghhjjkkgfddsssss3444343‘; // 1.將字符串轉換成數組 var newArr = str.spl

返回（統計）一個列表中出現次數最多的元素

clas 列表元素 center enter ax1 叠代器 orm {} 首先定義一個函數函數內逐行依次解釋為： #定義一個函數def max1(lt):　　 dict1 = {} #建立一個空字典 s = set(lt)

判斷一個字符串中出現次數最多的字符，並統計字數

spa bbbb aci light turn border java UNC ber \1 等於 (\w) var s = ‘aaabbbcccaaabbbaaabbbbbbbbbb‘; var a = s.split(‘‘); a.sort();

(ES6的“...“配合ES5‘’forEach‘’)前端面試之判斷一個字串中出現次數最多的字元，統計這個次數

// 判斷一個字串中出現次數最多的字元，統計這個次數 let str = 'aasdadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsdasjjhsghkafsagjkg

Problem A: 零起點學演算法91——找出一個數組中出現次數最多的那個元素

#include<stdio.h> int main() { int n,a[20],b[20]={0}; while(scanf("%d",&n)!=EOF) { for(int i=0;i<n;i++) {

演算法--統計文字中出現次數最多的單詞（字典樹）

統計一個文字中，出現次數最多的單詞：單詞全部小寫，單詞與單詞之間以空格間隔 1.利用字典 key為單詞 value為單詞出現的次數 def mostString(): dict = {} fr = open('preprocessing.txt')

sort +awk+uniq 統計檔案中出現次數最多的前10個單詞

原文地址：http://blog.sina.com.cn/s/blog_5dce657a01012ddi.html 作者：小新例項cat logt.log|sort -s -t '-' -k1n |awk '{print $1;}'|uniq -c|sort -k1nr|head

js 判斷一個字串中出現次數最多的字元，統計其出現次數

js 判斷一個字串中出現次數最多的字元，統計其出現次數 var str = 'asdfssaaasasasasaa'; var json = {}; for (var i = 0; i < str.length; i++) { if(!json[str.charAt(i

js統計陣列中出現次數最多的元素

想來想去沒想出更好的解決方法。思路就是在ana函式裡先遍歷一遍arr，維護一個數組newArr 用於將所有數值相同的放在陣列中的同一個塊內。另外一個數組unique 用於維護數值唯一，判斷當前的item是否已存在於newArr中。為了方便比較物件的數值相同，

判斷一個字串中出現次數最多的字元，統計這個次數

var str = 'asdfssaaasasasasaa'; var json = {}; for (var i = 0; i < str.length; i++) { if(!json[str.charAt(i)]){ json[str.cha

linux中sort（統計檔案中出現次數最多的前10個單詞）

例項 cat logt.log|sort -s -t '-' -k1n |awk '{print $1;}'|uniq -c|sort -k1nr|head -100 使用linux命令或者shell實現：檔案words存放英文單詞，格式為每行一個英文單詞

統計陣列中出現次數最多的元素並輸出

實驗過程中遇到一個實際問題：需要統計出10次計數的值中出現最多的一個數，比如輸入34 35 35 35 34 35 35 35 34 33 十個數，要求最終輸出35.如果出現兩個數同樣多，則輸出兩個元素中較小的那一個（也可以是較大的那一個，但是必須確定是其中一種）。程式碼

查詢一個數組中出現次數最多的值（長度為1000）

今天有一道面試題就是這個問題，然後我自己的解決方法是這樣的： var findMost = function(arr){ let arrL=[],arrN=[],count=1; //定義兩個陣列一個存出現次數一個存出

統計一篇英文文章中出現次數最多的10個單詞

相關推薦