Java實驗--統計字母出現頻率及其單詞個數

阿新 • • 發佈：2019-05-04

trace ise 出現次數 [] 頻率 getparent 小寫字母 throws ole

本周的實驗要求在之前實現統計單詞的基礎之上(可以見之前博客的統計單詞的那個實驗)，對其進行修改成所需要的格式，統計字母出現頻率的功能，並按照一定的格式把最終結果的用特定的格式在文本中顯示出來

統計過程的實現並不太麻煩，在原來的基礎上導入導出函數的基礎上修改成通用的類型，統計單詞的那一部分的單個字符讀取那一段加上統計字母的情況，並加上判斷把大小寫字母統一起來。

同時，在統計單詞的那裏加上一個無用字母的表格。這樣就可以統計有用意義的前n個最常用的單詞了。

實驗的代碼如下所示：

package pipei;
//洪鼎淇 20173627 信1705-3
import java.io.File;
 
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.text.DecimalFormat;
import java.util.HashMap;
import java.util.Map;
//哈利波特單詞統計



public class Pipei {
    public static 
 Map<String,Integer> map1=new HashMap<String,Integer>();
    static int g_Wordcount[]=new int[27];
    static int g_Num[]=new int[27];
    
    static String []unUse=new String[] {
        "it",
        "in",
        "to",
        "of",
        "the",
        "and",
        "that",
         
"for"
    };
    
    public static void main(String arg[]) {
        daoruFiles("piao.txt","tongji");
        traverseFolder2("C:\\Users\\Halo\\javatest\\pipei\\piao");
        
    }
    public static void daoruFiles(String a,String dc)
    {
        map1.clear();
        try {
            daoru(a);
        } catch (IOException e) {
            // TODO 自動生成的 catch 塊
            e.printStackTrace();
            
        }
        String sz[];
        Integer num[];
        final int MAXNUM=10; //統計的單詞出現最多的前n個的個數
        
        for(int i=0;i<g_Wordcount.length;i++)
        {
            g_Wordcount[i]=0;
            g_Num[i]=i;
        }
        
        sz=new String[MAXNUM+1];
        num=new Integer[MAXNUM+1];
        Pipei pipei=new Pipei();
        int account =1;
        //Vector<String> ve1=new Vector<String>();
        try {
            daoru(a);
        } catch (IOException e) {
            // TODO 自動生成的 catch 塊
            e.printStackTrace();
        }
        System.out.println("英文單詞的出現情況如下:");
        int g_run=0;
        
        for(g_run=0;g_run<MAXNUM+1;g_run++)
        {
            account=1;
            for(Map.Entry<String,Integer> it : Pipei.map1.entrySet())
            {
                if(account==1)
                {
                    sz[g_run]=it.getKey();
                    num[g_run]=it.getValue();
                    account=2;
                }
                if(account==0)
                {
                    account=1;
                    continue;
                }
                if(num[g_run]<it.getValue())
                {
                    sz[g_run]=it.getKey();
                    num[g_run]=it.getValue();
                }
                //System.out.println("英文單詞: "+it.getKey()+" 該英文單詞出現次數: "+it.getValue());
            }
            Pipei.map1.remove(sz[g_run]);
        }
        int g_count=1;
        String tx1=new String();
        String tx2=new String();
        for(int i=0;i<g_run;i++)
        {
            if(sz[i]==null)
                continue;
            if(sz[i].equals(""))
                continue;
            tx1+="出現次數第"+(g_count)+"多的單詞為:"+sz[i]+"\t\t\t出現次數: "+num[i]+"\r\n";
            System.out.println("出現次數第"+(g_count)+"多的單詞為:"+sz[i]+"\t\t\t出現次數: "+num[i]);
            g_count++;
        }
        try {
            daochu(tx1,dc+"2.txt");
        } catch (IOException e) {
            // TODO 自動生成的 catch 塊
            e.printStackTrace();
        }
        
        //------------------------------
        int temp=g_Wordcount[0];
        int numtemp=0;
        for(int i=0;i<26;i++)
        {
            for(int j=i;j<26;j++)
            {
                if(g_Wordcount[j]>g_Wordcount[i])
                {
                    temp=g_Wordcount[i];
                    g_Wordcount[i]=g_Wordcount[j];
                    g_Wordcount[j]=temp;
                    numtemp=g_Num[i];
                    g_Num[i]=g_Num[j];
                    g_Num[j]=numtemp;
                    
                }
            }
        }
        int sum=0;
        for(int i=0;i<26;i++)
        {
            sum+=g_Wordcount[i];
        }
        for(int i=0;i<26;i++)
        {
            char c=(char) (‘a‘+g_Num[i]);
            tx2+=c+":"+String.format("%.2f%% \r\n", (double)g_Wordcount[i]/sum*100);
        }
        try {
            daochu(tx2,dc+"1.txt");
        } catch (IOException e) {
            // TODO 自動生成的 catch 塊
            e.printStackTrace();
        }
        
        //------------------------------
        
    }
    public static void daoru(String s) throws IOException
    {
        
        File a=new File(s);
        FileInputStream b = new FileInputStream(a);
        InputStreamReader c=new InputStreamReader(b,"UTF-8");
        String string2=new String("");
        while(c.ready())
        {
            char string1=(char) c.read();
            if(WordNum(string1)>=0)
            {
                g_Wordcount[WordNum(string1)]+=1;
            }
            
            //------------------------
            if(!isWord(string1))
            {
                if(!isBaseWord(string2))
                {
                    if(map1.containsKey(string2.toLowerCase()))
                    {
                        Integer num1=map1.get(string2.toLowerCase())+1;
                        map1.put(string2.toLowerCase(),num1);
                    }
                    else
                    {
                        Integer num1=1;
                        map1.put(string2.toLowerCase(),num1);
                    }
                }
                string2="";
            }
            else
            {
                if(isInitWord(string1))
                {
                    string2+=string1;
                }
            }
        }
        if(!string2.isEmpty())
        {
            if(!isBaseWord(string2))
            {
                if(map1.containsKey(string2.toLowerCase()))
                {
                    Integer num1=map1.get(string2.toLowerCase())+1;
                    map1.put(string2.toLowerCase(),num1);
                }
                else
                {
                    Integer num1=1;
                    map1.put(string2.toLowerCase(),num1);
                }
            }
            
            string2="";
        }
        c.close();
        b.close();
    }
    public static void daochu(String txt,String outfile) throws IOException
    {
        File fi=new File(outfile);
        FileOutputStream fop=new FileOutputStream(fi);
        OutputStreamWriter ops=new OutputStreamWriter(fop,"UTF-8");
        ops.append(txt);
        ops.close();
        fop.close();
    }
    public static boolean isWord(char a)
    {
        if(a<=‘z‘&&a>=‘a‘||a<=‘Z‘&&a>=‘A‘||a==‘\‘‘)
            return true;
        return false;
    }
    public static boolean isInitWord(char a)
    {
        if(a<=‘z‘&&a>=‘a‘||a<=‘Z‘&&a>=‘A‘||a>‘0‘&&a<‘9‘||a==‘\‘‘)
            return true;
        return false;
    }
    public static boolean isBaseWord(String word)
    {
        for(int i=0;i<unUse.length;i++)
        {
            if(unUse[i].equals(word)||word.length()==1)
                return true;
        }
        return false;
    }
    public static int WordNum(char a)
    {
        if(a<=‘z‘&&a>=‘a‘)
            return a-‘a‘;
        else if(a<=‘Z‘&&a>=‘A‘)
            return a-‘A‘;
        return -1;
    }
    //----遞歸文件夾
    public static void traverseFolder2(String path) {

        File file = new File(path);
        if (file.exists()) {
            File[] files = file.listFiles();
            if (null == files || files.length == 0) {
                System.out.println("文件夾是空的!");
                return;
            } else {
                for (File file2 : files) {
                    if (file2.isDirectory()) {
                        System.out.println("文件夾:" + file2.getAbsolutePath());
                        traverseFolder2(file2.getAbsolutePath());
                    } else {
                        System.out.println("文件:" + file2.getAbsolutePath());
                        String name=file2.getName();
                        daoruFiles(file2.getAbsolutePath(), file2.getParentFile()+"\\"+name.replace(".txt", "")+"tongji");
                        
                    }
                }
            }
        } else {
            System.out.println("文件不存在!");
        }
    }

    
}

將飄的整本小說及其分章節放在一個文件夾中，最終的實驗結果如下：

技術分享圖片

tongji1位後綴的是文章字母構成比例（以整本飄的英文小說為例子）：

技術分享圖片

tongji2的實驗結果是有意義單詞的出現次數前10的排名：

技術分享圖片

對整本飄小說處理的時間級別在1秒以內，處理大文件及其多文件的過程在測試過程中沒有出現問題。

Java實驗--統計字母出現頻率及其單詞個數

trace ise 出現次數 [] 頻率 getparent 小寫字母 throws ole 本周的實驗要求在之前實現統計單詞的基礎之上(可以見之前博客的統計單詞的那個實驗)，對其進行修改成所需要的格式，統計字母出現頻率的功能，並按照一定的格式把最終結果的用特定的格式在文本

Java中統計字元出現個數和單詞出現個數（Map解決）

*統計每個字元/單詞出現的次數 */ publicstaticvoid count_word(){ scanner = new Scanner(System.in);

應聘Java筆試時可能出現問題及其答案

應聘Java筆試時可能出現問題及其答案在尋找這些答案的過程中，我將相關答案記錄下來，就形成了以下這些東西。需要說明的是以下答案肯定有很多不完整甚至錯誤的地方，需要各位來更正與完善它，千萬不要扔我的雞蛋啊。希望本文能夠給即將奔赴筆試考場的同仁些許幫助，更希望更多的人加入到收集整

Python獲取一段文章中字母出現頻率前5的字母以及個數（去除空格、換行符等，只算字母）

import time,re from collections import Counter text = 'A friend of mine named Paul received an automobile from his brother as Christmas present.

1219: 統計字符串的單詞個數

== 一行 pan div std 問題字符 names 分隔題目描述輸入一行字符，統計並輸出其中有多少個單詞，單詞之間用空格分隔。輸入只有一行，保證只包含可見字符，且此行的所有字符數不超過100。輸出一個整數，表示輸入的一行字符中共有多少個單詞。請

Java 統計一個字串中每個單詞,或者字母出現的次數

package cn.itcast.demo24; import java.util.HashMap; /* * 用程式碼實現以下需求(1)有如下字串"If you want to change your fate I think you must come to the

使用Java記錄四級試卷中的單詞出現頻率

1. 前言實現的功能是，記錄四級試卷中所有單詞出現頻率，並將單詞以頻率高低的方式進行排序，當然不僅是試卷，其他的文章都可以 2. 思路怎麼使用Java記錄一張四級卷子裡中所有的單詞呢讀取到文章所有的內容區分單詞與其他字元，用map將單詞與出現次數建

連結串列例項：對英語文字檔案單詞字元出現頻率統計

1 #include <stdio.h> 2 #include <string.h> 3 #include <stdlib.h> 4 #include <ctype.h> 5 6 7 #define

jmu-Java&Python-統計一段文字中的單詞個數並按單詞的字母順序排序後輸出

現需要統計若干段文字(英文)中的不同單詞數量。如果不同的單詞數量不超過10個，則將所有單詞輸出(按字母順序)，否則輸出前10個單詞。注1：單詞之間以空格(1個或多個空格)為間隔。注2：忽略空行或者空格行。注3：單詞大小寫敏感，即'word'與'WORD'是兩個不同的單詞。輸入說明

統計字串中各英文字母出現的頻率並按頻度排序

直接上程式碼 public class StringTest { /** * 任意給定英文字串，求出各個字母出現的次數，並按照字母出現的次數從高到低排序。

Python實現統計一篇英文文章內每個單詞的出現頻率的兩種很好解法

有一道Python面試題: 用python實現統計一篇英文文章內每個單詞的出現頻率，並返回出現頻率最高的前10個單詞及其出現次數。檔案的內容，就拷貝import this模組中的內容，檔名為: this.txt The Zen of Python, by Tim Peters Beauti

Java版統計文件中的每個單詞出現次數

ack ioe .cn style pri .html key red reg 正則表達式之Pattern和Matcher，請參見轉載博客 http://www.cnblogs.com/haodawang/p/5967219.html 代碼實現： 1 import

Java實現統計一篇文章中每個單詞出現的次數

import java.io.File; import java.io.FileReader; import java.util.HashMap; import java.util.Iterator; import java.util.Map; import java.util.Set; import jav

java記——統計一串字元中每個字母出現的個數

程式程式碼： import java.util.Scanner; public class Kn { public static void main(String[] args){ Scanne

做一個詞頻統計程式，該程式具有以下功能基本要求：（1）可匯入任意英文文字檔案（2）統計該英文檔案中單詞數和各單詞出現的頻率（次數），並能將單詞按字典順序輸出。（3）將單詞及頻率寫入檔案。

import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.FileReader; import java.io.FileWriter; import java.io.IOExcep

java程式設計:輸入一串小寫字串，統計每個字母出現的次數

*需求：統計字串中每個字母： * 說明：編寫程式，提示使用者輸入一個字串， * 然後統計字串中每個字母出現的個數，忽略字母的大小寫。 * * 原理： * 1.使用String類中的toLowerCase()方法，將字串中的大寫字母轉換成小寫形式。 * 2.構造

2013北郵java教程第4次實驗統計一句英語中母音字母的個數

編寫一個JAVA程式，統計一句英語中母音字母的個數。放程式碼： import java.util.*; public class Summary {/*** @param args* @author Chenxingman*/public static void main

C語言通過二叉樹實現單詞出現頻率的統計

一步步記錄自己的成長，在DVE-C++下編譯通過 #include <stdio.h> #include <ctype.h> #include <string.h> #include <stdlib.h> #define MA

統計一TXT文件中單詞出現頻率，輸出頻率最高的10個單詞

實驗過程主要思路就是首先將標點符號，常用冠詞等替換掉，然後利用雜湊表和陣列原理排序，輸出最高頻率的前十個陣列程式碼如下 import java.io.BufferedReader; import java.io.File; import java.io.Fil

Spark Streaming從Kafka中獲取數據，並進行實時單詞統計，統計URL出現的次數

scrip 發送消息 rip mark 3.2 umt 過程 bject ttr 1、創建Maven項目創建的過程參考：http://blog.csdn.net/tototuzuoquan/article/details/74571374 2、啟動Kafka A:安裝ka

Java實驗--統計字母出現頻率及其單詞個數

相關推薦