To be or not to be ...

阿新 • • 發佈：2018-12-22

/*
需求：為檔案建立倒排索引
step1：
    |--為所有的檔案建立索引號 FileID_Number
      |--首先查詢到所有的檔案目錄 file.list[]
      |--將所有的檔案寫到一個檔案索引檔案中 fileIndex.txt
step2:
    |--根據檔案的路徑將檔案載入到程式中，並將其中的單詞分詞統計
      |--統計每個單詞在各個檔案中出現的頻率，並將統計資訊寫到結果檔案wordIndex.txt中

*/

import java.io.*;
import java.util.*;

class InvertedEngine
{
    public static void 
 main(String[] args) throws IOException
    {   
        String filePath = "documents";
        String docIndex = "docIndex.txt";
        String wordIndex = "wordIndex.txt";
        getFileIndex(filePath , docIndex);
        getWordsFrequency(docIndex,wordIndex);
        System.out.println("Work Done!" 
);
    }

    public static void getFileIndex(String filePath , String docIndex)
    {
        //通過傳入的filePath找到檔案所在，並將該檔案下所有檔案資訊寫到docIndex.txt中
        File file = new File(filePath);
        File[] fileList = file.listFiles();     
        BufferedWriter bufw = null;
        try 
        {   //將所有filePath下的檔案路徑寫到docIndex檔案中 

            bufw = new BufferedWriter(new FileWriter(docIndex));
            for(int x = 0 ; x <fileList.length ; x++ )
            {
                String docPath = fileList[x].getAbsolutePath();         
                bufw.write("DocID_" + x + "\t" + docPath);
                bufw.newLine();bufw.flush();//重新整理寫入
            }
        }
        catch (IOException e)
        {
            System.out.println("開啟檔案失敗" + e);
        }
        finally
        {
            try
            {
                if(bufw != null)
                    bufw.close();
            }
            catch (IOException ex)
            {
                System.out.println("關閉檔案失敗" + ex);
            }
        }
    }
    public static void getWordsFrequency(String docIndex , String wordIndex) throws IOException
    { //通過docIndex檔案中的內容找到每個檔案，並將檔案中的內容做單詞統計
       TreeMap<String,TreeMap<String,Integer>>  tmp = new TreeMap<String,TreeMap<String,Integer>>();//統計map
       BufferedReader bufr = new BufferedReader(new FileReader(docIndex));//讀取docIndex.txt
       BufferedWriter bufw = new BufferedWriter(new FileWriter(wordIndex));//寫入到wordIndex.txt
       BufferedReader bufrDoc = null;
       String docIDandPath = null;
       while( (docIDandPath = bufr.readLine()) != null)
        {
              String[] docInfo = docIDandPath.split("\t");
              String docID = docInfo[0]; String docPath = docInfo[1];//獲取到docID和檔案的路徑
              bufrDoc = new BufferedReader(new FileReader(docPath));
              String  wordLine = null;   
              while( (wordLine = bufrDoc.readLine()) != null)
                {
                  String[] words = wordLine.split("\\W");
                  for(String wordOfDoc : words)
                      if(!wordOfDoc.equals(""))
                          wordDeal(wordOfDoc,docID,tmp);//將從docIndex讀取到對應檔案內容對做統計處理                            
                }
        } 
        //將處理後的結果寫入到wordIndex.txt檔案中        
        String wordFreInfo = null;
        Set<Map.Entry<String,TreeMap<String,Integer>>> entrySet = tmp.entrySet();
        Iterator<Map.Entry<String,TreeMap<String,Integer>>> it = entrySet.iterator();
        while(it.hasNext())
        {
            Map.Entry<String,TreeMap<String,Integer>> em = it.next();
            wordFreInfo = em.getKey() +"\t" + em.getValue();
            bufw.write(wordFreInfo);
            bufw.newLine();bufw.flush();
        }
        bufw.close();
        bufr.close();
        bufrDoc.close();
    }
    public static void wordDeal(String wordOfDoc,String docID,TreeMap<String,TreeMap<String,Integer>> tmp)
    {
        wordOfDoc = wordOfDoc.toLowerCase();
        if(!tmp.containsKey(wordOfDoc))
        {   
          //單詞在統計中是首次出現 
            TreeMap<String , Integer> tmpST = new TreeMap<String , Integer>();
            tmpST.put(docID,1);
            tmp.put(wordOfDoc,tmpST);
        }        
        else
        {//單詞在tmp中已近存在獲取該單詞在對應docID中出現次數，若是首次出現
         //count = null，則將（docID ,1)加入到tmpST中；若不是首次出現，則將count++後，再將資訊回寫到tmpST中。
         TreeMap<String ,Integer> tmpST = tmp.get(wordOfDoc);
         Integer count = tmpST.get(docID);
         count = ((count == null) ? 1 : count++);
         tmpST.put(docID,count);                
         tmp.put(wordOfDoc,tmpST);  //將最新結果回寫到tmp中   
        }
    }
}

To be or Not to be that is the question

起因：作為一名程式設計師，我很固執想往需求分析、軟體設計方向發展；一直以來想要參加一個完整的專案，感受專案的每個階段的考慮事件：自己工作的專案雖然蠻大的，但是一直沒法做自己想做的事情。最近發現開源專案是一個不錯的選擇，於是我參加了一個叫“XX“（不方便）的開源專案，我向專案主管（以下簡稱”

To be or not to be ...

/* 需求：為檔案建立倒排索引 step1： |--為所有的檔案建立索引號 FileID_Number |--首先查詢到所有的檔案目錄 file.list[] |--將所有的檔案寫到一個檔案索引檔案中 fileIndex.txt

To be or not to be，to be，be better

在網際網路的業務系統中，涉及到各種各樣的ID，訂單id,支付id,退款id,下面我一一來列舉一下，不一定全部適合，這些解決方案僅供你參考，或許對你有用。方案： 1.UUID 演算法的核心思想是結合機器的網絡卡、當地時間、一個隨記數來生成UUID。優點：本地生成，生成

【davidsu33的專欄】To be or not to be, It's a problem!!!

Twisted 基於python開發的跨平臺的網路庫，可以說只要是伺服器涉及到的，都可以用。包含http、ftp、mail、ssh、xmpp、irc也包含了底層的通訊庫，包括twisted.basic中的基於位元組或則基於行的通訊。twisted最大的閃光點在於全面，而

java中如何將每個單詞的字母反轉,就是將"To be or not to be "變成“oT eb ro ton ot eb”

public static void main(String[] args){ String s = "To be or not to be "; String[] ss = s.split(" "); StringBuilder sb = new Str

to be or not to be, that is a question...

很少釋出負能量的東西，沒地方寫，就放這裡吧。時間過得夠快的，本科畢業一年了，研究生入學也一年了，今天心情不太好，想總結一下自己在這一年都幹了什麼。為時一年的雁棲湖集中教學馬上就要結束了，我在努力地回想，除了每天都在寫程式碼，程式碼量確實上去了，但是這一年來好像也沒幹

To be or not to be,that's a question!

1. 背景： SIP提供給客戶端伺服器收到來自客戶端請求的IP地址，這個源IP地址被放在”received”引數中傳送，它放於響應的頂端頭欄位中。對NAT穿越有很大作用。但有很多情況下，僅一個ip地址資訊還不夠，還有需要埠資訊。於是有了第二步

PAT 1004 To Fill or Not to Fill (25)

space reac while osi diff font ava possible sso 題目描寫敘述 With highways available, driving a car from Hangzhou to any other city is easy

PAT1092:To Buy or Not to Buy

who ans namespace small 存在計數缺失 xtra 長度限制 1092. To Buy or Not to Buy (20) 時間限制 100 ms 內存限制 65536 kB 代碼長度限制 16000 B 判題程序 Standard

1033 To Fill or Not to Fill

erl rom strong put diff 針對下一個 ont \n PAT A 1033 To Fill or Not to Fill With highways available, driving a car from Hangzhou to any other

HDU-5978 To begin or not to begin

找規律，k為奇數輸出0，k為偶數輸出1 #include <iostream> using namespace std; int main () { int k; while (cin >> k) { if (k&1) cout &

2016 ICPC大連賽區 [Cloned] H - To begin or not to begin

A box contains black balls and a single red ball. Alice and Bob draw balls from this box without replacement, alternating after each draws until t

PAT (Advanced Level) Practice 1092 To Buy or Not to Buy （20 分）

Eva would like to make a string of beads with her favorite colors so she went to a small shop to buy some beads. There were many colorful strings of b

hdu5978 規律求概率 To begin or not to begin

給k個黑球 1個紅球先手那個人有優勢輸 1 先手沒優勢輸2 先手後手都一樣輸0 其實如果放回概率是一樣的設p為先手贏的概率 1黑1紅 p=1/2 先手抽紅 2黑1紅 p=1/3 + 2/3 *

1033 To Fill or Not to Fill （25 分）貪心演算法

題目 With highways available, driving a car from Hangzhou to any other city is easy. But since the tank capacity of a car is limited, we have to

2016 ICPC大連賽區 hdu5978 To begin or not to begin

To begin or not to begin Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/65536 K (Java/Others) Total Submission(s): 2

PAT1092 To Buy or Not to Buy （20 分）

1092 To Buy or Not to Buy （20 分） Eva would like to make a string of beads with her favorite colors so she went to a small shop to buy some beads

PAT1033 To Fill or Not to Fill

With highways available, driving a car from Hangzhou to any other city is easy. But since the tank capacity of a car is limited, w

【PAT甲級】1092 To Buy or Not to Buy

Eva would like to make a string of beads with her favorite colors so she went to a small shop to buy some beads. There were many colorful

PAT-ADVANCED1033——To Fill or Not to Fill

題目描述：題目翻譯： 1033 加油或不加油有了高速公路，從杭州開車到任何其他城市都很方便。但由於汽車的油箱容量有限，我們不得不在途中找到加油站。不同的加油站可能會給不同的價格。你需要設計最便宜的路線。輸入格式：每個輸入檔案包含一個測試用例

To be or not to be ...

相關推薦