1. 程式人生 > >Louvain 社團發現演算法學習(我的java實現+資料用例)

Louvain 社團發現演算法學習(我的java實現+資料用例)

為了大家方便,直接把資料放在github了:

演算法介紹:

Louvain 演算法是基於模組度的社群發現演算法,該演算法在效率和效果上都表現較好,並且能夠發現層次性的社群結構,其優化目標是最大化整個社群網路的模組度。

社群網路的模組度(Modularity)是評估一個社群網路劃分好壞的度量方法,它的含義是社群內節點的連邊數與隨機情況下的邊數之差,它的取值範圍是 (0,1),其定義如下:

上式中,Aij代表結點i和j之間的邊權值(當圖不帶權時,邊權值可以看成1)。 ki代表結點i的領街邊的邊權和(當圖不帶權時,即為結點的度數)。

m為圖中所有邊的邊權和。 ci為結點i所在的社團編號。

模組度的公式定義可以做如下的簡化:

其中Sigma in表示社群c內的邊的權重之和,Sigma tot表示與社群c內的節點相連的邊的權重之和。

我們的目標,是要找出各個結點處於哪一個社團,並且讓這個劃分結構的模組度最大。

Louvain演算法的思想很簡單:

1)將圖中的每個節點看成一個獨立的社群,次數社群的數目與節點個數相同;

2)對每個節點i,依次嘗試把節點i分配到其每個鄰居節點所在的社群,計算分配前與分配後的模組度變化Delta Q,並記錄Delta Q最大的那個鄰居節點,如果maxDelta Q>0,則把節點i分配Delta Q最大的那個鄰居節點所在的社群,否則保持不變;

3)重複2),直到所有節點的所屬社群不再變化;

4)對圖進行壓縮,將所有在同一個社群的節點壓縮成一個新節點,社群內節點之間的邊的權重轉化為新節點的環的權重,社群間的邊權重轉化為新節點間的邊權重;

5)重複1)直到整個圖的模組度不再發生變化。

在寫程式碼時,要注意幾個要點:

* 第二步,嘗試分配結點時,並不是只嘗試獨立的結點,也可以嘗試所在社群的結點數大於1的點,我在看paper時一開始把這個部分看錯了,導致演算法出問題。

* 第三步,重複2)的時候,並不是將每個點遍歷一次後就對圖進行一次重構,而是要不斷迴圈的遍歷每個結點,直到一次迴圈中所有結點所在社群都不更新了,表示當前網路已經穩定,然後才進行圖的重構。

* 模組度增益的計算,請繼續看下文

過程如下圖所示:


可以看出,louvain是一個啟發式的貪心演算法。我們需要對模組度進行一個啟發式的更新。這樣的話這個演算法會有如下幾個問題:

1:嘗試將節點i分配到相鄰社團時,如果真的移動結點i,重新計算模組度,那麼演算法的效率很難得到保證

2:在本問題中,貪心演算法只能保證區域性最優,而不能夠保證全域性最優

3:將節點i嘗試分配至相鄰社團時,要依據一個什麼樣的順序

......


第一個問題,在該演算法的paper中給了我們解答。我們實際上不用真的把節點i加入相鄰社團後重新計算模組度,paper中給了我們一個計算把結點i移動至社團c時,模組度的增益公式:

其中Sigma in表示起點終點都在社群c內的邊的權重之和,Sigma tot表示入射社群c內的邊的權重之和,ki代表結點i的帶權度數和,m為所有邊權和。

但是該增益公式還是過於複雜,仍然會影響演算法的時間效率。

但是請注意,我們只是想通過模組增益度來判斷一個結點i是否能移動到社團c中,而我們實際上是沒有必要真正的去求精確的模組度,只需要知道,當前的這步操作,模組度是否發生了增長。

因此,就有了相對增益公式的出現:

相對增益的值可能大於1,不是真正的模組度增長值,但是它的正負表示了當前的操作是否增加了模組度。用該公式能大大降低演算法的時間複雜度。

第二個問題,該演算法的確不能夠保證全域性最優。但是我們該演算法的啟發式規則很合理,因此我們能夠得到一個十分精確的近似結果。

同時,為了校準結果,我們可以以不同的序列多次呼叫該演算法,保留一個模組度最大的最優結果。

第三個問題,在第二個問題中也出現了,就是給某個結點i找尋領接點時,應當以一個什麼順序?遞增or隨機or其它規則?我想這個問題需要用實驗資料去分析。

在paper中也有提到一些能夠使結果更精確的序列。我的思路是,如果要嘗試多次取其最優,取隨機序列應該是比較穩定比較精確的方式。

說了這麼多,下面給上本人的Louvain演算法java實現供參考。 本人程式碼註釋比較多,適合學習。

我的程式碼後面是國外大牛的程式碼,時空複雜度和我的程式碼一樣,但是常數比我小很多,速度要快很多,而且答案更精準(迭代了10次),適合實際運用~

本人的java程式碼:(時間複雜度o(e),空間複雜度o(e))

Edge.java:

package myLouvain;

public class Edge implements Cloneable{
	int v;     //v表示連線點的編號,w表示此邊的權值
	double weight;
	int next;    //next負責連線和此點相關的邊
	Edge(){}
	public Object clone(){
		Edge temp=null;
		try{  
		    temp = (Edge)super.clone();   //淺複製  
		}catch(CloneNotSupportedException e) {  
		    e.printStackTrace();  
		}   
		return temp;
	}
}


Louvain.java:

package myLouvain;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Random;

public class Louvain implements Cloneable{
    int n; // 結點個數
    int m; // 邊數
    int cluster[]; // 結點i屬於哪個簇
    Edge edge[]; // 鄰接表
    int head[]; // 頭結點下標
    int top; // 已用E的個數
    double resolution; // 1/2m 全域性不變
    double node_weight[]; // 結點的權值
    double totalEdgeWeight; // 總邊權值
    double[] cluster_weight; // 簇的權值
    double eps = 1e-14; // 誤差

    int global_n; // 最初始的n
    int global_cluster[]; // 最後的結果,i屬於哪個簇
    Edge[] new_edge;   //新的鄰接表
    int[] new_head;
    int new_top = 0;
    final int iteration_time = 3; // 最大迭代次數
    
    Edge global_edge[];   //全域性初始的臨接表  只儲存一次,永久不變,不參與後期運算
    int global_head[];
    int global_top=0;
    
    void addEdge(int u, int v, double weight) {  
        if(edge[top]==null)
            edge[top]=new Edge();
        edge[top].v = v;
        edge[top].weight = weight;
        edge[top].next = head[u];
        head[u] = top++;
    }

    void addNewEdge(int u, int v, double weight) {
        if(new_edge[new_top]==null)
            new_edge[new_top]=new Edge();
        new_edge[new_top].v = v;
        new_edge[new_top].weight = weight;
        new_edge[new_top].next = new_head[u];
        new_head[u] = new_top++;
    }
    
    void addGlobalEdge(int u, int v, double weight) {
        if(global_edge[global_top]==null)
            global_edge[global_top]=new Edge();
        global_edge[global_top].v = v;
        global_edge[global_top].weight = weight;
        global_edge[global_top].next = global_head[u];
        global_head[u] = global_top++;
    }

    void init(String filePath) {
        try {
            String encoding = "UTF-8";
            File file = new File(filePath);
            if (file.isFile() && file.exists()) { // 判斷檔案是否存在
                InputStreamReader read = new InputStreamReader(new FileInputStream(file), encoding);// 考慮到編碼格式
                BufferedReader bufferedReader = new BufferedReader(read);
                String lineTxt = null;
                lineTxt = bufferedReader.readLine();

                ////// 預處理部分
                String cur2[] = lineTxt.split(" ");
                global_n = n = Integer.parseInt(cur2[0]);
                m = Integer.parseInt(cur2[1]);
                m *= 2;
                edge = new Edge[m];
                head = new int[n];
                for (int i = 0; i < n; i++)
                    head[i] = -1;
                top = 0;
                
                global_edge=new Edge[m];
                global_head = new int[n];
                for(int i=0;i<n;i++)
                    global_head[i]=-1;
                global_top=0;
                global_cluster = new int[n];
                for (int i = 0; i < global_n; i++)
                    global_cluster[i] = i;
                node_weight = new double[n];
                totalEdgeWeight = 0.0;
                while ((lineTxt = bufferedReader.readLine()) != null) {
                    String cur[] = lineTxt.split(" ");
                    int u = Integer.parseInt(cur[0]);
                    int v = Integer.parseInt(cur[1]);
                    double curw;
                    if (cur.length > 2) {
                        curw = Double.parseDouble(cur[2]);
                    } else {
                        curw = 1.0;
                    }
                    addEdge(u, v, curw);
                    addEdge(v, u, curw);
                    addGlobalEdge(u,v,curw);
                    addGlobalEdge(v,u,curw);
                    totalEdgeWeight += 2 * curw;
                    node_weight[u] += curw;
                    if (u != v) {
                        node_weight[v] += curw;
                    }
                }
                resolution = 1 / totalEdgeWeight;
                read.close();
            } else {
                System.out.println("找不到指定的檔案");
            }
        } catch (Exception e) {
            System.out.println("讀取檔案內容出錯");
            e.printStackTrace();
        }
    }

    void init_cluster() {
        cluster = new int[n];
        for (int i = 0; i < n; i++) { // 一個結點一個簇
            cluster[i] = i;
        }
    }

    boolean try_move_i(int i) { // 嘗試將i加入某個簇
        double[] edgeWeightPerCluster = new double[n];
        for (int j = head[i]; j != -1; j = edge[j].next) {
            int l = cluster[edge[j].v]; // l是nodeid所在簇的編號
            edgeWeightPerCluster[l] += edge[j].weight;
        }
        int bestCluster = -1; // 最優的簇號下標(先預設是自己)
        double maxx_deltaQ = 0.0; // 增量的最大值
        boolean[] vis = new boolean[n];
        cluster_weight[cluster[i]] -= node_weight[i];
        for (int j = head[i]; j != -1; j = edge[j].next) {
            int l = cluster[edge[j].v]; // l代表領接點的簇號
            if (vis[l]) // 一個領接簇只判斷一次
                continue;
            vis[l] = true;
            double cur_deltaQ = edgeWeightPerCluster[l];
            cur_deltaQ -= node_weight[i] * cluster_weight[l] * resolution;
            if (cur_deltaQ > maxx_deltaQ) {
                bestCluster = l;
                maxx_deltaQ = cur_deltaQ;
            }
            edgeWeightPerCluster[l] = 0;
        }
        if (maxx_deltaQ < eps) {
            bestCluster = cluster[i];
        }
         //System.out.println(maxx_deltaQ);
        cluster_weight[bestCluster] += node_weight[i];
        if (bestCluster != cluster[i]) { // i成功移動了
            cluster[i] = bestCluster;
            return true;
        }
        return false;
    }

    void rebuildGraph() { // 重構圖
        /// 先對簇進行離散化
        int[] change = new int[n];
        int change_size=0;
        boolean vis[] = new boolean[n];
        for (int i = 0; i < n; i++) {
            if (vis[cluster[i]])
                continue;
            vis[cluster[i]] = true;
            change[change_size++]=cluster[i];
        }
        int[] index = new int[n]; // index[i]代表 i號簇在新圖中的結點編號
        for (int i = 0; i < change_size; i++)
            index[change[i]] = i;
        int new_n = change_size; // 新圖的大小
        new_edge = new Edge[m];
        new_head = new int[new_n];
        new_top = 0;
        double new_node_weight[] = new double[new_n]; // 新點權和
        for(int i=0;i<new_n;i++)
            new_head[i]=-1;
        
        ArrayList<Integer>[] nodeInCluster = new ArrayList[new_n]; // 代表每個簇中的節點列表
        for (int i = 0; i < new_n; i++)
            nodeInCluster[i] = new ArrayList<Integer>();
        for (int i = 0; i < n; i++) {
            nodeInCluster[index[cluster[i]]].add(i);
        }
        for (int u = 0; u < new_n; u++) { // 將同一個簇的挨在一起分析。可以將visindex陣列降到一維
            boolean visindex[] = new boolean[new_n]; // visindex[v]代表新圖中u節點到v的邊在臨街表是第幾個(多了1,為了初始化方便)
            double delta_w[] = new double[new_n]; // 邊權的增量
            for (int i = 0; i < nodeInCluster[u].size(); i++) {
                int t = nodeInCluster[u].get(i);
                for (int k = head[t]; k != -1; k = edge[k].next) {
                    int j = edge[k].v;
                    int v = index[cluster[j]];
                    if (u != v) {
                        if (!visindex[v]) {
                            addNewEdge(u, v, 0);
                            visindex[v] = true;
                        } 
                        delta_w[v] += edge[k].weight;
                    }
                }
                new_node_weight[u] += node_weight[t];
            }
            for (int k = new_head[u]; k != -1; k = new_edge[k].next) {
                int v = new_edge[k].v;
                new_edge[k].weight = delta_w[v];
            }
        }

        // 更新答案
        int[] new_global_cluster = new int[global_n];
        for (int i = 0; i < global_n; i++) {
            new_global_cluster[i] = index[cluster[global_cluster[i]]];
        }
        for (int i = 0; i < global_n; i++) {
            global_cluster[i] = new_global_cluster[i];
        }
        top = new_top;
        for (int i = 0; i < m; i++) {
            edge[i] = new_edge[i];
        }
        for (int i = 0; i < new_n; i++) {
            node_weight[i] = new_node_weight[i];
            head[i] = new_head[i];
        }
        n = new_n;
        init_cluster();
    }

    void print() {
        for (int i = 0; i < global_n; i++) {
            System.out.println(i + ": " + global_cluster[i]);
        }
        System.out.println("-------");
    }

    void louvain() {
        init_cluster();
        int count = 0; // 迭代次數
        boolean update_flag; // 標記是否發生過更新
        do { // 第一重迴圈,每次迴圈rebuild一次圖
        //    print(); // 列印簇列表
            count++;
            cluster_weight = new double[n];
            for (int j = 0; j < n; j++) { // 生成簇的權值
                cluster_weight[cluster[j]] += node_weight[j];
            }
            int[] order = new int[n]; // 生成隨機序列
            for (int i = 0; i < n; i++)
                order[i] = i;
            Random random = new Random();
            for (int i = 0; i < n; i++) {
                int j = random.nextInt(n);
                int temp = order[i];
                order[i] = order[j];
                order[j] = temp;
            }
            int enum_time = 0; // 列舉次數,到n時代表所有點已經遍歷過且無移動的點
            int point = 0; // 迴圈指標
            update_flag = false; // 是否發生過更新的標記
            do {
                int i = order[point];
                point = (point + 1) % n;
                if (try_move_i(i)) { // 對i點做嘗試
                    enum_time = 0;
                    update_flag = true;
                } else {
                    enum_time++;
                }
            } while (enum_time < n);
            if (count > iteration_time || !update_flag) // 如果沒更新過或者迭代次數超過指定值
                break;
            rebuildGraph(); // 重構圖
        } while (true);
    }
}





Main.java:

package myLouvain;

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;

public class Main {
    public static void writeOutputJson(String fileName, Louvain a) throws IOException {
        BufferedWriter bufferedWriter;
        bufferedWriter = new BufferedWriter(new FileWriter(fileName));
        bufferedWriter.write("{\n\"nodes\": [\n");
        for(int i=0;i<a.global_n;i++){
            bufferedWriter.write("{\"id\": \""+i+"\", \"group\": "+a.global_cluster[i]+"}");
            if(i+1!=a.global_n)
            bufferedWriter.write(",");
            bufferedWriter.write("\n");
        }
        bufferedWriter.write("],\n\"links\": [\n");
        
        for(int i=0;i<a.global_n;i++){
            for (int j = a.global_head[i]; j != -1; j = a.global_edge[j].next) {
                int k=a.global_edge[j].v;
                bufferedWriter.write("{\"source\": \""+i+"\", \"target\": \""+k+"\", \"value\": 1}");   
                if(i+1!=a.global_n||a.global_edge[j].next!=-1)
                    bufferedWriter.write(",");
                bufferedWriter.write("\n");
            }
        }
        bufferedWriter.write("]\n}\n");
        bufferedWriter.close();
    }
    
    static void writeOutputCluster(String fileName, Louvain a) throws IOException{
        BufferedWriter bufferedWriter;
        bufferedWriter = new BufferedWriter(new FileWriter(fileName));
        for(int i=0;i<a.global_n;i++){
            bufferedWriter.write(Integer.toString(a.global_cluster[i]));
            bufferedWriter.write("\n");
        }
        bufferedWriter.close();
    }
    
    static void writeOutputCircle(String fileName, Louvain a) throws IOException{
        BufferedWriter bufferedWriter;
        bufferedWriter = new BufferedWriter(new FileWriter(fileName));
        ArrayList list[] = new ArrayList[a.global_n];
        for(int i=0;i<a.global_n;i++){
            list[i]=new ArrayList<Integer>();
        }
        for(int i=0;i<a.global_n;i++){
            list[a.global_cluster[i]].add(i);
        }
        for(int i=0;i<a.global_n;i++){
            if(list[i].size()==0)
                continue;
            for(int j=0;j<list[i].size();j++)
                bufferedWriter.write(list[i].get(j).toString()+" ");
            bufferedWriter.write("\n");
        }
        bufferedWriter.close();
    }     
    public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub
        Louvain a = new Louvain();
        double beginTime = System.currentTimeMillis();
        a.init("/home/eason/Louvain/testdata.txt");
        a.louvain();
        double endTime = System.currentTimeMillis();
        writeOutputJson("/var/www/html/Louvain/miserables.json", a);  //輸出至d3顯示
        writeOutputCluster("/home/eason/Louvain/cluster.txt",a);  //列印每個節點屬於哪個簇
        writeOutputCircle("/home/eason/Louvain/circle.txt",a);   //列印每個簇有哪些節點
        System.out.format("program running time: %f seconds%n", (endTime - beginTime) / 1000.0);
    }

}




用d3繪圖工具對上文那張16個結點的圖的劃分進行louvain演算法後的視覺化結果:

對一個4000個結點,80000條邊的無向圖(facebook資料)執行我的louvain演算法,大概需要1秒多的時間,劃分的效果還可以,視覺化效果(邊有點多了,都粘在一塊了):


//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

下面是國外大牛的程式碼實現(為了大家方便直接貼上了,時間複雜度o(e),空間複雜度o(e)):

ModularityOptimizer.java
import java.io.BufferedReader;  
import java.io.BufferedWriter;  
import java.io.FileReader;  
import java.io.FileWriter;  
import java.io.IOException;  
import java.util.Arrays;  
import java.util.Random;  
  
public class ModularityOptimizer  
{  
    public static void main(String[] args) throws IOException  
    {  
        boolean  update;  
         
        double modularity, maxModularity, resolution, resolution2;  
        int algorithm, i, j, modularityFunction, nClusters, nIterations, nRandomStarts;  
        int[] cluster;  
        double beginTime, endTime;  
        Network network;  
        Random random;  
        String inputFileName, outputFileName;  
  
          
        inputFileName = "/home/eason/Louvain/facebook_combined.txt";  
        outputFileName = "/home/eason/Louvain/answer.txt";  
        modularityFunction = 1;  
        resolution = 1.0;  
        algorithm = 1;  
        nRandomStarts = 10;  
        nIterations = 3;  
                
        System.out.println("Modularity Optimizer version 1.2.0 by Ludo Waltman and Nees Jan van Eck");  
        System.out.println();  
                              
        System.out.println("Reading input file...");  
        System.out.println();  
          
        network = readInputFile(inputFileName, modularityFunction);  
        
        System.out.format("Number of nodes: %d%n", network.getNNodes());  
        System.out.format("Number of edges: %d%n", network.getNEdges() / 2);  
        System.out.println();  
        System.out.println("Running " + ((algorithm == 1) ? "Louvain algorithm" : ((algorithm == 2) ? "Louvain algorithm with multilevel refinement" : "smart local moving algorithm")) + "...");  
        System.out.println();  
          
        resolution2 = ((modularityFunction == 1) ? (resolution / network.getTotalEdgeWeight()) : resolution);  
  
        beginTime = System.currentTimeMillis();  
        cluster = null;  
        nClusters = -1;  
        maxModularity = Double.NEGATIVE_INFINITY;  
        random = new Random(100);  
        for (i = 0; i < nRandomStarts; i++)  
        {  
            if (nRandomStarts > 1)  
                System.out.format("Random start: %d%n", i + 1);  
  
            network.initSingletonClusters();       //網路初始化,每個節點一個簇  
  
            j = 0;  
            update = true;  
            do  
            {  
                if (nIterations > 1)  
                    System.out.format("Iteration: %d%n", j + 1);  
  
                if (algorithm == 1)  
                    update = network.runLouvainAlgorithm(resolution2, random);  
           
                j++;  
  
                modularity = network.calcQualityFunction(resolution2);  
  
                if (nIterations > 1)  
                    System.out.format("Modularity: %.4f%n", modularity);  
            }  
            while ((j < nIterations) && update);  
            if (modularity > maxModularity)     {  
             
                cluster = network.getClusters();  
                nClusters = network.getNClusters();  
                maxModularity = modularity;  
            }  
  
            if (nRandomStarts > 1)  
            {  
                if (nIterations == 1)  
                    System.out.format("Modularity: %.8f%n", modularity);  
                System.out.println();  
            }  
        }  
        endTime = System.currentTimeMillis();  
  
          
        if (nRandomStarts == 1)  
        {  
            if (nIterations > 1)  
                System.out.println();  
            System.out.format("Modularity: %.8f%n", maxModularity);  
        }  
        else  
            System.out.format("Maximum modularity in %d random starts: %f%n", nRandomStarts, maxModularity);  
        System.out.format("Number of communities: %d%n", nClusters);  
        System.out.format("Elapsed time: %.4f seconds%n", (endTime - beginTime) / 1000.0);  
        System.out.println();  
        System.out.println("Writing output file...");  
        System.out.println();  
         
  
        writeOutputFile(outputFileName, cluster);  
    }  
  
    private static Network readInputFile(String fileName, int modularityFunction) throws IOException  
    {  
        BufferedReader bufferedReader;  
        double[] edgeWeight1, edgeWeight2, nodeWeight;  
        int i, j, nEdges, nLines, nNodes;  
        int[] firstNeighborIndex, neighbor, nNeighbors, node1, node2;  
        Network network;  
        String[] splittedLine;  
  
        bufferedReader = new BufferedReader(new FileReader(fileName));  
  
        nLines = 0;  
        while (bufferedReader.readLine() != null)  
            nLines++;  
  
        bufferedReader.close();  
  
        bufferedReader = new BufferedReader(new FileReader(fileName));  
  
        node1 = new int[nLines];  
        node2 = new int[nLines];  
        edgeWeight1 = new double[nLines];  
        i = -1;  
        bufferedReader.readLine() ;
        for (j = 1; j < nLines; j++)  
        {  
            splittedLine = bufferedReader.readLine().split(" ");  
            node1[j] = Integer.parseInt(splittedLine[0]);  
            if (node1[j] > i)  
                i = node1[j];  
            node2[j] = Integer.parseInt(splittedLine[1]);  
            if (node2[j] > i)  
                i = node2[j];  
            edgeWeight1[j] = (splittedLine.length > 2) ? Double.parseDouble(splittedLine[2]) : 1;  
        }  
        nNodes = i + 1;  
  
        bufferedReader.close();  
  
        nNeighbors = new int[nNodes];  
        for (i = 0; i < nLines; i++)  
            if (node1[i] < node2[i])  
            {  
                nNeighbors[node1[i]]++;  
                nNeighbors[node2[i]]++;  
            }  
  
        firstNeighborIndex = new int[nNodes + 1];  
        nEdges = 0;  
        for (i = 0; i < nNodes; i++)  
        {  
            firstNeighborIndex[i] = nEdges;  
            nEdges += nNeighbors[i];  
        }  
        firstNeighborIndex[nNodes] = nEdges;  
  
        neighbor = new int[nEdges];  
        edgeWeight2 = new double[nEdges];  
        Arrays.fill(nNeighbors, 0);  
        for (i = 0; i < nLines; i++)  
            if (node1[i] < node2[i])  
            {  
                j = firstNeighborIndex[node1[i]] + nNeighbors[node1[i]];  
                neighbor[j] = node2[i];  
                edgeWeight2[j] = edgeWeight1[i];  
                nNeighbors[node1[i]]++;  
                j = firstNeighborIndex[node2[i]] + nNeighbors[node2[i]];  
                neighbor[j] = node1[i];  
                edgeWeight2[j] = edgeWeight1[i];  
                nNeighbors[node2[i]]++;  
            }  
  
     
        {  
            nodeWeight = new double[nNodes];  
            for (i = 0; i < nEdges; i++)  
                nodeWeight[neighbor[i]] += edgeWeight2[i];  
            network = new Network(nNodes, firstNeighborIndex, neighbor, edgeWeight2, nodeWeight);  
        }  
        
  
        return network;  
    }  
  
    private static void writeOutputFile(String fileName, int[] cluster) throws IOException  
    {  
        BufferedWriter bufferedWriter;  
        int i;  
  
        bufferedWriter = new BufferedWriter(new FileWriter(fileName));  
  
        for (i = 0; i < cluster.length; i++)  
        {  
            bufferedWriter.write(Integer.toString(cluster[i]));  
            bufferedWriter.newLine();  
        }  
  
        bufferedWriter.close();  
    }  
}  

Network.java

import java.io.Serializable;
import java.util.Random;

public class Network implements Cloneable, Serializable {
	private static final long serialVersionUID = 1;

	private int nNodes;
	private int[] firstNeighborIndex;
	
	private int[] neighbor;
	private double[] edgeWeight;

	private double[] nodeWeight;
	private int nClusters;
	private int[] cluster;

	private double[] clusterWeight;
	private int[] nNodesPerCluster;
	private int[][] nodePerCluster;
	private boolean clusteringStatsAvailable;

	public Network(int nNodes, int[] firstNeighborIndex, int[] neighbor, double[] edgeWeight, double[] nodeWeight) {
		this(nNodes, firstNeighborIndex, neighbor, edgeWeight, nodeWeight, null);
	}

	public Network(int nNodes, int[] firstNeighborIndex, int[] neighbor, double[] edgeWeight, double[] nodeWeight,
			int[] cluster) {
		int i, nEdges;
		this.nNodes = nNodes;
		this.firstNeighborIndex = firstNeighborIndex;
		this.neighbor = neighbor;
		if (edgeWeight == null) {
			nEdges = neighbor.length;
			this.edgeWeight = new double[nEdges];
			for (i = 0; i < nEdges; i++)
				this.edgeWeight[i] = 1;
		} else
			this.edgeWeight = edgeWeight;

		if (nodeWeight == null) {
			this.nodeWeight = new double[nNodes];
			for (i = 0; i < nNodes; i++)
				this.nodeWeight[i] = 1;
		} else
			this.nodeWeight = nodeWeight;
	}

	public int getNNodes() {
		return nNodes;
	}

	public int getNEdges() {
		return neighbor.length;
	}

	public double getTotalEdgeWeight() // 計算總邊權
	{
		double totalEdgeWeight;
		int i;
		totalEdgeWeight = 0;
		for (i = 0; i < neighbor.length; i++)
			totalEdgeWeight += edgeWeight[i];
		return totalEdgeWeight;
	}

	public double[] getEdgeWeights() {
		return edgeWeight;
	}

	public double[] getNodeWeights() {
		return nodeWeight;
	}

	public int getNClusters() {
		return nClusters;
	}

	public int[] getClusters() {
		return cluster;
	}

	public void initSingletonClusters() {
		int i;
		nClusters = nNodes;
		cluster = new int[nNodes];
		for (i = 0; i < nNodes; i++)
			cluster[i] = i;

		deleteClusteringStats();
	}

	public void mergeClusters(int[] newCluster) {
		int i, j, k;
		if (cluster == null)
			return;
		i = 0;
		for (j = 0; j < nNodes; j++) {
			k = newCluster[cluster[j]];
			if (k > i)
				i = k;
			cluster[j] = k;
		}
		nClusters = i + 1;
		deleteClusteringStats();
	}

	public Network getReducedNetwork() {
		double[] reducedNetworkEdgeWeight1, reducedNetworkEdgeWeight2;
		int i, j, k, l, m, reducedNetworkNEdges1, reducedNetworkNEdges2;
		int[] reducedNetworkNeighbor1, reducedNetworkNeighbor2;
		Network reducedNetwork;
		if (cluster == null)
			return null;
		if (!clusteringStatsAvailable)
			calcClusteringStats();

		reducedNetwork = new Network();
		reducedNetwork.nNodes = nClusters;
		reducedNetwork.firstNeighborIndex = new int[nClusters + 1];
		reducedNetwork.nodeWeight = new double[nClusters];
		
		reducedNetworkNeighbor1 = new int[neighbor.length];
		reducedNetworkEdgeWeight1 = new double[edgeWeight.length];
		
		reducedNetworkNeighbor2 = new int[nClusters - 1];
		reducedNetworkEdgeWeight2 = new double[nClusters];
		reducedNetworkNEdges1 = 0;
		
		for (i = 0; i < nClusters; i++) {
			reducedNetworkNEdges2 = 0;
			for (j = 0; j < nodePerCluster[i].length; j++) {
				k = nodePerCluster[i][j]; // k是簇i中第j個節點的id
				for (l = firstNeighborIndex[k]; l < firstNeighborIndex[k + 1]; l++) {
					m = cluster[neighbor[l]]; // m是k的在l位置的鄰居節點所屬的簇id
					if (m != i) {
						if (reducedNetworkEdgeWeight2[m] == 0) {
							reducedNetworkNeighbor2[reducedNetworkNEdges2] = m;
							reducedNetworkNEdges2++;
						}
						reducedNetworkEdgeWeight2[m] += edgeWeight[l];
					}
				}
				reducedNetwork.nodeWeight[i] += nodeWeight[k];
			}
			for (j = 0; j < reducedNetworkNEdges2; j++) {
				reducedNetworkNeighbor1[reducedNetworkNEdges1 + j] = reducedNetworkNeighbor2[j];
				reducedNetworkEdgeWeight1[reducedNetworkNEdges1
						+ j] = reducedNetworkEdgeWeight2[reducedNetworkNeighbor2[j]];
				reducedNetworkEdgeWeight2[reducedNetworkNeighbor2[j]] = 0;
			}
			reducedNetworkNEdges1 += reducedNetworkNEdges2;
			reducedNetwork.firstNeighborIndex[i + 1] = reducedNetworkNEdges1;
		}
		reducedNetwork.neighbor = new int[reducedNetworkNEdges1];
		reducedNetwork.edgeWeight = new double[reducedNetworkNEdges1];
		System.arraycopy(reducedNetworkNeighbor1, 0, reducedNetwork.neighbor, 0, reducedNetworkNEdges1);
		System.arraycopy(reducedNetworkEdgeWeight1, 0, reducedNetwork.edgeWeight, 0, reducedNetworkNEdges1);
		return reducedNetwork;
	}
	
	public double calcQualityFunction(double resolution) {
		double qualityFunction, totalEdgeWeight;
		int i, j, k;
		if (cluster == null)
			return Double.NaN;
		if (!clusteringStatsAvailable)
			calcClusteringStats();
		qualityFunction = 0;
		totalEdgeWeight = 0;
		for (i = 0; i < nNodes; i++) {
			j = cluster[i];
			for (k = firstNeighborIndex[i]; k < firstNeighborIndex[i + 1]; k++) {
				if (cluster[neighbor[k]] == j)
					qualityFunction += edgeWeight[k];
				totalEdgeWeight += edgeWeight[k];
			}
		}
		for (i = 0; i < nClusters; i++)
			qualityFunction -= clusterWeight[i] * clusterWeight[i] * resolution;
		qualityFunction /= totalEdgeWeight;
		return qualityFunction;
	}
	
	public boolean runLocalMovingAlgorithm(double resolution) {
		return runLocalMovingAlgorithm(resolution, new Random());
	}
	public boolean runLocalMovingAlgorithm(double resolution, Random random) {
		boolean update;
		double maxQualityFunction, qualityFunction;
		double[] clusterWeight, edgeWeightPerCluster;
		int bestCluster, i, j, k, l, nNeighboringClusters, nStableNodes, nUnusedClusters;
		int[] neighboringCluster, newCluster, nNodesPerCluster, nodeOrder, unusedCluster;
		if ((cluster == null) || (nNodes == 1))
			return false;
		update = false;
		clusterWeight = new double[nNodes];
		nNodesPerCluster = new int[nNodes];
		for (i = 0; i < nNodes; i++) {
			clusterWeight[cluster[i]] += nodeWeight[i];
			nNodesPerCluster[cluster[i]]++;
		}
		nUnusedClusters = 0;
		unusedCluster = new int[nNodes];
		for (i = 0; i < nNodes; i++)
			if (nNodesPerCluster[i] == 0) {
				unusedCluster[nUnusedClusters] = i;
				nUnusedClusters++;
			}
		nodeOrder = new int[nNodes];
		for (i = 0; i < nNodes; i++)
			nodeOrder[i] = i;
		for (i = 0; i < nNodes; i++) {
			j = random.nextInt(nNodes);
			k = nodeOrder[i];
			nodeOrder[i] = nodeOrder[j];
			nodeOrder[j] = k;
		}
		edgeWeightPerCluster = new double[nNodes];
		neighboringCluster = new int[nNodes - 1];
		nStableNodes = 0;
		i = 0;
		do {
			j = nodeOrder[i];
			nNeighboringClusters = 0;
			for (k = firstNeighborIndex[j]; k < firstNeighborIndex[j + 1]; k++) {
				l = cluster[neighbor[k]];
				if (edgeWeightPerCluster[l] == 0) {
					neighboringCluster[nNeighboringClusters] = l;
					nNeighboringClusters++;
				}
				edgeWeightPerCluster[l] += edgeWeight[k];
			}
			clusterWeight[cluster[j]] -= nodeWeight[j];
			nNodesPerCluster[cluster[j]]--;
			if (nNodesPerCluster[cluster[j]] == 0) {
				unusedCluster[nUnusedClusters] = cluster[j];
				nUnusedClusters++;
			}
			bestCluster = -1;
			maxQualityFunction = 0;
			for (k = 0; k < nNeighboringClusters; k++) {
				l = neighboringCluster[k];
				qualityFunction = edgeWeightPerCluster[l] - nodeWeight[j] * clusterWeight[l] * resolution;
				if ((qualityFunction > maxQualityFunction)
						|| ((qualityFunction == maxQualityFunction) && (l < bestCluster))) {
					bestCluster = l;
					maxQualityFunction = qualityFunction;
				}
				edgeWeightPerCluster[l] = 0;
			}
			if (maxQualityFunction == 0) {
				bestCluster = unusedCluster[nUnusedClusters - 1];
				nUnusedClusters--;
			}
			clusterWeight[bestCluster] += nodeWeight[j];
			nNodesPerCluster[bestCluster]++;
			if (bestCluster == cluster[j])
				nStableNodes++;
			else {
				cluster[j] = bestCluster;
				nStableNodes = 1;
				update = true;
			}
			i = (i < nNodes - 1) ? (i + 1) : 0;
		} while (nStableNodes < nNodes); // 優化步驟是直到所有的點都穩定下來才結束
		newCluster = new int[nNodes];
		nClusters = 0;
		for (i = 0; i < nNodes; i++)
			if (nNodesPerCluster[i] > 0) {
				newCluster[i] = nClusters;
				nClusters++;
			}
		for (i = 0; i < nNodes; i++)
			cluster[i] = newCluster[cluster[i]];
		deleteClusteringStats();
		return update;
	}
	
	public boolean runLouvainAlgorithm(double resolution) {
		return runLouvainAlgorithm(resolution, new Random());
	}
	
	public boolean runLouvainAlgorithm(double resolution, Random random) {
		boolean update, update2;
		Network reducedNetwork;
		if ((cluster == null) || (nNodes == 1))
			return false;
		update = runLocalMovingAlgorithm(resolution, random);
		if (nClusters < nNodes) {
			reducedNetwork = getReducedNetwork();
			reducedNetwork.initSingletonClusters();
			update2 = reducedNetwork.runLouvainAlgorithm(resolution, random);
			if (update2) {
				update = true;
				mergeClusters(reducedNetwork.getClusters());
			}
		}
		deleteClusteringStats();
		return update;
	}
	
	private Network() {
	}
	
	private void calcClusteringStats() {
		int i, j;
		clusterWeight = new double[nClusters];
		nNodesPerCluster = new int[nClusters];
		nodePerCluster = new int[nClusters][];
		for (i = 0; i < nNodes; i++) {
			clusterWeight[cluster[i]] += nodeWeight[i];
			nNodesPerCluster[cluster[i]]++;
		}
		for (i = 0; i < nClusters; i++) {
			nodePerCluster[i] = new int[nNodesPerCluster[i]];
			nNodesPerCluster[i] = 0;
		}
		for (i = 0; i < nNodes; i++) {
			j = cluster[i];
			nodePerCluster[j][nNodesPerCluster[j]] = i;
			nNodesPerCluster[j]++;
		}
		clusteringStatsAvailable = true;
	}
	
	private void deleteClusteringStats() {
		clusterWeight = null;
		nNodesPerCluster = null;
		nodePerCluster = null;
		clusteringStatsAvailable = false;
	}
}


最後給上測試資料的連結,裡面有facebook,亞馬遜等社交資訊資料,供大家除錯用:

http://snap.stanford.edu/data/#socnets

如果我有寫的不對的地方,歡迎指正,相互學習!