Spark Graphx--連通分量
阿新 • • 發佈:2021-01-18
技術標籤:Sparksparksparkgraphx連通分量
連通分量
1.什麼是連通分量
連通分量是一個子圖,其中任何兩個頂點通過一條邊或一系列邊相互連線,其頂點是原始圖頂點集的子集,其邊是原始圖邊集的子集
2.計算連通分量的方法
class Graph[VD, ED] {
def connectedComponents(): Graph[VertexID, ED]
}
3.示例
即去掉了與其他頂點無關的那個頂點資訊
package cn.kgc.spark.graphx
import org.apache.spark. SparkContext
import org.apache.spark.graphx.{Edge, Graph, PartitionID, VertexId}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession
object Demo10_ConnectCompents {
def main(args: Array[String]): Unit = {
// 建立SparkSession
val spark: SparkSession = SparkSession.builder( )
.appName(this.getClass.getName)
.master("local[4]")
.getOrCreate()
//
// 建立SparkContext
val sc: SparkContext = spark.sparkContext
val users: RDD[(VertexId, (String, PartitionID))] = sc.parallelize(Array(
(1L, ("Alice", 28)),
(2L, ("Bob" , 27)),
(3L, ("Charlie", 65)),
(4L, ("David", 42)),
(5L, ("Ed", 55)),
(6L, ("Fran", 50)),
(7L,("zhsang",41))
))
val cntCall: RDD[Edge[PartitionID]] = sc.parallelize(Array(
Edge(2L, 1L, 7),
Edge(2L, 4L, 2),
Edge(3L, 2L, 4),
Edge(3L, 6L, 3),
Edge(4L, 1L, 1),
Edge(5L, 2L, 2),
Edge(5L, 3L, 8),
Edge(5L, 6L, 3)
))
val graph: Graph[(String, PartitionID), PartitionID] = Graph(users, cntCall)
//呼叫api計算連通分量,得到連通分量子圖
//即去掉了與其他頂點無關的頂點7得到的子圖稱為連通分量
graph.connectedComponents().triplets.foreach(println)
}
}
//輸出
((3,1),(2,1),4)
((5,1),(3,1),8)
((2,1),(1,1),7)
((2,1),(4,1),2)
((4,1),(1,1),1)
((5,1),(6,1),3)
((3,1),(6,1),3)
((5,1),(2,1),2)