學習Spark GraphX
阿新 • • 發佈:2018-12-20
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
val userGraph: Graph[(String, String), String]
Name: Compile Error Message: <console>:30: error: class $iw needs to be abstract, since value userGraph is not defined class $iw extends Serializable { ^ StackTrace:
val users: RDD[(VertexId, (String, String))] =
sc.parallelize(Array((3L, ("rxin", "student")),
(7L, ("jgonzal", "postdoc")),
(5L, ("franklin", "prof")),
(2L, ("istoica", "prof"))))
users = ParallelCollectionRDD[0] at parallelize at <console>:35 ParallelCollectionRDD[0] at parallelize at <console>:35
val relationships: RDD[Edge[String]] = sc.parallelize(Array(
Edge(3L, 7L, "collab"),
Edge(5L, 3L, "advisor"),
Edge(2L, 5L, "colleague"),
Edge(5L, 7L, "pi")
))
relationships = ParallelCollectionRDD[1] at parallelize at <console>:34 ParallelCollectionRDD[1] at parallelize at <console>:34
val defaultUser = ("John Doe", "Missing")
val graph = Graph(users, relationships, defaultUser)
defaultUser = (John Doe,Missing)
graph = [email protected]
[email protected]
graph.vertices.filter {case (id, (name, pos)) => pos == "postdoc"}.count
1
graph.vertices.filter {case (id, (name, pos)) => pos == "prof"}.count
2
graph.edges.filter(e => e.srcId < e.dstId).count
3
Graph 操作
詳見 https://spark.apache.org/docs/latest/graphx-programming-guide.html 的Graph類
1. 圖資訊
//邊數
graph.numEdges
4
//頂點數
graph.numVertices
4
//計算入度
graph.inDegrees.reduceByKey(_ + _).take(5)
Array((3,1), (5,1), (7,2))
//計算出度
graph.outDegrees.reduceByKey(_ + _).take(5)
Array((2,1), (3,1), (5,2))
//計算度
graph.degrees.reduceByKey(_ + _).collect()
Array((2,1), (3,2), (5,3), (7,2))
2.圖檢視
//頂點
graph.vertices.filter {case (id, (name, pos)) => pos == "postdoc"}.count
1
//邊
graph.edges.filter(e => e.srcId < e.dstId).count
3
//返回三元組檢視
graph.triplets.collect()
Array(((3,(rxin,student)),(7,(jgonzal,postdoc)),collab), ((5,(franklin,prof)),(3,(rxin,student)),advisor), ((2,(istoica,prof)),(5,(franklin,prof)),colleague), ((5,(franklin,prof)),(7,(jgonzal,postdoc)),pi))
3.圖快取
- persist
- cache
- unpersistVertices
4.分割槽
- partitionBy
5.頂點與邊的轉換
- mapVertices
- mapEdges
- mapTriplets
6.修改圖結構
- reverse
- subgraph
- mask
- groupEdges
7.用圖連線RDD
- joinVertices
- outerJoinVertices
8.彙集鄰近的三元組資訊
- collectNeighborIds
- collectNeighbors
- aggregateMessages
9.互動並行圖計算
- pregel
10.基本圖演算法
- pageRank
- connectedComponents
- triangleCount
- stronglyConnectedComponents