graphframes包Linux伺服器部署
1.安裝anaconda3.
2.下載graphframes包,官方下載地址:https://spark-packages.org/package/graphframes/graphframes,下載zip格式,上傳至伺服器。
3.在伺服器上解壓1中的壓縮包,unzip xx.zip。將/python/graphframes資料夾拷貝到anaconda3/lib/python/site-package/路徑下。
4.安裝pyspark 。conda install pyspark。
5.安裝完畢。示例程式碼:
import sys
import os
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
import graphframes
CONF = SparkConf().setAppName("My app")
SC = SparkContext(conf=CONF)
sqlContext = SQLContext(SC)
v = sqlContext.createDataFrame([
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
], ["id", "name", "age"])
# Create an Edge DataFrame with "src" and "dst" columns
e = sqlContext.createDataFrame([
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
], ["src", "dst", "relationship"])
g = graphframes.GraphFrame(v, e)
# Query: Get in-degree of each vertex.
g.inDegrees.show()
# Query: Count the number of "follow" connections in the graph.
g.edges.filter("relationship = 'follow'").count()
# Run PageRank algorithm, and show results.
results = g.pageRank(resetProbability=0.01, maxIter=20)
results.vertices.select("id", "pagerank").show()
6.執行命令
spark-submit \
--master yarn \
--deploy-mode cluster \
--num-executors 1 \
--executor-memory 1G \
--archives hdfs:///tmp/buming/tools/anaconda3_bm_v2.tar#anaconda3 \
--jars hdfs:///tmp/cangyuan/package/graphframes_graphframes-0.5.0-spark2.1-s_2.11.jar,hdfs:///tmp/cangyuan/package/com.typesafe.scala-logging_scala-logging-api_2.11-2.1.2.jar,hdfs:///tmp/cangyuan/package/org.slf4j_slf4j-api-1.7.7.jar,hdfs:///tmp/cangyuan/package/com.typesafe.scala-logging_scala-logging-slf4j_2.11-2.1.2.jar,hdfs:///tmp/cangyuan/package/org.scala-lang_scala-reflect-2.11.0.jar \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./anaconda3/anaconda3/bin/python \
graph_frame.py