1. 程式人生 > >graphframes包Linux伺服器部署

graphframes包Linux伺服器部署

1.安裝anaconda3.

2.下載graphframes包,官方下載地址:https://spark-packages.org/package/graphframes/graphframes,下載zip格式,上傳至伺服器。

3.在伺服器上解壓1中的壓縮包,unzip  xx.zip。將/python/graphframes資料夾拷貝到anaconda3/lib/python/site-package/路徑下。

4.安裝pyspark 。conda install pyspark。

5.安裝完畢。示例程式碼:

import sys
import os
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
import graphframes

CONF = SparkConf().setAppName("My app")
SC = SparkContext(conf=CONF)
sqlContext = SQLContext(SC)
v = sqlContext.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
], ["id", "name", "age"])
# Create an Edge DataFrame with "src" and "dst" columns
e = sqlContext.createDataFrame([
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
], ["src", "dst", "relationship"])
g = graphframes.GraphFrame(v, e)

# Query: Get in-degree of each vertex.
g.inDegrees.show()

# Query: Count the number of "follow" connections in the graph.
g.edges.filter("relationship = 'follow'").count()

# Run PageRank algorithm, and show results.
results = g.pageRank(resetProbability=0.01, maxIter=20)
results.vertices.select("id", "pagerank").show()

6.執行命令

spark-submit  \
    --master yarn \
    --deploy-mode cluster \
    --num-executors 1 \
    --executor-memory 1G \
    --archives hdfs:///tmp/buming/tools/anaconda3_bm_v2.tar#anaconda3 \
 --jars hdfs:///tmp/cangyuan/package/graphframes_graphframes-0.5.0-spark2.1-s_2.11.jar,hdfs:///tmp/cangyuan/package/com.typesafe.scala-logging_scala-logging-api_2.11-2.1.2.jar,hdfs:///tmp/cangyuan/package/org.slf4j_slf4j-api-1.7.7.jar,hdfs:///tmp/cangyuan/package/com.typesafe.scala-logging_scala-logging-slf4j_2.11-2.1.2.jar,hdfs:///tmp/cangyuan/package/org.scala-lang_scala-reflect-2.11.0.jar \
    --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./anaconda3/anaconda3/bin/python \
    graph_frame.py