用java編寫spark程式,簡單示例及執行
最近因為工作需要,研究了下spark,因為scala還不熟,所以先學習了java的spark程式寫法,下面是我的簡單測試程式的程式碼,大部分函式的用法已在註釋裡面註明。
我的環境:hadoop 2.2.0
spark-0.9.0
scala-2.10.3
jdk1.7
import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.FlatMapFunction; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; import org.apache.spark.api.java.function.PairFunction; import scala.Tuple2; import java.util.Arrays; import java.util.List; import java.util.regex.Pattern; public final class mysparktest { public static void main(String[] args) throws Exception { //context ,用於讀檔案 ,類似於scala的sc //格式為: // JavaSparkContext(master: String, appName: String, sparkHome: String, jars: Array[String], environment: Map[String, String]) JavaSparkContext ctx = new JavaSparkContext("yarn-standalone", "JavaWordCount", System.getenv("SPARK_HOME"), JavaSparkContext.jarOfClass(mysparktest.class)); //也可以使用ctx獲取環境變數,例如下面的語句 System.out.println("spark home:"+ctx.getSparkHome()); //一次一行,String型別 ,還有hadoopfile,sequenceFile什麼的 ,可以直接用sc.textFile("path") JavaRDD<String> lines = ctx.textFile(args[1], 1); //java.lang.String path, int minSplits lines.cache(); //cache,暫時放在快取中,一般用於哪些可能需要多次使用的RDD,據說這樣會減少執行時間 //collect方法,用於將RDD型別轉化為java基本型別,如下 List<String> line = lines.collect(); for(String val:line) System.out.println(val); //下面這些也是RDD的常用函式 // lines.collect(); List<String> // lines.union(); javaRDD<String> // lines.top(1); List<String> // lines.count(); long // lines.countByValue(); /** * filter test * 定義一個返回bool型別的函式,spark執行filter的時候會過濾掉那些返回只為false的資料 * String s,中的變數s可以認為就是變數lines(lines可以理解為一系列的String型別資料)的每一條資料 */ JavaRDD<String> contaninsE = lines.filter(new Function<String, Boolean>() { @Override public Boolean call(String s) throws Exception { return (s.contains("they")); } }); System.out.println("--------------next filter's result------------------"); line = contaninsE.collect(); for(String val:line) System.out.println(val); /** * sample test * sample函式使用很簡單,用於對資料進行抽樣 * 引數為:withReplacement: Boolean, fraction: Double, seed: Int * */ JavaRDD<String> sampletest = lines.sample(false,0.1,5); System.out.println("-------------next sample-------------------"); line = sampletest.collect(); for(String val:line) System.out.println(val); /** * * new FlatMapFunction<String, String>兩個string分別代表輸入和輸出型別 * Override的call方法需要自己實現一個轉換的方法,並返回一個Iterable的結構 * * flatmap屬於一類非常常用的spark函式,簡單的說作用就是將一條rdd資料使用你定義的函式給分解成多條rdd資料 * 例如,當前狀態下,lines這個rdd型別的變數中,每一條資料都是一行String,我們現在想把他拆分成1個個的詞的話, * 可以這樣寫 : */ JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() { @Override public Iterable<String> call(String s) { String[] words=s.split(" "); return Arrays.asList(words); } }); /** * map 鍵值對 ,類似於MR的map方法 * pairFunction<T,K,V>: T:輸入型別;K,V:輸出鍵值對 * 需要重寫call方法實現轉換 */ JavaPairRDD<String, Integer> ones = words.map(new PairFunction<String, String, Integer>() { @Override public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); } }); //A two-argument function that takes arguments // of type T1 and T2 and returns an R. /** * reduceByKey方法,類似於MR的reduce * 要求被操作的資料(即下面例項中的ones)是KV鍵值對形式,該方法會按照key相同的進行聚合,在兩兩運算 */ JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Function2<Integer, Integer, Integer>() { @Override public Integer call(Integer i1, Integer i2) { //reduce階段,key相同的value怎麼處理的問題 return i1 + i2; } }); //備註:spark也有reduce方法,輸入資料是RDD型別就可以,不需要鍵值對, // reduce方法會對輸入進來的所有資料進行兩兩運算 /** * sort,顧名思義,排序 */ JavaPairRDD<String,Integer> sort = counts.sortByKey(); System.out.println("----------next sort----------------------"); /** * collect方法其實之前已經出現了多次,該方法用於將spark的RDD型別轉化為我們熟知的java常見型別 */ List<Tuple2<String, Integer>> output = sort.collect(); for (Tuple2<?,?> tuple : output) { System.out.println(tuple._1 + ": " + tuple._2()); } /** * 儲存函式,資料輸出,spark為結果輸出提供了很多介面 */ sort.saveAsTextFile("/tmp/spark-tmp/test"); // sort.saveAsNewAPIHadoopFile(); // sort.saveAsHadoopFile(); System.exit(0); } }
程式碼編寫完成之後,打包上傳到Linux上,編寫spark程式的執行指令碼:
其中輸入資料儲存在#! /bin/bash export YARN_CONF_DIR=/usr/lib/cloud/hadoop/hadoop-2.2.0/etc/hadoop export SPARK_JAR=/usr/lib/cloud/spark/spark-0.9.0-incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar /usr/lib/cloud/spark/spark-0.9.0-incubating-bin-hadoop2/bin/spark-class org.apache.spark.deploy.yarn.Client \ --jar mysparktest.jar \ --class mysparktest.jar \ --args yarn-standalone \ --args /user/zhangdeyang/testspark \ --num-workers 3 \ --master-memory 485m \ --worker-memory 485m \ --worker-cores 2
/user/zhangdeyang/testspark中,測試資料如下:
Look! at the window there leans an old maid. She plucks the withered leaf from the balsam, and looks at the grass-covered rampart, on which many children are playing. What is the old maid thinking of? A whole life drama is unfolding itself before her inward gaze. "The poor little children, how happy they are- how merrily they play and romp together! What red cheeks and what angels' eyes! but they have no shoes nor stockings. They dance on the green rampart, just on the place where, according to the old story, the ground always sank in, and where a sportive, frolicsome child had been lured by means of flowers, toys and sweetmeats into an open grave ready dug for it, and which was afterwards closed over the child; and from that moment, the old story says, the ground gave way no longer, the mound remained firm and fast, and was quickly covered with the green turf. The little people who now play on that spot know nothing of the old tale, else would they fancy they heard a child crying deep below the earth, and the dewdrops on each blade of grass would be to them tears of woe. Nor do they know anything of the Danish King who here, in the face of the coming foe, took an oath before all his trembling courtiers that he would hold out with the citizens of his capital, and die here in his nest; they know nothing of the men who have fought here, or of the women who from here have drenched with boiling water the enemy, clad in white, and 'biding in the snow to surprise the city. .
執行我們編寫的執行指令碼,可得結果如下:
spark home:Optional.of(/usr/lib/cloud/spark/spark-0.9.0-incubating-bin-hadoop2)
Look! at the window there leans an old maid. She plucks the
withered leaf from the balsam, and looks at the grass-covered rampart,
on which many children are playing. What is the old maid thinking
of? A whole life drama is unfolding itself before her inward gaze.
"The poor little children, how happy they are- how merrily they
play and romp together! What red cheeks and what angels' eyes! but
they have no shoes nor stockings. They dance on the green rampart,
just on the place where, according to the old story, the ground always
sank in, and where a sportive, frolicsome child had been lured by
means of flowers, toys and sweetmeats into an open grave ready dug for
it, and which was afterwards closed over the child; and from that
moment, the old story says, the ground gave way no longer, the mound
remained firm and fast, and was quickly covered with the green turf.
The little people who now play on that spot know nothing of the old
tale, else would they fancy they heard a child crying deep below the
earth, and the dewdrops on each blade of grass would be to them
tears of woe. Nor do they know anything of the Danish King who here,
in the face of the coming foe, took an oath before all his trembling
courtiers that he would hold out with the citizens of his capital, and
die here in his nest; they know nothing of the men who have fought
here, or of the women who from here have drenched with boiling water
the enemy, clad in white, and 'biding in the snow to surprise the
city.
--------------next filter's result------------------
"The poor little children, how happy they are- how merrily they
they have no shoes nor stockings. They dance on the green rampart,
tale, else would they fancy they heard a child crying deep below the
tears of woe. Nor do they know anything of the Danish King who here,
die here in his nest; they know nothing of the men who have fought
-------------next sample-------------------
"The poor little children, how happy they are- how merrily they
it, and which was afterwards closed over the child; and from that
in the face of the coming foe, took an oath before all his trembling
----------next sort----------------------
: 27
"The: 1
'biding: 1
A: 1
Danish: 1
King: 1
Look!: 1
Nor: 1
She: 1
The: 1
They: 1
What: 2
a: 2
according: 1
afterwards: 1
all: 1
always: 1
an: 3
and: 12
angels': 1
anything: 1
are: 1
are-: 1
at: 2
balsam,: 1
be: 1
been: 1
before: 2
below: 1
blade: 1
boiling: 1
but: 1
by: 1
capital,: 1
cheeks: 1
child: 2
child;: 1
children: 1
children,: 1
citizens: 1
city.: 1
clad: 1
closed: 1
coming: 1
courtiers: 1
covered: 1
crying: 1
dance: 1
deep: 1
dewdrops: 1
die: 1
do: 1
drama: 1
drenched: 1
dug: 1
each: 1
earth,: 1
else: 1
enemy,: 1
eyes!: 1
face: 1
fancy: 1
fast,: 1
firm: 1
flowers,: 1
foe,: 1
for: 1
fought: 1
frolicsome: 1
from: 3
gave: 1
gaze.: 1
grass: 1
grass-covered: 1
grave: 1
green: 2
ground: 2
had: 1
happy: 1
have: 3
he: 1
heard: 1
her: 1
here: 2
here,: 2
his: 3
hold: 1
how: 2
in: 4
in,: 1
into: 1
inward: 1
is: 2
it,: 1
itself: 1
just: 1
know: 3
leaf: 1
leans: 1
life: 1
little: 2
longer,: 1
looks: 1
lured: 1
maid: 1
maid.: 1
many: 1
means: 1
men: 1
merrily: 1
moment,: 1
mound: 1
nest;: 1
no: 2
nor: 1
nothing: 2
now: 1
oath: 1
of: 9
of?: 1
old: 5
on: 5
open: 1
or: 1
out: 1
over: 1
people: 1
place: 1
play: 2
playing.: 1
plucks: 1
poor: 1
quickly: 1
rampart,: 2
ready: 1
red: 1
remained: 1
romp: 1
sank: 1
says,: 1
shoes: 1
snow: 1
sportive,: 1
spot: 1
stockings.: 1
story: 1
story,: 1
surprise: 1
sweetmeats: 1
tale,: 1
tears: 1
that: 3
the: 26
them: 1
there: 1
they: 7
thinking: 1
to: 3
together!: 1
took: 1
toys: 1
trembling: 1
turf.: 1
unfolding: 1
was: 2
water: 1
way: 1
what: 1
where: 1
where,: 1
which: 2
white,: 1
who: 4
whole: 1
window: 1
with: 3
withered: 1
woe.: 1
women: 1
would: 3
相關推薦
用java編寫spark程式,簡單示例及執行
最近因為工作需要,研究了下spark,因為scala還不熟,所以先學習了java的spark程式寫法,下面是我的簡單測試程式的程式碼,大部分函式的用法已在註釋裡面註明。 我的環境:hadoop 2.2.0 spark-0.9.0
用java編寫在1,2,…,9(順序不能變)數字之間插入+或-或什麼都不插入,使得計算結果總是100的程式,並輸出所有的可能性。例如:1 + 2 + 34 – 5 + 67 – 8 + 9 = 100
今天看到一個題目,編寫一個在1,2,…,9(順序不能變)數字之間插入+或-或什麼都不插入,使得計算結果總是100的程式,並輸出所有的可能性。例如:1 + 2 + 34 – 5 + 67 – 8 + 9 = 100。 剛開始看到題目的時候一籌莫展,但是題目下一條
java編寫spark程式
importjava.net.URI; import java.util.Arrays; import java.io.*; import org.apache.hadoop.io.*; import org.apache.hadoop.conf.Configuration; import org.
用Java編寫你自己的簡單HTTP伺服器
HTTP是個大協議,完整功能的HTTP伺服器必須響應資源請求,將URL轉換為本地系統的資源名。響應各種形式的HTTP請求(GET、POST等)。處理不存在的檔案請求,返回各種形式的狀態碼,解析MIME型別等。但許多特定功能的HTTP伺服器並不需要所有這些功能。例如,很多網站只
編寫一個程式,啟動三個執行緒,三個執行緒的名稱分別是 A,B,C; 每個執行緒將自己的名稱在螢幕上列印5遍,列印順序是ABCABC...
設定標誌位flag 當flag==1時,列印A 當flag==2時,列印B 當flag==3時,列印C 用count控制列印的次數,題目要求列印5遍,即15個字元 這裡的用notifyAll()的原因:是要把其餘兩個全都喚醒,因為如果用notify
編寫一個程式,開啟3個執行緒,這3個執行緒的ID分別為A、B、C,每個執行緒將自己的ID在螢幕上列印10遍
#include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <unistd.h> #include <string.h> //#define DEBUG 1 #d
編寫一個程式,開啟3個執行緒,這3個執行緒的ID分別為A、B、C,每個執行緒將自己的ID在螢幕上列印10遍,要求輸出結果必須按ABC的順序顯示;如:ABCABC….依次遞推。
#include <stdio.h> #include <pthread.h> #include <stdlib.h> #define NUM 10 pthread_mutex_t mutex; pthread_cond_t cond
編寫一個程式,開啟3個執行緒,這3個執行緒的ID分別為A、B、C,每個執行緒將自己的ID在螢幕上列印10遍,要求輸出結果必須按ABC的順序顯示;如:ABCABC
package test1; import java.util.concurrent.locks.Condition; import java.util.concurrent.locks.Lock; import java.util.concurrent.locks.R
java編寫WordCound的Spark程式,Scala編寫wordCound程式
1、建立一個maven專案,專案的相關資訊如下: <groupId>cn.toto.spark</groupId> <artifactId>bigdata</artifactId> <version>1.0-S
用Java編寫的http下載工具類,包含下載進度回調
listener layout output @override extends zh-cn st2 NPU .info HttpDownloader.java package com.buyishi; import java.io.FileOutputStream;
山科java實驗4-1 編寫一個程式,使用者可以從鍵盤錄入若干個學生的姓名和分數(程式每次提示使用者輸入“Y”或“N”決定是否繼續錄入學生資訊,如果使用者輸入“N”則使用者輸入完畢。輸入的“Y”、“N”不區分
編寫一個程式,使用者可以從鍵盤錄入若干個學生的姓名和分數(程式每次提示使用者輸入“Y”或“N”決定是否繼續錄入學生資訊,如果使用者輸入“N”則使用者輸入完畢。輸入的“Y”、“N”不區分大小寫)。使用者錄入完畢後,程式按成績由高到低的順序輸出學生的姓名和分數(姓名和分數之間用一個空格分割)。【說明
用指標方法編寫一個程式,輸入3個整數,將它們按由小到大的順序輸出
#include <stdio.h> void swap(int *pa,int *pb) { int temp; temp = *pa; *pa = *pb; *pb = temp; } void main() { int
eclipse原始碼java反編譯外掛,簡單好用
第一步 在eclipse安裝路徑下的dropins資料夾下面放入反編譯外掛 重啟eclipse 第二步 設定 1.Window > Preferences > General > Editors > File Associatio
用 java 編寫 Hello World 程式
一、安裝 JDK 可在360軟體管家內下載並安裝 安裝如下圖: 設定安裝目錄 等待進度條完成: 二、eclipse下載與安裝 下載連結為:https://www.eclipse.org/ 下載安裝包: 點選 Eclipse
輸出100到1000以內的迴文素數,用JAVA編寫
老師的要求是:使用JAVA語言程式設計輸出100到999的所有迴文素數。 落實到實際編寫上,意思也就是找出100-1000以內的所有迴文素數並顯示到螢幕上。 先上程式碼: public class te
java程式碼用eclipse開啟電腦程式,比如記事本
1.複製以下程式碼 package javaexercises; import java.io.IOException; public class open{ public static void main(String[] args) { try { R
實戰案例-- 用Java編寫基礎小程式
如果是剛接觸或者剛學習Java,練習一些基礎的演算法還是必須的,可以提升思維和語法的使用。 1、輸出兩個int數中的最大值 import java.util.Scanner; public class demo { public static void mai
用Java編寫基礎小程式&&經典案例
如果是剛接觸或者剛學習java,練習一些基礎的演算法還是必須的,可以提升思維和語法的使用。 1、輸出兩個int數中的最大值 import java.util.Scanner; publi
Java 寫一個方法判斷一個字串是否對稱 "asdfgasdfg"、編寫一個程式,將下面的一段文字中的各個單詞的字母順序翻轉,
1、寫一個方法判斷一個字串是否對稱 "asdfgasdfg" public class Demo22 {public static void main(String[] args) { String string="asdfgasdfg";
12個用Java編寫基礎小程式&經典案例(收藏篇)
如果是剛接觸或者剛學習java,練習一些基礎的演算法還是必須的,可以提升思維和語法的使用。 1、輸出兩個int數中的最大值 import java.util.Scanner; public class demo { public static void main(String[] arg