1. 程式人生 > >spark操作hdfs統計單詞例項 for Eclipse

spark操作hdfs統計單詞例項 for Eclipse

Set() 2018-09-07 20:27:46 INFO Utils:54 - Successfully started service 'sparkDriver' on port 1623. 2018-09-07 20:27:46 INFO SparkEnv:54 - Registering MapOutputTracker 2018-09-07 20:27:46 INFO SparkEnv:54 - Registering BlockManagerMaster 2018-09-07 20:27:46 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for
getting topology information 2018-09-07 20:27:46 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up 2018-09-07 20:27:46 INFO DiskBlockManager:54 - Created local directory at C:\Users\hsg\AppData\Local\Temp\blockmgr-2184be5b-b56c-4e63-a47f-a6bee53a2cce 2018-09-07 20:27:46 INFO MemoryStore:54 - MemoryStore started with
capacity 1987.5 MB 2018-09-07 20:27:46 INFO SparkEnv:54 - Registering OutputCommitCoordinator 2018-09-07 20:27:46 INFO log:192 - Logging initialized @1353ms 2018-09-07 20:27:46 INFO Server:346 - jetty-9.3.z-SNAPSHOT 2018-09-07 20:27:47 INFO Server:414 - Started @1408ms 2018-09-07 20:27:47 INFO AbstractConnector:278
- Started [email protected]{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 2018-09-07 20:27:47 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040. 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]531f4093{/jobs,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]403f0a22{/jobs/json,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]503ecb24{/jobs/job,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]6995bf68{/jobs/job/json,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]5143c662{/stages,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]77825085{/stages/json,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]3568f9d2{/stages/stage,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]5b1ebf56{/stages/stage/json,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]294a6b8e{/stages/pool,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]4b1d6571{/stages/pool/json,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]1b835480{/storage,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]3549bca9{/storage/json,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]4f25b795{/storage/rdd,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]6fb365ed{/storage/rdd/json,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]6e950bcf{/environment,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]16414e40{/environment/json,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]74bada02{/executors,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]525575{/executors/json,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]46dffdc3{/executors/threadDump,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]5a709816{/executors/threadDump/json,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]78383390{/static,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]31bcf236{/,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]4b3ed2f0{/api,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]3a12c404{/jobs/job/kill,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]1941a8ff{/stages/stage/kill,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://hsgpc:4040 2018-09-07 20:27:47 INFO Executor:54 - Starting executor ID driver on host localhost 2018-09-07 20:27:47 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 1664. 2018-09-07 20:27:47 INFO NettyBlockTransferService:54 - Server created on hsgpc:1664 2018-09-07 20:27:47 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 2018-09-07 20:27:47 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, hsgpc, 1664, None) 2018-09-07 20:27:47 INFO BlockManagerMasterEndpoint:54 - Registering block manager hsgpc:1664 with 1987.5 MB RAM, BlockManagerId(driver, hsgpc, 1664, None) 2018-09-07 20:27:47 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, hsgpc, 1664, None) 2018-09-07 20:27:47 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, hsgpc, 1664, None) 2018-09-07 20:27:47 INFO ContextHandler:781 - Started [email protected]6722db6e{/metrics/json,null,AVAILABLE,@Spark} 2018-09-07 20:27:47 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 236.7 KB, free 1987.3 MB) 2018-09-07 20:27:47 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 1987.2 MB) 2018-09-07 20:27:47 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on hsgpc:1664 (size: 22.9 KB, free: 1987.5 MB) 2018-09-07 20:27:47 INFO SparkContext:54 - Created broadcast 0 from textFile at SparkWordCount.scala:22 2018-09-07 20:27:48 INFO FileInputFormat:249 - Total input paths to process : 1 2018-09-07 20:27:48 INFO deprecation:1173 - mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 2018-09-07 20:27:48 INFO FileOutputCommitter:108 - File Output Committer Algorithm version is 1 2018-09-07 20:27:48 INFO SparkContext:54 - Starting job: runJob at SparkHadoopWriter.scala:78 2018-09-07 20:27:48 INFO DAGScheduler:54 - Registering RDD 3 (map at SparkWordCount.scala:23) 2018-09-07 20:27:48 INFO DAGScheduler:54 - Got job 0 (runJob at SparkHadoopWriter.scala:78) with 2 output partitions 2018-09-07 20:27:48 INFO DAGScheduler:54 - Final stage: ResultStage 1 (runJob at SparkHadoopWriter.scala:78) 2018-09-07 20:27:48 INFO DAGScheduler:54 - Parents of final stage: List(ShuffleMapStage 0) 2018-09-07 20:27:48 INFO DAGScheduler:54 - Missing parents: List(ShuffleMapStage 0) 2018-09-07 20:27:48 INFO DAGScheduler:54 - Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at SparkWordCount.scala:23), which has no missing parents 2018-09-07 20:27:48 INFO MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 1987.2 MB) 2018-09-07 20:27:48 INFO MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.8 KB, free 1987.2 MB) 2018-09-07 20:27:48 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on hsgpc:1664 (size: 2.8 KB, free: 1987.5 MB) 2018-09-07 20:27:48 INFO SparkContext:54 - Created broadcast 1 from broadcast at DAGScheduler.scala:1039 2018-09-07 20:27:48 INFO DAGScheduler:54 - Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at SparkWordCount.scala:23) (first 15 tasks are for partitions Vector(0, 1)) 2018-09-07 20:27:48 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 2 tasks 2018-09-07 20:27:48 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 7880 bytes) 2018-09-07 20:27:48 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, ANY, 7880 bytes) 2018-09-07 20:27:48 INFO Executor:54 - Running task 0.0 in stage 0.0 (TID 0) 2018-09-07 20:27:48 INFO Executor:54 - Running task 1.0 in stage 0.0 (TID 1) 2018-09-07 20:27:48 INFO HadoopRDD:54 - Input split: hdfs://192.168.145.180:8020/spark/hellospark:0+18 2018-09-07 20:27:48 INFO HadoopRDD:54 - Input split: hdfs://192.168.145.180:8020/spark/hellospark:18+18 2018-09-07 20:27:48 INFO Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 1147 bytes result sent to driver 2018-09-07 20:27:48 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 300 ms on localhost (executor driver) (1/2) 2018-09-07 20:27:48 INFO Executor:54 - Finished task 1.0 in stage 0.0 (TID 1). 1104 bytes result sent to driver 2018-09-07 20:27:48 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 301 ms on localhost (executor driver) (2/2) 2018-09-07 20:27:48 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool 2018-09-07 20:27:48 INFO DAGScheduler:54 - ShuffleMapStage 0 (map at SparkWordCount.scala:23) finished in 0.358 s 2018-09-07 20:27:48 INFO DAGScheduler:54 - looking for newly runnable stages 2018-09-07 20:27:48 INFO DAGScheduler:54 - running: Set() 2018-09-07 20:27:48 INFO DAGScheduler:54 - waiting: Set(ResultStage 1) 2018-09-07 20:27:48 INFO DAGScheduler:54 - failed: Set() 2018-09-07 20:27:48 INFO DAGScheduler:54 - Submitting ResultStage 1 (MapPartitionsRDD[5] at saveAsTextFile at SparkWordCount.scala:24), which has no missing parents 2018-09-07 20:27:48 INFO MemoryStore:54 - Block broadcast_2 stored as values in memory (estimated size 72.3 KB, free 1987.2 MB) 2018-09-07 20:27:48 INFO MemoryStore:54 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 26.1 KB, free 1987.1 MB) 2018-09-07 20:27:48 INFO BlockManagerInfo:54 - Added broadcast_2_piece0 in memory on hsgpc:1664 (size: 26.1 KB, free: 1987.4 MB) 2018-09-07 20:27:48 INFO SparkContext:54 - Created broadcast 2 from broadcast at DAGScheduler.scala:1039 2018-09-07 20:27:48 INFO DAGScheduler:54 - Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[5] at saveAsTextFile at SparkWordCount.scala:24) (first 15 tasks are for partitions Vector(0, 1)) 2018-09-07 20:27:48 INFO TaskSchedulerImpl:54 - Adding task set 1.0 with 2 tasks 2018-09-07 20:27:48 INFO TaskSetManager:54 - Starting task 0.0 in stage 1.0 (TID 2, localhost, executor driver, partition 0, ANY, 7649 bytes) 2018-09-07 20:27:48 INFO TaskSetManager:54 - Starting task 1.0 in stage 1.0 (TID 3, localhost, executor driver, partition 1, ANY, 7649 bytes) 2018-09-07 20:27:48 INFO Executor:54 - Running task 0.0 in stage 1.0 (TID 2) 2018-09-07 20:27:48 INFO Executor:54 - Running task 1.0 in stage 1.0 (TID 3) 2018-09-07 20:27:48 INFO ShuffleBlockFetcherIterator:54 - Getting 2 non-empty blocks out of 2 blocks 2018-09-07 20:27:48 INFO ShuffleBlockFetcherIterator:54 - Getting 1 non-empty blocks out of 2 blocks 2018-09-07 20:27:48 INFO ShuffleBlockFetcherIterator:54 - Started 0 remote fetches in 5 ms 2018-09-07 20:27:48 INFO ShuffleBlockFetcherIterator:54 - Started 0 remote fetches in 5 ms 2018-09-07 20:27:48 INFO FileOutputCommitter:108 - File Output Committer Algorithm version is 1 2018-09-07 20:27:48 INFO FileOutputCommitter:108 - File Output Committer Algorithm version is 1 2018-09-07 20:27:49 INFO FileOutputCommitter:535 - Saved output of task 'attempt_20180907202748_0005_m_000001_0' to hdfs://192.168.145.180:8020/spark/output/_temporary/0/task_20180907202748_0005_m_000001 2018-09-07 20:27:49 INFO SparkHadoopMapRedUtil:54 - attempt_20180907202748_0005_m_000001_0: Committed 2018-09-07 20:27:49 INFO Executor:54 - Finished task 1.0 in stage 1.0 (TID 3). 1502 bytes result sent to driver 2018-09-07 20:27:49 INFO TaskSetManager:54 - Finished task 1.0 in stage 1.0 (TID 3) in 696 ms on localhost (executor driver) (1/2) 2018-09-07 20:27:49 INFO FileOutputCommitter:535 - Saved output of task 'attempt_20180907202748_0005_m_000000_0' to hdfs://192.168.145.180:8020/spark/output/_temporary/0/task_20180907202748_0005_m_000000 2018-09-07 20:27:49 INFO SparkHadoopMapRedUtil:54 - attempt_20180907202748_0005_m_000000_0: Committed 2018-09-07 20:27:49 INFO Executor:54 - Finished task 0.0 in stage 1.0 (TID 2). 1459 bytes result sent to driver 2018-09-07 20:27:49 INFO TaskSetManager:54 - Finished task 0.0 in stage 1.0 (TID 2) in 716 ms on localhost (executor driver) (2/2) 2018-09-07 20:27:49 INFO TaskSchedulerImpl:54 - Removed TaskSet 1.0, whose tasks have all completed, from pool 2018-09-07 20:27:49 INFO DAGScheduler:54 - ResultStage 1 (runJob at SparkHadoopWriter.scala:78) finished in 0.734 s 2018-09-07 20:27:49 INFO DAGScheduler:54 - Job 0 finished: runJob at SparkHadoopWriter.scala:78, took 1.260756 s 2018-09-07 20:27:49 INFO SparkHadoopWriter:54 - Job job_20180907202748_0005 committed. hdfs://192.168.145.180:8020/spark/hellospark hdfs://192.168.145.180:8020/spark/output hello end 2018-09-07 20:27:49 INFO SparkContext:54 - Invoking stop() from shutdown hook 2018-09-07 20:27:49 INFO AbstractConnector:318 - Stopped [email protected]{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 2018-09-07 20:27:49 INFO SparkUI:54 - Stopped Spark web UI at http://hsgpc:4040 2018-09-07 20:27:49 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped! 2018-09-07 20:27:49 INFO MemoryStore:54 - MemoryStore cleared 2018-09-07 20:27:49 INFO BlockManager:54 - BlockManager stopped 2018-09-07 20:27:49 INFO BlockManagerMaster:54 - BlockManagerMaster stopped 2018-09-07 20:27:49 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped! 2018-09-07 20:27:49 INFO SparkContext:54 - Successfully stopped SparkContext 2018-09-07 20:27:49 INFO ShutdownHookManager:54 - Shutdown hook called 2018-09-07 20:27:49 INFO ShutdownHookManager:54 - Deleting directory C:\Users\hsg\AppData\Local\Temp\spark-97dc8724-e958-4db6-a005-5e365dcdcdba

相關推薦

spark操作hdfs統計單詞例項 for Eclipse

Set() 2018-09-07 20:27:46 INFO Utils:54 - Successfully started service 'sparkDriver' on port 1623. 2018-09-07 20:27:46 INFO SparkEnv:54 - Registering Map

Spark操作hdfs

Windows平臺spark連線hadoop叢集,並讀取hdfs資料 開發工具:idea 資料hdfs://hdfs://192.168.10:9000/word/ 在idea執行地方,選擇RUN-->Edit-->program arguments:新增hdf

從零開始寫一個Spark Structured Streaming程式來統計單詞個數

本文將從零開始寫一個Spark Structured Streaming程式來統計單詞的個數。單詞的來源是socket,讀者也可以換成kafka,計算的結果輸出到控制檯,讀者也可以改成輸出到kafka的某個topic。 準備環境: JDK和Scala安裝,並配置好環境變數JAVA_H

Eclipse操作HDFS高可用叢集

Eclipse操作HDFS高可用叢集 關於對叢集的操作,不希望通過程式碼,更希望可以通過視覺化介面進行叢集的操作,下面的我們進行,通過eclipse對HDFS高可用叢集的操作。 配置hadoop的外掛 首先需要在本機的eclipse上配置 Hadoop-eclipse-plugi

Spark中直接操作HDFS

Spark作為一個基於記憶體的大資料計算框架,可以和hadoop生態的資源排程器和分散式檔案儲存系統無縫融合。Spark可以直接操作儲存在HDFS上面的資料: 通過Hadoop方式操作已經存在的檔案目錄 val path = new org.apache.hadoop.fs.Path("hdfs

使用eclipse檢視操作HDFS操作流程

前提:         HDFS已經配置好,並且可用;         若沒有搭建好HDFS可以參考我的上兩篇部落格:         HDFS完全分散式搭建過程         HDFS高可用性的完全分散式搭建過程         推薦使用mars版本的ecl

使用Eclipse操作HDFS的檔案

一.常用類   1.Configuration Hadoop配置檔案的管理類,該類的物件封裝了客戶端或者伺服器的配置(配置叢集時,所有的xml檔案根節點都是configuration) 建立一個Configuration物件時,其構造方法會預設載入hadoop中的兩個配置檔案,分別是hdfs-site.

Spark -- RDD簡單操作統計文字中單行最大單詞數】

一 、什麼是RDD ?          RDD在Spark【Scala語言】中,是一種資料結構【基於記憶體,可持久化】,就好比Java的ArrayList一樣,可以進行各種的Action操作,比如Java中的List集合,可以進行get【獲取元素】、add【增加元

eclipse操作HDFS叢集API

eclipse操作HDFS叢集 windows下配置環境 1.配置HADOOP_HOME 2.配置HADOOP_USER_NAME 3.修改Path 修改eclipse配置 1.新增外掛 啟動e

spark入門實踐之單詞統計

2017-07-01 簡介 Apache Spark 是專為大規模資料處理而設計的快速通用的計算引擎。 Spark由UC Berkeley AMP lab (加州大學伯克利分校的AMP實驗室) 於2009年開始開發並開源. 目前

Spark實現HIVE統計結果匯入到HBase操作

由於HIVE更新的機制極其不適應SPARK環境,於是利用HBase來執行HIVE中某些統計結果的更新。首先要做的是實現Spark + Hive訪問,得到RDD,再將這個RDD匯入到HBase中操作。

大資料之簡單統計單詞的案例在本地eclipse執行

這是利用eclipse的執行緒代替linuxe的程序去執行 第一步:首先要將已經下載解壓後的hadoop配置好相應的環境變數 第二步: //建立配置檔案物件        Configuration conf=new Configuration

spark最新原始碼下載並匯入到開發環境下助推高質量程式碼(Scala IDEA for Eclipse和IntelliJ IDEA皆適用)(以spark2.2.0原始碼包為例)(圖文詳解)

  不多說,直接上乾貨! 前言     其實啊,無論你是初學者還是具備了有一定spark程式設計經驗,都需要對spark原始碼足夠重視起來。   本人,肺腑之己見,想要成為大資料的大牛和頂尖專家,多結合原始碼和操練程式設計。   好一段時間之前,寫過這篇部落格

Eclipse操作HDFS時常見錯誤

在Windows環境下,Eclipse操作hdfs時,出現  ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could n

利用Spark sql操作Hdfs資料與Mysql資料,sql視窗函式的使用

需求說明:                                                                  對熱門商品進行統計        根據商品的點選資料,統計出各個區域的銷量排行TOPK 產品        輸入:開始時間與結束時間 

spark操作讀取hbase例項

博主專案實踐中,經常需要用spark從hbase中讀取資料。其中,spark的版本為1.6,hbase的版本為0.98。現在記錄一下如何在spark中操作讀取hbase中的資料。 對於這種操作型的需求,沒有什麼比直接上程式碼更簡單明瞭的了。so,show me

Mapreduce例項---統計單詞個數(wordcount)

一:問題介紹 統計每一個單詞在整個資料集中出現的總次數。 資料流程: 二:需要的jar包 Hadoop-2.4.1\share\hadoop\hdfs\hadoop-hdfs-2.4.1.ja

scala 在 spark操作 hdfs

對於org.apache.hadoop.fs.Path來說,      path.getName只是檔名,不包括路徑      path.getParent也只是父檔案的檔名,同樣不包括路徑      path.toString才是檔案的全路徑名 建立檔案

Hibernate入門第二課 Hibernate Tools for Eclipse Plugins安裝

tails center 問題 名稱 lips 心得 ane 軟件 ins Hibernate入門第二課 Hibernate Tools for Eclipse Plugins安裝 你們可以直接去,http://blog.csdn.net/wei_chong_chong/a

【基礎水題】統計單詞個數

int pan 一個 else art 個數 print urn 是不是 1 //1.統計單詞的個數 2 #include <stdio.h> 3 int main(void) 4 { 5 int i, flag = 0, number =