1. 程式人生 > >2.5 WordContent簡單應用

2.5 WordContent簡單應用

第2章 Hadoop快速入門

2.5 WordContent簡單應用

Hadoop的HelloWorld程式

2.5.1 建立HDFS目錄

hdfs命令位於bin目錄下,通過hdfs dfs -mkdir命令可以建立一個目錄。

[root@node1 hadoop-2.7.3]# bin/hdfs dfs -mkdir -p input
      
  • 1

hdfs建立的目錄預設會放到/user/{username}/目錄下面,其中{username}是當前使用者名稱。所以input目錄應該在/user/root/下面。 
下面通過`hdfs dfs -ls`命令可以檢視HDFS目錄檔案

[root@node1 hadoop-2.7.3]# bin/hdfs dfs -ls /
      
  • 1

這裡寫圖片描述

2.5.2 上傳檔案到HDFS

在本地新建一個文字檔案 
vi /root/words.txt

[root@node1 hadoop-2.7.3]# vi /root/words.txt
      
  • 1

隨便輸入幾個單詞,儲存退出。 
這裡寫圖片描述

將本地檔案/root/words.txt上傳到HDFS 
bin/hdfs dfs -put /root/words.txt input

 
bin/hdfs dfs -ls input

這裡寫圖片描述

2.5.3 執行WordContent

執行下面命令: 
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount input output


      
  1. [[email protected] hadoop- 2.7 .3
    ]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples- 2.7 .3.jar wordcount input output
  2. 17/ 05/ 12 09: 04: 39 INFO client.RMProxy: Connecting to ResourceManager at / 0.0 .0 .0: 8032
  3. 17/ 05/ 12 09: 04: 41 INFO input.FileInputFormat: Total input paths to process : 1
  4. 17/ 05/ 12 09: 04: 41 INFO mapreduce.JobSubmitter: number of splits: 1
  5. 17/ 05/ 12 09: 04: 42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1494590593576_0001
  6. 17/ 05/ 12 09: 04: 43 INFO impl.YarnClientImpl: Submitted application application_1494590593576_0001
  7. 17/ 05/ 12 09: 04: 43 INFO mapreduce.Job: The url to track the job: http: //node1: 8088 /proxy/application_1494590593576_0001/
  8. 17/ 05/ 12 09: 04: 43 INFO mapreduce.Job: Running job: job_1494590593576_0001
  9. 17/ 05/ 12 09: 05: 08 INFO mapreduce.Job: Job job_1494590593576_0001 running in uber mode : false
  10. 17/ 05/ 12 09: 05: 08 INFO mapreduce.Job: map 0% reduce 0%
  11. 17/ 05/ 12 09: 05: 19 INFO mapreduce.Job: map 100% reduce 0%
  12. 17/ 05/ 12 09: 05: 31 INFO mapreduce.Job: map 100% reduce 100%
  13. 17/ 05/ 12 09: 05: 32 INFO mapreduce.Job: Job job_1494590593576_0001 completed successfully
  14. 17/ 05/ 12 09: 05: 32 INFO mapreduce.Job: Counters: 49
  15. File System Counters
  16. FILE: Number of bytes read= 54
  17. FILE: Number of bytes written= 237325
  18. FILE: Number of read operations= 0
  19. FILE: Number of large read operations= 0
  20. FILE: Number of write operations= 0
  21. HDFS: Number of bytes read= 163
  22. HDFS: Number of bytes written= 32
  23. HDFS: Number of read operations= 6
  24. HDFS: Number of large read operations= 0
  25. HDFS: Number of write operations= 2
  26. Job Counters
  27. Launched map tasks= 1
  28. Launched reduce tasks= 1
  29. Data- local map tasks= 1
  30. Total time spent by all maps in occupied slots (ms)= 8861
  31. Total time spent by all reduces in occupied slots (ms)= 8430
  32. Total time spent by all map tasks (ms)= 8861
  33. Total time spent by all reduce tasks (ms)= 8430
  34. Total vcore-milliseconds taken by all map tasks= 8861
  35. Total vcore-milliseconds taken by all reduce tasks= 8430
  36. Total megabyte-milliseconds taken by all map tasks= 9073664
  37. Total megabyte-milliseconds taken by all reduce tasks= 8632320
  38. Map-Reduce Framework
  39. Map input records= 3
  40. Map output records= 9
  41. Map output bytes= 91
  42. Map output materialized bytes= 54
  43. Input split bytes= 108
  44. Combine input records= 9
  45. Combine output records= 4
  46. Reduce input groups= 4
  47. Reduce shuffle bytes= 54
  48. Reduce input records= 4
  49. Reduce output records= 4
  50. Spilled Records= 8
  51. Shuffled Maps = 1
  52. Failed Shuffles= 0
  53. Merged Map outputs= 1
  54. GC time elapsed (ms)= 249
  55. CPU time spent (ms)= 2950
  56. Physical memory (bytes) snapshot= 303017984
  57. Virtual memory (bytes) snapshot= 4157116416
  58. Total committed heap usage (bytes)= 165810176
  59. Shuffle Errors
  60. BAD_ID= 0
  61. CONNECTION= 0
  62. IO_ERROR= 0
  63. WRONG_LENGTH= 0
  64. WRONG_MAP= 0
  65. WRONG_REDUCE= 0
  66. File Input Format Counters
  67. Bytes Read= 55
  68. File Output Format Counters
  69. Bytes Written= 32
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69

2.5.4 檢視結果

bin/hdfs dfs -ls output 
bin/hdfs dfs -cat output/part-r-00000


      
  1. [root @node1 hadoop- 2.7. 3] # bin/hdfs dfs -ls output/
  2. Found 2 items
  3. -rw-r--r-- 1 root supergroup 0 2017- 05- 12 09 : 05 output/_SUCCESS
  4. -rw-r--r-- 1 root supergroup 32 2017- 05- 12 09 : 05 output/part-r- 00000
  5. [root @node1 hadoop- 2.7. 3] # bin/hdfs dfs -cat output/part-r-00000
  6. Hadoop 3
  7. Hello 2
  8. Java 2
  9. World 2
  10. [root @node1 hadoop- 2.7. 3] #
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 這裡寫圖片描述

第2章 Hadoop快速入門

2.5 WordContent簡單應用

Hadoop的HelloWorld程式

2.5.1 建立HDFS目錄

hdfs命令位於bin目錄下,通過hdfs dfs -mkdir命令可以建立一個目錄。

[root@node1 hadoop-2.7.3]# bin/hdfs dfs -mkdir -p input
    
  • 1

hdfs建立的目錄預設會放到/user/{username}/目錄下面,其中{username}是當前使用者名稱。所以input目錄應該在/user/root/下面。 
下面通過`hdfs dfs -ls`命令可以檢視HDFS目錄檔案

[root@node1 hadoop-2.7.3]# bin/hdfs dfs -ls /
    
  • 1

這裡寫圖片描述

2.5.2 上傳檔案到HDFS

在本地新建一個文字檔案 
vi /root/words.txt

[root@node1 hadoop-2.7.3]# vi /root/words.txt
    
  • 1

隨便輸入幾個單詞,儲存退出。 
這裡寫圖片描述

將本地檔案/root/words.txt上傳到HDFS 
bin/hdfs dfs -put /root/words.txt input 
bin/hdfs dfs -ls input

這裡寫圖片描述

2.5.3 執行WordContent

執行下面命令: 
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount input output


    
  1. [[email protected] hadoop- 2.7 .3]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples- 2.7 .3.jar wordcount input output
  2. 17/ 05/ 12 09: 04: 39 INFO client.RMProxy: Connecting to ResourceManager at / 0.0 .0 .0: 8032
  3. 17/ 05/ 12 09: 04: 41 INFO input.FileInputFormat: Total input paths to process : 1
  4. 17/ 05/ 12 09: 04: 41 INFO mapreduce.JobSubmitter: number of splits: 1
  5. 17/ 05/ 12 09: 04: 42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1494590593576_0001
  6. 17/ 05/ 12 09: 04: 43 INFO impl.YarnClientImpl: Submitted application application_1494590593576_0001
  7. 17/ 05/ 12 09: 04: 43 INFO mapreduce.Job: The url to track the job: http: //node1: 8088 /proxy/application_1494590593576_0001/
  8. 17/ 05/ 12 09: 04: 43 INFO mapreduce.Job: Running job: job_1494590593576_0001
  9. 17/ 05/ 12 09: 05: 08 INFO mapreduce.Job: Job job_1494590593576_0001 running in uber mode : false
  10. 17/ 05/ 12 09: 05: 08 INFO mapreduce.Job: map 0% reduce 0%
  11. 17/ 05/ 12 09: 05: 19 INFO mapreduce.Job: map 100% reduce 0%
  12. 17/ 05/ 12 09: 05: 31 INFO mapreduce.Job: map 100% reduce 100%
  13. 17/ 05/ 12 09: 05: 32 INFO mapreduce.Job: Job job_1494590593576_0001 completed successfully
  14. 17/ 05/ 12 09: 05: 32 INFO mapreduce.Job: Counters: 49
  15. File System Counters
  16. FILE: Number of bytes read= 54
  17. FILE: Number of bytes written= 237325
  18. FILE: Number of read operations= 0
  19. FILE: Number of large read operations= 0
  20. FILE: Number of write operations= 0
  21. HDFS: Number of bytes read= 163
  22. HDFS: Number of bytes written= 32
  23. HDFS: Number of read operations= 6