電商專案實戰-打包至伺服器上執行(十三)
1、更改輸入、輸出路徑
(1)輸入路徑為:args[0]
(2)輸出路徑為:args[1]
2、修改IPParser.java
src/main/java/project/utils/IPParser.java
目前本機的IP庫是放在ip/qqwry.dat
要修改為:
//本機ip庫路徑 //private static final String ipFilePath = "ip/qqwry.dat"; //伺服器端ip庫路徑 private static final String ipFilePath = "/home/hadoop/lib/qqwry.dat";
3、修改pom.xml
使用1.8來編譯
在<project></project>中間新增:
<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.3</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build>
4、本機打包,上傳至伺服器
本機cmd中:
C:\Users\jieqiong>cd C:\Users\jieqiong\IdeaProjects\hadoop-train-v2 C:\Users\jieqiong\IdeaProjects\hadoop-train-v2>mvn clean package -DskipTests
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2>cd target/ C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target>dir 驅動器 C 中的卷是 Windows-SSD 卷的序列號是 F0E4-86A5 C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target 的目錄 2021/07/27 16:19 <DIR> . 2021/07/27 16:19 <DIR> .. 2021/07/27 16:19 <DIR> classes 2021/07/27 16:19 <DIR> generated-sources 2021/07/27 16:19 <DIR> generated-test-sources 2021/07/27 16:19 51,390 hadoop-train-v2-1.0.jar 2021/07/27 16:19 <DIR> maven-archiver 2021/07/27 16:19 <DIR> maven-status 2021/07/27 16:19 <DIR> test-classes 1 個檔案 51,390 位元組 8 個目錄 148,776,701,952 可用位元組
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target>scp hadoop-train-v2-1.0.jar [email protected]:~lib/
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\ip>scp qqwry.dat [email protected]:~lib/
5、伺服器端上傳好的檔案
[hadoop@hadoop000 lib]$ pwd /home/hadoop/lib [hadoop@hadoop000 lib]$ ls hadoop-train-v2-1.0.jar qqwry.dat
6、data資料夾中的資料
[hadoop@hadoop000 data]$ pwd /home/hadoop/data [hadoop@hadoop000 data]$ ls access.log data.txt emp.txt helloworld.txt part-r-00000 accessOwn.log dept.txt emp.txt-bak h.txt trackinfo_20130721.data
6、將trackinfo_20130721.data上傳至hdfs中的/project/input/raw(hdfs中本存在,之後使用到的都是已傳好的版本,非自己的版本,注意路徑)
[hadoop@hadoop000 data]$ hadoop fs -mkdir -p /project/input/raw [hadoop@hadoop000 data]$ hadoop fs -put trackinfo_20130721.data /project/input/raw [hadoop@hadoop000 data]$ hadoop fs -ls /project/input/raw Found 1 items -rw-r--r-- 1 hadoop supergroup 173555592 2018-12-09 08:50 /project/input/raw/trackinfo_20130721.data
7、寫指令碼
在/shell/pv.sh
沒有pv.sh檔案,使用vi直接建立並進入。
[hadoop@hadoop000 ~]$ clear [hadoop@hadoop000 ~]$ cd shell/ [hadoop@hadoop000 shell]$ ls [hadoop@hadoop000 shell]$ vi pv.sh
在pv.sh檔案寫入:
hadoop jar + "在hdfs中的jar包路徑及jar包名" + “要執行的某一java的copy reference” + "資料輸入路徑" + ”資料輸出路徑“
hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.PVStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/pvstat/
8、執行
先設定執行許可權,再執行。
[hadoop@hadoop000 shell]$ chmod u+x pv.sh
[hadoop@hadoop000 shell]$ ./pv.sh
(1)執行com.imooc.bigdata.hadoop.mr.project.mr.PVStatApp
注意:要複製類名的Copy Reference
hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.PVStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/pvstat/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v1/pvstat/part-r-00000 300000
(2)執行com.imooc.bigdata.hadoop.mr.project.mr.ProvinceStartApp
hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.ProvinceStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/provincestat/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v1/provincestat/part-r-00000
(3)執行com.imooc.bigdata.hadoop.mr.project.mr.PageStatApp
hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.PageStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/pagestat/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v1/pagestat/part*
(4)執行com.imooc.bigdata.hadoop.mr.project.mrv2.ETLApp
hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mrv2.ETLApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/input/etl/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/input/etl/part*
(5)執行com.imooc.bigdata.hadoop.mr.project.mrv2.ProvinceStatV2App
hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mrv2.ProvinceStatV2App hdfs://hadoop000:8020/project/input/etl/ hdfs://hadoop000:8020/project/output/v2/provincestatv2/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v2/provincestatv2/part*
(6)執行com.imooc.bigdata.hadoop.mr.project.mrv2.PVStatV2App
hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mrv2.PVStatV2App hdfs://hadoop000:8020/project/input/etl/ hdfs://hadoop000:8020/project/output/v2/pvstatv2/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v2/pvstatv2/part*
11、總結
大資料處理完以後的資料,是存放在HDFS上
其實大資料乾的事情基本就這麼多
再進一步:需要使用技術或者框架把處理完的結果匯出到資料庫中
Sqoop:把HDFS上的統計結果匯出到MySQL中。