1. 程式人生 > 其它 >電商專案實戰-打包至伺服器上執行(十三)

電商專案實戰-打包至伺服器上執行(十三)

1、更改輸入、輸出路徑

(1)輸入路徑為:args[0]

(2)輸出路徑為:args[1]

2、修改IPParser.java

src/main/java/project/utils/IPParser.java

目前本機的IP庫是放在ip/qqwry.dat

要修改為:

    //本機ip庫路徑
    //private static final String ipFilePath = "ip/qqwry.dat";
    //伺服器端ip庫路徑
    private static final String ipFilePath = "/home/hadoop/lib/qqwry.dat";

3、修改pom.xml

使用1.8來編譯

在<project></project>中間新增:

<build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.3</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

4、本機打包,上傳至伺服器

本機cmd中:

C:\Users\jieqiong>cd C:\Users\jieqiong\IdeaProjects\hadoop-train-v2
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2>mvn clean package -DskipTests
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2>cd target/
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target>dir
 驅動器 C 中的卷是 Windows
-SSD 卷的序列號是 F0E4-86A5 C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target 的目錄 2021/07/27 16:19 <DIR> . 2021/07/27 16:19 <DIR> .. 2021/07/27 16:19 <DIR> classes 2021/07/27 16:19 <DIR> generated-sources 2021/07/27 16:19 <DIR> generated-test-sources 2021/07/27 16:19 51,390 hadoop-train-v2-1.0.jar 2021/07/27 16:19 <DIR> maven-archiver 2021/07/27 16:19 <DIR> maven-status 2021/07/27 16:19 <DIR> test-classes 1 個檔案 51,390 位元組 8 個目錄 148,776,701,952 可用位元組
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target>scp hadoop-train-v2-1.0.jar [email protected]:~lib/
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\ip>scp qqwry.dat [email protected]:~lib/

5、伺服器端上傳好的檔案

[hadoop@hadoop000 lib]$ pwd
/home/hadoop/lib
[hadoop@hadoop000 lib]$ ls
hadoop-train-v2-1.0.jar  qqwry.dat

6、data資料夾中的資料

[hadoop@hadoop000 data]$ pwd
/home/hadoop/data
[hadoop@hadoop000 data]$ ls
access.log     data.txt  emp.txt      helloworld.txt  part-r-00000
accessOwn.log  dept.txt  emp.txt-bak  h.txt           trackinfo_20130721.data

6、將trackinfo_20130721.data上傳至hdfs中的/project/input/raw(hdfs中本存在,之後使用到的都是已傳好的版本,非自己的版本,注意路徑)

[hadoop@hadoop000 data]$ hadoop fs -mkdir -p /project/input/raw
[hadoop@hadoop000 data]$ hadoop fs -put trackinfo_20130721.data /project/input/raw
[hadoop@hadoop000 data]$ hadoop fs -ls /project/input/raw
Found 1 items
-rw-r--r--   1 hadoop supergroup  173555592 2018-12-09 08:50 /project/input/raw/trackinfo_20130721.data

7、寫指令碼

在/shell/pv.sh

沒有pv.sh檔案,使用vi直接建立並進入。

[hadoop@hadoop000 ~]$ clear
[hadoop@hadoop000 ~]$ cd shell/
[hadoop@hadoop000 shell]$ ls
[hadoop@hadoop000 shell]$ vi pv.sh

在pv.sh檔案寫入:

hadoop jar + "在hdfs中的jar包路徑及jar包名" + “要執行的某一java的copy reference” + "資料輸入路徑" + ”資料輸出路徑“

hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.PVStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/pvstat/

8、執行

先設定執行許可權,再執行。

[hadoop@hadoop000 shell]$ chmod u+x pv.sh
[hadoop@hadoop000 shell]$ .
/pv.sh

(1)執行com.imooc.bigdata.hadoop.mr.project.mr.PVStatApp

注意:要複製類名的Copy Reference

hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.PVStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/pvstat/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v1/pvstat/part-r-00000
300000

(2)執行com.imooc.bigdata.hadoop.mr.project.mr.ProvinceStartApp

hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.ProvinceStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/provincestat/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v1/provincestat/part-r-00000

(3)執行com.imooc.bigdata.hadoop.mr.project.mr.PageStatApp

hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.PageStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/pagestat/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v1/pagestat/part*

(4)執行com.imooc.bigdata.hadoop.mr.project.mrv2.ETLApp

hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mrv2.ETLApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/input/etl/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/input/etl/part*

(5)執行com.imooc.bigdata.hadoop.mr.project.mrv2.ProvinceStatV2App

hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mrv2.ProvinceStatV2App hdfs://hadoop000:8020/project/input/etl/ hdfs://hadoop000:8020/project/output/v2/provincestatv2/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v2/provincestatv2/part*

(6)執行com.imooc.bigdata.hadoop.mr.project.mrv2.PVStatV2App

hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mrv2.PVStatV2App hdfs://hadoop000:8020/project/input/etl/ hdfs://hadoop000:8020/project/output/v2/pvstatv2/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v2/pvstatv2/part*

11、總結

大資料處理完以後的資料,是存放在HDFS上
其實大資料乾的事情基本就這麼多
再進一步:需要使用技術或者框架把處理完的結果匯出到資料庫中
Sqoop:把HDFS上的統計結果匯出到MySQL中。