一、Hadoop課程
Hadoop課程
2.1 初始設定
初始環境這裡平臺已設定好,同學們需要了解一下如何設定。
1. 修改主機名,以master節點為例
[ec2-user@ip-172-31-32-47 ~]$ sudo vi /etc/hostname
#在裡面刪去所有內容,在首行新增 master作為自己新的主機名。
#重啟虛擬機器,使配置生效
[ec2-user@ip-172-31-32-47 ~]$ sudo reboot
2. 修改hosts對映,以master節點為例
#檢視所有節點的IP [ec2-user@master ~]$ ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001 inet 172.31.32.47 netmask 255.255.240.0 broadcast 172.31.47.255 inet6 fe80::8b2:80ff:fe01:e5c2 prefixlen 64 scopeid 0x20<link> ether 0a:b2:80:01:e5:c2 txqueuelen 1000 (Ethernet) RX packets 3461 bytes 687720 (671.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3262 bytes 544011 (531.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [ec2-user@slave1 ~]$ ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001 inet 172.31.36.81 netmask 255.255.240.0 broadcast 172.31.47.255 inet6 fe80::87d:36ff:fe72:bc0c prefixlen 64 scopeid 0x20<link> ether 0a:7d:36:72:bc:0c txqueuelen 1000 (Ethernet) RX packets 2195 bytes 543199 (530.4 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2178 bytes 361053 (352.5 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [ec2-user@slave2 ~]$ ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001 inet 172.31.46.142 netmask 255.255.240.0 broadcast 172.31.47.255 inet6 fe80::850:68ff:fe8c:6c5e prefixlen 64 scopeid 0x20<link> ether 0a:50:68:8c:6c:5e txqueuelen 1000 (Ethernet) RX packets 2284 bytes 547630 (534.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2241 bytes 375782 (366.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 #以IP 主機名格式寫道hosts檔案中 [ec2-user@master ~]$ sudo vi /etc/hosts #檢視修改結果,注意:所有節點都要修改hosts檔案 [ec2-user@master ~]$ cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost6 localhost6.localdomain6 172.31.32.47 master 172.31.36.81 slave1 172.31.46.142 slave2
2.2 安裝Java環境
我們先來了解一下為什麼要安裝JDK,JDK是 Java 語言的軟體開發工具包,提供給程式設計師使用。主要用於移動裝置、嵌入式裝置上的java應用程式。JDK是整個java開發的核心,它包含了JAVA的執行環境(JVM+Java系統類庫)和JAVA工具。
1. 解壓jdk1.8
#將jdk解壓到指定路徑 [ec2-user@master ~]$ sudo tar -zxvf hadoop/jdk-8u144-linux-x64.tar.gz -C /usr/local/src/ #檢視目標目錄下是否有jdk解壓包 [ec2-user@master ~]$ ls /usr/local/src/ jdk1.8.0_144
2. 重新命名為jdk
[ec2-user@master ~]$ cd /usr/local/src/
[ec2-user@master src]$ ls
jdk1.8.0_144
[ec2-user@master src]$ sudo mv jdk1.8.0_144/ jdk
[ec2-user@master src]$ ls
jdk
3. 新增環境變數(所有節點)–以master為例
[ec2-user@master src]$ sudo vi /etc/profile #在檔案末尾新增如下內容 export JAVA_HOME=/usr/local/src/jdk export PATH=$PATH:$JAVA_HOME/bin #重新整理環境變數 [ec2-user@master src]$ source /etc/profile
4. 檢視jdk版本,驗證是否安裝成功
[ec2-user@master src]$ java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
5. 修改許可權(所有節點,以master為例)
因為我們的實驗是採用普通使用者執行的,但是/usr/local/src/目錄需要root許可權才能操作,如果不修改許可權,在分發檔案時會顯示許可權不足。
[ec2-user@master ~]$ ll /usr/local/
total 0
drwxr-xr-x 2 root root 6 Apr 9 2019 bin
drwxr-xr-x 2 root root 6 Apr 9 2019 etc
drwxr-xr-x 2 root root 6 Apr 9 2019 games
drwxr-xr-x 2 root root 6 Apr 9 2019 include
drwxr-xr-x 2 root root 6 Apr 9 2019 lib
drwxr-xr-x 2 root root 6 Apr 9 2019 lib64
drwxr-xr-x 2 root root 6 Apr 9 2019 libexec
drwxr-xr-x 2 root root 6 Apr 9 2019 sbin
drwxr-xr-x 5 root root 49 Mar 4 20:51 share
drwxr-xr-x 4 root root 31 Mar 19 06:54 src
#把/usr/local/src/目錄和子資料夾的所屬使用者以及所屬組設定為ec2-user使用者
[ec2-user@master ~]$ sudo chown -R ec2-user:ec2-user /usr/local/src/
#再次檢視/usr/local/src/目錄所屬使用者以及所屬組
[ec2-user@master ~]$ ll /usr/local/
total 0
drwxr-xr-x 2 root root 6 Apr 9 2019 bin
drwxr-xr-x 2 root root 6 Apr 9 2019 etc
drwxr-xr-x 2 root root 6 Apr 9 2019 games
drwxr-xr-x 2 root root 6 Apr 9 2019 include
drwxr-xr-x 2 root root 6 Apr 9 2019 lib
drwxr-xr-x 2 root root 6 Apr 9 2019 lib64
drwxr-xr-x 2 root root 6 Apr 9 2019 libexec
drwxr-xr-x 2 root root 6 Apr 9 2019 sbin
drwxr-xr-x 5 root root 49 Mar 4 20:51 share
drwxr-xr-x 4 ec2-user ec2-user 31 Mar 19 06:54 src
6. 遠端分發到其他節點
[ec2-user@master ~]$ scp -r /usr/local/src/jdk/ slave1:/usr/local/src/
[ec2-user@master ~]$ scp -r /usr/local/src/jdk/ slave2:/usr/local/src/
2.3 安裝Hadoop叢集
1. 解壓
[ec2-user@master src]$tar -zxvf /home/ec2-user/hadoop/hadoop-2.9.1.tar.gz -C /usr/local/src/
[ec2-user@master src]$ ls
hadoop-2.9.1 jdk
2. 重新命名為Hadoop
[ec2-user@master src]$ pwd
/usr/local/src
[ec2-user@master src]$ mv hadoop-2.9.1/ hadoop
[ec2-user@master src]$ ls
hadoop jdk
3. 新增環境變數(所有節點)–以master為例
[ec2-user@master ~]$ sudo vi /etc/profile
#在檔案末尾新增如下內容
export HADOOP_HOME=/usr/local/src/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CLASSPATH=/usr/local/src/hadoop/lib/*
#重新整理環境變數
[ec2-user@master ~]$ source /etc/profile
4. 修改core-site.xml配置檔案
[ec2-user@master ~]$ cd /usr/local/src/hadoop/etc/hadoop/
[ec2-user@master hadoop]$ vi core-site.xml
在<configuration></configuration>
標籤中新增如下內容:
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/src/hadoop/tmp</value>
</property>
5. 修改hdfs-site.xml配置檔案
[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ vi hdfs-site.xml
在<configuration></configuration>
標籤中新增如下內容:
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 指定Hadoop輔助名稱節點主機配s置 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave1:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/src/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/src/hadoop/tmp/dfs/data</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
6. 修改yarn-site.xml配置檔案
[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ vi yarn-site.xml
在<configuration></configuration>
標籤中新增如下內容:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
7. 修改mapred-site.xml配置檔案
[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ cp mapred-site.xml.template mapred-site.xml
[ec2-user@master hadoop]$ vi mapred-site.xml
在<configuration></configuration>
標籤中新增如下內容:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
8. 修改hadoop-env.sh配置檔案
[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ vi hadoop-env.sh
配置jdk路徑:
export JAVA_HOME=/usr/local/src/jdk
注意:要根據自己路徑來修改。
9. 修改slaves配置檔案
[ec2-user@master hadoop]$ pwd
/usr/local/src/hadoop/etc/hadoop
[ec2-user@master hadoop]$ vi slaves
[ec2-user@master hadoop]$ cat slaves
slave1
slave2
10. 遠端分發到其他節點
[ec2-user@master hadoop]$ cd /usr/local/src/
[ec2-user@master src]$ scp -r hadoop/ slave1:/usr/local/src/
[ec2-user@master src]$ scp -r hadoop/ slave2:/usr/local/src/
11. 在namenode節點格式化namenode
[ec2-user@master src]$ hdfs namenode -format
12. 啟動hadoop叢集
#在namenode節點啟動Hadoop叢集
[ec2-user@master src]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
The authenticity of host 'master (172.31.32.47)' can't be established.
ECDSA key fingerprint is SHA256:Tueyo4xR8lsxmdA11GlXAO3w44n6T75dYHe9flk8Y70.
ECDSA key fingerprint is MD5:22:9b:6d:f2:f3:11:a2:6d:4d:dd:ec:25:56:3b:2d:b2.
Are you sure you want to continue connecting (yes/no)? yes
master: Warning: Permanently added 'master,172.31.32.47' (ECDSA) to the list of known hosts.
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-namenode-master.out
slave2: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-datanode-slave2.out
slave1: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-datanode-slave1.out
Starting secondary namenodes [slave1]
slave1: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-secondarynamenode-slave1.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-ec2-user-resourcemanager-master.out
slave1: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-ec2-user-nodemanager-slave1.out
slave2: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-ec2-user-nodemanager-slave2.out
#jps檢視程序
[ec2-user@master src]$ jps
31522 Jps
31256 ResourceManager
30973 NameNode
[ec2-user@master src]$ ssh slave1
Last login: Fri Mar 19 06:15:47 2021 from 219.153.251.37
__| __|_ )
_| ( / Amazon Linux 2 AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-2/
[ec2-user@slave1 ~]$ jps
29424 DataNode
29635 NodeManager
29544 SecondaryNameNode
29789 Jps
[ec2-user@slave1 ~]$ ssh slave2
Last login: Fri Mar 19 06:15:57 2021 from 219.153.251.37
__| __|_ )
_| ( / Amazon Linux 2 AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-2/
[ec2-user@slave2 ~]$ jps
29633 Jps
29479 NodeManager
29354 DataNode
13. 檢視hadoop叢集狀態
[ec2-user@master ~]$ hdfs dfsadmin -report
Configured Capacity: 17154662400 (15.98 GB)
Present Capacity: 11389693952 (10.61 GB)
DFS Remaining: 11389685760 (10.61 GB)
DFS Used: 8192 (8 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (2):
Name: 172.31.36.81:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 8577331200 (7.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2882510848 (2.68 GB)
DFS Remaining: 5694816256 (5.30 GB)
DFS Used%: 0.00%
DFS Remaining%: 66.39%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Mar 19 07:45:06 UTC 2021
Last Block Report: Fri Mar 19 07:41:00 UTC 2021
Name: 172.31.46.142:50010 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 8577331200 (7.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 2882457600 (2.68 GB)
DFS Remaining: 5694869504 (5.30 GB)
DFS Used%: 0.00%
DFS Remaining%: 66.39%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Mar 19 07:45:06 UTC 2021
Last Block Report: Fri Mar 19 07:41:00 UTC 2021
2.4 安裝Hive
1. 安裝MySQL
在安裝hive前我們需要先安裝MySQL資料庫,用來儲存hive的元資料。
1)下載mysql源安裝包
[ec2-user@master ~]$ sudo wget http://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm
2)安裝mysql源
[ec2-user@master ~]$ sudo yum localinstall mysql57-community-release-el7-8.noarch.rpm
3)檢查mysql源是否安裝成功
[ec2-user@master ~]$ sudo yum repolist enabled | grep "mysql.*.community.*"
mysql-connectors-community/x86_64 MySQL Connectors Community 146+39
mysql-tools-community/x86_64 MySQL Tools Community 123
mysql57-community/x86_64 MySQL 5.7 Community Server 484
4)安裝MySQL
[ec2-user@master ~]$ sudo yum install mysql-community-server
5)啟動MySQL服務並檢視執行狀態
[ec2-user@master ~]$ sudo systemctl start mysqld
[ec2-user@master ~]$ sudo systemctl status mysqld
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2021-03-19 07:56:43 UTC; 1s ago
Docs: man:mysqld(8)
http://dev.mysql.com/doc/refman/en/using-systemd.html
Process: 31978 ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid $MYSQLD_OPTS (code=exited, status=0/SUCCESS)
Process: 31927 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)
Main PID: 31981 (mysqld)
CGroup: /system.slice/mysqld.service
└─31981 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
Mar 19 07:56:39 master systemd[1]: Starting MySQL Server...
Mar 19 07:56:43 master systemd[1]: Started MySQL Server.
6)檢視mysql初始密碼
[ec2-user@master ~]$ sudo grep "password" /var/log/mysqld.log
2021-03-19T07:56:41.030922Z 1 [Note] A temporary password is generated for root@localhost: v=OKXu0laSo;
7)修改mysql登陸密碼
先把之前我們檢視到的初始密碼複製下來,在進入mysql需要輸入密碼時貼上下來,回車,就可以進入MySQL命令列。
[ec2-user@master ~]$ sudo mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 5.7.33
Copyright (c) 2000, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
修改密碼,設定MySQL登陸密碼為1234:
mysql> set password for 'root'@'localhost'=password('1234');
ERROR 1819 (HY000): Your password does not satisfy the current policy requirements
由上可知,新密碼設定的時候如果設定的過於簡單會報錯。
這時我們需要修改密碼規則:
mysql> set global validate_password_policy=0;
Query OK, 0 rows affected (0.00 sec)
mysql> set global validate_password_length=1;
Query OK, 0 rows affected (0.00 sec)
重新設定密碼:
mysql> set password for 'root'@'localhost'=password('1234');
Query OK, 0 rows affected, 1 warning (0.00 sec)
8) 設定遠端登陸
先退出MySQL,以新密碼登陸MySQL。
[ec2-user@master ~]$ mysql -uroot -p1234
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.7.33 MySQL Community Server (GPL)
Copyright (c) 2000, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
建立使用者:
mysql> create user 'root'@'172.%.%.%' identified by '1234';
Query OK, 0 rows affected (0.00 sec)
允許遠端連線:
mysql> grant all privileges on *.* to 'root'@'172.%.%.%' with grant option;
Query OK, 0 rows affected (0.00 sec)
重新整理許可權:
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
至此,MySQL安裝成功。
2. 把hive解壓到指定位置
[ec2-user@master ~]$ tar -zxvf hadoop/apache-hive-1.1.0-bin.tar.gz -C /usr/local/src/
3. 重新命名
[ec2-user@master ~]$ cd /usr/local/src/
[ec2-user@master src]$ ls
apache-hive-1.1.0-bin hadoop jdk
[ec2-user@master src]$ mv apache-hive-1.1.0-bin/ hive
[ec2-user@master src]$ ls
hadoop hive jdk
4. 新增環境變數
[ec2-user@master src]$ sudo vi /etc/profile
#在檔案末尾新增如下內容
export HIVE_HOME=/usr/local/src/hive
export PATH=$PATH:$HIVE_HOME/bin
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/src/hive/lib/*
#重新整理環境變數
[ec2-user@master src]$ source /etc/profile
5. 修改hive-site.xml配置檔案
[ec2-user@master src]$ cd hive/conf/
#建立hive-site.xml檔案
[ec2-user@master conf]$ touch hive-site.xml
[ec2-user@master conf]$ vi hive-site.xml
在hive-site.xml檔案中新增如下內容:
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>1234</value>
</property>
</configuration>
注意:MySQL密碼要改成自己設定的密碼。
6. 修改hive-env.sh配置檔案
[ec2-user@master conf]$ pwd
/usr/local/src/hive/conf
[ec2-user@master conf]$ cp hive-env.sh.template hive-env.sh
[ec2-user@master conf]$ vi hive-env.sh
#在裡面新增如下配置
export HADOOP_HOME=/usr/local/src/hadoop
export HIVE_CONF_DIR=/usr/local/src/hive/conf
7. 新增MySQL連線包
把MySQL驅動放到hive的lib目錄下。
[ec2-user@master conf]$ cp /home/ec2-user/hadoop/mysql-connector-java-5.1.44-bin.jar $HIVE_HOME/lib
[ec2-user@master conf]$ ls $HIVE_HOME/lib/mysql-connector-java-5.1.44-bin.jar
/usr/local/src/hive/lib/mysql-connector-java-5.1.44-bin.jar
8. 啟動Hadoop叢集(hive需要hdfs分散式檔案系統儲存來資料)
如果Hadoop已啟動,則不需要執行這一步。
start-all.sh
9. 初始化MySQL中的hive的資料庫
[ec2-user@master conf]$ schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 1.1.0
Initialization script hive-schema-1.1.0.mysql.sql
Initialization script completed
schemaTool completed
10. 啟動hive並測試
[ec2-user@master conf]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-1.1.0.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.587 seconds, Fetched: 1 row(s)
至此,hive安裝成功。
2.5 安裝Sqoop
1. 解壓
[ec2-user@master ~]$ tar -zxvf hadoop/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/src/
2. 重新命名為sqoop
[ec2-user@master ~]$ cd /usr/local/src/
[ec2-user@master src]$ ls
hadoop hive jdk sqoop-1.4.7.bin__hadoop-2.6.0
[ec2-user@master src]$ mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop
[ec2-user@master src]$ ls
hadoop hive jdk sqoop
3. 新增環境變數
[ec2-user@master src]$ sudo vi /etc/profile
#在裡面新增如下程式碼
export SQOOP_HOME=/usr/local/src/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
#重新整理環境變數
[ec2-user@master src]$ source /etc/profile
4. 修改sqoop-env.sh配置檔案
[ec2-user@master src]$ cd sqoop/conf/
[ec2-user@master conf]$ mv sqoop-env-template.sh sqoop-env.sh
[ec2-user@master conf]$ vi sqoop-env.sh
在裡面修改一下配置項,根據自己的環境來修改:
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/local/src/hadoop
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/local/src/hadoop
#Set the path to where bin/hive is available
export HIVE_HOME=/usr/local/src/hive
5. 把mysql驅動放到sqoop的lib目錄下
[ec2-user@master conf]$ cp /home/ec2-user/hadoop/mysql-connector-java-5.1.44-bin.jar $SQOOP_HOME/lib[ec2-user@master conf]$ ls $SQOOP_HOME/lib/mysql-connector-java-5.1.44-bin.jar
/usr/local/src/sqoop/lib/mysql-connector-java-5.1.44-bin.jar
6. 驗證sqoop是否配置成功
[ec2-user@master conf]$ sqoop help
Warning: /usr/local/src/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/src/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
21/03/19 08:53:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
小石小石摩西摩西的學習筆記,歡迎提問,歡迎指正!!!