關係資料庫資料與hadoop資料進行轉換的工具 - Sqoop
阿新 • • 發佈:2019-01-05
Sqoop
本文所使用的Sqoop版本為1.4.6
1.官網
2.作用
A:可以把hadoop資料匯入到關係資料庫裡面(e.g. Hive -> Mysql)
B:可以把關係資料庫資料匯入到hadoop裡面(e.g. Mysql -> Hive)
3.下載
http://archive.apache.org/dist/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-1.0.0.tar.gz
4.安裝
--上傳到node1(我之前安裝的hive就在node1上面)本目錄,並且解壓 cd tar-zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz --建立軟鏈 ln -sf /root/sqoop-1.4.6.bin__hadoop-2.0.4-alpha /home/sqoop-1.4.6
5.環境變數配置
--配置環境變數 vi /etc/profile export HADOOP_PREFIX=$HADOOP_HOME export PATH=$PATH:$SQOOP_HOME/bin :wq source /etc/profile
6.修改配置檔案
--修改配置檔案cd /home/sqoop-1.4.6/conf/ cp sqoop-env-template.sh sqoop-env.sh vi sqoop-env.sh
7.新增驅動包
--把mysql驅動包新增到sqoop的lib目錄下面 cd scp mysql-connector-java-5.1.23-bin.jar /home/sqoop-1.4.6/lib/
8.測試
sqoop version [[email protected] ~]# sqoop version Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/01/04 23:15:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 Sqoop 1.4.6 git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25 Compiled by root on Mon Apr 27 14:38:36 CST 2015 sqoop list-databases -connect jdbc:mysql://node1:3306/ -username root -password '!QAZ2wsx3edc' [[email protected] ~]# sqoop list-databases -connect jdbc:mysql://node1:3306/ -username root -password '!QAZ2wsx3edc' Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/01/04 23:17:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 19/01/04 23:17:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/01/04 23:17:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. information_schema hive mysql performance_schema result_db spark sys
你會發現,在輸出裡面會有很多Warning
我們可以通過以下操作去掉這些Warning
--去除Warning cd /home/sqoop-1.4.6/bin/ vi configure-sqoop --把下面的行全部註釋掉 - 在每一行前面加 '#' ## Moved to be a runtime check in sqoop. #if [ ! -d "${HBASE_HOME}" ]; then # echo "Warning: $HBASE_HOME does not exist! HBase imports will fail." # echo 'Please set $HBASE_HOME to the root of your HBase installation.' #fi ## Moved to be a runtime check in sqoop. #if [ ! -d "${HCAT_HOME}" ]; then # echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail." # echo 'Please set $HCAT_HOME to the root of your HCatalog installation.' #fi #if [ ! -d "${ACCUMULO_HOME}" ]; then # echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail." # echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.' #fi #if [ ! -d "${ZOOKEEPER_HOME}" ]; then # echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail." # echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.' #fi :wq --再次測試 [[email protected] bin]# sqoop list-databases -connect jdbc:mysql://node1:3306/ -username root -password '!QAZ2wsx3edc'
19/01/04 23:34:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 19/01/04 23:34:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/01/04 23:34:21 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. information_schema hive mysql performance_schema result_db spark sys
9.應用
9.1.從Mysql匯入到HDFS
準備工作:
--在資料庫裡面先建立table CREATE TABLE t_user (id INT, name VARCHAR(20), age INT); --插入測試資料 insert into t_user values(1, 'Tom', 20); insert into t_user values(2, 'John', 18); insert into t_user values(3, 'Div', 25); insert into t_user values(4, 'Susan', 31); insert into t_user values(5, 'Tiran', 40); insert into t_user values(6, 'Shasita', 13);
查詢結果:
mysql> select * from t_user;
+------+---------+------+
| id | name | age |
+------+---------+------+
| 1 | Tom | 20 |
| 2 | John | 18 |
| 3 | Div | 25 |
| 4 | Susan | 31 |
| 5 | Tiran | 40 |
| 6 | Shasita | 13 |
+------+---------+------+
6 rows in set (0.00 sec)
--從mysql資料庫裡面匯出資料到Hdfs上面 sqoop import --connect jdbc:mysql://node1:3306/sqoop_db --username root --password '!QAZ2wsx3edc' --table t_user --columns id,name,age -m 1 --target-dir /sqoop_t_user [[email protected] bin]# sqoop import --connect jdbc:mysql://node1:3306/sqoop_db --username root --password '!QAZ2wsx3edc' --table t_user --columns id,name,age -m 1 --target-dir /sqoop_t_user 19/01/04 23:54:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 19/01/04 23:54:30 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/01/04 23:54:30 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/01/04 23:54:30 INFO tool.CodeGenTool: Beginning code generation 19/01/04 23:54:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_user` AS t LIMIT 1 19/01/04 23:54:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_user` AS t LIMIT 1 19/01/04 23:54:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop-2.5 Note: /tmp/sqoop-root/compile/84e97965496cc61c73c17151375a419b/t_user.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 19/01/04 23:54:33 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/84e97965496cc61c73c17151375a419b/t_user.jar 19/01/04 23:54:33 WARN manager.MySQLManager: It looks like you are importing from mysql. 19/01/04 23:54:33 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 19/01/04 23:54:33 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 19/01/04 23:54:33 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 19/01/04 23:54:33 INFO mapreduce.ImportJobBase: Beginning import of t_user 19/01/04 23:54:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/01/04 23:54:34 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/01/04 23:54:34 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.79.138:8032 19/01/04 23:54:48 INFO db.DBInputFormat: Using read commited transaction isolation 19/01/04 23:54:48 INFO mapreduce.JobSubmitter: number of splits:1 19/01/04 23:54:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1546674829746_0001 19/01/04 23:54:49 INFO impl.YarnClientImpl: Submitted application application_1546674829746_0001 19/01/04 23:54:49 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1546674829746_0001/ 19/01/04 23:54:49 INFO mapreduce.Job: Running job: job_1546674829746_0001 19/01/04 23:54:59 INFO mapreduce.Job: Job job_1546674829746_0001 running in uber mode : false 19/01/04 23:54:59 INFO mapreduce.Job: map 0% reduce 0% 19/01/04 23:55:06 INFO mapreduce.Job: map 100% reduce 0% 19/01/04 23:55:06 INFO mapreduce.Job: Job job_1546674829746_0001 completed successfully 19/01/04 23:55:06 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=116299 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=63 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=4153 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4153 Total vcore-seconds taken by all map tasks=4153 Total megabyte-seconds taken by all map tasks=4252672 Map-Reduce Framework Map input records=6 Map output records=6 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=69 CPU time spent (ms)=1170 Physical memory (bytes) snapshot=175808512 Virtual memory (bytes) snapshot=893071360 Total committed heap usage (bytes)=84934656 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=63 19/01/04 23:55:06 INFO mapreduce.ImportJobBase: Transferred 63 bytes in 32.3608 seconds (1.9468 bytes/sec) 19/01/04 23:55:06 INFO mapreduce.ImportJobBase: Retrieved 6 records.
執行效果:
官網提供另一種方式,即讀取檔案的方式來實現上面的匯入功能
cd mkdir mysqoopdir cd mysqoopdir vi mysql_to_hdfs import --connect jdbc:mysql://node1:3306/sqoop_db --username root --password '!QAZ2wsx3edc' --table t_user --columns id,name,age -m 1 --target-dir /sqoop_t_user --delete-target-dir :wq sqoop --options-file mysql_to_hdfs
我們可以通過Hive,來驗證匯入結果
[[email protected] bin]# ./hive 19/01/05 00:03:29 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead Logging initialized using configuration in jar:file:/root/apache-hive-0.13.1-bin/lib/hive-common-0.13.1.jar!/hive-log4j.properties hive> dfs -cat /sqoop_t_user/*; 1,Tom,20 2,John,18 3,Div,25 4,Susan,31 5,Tiran,40 6,Shasita,13
我們看到的結果和mysql裡面的資料一樣。
應用場景:如果現在我們的需要處理/分析的資料都存在Mysql資料庫裡面,並且資料量比較大,我們想要通過離線分析這些資料。這時,我們就可以把Mysql裡面的資料通過Sqoop匯入到Hdfs裡面,進行分析處理。
匯入查詢結果:
--匯入查詢結果 cd mysqoopdir vi mysql_query_to_hdfs import --connect jdbc:mysql://node1:3306/sqoop_db --username root --password '!QAZ2wsx3edc' -e select id, name from t_user where id >= 1 and $CONDITIONS -m 1 --target-dir /sqoop_t_user --delete-target-dir :wq sqoop --options-file mysql_query_to_hdfs --檢驗 hive> dfs -cat /sqoop_t_user/*; 1,Tom 2,John 3,Div 4,Susan 5,Tiran 6,Shasita
9.2Mysql匯入資料到Hive
--mysql匯入到Hive cd mysqoopdir vi mysql_to_hive import --connect jdbc:mysql://node1:3306/sqoop_db --username root --password '!QAZ2wsx3edc' --table t_user -m 1 --create-hive-table --target-dir /sqoop_mysql_to_hive/ --hive-home /home/hive/ --hive-import --hive-table t_sqoop_mysql_t_user_to_hive --create-hive-table :wq sqoop --options-file mysql_to_hive --檢驗 hive> select * from t_sqoop_mysql_t_user_to_hive; OK 1 Tom 20 2 John 18 3 Div 25 4 Susan 31 5 Tiran 40 6 Shasita 13 Time taken: 0.577 seconds, Fetched: 6 row(s)
9.3.Mysql匯入到Hbase
--mysql匯入到Hbase cd mysqoopdir vi mysql_to_hbase import --connect jdbc:mysql://node1:3306/sqoop_db --username root --password '!QAZ2wsx3edc' --table t_user -m 1 --hbase-table t_sqoop_mysql_t_user_to_hbase --hbase-row-key id --hbase-create-table --column-family cf :wq sqoop --options-file mysql_to_hbase --檢驗 hbase(main):004:0> scan 't_sqoop_mysql_t_user_to_hbase' ROW COLUMN+CELL 1 column=cf:age, timestamp=1546680684317, value=20 1 column=cf:name, timestamp=1546680684317, value=Tom 2 column=cf:age, timestamp=1546680684317, value=18 2 column=cf:name, timestamp=1546680684317, value=John 3 column=cf:age, timestamp=1546680684317, value=25 3 column=cf:name, timestamp=1546680684317, value=Div 4 column=cf:age, timestamp=1546680684317, value=31 4 column=cf:name, timestamp=1546680684317, value=Susan 5 column=cf:age, timestamp=1546680684317, value=40 5 column=cf:name, timestamp=1546680684317, value=Tiran 6 column=cf:age, timestamp=1546680684317, value=13 6 column=cf:name, timestamp=1546680684317, value=Shasita 6 row(s) in 0.1730 seconds
9.4.Hdfs匯出到Mysql
--Hdfs匯出到Mysql cd mysqoopdir vi hdfs_to_mysql export --connect jdbc:mysql://node1:3306/sqoop_db --username root --password '!QAZ2wsx3edc' --table t_hdfs_user_to_mysql -m 1 --columns id,name,age --export-dir /sqoop_t_user/ :wq --進入node1 mysql -u root -p !QAZ2wsx3edc show database; use sqoop_db; show tables; --建立t_hdfs_user_to_mysql表 CREATE TABLE t_hdfs_user_to_mysql (id INT, name VARCHAR(20), age INT); cd mysqoopdir sqoop --options-file hdfs_to_mysql --檢驗 --執行匯入前 mysql> select * from t_hdfs_user_to_mysql; Empty set (0.00 sec) --執行匯入後 mysql> mysql> select * from t_hdfs_user_to_mysql; +------+---------+------+ | id | name | age | +------+---------+------+ | 1 | Tom | 20 | | 2 | John | 18 | | 3 | Div | 25 | | 4 | Susan | 31 | | 5 | Tiran | 40 | | 6 | Shasita | 13 | +------+---------+------+ 6 rows in set (0.00 sec)
========================================================
More reading,and english is important.
I'm Hongten
大哥哥大姐姐,覺得有用打賞點哦!你的支援是我最大的動力。謝謝。
Hongten部落格排名在100名以內。粉絲過千。
Hongten出品,必是精品。
E | [email protected] B | http://www.cnblogs.com/hongten
========================================================