Sqoop基本語法簡介
本篇文章主要介紹sqoop的基本語法及簡單使用方法。
1.查看命令幫助
[hadoop@hadoop000 ~]$ sqoop help usage: sqoop COMMAND [ARGS] Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS import-mainframe Import datasets from a mainframe server to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore Run a standalone Sqoop metastore version Display version information See ‘sqoop help COMMAND‘ for information on a specific command. # 這裏提示我們使用sqoop help command(要查詢的命令)進行該命令的詳細查詢
2.list-databases
# 查看list-databases命令幫助 [hadoop@hadoop000 ~]$ sqoop help list-databases usage: sqoop list-databases [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect <jdbc-uri> Specify JDBC connect string --connection-manager <class-name> Specify connection manager class name --connection-param-file <properties-file> Specify connection parameters file --driver <class-name> Manually specify JDBC driver class to use --hadoop-home <hdir> Override $HADOOP_MAPRED_HOME_ARG --hadoop-mapred-home <dir> Override $HADOOP_MAPRED_HOME_ARG --help Print usage instructions -P Read password from console --password <password> Set authentication password --password-alias <password-alias> Credential provider password alias --password-file <password-file> Set authentication password file path --relaxed-isolation Use read-uncommitted isolation for imports --skip-dist-cache Skip copying jars to distributed cache --username <username> Set authentication username --verbose Print more information while working # 簡單使用 [hadoop@oradb3 ~]$ sqoop list-databases > --connect jdbc:mysql://localhost:3306 > --username root > --password 123456 # 結果 information_schema mysql performance_schema slow_query_log sys test
3.list-tables
# 命令幫助 [hadoop@hadoop000 ~]$ sqoop help list-tables usage: sqoop list-tables [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect <jdbc-uri> Specify JDBC connect string --connection-manager <class-name> Specify connection manager class name --connection-param-file <properties-file> Specify connection parameters file --driver <class-name> Manually specify JDBC driver class to use --hadoop-home <hdir> Override $HADOOP_MAPRED_HOME_ARG --hadoop-mapred-home <dir> Override $HADOOP_MAPRED_HOME_ARG --help Print usage instructions -P Read password from console --password <password> Set authentication password --password-alias <password-alias> Credential provider password alias --password-file <password-file> Set authentication password file path --relaxed-isolation Use read-uncommitted isolation for imports --skip-dist-cache Skip copying jars to distributed cache --username <username> Set authentication username --verbose Print more information while working # 使用方法 [hadoop@hadoop000 ~]$ sqoop list-tables > --connect jdbc:mysql://localhost:3306/test > --username root > --password 123456 # 結果 t_order test0001 test_1013 test_dyc test_tb
4.將mysql導入HDFS中(import)
(默認導入當前用戶目錄下/user/用戶名/表名)
說到這裏擴展一個小知識點:
- hadoop fs -ls 顯示的是當前的用戶目錄 即/user/hadoop
hadoop fs -ls / 顯示的是HDFS根目錄
# 查看命令幫助
[hadoop@hadoop000 ~]$ sqoop help import
# 執行import
[hadoop@hadoop000 ~]$ sqoop import > --connect jdbc:mysql://localhost:3306/test > --username root > --password 123456 > --table students
這時很可能會出現這個錯誤Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
這裏我們需要導入java-json.jar包 下載地址 把java-json.jar添加到../sqoop/lib目錄下即可
# 再次執行 import導入
[hadoop@hadoop000 ~]$ sqoop import > --connect jdbc:mysql://localhost:3306/test > --username root > --password 123456 > --table students
18/07/04 13:28:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
18/07/04 13:28:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/07/04 13:28:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/07/04 13:28:35 INFO tool.CodeGenTool: Beginning code generation
18/07/04 13:28:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 1
18/07/04 13:28:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 1
18/07/04 13:28:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/app/hadoop-2.6.0-cdh5.7.0
18/07/04 13:28:37 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/3024b8df04f623e8c79ed9b5b30ace75/students.jar
18/07/04 13:28:37 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/07/04 13:28:37 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/07/04 13:28:37 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/07/04 13:28:37 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/07/04 13:28:37 INFO mapreduce.ImportJobBase: Beginning import of students
18/07/04 13:28:38 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/07/04 13:28:39 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/07/04 13:28:39 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/07/04 13:28:41 INFO db.DBInputFormat: Using read commited transaction isolation
18/07/04 13:28:41 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `students`
18/07/04 13:28:41 INFO db.IntegerSplitter: Split size: 0; Num splits: 4 from: 1001 to: 1003
18/07/04 13:28:41 INFO mapreduce.JobSubmitter: number of splits:3
18/07/04 13:28:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1530598609758_0015
18/07/04 13:28:42 INFO impl.YarnClientImpl: Submitted application application_1530598609758_0015
18/07/04 13:28:42 INFO mapreduce.Job: The url to track the job: http://oradb3:8088/proxy/application_1530598609758_0015/
18/07/04 13:28:42 INFO mapreduce.Job: Running job: job_1530598609758_0015
18/07/04 13:28:52 INFO mapreduce.Job: Job job_1530598609758_0015 running in uber mode : false
18/07/04 13:28:52 INFO mapreduce.Job: map 0% reduce 0%
18/07/04 13:28:58 INFO mapreduce.Job: map 33% reduce 0%
18/07/04 13:28:59 INFO mapreduce.Job: map 67% reduce 0%
18/07/04 13:29:00 INFO mapreduce.Job: map 100% reduce 0%
18/07/04 13:29:00 INFO mapreduce.Job: Job job_1530598609758_0015 completed successfully
18/07/04 13:29:00 INFO mapreduce.Job: Counters: 30
...
18/07/04 13:29:00 INFO mapreduce.ImportJobBase: Transferred 40 bytes in 21.3156 seconds (1.8766 bytes/sec)
18/07/04 13:29:00 INFO mapreduce.ImportJobBase: Retrieved 3 records.
# 生成的日誌信息大家一定要好好理解
# 查看HDFS上的文件
[hadoop@hadoop000 ~]$ hadoop fs -ls /user/hadoop/students
Found 4 items
-rw-r--r-- 1 hadoop supergroup 0 2018-07-04 13:28 /user/hadoop/students/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 13 2018-07-04 13:28 /user/hadoop/students/part-m-00000
-rw-r--r-- 1 hadoop supergroup 13 2018-07-04 13:28 /user/hadoop/students/part-m-00001
-rw-r--r-- 1 hadoop supergroup 14 2018-07-04 13:28 /user/hadoop/students/part-m-00002
[hadoop@hadoop000 ~]$ hadoop fs -cat /user/hadoop/students/"part*"
1001,lodd,23
1002,sdfs,21
1003,sdfsa,24
我們還可以加一些其他參數 使導入過程更加可控
-m
指定啟動map進程個數,默認是4個--delete-target-dir
刪除目標目錄--mapreduce-job-name
指定mapreduce的job的名字--target-dir
導入到指定目錄--fields-terminated-by
指定字段之間的分隔符--null-string
含義是 string類型的字段,當Value是NULL,替換成指定的字符--null-non-string
含義是非string類型的字段,當Value是NULL,替換成指定字符--columns
導入表中的部分字段--where
按條件導入數據--query
按照sql語句進行導入 使用--query關鍵字,就不能使用--table和--columns--options-file
在文件中執行
# 執行導入
[hadoop@hadoop000 ~]$ sqoop import > --connect jdbc:mysql://localhost:3306/test > --username root --password 123456 > --mapreduce-job-name FromMySQL2HDFS > --delete-target-dir > --table students > -m 1
# HDFS中查看
[hadoop@hadoop000 ~]$ hadoop fs -ls /user/hadoop/students
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2018-07-04 13:53 /user/hadoop/students/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 40 2018-07-04 13:53 /user/hadoop/students/part-m-00000
[hadoop@oradb3 ~]$ hadoop fs -cat /user/hadoop/students/"part*"
1001,lodd,23
1002,sdfs,21
1003,sdfsa,24
# 使用where 參數
[hadoop@hadoop000 ~]$ sqoop import > --connect jdbc:mysql://localhost:3306/test > --username root --password 123456 > --table students > --mapreduce-job-name FromMySQL2HDFS2 > --delete-target-dir > --fields-terminated-by ‘\t‘ > -m 1 > --null-string 0 > --columns "name" > --target-dir STU_COLUMN_WHERE > --where ‘id<1002‘
# HDFS 結果
[hadoop@hadoop000 ~]$ hadoop fs -cat STU_COLUMN_WHERE/"part*"
lodd
# 使用query 參數
[hadoop@hadoop000 ~]$ sqoop import > --connect jdbc:mysql://localhost:3306/test > --username root --password 123456 > --mapreduce-job-name FromMySQL2HDFS3 > --delete-target-dir > --fields-terminated-by ‘\t‘ > -m 1 > --null-string 0 > --target-dir STU_COLUMN_QUERY > --query "select * from students where id>1001 and \$CONDITIONS"
# HDFS查看
[hadoop@hadoop000 ~]$ hadoop fs -cat STU_COLUMN_QUERY/"part*"
1002 sdfs 21
1003 sdfsa 24
# 使用options-file參數
[hadoop@hadoop000 ~]$ vi sqoop-import-hdfs.txt
import
--connect
jdbc:mysql://localhost:3306/test
--username
root
--password
123456
--table
students
--target-dir
STU_option_file
# 執行導入
[hadoop@hadoop000 ~]$ sqoop --options-file /home/hadoop/sqoop-import-hdfs.txt
# HDFS查看
[hadoop@hadoop000 ~]$ hadoop fs -cat STU_option_file/"part*"
1001,lodd,23
1002,sdfs,21
1003,sdfsa,24
5.eval
查看幫助命令對與該命令的解釋為: Evaluate a SQL statement and display the results,也就是說執行一個SQL語句並查詢出結果。
# 查看命令幫助
[hadoop@hadoop000 ~]$ sqoop help eval
usage: sqoop eval [GENERIC-ARGS] [TOOL-ARGS]
Common arguments:
--connect <jdbc-uri> Specify JDBC connect
string
--connection-manager <class-name> Specify connection manager
class name
--connection-param-file <properties-file> Specify connection
parameters file
--driver <class-name> Manually specify JDBC
driver class to use
--hadoop-home <hdir> Override
$HADOOP_MAPRED_HOME_ARG
--hadoop-mapred-home <dir> Override
$HADOOP_MAPRED_HOME_ARG
--help Print usage instructions
-P Read password from console
--password <password> Set authentication
password
--password-alias <password-alias> Credential provider
password alias
--password-file <password-file> Set authentication
password file path
--relaxed-isolation Use read-uncommitted
isolation for imports
--skip-dist-cache Skip copying jars to
distributed cache
--username <username> Set authentication
username
--verbose Print more information
while working
SQL evaluation arguments:
-e,--query <statement> Execute ‘statement‘ in SQL and exit
# 執行
[hadoop@hadoop000 ~]$ sqoop eval > --connect jdbc:mysql://localhost:3306/test > --username root --password 123456 > --query "select * from students"
18/07/04 14:28:44 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
18/07/04 14:28:44 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/07/04 14:28:44 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
----------------------------------------------------
| id | name | age |
----------------------------------------------------
| 1001 | lodd | 23 |
| 1002 | sdfs | 21 |
| 1003 | sdfsa | 24 |
----------------------------------------------------
6.export (HDFS數據導出到MySQL或Hive中的數據導入到MySQL)
常用參數:
--table
指定導出表的名稱--input-fields-terminated-by
指定hdfs上文件的分隔符,默認是逗號--export-dir
導出數據的目錄--columns
指定導出的字段
在執行導出語句前mysql要先創建表(不創建表會報錯):
# HDFS原文件
[hadoop@hadoop000 ~]$ hadoop fs -cat /user/hadoop/students/part-m-00000
1001,lodd,23
1002,sdfs,21
1003,sdfsa,24
# export導出到mysql
[hadoop@hadoop000 ~]$ sqoop export > --connect jdbc:mysql://localhost:3306/test > --username root > --password 123456 > --table students_demo > --export-dir /user/hadoop/students/
18/07/04 14:46:20 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
18/07/04 14:46:20 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/07/04 14:46:20 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/07/04 14:46:20 INFO tool.CodeGenTool: Beginning code generation
18/07/04 14:46:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students_demo` AS t LIMIT 1
18/07/04 14:46:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students_demo` AS t LIMIT 1
18/07/04 14:46:21 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/app/hadoop-2.6.0-cdh5.7.0
18/07/04 14:46:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/fc7b53dd6eef701c0731c7a7c4a4b340/students_demo.jar
18/07/04 14:46:24 INFO mapreduce.ExportJobBase: Beginning export of students_demo
18/07/04 14:46:25 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/07/04 14:46:25 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
18/07/04 14:46:26 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
18/07/04 14:46:26 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
18/07/04 14:46:26 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
...
18/07/04 14:46:55 INFO mapreduce.ExportJobBase: Transferred 672 bytes in 29.3122 seconds (22.9256 bytes/sec)
18/07/04 14:46:55 INFO mapreduce.ExportJobBase: Exported 3 records.
# mysql中查看
mysql> select * from students_demo;
+------+-------+------+
| id | name | age |
+------+-------+------+
| 1001 | lodd | 23 |
| 1002 | sdfs | 21 |
| 1003 | sdfsa | 24 |
+------+-------+------+
3 rows in set (0.00 sec)
如果再導入一次會追加在表中
# 增加columns參數
[hadoop@hadoop000 ~]$ sqoop export > --connect jdbc:mysql://localhost:3306/test > --username root > --password 123456 > --table students_demo2 > --export-dir /user/hadoop/students/ > --columns id,name
# mysql結果
mysql> select * from students_demo2;
+------+-------+------+
| id | name | age |
+------+-------+------+
| 1001 | lodd | NULL |
| 1002 | sdfs | NULL |
| 1003 | sdfsa | NULL |
+------+-------+------+
3 rows in set (0.00 sec)
7.MySQL的中的數據導入到Hive中
常用參數:
--create-hive-table
創建目標表,如果有會報錯--hive-database
指定hive數據庫--hive-import
指定導入hive(沒有這個條件導入到hdfs中)--hive-overwrite
覆蓋--hive-table
指定hive中表的名字,如果不指定使用導入的表的表名--hive-partition-key
指定Hive分區表字段--hive-partition-value
指定導入的分區值
首次導入可能會報錯如下:18/07/04 15:06:26 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.<br/>18/07/04 15:06:26 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
解決方法:到hive目錄的lib下拷貝幾個jar包,問題就解決了
# 報錯解決方法
[hadoop@hadoop000 lib]$ pwd
/home/hadoop/app/hive-1.1.0-cdh5.7.0/lib
[hadoop@hadoop000 lib]$ cp hive-common-1.1.0-cdh5.7.0.jar /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/lib/
[hadoop@hadoop000 lib]$ cp hive-shims* /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/lib/
# 報錯解決後執行導入
[hadoop@hadoop000 ~]$ sqoop import > --connect jdbc:mysql://localhost:3306/test > --username root --password 123456 > --table students > --create-hive-table > --hive-database hive > --hive-import > --hive-overwrite > --hive-table stu_import > --mapreduce-job-name FromMySQL2HIVE > --delete-target-dir > --fields-terminated-by ‘\t‘ > -m 1 > --null-non-string 0
# Hive中查看
hive> show tables;
OK
stu_import
Time taken: 0.051 seconds, Fetched: 1 row(s)
hive> select * from stu_import;
OK
1001 lodd 23
1002 sdfs 21
1003 sdfsa 24
Time taken: 0.969 seconds, Fetched: 3 row(s)
建議:導入Hive不建議大家使用–create-hive-table參數,建議事先創建好hive表;因為自動創建的表字段類型可能並不是我們想要的。
# 增加partition參數
[hadoop@hadoop000 ~]$ sqoop import > --connect jdbc:mysql://localhost:3306/test > --username root --password 123456 > --table students > --create-hive-table > --hive-database hive > --hive-import > --hive-overwrite > --hive-table stu_import2 > --mapreduce-job-name FromMySQL2HIVE2 > --delete-target-dir > --fields-terminated-by ‘\t‘ > -m 1 > --null-non-string 0 > --hive-partition-key dt > --hive-partition-value "2018-08-08"
# Hive中查看
hive> select * from stu_import2;
OK
1001 lodd 23 2018-08-08
1002 sdfs 21 2018-08-08
1003 sdfsa 24 2018-08-08
Time taken: 0.192 seconds, Fetched: 3 row(s)
8.sqoop job的使用
sqoop job可以將執行的語句變成一個job,並不是在創建語句的時候執行,你可以查看該job,可以任何時候執行該job,也可以刪除job,這樣就方便我們進行任務的調度。
--create
<job-id> 創建一個新的job.--delete
<job-id> 刪除job--exec
<job-id> 執行job--show
<job-id> 顯示job的參數--list
列出所有的job
# 創建job
[hadoop@hadoop000 ~]$ sqoop job --create person_job1 -- import --connect jdbc:mysql://localhost:3306/test > --username root > --password 123456 > --table students_demo > -m 1 > --delete-target-dir
# 查看job
[hadoop@hadoop000 ~]$ sqoop job --list
Available jobs:
person_job1
# 執行job 會提示輸入mysql root用戶密碼
[hadoop@hadoop000 ~]$ sqoop job --exec person_job1
# HDFS查看
[hadoop@hadoop000 lib]$ hadoop fs -ls /user/hadoop/students_demo
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2018-07-04 15:34 /user/hadoop/students_demo/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 40 2018-07-04 15:34 /user/hadoop/students_demo/part-m-00000
我們發現執行person_job的時候,需要輸入數據庫的密碼,怎麽樣能不輸入密碼呢
配置sqoop-site.xml即可解決
# 將sqoop.metastore.client.record.password參數的註釋去掉 或者再添加一下
[hadoop@hadoop000 conf]$ pwd
/home/hadoop/app/sqoop-1.4.6-cdh5.7.0/conf
[hadoop@hadoop000 conf]$ vi sqoop-site.xml
<property>
<name>sqoop.metastore.client.record.password</name>
<value>true</value>
<description>If true, allow saved passwords in the metastore.
</description>
</property>
參考文章:https://blog.csdn.net/yu0_zhang0/article/details/79069251
Sqoop基本語法簡介