1. 程式人生 > >sqoop 之——小案例

sqoop 之——小案例

注意: 在用sqoop操作的時候:1.注意cdh版本要一致 2.雖然一致,但是還是需要匯入一些報到 $SQOOP_HOME 1)從 $HIVE_HOME/lib下匯入 hive-common-1.1.0-cdh5.7.0.jar, hive-shims-*, 2)匯入 java-connector-jar 3)匯入 json的jar包 其中需要從外部匯入的包下載地址: https://download.csdn.net/download/huonan_123/10746467

一、 MySQL==>HDFS

  1. 不指定匯入路徑
[[email protected] bin]$ sqoop import --connect jdbc:mysql://localhost:3306/sqoop --password root --username 123456 --table emp 

[
[email protected]
bin]$ hadoop fs -ls /user/hadoop/ 18/10/25 05:16:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items drwxr-xr-x - hadoop supergroup 0 2018-10-25 05:14 /user/hadoop/emp
  1. target-dir ===》指定hdfs路徑 注意
    :指定路徑–target-dir sqoop_emp(不指定會以讀取的表名自動生成一個目錄)
[[email protected] bin]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --password root \
> --username 123456 \
> --table emp \
> --target-dir sqoop_emp

  1. delete-target-dir ===》如果已經存在目錄就刪除
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username root \
--password root \
--table emp \
--delete-target-dir 

二、HDFS => MySQL

sqoop export \
--connect jdbc:mysql://localhost:3306/sqoop \
--username root \
--password 123456 \
--table emp_sqoop \
--export-dir emp 

[[email protected] sqoop-1.4.6-cdh5.7.0]$ sqoop export \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root \
> --password 123456 \
> --table emp_sqoop \
> --export-dir emp 
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
18/10/27 23:56:10 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
18/10/27 23:56:10 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/10/27 23:56:10 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/10/27 23:56:10 INFO tool.CodeGenTool: Beginning code generation
18/10/27 23:56:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp_sqoop` AS t LIMIT 1
18/10/27 23:56:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp_sqoop` AS t LIMIT 1
18/10/27 23:56:11 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/app/hadoop-2.6.0-cdh5.7.0
Note: /tmp/sqoop-hadoop/compile/74c212a715892b55898d0c44514495ad/emp_sqoop.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/10/27 23:56:13 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/74c212a715892b55898d0c44514495ad/emp_sqoop.jar
18/10/27 23:56:13 INFO mapreduce.ExportJobBase: Beginning export of emp_sqoop
18/10/27 23:56:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/10/27 23:56:14 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/10/27 23:56:14 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
18/10/27 23:56:15 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
18/10/27 23:56:15 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
18/10/27 23:56:15 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/10/27 23:56:15 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/10/27 23:56:19 INFO input.FileInputFormat: Total input paths to process : 4
18/10/27 23:56:19 INFO input.FileInputFormat: Total input paths to process : 4
18/10/27 23:56:19 INFO mapreduce.JobSubmitter: number of splits:3
18/10/27 23:56:19 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
18/10/27 23:56:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1540701601006_0031
18/10/27 23:56:21 INFO impl.YarnClientImpl: Submitted application application_1540701601006_0031
18/10/27 23:56:21 INFO mapreduce.Job: The url to track the job: http://hadoop001:8088/proxy/application_1540701601006_0031/
18/10/27 23:56:21 INFO mapreduce.Job: Running job: job_1540701601006_0031
18/10/27 23:56:35 INFO mapreduce.Job: Job job_1540701601006_0031 running in uber mode : false
18/10/27 23:56:35 INFO mapreduce.Job:  map 0% reduce 0%
18/10/27 23:56:47 INFO mapreduce.Job:  map 100% reduce 0%
18/10/27 23:56:47 INFO mapreduce.Job: Job job_1540701601006_0031 completed successfully
18/10/27 23:56:47 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=417579
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1513
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=21
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=0
	Job Counters 
		Launched map tasks=3
		Data-local map tasks=3
		Total time spent by all maps in occupied slots (ms)=27910
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=27910
		Total vcore-seconds taken by all map tasks=27910
		Total megabyte-seconds taken by all map tasks=28579840
	Map-Reduce Framework
		Map input records=14
		Map output records=14
		Input split bytes=594
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=286
		CPU time spent (ms)=4520
		Physical memory (bytes) snapshot=527609856
		Virtual memory (bytes) snapshot=4675698688
		Total committed heap usage (bytes)=452460544
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=0
18/10/27 23:56:47 INFO mapreduce.ExportJobBase: Transferred 1.4775 KB in 32.3879 seconds (46.7149 bytes/sec)
18/10/27 23:56:47 INFO mapreduce.ExportJobBase: Exported 14 records.

結果:

mysql> select * from emp_sqoop;
+-------+--------+-----------+------+------------+------+------+--------+
| EMPNO | ENAME  | JOB       | MGR  | HIREDATE   | SAL  | COMM | DEPTNO |
+-------+--------+-----------+------+------------+------+------+--------+
|  7369 | SMITH  | CLERK     | 7902 | 1980-12-17 |  800 | NULL |     20 |
|  7499 | ALLEN  | SALESMAN  | 7698 | 1981-02-20 | 1600 |  300 |     30 |
|  7521 | WARD   | SALESMAN  | 7698 | 1981-02-22 | 1250 |  500 |     30 |
|  7566 | JONES  | MANAGER   | 7839 | 1981-04-02 | 2975 | NULL |     20 |
|  7654 | MARTIN | SALESMAN  | 7698 | 1981-09-28 | 1250 | 1400 |     30 |
|  7698 | BLAKE  | MANAGER   | 7839 | 1981-05-01 | 2850 | NULL |     30 |
|  7782 | CLARK  | MANAGER   | 7839 | 1981-06-09 | 2450 | NULL |     10 |
|  7788 | SCOTT  | ANALYST   | 7566 | 1982-12-09 | 3000 | NULL |     20 |
|  7839 | KING   | PRESIDENT | NULL | 1981-11-17 | 5000 | NULL |     10 |
|  7844 | TURNER | SALESMAN  | 7698 | 1981-09-08 | 1500 |    0 |     30 |
|  7876 | ADAMS  | CLERK     | 7788 | 1983-01-12 | 1100 | NULL |     20 |
|  7900 | JAMES  | CLERK     | 7698 | 1981-12-03 |  950 | NULL |     30 |
|  7902 | FORD   | ANALYST   | 7566 | 1981-12-03 | 3000 | NULL |     20 |
|  7934 | MILLER | CLERK     | 7782 | 1982-01-23 | 1300 | NULL |     10 |
+-------+--------+-----------+------+------------+------+------+--------+
14 rows in set (0.01 sec)

三、Mysql =>hive

執行:

[[email protected] sqoop-1.4.6-cdh5.7.0]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root \
> --password 123456 \
> --table emp \
> --delete-target-dir \
> --hive-import  \
> --hive-table d5_emp_test_p \
> --fields-terminated-by '\t' \
> --columns 'EMPNO,ENAME,JOB,SAL,COMM' \
> --hive-overwrite \
> --hive-partition-key 'pt' \
> --hive-partition-value 'rz'

報錯: 在這裡插入圖片描述

原因是:hdfs目錄的許可權問題 刪除:hadoop fs -rmr /tmp/hive 再次執行會生成這個目錄 再次執行 結果: 在這裡插入圖片描述

  • 常用命令
命令 作用
–connect mysql d的url 如:jdbc:mysql://localhost:3306/sqoop
–username 使用者名稱
–password 密碼
–table 讀取mysql的表名稱
–export-dir 匯出hdfs路徑
–mapreduce-job-name 指定mr job作業的時候job名稱 可以在web的50070埠檢視
-fields-terminated-by 指定匯入的欄位
–where 拼寫sql條件
-m Use ‘n’ map tasks to import in parallel ,指定幾個task執行
-e 使用sql