sqoop 之——小案例
阿新 • • 發佈:2018-12-18
注意: 在用sqoop操作的時候:1.注意cdh版本要一致 2.雖然一致,但是還是需要匯入一些報到 $SQOOP_HOME 1)從 $HIVE_HOME/lib下匯入 hive-common-1.1.0-cdh5.7.0.jar, hive-shims-*, 2)匯入 java-connector-jar 3)匯入 json的jar包 其中需要從外部匯入的包下載地址: https://download.csdn.net/download/huonan_123/10746467
一、 MySQL==>HDFS
- 不指定匯入路徑
[[email protected] bin]$ sqoop import --connect jdbc:mysql://localhost:3306/sqoop --password root --username 123456 --table emp
[ [email protected] bin]$ hadoop fs -ls /user/hadoop/
18/10/25 05:16:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2018-10-25 05:14 /user/hadoop/emp
- –target-dir ===》指定hdfs路徑
注意
[[email protected] bin]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --password root \
> --username 123456 \
> --table emp \
> --target-dir sqoop_emp
- delete-target-dir ===》如果已經存在目錄就刪除
sqoop import \ --connect jdbc:mysql://localhost:3306/sqoop \ --username root \ --password root \ --table emp \ --delete-target-dir
二、HDFS => MySQL
sqoop export \
--connect jdbc:mysql://localhost:3306/sqoop \
--username root \
--password 123456 \
--table emp_sqoop \
--export-dir emp
[[email protected] sqoop-1.4.6-cdh5.7.0]$ sqoop export \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root \
> --password 123456 \
> --table emp_sqoop \
> --export-dir emp
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
18/10/27 23:56:10 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
18/10/27 23:56:10 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/10/27 23:56:10 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/10/27 23:56:10 INFO tool.CodeGenTool: Beginning code generation
18/10/27 23:56:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp_sqoop` AS t LIMIT 1
18/10/27 23:56:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp_sqoop` AS t LIMIT 1
18/10/27 23:56:11 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/app/hadoop-2.6.0-cdh5.7.0
Note: /tmp/sqoop-hadoop/compile/74c212a715892b55898d0c44514495ad/emp_sqoop.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/10/27 23:56:13 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/74c212a715892b55898d0c44514495ad/emp_sqoop.jar
18/10/27 23:56:13 INFO mapreduce.ExportJobBase: Beginning export of emp_sqoop
18/10/27 23:56:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/10/27 23:56:14 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/10/27 23:56:14 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
18/10/27 23:56:15 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
18/10/27 23:56:15 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
18/10/27 23:56:15 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/10/27 23:56:15 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/10/27 23:56:19 INFO input.FileInputFormat: Total input paths to process : 4
18/10/27 23:56:19 INFO input.FileInputFormat: Total input paths to process : 4
18/10/27 23:56:19 INFO mapreduce.JobSubmitter: number of splits:3
18/10/27 23:56:19 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
18/10/27 23:56:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1540701601006_0031
18/10/27 23:56:21 INFO impl.YarnClientImpl: Submitted application application_1540701601006_0031
18/10/27 23:56:21 INFO mapreduce.Job: The url to track the job: http://hadoop001:8088/proxy/application_1540701601006_0031/
18/10/27 23:56:21 INFO mapreduce.Job: Running job: job_1540701601006_0031
18/10/27 23:56:35 INFO mapreduce.Job: Job job_1540701601006_0031 running in uber mode : false
18/10/27 23:56:35 INFO mapreduce.Job: map 0% reduce 0%
18/10/27 23:56:47 INFO mapreduce.Job: map 100% reduce 0%
18/10/27 23:56:47 INFO mapreduce.Job: Job job_1540701601006_0031 completed successfully
18/10/27 23:56:47 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=417579
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1513
HDFS: Number of bytes written=0
HDFS: Number of read operations=21
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=3
Data-local map tasks=3
Total time spent by all maps in occupied slots (ms)=27910
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=27910
Total vcore-seconds taken by all map tasks=27910
Total megabyte-seconds taken by all map tasks=28579840
Map-Reduce Framework
Map input records=14
Map output records=14
Input split bytes=594
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=286
CPU time spent (ms)=4520
Physical memory (bytes) snapshot=527609856
Virtual memory (bytes) snapshot=4675698688
Total committed heap usage (bytes)=452460544
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
18/10/27 23:56:47 INFO mapreduce.ExportJobBase: Transferred 1.4775 KB in 32.3879 seconds (46.7149 bytes/sec)
18/10/27 23:56:47 INFO mapreduce.ExportJobBase: Exported 14 records.
結果:
mysql> select * from emp_sqoop;
+-------+--------+-----------+------+------------+------+------+--------+
| EMPNO | ENAME | JOB | MGR | HIREDATE | SAL | COMM | DEPTNO |
+-------+--------+-----------+------+------------+------+------+--------+
| 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800 | NULL | 20 |
| 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600 | 300 | 30 |
| 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250 | 500 | 30 |
| 7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975 | NULL | 20 |
| 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250 | 1400 | 30 |
| 7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850 | NULL | 30 |
| 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450 | NULL | 10 |
| 7788 | SCOTT | ANALYST | 7566 | 1982-12-09 | 3000 | NULL | 20 |
| 7839 | KING | PRESIDENT | NULL | 1981-11-17 | 5000 | NULL | 10 |
| 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500 | 0 | 30 |
| 7876 | ADAMS | CLERK | 7788 | 1983-01-12 | 1100 | NULL | 20 |
| 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950 | NULL | 30 |
| 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 3000 | NULL | 20 |
| 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300 | NULL | 10 |
+-------+--------+-----------+------+------------+------+------+--------+
14 rows in set (0.01 sec)
三、Mysql =>hive
執行:
[[email protected] sqoop-1.4.6-cdh5.7.0]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root \
> --password 123456 \
> --table emp \
> --delete-target-dir \
> --hive-import \
> --hive-table d5_emp_test_p \
> --fields-terminated-by '\t' \
> --columns 'EMPNO,ENAME,JOB,SAL,COMM' \
> --hive-overwrite \
> --hive-partition-key 'pt' \
> --hive-partition-value 'rz'
報錯:
原因是:hdfs目錄的許可權問題 刪除:hadoop fs -rmr /tmp/hive 再次執行會生成這個目錄 再次執行 結果:
- 常用命令
命令 | 作用 |
---|---|
–connect | mysql d的url 如:jdbc:mysql://localhost:3306/sqoop |
–username | 使用者名稱 |
–password | 密碼 |
–table | 讀取mysql的表名稱 |
–export-dir | 匯出hdfs路徑 |
–mapreduce-job-name | 指定mr job作業的時候job名稱 可以在web的50070埠檢視 |
-fields-terminated-by | 指定匯入的欄位 |
–where | 拼寫sql條件 |
-m | Use ‘n’ map tasks to import in parallel ,指定幾個task執行 |
-e | 使用sql |