1. 程式人生 > 其它 >Sqoop從MySQL向Hive增量式匯入資料報錯:Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject

Sqoop從MySQL向Hive增量式匯入資料報錯:Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject

1、問題描述:

(1)問題示例:

Step1:建立作業:

[Hadoop@master TestDir]$ sqoop job \
> --create myjob_1 \
> -- import \
> --connect "jdbc:mysql://master:3306/source?useSSL=false&user=Hive&password=******" \
> --table sales_order \
> --columns "order_number,customer_number,product_code,order_date,entry_date,order_amount" \


> --where "entry_date < current_date()" \
> --hive-import \
> --hive-table rds.sales_order \
> --incremental append \
> --check-column entry_date \
> --last-value '1900-01-01'
2021-11-06 15:08:48,003 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
2021-11-06 15:08:48,242 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
2021-11-06 15:08:48,242 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject

at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43)
at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:785)
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:399)
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.create(HsqldbJobStorage.java:379)
at org.apache.sqoop.tool.JobTool.createJob(JobTool.java:181)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:294)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObject
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 12 more

Step2:檢視作業:
[Hadoop@master TestDir]$ sqoop job --list
2021-11-06 15:09:01,663 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Available jobs:
myjob_1

Step3:檢視時間戳:
[Hadoop@master TestDir]$ sqoop job --show myjob_1 |grep last.value
2021-11-06 15:09:10,169 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43)
at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:785)
at org.apache.sqoop.tool.JobTool.showJob(JobTool.java:261)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:302)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObject
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 10 more

Step4:提交作業:
[Hadoop@master TestDir]$ sqoop job --exec myjob_1
2021-11-06 15:09:20,062 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
--table or --query is required for import. (Or use sqoop import-all-tables.)
Try --help for usage instructions.

Step5:二次檢視時間戳:
[Hadoop@master TestDir]$ sqoop job --show myjob_1 |grep last.value
2021-11-06 15:09:28,496 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43)
at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:785)
at org.apache.sqoop.tool.JobTool.showJob(JobTool.java:261)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:302)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObject
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 10 more

(2)問題綜述:

通過以上步驟示例,Sqoop從MySQL向Hive進行增量式匯入資料報錯,問題的核心在於:

Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject

解決該問題,是關鍵。

2、問題剖析:

(1)參考:https://www.cnblogs.com/QuestionsZhang/p/10082735.html

(2)剖析:根據參考,以及本示例所報出的問題,基本可以肯定是因為Sqoop安裝路徑下lib目錄缺少java-json.jar。因此只需要尋找一個Java-json.jar放入Sqoop安裝路徑中的lib目錄下即可。筆者從最終測試結果可知,此為正解。

3、解決方案:

(1)下載java-json.jar

下載地址:http://www.java2s.com/Code/JarDownload/java-json/java-json.jar.zip

(2)二次測試:

Step1:建立作業:

[Hadoop@master tmp]$ sqoop job \
> --create myjob_1 \
> -- import \
> --connect "jdbc:mysql://master:3306/source?useSSL=false&user=Hive&password=******" \
> --table sales_order \
> --columns "order_number,customer_number,product_code,order_date,entry_date,order_amount" \
> --where "entry_date < current_date()" \
> --hive-import \
> --hive-table rds.sales_order \
> --incremental append \
> --check-column entry_date \
> --last-value '1900-01-01'
2021-11-06 15:49:52,350 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
2021-11-06 15:49:52,632 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
2021-11-06 15:49:52,632 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.

Step2:檢視作業:

[Hadoop@master tmp]$ sqoop job --list
2021-11-06 15:50:10,531 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Available jobs:
myjob_1

Step3:檢視時間戳:

[Hadoop@master tmp]$ sqoop job --show myjob_1 |grep last.value
2021-11-06 15:50:29,832 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
incremental.last.value = 1900-01-01

Step4:提交作業:

[Hadoop@master tmp]$ sqoop job --exec myjob_1
2021-11-06 16:05:23,188 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
2021-11-06 16:05:23,815 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2021-11-06 16:05:23,851 INFO tool.CodeGenTool: Beginning code generation
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
2021-11-06 16:05:27,962 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `sales_order` AS t LIMIT 1
2021-11-06 16:05:29,150 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `sales_order` AS t LIMIT 1
......(此處有省略)
2021-11-06 16:09:35,112 INFO hive.HiveImport: OK
2021-11-06 16:09:35,273 INFO hive.HiveImport: Time taken: 1.611 seconds
2021-11-06 16:09:35,274 INFO hive.HiveImport: 2021-11-06 16:09:35,273 INFO [af634dbf-7808-4849-bd2e-f2b12b7435a8 main] CliDriver (SessionState.java:printInfo(1227)) - Time taken: 1.611 seconds
2021-11-06 16:09:35,274 INFO hive.HiveImport: 2021-11-06 16:09:35,273 INFO [af634dbf-7808-4849-bd2e-f2b12b7435a8 main] conf.HiveConf (HiveConf.java:getLogIdVar(5040)) - Using the default value passed in for log id: af634dbf-7808-4849-bd2e-f2b12b7435a8
2021-11-06 16:09:35,274 INFO hive.HiveImport: 2021-11-06 16:09:35,273 INFO [af634dbf-7808-4849-bd2e-f2b12b7435a8 main] session.SessionState (SessionState.java:resetThreadName(452)) - Resetting thread name to main
2021-11-06 16:09:35,274 INFO hive.HiveImport: 2021-11-06 16:09:35,274 INFO [main] conf.HiveConf (HiveConf.java:getLogIdVar(5040)) - Using the default value passed in for log id: af634dbf-7808-4849-bd2e-f2b12b7435a8
2021-11-06 16:09:35,319 INFO hive.HiveImport: 2021-11-06 16:09:35,318 INFO [main] session.SessionState (SessionState.java:dropPathAndUnregisterDeleteOnExit(885)) - Deleted directory: /user/hive/tmp/grid/af634dbf-7808-4849-bd2e-f2b12b7435a8 on fs with scheme hdfs
2021-11-06 16:09:35,319 INFO hive.HiveImport: 2021-11-06 16:09:35,319 INFO [main] session.SessionState (SessionState.java:dropPathAndUnregisterDeleteOnExit(885)) - Deleted directory: /tmp/hive/Local/af634dbf-7808-4849-bd2e-f2b12b7435a8 on fs with scheme file
2021-11-06 16:09:35,319 INFO hive.HiveImport: 2021-11-06 16:09:35,319 INFO [main] metastore.HiveMetaStoreClient (HiveMetaStoreClient.java:close(600)) - Closed a connection to metastore, current connections: 1
2021-11-06 16:09:35,389 INFO hive.HiveImport: Hive import complete.
2021-11-06 16:09:35,392 INFO hive.HiveImport: Export directory is empty, removing it.
2021-11-06 16:09:35,408 INFO tool.ImportTool: Saving incremental import state to the metastore
2021-11-06 16:09:35,454 INFO tool.ImportTool: Updated data for job: myjob_1

Step5:二次檢視時間戳:

[Hadoop@master tmp]$ sqoop job --show myjob_1 |grep last.value
2021-11-06 16:09:45,037 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
incremental.last.value = 2016-06-30 17:26:24.0

二次檢視時間戳的目的是驗證:Sqoop是不是成功將MySQL中資料採用增量式匯入Hive中,此外還可以在Hive檢視資料進行驗證。