1. 程式人生 > >spark升級——java.lang.NoSuchMethodError問題分析

spark升級——java.lang.NoSuchMethodError問題分析

Trouble shooting with java.lang.NoSuchMethodError Issue

背景

spark2.3引入了一些實用的新特性,如orc read/write optimization, bucket join with SQL, Continuous Processing等,由於業務需要使用部分新特性故籌備升級。將現網正在使用的spark版本上已經存在的patch合併到spark v2.3.2分支上之後,釋出到測試環境中進行測試。

測試環境配置:

  • hadoop2.6.0-cdh5.13.1
  • spark-2.3.2-bin-2.6.0-cdh5.13.1 (基於社群spark tag2.3.2 + hadoop2.6.0-cdh5.13.1 編譯生成)
  • 採用spark on yarn的方式執行spark application

使用spark-shell 執行如下程式碼

> spark.sql("select count(*) from test_db.test_table where dt='20180924'").show

executor端task執行成功
sparkui
driver端報錯如下

java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V
  at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122)
  at org.apache.spark.sql.execution.SparkPlan.org$apache$spark$sql$execution$SparkPlan$$decodeUnsafeRows(SparkPlan.scala:274)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeTake$1.apply(SparkPlan.scala:366)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeTake$1.apply(SparkPlan.scala:366)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:366)
  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3278)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2703)
  at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:723)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:682)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:691)
  ... 49 elided

問題分析

spark-shell程序的環境變數分析

通過如下命令分析

# 獲取spark-shell的程序id ${pid}
$ jps -ml | grep -i spark-shell

# jinfo查詢程序的環境變數
$ jinfo ${pid} | grep -i "lz4"
# 發現環境變數有且僅有一個/xxx/spark-2.3.2-bin-2.6.0-cdh5.13.1/jars/lz4-java-1.4.0.jar
# 進一步分析無法識別的類是否在上述jar包中
$ grep 'net\/jpountz\/lz4\/LZ4BlockInputStream'  /xxx/spark-2.3.2-bin-2.6.0-cdh5.13.1/jars/lz4-java-1.4.0.jar
# 輸出: Binary file /xxx/spark-2.3.2-vip-1.0.0-bin-2.6.0-cdh5.13.1/jars/lz4-java-1.4.0.jar matches;說明需要的類確實已經存在於環境變數中

通過上述分析,可以發現,異常中無法識別的類確實已經存在於JVM程序的環境變數當中,那為什麼還是無法識別呢?

莫非jar包沒有被載入?

# ${pid} is your jvm process id,use pgrep or jps to find the PID of your jvm process.
$ /usr/sbin/lsof -p ${pid} | grep lz4
java    100541 hdfs  mem    REG                8,3    59545   23294926 /tmp/liblz4-java1940655373971688476.so
java    100541 hdfs  mem    REG                8,3   370119   21442570 /xxx/spark-2.3.2-bin-2.6.0-cdh5.13.1/jars/lz4-java-1.4.0.jar
java    100541 hdfs  171r   REG                8,3   370119   21442570 /xxx/spark-2.3.2-bin-2.6.0-cdh5.13.1/jars/lz4-java-1.4.0.jar

通過上述結果可以確認jar包確實已經被JVM載入,並且未顯示版本相互衝突的lz4*.jar
那為什麼還是識別不了呢?
因為JVM識別不了net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V的問題已經鐵證如山了,所以只能繼續分析JVM載入的其他外部依賴,確認是否是由外部依賴引入衝突的jar包進而導致所需的類無法正常載入。

如何快速找出隱式引入的衝突jar包

$ /usr/sbin/lsof -p ${pid} | grep '\.jar' | awk '{print $9}' | grep -v 'jdk' | xargs grep -w 'net\/jpountz\/lz4\/LZ4BlockInputStream'
# output:
# Binary file /xxx/spark-2.3.2-vip-1.0.0-bin-2.6.0-cdh5.13.1/jars/lz4-java-1.4.0.jar matches
# Binary file /xxx/yyy/spark-plugins-0.1.0-SNAPSHOT-jar-with-dependencies.jar matches

到這裡問題的原因基本明朗了,spark-plugins這個模組中引入了lz4相關的類,進而導致net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V not found的異常;

開啟spark-plugins工程,分析其dependency tree,結果如下

......
[INFO] +- org.apache.kafka:kafka-clients:jar:0.9.0.1:compile
[INFO] |  +- org.xerial.snappy:snappy-java:jar:1.1.1.7:compile
[INFO] |  \- net.jpountz.lz4:lz4:jar:1.2.0:compile
[INFO] \- org.apache.commons:commons-pool2:jar:2.4.1:compile
......

通過修改pom.xml中·kafka-client·依賴,exclude掉lz4的依賴

    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>${kafka.version}</version>
        <exclusions>
            <exclusion>
                <groupId>net.jpountz.lz4</groupId>
                <artifactId>lz4</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

重新編譯部署spark-plugins-0.1.0-SNAPSHOT-jar-with-dependencies.jar到測試環境,重新啟動spark-shell;
測試通過,jar包衝突的問題成功fix。

總結

目前,大資料生態的眾多元件均執行在JVM之上,jar dependency conflict的問題時常發生,整理出一套快速分析定位此類問題的方法可以很好地推進日常開發工作順利開展,故做此記錄,文中或有紕漏之處,歡迎批評指正。

2018.9.29