編譯hadoop並獲取native
文件說明
該文件為完成編譯後,根據印象編寫,並未經過重新編譯測試,大體步驟應該沒錯,如有錯誤,望指正。
編譯的最初目的,是為了獲取合適的native包,並且檢測手動編譯的native包和自帶的native包有何差別。
編譯的版本為apache-hadoop-2.7.6和hadoop-2.6.0-cdh5.5.0。
編譯說明
原始碼中會有一個 BUILDING.txt 檔案,編譯需要的依賴和命令等大部分資訊都可以在裡面找到。
比如依賴:
Requirements: * Unix System * JDK 1.7+ * Maven 3.0 or later * Findbugs 1.3.9 (if running findbugs) * ProtocolBuffer 2.5.0 * CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac * Zlib devel (if compiling native code) * openssl devel ( if compiling native hadoop-pipes and to get the best HDFS encryption performance ) * Linux FUSE (Filesystem in Userspace) version 2.6 or above ( if compiling fuse_dfs ) * Internet connection for first build (to fetch all Maven and Hadoop dependencies)
編譯環境
$ uname -a # Linux e3base01 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
$ java -version # java version "1.7.0_04"
$ mvn -v # Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 2014-12-15T01:29:23+08:00)
# findbugs未安裝。
$ protoc --version # libprotoc 2.5.0
$ cmake -version # cmake version 2.8.12.2
$ yum list installed | grep zlib-devel # zlib-devel.x86_64 1.2.3-29.el6 @base
$ yum list installed | grep openssl-devel # openssl-devel.x86_64 1.0.1e-57.el6 @base
$ yum list installed | grep fuse # fuse.x86_64 2.8.3-4.el6 @anaconda-CentOS-201311272149.x86_64/6.5 # 系統自帶,並未刻意安裝。
## 編譯目錄預留推薦最少5G空間,編譯完後編譯目錄佔用空間接近4G。
安裝依賴
jdk
下載tar包,tar開。
配置環境變數,如~/.bash_profile中新增:
export JAVA_HOME=/path/to/jdk
export PATH=$JAVA_HOME/bin:$PATH
$ source ~/.bash_profile # 立即生效。
驗證
$ java -version
maven
下載tar包,tar開。
配置環境變數,如~/.bash_profile中新增:
export MAVEN_HOME=/path/to/maven
export MAVEN_OPTS="-Xmx4g -Xms4g"
export PATH=$MAVEN_HOME/bin:$PATH
$ source ~/.bash_profile # 立即生效。
驗證
$ mvn -v
protobuff
下載tar包,tar開。
# ./configure
# make
# make install
驗證
$ protoc --version
cmake,zlib-devel,openssl-devel
$ yum install cmake zlib-devel openssl-devel
驗證
$ yum list installed | egrep 'cmake|zlib-devel|openssl-devel'
編譯命令
## 編譯,並生成native。編譯過程如snappy和openssl包不可用,則編譯直接報錯。 # 該項未測試。
$ mvn clean package -Pdist,native -DskipTests -Dtar -Drequire.snappy -Drequire.openssl
## 編譯,並生成native,會將系統的snappy和openssl等共享庫新增至native目錄。注意將 snappy.lib 和 openssl.lib 指向系統正確的lib目錄。
$ mvn clean package -Pdist,native -DskipTests -Dtar -Drequire.snappy -Dsnappy.lib=/usr/lib64 -Dbundle.snappy -Drequire.openssl -Dopenssl.lib=/usr/lib64 -Dbundle.openssl
編譯過程中無法下載依賴的問題
編譯過程會遇到報錯,一般是顯示download xxx,然後就失敗,提示無法下載依賴等字樣。
因為編譯要求jdk1.7,部分源不支援jdk1.7訪問,網路人說是網路協議的問題,換1.8就好了。
這裡編譯指定了1.7,要編譯的話只能用下面方法。
換源,比如在maven的配置檔案中,增加ali源。下載依賴時,同一個包會在多個庫之間輪詢,輪詢到ali原,就下載成功了。
偶爾出現下載卡主的情況,這時可以ctrl c掉,重新編譯。
有時候換源也不解決問題,只能手動將對應的包放入maven的本次倉庫內。
便捷的方法,是複製編譯過程中提示的依賴地址,然後進到maven對應的依賴目錄下,使用wget命令將包get下來。
或者直接從其他地方拷一份較完整的倉庫過來,附件提供的倉庫可能不全,但涵蓋了大部分依賴包,可以節省部分時間。
編譯結果
# apache-2.7.6版本編譯完成。
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................. SUCCESS [ 1.413 s]
[INFO] Apache Hadoop Build Tools .......................... SUCCESS [ 1.060 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [ 1.873 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [ 3.796 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [ 0.214 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [ 1.974 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [ 5.477 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [ 8.330 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [ 5.958 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [ 2.701 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [01:28 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [ 4.182 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 10.096 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [ 0.040 s]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [04:20 min]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [ 38.823 s]
[INFO] Apache Hadoop HDFS BookKeeper Journal .............. SUCCESS [ 4.453 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [ 3.027 s]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [ 0.058 s]
[INFO] hadoop-yarn ........................................ SUCCESS [ 0.036 s]
[INFO] hadoop-yarn-api .................................... SUCCESS [02:54 min]
[INFO] hadoop-yarn-common ................................. SUCCESS [ 18.703 s]
[INFO] hadoop-yarn-server ................................. SUCCESS [ 0.040 s]
[INFO] hadoop-yarn-server-common .......................... SUCCESS [ 8.185 s]
[INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [ 15.118 s]
[INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [ 4.324 s]
[INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [ 16.786 s]
[INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 14.241 s]
[INFO] hadoop-yarn-server-tests ........................... SUCCESS [ 4.823 s]
[INFO] hadoop-yarn-client ................................. SUCCESS [ 5.175 s]
[INFO] hadoop-yarn-server-sharedcachemanager .............. SUCCESS [ 2.580 s]
[INFO] hadoop-yarn-applications ........................... SUCCESS [ 0.046 s]
[INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [ 1.914 s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [ 1.467 s]
[INFO] hadoop-yarn-site ................................... SUCCESS [ 0.041 s]
[INFO] hadoop-yarn-registry ............................... SUCCESS [ 3.651 s]
[INFO] hadoop-yarn-project ................................ SUCCESS [ 3.318 s]
[INFO] hadoop-mapreduce-client ............................ SUCCESS [ 0.198 s]
[INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 13.687 s]
[INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 14.004 s]
[INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [ 3.039 s]
[INFO] hadoop-mapreduce-client-app ........................ SUCCESS [ 6.411 s]
[INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [ 5.591 s]
[INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [ 4.838 s]
[INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [ 2.332 s]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [ 4.209 s]
[INFO] hadoop-mapreduce ................................... SUCCESS [ 2.373 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [ 3.223 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [ 7.516 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [ 1.895 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [ 3.894 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [ 2.840 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [ 1.690 s]
[INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [ 1.636 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [ 3.118 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [ 5.782 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [ 4.930 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [01:19 min]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [ 3.164 s]
[INFO] Apache Hadoop Client ............................... SUCCESS [ 7.182 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [ 0.910 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [ 5.668 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 7.558 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [ 0.031 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [04:35 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 19:52 min
[INFO] Finished at: 2018-12-12T16:40:32+08:00
[INFO] Final Memory: 242M/3915M
[INFO] ------------------------------------------------------------------------
# cdh-5.5.0版本編譯完成。
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................. SUCCESS [ 1.799 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [ 0.969 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [ 2.744 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [ 0.317 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [ 1.852 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [ 2.922 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [ 3.499 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [ 3.985 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [ 2.539 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [01:23 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [ 4.392 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 10.646 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [ 0.040 s]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [04:28 min]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [01:18 min]
[INFO] Apache Hadoop HDFS BookKeeper Journal .............. SUCCESS [ 3.934 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [ 3.894 s]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [ 0.061 s]
[INFO] hadoop-yarn ........................................ SUCCESS [ 0.043 s]
[INFO] hadoop-yarn-api .................................... SUCCESS [02:01 min]
[INFO] hadoop-yarn-common ................................. SUCCESS [ 18.434 s]
[INFO] hadoop-yarn-server ................................. SUCCESS [ 0.027 s]
[INFO] hadoop-yarn-server-common .......................... SUCCESS [ 8.025 s]
[INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [ 23.947 s]
[INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [ 7.871 s]
[INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [ 6.325 s]
[INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 16.706 s]
[INFO] hadoop-yarn-server-tests ........................... SUCCESS [ 0.896 s]
[INFO] hadoop-yarn-client ................................. SUCCESS [ 4.011 s]
[INFO] hadoop-yarn-applications ........................... SUCCESS [ 0.033 s]
[INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [ 2.224 s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [ 1.695 s]
[INFO] hadoop-yarn-site ................................... SUCCESS [ 0.042 s]
[INFO] hadoop-yarn-registry ............................... SUCCESS [ 4.132 s]
[INFO] hadoop-yarn-project ................................ SUCCESS [ 4.410 s]
[INFO] hadoop-mapreduce-client ............................ SUCCESS [ 0.159 s]
[INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 18.474 s]
[INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 19.756 s]
[INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [ 3.631 s]
[INFO] hadoop-mapreduce-client-app ........................ SUCCESS [ 7.248 s]
[INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [ 4.414 s]
[INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [ 4.481 s]
[INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [ 1.721 s]
[INFO] hadoop-mapreduce-client-nativetask ................. SUCCESS [01:05 min]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [ 3.870 s]
[INFO] hadoop-mapreduce ................................... SUCCESS [ 4.490 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [ 3.039 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [ 6.679 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [ 1.838 s]
[INFO] Apache Hadoop Archive Logs ......................... SUCCESS [ 2.622 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [ 30.751 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [ 58.028 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [ 2.962 s]
[INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [ 1.581 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [ 2.339 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [ 5.728 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [ 3.728 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [ 7.629 s]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [ 3.251 s]
[INFO] Apache Hadoop Client ............................... SUCCESS [ 4.500 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [ 1.157 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [ 3.543 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 9.379 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [ 0.088 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [03:02 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 19:23 min
[INFO] Finished at: 2018-12-12T17:22:39+08:00
[INFO] Final Memory: 258M/4065M
[INFO] ------------------------------------------------------------------------
獲取官方自帶native
hadoop-2.7.6
下載tar包,tar開。
/path/to/hadoop/lib/native
hadoop-2.6.0-cdh5.5.0
下載rpm包 https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.5.0/RPMS/x86_64/hadoop-2.6.0+cdh5.5.0+921-1.cdh5.5.0.p0.15.el6.x86_64.rpm
$ rpm2cpio hadoop-2.6.0+cdh5.5.0+921-1.cdh5.5.0.p0.15.el6.x86_64.rpm | cpio -div
$ ./usr/lib/hadoop/lib/native
編譯後的native對比
hadoop-2.7.6
在增加 -Drequire.snappy -Dsnappy.lib=/usr/lib64 -Dbundle.snappy -Drequire.openssl -Dopenssl.lib=/usr/lib64 -Dbundle.openssl 引數時,
與不增加上述引數編譯的native包對比,增加了 libcrypto.so* libk5crypto.so* libsnappy.so* 等檔案。
libk5crypto.so是一個失效的連線,需要手動找到連線對應的共享庫,拷貝到native目錄下。
不增加上述引數,和自帶的共享庫對比,共享庫檔案大小會發生改變,但不會有共享庫缺失和增加。
hadoop-2.6.0-cdh5.5.0
在增加 -Drequire.snappy -Dsnappy.lib=/usr/lib64 -Dbundle.snappy -Drequire.openssl -Dopenssl.lib=/usr/lib64 -Dbundle.openssl 引數時,
與不增加上述引數編譯的native包對比,增加了 libcrypto.so* libk5crypto.so* libsnappy.so* 等檔案。
libk5crypto.so是一個失效的連線,需要手動找到連線對應的共享庫,拷貝到native目錄下。
不增加上述引數,和自帶的共享庫對比,增加了 libhdfs.so libhdfs.so.0.0.0 檔案,缺少了 libsnappy.so libsnappy.so.1 libsnappy.so.1.1.3 檔案。
部署集群后native的檢查
$ hadoop checknative ## 均顯示true,則說明native正常。
18/12/12 19:02:31 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
18/12/12 19:02:31 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /src/hadoop-2.6.0-cdh5.5.0/hadoop-dist/target/hadoop-2.6.0-cdh5.5.0/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
snappy: true /src/hadoop-2.6.0-cdh5.5.0/hadoop-dist/target/hadoop-2.6.0-cdh5.5.0/lib/native/libsnappy.so.1
lz4: true revision:99
bzip2: true /lib64/libbz2.so.1
openssl: true /src/hadoop-2.6.0-cdh5.5.0/hadoop-dist/target/hadoop-2.6.0-cdh5.5.0/lib/native/libcrypto.so
一般來講,只要hadoop本身的共享庫,即自帶共享庫可用,其他庫如 zlib snappy lz4 bzip2 openssl 等都可以載入系統自帶的共享庫(預設安裝位置)。
有強迫症的話,可以將系統自帶的共享庫檔案,拷貝一份放到 native 目錄下,執行 checknative 所使用的庫都會使用 native 下的共享庫。
手動拷貝共享庫到native,和編譯時指定引數,效果未經測試。
附件,編譯所需的大部分依賴庫。