1. 程式人生 > >編譯hadoop並獲取native

編譯hadoop並獲取native

文件說明
    該文件為完成編譯後,根據印象編寫,並未經過重新編譯測試,大體步驟應該沒錯,如有錯誤,望指正。
    編譯的最初目的,是為了獲取合適的native包,並且檢測手動編譯的native包和自帶的native包有何差別。
    編譯的版本為apache-hadoop-2.7.6和hadoop-2.6.0-cdh5.5.0。

編譯說明
    原始碼中會有一個 BUILDING.txt 檔案,編譯需要的依賴和命令等大部分資訊都可以在裡面找到。
    比如依賴:

        Requirements:
        * Unix System
        * JDK 1.7+
        * Maven 3.0 or later
        * Findbugs 1.3.9 (if running findbugs)
        * ProtocolBuffer 2.5.0
        * CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
        * Zlib devel (if compiling native code)
        * openssl devel ( if compiling native hadoop-pipes and to get the best HDFS encryption performance )
        * Linux FUSE (Filesystem in Userspace) version 2.6 or above ( if compiling fuse_dfs )
        * Internet connection for first build (to fetch all Maven and Hadoop dependencies)

編譯環境
    $ uname -a    # Linux e3base01 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
    $ java -version    # java version "1.7.0_04"
    $ mvn -v    # Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 2014-12-15T01:29:23+08:00)
    # findbugs未安裝。
    $ protoc --version    # libprotoc 2.5.0
    $ cmake -version    # cmake version 2.8.12.2
    $ yum list installed | grep zlib-devel    # zlib-devel.x86_64       1.2.3-29.el6    @base
    $ yum list installed | grep openssl-devel    # openssl-devel.x86_64    1.0.1e-57.el6   @base
    $ yum list installed | grep fuse    # fuse.x86_64             2.8.3-4.el6     @anaconda-CentOS-201311272149.x86_64/6.5    # 系統自帶,並未刻意安裝。
    ## 編譯目錄預留推薦最少5G空間,編譯完後編譯目錄佔用空間接近4G。

安裝依賴
    jdk
        下載tar包,tar開。
        配置環境變數,如~/.bash_profile中新增:
            export JAVA_HOME=/path/to/jdk
            export PATH=$JAVA_HOME/bin:$PATH
        $ source ~/.bash_profile # 立即生效。
        驗證 
            $ java -version
    maven
        下載tar包,tar開。
        配置環境變數,如~/.bash_profile中新增:
            export MAVEN_HOME=/path/to/maven
            export MAVEN_OPTS="-Xmx4g -Xms4g"
            export PATH=$MAVEN_HOME/bin:$PATH
        $ source ~/.bash_profile # 立即生效。
        驗證 
            $ mvn -v
    protobuff
        下載tar包,tar開。
        # ./configure
        # make
        # make install
        驗證 
            $ protoc --version
    cmake,zlib-devel,openssl-devel
        $ yum install cmake zlib-devel openssl-devel
        驗證 
            $ yum list installed | egrep 'cmake|zlib-devel|openssl-devel'

編譯命令
    ## 編譯,並生成native。編譯過程如snappy和openssl包不可用,則編譯直接報錯。    # 該項未測試。
    $ mvn clean package -Pdist,native -DskipTests -Dtar -Drequire.snappy -Drequire.openssl
    ## 編譯,並生成native,會將系統的snappy和openssl等共享庫新增至native目錄。注意將 snappy.lib 和 openssl.lib 指向系統正確的lib目錄。
    $ mvn clean package -Pdist,native -DskipTests -Dtar -Drequire.snappy -Dsnappy.lib=/usr/lib64 -Dbundle.snappy -Drequire.openssl -Dopenssl.lib=/usr/lib64 -Dbundle.openssl

編譯過程中無法下載依賴的問題
    編譯過程會遇到報錯,一般是顯示download xxx,然後就失敗,提示無法下載依賴等字樣。
    因為編譯要求jdk1.7,部分源不支援jdk1.7訪問,網路人說是網路協議的問題,換1.8就好了。
    這裡編譯指定了1.7,要編譯的話只能用下面方法。
        換源,比如在maven的配置檔案中,增加ali源。下載依賴時,同一個包會在多個庫之間輪詢,輪詢到ali原,就下載成功了。
        偶爾出現下載卡主的情況,這時可以ctrl c掉,重新編譯。
        有時候換源也不解決問題,只能手動將對應的包放入maven的本次倉庫內。
        便捷的方法,是複製編譯過程中提示的依賴地址,然後進到maven對應的依賴目錄下,使用wget命令將包get下來。
        或者直接從其他地方拷一份較完整的倉庫過來,附件提供的倉庫可能不全,但涵蓋了大部分依賴包,可以節省部分時間。

編譯結果
# apache-2.7.6版本編譯完成。

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop Main ................................. SUCCESS [  1.413 s]
[INFO] Apache Hadoop Build Tools .......................... SUCCESS [  1.060 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [  1.873 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [  3.796 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [  0.214 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [  1.974 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [  5.477 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [  8.330 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [  5.958 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [  2.701 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [01:28 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [  4.182 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 10.096 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [  0.040 s]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [04:20 min]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [ 38.823 s]
[INFO] Apache Hadoop HDFS BookKeeper Journal .............. SUCCESS [  4.453 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [  3.027 s]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [  0.058 s]
[INFO] hadoop-yarn ........................................ SUCCESS [  0.036 s]
[INFO] hadoop-yarn-api .................................... SUCCESS [02:54 min]
[INFO] hadoop-yarn-common ................................. SUCCESS [ 18.703 s]
[INFO] hadoop-yarn-server ................................. SUCCESS [  0.040 s]
[INFO] hadoop-yarn-server-common .......................... SUCCESS [  8.185 s]
[INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [ 15.118 s]
[INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [  4.324 s]
[INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [ 16.786 s]
[INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 14.241 s]
[INFO] hadoop-yarn-server-tests ........................... SUCCESS [  4.823 s]
[INFO] hadoop-yarn-client ................................. SUCCESS [  5.175 s]
[INFO] hadoop-yarn-server-sharedcachemanager .............. SUCCESS [  2.580 s]
[INFO] hadoop-yarn-applications ........................... SUCCESS [  0.046 s]
[INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [  1.914 s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [  1.467 s]
[INFO] hadoop-yarn-site ................................... SUCCESS [  0.041 s]
[INFO] hadoop-yarn-registry ............................... SUCCESS [  3.651 s]
[INFO] hadoop-yarn-project ................................ SUCCESS [  3.318 s]
[INFO] hadoop-mapreduce-client ............................ SUCCESS [  0.198 s]
[INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 13.687 s]
[INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 14.004 s]
[INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [  3.039 s]
[INFO] hadoop-mapreduce-client-app ........................ SUCCESS [  6.411 s]
[INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [  5.591 s]
[INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [  4.838 s]
[INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [  2.332 s]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [  4.209 s]
[INFO] hadoop-mapreduce ................................... SUCCESS [  2.373 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [  3.223 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [  7.516 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [  1.895 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [  3.894 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [  2.840 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [  1.690 s]
[INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [  1.636 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [  3.118 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [  5.782 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [  4.930 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [01:19 min]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [  3.164 s]
[INFO] Apache Hadoop Client ............................... SUCCESS [  7.182 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [  0.910 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [  5.668 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [  7.558 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [  0.031 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [04:35 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 19:52 min
[INFO] Finished at: 2018-12-12T16:40:32+08:00
[INFO] Final Memory: 242M/3915M
[INFO] ------------------------------------------------------------------------


# cdh-5.5.0版本編譯完成。

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop Main ................................. SUCCESS [  1.799 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [  0.969 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [  2.744 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [  0.317 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [  1.852 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [  2.922 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [  3.499 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [  3.985 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [  2.539 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [01:23 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [  4.392 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 10.646 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [  0.040 s]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [04:28 min]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [01:18 min]
[INFO] Apache Hadoop HDFS BookKeeper Journal .............. SUCCESS [  3.934 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [  3.894 s]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [  0.061 s]
[INFO] hadoop-yarn ........................................ SUCCESS [  0.043 s]
[INFO] hadoop-yarn-api .................................... SUCCESS [02:01 min]
[INFO] hadoop-yarn-common ................................. SUCCESS [ 18.434 s]
[INFO] hadoop-yarn-server ................................. SUCCESS [  0.027 s]
[INFO] hadoop-yarn-server-common .......................... SUCCESS [  8.025 s]
[INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [ 23.947 s]
[INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [  7.871 s]
[INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [  6.325 s]
[INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 16.706 s]
[INFO] hadoop-yarn-server-tests ........................... SUCCESS [  0.896 s]
[INFO] hadoop-yarn-client ................................. SUCCESS [  4.011 s]
[INFO] hadoop-yarn-applications ........................... SUCCESS [  0.033 s]
[INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [  2.224 s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [  1.695 s]
[INFO] hadoop-yarn-site ................................... SUCCESS [  0.042 s]
[INFO] hadoop-yarn-registry ............................... SUCCESS [  4.132 s]
[INFO] hadoop-yarn-project ................................ SUCCESS [  4.410 s]
[INFO] hadoop-mapreduce-client ............................ SUCCESS [  0.159 s]
[INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 18.474 s]
[INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 19.756 s]
[INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [  3.631 s]
[INFO] hadoop-mapreduce-client-app ........................ SUCCESS [  7.248 s]
[INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [  4.414 s]
[INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [  4.481 s]
[INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [  1.721 s]
[INFO] hadoop-mapreduce-client-nativetask ................. SUCCESS [01:05 min]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [  3.870 s]
[INFO] hadoop-mapreduce ................................... SUCCESS [  4.490 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [  3.039 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [  6.679 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [  1.838 s]
[INFO] Apache Hadoop Archive Logs ......................... SUCCESS [  2.622 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [ 30.751 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [ 58.028 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [  2.962 s]
[INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [  1.581 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [  2.339 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [  5.728 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [  3.728 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [  7.629 s]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [  3.251 s]
[INFO] Apache Hadoop Client ............................... SUCCESS [  4.500 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [  1.157 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [  3.543 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [  9.379 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [  0.088 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [03:02 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 19:23 min
[INFO] Finished at: 2018-12-12T17:22:39+08:00
[INFO] Final Memory: 258M/4065M
[INFO] ------------------------------------------------------------------------

獲取官方自帶native
    hadoop-2.7.6
        下載tar包,tar開。
        /path/to/hadoop/lib/native
    hadoop-2.6.0-cdh5.5.0
        下載rpm包  https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.5.0/RPMS/x86_64/hadoop-2.6.0+cdh5.5.0+921-1.cdh5.5.0.p0.15.el6.x86_64.rpm
        $ rpm2cpio hadoop-2.6.0+cdh5.5.0+921-1.cdh5.5.0.p0.15.el6.x86_64.rpm | cpio -div
        $ ./usr/lib/hadoop/lib/native

編譯後的native對比
    hadoop-2.7.6
        在增加 -Drequire.snappy -Dsnappy.lib=/usr/lib64 -Dbundle.snappy -Drequire.openssl -Dopenssl.lib=/usr/lib64 -Dbundle.openssl 引數時,
            與不增加上述引數編譯的native包對比,增加了 libcrypto.so* libk5crypto.so* libsnappy.so* 等檔案。
            libk5crypto.so是一個失效的連線,需要手動找到連線對應的共享庫,拷貝到native目錄下。
        不增加上述引數,和自帶的共享庫對比,共享庫檔案大小會發生改變,但不會有共享庫缺失和增加。
    hadoop-2.6.0-cdh5.5.0
        在增加 -Drequire.snappy -Dsnappy.lib=/usr/lib64 -Dbundle.snappy -Drequire.openssl -Dopenssl.lib=/usr/lib64 -Dbundle.openssl 引數時,
            與不增加上述引數編譯的native包對比,增加了 libcrypto.so* libk5crypto.so* libsnappy.so* 等檔案。
            libk5crypto.so是一個失效的連線,需要手動找到連線對應的共享庫,拷貝到native目錄下。
        不增加上述引數,和自帶的共享庫對比,增加了 libhdfs.so libhdfs.so.0.0.0 檔案,缺少了 libsnappy.so libsnappy.so.1 libsnappy.so.1.1.3 檔案。

部署集群后native的檢查
    $ hadoop checknative    ## 均顯示true,則說明native正常。

18/12/12 19:02:31 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
18/12/12 19:02:31 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /src/hadoop-2.6.0-cdh5.5.0/hadoop-dist/target/hadoop-2.6.0-cdh5.5.0/lib/native/libhadoop.so.1.0.0
zlib:    true /lib64/libz.so.1
snappy:  true /src/hadoop-2.6.0-cdh5.5.0/hadoop-dist/target/hadoop-2.6.0-cdh5.5.0/lib/native/libsnappy.so.1
lz4:     true revision:99
bzip2:   true /lib64/libbz2.so.1
openssl: true /src/hadoop-2.6.0-cdh5.5.0/hadoop-dist/target/hadoop-2.6.0-cdh5.5.0/lib/native/libcrypto.so


    一般來講,只要hadoop本身的共享庫,即自帶共享庫可用,其他庫如 zlib snappy lz4 bzip2 openssl 等都可以載入系統自帶的共享庫(預設安裝位置)。
    有強迫症的話,可以將系統自帶的共享庫檔案,拷貝一份放到 native 目錄下,執行 checknative 所使用的庫都會使用 native 下的共享庫。
    手動拷貝共享庫到native,和編譯時指定引數,效果未經測試。

 

附件,編譯所需的大部分依賴庫。

https://download.csdn.net/download/anyuzun/10846380