雲帆大資料學院_hadoop 2.2.0原始碼編譯
2.1下載地址
1、ApacheHadoop(100%永久開源)下載地址:
- http://hadoop.apache.org/releases.html
- SVN:http://svn.apache.org/repos/asf/hadoop/common/branches/
2、CDH(ClouderaDistributed Hadoop,100%永久開源)下載地址:
- http://archive.cloudera.com/cdh4/cdh/4/(是tar.gz檔案!)
- http://archive.cloudera.com/cdh5/cdh/ (是tar.gz檔案!)
2.2官方版本說明
(1) 官網:http://hadoop.apache.org
(2) 下載Hadoop包
(3) 官方版本存在的問題
官方版本是在Linux 32位環境下編譯的,在Linux64為環境下執行會出錯:
u 錯誤警告:WARNutil.NativeCodeLoader: Unable to load native-hadoop library for yourplatform... using builtin-java classes where applicable。
u 官網提供的二進位制包,裡面的native庫,是32位的可以通過以下命令進行檢視:
$file $HADOOP_PREFIX/lib/native/libhadoop.so.1.0.0
可以看到該庫是基於32位的
libhadoop.so.1.0.0: ELF32-bitLSBshared object, Intel 80386, version 1 (SYSV), dynamically linked,BuildID[sha1]=0x9eb1d49b05f67d38454e42b216e053a27ae8bac9, not stripped。
2.3官方編譯說明
在下載下來的hadoop-2.2.0-src.tar.gz包下有個BUILDING.txt檔案,這個檔案詳細說明了編譯步驟
Build instructions for Hadoop
----------------------------------------------------------------------------------
Requirements:先決條件
* Unix System (這裡採用社群版Linux CentOS 6.4版本64位)
* JDK 1.6+ (JDK 1.6以上)
* Maven 3.0 or later (建議最好採用3.0.5版本)
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code) (編譯本地庫)
* Internet connection for first build (to fetch allMaven and Hadoop dependencies) (聯網下載依賴包)
----------------------------------------------------------------------------------
Maven main modules:
hadoop (Main Hadoopproject)
-hadoop-project (Parent POM forall Hadoop Maven modules. )
(Allplugins & dependencies versions are defined here.)
-hadoop-project-dist (Parent POM formodules that generate distributions.)
-hadoop-annotations (Generates theHadoop doclet used to generated the Javadocs)
-hadoop-assemblies (Mavenassemblies used by the different modules)
-hadoop-common-project (Hadoop Common)
-hadoop-hdfs-project (Hadoop HDFS)
-hadoop-mapreduce-project (Hadoop MapReduce)
-hadoop-tools (Hadoop toolslike Streaming, Distcp, etc.)
-hadoop-dist (Hadoopdistribution assembler)
----------------------------------------------------------------------------------
Where to run Maven from?
It can berun from any module. The only catch is that if not run from utrunk all modules that are not part of the buildrun must be installed in the local Mavencache or available in a Maven repository.
----------------------------------------------------------------------------------
Maven build goals:
* Clean : mvn clean
*Compile : mvn compile[-Pnative]
* Runtests : mvn test[-Pnative]
* CreateJAR : mvn package
* Runfindbugs : mvn compilefindbugs:findbugs
* Runcheckstyle : mvn compilecheckstyle:checkstyle
* InstallJAR in M2 cache : mvn install
* Deploy JARto Maven repo : mvn deploy
* Runclover : mvn test -Pclover[-DcloverLicenseLocation=${user.name}/.clover.license]
* RunRat : mvnapache-rat:check
* Buildjavadocs : mvn javadoc:javadoc
* Builddistribution : mvn package[-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar]
* Change Hadoopversion : mvn versions:set-DnewVersion=NEWVERSION
Buildoptions:
* Use-Pnative to compile/bundle native code
* Use-Pdocs to generate & bundle the documentation in the distribution (using-Pdist)
* Use -Psrcto create a project source TAR.GZ
* Use -Dtarto create a TAR with the distribution (using -Pdist)
Snappybuild options:
Snappy isa compression library that can be utilized by the native code. It is currentlyan optional component, meaning that Hadoop can be built with or without this dependency.
* Use-Drequire.snappy to fail the build if libsnappy.so is not found. If this optionis not specified and the snappy library is missing, we silently build a version of libhadoop.sothat cannot make use of snappy. Thisoption is recommended if you plan on making use of snappy and want to get more repeatable builds.
* Use-Dsnappy.prefix to specify a nonstandard location for the libsnappy headerfiles and library files. You do not need this option if you have installedsnappy using a package manager.
* Use-Dsnappy.lib to specify a nonstandard location for the libsnappy library files.Similarly to nappy.prefix, you do not need this option if you have installed snappy using a package manager.
* Use-Dbundle.snappy to copy the contents of the snappy.lib directory into the finaltar file. This option requires that -Dsnappy.lib is also given, and it ignoresthe -Dsnappy.prefix option.
---------------------------------------------------------------------------------
Building components separately
If you are building a submodule directory, all thehadoop dependencies this submodule has will be resolved as all other 3rd partydependencies. This is,from the Maven cache or from a Maven repository (if notavailable in the cache or the SNAPSHOT 'timed out').
An alternative is to run 'mvn install -DskipTests' from Hadoop source top levelonce; and then work from the submodule. Keep in mind that SNAPSHOTs time outafter a while, using the Maven '-nsu' will stop Maven from trying to updateSNAPSHOTs from external repos.
----------------------------------------------------------------------------------
Protocol Buffer compiler
The version of Protocol Buffer compiler, protoc,must match the version of the protobuf JAR.
If you have multiple versions of protoc in yoursystem, you can set in your build shell the HADOOP_PROTOC_PATH environmentvariable to point to the one you want to use for the Hadoop build. If you don'tdefine this environment variable,protoc is looked up in the PATH.
----------------------------------------------------------------------------------
Importing projects to eclipse
When you import the project to eclipse, installhadoop-maven-plugins at first.
$ cdhadoop-maven-plugins
$ mvninstall
Then, generate eclipse project files.
$ mvneclipse:eclipse -DskipTests
At last, import to eclipse by specifying the rootdirectory of the project via
[File] > [Import] > [Existing Projects intoWorkspace].
----------------------------------------------------------------------------------
Building distributions: (編譯釋出)
Createbinary distribution without native codeand without documentation:(二進位制原始碼)
$ mvnpackage -Pdist -DskipTests –Dtar
Createbinary distribution with native code andwith documentation:(二進位制原始碼+本地庫+文件)
$ mvnpackage -Pdist,native,docs -DskipTests –Dtar
Createsource distribution:(原始碼)
$ mvnpackage -Psrc –DskipTests
Createsource and binarydistributions with native code and documentation:(原始碼+二進位制原始碼+本地庫+文件)
$ mvnpackage -Pdist,native,docs,src -DskipTests –Dtar
Createa local staging version of the website(in/tmp/hadoop-site)
$ mvn cleansite; mvn site:stage -DstagingDirectory=/tmp/hadoop-site
----------------------------------------------------------------------------------
Handling out of memory errors in builds(解決記憶體溢位問題)
If the build process fails with an out of memoryerror, you should be able to fix it by increasing the memory used by maven-which can be done via the environment variable MAVEN_OPTS.
Here is an example setting to allocate between 256and 512 MB of heap space to Maven
export MAVEN_OPTS="-Xms256m -Xmx512m"
----------------------------------------------------------------------------------
2.4編譯步驟
Step1:安裝VMware 10 (略)
Step2:安裝 Linux作業系統 64bit(略)
這裡採用社群版CentOS 6.4版本 64位. 下載地址:http://www.centoscn.com/CentosSoft/
Step3:設定Linux聯網
(1) 設定VMware虛擬機器網路模式為:NAT模式
(2) 設定Linux作業系統的網路型別為:動態獲取DHCP伺服器地址,與宿主機共享網路
(3) 測試:pingwww.baidu.com
Step4:安裝JDK
說明: JDK版本為1.5以上 ; 64位編譯版本 (本環境採用jdk-6u45-linux-x64.bin)
(1)使用FTP工具(WinSCP工具或FileZilla)將jdk-6u45-linux-x64.bin上傳到Linxu系統/software/目錄下
(2)安裝jdk
cd /software/
chmod u+x jdk-6u45-linux-x64.bin --授予執行許可權
mkdir /workDir --建立一個軟體安裝目錄(個人習慣而已)
cp jdk-6u45-linux-x64.bin /workDir --複製到workDir目錄
./ jdk-6u45-linux-x64.bin --執行自解壓檔案
mv jdk1.6.0_45 jdk6u45 --方便起見,對資料夾重新命名
(3)配置環境變數
Vi /etc/profile
增加如下配置:
export JAVA_HOME=/workDir/jdk6u45
export PATH=.:$PATH:$JAVA_HOME/bin
(1) 使環境變數生效
source /etc/profile
(5)驗證jdk是否安裝成功
java –verson
Step5:安裝依賴包
yum install autoconf -y
yum install automake -y
yum install libtool -y
yum install cmake -y
yum installncurses-devel -y
yum installopenssl-devel -y
yum installgcc -y
yum install gcc-c++ -y
yum install lzo-devel -y
yum installzlib-devel -y
說明:-y 代表在安裝過程中提示選擇預設為“yes”
驗證:
rpm –qa | grep autoconf
【yum命令簡介】:
yum(全稱為 Yellow dog Updater, Modified)是一個在Fedora和RedHat以及SUSE中的Shell前端軟體包管理器。基於RPM包管理,能夠從指定的伺服器自動下載RPM包並且安裝,可以自動處理依賴性關係,並且一次安裝所有依賴的軟體包,無須繁瑣地一次次下載、安裝。yum提供了查詢、安裝、刪除某一個、一組甚至全部軟體包的命令,而且命令簡潔而又好記。
yum的命令形式一般是如下:yum [options] [command] [package...]
其中的[options]是可選的,選項包括-h(幫助),-y(當安裝過程提示選擇全部為"yes"),-q(不顯示安裝的過程)等等。[command]為所要進行的操作,[package ...]是操作的物件。
- 部分常用的命令包括:
自動搜尋最快映象外掛:yum install yum-fastestmirror
安裝yum圖形視窗外掛: yum install yumex
檢視可能批量安裝的列表: yum grouplist
- 安裝
yuminstall 全部安裝
yuminstall package1 安裝指定的安裝包package1
yumgroupinsall group1 安裝程式組group1
Step6:安裝Maven
(1) Maven 版本下載apache-maven-3.0.5-bin.tar.gz
說明:不要使用最新的Maven 3.1.1,Hadoop2.2.0的原始碼與Maven3.x存在相容性問題,所以會出現
java.lang.NoClassDefFoundError:org/sonatype/aether/graph/DependencyFilter
建議使用Maven3.0.5版本
(2) 下載
地址:http://maven.apache.org/download.cgi
選擇 apache-maven-3.0.5-bin.tar.gz下載
(3) 上傳到Linux並解壓到安裝目錄
tar –zxvf apache-maven-3.0.5-bin.tar.gz –C/workDir
(4) 設定環境變數
vi/etc/profile
新增:
exportMAVEN_HOME=/workDir/apache-maven-3.0.5
exportPATH=$PATH:$MAVEN_HOME/bin
執行命令:source /etc/profile 或者. /etc/profile
驗證:
mvn-v
Step7:配置Maven國內映象
(1)編輯 settings.xml檔案
進入安裝目錄 /workDir/apache-maven-3.0.5/conf
*修改<mirrors>內容:
<mirror>
<id>nexus-osc</id>
<mirrorOf>*</mirrorOf>
<name>Nexusosc</name>
<url>http://maven.oschina.net/content/groups/public/</url>
</mirror>
*修改<profiles>內容:
<profile>
<id>jdk-1.6</id>
<activation>
<jdk>1.6</jdk>
</activation>
<repositories>
<repository>
<id>nexus</id>
<name>localprivate nexus</name>
<url>http://maven.oschina.net/content/groups/public/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>nexus</id>
<name>localprivate nexus</name>
<url>http://maven.oschina.net/content/groups/public/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</pluginRepository>
</pluginRepositories>
</profile>
(2)複製配置
說明:將settings.xml檔案複製到使用者目錄,使得每次對maven建立時,都採用該配置
cd /home/Hadoop --*檢視使用者目錄【/home/hadoop】是否存在【.m2】資料夾,如沒有,則建立
mkdir .m2
cp /workDir/apache-maven-3.0.5/conf/settings.xml~/.m2 --複製檔案
(3) 配置DNS
vi /etc/resolv.conf
修改如下:
nameserver 8.8.8.8
nameserver 8.8.4.4
Step8:安裝protobuf
(1)下載protobuf-2.5.0.tar.gz
https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
(2)解壓到安裝目錄
cd /software
tar-zxvf protobuf-2.5.0.tar.gz –C /wrokDir
(3)安裝下面3個依賴包(如果已經安裝可以跳過)
yuminstall gcc -y
yuminstall gcc-c++ -y
yuminstall make -y
【說明】:如果缺少這個3個依賴包,會報下面的錯誤:
ERROR]Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.2.0:protoc(compile-protoc) on project hadoop-common:org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did notreturn a version -> [Help 1]
[ERROR]
[ERROR]To see the full stack trace of the errors, re-run Maven with the -eswitch.
[ERROR]Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR]For more information about the errors and possible solutions, please read thefollowing articles:
[ERROR][Help 1]http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR]After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :hadoop-common
(4)編譯安裝,執行配置檔案
進入安裝目錄,執行configure檔案
cd/workDir/protobuf-2.5.0 --進入安裝目錄
./configure --執行配置檔案
(5)安裝
make& make check & make install
說明:安裝protobuf需要安裝gcc gcc-c++系統包(如果之前安裝的話就不用再安裝)
(6) 配置環境變數
vi /etc/profile
新增:
export PROTOBUF_HOME=/workDir/ protobuf-2.5.0
export PATH=$PATH:$PROTOBUF_HOME/bin
使配置生效:
source /etc/profile 或者 . /etc/profile
驗證:
protoc --version
Step9:安裝findbugs-3.0.0
(1) 下載:findbugs-3.0.0.tar.gz
http://sourceforge.jp/projects/sfnet_findbugs/releases/
(2) 解壓到安裝目錄
cd /software
tar -zxvf findbugs-3.0.0.tar.gz-C /workDir
(3) 設定環境變數
vi/etc/profile
增加如下內容:
exportFINDBUGS_HOME=/wrokDir/findbugs-3.0.0
exportPATH=$PATH:$FINDBUGS_HOME/bin
(4) 使環境變數生效
source/etc/profile 或者 ./etc/profile
(5) 驗證
findbugs-version
重要說明:
如果出現以下錯誤,說明jdk版本不相容導致。findbugs-2.5.0和findbugs3.0.0是在jdk7以上編譯的,所以需要在Linux上安裝jdk7才可以。
錯誤提示:
Step10:編譯hadoop-src-2.2.0原始碼
(1) 下載:hadoop-2.2.0-src.tar.gz
http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0-src.tar.gz
(2) 解壓到安裝目錄
cd/software
tar-zxvf hadoop-2.2.0-src.tar.gz –C/workDir
(3) 原始碼包打Patch
-重要說明:hadoop-2.2.0版本的原始碼存在bug,在apache官方JIRA上有說明:
JIRA地址:https://issues.apache.org/jira/browse/HADOOP-10110
- Bug修復辦法:
Index: hadoop-common-project/hadoop-auth/pom.xml
===================================================================
--- hadoop-common-project/hadoop-auth/pom.xml (revision 1543124)
+++ hadoop-common-project/hadoop-auth/pom.xml (working copy)
@@ -54,6 +54,11 @@
</dependency>
<dependency>
<groupId>org.mortbay.jetty</groupId>
+<artifactId>jetty-util</artifactId>
+<scope>test</scope>
+</dependency>
+<dependency>
+<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty</artifactId>
<scope>test</scope>
</dependency>
從上面官方的bug修復說明中可以看到,需要編輯目錄$HADOOP_SRC_HOME/hadoop-common-project/hadoop-auth中的pom.xml檔案,在第55行下增加以下內容:
<dependency>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty-util</artifactId>
<scope>test</scope>
</dependency>
否則會報下面的錯誤:
[ERROR]Failed to execute goalorg.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile(default-testCompile) on project hadoop-auth: Compilation failure: Compilationfailure:
[ERROR]/home/chuan/trunk/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/AuthenticatorTestCase.java:[84,13]cannot access org.mortbay.component.AbstractLifeCycle
[ERROR]class file for org.mortbay.component.AbstractLifeCycle not found
(4)編譯
官方編譯說明:
Createsource and binary distributions with native code and documentation:(原始碼+二進位制原始碼+本地庫+文件)
$mvnpackage -Pdist,native,docs,src -DskipTests –Dtar
cd/wrokDir/Hadoop-2.2.0-src
mvnpackage -DskipTests -Pdist,native -Dtar
說明:如果在編譯過程中出現記憶體溢位的情況時,可以調整一下記憶體大小
export MAVEN_OPTS="-Xms256m -Xmx512m"
這個過程時間比較久,需要上網下載依賴包……
直到看到下面的資訊,說明編譯成功:
[INFO]------------------------------------------------------------------------
[INFO]BUILD SUCCESS
[INFO]------------------------------------------------------------------------
[INFO]Total time: 11:53.144s
[INFO]Finished at: Fri Nov 22 16:58:32 CST 2013
[INFO]Final Memory: 70M/239M
[INFO]------------------------------------------------------------------------
Step11:編譯後說明
1. 檢視編譯後的檔案
編譯後的路徑在:hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0
cd /workDir/ hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0
ll --檢視編譯好的目錄
編譯後hadoop-2.2.0目錄下的目錄:
drwxr-xr-x. 2 root root 4096 Aug 11 12:00 bin
drwxr-xr-x. 3 root root 4096 Aug 11 12:00 etc
drwxr-xr-x. 2 root root 4096 Aug 11 12:00 include
drwxr-xr-x. 3 root root 4096 Aug 11 12:00 lib
drwxr-xr-x. 2 root root 4096 Aug 11 12:00 libexec
drwxr-xr-x. 2 root root 4096 Aug 11 12:00 sbin
drwxr-xr-x. 4 root root 4096 Aug 11 12:00 share
進入 bin目錄,執行hadoop命令檢視指令碼
cd bin
./Hadoop version
可以看到所有版本:
[[email protected] bin]# ./hadoop version
Hadoop 2.2.0
Subversion Unknown -r Unknown
Compiled by root on 2014-08-11T18:34Z
Compiled withprotoc 2.5.0
From source with checksum79e53ce7994d1628b240f09af91e1af4
This command was run using /workDir/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/share/hadoop/common/
hadoop-common-2.2.0.jar
2. 檢視本地庫編譯版本
cd /workDir/ hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0
file lib//native/*
可以看到是64位的版本了(紅色字部分):
[[email protected] hadoop-2.2.0]# file lib//native/*
lib//native/libhadoop.a: current ar archive
lib//native/libhadooppipes.a: current ar archive
lib//native/libhadoop.so: symbolic link to `libhadoop.so.1.0.0'
lib//native/libhadoop.so.1.0.0: ELF64-bitLSB shared object, x86-64, version 1(SYSV), dynamically linked, not stripped
lib//native/libhadooputils.a: current ar archive
lib//native/libhdfs.a: current ar archive
lib//native/libhdfs.so: symbolic link to `libhdfs.so.0.0.0'
lib//native/libhdfs.so.0.0.0: ELF64-bitLSBshared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
至此,編譯成功!
轉載於:https://blog.51cto.com/yfteach01/1629703