CentOS7 下 Hadoop 單節點(偽分布式)部署
阿新 • • 發佈:2019-03-21
cati center etc 火墻 exp ech source add code
Hadoop 下載 (2.9.2)
https://hadoop.apache.org/releases.html
關閉防火墻 (也可放行)
# 停止防火墻
systemctl stop firewalld
# 關閉防火墻開機自啟動
systemctl disable firewalld
修改 hosts 文件,讓 hadoop 對應本機 IP 地址 (非 127.0.0.1)
vim /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 xxx.xxx.xxx.xxx hadoop
安裝 JDK
https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
# 解壓 tar -zxf /opt/jdk-8u202-linux-x64.tar.gz -C /opt/ # 配置環境變量 vim /etc/profile # JAVA_HOME export JAVA_HOME=/opt/jdk1.8.0_202/ export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$CLASSPATH export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH # 刷新環境變量 source /etc/profile # 驗證 java -version # java version "1.8.0_202" # Java(TM) SE Runtime Environment (build 1.8.0_202-b08) # Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)
安裝Hadoop
# 解壓 tar -zxf /opt/hadoop-2.9.2-snappy-64.tar.gz -C /opt/ # 配置環境變量 vim /etc/profile # HADOOP_HOME export HADOOP_HOME=/opt/hadoop-2.9.2 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin # 刷新環境變量 source /etc/profile # 驗證 hadoop version # 自己編譯的,顯示可能不一樣 # Hadoop 2.9.2 # Subversion Unknown -r Unknown # Compiled by root on 2018-12-16T09:39Z # Compiled with protoc 2.5.0 # From source with checksum 3a9939967262218aa556c684d107985 # This command was run using /opt/hadoop-2.9.2/share/hadoop/common/hadoop-common-2.9.2.jar
配置 Hadoop 偽分布式
配置 HDFS
hadoop-env.sh
vim /opt/hadoop-2.9.2/etc/hadoop/hadoop-env.sh # 配置 JDK 路徑 # The java implementation to use. export JAVA_HOME=/opt/jdk1.8.0_202/
core-site.xml
<configuration> <!-- 指定HDFS中NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop:9000</value> </property> <!-- 指定Hadoop運行時產生文件的存儲目錄 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoopTmp</value> </property> </configuration>
hdfs-site.xml
<configuration> <!-- 指定HDFS副本的數量 --> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
啟動 hdfs
# 第一次使用需要先格式化一次 hdfs namenode -format # 啟動 namenode hadoop-daemon.sh start namenode # 啟動 datanode hadoop-daemon.sh start datanode # 驗證,查看 jvm 進程 jps # 84609 Jps # 84242 NameNode # 84471 DataNode
瀏覽器訪問 CentOS 的 IP 地址加端口號 (默認50070) 即可看到 web 端
配置 YARN
yarn-env.sh
vim /opt/hadoop-2.9.2/etc/hadoop/yarn-env.sh # 配置 JDK 路徑 # some Java parameters export JAVA_HOME=/opt/jdk1.8.0_202/
yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <!-- Reducer獲取數據的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定YARN的ResourceManager的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop</value> </property> </configuration>
啟動 yarn,需保證 hdfs 已啟動
# 啟動 resourcemanager yarn-daemon.sh start resourcemanager # 啟動 nodemanager yarn-daemon.sh start nodemanager # 查看 JVM 進程 jps # 1604 DataNode # 1877 ResourceManager # 3223 Jps # 1468 NameNode # 2172 NodeManager
瀏覽器訪問 CentOS 的 IP 地址加端口號 (默認8088) 即可看到 web 端
配置 MapReduce
mapred-env.sh
vim /opt/hadoop-2.9.2/etc/hadoop/mapred-env.sh # 配置 JDK 路徑 export JAVA_HOME=/opt/jdk1.8.0_202/ # when HADOOP_JOB_HISTORYSERVER_HEAPSIZE is not defined, set it.
mapred-site.xml
# 復制一份 cp /opt/hadoop-2.9.2/etc/hadoop/mapred-site.xml.template /opt/hadoop-2.9.2/etc/hadoop/mapred-site.xml # 編輯 vim /opt/hadoop-2.9.2/etc/hadoop/mapred-site.xml
<configuration> <!-- 指定MR運行在YARN上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
運行一個 MapReduce 任務
# 計算圓周率 hadoop jar /opt/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 100 # Job Finished in 26.542 seconds # Estimated value of Pi is 3.14800000000000000000
瀏覽器訪問 CentOS 的 IP 地址加端口號 (默認8088) 可以查看記錄
其他配置
配置 jobhistory,打開歷史記錄
mapred-site.xml
<configuration> <!-- 歷史服務器端地址 --> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop:10020</value> </property> <!-- 歷史服務器web端地址 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop:19888</value> </property> </configuration>
# 啟動 jobhistory mr-jobhistory-daemon.sh start historyserver # JVM 進程 jps # 7376 NodeManager # 6903 DataNode # 18345 Jps # 6797 NameNode # 7086 ResourceManager # 18254 JobHistoryServer
瀏覽器訪問 CentOS 的 IP 地址加端口號 (默認19888) 即可看到 web 端
配置 log-aggregation 日誌聚集,在 web 端查看運行詳情
yarn-site.xml
<configuration> <!-- 開啟日誌聚集功能 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 設置日誌保留時間(7天) --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> </configuration>
# 需要重啟一遍服務 hadoop-daemon.sh stop namenode hadoop-daemon.sh stop datanode yarn-daemon.sh stop resourcemanager yarn-daemon.sh stop nodemanager mr-jobhistory-daemon.sh stop historyserver hadoop-daemon.sh start namenode hadoop-daemon.sh start datanode yarn-daemon.sh start resourcemanager yarn-daemon.sh start nodemanager mr-jobhistory-daemon.sh start historyserver # 再運行一個任務,就可以看到詳情 hadoop jar /opt/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 100
查看剛剛運行的任務詳情,未開啟日誌聚集之前運行的任務無法查看詳情
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
CentOS7 下 Hadoop 單節點(偽分布式)部署