1. 程式人生 > 其它 >Hadoop安裝與常用操作命令

Hadoop安裝與常用操作命令

一、大綱

1、HDFS叢集環境搭建

2、常見問題

3、HDFS Shell命令使用

 

 

二、叢集環境搭建

下載地址: https://hadoop.apache.org/releases.html

 

1、初始化目錄

在/bigdata/hadoop-3.2.2/下建立目錄

mkdir logs secret hadoop_data hadoop_data/tmp hadoop_data/namenode hadoop_data/datanode

 

2、設定預設認證使用者

vi hadoop-http-auth-signature-secret

root

 

使用simple偽安全配置,需要設定訪問使用者,具體見core-site.xml。如果需要更安全的認證可以使用kerberos。在hadoop web訪問地址後面加 ?user.name=root

比如:http://yuxuan01:8088/cluster?user.name=root

 

3、修改所有伺服器環境變數

vim /etc/profile

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.302.b08-0.el7_9.x86_64/jre

export HADOOP_HOME=/bigdata/hadoop-3.2.2

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin

 

source /etc/profile

 

4、配置env環境

1)分別在httpfs-env.sh、mapred-env.sh、yarn-env.sh檔案前新增JAVA_HOME環境變數

目錄:$HADOOP_HOME/etc/hadoop

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.302.b08-0.el7_9.x86_64/jre

 

2) 在hadoop-env.sh檔案中新增JAVA_HOME和HADOOP_HOME

目錄:$HADOOP_HOME/etc/hadoop

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.302.b08-0.el7_9.x86_64/jre

export HADOOP_HOME=/bigdata/hadoop-3.2.2

 

5、配置使用者

在start-dfs.sh和stop-dfs.sh頭部配置

HDFS_DATANODE_USER=root

HDFS_DATANODE_SECURE_USER=root

HDFS_NAMENODE_USER=root

HDFS_SECONDARYNAMENODE_USER=root

YARN_RESOURCEMANAGER_USER=root

YARN_NODEMANAGER_USER=root

 

在start-yarn.sh和stop-yarn.sh頭部配置

 

YARN_RESOURCEMANAGER_USER=root

HADOOP_SECURE_DN_USER=yarn

YARN_NODEMANAGER_USER=root

 

6、core-site.xml 配置

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://yuxuan01:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/bigdata/hadoop-3.2.2/hadoop_data/tmp</value>
  </property>
  
  <property>
    <name>io.compression.codecs</name>
	<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
  </property>
  
  <property>
    <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>
  
  <property>
    <name>hadoop.http.filter.initializers</name>
    <value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
	<description></description>
  </property>
  
  <property>
    <name>hadoop.http.authentication.type</name>
    <value>simple</value>
  </property>
  <property>
    <name>hadoop.http.authentication.signature.secret.file</name>
    <value>/bigdata/hadoop-3.2.2/secret/hadoop-http-auth-signature-secret</value>
	<description></description>
  </property>
  <property>
   <name>hadoop.http.authentication.simple.anonymous.allowed</name>
    <value>false</value>
	<description></description>
  </property>
  
  
  <property>
    <name>dfs.permissions.enabled</name>
    <value>false</value>
  </property>

  <property>
    <name>hadoop.proxyuser.jack.hosts</name>
    <value>*</value>
  </property>

  <property>
    <name>hadoop.proxyuser.jack.groups</name>
    <value>*</value>
  </property>
  
  <!-- -->
  <property>
    <name>fs.trash.interval</name>
    <value>1440</value>
    <description></description>
  </property>
  <property>
    <name>fs.trash.checkpoint.interval</name>
    <value>1440</value>
  </property>
</configuration>

 

7、hdfs-site.xml配置

<configuration>
   <property>
      <name>dfs.namenode.name.dir</name>
      <value>/bigdata/hadoop-3.2.2/hadoop_data/namenode</value>
      <description></description>
   </property>

   <property>
      <name>dfs.datanode.data.dir</name>
      <value>/bigdata/hadoop-3.2.2/hadoop_data/datanode</value>
      <description></description>
   </property>

   <property>
      <name>dfs.replication</name>
      <value>3</value>
      <description></description>
   </property>
   
   <property>
      <name>dfs.secondary.http.address</name>
      <value>yuxuan02:9001</value>
      <description></description>
   </property>
  
   <property>
      <name>dfs.webhdfs.enabled</name>
      <value>true</value>
   </property>
</configuration>

 

8、mapred-site.xml配置

<configuration>
   <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
   </property>
   
   <property>
     <name>yarn.app.mapreduce.am.env</name>
     <value>HADOOP_MAPRED_HOME=/bigdata/hadoop-3.2.2/etc/hadoop:/bigdata/hadoop-3.2.2/share/hadoop/common/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/common/*:/bigdata/hadoop-3.2.2/share/hadoop/hdfs:/bigdata/hadoop-3.2.2/share/hadoop/hdfs/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/hdfs/*:/bigdata/hadoop-3.2.2/share/hadoop/mapreduce/*:/bigdata/hadoop-3.2.2/share/hadoop/yarn:/bigdata/hadoop-3.2.2/share/hadoop/yarn/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/yarn/*</value>
   </property>
   <property>
     <name>mapreduce.map.env</name>
     <value>HADOOP_MAPRED_HOME=/bigdata/hadoop-3.2.2/etc/hadoop:/bigdata/hadoop-3.2.2/share/hadoop/common/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/common/*:/bigdata/hadoop-3.2.2/share/hadoop/hdfs:/bigdata/hadoop-3.2.2/share/hadoop/hdfs/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/hdfs/*:/bigdata/hadoop-3.2.2/share/hadoop/mapreduce/*:/bigdata/hadoop-3.2.2/share/hadoop/yarn:/bigdata/hadoop-3.2.2/share/hadoop/yarn/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/yarn/*</value>
   </property>
   <property>
     <name>mapreduce.reduce.env</name>
     <value>HADOOP_MAPRED_HOME=/bigdata/hadoop-3.2.2/etc/hadoop:/bigdata/hadoop-3.2.2/share/hadoop/common/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/common/*:/bigdata/hadoop-3.2.2/share/hadoop/hdfs:/bigdata/hadoop-3.2.2/share/hadoop/hdfs/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/hdfs/*:/bigdata/hadoop-3.2.2/share/hadoop/mapreduce/*:/bigdata/hadoop-3.2.2/share/hadoop/yarn:/bigdata/hadoop-3.2.2/share/hadoop/yarn/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/yarn/*</value>
   </property>

   <property>  
     <name>mapred.map.output.compression.codec</name>  
     <value>com.hadoop.compression.lzo.LzoCodec</value>  
   </property>  

   <property>  
     <name>mapred.child.env</name>  
     <value>LD_LIBRARY_PATH=/usr/local/hadoop/lzo/lib</value>  
   </property>
   
   <property>  
     <name>mapred.child.java.opts</name>  
     <value>-Xmx1048m</value>  
   </property> 
   
   <property>  
     <name>mapreduce.map.java.opts</name>  
     <value>-Xmx1310m</value>  
   </property> 
   
   <property>  
     <name>mapreduce.reduce.java.opts</name>  
     <value>-Xmx2620m</value>  
   </property> 
   
   <property>
     <name>mapreduce.job.counters.limit</name>
     <value>20000</value>
     <description>Limit on the number of counters allowed per job. The default value is 200.</description>
   </property>
</configuration>

 

9、yarn-site.xml配置

<configuration>
   <!-- Site specific YARN configuration properties -->
   <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
   </property>
   
   <property>
      <name>yarn.resourcemanager.hostname</name>
      <value>yuxuan01</value>
   </property>
   
   <property>
      <description>Amount of physical memory, in MB, that can be allocated for containers.</description>
      <name>yarn.nodemanager.resource.memory-mb</name>
      <value>7192</value>
   </property>
   
   <property>
      <description>The minimum allocation for every container request at the RM,in MBs. 
	  Memory requests lower than this won't take effect,and the specified value will get allocated at minimum.</description>
      <name>yarn.scheduler.minimum-allocation-mb</name>
      <value>1024</value>
   </property>

   <property>
      <description>The maximum allocation for every container request at the RM,in MBs. 
	  Memory requests higher than this won't take effect, and will get capped to this value.</description>
      <name>yarn.scheduler.maximum-allocation-mb</name>
      <value>7192</value>
   </property>

   <property>
      <name>yarn.nodemanager.vmem-check-enabled</name>
	  <value>false</value>
   </property>
   
    <property>
      <name>yarn.app.mapreduce.am.command-opts</name>
      <value>-Xmx2457m</value>
  </property>
</configuration>

 

10、配置works

設定datanode的伺服器,之前檔名是slaves,hadoop3之後改為workers了。目錄:$HADOOP_HOME/etc/hadoop

 

11、同步到其他伺服器目錄

scp -r /bigdata/hadoop-3.2.2/ root@yuxuan02:/bigdata/

scp -r /bigdata/hadoop-3.2.3/ root@yuxuan03:/bigdata/

 

12、格式化hadoop

hdfs namenode -format

 

13、啟動

./bin/start-all.sh

jps

 

14、web頁面檢視

  1. 首次訪問(由於設定了simple安全策略):http://yuxuan01:9870?user.name=root

  2. Job檢視:http://yuxuan01:8088/cluster?user.name=root

 

三、常見問題

1、啟動Namenode失敗

檢視 /bigdata/hadoop-3.2.2/hadoop_data/namenode目錄是否存在

工具初始化: ./bin/hadoop namenode -format

 

2、啟動datanode失敗

第一種方法:

每次格式化前,要先關閉

./stop-all.sh

然後再格式化

./hdfs namenode -format

最後啟動

./start-all.sh

 

第二種方法:

進入/bigdata/hadoop-3.2.2/hadoop_data/namenode目錄(此目錄為namenode的dfs.name.dir配置的路徑)

rm -rf /bigdata/hadoop-3.2.2/hadoop_data/namenode

然後再格式化

./hdfs namenode -format

最後啟動

./start-all.sh

 

 

 

四、HDFS常用Shell命令

http://hadoop.apache.org/docs/r1.2.1/commands-manual.html

使用者命令和管理員命令

./hadoop 檢視所有命令

./hadoop fs -put hadoop / 假設上傳hadoop檔案 到/目錄

./hadoop fs -lsr /

./hadoop fs -du / 檢視檔案大小

./hadoop fs -rm /hadoop 刪除檔案

./hadoop fs -rmr /hadoop 刪除資料夾下所有檔案

./hadoop fs -mkdir /louis 建立目錄

./hadoop dfsadmin -report 報告檔案資訊和統計資訊

./hadoop dfsadmin -safemode enter 只讀模式

/hadoop dfsadmin -safemode leave 離開模式

./hadoop fsck /louis -files -blocks 檢查檔案是否健康

 

fsck作用

1) 檢查檔案系統的健康狀態

2)檢視檔案所在的資料塊

3)刪除一個壞塊

4)查詢一個缺失的塊

 

hadoop balancer 磁碟均衡器

hadoop archive 檔案歸檔,小檔案合併在一起

./hadoop archive -archiveName pack.har -p /loris hadoop arichivdDir 生成歸檔包

./hadoop fs -lsr /user/louris/arichiveDirpack.har

./hadoop fs -cat /user/louis/archiveDir/pack.har/_index 檢視歸檔包檔案