1. 程式人生 > >hadoop大叢集優化配置,datanode節點數量為100

hadoop大叢集優化配置,datanode節點數量為100

使用場景:各種大檔案 小檔案的儲存與下載(佔用namenode記憶體比較大)

namenode伺服器記憶體100G,hadoop佔用80G,datanode節點數量為100臺,叢集優化之前每次GC需要20多秒,優化之後每次GC只需要花費1秒左右,大大提高了叢集效率

多說一句,jvm每次做GC操作時,對外界是沒有響應的,所有對jvm的請求都處於等待

hadoop-env.sh配置

JVM_OPTS="-server -verbose:gc
  -XX:+PrintGCDateStamps
  -XX:+PrintGCDetails
  -XX:+UseGCLogFileRotation
  -XX:NumberOfGCLogFiles=9
  -XX:GCLogFileSize=256m
  -XX:+DisableExplicitGC
  -XX:+UseCompressedOops
  -XX:SoftRefLRUPolicyMSPerMB=0
  -XX:+UseFastAccessorMethods
  -XX:+UseParNewGC
  -XX:+UseConcMarkSweepGC
  -XX:+CMSParallelRemarkEnabled
  -XX:CMSInitiatingOccupancyFraction=70
  -XX:+UseCMSCompactAtFullCollection
  -XX:CMSFullGCsBeforeCompaction=0
  -XX:+CMSClassUnloadingEnabled
  -XX:CMSMaxAbortablePrecleanTime=301
  -XX:+CMSScavengeBeforeRemark
  -XX:PermSize=160m
  -XX:GCTimeRatio=19
  -XX:SurvivorRatio=2
  -XX:MaxTenuringThreshold=100" //這個引數很重要,MaxTenuringThreshold這個引數用於控制物件能經歷多少次Minor GC才晉升到舊生代,預設值是15


# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="$JVM_OPTS -Xmx80g -Xms80g -Xmn21g -Xloggc:$HADOOP_LOG_DIR/namenode_gc.log"
export HADOOP_SECONDARYNAMENODE_OPTS="$JVM_OPTS -Xmx80g -Xms80g -Xmn21g"
export HADOOP_DATANODE_OPTS="$JVM_OPTS -Xmx3g -Xms3g -Xmn2g -Xloggc:$HADOOP_LOG_DIR/datanode_gc.log"
export HADOOP_BALANCER_OPTS="$JVM_OPTS -Xmx1g -Xms1g -Xmn512m -Xloggc:$HADOOP_LOG_DIR/balancer_gc.log"


export HADOOP_JOBTRACKER_OPTS="$JVM_OPTS -Xmx1g -Xms1g -Xmn512m -Xloggc:$HADOOP_LOG_DIR/jobtracker_gc.log"
export HADOOP_TASKTRACKER_OPTS="$JVM_OPTS -Xmx1g -Xms1g -Xmn512m -Xloggc:$HADOOP_LOG_DIR/tasktracker_gc.log"
export HADOOP_CLIENT_OPTS="$JVM_OPTS -Xmx512m -Xms512m -Xmn256m"

歡迎各位大俠批評指正錯誤 ,轉發請註明出處 http://blog.csdn.net/maijiyouzou/article/details/23740225