hadoop大叢集優化配置,datanode節點數量為100
使用場景:各種大檔案 小檔案的儲存與下載(佔用namenode記憶體比較大)
namenode伺服器記憶體100G,hadoop佔用80G,datanode節點數量為100臺,叢集優化之前每次GC需要20多秒,優化之後每次GC只需要花費1秒左右,大大提高了叢集效率
多說一句,jvm每次做GC操作時,對外界是沒有響應的,所有對jvm的請求都處於等待
hadoop-env.sh配置
JVM_OPTS="-server -verbose:gc
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=9
-XX:GCLogFileSize=256m
-XX:+DisableExplicitGC
-XX:+UseCompressedOops
-XX:SoftRefLRUPolicyMSPerMB=0
-XX:+UseFastAccessorMethods
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=70
-XX:+UseCMSCompactAtFullCollection
-XX:CMSFullGCsBeforeCompaction=0
-XX:+CMSClassUnloadingEnabled
-XX:CMSMaxAbortablePrecleanTime=301
-XX:+CMSScavengeBeforeRemark
-XX:PermSize=160m
-XX:GCTimeRatio=19
-XX:SurvivorRatio=2
-XX:MaxTenuringThreshold=100" //這個引數很重要,MaxTenuringThreshold這個引數用於控制物件能經歷多少次Minor GC才晉升到舊生代,預設值是15
# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="$JVM_OPTS -Xmx80g -Xms80g -Xmn21g -Xloggc:$HADOOP_LOG_DIR/namenode_gc.log"
export HADOOP_SECONDARYNAMENODE_OPTS="$JVM_OPTS -Xmx80g -Xms80g -Xmn21g"
export HADOOP_DATANODE_OPTS="$JVM_OPTS -Xmx3g -Xms3g -Xmn2g -Xloggc:$HADOOP_LOG_DIR/datanode_gc.log"
export HADOOP_BALANCER_OPTS="$JVM_OPTS -Xmx1g -Xms1g -Xmn512m -Xloggc:$HADOOP_LOG_DIR/balancer_gc.log"
export HADOOP_JOBTRACKER_OPTS="$JVM_OPTS -Xmx1g -Xms1g -Xmn512m -Xloggc:$HADOOP_LOG_DIR/jobtracker_gc.log"
export HADOOP_TASKTRACKER_OPTS="$JVM_OPTS -Xmx1g -Xms1g -Xmn512m -Xloggc:$HADOOP_LOG_DIR/tasktracker_gc.log"
export HADOOP_CLIENT_OPTS="$JVM_OPTS -Xmx512m -Xms512m -Xmn256m"
歡迎各位大俠批評指正錯誤 ,轉發請註明出處 http://blog.csdn.net/maijiyouzou/article/details/23740225