Hadoop2.x與Hadoop3.x副本選擇機制

阿新 • • 發佈：2022-03-26

HDFS 上的檔案對應的 Block 儲存多個副本，且提供容錯機制，副本丟失或者宕機自動恢復，預設是存 3 個副本。

2.8.x之前的副本策略

官方文件說明：

https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Data_Replication

For the common case, when the replication factor is three, HDFS’s placement policy is to put one replica on one node in the local rack

, another on a different node in the local rack, and the last on a different node in a different rack. This policy cuts the inter-rack write traffic which generally improves write performance. The chance of rack failure is far less than that of node failure; this policy does not impact data reliability and availability guarantees. However, it does reduce the aggregate network bandwidth used when reading data since a block is placed in only two unique racks rather than three. With this policy, the replicas of a file do not evenly distribute across the racks. One third of replicas are on one node, two thirds of replicas are on one rack, and the other third are evenly distributed across the remaining racks. This policy improves write performance without compromising data reliability or read performance.

第一副本：放置在上傳檔案的 DataNode 上；如果是叢集外提交，則隨機挑選一個磁碟不太慢、CPU 不太忙的節點。

第二副本：放置在與第一個副本相同的機架的節點上。

第三副本：與第二個副本相同機架的不同節點上。

如果還有更多的副本：隨機放在節點上，同時需要保持每個機架的副本數低於上限，基本上是((replicas - 1) / racks + 2）。

因為 NameNode 不允許 DataNodes 擁有同一個 block 的多個副本，所以能建立的最大副本數就是當時 DataNodes 的總數。

2.9.x之後及3.x的副本策略

官方文件說明：

https://hadoop.apache.org/docs/r2.9.0/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Data_Replication

For the common case, when the replication factor is three, HDFS’s placement policy is to put one replica on the local machine if the writer is on a datanode, otherwise on a random datanode, another replica on a node in a different (remote) rack, and the last on a different node in the same remote rack. This policy cuts the inter-rack write traffic which generally improves write performance. The chance of rack failure is far less than that of node failure; this policy does not impact data reliability and availability guarantees. However, it does reduce the aggregate network bandwidth used when reading data since a block is placed in only two unique racks rather than three. With this policy, the replicas of a file do not evenly distribute across the racks. One third of replicas are on one node, two thirds of replicas are on one rack, and the other third are evenly distributed across the remaining racks. This policy improves write performance without compromising data reliability or read performance.

第一副本：放置在上傳檔案的 DataNode 上；如果是叢集外提交，則隨機挑選一個磁碟不太慢、CPU 不太忙的節點。

第二副本：放置在與第一個副本不同的機架的節點上。

第三副本：與第二個副本相同機架的不同節點上。

如果還有更多的副本：隨機放在節點上，同時需要保持每個機架的副本數低於上限，基本上是((replicas - 1) / racks + 2）。

因為 NameNode 不允許 DataNodes 擁有同一個 block 的多個副本，所以能建立的最大副本數就是當時 DataNodes 的總數。