ceph/crush/mapper.c 原始碼解析

阿新 • • 發佈：2018-12-27

（1）crush_find_rule函式

int crush_find_rule(const struct crush_map *map, int ruleset, int type, int size)

crush_find_rule函式是根據指定的ruleset、type、size在crush_map中找到相應的的crush_rule id。
參數：
map：crush_map
ruleset：儲存規則集id（使用者定義）
type：儲存規則集型別（使用者定義）
size：輸出集大小

（2）bucket_perm_choose

static int bucket_perm_choose(struct crush_bucket *bucket,int x, int r)

根據bucket的隨機排列進行選擇。給定一個crush輸入x和副本位置（通常，輸出集中的位置）r，將在bucket中生成一個item。

（3）bucket_uniform_choose

static int bucket_uniform_choose(const struct crush_bucket_uniform *bucket, struct crush_work_bucket *work, int x, int r)

uniform型別適用於每個items具有相同的權重，且items 很少新增和刪除，也就是item的數量比較固定。它用了偽隨機排列演算法。

（4）bucket_list_choose

static int bucket_list_choose(const struct crush_bucket_list *bucket, int x, int r)

List型別的bucket中，其子item在記憶體中使用資料結構中的連結串列來儲存，其所包含的item可以具有任意的權重。叢集擴充套件時，新裝置加到表頭，資料遷移很少。但是移除裝置時，會產生很多資料移動。具體查詢演算法如下：
1）從List_Bucket的表頭 item開始查詢，它先得到表頭item的權重Wh，剩餘連結串列中所有item的權重之和為Ws。
2）根據Hash（x，r，i）函式得到一個[0-1]的值v，假如這個值v在[0~Wh/Ws]之中，則選擇表頭item ，並返回表頭item的id值。
3）否則繼續遍歷剩餘的連結串列，繼續遞迴選擇。
查詢複雜度為O(n)

（5） bucket_tree_choose

static int bucket_tree_choose(const struct crush_bucket_tree *bucket, int x, int r)

Tree型別的Bucket其item的組織成樹結構：每個item組成決策樹的葉子節點。根節點和中間節點是虛節點，其權重等於左右子樹的權重之和。由於item在葉子節點，所以每次選擇只能走到葉子節點才能選擇一個item出來。其具體查詢方法如下：
1）從該Tree bucket的root item （虛節點）開始遍歷。
2）它先得到節點的左子樹的權重Wl，得到節點的權重Wn，然後根據雜湊函式Hash（x，r，i）得到一個[0~1]值v：
a）如果值v在[0~Wl/Wn]之間，那麼左子樹中繼續選擇item。
b）否則在右子樹中繼續選擇item。
c）繼續遍歷子樹，直到到達葉子節點，葉子節點item為最終選出的一個結果。

由上述過程可知，Tree bucket每次選擇一個item都要遍歷到子節點。其查詢複雜度是O(log n)。
當bucket中包含大量的item時，效率會比List型的高。

（6）bucket_straw_choose

static int bucket_straw_choose(struct crush_bucket_straw *bucket,int x, int r)

函式bucket_straw_choose用於straw型別的bucket的選擇，輸入引數x為pgid，r為副本數。

Straw類的Bucket為預設的選擇演算法。該Bucket中的item選中概率是相同的，其實現如下：
1）函式f(Wi)為和item的權重Wi相關的函式，決定了每個item被選中的概率。
2）給每個item計算出一個長度，其公式為length=f(Wi)*hash(x,r,i)
length值最大的item就是被選中的item。

List buckets和Tree buckets的結構決定了只有有限的雜湊值需要計算並與權重進行比較以確定bucket中的項。這樣做的話，他們採用了分而治之的方式，要麼給特定項以優先權（比如那些在列表開頭的項），要麼消除完全考慮整個子樹的必要。儘管這樣提高了副本定位過程的效率，但當向buckets中增加項、刪除項或重新計算某一項的權重以改變其內容時，其重組的過程是次最優的。

Straw型別bucket允許所有項通過類似抽籤的方式來與其他項公平“競爭”。定位副本時，bucket中的每一項都對應一個隨機長度的straw，且擁有最長長度的straw會獲得勝利（被選中）。每一個straw的長度都是由固定區間內基於CRUSH輸入 x,，副本數目r,，以及bucket項 i，的雜湊值計算得到的一個值。每一個straw長度都乘以根據該項權重的立方獲得的一個係數 f(wi)，這樣擁有最大權重的項更容易被選中。儘管straw型別bucket定位過程要比List buckets和Tree buckets慢，但是straw型別的bucket在修改時最近鄰項之間資料的移動（重組過程）是最優的。

（7）bucket_straw2_choose

static int bucket_straw2_choose(struct crush_bucket_straw2 *bucket,int x, int r)

Straw bucket 的改進，可以減少資料的遷移量。例如，增加一個裝置給專案C從而改變它的權重後，或者刪除專案C以後，資料只會移動到它上面或者從它上面移動到其他地方，而不會在bucket內的其它專案之間出現資料移動。

（8）crush_bucket_choose

static int crush_bucket_choose(struct crush_bucket *in, int x, int r)

函式crush_bucket_choose根據不同的型別bucket，選擇不同的演算法來實現從bucket中選出item。
crush_bucket_choose是CRUSH最重要的函式，應為預設的bucket型別是straw，常見的情況下我們會使用straw型別bucket，然後就會進入bucket_straw_choose。

（9）crush_choose_firstn

static int crush_choose_firstn(const struct crush_map *map,
			       struct crush_bucket *bucket,
			       const __u32 *weight, int weight_max,
			       int x, int numrep, int type,
			       int *out, int outpos,
			       int out_size,
			       unsigned int tries,
			       unsigned int recurse_tries,
			       unsigned int local_retries,
			       unsigned int local_fallback_retries,
			       int recurse_to_leaf,
			       unsigned int vary_r,
			       unsigned int stable,
			       int *out2,
			       int parent_r)

深度優先，呼叫函式crush_choose_firstn。
函式呼叫crush_bucket_choose選擇需要的副本數，並對選擇出來的OSD做了相關的衝突檢查，如果衝突或者失效或者過載，繼續選擇新的OSD。
這個函式遞迴的選擇特定bucket或者裝置，並且可以處理衝突，失敗的情況。
如果當前是choose過程，通過呼叫crush_bucket_choose來直接選擇。
如果當前是chooseleaf選擇葉子節點的過程，該函式將遞迴直到得到葉子節點。
引數：
map：crush_map
bucket：我們從中選擇一個item的bucket
x：crush輸入值
numrep：要選擇的item數
type：要選擇的item型別
out：指向輸出向量的指標
outpos：我們在該向量中的位置
out_size：out向量的大小
tries：嘗試的次數
rerserse_tries：遞迴chooseleaf的嘗試次數
local_retries：本地化重試
local_fallback_retries：本地化後備重試
recurse_to_leaf：如果我們想要在給定型別的每個item下有一個裝置，則為true（chooseleaf而不是choose）
stable：穩定模式在所有副本的遞迴呼叫中啟動rep = 0
vary_r：將r傳遞給遞迴呼叫
out2：葉子item的第二個輸出向量（如果是recurse_to_leaf）
parent_r：從父級傳遞的r值

（10）crush_choose_indep

static void crush_choose_indep(const struct crush_map *map,
			       struct crush_bucket *bucket,
			       const __u32 *weight, int weight_max,
			       int x, int left, int numrep, int type,
			       int *out, int outpos,
			       unsigned int tries,
			       unsigned int recurse_tries,
			       int recurse_to_leaf,
			       int *out2,
			       int parent_r)

糾刪碼儲存過程
廣度優先，呼叫函式crush_choose_indep。

（11）crush_do_rule

int crush_do_rule(const struct crush_map *map,
		  int ruleno, int x, int *result, int result_max,
		  const __u32 *weight, int weight_max,
		  int *scratch)

函式crush_do_rule根據step的數量，迴圈呼叫相關的函式選擇bucket。如果是深度優先，呼叫函式crush_choose_firstn；如果是廣度優先，呼叫函式crush_choose_indep。
引數：
map：crush map結構
ruleno：ruleset的號
x：輸入，一般是pg的id
result：輸出osd列表
result_max：輸出osd列表的數量
weight：所有osd的權重，通過它來判斷osd是否out
weight_max：所有osd的數量
sratch：私人使用的scratch向量; 必須> = 3 * result_max

ceph/crush/mapper.c 原始碼解析

ceph/crush/mapper.c 原始碼解析

ceph/src/crush/crush.h 原始碼解析

MapReducer中原始碼Mapper和Reducer方法原始碼解析

MapReduce原始碼解析之Mapper

C#軟體授權、註冊、加密、解密模組原始碼解析並製作註冊機生成license

【MapReduce詳解及原始碼解析（一）】——分片輸入、Mapper及Map端Shuffle過程

mybatis原始碼-解析配置檔案（四）之配置檔案Mapper解析

mybatis原始碼-解析配置檔案（四-1）之配置檔案Mapper解析(cache)

Mybatis 原始碼解析三、Mapper介面與mapper.xml檔案繫結

Ceph Crush 演算法原始碼分析

mybatis通用mapper原始碼解析（二）

lcc原始碼解析只decl.c

lcc原始碼解析之sym.c

ceph原始碼解析--osd篇

CEPH CRUSH 演算法原始碼分析原文CEPH CRUSH algorithm source code analysis

MyBatis Spring 整合,mapper介面@repository有時候卻不用寫的原因(MyBatis Spring 整合原始碼解析)

uC/OS-II原始碼解析(ucos_ii.c)

lcc原始碼解析之expr.c

openCV內部原始碼C++語法解析

Mybatis 原始碼解析二、Mapper介面的代理實現過程 MapperScannerConfigurer 解析

ceph/crush/mapper.c 原始碼解析

相關推薦