Xgboost原理、程式碼、調參和上線實錄

阿新 • • 發佈：2018-12-11

對於一個演算法工程師而言，xgboost應該算的上是起手式，網上也有各式各樣的教程，這篇部落格主要從原理、程式碼、調參和上線進行覆蓋，進而構建一個直觀的演算法體系；

生成的二叉樹是滿二叉樹還是完全二叉樹？

調參方法

param = {

# step size

'eta': 0.1,

# model param, the weight value of the number of leaves, larger -> under fitting

# 'gamma': 0.1,

'max_depth': depth,

# pruning param, min instance weight in hessian needed in a child; larger -> under fitting

# 'min_child_weight': 1,

# pruning param, update constraint, larger -> under fitting

# 'max_delta_step': 0,

'subsample': 0.8,

# column sample ratio each tree

# 'colsample_bytree': 0.8,

# column sample ratio each layer

'colsample_bylevel': 0.3,

# L2 regularization term

# 'lambda': 1,

# L1 regularization term

'alpha': 0.1,

# small data set -> exact, large data set -> approx, just choose auto

# 'tree_method': 'auto',

# 'sketch_eps': 0.03,

# for unbalanced data set, pos: value, neg: 1

# 'scale_pos_weight': 1 / weight,

# "reg:linear" –linear regression

# "reg:logistic" –logistic regression

# "binary:logistic" –logistic regression for binary classification, output probability

# "binary:logitraw" –logistic regression for binary classification, output score before logistic transformation

# "count:poisson" –poisson regression for count data, output mean of poisson distribution, max_delta_step is set to 0.7 by default in poisson regression (used to safeguard optimization)

# "multi:softmax" –set XGBoost to do multiclass classification using the softmax objective, you also need to set num_class(number of classes)

# "multi:softprob" –same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata, nclass matrix. The result contains predicted probability of each data point belonging to each class.

# "rank:pairwise" –set XGBoost to do ranking task by minimizing the pairwise loss

# "reg:gamma" –gamma regression for severity data, output mean of gamma distribution

'objective': 'binary:logistic',

# threshold

'base_score': weight / (weight + 1),

# "rmse": root mean square error

# "mae": mean absolute error

# "logloss": negative log-likelihood

# "error": Binary classification error rate. It is calculated as # (wrong cases)/# (all cases). For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.

# "merror": Multiclass classification error rate. It is calculated as # (wrong cases)/# (all cases).

# "mlogloss": Multiclass logloss

# "auc": Area under the curve for ranking evaluation.

# "ndcg":Normalized Discounted Cumulative Gain

# "map":Mean average precision

# "[email protected]","[email protected]": n can be assigned as an integer to cut off the top positions in the lists for evaluation.

# "ndcg-","map-","[email protected]","[email protected]": In XGBoost, NDCG and MAP will evaluate the score of a list without any positive samples as 1. By adding "-" in the evaluation metric XGBoost will evaluate these score as 0 to be consistent under some conditions. training repeatedly

# "gamma-deviance": [residual deviance for gamma regression]

'eval_metric': 'auc',

'seed': 31,

}

上線實錄：

在實際生產過程中，難免遇到將模型上線的問題，這裡將流程進行拆解；

1.模型訓練完成—儲存模型檔案—解析模型檔案—重寫程式碼讀取解析檔案預測樹；

Xgb的損失函式的作用；

Java解析原始碼，具體的程式碼和註釋已經更新在github上，

Xgboost原理、程式碼、調參和上線實錄

Xgboost原理、程式碼、調參和上線實錄

UDP千兆乙太網FPGA_verilog實現（四、程式碼前期準備-UDP和IP協議構建）

ViewPager系列之ViewPager無限迴圈滑動原理、程式碼、2種實現方法比較

機器學習（三）深度學習的經典論文、程式碼、部落格文章

客戶逾期貸款預測[6] - 網格搜尋調參和交叉驗證

吳恩達機器學習程式設計題ex1下多變數線性迴歸： (python版含題目要求、程式碼、註解)

吳恩達機器學習程式設計題ex1上單變數線性迴歸： (python版含題目要求、程式碼、註解)

吳恩達-深度學習-課程筆記-8: 超參數調試、Batch正則化和softmax( Week 3 )

開源俄版三軸雲臺軟硬體、調參軟體除錯可用，程式碼開源，入門說明

常見排序演算法的基本原理、程式碼實現和時間複雜度分析

XGBoost特點、調參、討論

KMeans原理、調參及應用

xgboost、random forest等模型調參小結

Bash腳本之if、case、read和位置參數

python,錯誤、調試和測試

錯誤、調試和測試是怎樣的

JavaScript學習總結（三、函數聲明和表達式、this、閉包和引用、arguments對象、函數間傳遞參數）

基於pytorch的CNN、LSTM神經網絡模型調參小結

9、如何在Xamarin中進行iOS真機調試和發布

SSH 協議原理、組成、認證方式和過程[轉]

Xgboost原理、程式碼、調參和上線實錄

相關推薦