1. 程式人生 > 其它 >GBDT中損失函式的負梯度用來擬合的一些理解

GBDT中損失函式的負梯度用來擬合的一些理解

\(L(y_i,f(x_i))\)\(f(x_i)=f_{m-1}(x_i)\)處泰勒展開到一階(捨去餘項,故為近似)

\[L(y_i,f(x_i))\approx L(y_i,f_{m-1}(x_i))+\left. \frac{\partial L(y_i,f(x_i))}{\partial f(x_i)} \right|_{f(x_i)=f_{m-1}(x_i)}\cdot (f(x_i)-f_{m-1}(x_i)) \]

\(f(x_i) = f_{m-1}(x_i)\)\(f_m(x_i) = f_{m-1}(x_i)+T_m(x_i;\theta _m)\)帶入上式並移項

\[L(y_i,f_m(x_i))-L(y_i,f_{m-1}(x_i))\approx \left. \frac{\partial L(y_i,f(x_i))}{\partial f(x_i)} \right|_{f(x_i)=f_{m-1}(x_i)}\cdot T_m(x_i;\theta _m) \]

左式需小於0(每輪得到的強學習器需要比上一輪強學習器在損失函式更小,不然優化無意義),故令\(T_m(x_i;\theta _m)\)去擬合\(-\left. \frac{\partial L(y_i,f(x_i))}{\partial f(x_i)} \right|_{f(x_i)=f_{m-1}(x_i)}\)

使得右式小於0。
混淆點:\(f(x_i)\)是一個變數,代表最終求得的強學習器在第\(i\)個樣本\(x_i\)上的預測,\(f_{m-1}(x_i)\)\(f_m(x_i)\)是常量,即\((m-1)\)輪和\(m\)輪得到的強學習器在樣本\(x_i\)上的預測