bootstrap && bagging && 決策樹 && 隨機森林

阿新 • • 發佈：2018-10-27

eat calculate 決策 dev The for instance sta mode

看了一篇介紹這幾個概念的文章，整理一點點筆記在這裏，原文鏈接：

https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/

1.Bootstrap Method

The bootstrap is a powerful statistical method for estimating a quantity from a data sample. This is easiest to understand if the quantity is a descriptive statistic such as a mean or a standard deviation.

就是說，bootstrap是一個統計學習的方法，用來更好的估計一個數據集的某些性質，比如方差和均值，當數據集的數據有一些錯誤的時候，這樣可以提高估計的準確率；

具體的操作就是，創造一個數據集的多個子數據集，然後再各個子數據集上分別計算比如方差，最後將多個計算結果做平均；

2.Bootstrap Aggregation (Bagging)

是一種集成方法，集成方法就是合並來自多種機器學習預測方法計算的結果的技術，得到的結果比單一的預測結果要好；

Bootstrap Aggregation is a general procedure that can be used to reduce the variance for those algorithm that have high variance. An algorithm that has high variance are decision trees, like classification and regression trees (CART).

Bagging is the application of the Bootstrap procedure to a high-variance machine learning algorithm, typically decision trees.

可以看出，bagging其實是 bootsrap方法在高方差的算法上的應用，用來降低方差 variance；所以可以得到 5-bagged decision trees 這種；

具體的方法也很簡單，和bootstrap差不多，將數據集劃分，然後再各個子數據集上分別訓練決策樹，最後合並決策樹的預測結果，例如：

Let’s assume we have a sample dataset of 1000 
 instances (x) and we are using the CART algorithm. Bagging of the CART algorithm would work as follows.

1.Create many (e.g. 100) random sub-samples of our dataset with replacement.
2.Train a CART model on each sample.
3.Given a new dataset, calculate the average prediction from each model.

3.Random Forest 隨機森林、

引入隨機森林的原因是，在多個子數據集上分別訓練決策樹，但是決策樹都是貪心的，都想尋找最優的劃分，導致最後多個決策樹之間的相關性很大，這樣對最後的結果不好；

所以引入隨機森林，每次限制決策樹在 split point 可以挑選的特征數量，導致更好的隨機；一般來說數量 m：

For classification a good default is: m = sqrt(p)
For regression a good default is: m = p/3

P是分類問題輸入的變量數目，也是特征的數目；

bootstrap && bagging && 決策樹 && 隨機森林

bootstrap && bagging && 決策樹 && 隨機森林

1.Bootstrap Method

2.Bootstrap Aggregation (Bagging)

3.Random Forest 隨機森林、

通俗易懂--決策樹演算法、隨機森林演算法講解(演算法+案例)

通過５折交叉驗證，實現邏輯迴歸，決策樹，SVM,隨機森林，GBDT,Xgboost,lightGBM的評分

決策樹系列之隨機森林

bootstrap && bagging && 決策樹 && 隨機森林

POJ 3177--Redundant Paths【無向圖添加最少的邊成為邊雙連通圖 &amp;&amp; tarjan求ebc &amp;&amp; 縮點構造縮點樹】

bzoj4552: [Tjoi2016&Heoi2016]排序（二分+線段樹）

最小割分治(最小割樹):BZOJ2229 && BZOJ4519

BZOJ3545&3551[ONTAK2010]Peaks——kruskal重構樹+主席樹+dfs序+樹上倍增

bzoj 4552 [Tjoi2016&Heoi2016]排序 (二分答案線段樹）

[洛谷P3369] 普通平衡樹 Treap & Splay

【HDOJ1828&&POJ1177】Picture（線段樹，掃描線）

Leetcode 669 修剪二叉搜尋樹 Java&Python

B+樹 && B+樹索引&&Cardinality值 ----- InnoDB儲存引擎內幕

Python3&資料結構之二叉樹

線段樹補充&zkw線段樹

決策單調性&四邊形不等式

【20180422】java--二叉樹結構&演算法學習

BZOJ 3878 [AHOI&JSOI2014]奇怪的計算器 (線段樹)

絕對是全網最好的Splay 入門詳解——洛谷P3369&BZOJ3224: Tyvj 1728 普通平衡樹包教包會

[白話解析] 通俗解析整合學習之bagging，boosting & 隨機森林

bootstrap && bagging && 決策樹 && 隨機森林

1.Bootstrap Method

2.Bootstrap Aggregation (Bagging)

3.Random Forest 隨機森林、

相關推薦