1. 程式人生 > >bootstrap && bagging && 決策樹 && 隨機森林

bootstrap && bagging && 決策樹 && 隨機森林

eat calculate 決策 dev The for instance sta mode



1.Bootstrap Method

The bootstrap is a powerful statistical method for estimating a quantity from a data sample. This is easiest to understand if the quantity is a descriptive statistic such as a mean or a standard deviation.



2.Bootstrap Aggregation (Bagging)


Bootstrap Aggregation is a general procedure that can be used to reduce the variance for those algorithm that have high variance. An algorithm that has high variance are decision trees, like classification and regression trees (CART).

Bagging is the application of the Bootstrap procedure to a high-variance machine learning algorithm, typically decision trees.

可以看出,bagging其實是 bootsrap方法在高方差的算法上的應用,用來降低方差 variance;所以可以得到 5-bagged decision trees 這種;


Let’s assume we have a sample dataset of 1000
instances (x) and we are using the CART algorithm. Bagging of the CART algorithm would work as follows. 1.Create many (e.g. 100) random sub-samples of our dataset with replacement. 2.Train a CART model on each sample. 3.Given a new dataset, calculate the average prediction from each model.

3.Random Forest 隨機森林、


所以引入隨機森林,每次限制決策樹在 split point 可以挑選的特征數量,導致更好的隨機;一般來說數量 m:

  • For classification a good default is: m = sqrt(p)
  • For regression a good default is: m = p/3


bootstrap && bagging && 決策樹 && 隨機森林