R語言估算不同分類器的預測誤差
阿新 • • 發佈:2019-01-05
說明
為了比較不同的分類器,我們通過將多種分類演算法採用ipred包的erroreset函式進行10折交叉驗證,來證明整合分類器是否比單一決策樹分類效果更優。
操作
仍然使用telecom churn的資料集作為輸入資料來源來完成對不再分類器錯分率的評估。
bagging模型的錯分率方法如下:
churn.bagging = errorest(churn ~ .,data = trainset,model = bagging)
churn.bagging
Call:
errorest.data.frame(formula = churn ~ ., data = trainset , model = bagging)
10-fold cross-validation estimator of misclassification error
Misclassification error: 0.0549
boosting模型的錯分率評估方法如下:
churn.boosting
Call:
errorest.data.frame(formula = churn ~ ., data = trainset, model = ada)
10-fold cross-validation estimator of misclassification error
Misclassification error: 0.0479
評估隨機森林的錯分率:
churn.randomforest = errorest(churn ~ .,data = trainset,model = randomForest)
churn.randomforest
Call:
errorest.data.frame(formula = churn ~ ., data = trainset, model = randomForest)
10-fold cross-validation estimator of misclassification error
Misclassification error: 0.0518
呼叫churn.predict 對測試資料集進行分類預測,並對單棵決策樹的錯分率進行評估:
churn.predict1 = function(object,newdata){predict(object,newdata = newdata,type = "class")}
> churn.tree = errorest(churn ~ .,data = trainset,model = rpart,predict = churn.predict1)
> churn.tree
Call:
errorest.data.frame(formula = churn ~ ., data = trainset, model = rpart,
predict = churn.predict1)
10-fold cross-validation estimator of misclassification error
Misclassification error: 0.0648
原理
本節使用ipred包中的errorest函式對boosting,bagging,隨機森林,以及單顆決策分類樹四種分類器的錯分率進行了評估。errorest函對每種分類器都執行10折交叉驗證,然後計算分類模型的錯分率,從中可以看出boosting的方法錯分率最低,然後依次是隨機森林、bagging,而單顆決策樹的效能是最差的。