1. 程式人生 > >Linear Classification in R with Decision Trees

Linear Classification in R with Decision Trees

In this post you will discover 7 recipes for non-linear classification with decision trees in R.

All recipes in this post use the iris flowers dataset provided with R in the datasets package. The dataset describes the measurements if iris flowers and requires classification of each observation to one of three flower species.

classification with decision trees

Classification with Decision Trees
Photo by stwn, some rights reserved

Classification and Regression Trees

Classification and Regression Trees (CART) split attributes based on values that minimize a loss function, such as sum of squared errors.

The following recipe demonstrates the recursive partitioning decision tree method on the iris dataset.

CART method in R R
123456789101112 # load the packagelibrary(rpart)# load datadata(iris)# fit modelfit<-rpart(Species~.,data=iris)# summarize the fitsummary(fit)# make predictionspredictions<-predict(fit,iris[,1:4],type="class")# summarize accuracytable(predictions,iris$Species)

Learn more about the rpart function and the rpart package.

C4.5

The C4.5 algorithm is an extension of the ID3 algorithm and constructs a decision tree to maximize information gain (difference in entropy).

The following recipe demonstrates the C4.5 (called J48 in Weka) decision tree method on the iris dataset.

C4.5 method in R R
123456789101112 # load the packagelibrary(RWeka)# load datadata(iris)# fit modelfit<-J48(Species~.,data=iris)# summarize the fitsummary(fit)# make predictionspredictions<-predict(fit,iris[,1:4])# summarize accuracytable(predictions,iris$Species)

Learn more about the J48 function and the RWeka package.

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

PART

PART is a rule system that creates pruned C4.5 decision trees for the data set and extracts rules and those instances that are covered by the rules are removed from the training data. The process is repeated until all instances are covered by extracted rules.

The following recipe demonstrates the PART rule system method on the iris dataset.

PART method in R R
123456789101112 # load the packagelibrary(RWeka)# load datadata(iris)# fit modelfit<-PART(Species~.,data=iris)# summarize the fitsummary(fit)# make predictionspredictions<-predict(fit,iris[,1:4])# summarize accuracytable(predictions,iris$Species)

Learn more about the PART function and the RWeka package.

Bagging CART

Bootstrapped Aggregation (Bagging) is an ensemble method that creates multiple models of the same type from different sub-samples of the same dataset. The predictions from each separate model are combined together to provide a superior result. This approach has shown participially effective for high-variance methods such as decision trees.

The following recipe demonstrates bagging applied to the recursive partitioning decision tree for the iris dataset.

Bagging CART in R R
123456789101112 # load the packagelibrary(ipred)# load datadata(iris)# fit modelfit<-bagging(Species~.,data=iris)# summarize the fitsummary(fit)# make predictionspredictions<-predict(fit,iris[,1:4],type="class")# summarize accuracytable(predictions,iris$Species)

Learn more about the bagging function and the ipred package.

Random Forest

Random Forest is variation on Bagging of decision trees by reducing the attributes available to making a tree at each decision point to a random sub-sample. This further increases the variance of the trees and more trees are required.

The following recipe demonstrate the random forest method applied to the iris dataset.

Random Forest in R R
123456789101112 # load the packagelibrary(randomForest)# load datadata(iris)# fit modelfit<-randomForest(Species~.,data=iris)# summarize the fitsummary(fit)# make predictionspredictions<-predict(fit,iris[,1:4])# summarize accuracytable(predictions,iris$Species)

Learn more about the randomForest function and the randomForest package.

Gradient Boosted Machine

Boosting is an ensemble method developed for classification for reducing bias where models are added to learn the misclassification errors in existing models. It has been generalized and adapted in the form of Gradient Boosted Machines (GBM) for use with CART decision trees for classification and regression.

The following recipe demonstrate the Gradient Boosted Machines (GBM) method in the iris dataset.

Gradient Boosted Machines in R R
123456789101112 # load the packagelibrary(gbm)# load datadata(iris)# fit modelfit<-gbm(Species~.,data=iris,distribution="multinomial")# summarize the fitprint(fit)# make predictionspredictions<-predict(fit,iris)# summarize accuracytable(predictions,iris$Species)

Learn more about the gbm function and the gbm package.

Boosted C5.0

The C5.0 method is a further extension of C4.5 and pinnacle of that line of methods. It was proprietary for a long time, although the code was released recently and is available in the C50 package.

The following recipe demonstrates the C5.0 with boosting method applied to the iris dataset.

Boosted C5.0 method in R R
123456789101112 # load the packagelibrary(C50)# load datadata(iris)# fit modelfit<-C5.0(Species~.,data=iris,trials=10)# summarize the fitprint(fit)# make predictionspredictions<-predict(fit,iris)# summarize accuracytable(predictions,iris$Species)

Learn more about the C5.0 function in the C50 package.

Summary

In this post you discovered 7 recipes for non-linear classification using decision trees in R using the iris flowers dataset.

Each recipe is generic and ready for you to copy and paste and modify for your own problem.


Frustrated With Your Progress In R Machine Learning?

Master Machine Learning With R

Develop Your Own Models in Minutes

…with just a few lines of R code

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.


相關推薦

Linear Classification in R with Decision Trees

Tweet Share Share Google Plus In this post you will discover 7 recipes for non-linear classifica

Linear Regression in R with Decision Trees

Tweet Share Share Google Plus In this post, you will discover 8 recipes for non-linear regressio

Linear Classification in R

Tweet Share Share Google Plus In this post you will discover recipes for 3 linear classification

A breath of fresh air with Decision Trees

A breath of fresh air with Decision TreesA very versatile decision support tool, capable of fitting complex algorithms, that can perform both classificatio

Get Your Data Ready For Machine Learning in R with Pre

Tweet Share Share Google Plus Preparing data is required to get the best results from machine le

Linear Regression in R

Tweet Share Share Google Plus In this post you will discover 4 recipes for non-linear regression

A gentle introduction to decision trees using R

Most techniques of predictive analytics have their origins in probability or statistical theory (see my post on Naïve Bayes, for example). In this post I'l

How To Get Started With Machine Learning Algorithms in R

Tweet Share Share Google Plus R is the most popular platform for applied machine learning. When

How To Get Started With Machine Learning in R (get results in one weekend)

Tweet Share Share Google Plus How do you get started with machine learning in R? R is a large an

An introduction to parsing text in Haskell with Parsec

util eof try xib reporting where its ner short Parsec makes parsing text very easy in Haskell. I write this as much for myself as for any

intersect for multiple vectors in R

con span osi library tar other and pos intersect Say you have a <- c(1,3,5,7,9) b <- c(3,6,8,9,10) c <- c(2,3,4,5,7,9) A stra

print,cat打印格式及字符串引號格式,去掉字符串空格 in R

with letters out logs true right int 函數 cnblogs print 函數的打印格式: ##no quote print out > x <- letters[1:5] > print(x,quote=F,);pri

Could not resolve view with name '***' in servlet with name 'dispatcher'

urn 異常 避免 href 出現 view hist 異步 rop 今天在開發中遇到了一個問題,控制層使用的是SpringMVC框架。 @RequestMapping("historyDetail") private String History(Mod

Parallel Gradient Boosting Decision Trees

perfect mes etc som mos val swa enumerate gre 本文轉載自:鏈接 Highlights Three different methods for parallel gradient boosting decision tre

[Angular] Handle HTTP Errors in Angular with HttpErrorResponse interface

int network perror ssa with proc () esp use When communicating with some backend API, data travels over the network using the HTTP protoc

install tensorflow in windows with python3

style chan windows sin window python2 test div post python3 -m pip install --upgrade https://storage.googleapis.com/tensorflow/mac/cpu/te

springmvc搭建環境時報No mapping found for HTTP request with URI [/exam3/welcome] in DispatcherServlet with name 'spring2'

url att comm con icon ffi handler b- ati 項目是使用spring MVC (1)在瀏覽器中訪問,後臺總報錯: Java代碼 No mapping found for HTTP request with URI [/exam

Match function in R

nom -i strong UNC https tps true mat amp Examples: print(match(5, c(1,2,9,5,3,6,7,4,5)))[1] 4 5 %in% c(1,2,9,5,3,6,7,4,5)[1] T

Beautiful and Powerful Correlation Tables in R

select auto edit for open sat local plot fig Another correlation function?! Yes, the correlation function from the psycho package. devt

Spring Boot, Java Config - No mapping found for HTTP request with URI [/…] in DispatcherServlet with name 'dispatcherServlet'

dispatch name req let servlet patch request found pack Spring Boot 啟用應用: error: No mapping found for HTTP request with URI [/…]