1. 程式人生 > >Compare Models And Select The Best Using The Caret R Package

Compare Models And Select The Best Using The Caret R Package

The Caret R package allows you to easily construct many different model types and tune their parameters.

After creating and tuning many model types, you may want know and select the best model so that you can use it to make predictions, perhaps in an operational environment.

In this post you discover how to compare the results of multiple models using the caret R package.

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Compare Machine Learning Models

While working on a problem, you will settle on one or a handful of well-performing models. After tuning the parameters of each, you will want to compare the models and discover which are the best and worst performing.

It is useful to get an idea of the spread of the models, perhaps one can be improved, or you can stop working on one that is clearly performing worse than the others.

In the example below we compare three sophisticated machine learning models in the Pima Indians diabetes dataset. This dataset is a summary from a collection of medical reports and indicate the onset of diabetes in the patient within five years.

The three models constructed and tuned are Learning Vector Quantization (LVQ), Stochastic Gradient Boosting (also known as Gradient Boosted Machine or GBM), and Support Vector Machine (SVM). Each model is automatically tuned and is evaluated using 3 repeats of 10-fold cross validation.

The random number seed is set before each algorithm is trained to ensure that each algorithm gets the same data partitions and repeats. This allows us to compare apples to apples in the final results. Alternatively, we could ignore this concern and increase the number of repeats to 30 or 100, using randomness to control for variation in the data partitioning.

Once the models are trained and an optimal parameter configuration found for each, the accuracy results from each of the best models are collected. Each “winning” model has 30 results (3 repeats of 10-fold cross validation). The objective of comparing results is to compare the accuracy distributions (30 values) between the models.

This is done in three ways. The distributions are summarized in terms of the percentiles. The distributions are summarized as box plots and finally the distributions are summarized as dot plots.

Example of comparing model results using the Caret R Package R
123456789101112131415161718192021222324 # load the librarylibrary(mlbench)library(caret)# load the datasetdata(PimaIndiansDiabetes)# prepare training schemecontrol<-trainControl(method="repeatedcv",number=10,repeats=3)# train the LVQ modelset.seed(7)modelLvq<-train(diabetes~.,data=PimaIndiansDiabetes,method="lvq",trControl=control)# train the GBM modelset.seed(7)modelGbm<-train(diabetes~.,data=PimaIndiansDiabetes,method="gbm",trControl=control,verbose=FALSE)# train the SVM modelset.seed(7)modelSvm<-train(diabetes~.,data=PimaIndiansDiabetes,method="svmRadial",trControl=control)# collect resamplesresults<-resamples(list(LVQ=modelLvq,GBM=modelGbm,SVM=modelSvm))# summarize the distributionssummary(results)# boxplots of resultsbwplot(results)# dot plots of resultsdotplot(results)

Below is the table of results from summarizing the distributions for each model.

1234567891011121314 Models:LVQ,GBM,SVM Number of resamples:30Accuracy Min.1stQu.Median   Mean3rdQu.Max.NA'sLVQ 0.5921  0.6623 0.6928 0.6935  0.7273 0.7922    0GBM 0.7013  0.7403 0.7662 0.7665  0.7890 0.8442    0SVM 0.6711  0.7403 0.7582 0.7651  0.7890 0.8961    0Kappa        Min. 1st Qu. Median   Mean 3rd Qu.   Max. NA'sLVQ0.031250.16070.28190.26500.38450.51030GBM0.326900.39810.46380.46630.52130.64260SVM0.218700.38890.41670.45200.50030.76380
Box Plot Comparing Model Results

Box Plot Comparing Model Results using the Caret R Package

Dotplot Comparing Model Results using the Caret R Package

Dotplot Comparing Model Results using the Caret R Package

If you needed to make strong claims about which algorithm was better, you could also use statistical hypothesis tests to statistically show that the differences in the results were significant.

Something like a Student t-test if the results are normally distributed or a rank sum test if the distribution is unknown.

Summary

In this post you discovered how you can use the caret R package to compare the results from multiple different models, even after their parameters have been optimized. You saw three ways the results can be compared, in table, box plot and a dot plot.

The examples in this post are standalone and you can easily copy-and-paste them into your own project and adapt them for your problem.


Frustrated With Your Progress In R Machine Learning?

Master Machine Learning With R

Develop Your Own Models in Minutes

…with just a few lines of R code

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.


相關推薦

Compare Models And Select The Best Using The Caret R Package

Tweet Share Share Google Plus The Caret R package allows you to easily construct many different

DevSecOps, Threat Modelling and You: Get started using the STRIDE method

DevSecOps, Threat Modelling and You: Get started using the STRIDE methodNowadays more and more DevOps teams are starting to shift towards DevSecOps. The se

AOSP Part 1: Get the code using the Manifest and Repo tool

6 months ago, I moved to New York, the first city I lived in outside of Israel. With a new job at a new place, I decided to also try a new laptop runn

Tuning Machine Learning Models Using the Caret R Package

Tweet Share Share Google Plus Machine learning algorithms are parameterized so that they can be

20世紀十大演算法 The Best of the 20th Century: Editors Name Top 10 Algorithms

at the National Bureau of Standards, initiate the development of Krylov subspace iteration methods.These algorithms address the seemingly simple task of s

Data Visualization with the Caret R package

Tweet Share Share Google Plus The caret package in R is designed to streamline the process of ap

Feature Selection with the Caret R Package

Tweet Share Share Google Plus Selecting the right features in your data can mean the difference

Face Detection and Tracking Using the KLT Algorithm

Face Detection and Tracking Using the KLT Algorithm from: https://cn.mathworks.com/help/vision/examples/face-detection-and-tracking-using-

Margarite and the best present

put script lines turned orm %d perf recent black Little girl Margarita is a big fan of competitive programming. She especially loves prob

Codeforces Round #524 (Div. 2) B. Margarite and the best present 規律題

B. Margarite and the best present time limit per test 1 second memory limit per test 256 megabytes input standard input output stand

(規律)cf#524-B.Margarite and the best present

https://codeforces.com/contest/1080/problem/B 規律->奇數開始每兩個數和為-1,偶數開始每兩個數和為1,處理總個數的最後一個數即可 #include<bits/stdc++.h> using namespa

CF 1084 D. The Fair Nut and the Best Path

D. The Fair Nut and the Best Path https://codeforces.com/contest/1084/problem/D 題意:   在一棵樹內找一條路徑,使得從起點到終點的最後剩下的油最多。(中途沒油了不能再走了,可以在每個點加wi升油,減少的油量為路徑長度)。

Codeforces Round #526 D - The Fair Nut and the Best Path /// 樹上兩點間路徑花費

題目大意: 給定一棵樹 樹上每個點有對應的點權 樹上每條邊有對應的邊權 經過一個點可得到點權 經過一條邊必須花費邊權 即從u到v 最終得分=u的點權-u到v的邊權+v的點權 求樹上一條路徑使得得分最大   看註釋 #include <bits/stdc++.h> #

CodeForces 1084D The Fair Nut and the Best Path

The Fair Nut and the Best Path 題意:求路徑上的 點權和 - 邊權和 最大, 然後不能存在某個點為負數。 題解: dfs一遍, 求所有兒子走到這個點的最大值和次大值。 我們需要明白如果可以從u -> v  那麼一定可以從 v -> u, 當然 指的是

Codeforces Round #524 (Div. 2) Margarite and the best present CodeForces - 1080B

找一下規律即可。。 #include<stdio.h> #include<iostream> #include<algorithm> #include<cmath> #include<cstring> #include<strin

The Best Free Scrum Learning Resources, Guides and Articles

The Best Free Scrum Learning Resources, Guides and Articles Guide - Scrum Guides [PDF Download] Developed and sustained by Scrum creators: Ken Sch

【樹的DFS】codeforces1084D The Fair Nut and the Best Path

D. The Fair Nut and the Best Path time limit per test3 seconds memory limit per test256 megabytes inputstandard input outputstandard output The Fa

Ask HN: Whats the best desktop cfg for ML and Data science side project as R&D?

Should I go for a) All in one powerful desktop b) multiple PCs with RAM in the 4-8 GB range? How to decide?

10 of the best dating sites for introverts, wallflowers, and shy people

Online dating is basically the best thing that ever happened to introverts. You can now scan for a potential mate without ever leaving the comfort zone tha

The Silver Islands are the best (and sexiest) region in 'Assassin's Creed Odyssey'

A lot has been written about the epic story of Assassin's Creed Odyssey, and it's true that the main quests of the game span a massive map and take players