Review of Machine Learning With R
How do you get started with machine learning in R?
In this post you will discover the book Machine Learning with R by Brett Lantz that has the goal of telling you exactly how to get started practicing machine learning in R.
We cover the audience for the book, a nice deep breakdown of the contents and a summary of the good and bad points.
Let’s get started.
Note: We are talking about the second edition in this review
Who Should Read This Book?
There are two types of people who should read this book:
- Machine Learning Practitioner. You already know some machine learning and you want to learn how to practice machine learning using R.
- R Practitioner. You are a user of R and you want to learn enough machine learning to practice with R.
From the preface:
It would be helpful yo have a bit of familiarity with basic math and programming concepts, bit no prior experience is required.
Need more Help with R for Machine Learning?
Take my free 14-day email course and discover how to use R on your project (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Book Contents
This section steps you through the topics covered in the book.
When picking up a new book, I like to step through each chapter and see the steps or journey it takes me on. The journey of this book is as follows:
- What is machine learning.
- Handle data.
- Lots of machine learning algorithms
- Evaluate model accuracy.
- Improve accuracy of models.
This covers many of the tasks you need for a machine learning project, but it does miss some.
Let’s step through each chapter and see what the book offers:
Chapter 1: Introducing Machine Learning
Provides an introduction to machine learning, terminology and (very) high-level learning theory.
Topics covered include:
- Uses and abuses of machine learning
- How machines learn
- Machine learning in practice
- Machine learning with R
Interestingly, the topic of machine learning ethics is covered, a topic you don’t often see addressed.
Chapter 2: Managing and Understanding Data
Covers R basics but really focuses on how to load, summarize and visualize data.
Topics include:
- R data structures
- Managing data with R
- Exploring and understanding data
A lot of time is spent on different graph types, which I generally like. It is good to know about and use more than one or two graphs.
Chapter 3: Lazy Learning – Classification Using Nearest Neighbors
This chapter introduces and demonstrates the k-nearest neighbors (kNN) algorithm.
Topics covered include:
- Understanding nearest neighbor classification
- Example – diagnosing breast cancer with k-NN algorithm
I like how good time is spent on data transforms, so critical to the accuracy of kNN.
Chapter 4: Probabilistic Learning – Classification Using Naive Bayes
This chapter introduces and demonstrates the Naive Bayes algorithm for classification.
Topics covered include:
- Understanding Naive Bayes
- Example – filtering mobile phone spam with the Naive Bayes Algorithm
I like the interesting case study problem used.
Chapter 5: Divide and Conquer – Classification Using Decision Trees and Rules
This chapter introduces decision trees and rule systems with the algorithms C5.0, 1R and RIPPER.
Topics covered include:
- Understanding decision trees
- Example – identifying risky bank loans using C5.0 decision trees
- Understanding classification rules
- Example – identifying poisonous mushrooms with rule learners
I like that C5.0 is covered as it has been priority for a long time and has only recently been released as open source and made available in R. I am surprised that CART was not covered, the hello world of decision tree algorithms.
Chapter 6: Forecasting Numeric Data – Regression Methods
This chapter is all about regression, with a demonstrations of linear regression, CART and M5P.
Topics covered include:
- Understanding Regression
- Example – predicting medical expenses using linear regression
- Understanding regression trees and model trees
- Example – estimating the quality of wines with regression trees and model trees
It is good to see the classics linear regression and CART covered here. M5P is also a nice touch.
Chapter 7: Black Box Methods – Neural Networks and Support Vector Machines
This chapter introduces artificial neural networks and support vector machines.
Topics covered include:
- Understanding neural networks
- Example – Modeling the strength of concrete with ANNs
- Understanding support vector machines
- Example – performing OCR with SVMs
It is good to see these algorithms covered and the example problems are interesting.
Chapter 8: Finding Patterns – Market Basket Analysis Using Association Rules
This chapter introduces and demonstrates association rule algorithms, typically used for market basket analysis.
Topics covered include:
- Understanding association rules.
- Example – identifying frequently purchased groceries with association rules
It’s not a topic I like much nor an algorithm I have ever had to use on a project. I’d drop this chapter.
Chapter 9: Finding Groups of Data – Clustering with k-means
This chapter introduces he k-means clustering algorithm and demonstrates it on data.
Topics covered include:
- Understanding clustering
- Example – finding teen market segments using k-means clustering
Another esoteric topic that I would probably drop. Clustering is interesting but often unsupervised learning algorithms are really hard to use well in practice. Here’s some clusters, now what.
Chapter 10: Evaluating Model Performance
This chapter presents methods for evaluating model skill.
Topics covered include:
- Measuring performance for classification
- Evaluating future performance
I like that performance measures and resampling methods are covered. Many texts skip it. I like that a lot of time is spent on the more detailed concerns of classification accuracy (e.g. touching on Kappa and F1 scores).
Chapter 11: Improving Model Performance
This chapter introduces techniques that you can use to improve the accuracy of your models, namely algorithm tuning and ensembles.
Topics covered include:
- Tuning stock models for better performance
- Improving model performance with meta-learning
Good but too brief. Algorithm tuning and ensembles are a big part of building accurate models in modern machine learning. Length could be suitable given that it is an introductory text, but more time should be given to the caret package.
If you’re not using caret for machine learning in R, you’re doing it wrong.
Chapter 12: Specialized Machine Learning Topics
This chapter contains a mess of other topics, including:
- Working with proprietary files and databases
- Working with online data and services
- Working with domain-specific data
- Improving performance of R
The topics are very specialized. Perhaps only the last on “improving performance of R” is really actionable for your machine learning projects.
Machine Learning Algorithms
The book covers a number of different machine learning algorithms. This section lists all of the algorithms covered and in which chapter they can be found.
I note that page 21 of the book does provide a look-up table of algorithms to chapters, but it is too high-level and glosses over the actual names of the algorithms used.
- k-nearest neighbors (chapter 3)
- Naive Bayes (chapter 4)
- C5.0 (chapter 5)
- 1R (chapter 5)
- RIPPER (chapter 5)
- Linear Regression (chapter 6)
- Classification and Regression Trees (chapter 6)
- M5P (chapter 6)
- Artificial Neural Networks (chapter 7)
- Support Vector Machines (chapter 7)
- Apriori (chapter 8)
- k-means (chapter 9)
- Bagged CART (chapter 10)
- AdaBoost (chapter 10)
- Random Forest (chapter 10)
What Do I Think Of This Book?
I like the book as an introduction for how to do machine learning on the R platform.
You must know how to program. You must know a little bit of R. You must have some sense of how to drive a machine learning project from beginning to end. This book will not cover these topics, but it will show you how to complete common machine learning tasks using R.
Set your expectations accordingly:
- This is a practical book with worked examples and high-level algorithm descriptions.
- This is not a machine learning textbook with theory, proof and lots of equations.
Pros
- I like the structured examples how each algorithm is demonstrated with a different dataset.
- I like that the datasets are small in memory examples perhaps all taken from the UCI Machine Learning Repository.
- I like that references to research papers are provided where appropriate for further reading.
- I like the boxes that summarize usage information for algorithms and other key techniques.
- I like that it is practically focused, the how of machine learning not the deep why.
Cons
- I don’t like that it is so algorithms focused. It general structure of most “applied books” and dumps a lot of algorithms on you rather than the extended project lifecycle.
- I don’t like that there are no end-to-end examples (problem definition, through to model selection, through to presentation of results). The formal structure of examples is good, but I’d a deep case study chapter I think.
- I cannot download the code and datasets from a GitHub repository or as a zip. I have sign up and go through their process.
- There are chapters there that feel like they are only there because similar chapters exist in other machine learning books (clustering and association rules). These may be machine learning methods, but are not used nearly as often as core predictive modeling methods (IMHO).
- Perhaps a little too much filler. I like less talk more action. If I want long algorithm description I’d read an algorithms textbook. Tell me the broad strokes and let’s get to it.
Final Word
If you are looking for a good applied book for machine learning with R, this is it. I like it for beginners who know a little machine learning and/or a little R and want to practice machine learning on the R platform.
Even though I think O’Reilly books are generally better applied books than Packt, I don’t see an offering from O’Reilly that can compete.
If you want to go one step deeper and get some more theory and more explanations I would advise checking out: Applied Predictive Modeling. If you want more math I would suggest An Introduction to Statistical Learning: with Applications in R.
Both books have examples in R, but less focus on R and more focus on the details of machine learning algorithms.
Resources
Next Step
Have you read this book? Let me know what you think in the comments.
Are you thinking of buying this book? Have any questions? Let me know in the comments and I’ll do my best to answer them.
Frustrated With Your Progress In R Machine Learning?
Develop Your Own Models in Minutes
…with just a few lines of R code
Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…
Finally Bring Machine Learning To
Your Own Projects
Skip the Academics. Just Results.
相關推薦
Review of Machine Learning With R
Tweet Share Share Google Plus How do you get started with machine learning in R? In this post yo
Book Review: Machine Learning with Python Cookbook
Additional Considerations The only criticism I can place is that I wish there were more topics covered in the content. Some specific areas I would have li
[Machine Learning with Python] Cross Validation and Grid Search: An Example of KNN
Train model: from sklearn.model_selection import GridSearchCV param_grid = [ # try 6 (3×2) combinations of hyperparameters {'n_neighbors': [3,
How to Build an Ensemble Of Machine Learning Algorithms in R (ready to use boosting, bagging and stacking)
Tweet Share Share Google Plus Ensembles can give you a boost in accuracy on your dataset. In thi
Get Your Data Ready For Machine Learning in R with Pre
Tweet Share Share Google Plus Preparing data is required to get the best results from machine le
How To Get Started With Machine Learning in R (get results in one weekend)
Tweet Share Share Google Plus How do you get started with machine learning in R? R is a large an
OReilly.Hands-On.Machine.Learning.with.Scikit-Learn.and.TensorFlow學習筆記彙總
其中用到的知識點我都記錄在部落格中了:https://blog.csdn.net/dss_dssssd 第一章知識點總結: supervised learning k-Nearest Neighbors Linear Regression
Hands-on Machine Learning with Scikit-Learn and TensorFlow(中文版)和深度學習原理與TensorFlow實踐-學習筆記
監督學習:新增標籤。學習的目標是求出輸入與輸出之間的關係函式y=f(x)。樸素貝葉斯、邏輯迴歸和神經網路等都屬於監督學習的方法。 監督學習主要解決兩類核心問題,即迴歸和分類。 迴歸和分類的區別在於強調一個是連續的,一個是離散的。 非監督學習:不新增標籤。學習目標是為了探索樣本資料之間是否
A Survey of Machine Learning Techniques Applied to Software Defined Networking (SDN): Research Issues and Challenges
文章名稱:A Survey of Machine Learning Techniques Applied to Software Defined Networking (SDN): Research Issues and Challenges 文章名稱:應用於SDN的機器學習技術綜述:研究問題與挑戰
Introduction to Machine Learning with Python/Python機器學習基礎教程_程式碼修改與更新
2.3.1樣本資料集 --程式碼bug及修改意見 import matplotlib.pyplot as plt import mglearn X,y=mglearn.datasets.make_forge() mglearn.discrete_scatter(X[:,0
A Comprehensive survey of machine learning for Internet (2018) via Boutaba,Mohammed et al【sec 5】
5 Traffic routing 網路流量路由是網路中的基礎,並且需要選擇用於分組傳輸的路徑。 選擇標準是多種多樣的,主要取決於操作策略和目標,例如成本最小化,鏈路利用率最大化和QoS配置。 流量路由需要具有強能力的ML模型能力,例如能夠應對和擴充套件複雜和動態網路拓撲,學習所選路
A Comprehensive survey of machine learning for Internet (2018) via Boutaba,Mohammed et al【Sec 2】
這是AI for Net的一篇survey。 文章目錄 Section 2 A primer of AI for net 2.1 learning paradigm 2.2 Data c
【Machine Learning with Peppa】分享機器學習,數學,統計和程式設計乾貨
專欄達人 授予成功建立個人部落格專欄
Machine Learning with Peppa
把Scala List的幾種常見方法梳理彙總如下,日常開發場景基本上夠用了。建立列表scala> val days = List("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Sat
spark機器學習 原始碼 Machine Learning With Spark source code
@rover這個是C++模板 --胡滿超 stack<Postion> path__;這個裡面 ”<> “符號是什麼意思?我在C++語言裡面沒見過呢? 初學者,大神勿噴。
Combining Machine Learning with Credit Risk Scorecards
With all the hype around artificial intelligence, many of our customers are asking for some proof that AI can get them better results in areas where other
Machine Learning with Kaggle Kernels
In the last article we introduced Kaggle's primary offerings and proceeded to execute our first "Hello World" program within a Kaggle Kernel. In this artic
Machine Learning with Time Series Data
As with any data science problem, exploring the data is the most important process before stating a solution. The dataset collected had data on Chicago wea
The Basics of Machine Learning
If you still believe that your daily activities are the result of your own choices, you are mistaken. Few of us understand that our purchasing decisions, a
The Chairman of Nokia on Ensuring Every Employee Has a Basic Understanding of Machine Learning
I've long been both paranoid and optimistic about the promise and potential of artificial intelligence to disrupt -- well, almost everything. Last year, I