6 Questions To Understand Any Machine Learning Algorithm
There are a lot of machine learning algorithms and each algorithm is an island of research.
You have to choose the level of detail that you study machine learning algorithms. There is a sweet spot if you are a developer interested in applied predictive modeling.
This post describes that sweet spot and gives you a template that you can use to quickly understand any machine learning algorithm.
Let’s get started.
What You Need To Know About a Machine Learning Algorithm?
What do you need to know about a machine learning algorithm to be able to use it well on a classification or prediction problem?
I won’t argue that the more that you know about how and why a particular algorithm works, the better you can wield it. But I do believe that there is a point of diminishing returns where you can stop, use what you know to be effective and dive deeper into the theory and research on an algorithm if and only if you need to know more in order to get better results.
Let’s take a look at the 6 questions that will reveal how a machine learning algorithms and how to best use it.
Get your FREE Algorithms Mind Map
I've created a handy mind map of 60+ algorithms organized by type.
Download it, print it and use it.
Also get exclusive access to the machine learning algorithms email mini-course.
6 Questions To Ask About Any Algorithm
There are 6 questions that you can ask to get to the heart of any machine learning algorithm:
- How do you refer to the technique (e.g. what name)?
- How do you represent a learned model (e.g. what coefficients)?
- How to you learn a model (e.g. the optimization process from data to the representation)?
- How do you make predictions from a learned model (e.g. apply the model to new data)?
- How do you best prepare your data for the modeling with the technique (e.g. assumptions)?
- How do you get more information on the technique (e.g. where to look)?
You will note that I have phrased all of these questions as How-To. I did this intentionally to separate the practical concerns of how from the more theoretical concerns of why. I think knowing why a technique works is less important than knowing how it works, if you are looking to use it as a tool to get results. More on this in the next section.
Let’s take a closer look at each of these questions in turn.
1. How Do You Refer To The Technique?
This is obvious but important. You need to know the canonical name of the technique.
You need to be able to recognize the classical name or the name of the method from other fields as well and know that it is the same thing. This also includes the acronym for the algorithm, because sometimes they are less than intuitive.
This will help you sort out the base algorithm from extensions and the family tree of where the algorithm fits and relates to similar algorithms.
2. How Do You Represent a Learned Model?
I really like this nuts and bolts question.
This is question often overlooked in textbooks and papers and is perhaps the first question an engineer has when thinking about how a model will actually be used and deployed.
The representation is the numbers and data structure that captures the distinct details learned from data by the learning algorithm to be used by the prediction algorithm. It’s the stuff you save to disk or the database when you finalize your model. It’s the stuff you update when new training data becomes available.
Let’s make this concrete with an example. In the case of linear regression, the representation is the vector of regression coefficients. That’s it. In the case of a decision tree is is the tree itself including the nodes, how they are connected and the variables and cut-off thresholds chosen.
3. How Do You Learn a Model?
Given some training data, the algorithm needs to create the model or fill in the model representation. This question is about exactly how that occurs.
Often learning involves estimating parameters from the training data directly in simpler algorithms.
In most other algorithms it involves using the training data as part of a cost or loss function and an optimization algorithm to minimize the function. Simpler linear techniques may use linear algebra to achieve this result, whereas others may use a numerical optimization.
Often the way a machine learning algorithm learns a model is synonymous with the algorithm itself. This is the challenging and often time consuming part of running a machine learning algorithm.
The learning algorithm may be parameterized and it is often a good idea to list common ranges for parameter values or configuration heuristics that may be used as a starting point.
4. How Do You Make Predictions With A Model?
Once a model is learned, it is intended to be used to make predictions on new data. Note, we re exclusively talking about predictive modeling machine learning algorithms for classification and regression problems.
This is often the fast and even trivial part of using a machine learning algorithm. Often it is so trivial that it is not even mentioned or discussed in the literature.
It may be trivial because prediction may be as simple as filling in the inputs in an equation and calculating a prediction, or traversing a decision tree to see what leaf-node lights up. In other algorithms, like k-nearest neighbors the prediction algorithm may be the main show (k-NN has no training algorithm other than “store the whole training set”).
5. How Do You Best Prepare Data For The Algorithm?
Machine learning algorithms make assumptions.
Even the most relaxed non-parametric methods make assumptions about your training data. It is good or even critical to review these assumptions. Even better is to translate these assumptions into specific data preparation operations that you can perform.
This question flushes out transforms that you could use on your data before modeling, or at the very least gives you pause to think about data transforms to try. What I mean by this is that it is best to treat algorithm requirements and assumptions as suggestions of things to try to get the most out your model rather than hard and fast rules that your data must adhere to.
Just like you cannot know which algorithm will be best for your data before hand, you cannot know the best transforms to apply to your data to get the most from an algorithm. Real data is messy and it is a good idea to try a number of presentations of your data with a number of different algorithms to see what warrants deeper investigation. The requirements and assumptions of machine learning algorithms help to point out presentations of your data to try.
6. How Do You Get More Information About the Algorithm?
Some algorithms will bubble up as generally better than others on your data problem.
When they do, you need to know where to look to get a deeper understanding of the technique. This can help with further customizing the algorithm for your data and with tuning the parameters of the learning and prediction algorithms.
It is a good idea to collect and list resources that you can reference if and when you need to dive deeper. This may include:
- Journal Articles
- Conference Papers
- Books including textbooks and monographs
- Webpages
I also think it is a good idea to know of more practical references like example tutorials and open source implementations that you can look inside to get a more concrete idea of what is going on.
For more on researching machine learning algorithms, see the post How to Research a Machine Learning Algorithm.
Summary
In this post you discovered 6 questions that you can ask of a machine learning, that if answered, will give you a very good and practical idea of how it works and how you can use it effectively.
These questions were focused on machine learning algorithms for predictive modeling problems like classification and regression.
These questions, phrased simply are:
- What are the common names of the algorithm?
- What representation is used by the model?
- How does the algorithm learn from training data?
- How can you make predictions from the model on new data?
- How you can best prepare your data for the algorithm?
- Where you can you look for more information about the algorithm?
For another post along this theme of defining an algorithm description template see How to Learn a Machine Learning Algorithm.
Do you like this approach? Let me know in the comments.
Frustrated With Machine Learning Math?
See How Algorithms Work in Minutes
…with just arithmetic and simple examples
It covers explanations and examples of 10 top algorithms, like:
Linear Regression, k-Nearest Neighbors, Support Vector Machines and much more…
Finally, Pull Back the Curtain on
Machine Learning Algorithms
Skip the Academics. Just Results.
相關推薦
6 Questions To Understand Any Machine Learning Algorithm
Tweet Share Share Google Plus There are a lot of machine learning algorithms and each algorithm
6 Steps To Write Any Machine Learning Algorithm From Scratch: Perceptron Case Study
This goes back to what I originally stated. If you don't understand the basics, don't tackle an algorithm from scratch. For the Perceptron, let's go ahead
How to Implement a Machine Learning Algorithm
Tweet Share Share Google Plus Implementing a machine learning algorithm in code can teach you a
Step Methodology To The Best Machine Learning Algorithm
Tweet Share Share Google Plus How do you choose the best algorithm for your dataset? Machine lea
How to Learn a Machine Learning Algorithm
Tweet Share Share Google Plus The question of how to learn a machine learning algorithm has come
How To Learn Any Machine Learning Tool
Tweet Share Share Google Plus Machine learning tools save you time by automating aspects of a ma
How to Tune a Machine Learning Algorithm in Weka
Tweet Share Share Google Plus Weka is the perfect platform for learning machine learning. It pro
Understand Any Machine Learning Tool Quickly (even if you are a beginner)
Tweet Share Share Google Plus How can you learn about a machine learning tool quickly? Using the
How to Better Understand Your Machine Learning Data in Weka
Tweet Share Share Google Plus It is important to take your time to learn about your data when st
How To Investigate Machine Learning Algorithm Behavior
Tweet Share Share Google Plus Machine learning algorithms are complex systems that require study
How To Load CSV Machine Learning Data in Weka (如何在Weka中載入CSV機器學習資料)
How To Load CSV Machine Learning Data in Weka 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/load-csv-machine-learning-data-weka/
[Machine Learning & Algorithm] 隨機森林(Random Forest)
閱讀目錄 回到頂部 1 什麼是隨機森林? 作為新興起的、高度靈活的一種機器學習演算法,隨機森林(Random Forest,簡稱RF)擁有廣泛的應用前景,從市場營銷到醫療保健保險,既可以用來做市場營銷模擬的建模,統計客戶來源,保留和流失,也可用來預測疾病的風險和病患
機器學習_論文筆記_1: A few useful things to know about machine learning
> 翻譯總結by joey周琦 希望把自己閱讀到的,覺得有營養的論文,總結筆記和自己想法,留給自己,也分享給大家。因為英文論文中一些專有,有難度的詞句,會給出英文原文。 這篇文章總結了有關機器學習的12條重要,簡單,明瞭的經驗。本文面對分類問題總結,但不限於分類問題。
How to Apply Industrial Machine Learning
The concept of machine learning is becoming better understood as we increasingly interact with it every day. From Netflix and Amazon recommendations, to Si
How to become a machine learning engineer: A cheat sheet
Machine learning engineers--i.e., advanced programmers who develop artificial intelligence (AI) machines and systems that can learn and apply knowledge--ar
How to deliver on Machine Learning projects
As Machine Learning (ML) is becoming an important part of every industry, the demand for Machine Learning Engineers (MLE) has grown dramatically. MLEs comb
Deploy any machine learning model serverless in AWS
When a machine learning model goes into production, it is very likely to be idle most of the time. There are a lot of use cases, where a model only needs t
Apache Spark sets out to standardize distributed machine learning training, execution, and deployment
We called it Machine Learning October Fest. Last week saw the nearly synchronized breakout of a number of news centered around machine learning (ML): The r
The Best Resources I Used to Teach Myself Machine Learning
The Best Resources I Used to Teach Myself Machine LearningThe field of machine learning is becoming more and more mainstream every year. With this growth c
5 terms to know about machine learning
Machine learning (ML) has taken imaging by storm. Though many radiologists are familiar with the concept, assessing the results of various algorithmic appr