1. 程式人生 > >How to Build an Intuition for Machine Learning Algorithms

How to Build an Intuition for Machine Learning Algorithms

Machine learning algorithms are complex. To get good at applying a given algorithm you need to study it from multiple perspectives: algorithmic, mathematical and empirical.

It’s this last point I want to stress. You need to build up an intuition or how an algorithm behaves on real data. You need to work on lots of problems.

In this post I want to encourage you to use small in-memory datasets when starting out and when practising machine learning.

Wrapping your head around data

Wrapping your head around data
Photo by Nic McPhee, some rights reserved

Study an Algorithm or a Problem, Not Both

You can’t learn a problem and an algorithm at the same time.

If you try, you will progress on both slowly and inefficiently. Your focus will be divided and nether task are being executed ideally.

You will known when you’re in this track because you will oscillate between diving deep into the problem and deep into a specific algorithm. You will be frustrated and overwhelmed. You’re taking on too much.

Get your FREE Algorithms Mind Map

Machine Learning Algorithms Mind Map

Sample of the handy machine learning algorithms mind map.

I've created a handy mind map of 60+ algorithms organized by type.

Download it, print it and use it. 

Download For Free


Also get exclusive access to the machine learning algorithms email mini-course.

Split Your Concerns

The best course of action is to study the algorithm and the problem separately.

You study the problem by using algorithms to learn more about it and posit candidate solutions in the form of models. This means you will be experimenting a lot of models (spot checking) and likely a lot of algorithm configurations (tuning).

You study an algorithm by focusing on one problem dataset and using it to learn more about the interactions of the algorithms parameters and their effects on the model, such as the final result or behaviour over time.

It is this second type of project were you can use empirical experiments to build an intuition into how machine learning algorithms work. You can pair this intuition with theory of why they work and aim to make informed decisions around which algorithm to use and when for a given problem in the future.

Play the Scientist

You are looking to characterize the behaviours of the algorithm as a system on a controlled problem.

The focus of the study is a question, such as:

What is the information processing strategy of the algorithm?
How does the system behave when a given parameter is varied?

Clearly define the specific question you intend to answer with your study before you gets started. Be clear on what form the answer will take.

Practical Results

Studying algorithms has some specific tangible benefits that improve your machine learning skills, such as:

  • Algorithm Tuning: You are learning how the algorithm behaves as a complex system and the influence the algorithm parameters have on those behaviors. These are invaluable insights and intuitions needed for tuning the algorithm on specific problem instances.
  • Problem-Algorithm fit: You are learning about the classes of algorithms and specific algorithm instances that perform well on classes of problems and problem instances. This is an intuition that can only be built up from experience.
  • Project Life-cycle: You are practising the process of applied machine learning from data preparation, algorithm testing and tuning and the presentation of results.

They key is having standard well understood datasets that you can use to better understand the algorithm under study.

Use Standard Datasets

You can use one or a small number of model datasets to study a machine learning algorithm.

Sometimes they are called toy datasets or toy problems, because of their size. Nevertheless, they play an important role when you are learning about and practising machine learning algorithms.

Different datasets have different known properties. It is often desirable to select a small set of those properties to expose different behaviours of an algorithm under study.

For example some properties may include

  • Number of Features
  • Class Distribution
  • Data Types
  • Structured Relationship

5 Benefits of Model Datasets

Below are 5 benefits you get in using standard machine learning datasets.

  • Small: The dataset can fit into memory. This means you can run a lot of experiments, quickly and in turn learn about the algorithm quickly.
  • Understood: The dataset is generally understood. It may have significant literature behind it or be a common point of test and study for algorithms. It has known properties for testing the capability of an algorithm.
  • Controlled: A model dataset constant and provides the basis for controlled experiments. The behavior of the algorithm can be varied to see the effects on the results against the well understood problem.
  • Free: Model datasets are available for download. You do not need permission or to pay a license fee. The common data sets are available for you to use whenever you need.
  • Simple: The structure or relationships in the data are not complex. They can be easily understood, described with summary statistics and graphs. There are typically few variables.

UCI Machine Learning Repository

Some tools come with sample datasets, but one great source that you can trust to be consistent is the University of California Irvine Machine Learning Repository.

It is a website that hosts hundreds of standard machine learning datasets used in academia for testing, demonstrating and empirically characterizing the behaviours of machine learning algorithms.

You can browse datasets on this site, look at the data, and review papers and articles that have made reference to the dataset.

It is a valuable resources that you can use to find datasets to study a machine learning algorithm.

5 Classic Model Datasets

Below are a list of 5 class datasets that I like to use when getting familiar with a new algorithm or an old algorithm I’ve forgotten about.

  • Iris Flower: Describes iris flower in terms of the dimensions of the flowers divided into three species classes.
  • Ionosphere: Describes radar return data characterizing engergy states in the ionosphere. All attributes are numeric and the class is binary.
  • Pima Indians Diabetes: Varied medical record data for Pima Indians with a binary class of whether the patient had an onset of diabetes within 5 years from when the medical data was collected.
  • Glass Identification: Identification of class based on the chemical composition of samples, multiple unbalanced classes.
  • Wisconson Breast Cancer: Medical biopsy information from breast cancer patients and a binary class variable of whether the sample was cancerous.

You may find one or more of these datasets useful in your own experiments.

Summary

In this post you discovered the difficulties when attempting to learn about a problem dataset and an algorithm at the same time. In fact, they are competing concerns.

You discovered that the answer is to separate those concerns into learning about your problem and learning about an algorithm, and being clear on what your goals are.

You discovered the benefits of small model datasets when learning about an algorithm, where to get standard machine learning datasets and some popular examples you could start with.

If you would like to know more about how to study machine learning algorithms, take a look at my algorithm description template for learning any algorithm and small projects methodology guides for self-study projects including studying algorithms.


Frustrated With Machine Learning Math?

Mater Machine Learning Algorithms

See How Algorithms Work in Minutes

…with just arithmetic and simple examples

It covers explanations and examples of 10 top algorithms, like:
Linear Regression, k-Nearest Neighbors, Support Vector Machines and much more…

Finally, Pull Back the Curtain on
Machine Learning Algorithms

Skip the Academics. Just Results.


相關推薦

How to Build an Intuition for Machine Learning Algorithms

Tweet Share Share Google Plus Machine learning algorithms are complex. To get good at applying a

How to Build an Ensemble Of Machine Learning Algorithms in R (ready to use boosting, bagging and stacking)

Tweet Share Share Google Plus Ensembles can give you a boost in accuracy on your dataset. In thi

How to Build an High Availability MQTT Cluster for the Internet of Things

1. Setting up the MQTT broker MQTTis a machine-to-machine (M2M)/“Internet of Things” connectivity protocol. It was designed as an extremely lightweight

How to build an Ethereum Wallet web app

To send Ether, we need to use native functions provided by the web3.js library, while sending tokens and checking balances involves interaction with a smar

How to build a case for a product redesign

The overarching theme that emerged was that our product’s information hierarchy was unclear, which I pitched to stakeholders as the primary problem we shou

Any tips on how to build an audience before MVP ready?

My co-founders and I are building a social media app and thought it made sense to go stealth mode prior to launching due to what we perceive to be a high c

How to build an awesome stocks spreadsheet with React 16

How to build an awesome stocks spreadsheet with React 16React 16 is the first version of React built on top of React’s new core architecture, codenamed “Fi

How to Normalize and Standardize Your Machine Learning Data in Weka

Tweet Share Share Google Plus Machine learning algorithms make assumptions about the dataset you

Introduction to Random Number Generators for Machine Learning in Python

Tweet Share Share Google Plus Randomness is a big part of machine learning. Randomness is used a

How to Kick Ass in Competitive Machine Learning

Tweet Share Share Google Plus David Kofoed Wind posted an article to the Kaggle blog No Free Hun

How to Work Through a Regression Machine Learning Project in Weka Step

Tweet Share Share Google Plus The fastest way to get good at applied machine learning is to prac

How To Talk About Data in Machine Learning (Terminology from Statistics and Computer Science)

Tweet Share Share Google Plus Data plays a big part in machine learning. It is important to unde

How to Layout and Manage Your Machine Learning Project

Tweet Share Share Google Plus Project layout is critical for machine learning projects just as i

How To Handle Missing Values In Machine Learning Data With Weka

Tweet Share Share Google Plus Data is rarely clean and often you can have corrupt or missing val

How to Build an AWS DeepLens Project with Amazon SageMaker

Amazon Web Services is Hiring. Amazon Web Services (AWS) is a dynamic, growing business unit within Amazon.com. We are currently hiring So

How to build Clang toolchains for Android NDK from source code

we have some source changes to LLVM/Clang need add into NDK. After download and change "external/llvm" and build NDK from source. It find

Machine Learning: How to Build a Model From Scratch

As an online travel booking company, Momentum Travel realized early on that identifying and preventing fraud is a vital part of their business. Hear from S

How to build a Deep Learning Image Classifier for Game of Thrones dragons

Performance of most flavors of the old generations of learning algorithms will plateau. Deep learning, training large neural networks, is scalable and perf

Assessing Annotator Disagreements in Python to Build a Robust Dataset for Machine Learning

Assessing Annotator Disagreements in Python to Build a Robust Dataset for Machine LearningTea vs. Coffee: the perfect example of decisions and disagreement

How to Create a Linux Virtual Machine For Machine Learning Development With Python 3

Tweet Share Share Google Plus Linux is an excellent environment for machine learning development