The Machine Learning Mastery Method
5-Steps To Get Started and Get Good at Machine Learning
I teach a 5-step process that you can use to get your start in applied machine learning.
It is unconventional.
The traditional way to teach machine learning is bottom-up.
Start with the theory and math, then algorithm implementations, then send you off to figure out how to start solving real-world problems.
The Machine Learning Mastery approach flips this and starts with the outcome that is most valuable.
It targets the outcome that business wants to pay for:
how to deliver a result.
A result in the form of a set of predictions or model that can reliably make predictions.
This is a top-down and results-first approach.
Starting with the goal of achieving the result that is most desirable in the marketplace, what is the shortest path to take you, the practitioner, to that result?
We can summarize this path in 5-steps as follows:
- Step 1: Adjust Mindset (believe!).
- Step 2: Pick a Process (how to get results).
- Step 3: Pick a Tool (implementation).
- Step 4: Practice on Datasets (put in the work).
- Step 5: Build a Portfolio (show your skills).
That’s it.
It’s why I created this website. I knew an easier way and just had to share it.
Below is a cartoon to illustrate the process, where step 1 (on mindset) and step 2 (on show your work) are omitted for brevity.
Let’s take a closer look at each step.
Step 0: Landmarks
Before we begin, you must know the landmarks of machine learning.
I often just assume this, but you cannot proceed unless you know some true basics.
For example:
- You should know what machine learning is and be able to explain it to a colleague.
- You should know some examples of machine learning problems off the top of your head
- You should know that machine learning is the only way to solve some complex problems.
- You should know that predictive modeling is the most useful part of applied machine learning.
- You should know where machine learning fits with regard to AI and Data Science
- You should know the types of machine learning algorithms available.
Step 1: Mindset
Machine learning is not just for the professors.
It is not just for the gifted or the academics.
You Must Believe
You can learn the topic and apply it to solve problems.
There’s no reason why not.
- You do not need to write code.
- You do not need to know or be good at math.
- You do not need a higher degree.
- You do not need big data.
- You do not need access to a supercomputer.
- You do not need a lot of time.
Really, there is only one thing that can stop you from getting started and getting good at machine learning.
It’s you.
- Maybe you just can’t find the motivation.
- Maybe you think you have to implement everything from scratch.
- Maybe you keep picking advanced problems rather than beginner problems to work on.
- Maybe you don’t have a systematic process to follow in order to deliver a result.
- Maybe you’re not making use of good tools and libraries.
Clear the limiting beliefs stopping you from getting started.
This post might help:
There are a lot of speed bumps you can hit.
Identify them, address them, and keep moving.
Why Machine Learning?
Once you know that you can do machine learning, understand why.
- Maybe you’re interested in learning more about machine learning algorithms.
- Maybe you’re interested in creating predictions.
- Maybe you’re interested in solving complex problems.
- Maybe you’re interested in creating smarter software.
- Maybe you’re even interested in becoming a data scientist.
Think hard on this topic and try and figure out your “why“.
This post might help:
Once you have your “why“, find your tribe.
Which group of machine learning practitioners do you have the most affinity?
- Maybe you’re a business person with a general interest.
- Maybe you’re a manager delivering a project.
- Maybe you’re a machine learning student.
- Maybe you’re a machine learning researcher.
- Maybe you’re a researcher with a sticky problem.
- Maybe you want to implement algorithms
- Maybe you need one-off predictions.
- Maybe you need a model you can deploy.
- Maybe you’re a data scientist.
- Maybe you’re a data analyst.
Each tribe has different interests and will approach the field of machine learning from a different direction.
Not all books and materials are right for you, find your tribe, then find the materials that speak to you.
This post might help:
Step 2: Pick a Process
Do you want to reliably get above average results on problem after problem?
You need to follow a systematic process.
- A process allows you to harness and reuse best practices.
- It means you don’t have to rely on memory or intuition.
- It guides you through a project end-to-end.
- It means that you always know what to do next.
- It can be tailored to your specific problem types and tools.
A systematic process is the difference between a roller coaster of good and bad results on the one hand and above average and forever improving results on the other.
I would choose above average and forever improving results every time.
A process template that I recommend is as follows:
- Step 1: Define your problem.
- Step 2: Prepare your data.
- Step 3: Spot-check algorithms.
- Step 4: Improve results.
- Step 5: Present results.
Below is a nice cartoon to summarize this systematic process:
You can learn more about this process in the post:
You do not have to use this process, but you do need a systematic process for working through predictive modeling problems.
Step 3: Pick a Tool
Pick a best-of-breed tool that you can use to deliver machine learning results.
Map your process onto the tool and learn how to use it most effectively.
There are three tools I recommend the most:
- Weka Machine Learning Workbench (Perfect for beginners). Weka offers a GUI interface and no code is required. I use it for quick one-off modeling problems.
- Python Ecosystem (Perfect for intermediate). Specifically pandas and scikit-learn on top of the SciPy platform. You can use the same code and models in development and they are reliable enough to run in operations.
- R Platform (Perfect for advanced). R was designed for statistical computing, and although the language is arcane and some of the packages are poorly documented, it offers the most methods as well as state of the art techniques.
I also have recommendations for specialty areas:
- Keras for Deep Learning. It uses Python meaning you can leverage the whole Python ecosystem which saves a lot of time. The interface is very clean, whilst also supporting the power of the Theano and Keras back-ends.
- XGBoost for Gradient Boosting. It is the fastest implementation of the technique around. It also supports both R and Python allowing you to leverage either platform in your project.
These are just my personal recommendations and I have lots of posts as well as more detailed training on each.
Learn how to use your chosen tool well. Study it. Become an expert in it.
What Programming Language?
The programming language does not matter.
Even the tool you use does not matter.
The skills you learn working through problems will transfer from platform to platform easily.
Nevertheless, here are some survey results on the most popular languages in machine learning:
Step 4: Practice on Datasets
Once you have a process and a tool, you need to practice.
You need to practice a lot.
Practice on standard machine learning datasets.
- Use real-world datasets, collected from an actual problem domain (rather than contrived).
- Use small datasets that fit into memory or an excel spreadsheet.
- Use well-understood datasets so you know what kind of results to expect.
Practice on different types of datasets. Practice on problems that make you uncomfortable as you will have to push your skills to get a solution. Seek out different traits in data problems, such as:
- Different types of supervised learning such as classification and regression.
- Different sized datasets from tens, hundreds, thousands and millions of instances.
- Different numbers of attributes from less than ten, tens, hundreds and thousands of attributes.
- Different attribute types from real, integer, categorical, ordinal and mixtures.
- Different domains that force you to quickly understand and characterize a new problem in which you have no previous experience.
Use the UCI Machine Learning Repository
These are the most used and best-understood datasets and the best place to start.
Learn more in the post:
Use machine learning competitions, such as Kaggle
These datasets are often larger and require more preparation to model well.
For a list of the most popular datasets that you could practice on, see the post:
Practice on problems of your own devising
Collect data on machine learning problems that matter to you.
You will find the problems and the solutions you devise so much more rewarding.
For more information, see the post:
Step 5: Build a Portfolio
You will build up a collection of completed projects.
Put them to good use.
As you work through datasets and get better, create semi-formal outputs that summarize your findings.
- Maybe upload your code and summarize it in a readme.
- Maybe you write up your results in a blog post.
- Maybe you make a slide deck.
- Maybe you create a little video on youtube.
Each one of these completed projects represents one piece of your growing portfolio.
Just like a painter, you can build a portfolio of completed work to demonstrate your growing skills in delivering results with machine learning.
You can learn more about this approach in the post:
You can use this portfolio yourself, leveraging code and knowledge in your prior results in larger and more ambitious projects.
Once your portfolio is mature, you may even choose to leverage it into more responsibility at work or into a new machine learning focused role.
For more on this see the post:
Tips And Tricks
Below are some practical tips and tricks you may consider when using this process.
- Start with a simple process (like above) and a simple tool (like Weka), then advance once you have confidence.
- Begin with the simplest and most used datasets (iris flowers and Pima diabetes).
- Each time you apply the process, look for ways to improve it and your usage of it.
- If you discover new methods, figure out the best way to integrate them into your process.
- Study algorithms, but only as much and in ways that help you achieve better results with your process.
- Study and learn from experts and see what methods you can steal and add to your process.
- Study your tool like you do predictive modeling problems and get the most out of it.
- Tackle harder and harder problems, leave the easy ones as you won’t learn much from them.
- Focus on clearly presenting results, the better you do this, the greater the impact of your portfolio.
- Engage in the community on forums and Q&A sites, both ask and answer questions.
Summary
In this post, you discovered a simple 5-step process that you can use to get started and make progress in applied machine learning.
Although simple to layout, the approach does take hard work, but it does payoff.
Many of my students worked through this process and got work as machine learning engineers and data scientists.
If you are in a deeper treatment of this process and related ideas, see the post:
Do you have any questions?
Ask in the comments below and I will do my best to answer.
相關推薦
The Machine Learning Mastery Method
Tweet Share Share Google Plus 5-Steps To Get Started and Get Good at Machine Learning I teach a
學習 Machine Learning Mastery With Python (1)
測試套件 實際應用 十分 機器學習 小數 機器學習算法 很多 結果 分鐘 1 介紹 1.1 機器學習的錯誤的想法 一定要對python 編程和python語法非常了解 深入學習scikit learn使用的機器學習算法的理論和參數 避免或者不能接觸實際項目中的其他部分。
Shooting The Machine Learning Rapids With Open Source
There are a lot of different kinds of machine learning, and some of them are not based exclusively on deep neural networks that learn from tagged text, aud
The Machine Learning Revolution: How Artificial Intelligence Could Transform Your Business AllBusiness.com
With a technology as impactful as machine learning (ML), it can be difficult to avoid hyperbole. Sure, billions of dollars in investment are pouring into M
Machine Learning Mastery
The weights of artificial neural networks must be initialized to small random numbers. This is because this is an expectation of the stochastic optimizati
Machine Learning Mastery Ebooks
Machine Learning Mastery With Python Understand Your Data, Create Accurate Models and work Projects End-to-End Discover the process that you can use to ge
Machine Learning Mastery Blog
Time series data must be transformed into a structure of samples with input and output components before it can be used to fit a supervised learning model
Machine Learning Mastery With Python
I believe my books offer thousands of dollars of education for tens of dollars each. They are months if not years of experience distilled into a few hundre
Machine Learning Mastery Super Bundle
I believe my books offer thousands of dollars of education for tens of dollars each. They are months if not years of experience distilled into a few hundre
Join the Machine Learning Revolution
We use cookies and other tracking technologies to improve your browsing experience on our site. To learn more, please read our updated privacy policy. By c
How IoT could unleash the real power of the machine learning
What is so interesting about machine learning? Why is machine learning considered the future? Do you think a cognitive system will ever be able to
New method peeks inside the 'black box' of artificial intelligence: Researchers help explain why machine learning algorithms som
It can be challenging for computer scientists to figure out what went wrong in such cases. This is because many machine learning algorithms learn from inf
Estimating the number of receiving nodes in 802.11 networks via machine learning
當前 網絡通信 works 存儲 bsp ron 測量 分析 輸入 來源:IEEE International Conference on Communications 作者:Matteo Maria 年份:2016 摘要: 現如今很多移動設備都配有多個無線接口,比如藍牙
Machine Learning is Fun Part 5: Language Translation with Deep Learning and the Magic of Sequences
Making Computers TranslateSo how do we program a computer to translate human language?The simplest approach is to replace every word in a sentence with the
the resource for machine learning
Questions and Answers What's matrix dot product in Deep Learning? Deep Neural Network with Matrices https://matrices.io/deep-neural-network-from-scrat
How AI & Machine Learning Are Redefining The War For Talent
These and many other fascinating insights are from Gartner's recent research note, Cool Vendors in Human Capital Management for Talent Acquisition (PDF, 13
[Infographic] The Best Tools for Machine Learning Gengo AI
Machine learning projects can range from small datasets and standard algorithms, to much larger projects that use neural networks engines with massive data
Dr. Data Show Video: Why Machine Learning Is the Coolest Science
Watch the premiere episode of The Dr. Data Show, which answers the question, "What makes machine learning the coolest science?" This new web series breaks
Machine learning -- Is the emperor wearing clothes?
Machine learning uses patterns in data to label things. The core concepts are actually embarrassingly simple. I say "embarrassingly" because if someone mad
Machine learning — Is the emperor wearing clothes?
Algorithms for hipstersThese days, no data science hipster is into the humble straight line. Flexible, squiggly shapes are all the rage among today’s fashi