1. 程式人生 > >Why and What of Machine Learning?

Why and What of Machine Learning?

The world around us is rapidly changing with machines becoming more intelligent. Machines learn from data that we have collected over the years and what we generate each day. Machine learning is not a new concept but was actually coined by Arthur Samuel in the year 1959. Today we see almost everyone pursuing their way to adopt machine learning. It is primarily because, today, we have the necessary resources, the huge amount of data that can be used to draw meaningful conclusions and the processing power to learn from the data and make predictions.

Machine learning is not just a buzzword but a whole new dimension of knowledge. Who could have known that machines could get insights that no one even knew existed. The recommendations that you get on popular e-commerce websites like Amazon, Flipkart etc. are using machine learning to recommend items that you would probably buy. The digital assistants like Google Assistant, Siri, and Alexa are all using machine learning to show you answers to the infinite combinations of questions you might have. Tesla has developed automatic electric cars that have machine learning based autopilots to drive on roads without human intervention.

With machine learning being recognised as an important breakthrough in almost all fields, it becomes important to understand what it really is. But first, let’s understand why such a concept would be useful to anyone with a very basic example of SPAM vs HAM (non-spam) emails.

Why Machine Learning?

Whenever you access your email through an email client like Gmail or Outlook, you must have noticed that some emails are in your inbox while others move to the SPAM folder without you explicitly telling the client or the email service provider that certain emails are spam mails for you. This is not a miracle but rather a machine learning application. The systems are trained on a huge collection of emails which have previously been identified as SPAM or HAM emails. This training has given them the power to distinguish and categorise any new email as it is received.

The application seems quite basic but imagine the situation if each user had to do this categorisation each day; that’s how powerful it is. Suppose there was no such system and each day your inbox was filled with irrelevant emails. You would have to spend 15–20 minutes daily just to sort out the relevant emails and delete the irrelevant ones. Repeating the same task everyday would be really tedious and time wasting. This is when a machine can step in and make a world of difference. It would save you more time, and energy.

The same principle gets replicated into businesses using huge sums of data generated each day to solve real life problems. In fact, each day we are generating approximately 2.5 Quintillion Bytes of data. Businesses can leverage all this data with immense potential and draw meaningful information out of it.

Definition

“person drawing black robot with stars on paper” by rawpixel on Unsplash

There is no single definition that can fully explain what machine learning is. So, below is a definition that explains the crux of it.

Machine learning can be defined as the ability of machines to learn from data in a way that they are able to make accurate predictions (to certain extent) without the programmer actually programming the machines for new data points.

Let’s break up this definition to understand it better. It states that given a dataset (large amount of data usually proves to be better), we can use an algorithm that learns the data and tries to identify any patterns in it. For example, in the SPAM/HAM example, the algorithm will try to understand what words or style is followed in emails that are considered as SPAM or HAM. Once the algorithm understands these patterns, it reaches a state where it can now predict whether a new data point (in this example, a new email) will likely be a SPAM or a HAM with a certain accuracy.

Machine learning is a multi-disciplinary field which includes the juxtaposition of Computer Science, Statistics, and Mathematics that work together to solve problems using data. There are several machine learning algorithms such as Support Vector Machine, Random Forest, K-Means Clustering algorithm and many others that are used all across the globe.

Classification of Machine Learning

Machine learning is classified into three different types, namely, Supervised Machine Learning, Unsupervised Machine Learning and Reinforcement Learning.

Supervised Machine Learning: When you have a dataset where you know the various factors that can affect your prediction as well as the outputs, we use supervised machine learning. Here, for each input X we put in the algorithm, we get an output y. The true output value for the input is already known to us. Correction is made if the prediction is wrong, improving our model and increasing its accuracy. This process continues till maximum information can be retrieved out of the data. The algorithms aim to map the inputs to the respective outputs with as much accuracy as possible. Examples include Logistic Regression, Naive Bayes etc.

“aerial photography of rural” by Breno Assis on Unsplash

Predicting House Prices: Suppose we have a dataset of houses with features like area, location, rooms and sale prices. We can use Supervised Machine Learning to train an algorithm to find a pattern in the features and their relation to house prices. So, whenever we give the details of a new house, our algorithm can give a good estimate of its selling price.

Unsupervised Machine Learning: Sometimes we come across problems where we have the data but do not know what the output might look like. The data has hidden information that machine learning can understand and reveal but because it lacks proper labels, supervised learning cannot be applied. In such a scenario, unsupervised machine learning can prove to be helpful. Unsupervised learning would classify the data and produce classes based on similarities that exist in data. Common example algorithms include Principal Component Analysis (PCA), K-Means Clustering etc.

“person holding remote pointing at TV” by freestocks.org on Unsplash

Recommendation System: Suppose we have a dataset of movies and a few users. Once the users start watching movies, their preferences are recorded. Even though there were no classes earlier in the dataset, the users may now be classified using Unsupervised Machine Learning into groups (clusters) where users in a given group share common movie interests. Now, considering both users A and B belong to the same group, if user A saw a movie and liked it, the same movie can also be recommended to user B.

Reinforced Learning: In this class of machine learning, machines learn and try to adapt to an ideal behaviour in the given environment. The algorithm understands the environment and takes available actions, adopting the ones that lead to rewards. The algorithm uses a reward based system to improve its performance. This hit-and-trial approach helps the algorithm to gradually improve over time and train all by itself. Examples include Q-Learning, Monte Carlo etc.

Games: Reinforcement Learning is really useful in training models that can learn to play games. AlphaGo Zero is an AI powered system that has successfully been able to play the game of Go by starting from the scratch, playing games with itself and acting as the teacher giving rewards and punishments. It used reinforcement learning and became so experienced in itself that it now holds the potential to win against the best human Go player.

Conclusion

Machine learning is a really beautiful concept, which is solving real life problems using data. It’s just the beginning of an era where so much more is yet to be explored, and who knows what more information does our data have for us.