Machine Learning In A Nutshell

Machine Learning as the name suggests is something where the machine learns. Machine Learning enables the machine to undertake tasks which are usually associated with humans. For instance, reading a handwritten text, conventionally the machines are not expected to read handwritten words but once the machine learns about how handwritten alphabets look like, it can read the text.

Let us take another example where you are required to predict the price of house or an apartment in a particular neighbourhood. Humans try to evaluate the price on the basis of price/sq m, but is it the only thing that should be kept in mind? Certainly not, so in this case the machine is fed with information such as location, crime rate, air quality, income status of neighbourhood. The machine will then take into account all the features of the neighbour and then predict price of the house.

Machine Learning finds a lot of utility in the IT sector as people find a lot of potential in the field. Machine Learning is really fun and is not tough at all since there is no problem specific code and usually a bit of changes in the template algorithm is all that is needed to get started.


How does a person learn something? Of course, he is taught something either visually or orally. The person is given some information that he tries to retain. Similarly a machine is to be fed with some information. This information in case of machines is fed using a dataset. A dataset is a table of features or scores that is fed to the machine to learn. This is the where the machine gains its knowledge from.


There are three types of learning, Supervised learning, Unsupervised learning and Reinforcement learning. This red wine quality dataset will help us understand about Supervised and Unsupervised Learning and a rat-maze problem will help us understand Reinforcement learning.


Let us assume that you want to know about the quality of red wine based upon amount of its constituents. Red wine is a mixture of numerous constituents but for now let’s assume density, pH, sulphites and alcohol to be the main constituents which put huge impact on quality of wine.

The constituents of wine listed in the dataset work as independent variables whereas the quality of wine is a dependent variable since it depends on the values of the independent variables i.e. quality depends on density, pH, sulphates and alcohol.

The data above has several columns and the last one corresponds to the target values, since the machine is fed with values of quality so it has a target value on which the machine is trained.

The machine draws a relation so as to understand the effect of different variables on the quality of wine.

If you have now created a mixture of constituents and now you want to test the quality of wine.

The machine predicts a value that corresponds to the quality of wine made.


In the above case the problem was to predict the quality of wine so machine was trained on target values too but what if the problem is a classification problem i.e. the dataset contains of constituents only and the predicted quality of wine is not given.

Now, is the data useless? The data cannot be used be used to predict the quality of wine because machine was not able to draw the relation between different constituents and the quality of wine but the still the dataset isn’t waste. The dataset can be used to draw other conclusions, patterns and can help understand more about constituents of wine, for instance, the average pH level that should be maintained or the maximum amount of alcohol content.

The Unsupervised learning enables machine to form clusters and classify data on the basis of its learning.

If the problem is to classify the data between two classes i.e. Class I for high alcoholic wine and Class II for less alcoholic wine then they can be classified using Unsupervised learning. The graph shows these two classes in different colors which shows clusters, blue for less alcoholic wine and red for high alcoholic wine.


Reinforcement learning is a hit and trial based learning. In order to understand Reinforcement learning it is important that we get familiar with these two terms, agent and environment.

Agent is something that makes decisions, has the ability to perceive the environment and act accordingly whereas environment is where the agent performs the actions. The agent interacts with the environment. Each time the agent receives a new observation, it performs an action from the set of actions available. An agent together with an environment is called a world.

Let us take an example of a rat-maze to understand what agents and environments really are.

There is a rat at the starting point and that rat has to reach to the cheese which is the destination, finding the shortest and optimal path to end point. In this case the rat is the agent since it has to move in order to reach its goal. This rat is performing an action in the maze so the maze is the environment.

In this case the rat starts its journey towards the piece of cheese. It travels until there is an obstacle. Everytime the rat moves further can be seen as a reward since it is getting close to the goal, but as soon as an obstacle occurs it has to change its path. There might be time when the rat might need to retrace its path back to a point in order to continue further. So this can be seen as a penalty or punishment.

Reinforcement learning works in the same way, the agent performs an action, the state gets changed and at every point it either gets rewards (if the state gets changed in favour) or it gets penalties (if the state does not change in favour).


The dataset is split into two parts to check for the best fitted model for the dataset so that there is maximum accuracy. The two parts in which the dataset is split are training set and test set. Suppose the length of the dataset is 100 and the dataset is split in an 80:20 manner, this means that the machine learns on the 80 observations and the machine tests itself on the rest 20 observations. The Machine Learning model predicts the target value and using the values in the dataset the mean deviation from the actual target value is predicted. The model with minimum deviation or error the best fitted model for the dataset.


There are numerous models on which you can train your dataset and you will get to learn about them with due course of time.

•Some of the models, used for prediction are as follows

Linear Regression

Multiple Linear Regression

Logistic Regression

Polynomial Regression

• Some of the models used in case of classification problems are as follows

K means Clustering

K Nearest Neighbour

Learn more about Machine Learning models through video tutorials and then start working on them.


You can find datasets on this website and start working towards on them to get hands on experience of Machine Learning.

Find me on Instagram, LinkedIn, Twitter and E-Mail me if you have any query!




21 years old sharing insights as he explores Writing, Business, Self-Help, and Technology |

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Monuments to Calculus

Classification Model : Decision Tree in R

Deep Learning Explainability: Hints from Physics

Semi Automatic Labeling : A step above the arduous task of manual labels (CVAT and Semantic…

Fundamentals of Deep Learning

Creating voice assistant for games (tutorial for FIFA)

Hierarchical Classification with Local Classifiers: Down the Rabbit Hole

A Survey of Deep learning based Object detection models

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Harshit Ahuja

Harshit Ahuja

21 years old sharing insights as he explores Writing, Business, Self-Help, and Technology |

More from Medium

Rapid revision notes: K Nearest Neighbors Algorithm

Memorising ML Algorithms.

Machine Learning for Match Making Trips

Decision trees and random forests