top of page

Hands on machine learning - II

Updated: Sep 15, 2022

Introduction : Machine learning is an exciting branch of Artificial Intelligence, and it’s all around us. Machine learning brings out the power of data in new ways, such as Facebook suggesting articles in your feed. Machine learning is the field to study that gives computers the ability to learn without being explicitly programmed. As you input more data into a machine, this helps the algorithms teach the computer, thus improving the delivered results. When you ask Alexa to play your favourite music station on Amazon Echo, she will go to the station you played the most.


To identify groups of similar things is called clustering. It is the task of identifying similar instances and assigning them to clusters. Here are some clustering algorithms K-means and DBSCAN . Clustering is also used for image segmentation.


Suppose we have 5 blobs of instances. the K-means is an algorithm for clustering this kind of datasets. First, we need to specify the number of clusters k that the algorithm must find in the dataset. Each instances get assigned to one of the 5 clusters. The vast majority of the instances are assigned to the appropriate cluster but few instances gets mislabeled, especially near the boundary.

Instead of assigning each instance to a single cluster, which is called Hard clustering. It is always used to give each instance a score per cluster, called soft clustering. This score can be distance between the instance and the centroid.

In the second chart this is a technique called “Elbow Technique”, this is used to find the number of cluster to use in the model.

Limits of K-Means:

K-Means is not perfect. It is necessary to run the algorithm several times to avoid sub-optimal solutions, also to specify the number of clusters.

And also the K-Means does not perform well on clusters having varying sizes, different densities and non-spherical shapes.


This algorithm defines clusters as continuous regions of high density.

For each instance the algorithm counts how many instances are located within a small distance epsilon from it.

If an instance has at least min_samples instances in its epsilon_neighborhood then it is considered as core instance.

All instances in the neighborhood of a core instance belong to the same cluster.

Any instance that is not a core instance and does not have one in its neighborhood is considered an anomaly.

Naive Bayes:

We call this machine learning technique Naive Bayes because it makes a Naive assumption that features are independent of each other. In reality some of the features are dependent though.

If we flip a coin the probability of getting head is 1/2. Similarly, when we randomly pick a card, the probability of getting a queen is 4/52. Cause we have total of 52 cards and 4 queens. This is the basic concept of Naive Bayes.


The machine learning concepts seems pretty hard at first but as we dive deeper into the concept and apply it on our own this becomes easy. Creating machine learning models are more like trial and error, we have different methods to get best results and best accuracy. We have different functions to test the model on test data and validation data. If someone thinks it’s all about math he might be right but, to apply these concepts you don’t need mathematics. To know how the models actually works from inside and to learn the core concepts of these models you need knowledge on mathematics for sure.

Neural Networks are a different topic that’s beyond the scope of this article. This is a whole another level of concepts and ideas, to start with Neural Networks or Computer Vision ideas do checkout the articles here -

Sources :

  1. Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow [O’REILLY]

  2. Deep Learning from Scratch [O’REILLY]


25 views0 comments

Recent Posts

See All


bottom of page