# How the Hierarchical Clustering Algorithm Works

# How the Hierarchical Clustering Algorithm Works

Hierarchical Clustering is an **unsupervised Learning Algorithm**, and this is one of the most popular clustering technique in Machine Learning.

Expectations of getting insights from machine learning algorithms is increasing abruptly. Initially, we were limited to predict the future by feeding historical data.

This is easy when the **expected results and the features** in the historical data are available to build the supervised learning models, which can predict the future.

For example predicting the **email is spam or not**, using the historical email data.

Learn hierarchical clustering algorithm in detail also, learn about agglomeration and divisive way of hierarchical clustering. #clustering #hierarchicalclustering

But the real world problems are not limited to supervised type, and we do get the unsupervised problems too.

How to build the models for such problems?

Where comes the unsupervised learning algorithms.

In this article, we are going to learn one such popular unsupervised learning algorithm which is hierarchical clustering algorithm.

Before we start learning, Let’s look at the topics you will learn in this article. Only if you read the complete article 🙂

Before we understand what hierarchical clustering is, its benefits, and how it works. Let us learn the unsupervised learning algorithm topic.

## What is Unsupervised Learning

Unsupervised learning is training a machine using information that is **neither classified nor labeled** and allows the machine to act on that information **without** guidance.

In Unsupervised Learning, a machine’s task is to group unsorted information according to **similarities, patterns**, and differences without any prior data training. It is defined as

“Unsupervised Learning Algorithm is a machine learning technique, where you don’t have to supervise the model. Rather, you need to allow the model to work on its own to discover information, and It mainly deals with unlabelled data.”

If you want to know more, we would suggest you to read the **unsupervised learning algorithms** article.

### Types of Unsupervised Learning Algorithm

Unsupervised Learning algorithms are classified into two categories.

**Clustering:**Clustering is a technique of grouping objects into clusters. Objects with the most similarities remain in a group and have less or no similarities with another group’s objects.**Association:**Association rule in unsupervised learning method, which helps in finding the relationships between variables in a large database.

### Unsupervised Learning Algorithms

The list of some popular Unsupervised Learning algorithms are:

**K-means Clustering**- Hierarchical Clustering
- Principal Component Analysis
- Apriori Algorithm
- Anomaly detection
- Independent Component Analysis
- Singular value decomposition

Before we learn about **hierarchical clustering**, we need to know about clustering and how it is different from classification.

## What is Clustering

Clustering is an important technique when it comes to the unsupervised learning algorithm. Clustering mainly deals with finding a structure or pattern in a collection of uncategorized data.

It is a technique that groups similar objects such that objects in the same group are identical to each other than the objects in the other groups. The group of similar objects is called a Cluster.

### How is clustering different from classification?

As a data science beginner, the difference between clustering and classification is confusing. So as the initial step, let us understand the fundamental difference between **classification and clustering. **

For example,

Let us say we have **four categories**:

- Dog
- Cat
- Shark
- Goldfish

In this scenario, clustering would make 2 clusters. The one who lives on land and the other one lives in water.

So the entities of the first cluster would be dogs and cats. Similarly, for the second cluster, it would be sharks and goldfishes.

But in classification, it would classify the four categories into four different classes. One for each category.

So dogs would be classified under the class dog, and similarly, it would be for the rest.

**In classification,** we have labels to tell us and supervise whether the classification is right or not, and that is how we can classify them right. Thus making it a **supervised learning algorithm. **

But in clustering, despite distinctions, we cannot classify them because we don’t have labels for them. And that is why clustering is an **unsupervised learning algorithm**.

In real life, we can expect high volumes of data without labels. Because of such great use, clustering techniques have many real-time situations to help. Let us understand that.

### Applications of Clustering

Clustering has a large number of applications spread across various domains. Some of the most popular applications of clustering are:

**Recommendation Engines**- Clustering similar news articles
- Medical Imaging
- Image Segmentation
- Anomaly detection
- Pattern Recognition

Till now, we got the in depth idea of what is unsupervised learning and its types. We also learned what clustering and various applications of the clustering algorithm.

Now have a look at a detailed explanation of what is **hierarchical clustering** and why it is used?

## What is Hierarchical Clustering

Hierarchical clustering is one of the popular clustering techniques after **K-means Clustering**. It is also known as Hierarchical Clustering Analysis (HCA)

Which is used to group unlabelled datasets into a Cluster. This Hierarchical Clustering technique builds clusters based on the **similarity** between different objects in the set.

It goes through the various features of the data points and looks for the similarity between them.

This process will continue until the dataset has been grouped. Which creates a **hierarchy** for each of these clusters.

Hierarchical Clustering deals with the data in the form of a tree or a **well-defined hierarchy**.

Because of this reason, the algorithm is named as a hierarchical clustering algorithm.

This hierarchy way of clustering can be performed in two ways.

**Agglomerative:**Hierarchy created from bottom to top.**Divisive:**Hierarchy created from top to bottom.

In the next section of this article, let’s learn about these two ways in detail. For now, the above image gives you a high level of understanding.

In the early sections of this article, we were given various algorithms to perform the clustering. But how is this hierarchical clustering different from other techniques?

Let’s discuss that.

## Why Hierarchical Clustering

As we already have some clustering algorithms such as K-Means Clustering, then why do we need Hierarchical Clustering?

As we have already seen in the **K-Means Clustering algorithm article**, it uses a pre-specified number of clusters. It requires advanced knowledge of **K**., i.e., how to define the number of clusters one wants to divide your data.

Still, in hierarchical clustering no need to pre-specify the number of clusters as we did in the K-Means Clustering; one can stop at any number of clusters.

Furthermore, Hierarchical Clustering has an advantage over K-Means Clustering. i.e., it results in an attractive tree-based representation of the observations, called a **Dendrogram**.

## Types of Hierarchical Clustering

The Hierarchical Clustering technique has two types.

**Agglomerative Hierarchical Clustering**- Start with points as individual clusters.
- At each step, it merges the closest pair of clusters until only one cluster ( or K clusters left).

**Divisive Hierarchical Clustering**- Start with one, all-inclusive cluster.
- At each step, it splits a cluster until each cluster contains a point ( or there are clusters).

## Agglomerative Clustering

It is also known as **AGNES** ( Agglomerative Nesting) and follows the **bottom-up **approach.

Each observation starts with its own cluster, and pairs of clusters are merged as one moves up the hierarchy.

That means the algorithm considers each data point as a single cluster initially and then starts combining the closest pair of clusters together.

It does the same process until all the clusters are merged into a single cluster that contains all the datasets.

### How does Agglomerative Hierarchical Clustering work

Let’s take a sample of data and learn how the agglomerative hierarchical clustering work step by step.

#### Step 1

First, make each data point a “single - cluster,” which forms N clusters. (let’s assume there are N numbers of clusters).

#### Step 2

Take the next two closest data points and make them one cluster; now, it forms **N-1 clusters**.

#### Step 3

Again, take the two clusters and make them one cluster; now, it forms N-2 clusters.

#### Step 4

Repeat ‘Step 3’ until you are left with **only one cluster**.

Once all the clusters are combined into a big cluster. We develop the Dendrogram to divide the clusters.

For the divisive hierarchical clustering, it treats all the data points as one cluster and splits the clustering until it creates meaningful clusters.

## Difference ways to measure the distance between two clusters

There are several ways to measure the distance between in order to decide the rules for clustering, and they are often called Linkage Methods.

Some of the popular linkage methods are:

- Simple Linkage
- Complete Linkage
- Average Linkage
- Centroid Linkage
- Ward’s Linkage

### Simple Linkage

Simple Linkage is also known as the **Minimum Linkage (MIN)** method.

In the Single Linkage method, the distance of two clusters is defined as the minimum distance between an object (point) in one cluster and an object (point) in the other cluster. This method is also known as the **nearest neighbor method**.

#### Pros and Cons of Simple Linkage method

#### Pros of Simple Linkage

#### Cons of Simple Linkage

### Complete Linkage

The complete Linkage method is also known as the **Maximum Linkage (MAX)** method.

In the Complete Linkage technique, the distance between two clusters is defined as the maximum distance between an object (point) in one cluster and an object (point) in the other cluster.

And this method is also known as the furthest neighbor method.

#### Pros and Cons of Complete Linkage method

#### Pros of Complete Linkage

#### Cons of Complete Linkage

### Average Linkage

In the Average Linkage technique, the distance between two clusters is the average distance between each cluster’s point to every point in the other cluster.

This method is also known as the **unweighted pair** group method with arithmetic mean.

#### Pros and Cons of the Average Linkage method

#### Pros of Average Linkage

#### Cons of Average Linkage

### Centroid Linkage

In the Centroid Linkage approach, the distance between the two sets or clusters is the distance between two mean vectors of the sets (clusters).

At each stage, we combine the two sets that have the **smallest centroid **distance. In simple words, it is the distance between the centroids of the two sets.

#### Pros and Cons of Centroid Linkage method

#### Pros of Centroid Linkage

#### Cons of Centroid Linkage

### Ward’s Linkage

Ward's Linkage method is the similarity of two clusters. Which is based on the increase in squared error when two clusters are merged, and it is similar to the group average if the distance between points is distance squared.

#### Pros and Cons of Ward’s Linkage method

#### Pros of Ward's Linkage

#### Cons of Ward's Linkage

Some of the other linkage methods are:

- Strong Linkage
- Flexible linkage
- Simple Average

The Linkage method’s choice depends on you, and you can apply any of them according to the type of problem, and different linkage methods lead to different clusters.

Below is the comparison image, which shows all the linkage methods. We took this reference image from greatlearning platform **blog.**

Hierarchical Clustering algorithms generate clusters that are organized into hierarchical structures.

These hierarchical structures can be visualized using a tree-like diagram called **Dendrogram**.

Now let us discuss Dendrogram.

## What is Dendrogram

A Dendrogram is a diagram that represents the **hierarchical relationship** between objects. The Dendrogram is used to display the distance between each pair of sequentially merged objects.

These are commonly used in studying hierarchical clusters before deciding the number of clusters significant to the dataset.

The distance at which the two clusters combine is referred to as the dendrogram distance.

The primary use of a dendrogram is to work out the best way to allocate objects to clusters.

The key point to interpreting or implementing a dendrogram is to focus on the closest objects in the dataset.

Hence from the above figure, we can observe that the objects P6 and P5 are very close to each other, merging them into one cluster named C1, and followed by the object P4 is closed to the cluster C1, so combine these into a cluster (C2).

And the objects P1 and P2 are close to each other so merge them into one cluster (C3), now cluster C3 is merged with the following object P0 and forms a cluster (C4), the object P3 is merged with the cluster C2, and finally the cluster C2 and C4 and merged into a single cluster (C6).

Till now, we have a clear idea of the Agglomerative Hierarchical Clustering and Dendrograms.

Now let us implement python code for the Agglomerative clustering technique.

## Agglomerative Clustering Algorithm Implementation in Python

Let us have a look at how to apply a hierarchical cluster in python on a **Mall_Customers dataset**.

If you remembered, we have used the same dataset in the k-means clustering algorithms implementation too.

Please refer to **k-means article** for getting the dataset.

### Importing the libraries and loading the data

We are importing all the necessary libraries, then we will load the data.

### Dendrogram to find the optimal number of clusters

### Training the Hierarchical Clustering model on the dataset

Now, we are training our dataset using Agglomerative Hierarchical Clustering.

## Advantages and Disadvantages of Agglomerative Hierarchical Clustering Algorithm

#### Advantages

#### Disadvantages

## Divisive Hierarchical Clustering

Divisive Hierarchical Clustering is also known as **DIANA** (Divisive Clustering Analysis.)

It is a top-down clustering approach. It works as similar as Agglomerative Clustering but in the **opposite** direction.

This approach starts with a single cluster containing all objects and then splits the cluster into two least similar clusters based on their characteristics. We proceed with the same process until there is one cluster for each observation.

Here, the divisive approach method is known as rigid, i.e., once a splitting is done on clusters, we can't revert it.

### Steps to perform Divisive Clustering

- Initially, all the objects or points in the dataset belong to one single cluster.
- Partition the single cluster into two least similar clusters.
- And continue this process to form the new clusters until the desired number of clusters means one cluster for each observation.

## Strengths and Limitations of Hierarchical Clustering Algorithm

For every algorithm, we do have strengths and limitations. If we don't know about these, we end up using these algorithms in the cases where they are limited not to use. So let’s learn this as well.

### Strengths of Hierarchical Clustering

- It is to understand and implement.
- We don’t have to pre-specify any particular number of clusters.
- Can obtain any desired number of clusters by cutting the Dendrogram at the proper level.

- They may correspond to meaningful classification.
- Easy to decide the number of clusters by merely looking at the Dendrogram.

### Limitations of Hierarchical Clustering

- Hierarchical Clustering does not work well on vast amounts of data.
- All the approaches to calculate the similarity between clusters have their own disadvantages.
- In hierarchical Clustering, once a decision is made to combine two clusters, it can not be undone.
- Different measures have problems with one or more of the following.
- Sensitivity to noise and outliers.
- Faces Difficulty when handling with different sizes of clusters.
- It is breaking large clusters.
- In this technique, the order of the data has an impact on the final results.

## Conclusion

In this article, we discussed the hierarchical cluster algorithm’s in-depth intuition and approaches, such as the Agglomerative Clustering and Divisive Clustering approach.

Hierarchical Clustering is often used in the form of descriptive rather than predictive modeling.

Mostly we use Hierarchical Clustering when the application requires a hierarchy. The advantage of Hierarchical Clustering is we don’t have to pre-specify the clusters.

However, it doesn’t work very well on vast amounts of data or huge datasets. And there are some disadvantages of the Hierarchical Clustering algorithm that it is not suitable for large datasets. And it gives the best results in some cases only.

#### Recommended Courses

#### Cluster Analysis With Python

Rating: **4.6/5**

#### Unsupervised Learning Algorithms

Rating: **4.5/5**

#### A to Z Machine Learning with Python

Rating: **4.5/5**

The concept is clearly explained and easily understandable.

For unsupervised learning, clustering is very important and the presentation of the article has made me to know hierarchical clustering importance and implementation in real world scenario problems.

Many thanks to the author-shaik irfana for her valuable efforts.

Thanks! Azam.

We wish you a happy learning!

Very well explained. The article is elegant and has a smooth flow.

Thanks! Mir Quadri

Thank! Quadri

Good explanation for all type of lerners and word presentation is very simple and understanding keep it top and more topics can explain for lerners.All the best for more useful topics

Sabiha,

Thanks for your kind words. We try to write much more quality articles like these.

We wish you happy learning!

Thanks! Sabiha 🙂

Looking a great work dear Very well explanation theoretical and Code part I really appreciate you, keep it up 👏👏👏👌👌💐💐

Thanks Rajeswari,

We are glad that you like the article, much more coming. Please visit the site regularly.

Do you have a topic in mind that we can write please let us know.

Thanks, and we wish you a happy learning.

Thanks! Rajeswari

I never seen this type of explanation because this content very useful to who want to learn quickly in an easy way keep it up and we are waiting for your new article in such a way.

Thanks, Ravi Kumar,

We always go one step ahead to create the quality content. Sure, much more are coming on the way.

Thanks, and happy learning!

Thanks! Ravi Kumar

Good explanation with minimal use of words..

Thanks! Ahmed

Excellent presentation skills, u written in easy way to get it. Keep it up👏👏👏👌

Thanks Sree hari.

Thanks! Sree hari

very well explanation thory and coding part

keep it up irfana

very well explanation along with theoretical and code part,

keep going irfana,

Thanks! Chowdary.

We are glad that you liked our article. We wish you happy learning.

Thanks! Chowdary

Well detailed theory along with practical coding, Irfana.

Keep them coming!

Well detailed theory along with practical coding, Irfana.

Keep up the work! Expecting more of such articles.

Thanks, Sandeep,

Many more amazing articles are on the way.