classification and clustering algorithms

classification and clustering algorithms

Classification and Clustering Algorithms

Solving real world problems with data science concepts is so exciting and it yields so fun.

A famous dialogue you could listen from the data science people. It could be true if we add it’s so challenging at the end of the dialogue. The foremost challenge starts from  categorising the problem itself. The first level of categorising could be whether supervised or unsupervised learning. The next level is what kind of algorithms to get start with whether to start with classification algorithms or with clustering algorithms?

As we have covered the first level of categorising supervised and unsupervised learning in our previous post, now we would like to address the key differences between classification and clustering algorithms. First of all, it’s better to know the differences between  classification and prediction before knowing the difference between classification and clustering.

Now Let’s begin with classification concepts.

Classification Concept

Classification Concepts

Classification Concepts

In classification, the idea is to predict the target class by analysis the training dataset. This could be done by finding proper boundaries for each target class. In a general way of saying, Use the training dataset to get better boundary conditions which could be used to determine each target class. Once the boundary conditions determined, the next task is to predict the target class as we have said earlier. The whole process is known as classification.

Target class examples:

  • Analysis the customer data to predict whether he will by computer accessories (Target class: Yes or No)
  • Gender classification from hair length (Target classes: Male or Female)
  • Classifying fruits from each fruit feature like colour, taste, size, weight (Targe class: Apple, Orange, Cherry, Banana)

Let’s understand the concept of classification with gender classification using hair length. To classify gender (target class) using hair length as feature parameter we could train a model using any classification algorithms to come up with some set of boundary conditions which can be used to differentiate the male and female genders using hair length as the training feature. In gender classification case the boundary condition could the proper hair length value. Suppose the differentiated boundary hair length value is 15.0 cm then we can say that if hair length is less than 15.0 cm then gender could be male or else female.

Some classification algorithms listed below.

Classification Algorithms

  • Linear classifiers
    • Logistic regression
    • Naive Bayes classifier
    • Fisher’s linear discriminant
  • Support vector machines
    • Least squares support vector machines
  • Quadratic classifiers
  • Kernel estimation
    • k-nearest neighbor
  • Decision trees
    • Random forests
  • Neural networks
  • Learning vector quantization

Application of Classification Algorithms

  • Email spam classification
  • Bank customers loan pay bank willingness prediction.
  • Cancer tumour cells identification.
  • Sentiment analysis.
  • Drugs classification
  • Facial key points detection
  • Pedestrians detection in an automotive car driving.

Clustering Concept

Clustering Algorithms

Clustering

In clustering the idea is not to predict the target class as like classification , it’s more ever trying to group the similar kind of things by considering the most satisfied condition all the items in the same group should be similar and no two different group items should not be similar.  To group the similar kind of items in clustering, different similarity measures could be used.

Group items Examples:

  • While grouping similar language type documents (Same language documents are one group.)
  • While categorising the news articles (Same news category(Sport) articles are one group )

Let’s understand the concept with clustering genders based on hair length example. To determine gender, different similarity measure could be used to categorise male and female genders. This could be done by finding the similarity between two hair lengths and keep them in the same group if the similarity is less(Difference of hair length is less). The same process could continue until all the hair length properly grouped into two categories.

To get used to different similarity measure to perform clustering we have some popular clustering algorithms. Which are listed below.

Clustering Algorithms

Clustering algorithms can be classified into two main categories Linear clustering algorithms and Non-linear clustering algorithms.

  • Linear clustering algorithm
    • k-means clustering algorithm
    • Fuzzy c-means clustering algorithm
    • Hierarchical clustering algorithm
    • Gaussian(EM) clustering algorithm
    • Quality threshold clustering algorithm
  • Non-linear clustering algorithm
    • MST based clustering algorithm
    • kernel k-means clustering algorithm
    • Density-based clustering algorithm

Application of Clustering Algorithms

  • Recommender systems
  • Anomaly detection
  • Human genetic clustering
  • Genom Sequence analysis
  • Analysis of antimicrobial activity
  • Grouping of shopping items
  • Search result grouping
  • Slippy map optimization
  • Crime analysis
  • Climatology

Conclusion

Let’s summarize the key things we have learnt in this blog post.

Classification: Predicting target class for test dataset from the trained modeled from the training dataset.

Clustering: Using different similarity measure to place the all the similar items in a group.

References

[1] Statistical classification

[2] Data Clustering

[3] Scikit-learn Clustering

Follow us:

FACEBOOKQUORA |TWITTER| GOOGLE+ | LINKEDINREDDIT | FLIPBOARD | MEDIUM | GITHUB

I hope you like this post. If you have any questions then feel free to comment below.  If you want me to write on one specific topic then do tell it to me in the comments below.

  • zehra says:

    Just awesome…very clear and to the point

  • Yansi says:

    I would recommend this post on my blog. Awesome work.

  • maitri says:

    thank you…it was very helpful

  • […] to KNN classifier, we can use Radius Neighbor Classifier for classification tasks. As in KNN classifier, we specify the value of K, similarly, in Radius neighbor classifier the […]

  • […] package provides us direct access to various functions for training our model with various machine learning algorithms like Knn, SVM, decision tree, linear regression, […]

  • […] package provides us direct access to various functions for training our model with various machine learning algorithms like Knn, SVM, decision tree, linear regression, […]

  • […] vector machine classifier is one of the most popular machine learning classification algorithm. Svm classifier mostly used in addressing multi-classification problems. If you are not aware of […]

  • […] Bayes classifier is a straightforward and powerful algorithm for the classification task. Even if we are working on a data set with millions of records with some attributes, it is […]

  • Shafi says:

    Hi Madhu,

    Good effort you put for us,interest automatically increasing while reading ur blog. Thanks for excellent material.

  • […] Using the training dataset to model any classification or clustering algorithm. […]

  • […] caret package provides us direct access to various functions for training our model with different machine learning algorithms like Knn, SVM, decision tree, linear regression, […]

  • […] are getting the high probabilities. This is what we can expect from the softmax function. Later in classification task, we can use the high probability value for predicting the target class for the given input […]

  • […] are getting the high probabilities. This is what we can expect from the softmax function. Later in classification task, we can use the high probability value for predicting the target class for the given input […]

  • […] specialization, you will be learning the machine learning algorithms like regression techniques, classification methods, and clustering techniques. In each course, you were going to learn these particular algorithms from basics to advance, and you […]

  • Sairam says:

    Can you recommend programming tools that we can use for implementing the classification and clustering algorithms?

  • […] in machine learning field is not only about building different classification or clustering models. It’s more about feeding the right set of features into the training […]

  • >