7 Most Popular Data mining Techniques

September 16, 2014 Saimadhu Polamuri

You can’t become a successful data scientist without knowing these 7 popular data mining techniques.

Be sure to complete this article if you are an aspiring data scientist and want a successful data science career. Data mining techniques are part of any data scientist's day-to-day working activities.

7 MOST POPULAR DATA MINING TECHNIQUES

Click to Tweet

Before we drive forward, skim through the table of contents of this article.

Table of Contents

What is Data Mining

Seven Steps in Data Mining Processing

Different Types of Data Mining Techniques

Association Rules Analysis

Regression Algorithms

Classification Algorithms

Clustering Algorithms

Time Series Forecasting

Anomaly Detection

Artificial Neural Network Models

Data Mining Applications

Conclusion

What is Data Mining

Data mining is a methodology predominantly used by organizations with vast data. The data organization uses data mining techniques to get insights that help organizational growth.

Just to give a simple example in the financial sector, Banks need to understand the user transactional data to identify the loan eligibility amount. In some cases they will identify the fraud activites in credit cards.

More technically, data mining is the workflow process of identifying the insights and patterns from the raw data stored in databases or data stores (A place where the data is held).

The organization will make various critical business decisions to increase revenue using the insights.

Hope you understand the term data mining by now. Let’s use this understanding to know how the organization uses data mining processes.

Seven Steps in Data Mining Processing

Data mining is known as Knowledge Discovery in Databases or KDD. Identifying insightful knowledge from data includes various stages.

Gregory Piatetsky-Shapiro coined the term “Knowledge Discovery in Databases” in 1989.

On a high level, the knowledge discovery process has two main stages.

Data Preprocessing
Data Mining

In the data preprocessing stage, we will perform the below steps.

Data cleaning
- To convert the raw data to a clean form of data by removing irrelevant data.
Data integration
- Combine the various data sources data to a centralized data source.
Data reduction
- Identify the relevant data and remove the data which are not helpful for any analysis.
Data transformation
- Transform the data to model-specific transformation, such as normalizing and converting the categorical data to numerical data.

Likewise, in the Data mining stage, we will perform the below steps

Data mining
- Apply various methods to get valuable insights out of the data.
Pattern evaluation
- Using the various statistical and machine learning models to identify the critical pattern inside the data.
Knowledge representation
- Represent the insights or the patterns we identified with great visualizations.

To learn about these 7 steps in data mining, you can check out the details of the steps of the data mining article.

Now let’s move forward with the popular data mining techniques.

Different Types of Data Mining Techniques

We have various data mining techniques to extract insights from the data. These techniques identify the underlying patterns and help predict future outcomes.

Below is the list of various types of data mining Techniques:

Association Rule Analysis
Regression Algorithms
Classification Algorithms
Clustering Algorithms
Time Series Forecasting
Anomaly Detection
Artificial Neural Network Models

Now let’s learn about each of these techniques in detail.

Association Rules Analysis

Association rules help determine how frequently the dataset items are coming up. These rules are heavily used in market bucket analysis.

These association rules are if-then statements that help identify how frequently some items came from large databases or datasets.

They are commonly used on transaction databases to identify how frequently some products people buy. This helps in recommending the products to users.

One popular method of association rule is association-based classification, also called as associative classification, which contains two significant steps.

Using the Apriori rule mining algorithm
Build a classifier using the identified association rules

Association rule analysis has 3 main computing formulas used to identify the rules.

Lift
Support
Confidence

Lift

Lift measures the accuracy of the confidence over how often item B is purchased.

(Confidence) / (item B)/ (Entire dataset)

Support

This measurement technique measures how often multiple items are purchased and compares it to the overall dataset.

(Item A + Item B) / (Entire dataset)

Confidence

This measurement technique also measures how often item B is purchased when item A is purchased.

(Item A + Item B)/ (Item A)

Regression Algorithms

Regression analysis helps in identifying the relationship between the independent and dependent variables. We also have various regression algorithms to predict the numerical outcome.

For example, house price prediction. In this, we need to have the house features, such as

Number of rooms
Number of bathrooms
Area of house
Age of the house

Using the above features, we need to predict the house price.

We are various versions of regression models. Some of them are captured below.

Classification Algorithms

The classification data mining technique predicts the outcomes in categorical or distinct values. From the above example, house price detection, if you want to perform the classification, the house prices are converted into numerical categories or categorical values.

Such as is the house price low, medium, or high. Where the features will still be the same.

To make things clear, will write the example again.

Given house-related features, classify the house price into the categories below.

Low Price,
Medium Price,
High Price

Using the same features, which are

Number of rooms
Number of bathrooms
Area of house
Age of the house

Hope you understand the difference between regression and classification problems. We are various versions of classification algorithms. Some of them are captured below.

Clustering Algorithms

The clustering technique in data mining aims to identify similar data points. In other words, it groups the data points based on the similarity metrics.

In other words, Clustering analysis is a data mining technique to identify similar data points. This technique helps to recognize the differences and similarities between the data.

Clustering is similar to classification but involves grouping chunks of data based on their similarities.

The best example of clustering is the world map if you see, based on some similarities, the people in a land area clustered in a country.

Below is the list of the most popular clustering algorithms

Agglomerative Hierarchical Clustering
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
K-Means
Spectral Clustering
Mean Shift Clustering
Mini-Batch K-Means
BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies)
Affinity Propagation
Gaussian Mixture Models (GMM)
OPTICS

Time Series Forecasting

We predict the future data point using the time series data in the time series forecasting technique. For that, these forecasting techniques analyze the trend in the historical data and extrapolate the future trend.

The best example of time series forecasting is predicting the stock price using the historical price trend of the stock.

Below are the most popular time series forecasting models.

Autoregressive (AR)
Autoregressive Integrated Moving Average (ARIMA)
Seasonal Autoregressive Integrated Moving Average (SARIMA)
Exponential Smoothing (ES)
LSTM (Deep Learning)
DeepAR
Prophet
Temporal Fusion Transformer
N-BEATS

Anomaly Detection

The anomaly detection technique is used as a modeling technique and in the data preparation stage also.

Anomaly detection is also called outlier analysis, where we identify the data points far from the remaining data points.

This technique is majorly used in fraud detection. Just to give you an example, based on the user credit transaction identifying the fraud activities that happened with the user credit card.

Below are the popular anomaly detection techniques used in the industry.

One Class SVM
Local Outlier Factor
Isolation Forest
Minimum Covariance Determinant

Artificial Neural Network Models

The artificial neural network models work the way human biological neuron works. The neural network contains various layers, and each layer aims to learn the input data pattern.

It contains the below building blocks.

Inputs,
Hidden layers,
Weights,
Bias,
Activation functions,
Learning rate

Below are some of the popular neural network model

Convolutional Neural Networks (CNNs)
Recurrent Neural Networks (RNNs)
Long Short Term Memory Networks (LSTMs)
Deep Belief Networks (DBNs)
Restricted Boltzmann Machines( RBMs)
Autoencoders
Generative Adversarial Networks (GANs)
Radial Basis Function Networks (RBFNs)
Multilayer Perceptrons (MLPs)
Self Organizing Maps (SOMs)

We have seen various data mining techniques, and now let’s see the domain where we are using the discussed data mining techniques.

Data Mining Applications

Banking
- In the banking domain, we use various data mining techniques to identify the fraud activities, identifying the user loan repayment worthiness.
Education
- In education, we use various data mining techniques to recommend personalized learning paths for the students, simplifying the user learning time.
Insurance
- In the insurance sector, data mining techniques help determine the price for the various financial products to get a profit out of them and help accrue new customers.
E-Commerce
- In eCommerce, various data mining techniques are used to provide product recommendations to the user, Also, these techniques help in giving the user-specific pricing.
Bioinformatics
- In Bioinformatics, the techniques we discussed are used to identify the pattern from the micro-analysis data and help create the medical patterns.

Conclusion

In this article, we discussed various data mining techniques. Also, each technique-wise, the list of popular algorithms. We also see how these techniques were used in multiple domains with examples.

In a nutshell, data mining techniques are used to get essential business insights and to identify the underline patterns in the data that help the organization to make critical business decisions.

Recommended Courses

Recommended

Machine Learning Course

Rating: 4.5/5

Learn Now

Deep Learning Course

Rating: 4/5

Learn Now

NLP Course

Rating: 4/5

Learn Now

9 Responses to “7 Most Popular Data mining Techniques”

Emilia Jazz
4 years ago
Reply

Data mining can showcase the data with real figures and facts that would provide an insight into how you can improvise the future product launches.
- Saimadhu Polamuri
  4 years ago
  Reply
  
  Yes Emilia Jazz, Getting the valuable insight is key.
Deepak Gautam
7 years ago
Reply

good one.
- saimadhu
  7 years ago
  Reply
  
  Hi Deepak Gautam,
  Thanks for your compliment 🙂
Suren
8 years ago
Reply

Hi Sai, Thanks for the basics of data mining. What is training, what is model here? And what is training a model?
Could you please help me understanding these basic terms?
pranav
9 years ago
Reply

hello sir, i lyk the way u explained basic concept of data mining. i hav question ,hope u will provide me useful information. i m working on a project DATAMINING IN HEALTHCARE. i m implementing this project in jsp using Eclipse IDE. i am providing training data like this:
PERSON HAVING SOME DISEASE(Because last digit is 1)
30 2 1 2 2 2 2 1 2 2 2 2 2 1 85 18 4 35 1 1
50 1 1 2 1 2 2 1 2 2 2 2 2 0.9 135 42 3.5 35 1 1
78 1 2 2 1 2 2 2 2 2 2 2 2 0.7 96 32 4 35 1 1
PERSON DON’T HAVE DISEASE( last digit is o i.e: histology componet)
44 1 1 2 1 1 2 2 2 1 2 2 1 0.9 135 55 3.42 41 2 0
30 1 2 2 1 1 1 2 1 2 1 1 1 2.5 165 64 2.8 35 2 0
38 1 1 2 1 1 1 2 1 2 1 1 1 1.2 118 16 2.8 35 2 0
now how to use this information in my code?
(i have created all the needed functions and trained classifier)
http://maketshirtsonline.net
9 years ago
Reply

I am no longer certain where you are getting your information, however great topic.

I needs to spend some time learning more or figuring out more.
Thanks for fantastic info I used to be searching for this info for
my mission.
Anonymous
9 years ago
Reply

Hi Sai Madhu.! Great article to introduce Data Mining.
I have a question. You have said about “reinforcement learning”. What exactly is this and how does this happen? How can an algorithm learn over time? Does it store in any memory to remember? Please give me clear picture about this.
- saimadhu
  9 years ago
  Reply
  
  Hii someone 🙂
  Thanks for your compliment.
  Reinforcement learning:
  The reason for using reinforcement learning in the introduction to data mining post was to express learning from data is like trial and error learning. This means ,you build the model from your train data by considering few parameters. when you tested your model it’s not giving better accuracy. Then you will change the parameters you have consider before to get better accuracy. This process will go until you gratify with you model accuracy.
  Example:
  Suppose in our apples example we believed that small and green apples are good one ,but when you buy green apples you feel that those apple are not good enough. so you will find some new parameters like buying morning is good or evening is good some thing like that then you will change your model based on your new parameters.
  
  Coming to your next question how an algorithm will learn over time:
  
  In real world problems our training data will update on regular intervals so our model will also update so our model accuracy will also changes. this is the reason why we will called algorithm will learn over time.
  
  Coming to your next question Does it store in any memory to remember:
  
  once we done with model we will store it in an variable. when you changed your model you will update the variable that’s it.
  
  if you have any other questions you can mail to hello@dataaspirant.com