Introduction to data mining techniques

introduction to data mining techniques

Introduction to data mining techniques

Introduction to data mining techniques:

Data mining techniques are set of  algorithms intended to find the hidden knowledge from the data. Usage of data mining techniques will purely depend on the problem we were going to solve. Some of the popular data mining techniques are classification algorithms, prediction analysis algorithms, clustering techniques. In this initial introduction post, we were going to address the basic understanding of the term data mining by presenting you a toy kind of example. You can learn more on data mining beginners guide.

Data Mining History:

In 1960s statisticians used the terms “Data Fishing” or “Data Dredging” to refer what they considered the bad practice of analyzing data without a prior hypothesis. The term “Data Mining” appeared around 1990 in the database community.

Data mining in Technical words:

Technically Data mining is the process of extracting specific information from data and presenting relevant and usable information that can be used to solve problems. There are different kinds of services in the process like text mining, web mining, audio and video mining, pictorial data mining and social network data mining.

Why is data mining hot cake topic for this generation?

Data mining is the young and promising field for the present generation because of its spacious applications. In a general way of saying, it has an attracted a great deal of attention in the information industry and in society, due to the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge.The information and knowledge gained can be used for applications ranging from market analysis, fraud detection, and customer retention to production control and science exploration. This is the reason why data mining is also called as knowledge discovery from data.

Data Mining Techniques :

  • Classification Technique: To predict the outcome of the target class(Will purchase or Not).
  • Clustering Technique: Grouping or clustering the dataset. (News articles clustering)
  • Associations Rule Technique: Finding the frequently occurrent items (Frequently purchased items)
  • Data Visualization: Visualizing the data for understanding the hidden insights.

Data Mining Applications:

  • Weather forecasting.
  • E-commerce.
  • Self-driving cars.
  • Hazards of new medicine.
  •  Space research.
  • Fraud detection.
  • Stock trade analysis.
  • Business forecasting.
  • Social networks.
  • Customers likelihood.

Understanding of data mining with buying apple example:


Before going to explain data mining with this fresh apples, let me say some interesting facts about apples.

Nutrition:  According to the United States Department of Agriculture, a typical apple serving weighs 242 grams and contains 126 calories with significant dietary fiber and modest vitamin C content, with otherwise a generally low content of essential nutrients.

Toxicity of apple seeds: The seeds of apples contain small amounts of amygdalin, a sugar and cyanide compound known as a cyanogenic glycoside. Ingesting small amounts of apple seeds will cause no ill effects, but in extremely large doses can cause adverse reactions. There is only one known case of fatal cyanide poisoning from apple seeds; in this case, the individual chewed and swallowed one cup of seeds. It may take several hours before the poison takes effect, as cyanogenic glycosides must be hydrolyzed before the cyanide ion is released.

Now Let’s step into example for basic understanding building data mining model:

Suppose your family members want to meet someone who is suffering from pancreatic cancer. We all know that the consumption of apples could help to reduce pancreatic cancer by up to 23 percent. So your father asked you to bring apples from a nearby shop. Also, your father teaches (learn) you how to buy apples by giving some set of rules.

Rules for buying apples:

  • Big size apples are having less taste than small size apples.
  • Dark red apples are not fresh ones.
  • Light red apples are fresh ones.
  • Green apples are good for health.

On clear observation on the about listed rules, You can pick the apples which you want to buy. Your family members want to give  these apples to an unhealthy person. Hence, you obviously pick green apples. So when you go for shopping you will pick small size apples which are in green color. End of the story to select apples which are good for health.

NonData mining  Algorithm:

if( selected_apple == small (in size ))
     if(selected_apple == green ( in color ) ){
            select apple
     else {
           don't select apple

Comparing  with data mining:

  • You will randomly select an apple from the shop ( training data )
  • Make a table of all the physical characteristics of each apple, like color, size( features )
  • Tasty apples, apple which is good for health( output variables )
  • If you went to new shop to buy the apples ( test data )

Whatever you have done so for is called as model building in data mining terminology once you were with the model you have build (Here the proper rules for buying apples) You can now buy  apples with great confidence, without worrying about the details of how to choose the best apples. And what more, you can make your algorithm and improve it over time (reinforcement learning), The model performance will improve when you have done more training, and modifies itself when it makes a wrong prediction. But the best part is, you can use the same algorithm to train different models, one each for predicting the quality of apples, oranges, bananas, grapes, cherries, and watermelons, and keep all your loved ones happy.

This type of learning is called as supervised learning in data mining. In next post, You can get the clear understanding of the difference between supervised learning and unsupervised learning with real life examples.

Reference Books:

Follow us:


I hope you like this post. If you have any questions then feel free to comment below.  If you want me to write on one specific topic then do tell it to me in the comments below.

Related Courses:

Do check out unlimited data science courses

Title of the course Course Link What You Will Learn
Pattern Discovery in Data Mining
Pattern Discovery in Data Mining
  • Will learn the basic concepts of data mining and it’s real world applications.
  • Will also learn data-driven methods and some interesting of pattern discovery.
  • Practice the scalable pattern discovery methods on massive transaction data.
Introduction to machine learning
Machine Learning
  •  Introduce the basic machine learning, data mining, and pattern recognization concepts.
  • In details differences of supervised and unsupervised learning algorithms.
  • Lot more case studies and machine learning applications.
 Data Mining with Python
Data Mining with Python: Classification and Regression
  • Understand the key concepts in data mining and will learn how to apply these concepts to solve the real world problems.
  • Will get hands on experience with python programming language.
  • Hands on experience with numpy, pandas, matplotlib libraries (Python libraries)

9 Responses to “Introduction to data mining techniques

  • Data mining can showcase the data with real figures and facts that would provide an insight into how you can improvise the future product launches.

  • Deepak Gautam
    6 years ago

    good one.

  • Hi Sai, Thanks for the basics of data mining. What is training, what is model here? And what is training a model?
    Could you please help me understanding these basic terms?

  • hello sir, i lyk the way u explained basic concept of data mining. i hav question ,hope u will provide me useful information. i m working on a project DATAMINING IN HEALTHCARE. i m implementing this project in jsp using Eclipse IDE. i am providing training data like this:
    PERSON HAVING SOME DISEASE(Because last digit is 1)
    30 2 1 2 2 2 2 1 2 2 2 2 2 1 85 18 4 35 1 1
    50 1 1 2 1 2 2 1 2 2 2 2 2 0.9 135 42 3.5 35 1 1
    78 1 2 2 1 2 2 2 2 2 2 2 2 0.7 96 32 4 35 1 1
    PERSON DON’T HAVE DISEASE( last digit is o i.e: histology componet)
    44 1 1 2 1 1 2 2 2 1 2 2 1 0.9 135 55 3.42 41 2 0
    30 1 2 2 1 1 1 2 1 2 1 1 1 2.5 165 64 2.8 35 2 0
    38 1 1 2 1 1 1 2 1 2 1 1 1 1.2 118 16 2.8 35 2 0
    now how to use this information in my code?
    (i have created all the needed functions and trained classifier)

  • I am no longer certain where you are getting your information, however great topic.

    I needs to spend some time learning more or figuring out more.
    Thanks for fantastic info I used to be searching for this info for
    my mission.

  • Anonymous
    8 years ago

    Hi Sai Madhu.! Great article to introduce Data Mining.
    I have a question. You have said about “reinforcement learning”. What exactly is this and how does this happen? How can an algorithm learn over time? Does it store in any memory to remember? Please give me clear picture about this.

    • Hii someone 🙂
      Thanks for your compliment.
      Reinforcement learning:
      The reason for using reinforcement learning in the introduction to data mining post was to express learning from data is like trial and error learning. This means ,you build the model from your train data by considering few parameters. when you tested your model it’s not giving better accuracy. Then you will change the parameters you have consider before to get better accuracy. This process will go until you gratify with you model accuracy.
      Suppose in our apples example we believed that small and green apples are good one ,but when you buy green apples you feel that those apple are not good enough. so you will find some new parameters like buying morning is good or evening is good some thing like that then you will change your model based on your new parameters.

      Coming to your next question how an algorithm will learn over time:

      In real world problems our training data will update on regular intervals so our model will also update so our model accuracy will also changes. this is the reason why we will called algorithm will learn over time.

      Coming to your next question Does it store in any memory to remember:

      once we done with model we will store it in an variable. when you changed your model you will update the variable that’s it.

      if you have any other questions you can mail to

Leave a Reply

Your email address will not be published. Required fields are marked *