5 thing you can do better with data mining

Today we will discover more about data mining. If you are not familiar with this concept, it is better that you start to understand more what is behind. We are talking about powerful tools and techique that will help you to get insight from your big data. 

A simple definition is: 

Data mining is the sum of technique and methodologies  to collect information from different sources and manage in automatic way through algorithms and logical patterns

How data mining could help you to collect data ?

Data are growing fastly not only on social and open database, but everywhere. 

With data mining tecnique like data scraping (taking data from internet, like ecommerce price , weather data, stock exchange…),  you can increase number of datasources that you can use for your analysis.

Did you know that you can get data also from images. Discover more here: 


In few minutes with very small line of code you can learn how to web scraping data using Python and R

How to group your data: clustering analysis

Image a big databases with many customers. It often happen that you have a lot of different groups of customer . Clustering analysis could easily identify which are customer with affinity that you can address in a similar group target,  maybe because they are similar to size order, purchase need, purchase attitude.

Clustering income vs education
                                 Example of cluster from www.dummies.com

This will help you or your firm to set different pricing, product and general marketing strategies more focused for that particular target.

Using Python or R will help you to identify clusters (see below an example of 3 clusters)

Other examples could be find here:

Cluster Analysis by JMP


Regression analysis: identify future output based on historical data

Consider a dataset with icecream sales of last three years and one with temperature information. With regression you can create an algorithm to estimate how much icecream you can sales based on expected temperature

Interesting article that clarify more regression, expecially on marketing 

Anomaly detection

Yes, how many times you have seen dataset with errors like typo distraction or duplicated info. Through specific tools and Machine learning you can easily identify and prevent this kind of error analyzing historical data and suggesting correct value.

How much time you can save from more robust and clear data set? Data scientist usually pass from 70% to 90% cleaning data 

Classification analysis: a powerful data mining technique

In this field are growing machine learning algorithm and chatbot that in future could try to solve most of our questions, maybe about a product features, classifying our question base on common patterns. 

Could be also interesting to identify common words in books, text, maybe through Wordcloud.

Signup to our newsletter to know soon how to analyze through wordcloud any text with Python and discover more info on datamining tools and techniques