What is Data Mining?
Data mining is process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.
Data Mining Process:
- Develop understanding of application, goals
- Create dataset for study (often from Data Warehouse)
- Data Cleaning and Preprocessing
- Data Reduction and projection
- Choose Data Mining task
- Choose Data Mining algorithms
- Use algorithms to perform task
- Interpret and iterate through 1-7 if necessary
- Deploy: integrate into operational systems.
As you can see, the core steps of data mining is from step 4 – step 8. Well, on discussing about data mining process, it leads to an important methodology of data mining called “CRISP-DM“.
“Cross Industry Standard Process for Data Mining” – a 6-phase model of the entire data mining process, from start to finish, that is broadly applicable across industries for a wide array of data mining projects.
As there are 6 phases, I will give short description about each phases.
- Business Understanding – Identifying the project objectives
- Data Understanding – Collect and review data
- Data Preparation – Select and clean data
- Modelling – Manipulate data and draw conclusions
- Evaluation – Evaluate model
- Deployment – Apply conclusions to business