Why Is Data Integration In Data Mining Important?
Data Integration is a data processing technique that collects data from different sources (such as data cubes, multiple databases, and flat files) and offers a unified view of the data to the users.
Data integration in data mining connects with issues such as duplicate data, inconsistent data, old systems, etc. Manual data integration can be achieved through middleware and applications.
There are primarily 2 major systems for data integration which are as follows:
Tight Coupling - In this method, the data warehouse is treated as an information recovery feature. The process is known as ETL which means Extraction, Transformation, and Loading.
Loose Coupling - In this method, an interface is offered that listens to a query from the user and transforms it to the source database and then sends the query directly to the reference databases and obtains a great result.
What are the Issues Of Data Integration in Data Mining?
There are no problems during data integration in data mining: Schema Integration, Redundancy, Detection and explanation of data value disputes.
1. Schema Integration - It integrates metadata from multiple sources and the real-world entities are matched with the entity identification problem.
2. Redundancy - An attribute may be duplicative or obtain redundancy. When the attributes are inconsistent, they may appear as duplicates in the resulting data set.
Some redundancies can be caught with the help of correlation analysis.
3. Detection and explanation of data value disputes - This is the third critical issue in data integration. Here the attribute values collected from different sources may vary for the exact real-world entity. An attribute collected in a system may be registered at a lower level of generalisation as compared with the “same” characteristic in another.
Schedule a call with our experts today and get help to run your business smoothly and gather data useful for your organization.