Preprocessing and Integration of Data from Multiple Sources for Knowledge Discovery
SPACE AND NAVAL WARFARE SYSTEMS CENTER SAN DIEGO CA
Pagination or Media Count:
The explosive growth in the generation and collection of data has generated an urgent need for a new generation of techniques and tools that can assist in transforming these data intelligently and automatically into useful knowledge. Knowledge discovery is an emerging multidisciplinary field that attempts to fulfill this need. Knowledge discovery is a large process that includes data selection, cleaning, preprocessing, integration, transformation and reduction, data mining, model selection, evaluation and interpretation, and finally consolidation and use of the extracted knowledge. This paper addresses the issues of data cleaning and integration for knowledge discovery by proposing a systematic approach for resolving semantic conflicts that are encountered during the integration of data from multiple sources. Illustrated with examples derived from military databases, the paper presents a heuristics based algorithm for identifying and resolving semantic conflicts at different levels of information granularity.
- Computer Systems