Building Information Servers
Final rept. 30 Sep 93-29 Sep 97
UNIVERSITY OF SOUTHERN CALIFORNIA MARINA DEL REY INFORMATION SCIENCES INST
Pagination or Media Count:
This research addressed the problem of determining the relationships among multiple, diverse information sources in order to support the integration of data from these sources. In general, to integrate data from multiple sources requires a model of the precise relationships between the sources. Constructing such a model by hand is a difficult and time consuming process. The relationships captured in a model describe the type of overlap between data instances in different sources. In this work data mining techniques were used to determine these relationships by comparing the data instances between sources. A related problem is that data instances can exist in different formats across several sources, e.g. IBM may be abbreviated as IBM in one source and appear as International Business Machines in another source. This work addressed this problem by developing techniques for automatically determining the mapping between names used in different sources. These integration techniques were use in conjunction with the SIMS information mediator. allowing SIMS to correctly and efficiently integrate data across several sources that contained data instances appearing in multiple formats.
- Computer Systems
- Information Science