Accession Number:

ADA110052

Title:

Duplicate Record Elimination in Large Data Files.

Descriptive Note:

Technical rept.,

Corporate Author:

WISCONSIN UNIV-MADISON DEPT OF COMPUTER SCIENCES

Personal Author(s):

Report Date:

1981-08-01

Pagination or Media Count:

29.0

Abstract:

This paper addresses the issue of duplicate elimination in large data files in which many occurrences of the same record may appear. A comprehensive cost analysis of the duplicate elimination operation is presented. This analysis is based on a combinatorial model developed for estimating the size of intermediate runs produced by a modified merge-sort procedure. The performance of this merge-sort procedure is demonstrated to be significantly superior to the standard duplicate elimination technique of sorting followed by a sequential pass to locate duplicate records. The results can also be used to provide critical input to a query optimizer in a relational database system. Author

Subject Categories:

  • Information Science
  • Computer Programming and Software
  • Computer Hardware

Distribution Statement:

APPROVED FOR PUBLIC RELEASE