Accession Number : ADA110052


Title :   Duplicate Record Elimination in Large Data Files.


Descriptive Note : Technical rept.,


Corporate Author : WISCONSIN UNIV-MADISON DEPT OF COMPUTER SCIENCES


Personal Author(s) : Friedland,Dina ; DeWitt,David J


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a110052.pdf


Report Date : Aug 1981


Pagination or Media Count : 29


Abstract : This paper addresses the issue of duplicate elimination in large data files in which many occurrences of the same record may appear. A comprehensive cost analysis of the duplicate elimination operation is presented. This analysis is based on a combinatorial model developed for estimating the size of intermediate runs produced by a modified merge-sort procedure. The performance of this merge-sort procedure is demonstrated to be significantly superior to the standard duplicate elimination technique of sorting followed by a sequential pass to locate duplicate records. The results can also be used to provide critical input to a query optimizer in a relational database system. (Author)


Descriptors :   *DATA MANAGEMENT , *COMPUTER FILES , DATA BASES , REMOVAL , OPTIMIZATION , COST ANALYSIS , SEMANTICS , INPUT OUTPUT PROCESSING , FORMATS , DATA STORAGE SYSTEMS , SORTING , COMBINATORIAL ANALYSIS , REPLICAS


Subject Categories : Information Science
      Computer Programming and Software
      Computer Hardware


Distribution Statement : APPROVED FOR PUBLIC RELEASE