Harvest User's Manual
COLORADO UNIV AT BOULDER DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
HARVEST is an information discovery and access system 4. It addresses three critical problems to help users reap the growing collection of information accessible via the World Wide Web 2. First, it provides an efficient and flexible means of indexing widely distributed information, to support resource discovery. Second, it provides network-adaptive means of caching and replicating heavily accessed information, to prevent bottlenecks. Third, it provides support for accessing and manipulating complex data. A key goal of Harvest is to provide a flexible system that can be configured in various ways to create many types of indexes, making very efficient use of Internet servers, network links, and index space on disk. Our measurements indicate that Harvest can reduce server load by a factor of 6,600, network traffic by a factor of 59, and index space requirements by a factor of 43 when building indexes, compared with previous systems, such as Archie, WAIS, and the World Wide Web Worm 3. Harvest also allows users to extract structured attribute-value pair information from many different information formats and build indexes that allow these attributes to be referenced e.g., all documents with a certain regular expression in the title field.
- Information Science
- Computer Programming and Software