EXPERIMENTAL DESIGN FOR MEASURING THE INTRA- AND INTER-GROUP CONSISTENCY OF HUMAN JUDGMENT OF RELEVANCE.
GEORGIA INST OF TECH ATLANTA
Pagination or Media Count:
The suspected variability of humans in judging the relevance of documents is one of the current problems confronting the development and improvement of document information and retrieval systems. The purpose of this thesis was to design a method to investigate the variation of relevance judgments between two groups of analysts and among the analysts within each group. A pilot experiment was conducted using two groups of analysts subject experts and non-experts and two question-document collections machine retrieved and randomly selected. Analysts were instructed to mark each document relevant or not-relevant to the given question and to record the time required to make such relevance assessments. The responses were analyzed statistically. The data permitted the following conclusions 1 the analysts within the groups could consistently agree on the relevance of documents to questions 2 the degree of consistency of the two groups did not differ significantly 3 the two groups did agree on the relevance of a particular document to a question and 4 the method of document selection had a serious effect only on the consistency of the group of non-experts.