Accession Number : ADA506585
Title : Human Dimensions of Corpora Comparison: An Analysis of Kilgarriff's (2001) Approach
Descriptive Note : Technical rept.
Corporate Author : DEFENCE SCIENCE AND TECHNOLOGY ORGANISATION EDINBURGH (AUSTRALIA) COMMAND CONTROL COMMUNICATIONS AND INTELLIGENCE DIV
Personal Author(s) : Parsons, Kathryn ; McCormac, Agata ; Butavicius, Marcus
Report Date : Apr 2009
Pagination or Media Count : 62
Abstract : There is a distinct lack of tools that provide a comprehensive measure of the similarity between corpora. Finding similar corpora is necessary for the design of certain user studies investigating text processing. It is also useful for ensuring comparability between studies on document analysis conducted across classified and unclassified domains. In this study, human judgements of corpora similarity were obtained as a gold standard. These were then compared to the values provided by Kilgarriff's (2001) chi-square (X2) statistic. The findings indicated a high level of agreement between the participants, with 77% shared variance in overall similarity judgements. The results of the X2 measure also correlated well with the human results, with a correlation of approximately 0.66. Although there are complexities associated with the X2 technique that need to be examined in further research, this study provides extremely promising results, suggesting that a statistical technique could provide results that are comparable to human judgements.
Descriptors : *INFORMATION RETRIEVAL , PERFORMANCE(HUMAN) , COMPARISON , TEXT PROCESSING , STATISTICAL ANALYSIS , AUSTRALIA , JUDGEMENT(PSYCHOLOGY) , ALGORITHMS
Subject Categories : Information Science
Distribution Statement : APPROVED FOR PUBLIC RELEASE