Accession Number : ADA506585


Title :   Human Dimensions of Corpora Comparison: An Analysis of Kilgarriff's (2001) Approach


Descriptive Note : Technical rept.


Corporate Author : DEFENCE SCIENCE AND TECHNOLOGY ORGANISATION EDINBURGH (AUSTRALIA) COMMAND CONTROL COMMUNICATIONS AND INTELLIGENCE DIV


Personal Author(s) : Parsons, Kathryn ; McCormac, Agata ; Butavicius, Marcus


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a506585.pdf


Report Date : Apr 2009


Pagination or Media Count : 62


Abstract : There is a distinct lack of tools that provide a comprehensive measure of the similarity between corpora. Finding similar corpora is necessary for the design of certain user studies investigating text processing. It is also useful for ensuring comparability between studies on document analysis conducted across classified and unclassified domains. In this study, human judgements of corpora similarity were obtained as a gold standard. These were then compared to the values provided by Kilgarriff's (2001) chi-square (X2) statistic. The findings indicated a high level of agreement between the participants, with 77% shared variance in overall similarity judgements. The results of the X2 measure also correlated well with the human results, with a correlation of approximately 0.66. Although there are complexities associated with the X2 technique that need to be examined in further research, this study provides extremely promising results, suggesting that a statistical technique could provide results that are comparable to human judgements.


Descriptors :   *INFORMATION RETRIEVAL , PERFORMANCE(HUMAN) , COMPARISON , TEXT PROCESSING , STATISTICAL ANALYSIS , AUSTRALIA , JUDGEMENT(PSYCHOLOGY) , ALGORITHMS


Subject Categories : Information Science


Distribution Statement : APPROVED FOR PUBLIC RELEASE