Accession Number:

ADA460578

Title:

Domain and Language Evaluation Results

Descriptive Note:

Corporate Author:

DEPARTMENT OF DEFENSE FORT GEORGE G MEADE MD

Personal Author(s):

Report Date:

1993-01-01

Pagination or Media Count:

9.0

Abstract:

The Fifth Message Understanding Conference MUC-5 focused on the task of data extraction for two distinctly different applications, one within the domain of joint ventures JV and the other within the domain of microelectronics ME . For each application, the task could be performed in either English andor Japanese, giving four combinations English Joint Ventures, Japanese Joint Ventures, English Microelectronics, and Japanese Microelectronics . Interpreting the evaluation results across domains and within a single domain between languages is affected d by a number of factors. Differences in task focus, complexity, and domain technicality make it impossible to apply inferential statistics between domains . In addition, even though the task and the template design were the same across languages within a single domain, differences in the types of text sources for each language and accompanying variations in template fills and fill rules by language also make it impossible to apply inferential statistics between the language pairs . Moreover, there is considerable variation in the participants level of effort and funding, and not all of the participants worked in multiple languages andor multiple domains . In light of these factors, I will present descriptive statistics comparing error per response fill to address the following questions 1 For both languages, what is the performance difference between domains 2 Between domains, what are performance differences for the single shared object and for unattempted slots 3 For both domains, what is the performance difference between languages 4 For a single domain, what are representative differences at object and slot levels between English and Japanese The discussion of domain and language difference s will center upon general factors that influence performance in information extraction.

Subject Categories:

  • Information Science
  • Linguistics
  • Electrical and Electronic Equipment
  • Statistics and Probability

Distribution Statement:

APPROVED FOR PUBLIC RELEASE