Accession Number:

ADA458774

Title:

Improved Word-Level Alignment: Injecting Knowledge about MT Divergences

Descriptive Note:

Conference paper

Corporate Author:

MARYLAND UNIV COLLEGE PARK INST FOR ADVANCED COMPUTER STUDIES

Report Date:

2002-02-14

Pagination or Media Count:

11.0

Abstract:

Word-level alignments of bilingual text bitexts are not an integral part of statistical machine translation models, but also useful for lexical acquisition, treebank construction. and part-of-speech tagging. The frequent occurrence of divergences, structural differences between languages, presents a great challenge to the alignment task. We resolve some of the most prevalent divergence cases by using syntactic parse information to transform the sentence structure of one language to bear a closer resemblance to that of the other language. In this paper, we show that common divergence types can be found in multiple language pairs in particular, we focus on English-Spanish and English-Arabic and systematically identified. We describe our techniques for modifying English parse trees to form resulting sentences that share more similarity with the sentences in the other languages finally, we present an empirical analysis comparing the complexities of performing word-level alignments with an without divergence handling. Our results suggest that divergence-handling can improve word-level alignment.

Subject Categories:

  • Linguistics
  • Statistics and Probability

Distribution Statement:

APPROVED FOR PUBLIC RELEASE