Translating Collocations for Use in Bilingual Lexicons
COLUMBIA UNIV NEW YORK DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
Collocations are notoriously difficult for non-native speakers to translate, primarily because they are opaque and can not be translated on a word by word basis. We describe a program named Champollion which, given a pair of parallel corpora in two different languages, automatically produces translations of an input list of collocations. Our goal is to provide a tool to compile bilingual lexical information above the word level in multiple languages and domains. The algorithm we use is based on statistical methods and produces p word translations of n word collocations in which n and p need not be the same the collocations can be either flexible or fixed compounds. For example, Champollion translates to make a decision, employment equity, and stock market, respectively into prendre une decision, equite en matiere demploi, and bourse. Testing and evaluation of Champollion on one years worth of the Hansards corpus yielded 300 collocations and their translations, evaluated at 77 accuracy. In this paper, we describe the statistical measures used, the algorithm, and the implementation of Champollion, presenting our results and evaluation.
- Statistics and Probability