A Statistical Word-Level Translation Model for Comparable Corpora
MARYLAND UNIV COLLEGE PARK INST FOR ADVANCED COMPUTER STUDIES
Pagination or Media Count:
In this paper, we present a model of statistical word-level mapping for comparable corpora. The approach is based on the assumption that if two terms have close distributional profiles, their corresponding translations distributional profiles should be close in a comparable corpus. The proposed model is described. A preliminary investigation on intralanguage comparable corpora is laid out. The preliminary results are 92 accurate suggesting the feasibility of the model. The model needs to undergo some improvements and should be tested cross linguistically before assessing its significance.