Accession Number:

ADA460258

Title:

A Unigram Orientation Model for Statistical Machine Translation

Descriptive Note:

Corporate Author:

IBM THOMAS J WATSON RESEARCH CENTER YORKTOWN HEIGHTS NY

Personal Author(s):

Report Date:

2004-01-01

Pagination or Media Count:

5.0

Abstract:

In this paper, we present a unigram segmentation model for statistical machine translation where the segmentation units are blocks pairs of phrases without internal structure. The segmentation model uses a novel orientation component to handle swapping of neighbor blocks. During training, we collect block unigram counts with orientation we count how often a block occurs to the left or to the right of some predecessor block. The orientation model is shown to improve translation performance over two models 1 no block re-ordering is used, and 2 the block swapping is controlled only by a language model. We show experimental results on a standard Arabic-English translation task.

Subject Categories:

  • Linguistics

Distribution Statement:

APPROVED FOR PUBLIC RELEASE