Improving Statistical Machine Translation Through N-best List Re-ranking and Optimization
AIR FORCE INSTITUTE OF TECHNOLOGY WRIGHT-PATTERSON AFB OH GRADUATE SCHOOL OF ENGINEERING AND MANAGEMENT
Pagination or Media Count:
Statistical machine translation SMT is a method of translating from one natural language NL to another using statistical models generated from examples of the NLs. The quality of translation generated by SMT systems is competitive with other premiere machine translation MT systems and more improvements can be made. This thesis focuses on improving the quality of translation by re-ranking the n-best lists that are generated by modern phrase-based SMT systems. The n-best lists represent the n most likely translations of a sentence. The research establishes upper and lower limits of the translation quality achievable through re-ranking. Three methods of generating an n-gram language model LM from the n-best lists are proposed. Applying the LMs to re-ranking the n-best lists results in improvements of up to six percent in the Bi-Lingual Evaluation Understudy BLEU score of the translation.