Accession Number : ADA507147


Title :   Domain Adaptation of Translation Models for Multilingual Applications


Descriptive Note : Doctoral thesis


Corporate Author : CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE


Personal Author(s) : Rogati, Monica


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a507147.pdf


Report Date : Apr 2009


Pagination or Media Count : 128


Abstract : The performance of a statistical translation algorithm in the context of multilingual applications such as cross-lingual information retrieval (CLIR) and machine translation (MT) depends on the quality, quantity and proper domain matching of the training data. Traditionally, manual selection and customization of training resources has been the prevailing approach. In addition to being labor-intensive, this approach does not scale to the large quantity of heterogeneous resources that have recently become available, such as parallel text and bilingual thesauri in various domains. More importantly, manual customization does not offer a solution to efficiently and effectively producing tailored translation models for a mixture of heterogeneous target documents in various domains, topics, languages and genres. Translation models trained on a general domain do not work well in technical domains; models trained on written documents are not appropriate for spoken dialogue; models trained on manual transcripts can be sub-optimal for translating noisy transcripts produced by a speech recognizer; finally, models trained on a mixture of topics are not optimal for any of the topic-specific documents. We seek to address this challenge by automatically adapting translation models (and implicitly parallel training resources) to specific target domains or sub-domains.


Descriptors :   *MODELS , *MACHINE TRANSLATION , *ADAPTATION , *LANGUAGE , *ALGORITHMS , HETEROGENEITY , PARALLEL ORIENTATION , STATISTICAL PROCESSES , DESIGN CRITERIA , LANGUAGE TRANSLATION , RESOURCES , INFORMATION RETRIEVAL , TARGETS , THESES , TRAINING , OPTIMIZATION


Subject Categories : Linguistics
      Numerical Mathematics


Distribution Statement : APPROVED FOR PUBLIC RELEASE