System for Cross-Language Information Processing, Translation and Summarization (SCRIPTS)

reportActive / Technical Report | Accesssion Number: AD1165721 | Open PDF

Abstract:

This report describes the technical approaches and results for System for Cross-language Information Processing, Translation and Summarization (SCRIPTS) funded under the IARPA MATERIAL program. SCRIPTS consists of components for Automatic Speech Recognition (ASR) and Machine Translation (MT) in order to preprocess the text and speech corpora provided as part of the program. It also includes a text processing component that performs morphological analysis. In user-facing mode, given a query, SCRIPTS Cross-Language Information Retrieval (CLIR) returns relevant documents, while Summarization generates textual summaries of each document to help an analyst confirm which documents returned by CLIR are actually relevant. Over the course of program, the team implemented models for nine different languages: Somali, Swahili, Tagalog, Bulgarian, Lithuanian, Pashto, Farsi, Kazakh, and Georgian.

Security Markings

DOCUMENT & CONTEXTUAL SUMMARY

Distribution Code:
A - Approved For Public Release
Distribution Statement: Public Release

RECORD

Collection: TRECMS
Identifying Numbers
Subject Terms