Accession Number:

ADA603814

Title:

Arabic Natural Language Processing System Code Library

Descriptive Note:

Final rept. Oct 2013-Sep 2014

Corporate Author:

ARMY RESEARCH LAB ADELPHI MD COMPUTATIONAL AND INFORMATION SCIENCES DIRECTORATE

Personal Author(s):

Report Date:

2014-06-01

Pagination or Media Count:

14.0

Abstract:

This technical note provides a brief description of a Java library for Arabic natural language processing NLP containing code for training and applying the Arabic NLP system described in the paper A Cross-Task Flexible Transition Model for Arabic Tokenization, Affix Detection, Affix Labeling, POS Tagging, and Dependency Parsing by Stephen Tratz presented at the Statistical Parsing of Morphologically Rich Languages SPMRL workshop held in Seattle in conjunction with the Empirical Methods in Natural Language Processing EMNLP conference of October 2013. The system is capable of clitic separation, inflectional affix identification and labeling, part-of-speech tagging, and dependency parsing for Arabic. The code, which is extended from previously released graduate student code, also supports English part-of-speech tagging, dependency parsing, and semantic disambiguation tasks. In general, the code library is expected to be of most value to natural language processing researchers.

Subject Categories:

  • Linguistics
  • Computer Programming and Software

Distribution Statement:

APPROVED FOR PUBLIC RELEASE