Accession Number:

ADA595522

Title:

A Language-Independent Approach to Automatic Text Difficulty Assessment for Second-Language Learners

Descriptive Note:

Corporate Author:

MASSACHUSETTS INST OF TECH LEXINGTON LINCOLN LAB

Report Date:

2013-08-01

Pagination or Media Count:

10.0

Abstract:

In this paper, we introduce a new baseline for language-independent text difficulty assessment applied to the Interagency Language Roundtable ILR proficiency scale. We demonstrate that reading level assessment is a discriminative problem that is best-suited for regression. Our baseline uses z-normalized shallow length features and TF-LOG weighted vectors on bag-of-words for Arabic, Dari, English, and Pashto. We compare Support Vector Machines and the Margin-Infused Relaxed Algorithm measured by mean squared error. We provide an analysis of which features are most predictive of a given level.

Subject Categories:

  • Linguistics

Distribution Statement:

APPROVED FOR PUBLIC RELEASE