Detecting the Difficulty Level of Foreign Language Texts
Abstract:
This report describes experiments conducted on automatically determining the difficulty level of foreign language materials for the purpose of aiding teachers, students, and DoD linguists in finding suitable materials for supporting language learning and sustainment. The measure used as the indicator of difficulty is based on the Interagency Language Roundtable ILR proficiency scale, which is used to measure the proficiency levels of DoD linguists in listening, reading, speaking, writing, translating, and interpreting. The experiments described were conducted with a corpus of authentic Arabic and Mandarin Chinese materials from several genres that were hand-labeled for ILR level. The corpus contained materials at the 2, 2, and 3 levels. ILR level detectors were built for these levels for both the original Arabic and Mandarin sources as well as for human-produced English translations of these sources. The detectors were based on statistical language modeling techniques. The equal error rates EERs obtained ranged from 12.4-49.4 depending on the language, ILR level, language model order, and various other factors related to the experimental design. In general, the performance was best for discriminating level 3 materials from level 2 and 2 materials, with EERs ranging from 12.4-33.3 across the languages and translations, language model level, and experimental design. The performance was worst for discriminating level 2 materials from level 2 and 3 materials, with EERs ranging from 31.2-49.4.