DID YOU KNOW? DTIC has over 3.5 million final reports on DoD funded research, development, test, and evaluation activities available to our registered users. Click
HERE to register or log in.
Accession Number:
ADA561948
Title:
A Method for Correcting Broken Hyphenations in Noisy English Text
Descriptive Note:
Final rept.
Corporate Author:
ARMY RESEARCH LAB ADELPHI MD COMPUTATIONAL AND INFORMATION SCIENCES DIRECTORATE/BATTLEFIELD ENVIRONMENT DIV
Report Date:
2012-04-01
Pagination or Media Count:
18.0
Abstract:
The problem of rejoining broken hyphenations in processed English text is addressed. A basic algorithm is developed, which makes use of a word validation step. Results of running the algorithm over an English military training text is presented and analyzed. Precision and recall scores show that the algorithm works well for correcting broken hyphenations, but fails when certain types of noise are encountered in the data.
Distribution Statement:
APPROVED FOR PUBLIC RELEASE