DID YOU KNOW? DTIC has over 3.5 million final reports on DoD funded research, development, test, and evaluation activities available to our registered users. Click
HERE to register or log in.
Accession Number:
ADA551452
Title:
Security Classification Using Automated Learning (SCALE): Optimizing Statistical Natural Language Processing Techniques to Assign Security Labels to Unstructured Text
Descriptive Note:
Technical memorandum
Corporate Author:
DEFENCE RESEARCH AND DEVELOPMENT CANADA OTTAWA (ONTARIO)
Report Date:
2010-12-01
Pagination or Media Count:
68.0
Abstract:
Automating the process of assigning security classifications to unstructured text would facilitate a transition to a data-centric architecture-one that promotes information sharing, in which all data in an organization are electronically labelled. In this document, we report the results of a series of experiments conducted to investigate the effectiveness of using statistical natural language processing and machine learning techniques to automatically assign security classifications to documents. We present guidelines for selecting parameters to maximize the accuracy of a machine learning algorithms classification decisions for several well-defined collections of documents. We examine the significance of a documents topic and the effect of security policy changes on the ability of our system to automate classification we include design recommendations to address both topic and policy considerations. Our classification techniques prove effective at assessing a documents sensitivity, achieving accuracies upwards of 80.
Distribution Statement:
APPROVED FOR PUBLIC RELEASE