Developing a Common Interchange Model and Format for Representing Knowledge Synthesized from HLT Analytic Results
MITRE CORP MCLEAN VA MCLEAN
Pagination or Media Count:
In the Human Language Technology HLT domain, analytic results extracted from raw document sources are captured in varied models and formats due to the depth of what can be revealed and the diversity of interpretation. However, some common model and format must be followed to allow for multiple analytics to operate together in workflows and enable both the communication between analytics and the fusion of parallel or complementary results. This data integration problem is exacerbated when placing an emphasis on extracting knowledge from text, as the data model must be both adaptable and extensible to handle current and emerging content extraction capabilities and technologies. This paper describes a common interchange format and model designed to coordinate the extracted information from raw document sources in order to generate knowledge. The approach described adheres to the principles of adaptability and extensibility. It also provides the means to represent the annotation data that act as the reference for the knowledge and maintain provenance about these analytic results. While the data model and format described were designed for the HLT domain, the process used to develop them can be applied to other domains as well e.g., image processing, signal processing.