Accession Number:



Extending Generation and Evaluation and Metrics (GEM) to Grounded Natural Language Generation (NLG) Systems and Evaluating their Descriptive Texts Derived from Image Sequences

Descriptive Note:

[Technical Report, Technical Report]

Corporate Author:


Report Date:


Pagination or Media Count:



We present here, for consideration in a future Generation and Evaluation and Metrics GEM challenge, a graduated, task-based approach to evaluating grounded natural language generation NLG systems that generate descriptive texts derived from sequences of input images. We start by characterizing grounded NLG tasks that generate descriptive texts at increasing levels of complexity, then step through examples of these levels with image sequences and facet targets input and their derivative descriptive texts output from our human-authored data set. For evaluating whether a grounded NLG system is good enough for users needs, we first ask if the user can recover the images the system used to derive descriptive texts at the relevant, graduated level of complexity. The texts judged as adequate in this image-selection task are then analyzed for their semantic facet units SFUs, which form the basis for scoring descriptive texts generated by other grounded NLG systems. The image-selection and SFU scoring together constitute the evaluation we are piloting for grounded, data-to-text NLG systems.

Subject Categories:

  • Linguistics

Distribution Statement:

[A, Approved For Public Release]