Linguistic Resources for Speech Parsing
LINGUISTIC DATA CONSORTIUM PHILADELPHIA PA
Pagination or Media Count:
We report on the success of a two-pass approach to annotating metadata, speech effects and syntactic structure in English conversational speech separately annotating transcribed speech for structural metadata, or structural events, fillers, speech repairs or edit dysfluencies and SUs, or syntacticsemantic units and for syntactic structure treebanking constituent structure and shallow argument structure. The two annotations were then combined into a single representation. Certain alignment issues between the two types of annotation led to the discovery and correction of annotation errors in each, resulting in a more accurate and useful resource. The development of this corpus was motivated by the need to have both metadata and syntactic structure annotated in order to support synergistic work on speech parsing and structural event detection. Automatic detection of these speech phenomena would simultaneously improve parsing accuracy and provide a mechanism for cleaning up transcriptions for downstream text processing. Similarly, constraints imposed by text processing systems such as parsers can be used to help improve identification of dysfluencies and sentence boundaries. This paper reports on our efforts to develop a linguistic resource providing both spoken metadata and syntactic structure information, and describes the resulting corpus of English conversational speech.
- Operations Research