Collection of Spontaneous Speech for the ATIS Domain and Comparative Analyses of Data Collected at MIT and TI
MASSACHUSETTS INST OF TECH CAMBRIDGE LAB FOR COMPUTER SCIENCE
Pagination or Media Count:
As part of our development of a spoken language system in the ATIS domain, we have begun a small-scale effort in collecting spontaneous speech data. Our procedure differs from the one used at Texas Instruments TI in many respects, the most important being the reliance on an existing system, rather than a wizard, to participate in data collection. Over the past few months, we have collected over 3,600 spontaneously generated sentences from 100 subjects. This paper documents our data collection process, and makes some comparative analyses of our data with those collected at TI. The advantages as well as disadvantages of this method of data collection will be discussed.
- Information Science