CRL/NMSU and Brandeis: Description of the MucBruce System as Used for MUC-4
NEW MEXICO STATE UNIV LAS CRUCES COMPUTING RESEARCH LAB
Pagination or Media Count:
Through their involvement in the Tipster project the Computing Research Laboratory at New Mexico State University and the Computer Science Department at Brandeis University are developing a method for identifying articles of interest and extracting and storing specific kinds of information from large volumes of Japanese and English texts. We intend that the method be general and extensible. The techniques involved are not explicitly tied to these two languages nor to a particular subject area. Development for Tipster has been going on since September, 1992. The system we have used for the MUC-4 tests has only implemented some of the features we plan to include in our final Tipster system. It relies intensively on statistics and on context-free text marking to generate templates. Some more detailed parsing has been added for a limited lexicon, but lack of fuller coverage places an inherent limit on its performance. Most of the information produced in our MUC templates is arrived at by probing the text which surrounds significant words for the template type being generated, in order to find appropriately tagged fillers for the template fields.
- Information Science