BBN: Description of the PLUM System as Used for MUC-3
BBN SYSTEMS AND TECHNOLOGIES CORP CAMBRIDGE MA
Pagination or Media Count:
Traditional approaches to the problem of extracting data from texts have emphasized handcrafted linguistic knowledge. In contrast, BBNs PLUM system Probabilistic Language Understanding Model was developed as part of a DARPA-funded research effort on integrating probabilistic language models with more traditional linguistic techniques. Our research and development goals are more rapid development of new applications, the ability to train and re-train systems based on user markings of correct and incorrect output, more accurate selection among interpretations when more than one is found, and more robust partial interpretation when no complete interpretation can be found. We have previously performed experiments on components of the system with texts from the Wall Street Journal, however, the MUC-3 task is the first end-to-end application of PLUM. MI components except parsing were developed in the last 5 months, and cannot therefore be considered fully mature. The parsing component, the MIT Fast Parser 4, originated outside BBN and has a more extensive history prior to MUC-3. A central assumption of our approach is that in processing unrestricted text for data extraction, a non-trivial amount of the text will not be understood. As a result, all components of PLUM are designed to operate on partially understood input, taking advantage of information when available, and not failing when information is unavailable.
- Information Science