Accession Number:



BBN: Description of the PLUM System as Used for MUC-4

Descriptive Note:

Conference paper

Corporate Author:


Report Date:


Pagination or Media Count:



Traditional approaches to the problem of extracting data from texts have emphasized hand-rafted linguistic knowledge. In contrast, BBNs PLUM system Probabilistic Language Understanding Model was developed as part of a DARPA-funded research effort on integrating probabilistic language models with more traditional linguistic techniques. Our research and development goals are more rapid development of new applications, the ability to train and re-train systems based on user markings of correct and incorrect output, more accurate selection among interpretations when more than one is found, and more robust partial interpretation when no complete interpretation can be found. A central assumption of our approach is that in processing unrestricted text for data extraction, a non-trivial amount of the text will not be understood. As a result, all components of PLUM are designed to operate on partially understood input, taking advantage of information when available, and not failing when information is unavailable. We had previously performed experiments on components of the system with texts from the Wall Street Journal, however, the MUC-3 task was the first end-to-end application of PLUM. Very little hand-tuning of knowledge bases was done for MUC-4 since MUC-3, the system architecture as depicted in figure 1 has remained essentially the same. In addition to participating in MUC-4, since MUC-3 we focused on porting to new domains and a new language, and on performing various experiments designed to control recallprecision tradeoffs. To support these goals, the preprocessing component and the fragment combiner were made declarative the semantics component was generalized to use probabilities on word senses we expanded our treatment of reference we enlarged the set of system parameters at all levels and we created a new probabilistic classifier for text relevance which filters discourse events.

Subject Categories:

  • Information Science
  • Linguistics
  • Cybernetics

Distribution Statement: