BBN PLUM: MUC-4 Test Results and Analysis
BBN SYSTEMS AND TECHNOLOGIES CORP CAMBRIDGE MA
Pagination or Media Count:
Our mid-term to long-term goals in data extraction from text for the next one to three years are to achieve much greater portability to new languages and new domains, greater robustness, and greater scalability. The novel aspect to our approach is the use of learning algorithms and probabilistic models to learn the domain-specific and language. specific knowledge necessary for a new domain and new language. Learning algorithms should contribute to scalability by making it feasible to deal with domains where it would be infeasible to invest sufficient human effort to bring a system up. Probabilistic models can contribute to robustness by allowing for words, constructions, and forms not anticipated ahead of time and by looking for the most likely interpretation in context. We began this research agenda approximately two years ago. During the last twelve months, we have focused much of our effort on porting our data extraction system PLUM to a new language Japanese and to two new domains. During the next twelve months, we anticipate porting PLUM to two or three additional domains. For any group to participate in MUC is a significant investment. To be consistent with our mid-term and long- term goals, we imposed the following constraints on ourselves in participating in MUC-4 We would focus our effort on semi-automatically acquired knowledge. We would minimize effort on handcrafted knowledge, and most generally. We would minimize MUC-specific effort. Though the three self-imposed constraints meant our overall scores on the objective evaluation were not as high as if we had focused on handtuning and handcrafting the knowledge bases, MUC-4 became a vehicle for evaluating our progress on the long-term goals.
- Information Science