SRI International: Description of the FASTUS System Used for MUC-4
SRI INTERNATIONAL MENLO PARK CA
Pagination or Media Count:
FASTUS is a slightly permuted acronym for Finite State Automaton Text Understanding System. It is a system for extracting information from free text in English, and potentially other languages as well, for entry into a database, and potentially for other applications. It works essentially as a cascaded, nondeterministic finite state automaton. It is an information extraction system, rather than a text understanding system. This distinction is important. In information extraction, only a fraction of the text is relevant. In the case of the MUC-4 terrorist reports, probably only about 10 of the text is relevant. There is a pre-defined, relatively simple, rigid target representation that the information is mapped into. The subtle nuances of meaning and the writers goals in writing the text are of no interest. This contrasts with text understanding, where the aim is to make sense of the entire text, where the target representation must accommodate the full complexities of language, and where we want to recognize the nuances of meaning and the writers goals. The MUC evaluations are information extraction tasks, not text understanding tasks, The TACITUS system that was used for MUC-3 in 1991 is a text-understanding system 1. Using it for the information extraction task gave us a high precision, the highest of any of the sites. However, our recall was mediocre, and the system was extremely slow. Our motivation in building the FASTUS system was to have a system that was more appropriate to the information extraction task.
- Information Science