University of Sheffield: Description of the LaSIE System as Used for MUC-6
Abstract:
The LaSIE Large Scale Information Extraction system has been developed at the University of Sheffield as part of an ongoing research effort into information extraction and, more generally, natural language engineering. LaSIE is a single, integrated system that builds up a unified model of a text which is then used to produce outputs for all four of the MUC-6 tasks. Of course this model may also be used for other purposes aside from MUC-6 results generation, for example we currently generate natural language summaries of the MUC-6 scenario results. Put most broadly, and superficially, our approach involves compositionally constructing semantic representations of individual sentences in a text according to semantic rules attached to phrase structure constituents which have been obtained by syntactic parsing using a corpus-derived context-free grammar. The semantic representations of successive sentences are then integrated into a discourse model which, once the entire text has been processed, may be viewed as a specialisation of a general world model with which the system sets out to process each text. LaSIE has a historical connection with the University of Sussex MUC-5 system GCE93 from which it derives its approach to world modelling and co-reference resolution and its approach to recombining fragmented semantic representations which result from partial grammatical coverage. However, the parser and grammar differ significantly from those used in the Sussex system. In its approach to named entity identification LaSIE borrows to some extent from the approach adopted in the MUC-5 Diderot system CGJ93. Virtually all of the code in LaSIE is new and has been developed since January 1995 with about 20 person-months of effort.