CRL/Brandeis: The DIDEROT System
NEW MEXICO STATE UNIV LAS CRUCES COMPUTING RESEARCH LAB
Pagination or Media Count:
Diderot is an information extraction system built at CRL and Brandeis University over the past two years. It was produced as part of our efforts in the Tipster project. The same overall system architecture has been used for English and Japanese and for the micro-electronics and joint venture domains. The past history of the system is discussed and the operation of its major components described. A summary of scores at the 24 month workshop is given. Because of the emphasis on different languages and different subject areas the research has focused on the development of general purpose, re-usable techniques. The CRLBrandeis group have implemented statistical methods for focusing on the relevant parts of texts, programs which recognize and mark names of people, places and organizations and also dates. The actual analysis of the critical parts of the texts is carried out by a parser controlled by lexical structures for the key words in the text. To extend the systems coverage of English and Japanese some of the content of these lexical structures was derived from machine readable dictionaries. These were then enhanced with information extracted from corpora.
- Information Science