MITRE: Description of the Alembic System Used for MUC-6
MITRE CORP BEDFORD MA
Pagination or Media Count:
As with several other veteran MUC participants, MITREs Alembic system has undergone a major transformation in the past two years. The genesis of this transformation occurred during a dinner conversation at the last MUC conference, MUC-5. At that time, several of us reluctantly admitted that our major impediment towards improved performance was reliance on then-standard linguistic models of syntax. We knew we would need an alternative to traditional linguistic grammars, even to the somewhat non-traditional categorial pseudo-parser we had in place at the time. The problem was, which alternative The answer came in the form of rule sequences, an approach Eric Brill originally laid out in his work on part-of-speech tagging 5, 7. Rule sequences now underlie all the major processing steps in Alembic part-ofspeech tagging, syntactic analysis, inference, and even some of the set-fill processing in the Template Element task TE. We have found this approach to provide almost an embarrassment of advantages, speed and accuracy being the most externally visible benefits. In addition, most of our rule sequence processors are trainable, typically from small samples. The rules acquired in this way also have the characteristic that they allow one to readily mix hand-crafted and machine-learned elements. We have exploited this opportunity to apply both machine-learned and hand-crafted rules extensively, choosing in some instances to run sequences that were primarily machine-learned, and in other cases to run sequences that were entirely crafted by hand.
- Information Science