Analysis of a Table-Driven Algorithm for Fast Context-Free Parsing
Abstract:
Context-free grammars Chomsky, 1956 have been widely used in describing the syntax of programming and natural languages. Numerous algorithms have been developed to recognize sentences in languages so described. Some are general, in the sense that they can handle all or most context-free grammars others are more restricted and can handle only a small subclass of these grammars including the grammars of most programming languages. These latter algorithms, e.g., the LL operator precedence, predictive, and LR parsing algorithms Aho and Ullman, 1972 are typically more efficient than the former because they take advantage of inherent features in the class of grammars they recognize. Most practical parsers analyze the syntax of their input in a single deterministic pass, without need of backup. Each symbol is examined only once, and, at the time it is examined, there is sufficient information available to make an necessary parsing decisions. In his famous paper, Knuth Knuth, 1965 established a family of context-free grammars known as LRk grammars and provided an effective test to determine, for a given positive integer k, whether a grammar belonged to the LR k class. The connection to practical parsers mentioned above is that an LR k grammar describes a language, all of whose sentences can be parsed in a single backup-free parse, if at most k symbols of look-ahead are available. Despite the wide coverage of LRk grammars, there are still many languages for which no LRk grammar exists. The example that sparked this work is ROSIE Kipps et al., 1987, a language for applications in artificial intelligence with a high-level English-like syntax.