Accession Number:

ADA333294

Title:

Lattice Based Language Models

Descriptive Note:

Corporate Author:

CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE

Personal Author(s):

Report Date:

1997-09-01

Pagination or Media Count:

29.0

Abstract:

This paper introduces lattice based language models, a new language modeling paradigm. These models construct multi-dimensional hierarchies of partitions and select the most promising partitions to generate the estimated distributions. We discussed a specific two dimensional lattice and propose two primary features to measure the usefulness of each node the training-set history count and the smoothed entropy of its prediction. Smoothing techniques are reviewed and a generalization of the conventional backoff strategy to multiple dimensions is proposed. Preliminary experimental results are obtained on the SWITCHBOARD corpus which lead to a 6.5 perplexity reduction over a word trigram model.

Subject Categories:

  • Operations Research
  • Linguistics
  • Cybernetics

Distribution Statement:

APPROVED FOR PUBLIC RELEASE