Accession Number:

ADA439568

Title:

PAFI: A Pattern Finding Toolkit

Descriptive Note:

Technical rept.

Corporate Author:

ARMY HIGH PERFORMANCE COMPUTING RESEARCH CENTER MINNEAPOLIS MN

Report Date:

2003-07-07

Pagination or Media Count:

21.0

Abstract:

PAFI is a set of programs that can be used to find frequent patterns in large and diverse databases. The current release of PAFI includes three different pattern discovery programs called LPMiner, SLPMiner, and FSG. LPMiner finds patterns corresponding to itemsets in a transaction database and is based on the algorithm described. SLPMiner finds patterns corresponding to sub-sequences in a sequential database and is based on the algorithm described. Finally, FSG finds patterns corresponding to connected undirected subgraphs in an undirected graph database and is based on the algorithms described. These programs can be used to mine a wide-range of datasets arising in commercial, information retrieval, and scientific applications. All three programs can be used to find patterns that satisfy a constant minimum support. Moreover, a key feature of LPMiner and SLPMiner is that they can find long frequent patterns without finding a large number of short patterns that are often useless. This is achieved by using length-decreasing support constraints, where the minimum occurrence frequency of a pattern is given as a non-increasing function of pattern length. PAFIs pattern discovery programs usually provide three additional functionalities. First, all three programs can generate maximal frequent patterns. A maximal frequent pattern is a frequent pattern that is not contained by any other frequent patterns. Generally, the number of maximal frequent patterns is much smaller than the number of all the frequent patterns, leading to higher readability of frequent pattern files. Second, SLPMiner and FSG can generate transaction-ID lists TID-lists indicating which sequences or graph transactions support a particular frequent pattern. Third, all three programs can generate parent-children-lists PC-lists that can be used to construct the frequent pattern lattice.

Subject Categories:

  • Information Science
  • Computer Programming and Software

Distribution Statement:

APPROVED FOR PUBLIC RELEASE