Accession Number:



Discovering Models of Software Processes from Event-Based Data

Descriptive Note:

Technical rept.

Corporate Author:


Report Date:


Pagination or Media Count:



Many software process methods and tools presuppose the existence of a formal model of a process. Unfortunately, developing a formal model for an on-going, complex process can be difficult, costly, and error prone. This presents a practical barrier to the adoption of process technologies, which would be lowered by automated assistance in creating formal models. To this end, the authors have developed a data analysis technique that they term process discovery. Under this technique, data describing process events are first captured from an on-going process and then used to generate a formal model of the behavior of that process. In this paper, the authors describe a Markov method that they developed specifically for process discovery. They also describe two additional methods that they adopted from other domains and augmented for their purposes. The three methods range from the purely algorithmic to the purely statistical. The approach underlying the methods is to view the process discovery problem as one of grammar inference. In other words, the data describing the behavior of a process are viewed as sentences in some language the grammar of that language is then the formal model of the process. Following an introduction, Section 2 of the paper discusses the framework in which the authors define and analyze event data. Section 3 gives a more complete statement of the discovery problem and outlines their grammar inference approach. Section 4 provides needed background on grammar inference. The discovery methods themselves are described in Section 5. Section 6 presents a comparative evaluation of the methods. Section 7 describes DaGama, the tool implementing the discovery methods. The application of the methods in an industrial case study is reviewed in Section 8. In Section 9, the authors present a summary of their results, an overview of related work, and a discussion of future work.

Subject Categories:

  • Computer Programming and Software
  • Cybernetics

Distribution Statement: