The Search for Lost Cycles: A New Approach to Parallel Program Performance Evaluation
ROCHESTER UNIV NY DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
Traditional performance debugging and tuning of parallel programs is based on the measure-modify approach, in which detailed measurements of program executions are used to guide incremental changes to the program that result in better performance. Unfortunately, the performance of a parallel algorithm is often related to its implementation, input data, and machine characteristics in surprising ways, and the measure-modify approach is unsuited to exploring these relationships fully it is too heavily dependent on experimentation and measurement, which is impractical for studying the large number of variables that can affect parallel program performance. In this paper we argue that the problem of selecting the best implementation of a parallel algorithm requires a new approach to parallel program performance evaluation, one with a greater balance between measurement and modeling. We first present examples that demonstrate that different parallelizations of a program may be necessary to achieve the best possible performance as one varies the input data, machine architecture, or number of processors used. We then present an approach to performance evaluation based on lost cycles analysis, which involves measurement and modeling of all sources of overhead in a parallel program. We describe a measurement tool for lost cycles analysis that we have incorporated into the runtime environment for Fortran programs on the Kendall Square KSR1, and use this tool to analyze the performance tradeoffs among implementations of 2D FFT and parallel subgraph isomorphism.
- Computer Programming and Software