Implementing the Multiprefix Operation on Parallel and Vector Computers
CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE
Pagination or Media Count:
For an ordered set of n values, each with an associated integer label, the multiprefix operation calculates a partial sum for each value that is the sum of all preceding values with the same label. The multiprefix operation has been proposed as a parallel primitive because of its power for expressing many data parallel algorithms succinctly. However, most approaches to implementing this operation have used integer sorting to gather elements with the same label together or have suggested special hardware. In this paper we present a work efficient algorithm for the multiprefix operation on n elements that runs in S On parallel steps on a p n processor CRCW-ARB PRAM. The CRCW- ARB model ensures only that of multiple processors writing to the same location, an arbitrary one succeeds. We make use of this feature to resolve data dependencies in the first phase of the algorithm only so that all later steps guarantee EREW memory access. A fully vectorized version of our algorithm has been designed for the CRAY Y-MP and provides good performance for a number of important algorithms. For the integer sorting test of the NAS benchmarks, our multiprefix operation was used to create an algorithm that is competitive in performance with the current best algorithms for that machine. As another example, we show that by using the multiprefix operator for sparse-matrix dense- vector multiplication, we obtain performance exceeding traditional FORTRAN-based approaches. Finally, our algorithm also makes possible the simulation of a CRCW- PLUS PRAM on a p processor CRCW-ARB PRAM with only constant slowdown for problem sizes n p2.
- Computer Programming and Software