Implementation of Parallel Algorithms
Abstract:
Randomization has been demonstrably useful both in simplifying and in improving the efficiency of sorting algorithms on actual parallel machines. Flashsort RV83,86 and Samplesort HC83 are related parallel sorting algorithms that have been proposed in the literature. Both utilize a sophisticated randomized sub-sampling technique with over-sampling to form a splitter set, but Samplesort differs from Flashsort in that it distributes all the splitters to each processor while Flashsort uses splitter-directed routing. In this paper we present B-Flashsort, a new batched-routing variant of Flashsort designed to sort N values using P processors with N P. Like the randomized sorts to which it is related, B-Flashsort performs NP log N parallel comparisons. The algorithm can be implemented on a wide variety of networks, but in this paper we concentrate on its implementation on a d-dimensional toroidal mesh. The algorithm requires only constant space in addition to the input and output.