Enhancing Image Processing Performance for PCID in a Heterogeneous Network of Multi-core Processors
Abstract:
The Physically-Constrained Iterative Deconvolution PCID image deblurring code is being ported to heterogeneous networks of multi-core systems. This paper reports results from experiments using the JAWS supercomputer at MHPCC and the Cell Cluster at AFRL in Rome, NY. The results compare approaches to parallelizing FFT executions across the Xeons and the Cells Synergistic Processing Elements SPEs for frame-level image processing. Optimization of FFTs in the PCID code led to a decrease in relative processing time for FFTs. Profiling PCID version 6.2, about one year ago, showed the 13 functions that accounted for the highest percentage of processing were all FFT processing functions. They accounted for over 88 of processing time in one run on Xeons. FFT optimizations led to improvement in the current PCID version 8.0. A recent profile showed that only two of the 19 functions with the highest processing time were FFT processing functions. Timing measurements showed that FFT processing for PCID version 8.0 has been reduced to less than 19 of overall processing time. We are working toward a goal of scaling to 200-400 cores per job 1-2 imagery framescore. Running a pair of cores on each set of frames assigned to a worker reduces latency by implementing multithreading FFT processing. These results support the next higher level of parallelism in PCID, where groups of frames each producing one resolved image are sent to cliques of cores in a round robin fashion. We are fine-tuning the PCID parallelization strategy to balance processing over Xeons and Cell BEs to find an optimal partitioning of PCID over the heterogeneous processors. Using a publicationsubscription oriented information management system to implement a unified communications platform makes runs on large HPCs with thousands of intercommunicating cores more flexible and more fault tolerant. Techniques for adapting the code to single precision and performance results are reported.