A Transformational Approach to High Performance Embedded Computing
COLORADO STATE UNIV FORT COLLINS
Pagination or Media Count:
This paper describes a transformational, high-level language approach to High Performance Embedded Computing on the SRC-6 machine and its MAP reconfigurable hardware. A program is initially written in pure C and compiled by the MAP C Compiler. Then, using feedback from the MAP C compiler, the program is successively transformed manually to achieve better performance. These transformations avoid certain inefficiencies, such as re-reading values from memory, loop slowdown caused by loop carried dependencies, and underutilizing memory bandwidth. We discuss the transformations in the context of the Wavelet Versatility Benchmark and the Gauss-Seidel iterative linear equation solver. FPGAs use a large number of pins to connect to memories. They do not have caches, but they have on-chip block RAM, allowing the programmer to decide what data stays on chip. Also, fine grain operation level parallelism combined with pipelining makes it possible for FPGAs to execute an inner loop body in one clock cycle. These characteristics provide a simple, deterministic performance model, allowing the programmer to work towards a well-defined goal store hot data structures on chip either in block RAM or in registers, create inner loop bodies that execute in one clock cycle, and use the full memory bandwidth of the machine by loop unrolling.
- Computer Programming and Software
- Theoretical Mathematics