An Analysis of Instruction-Cached SIMD Computer Architecture
Abstract:
In a single instruction-streammultiple data-stream SIMD computer, calculations are performed by simple processing elements PEs that are not independently capable of program-control operations. In lock-step, the PEs execute one program that is sequenced by a single system controller. Large numbers of these simple PEs are obtained through replication of a PE chip containing many identical PEs. A state-of-the-art SIMD computer is regulated by a single system clock that is distributed throughout the computer. On each system clock cycle, the system controller broadcasts the next instruction to be executed by the PEs. The system clock interval allows time to distribute a PE instruction throughout the computer, an action that typically requires more time than the minimum interval of a clock regulating the PEs themselves within the PE chips. The disparity between the highest rate of PE operation and the rate of global instruction broadcast gives rise to a heretofore uncompensated clock-rate limitation. To overcome this limitation, instruction-cached SIMD computer architecture provides for a small instruction buffer to be placed within the replicated PE chip. This buffer stores repeated instruction sequences for subsequent retrieval at the relatively high rate attainable within the PE chip. The instruction buffer and its control mechanism comprise a SIMD instruction cache, or 1-cache.