Cache-Based Architectures for High Performance Computing
Abstract:
Many researchers have noted that scientific codes perform poorly on computer architectures involving a memory hierarchy cache. Furthermore, a number of researchers and some vendors concluded that simply making the caches larger would not solve this problem. Alternatively, some vendors of HPC systems have opted to equip their systems with fast memory interfaces, but with a limited amount of on-chip cache and no off-chip cache. Some RISC-based HPC systems supported some sort of prefetching or streaming facility that allows one to more efficiently stream data between main memory and the processor e.g., the Cray T3E. However, there are fundamental limitations on the benefits of these approaches which makes it difficult to see how these approaches by themselves will eliminate the Memory Wall. It has been shown that if one relies solely on this approach for the Cray T3E, one is unlikely to achieve much better than 4-6 of the machines peak performance. Does this mean that as the speed of RISCCISC processors increases, systems designed to process scientific data are doomed to hit the Memory Wall The answer to that question depends on the ability of programmers to find innovative ways to take advantage of caches. This report discusses some of the techniques that can be used to overcome this hurdle allowing one to consider what types of hardware resources are required to support these techniques.