Accession Number:

ADA604304

Title:

Code Optimizers and Register Organizations for Vector Architectures

Descriptive Note:

Doctorial thesis

Corporate Author:

CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES

Personal Author(s):

Report Date:

1992-05-01

Pagination or Media Count:

172.0

Abstract:

A major challenge facing computer architects today is designing cost-effective hardware that executes multiple operations simultaneously. The goal of such designs is to improve performance by taking advantage of fine-grain parallelism. In this dissertation, I study vector architectures, the oldest of several processor designs that support fine-grain parallelism. Because implementing a cost-effective processor that performs well requires studying not only the design of processors but also the design of algorithms for compilers, this dissertation encompasses aspects of both hardware and software design. In the first half of this dissertation, I demonstrate that a vector architecture is a cost-effective processor that supports fine-grain parallelism. I show that implementing a vector architecture is no more costly than implementing a superscalar architecture, which is currently popular among designers of VLSI microprocessors. I then show that programs that are rich in parallelism tend also to be vectorizable and are also the ones that execute the longest in a workload, thus demonstrating further the effectiveness of vector architectures. Finally, I show that superpipelined hardware in combination with a vector architecture can take advantage of what little parallelism is available in non-vectorizable programs. In the second half of this dissertation, I investigate the cost and performance of different organizations for a vector register file in the Cray Y-MP vector processor, an investigation that emphasizes the interaction between processor design and compiler algorithms. After showing that instruction scheduling has a major impact on how effectively more vector registers can be used, I present data from simulation experiments indicating that 16 vector registers and a list scheduling algorithm can improve performance significantly over that of 8 vector registers and the scheduling algorithm used in the Cray vectorizing compiler.

Subject Categories:

  • Numerical Mathematics
  • Computer Programming and Software

Distribution Statement:

APPROVED FOR PUBLIC RELEASE