AFIT/GCS/ENG/91D-20



# AD-A243 632

## EFFECT OF SPATIAL LOCALITY PREFETCHING ON STRUCTURAL LOCALITY

#### THESIS

Dirk D. Schalch, Captain, USAF

AFIT/GCS/ENG/91D-20

Approved for public release; distribution unlimited

91 12 24 08 5



| REPORT DOCUMENTATION PAGE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                         | OMB No. 0704-0188                                                                                                                                                                                                                                                                                                                                                    |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Public reporting burgen for this collection of informa<br>gathering and maintaining the data needed, and com<br>collection of information, including suggestions for re<br>Davis Highwar, Sute 1204, Arington, v2-22224302                                                                                                                                                                                                                                                                               | tion is estimated to average 1 hour pe<br>pieting and reviewing the collection of<br>ducing this burder, 15 Washington He<br>and to the Office of Management an                                                                                                                                                           | r response, including the time for<br>information Send comments re<br>vadquarters Services, Directorate<br>d Budget, Paperwork Reduction P                                                                                              | reviewing instructions, searching existing data source<br>garding this burden estimate or any other aspect of t<br>for information Operations and Reports, 1215 jeffers<br>roject (0704-0188), Washington, UC 2050                                                                                                                                                   |
| 1. AGENCY USE ONLY (Leave blank)                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 2. REPORT DATE<br>December 1991                                                                                                                                                                                                                                                                                           | 3. REPORT TYPE A<br>Master's The                                                                                                                                                                                                        | ND DATES COVERED                                                                                                                                                                                                                                                                                                                                                     |
| 4. TITLE AND SUBTITLE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                         | 5. FUNDING NUMBERS                                                                                                                                                                                                                                                                                                                                                   |
| EFFECT OF SPATIAL LOCAL<br>STRUCTURAL LOCALITY                                                                                                                                                                                                                                                                                                                                                                                                                                                           | ITY PREFETCHINC ON<br>Y                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                      |
| 6. AUTHOR(S)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                         | -1                                                                                                                                                                                                                                                                                                                                                                   |
| Dirk D. Schalch                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                      |
| 7. PERFORMING ORGANIZATION NAME                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | (S) AND ADDRESS(ES)                                                                                                                                                                                                                                                                                                       |                                                                                                                                                                                                                                         | 8. PERFORMING ORGANIZATION                                                                                                                                                                                                                                                                                                                                           |
| Air Force Institute of Technology, WPAFB OH 45433-6583                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                           | OH 45433-6583                                                                                                                                                                                                                           | REPORT NUMBER                                                                                                                                                                                                                                                                                                                                                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | <i></i> , <i></i> ,                                                                                                                                                                                                                                                                                                       |                                                                                                                                                                                                                                         | AFIT/GCS/ENG/91D-20                                                                                                                                                                                                                                                                                                                                                  |
| . SPONSORING / MONITORING AGENCY                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | NAME(S) AND ADDRESS(E                                                                                                                                                                                                                                                                                                     | 5)                                                                                                                                                                                                                                      | 10. SPONSORING / MONITORING                                                                                                                                                                                                                                                                                                                                          |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                      |
| 1. SUPPLEMENTARY NOTES                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                         | <u></u>                                                                                                                                                                                                                                                                                                                                                              |
| 2a. DISTRIBUTION / AVAILABILITY STAT                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | EMENT                                                                                                                                                                                                                                                                                                                     |                                                                                                                                                                                                                                         | 12b. DISTRIBUTION CODE                                                                                                                                                                                                                                                                                                                                               |
| Distribution Unlimited                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                      |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                      |
| 3. ABSTRACT (Maximum 200 words)<br>The purpose of this rese                                                                                                                                                                                                                                                                                                                                                                                                                                              | earch was to analyz                                                                                                                                                                                                                                                                                                       | ze the effect th                                                                                                                                                                                                                        | at spatial locality                                                                                                                                                                                                                                                                                                                                                  |
| 3. ABSTRACT (Maximum 200 words)<br>The purpose of this rese<br>prefetching in cache mem<br>referencing behavior. The<br>a proposed two-level cach<br>to use spatial locality<br>in executing computer work<br>incorporate the combined<br>prefetching. From these<br>for both caches. Combine<br>referencing models, meass<br>solve the hit probability<br>through structural local<br>locality prefetching is                                                                                           | earch was to analyze<br>ory has on the str<br>o examine this, a<br>the memory subsyste<br>prefetching to exp<br>orkloads. New memo<br>use of structural<br>e models, equations<br>and with the state<br>surements from the<br>y equations. This<br>ity prefetching ar<br>being used in the                                | the effect the<br>suctural localit<br>software simula<br>em. The propose<br>bloit the struct<br>ory referencing a<br>locality and s<br>were derived to<br>transition proba<br>trace-driven sim<br>research showed<br>blower level cach  | at spatial locality<br>y of program memory<br>tor was built to model<br>d subsystem was designed<br>ural locality contained<br>models were developed to<br>patial locality<br>o predict the hit rates<br>abilities of the memory<br>mulations were used to<br>d that performance gains<br>e even when spatial<br>he.                                                 |
| 3. ABSTRACT (Maximum 200 words)<br>The purpose of this rese<br>prefetching in cache mem<br>referencing behavior. The<br>a proposed two-level cach<br>to use spatial locality<br>in executing computer word<br>incorporate the combined<br>prefetching. From these<br>for both caches. Combin<br>referencing models, meass<br>solve the hit probability<br>through structural local<br>locality prefetching is<br>4. SUBJECT TERMS                                                                        | earch was to analyze<br>ory has on the str<br>o examine this, a<br>the memory subsyste<br>prefetching to exp<br>orkloads. New memo<br>use of structural<br>e models, equations<br>and with the state<br>surements from the<br>y equations. This<br>ity prefetching ar<br>being used in the                                | the effect the<br>ructural localit<br>software simula<br>em. The propose<br>oloit the struct<br>bry referencing for<br>locality and s<br>were derived to<br>transition prob-<br>trace-driven sim<br>research showed<br>lower level cach | at spatial locality<br>y of program memory<br>tor was built to model<br>d subsystem was designed<br>ural locality contained<br>models were developed to<br>patial locality<br>o predict the hit rates<br>abilities of the memory<br>mulations were used to<br>d that performance gains<br>e even when spatial<br>he.                                                 |
| 3. ABSTRACT (Maximum 200 words)<br>The purpose of this rese<br>prefetching in cache mem<br>referencing behavior. The<br>a proposed two-level cach<br>to use spatial locality<br>in executing computer words<br>incorporate the combined<br>prefetching. From these<br>for both caches. Combined<br>referencing models, meases<br>solve the hit probability<br>through structural local<br>locality prefetching is<br>4. SUBJECT TERMS<br>Cache, Spatial Locality,                                        | earch was to analyze<br>fory has on the structure<br>to examine this, a<br>the memory subsyste<br>prefetching to exp<br>orkloads. New memory<br>use of structural<br>e models, equations<br>and with the state<br>surements from the<br>y equations. This<br>ity prefetching ar<br>being used in the<br>Structural Locali | ty<br>ty                                                                                                                                                                                                                                | at spatial locality<br>y of program memory<br>tor was built to model<br>d subsystem was designed<br>ural locality contained<br>models were developed to<br>patial locality<br>o predict the hit rates<br>abilities of the memory<br>mulations were used to<br>d that performance gains<br>e even when spatial<br>he.<br>15. NUMBER OF PAGES<br>149<br>16. PRICE CODE |
| 3. ABSTRACT (Maximum 200 words)<br>The purpose of this rese<br>prefetching in cache mem<br>referencing behavior. The<br>a proposed two-level cach<br>to use spatial locality<br>in executing computer words<br>incorporate the combined<br>prefetching. From these<br>for both caches. Combin<br>referencing models, meas<br>solve the hit probability<br>through structural local<br>locality prefetching is<br>4. SUBJECT TERMS<br>Cache, Spatial Locality,<br>7. SECURITY CLASSIFICATION [18. SPINOR] | arch was to analyz<br>hory has on the str<br>to examine this, a<br>the memory subsyste<br>prefetching to exp<br>orkloads. New memo<br>use of structural<br>e models, equations<br>and with the state<br>surements from the<br>y equations. This<br>ity prefetching ar<br>being used in the<br>Structural Locali           | ty                                                                                                                                                                                                                                      | at spatial locality<br>y of program memory<br>tor was built to model<br>d subsystem was designed<br>ural locality contained<br>models were developed to<br>patial locality<br>o predict the hit rates<br>abilities of the memory<br>mulations were used to<br>d that performance gains<br>e even when spatial<br>he.<br>15. NUMBER OF PAGES<br>149<br>16. PRICE CODE |

AFIT/GCS/ENG/91D-20

# EFFECT OF SPATIAL LOCALITY PREFETCHING ON STRUCTURAL LOCALITY

#### THESIS

Presented to the Faculty of the School of Engineering of the Air Force Institute of Technology

Air University

In Partial Fulfillment of the

Requirements for the Degree of

Master of Science in Computer Systems

lon For USANI

orl' Tet Generation Generation

D1 Spination

6 +35911 TTV -

Dist : epastel

20 TRI CARL

. 1 .

1.7

Dirk D. Schalch, B.S.

Captain, USAF

December 1991

Approved for public release; distribution unlimited

#### Acknowledgments

In accomplishing this thesis, I received invaluable assistance from others. I am greatly indebted to my faculty advisor, Maj William C. Hobart, Jr., for his leadership and support. His belief in me and the importance of my research guided my efforts to completion.

I also would like to thank my thesis committee members, Maj Eric R. Christensen and Maj Kim Kanzaki, for their help. Maj Christensen provided timely assistance with the Ada programming language.

And finally, I wish to thank my wife Maggie whose patience and support made a difficult task much easier to bear.

Dirk D. Schalch

# Table of Contents

|                                                                                 | Page  |
|---------------------------------------------------------------------------------|-------|
| Acknowledgments                                                                 | ii    |
| List of Figures                                                                 | iv    |
| List of Tables                                                                  | v     |
| Abstract                                                                        | vi    |
| I. Introduction                                                                 | 1-1   |
| II. Literature Review                                                           | 2-1   |
| III. Methodology                                                                | 3-1   |
| Justification of Methodology Selected                                           | 3-1   |
| Functional Requirements                                                         | 3-3   |
| Cache Simulator Preliminary Design                                              | 3-5   |
| Implementation of Cache Simulator in Ada                                        | 3-8   |
| Validation of Cache Simulator                                                   | 3-31  |
| Summary $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ | 3-33  |
| IV. Findings                                                                    | 4-1   |
| Workload Selection                                                              | 4-1   |
| maraca-driven Simulations                                                       | 4-3   |
| CAN Cache Wit Drobability                                                       | 4-5   |
| CAM Cache Hit Plobability                                                       | 4-5   |
| Structural Locality Cache Hit Probability                                       | 4-15  |
| Performance Analysis                                                            | 4-22  |
| Summary                                                                         | 4-23  |
| V. Conclusions and Recommendations                                              | 5-1   |
| Appendix A: Cache Simulator Source Code                                         | A-1   |
| Appendix B: Test Trace Mapping to Testing Requirements                          | B-1   |
| Appendix C: Wc and Ws vs. Effective Cache Size Graphs                           | C-1   |
| Bibliography                                                                    | BIB-1 |
| Vita                                                                            |       |

# List of Figures

| Figure |                                                                     | Page |
|--------|---------------------------------------------------------------------|------|
| 2.1    | Two-State Markov Model of Program Behavior                          | 2-1  |
| 2.2    | Hobart's Proposed Memory Subsystem                                  | 2-6  |
| 3.1    | Cache Simulator Structure Chart                                     | 3-7  |
| 3.2    | Linked_List_Package Specification                                   | 3-12 |
| 3.3    | CircularQ_Package Specification                                     | 3-14 |
| 3.4    | Addr_Record_Package Specification                                   | 3-16 |
| 3.5    | Cache_Simulator_Driver Procedure                                    | 3-17 |
| 3.6    | Cache Processing Flow                                               | 3-21 |
| 3.7    | SLC Prefetch After SLC Miss and CAM Hit $\ldots$ .                  | 3-23 |
| 3.8    | SLC and CAM Prefetches After Both Caches Missed $$ .                | 3-24 |
| 3.9    | Fetch_Address_Package Specification                                 | 3-25 |
| 3.10   | Hex_to_Dec_Package Specification                                    | 3-26 |
| 3.11   | Determine_Type_Package Specification                                | 3-27 |
| 3.12   | Compute_Miss_Ratios_Package Specification                           | 3-28 |
| 3.13   | Compute_Memory_Access_Time_Package Specification .                  | 3-29 |
| 3.14   | Compute_Cache_Pollution_Package Specification                       | 3-29 |
| 3.15   | Generate_Ref_Frequency_List_Package Specification                   | 3-30 |
| 3.16   | Simulator Testing Requirements                                      | 3-32 |
| 4.1    | Markov Model for CAM Cache Referencing With<br>Prefetching          | 4-6  |
| 4.2    | Modified Markov Model for CAM Cache Referencing<br>With Prefetching | 4-8  |
| 4.3    | Markov Model for SLC Referencing With Prefetching                   | 4-16 |
| 4.4    | Modified Markov Model for SLC Referencing With<br>Prefetching       | 4-18 |

.

# List of Tables

| Table |                                                   | Page |
|-------|---------------------------------------------------|------|
| 3.1   | Cache Simulator Requirements Matrix               | 3-9  |
| 4.1   | Symbolic Workloads Used in Simulations            | 4-2  |
| 4.2   | SLC and CAM Cache Parameters                      | 4-3  |
| 4.3   | Cache Performance Statistics                      | 4-5  |
| 4.4   | State Transition Probabilities - All References . | 4-10 |
| 4.5   | CAM Cache Hit Probability Comparisons             | 4-15 |
| 4.6   | SLC Hit Probability Comparisons                   | 4-21 |
| 4.7   | Speedup Due to Spatial and Structural Prefetching | 4-23 |

#### <u>Abstract</u>

The purpose of this research was to analyze the effects that spatial prefetching in cache memory have on the structural locality of program memory referencing behavior. To examine this, a software simulator was built to model a proposed two-level cache memory subsystem. The proposed subsystem was designed to use spatial prefetching to exploit the structural locality contained in executing computer workloads.

New memory referencing models were developed to incorporate the combined use of structural locality and spatial locality prefetching. From these models, equations were derived to predict the hits rates for both caches. Combined with the state transition probabilities of the memory referencing models, measurements from the trace-driven simulations were used to solve the hit probability equations.

This research showed that performance gains through structural locality prefetching are still possible even when spatial locality prefetching is being used in the lower level cache.

vi

#### Chapter 1

#### Introduction

## 1.1 Background

One of the significant factors in improving the performance of a computer system is minimizing the time required to access instructions and data in main memory. Cache memory performs this vital function. Located between the computer processor and main memory, cache memory is small, high-speed memory designed to temporarily store portions of main memory most likely to be referenced by the computer in the near future. Cache memory can typically reduce access time to instructions and data to 10-25 percent of the time to directly access main memory (Smith, 1982:473).

Its extremely fast operating speeds require cache memory to be implemented by special hardware containing high-speed logic circuits (Hayes, 1988:443). As a result, cache memory is very expensive. The challenge for the cache memory designer is to minimize design cost while maximizing cache performance.

One of the key considerations in designing cache memory is understanding the effects that computer workloads can have on cache memory performance (Hobart, 1989:4). Despite its small size, cache memory is able to successfully perform its

functions due to the locality characteristics of workload execution. Three types of program locality are spatial, temporal, and structural (defined in section 1.5). By characterizing the memory referencing behavior of expected computer workloads, one can optimally design a cache memory subsystem which increases computer performance by reducing memory access time.

To take advantage of these referencing localities, cache memory can prefetch in blocks of instructions and data from main memory. Based on current memory referencing, these cache blocks have a high probability of satisfying subsequent memory references (cache hits).

While program locality ensures cache performance, a trade-off exists in how much to prefetch. If the block size is small, reduced bus bandwidth could boost cache performance through decreased transfer time. However, the block size may not be exploiting the locality potential in the workloads. In turn, cache miss ratios could increase. On the other hand, a large block size could improve hit rates by capturing more of the available locality. But this advantage could be hampered by reduced effective cache size resulting from prefetching unneeded references (known as cache pollution).

From this discussion, it becomes apparent that block size prefetch strategies play an important role in determining the effectiveness of a cache memory design.

## 1.2 Statement of Problem

The purpose of my thesis research is to analyze the effects that spatial locality prefetching has on structural locality. This research focuses on the cache pollution which occurs when spatial prefetching is used in a two-level cache hierarchy.

## 1.3 Research Objectives

This research involves the two-level cache memory subsystem proposed by Hobart (Hobart, 1989:96-99). The proposed design (discussed in detail in next chapter) uses two caches to further reduce memory access time.

The cache hierarchy employs spatial prefetching in the secondary cache (closest to main memory). This action attempts to capture any structural locality being exhibited by the executing workload. These referenced structures are then prefetched into a smaller, faster first-level cache located between the secondary cache and the processor.

In addition, Hobart developed four Markov models to represent the referencing behavior of both caches employing prefetching and no prefetching (Hobart, 1989:100-112). From these models, cache hit probability equations were derived. This research analyzes the two Markov models involving prefetching in both caches (discussed in Chapter 4).

The objectives of this research are as follows:

- To design, build, and implement a cache simulator

to represent Hobart's two-level cache hierarchy,

- To use trace-driven simulation to measure the following cache performance statistics resulting from various spatial prefetching strategies: miss ratios, pollution, and effective memory access time.

- To determine how the two Markov models for prefetching can be modified to account for the effects of spatial prefetching,

- To derive cache hit probability equations from these modified Markov models,

- To incorporate the cache pollution measurements into these hit probability equations.

And in so doing, this research

- Provides a documented analysis of the effects that spatial prefetching has on structural locality,

- Provides an analytic model which comprehensively incorporates the effects of spatial prefetching on the two-level cache performance,

- Provides a method to predict cache hit probabilities using measured pollution rates,

- Identifies optimal prefetch block sizes for given cache sizes which could serve as design parameters for a possible hardware implementation of the proposed memory subsystem.

#### 1.4 Research Questions

The questions involved in this research are as follows: - Can the cache simulator be structurally developed in the Ada language and provide acceptable simulation processing speed?

- Does spatial prefetching into the secondary cache effectively capture referenced structures (structural locality) inherent in the workloads? And in the process, how is cache performance affected by any resulting pollution?

Does structural prefetching into the first-level cache produce an acceptable level of performance? How has this performance been affected by any resulting pollution?
Can the cache pollution measurements obtained from spatial prefetching strategies be used to predict the hit

ratios for both caches?

## 1.5 Definitions

#### 1.5.1 Block

A block (also referred to as line) is a unit of cache memory storage identified by a tag. Block size is always a power of two.

#### 1.5.2 Miss

The event that a requested memory address is not

available when referenced in a given cache memory level. If a miss occurs in main memory, it is known as a page fault. A hit represents the opposite event: requested memory address is available.

# 1.5.3 Miss Ratio

The number of misses occurring in a given cache divided by the total number of memory references to that cache. The miss ratio is a cache performance metric. The hit ratio represents the opposite metric: number of hits in a cache divided by total number of references to that cache.

# 1.5.1 Pollution

As defined by Smith, cache pollution is the portion of prefetched data which is never referenced while residing in the cache (Smith, 1982:482). Cache pollution reduces the cache's effective size.

# 1.5.4 Prefetching

Prefetching is the transfer of a block of instructions or data from one level of the memory hierarchy (such as main memory) to a higher level (such as cache memory) prior to being used at that higher level. Unless otherwise noted, prefetch block size will always equal the cache block size.

# 1.5.2 Spatial Locality

Spatial locality is the condition that subsequent memory references will likely occur in locations close to the current reference. Examples of spatial locality are data files or results of a relational database query clustered by an identifying attribute. Both tend to be stored together physically in memory.

#### 1.5.3 Structural Locality

Structural locality is the condition that a given set of memory references will likely be referenced in the same order as previously referenced. Termed by Thazhuthaveetil, structural locality is the newest concept to be studied (Thaz, 1986). An example of structural locality is subroutine which may be repeated several times during program execution.

#### 1.5.4 Temporal Locality

Temporal locality is the condition that a current memory reference will be likely referenced again in the near future. An example is a program loop which repeats instructions.

## 1.6 Scope of Research

This research involves designing, building, and implementing the cache simulator according to the behavioral description of Hobart's proposed memory subsystem. Once

coded, the simulator is thoroughly tested to prove correctness of design. This validation process ensures that research findings are based on accurate data.

Once the simulator is developed, trace-driven simulations are used to measure the performance in both caches. The traces are comprised of collected memory references obtained from various computer workloads.

The resulting data is used to characterize the effects of spatial prefetching on structural locality. From this analysis, modified cache behavior models are developed to account for these spatial prefetching effects. Cache hit probability equations are derived from these new models. Within these equations, pollution measurements are used to predict the hit rates for the two caches. Using these hit ratios, effective memory access times are calculated for various spatial prefetching parameters.

This research does not involve measuring the effects of prefetching on the access cycle time of each cache memory. To determine the effective memory access time, the cycle times for the memory levels are based on typical values associated with current technology. In addition, this research does not involve developing the hardware circuit design for any components of the proposed memory subsystem. Instead, it employs software simulation to study the different aspects of cache behavior.

#### 1.7 Assumptions and Limitations

- The proposed cache memory subsystem operates in a single processor environment.

- Since the goal of this research is to study the effects of prefetching based on symbolic program behavior, system activities such as context switching and interrupt servicing are not included in the workloads. As a continuation of Hobart's research, this research serves as a baseline from which future studies can analyze the effects of prefetching based on total system behavior. The literature review covers some techniques for incorporating context switching in trace-driven simulations.

#### 1.8 Summary

This chapter has provided an overview of this thesis effort. The following chapters cover three main areas. Chapter 2 provides an extensive literature review of applicable research. Next, Chapter 3 describes the methodology used to conduct this research. In particular, this chapter covers the design and development of the cache simulator. Chapter 4 provides a detailed description of the research results. Modified cache behavior models incorporating the effects of spatial prefetching are presented. From these models, cache hit probability equations are derived. Calculated results are then compared with actual simulation

measurements. In addition, this chapter investigates cache performance improvements occurring from spatial prefetching. Finally, Chapter 5 provides the research conclusions. Recommendations for future research are presented.

#### Chapter 2

# Literature Review

## 2.1 Locality Characteristics of Symbolic Workloads

Hobart analyzed the spatial, temporal, and structural localities of symbolic workloads (Hobart, 1989). Symbolic processing is associated with artificial intelligence applications.

Using trace-driven simulation, Hobart methodically characterized the locality aspects of symbolic workloads by examining the low-level memory referencing behavior. To examine the "temporal distances" of memory references, Hobart developed a two-state Markov model as shown in Figure 2.1 (Hobart, 1989:40-42). The various state transition probabilities are discussed in Chapter 4.



Figure 2.1: Two-State Markov Model of Program Behavior

Previously unreferenced addresses are described by the "new reference" state. While previously referenced addresses are depicted by the "old reference" state. Transitions between the two states occur as memory referencing shifted from old locations to new locations (vice versa). Within the old reference state, consecutive references which occur in the same order in which they were previously referenced are represented as "same-stack-distance (SSD)" transitions. The notion of a "stack" is used to describe the ordered contents of the cache. Within the cache, stack distance represents the spatial distance from the last address (top of the stack) to another address. Given a previously referenced address, an SSD reference takes place when the next reference is located the same spatial distance from the top of the stack as when it was last referenced. In turn, consecutive SSD transitions show a rereferencing of cache addresses in the same order as before. Conversely, consecutive old references with different stack distances (not in the same order) are "not-same-stackdistance (NSSD)" transitions. In total, Hobart identified five possible state transitions: New-New, New-Old, Old-New, Old-SSD, and Old-NSSD.

Employing this model, Hobart implemented a systematic method for extensive analysis of program locality. New metrics were developed to measure referencing behavior.

Studying the spatial characteristics of reference strings, Hobart discovered an unique aspect of memory refer-

encing behavior. When an executing workload is exhibiting spatial locality, subsequent references took place within a physical address distance of 32 words from the previous references (Hobart, 1989:51-53). He labeled this characteristic the "spatial locality window." This narrow spatial window was found to exist in all types of workloads both symbolic and non-symbolic (conventional).

From this, a new spatial locality metric was developed called the "spatial window probability (*Psw*)": the probability that given a current address, the subsequent reference is within 32 words. Hobart observed that the *Psw* of symbolic workloads was almost 50 percent greater than of conventional workloads. This was attributed to the higher percentages of instruction fetches inherent to symbolic processing.

As a result of the higher *Psw*, Hobart suggested that spatial prefetching may prove more effective for symbolic workloads. In addition, the narrow spatial window allowed smaller prefetch block sizes which reduces the potential for cache pollution.

To analyze temporal locality characteristics, three metrics were developed to measure the cumulative temporal distances of program referencing (Hobart, 1989:55-69). The *LRU90*, *LRU95*, and *LRU99* metrics represented "simulated, fullyassociative least recently used (LRU) stack depths" needed to capture 90, 95, and 99 percent of a workload's old references, respectively.

Hobart found that the temporal distances of symbolic workloads were significantly less than those of conventional workloads. The conventional *LRU99* was five times greater. This behavior was attributed to program referencing characteristics. Symbolic workloads tend to access the first few elements of a list. In contrast, conventional workloads tend to access their entire structures evenly. In addition, symbolic workloads only reference about one third the number of distinct data addresses.

From this, Hobart suggested a trade-off between cache design options tailored toward symbolic workloads. Based on the temporal analysis, a 99 percent hit rate on old references can be attained with a cache one fifth the size that would be required for conventional workloads.

To analyze structural locality characteristics, Hobart developed the *Pssd* metric: the probability that given a previously referenced address, the subsequent reference will be to an address with the "same stack distance" to the previous reference (Hobart, 1989:69-71).

Hobart found the *Pssd* depicted one of the most unique aspects of symbolic memory referencing behavior. Over one half of symbolic references (Pssd = 0.550) can be classified as the referencing of ordered structures. In contrast, the percentage for conventional workloads is only slightly more than one fourth.

As a result of his findings, Hobart proposed a memory

subsystem design to exploit the structural locality of symbolic workloads (Hobart, 1989:96-99). The design involves a two-level cache hierarchy comprised of a small structural locality cache (SLC) close to the processor and a larger content-addressable memory (CAM) cache. The proposed design is shown in Figure 2.2.

The main function of the CAM is to capture the structural locality being exhibited by the workload. It can accomplish this goal by using a first-in-first-out (FIFO) circular buffer replacement algorithm. This algorithm allows the CAM to store blocks containing requested addresses in the order received from main memory. As a result of the maintained order, structural locality will remain intact after being fetched into the CAM.

The SLC is then able to exploit this structural locality by prefetching requested blocks from the CAM. Unlike the CAM, the SLC does not require reordering. This reason combined with the small size of the SLC allow an LRU replacement algorithm to be employed.

Both caches are comprised of content addressable memory. This type of memory is fully associative and, in turn, allows addresses to be simultaneously searched. The result is improved access times due to reduced latency.

As discussed in Chapter 1, this proposed memory subsystem will used in this research. One of Hobart's main concerns was cache pollution in the CAM resulting from spatial prefetching.



Figure 2.2: Hobart's Proposed Memory Subsystem

2-6

How would reduced structural locality resulting from pollution effect the performance of the SLC and CAM caches? This research will focus on this concern.

## 2.2 Trace-Driven Simulation

Trace-driven simulation involves the use of collected sequences of virtual addresses (traces) to drive a simulation model of the cache memory system (Smith, 1982:479-480). By changing parameters within the simulator, the effects of cache design choices (cache size, block size, replacement, etc.) can be studied.

Smith identified two major advantages of trace-driven simulation (Smith, 1987:1065). It allows the analysis of cache memory performance based on actual computer workload behavior. Mathematical models and random number generators have fallen short in representing true program characteristics. The other advantage is flexible and feasible cache design assessment. Hardware prototypes require extensive development time with minimum design variance. In contrast, cache simulators can be developed quicker with maximum design parameter ranges.

Smith pointed out two major limitations of trace-driven simulation (Smith, 1987:1065). He found that actual miss ratios from executing workloads in a real system environment are almost always exceed those obtained through simulations. This difference can be attributed to several reasons. Due to

their relatively small sizes, traces may not provide representative samples of the computer workloads. Operating system activity, such as context switching and interrupt servicing, may not be contained in the trace. In addition, input/output handling may not be included.

The other limitation results in reduced analysis of larger caches. In order to incorporate process switching, Smith employed cache "flushing" to simulate the changing of working sets. However, this approach combined with the limited size of traces prevented larger caches (beyond 32K bytes) from being filled. The result was a lower limit to large cache performance.

To improve the trace collection process, Agarwal, Sites, and Horowitz developed a new technique involving the modification of microcode to capture memory references as they occur (Agarwal and others, 1986:119-127). Using the VAX 8200, they implemented microcode changes which recorded all virtual address references to include process switching and system calls. The method was dependent on available microcode memory. Microcode was modified everywhere a memory reference could be generated. As a result, saturated memory prevented all required microcode changes.

Hobart overcame this memory problem. Using a microcode modification technique similar to Agarwal, Sites, and Horowitz's, Hobart eliminated the need to embed microcode changes at every memory referencing location (Hobart, 1989:31-

32). Instead, the page map table was changed to automatically invoke the "page fault abort handler" on every virtual address reference. In turn, the handler was modified to collect the virtual address traces.

(E)

To improve trace data as a true representation of program behavior, Iyer, Laha, and Patel developed a sampling technique to estimate the distribution of cache miss ratios due to task switching (Iyer and others, 1988:1325-1330). Similar to Smith's approach, their technique uses an emptying of the cache to represent a context switch. However, while Smith simulated task switching by purging the cache at consecutive intervals, Laha, Patel, and Iyer sampled a trace at points where task switches occurred.

Their methodology involves the following steps. First, they chose a sample size which represented the length of the process interval. Next, based on trace size, the sampling frequency was established to obtain a target number of samples. Once these parameters were set, the trace samples were collected so that the start of each sample mapped to a context switch. When running the simulation, the cache memory would be purged at the start of each sample to coincide with the task switch.

The result is a trace-driven simulation which produces a more accurate distribution of the cache miss ratio. Typical samples sizes were 5000, 10000, and 20000 address references. Laha, Patel, and Iyer used 35 samples from each trace to

attain an acceptable level of confidence.

#### 2.3 Impact of Prefetching on Cache Performance

The locality of memory referencing in executing workloads provides the opportunity to predict which portions of the address space will most likely be referenced next. Cache memory takes advantage of this locality by prefetching instructions and data ahead of their actual usage.

As Smith pointed out, the result is substantial improvement of system performance (Smith, 1978:7-21). The other choice is to fetch on demand. Smith explained that demand fetches incur high penalties in CPU overhead and idle time. In a single process environment, the CPU must wait while transfers between cache and main memory are accomplished. Even with multiprogramming, a process may not always be ready to execute while memory requests are being satisfied. In addition, having to schedule and start each separate demand fetch can lead to increased overhead per transfer.

In determining the effectiveness of prefetching, Smith compared the reduction of the miss ratio for a given block size to the corresponding increase in transfer ratio. Transfer ratio was comprised of the miss ratio and prefetch ratio: "number of prefetch data transfers to total number of memory references." Smith concluded that prefetching can reduce the cache miss ratio at a cost level less than the percentage increase in transfer ratio. He found that a block

fetch size of 32 bytes was generally effective for cache memory sizes up to 64K bytes. A 64-byte block size produced comparable (even better) miss ratios but at the expense of increased transfer costs. In this research, a CAM block size of 32 bytes is used with a 32K byte CAM for one set of simulations.

In their research of cache performance in a Unix environment, Alexander, Keshlear, Cooper, and Briggs also looked at prefetching effects on bus traffic (Alexander and others, 1986:41-70). Similar to Smith, 32 bytes proved to be an effective prefetch block size. They found that block sizes ranging from 8 to 32 bytes resulted in the largest reductions in bus traffic.

Smith identified several architectural factors which can affect block size choice (Smith, 1987: 1064). "Memory interference" and "memory busy time" can result from larger block sizes. In multiprocessor systems, the longer line can tie up the memory and bus and, in turn, adversely impact other processors. In addition, "I/O overruns" may occur. Memory interference may cause I/O operations to be aborted and reinitiated. Another factor involves address tag storage. If a block size is small, a substantial amount of the cache storage is required for the address tags. The result is the effective size of the cache is decreased. One more factor concerns copy-back caches. A larger block size can increase memory traffic for each copy back. However, Smith suggests

the additional traffic can be countered by the lower miss rates attained from larger lines.

To improve multiprocessor system performance, Johnson also suggests the use of prefetching to reduce contention for shared memory resources (Johnson, 1989:137-141). He explains an approach called "tagged working set prefetching." Each prefetched block carries a tag to uniquely identify its working set. The prefetched blocks are then "broadcast" to all processors. Using the tags, other cache controllers can load any required broadcast blocks. The result is reduced memory traffic by accomplishing several prefetches with single accesses.

Smith attributed the effect on miss ratio as the main influence that block size choice has on cache performance (Smith, 1987:1064-1074). Using extensive trace-driven simulation, Smith observed that larger block sizes generally reduced cache miss ratios. Longer lines tend to exploit the memory referencing localities of executing workloads. The upper limit to this effect occurred when the block size approached the cache size. After this point, the miss ratios increased. Smith explained that the increase was due to less captured program locality resulting from decreased number of cache blocks. For cache sizes of 32K and 64K bytes, prefetch block sizes of 16-64 bytes continued to produce the lowest miss ratios. In addition to the simulations involving a 32 byte CAM block size, another set of simulations used in this

research employs a 16 byte block size for a 32K byte CAM cache.

Using trace-driven simulations involving similar cache sizes, Przybylski also produced optimal prefetch block sizes of 32-64 bytes (Przybylski, 1990:160-169). These sizes resulted in the minimum effective memory access times.

The advantages of prefetching extend to all levels of the memory hierarchy. Excessive dependence on disk I/O operations can significantly decrease system performance. Smith identified database systems as excellent candidates for data caching (Smith, 1978:223-246). The high degree of sequential access inherent in database systems allow subsequent data references to be predicted. In turn, data blocks or segments can be prefetched into main memory to satisfy future requests. The resulting reduction in disk I/O substantially improves database system performance.

# 2.4 Multi-Level Cache Hierarchies

With the speeds of new processors continuing to increase, traditional single-level caches will not be able to exploit the extremely fast CPU cycle times. As Smith pointed out, expanding the size of a single cache produces two problems (Smith, 1982:517). One is large physical size and circuit complexity increases memory access time. The other problem is adding more cost to an already expensive hardware component.

Hennessy, Horowitz, and Przybylski observed that once the

cache size reaches 64KB, there exists little margin for performance improvement (Hennessy and others, 1988:290-298). As they note, "chip to chip communication" and control circuitry continue to contribute a major portion of memory latency. Physical cache size and material properties limit transfer rates. Consequently, as CPU cycle time grows faster, the single-level cache is hard pressed to maximize CPU performance.

Multi-level caches provide a solution. Employing smaller, faster cache design technology, cache hierarchies can locate a first-level cache close to the processor to lower memory latency and transfer times. As Hennessy, Horowitz, and Przybylski pointed out, the addition of a second-level (C2) cache substantially decreases the miss penalty of the firstlevel (C1) cache (Hennessy and others, 1989:114-120). The result is lower effective memory access time leading to increased CPU performance.

In their two-level cache simulation study, Levy and Short show that the addition of a secondary cache can boost system performance (Levy and Short, 1988:81-87). Employing tracedriven simulations, they compared the execution cycle times of two-level caches to those of a single-level cache. Results revealed that given an C2 to C1 size ratio of at least 8:1, system performance was significantly improved from the addition of the C2 cache. For example, given a 15-cycle main memory access time, combining a 4-cycle, 256KB C2 cache with

a 1-cycle, 8KB C1 cache produced a performance increase of 18 percent. In this research, an 8:1 ratio of CAM to SLC size is used in the simulations.

Using trace-driven simulations of two-level caches, Hennessy, Horowitz, and Przybylski showed that a *C1* cache largely decreased the number of memory references to the *C2* cache (Hennessy and others, 1989:114-120). They noted that the smaller amount of *C2* references reduces the impact of *C2* cycle time. As a result, optimal secondary cache sizes can exceed the sizes of single-level caches. Interesting similarities were found between the "global" miss ratios (number of misses divided by total CPU references) of secondary caches and the miss ratios of corresponding singlelevel caches. They determined that if the *C2* cache is at least eight times the size of the *C1* cache, then the global miss ratio of *C2* is basically equivalent to the miss ratio of the comparably-sized single cache.

From trace-driven simulations of two-level caches, Bakka, Bugge, and Kristiansen examined the miss ratios of secondary caches (Bakka and others, 1990:250-259). Their simulations involved C2 cache sizes of 1-8 MB with the smallest size being eight times the size of the C1 cache. They found that C2 block sizes of 128 and 256 bytes produced the lowest miss ratios in the secondary caches.

Due to the high hit ratios of the first-level cache, improving the access time of this cache is an important design

goal. Baer, Levy, and Wang proposed a two-level cache hierarchy designed to minimize the access time of the *C1* cache (Baer and others, 1989:140-148). They suggested that the *C1* cache can be optimally accessed by virtual addresses. In turn, the *C1* access time is reduced since no address translation is required. Address translation and handling of "synonyms" (copies of data under different virtual addresses) is accomplished in the *C2* cache. As discussed earlier, decreased referencing of the *C2* cache lessens the impact of increased cycle time overhead. From their simulation results, they concluded that if the address translation penalty in the *C1* cache is at least six percent, then switching to the virtual *C1* design would reduce effective memory access time.

#### 2.5 Summary

Several cache performance characteristics identified in this chapter have direct applicability to this research. Increasing the cache size beyond 64K bytes results in little improvement to the hit ratio. For cache sizes of 32K and 64K bytes, prefetch block sizes of 16-64 bytes produce the optimal hit ratios. In two-level cache hierarchies, the size ratio of the secondary cache to the primary cache should be at least 8 to 1. These performance characteristics are used to establish the cache size and block size parameters for the trace-driven sirulations.

# Chapter 3 Methodology

This chapter describes the methodology employed to design and build the two-level cache simulator used for this research. It starts with a justification of the methodology selected. An overview of the experimental setup is provided. Next, the cache simulator design and implementation is covered in detail. The chapter ends with a description of the test procedures used to validate and verify the correctness of the simulator functions.

# 3.1 Justification of Methodology Selected

As covered in the background, trace-driven simulation is the technique to be used to evaluate the proposed memory subsystem design. Hobart developed a simulator for the twolevel SLC and CAM cache hierarchy. This simulator was written in Lisp and was built to run on the TI Explorer II computer.

For this research, the decision was made to design and develop another simulator for the SLC and CAM cache memory subsystem. This simulator is written using the Ada programming language.

The rationale for this decision is based on several justifications. One main reason is to take advantage of the

powerful, high-speed Sun SPARC microprocessor workstations. The idea is very simple: to minimize the execution time of the trace-driven simulations. The speed of the Sun SPARC microprocessor is orders of magnitude (over 20 to 1) faster than the TI Explorer II. Importantly, the Sun SPARC workstation contains an extensive Ada software development/ execution environment: Verdix Ada Development System (VADS - Version 6.0)

Another justification is based on the choice of the Ada programming language. From a development standpoint, Ada offers several advantages. A main strength of Ada lies in its handling of abstract data types. Using a "packaging" concept, Ada provides the ability to "encapsulate" abstract data types along with the operations which manipulate their states (Feldman, 1985:4-7). This Ada feature allows the cache simulator to be developed using software packages which can be integrated into a reliable, structured design. In addition, this advantage facilitates modification and expansion of the simulator to incorporate growing research requirements. New aspects of cache behavior can be analyzed by modifying existing or developing additional packages.

Another advantage is reusability. The use of Ada constructs to achieve a highly-structured design promotes an understanding of the software system. In turn, future research efforts can benefit from expanded versions of the cache simulator.
## 3.2 Functional Requirements

The following requirements served as the functional baseline for the SLC and CAM cache simulator.

#### 3.2.1 Workload Traces

Several sets of virtual address traces were available for use in this research. Sixteen workload traces were collected by Agarwal, Sites, and Horowitz from the VAX 8200 (Agarwal and others, 1986). Hobart provided 15 address traces collected on the TI Explorer II (10 traces) and the IBM System/360 Model 91 (5 traces).

The cache simulator must accept as input the memory references contained in these traces. The memory references in the Agarwal, Sites, and Horowitz traces are in hexadecimal format. In addition, each memory reference has an integer number identifier. This identifier indicates whether the address is a data read, data write, or instruction fetch. The memory references in the Explorer II/IBM 360 traces are in integer format. These traces are available in separate versions: data read only, data write only, instruction fetch only, data reference only, and all references. As a result, they do not contain reference identifiers.

#### 3.2.2 Cache Design Parameters

The cache simulator must allow modification of the following parameters for varied simulations: SLC size, SLC

prefetch block size, CAM size, and CAM prefetch block size. The SLC shall employ a LRU replacement algorithm. The CAM shall employ a FIFO circular buffer replacement algorithm.

# 3.2.3 SLC and CAM Miss Ratios

For each trace-driven simulation, the cache simulator must calculate the miss ratios for the SLC and CAM caches.

# 3.2.4 SLC and CAM Memory Access Time

For each trace-driven simulation, the cache simulator must calculate the effective memory access time. If we assume that a cache block transfer can be overlapped with the CPU execution, then the effective memory access time  $t_a$  is:

$$t_{a} = t_{SLC} p_{SLC} + t_{CAM} (1 - p_{SLC}) p_{CAM} + t_{MEM} (1 - p_{CAM}) (1 - P_{SLC})$$

where

# 3.2.5 SLC and CAM Cache Pollution

For each trace-driven simulation, the cache simulator must calculate the SLC and CAM cache pollution percentages.

# 3.2.6 Reference Frequency

For each trace-driven simulation, the cache simulator must individually track how long it takes (number of references) before each prefetched address is first referenced. Cache pollution references will not be included. For each given number of references, the cache simulator must determine how many prefetched addresses required the same number of references. For example, 1000 prefetched addresses in the SLC took 50 memory references before they were referenced for the first time.

# 3.3 Cache Simulator Preliminary Design

The purpose of this stage was to develop a high-level architecture depicting how the simulator can be structured to meet all functional requirements. In turn, this architecture served as guide to producing an Ada software design solution.

The design approach used for this cache simulator is based on functional decomposition. To achieve a structured design, the simulator is comprised of an integrated set of program modules. Each module performs functional requirements involving the processing of input(s) to produce required output(s).

A structure chart is constructed to provide a pictorial representation of how the cache simulator program modules would work together. The structure chart offers a way to develop the simulator architecture without requiring knowledge of the internal workings of each module. The syntax involves using boxes to represent the program modules. Communication between the modules is represented by labeled arrows.

Figure 3.1 shows the structure chart for the cache simulator. As depicted, the program modules are grouped into three major areas: input, processing, and output. The top box represents the cache simulator driver module which directs the activity between the three areas.

The input area is comprised of three modules. The Fetch Trace Address module extracts one memory reference address at a time from the trace file. Prior to being sent to the driver, the memory reference is converted from hexadecimal to integer format by the Convert Hex Address to Integer module. The Determine Reference Type module determines whether the memory reference is a data read, data write, or instruction fetch.

The cache processing area performs the SLC and CAM cache functions. It is comprised of the following three modules: *Process Data Read Address, Process Data Write Address,* and *Process Instruction Fetch Address.* Depending on the memory reference type, the driver invokes the appropriate module to service the trace address. Since these modules execute the





SLC and CAM cache functions, they require access to the SLC and CAM cache data structures. As the selected module processes the memory reference, SLC and CAM data is generated for cache performance analysis.

The output area processes the performance data and produces statistics which are written to files. This area is comprised of four modules. The *Compute Miss Ratios* module calculates the SLC and CAM cache miss ratios. Using the cache miss ratios, the *Compute Memory Access Time* module calculates the effective memory access time. The *Compute Cache Pollution* module calculates the SLC and CAM cache pollution percentages. And finally, the *Generate Reference Frequency List* module produces the output file tracking the number of references before each prefetched address is first referenced.

Table 3.1 maps the structure chart modules to the functional requirements satisfied.

# 3.4 Implementation of Cache Simulator in Ada

Once the simulator architecture was established, development of the cache simulator began. This development phase involved the transformation of the functional design requirements into a complete Ada system. Major tasks included the detailed design, coding, integration and testing of the simulator.

| Module                                                                                                                                                                                                             | Functional<br>Requirements                                                    |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|
| Driver<br>Fetch Trace Address<br>Determine Reference Type<br>Convert Hex Address to Integer<br>Process Data Read Address<br>Process Data Write Address<br>Process Instruction Fetch Address<br>Compute Miss Ratios | 3.2.2<br>3.2.1<br>3.2.1<br>3.2.1<br>3.2.2<br>3.2.2<br>3.2.2<br>3.2.2<br>3.2.3 |
| Compute Memory Access Time<br>Compute Cache Pollution<br>Generate Reference Frequency List                                                                                                                         | 3.2.4<br>3.2.5<br>3.2.6                                                       |

Table 3.1: Cache Simulator Requirements Matrix

Each program module in the structure chart is mapped into an Ada package. By performing this encapsulation, each Ada package can be developed and tested as an individual functional component. Indicated by the arrow connections between modules (structure chart), visibility into other packages is accomplished by *withing* the required packages. Identified as data flows between modules (structure chart), package communication is accomplished through the passing of parameters to/from the various procedures and functions contained within the packages.

The following sections provide a detailed description of each Ada package used in the cache simulator. Appendix A contains the source code.

# 3.4.1 LinkedLists\_Package

A linked list data structure is employed to represent the SLC cache abstract data type. The LinkedLists\_Package is used to implement the SLC cache as a linked list structure. Ada code for this package is a modified version of code taken from <u>Data Structures with Ada</u> (Feldman, 1985:103-115). The rationale for the linked list structure is based on the LRU replacement algorithms of the SLC cache.

Initially, it appeared that the SLC might be optimally represented as an array. However, if an array was employed, the array elements would constantly have to be reshuffled to depict an LRU ordering. If a time stamp (number of references passed) was used, every array element would have to be searched to determine the LRU candidate for replacement. Both of these array approaches would have lead to simulator performance penalties.

A linked list structure provides a cleaner way to implement the LRU replacement algorithm. As an SLC address is being either referenced (cache hit) or prefetched, the address can simply be placed at the front of the list. As a result, the LRU addresses fall to the rear of the list. When an address has to be replaced, no search is required. The address at the rear of the list is replaced. In addition, since the SLC is very small, the size of the linked list never grows beyond 512 nodes (largest number of addresses in SLC). In turn, SLC searches do not degrade simulator performance.

In addition to the SLC, two linked lists are used to maintain the SLC and CAM reference frequency lists. These lists will be discussed in more detail later. For now, it should be pointed out that these two linked lists are handled by this package.

The LinkedLists\_Package specification is shown in Figure 3.2. LinkedListNode is a record structure containing three fields. The Addr field represents the address of the memory reference. The NumRef field indicates the number of memory references which have passed while the address is waiting to be first referenced. The Next field contains the pointer to the next LinkedListNode.

This package has two functions and four procedures to manipulate the state of the linked list data structure. Receiving the memory reference address, the MakeNode function creates a LinkedListNode for storing a new address in the SLC. The Search function is used to find the node containing the matching address during a SLC search. The AddToFront procedure adds a LinkedListNode to the front of the SLC. The Insert\_In\_Order procedure is used by the SLC and CAM reference frequency lists to insert an address record in ascending order by the NumRef value. Insert\_In\_Order employs the AddToFront and AddToRear (only at end of list) procedures to place a frequency record in its appropriate position in the list. The Delete procedure deletes a designated node from the SLC.

```
package LinkedLists Package is
   type LinkedListNode;
   type NodePointer is access LinkedListNode;
   type LinkedListNode is
      record
         Addr
                  : integer;
         NumRef : integer := 0;
                 : NodePointer := null;
         Next
      end record;
   type List is
     record
                  : NodePointer := null;
         Next
                 : NodePointer := null;
         Tail
      end record;
   function MakeNode (Address: integer)
                      return NodePointer;
   function Search (L: List; P1: NodePointer)
                      return NodePointer;
   procedure AddToFront (L: in out List;
                         P1: NodePointer);
   procedure AddToRear (L: in out List;
                         P1: NodePointer);
   procedure Insert In Order (L: in out List;
                         P1: NodePointer);
   procedure Delete (L: in out List; P1: NodePointer);
end LinkedLists Package;
```

Figure 3.2: LinkedList\_Package Specification

The LinkedLists\_Package body instantiates a generic package called Unchecked\_Deallocation. Dynamic allocation can swiftly use up storage space as linked lists nodes are continually being created and deleted. The Unchecked\_ Deallocation package is called by the Delete procedure to return space used by the deleted node back to available memory.

# 3.4.2 CircularQ Package

A circular queue data structure is used to implement the FIFO circular buffer replacement algorithm of the CAM cache. Blocks of addresses are stored in the CAM in the order they are referenced. The *CircularQ\_Package* maintains the state of the CAM cache using the circular queue. Ada code for this package is also a modified version of code taken from Feldman (Feldman, 1985:144-145).

Using a circular queue eliminates the need to move the entire queue. Since the front and rear of the queue are essentially connected, CAM replacement and prefetch operations can be accomplished by moving head and tail pointers around the queue.

In Chapter 4, the CAM is referred to as a "stack." This terminology is not to be confused with the CAM being implemented as a circular queue. Instead, "stack" is used to describe the memory referencing behavior of the CAM cache: same-stack-distance, not-same-stack-distance. The circular queue describes how the addresses are stored and replaced within the CAM cache.

The CircularQ\_Package specification is shown in Figure 3.3. The circular queue array is created as the dynamically allocated Array\_Type with index of range 1 to 32768. The upper range represents the largest possible CAM size of 32768 addresses. The Queue is a record structure containing four fields. Declared as a dynamically allocated

```
package CircularQ Package is
  type index is range 1 .. 32768;
  type Array_Type is array (index) of integer;
  type Array Ptr Type is access Array Type;
  type Queue is
    record
       Address : Array Ptr Type := new Array Type;
       Ref Count : Array Ptr Type := new Array Type;
       head
                 : index;
       tail
                 : index;
    end record;
  procedure Enqueue (Q: in out Queue;
                     CAM Size Index: in index;
                     Reference: in integer);
  procedure Dequeue (Q: in out Queue;
                     CAM Size Index: in index);
  procedure SearchQ (Q: in Queue;
                     Reference: in integer;
                     CAM Size Index: in index;
                     Position: in out index;
                     Found: in out boolean);
end CircularQ Package;
```

Figure 3.3: CircularQ\_Package Specification

variable of Array\_Ptr\_Type, the Address field is used to store the memory reference address in the CAM. Also of Array\_Ptr\_ Type, the Ref\_Count field indicates the number of memory references that have passed while the address is waiting to be first referenced. The head and tail fields are used to indicate the front and rear of the CAM circular queue, respectively.

This package has three procedures to manipulate the state

of the circular queue data structure. Receiving the prefetched memory reference address, the Enqueue procedure loads the address at the tail of CAM queue. As required by the FIFO replacement algorithm, the Dequeue procedure removes the address located at the head of the CAM queue. The SearchQ procedure searches for a matching address in the CAM. It returns a flag indicating whether or not an address has been found. If a cache hit occurs, SearchQ will also return the position of the address in the CAM queue.

# 3.4.3 Addr Record Package

For type handling purposes, an abstract data type is created for the virtual addresses contained in the Agarwal, Sites, and Horowitz trace files. The Address\_Record\_Package specification is shown in figure 3.4. This package creates a record structure comprised of two fields. The The\_Type field indicates the type of the memory reference: data read, data write, or instruction fetch. The Address field stores the hexadecimal virtual address from the trace file.

# 3.4.4 Cache\_Simulator Driver Procedure

The Cache\_Simulator driver procedure functions as the main controller for the simulator system. It provides an interface between the input, cache processing, and output packages. In order to produce an executable Ada system, the Cache\_Simulator driver is built as a procedure instead of as

```
package Addr_Record_Package is
  type Addr_Record is
    record
    The_Type : character;
    Address : integer;
    end record;
end Addr Record Package;
```

Figure 3.4: Addr Record\_Package Specification

a package. Figure 3.5 shows the *Cache\_Simulator* driver procedure.

As the driver, the *Cache\_Simulator* procedure needs visibility to the input, processing, output packages. The required packages are withed into the driver.

In the variable declaration section, the SLC linked list and CAM circular queue data structures are instantiated. The SLC and CAM reference frequency lists are also instantiated. The SLC size, CAM size, SLC block size, and CAM block size parameters are declared as variables rather than constants. This allows the cache parameters to be interactively entered by the user. The variables used to calculate the cache miss ratios, pollution, and effective memory access time are declared and initialized to zero. These variables include the following: SLC\_Miss, CAM\_Miss, SLC\_Total\_Refs, CAM\_Total\_ Refs, SLC\_Non\_Ref, CAM\_Non\_Ref, SLC\_Total\_Prefetch, and CAM\_Total\_Prefetch.

For convenience, the driver provides the user with an

```
with Text_IO, Addr_Record_Package, LinkedLists_Package,
     CircularQ Package, Fetch_Address_Package,
     Determine_Type_Package, Serv_Instr_Fetch Package,
     Serv Data Read Package, Serv Data Write Package,
     Compute Miss Ratios Package,
     Compute_Memory_Access_Time_Package,
     Compute Cache Pollution Package,
     Generate Ref Frequency List Package;
procedure Cache Simulator is
       ********
       * Variable declaration section *
   --
       ******
begin
       user enters names of output & reference files
   -
   -
       create output & reference files
       user enter name of trace input file
   ----
       open trace input file
       user enters cache parameters: SLC size, CAM size,
   -----
   -
          SLC block size, CAM block size
   -
       while not end of file loop
   --
          call routine to fetch address from trace file
   ---
          call routine to determine type of reference
   ----
          case
             when data read =>
   ----
   --
                call routine to process data read
   -----
             when data write =>
   _ _
                call routine to process data write
   ----
             when instruction fetch =>
   _ _
                call routine to process instruction fetch
   _
             when others => exit
   ----
          end case
   ----
          if number of references processed = 20000 then
   ----
             call routine to compute miss ratios
   ----
             reset reference counter to zero
   _ _
          else if end of trace file then
   ----
             call routine to compute final miss ratios
             call routine to compute average memory
   ---
   _ _
                access time
             call routine to compute cache pollution
   -----
   _ _
             call routine to generate reference
   --
                frequency list
   -
          end if
       end loop
   --
       close input, output, and reference files
end Cache Simulator
```

Figure 3.5: Cache\_Simulator Driver Procedure

interface for executing a simulation. Once the simulator is activated, the user is queried to name the statistics output file to be generated by the simulation run. The statistics output file includes the SLC and CAM miss ratios, the SLC and CAM pollution percentages, and the effective memory access time. Secondly, the user is asked to name the reference output file. This file includes the reference frequency list for the SLC and CAM caches. Next, the user is gueried to name the trace input file to be used in the simulation run. The final user inputs include the cache parameters: SLC size, CAM size, SLC block size, and CAM block size. For archival purposes, the driver writes the trace input filename and the cache parameters at the top of the statistics and reference output files.

Once all user inputs have been entered, the cache simulation begins. Due to the different formats of the address traces, two Cache\_Simulator drivers have been written. One version is designed for the Agarwal, Sites, and Horowitz traces. In this version (Version 1), the driver calls the Load\_Record procedure (Fetch\_Address\_Package). This action returns an address converted from an hexadecimal to an integer format. The driver then calls the Address\_Type function (Determine\_Type\_Package) to determine the type of reference.

The other driver version (Version 2) is designed for the Explorer trace files. As previously explained, these address traces are already in integer format and do not require a type

determination. In turn, this driver version only needs to perform a get (Text\_IO) operation to fetch an address.

Depending on the reference type, the Version 1 driver will invoke one of the three cache processing procedures: Serv\_Data\_Read, Serv\_Data\_Write, or Serv\_Instr\_Fetch. For this research, these three procedures perform the same cache processing functions. Given this, the rationale for creating three separate procedures is to accommodate future research requirements. Continuing research may need the simulator to perform different cache processing functions based on reference type. Separate procedures facilitate implementation of these future cache processing modifications.

Since no reference type determination occurs, the Version 2 driver only requires one main cache processing procedure. The Process\_Memory\_Reference procedure performs the same cache functions as the three processing procedures in the Version 1 driver.

Once the memory address reference has been processed, the driver repeats the fetch and process cycle described above. For every 20000 trace references processed, the driver calls the Compute\_Miss\_Ratios procedure to calculate the cumulative miss ratios in the SLC and CAM caches. When the simulator reaches the end of the trace file, the driver invokes several procedures to produce the cache performance data. First, the driver calls the Compute\_Miss\_Ratios procedure to calculate the final miss ratios. Next, the driver calls the

Compute\_Memory\_Access\_Time procedure to calculate the effective memory access time. Then, the Compute\_Cache\_ Pollution procedure is called to determine the pollution percentages. And finally, the Generate\_Ref\_Frequency\_List procedure is invoked producing the reference frequency data.

At this point, the simulation is finished. The driver concludes the session by closing the trace, statistics, and reference files.

### 3.4.5 Cache Processing Packages

The cache processing packages include the following: (Version 1 driver) Serv\_Data\_Read\_Package, Serv\_Data\_Write\_ Package, and Serv\_Instr\_Fetch\_Package; (Version 2 driver) Process\_Memory\_Reference\_Package. Since all four packages perform the same cache processing functions, the following explanation applies to all four packages.

For types handling, the cache processing packages require visibility (with) to the Linked\_List\_Package, the CircularQ\_ Package, and the Addr\_Record\_Package. Each package contains one cache processing procedure. To assist in a more detailed description, Figure 3.6 shows the flow of the cache processing procedure.

Using the requested address, the SLC is searched for a matching address. If a SLC hit occurs, then the SLC performance statistics are updated. Since the address request has been satisfied, the cache processing is finished. Control





is returned to the driver.

If a SLC miss occurs, then the CAM is searched for a matching address. If a CAM hit occurs, then the block containing the matching address is prefetched from the CAM to the SLC. Figure 3.7 illustrates this prefetching process. The goal is to prefetch the structural locality that exists in the CAM. In turn, the requested address plus the addresses located immediately above in the CAM (equal to SLC block size) are prefetched into the SLC. The prefetched addresses are placed at the front of the SLC linked list. This action causes the LRU addresses to fall toward the back of the list. Once the SLC prefetch is accomplished, the CAM performance statistics are updated. Since cache processing has finished, control is returned to the driver.

If a CAM miss occurs, then the block containing the requested address is prefetched from main memory to the CAM. In addition, a SLC block containing the requested address must be prefetched from the CAM. Figure 3.8 illustrates this prefetching process. Once again, only the requested address plus the addresses immediately above in the stack (equal to SLC block size) are prefetched into the SLC. In the example, although the SLC block size is four, the prefetch results in only one word. Two conditions produce this result. One is the CAM prefetch is always placed at the stack top. The other is the requested address is the last address on the stack. Once both prefetches are accomplished, the SLC and CAM







Figure 3.8: SLC & CAM Prefetches After Both Caches Missed

;

performance statistics are updated. Since cache processing has finished, control is returned to the driver.

### 3.4.6 Fetch\_Address\_Package

The Fetch\_Address\_Package contains one procedure: Load\_Record. The specification for this package is shown in Figure 3.9. Load\_Record procedure extracts one memory reference record (address and reference type) from the current position of the Agarwal, Sites, and Horowitz trace file. To convert the address from hexadecimal to integer format, this procedure calls the Hex\_to\_Dec procedure. The resulting integer address record is then returned to the driver for further processing.

> with Text\_IO, Addr\_Record\_Package; package Fetch\_Address\_Package is procedure Load\_Record (Input\_File: in out File\_Type; Memory\_Ref: out Addr\_Record); end Fetch\_Address Package;

Figure 3.9: Fetch\_Address\_Package Specification

#### 3.4.7 Hex\_to\_Dec\_Package

The Hex\_to\_Dec\_Package contains one function: Hex\_ to\_Dec. The specification for this package is shown in Figure 3.10. The Hex\_to\_Dec function converts the input hexadecimal string into an integer address.

Due to the address structure of the SPARC microprocessor, an offset is required in the conversion value. Although the SPARC machine has a 32-bit address, it reserves 1 bit as a sign bit for integers. In turn, not all eight digit hexadecimal addresses can be represented as integer values. To counter this, the integers are offset to include both the positive and negative values. The resulting integer range is:  $-2^{31}$ ..  $(2^{31} - 1)$ . Although this offset changes the original value of the virtual address, the modified value does not affect the cache processing. The simulator is only interested in matching addresses during searches.

The rationale for converting the hexadecimal address to an integer value is to optimize performance. The simulator

package Hex\_to\_Dec\_Package is
 function Hex\_to\_Dec (Hex\_Addr: string;
 Hex\_Length: natural) return integer;
end Hex\_to\_Dec\_Package;

Figure 3.10: Hex\_to\_Dec\_Package Specification

could be designed to process a hexadecimal address. However, since the address would be represented as a string, cache searches would involve comparing addresses one character at a time. The result would be a substantial drop in simulation speed. By treating the address as an integer value, cache searches only require a single comparison per address.

# 3.4.8 Determine\_Type\_Package

The Determine\_Type\_Package contains one function: Address\_Type. The specification for this package is shown is Figure 3.11. Receiving the address record, the Address\_Type procedure determines the memory reference type: data read, data write, or instruction fetch. The type identifier is returned to the driver.

with Addr\_Record\_Package;

package Determine\_Type Package is

end Determine\_Type\_Package;

Figure 3.11: Determine Type Package Specification

# 3.4.9 Compute Miss Ratios Package

The Compute\_Miss\_Ratios\_Package contains one procedure: Compute\_Miss\_Ratios. The specification for this package is shown in Figure 3.12.

The Compute\_Miss\_Ratios procedure calculates the SLC and CAM miss ratios using input cache performance statistics: number of SLC misses, total number of SLC references, number of CAM misses, and total number of CAM references. The resulting miss ratios are written to the statistics file. In addition, the values are returned to the driver to be used in effective memory access time calculations.

> with Text\_IO; package Compute\_Miss\_Ratios\_Package is procedure Compute\_Miss\_Ratios (SLC\_Miss: in natural; CAM\_Miss: in natural; SLC\_Total\_Refs: in natural; CAM\_Total\_Refs: in natural; Num\_Ref: in natural; SLC\_MR: out float; CAM\_MR: out float Output\_File: in out File\_Type); end Compute Miss Ratios Package;

Figure 3.12: Compute Miss Ratios Package Specification

# 3.4.10 Compute\_Memory\_Access\_Time\_Package

The Compute\_Memory\_Access\_Time\_Package contains one procedure: Compute\_Memory\_Access\_Time. The specification for this package is shown in Figure 3.13. The Compute\_Memory\_ Access\_Time procedure calculates the effective memory access time for the trace workload. The access times (in clock cycles) used in the model for the three memory levels are as follows: SLC: 1; CAM: 4; main memory: 32.

```
with Text_IO;
package Compute_Memory_Access_Time_Package is
    procedure Compute_Memory_Access_Time
    (SLC_MR: in float; CAM_MR: in float;
    Output_File: in out File_Type);
end Compute Memory Access Time Package;
```

Figure 3.13: Compute Memory Access Time Package Specification

# 3.4.11 Compute\_Cache\_Pollution\_Package

The Compute\_Cache\_Pollution\_Package contains one procedure: Compute\_Cache\_Pollution. The specification for this package is shown in Figure 3.14. The Compute\_Cache\_Pollution procedure calculates the SLC and CAM cache pollution using the

with Text\_IO, LinkedLists\_Package, CircularQ\_Package; package Compute\_Cache\_Pollution\_Package is procedure Compute\_Cache Pollution (SLC: in out List; CAM in out Queue; SLC\_Non\_Ref: in out natural; CAM\_Non\_Ref: in out natural; SLC\_Total\_Prefetch: in natural; CAM\_Total\_Prefetch: in natural; Temp\_CAM\_Size: in natural; Output\_File: in out File\_Type); end Compute Cache Pollution Package;

Figure 3.14: Compute\_Cache\_Pollution\_Package Specification

following cache performance statistics: number of prefetched

addresses in the SLC and CAM never referenced; total number of prefetched addresses in the SLC and CAM. The resulting pollution percentages are written to the statistics file.

### 3.4.12 Generate Ref Frequency List Package

The Generate\_Ref\_Frequency\_List\_Package contains one procedure: Generate\_Ref\_Frequency\_List. The specification for this package is shown in Figure 3.15.

The Generate\_Ref\_Frequency\_List procedure produces the output file recording the reference frequencies for the SLC and CAM. The file format consists of two columns. The first column is comprised of values denoting the number of references passed before the prefetched addresses were first referenced. The second column indicates the number of addresses (frequency) which realized the corresponding number of references value.

with Text\_IO, LinkedLists\_Package; package Generate\_Ref\_Frequency\_List\_Package is procedure Generate\_Ref\_Frequency\_List (SLC\_Ref\_List: in out List; CAM\_Ref\_List: in out List; SLC\_Non\_Ref: in out List; SLC\_Non\_Ref: in natural; Reference\_File: in out File\_Type); end Generate\_Ref\_Frequency\_List\_Package;

Figure 3.15: Generate Ref Frequency List Package Spec

# 3.5 Validation of Cache Simulator

Prior to being integrated into the cache simulator system, each Ada package was tested thoroughly as a unit. Using test inputs, the outputs of each package were checked for correctness. Verifying the correct functioning of the packages facilitated the integration effort.

Once the integration of the Ada packages was successfully completed, the cache simulator system was ready for testing. In order to ensure the validity of the research results, the simulator was extensively tested for accuracy. To accomplish this, a test trace file was created. The file contained 50 memory references which would require all cache processing functions to be used. Figure 3.16 provides an outline of the testing requirements. Using this trace file, the test simulations were designed to meet all of these requirements. In Appendix B, each test requirement is mapped to the point within the trace where it is tested.

Five test trace simulations were run and checked for accuracy. Each test run used a different set of cache parameters: SLC size, SLC block size, CAM size, and CAM block size. Prior to running the simulations, the cache performance statistics were manually calculated for each set of cache parameters. After the five test simulations were run, the simulation statistics matched the manual calculations.

Test Input Functions I. (Version 1 driver) Correct handling of memory reference record Α. Hexadecimal address (8 digits) 1. Hexadecimal address (< 8 digits) 2. Conversion to integer value 3. Type determination 4. (Version 2 driver) Correct handling of integer memory reference в. (Both) Able to process all memory references C. Test Cache Processing Functions II. Correct handling of SLC search Α. 1. SLC hit SLC miss 2. Correct handling of CAM search Β. CAM hit 1. 2. CAM miss On CAM hit, correct handling of SLC prefetch C. Before SLC fills 1. 2. After SLC fills 3. From middle of CAM 4. From top of CAM D. On CAM miss, correct handling of CAM prefetch and SLC Prefetch 1. Before CAM fills After CAM fills 2. Before SLC fills 3. After SLC fills 4. Correct tracking of cache performance stats E. (after 25 references) # of SLC misses 1. # of CAM misses 2. 3. total # of SLC references total # of CAM references 4. Test Output Functions III. A. Correct SLC & CAM miss rates Correct effective memory access time в. Correct SLC & CAM cache pollution percentages c. D. Correct values in reference frequency file

Figure 3.16: Simulator Testing Requirements

# 3.6 Summary

The cache simulator was designed to meet all the functional requirements of the proposed memory subsystem. Once developed, the simulator was subjected to rigorous validation testing and cross-checking. Based on this effort, it can be concluded that the cache simulator is capable of providing valid results for this research.

# Chapter 4

# Findings

This chapter provides the research results of the effect that spatial locality prefetching has on structural locality. First, the workload selection for studying the cache pollution in the SLC and CAM is discussed. Next, the modifications to Hobart's memory referencing models are shown to account for spatial prefetching. From these modified models, new equations are derived to predict the SLC and CAM hit probabilities. Next, the approach to using simulation measurements to solve these equations is explained. Results from these equations are then compared against hit ratios produced in the trace-driven simulations. The SLC and CAM hit probabilities are then used to estimate the effective memory access time of the design. Finally, the performance effects of spatial prefetching are compared with baseline results using no prefetching.

# 4.1 Workload Selection

As discussed in chapter two, the differences between the memory referencing behavior of symbolic and conventional workloads has been well documented. In order to produce meaningful results, cache performance studies must take these

differences into account.

The basic motivation for the proposed memory subsystem was to exploit the unique locality characteristics of symbolic workloads (Hobart, 1989:96). As such, this research focuses on symbolic workloads in determining the effects of cache pollution from spatial prefetching. The symbolic workloads used in the trace-driven simulations are shown in Table 4.1. Eight workloads were used from the Explorer traces. The last workload was taken from the VAX traces.

| Workload     | Application           | System      |
|--------------|-----------------------|-------------|
| Boyer        | Theorem Prover        | Explorer II |
| Compile-RB   | Lisp Compiler         | **          |
| Compile-STR  | Lisp Compiler         | 11          |
| GLISP-Comp   | Expert System Tool    | **          |
| GLISP-Pay    | Expert System Tool    | 85          |
| QSIM         | Qualitative Reasoning | 81          |
| Reducer      | Symbolic Computation  | 99          |
| TMYCIN       | Expert System Tool    | **          |
| LISP.0CO.DIN | Lisp Application      | VAX 8200    |

Table 4.1: Symbolic Workloads Used in Simulations

The nine selected workloads included all references. The rationale is to characterize the cache pollution effects of spatial prefetching against total workload behavior.

# 4.2 Trace-driven Simulations

To reasonably limit the number of simulations due to time constraints, the selection of the SLC and CAM size parameters is based on an estimated upper range of current cache technology. The CAM and SLC sizes are set at 8192 and 512 words, respectively. This 16 to 1 size ratio meets the minimum 8 to 1 ratio requirement. The cache parameters chosen for these simulations are shown in Table 4.2.

In order to study the cache pollution effects of spatial prefetching in the CAM, the CAM block size is fixed at 4 words. The rationale for this block size choice was based on the availability of data needed to solve the cache hit probability equations.

In studying cache performance in a RISC environment, Hill and Pnevmatikatos determined that a 32 byte block size (upper limit) produced the lowest miss ratios for a cache size of 32K

| Туре           | Number of<br>Words |
|----------------|--------------------|
| SLC Size       | 512                |
| CAM Size       | 8192               |
| SLC Block Size | 4, 8, 16, 32       |
| CAM Block Size | 4                  |

Table 4.2: SLC and CAM Cache Parameters

bytes (Hill and Pnev, 1990:53-68). The block size of four words (used for the CAM cache) was also found to produce low miss ratios. Using trace-driven simulation in their research, the effects of block sizes were analyzed using a variety of workloads including "Xlisp" (lisp interpreter with objectoriented features).

The three SLC block sizes represent 1:1, 2:1, 4:1, and 8:1 ratios to the CAM block size. The rationale for these block size selections is based on the memory subsystem design. The CAM is designed to capture the structural locality inherent in the workloads while spatially prefetching. In order to take advantage of this structural locality, the SLC should prefetch a multiple of the CAM block size.

The speed performance of the cache simulator is very acceptable. Assuming one user on the SPARC workstation, simulation run time varied from one half hour to two hours. This run time reflects simulations using trace files of up to 450,000 references. Importantly, this research showed the cache simulator could be structurally developed using the Ada language and, in turn, still provide a high level of performance.

A total of 45 simulations (five per workload) were run. Thirty-six simulations involved the prefetch ratios described above. The other nine used the same SLC and CAM sizes but did not use any prefetching. Therefore, the SLC and CAM block sizes were one word. The results from these simulations were

used as a baseline to determine any performance improvements.

The cache performance statistics generated from the trace simulations are shown in Table 4.3. The statistics represent the mean values for all workloads.

Table 4.3:Cache Performance StatisticsSLC Size = 512CAM Size = 8192SLC Block Size (varied)CAM Block Size = 4No Prefetch:SLC & CAM Block Sizes = 1

| SLC Block:    | 4       | 8          | 16    | 32 No     | Prefetch |  |
|---------------|---------|------------|-------|-----------|----------|--|
| SLC Hit Rate  | <u></u> |            |       |           |          |  |
| Mean:         | .859    | .880       | .893  | .908      | .820     |  |
| Std Dev:      | .061    | .055       | .051  | .044      | .074     |  |
| CAM Hit Rate  | (given  | SLC miss)  |       |           |          |  |
| Mean:         | .855    | .827       | .802  | .777      | .754     |  |
| Std Dev:      | .063    | .080       | .100  | .106      | .113     |  |
| SLC Pollution | .608    | .660       | .733  | .807      | .489     |  |
| CAM Pollution | (SLC    | hits count | ed as | pollution | in CAM)  |  |
|               | .771    | .799       | .819  | .828      | .561     |  |
| Eff Memory    | 4 02    | 4 95       | 4 70  | A 67      | 5 25     |  |
| Access Time*  | 4.93    | 4.00       | 4./9  | 4.0/      | 5.25     |  |

\* The equation for effective memory access time was explained in Chapter 3 (Section 3.2.4).

### 4.3 CAM Cache Hit Probability

Hobart developed a two-state Markov model to illustrate CAM cache referencing when prefetching is used (Hobart, 1989:100-106). As shown in Figure 4.1, the model uses state transitions to represent the probabilities of various types of




CAM cache referencing behavior.

In this research, the CAM is assumed to be referenced only when a SLC miss occurs. Thus, the CAM miss rate represents the local miss rate of the CAM. From this assumption, a CAM hit can occur in three ways. Given an old reference is currently being referenced, a hit can take place if the next reference is also old and has not been replaced out of the  $(1 - P_{ON}) W_C$  where  $(1 - P_{ON})$  is the probability of an CAM: old to old state transition;  $W_c$  is the effective cache size hit probability. Given a current new reference, a hit occurs if the next reference is old and has not been replaced out of the CAM:  $P_{ON} W_C$  where  $P_{NO}$  is the probability of a new to old state transition. In addition, given a current new reference, a hit can take place if the next reference is new and exists within the same block:  $P_{CP}$  which is the probability that a new to new reference is made to the block that was just The simplifying assumption is made that once prefetched. consecutive new references are made to a block and a new to old state transition occurs, any unreferenced addresses in the prefetched block(s) are considered pollution.

Thus, in the original model, when an old to new state transition occurs, the new reference is assumed to be to a previously unreferenced block. In order to more accurately predict the effects of spatial prefetching, a new model removes this assumption as shown in Figure 4.2 (modifications within dashed box). Given a current old reference, a hit can





take place if the next reference is new and is located within an existing prefetched block:  $P_{ON}$   $P_{CB}$  where  $P_{CB}$  is the probability of referencing a CAM prefetched block.

Using the modified Markov model, the following equation is derived to determine the expected CAM cache hit rate,  $P_{CAM}$ :

$$P_{CAM} = ((1 - P_{ON}) W_{C} + P_{ON} P_{CB}) p_{s_{0}} + (P_{NO} W_{C} + P_{CP}) p_{s_{1}}$$

where

$$p_{s_0} = \frac{P_{NO}}{P_{NO} + P_{ON}}$$

and

$$p_{s_1} = 1 - p_{s_0} = \frac{P_{ON}}{P_{NO} + P_{ON}}$$

Simplifying the equation, we have:

$$P_{CAM} = \frac{P_{NO} (W_{C} + P_{ON} P_{CB}) + P_{ON} P_{CP}}{P_{NO} + P_{ON}}$$

In order to show how the  $P_{CAM}$  equation is solved, an example using the performance statistics (Table 4.3) involving a CAM block size of 4 words is explained.

By analyzing the temporal distances of reference strings, Hobart determined the state transition probabilities of memory referencing behavior for the symbolic trace workloads (Hobart, 1989:40-41, 137). The mean state transition probabilities (shown in Table 4.4) will be used in the hit probability equation.

| Workload | New-Old<br>(NO) | Same Stack<br>Distance<br>(SSD) | Not Same<br>Stack Distance<br>(NSSD) | Old-New<br>(ON) |
|----------|-----------------|---------------------------------|--------------------------------------|-----------------|
| Boyer    | .506            | .474                            | .502                                 | .024            |
| Comp-RB  | .382            | .562                            | .423                                 | .015            |
| Comp-STR | .383            | .544                            | .438                                 | .018            |
| GLISP-C  | .478            | .623                            | .361                                 | .016            |
| GLISP-P  | .250            | .588                            | .407                                 | .005            |
| QSIM     | .457            | .444                            | .544                                 | .012            |
| Reducer  | .137            | .540                            | .454                                 | .006            |
| TMYCIN   | .378            | .626                            | .364                                 | .010            |
| Mean     | .371            | .550                            | .436                                 | .013            |
| Std Dev  | .1235           | .0653                           | .0634                                | .0063           |

Table 4.4: State Transition Probabilities - All References

 $W_C$  is a function of the CAM size. The probability that an old reference exists in the CAM is limited by the finite size of the CAM. Since spatial prefetching results in pollution, the effective size of the CAM is reduced. Effective CAM size is calculated as follows:

Eff CAM Size =  $N_c$  (1 -  $C_p$ )

where

 $N_c = CAM \ size \ (words)$ 

 $C_p = CAM pollution mean$ 

In the example, effective CAM size is:

Eff CAM Size = 8192 (1 - .771) = 1876 words

Analyzing the cumulative temporal locality characteristics of symbolic workloads, Hobart mapped the effective CAM sizes to corresponding hit probabilities on old-old transitions (Hobart, 1989:103). Using this graph (Appendix C), the  $W_C$  can be estimated for the effective CAM size of 1876 words:

$$W_{c} = 0.96$$

In order to calculate the probability of referencing a prefetched CAM block,  $P_{CB}$ , we must determine the number of old to new reference hits resulting from references to previously prefetched CAM blocks. First, the total number of additional references accessed within a CAM prefetched block,  $A_R$ , is calculated. From the CAM pollution  $(C_p)$  resulting from no prefetching (Table 4.3),  $(1 - C_p)$  or 43.9% of the demand-fetched references are rereferenced on average. We can note that if none of the prefetched words were referenced, the CAM pollution would be:

$$C_p = 1 - \frac{.439}{4} = 0.890$$

However, if all three of the prefetched references are, in fact, referenced, then the CAM pollution would be:

$$C_p = 1 - (\frac{.439}{4} + \frac{3}{4}) = 0.140$$

Our actual cache pollution must be between these two extremes. The following equation can be formed to determine the average number of additional references,  $A_B$ :

$$C_P = 1 - (\frac{(1 - C_P)}{B_C} + \frac{A_R}{B_C})$$

Solving for  $A_p$  gives:

$$A_{R} = (B_{C} - 1) (1 - C_{P})$$

Using this equation in our example, we have:

$$A_R = (3)(1 - .771) = 0.687$$

Therefore, 0.687 additional references are referenced in the CAM block on average.

Next, the number of references occurring within a CAM block during new to new referencing must be determined. Using a probability decision tree and summing all expected value outcomes, 1.41 references were found to be referenced during new to new referencing within a prefetched block of 4 words. Thus, only 0.41 out of the additional three references are referenced on average during new to new transitions. However, when the CAM spatially prefetches a block from main memory, all or a portion of the CAM block will also be prefetched to the SLC depending on where the requested address is located within the CAM block. Since the forward references (above the requested address) are prefetched to the SLC, new to new referencing within the CAM can only occur from those references within the CAM block which were not prefetched to the SLC (below the requested address). For example, if all four references in the CAM block are prefetched into the SLC, then no new to new references from that block can take place in the CAM. At the other extreme, if only the requested

address in the CAM block is prefetched to the SLC, then 0.41 new to new references from that block can take place in the CAM. The actual number of new to new references within the CAM block must be between these two extremes. Assuming onehalf of the CAM block is prefetched into the SLC on average, we can also assume that only 0.2 additional references of the 3 prefetched references is referenced before transitioning back to the old reference state.

From the CAM pollution which occurs with no prefetching, we know that 0.439 references are accessed on average. By subtracting 0.439 references and the 0.2 new-new references from the total number of additional references (0.687), the number of old-new reference hits resulting from references to previously prefetched CAM blocks is 0.048 references.

Based on the state transition probability,  $P_{ON}$  (.013), 13 out of every 1000 references can be expected to be old to new transitions. In this research, simulations revealed that if a prefetched address is to be referenced, it is first referenced within 256 memory accesses after being prefetched. Therefore, we can expect 3.328 (.013 x 256) to be referenced during the next 256 transitions.

The probability of referencing a previously prefetched CAM block,  $P_{CB}$ , can now be determined by dividing the number of old-new reference hits (0.048) by the expected number of old-new references during the "lifetime" of a prefetched reference (3.328) times the expected number of new-new

references (1.2) that will occur in the new reference state:

$$P_{CB} = \frac{0.048}{3.994} = 0.012$$

To calculate the probability of a CAM prefetch block reference during a new to new transition  $(P_{CP})$ , the expected value of number of new to new references,  $U_{CP}$ , can be set equal to the 0.2 additional new to new references.  $U_{CP}$  is treated as a sum of geometric series using  $P_{CP}$ . From this,  $P_{CP}$ can be calculated as follows:

$$U_{CP} = \frac{P_{CP}}{(1 - P_{CP})} = 0.2$$

Therefore

$$P_{CP} = \frac{.2}{1+.2} = 0.167$$

Substituting the state transition probabilities,  $W_C$ ,  $P_{CB}$ , and  $P_{CP}$ , the CAM hit probability equation can be solved:

$$P_{CAM} = \frac{.371 (.96 + (.013) (.012)) + (.013) (.167)}{.384} = 0.932$$

In table 4.5, the CAM cache hit probability computations are compared with the measured mean values from the simulations. As shown, the predicted hit rates are slightly outside one standard deviation range from the measured hit rates. The calculated hit rates consistently overestimate the measured means. To account for this difference, future research may investigate how the effective CAM size may be further reduced

| SLC Size<br>SLC Bloc          | e = 512<br>ck (varied) |              | CAM Size = 8192<br>CAM Block = 4 |              |
|-------------------------------|------------------------|--------------|----------------------------------|--------------|
| SLC Block                     | 4                      | 8            | 16                               | 32           |
| Equation                      | .932                   | .924         | .904                             | .893         |
| Measured<br>Mean:<br>Std Dev: | .855<br>.063           | .827<br>.080 | .802<br>.100                     | .777<br>.106 |

Table 4.5: CAM Cache Hit Probability Comparisons

from spatial prefetching. The resulting smaller effective CAM size would decrease the calculated CAM hit probability.

#### 4.4 Structural Locality Cache Hit Probability

Hobart developed a two-state Markov model to predict SLC hit rates when structural locality prefetching is used (Hobart, 1989:107-112). As shown in Figure 4.3, the model uses state transitions to represent the probabilities of various types of SLC referencing behavior. This model assumes no spatial locality prefetching.

A SLC hit can occur in two ways. Given a current old reference, a hit can take place if the next reference is also old and has not been replaced in the SLC. Since the SLC employs structural prefetching, the probability of this type of hit is calculated using the SSD and NSSD state transitions:  $P_{SSD} + P_{NSSD} W_S$  where  $P_{SSD}$  is the probability of an old to SSD state transition;  $P_{NSSD}$  is the probability of an old to NSSD state transition;  $W_S$  is effective cache size hit probability.





The other way a SLC hit can occur is during a new to old state transition:  $P_{NO} W_S$ . This probability is similar to the hit probability in the CAM model. Unlike the CAM model, a hit cannot occur on a new to new state transition since this model assumes that no spatial locality prefetching is being used in the CAM.

To incorporate the effects of spatial locality prefetching, my model adds two state transitions which can produce hits. As shown in Figure 4.4, these hit probabilities are similar to the corresponding probabilities in the modified CAM model (Figure 4.2).  $P_{SB}$  is the probability of referencing spatially prefetched CAM data during an old to new transition.  $P_{SP}$  is the probability of a reference to the same CAM block during a new to new reference.

Using the modified Markov model, the following equation is derived to determine the expected SLC hit rate,  $P_{SLC}$ :

$$P_{SLC} = ((P_{SSD_{A}} + P_{NSSD} W_{S}) + P_{SB}P_{ON}) \frac{P_{NO}}{P_{NO} + P_{ON}} + (P_{NO}W_{S} + P_{SP}) \frac{P_{ON}}{P_{NO} + P_{ON}}$$

which simplifies to

$$P_{SLC} = \frac{P_{NO} (P_{SSD_A} + (P_{NSSD} + P_{ON}) W_S + P_{ON} P_{SB}) + P_{ON} P_{SP}}{P_{NO} + P_{ON}}$$

In order to determine  $W_S$ , the effective SLC size must be calculated as follows:



Figure 4.4: Modified Markov Model for SLC Referencing with Structural Locality Prefetching and Spatial Locality Prefetching

Eff SLC Size = 
$$N_s$$
 ( $E_{NN}$  +  $E_{ss}$ )

where

 $N_s = SLC \ size \ (words)$ 

# E<sub>NN</sub> = expected % of new-new reference hits occurring within the most recently prefetched SLC block

Given the assumption that on average one-half of the CAM prefetched block (on a CAM miss) is prefetched into the SLC, the expected number of new-new references  $(E_{NN})$  is 1.2 out of 4 references (0.3). Thus, only 0.2 of the three additional prefetched SLC references are accessed on average.

 $E_{SS}$  is calculated as follows. First, the total number of additional references used within a SLC block,  $A_R$ , is determined (discussed in Section 4.3):

 $A_{R} = (B_{S} - 1) (1 - S_{P})$ 

where

 $B_s = SLC \ block \ size \ (words)$ 

 $S_{p} = SLC pollution mean$ 

Using the SLC block size of four words, we have:

$$A_p = 3(.392) = 1.176$$

From the SLC pollution occurring with no prefetching, we know that 0.511 demand-fetched references are rereferenced on

average. Subtracting the 0.511 references and the 0.2 additional new-new references from the total number of additional references (1.176), the number of additional oldnew reference hits resulting from references to previously prefetched SLC blocks is 0.465. From this,  $E_{SS}$  can be determined:

$$E_{ss} = \frac{.465}{4} = .116$$

We can now solve for the effective CAM size:

 $W_S$  can now be estimated (Appendix C) for 213 words:

$$W_{s} = 0.67$$

 $P_{SB}$  is calculated in the same manner as  $P_{CB}$  is calculated for the CAM cache. It is determined by dividing the number of old-new reference hits (.465) by the expected number of oldnew references during the "lifetime" of a prefetched reference (3.328) times the expected number of new-new references (1.2) that will occur in the new reference state:

$$P_{SB} = \frac{.465}{3.994} = 0.116$$

Since SLC blocks prefetch CAM blocks,  $P_{SB}$  remains constant for all SLC block sizes.

Given consecutive new-new references only occur within a given block,  $P_{SP}$  is calculated in a manner similar to  $P_{CP}$ :

$$P_{SP} = \frac{.2}{(1 + .2)} = 0.167$$

Substituting the state transition probabilities,  $W_S$ ,  $P_{SB}$ , and  $P_{SP}$ , the SLC hit probability equation can be solved:

$$P_{SLC} = \frac{.371(.48 + (.449).67 + (.013).116) + (.013).167}{.384} = 0.761$$

In table 4.6, the SLC hit probability computations are compared with the measured mean values from the simulations. As shown, the predicted hit rates are relatively close to the measured means. Hit rates for the SLC block sizes of 8 and 16 words fall within one standard deviation of the measured means. Hit rates for the SLC block sizes of 4 and 32 words are slightly outside one standard deviation range. The calculated rates consistently underestimate the measured rates. Although the overwhelming majority of SLC prefetches take place from the CAM (allowing for a full SLC block), the portion of SLC prefetches initiated from main memory may still have an impact on pollution. Reduced pollution can increase the effective size of SLC. And, in turn, increase the calculated SLC hit, probability.

| SLC Size = 512<br>SLC Block (varied) |              | CAM Size = 8192<br>CAM Block = 4 |              |              |
|--------------------------------------|--------------|----------------------------------|--------------|--------------|
| SLC Block                            | 4            | 8                                | 16           | 32           |
| Equation                             | .761         | .836                             | .846         | .839         |
| Measured<br>Mean:<br>Std Dev:        | .859<br>.061 | .880<br>.055                     | .893<br>.051 | .908<br>.044 |

Table 4.6: SLC Hit Probability Comparisons

In addition, as in the case of the CAM cache, future research may also investigate the possibility that additional SLC hits may occur as a result of new-new references which take place within multiple blocks. These additional hits would be represented by another new-new state transition. The resulting SLC hits would increase the calculated SLC hit probability.

#### 4.5 Performance Analysis

The effective memory access times were measured for the four sets of SLC and CAM cache block parameters (Table 4.3). As presented in Chapter 3, the effective memory access time,  $t_a$ , is calculated as follows:

$$t_{a} = t_{SLC}P_{SLC} + t_{CAM}(1 - P_{SLC})P_{CAM} + t_{MEM}(1 - P_{CAM})(1 - P_{SLC})$$

where the cycle times for the SLC, CAM, and main memory are 1, 4, and 32 clock cycles, respectively.

To serve as a baseline comparison, the effective memory access time is used for the same SLC and CAM cache sizes when no prefetching is involved. From Table 4.3, the measured mean effective memory access time for no prefetching is 2.822. Calling this the baseline access time Tb, the performance speedup S due to prefetching is defined as (Hobart, 1989:113):

$$S = \left(\frac{t_b}{t_n} - 1\right) 100 \qquad (percent)$$

As shown in Table 4.7, the speedup due to spatial and

structural prefetching occurred for all SLC block sizes. For this set of parameters, increasing the SLC block size results in performance improvement. As the SLC block size nears the SLC size, performance gains would be expected to decrease. In this set, the largest SLC block of 32 words did not adversely impact the SLC hit ratio since a sufficient number of blocks (16) could still be stored in the SLC. As a result, it produced the smallest effective memory access time.

Table 4.7: Speedup Due to Spatial and Structural Prefetching

SLC Size = 512 CAM Size = 8192 No Prefetch Tb = 5.25

| CAM Prefetch Block = 4 |       |  |  |  |
|------------------------|-------|--|--|--|
| SLC Block              | S (%) |  |  |  |
| 4                      | 6.5   |  |  |  |
| 8                      | 8.2   |  |  |  |
| 16                     | 9.6   |  |  |  |
| 32                     | 12.4  |  |  |  |

#### 4.6 Summary

This chapter has shown how Hobart's memory referencing models were modified to incorporate the effects of spatial prefetching in the CAM and structural prefetching in the SLC. CAM and SLC hit probability equations were derived from these models. Using the measurements from the trace-driven simulations, the expected hit rates were calculated for both caches using different sets of block size parameters. In particular, cache pollution estimates were used to analyze the effects of spatial prefetching on structural locality.

.

#### Chapter 5

#### Conclusion and Recommendations

#### 5.1 Main Contributions

The principal contribution of this research is characterizing the effects of spatial prefetching on structural locality in the proposed memory subsystem. This research shows that performance gains through structural locality prefetching are still possible even when spatial locality prefetching is being used in the lower level cache.

Existing memory referencing models are modified to incorporate the combined use of structural locality and spatial locality prefetching. From these models, equations are derived to predict the expected hit rates of the SLC and CAM caches. Targeting a representative sample of symbolic workloads, trace-driven simulations provide performance measurements for different combinations of SLC and CAM cache block sizes. Combined with the state transition probabilities of the modified Markov memory referencing models, these measurements are used to solve the hit probability equations.

The purpose of this research is not to exhaustively measure performance for all combinations of cache parameters. Instead, it provides an extended memory referencing model to analytically predict the hit rates and the effective memory

access times for a range of SLC block sizes.

The experimental methodology for this research is also a significant contribution. The cache simulator provides a reliable tool for not only meeting the demands of this research, but also future research as well. Built using the Ada language, the simulator is designed to facilitate tailoring it to meet new research requirements.

#### 8.2 Additional Applications of This Research

Several approaches exist for continuing this research. First, the modified memory referencing models should be applied to other types of workloads such as numeric and dataprocessing. Expanding the applicability to various workload types will increase the usefulness of the models. In addition, the effects of varying other parameters besides SLC block size should be examined.

Another research application is to further analyze the temporal behavior within the SLC and CAM cache. Specifically, it would be useful to differentiate between prefetched instructions and data. Further investigation is also required to characterize this temporal behavior over a wide range of cache parameters. The cache simulator has already been designed to provide data for this analysis. As described in Chapter 3, the Generate\_Ref\_ Frequency\_Package outputs a reference file containing temporal reference information for both caches.

Another research course would be to study the impact of combined structural locality and spatial locality prefetching on individual reference types: instruction fetches, data reads, and data writes.

## Appendix A: Cache Simulator Source Code

. . . .

A-1

```
package LinkedLists Package is
   type LinkedListNode;
   type NodePointer is access LinkedListNode;
   type LinkedListNode is
      record
         Addr : integer;
         NumRef : integer := 0;
         Next : NodePointer := null;
      end record;
   type List is
      record
         Next: NodePointer := null;
         Tail: NodePointer := null;
      end record;
-- create a node (address to be stored in cache)
   function MakeNode (Address: integer) return NodePointer;
-- find node containing matching address with Pl
   function Search (L: List; Address: integer)
                    return NodePointer;
-- add node P1 to front of list
   procedure AddToFront (L: in out List; P1: NodePointer);
-- add node P1 to rear of list
   procedure AddToRear (L: in out List; Pl: NodePointer);
-- insert node P1 in order by number of passed references
___
      before being first referenced
procedure Insert_In_Order (L: in out List; P1: NodePointer);
-- delete node P1 from list
   procedure Delete (L: in out List; P1: NodePointer);
end LinkedLists Package;
```

```
with Unchecked Deallocation;
package body LinkedLists Package is
   procedure Deallocate Node is new Unchecked Deallocation
             (Object => LinkedListNode,
                    => NodePointer);
              Name
   function MakeNode (Address: integer)
                       return NodePointer is
      p: NodePointer;
   begin
      p := new LinkedListNode;
      p.Addr := Address;
      return p;
   end MakeNode;
                                                             • •
   function Search (L: List; Address: integer)
                    return NodePointer is
      p: NodePointer := L.Next;
   begin
      while p /= null and then p.Addr /= Address loop
         p :≈ p.Next;
      end loop;
      return p;
   end Search;
   procedure AddToFront (L: in out List; P1: NodePointer) is
   begin
      Pl.Next := L.Next;
      L.Next := P1;
      if L.Tail = null then L.Tail := P1; end if;
   end AddToFront;
   procedure AddToRear (L: in out List; P1: NodePointer) is
   begin
      if L.Next = null then
         AddToFront (L, Pl);
      end if;
      L.Tail.Next := Pl;
      L.Tail := P1;
   end AddToRear;
   procedure Insert_In_Order (L: in out List; P1: NodePointer) is
      p: NodePointer := L.Next;
      q: NodePointer;
   begin
      if p = null then
         AddToFront (L, P1);
      elsif p.NumRef > Pl.NumRef then
         AddToFront (L, P1);
      else
         while p /= null and then
            p.NumRef < P1.NumRef + 1 loop</pre>
            q := p;
            p := p.Next;
         end loop;
         if p = null then
            AddToRear (L, P1);
         else
            q.Next := P1;
            P1.Next := p_i
                                       A-3
         end if;
      end if;
   end Insert_In_Order;
```

```
procedure Delete (L: in out List; Pl: NodePointer) is
    p: NodePointer := L.Next;
    q: NodePointer := null;
begin
    if Pl = p then L.Next := Pl.Next;
    else
        while p /= null and then p /= Pl loop
        q := p;
        p := p.Next;
        end loop;
        q.Next := Pl.Next;
        Deallocate_Node (p);
    end if;
        if L.Tail = Pl then L.Tail := q; end if;
end Delete;
end LinkedLists_Package;
```

generic

```
type Object is limited private;
type Name is access Object;
```

procedure Unchecked\_Deallocation (x: in out Name);

| package CircularQ_Pa                                                           | ackage is                                                                                                                                 |
|--------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| type index is rang<br>type Array_Type is<br>type Array_Ptr_Typ                 | ge 1 32768;<br>s array (index) of integer;<br>be is access Array_Type;                                                                    |
| type Queue is<br>record<br>Address<br>Ref_Count<br>head<br>tail<br>end record; | <pre>: Array_Ptr_Type := new Array_Type;<br/>: Array_Ptr_Type := new Array_Type;<br/>: index;<br/>: index;</pre>                          |
| procedure Enqueue                                                              | (Q: in out Queue;<br>CAM_Size_Index: in index;<br>Reference: in integer);                                                                 |
| procedure Dequeue                                                              | (Q: in out Queue;<br>CAM_Size_Index: in index);                                                                                           |
| procedure SearchQ                                                              | <pre>(Q: in Queue;<br/>Reference: in integer;<br/>CAM_Size_Index: in index;<br/>Position: in out index;<br/>Found: in out boolean);</pre> |

end CircularQ\_Package;

.

```
package body CircularQ Package is
   procedure Enqueue (Q: in out Queue;
                      CAM_Size_Index: in index;
                      Reference: in integer) is
   begin
      Q.tail := (Q.tail mod CAM_Size_Index) + 1;
      Q.Address (Q.tail) := Reference;
      Q.Ref_Count (Q.tail) := 0;
   end Enqueue;
   procedure Dequeue (Q: in out Queue;
                      CAM_Size_Index: in index) is
   begin
      Q.head := (Q.head mod CAM Size Index) + 1;
   end Dequeue;
   procedure SearchQ (Q: in Queue;
                      Reference: in integer;
                      CAM_Size_Index: in index;
                      Position: in out index;
                      Found: in out boolean) is
   begin
      for i in 1 .. CAM_Size_Index loop
         Position := (Position mod CAM Size Index) + 1;
         if Q.Address (Position) = Reference
            then Found := true;
            exit;
         end if;
      end loop;
   end SearchQ;
end CircularQ_Package;
```

package Addr\_Record\_Package is

```
type Addr_Record is
    record
    The_Type : character;
    Address : integer;
    end record;
```

.

-

end Addr\_Record\_Package;

LinkedLists Package, CircularQ Package, Fetch Address Package, Determine Type Package, Serv\_Instr\_Fetch\_Package, Serv\_Data\_Read\_Package, Serv\_Data\_Write\_Package, Compute\_Miss Ratios Package, Compute Memory Access Time Package, Compute Cache Pollution Package, Generate Ref Frequency List Package; Text IO, Addr Record Package, se LinkedLists Package, CircularQ Package, Fetch\_Address\_Package, Determine\_Type Package, Serv\_Instr\_Fetch\_Package, Serv\_Data\_Read\_Package, Serv\_Data Write\_Package, Compute Miss Ratios Package, Compute Memory Access Time Package, Compute Cache Pollution Package, Generate Ref Frequency List Package; procedure Cache Simulator is package Type Integer IO is new integer IO (integer); use Type\_Integer\_IO; package Index Integer IO is new integer IO (index); use Index Integer IO; Input\_File : File\_Type; Output\_File : File\_Type; Reference\_File : File\_Type; Memory\_Ref : Addr\_Record; Num\_Ref : natural := 0; Total\_Num\_Ref : natural := 0; Type\_Ref : character; SLC : List; : List; SLC : List; CAM : Queue; SLC\_Ref\_List : List; CAM\_Ref\_List : List; SLC\_Miss : natural := 0; CAM\_Miss : natural := 0; SLC\_Total\_Refs : natural := 0; CAM\_Total\_Refs : natural := 0; Temp\_SLC\_Size : natural := 0; Temp\_CAM\_Size : natural := 0; SLC\_Non\_Ref : natural := 0; SLC\_Total\_Prefetch : natural := 0; CAM\_Total\_Prefetch : natural := 0; SLCCAM\_Total\_Prefetch : natural := 0; SLC\_MR, CAM\_MR : float; CAM\_Size\_Index : index; In\_String : string (1..15); In\_Length : natural; New\_Space : natural := 1; Last\_Space : natural := 0; -- \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* CACHE PARAMETERS SLC\_Size : natural; CAM\_Size : natural; SLC\_Line\_Size : natural; CAM\_Line\_Size : natural; begin new line; put ("Please enter the following:"); new line(2);

ith Text IO, Addr Record Package,

put ("Statistics Filename: "); get\_line (In\_String, In\_Length); while ((New\_Space < In\_Length) and (In\_String (New\_Space .. New\_Space) /= " ")) loop New Space := New Space + 1; end loop; create (Output File, Out File, In\_String ((Last\_Space + 1) .. New Space)); New Space := 1; Last Space := 0; put ("Reference Filename: "); get\_line (In\_String, In\_Length);
while ((New\_Space < In\_Length) and</pre> (In\_String (New\_Space .. New\_Space) /= " ")) loop New Space := New Space + 1;end loop; create (Reference File, Out File, In String ((Last Space + 1) .. New Space)); New\_Space := 1; Last\_Space := 0; put ("Trace Filename: "); get line (In String, In Length); while ((New Space < In Length) and (In\_String (New\_Space .. New\_Space) /= " ")) loop New\_Space := New\_Space + 1; end loop; open (Input File, In File, In String ((Last\_Space + 1) .. New\_Space)); new\_line; put ("\*\*\* Cache Parameters \*\*\*"); new\_line; put ("SLC Size: "); get (SLC\_Size); put ("SLC Line Size: "); get (SLC Line Size); put ("CAM Size: "); get (CAM\_Size); put ("CAM Line Size: "); get (CAM Line Size); new line; -- Heading info: stats & reference files. \_\_\_\_\_\_ put (Output\_File, "Address Trace: "); put (Output\_File, In\_String((Last\_Space+1)..New\_Space)); put (Reference\_File, "Address Trace: ");
put (Reference\_File, In\_String((Last\_Space+1)..New\_Space)); set\_line (Output\_File, 2); set\_line (Reference\_File, 2); put (Output\_File, "SLC Size: "); put (Output\_File, SLC\_Size); put (Output\_File, " SLC Line: "); put (Output\_File, SLC\_Line\_Size); put (Reference\_File, "SLC Size: ·"); put (Reference\_File, SLC\_Size); put (Reference\_File, " SLC Line: "); put (Reference\_File, SLC\_Line\_Size); set\_line (Output\_File, 3); set\_line (Reference File, 3); put (Output\_File, "CAM Size: put (Output\_File, CAM\_Size); "); put (Output\_File, " CAM Line: "); put (Output File, CAM Line Size); put (Reference\_File, "CAM Size: "); put (Reference\_File, CAM\_Size); \*\* "); put (Reference File, CAM Line: A-10 put (Reference File, CAM Line Size); set\_line (Output\_File, 5); set line (Reference File, 5);

```
- Initialize CAM head & tail to CAM size.
   _____
                 _____
CAM Size Index := index (CAM Size);
CAM.head := CAM Size Index;
CAM.tail := CAM Size Index;
                           while not End_Of_File (Input_File) loop
   Load Record (Input File, Memory Ref);
   Type Ref := Address Type (Memory Ref);
   loop
      case Type Ref is
         when 0' = 
           Num Ref := Num Ref + 1;
            Serv Data Read (Input File, Memory Ref,
                  SLC, CAM, SLC_Ref_List, CAM_Ref_List,
                  SLC Miss, CAM Miss,
                 SLC_Total_Refs, CAM_Total_Refs,
                  SLC_Non_Ref, CAM_Non Ref,
                  SLC Total Prefetch, CAM Total Prefetch,
                  SLC_Size, CAM_Size, SLC_Line Size,
                 CAM Line_Size, Temp SLC_Size,
                  Temp CAM Size); exit;
        when 1' = 
           Num Ref := Num_Ref + 1;
            Serv Data Write (Input File, Memory Ref,
                 SLC, CAM, SLC_Ref_List, CAM_Ref_List,
                  SLC Miss, CAM_Miss,
                 SLC_Total_Refs, CAM_Total_Refs,
                 SLC_Non_Ref, CAM_Non_Ref,
                 SLC_Total Prefetch, CAM Total Prefetch,
                 SLC Size, CAM Size, SLC Line Size,
                 CAM Line_Size, Temp_SLC_Size,
                 Temp CAM Size); exit;
        when 2' \Rightarrow
           Num_Ref := Num_Ref + 1;
           Serv_Instr Fetch (Input File, Memory Ref,
                 SLC, CAM, SLC_Ref_List, CAM Ref List,
                 SLC_Miss, CAM Miss,
                 SLC_Total_Refs, CAM Total Refs,
                 SLC_Non_Ref, CAM_Non_Ref,
                 SLC_Total_Prefetch, CAM_Total Prefetch,
                 SLC Size, CAM Size, SLC Line Size,
                 CAM_Line_Size, Temp_SLC_Size,
                 Temp CAM Size); exit;
        when others => exit;
      end case;
  end loop;
   if Num Ref = 20000 then
     Total_Num_Ref := Total_Num_Ref + Num_Ref;
      -- Compute SLC & CAM miss ratios
                                                        A-11
           for every 20000 memory references
      Compute Miss Ratios (SLC Miss, CAM Miss,
           SLC_Total_Refs, CAM Total Refs,
```

```
Total_Num_Ref, SLC_MR,
          CAM_MR, Output_File);
      Num Ref := 0;
    elsif End Of File (Input File) then
      Total Num Ref := Total Num Ref + Num Ref;
      _____
      -- Compute final SLC & CAM miss ratios
      Compute Miss Ratios (SLC Miss, CAM Miss,
          SLC Total Refs, CAM Total Refs,
          Total Num Ref, SLC MR,
          CAM MR, Output File);
      -- Compute average memory access time
      Compute_Memory_Access_Time (SLC_MR, CAM_MR,
                          Output File);
       -- Compute SLC & CAM pollution
                            Compute_Cache_Pollution (SLC, CAM, SLC_Non_Ref,
        CAM_Non_Ref, SLC_Total_Prefetch, CAM_Total_Prefetch,
        Temp_CAM_Size, Output_File);
      -- Generate reference frequency list
      Generate Ref Frequency List (SLC_Ref_List,
        CAM_Ref_List, SLC_Non_Ref, CAM_Non_Ref,
        Reference File);
                   .
__________
      new line (2);
      put line ("***** End of Simulation *****");
      exit;
    end if;
  end loop;
close (Input_File);
close (Output File);
close (Reference File);
```

end Cache\_Simulator;

vith Addr\_Record\_Package, LinkedLists\_Package, CircularQ\_Package, Text\_I0; lse Addr\_Record\_Package, LinkedLists\_Package, CircularQ\_Package, Text\_I0;

package Serv\_Data\_Read\_Package is

| procedure Serv_Data_Read |   |    |                       |
|--------------------------|---|----|-----------------------|
| (Input_File              | : | in | <pre>File_Type;</pre> |
| Memory_Ref               | : | in | Addr_Record;          |
| SLC                      | : | in | out List;             |
| CAM                      | : | in | out Queue;            |
| SLC_Ref_List             | : | in | out List;             |
| CAM_Ref_List             | : | in | out List;             |
| SLC_Miss                 | : | in | out natural;          |
| CAM_Miss                 | : | in | out natural;          |
| SLC_Total_Refs           | : | in | out natural;          |
| CAM_Total_Refs           | : | in | out natural;          |
| SLC_Non_Ref              | : | in | out natural;          |
| CAM_Non_Ref              | : | in | out natural;          |
| SLC_Total_Prefetch       | : | in | out natural;          |
| CAM_Total_Prefetch       | : | in | out natural;          |
| SLC_Size                 | : | in | natural;              |
| CAM_Size                 | : | in | natural;              |
| SLC_Line_Size            | : | in | natural;              |
| CAM_Line_Size            | : | in | <pre>natural;</pre>   |
| Temp_SLC_Size            | : | in | out natural;          |
| Temp_CAM_Size            | : | in | out natural);         |

end Serv\_Data\_Read\_Package;

### package body Serv\_Data\_Read\_Package is

| procedure Serv Data Read                                                                 |
|------------------------------------------------------------------------------------------|
| (Input File : in File Type:                                                              |
| Memory Ref : in Addr Record;                                                             |
| SLC : in out List;                                                                       |
| CAM : in out Queue;                                                                      |
| SLC_Ref_List : in out List;                                                              |
| CAM_Ref_List : in out List;                                                              |
| SLC_Miss : in out natural;                                                               |
| CAM Miss : in out natural;                                                               |
| CAM Total Refs : in out natural;                                                         |
| SLC Non Ref : in out natural:                                                            |
| CAM Non Ref : in out natural:                                                            |
| SLC Total Prefetch : in out natural;                                                     |
| CAM Total Prefetch : in out natural;                                                     |
| SLC_Size : in natural;                                                                   |
| CAM_Size : in natural;                                                                   |
| SLC_Line_Size : in natural;                                                              |
| CAM_Line_Size : in natural;                                                              |
| Temp_SLC_Size : in out natural;                                                          |
| Temp_CAM_Size : in out natural) is                                                       |
| <pre>package Index_Integer_IO is new integer_IO (index);<br/>use Index_Integer_IO;</pre> |
| Reference : Addr Becord.                                                                 |
| Temp Ndl. Temp Nd2 : NodePointer:                                                        |
| Temp Nd3 : NodePointer;                                                                  |
| Position : index := CAM.head;                                                            |
| Found : boolean := false;                                                                |
| k : natural := 1;                                                                        |
| CAM_Prefetch_Addr : integer;                                                             |
| n,o,p : integer;                                                                         |
| CAM_Size_Index, j : index;                                                               |
| Temp_CAM_Tall : index;                                                                   |
| begin                                                                                    |
| *****                                                                                    |
| Search for memory reference in the SLC.                                                  |
| ~~ ***********************************                                                   |
| <pre>SLC_Total_Refs := SLC_Total_Refs + 1;</pre>                                         |
| Temp Ndl := SLC.Next;                                                                    |
| while Temp Nd1 /= null loop                                                              |
| if Temp_Ndl.NumRef /= -1 then                                                            |
| <pre>Temp_Ndl.NumRef := Temp_Ndl.NumRef + 1;</pre>                                       |
| end if;                                                                                  |
| <pre>Temp_Ndl := Temp_Ndl.Next;</pre>                                                    |
| end loop;                                                                                |
| n :- Momory Pef Address;                                                                 |
| Temp_Ndl := MakeNode (Memory_Ref.Address);                                               |
| <pre>Temp_Nd2 := Search (SLC, n);</pre>                                                  |
| *************                                                                            |
| If the address is found in the SLC,                                                      |
| then delete the address in the cache                                                     |
| (linked list) and add it to the front                                                    |
| or the list (most recently used).                                                        |
| ***** <b>***</b> ***************************                                             |

A-14

if Temp\_Nd2 /= null then
```
AddToFront (SLC, Temp_Ndl);
  SLC.Next.NumRef := -1;
  if Temp Nd2.NumRef /= -1 then
     Temp Nd3 := MakeNode (Temp Nd2.Addr);
     Temp Nd3.NumRef := Temp Nd2.NumRef;
     Insert In Order (SLC Ref List, Temp Nd3);
  end if;
  Delete (SLC, Temp Nd2);
else
__ ***********
-- Since a miss has occurred in the SLC,
-- search for the address in the CAM.
SLC_Miss := SLC_Miss + 1;
  CAM_Total Refs := CAM_Total_Refs + 1;
  CAM_Size Index := index (CAM_Size);
  if Temp CAM Size /= 0 then
     for i in 1 .. Temp CAM Size loop
        j := index (i);
        if CAM.Ref_Count (j) /= -1 then
           CAM.Ref_Count (j) := CAM.Ref_Count (j) + 1;
        end if;
     end loop;
  end if;
  SearchQ (CAM, n, CAM Size Index, Position, Found);
-- If the address is found in the CAM, then
-- prefetch a line from the CAM into the SLC.
-- The SLC line is comprised of the requested
-- address + the addresses located after the
-- requested address (total equal to the SLC
-- line size). This action represents a
-- spatial prefetch of the structural
-- locality captured in the CAM.
if Found = true then
                              -- address found in CAM
     if CAM.Ref Count (Position) /= -1 then
        Temp Nd1 := MakeNode (CAM.Address (Position));
        Temp_Ndl.NumRef := CAM.Ref_Count (Position);
        Insert_In_Order (CAM_Ref_List, Temp_Ndl);
        CAM.Ref_Count (Position) := -1;
     end if;
     while Position /= CAM.tail
        and then k /= SLC_Line_Size loop
           k := k + 1;
          Position := Position mod CAM Size Index + 1;
        end loop;
     for i in 1 .. k loop
     __ *****************************
     -- If the SLC is full, then delete the
     -- address located at the rear of the list
     -- (LRU replacement).
                                                     A-15
     -- ****
                  ******
        if Temp_SLC_Size = SLC_Size then
```

```
if SLC.tail.NumRef /= -1 then
          SLC Non Ref := SLC Non Ref + 1;
       end if;
       Delete (SLC, SLC.tail);
       Temp SLC Size := Temp SLC Size - 1;
     end if;
  __ ***************
    Temp_Nd1 := MakeNode (CAM.Address (Position));
    AddToFront (SLC, Temp Nd1);
     SLC Total Prefetch := SLC_Total_Prefetch + 1;
    Temp SLC Size := Temp SLC Size + 1;
     if Position = 1 then
       Position := CAM_Size_Index;
    else
       Position := Position - 1;
    end if;
  end loop;
else
                       -- address not found in CAM
-- If a miss occurs in the CAM, then prefetch a
-- line from main memory into the CAM. The line
-- will be comprised of the block of memory (equal
-- to the CAM line size) in which the requested
-- address is located.
CAM Miss := CAM Miss + 1;
-- If the CAM is full, then the delete the
-- addresses (amount equal to the CAM line size)
-- in the front of the CAM (FIFO replacement).
if Temp_CAM_Size = CAM_Size then
     for i in 1 .. CAM_Line_Size loop
       if CAM.Ref_Count (CAM.head mod CAM Size Index + 1)
          /= -1 then
         CAM_Non_Ref := CAM_Non_Ref + 1;
       end if;
       Dequeue (CAM, CAM Size Index);
       Temp CAM Size := Temp CAM Size - 1;
    end loop;
  end if;
 o := n / CAM Line Size;
  CAM Prefetch Addr := o * CAM Line Size;
___ *******************************
-- CAM prefetch for a positive integer address.
                                             A-16
if n \ge 0 then
     CAM_Prefetch_Addr := CAM_Prefetch Addr - 1;
```

```
for i in 1 .. CAM_Line_Size loop
        CAM Prefetch Addr := CAM Prefetch Addr + 1;
        Enqueue (CAM, CAM Size Index, CAM Prefetch Addr);
        CAM_Total_Prefetch := CAM Total Prefetch + 1;
        Temp_CAM_Size := Temp CAM_Size + 1;
     end loop;
  else
  -- CAM prefetch for a negative integer address.
  if n rem CAM Line Size /= 0 then
        CAM Prefetch Addr := CAM Prefetch Addr - CAM Line Size;
     end if;
     for i in 1 .. CAM Line Size loop
        Enqueue (CAM, CAM Size Index, CAM Prefetch Addr);
        CAM_Total_Prefetch := CAM_Total Prefetch + 1;
        CAM Prefetch Addr := CAM Prefetch Addr + 1;
        Temp_CAM_Size := Temp_CAM_Size + 1;
     end loop;
  end if;
-- Prefetch the addresses from the CAM to the SLC
-- starting with the requested address and ending
-- with the last (tail of the circular queue)
-- address in the CAM.
Temp_CAM_Tail := CAM.tail;
  while CAM.Address (Temp_CAM_Tail) /= n loop
     Temp Ndl := MakeNode (CAM.Address (Temp CAM Tail));
     if Temp_SLC_Size = SLC_Size then
        if SLC.tail.NumRef /= -1 then
          SLC_Non_Ref := SLC_Non_Ref + 1;
        end if;
        Delete (SLC, SLC.tail);
        Temp_SLC_Size := Temp_SLC_Size - 1;
     end if;
     AddToFront (SLC, Temp_Ndl);
     SLC Total Prefetch := SLC Total Prefetch + 1;
     Temp SLC Size := Temp SLC Size + 1;
     if Temp_CAM_Tail = 1 then
        Temp_CAM_Tail := CAM Size Index;
     else
        Temp_CAM_Tail := Temp_CAM_Tail - 1;
     end if;
  end loop;
  if Temp_SLC_Size = SLC_Size then
     if SLC.tail.NumRef /= -1 then
        SLC_Non_Ref := SLC_Non_Ref + 1;
     end if;
                                                 A-17
     Delete (SLC, SLC.tail);
     Temp_SLC_Size := Temp_SLC_Size - 1;
```

```
end if;
Temp_Ndl := MakeNode (CAM.Address (Temp_CAM_Tail));
AddToFront (SLC, Temp_Ndl);
SLC_Total_Prefetch := SLC_Total_Prefetch + 1;
Temp_SLC_Size := Temp_SLC_Size + 1;
end if;
end if;
```

end Serv\_Data\_Read; end Serv\_Data\_Read\_Package;

| with | Addr_Record_Pack | age, |            |          |       |     |
|------|------------------|------|------------|----------|-------|-----|
|      | LinkedLists Pack | age, | CircularQ_ | Package, | Text_ | IO; |
| use  | Addr Record Pack | age, |            |          |       |     |
|      | LinkedLists_Pack | age, | CircularQ_ | Package, | Text_ | 10; |
|      |                  |      |            |          |       |     |

package Serv\_Data\_Write\_Package is

| : | in | <pre>File_Type;</pre>                                        |
|---|----|--------------------------------------------------------------|
| : | in | Addr_Record;                                                 |
| : | in | out List;                                                    |
| : | in | out Queue;                                                   |
| : | in | out List;                                                    |
| : | in | out List;                                                    |
| : | in | out natural;                                                 |
| : | in | <pre>natural;</pre>                                          |
| : | in | <pre>natural;</pre>                                          |
| : | in | <pre>natural;</pre>                                          |
| : | in | natural;                                                     |
| : | in | out natural;                                                 |
| : | in | <pre>out natural);</pre>                                     |
|   |    | : in<br>: in<br>: in<br>: in<br>: in<br>: in<br>: in<br>: in |

end Serv\_Data\_Write\_Package;

```
package body Serv_Data_Write_Package is
```

```
procedure Serv_Data_Write
         (Input_File
                            : in File_Type;
          Memory_Ref
                           : in Addr_Record;
          SLC
                            : in out List;
          CAM
                            : in out Queue;
          SLC Ref List
                            : in out List;
          CAM_Ref_List
                           : in out List;
          SLC_Miss
                            : in out natural;
         CAM_Miss : in out natural;
SLC_Total_Refs : in out natural;
CAM_Total_Refs : in out natural;
          SLC Non Ref
                            : in out natural;
                      : in out natural;
          CAM Non_Ref
          SLC_Total_Prefetch : in out natural;
          CAM_Total_Prefetch : in out natural;
          SLC_Size
                         : in natural;
                            : in natural;
          CAM Size
          SLC_Line_Size
                           : in natural;
          CAM_Line Size
                            : in natural;
          Temp_SLC_Size
          Temp_SLC_Size : in out natural;
Temp_CAM_Size : in out natural) is
   package Index_Integer IO is new integer IO (index);
   use Index Integer IO;
   Reference
                       : Addr_Record;
   Temp_Nd1, Temp_Nd2 : NodePointer;
   Temp Nd3
                      : NodePointer;
   Position
                      : index := CAM.head;
   Found
                      : boolean := false;
   k
                      : natural := 1;
   CAM_Prefetch_Addr
                      : integer;
   n,o,p
                      : integer;
   CAM_Size_Index, j : index;
   Temp CAM Tail
                      : index;
begin
-- ***************
-- Search for memory reference in the SLC.
-- ***************
   SLC_Total_Refs := SLC_Total_Refs + 1;
   Temp Nd1 := SLC.Next;
   while Temp_Ndl /= null loop
      if Temp_Ndl.NumRef /= -1 then
         Temp_Ndl.NumRef := Temp_Ndl.NumRef + 1;
      end if;
      Temp Ndl := Temp Ndl.Next;
   end loop;
   n := Memory_Ref.Address;
   Temp Ndl := MakeNode (Memory Ref.Address);
   Temp Nd2 := Search (SLC, n);
__ **************
-- If the address is found in the SLC,
-- then delete the address in the cache
-- (linked list) and add it to the front
-- of the list (most recently used).
--- *******************************
```

A-20

```
AddToFront (SLC, Temp_Ndl);
  SLC.Next.NumRe1 := -1;
  if Temp Nd2.NumRef /= -1 then
     Temp Nd3 := MakeNode (Temp Nd2.Addr);
     Temp Nd3.NumRef := Temp Nd2.NumRef;
     Insert_In_Order (SLC Ref List, Temp Nd3);
  end if;
  Delete (SLC, Temp_Nd2);
else
-- Since a miss has occurred in the SLC,
-- search for the address in the CAM.
SLC Miss := SLC Miss + 1;
  CAM Total Refs := CAM Total Refs + 1;
  CAM Size Index := index (CAM_Size);
  if Temp_CAM_Size /= 0 then
     for i in 1 .. Temp CAM Size loop
        j := index (i);
        if CAM.Ref_Count (j) /= -1 then
          CAM.Ref_Count (j) := CAM.Ref_Count (j) + 1;
        end if;
     end loop;
  end if;
  SearchQ (CAM, n, CAM Size Index, Position, Found);
-- If the address is found in the CAM, then
-- prefetch a line from the CAM into the SLC.
-- The SLC line is comprised of the requested
-- address + the addresses located after the
-- requested address (total equal to the SLC
-- line size). This action represents a
-- spatial prefetch of the structural
-- locality captured in the CAM.
if Found = true then
                             -- address found in CAM
     if CAM.Ref_Count (Position) /= -1 then
        Temp_Ndl := MakeNode (CAM.Address (Position));
        Temp_Ndl.NumRef := CAM.Ref_Count (Position);
        Insert In Order (CAM Ref List, Temp Ndl);
        CAM.Ref Count (Position) := -1;
     end if;
     while Position /= CAM.tail
      . and then k /= SLC_Line_Size loop
          k := k + 1;
          Position := Position mod CAM_Size_Index + 1;
        end loop;
     for i in 1 .. k loop
     __ ****************************
     -- If the SLC is full, then delete the
     -- address located at the rear of the list
     -- (LRU replacement).
     -- *
                              *****
        if Temp_SLC_Size = SLC_Size then
```

A-21

```
if SLC.tail.NumRef /= -1 then
          SLC_Non Ref := SLC Non Ref + 1;
        end if;
        Delete (SLC, SLC.tail);
        Temp_SLC_Size := Temp_SLC_Size - 1;
     end if;
  -- ********************************
     Temp Ndl := MakeNode (CAM.Address (Position));
     AddToFront (SLC, Temp Ndl);
     SLC Total Prefetch := SLC Total Prefetch + 1;
     Temp_SLC_Size := Temp_SLC_Size + 1;
     if Position = 1 then
        Position := CAM_Size_Index;
     else
        Position := Position - 1;
     end if;
  end loop;
else
                         -- address not found in CAM
-- If a miss occurs in the CAM, then prefetch a
-- line from main memory into the CAM. The line
-- will be comprised of the block of memory (equal
-- to the CAM line size) in which the requested
-- address is located.
CAM Miss := CAM Miss + 1;
-- ******************************
-- If the CAM is full, then the delete the
-- addresses (amount equal to the CAM line size)
-- in the front of the CAM (FIFO replacement).
-- *********************************
  if Temp_CAM_Size = CAM_Size then
     for i in 1 .. CAM_Line_Size loop
        if CAM.Ref_Count (CAM.head mod CAM_Size_Index + 1)
          /= -1 then
          CAM_Non_Ref := CAM_Non_Ref + 1;
        end if;
       Dequeue (CAM, CAM Size Index);
        Temp CAM Size := Temp CAM Size - 1;
     end loop;
  end if;
  o := n / CAM Line Size;
  CAM Prefetch_Addr := o * CAM_Line_Size;
-- *********************
-- CAM prefetch for a positive integer address.
                                                 A-22
-- ******************
  if n \ge 0 then
     CAM_Prefetch_Addr := CAM_Prefetch_Addr - 1;
```

```
for i in 1 .. CAM_Line_Size loop
        CAM_Prefetch_Addr := CAM Prefetch Addr + 1;
        Enqueue (CAM, CAM Size Index, CAM Prefetch Addr);
        CAM Total Prefetch := CAM Total Prefetch + 1;
        Temp CAM Size := Temp CAM Size + 1;
     end loop;
  else
  -- *************
  -- CAM prefetch for a negative integer address.
  if n rem CAM Line Size /= 0 then
        CAM_Prefetch Addr := CAM_Prefetch Addr - CAM Line Size;
   \cdot end if;
     for i in 1 .. CAM Line Size loop
        Enqueue (CAM, CAM_Size_Index, CAM_Prefetch_Addr);
        CAM Total Prefetch := CAM Total Prefetch + 1;
        CAM Prefetch_Addr := CAM_Prefetch_Addr + 1;
        Temp_CAM_Size := Temp_CAM_Size + 1;
     end loop;
  end if;
-- Prefetch the addresses from the CAM to the SLC
-- starting with the requested address and ending
-- with the last (tail of the circular queue)
-- address in the CAM.
Temp CAM Tail := CAM.tail;
  while CAM.Address (Temp_CAM_Tail) /= n loop
     Temp_Ndl := MakeNode (CAM.Address (Temp CAM Tail));
     if Temp_SLC_Size = SLC_Size then
        if SLC.tail.NumRef /= -1 then
           SLC_Non_Ref := SLC_Non Ref + 1;
        end if;
        Delete (SLC, SLC.tail);
        Temp_SLC_Size := Temp_SLC_Size - 1;
     end if;
     AddToFront (SLC, Temp Ndl);
     SLC Total Prefetch := SLC Total Prefetch + 1;
     Temp_SLC_Size := Temp_SLC_Size + 1;
     if Temp_CAM_Tail = 1 then
        Temp_CAM_Tail := CAM_Size_Index;
     else
        Temp_CAM_Tail := Temp CAM Tail - 1;
     end if;
  end loop;
  if Temp SLC Size = SLC_Size then
     if SLC.tail.NumRef /= -1 then
        SLC_Non_Ref := SLC_Non_Ref + 1;
     end if;
                                                    A-23
     Delete (SLC, SLC.tail);
     Temp_SLC_Size := Temp_SLC_Size - 1;
```

end if;

Temp\_Nd1 := MakeNode (CAM.Address (Temp\_CAM\_Tail)); AddToFront (SLC, Temp\_Nd1); SLC\_Total\_Prefetch := SLC\_Total\_Prefetch + 1; Temp\_SLC\_Size := Temp\_SLC\_Size + 1;

end if;

end if;

end Serv\_Data\_Write; end Serv\_Data\_Write\_Package;

.

with Addr\_Record\_Package, LinkedLists\_Package, CircularQ\_Package, Text\_IO; use Addr\_Record\_Package, LinkedLists\_Package, CircularQ\_Package, Text\_IO;

package Serv\_Instr\_Fetch\_Package is

| procedure Serv_Instr_Fetch |   |    |                          |
|----------------------------|---|----|--------------------------|
| (Input_File                | : | in | <pre>File_Type;</pre>    |
| Memory_Ref                 | : | in | Addr_Record;             |
| SLC                        | : | in | out List;                |
| CAM                        | : | in | out Queue;               |
| SLC_Ref_List               | : | in | out List;                |
| CAM_Ref_List               | : | in | out List;                |
| SLC_Miss                   | : | in | out natural;             |
| CAM_Miss                   | : | in | out natural;             |
| SLC_Total_Refs             | : | in | out natural;             |
| CAM_Total_Refs             | : | in | out natural;             |
| SLC_Non_Ref                | : | in | out natural;             |
| CAM_Non_Ref                | : | in | out natural;             |
| SLC_Total_Prefetch         | : | in | out natural;             |
| CAM_Total_Prefetch         | : | in | out natural;             |
| SLC_Size                   | : | in | natural;                 |
| CAM_Size                   | : | in | <pre>natural;</pre>      |
| SLC_Line_Size              | : | in | <pre>natural;</pre>      |
| CAM_Line_Size              | : | in | <pre>natural;</pre>      |
| Temp_SLC_Size              | : | in | out natural;             |
| Temp_CAM_Size              | : | in | <pre>out natural);</pre> |

end Serv\_Instr\_Fetch\_Package;

,

## package body Serv\_Instr\_Fetch\_Package is

```
procedure Serv Instr Fetch
        (Input File
                           : in File Type;
                           : in Addr Record;
         Memory Ref
                           : in out List;
         SLC
                          : in out Queue;
         CAM
         SLC Ref List
                          : in out List;
         CAM Ref List
                          : in out List;
         SLC Miss
                          : in out natural;
         CAM Miss
                          : in out natural;
         SLC_Total_Refs : in out natural;
CAM_Total_Refs : in out natural;
                          : in out natural;
         SLC_Non_Ref
                      : in out natural;
         CAM Non Ref
         SLC Total Prefetch : in out natural;
         CAM Total Prefetch : in out natural;
         SLC Size
                           : in natural;
                          : in natural;
         CAM Size
         Temp SLC Size
                         : in out natural;
         Temp CAM Size : in out natural) is
  package Index Integer IO is new integer IO (index);
  use Index Integer IO;
                     : Addr Record;
  Reference
  Temp_Nd1, Temp_Nd2 : NodePointer;
  Temp_Nd3
                     : NodePointer;
  Position
                     : index := CAM.head;
                     : boolean := false;
  Found
  k
                     : natural := 1;
  CAM_Prefetch_Addr : integer;
                    : integer;
  n,o,p
  CAM_Size_Index, j : index;
Temp CAM Tail : index;
begin
-- Search for memory reference in the SLC.
-- ******************************
  SLC_Total_Refs := SLC_Total_Refs + 1;
  Temp Nd1 := SLC.Next;
  while Temp Ndl /= null loop
     if Temp Ndl.NumRef /= -1 then
        Temp_Ndl.NumRef := Temp_Ndl.NumRef + 1;
     end if;
     Temp Ndl := Temp Ndl.Next;
  end loop;
  n := Memory_Ref.Address;
  Temp_Nd1 := MakeNode (Memory_Ref.Address);
  Temp Nd2 := Search (SLC, n);
-- *********************************
-- If the address is found in the SLC,
-- then delete the address in the cache
-- (linked list) and add it to the front
-- of the list (most recently used).
```

if Temp Nd2 /= null then

A-26

```
AddToFront (SLC, Temp Ndl);
  SLC.Next.NumRef := -1;
  if Temp_Nd2.NumRef /= -1 then
     Temp Nd3 := MakeNode (Temp Nd2.Addr);
     Temp Nd3.NumRef := Temp Nd2.NumRef;
     Insert In Order (SLC Ref List, Temp Nd3);
  end if;
  Delete (SLC, Temp Nd2);
else
--- *************
-- Since a miss has occurred in the SLC,
-- search for the address in the CAM.
SLC Miss := SLC Miss + 1;
  CAM Total Refs := CAM Total Refs + 1;
  CAM Size Index := index (CAM Size);
  if Temp CAM Size /= 0 then
     for i in 1 .. Temp_CAM_Size loop
        j := index (i);
        if CAM.Ref_Count (j) /= -1 then
           CAM.Ref_Count (j) := CAM.Ref_Count (j) + 1;
        end if;
     end loop;
  end if;
  SearchQ (CAM, n, CAM Size Index, Position, Found);
-- If the address is found in the CAM, then
-- prefetch a line from the CAM into the SLC.
-- The SLC line is comprised of the requested
-- address + the addresses located after the
-- requested address (total equal to the SLC
-- line size).
              This action represents a
-- spatial prefetch of the structural
-- locality captured in the CAM.
                 * * * * * * * *
  if Found = true then
                              -- address found in CAM
     if CAM.Ref Count (Position) /= -1 then
        Temp_Ndl := MakeNode (CAM.Address (Position));
        Temp_Ndl.NumRef := CAM.Ref_Count (Position);
        Insert In_Order (CAM_Ref_List, Temp_Ndl);
        CAM.Ref_Count (Position) := -1;
     end if;
     while Position /= CAM.tail
        and then k /= SLC_Line_Size loop
           k := k + 1;
           Position := Position mod CAM_Size_Index + 1;
        end loop;
     for i in 1 .. k loop
     __ ***************
      -- If the SLC is full, then delete the
     -- address located at the rear of the list
     -- (LRU replacement).
                  *************************
        if Temp_SLC_Size = SLC_Size then
```

A-27

```
if SLC.tail.NumRef /= -1 then
         SLC_Non_Ref := SLC Non Ref + 1;
       end if;
       Delete (SLC, SLC.tail);
       Temp_SLC_Size := Temp_SLC_Size - 1;
    end if;
    Temp Ndl := MakeNode (CAM.Address (Position));
    AddToFront (SLC, Temp_Ndl);
    SLC Total_Prefetch := SLC_Total_Prefetch + 1;
    Temp SLC Size := Temp SLC Size + 1;
    if Position = 1 then
       Position := CAM_Size_Index;
   . else
       Position := Position - 1;
    end if;
  end loop;
else
                       -- address not found in CAM
-- If a miss occurs in the CAM, then prefetch a
-- line from main memory into the CAM. The line
-- will be comprised of the block of memory (equal
-- to the CAM line size) in which the requested
-- address is located.
CAM_Miss := CAM_Miss + 1;
    -- If the CAM is full, then the delete the
-- addresses (amount equal to the CAM line size)
-- in the front of the CAM (FIFO replacement).
-- ******************************
  if Temp CAM Size = CAM Size then
    for i in 1 .. CAM Line Size loop
       if CAM.Ref Count (CAM.head mod CAM Size Index + 1)
         /= -1 then
         CAM Non Ref := CAM Non Ref + 1;
       end if;
       Dequeue (CAM, CAM Size Index);
       Temp_CAM_Size := Temp_CAM_Size - 1;
    end loop;
  end if;
 o := n / CAM Line Size;
  CAM Prefetch Addr := o * CAM Line Size;
 -- CAM prefetch for a positive integer address.
                                              A~28
if n \ge 0 then
    CAM_Prefetch_Addr := CAM_Prefetch Addr - 1;
```

```
for i in 1 .. CAM_Line_Size loop
        CAM_Prefetch_Addr := CAM_Prefetch_Addr + 1;
        Enqueue (CAM, CAM_Size_Index, CAM_Prefetch_Addr);
        CAM Total Prefetch := CAM Total Prefetch + 1;
        Temp_CAM_Size := Temp_CAM_Size + 1;
     end loop;
  else
  -- CAM prefetch for a negative integer address.
  -- ********************************
     if n rem CAM Line Size /= 0 then
        CAM Prefetch Addr := CAM Prefetch Addr - CAM Line Size;
     end if;
     for i in 1 .. CAM_Line Size loop
        Enqueue (CAM, CAM Size Index, CAM Prefetch Addr);
        CAM Total Prefetch := CAM_Total_Prefetch + 1;
        CAM_Prefetch_Addr := CAM_Prefetch_Addr + 1;
        Temp_CAM_Size := Temp_CAM_Size + 1;
     end loop;
  end if;
-- Prefetch the addresses from the CAM to the SLC
-- starting with the requested address and ending
-- with the last (tail of the circular queue)
-- address in the CAM.
Temp CAM Tail := CAM.tail;
  while CAM.Address (Temp_CAM_Tail) /= n loop
     Temp Ndl := MakeNode (CAM.Address (Temp_CAM_Tail));
     if Temp SLC Size = SLC Size then
        if SLC.tail.NumRef /= -1 then
           SLC_Non_Ref := SLC_Non_Ref + 1;
        end if;
        Delete (SLC, SLC.tail);
        Temp_SLC_Size := Temp_SLC_Size - 1;
     end if;
     AddToFront (SLC, Temp_Ndl);
     SLC_Total_Prefetch := SLC_Total Prefetch + 1;
     Temp SLC Size := Temp_SLC_Size + 1;
     if Temp CAM Tail = 1 then
        Temp CAM Tail := CAM Size Index;
     else
        Temp CAM_Tail := Temp_CAM_Tail - 1;
     end if;
  end loop;
  if Temp_SLC_Size = SLC_Size then
     if SLC.tail.NumRef /= -1 then
        SLC Non Ref := SLC Non Ref + 1;
     end if;
                                                    A-29
     Delete (SLC, SLC.tail);
     Temp_SLC_Size := Temp_SLC_Size - 1;
```

end if;

Temp\_Ndl := MakeNode (CAM.Address (Temp\_CAM\_Tail)); AddToFront (SLC, Temp\_Ndl); SLC\_Total\_Prefetch := SLC\_Total\_Prefetch + 1; Temp\_SLC\_Size := Temp\_SLC\_Size + 1;

end if;

end if;

end Serv\_Instr\_Fetch; end Serv\_Instr\_Fetch\_Package; with Text\_IO, Addr\_Record\_Package; use Text\_IO, Addr\_Record\_Package;

package Fetch\_Address\_Package is

end Fetch\_Address\_Package;

.

```
× .
with Text IO, Hex to Dec Package;
use Text_IO, Hex_to_Dec_Package;
package body Fetch_Address Package is
   procedure Load Record (Input File: in out File Type;
                          Memory Ref: out Addr Record) is
      type Field Type is (field1, field2);
      In String
                     : string (1 .. 11);
      In Length
                     : natural;
      New_Space
                    : natural;
                   : natural;
      Last_Space
                   : Field_Type := field1;
: string (1 .. 8);
      In Field
      Hex Addr
      Hex_Length : natural;
   begin
         get_line (Input_File, In_String, In_Length);
         New Space := 1;
         Last Space := 0;
      loop
         while ((New Space < In_Length) and
            (In_String (New_Space .. New_Space) /= " ")) loop
            New Space := New Space + 1;
         end loop;
         case In_Field is
            when field1 => Memory_Ref.The_Type :=
               In_String (New Space - 1);
               In_Field := field2;
            when field2 => Hex_Addr ((Last_Space - 1) .. New_Space - 2) :=
               In_String ((Last_Space + 1) .. New_Space);
               Hex_Length := New_Space - Last_Space;
               Memory_Ref.Address := Hex_to_Dec (Hex_Addr, Hex_Length);
               In_Field := field1; exit;
            when others => exit;
         end case;
         Last Space := New Space;
         New Space := New Space + 1;
      end loop;
   end Load Record;
end Fetch_Address_Package;
```

package Hex\_to\_Dec\_Package is

end Hex\_to\_Dec\_Package;

```
with Text IO; use Text IO;
package body Hex to Dec Package is
   function Hex_to_Dec (Hex_Addr: string; Hex_Length: natural)
                        return integer is
                         : constant := character'pos ('0');
      Zero Pos
      Capital A Pos
                        : constant := character'pos ('A');
                        : constant := character'pos ('a');
      Small A Pos
      package Type Integer IO is new integer_IO (integer);
      use Type_Integer_IO;
      type Dec Value Type is range -2**31 .. 2**31-1;
      Temp Dec Value: Dec Value Type := 0;
      Num_Value: Dec_Value_Type range 0 .. 15;
      Dec_Value: integer;
      Hex_Char: character;
  begin
      for i in 1 .. Hex_Length loop
         Hex_Char := Hex_Addr (i);
         case Hex Char is
            when '0' .. '9' =>
               Num_Value := character'pos (Hex_Char) - Zero_Pos;
            when 'A' \therefore 'F' =>
               Num Value := character'pos (Hex_Char) -
                            Capital_A_Pos + 10;
            when 'a' .. 'f' =>
               Num_Value := character'pos (Hex_Char) -
                            Small A Pos + 10;
            when others => exit;
         end case;
         if i < 8 then
            Temp Dec_Value := 16 * Temp_Dec_Value + Num_Value;
         else
            Temp Dec Value := Temp Dec Value - 2**27;
            Temp Dec_Value := 16 * Temp_Dec_Value + Num_Value;
         end if;
      end loop;
      Dec Value := integer (Temp_Dec_Value);
      return Dec_Value;
   end Hex_to_Dec;
end Hex_to_Dec_Package;
```

with Addr\_Record\_Package; use Addr\_Record\_Package;

package Determine\_Type\_Package is

function Address\_Type (Memory\_Ref: Addr\_Record) return
character;

end Determine\_Type\_Package;

.

with Addr\_Record\_Package; use Addr\_Record\_Package;

package body Determine\_Type\_Package is

return Memory\_Ref.The\_Type; end Address\_Type;

end Determine\_Type\_Package;

with Text\_IO; use Text\_IO;

package Compute\_Miss\_Ratios\_Package is

procedure Compute\_Miss\_Ratios (SLC\_Miss: in natural; CAM\_Miss: in natural; SLC\_Total\_Refs: in natural; CAM\_Total\_Refs: in natural; Num\_Ref: in natural; SLC\_MR: out float; CAM\_MR: out float; Output\_File: in out File\_Type);

end Compute\_Miss\_Ratios\_Package;

```
with Text IO; use Text IO;
package body Compute Miss Ratios Package is
   procedure Compute Miss_Ratios (SLC_Miss: in natural;
      CAM_Miss: in natural; SLC Total Refs: in natural;
      CAM Total Refs: in natural; Num_Ref : in natural;
      SLC_MR: out float; CAM_MR: out float;
      Output File: in out File Type) is
      type MR Type is delta 0.0001 range 0.0 .: 1.0;
      package MR Type IO is new fixed IO (MR Type);
      package Type_Integer_IO is new integer_IO (integer);
      use MR_Type_IO, Type_Integer IO;
      SLC Miss Ratio
                       : MR Type;
                       : MR Type;
      CAM Miss Ratio
      SM, CM, ST, CT
                       : float;
      SMR, CMR
                       : float;
  begin
      SM := float (SLC Miss);
      CM := float (CAM_Miss);
      ST := float (SLC_Total_Refs);
      CT := float (CAM_Total_Refs);
      new line (3);
      put ("Number of References Processed: "); put (Num Ref);
      new line (2);
      put ("Number of SLC Misses:
                                    "); put (SLC_Miss);
      new line;
      put ("Number of CAM Misses:
                                    "); put (CAM_Miss);
      new line;
      put ("Total SLC References:
                                    "); put (SLC_Total_Refs);
      new line;
      put ("Total CAM References:
                                    "); put (CAM Total_Refs);
      new line (2);
      SMR := SM / ST;
      CMR := CM / CT;
      SLC Miss_Ratio := MR_Type (SMR);
      CAM_Miss_Ratio := MR_Type (CMR);
      SLC_MR := float (SLC_Miss_Ratio);
      CAM MR := float (CAM Miss Ratio);
      put ("SLC_Miss_Ratio: "); put (SLC_Miss_Ratio);
      new line;
      put ("CAM Miss Ratio:
                             "); put (CAM Miss Ratio);
      Set_Col (Output_File, 1);
      put (Output_File, Num_Ref);
      put (Output File, SLC Miss);
      put (Output File, CAM Miss);
      put (Output_File, SLC_Total_Refs);
      put (Output_File, CAM_Total_Refs);
      put (Output_File, SLC_Miss_Ratio);
      put (Output File, CAM Miss Ratio);
   end Compute_Miss_Ratios;
```

```
end Compute_Miss_Ratios_Package; A-38
```

with Text\_IO; use Text\_IO;

.

package Compute\_Memory\_Access\_Time\_Package is

procedure Compute\_Memory\_Access\_Time (SLC\_MR: in float; CAM\_MR: in float; Output\_File: in out File\_Type);

end Compute\_Memory\_Access\_Time\_Package;

with Text\_IO; use Text\_IO;

package body Compute Memory Access Time Package is

end Compute\_Memory\_Access\_Time;

end Compute Memory Access Time Package;

end Compute\_Cache\_Pollution\_Package;

package body Compute\_Cache\_Pollution\_Package is

procedure Compute\_Cache Pollution (SLC: in out List; CAM: in out Queue; SLC\_Non\_Ref: in out natural; CAM Non Ref: in out natural; SLC Total Prefetch: in natural; CAM\_Total Prefetch: in natural; Temp CAM Size: in natural; . Output File: in out File Type) is type Fixed\_Type is delta 0.0001 range 0.0 .. 1.0; package Type Fixed IO is new fixed IO (Fixed Type); use Type Fixed IO; package Index Integer IO is new integer IO (index); use Index Integer IO; SLC\_Pollution : Fixed\_Type; CAM\_Pollution : Fixed\_Type; CAM\_Pollution : Fixed\_Type; SNR, CNR, STP, CTP : float; Temp\_Node : NodePointer; : index; j begin Temp Node := SLC.Next; while Temp\_Node /= null loop if Temp\_Node.NumRef /= -1 then SLC\_Non\_Ref := SLC\_Non\_Ref + 1; end if; Temp Node := Temp Node.Next; end loop; SNR := float (SLC\_Non\_Ref); STP := float (SLC\_Total\_Prefetch); SLC\_Pollution := Fixed\_Type (SNR/STP); Set\_Col (Output\_File, 1); put (Output\_File, "SLC Pollution: "); put (Output File, SLC Pollution); \_\_\_\_\_ for i in 1 .. Temp\_CAM\_Size loop j := index (i);if CAM.Ref\_Count (j) /= -1 then CAM\_Non\_Ref := CAM\_Non\_Ref + 1; end if; end loop; CNR := float (CAM\_Non\_Ref); CTP := float (CAM Total Prefetch); CAM Pollution := Fixed Type (CNR/CTP); Set Col (Output File, 1); put (Output\_File, "CAM Pollution: "); put (Output File, CAM Pollution); end Compute Cache Pollution;

end Compute\_Cache\_Pollution\_Package;

with Text\_IO, LinkedLists\_Package; use Text\_IO, LinkedLists\_Package;

package Generate\_Ref\_Frequency\_List\_Package is

end Generate\_Ref\_Frequency\_List\_Package;

package body Generate Ref Frequency List Package is procedure Generate\_Ref\_Frequency\_List (SLC\_Ref\_List: in out List; CAM\_Ref\_List: in out List; SLC\_Non\_Ref: in natural; CAM Non Ref: in natural; Reference\_File: in out File\_Type) is package Type\_Integer\_IO is new integer\_IO (integer); use Type\_Integer\_IO; Temp\_Ndl, Temp\_Nd2 : NodePointer; Reference\_Count : natural := 0; begin Set Col (Reference File, 1); put (Reference\_File, "SLC: Frequency of References"); Set Col (Reference File, 1); 0"); put (Reference\_File, " put (Reference File, SLC Non\_Ref); Temp\_Ndl := SLC\_Ref\_List.Next; Temp\_Nd2 := SLC\_Ref\_List.Next; while Temp\_Nd1 /= null loop while Temp\_Nd1 /= null and then Temp\_Ndl.NumRef = Temp\_Nd2.NumRef loop Temp Ndl := Temp\_Ndl.Next; Reference\_Count := Reference\_Count + 1; end loop; Set Col (Reference\_File, 1); put (Reference\_File, Temp\_Nd2.NumRef); put (Reference\_File, Reference\_Count); Temp Nd2 := Temp Nd1; Reference Count := 0; end loop; Set\_Col (Reference File, 1); put (Reference\_File, " Set Col (Reference\_File, 1); put (Reference\_File, "CAM: Frequency of References"); Set Col (Reference File, 1); 0"); put (Reference\_File, " put (Reference\_File, CAM\_Non\_Ref); Temp Ndl := CAM\_Ref\_List.Next; Temp\_Nd2 := CAM\_Ref\_List.Next; while Temp Ndl /= null loop while Temp Nd1 /= null and then Temp\_Ndl.NumRef = Temp\_Nd2.NumRef loop Temp\_Ndl := Temp\_Ndl.Next; Reference\_Count := Reference\_Count + 1; end loop; Set\_Col (Reference File, 1); put (Reference\_File, Temp\_Nd2.NumRef); put (Reference\_File, Reference\_Count); Temp Nd2 := Temp\_Nd1; Reference\_Count := 0; end loop; end Generate\_Ref\_Frequency\_List;

```
end Generate_Ref_Frequency_List_Package;
```

```
A-44
```

Appendix B: Test Trace Mapping to Testing Requirements

.

## Test Trace Mapping to Testing Requirements (Figure 3.16)

| Note:  | Testi<br>in the<br>test<br>drive<br>Explo<br>so th<br>point | ng re<br>e tes<br>trac<br>r.<br>rer f<br>e te<br>s in | equirements are identified at the first point<br>st trace where they are tested. The following<br>e was designed for the VAX traces: Version 1<br>To test the Version 2 driver for the TI<br>traces, the addresses were changed to integers<br>sting requirements were tested at the same<br>the test trace. |
|--------|-------------------------------------------------------------|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|        | <u>Cache</u><br>S<br>SL                                     | LC S<br>C B1                                          | ize: 8 CAM Size: 16<br>ock: 4 CAM Block: 2                                                                                                                                                                                                                                                                   |
| 0 1AF7 | 6945                                                        | <=                                                    | I.A.1, I.A.3, I.A.4, II.A.2, II.B.2,<br>II.D.1, II.D.2                                                                                                                                                                                                                                                       |
| 0 1AF7 | 6946                                                        |                                                       |                                                                                                                                                                                                                                                                                                              |
| 0 1AF7 | 6944                                                        | <=                                                    | II.B.1, II.C.1                                                                                                                                                                                                                                                                                               |
| 1 2378 | BC12                                                        | <=                                                    | II.D.4                                                                                                                                                                                                                                                                                                       |
| 1 2378 | BC13                                                        | <=                                                    | II.A.1                                                                                                                                                                                                                                                                                                       |
| 0 1AF7 | 6946                                                        |                                                       |                                                                                                                                                                                                                                                                                                              |
| 0 1AF7 | 6947                                                        |                                                       |                                                                                                                                                                                                                                                                                                              |
| 1 2378 | BC14                                                        |                                                       |                                                                                                                                                                                                                                                                                                              |
| 0 1AF7 | 6948                                                        |                                                       |                                                                                                                                                                                                                                                                                                              |
| 0 1AF7 | 6945                                                        | <=                                                    | II.C.2, II.C.3                                                                                                                                                                                                                                                                                               |
| 2 1890 | DF03                                                        |                                                       |                                                                                                                                                                                                                                                                                                              |
| 2 1890 | DF02                                                        | <=                                                    | II.C.4                                                                                                                                                                                                                                                                                                       |
| 1 2458 | F3B2                                                        |                                                       |                                                                                                                                                                                                                                                                                                              |
| 1 2458 | F3B3                                                        |                                                       |                                                                                                                                                                                                                                                                                                              |
| 1 2458 | F3B4                                                        |                                                       |                                                                                                                                                                                                                                                                                                              |
| 1 2458 | F3B5                                                        |                                                       |                                                                                                                                                                                                                                                                                                              |
| 1 2458 | F3B6                                                        | <=                                                    | II.D.2                                                                                                                                                                                                                                                                                                       |
| 0 9654 | EF24                                                        |                                                       |                                                                                                                                                                                                                                                                                                              |

B-2

- 0 9654EF25
- 1 2458F3B2
- 1 2458F3B3
- 2 189CDF01
- 2 189CDF02
- 1 2458F3B4
- 1 2458F3B5 <= II.E.1-4
- 3 08971234
- 4 367814BC
- 0 9654EF24
- 0 9654EF25
- 1 67209AF0
- 2 395CAFB1
- 0 9087BA23
- 1 65432DA0
- 1 2458F3B4
- 1 2458F3B5
- 0 56129027
- 1 9BF23467
- 1 67209AF0
- 2 395CAFB1
- 0 4F29B560
- 1 2A9F73CA
- 2 9876AFB3
- 0 68210BD3
- 0 56129027

- 1 9BF23467
- 2 B5DA0 <= I.A.2
- 1 6754D231
- 0 4F29B560
- 1 2A9F73CA

.

0 1AF76945 <= III.A-D

## Test Trace Results

| SLC | Miss Rate: 0   | .7083   |             |
|-----|----------------|---------|-------------|
| CAM | Miss Rate: 0   | .7059   |             |
| SLC | Pollution: 0   | .8267   |             |
| CAM | Pollution: 0   | .8333   |             |
| Eff | Memory Access  | Time:   | 17.124      |
|     | <b>_</b>       |         |             |
| SLC | Frequency of 1 | Referer | nces:       |
| 0   | 62             | (never  | referenced) |
| 1   | 9              | •       | •           |
| 3   | 1              |         |             |
| 4   | 2              |         |             |
| 7   | 1              |         |             |
| •   | -              |         |             |
| CAM | Frequency of 1 | Referer | nces:       |
| 0   | _ 40 _         | (never  | referenced) |
| 1   | 1              | •       | ·           |
| 2   | 1              |         |             |
| 4   | 1              |         |             |
| 5   | 1              |         |             |
| 6   | 2              |         |             |
| 7   | - 2            |         |             |
| •   | -              |         |             |

•

Appendix C: Wc and Ws vs. Effective Cache Size Graphs



-

Ws




٧s



Wс

Effective Cache Size



W<sub>C</sub>

Effective Cache Size

C-5

## **Bibliography**

- Agarwal, A. and others. "ATUM: A New Technique for Capturing Address Traces Using Microcode," Proceedings of the 13th Annual International Symposium on Computer Architecture. 119-127. New York: IEEE Press, 1986.
- Alexander, Cedell and others. "Cache Memory Performance in a Unix Environment," ACM Computer Architecture News, 14: 41-70 (June 1986).
- Baer, Jean-Loup and others. "Organization and Performance of a Two-Level Virtual-Real Cache Hierarchy," Proceedings of the 16th Annual International Symposium on Computer Architecture. 140-148. New York: IEEE Press, 1989.
- Bakka, Bjorn O. and others. "Trace-Driven Simulations for a Two-Level Cache Design in Open Bus Systems," ACM Computer Architecture News, 18: 250-259 (June 1990).
- Feldman, Michael B. Data Structures with Ada. Reston: Reston Publishing Company, Inc., 1985.
- Hayes, John P. Computer Architecture and Organization (Second Edition). New York: McGraw-Hill Book Company, 1988.
- Hennessy, John and others. "Performance Tradeoffs in Cache Design," Proceedings of the 15th Annual International Symposium on Computer Architecture. 290-298. New York: IEEE Press, 1988.
- -----. "Characteristics of Performance-Optimal Multi-Level Cache Hierarchies," Proceedings of the 16th Annual International Symposium on Computer Architecture. 114-121. New York: IEEE Press, 1989.
- Hill, Mark D. and Dionisios N. Pnevmatikatos. "Cache Performance of the Integer SPEC Benchmarks on a RISC," ACM Computer Architecture News, 18: 53-68 (June 1990).
- Hobart, Maj William C., Jr. An Investigation of the Locality of Memory Accesses During Symbolic Program Execution. PhD dissertation. The University of Texas at Austin, Austin TX, 1989.
- Iyer, Ravishankar K. and others. "Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems," IEEE Transactions on Computers, 37: 1325-1335 (November 1988).

- Johnson, Eric E. "Working Set Prefetching for Cache Memories," ACM Computer Architecture News, 17: 137-141 (December 1989).
- Levy, Henry M. and Robert T. Short. "A Simulation Study of Two-Level Caches," ACM Computer Architecture News, 16: 81-87 (May 1988).
- Przybylski, Steven. "The Performance Impact of Block Sizes and Fetch Strategies," Proceedings of the 17th Annual International Symposium on Computer Architecture. 160-169. New York: IEEE Press, 1990.
- Smith, Alan J. "Sequentiality and Prefetching in Database Systems," ACM Transactions on Database Systems," 3: 223-247 (September 1978).
- -----. "Sequential Program Prefetching in Memory Hierarchies," IEEE Computers, 12: 7-21 (December 1978).
- -----. "Cache Memories," Computing Surveys, 14: 473-523 (September 1982).
- -----. "Line (Block) Size Choice for CPU Cache Memories," IEEE Transactions on Computers, C-36: 1063-1075 (September 1987).
- Thazhuthaveetil, M.J. A Structured Memory Access Architecture for Lisp. PhD dissertation. The University of Wisconsin-Madison, Madison WI, 1986.