







The Implications of VLSI ROM Chips on Numerical Analysis

٦

Ľ,

January 8,1982

Principal Investigators: L. Feeser M. Rooney M. Shephard Agency: Office of Naval Research Contract No.: N00014-80-C-0712 R. P. I. Project No.: 5-24350



DISTRIBUTION STATEMENT A Approved for public release; Distribution Unlimited

· W. a submer

## TABLE OF CONTENTS

2

Ľ.

I.

:

| Introduction1                    |
|----------------------------------|
| Overview2                        |
| Review of ROM Hardware5          |
| Data Storage on ROM14            |
| Hardware Look-Up14               |
| Programming Interface            |
| Speed Increases23                |
| Down-Line Loading                |
| Needed Technology                |
| Software Storage on ROM          |
| Advantages and Myths of Firmware |
| Uses of ROM Software34           |
| Programming Interface            |
| Creation of Firmware             |
| Firmware Examples41              |
| Conclusions44                    |
| Acknowledements47                |
| References                       |
| INDEX                            |





- -

.

Me of hand and the second

.

The Implications of VLSI ROM Chips on Numerical Analysis

#### Introduction

Very large scale integrated circuits (VLSI) are an established technique for implementing computing systems and subsystems. This technology has resulted in both increased processing capability and reduced hardware cost to a scale that custom processing elements are practical for small to medium application areas. In particular, the goal of this research effort is to focus upon applications in numerical analysis and structural engineering computations.

In an earlier technical report [1], the hardware aspects of VLSI technology were explored. A series of reports each focusing upon the application and implementation of one VLSI hardware feature now follow. This first report concentrates upon read-only memory (ROM) circuits and their derivatives. The report is divided into four sections: overview of VLSI advances, a brief review of ROM circuits, storage of data on ROM chips, and storage of software on ROM chips (or firmware).

This report will show the potential speed increases made available by storing pre-computed data on ROM rather than computing upon request. The report will also demonstrate the availability of customized processors, particularly micro-computers, made possible through firmware. The implementation of these ROM applications will be discussed with emphasis on potential changes to current programming practice, a prime concern of the numerical analyst.

#### <u>Overview</u>

While this section is not crucial to the comprehension of this report, it is important to maintain a perspective on the advances in micro-electronics with the intent of isolating trends and predicting future cost/performance ratios. This section, therefore, presents a brief overview of those advances.

There are two ways to view the advances resulting from large scale integration of transistor circuits on silicon wafers [2]. First, for a constant cost, VLSI has resulted increased performance. Second, for a constant in performance level, processing costs have decreased. In reality, these are simply restatements of the same concept, shown graphically in Fig. 1. The diagonal lines represent equi-potential performance. These of points have historically remained straight, and it is predicted that these trends will continue.

As a result of performance/cost increases, the levels of integration have also progressed, following the trend illustrated in Fig. 2. Currently, the IBM 370/168 has been

2 1971 B.T





converted to an experimental chip; 16 and 32 bit micro-processors, some with virtual memory, are beginning to appear. And, research is underway to implement a DEC-VAX system onto a single wafer. Memory circuits are following the same pattern of increased integration.

VLSI technology can only be applied to transistor circuits, not electro-mechanical devices. As a result, the primary influences are felt in: CPU's, memory circuits, peripheral electronics (e.g., I/O drivers), and improved reliability. Secondary influences affect power supply circuits, packaging size reductions, and software. Little or no influence exists for electro-mechanical devices such as printers and tape/disk transports.

With these trends of improved performance/cost ratios in mind, the remainder of this report is devoted the examining the impact of readily available read-only memory (ROM) circuits.

#### Review of ROM Hardware

· We have party to any

While the first technical report [1] examined the hardware aspects of VLSI technology, including read-only memory; a brief review of the basic operation of ROM chips is included for completeness and some historical perspective is added. More importantly, the definitions of "ROM", "PROM", and "EPROM" used throughout this report are presented in this section. Finally, it is shown that

current size limitations of ROM chips can be easily overcome by using several chips.

Read-only memory is defined as any type of storage system whose contents can be read, but not changed by the computer. One simple form of secondary read-only memory is the punch card, which actually preceded the first computer. However, it is the intent of this report to focus only upon primary ROM storage; that is, circuits which are part of the CPU hardware and can be randomly accessed.

Ironically, primary ROM circuitry was the original method for programming electronic computers. The "programmer" was required to construct a circuit by plugging jumper wires into a pegboard as shown in Fig. 3. The entire board was then plugged into the computer and the wires (hopefully) made the proper connections. Needless to say, the programming process was tedious, but once constructed, it made the CPU a custom processor. John Von Neumann changed the procedure by storing and handling programs like all other data, and ROM circuitry all but vanished except for bootstrap mechanisms.

The need for ROM circuitry remained, but its usage was hindered by high cost and large physical size as well as the complexity of constructing it. By applying the VLSI technologies to memory circuits, as done for processor elements, ROM circuits are realizable for the storage of



programs and frequently used data. In fact, the highly repetitive nature of memory cells has resulted in even greater cost reductions for ROM than for processors.

A typical configuration for a read-only memory chip is shown in Fig. 4. A binary coded address enters on the left. The address is decoded and closes the appropriate internal switches to allow the contents of the specified memory location to flow out on the right through the data lines. The operation of each memory cell is still based upon the pegboard/jumper system. However, there are three major commercially available systems with different techniques for creating the jumpers.

In a conventional ROM (usually denoted by only the letters ROM), the jumpers are created and not created by the pattern transferred onto the silicon wafer. That is, they are manufactured directly. The advatage of this approach is extremely low cost when produced in quantity. The drawback is that a large number must be produced to offset the high fixed manufacturing costs (e.g., making the masks or patterns).

The programmable read-only memory (denoted as a PROM) is the next level of sophistication. The chip is manufactured without any data in it. After physical manufacture, the virgin chip is placed in a programming device and the data is programmed into it. The final PROM



is then used like a conventional ROM. This programming can be done only once, however. Fig. 5 illustrates a simple model of the internal workings of a PROM and is presented to aid in understanding the limitation (i.e., irreversibility) of PROM circuits. Each data line in each memory location is connect to the supply voltage on one end, to the external data lines and addressing switches on the other end and contains a segment of reduced diameter "wire". This reduced segment acts as a fuse element. Under normal working conditions, the supply voltage is low enough that the induced current through the fuse section of the data line does not cause it to burn out. In order to program a new chip (one where all fuses are intact), the supply voltage is When any data line is connected to ground, raised. sufficient current flows through the data lines to burn out If a zero (no voltage) bit is desired, the fuse the fuse. is burned out; and vice versa. It should now be clear that once a zero bit is "burned in" to a PROM, it cannot be changed. The advantage of a PROM is the ability to program it after manufacture making it well suited to small quantity applications. However, the added step of programming makes the chip most costly.

At the pinnacle of read-only memory technology is the erasable programmable read-only memory (denoted as an EPROM). Both the ROM and PROM are final in their



programming; if an error is embedded into the chip, there is no way to correct it and the chip must be discarded. EPROM's, are similar to PROM's but use a different fusing mechanism, one that can be repaired. Repairs are usually done on a global basis, essentially creating a new chip that must be completely reprogrammed. A common arrangement uses ultraviolet light to refuse the junctions. This added feature often makes the chip more expensive; but more importantly it makes the chip vulnerable to accidental erasing. As a result, the EPROM is best suited to small volume applications where errors are probable to occur, such as an experimental environment.

While the capacity of read-only memory chips have and will continue to increase, there are many applications where the capacity of a single chip is simply not sufficient. As shown in Fig. 6, it is a simple matter to use two or more chips together. Some of the address lines are used as address inputs and are sent to all ROM chips simultaneously. The remaining address lines are fed into a switching circuit which connects the output of the appropriate ROM to the final data output lines. In this manner, the switching circuit acts as a chip select mechanism and the effect is to create a much larger ROM chip.



Fig. 6 - Extending ROM Size

#### Data Storage on ROM

The storage of data is the most logical use of ROM chips. This section examines a technique called "hardware look-up" and how it can be used in numerical analysis applications.

will be shown that the technique can handle It functions of integer values, functions of real values, functions of multiple values, and even simultaneously evaluate multiple functions. Two possible programming implementations will be examined; both requiring little or no changes to existing programs and current programming practice. For the numerical analyst, the prime result will be significant speed increases, and a simulation procedure is demonstrated for predicting expected increases. The current technology will handle "hardware look-up", and the mechanics of placing data on ROM is discussed, including a multi-computer approach called "down-line loading" which enables a micro-computer to emulate a main-frame computer's capabilities.

<u>Hardware</u> <u>Look-Up</u>. The mechanics of the "hardware look-up" scheme works as follows: The input data value is written into a specific memory location. This data is then used as an address by the ROM chip. The output data value is then transferred by the ROM chip into some special memory location. The flow of data is shown in Fig. 7. The scheme



can be extended to functions of two variables by using the combination of two input data as one address for the ROM as shown in Fig. 8. Further, several ROM chips can be connected to the same memory locations to allow several functions to be "looked-up" simultaneously as shown in Fig. 9.

ROM chips are designed for the storage of discrete data; that is, data which is accessed by whole integer numbers. Thus, we may think of data stored in ROM as elements of an array, and the input address to the ROM may be thought of as the index of the array. This makes ROM chips well suited to the storage of tables such as wide-flange beam sizes and properties. In fact, for multi-column tables (tables consisting of multiple items per entry line), an arrangement of parallel ROM chips is ideal, for it allows quick access to all entries and simplified installation of updates.

Most of the data and/or functions used by the numerical analyst are continuous or analog in their nature. That is, they have an infinite number of values. Because ROM chips are discrete (i.e., have a finite number of storage locations), it is <u>theoretically</u> impossible to store a continuous function. However, by considering a practical limitation of computers, numerical round-off, it is not only possible but actually quite simple to store a continuous





وريستومره بعد



function.

All numbers, integer or real, are stored in the computer as a pattern of binary digits [6]. The number of digits used is determined by the word size of the computer and the encoding scheme, but is a fixed number for a given However, for a fixed number of binary digits, say n, type. only 2\*\*n possible permutations exist. In order to accommodate real numbers using a fixed number of digits (and thus a finite number of permutations), the desired range of real numbers is divided into subregions and each subregion corresponds to a permutation. (See Fig. 10.) Any real number within a given subrange is assigned to the same permutation, and thus are all treated as the same real This process is conventionally called "round-off". number. The important result is that the continuous set of real numbers have been converted to a discrete set of binary permutations. By interpretting the binary permutation patterns as integers, an address can be obtained for storage and retrieval on the ROM chip. (The net result is much like that of printing a real number with an I format in the FORTRAN language.)

In summary, real numbers are encoded to a discrete set of bit patterns through round-off; the patterns are then used as input addresses to the ROM chips. No accuracy is lost, no numbers unaccounted for, as all <u>storable</u> real

. The second





numbers are used. Thus, continuous furnctions (such as sine, logarithms, and polynomials) may be computed once and completely stored on ROM by storing the corresponding values for all of the <u>storable</u> real input values. For a 16 bit encoding scheme, 64K of ROM is required.

<u>Programming Interface</u>. To the numerical analyst, the most important issue is how ROM chip capabilities will be used with programming languages; and more specifically, how current programs will have to be modified to take advantage of the new technology. The discussion presented here is geared toward the FORTRAN language; however, similar constructs are available in most other languages, most notably, BASIC, the language used on most micro-computers. Two major approaches are available: function calls, and special COMMON blocks.

The COMMON block approach requires the programmer to include a special labelled COMMON block in the program. The linker (or linking loader) is used to bind this block to the locations in memory where the ROM circuits reside. Most linkers presently provide this capability in their advanced or extended features. The ROM chips may then be used in the manner previously described: input data is sent by making an arithmetic assignment, and the generated output data is available by simply using the appropriate variable from the common block. For example:

· The second second second

COMMON /TRIG/ X,SIN,TAN X=Y\*3.14/180.

Z=SIN+(2.\*TAN)

The first line designates the special COMMON block; the second line loads a value into the parallel ROM circuits; and the third line uses two of the output data values.

The function call approach behaves the same as any system or language supplied function. That is, an ordinary function call is made (e.g., X=SIN(Y)), and a machine or assembly language routine is involked to generate and return the proper value. Further, an array is syntactically called in the same manner as a function; thus, the function call approach can also be compared with addressing arrays. The blending of FORTRAN and assembler routines is done with the linker (or linking loader) and can be easily handled by all existing linkers.

Both approaches are quite workable and simple. The COMMON block approach is better suited when several inputs are needed, and is <u>required</u> if several output are generated by a parallel ROM arrangement. This latter condition occurs because a FORTRAN function can return only one value. The function call approach is vastly superior for existing software. Present code can be left intact: the code for the function is simply replaced with new function code which accesses the ROM chips. However, the COMMON block approach

a to brank the second

can be converted easily to a function call approach (i.e., the COMMON blocks can be hidden away in the function call). As a result, the COMMON block technique should be used as the implementation scheme and the programmer can then use both.

The most significant result is that current programming practice can continue, and conversion can be done to existing code without rewriting it.

<u>Speed Increases</u>. The primary benefit from hardware look-up will be increased speed. This increase will be derived from pre-computing all potentially needed values rather than generating them upon request, much like mathematical tables are (or were) used in engineering practice. In order to gain some insight into the magnitude of the increase, a simple simulation procedure was carried out.

The procedure consisted of running a pair of programs for several different functions. The first of the pair of programs contained a conventional function call; thus, computations were performed upon request. The second program of each pair simulated the ROM chips by: 1) creating an array, 2) computing all potential values of the function and storing them in the array, 3) changing all function calls to array accesses (although this actually requires nothing as the syntax is the same for both), and 4) running

the program. The comparison was made by monitoring the CPU time (through system subroutines linked to the operating system) of the complete first program and the execution <u>after</u> loading the array for the second program. Each pair of programs were run four times to minimize system loading fluctuation and the results are shown in Fig. 11. A typical program pair is shown in Fig. 12, and the computations were performed on an IBM 370/3033.

Figure 13 graphically depicts a series run on polynomials. The gradient change at sixth-order polynomials is believed due to a change in computational procedure: specifically, changing from repetitive multiplication to a logarithm-antilogarithm scheme.

Two important trends can be seen by examining the table of Fig. 11 and the graph of Fig. 13. First, for standard mathematical functions (e.g., SIN, LOG), speed increases of 15 to 20 can be realized. These savings can be applied to a large range of numerical computations (e.g., development of rotation matrices and fourier transforms). Second, for custom functions such as polynomials, speed increases are related to the complexity of the function and result in speed increases of several orders of magnitude. It should be noted that no substantial speed increase will result from converting tabular data stored in arrays to ROM chips, but other benefits are derived.

|                         | Time for function<br>evaluation (cpu sec) | Time for array<br>evaluation (cpu sec) | Speed Increase |
|-------------------------|-------------------------------------------|----------------------------------------|----------------|
| Multiplication          | 0.777                                     | 0.095                                  | 8.18           |
| Addition                | 0.727                                     | 0.095                                  | 7.65           |
| Division                | 0.835                                     | 0.105                                  | 7.95           |
| Logarithm               | 1.649                                     | 0.099                                  | 16.66          |
| Sine                    | 1.694                                     | 0.099                                  | 17.11          |
| Cosine                  | 1.675                                     | 0.094                                  | 17.82          |
| Exponentiation          | 1.658                                     | 0.094                                  | 17.63          |
| Polynomials             |                                           |                                        |                |
| 0 <sup>th</sup> order   | 0.585                                     | 0.101                                  | 5.79           |
| 1 <sup>st</sup> order   | 0.707                                     | 0.101                                  | 7.00           |
| 2 <sup>nd</sup> order   | 0.766                                     | 0.101                                  | 7.58           |
| 3 <sup>rd</sup> order   | 0.893                                     | 0.101                                  | 8.84           |
| 4 <sup>th</sup> order   | 0.988                                     | 0.101                                  | 9.78           |
| 5 <sup>th</sup> order   | 1.105                                     | 0.101                                  | 10.94          |
| 6 <sup>th</sup> order   | 1.243                                     | 0.101                                  | 12.30          |
| 7 <sup>th</sup> order   | 1.944                                     | 0.101                                  | 19.25          |
| 8 <sup>th</sup> order   | 2.649                                     | 0.100                                  | 26.49          |
| , 9 <sup>th</sup> order | 3.378                                     | 0.101                                  | 33.45          |
| 10 <sup>th</sup> order  | 4.108                                     | 0.101                                  | ^ <b>0.67</b>  |

\*Speed Increase = Function Time/Array Time

Fig. 11 - Potential Speed Increases with ROM Circuits

. د شا

# Function Call

M2

|    | F(K)=K**2      |
|----|----------------|
|    | CALL TIME      |
|    | DO 20 I=1,1000 |
|    | DO 10 J=1,50   |
|    | Z=F(J)         |
| 1Ø | CONTINUE       |
| 2Ø | CONTINUE       |
|    | CALL TIME      |
|    | STOP           |
|    | END            |

Array

| DIMENSION F(5#) |
|-----------------|
| DO 5 I=1,5ø     |
| F(I)=I**2       |
| CONTINUE        |
| CALL TIME       |
| DO 20 I=1,1000  |
| DO 10 J=1,50    |
| Z=F(J)          |
| CONTINUE        |
| CONTINUE        |
| CALL TIME       |
| STOP            |
| END             |
|                 |

Fig. 12 - Typical Program Pair

And the second s

ς.



· The second second

<u>Down-Line Loading</u>. The programming of FROM and EPROM chips is, of course, done by a computer. This manufacturing process leads to an interesting extension, particularly if EPROM chips are used. Fig. 14 outlines the major steps.

Down-line loading is a procedure where one computer programs another, regardless of the storage medium. It is suggested here that the programming be a transfer of data and that the storage medium be a PROM circuit (or preferably, an EPROM chip).

The first step is to install a virgin PROM or to erase an existing EPROM. Often the PROM must be connected to a special device during programming, one that can supply the necessary electrical signals; the completed PROM is later moved to a permanent, but ordinary, chip socket. Second, communication is established between: the two CPU's, the user and one of the CPU's, and the virgin PROM and the non-programming CPU. Third, the programming CPU transfers the data through the non-programming CPU to the PROM where the data is embedded into silicon. Communications between the CPU's is no longer needed and is usually severed. Fourth, the non-programming CPU runs alone and uses the data from the PROM as if it were initially manufactured into the CPU.

The prime advantge of down-line loading is that the power of a large computer can be used to quickly generate



complex data to be stored on the PROM. Once programmed, the PROM can be used by a smaller machine, and thus, imitates the power of a much larger CPU. Further, because the programming is done once, the larger CPU can "train" many smaller machines. EPROM chips allows the process to be repeated as updates are needed. The communication between two CPU's is usually very fast and the smaller machine often buffers the data for the slower process of programming the PROM chip. This arrangement is best suited for a main-frame to micro-computer connection.

<u>Needed Technology</u>. All of the plans for placing data into ROM is presently available. But, present technology would require multiple chips in an extended ROM configuration. This would result in a somewhat expensive implementation as compared to the "dollars and cents" <u>cost</u> of computing upon request. The speed increases could easily tip the scales in a critical situation, however.

VLSI and ULSI (ultra-large scale integration) will soon provide a single chip capacity large enough for 16 and 32 bit single value input ROM circuits. At that point, the ROM circuit cost will be less than computing upon request.

### Software Storage on ROM

Because a program is treated as data in the modern computer, ROM chips can be used to store programs and data. Further, no changes are required in the construction of the

ROM chip, and thus, the mechanics of transferring the program to the chip is the same, though the generation of the information is different. As a result, normal ROM's, PROM's, and EPROM's may all be used to store programs. The final product, software stored on ROM, is called <u>firmware</u>.

section examines the impact of firmware on This computing. The section is divided into five subsections, with the first three comprising the heart of the discussion. As a result of this discussion, it will become evident that the key importance of firmware is its ability to transform a general processor into a custom or semi-custom processor. the numerical analyst, two practical applications For presently exist: The first use is to extend the hardware features of the machine through firmware subroutines; these subroutine libraries, will like reduce extensions, programming cost by eliminating the rewriting of standard computational procedures. The customization will be as simple as plugging in appropriate ROM modules. The second use takes the first use an additional step and consists of putting an entire program onto ROM. This will create a complete custom processor. Current examples of these applications are shown in the last subsection, and the firmware fourth subsection describes the creation of "down-line loading". The other applications including (microcode programming and operating systems on ROM) should

not be ignored, however. They are destined to be practically available to the numerical analyst as improved sofware tools are developed.

Advantages and Myths of Firmware. The primary advantage of firmware is its ability to create a custom processor. That is, once the ROM chips are installed, the programmer (or user) may view the new software as an intrinsic part of the hardware. As we shall see shortly, these customizations may range from extended hardware features to specialize operating systems dedicated to the performance of a single task.

One peculiar advantage of firmware is that it is difficult to reproduce a ROM chip unless you are original manufacturer. Further, it is almost impossible to produce the same software for another machine if the software resides in a ROM. That is, it is a prohibitively costly procedure except for the developer of the firmware who alone has access to the source code and original manufacturing chip masks. Thus, firmware aids in the battle against software piracy. In the past, software piracy has not posed a major difficulty for the numerical analyst; but, with the proliferation of micro-computers and corresponding users, the protection of innovative programming (for the recovery of developmental costs) will become increasingly important.

Having seen the two primary advantages of firmware, we shall now expose some of the myths. It is important to firmly establish that software in ROM will NOT yield any significant speed increases. Most programs are loaded into memory prior to the start of execution by the loader or linking loader. While some speed increase certainly occurs in accessing ROM circuitry over disk, the total load time is generally negligible when compared to execution time. Some micro-computer systems appear to exhibit improved performance when operating from ROM; this, however, is an illusion. The speed increase results from executing compiled code (previously translated to machine language) instead of interpretted code (translate and executed simultaneously). This speed improvement can be obtained by simply installing a compiler program.

Firmware usage will result in some space (or memory) savings. However, as memory sizes are and will be expanding with reduced costs, the space savings will not be significant for large machines. For micro-computers the space savings are significant, but, again, an illusion. In order to facilitate firmware, a certain number of memory locations must be connected to the ROM chips; memory locations which must be robbed from normal memory. Thus, it is important to actually limit the amount of memory devoted to firmware.

C. THE ST. P. ST. P. ST.

Firmware has the potential to reduce the cost of software, but only when large quantities of a single item are to be produced. The potential savings result from the ability to produce an entire chip in a few operations regardless of its complexity. There are several snags in this approach, however. First, this maufacturing procedure only applies to normal ROM chips, not PROM or EPROM chips where each datum must be transferred to each chip. Second, the development cost is quite expensive, and thus, must be distributed over a large production run to make duplication a significant cost. Third, the machine targetted for by the software must be physically able to accept ROM chips and the signal interface standards must match. Because hardware varies so greatly, ROM chips are usually compatible with only one machine; thus, many of one type of machine must exist. Hence, production cost savings do not apply for main-frame computers. In fact, these cost savings are only appreciable for popular micro-computers.

<u>Uses of ROM Software</u>. Technically, any program or set of instructions can be placed on a ROM chip. There are, however, certain loose categories of programs that emerge. The discussion here is not meant to be complete, but rather, to indicate the scope of firmware.

The first usage is to extend or define the instruction set of a processor chip. Obviously, this requires a special

processor chip called a micro-programmable micro-processor. This processor is designed and fabricated without a fixed set of instructions. After fabrication a PROM circuit, usually part of the processor chip, is encoded to provide the processor's instruction set. The procedure is called micro-programming and uses a primative but complicated machine language called micro-code. Until such time as the micro-programing task is simplified, this process will remain impractical for numerical analysts.

The second level of usage is to permanently install program segments or subroutines into primary memory. Fig. 15 shows a typical memory map for such an arrangement. The low memory addresses store interrupt vectors and other hardware dependent data. The next low memory block holds the operating system and related data. At the high end of memory are the input/output ports to which the peripherals and peripheral control equipment are wired. The next high block of memory is reserved for ROM chips (data and software), and the ROM chips are physically wired to these locations. The remaining space is conventional primary random access memory, and can, thus, be used for program and data storage. The exact distribution and order of these blocks will vary with each machine and operating system. The subroutines are encoded into ROM chips, which are then "permanently" installed into memory. (We shall explain how





· Mr. with Same Land Martin

they are used shortly.)

At the third level of usage, the ROM space shown in Fig. 15 is expanded and complete programs are installed. Some conventional random access memory must be retained to store the data being processed, the output data, and intermediate results. The classic Von Neumann trade-off of program storage versus data storage limits the system. As with subroutines, several programs can be resident in this configuration and selection made though the operating system.

At the fourth level of usage, the operating system can be implemented as ROM. Again, any associated data must be kept in conventional random access memory. In this fashion, the computer system becomes a dedicated processor (e.g., a word-processing machine or a finite element pocket calculator). This level of usage requires the writing of an operating system, and thus, is impractical for the numerical analyst. As software design aids become available, this technique may prove practical.

<u>Programming Interface</u>. A quick examination of Fig. 15 will reveal that firmware is not any different than normal software except where in memory it is stored. As a result, firmware can readily be handled with existing techniques and tools, most notably the linker, loader, or linking loader.

For the case of the complete program, no linking is required, the code is always resident, and the only operation necessary is for the loader to pass control to the appropriate starting address. The last statement in these ROM programs is to pass control back to the operating system. This procedure also applies to ROM based operating systems except the last step (returning control) is omitted as all operating systems are designed as infinite loops.

While micro-code programming is beyond the scope of this report, the use of micro-code requires the development of new compilers each time the instruction set is redefined or extended. Further, to make efficient use of the newly defined instruction set, the compilers must be optimizing compilers. Hence, it is evident that the micro-programmable processor is currently beyond the practical limits of the numerical analyst.

The only real programming interface for the numerical analyst is the use of program segments or subroutines. These subroutines are (or should be) in a machine readable form and are permanently loaded into the computer. The only concern, then, is the linking process which may be divided into two subproblems. The first subproblem is branching to and returning from the subroutines. The solution is the same as for complete programs except connections are made to a calling program rather than the operating system. Any

existing linker can accomplish this task. The second subproblem is the passing of arguments. The arguments (and other data) must reside in the random access memory, not in Thus, firmware subroutines must be compiled so that ROM. all variables reside separately from the code and all addresses be fixed. The most logical method of assuring this requirement is to use a fixed location COMMON block as was described earlier. It is, of course, possible and practical for all the firmware subroutines to share the same fixed COMMON block as firmware to firmware calls are predictable and shared COMMON blocks will save space. Other methods include passing arguments in the registers, on the system and/or user stacks, and through indirect addressing. These alternate methods are cumbersome for the numerical analyst and provide no real benefit over the COMMON block method.

<u>Creation of Firmware</u>. Virtually all firmware exists as machine language code, yet it is extremely rare for anyone to write machine code. Obviously, some type of translation is made from another language. At the elementary level, an assembly language is used and is translated by an assembler program. Because assembly language is very similar to machine code, assembly language affords the programmer the opportunity to create the highest speed code while simultaneously minimizing software storage requirements.

Although assembly language programs are easier to write than machine code, assembly language is still extremely cumbersome for composing large or complex programs. То alleviate the problem, a high level language such as FORTRAN or BASIC is used to write source code. This source code is then translated to machine code (or an object module) by a special program called a compiler. However, in order to support the flexibility of the high level language, a certain amount of inefficiency must be built into the compiler program. High level languages can also be translated by another type of special program called an interpretter. But, an interpretter translates one line of code at a time as the high level language is being executed. This means that every statement in a loop is translated everytime the loop is executed and the entire high level language code must be retranslated (including all the repetitions of the loops) each time the program is run. While interpretted code has the advantage that source code is stored, the extra translation time quickly outweighs the apparent advantage. As a result, interpretted code is rarely stored as firmware. One final note about translation is warranted: it is common practice to compile or assemble a program on the machine (or same model) where the machine code will operate; however, it is quite possible to compile or assemble on another machine with a special translation

program called a cross-compiler or cross-assembler. This cross translation procedure is becoming increasingly popular for the preparation of firmware.

Once the machine code has been created, it must be transferred into a chip. The mechanics for this process is exactly the same as for data; and conventional ROM, PROM, and EPROM implementations are possible. However, due to the low volumes of firmware production runs, PROM and EPROM chips are almost always used, with EPROM chips handling most of the experimental versions and PROM chips being used for final production runs. As with data storage, down-line loading is possible and used to a small extent.

<u>Firmware Examples</u>. Most firmware in existence (barring system bootstrap loaders) is used in micro-computer systems. As micro-computer usage grows, firmware will also expand. In this section, two systems utilizing firmware are briefly examined. These are certainly not only systems available, but are typical of what is presently done with firmware. (No endorsement of these products is implied.) The first example illustrates how firmware subroutines can be used to extend the hardware features of a machine. The second example illustrates how a complete custom processor can be created. This second example could be used to create a finite element personal computer, i.e., a machine that powers up as a finite element machine. Only the practical

1. 1980 - 10 C 2000 - 1980

limitations of memory size and micro-processor speed hinder this grand creation.

The first system is the Tektronix [3] 4050 series of micro-computers (models 4051, 4052, and 4054). A typical configuration (see Fig. 16) possesses the micro-processor, memory, a vector storage tube display, keyboard, magnetic tape drive, port for a hardcopy unit, an RS-232-C or GPIB port, and slots for ROM cartridges. The machines are programmable only in BASIC and the ROM slots are configured provide various extensions to the BASIC language to depending upon the modules inserted. Modules may be inserted, removed, and swapped by the user as his/her needs change as the modules are simply slid in and out. Presently, only Tektronix offers a line of compatible ROM cartridges and these cartridges are pre-defined (i.e., no mechanism exists for the user to generate his/her own Typical ROM cartridges provide cartridge). matrix an extended line-editor. Because the operations and firmware is in machine language, considerable speed improvements are possible over the normal BASIC language.

The second system is the Radio Shack TRS-80 Color Computer. Models range from 4K memory, standard BASIC language for \$400; to 32K memory, extended BASIC for \$750, both without the television monitor. In addition to BASIC, the computer can be programmed in machine language and



assembly language, thus allowing direct access to the features of the Motorola 6809 micro-processor. The typical system configuration is shown in Fig. 17 and provides the micro-processor, memory, keyboard, and connections to a standard television, cassette recorder, RS-232-C port, and joystick inputs as well as the ROM cartridge slot. In this system the ROM cartridge is intended to customize the operation of the machine. Radio Shack and other independent vendors offer a wide range of cartridges including several games, interactive graphics editor, financial packages, text (or word) processor, cassette filing system, and "smart" terminal package. More importantly, one of the independent vendors [5] will take any BASIC program, translate it, and store the result on a ROM cartridge. The cost ranges from \$42 to \$84 depending upon program size. Using this service, any high level programmer can create a custom processor through firmware.

## <u>Conclusions</u>

This report has presented a thorough overview of the impact of read-only memory (ROM) chips on computing procedures and the relation of this impact with other VLSI technologies. It has been demonstrated that ROM chips can be successfully used to store both data and programs. For the numerical analyst, the following are important points: 1. ROM technology is presently available to provide



numerical analysis requirements on a single board.

- Single chip ROM implementations will be available shortly.
- Data storage on ROM (or hardware look-up) will provide substantial increases in speed.
- 4. Both discrete and continuous data may be stored on ROM chips due to the discrete nature of binary encoding.
- 5. Programs on ROM (or firmware) will allow simplified development of custom processors.
- Subroutines on ROM (firmware) will reduce the programming cost by eliminating the rewriting of standard computational procedures.
- 7. Neither micro-code programming nor operating systems on ROM are within the current practical limitations of the numerical analyst, but are predicted to become available as software tools improve.
- No significant changes are required in programming to accomodate either hardware look-up or firmware.
- 9. Down-line loading will provide main-frame capabilities on micro and mini-computers
- 10. ROM applications, including user defined ROM, is beginning to emerge for micro-computers.

## Acknowledements

The funding for this research was provided by the Office of Naval Research under contract No. N00014-80-C-0712. This is the second technical report produced under that contract.

## References

- Feeser, L.J., Rooney, M.F., and Shephard, M.S., <u>VLSI</u> <u>Technologies and Numerical Analysis</u>, Technical Report to Office of Naval Research, Contract No. N00014-80-C-0712, Department of Civil Engineering, Rensselaer Polytechnic Institute, Troy, New York, 12181, June 1981.
- Marks, M., "Impact of VLSI on System Architecture", Scientific American Seminar, Rensselaer Polytechnic Institute, Troy, New York, November 5, 1981.
- 3. Tektronix Corporation, Beaverton, Oregon.
- Radio Shack, A division of the Tandy Corportion, Fort Worth, Texas.
- 5. Eigen Systems, P.O. Box 10234, Austin, Texas.
- Rooney, Martin F., "Computer Hardware for Civil Engineers", <u>Journal of the Technical Councils</u>, American Society of Civil Engineers, New York, Vol. 107, No. TC1, April, 1981, pp. 153-168.

## INDEX

Acknowledements, 47 Address Line, 12 Advance Due to VLSI, 2 Advantage of Down-Line Loading, 28 Advantages and Myths of Firmware, 32 Analog Data, 16, 46 Approaches to ROM Implementation, 21 Approaches, Suitability, 22 Array, 16, 22, 23, 24 Assembler, 22, 39, 41 Assembly Language, 22, 39, 44 BASIC, 21, 40, 42, 44 Binary Coding, 8 Binary Encoding, 46 Bootstrap, 6, 41 Buffer, 30 Burn In, 10 Chip Select Mechanism, 12 COMMON Block Approach, 21, 39 Compiled Code, 33 Compiler, 38 Compiler, Optimizing, 38 Complexity, 24, 34

L - L bank to

Computer Graphics, 42 Computing Upon Request, 1, 23 Computing Upon Request, Cost of, 30 Conclusions, 44 Continuous Functions, 16 Converting Between Approaches, 23 Cost of Computing Upon Request, 30 Cost/Performance Ratio, 2 Cost, Programming, 31, 46 Cost, Software, 34 Costs, Manufacturing, 8 CPU, 5, 28 CPU Time, 24 Creation of Firmware, 39 Cross-Assembler, 41 Cross-Compiler, 41 Current Programs, 2, 14, 23, 46 Custom Functions, 24 Custom Processor, 1, 31, 32, 37, 41, 44, 46 Data Line, 8, 10 Data Storage on ROM, 14, 46 Dedicated Processor, 37 Digital Equipment Corporation, 5 Discrete Data, 16, 46 Down-Line Loading, 14, 28, 31, 41, 46

4 P & 7

Down-Line Loading, Advantage, 28 Down-Line Loading, Definition, 28 Electro-Mechanical Devices, 5 Encoding Scheme, 19 EFROM, 5, 10 EPROM, Programming of, 28 Erasable Programmable Read-Only Memory, 10 Erasing, 12 Error, 12 Execution Time, 33 Extended Hardware Features, 31, 32, 41 Extended ROM, 12 Finite Element Pocket Calculator, 37 Firmware, 1, 2, 31, 46 Firmware Examples, 41 Firmware, Definition, 31 FORTRAN, 19, 21, 22, 40 Fourier Transform, 24 Function, 16 Function Call Approach, 21, 22 Function of Two Variables, 16 Function, FORTRAN, 22 Functions of Integer Values, 14 Functions of Multiple Values, 14 Functions of Real Values, 14

Functions, Custom, 24 Functions, Multiple, 14 Fuse, 10 GPIB, 42 Hardcopy, 42 Hardware, 1, 5, 32 Hardware Look-Up, 14, 46 Hardware, Extended, 32 High Level Language, 40, 44 I/O Port, 35, 42 IBM, 2, 24 Implementation Approaches, 21 Index of Array, 16 Indirect Addressing, 39 Input Address, 19 Instruction Set, 34 Interfacing, 34 Interpretted Code, 33 Interpretter, 40 Interrupt Vector, 35 Introduction, 1 Irreversibility, 10 Joystick, 44 Jumper, 6 Limitation, Size, 6, 42

Ŷ

Linker, 21, 22, 37, 39 Linking Loader, 21, 22, 37 Load Time, 33 Logarithm, 21, 24 Machine Language, 22, 33, 35, 39, 44 Machine Readable Form, 38 Magnetic Tape, 42 Main-Frame Computer, 14, 34, 46 Manufacturing Costs, 8 Map, Memory, 35 Mask, 8 Matrix, Rotation, 24 Memory Cell, 8 Memory Circuit, 5 Memory Map, 35 Micro-Code, 35, 38 Micro-Code Programming, 31, 46 Micro-Computer, 2, 14, 21, 30, 32, 33, 34, 41, 42, 46 Micro-Electronics, 2 Micro-Processor, 42, 44 Micro-Programmable, 35 Mini-Computer, 46 Modifying Current Programs, 21, 46 Motorola, 44 Multi-Column Table, 16

a n shart to

Multiple Functions, 14 Myths, 33 Needed Technology, 30 Object Module, 40 Operating System, 35, 37, 38 Operating Systems on ROM, 31, 46 Optimizing Compiler, 38 Overview, 2 Packaging, 5 Parallel ROM, 16, 22 Passing Arguments to ROM, 39 Pegboard, 6 Performance/Cost Increases, 2 Peripherals, 5, 35 Permutations, 19 Piracy, Software, 32 Polynomial, 21, 24 Port, 1/0, 35 Power Supplies, 5 Pre-Computing, 1, 23 Present Code, 22 Primary Influences, VLSI, 5 Processor, Custom, 32, 37, 44 Processor, Dedicated, 37 Programmable Read-Only Memory, 8

A section of the sec

Programmable Read-Only Memory, Erasable, 10 Programmable, Micro, 35 Programming Cost, 31, 46 Programming Interface, 21, 37 Programming Language, 21 Programming PROM and EPROM, 28 Programming, Micro-Code, 38 PROM, 5 PROM, Definition, 8 PROM, Programming of, 28 Punch Card, 6 Radio Shack, 42 Random Access Memory, 35, 37, 39 Read-Only Memory, Definition, 6 Read-Only Memory, Erasable Programmable, 10 Read-Only Memory, Programmable, 8 References, 47 Registers, 39 Reproducing ROM Chips, 32 Review of ROM Hardware, 5 Rewriting Programs, 23 ROM, 1, 5 ROM Cartridge, 42, 44 ROM Implementation Approaches, 21 ROM, Conventional, 8

ROM, Definition, 6 Rotation Matrix, 24 Round-Off, 16, 19 RS-232-C, 42, 44 Secondary Influences, VLSI, 5 Semi-Custom Processor, 31 Simulation, 23 Sine, 21, 22, 24 Single Board Configuration, 46 Single Chip Implementation, 46 Size Limitation, 6 Software Piracy, 32 Software Storage on ROM, 30 Software Storage Requirements, 39 Software Tools, 32, 46 Source Code, 32, 40 Space Saving, 33 Speed Increases, 1, 14, 23, 24, 30, 33, 46 Stack, 39 Starting Address, 38 Storable Real Numbers, 21 Storage of Data, 14 Subroutine, 31, 38, 41, 46 Subroutine Libraries, 31 Switching Circuit, 12

Tables, 16, 23 Tabular Data, 24 Tektronix, 42 Time, CPU, 24 Time, Execution, 33 Time, Load, 33 Training, Down-Line Loading, 30 Transform, Fourier, 24 Translate, 44 Translation, 39 Trends, 2, 5, 24 ULSI, 30 Ultra-Large Scale Integration, 30 Ultraviolet Light, 12 Updates, 16 User Defined ROM, 46 Uses of ROM Software, 34 VAX Computer, 5 Virtual Memory, 5 VLSI, 1 Von Neumann, John, 6, 37 Wafer, 2,8 Word Processor, 37 Word Size, 19

Ľ.

| REPORT DOCUMENTATION PAGE                                                                                                                                                                                                                                                                       | READ INSTRUCTIONS                                                |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|
| REPORT NUMBER                                                                                                                                                                                                                                                                                   | BEFORE COMPLETING FORM                                           |
| AD-A11                                                                                                                                                                                                                                                                                          |                                                                  |
| . TITLE (and Sublitio)                                                                                                                                                                                                                                                                          | S. TYPE OF REPORT & PERIOD COVER                                 |
| The Implications of VLSI ROM Chips on<br>Numerical Analysis                                                                                                                                                                                                                                     | Interim Technical Report                                         |
|                                                                                                                                                                                                                                                                                                 | C. PERFORMING ORG. REPORT NUMBER                                 |
| . AUTHOR(e)                                                                                                                                                                                                                                                                                     | L. CONTRACT OR GRANT NUMBER(4)                                   |
| L.J. Feeser                                                                                                                                                                                                                                                                                     |                                                                  |
| M.F. Rooney                                                                                                                                                                                                                                                                                     | N0014-80-C-0712                                                  |
| M.S. Shephard<br>PERFORMING ORGANIZATION NAME AND ADDRESS                                                                                                                                                                                                                                       | 10. PROGRAM ELEMENT PROJECT, TAL                                 |
| Rensselear Polytechnic Institute                                                                                                                                                                                                                                                                | AREA & WORK UNIT NUMBERS                                         |
| Department of Civil Engineering                                                                                                                                                                                                                                                                 |                                                                  |
| - Troy, New York 12181                                                                                                                                                                                                                                                                          |                                                                  |
| 1. CONTROLLING OFFICE NAME AND ADDRESS<br>Director Structural Mechanics, Material Scient                                                                                                                                                                                                        | e 12. REPORT DATE JONT JAN. 8,                                   |
| Office of Naval Research, 800 No. Quincy Stree                                                                                                                                                                                                                                                  | t, 12. NUMBER OF PAGES                                           |
| Arlington, VA 22217                                                                                                                                                                                                                                                                             | 57                                                               |
| 14. MONITORING AGENCY NAME & ADDREES(I dillorent frem Controlling C                                                                                                                                                                                                                             | THE SECURITY CLASS. (of the report)                              |
| •                                                                                                                                                                                                                                                                                               | UNCLASSIFIED                                                     |
|                                                                                                                                                                                                                                                                                                 | I ONCERDITIED                                                    |
|                                                                                                                                                                                                                                                                                                 |                                                                  |
| - DISTRIBUTION STATEMENT (of this Report)<br>This document has been approved for public re<br>'unlimited.                                                                                                                                                                                       | 16. DECLASSIFICATION/DOWNGRADING                                 |
| This document has been approved for public re<br>unlimited.                                                                                                                                                                                                                                     | The BECLASSIFICATION/DOWNGRADING                                 |
| This document has been approved for public re                                                                                                                                                                                                                                                   | The BECLASSIFICATION/DOWNGRADING                                 |
| This document has been approved for public re<br>unlimited.                                                                                                                                                                                                                                     | The BECLASSIFICATION/DOWNGRADING                                 |
| This document has been approved for public re<br>unlimited.                                                                                                                                                                                                                                     | The BECLASSIFICATION/DOWNGRADING                                 |
| This document has been approved for public re<br>unlimited.                                                                                                                                                                                                                                     | The BECLASSIFICATION/DOWNGRADING                                 |
| This document has been approved for public re<br>"unlimited.<br>77. DISTRIBUTION STATEMENT (of the ebstront entered in Block 30, 11 dth                                                                                                                                                         | The BECLASSIFICATION/DOWNGRADING                                 |
| This document has been approved for public re<br>"unlimited.<br>77. DISTRIBUTION STATEMENT (of the ebstront entered in Block 30, 11 dth                                                                                                                                                         | The BECLASSIFICATION/DOWNGRADING                                 |
| This document has been approved for public re<br>"unlimited.<br>77. DISTRIBUTION STATEMENT (of the ebstront entered in Block 30, 11 dth                                                                                                                                                         | The BECLASSIFICATION/DOWNGRADING                                 |
| This document has been approved for public re<br>"unlimited.<br>T. DISTRIBUTION STATEMENT (of the about an intered in Block 30, 11 differences<br>B. SUPPLEMENTARY NOTES<br>:<br>S. KEY WORDS (Combine on reserve olde M accessory and identify by block                                        | The BECLASSIFICATION/DOWNGRADING<br>Hease and sale; distribution |
| This document has been approved for public re<br>"unlimited.<br>T7. DISTRIBUTION STATEMENT (of the electron entered in Block 39, H dH<br>IS. SUPPLEMENTARY NOTES<br>:                                                                                                                           | The BECLASSIFICATION/DOWNGRADING<br>Hease and sale; distribution |
| This document has been approved for public re<br>"unlimited.<br>T. DISTRIBUTION STATEMENT (of the about an intered in Block 30, 11 differences<br>B. SUPPLEMENTARY NOTES<br>:<br>S. KEY WORDS (Combine on reserve olde M accessory and identify by block                                        | The BECLASSIFICATION/DOWNGRADING<br>Hease and sale; distribution |
| This document has been approved for public re<br>"unlimited.<br>T. DISTRIBUTION STATEMENT (of the abstract enforced in Block 30, H dH<br>B. SUPPLEMENTARY NOTES<br>:<br>S. KEY WORDS (Combine on service of the N accounty and Monthly by Mode<br>Computers, Numerical Analysis, Very Large Sca | The BECLASSIFICATION/DOWNGRADING<br>Hease and sale; distribution |

...

1

ľ

いろうないるからたのないのであるという

| 20. ABSTRACT (Continued                             | ABE (Man Date Enterol) |           |             |           |         |    |
|-----------------------------------------------------|------------------------|-----------|-------------|-----------|---------|----|
| firmware are presented.<br>for the numerical analys | Final conclusions      | highlight | the importa | nce of RO | M chips | 1  |
|                                                     |                        | •         |             |           |         | į  |
|                                                     |                        | •         |             |           | :       |    |
|                                                     |                        |           | •           |           |         | 1  |
|                                                     |                        |           |             |           |         |    |
|                                                     |                        |           |             |           |         |    |
|                                                     |                        |           |             | *•        |         |    |
|                                                     | • ·                    |           |             |           |         |    |
|                                                     | •                      |           |             |           |         |    |
| · · ·                                               | . <b></b>              |           | •           | •         | · ·     | ļ, |
|                                                     | · · ·                  |           |             |           |         |    |
|                                                     |                        |           |             | •         |         |    |
|                                                     |                        |           |             |           |         |    |
|                                                     | . <u>.</u> .           | •••       |             | •         |         |    |
| ·                                                   | :                      |           |             |           | •       |    |
|                                                     | . <b>.</b>             |           |             | •         | •       |    |
|                                                     |                        | •         | •           |           |         |    |
|                                                     |                        |           |             | •         |         |    |
|                                                     |                        |           |             | ••        | •       |    |
|                                                     | :                      |           |             |           |         |    |
|                                                     |                        |           |             |           | •       |    |

<u>R</u>

