# RI Internation ### AD-A276 506 A002/Code W; A003/Documenting; A004/Final Report • February 23, 1994 #### PROOF OF CONCEPT FOR THE REWRITE RULE MACHINE: INTERENSEMBLE STUDIES José Meseguer, Principal Scientist Computer Science Laboratory SRI Project ECU 1505 Prepared for: Chief of Naval Research 800 North Quincy Street Arlington, VA 22217-5660 Attn: Dr. Keith Bromley, Scientific Officer Mary Ann Cook, Contracting Officer Director, Naval Research Laboratory, Attn: Code 2627 **Administrative Contracting Officer Defense Technical Information Center** Contract No. N00014-90-C-0264 Approved: Mark Moriconi, Director Computer Science Laboratory Donald Nielson, Vice President Computing and Engineering Sciences Division This document has been approved for public release and sale; its distribution is unlimited. DTIC QUALLTY HISTEOTIED 2 ## Best Available Copy # Final Report for the Project "Proof of Concept for the Rewrite Rule Machine: Interensemble Studies" Patrick Lincoln, José Meseguer, Babak Taheri, and Timothy Winkler SRI International, Menlo Park, CA 94025 #### 1 Introduction Under the direction of Dr. José Meseguer, the Rewrite Rule Machine (RRM) team began this project on 15 August 1990, and completed the work on 14 August 1991. In addition to the project participants listed as authors, Prof. Joseph Goguen of Oxford University served as a consultant. The main goal was to learn through simulation about the functionality and performance on realistic applications of an RRM system consisting of a collection of RRM ensemble chips (each such chip being a SIMD processor) connected on a network, and to design mechanisms to support the simultaneous parallel computation of applications across many such ensemble chips. To achieve these goals we first built a high-level interensemble simulator and ran a collection of benchmarks on it under varying assumptions about several architectural parameters to obtain a first estimate of the communication requirements for the RRM and to determine the feasibility of those requirements in view of existing network technology. Using a second, very detailed register-transfer level simulator of a single ensemble and the performance results of a collection of applications run on it, together with modeling above the ensemble level, we also estimated interensemble performance; in this way we were able to obtain more detailed and accurate interensemble performance estimates. Mechanisms supporting parallel computations across many ensembles were also studied and designed. Section 2 of this report gives an introductory overview of the RRM. Section 3 describes the new RRM architecture on which the detailed ensemble simulator and Codes Dist Avail and for Special the modeling of higher levels were based. Section 4 describes the interensemble computation mechanisms. Section 5 discusses simulation and performance estimation: interensemble simulations are discussed in Section 5.1, communication requirements in Section 5.2, and ensemble simulations and performance modeling of higher RRM levels based on them in Section 5.3. The code of the interensemble simulator and of several benchmarks run on it are given in Appendix A. A description of the ensemble simulator is given in Appendix B; and its code and that of benchmarks run on it are given in Appendix C. #### 2 Overview of the RRM Following an overview of the Rewrite Rule Machine (RRM) architecture and model of computation with special emphasis on the new ensemble design, we discuss performance estimates based on simulation. The architecture is a multilevel hierarchy, which is SIMD at the lower (chip) levels, and MIMD at the higher levels. This enables the RRM to combine the advantages of the SIMD and MIMD approaches. The RRM model of computation is concurrent graph rewriting, which supports extremely fine-grain parallelism, dynamic resource allocation, and simple semantics. Since performance estimation for a machine like the RRM is difficult, we must carefully justify our approach. We discuss the problems and how we address them later in this document. Our approach to performance estimation may be summarized as follows: we chose a diversity of problems to stress the design in different ways, including communication, memory, and computation; we chose problems representative of different application areas; and we built and used different simulators to get a variety of performance estimates. #### 2.1 Multigrain Concurrency and Applications Many important real-life applications involve a number of diverse, relatively independent processes, many of which are computationally homogeneous. For example, a large simulation problem may involve many independent, loosely coupled processes. Let us call a computation homogeneous, if at each moment it consists of many instances of the same instruction being applied to many data items in parallel; sometimes this is called data parallelism. While many familiar numerical algorithms have this form, many complex computational tasks are locally homogeneous but globally inhomogeneous. Because of its very fine-grain SIMD parallelism at the chip level combined with its flexible coarser-grain MIMD parallelism at the network level that allows different chips to work on very different subtasks of the same problem at once, the RRM can Figure 1: Concurrent Rewriting of Fibonacci Expressions exploit a problem's parallelism at several levels. We call this property multigrain concurrency; it makes the RRM very well suited for solving not only homogeneous problems, but also complex, locally homogeneous but globally inhomogeneous problems in many areas, including discrete event simulation, decision support systems, rapid prototyping, vision, computational geometry, automated deduction, finite element methods, neural nets, and hardware simulation. #### 2.2 Combining SIMD and MIMD At present, the two main approaches to massive parallelism are SIMD machines and MIMD multicomputers. Examples of the state of the art in each category are the Connection Machine, CM-2 (Thinking Machines Inc. [14, 4]), and the MP1216 (MasPar Computer Corporation [23]), for SIMD computers; and Mosaic (Chuck Seitz, Caltech [22]), the J-machine (William Dally, MIT [5]), Paragon (Intel Corporation), and the CM-5 (Thinking Machines, which simulates SIMD by MIMD broadcast), for MIMD computers. These two approaches are quite different. Each has unique advantages not shared by the other approach. The strength of SIMD machines is their exploitation of fine-grain data parallelism, which makes them a good choice for homogeneous problems; their weakness is their centralized control, executing the same code everywhere, which makes them perform poorly on large nonhomogeneous applications. MIMD machines are much more flexible because they allow different code to be run in different processors simultaneously; however, their communication—typically asynchronous interprocessor message passing over a network—is not well suited to data parallelism. A key goal of the RRM is to combine the best of these two approaches in a single architectural design. It shares with SIMD machines the capability for fine-grain data parallelism, which is carried to an even finer level in the RRM ensemble; however, because of its decentralized MIMD control, the RRM can perform well on both homogeneous and nonhomogeneous problems, whereas SIMD machines can excel only on homogeneous problems. Compared with MIMD machines, the RRM enjoys the same flexibility and generality, based on distributed control and asynchronous message passing, but because the RRM is SIMD at the chip level, it can exploit fine-grain data parallelism locally, even for highly nonhomogeneous applications, whereas, at present, purely MIMD machines can get large degrees of parallelism only at the interprocessor level. #### 2.3 Programmability The RRM is programmable in a wide variety of declarative ultra-high-level languages that permit massive exploitation of implicit parallelism and ease the creating and porting of parallel programs. We believe that declarative languages are good choices for programming such applications as vision, real-time plant control, simulations, and expert systems, because they do not require explicit commitment to specific forms of synchronization or scheduling. These convictions are supported by extensive simulations, and by compilation techniques [12, 1, 20] making functional (e.g., OBJ [8]), object-oriented (e.g., Maude [17], FOOPS [11]), and relational (e.g., Eqlog [10]) programming languages easy to compile into RRM code. However, it is a fact of life that some parts of large applications programs have already been written, and it may not be practical to rewrite them in a declarative language. Because its flexible model of computation also supports imperative features, a compiler for the RRM from a conventional language, even a sequential one, could be written relatively straightforwardly. #### 2.4 The Concurrent Rewriting Model of Computation The RRM's model of computation is **concurrent rewriting**. In this model, data are **terms** constructed from a given set of constant and function symbols, and a program is a set of equations that are interpreted as left to right **rewrite rules**. The lefthand side (abbreviated LHS) and righthand side (RHS) of a rewrite rule may have variables as well as function symbols. A variable can be instantiated with any term of the appropriate sort, and a set of instantiations for variables is called a **substitution**. A rewriting computation starts with a given term as its data and a given set of rewrite rules as its program. Applying a rewrite rule has two phases, called matching and replacement. The matching phase attempts to find a substitution that yields a subterm of the input term when applied to the rewrite rule's lefthand side. Then, in the replacement phase, the matched subterm, called the redex, is replaced by the righthand side of the rule, instantiated with the same substitution. Rules are applied until no more matches can be found; then the resulting term is called reduced and considered to be the final result. In the concurrent rewriting model of computation, more than one rule can be applied at once, and each rule can be applied to many subterms of the given term Figure 2: Hierarchical Structure of the RRM at once. Let us explain this by example. Here is a simple program to compute the Fibonacci numbers: - $(1) \quad fibo(0) = 0$ - (2) fibo(1) = 1 - (3) fibo(N) = fibo(N-2) + fibo(N-1) if N > 1 If you give fibo(3) as data, the top node will match rule (3); thus the whole term will be replaced by ``` fibo(1) + fibo(2) ``` In the next step, the first fibo node will match rule (2), and the second fibo will match rule (3) again, and the simultaneous application of these rules yields ``` 1 + (fibo(0) + fibo(1)) ``` in just one step of concurrent rewriting. Figure 1 illustrates these two concurrent rewriting steps, using tree representation for expressions. We say that a concurrent rewriting computation is SIMD, when just one rewrite rule is applied concurrently at each moment; in the RRM, this style of concurrent rewriting is realized by an ensemble chip, as explained later. If several rules are concurrently being applied, each to possibly many instances, we have MIMD concurrent rewriting; this general case is the correct model for the RRM as a whole. See [9] for general background on the concurrent rewriting model, [6] for definitions of SIMD and MIMD rewriting (called parallel and concurrent rewriting in that paper), and [18, 19, 17] for a definition of concurrent rewriting as deduction in rewriting logic and a systematic treatment of concurrent object-oriented computation by means of concurrent rewriting. Two additional topics treated in [9] deserve mention. The first is sharing, which permits a common substructure of two or more given structures to be shared between them, rather than requiring that it be duplicated. This leads to directed acyclic graphs rather than just trees. The second topic is evaluation strategies, which are annotations that impose restrictions on concurrent execution in order to improve the performance of parallel computations. A strategy for a function f of n arguments consists of a set $\{i_1, ..., i_k\} \subseteq \{1, ..., n\}$ indicating the argument places that should be already reduced before a rewrite rule match for f is attempted. For example, if\_then\_else\_fi is typically computed with strategy $\{1\}$ , and integer addition with "bottom-up" strategy $\{1,2\}$ . #### 3 The RRM Architecture The RRM architecture is hierarchical, with each unit consisting of a collection of cooperating units at the next lower level. The most basic processing element is the cell, with four cells making up a tile. An ensemble chip contains hundreds of cells (576 is our current estimate). A cluster is a collection of ensemble chips connected on a board, and the machine as a whole is a network. Figure 2 provides a pictorial representation of the RRM hierarchy. A single ensemble yields very fast, extremely fine-grain SIMD rewriting, but RRM execution is coarse-grain MIMD at the cluster and network levels, since each ensemble independently executes its own rewrites on its own data, communicating with other ensembles when necessary. #### 3.1 Cell, Tile, and Ensemble Architecture The most basic computational element in the RRM is the cell [16, 2], which stores one data item with pointers to other cells, and also provides basic computational and communication capabilities; thus cells mix storage, computation, and communication. A cell consists of: - Several registers (mostly 16-bit), including: - token, which encodes the operation or constant symbol of a data node, - left and right, which point to the descendant nodes1, - a 32-bit marks register, which holds volatile information (similar to condition codes), - flags, which holds less volatile information, such as type and reduction status, <sup>&</sup>lt;sup>1</sup>Unary operations only use left, and n-ary operations for n > 2 are decomposed into binary ones. - Twelve general-purpose registers, including ntoken, nleft, nright and nflags. - An ALU to operate on and test the contents of registers. - Interfaces to communication channels and the controller. We divide the silicon area of the ensemble chip into a $12 \times 12$ mesh of tiles, each with four cells. Adjacent tiles are directly connected by short wires, so that placing logically linked nodes in cells located in adjacent tiles permits very efficient communication. Placing several cells in one tile increases the probability of logically related data being in adjacent cells. Our new ensemble design is simpler and has substantially better overall performance than previous designs [7, 2]. Its simpler instructions allow a faster clock (100 MHz seems a reasonable estimate) and provide much better support for communication between cells. An ensemble has a single SIMD controller that broadcasts its instructions to all cells. The controller can obtain very fast feedback (one clock cycle) about the state of the cells (such as type of data and operation symbols in cells, remote references, success or failure of an instruction, and termination) and can use such feedback to branch to different SIMD code segments. Obeying SIMD instructions, cells can communicate with adjacent cells (each cell has 16 adjacent cells in its 4 adjacent tiles) to find local patterns for rewriting; hundreds of such patterns may be found and transformed simultaneously. Other SIMD instructions allow communication among nonadjacent cells using special row and column buses, relocation of data, and input-output. The short buses, called ports, allowing fast communication of each cell with the 16 cells in the north, south, east, and west (N, S, E, and W) neighboring tiles, as well as the row and column buses used for communication of nonadjacent cells under SIMD control, are shown in the figure below. SIMD concurrent rewriting takes place by broadcasting instructions that implement matching and then replacement of the patterns found. Although for very regular computations it is possible to avoid remote—i.e., not physically adjacent—references within a single ensemble, in general the dynamic nature of the computation will require remote references, and then matching will require relocation of some data. This is accomplished with specialized instructions and chip-level hardware support, including cell and tile features, and buses for communication between distant cells. We use a reference counting scheme for storage management, both within ensembles and in the RRM as a whole. We have fully simulated the details of this within the ensemble for the examples discussed later in this report. #### 3.2 Cluster and Network Architecture The cluster architectural level corresponds to board-level structure in the actual implementation. At this level, ensemble chips can be arranged in a two-dimensional (2D) mesh with fast connections to each of four neighbors, giving 8 connections per ensemble (4 in and 4 out). With current technology, these could be 16-bit-wide connections running at 50 MHz, giving 800 Mbps per connection and 6.4 Gbps total bandwidth per chip. Additional interconnection hardware at the board level beyond the fast, local connections is also desirable, as in the iWARP [3] and DataWave [21] designs. The performance we assume is not that much beyond that provided by these designs; the iWARP has 8 ports, each 8 bits wide at 40 MHz, giving 320 Mbps per port and 2.56 Gbps total (100 to 150 ns latency), and the DataWave has 8 ports, each 12 bits, at 60 MHz, giving 5.76 Gbps total. We are estimating that a cluster will have about 100 ensembles. The network level interconnection for the RRM has not been fixed. We have been considering the wormhole routing networks of Seitz [22] and Dally [5]. Actual realizations of these designs have achieved high communication rates: 205 Mbps for Ametek 2010, and 200 Mbps for the Intel Paragon. For a 2D mesh, average case communication time for 10,000 nodes is estimated at 1885 ns, or 188 clock cycles. For a 3D mesh, the average case communication cost for 10,000 nodes is estimated at 976 ns, or 98 clock cycles. In general, interchip communication in the RRM is asynchronous message passing that imposes no critical timing requirements on the network or switching technology. Thus, the RRM can exploit the best communication technology available, and take advantage of any future improvements. However, the RRM can exploit locality and use fast local interensemble connections at the cluster level to get very high performance for certain problems. Figure 3: Before and After Creation of Ghost (of b) #### 4 Interensemble Computation Sometimes active cells in one ensemble need information from descendants in another ensemble. We call references from one ensemble to another ensemble distant references, to distinguish them from the remote references that occur from cell to nonneighbor cell within a single ensemble. Although distant references can be reduced by relocating data to ensembles that reference it most often, it is impossible to completely eliminate distant data references, even using static memory allocation, because, in general, structures will not fit in a single ensemble. To efficiently support interensemble communication, we have developed two related mechanisms. For symbolic computation, where data is laid out dynamically and computation is asynchronous or delay-insensitive, we use an incremental symbolic cache approach. When a distant reference is made, and it is determined that the distant node should not be relocated to the local ensemble, then a ghost node is instead allocated in a cell of the local ensemble, and data from the target of the distant reference is copied into the ghost node. However, unlike true relocation, the ghost node is prevented from being the root of a rewrite, i.e., is temporarily frozen. Also, a ghost node maintains a copy of the original distant pointer, and thus acts as a passive incremental "symbolic cache" of data that actually resides on another ensemble. After some time, under SIMD control, ghost nodes flush their data and use the stored distant pointer to refresh their contents. This flush-refresh of ghost information may be performed at any time. In addition, at some times the parent of a ghost may copy the distant pointer from its descendant ghost, and then cause deletion of the ghost. For example, in Figure 3, in the before (left) picture, ensemble A contains a cell labeled a that has a distant pointer to a cell labeled b in ensemble B. In the course of pattern matching, cell a requires information from its descendant b. In the after (right) picture, a ghost node for b has been created in ensemble A, and the distant pointer from a to b has been replaced with a local pointer from a to the new ghost of b. Thus the ghost of node b has distant pointers to the children of b, and also has a Figure 4: Systoli. Interensemble Computation copy of the original distant pointer (shown as a dashed arrow to node **b** in ensemble **B**). Note that this process cannot continue indefinitely, since ghosts are not allowed to initiate the matching process themselves. Thus even if the structure underneath **b** is large, only that portion of the structure needed to verify a match rooted at **a** is ever copied to ensemble **A**. The mechanism used in the systolic case is similar in spirit to the symbolic case described above, but can be implemented somewhat more efficiently, due to the locality of reference that (in part) characterizes systolic computations. Because this locality does not change during a computation, we should place elements that communicate frequently on the same ensemble. As in the symbolic case, structures may be too large to fit on a single ensemble, and then we must place portions of the problem on neighboring ensembles, while keeping local copies of the border data current on both ensembles. Since systolic computation is synchronous and delay-sensitive, we must ensure that the border data is updated correctly when it is read by the local ensemble. In general the systolic computation must wait every cycle for the block transfer of data between ensembles. In Figure 4, ensembles A and B each contain an area of active cells delineated by the dashed box. Outside this box are border cells that do not necessarily perform computations, but instead store copies of the near-edge cells of neighboring ensembles. Figure 4 shows a block copy of information from active cells in ensemble B to (passive) edge cells in ensemble A. After information from each neighboring ensemble is copied into ensemble A, the next step of computation can proceed. In many cases we can overlap communication with computation. This potential overlap, or rudimentary pipelining of I/O and computation, is another consequence of our architectural choice of multiple cells per tile. The current design of the ensemble with four cells per tile allows simultaneous systolic computation of four distinct two-dimensional layers at a time. In fact, one or two layers could perform I/O at the same time that the other layers perform their systolic computations. In this way, we may hide some of the potential I/O penalty of interensemble computations. #### 4.1 Load Balancing Allocation in an ensemble normally ensures that allocated cells are neighbors of the allocating cell. However, when an ensemble becomes too full, allocations are made on other ensembles. This process can be described as pushing out computational subtasks. The SIMD controller can gather (perhaps imprecise) information about the utilization level of an ensemble in order to determine when the ensemble is full. For certain computations, it may be advisable to push out subtasks at the outset. Large symbolic computations usually require building and manipulating very large term structures, which may be distributed over several ensembles when they are initialized, may be distributed explicitly by a specially tuned SIMD broadcast, or may migrate implicitly to neighboring ensembles during computation. Allocation is important in architectures like the RRM, due to the sensitivity of computation to locality. Thus, initial placement may have a large impact on performance, especially for relatively short computations with large amounts of data. After initial allocation, the compiled SIMD code may explicitly push subcomputations out of an ensemble, perhaps forming a ghost node in its place. Thus the local copy does not perform rewrites itself, although it would still participate passively in other rewrites. Finally, automatic migration can be performed by pushing subtasks out of an ensemble based on the depth of the subterm from a root node of the ensemble, forcing subcomputations to be pushed out more quickly. However, spreading computation more quickly, and thus more evenly among ensembles, trades off against interensemble communication overhead. The techniques for interensemble computation already described above substantially alleviate this overhead, but it still exists. #### 5 Simulation and Performance Estimation Estimating the performance of computer systems is a difficult art at best, and is even more difficult for radically new machines that have not yet been built. The performance limitations of simulators mean that large problems are very difficult to run. Testing different aspects of a design on the largest possible problems may force using multiple simulators to abstract different details for various choices of performance measure and problem. But then it may be difficult to justify the abstractions, and to ensure that the problems fit the assumptions behind their justifications. For the RRM, these difficulties seem particularly acute, because of the high performance figures that we seek to justify. Given the serious performance limitations posed by trying to simulate our architecture on the workstations available at the time our research was carried out (limitations still applying today to a good extent) the approach to performance estimation that we adopted was a hybrid one: - Interensemble simulations, in order to be computationally feasible, were performed at a high level of modeling using a high-level interensemble simulator (see Section 5.1) not detailed enough to yield precise quantitative information, but useful for obtaining preliminary estimates of communication requirements (see Section 5.2) and also for gaining experience with interensemble computations, and for getting some rough performance estimates. - Ensemble simulations for our new RRM ensemble design were performed using a very detailed register-transfer level ensemble simulator (see Section 5.3). By running a widely varied collection of applications on this simulator, very precise estimates were obtained at the ensemble level. Using modeling—based on architectural assumptions consistent with the interensemble simulation experiments and feasible with current technology—for the higher levels of the RRM architecture, we were then able to obtain more detailed interensemble performance estimates for the RRM as a whole. #### 5.1 Interensemble Simulations We developed an interensemble simulator for the RRM and used it to develop and test ideas about interensemble communication mechanisms and strategies; the code of this simulator as well as that of the benchmarks run on it is given in Appendix A. For this simulator, the RRM as a whole was modeled as a 2D array of clusters which, themselves, were 2D arrays of cells. Clusters were assumed to be interconnected in such a way that one could view the machine as a whole as a 2D array of ensembles. We introduced the notion of clusters to model a difference in technology between board-level interconnection and interconnection at a larger scale. The overall 2D topology was chosen because it was a relatively modest interconnection structure that should be realizable in practice. The simulator was instrumented to keep track of communication at different levels so that we could get some estimates of the communication requirements of the RRM as a whole and between clusters. The simulator was a very high level simulator, manipulating term structures by applying rewrite rules, but a term was considered to be located in a particular ensemble and ensembles had some limitations on the total size of the terms they could contain. The ensembles apply rewrite rules independently, and then apply strategies to determine if terms should be pushed out, making more room, or pulled in, making ghosts. The basic high-level actions are those of pushing out a subcomputation (somewhat similar to a remote procedure call) and copying results or partial results back in (which returns a final result or provides a cached version of a partial result). Two primary strategies were used for deciding when to push subterms out of an ensemble. One strategy was based on a threshold value for the depth of a term below a root in the ensemble. If this depth threshold was chosen to be fairly shallow, then the term would be forced to rapidly spread out through the RRM. This would have the benefit that a given ensemble would be unlikely to contain only an upper part of the tree that was waiting on subcomputations to complete and so be idle. This strategy would be applied regardless of how full the ensemble was; the size limitation would mainly come about when an ensemble refused to accept a subcomputation being pushed out from elsewhere. Such a strategy is analogous to the treatment of allocation within an ensemble. The other strategy used a fullness threshold to decide when to push out a sub-computation: if an ensemble becomes very full, a major subcomputation is pushed out. The strategy for selecting the subterm to be pushed out is to select either whole rooted terms if they are small or to select a top-most large subterm of a very large rooted term. (In practice, it is necessary to have a second fullness threshold that determines when the ensemble will accept pushed out subcomputations from other ensembles.) Some of the problems simulated were: numeric Fibonacci number calculation, Peano arithmetic Fibonacci, bubble sort, merge sort, and matrix multiply. These examples were chosen because the patterns of computation are quite different in these examples and seem to represent an interesting variety. The amount of time required to do the simulations was a limiting factor on the size of problems run for these examples. Different versions of the simulator were used in simulations. Changes were introduced in order to make the simulations more realistic; for example, limits on the amount of communication between ensembles in a cluster or between clusters were imposed in some versions, to test the impact of such limits. Some of the parameters that we experimented with were: time penalties for both intra- and intercluster communication, size of ensemble for allocation, intra- and intercluster communication limits, relocation size threshold (an ensemble must be below a given limit of occupancy before it can relocate structures inward), push-out threshold (and an associated goal for the size of the ensemble when subcomputations have been pushed out), and a value controlling when subcomputations are pushed out based on the depth of a term below a root in an ensemble. Most of these parameters were not critical in that small variations in their values had only small effects on the performance of the simulated RRM. The robustness of the performance relative to these variations is a very positive result. For the simulations performed to generate estimates (discussed later), values were chosen that were as realistic as possible and on the conservative side (i.e., values that would tend to produce the least favorable result). The high level of abstraction of the simulation process means that the results cannot be expected to be precise predictions of the behavior of the RRM. On the other hand, we expect that the large-scale behavior should be roughly similar. We also now believe that the development of multiprocessor interconnection technology is proceeding so rapidly that we can expect to have networks that are very well adapted to the RRM architecture. #### 5.2 Communication Requirements and Networks The following subsections give an indication of the current state of the art in high-performance, low-latency interconnection networks and then present the specific estimates of the communication requirements derived from the interensemble simulations. #### 5.2.1 Prospects for High-Performance Interconnects Demand for very high performance interconnects is being driven both by tightly coupled shared memory systems and by more experimental distributed memory systems. Theoretical results have been abundant in this area, from theoretical studies done for phone systems to a large literature on hypercubes and their variants. However, careful comparisons and evaluations of tradeoffs and actual practical engineering experience are just being developed for the kind of large-scale interconnects that interest us. Just constructing an interconnect for 500 processors is a major project, probably too large for the academic context and hard to justify either commercially or in government funded research without a clear use (i.e., an overall system architecture using the interconnect). The analysis of design tradeoffs in the thesis of Dally has led to a whole new generation of wormhole routing interconnects used in machines such as the Ametek 2010, the Fujitsu AP1000, and the Intel iWARP [3]. The iWARP processor is intended for very high performance, very fine grain systolic computation. A new processor with a similar goal is the ITT DataWave [21]. There are also next generation designs such as Seitz's CalTech MOSAIC project and Dally's J-machine project at MIT. These designs achieve impressive communication rates: 205 Mbps for Ametek 2010, 200 Mbps for Fujitsu AP1000, 160 Mbps per port and 2.56 Gbps aggregate for the iWARP (8 ports, each 8 bits, at 40 Mhz and with 100 to 150 ns latency), and 6 Gbps for the DataWave (8 ports, each 12 bits, at 60 Mhz). Both the iWARP and DataWave examples are interesting because systolic computation is even more demanding of the interconnect than the RRM design is expected to be. Progress can be expected to continue along these lines, and good designs will be developed and tested for systems with large numbers of processors (500 to 1000 or more). For example, the MOSAIC system is planned to have 16000 processors and will be based on a 3D mesh $(32 \times 32 \times 16)$ . Another feature of the iWARP and DataWave designs is that they are single-chip processor designs. This allows the 8×8 prototype for iWARP to fit on a single board. With current technology, this gives a significant advantage as wiring densities can be much denser on a board than off. The other machines (Ametek 2010, Fujitsu AP1000) were based on processing nodes with from one to four processors per board. The goal for the RRM is to have the complete ensemble consist of an ensemble chip (each with 576 processing elements (PE's)) perhaps with one or two more chips for extra memory and routing, thus also allowing many processors per board. We believe that low latency and local high performance may be much more important than global bandwidth. In the future, it is likely that there will be very important solutions based on mixtures of technologies. For example, CMOS to GaAs with silicon lasers and a optical interconnect could provide multi-gigahertz rates over a single optical fiber. To reduce the number of wires and maintain bandwidth, it is necessary to increase the frequency or rate of operation, which suggests a change in basic technology. However, it is likely that CMOS, or related technologies will continue to provide the highest density available, which suggests that a mixed approach might be desirable. The important point here is that there is much room for improvement of interconnection technology and that the RRM—because of its asynchronous model of computation—can easily take advantage of such improvements. #### 5.2.2 Communication Requirements We have used the high-level interensemble simulator on a suite of characteristic examples to estimate upper bounds on the communication demands of an ensemble. Our upper bounds seem to be in the range of the newer network and interconnection technologies such as the wormhole routing networks of Seitz [22] and Dally [5]. As mentioned, interchip communication in the RRM is asynchronous message passing communication and imposes no critical timing requirements on the network or switch technology. This makes the RRM capable of exploiting the best communications technology available, and of taking advantage of any future improvements in such technology. The following are important observations on the communication requirements of an ensemble: - Estimated ensemble I/O rate: 160 Mbps to 520 Mbps (estimate based on specially instrumented interensemble simulations). - Pins are not a bottleneck: realistic current estimate is 4 Gbps (100 pins at 40 Mhz). - Communication capacity seems to be in the range of newer network and interconnection designs (Seitz [22], Dally [5], iWARP[3], DataWave[21]). - Average communication performance is enough; the RRM design doesn't make critical timing requirements on the communication network. ### 5.3 Ensemble Simulations and Interensemble Performance Modeling The new ensemble design and estimates of its performance have been validated by running a variety of benchmarks on a new ensemble simulator written in C, which models the ensemble computation in great detail at the register transfer level. A detailed description of this simulator is given in Appendix B. The code of the simulator as well as that of several applications run on it are given in Appendix C. Our simulations at the ensemble level have a great level of detail and give quite accurate performance estimates, but our overall performance estimates for the RRM are still preliminary, and more studies and experiments are required to increase their accuracy. The present estimates are based on detailed ensemble simulations, high-level interensemble simulations, estimates of communication requirements, and analysis using simple approximate models. More definitive performance estimates will require more detailed simulations and analytic studies for a wider collection of examples and applications. The performance models are based on simple predictions of the computation times for specific strategies for performing the computations. We discuss RRM performance predictions for a variety of examples chosen because their patterns of computation are representative of different kinds of computations; they represent basic examples of general symbolic computations (numeric Fibonacci and the TAK function), highly regular symbolic computations (sorting), and discrete event simulations of a systolic nature (fluid flow and a simple hardware simulator). When describing RRM performance at the cluster or network levels, we specify efficiency as a percentage of the ideal performance. The ideal performance corresponds to a linear extrapolation of a single ensemble's performance, i.e., a linear speedup. We will also give "idealized Sun-relative speedup," which simply is the product of the number of ensembles, the Sun-relative speedup, and the efficiency. We assume a 100-MHz clock and a 12 × 12 array of tiles requiring approximately 6 million transistors. These figures seem achievable since speeds and sizes of this kind have already been demonstrated. For example, the 1991 Hot Chips conference [15] presented two chips with 100-MHz clocks (one of them with 4.1 million transistors), and another chip with 14 million transistors. There are many different performance measures for machines, including machine instruction execution rates, and actual elapsed time. The most intrinsic ensemble performance estimate is the number of clock cycles needed for a given computation. By assuming a specific clock rate, this measure can be translated into seconds. However, some relative comparison of performance between the ensemble and existing sequential processors is also desirable. We use the Sun-relative speedup for this purpose. To obtain this comparative measure we write one program in ensemble SIMD code or with rewrite rules, and another in efficient C. By comparing the actual performance of the C program on a Sun workstation with the performance of the SIMD code on the ensemble simulator, we obtain for each problem a speedup measure called "Sun-relative speedup." In our case, we take a Sun SPARCstation IPC as the basis for comparison. This could also be used to assign a "MIPS" rating to the ensemble by multiplying this speedup by the published MIPS ratings of the specific Sun workstation, which is roughly 15 MIPS for the SPARCstation IPC. In most cases, the aim is to compare a good algorithm for a problem on the RRM with a good sequential algorithm on a Sun. In some cases, the optimized sequential Sun version involves significant variations from the algorithm used on the RRM. When we discuss each benchmark below, at the ensemble level and levels above, we mention the specific assumptions made. #### 5.3.1 Performance Estimates for TAK The TAK benchmark is a subtle modification of the function Ikuo Takeuchi originated specifically to test Lisp systems. The modification accidentally introduced by Richard Gabriel and John McCarthy makes the function more difficult to optimize, but preserves its simple, recursion-intensive structure. We have implemented TAK for the RRM and in C for purposes of comparison. The Lisp and C code are shown below: ``` tak(x,y,z) register int x,y,z; { int r1, r2, r3; while (1) { if (x<=y) return z; r1 = tak(x-1,y,z); r2 = tak(y-1,z,x); r3 = tak(z-1,x,y); x = r1; y = r2; z = r3; }}</pre> ``` Because our most detailed simulations are limited to a single ensemble, we have used the arguments 12,8,4 instead of the more traditional 18,12,6. The RRM code completes this benchmark in 22,428 cycles, while the C version finishes in .0015 second on a SPARCstation IPC. This leads to a Sun-relative speedup of 6.7 (= .0015/.00022428). We currently don't have cluster or RRM estimates for this example. #### 5.3.2 Performance Estimates for Numeric Fibonacci A strategy for computing numeric Fibonacci—which yields a simple approximate model for estimating performance—is to do the computation directly if it fits in one ensemble, and otherwise apply the last of the following rewrite rules for fibo ``` fibo(0) = 0 fibo(1) = 1 fibo(N) = fibo(N-1) + fibo(N-2) if N>1 ``` once, and then push out the subcomputation of fibo(N-2), to proceed in parallel with that of fibo(N-1), which either may be done locally or may push out further subcomputations. This strategy always keeps a significant subcomputation for the current ensemble. Detailed ensemble simulations allow quite accurate estimates of time required for n up to 10 (it is linear in n). By comparing with the time required to run the same algorithm in C on a Sun workstation, we obtain a Sun-relative speedup of 6.7. The cost for larger n is the time to set up the subcomputations, plus the maximum of the cost to finish the local subcomputation and the cost to finish the pushed out subcomputation, plus the cost to finish the computation. Assuming that network I/O can be overlapped with SIMD broadcast, but that transferring a simple expression like fibo(10) out of an ensemble or transferring a result such as 2584 takes just a small number of SIMD instructions, the complete time to compute the numeric Fibonacci can be modeled by a recursive function allowing different assumptions about the network communication delays. For very fast networks, the network communication times and the computation times (for setup and finishing) are roughly comparable, so that network I/O cannot dominate the overall computation time (usually it will be overlapped with computation). The cost of numeric Fibonacci within an ensemble is approximated by $$fibens(n) = 250 \times n - 50$$ for $n \geq 3$ . The approximate cost to compute the n-th Fibonacci, for $n \geq 10$ , is then ``` fibgen(n) = simdcost+ max(fibgen(n-1), fibgen(n-2) + pushcost) ``` where simdcost is the SIMD execution cost to set up the subcomputations, push out, pull in, and finish the Fibonacci computation (approximately 300 clock cycles), pushcost is the cost to do two I/O operations (estimated to be less than 200 clock cycles for 10,000 ensembles, and fibgen(n) = fibens(n) for n < 10. With these estimates the simdcost dominates, I/O is overlapped, and efficiency is very good. For larger n, fibgen(n) = $300 \times n - 455$ . For a 10,000-ensemble RRM, the predicted worst-case efficiency for this example is 88%, which seems quite encouraging. The idealized Sun-relative speedup is 59,000. #### 5.3.3 Performance Estimates for Sorting A simple way to sort a sequence of numbers on an RRM ensemble is to use a 2D exchange sort that uses both "bubblesort" exchanges of consecutive elements of the sequence and "shortcut" exchanges between nonconsecutive elements. By appropriate placement of the sequence within an ensemble, both types of exchanges can be accomplished by simple, local transformations. For a 23 × 23 array of values we can form a linear sequence of numbers in the array by going down the first tile column, up the second column, and so forth. We can also establish horizontal shortcut links between list elements that are adjacent elements of the same row. By folding the 2D array twice, it is possible to embed the array in an ensemble and fit a list with $23 \times 23$ (= 529) elements inside an ensemble in such a way that all links are direct neighbor-to-neighbor connections. The 2D exchange sort algorithm alternates bubble sort exchanges between consecutive elements in the sequence with shortcut exchanges between nonconsecutive, but horizontally adjacent elements. For a list of length nplaced in this manner, the time to do a 2D sort within a single ensemble is proportional to $\sqrt{n}$ , and requires approximately $221 \times \sqrt{n} - 468$ clock cycles. The average number of instructions for either the bubblesort or the shortcut exchange phases is 42, giving a main loop size of 84. Comparing with the time taken by a simple quicksort algorithm written in C and running on a Sun workstation yields a Sun-relative speedup of 127. Uniformly distributed random data was used for the tests. At the interensemble level, one can use the same pattern, i.e., ensembles in a mesh and interchanges in the long chain or rows, but interchanges will always exchange the maximal value from one ensemble with the minimal value from the next. For this problem, the computation within an ensemble has a structure different from the structure at the cluster level and higher. The simple, fixed connectivity is one advantage of this approach; it should be possible to allocate ensembles so that all I/O connections are local, best-case links. The data would be broken into chunks, which start interchanging data internally and across ensemble boundaries, in one of two directions, at their endpoints. When data items are exchanged across a boundary, an item is pushed out and another is pulled in, preserving the size of the chunk in the ensemble. It seems better not to have a lock-step process, in which data items are always exchanged, but instead to exchange data only when there is a need, e.g., when a new value has been interchanged into an end position. In order to get estimates at the cluster level, a special simulator was written in C that simulates the two-level 2D sorting algorithm and calculates clock-count estimates. Note that, because of reduction of the bandwidth through a cross section of the machine, one expects that sorting at the cluster level should be at least 23 times slower than within an ensemble. Since there are 100 ensembles at the cluster level, one might still see some further speedup; however, the algorithms are more complex and less efficient. If very fast neighbor-to-neighbor connections can be used at the cluster level (and this should also be possible at the level of the RRM as a whole), then exchanging data with a neighbor should take only 5 to 10 clock cycles. The phases consist of local plus global linear exchanges, with an additional smallest to largest shortcut, and local plus global row exchanges. The additional cost, due primarily to communication, of global operations is estimated at 20 to 30 instructions. The estimated time to sort in a cluster was compared against the time to quicksort on a Sun giving an estimated Sun-relative speedup, for a 100-ensemble cluster, of 114. For a wormhole routing network, when data items are exchanged between two ensembles, a round-trip message is required with estimated time, assuming a single hop is required, of perhaps 20 clock cycles. The estimated idealized Sun-relative speedup for the RRM would be very close to the cluster case. It is very possible that the network latency could be overlapped with other computation, and the increase in the total computation time compared with the cluster case should not be more than 20%. #### 5.3.4 Performance Estimates for Fluid Flow Fluid dynamics can be studied using a 2D cellular automaton model [13]. This computational model is nearly ideal for the RRM, due to its very regular structure heavily using instructions that efficiently interchange bits among neighboring cells. The same communication pattern could be used for many other 2D processing and cellular automata problems. In fact, we have implemented Conway's game of Life using these same techniques, and have achieved similar performance. Many other problems, such as certain vision algorithms, stress analysis and particle diffusion in solids, fit this pattern of computation. We have implemented a version of the cellular automata approach based on a regular 2D hexagonal lattice. Each cell is connected to its six neighbors by links that may hold at most one particle traveling in each direction in each time step. We use unit time steps, unit particle masses, and unit velocity. Each particle is completely described by the link on which it currently resides, and all particles have constant kinetic energy and zero potential energy. At each time step, particles move along their links, possibly interact with other particles at the center of a hexagonal cell, and move to some other link. We have implemented this model using one RRM cell to simulate each hexagonal cell of the model. Each RRM cell contains six bits that encode the presence or absence of outgoing particles on the links to its six neighbors. Communication is handled by transferring the six bits from each cell to the appropriate neighbor. Computation is handled by performing certain bitwise operations (such as and, or, equal) and a form of table lookup. We used 1000 iterations of 529 hexagonal cells as the benchmark. Assuming that the ensemble chips will have a clock speed of 100 MHz, the whole benchmark should run in 2.2 to 2.6 ms. There are multiple ways to implement this problem in C for comparison. The fastest implementation we developed (using register declarations for variables, changing the way table lookup was handled, moving conditional expressions out of the main loop) ran in 1.4 seconds. This results in a Sun-relative speedup of between 400 and 670 for a single ensemble. The instruction count for the main loop for this problem is about 220 instructions. We estimate that the communication overhead within a cluster, using neighbor-to-neighbor connections, could be as low as 48 clock cycles (6 bits $\times$ 4 cells per tile $\times$ 2). The transfers of marks between ensembles can take place in 12-bit parallel transfers (one cell for each tile on the edge of the ensemble). This gives 268 clock cycles per main loop or 2680 ns at 100 MHz. This gives a cluster-level performance that is 82% of ideal (= 220/268). #### 5.3.5 Performance Estimates for a Hardware Simulator It is possible to do a simple kind of hardware simulation on the RRM extremely fast. The code to simulate two-input NAND and OR gates, where the output state of a gate is represented by the status of a specific mark, has only 24 instructions. This simple simulator cannot simulate arbitrary circuits, since there can be layout problems; gates must be close to the gates that produce their input signals. For a specific very simple circuit, comparison of the simulations with a highly optimized C program running on a Sun workstation gives a Sun-relative speedup estimate for an ensemble of 533, and for an optimized C program that is a more general circuit simulator an estimated speedup of 1,500. The cluster-level performance estimate is close to a linear scaleup of this. Only a single mark per edge cell needs to be transferred across the ensemble boundaries. It seems reasonable to assume that this could be done in 8 clock cycles. The cluster performance would then be 75% of ideal (= 24/32). The idealized Sun-relative speedup for a cluster would be 40,000 to 110,000. #### 5.3.6 Summary of Performance Estimates For a 10,000-ensemble RRM, our present estimates are as follows: - Raw peak performance: 576 trillion operations per second. - For general symbolic applications (the numeric Fibonacci problem is taken as a typical example and the TAK function is a secondary example): - Ensemble Sun-relative speedup is roughly 6.7. - RRM performance with wormhole network at 88% efficiency gives an idealized Sun-relative speedup of 59,000. - For highly regular symbolic applications (the sorting problem is taken as a typical example): - Ensemble performance is a Sun-relative speedup of 127. - Cluster-level performance is a Sun-relative speedup of 114. - RRM performance is estimated at over 80% efficiency (relative to the cluster performance) yielding a Sun-relative speedup of over 91. - For systolic applications (a 2D fluid flow problem is taken as a typical example; a secondary example is a hardware simulator): - Ensemble performance is a Sun-relative speedup of 400 to 670. - Cluster-level performance, which should be attainable in practice, is 82% efficiency. This yields idealized Sun-relative speedups of 33,000 to 55,000. #### References - [1] Hitoshi Aida, Joseph Goguen, and José Meseguer. Compiling concurrent rewriting onto the rewrite rule machine. In S. Kaplan and M. Okada, editors, Conditional and Typed Rewriting Systems, Montreal, Canada, June 1990, pages 320-332. Springer LNCS 516, 1991. - [2] Hitoshi Aida, Sany Leinwand, and José Meseguer. Architectural design of the rewrite rule machine ensemble. In J. Delgado-Frias and W.R. Moore, editors, VLSI for Artificial Intelligence and Neural Networks, pages 11-22. Plenum Publ. Co., 1991. Proceedings of an International Workshop held in Oxford, England, September 1990. - [3] S. Borkar, R. Cohn, G. Cox, H. T. Kung, M. Lam, B. Moore, C. Peterson, J. Pieper, L. Rankin, P. S. Tseng, J. Sutton, J. Urbanski, and J. Webb. iWARP: an integrated solution to high-speed parallel computing. In *Proceedings of Supercomputing '88*, pages 330-339. IEEE Press, 1988. - [4] Thinking Machines Corporation. Connection machine technical summary. 1990. - [5] William Dally. Network and processor architecture for message-driven computers. In R. Suaya and G. Birtwistle, editors, *VLSI and Parallel Computation*, pages 140-222. Morgan Kaufmann, 1990. - [6] J.A. Goguen. Semantic specifications for the rewrite rule machine. In A. Yonezawa, W. McColl, and T. Ito, editors, *Concurrency: Theory, Language and Architecture*, pages 216–234. Springer LNCS, Vol. 491, 1990. - [7] J.A. Goguen, S. Leinwand, J. Meseguer, and T. Winkler. The rewrite rule machine, 1988. Technical Report PRG-76, Oxford University, Programming Research Group, 1989. - [8] Joseph Goguen, Claude Kirchner, Hélène Kirchner, Aristide Mégrelis, José Meseguer, and Timothy Winkler. An introduction to OBJ3. In Jean-Pierre Jouannaud and Stephane Kaplan, editors, Proceedings, Conference on Conditional Term Rewriting, Orsay, France, July 8-10, 1987, pages 258-263. Springer LNCS 308, 1988. - [9] Joseph Goguen, Claude Kirchner, and José Meseguer. Concurrent term rewriting as a model of computation. In R. Keller and J. Fasel, editors, *Proc. Workshop on Graph Reduction*, Santa Fe, New Mexico, pages 53-93. Springer LNCS 279, 1987. - [10] Joseph Goguen and José Meseguer. Eqlog: Equality, types, and generic modules for logic programming. In Douglas DeGroot and Gary Lindstrom, editors, Logic Programming: Functions, Relations and Equations, pages 295-363. Prentice-Hall, 1986. - [11] Joseph Goguen and José Meseguer. Unifying functional, object-oriented and relational programming with logical semantics. In Bruce Shriver and Peter Wegner, editors, Research Directions in Object-Oriented Programming, pages 417-477. MIT Press, 1987. - [12] Joseph Goguen and José Meseguer. Software for the rewrite rule machine. In Proceedings of the International Conference on Fifth Generation Computer Systems, Tokyo. Japan, pages 628-637. ICOT. 1988. - [13] J. Hardy, B. Hasslacher, and Y. Pomeau. Lattice gas automata for the Navier-Stokes equation. *Physical Review Letters*, 56:1505, 1986. - [14] W. Hillis. The Connection Machine. MIT Press, 1985. - [15] HOT Chips III. IEEE, 1991. Record of Symposium held at Stanford University August 26-27, 1991. - [16] S. Leinwand, J.A. Goguen, and T. Winkler. Cell and ensemble architecture for the rewrite rule machine. In *Proceedings of the International Conference on Fifth Generation Computer Systems*, Tokyo, Japan, pages 869-878. ICOT, 1988. - [17] José Meseguer. A logical theory of concurrent objects. In ECOOP-OOPSLA'90 Conference on Object-Oriented Programming, Ottawa, Canada, October 1990, pages 101-115. ACM, 1990. - [18] José Meseguer. Rewriting as a unified model of concurrency. In *Proceedings of the Concur'90 Conference, Amsterdam, August 1990*, pages 384-400. Springer LNCS 458, 1990. - [19] José Meseguer. Conditional rewriting logic as a unified model of concurrency. Theoretical Computer Science, 96(1):73-155, 1992. - [20] José Meseguer and Timothy Winkler. Parallel programming in Maude. In J.-P. Banâtre and D. Le Mètayer, editors, Research Directions in High-level Parallel Programming Languages, pages 253-293. Springer LNCS 574, 1992. Also Technical Report SRI-CSL-91-08, SRI International, Computer Science Laboratory, November 1991. - [21] Ulrich Schmidt and Knut Caesar. Datawave: A single-chip multiprocessor for video applications. *IEEE Micro*, 11(3):22-25, June 1991. - [22] Charles L. Seitz. Concurrent architectures. In R. Suaya and G. Birtwistle, editors, VLSI and Parallel Computation, pages 1-84. Morgan Kaufmann, 1990. - [23] Arthur Trew and Greg Wilson, editors. Past. Present, Parallel: A Survey of Parallel Computing at the Beginning of the 1990s. Springer-Verlag, 1991. A Code of the Interensemble Simulator and of Several Benchmarks Run on it The abstract interensemble simulator presented below was instrumental in our initial explorations of the network requirements for an RRM cluster. The abstract interensemble simulator essentially consists of an an annotated interpreter of rewrite-rules. This interpreter keeps track of which ensemble a term is supposed to reside in. Given a strategy for spawning tasks, the simulator then 'moves' terms from one ensemble to another (or causes newly allocated cells to be created in some other ensemble) by annotating the terms with the name (location) of a new ensemble. Many experiments were run using this abstract simulator. For example, very preliminary experiments were conducted considering simple alternative cluster network configurations (2-D mesh, complete interconnect, and several bus-like connection layouts). Experiments were conducted with alternate strategies for spawning new tasks (allocating new cells in other ensembles only once one ensemble is nearly full, pushing out whole subterms when an ensemble becomes full, and static spawning routines). Network bandwidth requirements were estimated based on the number of pointers detected between terms residing on various simulated ensembles. Some initial experiments were also carried out regarding alternate formulations of the rewrite rules. #1/bin/csh .f 4terl bubble sort foreach size (5 10 15 20 25 30 35 40 50 60 70 80 90 100 110 120) runtest tell \$size end stst2 peano (ibo foreach size (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14) runtest tst2 Seize end itet3 merge fort foreach size (5 10 15 20 25 30 35 40 50 60 70 80) runtedt (4t3 \$size end - ``` make.lsp ``` (cmb) ``` (format t "-6,2f" (quo (state0_sum x) (' (state0_num x) zer))) (terpri) (princ 'num: ') (prinl (stated_num x)) (terpri) (princ 'sum: ') (prinl (stated_sum x)) (terpri) (princ 'man: ') (prinl (stated_sum x)) (terpri) (princ 'sum sq: ') (prinl (stated_sum sq x)) (terpri) (princ 'sum sq: ') (prinl (stated_sum sq x)) (terpri) (princ 'su: ') (format t "-6,2f" (stated_sum sq x)) (princ 'dev: ') (format t "-6,2f" (stated_sum sq x)) (stated 'c (zer (or (stated_sum sq x)) (terpri) (when (and (* 0 zer) (* zer (stated_sum x))) (princ 'non zero av: ') (let (inzval (if (zerop (car e)) 0.0 (quo (cdr e) naeum))) (val (quo (cdr e) sum))) (eat nacum (* nacum naval)) (let ((ss (state_ensemble 1))) (unices (ort (state_uned ss))); (equal none ss) (unices (state) (state_unety ss)) gai) (merge_etate (state)_glb1 (state_output ss)) gao) (setf (state0_av se) 0.0) (setf (state0_dev se) 0.0)) (setf (state0_dev se) av) (setf (state0_av se) av) (setf (state0_dev se) (setf (state0_dev se) (when (zerop (car e)) (setq zers (cdr e))) (incf sum (cdr e))) (let ((nzeum (' sum zere)) (nzcum 0.0) (cum 0.0)) (princ "global per step I/O state") (terpri) (stateO_compute_derived_basic glob) (stateO_compute_derived_basic globo) (princ "luput") (terpri) (stateO_print globa) (terpri) (princ "output") (terpri) (princ "global ensemble I/O state") (terpri) (state0_compute_derived_basic_gs)) (state0_compute_derived_basic_gso) (state0_compute_derived_basic_gso) (princ_input') (terpri) (state0_print_gso) (terpri) (state0_print_gso) (terpri) (state0_print_gso) (terpri) (state0_print_gso) (defun quo (x y) (coerce (/ x y) 'float)) (princ 'disribution: ') (terpri) (print_distr (state0_distr x)) (state0_print globo) (terpri)) (lot ((gs) (stateO_create)) (gso (stateO_create)) :(none (state_make)) (let ((sum 0) (zers 0)) (dolist (e d) (dotimes (i rrm_size) (defun state) print (x) defun print_distr (d) (defun ensiostats () ``` ``` (setq cum (+ cum val)) (format t *-5d: -6d -8.2f -8.2f -8.2f-9. (car e) (cdr e) (* 100 val) (* 100 nzcum) (* 100 cum))))) ``` ``` patch.lsp ``` ``` tst5.lsp ``` ``` (,e, peof) (name) (eq (al nil (0 nil) (empty-row nil)) (empty-row nil)) (eq (alm nil (empty-row nil) (empty-matrix nil)) (empty-matrix nil)) ) (eq (firsts nil (empty-matrix nil)) (empty-row nil)) (eq (firsts nil (alm nil (empty-row nil) m)) (al nil (o nil) (firsts nil m)) (eq (firsts nil (al nil (x) m)) (al nil ( (firsts nil m))) (eq (rests nil (empty-matrix nil)) (empty-matrix nil)) (eq (rests nil (alla nil (empty-row nil) m)) (alm nil (empty-row nil) (rests nil m)) (eq (rests nil (alm nil (al nil x r) m)) (alm nil (alm nil (al nil x r))) (eq (tr nil empty.matrix nil)) (ampty-matrix nil)) :(ceq (tr nil m) (alm nil (fitute nil m) (tr nil (rests nil m))) : (not nil (== nil m '(ampty-matrix nil))) (eq (tr nil (alm (al (matr nil (empty-matrix nil) n) (empty-matrix nil)) (amtr nil (alm nil a m) n) (alm nil (rowxtr nil a n) (mmtr nil m n))) (eq (rowit nil a (empty-matrix nil)) (empty-row nil)) (eq (rowit nil a (alm nil b n)) (ed (ip nil a b) (rowit nil a n))) (let (() (- test-size jo 1))) (setq row (list 'al (list (+ 1 j)) row)) (eq ("m nil m n) (mmtr nil m (tr nil n))) (ip nil (empty-row nil) a) (0 nil)) (ip nil a (empty-row nil) (0 nil)) (ip nil (al nil x a) (al nil y b)) (+ nil (* nil x y) (ip nil a b))) (let ((clock 0) (mat '(empty-matrix))) ((.est. peol) (Inwie, dpunod)) seelun) (init_term (rrm_ensemble 0 0 1 1) (setd mat (list 'alm row mat)) (setq tst 5) (setq test size 3) (setq test size 3) (setq intra_cluster_penalty 2) (setq inter_cluster_penalty 3) (setq punh_lim 3) (setq bull_lim 3) (setq bull_lim 3) (ed (sq nil m) ("m nil m m)) (setq all_partitions '( (to file "tet5.out" . Matrix Multiply (setq root (Inft) Ş Ş 0 0 0 0 0 0 0 0 0 ``` ``` (list '*m mat mat))) ``` ``` (princ 'nitial') (terpri) (print_term root) (terpri) (simul) (load 'a') ``` strat.lsp ``` (defun compute_push_penalty (to from) (if (= {aref ensemble_cluster to) (aref ensemble_cluster from)) (progn (ens_add_output e val) (incf step_output val)) (break 'io_action error') (if (eq 'in dir) (progn (ens. add_input e val) (incf step_input val)) (if (eq 'out dir) (defun global lo_new_group () (format t " -6d -8d-%" step_output step_input) (insert_val input_state step_input) (defun add_io_action (time dir e val) (push (list time dir e val) io_actions)) (defun compute_reloc_penalty (to from knd) (if (eq 'near knd) (defun perform_lo_action {x}) (io_action (car x) (cadr x) (caddr x))) (insert_val output_state step_output) (new_group output_state) (setq step_output_0) (detvar size_lim 100) (detvar reloc_mize_lim) (detvar ene_full_three size_lim) (detvar conflict_reloc_mize_0) (detvar ene_mize_low 50) (detvar ene_full_pumh 0) (detvar ene_full_pumh 0) (perform to action (cdr a)) (push a new)) (defvar mssg_reloc_req_size 40) (defvar mssg_push_size 120) (defun force_relocation (x e) (setq forced_relocation t) (defun to_action (dir e val) (defun perform to actions () intra_cluster_penalty inter_cluster_penalty)) inter_cluster_penalty)) mssg_data_size 100) (let ((loc (term_loc x))) (let ((new nil)) (dolist (a lo_actions) (if (<= (car a) clock) (setq some_relocation t) (new_group input_state) (setq step_input 0) Intra_cluster_penalty (setq lo_actions new) (defvar io_actions nil) (defvar step_output 0) (perform_lo_actions) (defvar step_input 0) (defvar output_state) |defvar input_stats} defvar ``` ``` (aref ensemble cluster e) (aref ensemble cluster loc)))) (when (and res (< wh (cdr res))) (setf (cdr res) wh)))) :@ note (let ((kind (if (= (aref ensemble_cluster icc) (aref ensemble_cluster e)) 'near 'far))) : if ensemble e is full or i is big enough, push to another ensemble per-rule treatment (add to action mid 'in loc mass_reloc_req_size) (add_to_action mid 'out loc mass_data_size)) (add_to_action wh' in a mass_data_size) (cluster_ensemble_inc_reloc e loc)) (io_action 'out e masg_reloc_req_size) (lat (imid (+ clock (trunsate delay 2)))) (add_lo_action mid 'in loc masg_reloc_req_size) (add_lo_action mid 'out loc masg_reloc_req_size)) (let ((current_ensemble e)) (replace_list x (term_manke y)) (replace_list x (term_manke y)) ; term yemake -- need this in case rule is of form l = x ; teeult is always in the original ensemble (when (** 5 verbose) (princ *88 reloc size conflict ens: ") (prinl e) (princ * from *) (prinl luc) (terpri)) (incf conflict_reloc_size)) (let ((delay (compute_reloc_penalty e loc kind))) (let ((wh (+ clock delay))) (the (tree (casoc e (get_an x 'where))) (the (and rea (<= (cdr rea) wh)) (when (and verbose (<= 4 verbose)) (princ "fel") (prinl kind) (princ 'reloc") (terpri) (princ "to") (prinl wh) (terpri) (princ "at") (prinl wh) (terpri) (princ "at") (prinl wh) (terpri) </pre> (when (<- 5 verbose) (prior * 981 raioc comm conflict ens: ") (print e) (princ * and ") (print locy (terpris)) (if (eq 'near kind) (if (or (not test_comm) (test_ensemble_reloc e loc)) (if (< reloc_eize_lim (ens_size e))</pre> (end_inc_raioc e) (io_action 'out e masg_reloc_req_size) (let ((mid (+ clock (truncate delay 2)))) (add_to_action wh 'in a mesg_data_size) (princ '848 repl') (princ 'ens ') (prinl e) (princ 'ersl') (prinl l) (terpri) (princ 'term') (print_term_choose x) (terpri) (princ 'repl') (print_term_choose y) (terpri) (princ 'rule ') (prinl F) (terpri) (< size_lim (ens_size_recompute e))) (incf conflict_near) (incf conflict_far))) ;@ note ) ; if or not test_comm (when loc ;@d wait if being relocated (ens inc reloc far e) (term_add_loc x e wh) (if (eq 'near kind) (1f (or (*- push_11m 1) (move_node x e) (progn ..... ``` ``` (defun test_ensemble_reloc (i j) (let ((ic (aref ensemble_clueter i)) (jc (aref ensemble_clueter j))) (push (build_term push (if push (alloc_ensemble e) e) i2 a) res) (print e) (terpri) (print c) terpri) (princ pushed ') (trint_term_choose u) (terpri)) (princ pushed ') (trint_term_choose u) (terpri)) (princ pushed ') (compute push penalty e current_ensemble))) (setf (car res) 'where) (setf (car res) 'where) (setf (car res) 'where) (setf (car res) 'where) (setf (car res) 'where (set (car res) 'where (setf (car ress) 'where (set (car ress)) (setf r intra_cluster_lim)) (< (state0_cur (aref cluster_state ic jc)) inter_cluster_lim))</pre> let ((res (term make where e (car x) (nreverse res)))) (if (= ic )c) (let ((cesi (aref cluster_ensemble_state ic))) (i2 (+ i 1)) (push nil)) (when (+ build_lim i) (setq i2 0 push t)) (dollet (a (cdr x)) (let ((res (assoc 'new_where (term_an u)))) (defun build_term (fig e i x) (if (or (term_is_war x) (term_is_bi x)) x (when (is_new u) (push node u (alloc_ensemble e)) (dolist (a (term_args u)) (update_new_loc m e)) (when (and verbose (<= 2 verbose)) (princ '### pushing ') (prinl current_ensemble)</pre> (defun ens_size_invalidate (e) (let ((inf (status_ensemble e))) (let ((x (status_eize inf))) (ens_size_invalidate e) :@@ (dolist (u (term_args x)) (eng size_invalidate e) :@@ (ens_size_invalidate e) : flg: should this be a root? (defvar intra_cluster_lim 10) (defvar inter_cluster_lim 10) (defvar conflict_near 0) (defvar conflict_far 0) (8etf (car x) -1)))) (update_new_loc x e)) (defvar test_comm nil) (brinc " to ") (let ((ree nil) (null cest) (when x ``` (defvar orig\_ens) ``` (shove_node a (car args)) (when (or (null (cdr args)) (< sz ene_size_low)) (return)) (setq args (cdr args)) (let ((nevens (alloc_ensemble e))) (term_add_loc tim nevens (compute_push_penalty nevens e)) push part of the one top term (let (itrm (car lat))) (let (itrg (term arge trm))) (when (<= 5 verbose) (princ 'sée ens (ull pushing part ') (prinl e) (princ 'sée ens (ull pushing part ') (prinl e) (incf ens_tull_push_eingle)</pre> (defun term_top_size (x) (if (or (term_is_ver x) (term_is_const x)) 1 (let ((res 1)) :count one for the top node (dolist (e (term_arge x)) (when (cocure in ensemble e orlogens) (setq res (+ res (term_top_size e)))) (when (cdr args) (return)) (setq args (term_args (car args))) (inf (status_ensemble e))) (dolist (r (status_roots inf)) (setq res (+ res (term_top_size r))) (print_term_choose (car args)) (cluster_ensemble_inc_push e nevens) :@ (when (< eng full_three sz) (incf eng full_push) (let ((let (status_roots inf))) (if (null (odr let)))</pre> (set_status_roots inf let) (ensemble_add_root nevens trm) (ens_size_invalidate nevens) ; (and (not (is_root e)) ...) (defun compute_ens_size (e) (let ((orig_ens e)) (defun shove_node (e trm) (ene_inc_push e) :@ (orig_ens e)) (let ((res 0) = ``` ``` strat.lsp ``` ``` (defun term_remove_loc (x e) (lat ((res nil)) (lat ((res nil)) (dolist term_remove_loc res)) (defun term_remove_loc res) (defun term_remove_loc res) (unless (or (term_args x)) (unless (or (term_args x)) (unless (or (term_args x)) (unless (or (term_args x)) (term_remove_loc x e) (dolist (as (term_args x)) (dolist (as (term_args x)) (dolist (as (term_args x)) (term_remove_loc x e) (dolist (as (term_args x)) (dolist (as (term_args x)) (dolist (as (term_args x)) (dolist (as (term_args x)) (dolist (as (term_args x)) (dolist (arg.col)) (terps) (princ ')) (if (or (unberp x) (princ ')) (if (or (unberp x) (term_args x)) (princ ') ``` ``` (print rrm_cluster_height) (princ 'x') (print rrm_cluster_width) (terpri) (eetq inter_cluster_lim 40) :@100 (princ "intra cluster comm limit: ') (print intra_cluster_lim) (terpri) (princ 'inter cluster comm limit: ') (print inter_cluster_lim) (terpri)) (setq intra_cluster_penalty 2) (princ 'intra_cluster_penalty) (terpri) (setq inter_cluster_penalty 2) :@20 [princ inter_cluster_penalty) (terpri) (unless (boundp 'alloc range) (load 'allocd')) (princ 'alloc threshold: ') (prinl alloc_thres) (terpri) (princ 'alloc randomization range: ') (prinl alloc_range) (princ 'aven totally (ull: alloc_ut) (terpri) (princ 'then totally full: alloc ') (when (boundp 'ens_full_thres) (setq ens_full_thres_100) (princ 'ensemble full_thresbold: ") (print_ens_full_thres) (terpri) (setq size_lim 100) (princ "size_limit: ') (prinl size_lim) (terpri) (princ "size_limit: ') (prinl size_lim) (setq reloc size_limi 100) (princ "reloc size_limit: ') (prinl reloc_size_lim) (terpri)) (princ "ensemble low level: ") (print ensemble low) (terpri)) (print cluster height) (princ 'x') (print cluster width) (princ cluster ensembles ') (setq 'random-state' (make'random-state t)) (princ 'random state: ') (prinl 'random-state') (terpri)) (when (boundp 'clock_limit) (princ 'clock limit: ') (print clock_limit) (terpri)) (princ "state: ene ") (print ene_state_param) (princ " clust ene ") (print clust_ene_state_param) (princ " clust ") (print clust_param) (terpri) build lim: ') (print build_lim) (terpri) (princ 'problem size: ') (prinl test-size) (terpri) (setq gc_period 1) (princ 'gc period: ') (princ gc_period) (terpri) (progn (princ no communication limits') (terpri)) (sety test-size (read-from-string val)) (unless (boundp 'compute_push_penalty) (load "strat")) (setq push_lim lim) (princ "push lim: ") (prinl push_lim) (princ "rrm size: rrm clusters ") ((("312" 'getenv 'S12E"))) (when (boundp 'test_comm) (setq test_comm t) (if test_comm (setq ens_size_low 55) (setq build_lim lim) (defvar lim 20) (load 'patch') (terpri) (princ : (progn ``` ``` Adproblem: really would like to have each pointer be to a term at a location roots of ensembles (@todo reduced teating (delay) simple approach: per emable flag, clear whenever start set of simple approach: per emable flag, clear whenever start set of all partitions, est when any rule is applied. If is not set in a pass all nodes with reduced children become reduced (??) (@todo random walk doesn't give snough dispersion (@to where s too, if no longer there, then remove from where list short randomization in penalties? (charge masse?) - will try adding alternate versions (cluster_ensemble_inc_reloc e loc) -> (cluster_ensemble_inc_comm ry) annotations on term -- ensembles having copies, reduced status for reduced: keep track of set of rules used on term since last success SL: GC is equivalent to using just the term, then a pass to find current (cluster_inc_reloc x y) ... (cluster_inc_comm x y) basic simulation step: activate all ensembles -- apply current rule of rules set and find next rule if have to switch rule sets, then increment an ensemble counter a current rule set and current rule a current set of term roots a current 'full-ness' use of term in another ensemble causes current-time plus delay to be added as time for desired ensemble probably want to carefully do match and replacement: replacement is delayed, put on global action list, done at end of major simulation step terms labelled with partitions that they occur in and time each ensemble is in a cluster communication is inside a cluster or across clusters stats on all communication across clusters are kept communication across clusters takes even longer (defvar show_term nil) (setq 'print-case' :downcase) · Inter ensemble Simulations tags: where reduced rules (defvar current_ensemble) (defvar ensemble_cluster) (defvar ensemble_status) (var periodic 500) (defvar push_lim 5) (defvar build_lim 5) (defvar gc_period 10) each ensemble has (defvar verbose 1) when valid (defvar root) ``` ``` cluster row and col (within rrm); ensemble low and col (within cluster) (defun cluster_left (e) (let ((col (rem e cluster_width)) (+ e (· col) (if (* col 0) (· cluster_width 1) (· col 1))))) (truncate c rrm_cluster_width) (rem c rrm_cluster_width) (defun cluster_elt (c 1 f) (let (fm tem c rrm_cluster_width))) (+ (* (* (* c rm) cluster_height) rm) cluster_width) (* 1 rrm_width) (+ (truncate (rem e rrm_width) cluster_width) (* (truncate (truncate e rrm_width) cluster_height) rrm_cluster_width))) (defun cluster_row (e) (rem (truncate e rrm_width) cluster_height)) (defun cluster_ensemble_number (e) (+ (* (cluster_row e) cluster_width) (defun rrm_enemble (1 ) k 1) (+ (* 1 trm_vidth cluster_height) (* ) cluster_width) (* k rrm_width) (defvar eng_state_param 10) (defvar clust_eng_state_param 10) (defvar clust_param 10) (defvar cluster_ensemble_stats2) (defvar cluster_state) (defvar cluster_ensemble_state) (defvar ensemble_state) (defvar forced relocation nil) (defvar some_relocation nil) (defvar replacements 0) (defvar rule_successful nil) (defun ensemble_number (i j) (+ (* i cluster_width) j)) (defwar clock_limit 999999) : (defun cluster_elt (c i j) (defvar replacement_state) (truncate e rrm_width)) lefun cluster_col (e) (rem e cluster_vidth)) (defvar action_list nil) (cluster_col e))) (defun rrm_cluster (e) (defun rrm_col (e) (rem e rrm_vidth)) (defun rrm_elt (1 5) (+ (+ i rrm_width) (defun rrs_row (e) (defvar clock 0) intra-cluster ``` ``` (defun cluster_up (e) (let ((row (rem (truncate e rrm width) cluster_height))) (* e (* rrm_width (if (* row 0) (* cluster_height row 1) -1))))) (defun cluster right (e) (let ((col (Tem e cluster_width))) (* e (* col) (if (* col (* cluster_width 1)) 0 (* col 1))))) :(defun term_make_nil (x y) (cons x (cons nil y))); not used (defun term_make_an (x a y) (cons x (cons a y))) (defun term_apse (x) (cdur x)) (defun term_apse (x) (cdur x)) (defun term_ap (x) (car x)) (defun term_ap (x) (cadr x)) (defun rrm_left (e) (let ((col (rem e rrm_width))) (+ e (- col) (if (= col 0) (- rrm_width 1) (- col 1))))) (defun add_repl (x x i y) (push (list 'repl current_ensemble x r i y) action_list)) (if (= col (- rrm_width 1)) 0 (+ col 1)))) (break "perform_actions: unknown action")) (dolist (a action_list) (if (eq 'repl (car a)) (spply F'perform_replacement (cdr a)) (if (eq 'repl (car a)) (if (eq 'repl add_root (car a)) (defun rm_up (e) (rem (+ e rrm_size (* rrm_width)) rrm_size)) (defun term_is_var (x) (and x (symbolp x))) (defun term_is_const (x) (or (numberp x) (defin add_root (e x) (push (list 'root e x) action_list)) (ens_set_used_s) (let_((inf (status_ensemble e))) (set_status_roots inf (cons x (status_roots inf)))) (defun rrm_down (e) (rem (+ e rrm_width) rrm_eize)) (let ((col (rem e rrm_width))) (defun ensemble_add_root (e x) (all_new_group) (setq clock (+ clock 1))) (defun replace_list (x y) (defun perform_actions () (setf (car x) (car y)) (setf (cdr x) (cdr y)) (defun cluster_down (e) (defun reset_clock () (setq clock 0)) (defun rrm_right (e) (+ e (· col) inter-cluster (defun tick () ``` (and (consp x) ``` (defun term loc (X) (let (0 have each pointer be to a term at a location (defun term loc (X) (cons 'new_where (list (cons current_ensemble clock))) (cons 'new_where (list (cons current_ensemble clock))) (cons 'where (list (cons current_ensemble clock))) (let ((ree (aseoc n (term_an x)))) (if res (setf (cdr res) v) (setf (cadr x) (cons (cons n v) (term_an x)))) :@@ (cons 'where (list (cons wh clock))) (if (numberp x) x (if (and (consp x) (numberp (car x))) (car x) (if (and (consp x) (eq 'bi (car x))) :@@keep more annotations? -- don't want where (term_make_direct (car x) (nreverse res)) (or (numberp (car x)) (eq 'bi (car x))))) (defun term_ia_bi (x) (and (consp x) (eq 'bi (car x)))) (defun term_const_val (x) (defun occurs in_ensemble (x e) (let ((loce (get_an x 'where))) (let ((rem (asuoc e loce))) (defun term_expand (x) (if (term_is_yex x) x (ide ((tes nil)) (dolist (a (off x)) (push (term_expand a) ree) (break "term_const_val")))) (defun term_make_where (wh x y) (<= ( dr res) clock)) , odr (assoc n (term_an x))) (defun term_make_direct (x y) (@ note handling of where the contract of (defun an_add (a n v) (cons (cons n v) a)) (defun term remake (x) (cons (car x) (defun set_an (x n v) (cddr x))) (defun get_an (x n) (11st ( ( K (cons x (cons (cons x (cons (cadr x) (and res = ``` ``` (if (atom p) (if (equal p x) nil 'fail) (if (and (conex x) 'fail) (if (and (conex x) 'fail) (if (not (cocure_in_ensemble x current_ensemble)) (proof) (force_elocation x current_ensemble) (if (not (equal (term_op p) (term_op x))) 'fail) (if (not (equal (term_op p) (term_op x))) 'fail) (when (null pl) (if (null xl) (return res) (return 'fail))) (seeq null xl) (return 'fail)) (seeq al (term_match_l (arr pl)) (car xl))) (if (eq 'fail val) (return 'fail) (if (term_is_var p) (if (term_is_var x) {list (list p x)} (if (term_is_var x) {list (list p x)} (if (not (cours_in_ensemble x current_ensemble)) (progn (force_relocation x current_ensemble) 'fail) (dolist (e locs) (when (<= (cdr e) clock) (return (car e)))</pre> (when (< 60 (col)) (terpri) (princ ')) (if (term_is_var x) (prini x) (if (term_is_const x) (prini (term_const_val x))</pre> (setf (cdr res) w)) (set an x 'where (cons cons e w) wh))) (when (< 60 (col)) (terpri) (princ ' ')) (defun term_match (p x) (when (and verbose (<= 10 verbose)) (princ '### match') (print_term_choose p) (princ '*** (print_term_choose x) (terpri))</pre> (setq res (append res val))) (setq pl (cdr pl) xl (cdr xl)) (defun an_add_loc (a e w) (an_add a 'where (list (cons e w)))) (let ((ree (term_match_1 p x))) (when (and verbose (<- 10 verbose)) (princ '## match reeult ') (prinl ree) (terpxi)) (defun term_add_loc (x e w) (let ((wh (get_an x 'where))) (let ((res (assoc e wh))) (prin1 (term_op x)) (dollet (e (term_args x)) (princ **) (when (< w (cdr res)) (list (list p x)))) (if (term is var x) 'fail (if (term is const p) (defun term_match_1 (p x) (defun print_term (x) (print term e)) (brinc '(') princ ")") (if locs ..... uT) (progn ``` ``` (list (cons current_ensemble clock))) (mapcer #'(lasbda (u) (term_inst u s)) (term_sigs x))) (term_equal (eval_cond_direct (car (term_args e)) mt) (aval_cond_direct (cadr (term_args e)) mt)) (if (member (term_op e) "(numberp equal atom nequal)) (term_make_an (term_op x) :@@teeping annotations (an_add_loc (term_an x) current_ensemble clock) (an_add (term_an x) (if (eq 'and (term_op e)) (ist (ist (tai_ond_direct (ost (term_args e)) mt))) (if (eq 'fail a) 'fail (if (enlia) term (if (eq 'or (term_op e)) (if (eq 'fail an) 'fail (if (eq 'fail an) 'fail {eval_cond_direct (cadr (term_args e)) mt)})) (if (eq '** (term_op e)) (eval_cond_direct (cadr (term_args e)) mt)))) (let ((val (eval cond direct x mt))) (bet ((val (eval cond direct x mt))) (when (eq 'fail val) (return 'fail)) (if (or (atom x) (term is b) x) x (if (and (consp x) (eq 'b-i (car x))) (let ((subs s)) (defun val (v) (var_inst v subs)) (defun bi_inst (r m mt) (apply (car m) (cddr m)) ;@@?? (dollst (x (term_args e) (defun rule_lhe (x) (cadr x)) (defun rule_rhe (x) (caddr x)) (defun rule_cond (x) (caddr x)) (term_inst (rule_rhs r) mt)) (if (term_is_var x) (let ((res (assoc x s))) (if res (cadr res) (defun rule_inst (r x mt) (if (eq 'b-i (rule_rhs r)) (bi_inst r x mt) (let ((res (assoc v s))) (if res (cadr res) (eval (cadr x))) (defun mak (x) (list x)) (defun term_inst (x s) (defun var_inst (v s) (let ((as nil)) ``` (let ((as nil)) (defvar subs) ``` Jolist (e (term_args x)) (unless (or (is_root s) :@@ cut off rewriting at transition @@new (not (occurs_in_ensemble e current_ensemble))) (apply_rule_all (+ i i) r s))) (unless (term_equal (car xa) (car ya)) (return nil)) (setq xa (cdr xa) ya (cdr ya)) (if (rule_cond r) (let ((res (eval_cond_direct (rule_cond r) mt))) (if (eq 'fail res) (break "apply_rule: complex condition") (if (atom_x) (equal x y) (if (term_is_bi x) (equal x y) (if (term_is_bi x) (equal x y) (if (term_is_bi y) nil (and (equal (term_op x) (term_op y)) (let ((xa (term_args x)) (ya (term_args y))) (add_repl x r 1 (rule_inst r x mt))))) (defun apply_rule_all (i r r) (unless (or (term_is_var r) (term_is_const r)) (apply_rule i r r) (oblist (e (term_args r)) (dollet (x (term_args e)) (push (eval_cond_direct x mt) as)) (apply (term_op e) (nreverse as))) (when (null xa) (return (null ya))) (setq rule_successful t) (ens_inc_repl current_ansamble) (incf replacements) (add_repl x r 1 (rule_inst x mt))) (defun direct_value (x) (if (term_is_const_x) (term_const_val x) (defun apply_rule (1 r x) (let ((mt (term_match (rule_lhm r) x))) (unless (eq 'fail mt) (setq rule_successful t) (ens_inc_repl current_ensemble) (incf replacements) (defun nequal (x y) (not (equal x y))) (defun try_rule (inf rul) (defun term_equal (x y) i is a depth (progn ...... = ``` (or rule successful forced relocation) ``` ; (ens inc switch current ensemble) : (detodo only count final switch? ; (ens_inc_switch current_ensemble) :(@@note only one switch (ens_inc_switch current_ensemble) :(@Gnote only one switch (when (and (eq re init_rules) (eq ps init_partitions)) (when (or (null rs) (eq rs init_rules)) (return)) (when (try_rule inf (car rs)) (defun set_status_all_rules (s.x) (setf (nth ) e) x)) (defun set_status_partitions (s.x) (setf (nth 4 e) x)) (defun set_status_alze (s.x) (setf (nth 5 e) x)) (defun status ensemble (e) (aref ensemble status e)) (init partitions (status_all_rules inf))) (let ((re init_rules) (... (defun set_status_roots (e x) (setf (nth 1 e) x)) (defun set_status_rules (e x) (setf (nth 2 e) x)) (return from outer nil) (when (try rule inf (car re)) (set_status_rules inf (cdr re)) (set_status_all_rules inf (cdr pe)) (return-from outer nil)) (set status rules inf (cdr rs)) (set_status_all_rules inf (car ps)) (return-from outer nll)) (when (try_rule inf (car re)) (set_status_rules inf (cdr rs)) (return-from outer nil)) (set status rules inf (cdr rs)) (return-from outer nil)) (setq rs (cdr rs))) (defun status all rules (e) (nth 3 e)) (defun status partitions (e) (nth 4 e)) (defun status size (e) (nth 5 e)) (when (null re) (return))) (setq re (status_all_rules inf)) (ens_full_strategy current_ensemble) (when (try_rule inf (car rs)) (when (null rs) (return))) (setq ps (odr ps))) (when (null re) (return))) (setq pe (odr pe)) (defun status_roots (e) (nth 1 e)) (defun status_rules (e) (nth 2 e)) (when (null pe) (return))) (when (null pe) (return)) (stated_init ens_state_param) (setq pe all_partitions) (defun status_make (e) (list e nil nil nil nil) (ps init_partitions)) (setq rs (cdr re)) (setq rs (cdr rs)) (setq rs (odr rs)) (setd rs (car ps)) (setq rs (car pe)) (defun state_make () (block outer ``` 10 ``` (defun ene_add_input (e val) (state_add_input (aref ensemble_stats e) val)) (defun ens_add_output (e val) (state_add_output (aref ensemble_stats e) val)) (defun ena_inc_switch (e) (state_inc_switch (aref ensemble_state e))) (defun ens_inc_reloc (e) (state_inc_reloc (aref ensemble_state e))) (defun ens_inc_reloc (ar (e) (state_inc_reloc_far (aref ensemble_state e))) (defun ens_inc_push (e) (state_inc_push (aref ensemble_state e))) (dofun ens_inc_repl (e) (state_inc_repl (aref ensemble_state e))) (defun ens_set_used (e) (state_state_set_used (aref ensemble_state e))) (let ((ic (aref ensemble_cluster i)) (jc (aref ensemble_cluster j))) (if (= ic jc) (defun state_add_input (x val) (add_val (nth 6 x) val)) (defun state_add_output (x val) (add_val (nth 7 x) val)) (defun state_inc_switch (x) (inc_val (car x))) (defun state_inc_reloc (x) (inc_val (cadx x))) (defun state_inc_reloc_far (x) (inc_val (caddx x))) (defun state_inc_push (x) (inc_val (caddx x))) (defun state_inc_repl (x) (inc_val (nth 4 x))) (defun state_set_used (x) (setf (nth 8 x) t)) (defun stats_ensemble (e) (aref ensemble_stats e)) (defun state_switch (x) (car x)) (defun state_reloc (x) (cadx x)) (defun state_reloc (xr (xr) (caddr x)) (defun state_reloc (xr (xr) (xr)) (defun state_rep1 (xr) (nth 4 xr)) (defun state_size (xr) (nth 5 xr)) (defun state_output (xr) (nth 6 xr)) (defun state_output (xr) (nth 7 xr)) (defun state_used (xr) (nth 8 xr)) (setf (stats0_cur siz) (ens_size e)) (new_group siz)) (defun ens_add_size (e) (let ((stre (arcf ensemble_stuts e))) (let ((stre(arcf_size stre))) (setf (state_our siz) (ens_size e)) (nev_group siz)))) (defun cluster_ensemble_inc_reloc (1 1) (defun eng_state_new_group (e) (let ((stts (aref ensemble_state e))) (defun cluster_ing_reloc (i j) (ing_val (aref cluster_state i j))) (defmacro state0_cur (x) '(nth 1 ,x)) (stats0_init ens_stats_param) (stats0_init ens_stats_param) (stats0_init ens_stats_param) (new_group (nth 4 stts)) (let ((siz (state_size stts))) (stated_init_ene_stats_param) (stats0_init_ene_stats_param) (stats0_init_ene_state_param) (new_group (nth 0 ette)) (new_group (nth 2 stts)) (new_group (nth 6 etts)) (new_group (nth 7 etts)) ``` ``` (setq cluster_stats | trm_cluster_size rrm_cluster_size clust_patam)) (make_2d_array_state rrm_cluster_size rrm_cluster_size clust_param)) (defun cluster_ensemble_inc_push (i j) (let ((ic (aref ensemble_cluster i)) (jc (aref ensemble_cluster j))) (if (= ic jo) clust_ens_stats_param))) (inc_val (aref cluster_ensemble_stats ic) (cluster_ensemble_number i) (defun init () (setq ensemble_cluster (make-array (list rrm_size))) (setf (aref ensemble_cluster i) (rrm_cluster i))) (setq ensemble_status (make-array (list rrm_size))) (actq ensembla_state (make-array (list rrm_size))) (dotimes (i rrm_size) (dotimes (i rrm_sise) (setf (aref ensemble_status i) (status_make i))) (setf (aref ensemble_state 1) (state_make))) (cluster_ensemble_number 1))) (setq cluster_ensemble_state2 (sake-array (list rrm_cluster_size))) (let ((res (assoc 'new_where (term_an u)))) (setq cluster_ensemble_state (make-array (list rrm_cluster_size))) (setq replacement_state (state0_init 1)) (setq replacemente 0) (inc_val (aref cluster_etate2 ic jo))) (setf (cadr res) (cons e clock)) conly push if new (defun is_new (x) (assoc 'new_where (term_an x))) (when (is new x) (move node x e) (dollst (y (term_args x)) (cluster_inc_reloc to jc)) (update_new_loc y e))) (defun update_new_loc (x e) (if (term_ls_var x) x (setf (car res) 'where) (if (atom x) x (if (term_ie_bi x) x (dotimes (1 rrm size) (setq cluster_state2 (defun move node (u e) ``` ``` ies.lsp ``` ``` (setq action_list nil) (simulation_step) (simulation_step) (shen (and (null action_list) (null some_relocation) (or (not (fboundp 'check_reduced)) (check_reduced root))) (defun setup (file) (with-open-file (*stanadard-input* file :direction :input) (setq all_partitions (read))) (init) (when (*- 1 verbose) {princ '------ ")} (when (= 0 (rem clock gc_period)) (perform_gc)) (when ( clock_limit clock) (princ 'clock Liwit ExcEEDED') (terpri) (setq input_stats (stateo_init 10)) (setq output_stats (statso_init 10)) (when (< 60 (col)) (terpil)) (when (and)</pre> (defun simulation_step () (when (<= 6 verbose) (dotimes (1 trm_size) (activate_ensemble i) (princ 'clock') (princ '') (prinl clock) (setq action_list nil) (setq step_input 0) ( torce .output) (perform_actions) (when verbose (defun simul () (reset_clock) (return)) ``` ``` (dotimes (i rrm_size) ;@@todo eliminate need for (put in ensemble_add_root) (let (finf (etatus_ensemble 1)) (doilet ( retus_ensemble 1)) (doilet (r etatus_roote inf)) (asark_root_node r) (clear_node r) (mapcan f'(lambda (p) (when {member x (status_roots (status_ensemble (car p)))) (defun mark_term (x) (unless (or (term_is_var x) (term_is_bi x) (is_marked x)) (dalts_mode x) (dalts (y (term_args x)) (mark_term y)) (set_an x 'mark clock))) ;(@ clock versus 'marked (defun fix_roote_where (x) (unless (or (term_is_var x) (term_is_bi x)) (when (is_root x) (get_an x 'where)))) (when (null new) (princ '#### null where') (terpri) (print x) (terpri) (print (get_an x 'where)) (terpri) ; (fix_roots_where root) ;@dtodo fix this (progn (clear_node n) nil) (dotimes (1 rrm_size) (lat ((inf (eratum_enemble 1))) (when (etatum_roots inf) (eet_etatum_roots inf (mapoen F'(lambda (n) (if (im_marked n) (when (<= 1 verbose) (terpri)) (defun is_root (x) (eq 'is_root (get_an x 'root))) (defun ie_marked (x) (equal clock (get_an x 'mark))) (clear_node n) (set an x 'where new)) (etatus_roots inf))) (defun mark_root_node (x) (set_an x 'root' is_root)) (list n) (when (<- 1 verbose) (11st p)) (defun clear_node (x) ; (set_an x 'mark 0) (defun perform_gc () (break)) (if (null new) (defun mark_node (x) (when (is_root x) (mark_term root) (princ . cc.) (when verbose ``` ``` (if (or (term_is_var x) (term_is_const x)) i (let (test i) :count one for the top node (dollet (e (term_args x)) (setq res (+ res (term_aixe s))) (("locally" beot) (locally, dbunod)) sealun) (defun init_term (e x) (let ((verbose nil) (current_ensemble 0)) (inf (status ensemble e))) (dolist (r (status_roots inf)) (setq res (* res (term_top_size r))) (defun col () (filecol *standard-output*)) (setg x (come nil nil)) (set status size inf x)) (setf (cdr x) (compute_ens_size e)) (setf (car x) clock) (set_status_size_inf_x)) (setf_(cdr_x) (compute_ens_size_e)) (setf_(car_x) clock) (let ((x (status_elze inf))) (If (and x (= clock (car x))) (cdr x) (let ((inf (status_ensemble e))) (let ((x (status_eize inf))) (when (null x) (defun eng_size (e) (let ((inf (status_ensemble e))) (setq x (cons nil nil)) (defun ens_size_recompute (e) (dolist (y (term_args x)) (defun compute_ens_size (s) (let ([res 0]) (fix_roots_where y)) (build_term t + 0 x) (when (null x) (defun term_size (x) (cdr x) (cdr x) (progn ``` (when {< 60 (col)) (terpti) (princ " ")) (if (atom x) (prinl x)</pre> (defun prin (x) ``` current group man, ain, max, eum-of-sq, distribution(/k) ... keep adaptive distribution where know how many values excluded (came before the last adaptation)? ; liep object that flexibly and extensibly keeps record of statistics (with open-file (*standard-output* ,file :direction :output) (let ((*trace-output* *standard-output*) (currentfile *standard-output*)) (when (atom x) (when x (princ " . ") (prin x)) (return)) (state) (current_state) (global_state) distrparam); current_state ; number sum min max sum·of-eq distribution ; a tag indicating the treatment (for extensibility) (defun state_kind (x) (car x)); state0,state1,... (when (< 60 (col)) (terpri) (princ '')) number sum min max sum of seq distribution (defun compute_derived (sts) (if (eq' verse0 (state_kind sts)) (state_compute_derived sts) (brask 'compute_derived: unknown kind') (defun insert_val (sts v) (if (eq 'tatko' (state, ind sts)) (stateO_insert_val sts v) (break 'insert_val: unknown kind') (defun new_group (ste) (if (eq 'state) new_group ste) (state) new_group; unknown kind') (defmacro to-file (file erest forms) (if to-file (defun inc_val (ste) (if (sq state) (state_kind ste)) (stateO_inc_val ste) (break 'inc_val unknown kind') (if (eq 'state0 (state_kind sts)) (state0_add_val ste val) (break 'add_val: unknown kind') (defun add_val (sts val) -- add_val, new_group (prin (car x)) (setq x (cdr x)) (setq x (cdr x)) (prin (car x)) (progn ,@forms) (defvar to-file t) statistics stuff (brine . .) ; global_etate (brinc '(') (princ ')') stated format 1000 ``` ``` ies.lsp ``` œ ``` (defun state0_create () (list 0 0 most-positive-fixnum most-negative-fixnum 0 nil nil) (setf (state0_distr se) (add_distr n 1 dp (state0_distr se))) (sqrt (' (coerce-float (/ sum sq n)) (* av av)))) (defun coerce-float (x) (coerce x 'long-float)) (defmacro state0_cur (x) '(nth 1 ,x)) (defmacro state0_glb1 (x) '(nth 2 ,x)) (defmacro state0_distr_param (x) '(nth 3 ,x)) (defmacto state0_sum (x) (nth 1 x) (defmacto state0_sum (x) (nth 2 x)) (defmacto state0_sum (x) (nth 3 x)) (defmacto state0_sum (x) (nth 3 x)) (defmacto state0_sum (x) (nth 5 x)) (defmacto state0_sum (x) (nth 6 x)) (defmacto state0_sv (x) (nth 6 x)) (setf (stats0_cur sts) 0) :reset counter (setf (state0_av se) 0.0) (setf (state0_dev se) 0.0)) (let ((av (coerce-float (/ sum n)))) (setf (state0_av se) av) (setf (state0_dev se) (defmacro state0_num (x) '(nth 0 ,x)) (defun stateO_compute_derived (stts) (incf count_asts) (let ((ss (stateO_glbl stts))) (let ((n (stateO_num ss)) (sum (state0_sum es)) (sum_eq (state0_sum_eq es))) (defun state0_init (p) (list 'state0_create) p) (when (or (null val) (< n val)) (setf (stats0_min se) n))) (let ((val (statsO_max ss))) (when (or (null val) (< val n)) (setf (statsO_max ss) n))) (defun stated_insert_val (sts v) (setf (state0_cur sts) v) (defun state0_add_val (ets val) (let ((val (state0 min ss))) (incf (State0_cur sts) val) (defun state0_inc_val (sts) (incf (state0_cur sts)) (defvar count_stats 0) (if (- 0 n) (progn ``` ``` (setf (statso_distr gs) (merge_distr (statso_distr ss) (statso_distr gs))) (when (null cut) (setf (cdr lst) (cons pr nil)) (return)) (when (< n (casr cur)) (setf (cdr lst) (cons pr cur)) (return)) (setq lst cur cur (cdr cur))</pre> (when (null 1) (return (insert_distr v c d))) (when (- v (cast 1)) (incf (cdar 1) c) (return d)) (setq 1 (cdr 1)) (dollst (de d1) (setq d2 (add_distr1 (car de) (cdr de) d2))) (defun merge_state (se ge) (incf (etate) nim ge) (state) nim se)) (incf (state) sim ge) (state) sim se)) (incf (state) sim ge) (state) sim se)) (defun all_new_group () (insert_val replacement_state replacements) (new_group replacement_state) (ang_state_new_group e)) (dotines (i rrm_cluster_size) (dotines (j rrm_cluster_size) (now_group (ared cluster_etate i j)) (new_group (ared cluster_etate2 i j)) (defun make_2d_array_etate (m n p) (let (free (make-array (list m n)))) (dotimes (1 m) (let ((vals (stateO_min se)) (val (stateO_min ge))) (when (or (null val) (< vals val)) (setf (stateO_min ge) vals))) (val (stated_max ge))) (when (or (null val) (< val vals)) (setf (stated_max ge) vals))) (defun add_distr (n c p d) (let ((v (* p (truncate n p)))) (add_distrl v c d) (let ((vals (etats0_max ss)) (setf (aref res 1 j) (state0_init p)))) (defun merge_dietr (dl d2) (defun add_digtr1 (v c d) (let ((1 d)) (global_lo_new_group) (dotimes (e rrm_size) (setq replacements 0) (dotimes () n) 1000 ``` ``` ies.lsp ``` σ ``` (defun alloc_ens (e flag) : when flag try allocating in other clusters, otherwise fail (nil value) (incf alloc_num) (let ((ch (random alloc_range))) (setq res (random rrm_size))); choose a random ensemble (setq res cs) (return-from search nil)))) (return-from search nil)))) (return-from search nil)))) (princ res (rrm.ens.dir e dir)); if try all use first (when (-- 5 verboes) (princ 's salloc totally full ens ') (prinl e) (princ 'in') (princi c) (terpri)) (locf alloc_totally_full) (when flag (0 (rrm_cluster_right c)) (1 (rrm_cluster_down c)) (2 (rrm_cluster_left c)) (3 (rrm_cluster_up c)) (random cluster_up c)) (random cluster_width)) (random cluster_width)) (random cluster_width)) ) ; fail on fotally full (when res (return))) (block search (dotimes (i cluster_height) (dotimes (i cluster_width) (list (ca (cluster_sit o i j))) (let ((size (eng_size ce))) (when (< size alloc_thres) (unless alloc totally_full_random (defvar alloc_range 100) (defvar alloc_three 100) (defvar alloc_three 100) (defvar alloc_three 100) (defvar alloc_thl 0) (defvar alloc_totally_thl 0) (defvar alloc_totally_thl thl 0) (alloc_ens (cluster_elt (case 1 (if (evenp ch) 1 2) (if (evenp ch) 0 3)))) (let ((cdir dir)) (if (< ch alloc_cut) (dotimes (i 4) (defun alloc_ensemble (e) (alloc_ens e t)) (defun rrm_ens_d)r (e x) (case x (3 (rrm_right e)) (1 (rrm_up e)) (2 (rrm_left e)) (3 (rrm_down e)) (unless res (let ((dir right 0 ``` ``` (unless (fboundp 'ensiostats) (load "a2")) (ensiostate) (princ (princ "near communication conflicts: ") (prin1 conflict_near) (terpri) (princ 'far communication conflicts: ") (prin1 conflict_far) (terpri) (when (boundp 'confiltt_reloc_eize) (princ 'reloc size confiltts: ') (print confiltt_reloc_eize) (terprt) princ 'alloc num: ') (print alloc_num) (terpri) (princ 'alloc full: ') (print alloc_full) (terpri) (princ 'alloc totally full: ') (print alloc_totally_full) (terpri) (princ 'ensemble full pushes: ') (print ens_full_push) (terpri) (princ 'ensemble full pushes with single root: ') (print ens_full_push_single) (terpri) updated now that state0 used for state_ensemble) stats in ensemble_state (func state_ensemble) cluster stats for ensembles in cluster_ensemble_state (setq count_state 0) (sil_state_apply #'state0_compute_derived) (princ 'number of state: ') (print count_state) (terpri) (defun stats0_val (x) (stats0_sum (stats0_glb1 x))) (compute_derived_replacement_state) (princ 'replacement per etep state') (terpri) (state0_ehow_state2 replacement_state 0) (princ 'root: ') (print term root) (terpri) (princ 'details: ') (prin root) (terpri) and cluster ensemble state2 (princ 'cluster matrix') (terpri) (when (boundp 'replacement_state) princ 'maximum size") (terpri) (wend_fini_ene, dbunod) news) (when (boundp 'conflict_near) (when (bounds 'alloc_num) (brinc 'roote') (terpri) princ 'size') (terpri) cluster state2 cluster state (eng_max_siz) (defun sha () ensetata) (ena_etz) () a unjap) (terpri)) (ens_rts) ( m/e) (B) ``` ``` (dotines (1 rrm width) (format t "-4d" i)) (terpri) (dotines (1 rrm width) (format t "-4d" i)) (terpri) (dotines (1 rrm width) (dotines (1 rrm width) (dot wid <u>.</u> : (if (state used (state ensamble e)) (princ '(princ ".)) (format t "-4d" s2) (if (stats used (state_ensemble e)) (princ " (princ " "))) (princ "global ensemble state") (terpri) · • (format t ~4d* sazeiz) (if (state_used ss) (princ * (princ * ...))) (princ ' |') (terpri) (princ ' |') (terpri) (princ - | ') (terpri) (princ ' |') (terpri) (defun ens_max_six () (princ " ') (defun clust_mat () (defun ensetate () (princ " (princ) ``` ``` (princ "reloc; ") (prin1 (state switch ss)) (terpri) (princ "reloc; ") (prin1 (state_reloc ss)) (terpri) (princ "reloc_far: ") (prin1 (state_reloc_far ss)) (terpri) (princ "push: ") (prin1 (state_push ss)) (terpri) (princ "state: ") (prin1 (state_repl ss)) (terpri) (princ "size: ") (prin1 (sns_size i)) (terpri) (princ 'ensemble roots') (terpri) (dotimes (i rrm_size) (let ((re (etatus_roots (status_ensemble i)))) (princ 'detailed ensemble info') (terpri) (princ "##") (terpri)) (dolist (r rs) (prin r) (terpri) (princ ' | ') (terpri) (princ ' |') (terpri) (print name) (terpti) (print name) (terpri) (defun ens_name (name) (defun enenum (name) (terpri) (princ . (princ (princ tuck; ) (prin1 aug) (terpti) (princ 'used: ') (prin1 aug) (terpti) (princ 'rel: ') (prin1 aug) (terpti) (princ 'rel: ') (prin1 aug) (terpti) (princ 'rel: ') (prin1 aug) (terpti) (princ 'relia: ') (prin1 pum) (terpti) (princ 'reph: ') (prin1 pum) (terpti) (princ 'aux eize: ') (prin1 aug) (terpti) (princ 'aux eize: ') (prin1 aug) (terpti) (princ 'aux eize: ') (prin1 aug) (terpti) (princ 'aux eize: ') (prin1 died (') (terpti) (princ 'aux eize: ') (prin1 died (') (terpti) (princ 'aux eize: ') (prin1 died (') (terpti) (princ 'aux eize: ') (prin1 (coerce (' rel num) 'float)) (terpti) (princ 'aux eize: ') (prin1 (coerce (' rel num) 'float)) (terpti) (princ 'aux eize: ') (prin1 (coerce (' rel num) 'float)) (terpti) (princ 'aux eize: ') (prin1 (coerce (' rel num) 'float)) (terpti) (princ 'aux eize: ') (prin1 (coerce (' rel num) 'float)) (terpti) (princ 'aux eize: ') (prin1 (coerce (' rel num) 'float)) (terpti) (princ 'aux eize: ') (prin1 (coerce (' rel aux 'float)) (terpti) (princ 'aux eize: ') (prin1 (coerce (' rel aux 'float)) (terpti) (princ 'aux eize: ') (prin1 (coerce (' rel aux 'float)) (terpti) (princ 'aux eize: ') (prin1 (coerce (' rel aux 'float)) (terpti) (princ 'aux eize: ') (prin1 (coerce (' rel aux 'float)) (terpti) (princ 'aux eize: ') (prin1 (coerce (' rel aux 'float)) (terpti) (unless (= 0 closs)) (princ | 2v used: ") (prin1 (coerce (/ used clock) 'float)) (terpri) (princ | 2v used: ") (prin1 (coerce (/ used clock) 'float)) (terpri) (princ | 2v usel | far: ") (prin1 (coerce (/ ral clock) 'float)) (terpri) (princ | 2v ral far: ") (prin1 (coerce (/ ral clock) 'float)) (terpri) (princ | 2v vral far: ") (prin1 (coerce (/ ral clock) 'float)) (terpri) (princ | 2v vral far: ") (prin1 (coerce (/ used cnt) 'float)) (terpri) (princ | 2v used: ") (prin1 (coerce (/ used cnt) 'float)) (terpri) (princ | 2v used: ") (prin1 (coerce (/ used cnt) 'float)) (terpri) (princ | 2v used: ") (prin1 (coerce (/ ver cnt) 'float)) (terpri) (princ | 2v used: ") (prin1 (coerce (/ rel cnt) 'float)) (terpri) (princ | 2v vel: ") (prin1 (coerce (/ rel cnt) 'float)) (terpri) (princ | 2v vel: ") (prin1 (coerce (/ rel cnt) 'float)) (terpri) (princ | 2v vel: ") (prin1 (coerce (/ rel cnt) 'float)) (terpri) (princ | 2v vel: ") (prin1 (coerce (/ rel cnt) 'float)) (terpri) (princ | 2v vel: ") (prin1 (coerce (/ rel cnt) 'float)) (terpri) (princ 'av size: ') (prini (coerce (/ size num) 'float)) (terpri))) (princ 'av size: av size: av per tick') (terpri) (when (state_used as) (setq used (+ used 1))) (setq sv (+ sv (state_oval (state_avitch se)))) (setq rel (+ rel (state_oval (state_reloc se)))) (setq rel (+ rel (state_oval (state_avioc_far se)))) (setq repl (+ repl (state_oval (state_avioc_far se)))) (setq repl (+ repl (state_oval (state_aviol se)))) (let (set (state_oval (state_oval se)))) (let (mx (state_oval setate_oval (state_aviol se)))) (if (- oval) (incf maxeizeo) (when (< maxsize mx) (setq maxsize mx)) setd marsizecum (+ maxsizesum mx)))) (setq size (+ size (ens size 1))) ;@@ (princ "clock: ") (prin1 clock) (terpri) (let ((ss (stats ensemble 1))) (unless (equal none ss) (none (state_make))) (setq num (+ num 1)) (dotimes (i rrm_size) (maxeizesum 0) (rel 0) (rel far 0) (push 0) (repl 0) (G Caziexem) (maxeize 0) (let ((num 0) (used 0) (8 0) (progn ``` ``` (if (and (~ 0 res) (~ res 9)) (progn (print res)) (case (truncate res 100) (0 (princ * a*)) (1 (princ * b*)) (2 (princ * d*)) (4 (princ * d*)) (6 (princ * d*)) (6 (princ * d*)) (6 (princ * d*)) (6 (princ * d*)) (7 (princ * d*)) (8 (princ * d*)) (9 (princ * d*)) (defun clust_ens_stats (e) (clust_stats (aref ensemble_cluster e))) (docimes (i rrm width) (format t "-2d" i)) (terpri) (docimes (i rrm height) (format t "-3d" | " i) (format t "-3d" | " i) (fortune ( rrm width) (let ((e (rrm width))) (let ((re (rrm width)))) (let ((re (rrm width)))) (if (mumberp res) (if (= 0 res) (princ ",") (defun clust_stats (ec) (let ((mat (aref cluster_ensemble_state ec))) ``` (detail day () ``` "roots: ") (prin1 (length (status_roots (status_ensemble e)))) (let ((v (rrm_down e))) (princ * (*) (prinl (term_root_loc *)) (princ *)*) (defun cluster_ensemble (1 1 e) (princ '(') (prinl (term_op x)) (dollet (e (term_args x)) (if (ie_root e) (dolist (r rs) (brine . .) (bring .).) (brogn (progn = (defun clust_ens_stats2_2d (e) (clust_stats2_2d (aref ensemble_cluster e))) (defun clust_ens_stats_2d (e) (clust_stats_2d (aref ensemble_cluster e))) (defun clust_ens_state2 (e) (clust_state2 (aref ensemble_cluster e))) (dotimes (i cluster_size) (format t "-4d" i)) (terpri) (dotimes (i cluster_size) (dotimes (i cluster_size) (dotimes (i cluster_size) (let (v (sref mat i j))) (format t "-4d" v) (defun clust_state2_2d (ec) (let ((mat (aref cluster_ensemble_state2 ec))) (defun clust_state2 (ec) (let ((mat (aref cluster_ensemble_state2 ec))) (defun clust_state_2d (ec) (let (imst (aref cluster_ensemble_state ec))) (when mat (brinc ' |') (tempri) (princ " |") (terpri) (princ ' |') (terpri) (princ ' | ') (terpri) (defun ens_sketch (e) (when mat (when mat (princ : ``` ``` (princ 'switch: ') (prinl (state_switch se)) (terpri) (princ 'reloc: ') (prinl (state_reloc se)) (terpri) (princ 'reloc_far: ') (prinl (state_reloc_far se)) (terpri) (princ 'push: ') (prinl (state_push se)) (terpri) (princ 'relp: ') (prinl (state_rel) se)) (terpri) (princ 'arize: ') (prinl (state_size se)) (terpri) (princ 'computed size: ') (prinl (ens_size e)) (terpri) (princ 'computed size: ') (prinl (ens_size e)) (terpri) (defun print_term_top (x) (when {< 6 (wold) (terpt) (print " ')) (if (term_le var x) (print x) (if (or (numberp x) (term_ie_bi x)) (print (term_conet_val x))</pre> (let ((re (status roots (status ensemble e)))) (let ((v (rrm_right e))) (if (= ec (aref ensemble_cluster v)) (cluster_ensemble_number v) (let ((v (zrm_up e))) (if (= ec (aref ensemble_cluster v)) (cluster_ensemble_number v) (let ((v (rrm_left e))) (if (* ec (aref ensemble_cluster v)) (if (- ec (aref ensemble_cluster v)) (print_term_top e)))) (when (< 60 (col)) (terpri) (princ ')) (print_term_top r) (terpri))) (let ((en (cluster_ensemble_number e))) (let ((alit (cluster_ensemble_number v) (cluster_ensemble_number v) ``` # a.lsı ``` (let ((g (state0_glb1 es))) (let ((num (state0_num g))) (let ((num (state0_num g))) (setf (state0_num g) (' num zero)) ivisterO_compute_defived se) (princ 'global not zero ') (state0_ehow_state2 se zero) (terpri) (setf (state0_num g) num) (rrm ensemble i ) (truncate a cluster_width) (cluster_col a)) (if (= 1 )) (princ - · · · ) (let ((e (ard m 1 i))) (let ((e (ard m 1 i))) (let ((va) (g (atte0 glb) e)))) (lesert val scan_star val) (new_group scan_state) (when (= 0 val) (incf scan_sero)) (if (= 0 val) (princ ") (format t = -3d" val)) (dotimee (i cluster_size) (format t '-4d' i)) (terpri) (dotimes (i cluster_size) (dotimes () cluster_size) (state0_compute_derived ss) (princ 'global ') (state0_show_state2 ss 0) (tempil) (decf count_state 2) (unless (= 0 (state0_num (state0_glb1 scan_state))) (show_state scan_state scan_zero)) (setq rel scan_state relz scan_zero) (defun ena_rt (a) (princ "ensemble roots") (terpri) (let ((rs (status_roots (status_ensemble e)))) (unless (null m) (princ "reloc cluster ") (prinl e) (terpri) (princ 'relocation within cluster') (terpri) (setq scan_state (state0_init scan_param) scan_zero 0) (defun sh (e) (let ((m (aref cluster_ensemble_stats e))) (defun clust_sketch (c) (princ "cluster: ") (prinl c) (terpri) (princ "reloc") (terpri) (clust_state2 c) (princ "reloc from") (terpri) (clust_state_2d c) (princ "push from") (terpri) (dotimes (e rrm_cluster_size) (prin r) (terpri)))) (defun show stats (se zero) (clust_state c) (princ 'push') (terpri) (defvar scan_stats nil) (defvar scan_zero 0) (defvar scan_param 1) (clust_stats2_2d c) (dolist (r rs) (defun sha (m) (unless (null m) (defun shc () (when re (princ . (a) e) (progn ``` ``` (let ((adjwal (if (= 0 (car x)) ( (cdr x) zero) (cdr x)))) (format t "-5d: -5d-4" (car x) adjwal) (setq tot (+ tot adjwal)) (when (< cut tot) (princ " (terpri)</pre> (let (se (state)_glb1 x)) (princ 'number: ') (prin1 (stateO_num se)) (terpri) (princ 'number: ') (prin1 (stateO_man se)) (terpri) (princ 'man: ') (prin1 (stateO_man se)) (terpri) (princ 'man: ') (prin1 (stateO_man se)) (terpri) (princ 'dev: ') (prin1 (stateO_man se)) (let (d (stateO_dist se)) (let (d (stateO_dist se)) (let (cuti (* .5 num)) (cut (* .5 num)) (defun etate@_ehow_state (x unit) (let ((setse@_gth) x)) (prino 'number: ') (prin1 (state@_num es)) (terpri) (prino 'main: ') (prin1 (state@_min es)) (terpri) (prino 'main: ') (prin1 (state@_min es)) (terpri) (prino 'main: ') (prin1 (state@_win es)) (terpri) (prino 'dev: ') (prin1 (state@_win es)) (terpri) (let (d (state@_distr ms))) (unless (~ 0 (atate0_num (state0_glb) scan_state))) (show_state scan_state scan_zero))) (let ((n (truncate (odr x) unit))) (when (< 60 n) (princ '9') (setq n 60)) (dotimes (i n) (princ '*')) (format t "-5d: -5d " (car x) (cdr x)) (let ((m (aref cluster_ensemble_state2 e))) (unless (null m) (princ 'push cluster ') (print e) (terpri) (defun state0_show_state2 (x zero) (princ * |*) (terpri) (setq cut1 nums)) (terpri)) (Stated_max x)) (defun sh2 (e) (defun shc2 () (defun g (x) (sh2 e) ``` ``` (defun find_empty_where (x) (if (or (term is var x) (term is conet x)) nil (let (fres (null (get_an x "where)))) (when res (princ "--> ") (prind (a_term_loc x)) (princ ":") (print_term_partial 2 x) (terpri)) (doliet (e (term_args x)) (setq res (or res (find_empty_where e))) (when (ig_root x) (princ '--> ') (prinl (a_term_loc x)) (princ ': ') (princ 'term_partial 2 x) (terpri)) (doliet (e (term_arge x)) (find_it e) (princ "<-- ") (prind (a_term_loc x)) (princ ": ") (print_term_partial 2 x) (terpri))</pre> (princ '...') (prinl (a_term_loc xi) (princ ': ') (print term_loc xi) (doliet (e (term_loc xi)) (setq res (or res (find_ture e))) (princ '<- ') (prin1 (a_term_loc x)) (princ ': ') (print_term_partial 2 x) (terpri))</pre> (let ((res (eval_cond_direct (rule_cond r) mt))) (if (eq 'fail res) (break 'apply_rule: complex condition') (if (or (term_is_var x) (term_is_const x)) nil (ter (tes nil)) (dollst (p (gst_an x "where) (setq res t)) (when (<- (cdr p) clock) (return))) (if (or (term_is_var x) (term_is_const x)) nil (add_repl x r 1 (rule_inet r x mt))))) (add_repl x r i (rule_inst r mt))) (defun apply_rule (1 r x) (set_an x 'tried t) (let ((at (term_match (rule_lhs r) x))) (unless (eq 'fall mt) (f (rule_cond r) (ens_inc_repl current_ensemble) (setq rule_successful t) (ens_inc_repl current_ensemble) (incf replacements) (defun a_term_loc(x) (if whloc (si:address x) (let ((locs (get_an x 'where))) (if locs (setq rule_successful t) (incf replacements) (car (car (last locs))) (defun find_future (x) (defun find_it (x) (when res (defvar while t) (when res (princ ' (when res (when res (brogn = ``` ``` (dolist (r rts) (prin1 (a_term_loc r)) (princ ": ") (print_term_partial 2 r) (terprl)) (when (occ_in x r) (princ 'ensemble: ') (print i) (princ '') (print (a_term_loc r)) (princ ': ') (print term_partiel 2 r) (unless (get_an x 'tried) (princ '...' ) (prinl (a_term_loc x)) (princ ':.') (print_term_partial 2 x) (terpri)) (find_untried e) (when (get_an x 'tried) (princ '--> ') (prinl (a_term_loc x)) (princ ': ') (print_term_partial 2 x) (terpri)) (doliet (a (term_args x)) (find_tried a) (if (term is const x) (prini (term const val x)) (and (get_an x 'tried) (dolist (xa (term_arge x) t) (unless (all_tried xa) (return nil)))) (if (or (term is yar x) (term is const x)) nil (if (or (term_is_var x) (term_is_const x)) nil (show_not_tried e)) (when (< 60 (col)) (terpri) (princ " ')) (when (< 60 (col)) (terpri) (princ ' ')) (defun all_tried (x) (if (or (term_is_var x) (term_is_const (dotimes (i rrm_size) (let ((inf (status_ensemble 1))) (let ((rts (status_roots inf))) (defum find_in_roots (x) (dotines (1 rrm_size) (let ((inf (status_enemble 1))) (let ((int (status_enemble 1))) (if (term_is_var x) (prinl x) (princ '(') (prinl (term_op x)) (dollst (e (term_args x)) (princ '') (defun show_not_tried (x) (if (all_tried x) (princ "#") (print i) (terpri) (defun find untried (x) (dolist (r rts) (defun find tried (x) (defun find_roots () (terpri)) (brinc .).) (brogn ``` ``` (unless (find a_root x) (print (a_term_loc x)) (print ": ") (print_term_partial 2 x) (defun find_a_root (x) (let ((ree (find_a_root_l x))) (prinl (a_term_loc x)) (princ " --> ") (prinl ree) (terpri) (setq check_list (make-hash-table :size 5000 :test #'eq)) (dotimes (s rrm_size) (when (eq x r) (return-from whole 1)))))))) (if (eq x y) t (if (or (term is var y) (term is_const y)) nil (if (is_root y) nil (dollat (e (term args y)) (when (occ_in x e) (return t))) (defun find_non_roots (x) (if (or (term_is_var x) (term_is_const x)) nil (nth x (status_roots (status_ensemble e))) (when (eq z r) (princ " ") (prin1 1))))))) (defun show_a_root (x) (dotimes (i rrm_size) (let ((inf (status_ensemble i))) (let ((rife (status_roots inf))) (let ((current_ensemble e)) (let ((inf (exturensemble e))) (Colist (r (etatue_roots inf)) (Craverse_tried r) (let ((inf (status ensemble 1))) (let ((its (status roots inf))) (dolist (e (term_args x)) (find_non_roots e)) (defun addr (x) (si:address x)) (defun find_a_root_1 (x) (block whole (traverse_tried e) (when (is root x) (dotimes (i rrm size) (dolist (r rts) (dollet (r rts) (defun selroot (e x) (defun occ_in (x y) (defvar check_list) (defun not_tried () (terpri) (when rts (when rts (progn ``` ``` (and (gethash x check_list) (doilst (xa (term_args x) t) (unless (every_tried xa) (return nil)))) (When (< 60 (col)) (terpri) (princ '')) (if (term_is_var x) (prinl x) (if (term_is_const x) (prinl (term_const_val x)) (lat ((ifg (gethash x check_list))) (when (and see (not fig)) (when (< 60 (col)) (terpri) (princ '')) (princ (if iig ')' '')) (defun every_tried (x) (if {or (term_is_var x) (term_is_const x)) (defun ex () (exhibit_not_tried root t)) (defun exhibit_not_tried (x see) (if (every_tried x) (princ "#") (exhibit_not_tried e flg)) (print_term_partial 2 x) (dollst (s (term args x)) (princ ") (print (a_term_loc x)) (princ ":") ((.). .). ft #10 ((.)) (print (term_op x)) (find in roots x) (brinc ">")) (princ 'cc') (terpri) ``` ``` tst4.lsp ``` ``` (ro-file 'teté.out' 'teté.out') (ro-fi ``` ``` Loading p.lap rrm size: rrm clusters 4x4 cluster ensembles 4x4 stats: ens 10 clust ens 10 clust 10 intra cluster penalty: 2 inter cluster penalty: 2 gc period: 1 Loading alloc4.0 Finished loading alloc4.0 alloc threshold: 100 alloc threshold: 100 alloc threshold: 100 alloc randomization range: 100 cut: 2 when totally full: alloc in local cluster Loading strat.0 Finished loading strat.0 Finished loading strat.0 Finished loading strat.0 Finished loading strat.0 Finished loading strat.0 strate limit: 100 state lin ``` | | • | | • | | : | | | | | | | ; | | | | | | |-----------------------------------------|--------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|---------------------------------------|----------------------------------------------------------------|-----------------------------------------------------|-------------------------------------------------------------------|----------------------------------|----------------------------------------------|----------|---------|---------|-------------|----------------|---------------------------------|------------------------|-----------------------------------| | | 20000000 | 000000 | - : | : | 1 | : | | | | | | : | | | | | | | | *00000000 | | , | : | | : | | | | | | • | | | | | | | | 200000000 | • | , | | • | | | | | | | ; | | | | | | | | | • | , ; | : | : | : | | | | | | : | | | | | | | | | • | • | : | : | : | | | | | | : | | | | | | | | | • | , | : | | | | | | | | : | | | | | | | | ********** | • | , | | • | : | | | | | | : | | | | | | | | 800000000 | ; | , | | | | | | | | | : | | | | | 99 | | | ; | 000000 | , [ | | | | | | | | | | | | | | 9999 | | | \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | 000000 | , [ | | Ę. | | | | | | | | | | | | 9999 | | | • | 000000 | ; | 0 | 160 | | | | | | | | | | | | 9999 | | | +0000 0000 | 000000 | , [ | cte | 4 | B | 1573 | Š | 80 | | | | | | | | ž | | | M000 00000 | 000000 | | conflicts: | 0 3 | 3 | 1428 | ; | | | | | | | | | ë | | | ; | 000000 | , ä | | is he | 8 ter | 12857 | | | | | 1 | | | | ): 253<br>0625 | udin | | | ;<br>; | | _; | communication co | ensemble full pushes: 0 ensemble full pushes with single root: | replacement per step stats<br>number: 140<br>min: 0 | max: 27<br>average: 3.2071428571428573<br>dev: 4 0470384737685983 | | 15 | - m vo n | | ble | | | | 6 - 0: 25<br>: 0.40625 | ze excluding 0: 34.66666666666666 | | | 40.000000000000000000000000000000000000 | 000000 | num: 13<br>full: 0<br>totally | | 33 | 140<br>140 | 3.3 | | | | | 140 | | • | | 1 46<br>1 2 2 3 | i i | | | Cluster | | alloc full | near communitar communitar communitar | 44 | replacement<br>number: 140<br>min: 0 | 7 8 4 | 5 # # | <u>, </u> | | äää | | 25° . 5° | 8 8 E | ## <b>#</b> | | 19 | | # # # # # # # # # # # # # # # # # # # | | 6011111 | 1100 | near<br>far<br>reloc | 1 2 2 | repla<br>numb | | | | | | global | in and in a | | pueh: | X Z | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | _ | | | | | | ~ <del>!</del> | | | | | | | | ži<br>—— | | | | | SI - | | | | | | -: | | | | | | | | 14 15 | • • • • | | | | 14 15 | | | · · · · | • • • • | • • | | 14 15 | | | | | | 20<br>20<br>nd1 | 13 14 15 | | | | | 13 14 15 | | | | | • • | - | 13 14 15 | | ••• | | | | 9 (- 10<br>9 (- 20<br>9 (- and) | 12 13 14 15 | | | | | 12 13 14 15 | | | | | ••• | | 12 13 14 15 | | • • • | | | | 56 | 11 12 13 14 15 | | | | - · · · · · · · · · · · · · · · · · · · | 77 CI | | | | | | | 11 12 13 14 15 | | • • • | | | | 8 (- 9<br>(- 19<br>(- 29 | 10 11 12 13 14 15 | | | | | 77 CI | | | | | | | 10 11 12 13 14 15 | | | | | | 7 (- 8 (- 9 (- 19 (- 29 (- 29 | <b>ä</b> · · | | | | · · · · · · · · · · · · · · · · · · · | 77 CI | | | | | | | 9 10 11 12 13 14 15 | | | | | | (- 7 (- 8 (- 9<br>17 (- 18 (- 19<br>27 (- 28 (- 29 | 11 01 | | | | - · · · · · · · · · · · · · · · · · · · | 77 CI | | | | | | | 8 9 10 11 12 13 14 15 | | | | | | (- 7 (- 8 (- 9<br>17 (- 18 (- 19<br>27 (- 28 (- 29 | 11 01 | | | | - · · · · · · · · · · · · · · · · · · · | 77 CI | | | | | | | 7 6 9 10 11 12 13 14 15 | | | | | | - 5 ( - 6 ( - 7 ( - 8 ( - 9 ( - 9 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( )))))))))))))))))))))))))))))))))) | 6 7 8 9 10 11 | | | | | 77 CI | | | | | | | 6 7 8 9 10 11 12 13 14 15 | | | | | | - 5 ( - 6 ( - 7 ( - 8 ( - 9 ( - 9 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( )))))))))))))))))))))))))))))))))) | 6 7 8 9 10 11 | | | | | 77 CI | | | | | | | 5 6 7 8 9 10 11 12 13 14 15 | | | | S & S & S & S & S & S & S & S & S & S & | | - 5 ( - 6 ( - 7 ( - 8 ( - 9 ( - 9 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( - 20 ( )))))))))))))))))))))))))))))))))) | 6 7 8 9 10 11 | | | | | 77 CI | | | | | | | 4 5 6 7 8 9 10 11 12 13 14 15 | | | | 100<br>0 120<br>80<br>200<br>40<br>100 | 200<br>200<br>100<br>100<br>40<br>40<br>40<br>40 | (-3 (-4 (-5 (-6 (-7 (-8 (-9 -14 (-15 (-26 (-27 (-28 (-29 (-11) )))))))) | 6 7 8 9 10 11 | | | | | 77 CI | | | | | | | 3 4 5 6 7 8 9 10 11 12 13 14 15 | | | | | | (-3 (-4 (-5 (-6 (-7 (-8 (-9 -14 (-15 (-26 (-27 (-28 (-29 (-11) )))))))) | 6 7 8 9 10 11 | | | | | 77 CI | | | | | | | 3 4 5 6 7 6 | | | | | | (-3 (-4 (-5 (-6 (-7 (-8 (-9 -14 (-15 (-26 (-27 (-28 (-29 (-11) )))))))) | 6 7 8 9 10 11 | | | | | 3 4 5 6 7 8 9 10 11 12 13 14 | | | | | | | 3 4 5 6 7 6 | 9 | | | | | (-3 (-4 (-5 (-6 (-7 (-8 (-9 -14 (-15 (-26 (-27 (-28 (-29 (-11) )))))))) | 6 7 8 9 10 11 | | | | | 2 3 4 5 6 7 8 9 10 11 12 13 14 | 6 20 | | | | | | 1 2 3 4 5 6 7 8 | 30 | | | 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | o: | (-3 (-4 (-5 (-6 (-7 (-8 (-9 -14 (-15 (-26 (-27 (-28 (-29 (-11) )))))))) | 6 7 8 9 10 11 | | | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | | | | | | | 1 2 3 4 5 6 7 8 | | | ``` tstl.out.1 ``` ``` av used: 0.021428571428571429 av sv: 0.77142857142857145 av rel: 0.61428571428571432 av rel: 0.61428571428571432 av repl: 3.07428857142857 av repl: 3.07428857142857 av used: 8.1705357142857144857 av used: 8.1705357142857144857 av rel: 0.0021995535714285716 av rel: 0.0021995535714285716 av repl: 0.012527901785714286 2 # 13 77 7 ដ ; aversge: 0.02916666666666667 dav: 0.19144008346100236 0: 234 15: number of zero values: 234 global not zero number: 6 min: 0 max: 2 average: 1.1666666666666667 dev: 0.37267799624996456 0: 108 108 ..... pushes within cluster push cluster 0 number: 240 global : ä ö ``` global ensemble I/O state num: 420 gum: 13360 min: 0 Loading a2.0 Finished loading a2.0 108 ....... 804 number: 240 global global not zero number: 2 min: 0 max: 1 average: 1.0 dew: 0.0 0: 0 ``` 59.76 77.38 77.62 94.52 96.90 99.29 59.29 76.90 77.38 95.24 97.62 99.52 99.76 7.14 33.57 41.43 67.14 74.29 0.00 43.79 44.38 86.39 92.31 99.21 0.00 43.27 44.44 88.30 94.15 99.42 0.00 28.46 36.92 64.62 72.31 global per step I/O state 59.29 17.62 0.48 17.86 2.38 1.90 0.24 2.38 2.38 2.38 2.38 2.38 0.48 7.14 26.43 7.86 25.71 7.14 14.29 output num: 420 mum: 1360 mum: 1360 mum: 220 mum: 220 mum: 220 mum: 31.81 dew: 44.91 disribution: ax: 300 av: 94.71 dev: 57.55 non zero av: 102.00 disribution: av: 31.81 dav: 45.73 non zero av: 79.05 distibution: eum: 13260 min: 0 120: 120: 200: 200: 1200:1 1200:1 140:: ``` | 89.29<br>90.00 | 97.86 | 98.57 | 99.39 | 100.00 | | | | | | | | | | 34.29 | 41.43 | 47.14 | 52.14 | 53.57 | ~ | 84.29 | 86.43 | 92.86 | 97.14 | 98.57 | 99.39 | 100.00 | |----------------|-------|--------------|-------|--------|--------|----------|------------|--------|----------|----------|---------|-------------------|-------------|-------|--------|-------|-------|-------|-------|-------|-------|-------|-------|-------|----------|--------| | 88.46 | 97.69 | • | 99.23 | 100.00 | | | | | | | | | | 0.00 | 10.87 | 19.57 | 27.17 | 29.35 | 73.91 | 76.09 | 79.35 | 89.13 | 95.65 | 97.83 | 16.86 | 100.00 | | 0.71 | 7.86 | 0.71 | 0.71 | 0.71 | | | | | | | | 145.22 | | 34.29 | 7.14 | 5.71 | 8.00 | 1.43 | 29.29 | 1.43 | 2.14 | 6.43 | 4.29 | 1.43 | 0.71 | 0.71 | | | 11 | - | - | - | | | 9 | | | <b>.</b> | .12 | .:<br><b>&gt;</b> | ton: | 8 | 70 | • | ^ | ~ | 7 | ~ | ~ | • | 9 | ~ | <b>-</b> | - | | 160: | 500: | <b>5</b> 50: | 240: | 300: | output | num: 140 | sum: 13360 | min: 0 | Max: 360 | av: 95. | dev: 87 | non zero | disribution | ö | ;<br>• | 90: | 100: | 120: | 140: | 180: | 300: | 220: | 240: | 280: | 340: | 360: | ) (setd all partitions ) > princ 'initial'; (terpri) (princ 'initial'; (terpri) (print\_term root) (terpri) >(- 6 (- 5 (- 4 (- 3 (- 2 (- 1 (- 0 (end1))))))) ")" , 1,11 (simul) 0 GC 1 GC 2 GC 3 GC 4 GC 5 GC 6 GC 7 ('s' bsol) < Loading a.18p Finished loading a.18p r(v) number of state: 1792 root: (- 0 (- 1 (- 2 (- 3 (- 4 (- 5 (- 6 (end1))))))) | • | 0 | - | ~ | ~ | • | S | 9 | 7 | 8 | • | 20 | 11 | 77 | 13 | 7. | 32 | _ | |-----------------------------------------|---|----|---|---|---|---|---|---|-----------|---|----|----|----|----|----|----|---| | ö | • | | | | | | | | | | | | | | | | | | ï | • | - | | | | | | | | | | • | | | • | | | | ë | ٠ | • | | | | | | | | | | | | | | | | | Ë | | | | • | | | | | | | | | | • | ٠ | | _ | | ÷ | ٠ | | ٠ | | | | | | | | | | | | | | | | ; | • | | | | | | | | | | | | | | | | | | ij | ٠ | • | • | | | | | | | | ٠ | | ٠ | | | | | | 7: | ٠ | • | | | | | | | | | | | | | ٠ | | | | ë | ٠ | • | | | | | | | - | | | | | ٠ | | | _ | | | • | | | | | | | | | | | | | | | | _ | | ë | | | • | | | | | | | | | | | | | | _ | | :11 | • | | • | | | | | | | | | | | | | | | | 12: | ٠ | • | • | | | | | | | | | | | | | ٠ | | | 13: | | | | | | | | | | | | | | | ٠ | | | | 14: | • | | | ٠ | | | | | | | | | | • | | | | | 15: | ٠ | • | • | | | | | | | | | | | • | ٠ | | | | 44.14 | : | : | | | | : | : | : | : | : | : | | : | | : | : | : | | | 0 | - | ~ | ~ | • | S | 9 | 7 | <b>80</b> | • | 9 | 11 | 77 | 7 | = | 22 | | | ö | ٠ | • | | | | | | | | | | | | | | ٠ | _ | | <u>-</u> | • | 12 | ٠ | | | | | | | | | | | • | • | | _ | | ä | | | | | | | | | | | | | | | | | _ | | ä | | | | | | | | | | | | | | | | • | | | ÷ | ٠ | | | | | | | | | | | | | | • | • | | | Š | • | | | • | • | | | | | | ٠ | | | | ٠ | | _ | | ÿ | • | | | | | | | | | | | | | ٠ | | | | | ~ | • | • | | • | | | | | | | | | | | - | | | | ä | ٠ | • | | | | | | | | | • | | | ٠ | | | | | ë | • | • | • | | | | | | - | | | | | | | | | | ä | • | • | | | | | | | | | | | - | | | | _ | | ======================================= | • | | ٠ | | • | | | | | | | ٠ | | | ٠ | • | _ | | 12 | • | | | | | | | | | | | | | | | - | _ | | = | • | • | ٠ | | | | | | | | | - | | | | | _ | | : 71 | ٠ | • | ٠ | | | | | | | | | | | | | | _ | | | • | - | • | - | • | ď | ¥ | • | Œ | 0 | 9 | - | 2 | : | 7 | 3.5 | |----------|---|----|---|---|---|---|---|---|---|---|---|---|---|---|---|-----| | | 0 | • | ٠ | , | , | • | , | | , | • | • | • | : | 7 | : | • | | _<br>:: | | | | | | | - | | | | | | ٠ | • | ٠ | ٠ | | _ | • | 15 | | | | | | | | | | | | ٠ | • | ٠ | | - | | | ٠ | | | | | | | | | • | - | | • | • | | <u>.</u> | | - | ٠ | | | | | | | | | ٠ | • | • | • | • | | | | | | | | | | | | | | ٠ | | • | • | • | | - | | | - | | | | | | | | | • | ٠ | | | • | | - 9 | | | • | | | | | | | | • | • | - | • | | ٠ | | <u>.</u> | | | | | | | | | | | | | | • | | ٠ | | <br> | | | | | | | | | • | | | • | • | • | | • | | <u>.</u> | | | | | | | | • | | | | ٠ | • | - | ٠ | • | | ë | | | | | | | | | | | • | - | | • | • | • | | = | | | - | | | | | | | | - | • | | | - | • | | <u>:</u> | | | | | | | | | | | | | | | • | • | | <u>:</u> | • | | | | | | | | | | | • | ٠ | | • | • | | | | | | | | | | | | | • | | | | • | • | | _ | | | | • | | | | | | | | - | | | | • | | - | 0 | ~ < | ~ < | <b>*</b> 0 | s c | • | ~ 6 | <b>c</b> | • | ខ្ល | = 9 | 3 9 | 2 3 | = 5 | 21 0 | _ | |---|---|-----|-----|------------|-----|---|-----|----------|---|-----|-----|-----|-----|-----|------|---| | ۰ | • | -: | 0 | 0 | • | • | • | • | • | • | 0 | 0 | 0 | 0 | 0 | | | 0 | 0 | : | • | 0 | 0 | • | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | | 0 | 0 | • | : | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | | 0 | 0 | 0 | 0 | : | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | _ | | 0 | 0 | ٥ | 0 | 0 | : | 0 | 0 | 0 | ٥ | 0 | ٥ | ٥ | 0 | ٥ | ٥ | _ | tstl.out.pdl | 100.00 | 100.00 | 100.00 | 100.00 | |-----------------------------------------------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------| | 0.00 | | o | <b>0</b> | | 100.00 | 100.00 | etep I/O state | 100.00 | | av: 0.00<br>dev: 0.00<br>disribution:<br>0: 8 | 📽 | global per step input num: 8 eum: 0 e | output num: 8 star: 0 star: 0 av: 0.00 dev: 0.00 disribution: 0: 8 | B Detailed Description of the New Ensemble Simulator ### The New RISC Ensemble Simulator #### Tim Winkler Abstract: The purpose of these notes is to document the new RISC ensemble simulator. #### 1 Introduction The outline of the rest of this report is this: - Overview of simulator. - Basic structures. - Basic simulation process: a simulation step. - Instruction set summary. - · Annotated code for operations. This discussion only includes the basic simulator. #### 2 Overview The ensemble simulator is written in C and as a whole consists of - risc.h which contains the basic parameters, macro definitions, type definitions, and global variable declarations. - risc.c This contains the main routine, simd\_execute called from the main routine (which executes the simd code) and is defined in the user provided SIMD code, and the basic SIMD execution routines (simd, simd1, ...). Supporting the SIMD execution are various routines in basic.c which are not discussed here. These are all part of the software of the ensemble and do not affect the design of the ensemble. Normally the user of the simulator will put data graph initialization and the routine simd\_execute in the file simd.c and link this with the rest of the simulator. - basic.h, basic.c less basic routines (in fact, these can be viewed as providing SIMD routines) that are not discussed here. #### 3 Basic Structures The basic structures used in the simulation are variables that represent registers and wires/buses. The value of a variable representing a bus indicates the currently asserted state of the bus. Normally the value asserted should not persist beyond one whole clock cycle, and should only change once during a clock phase (roughly speaking). Here are some basic constants. From risc.h: ``` /* ensemble configuration */ #define COLHUM 12 - Number of columns of tiles. #define ROWNUM 12 - Number of rows of tiles. /* Are row and col numbers powers of two corresponding to ADDRSHIFTs? */ #define POW2 0 - See comment above. #define TILENUM (COLNUM + ROWNUM) - Total number of tiles. *define CELLSPERTILE 4 - Number of cells per tile. #define CELLHUM (TILENUM * CELLSPERTILE) — Total number of cells. #define PORTNUM (2*TILENUM + COLNUM + ROWNUM) - Number of ports. #define PORTCHT 4 - Number of ports at cell. Here are the central type defintions. Note: the ids are not used in the simulation process and were included to make structures self-identifying. /* type definitions */ /* only used to get at id (really) */ typedef struct genericstruct { - Any structure below looks like this. int genid; int genval; /* only because is a common case */ } genericthing; typedef struct portstruct { - Represents a port bus. int portid; int portaddr; int portval; int portloc; } portbus; typedef struct colstruct { - Represents a column bus. int colid: int colval; int colowner; /* need not exist in actual hardware */ } colbus; #define FAILFLAG (1<<30) typedef struct rowstruct { - Represents a row bus. int rowid; int rowval; int rowowner; /* need not exist in actual hardware */ } rowbus; typedef struct cellstruct { - Represents a cell. int self; /* own address */ int reg[REGCNT]; - Includes ACC. int marks; portbus *ports[PORTCNT]; ``` ``` colbus *col; rowbus *row; int portno; - Value 0..7; unique for each cell on port bus. int refc; - Reference count. int aux; - Used in SELCELL MAKEREQ FETCHDATA, etc. int state; - The state flags (see below). } cellstate; /* some defines related to the above */ #define CELLTOK reg[TOK] #define CELLFLAGS reg[FLAGS] #define CELLLEFT reg[LEFT] #define CELLRIGHT reg[RIGHT] #define CELLNTOK reg[NTOK] #define CELLNFLAGS reg[NFLAGS] #define CELLWLEFT reg[WLEFT] #define CELLWRIGHT reg[WRIGHT] #define CELLACC reg[ACC] #define PORTE ports[EAST] #define PORTW ports[WORTH] *define PORTW ports[WEST] #define PORTS ports[SOUTH] #define STATE_VALID 1 /* VALID means allocated and allowed to become active */ #define STATE_ACTIVE (1<<1) #define STATE_TRYING (1<<2) #define STATE_REMOTE_LEFT (1<<5)</pre> #define STATE_REMOTE_RIGHT (1<<6)</pre> #define STATE_COMMIT (1<<7)</pre> typedef struct simdstruct { - SIMD broadcast bus. int id: int op; - Current operation/opcode. int arg1, arg2, arg3, arg4; - arguments to operation. int ph; - Clock phase, 1 or 2. } simdbus; There are 32 marks in a cell, referred to by index (0-31). There may be some additional state bits: WORKING, ZERO, MINUS, CARRY, OVERFLOW, and FAILURE. The aux register might be the ACC. Here are the actual declarations of the key shared variables. extern int clock; /* Note: not phase number */ extern int globalfeedback; extern simdbus simdinstr; extern portbus port[PORTHUM]; extern colbus col[COLNUM]; extern rowbus row[ROWNUM]; extern cellstate cell[CELLBUM]; ``` #### 4 The Simulation Process The user code calls the simd, simd1, ... routines, and also uses the resetglob and testglob routines; these routines give control the the ensemble. The user SIMD routines may also use whatever controller state is desired (e.g. flags). Execution of ``` simd2(TSTMARKPTR,1,LEFT); SHOW; ``` causes the opcode (TSTMARKPTR) and arguments (1,LEFT) to be put into the simdinstr structure that represents the SIMD control broadcast bus, and then the execution of simdstep. There are many extranenous details to simdstep (e.g., recording frequency of execution of instructions), but the key process is to - Increment the clock. - Reset port, row, and col buses (if needed). - Set the instruction phase to 1 (simdinstr.ph = 1). - Perform simdaction for every cell. - In the case of CELLARBPTR and ARBROW, perform some inter-clock phase actions that cannot easily be associated with cell actions. - Set the instruction phase to 1 (simdinstr.ph = 2). - Perform simdaction for every cell. The simulaction routine takes a pointer to cell and uses a large switch statement to execute the code associated with the SIMD op. #### 5 Basic definitions The following are very basic bit operations: ``` /* general defines */ — The are bit testing operations based on bit indices. #define TSTBITN(x,n) ((x)&(1<<(n))) #define ADDBITN(x,n) ((x)|(1<<(n))) #define REMBITN(x,n) ((x)& ^(1<<(n))) #define ASNBITN(x,n,y) ((y)?ADDBITN(x,n):REMBITN(x,n)) #define MSKBITN(n) (1<<(n)) #define SETBITN(x,n) ((x) |= (1<<(n))) #define CLRBITN(x,n) ((x) &= ^(1<<(n))) #define LOWERBITS(n) ((1<<(n))-1) ``` — The are bit testing operations based on bit masks. ``` #define TSTBIT(x,n) ((x)&(n)) #define ADDBIT(x,n) ((x)|(n)) #define REMBIT(x,n) ((x)&(~(n))) #define ASMBIT(x,n,y) ((y)?ADDBIT(x,n):REMBIT(x,n)) #define SETBIT(x,n) ((x) |= (n)) #define CLRBIT(x,n) ((x) k= (n)) /* Note: duplication of n in following: */ #define TSTALLBIT(x,n) ((n) == ((x)\&(n))) #define LOWESTBIT(n) ((n)&-(n)) #define ATMOSTONEBIT(n) ((n) & -(n)) #define UNDEF (Oxbadbadee) The following are the basic definitions related to addresses, address mappings, port selection, and PORTNO calculations. /* basic defines */ /* Addresses: top left is 0,0 (is X,Y); X is row and Y is col */ Y=0 1 2 3 4 5 6 7 /* X=0 0 1 2 3 4 5 6 7 1 8 9 10 11 12 13 14 15 2 16 17 18 .... #define ADDRSHIFT 4 - Bits to represent column or row. #define ADDRTOPSHIFT 8 /* should be 2*ADDRSHIFT */ /* Changed cell nums by +1 in addrs */ - Compute address from row, col, cell num. \#define ADDR(x,y,n) ((((n)+1) << ADDRTOPSHIFT) + ((x)<<ADDRSHIFT)+(y)) — Tile is just row, col, no cell number. #define TILE(x) ((x) & ((1<<ADDRTOPSHIFT)-1)) Decode addresses. #define ROW(x) (((x)>>ADDRSHIFT) & ((1<<ADDRSHIFT)-1)) #define COL(x) ((x) & ((1<<ADDRSHIFT)-1)) #define NUM(x) (((x) >> ADDRTOPSHIFT)-1) #define CELLNUMINCR (1<<ADDRTOPSHIFT) #define TILEADDR(x,y) (((x)<<ADDRSHIFT)+(y))</pre> Translating addresses to indices. #if POW2 #define CELLIND(x,y,n) (((n) << ADDRTOPSHIFT) + ((x)<<ADDRSHIFT)+(y)) /* convert address to cell[] index */ #define ADDRTOIND(x) ((x)-CELLNUMINCR) #define TILEIND(x,y) (((x)<<ADDRSHIFT)+(y))</pre> #else #define CELLIED(x,y,n) (((n)+ROWBUM + (x))+COLBUM + (y)) /* convert address to cell index */ #define ADDRTOIND(x) ((NUM(x)*ROWNUM + ROW(x))*COLNUM + COL(x)) ``` ``` #define TILEIND(x,y) (((x)+COLNUM)+(y)) #endif #define ROWINCR (1<<ADDRSHIFT) #define COLINCR 1 #define ADDRUP(x) ((x)-ROWINCR) — Adjacent tiles. #define ADDRDWW(x) ((x)+ROWINCR) #define ADDRLFT(x) ((x)-COLINCR) #define ADDRRGT(x) ((x)+COLINCR) #define CELLAT(a) cell[ADDRTOIND(a)] - Address to cell. #define CELLREF(x,y,n) cell[CELLIND(x,y,n)] -r,c,n to cell. — Indices for ports. #define PORTWIND(x,y) (2*TILEIND(x,y)) #define PORTWIND(x,y) (2*TILEIND(x,y)+1) #define ISPORTW(x) ((x)&1) only when not on right or bottom edge */ #define PORTEIND(x) (2*TILENUM+(x)) /* along right side */ #define PORTSIND(y) (2*TILENUM+ROWNUM+(y)) /* along bottom */ /* access to all ports by allowing "the next index over" */ #define PORTNINDX(x,y) ((x)==ROWNUM ? PORTSIND(y) : PORTNIND(x,y)) #define PORTWINDX(x,y) ((y)==COLNUM ? PORTEIND(x) : PORTWIND(x,y)) /* not really correct: */ #if POW2 #define VALIDADDR(x) (CELLNUMINCR<=(x) && \ (x)<(CELLNUM+CELLNUMINCR)) #else #define VALIDADDR(x) (ADDR(0,0,0)<=(x) 22 \ (x)<ADDR(ROWNUM-1,COLNUM-1,CELLSPERTILE-1) && \ (COL(x)<COLNUM) && (ROW(x)<ROWNUM)) #endif #define VALIDDIR(x) (0<=(x) && (x)<= 3) /* a pair of tileaddrs */ /* Try to treat addr as local when same tile? */ #define ISLOCAL(x,y) (((x)<=(y))? ((COLINCR==((y)-(x)) & COL(y)!=0) \ || ROWINCR==((y)-(x))) : \ ((COLINCR==((x)-(y)) \text{ & } COL(x)!=0) \mid \mid ROWINCR==((x)-(y)))) - Compute direction of y from x (or undef). #define WHICHDIR(x,y) \ ((COLINCR==((y)-(x)))?((COL(y)==0)?UNDEF : EAST): \ (ROWINCR==((y)-(x)))?SOUTH: \ (-COLINCR==((y)-(x)))?((COL(x)==0)? UNDEF: WEST): \ (-ROWINCR==((y)-(x)))?MORTH: UNDEF) ``` ``` /* addr -> portno */ #define PORTHO(a) ((1&((a)+((a)>>ADDRSHIFT)))?HUM(a)+4:HUM(a)) #define PORTMOSHIFT 8 #define PORTHOCHT 8 #define PORTWOCASE(a) ((a) < CELLSPERTILE)</pre> #define PORTWOSELBIG(a,b) (((a)<CELLSPERTILE) ? ((b)>>PORTWOSHIFT) : \ ((b)&LOWERBITS(PORTWOSHIFT))) #define PORTWOSEL(a,b) ((a) ? ((b)>>CELLSPERTILE) : \ ((b)&LOWERBITS(CELLSPERTILE))) #define PORTMOREPLY(a,b) (((a)<CELLSPERTILE) ? ((b)<<CELLSPERTILE) : (b))</pre> /* ROW and COL work for PORTADDRs too */ #define PORTADDR(x,y,d) (((d) << ADDRTOPSHIFT) + ((x)<<ADDRSHIFT)+(y))</pre> #define PORTDIR(x) ((x) >> ADDRTOPSHIFT) /* number of registers in one set */ #define REGNUM 4 These are some simple defines used to make the SIMD code a bit more readable. /* registers */ #define TOK 0 #define FLAGS 1 #define LEFT 2 #define RIGHT 3 #define NTOK 4 #define NFLAGS 5 #define MLEFT 6 #define WRIGHT 7 #define ACC 8 #define REGCNT 9 /* directions */ #define EAST 0 #define WORTH 1 #define WEST 2 #define SOUTH 3 ``` In fact, it is possible to use 16 registers by changing REGCHT to 16. One may also want to adjust REGHUM. ## 6 Instruction Set Summary Specifiers for arguments to SIMD instructions: - val: 16 (actually 32 bit) value - mark: 0-31 bit position - reg, ptr, source-ptr, target-ptr, sreg, treg, source, target: 0-8 register selector May be 0-15 instead. - dir: 0=EAST, 1=NORTH, 2=WEST, 3=SOUTH - portno: 0-7 select active element on port ## Basic instructions ``` MOP - Do nothing. INIT - Activate all allocated. ERASE - Clear marks. CONST val - val to ACC. CLEAR reg -\theta to reg. EQ reg reg - Various tests. MEQ reg reg GT reg reg LE reg reg LT reg reg GE reg reg TSTZERO reg TSTWZERO reg MOVE treg sreg - From source to target. ADD treg sreg SUB treg sreg LOGNOT reg LOGAND treg sreg LOGIOR treg sreg LOGXOR treg sreg (For these next, probably want to use a state bit for bits shifted out) ROTATEL treg sreg - 16-bit rotate left. ROTATER treg sreg — 16-bit rotate right SHIFTL treg sreg - Shift left ... SHIFTR treg sreg - Shift right. SHIFTRA treg sreg - Shift right arithmetic (preserve sign). SETRANDOM treg — Put in 16 (32) bit psuedo-random value. TSTMARK mark — Local mark operations. TSTNOTMARK mark SETMARK mark CLRMARK mark TSTMARKPTR mark source-ptr — Adjacent tile mark operations. TSTWOTMARKPTR mark source-ptr SETMARKPTR mark target-ptr CLRMARKPTR mark target-ptr INCR val - Increment reference count of active cell. val can be +1,-1 INCRPTR1 val target-ptr - RISC incrptr part 1 (cells 0,1). INCRPTR2 val target-ptr - RISC incrptr part 2 (cells 2,3). SETTRYING — Set state trying flag. Performs global feedback on any active. INITTRYING — Equiv. to: INIT, test TRYING flag. ``` Performs global feedback on any active. CLRTRYING - Clear state trying flag. ARBPORTPTR reg - Arbitrate for port based on pointer, clear TRYING if succeed (deactivate if non-local). ARBPORT dir - Arbitrated on given port clear TRYING if succeed. MAKEREQ ptr - RISC fetch part 1; uses aux. May want a version, MAKEREQDIR dir, that goes in a certain direction (not defined yet). FETCHDATA treg source-ptr sreg - RISC fetch part 2. STORE target-ptr treg sreg - RISC store (assumes arbitration). [SHOULD BE SPLIT INTO MAKEREQ and STOREDATA.] STOREPH portno target-ptr treg sreg - Portno activation, no arb. needed. FETCHPN portno treg source-ptr sreg - Portno activation, no arb. needed. CELLARBPTR target-ptr — After ARBPORTPTR, do cell arbitration (not used). ALLOCREQ - RISC alloc part 1; uses aux. AVAIL1 reg - RISC alloc part 2 (cells 0,1). AVAIL2 reg — RISC alloc part 3 (cells 2,3). ALLOCPTR target-ptr - RISC alloc part 4; allocate pointed-at cell. SENDGLOBAL — Initiate global feedback. resetglob() - CONTROLLER CODE: reset global feedback indicator. testgloba() - CONTROLLER CODE: test global feedback indicator. ROW/COL operations ARBTILE - ROW/COL part 1; equiv. to ARBPORT MORTH, but doesn't affect TRYING. ARBCOL - ROW/COL part 2. SELROW target - ROW/COL part 3. ARBROW target - ROW/COL part 4. SELCELL ptr - ROW/COL part 5. FETCHGLBL target source-ptr reg - Global actions. STOREGLBL target-ptr reg source TSTMARKGLBL source-ptr mark TSTNOTMARKGLBL mark source-ptr SETMARKGLBL mark target-ptr CLRMARKGLBL mark target-ptr INCRGLBL val target-ptr Other instructions SWAP — Switch TOK, NTOK, etc. COMMIT — SWAP; set COMMIT flag. COMMITMARK mark — If mark, then SWAP; always set COMMIT flag. SETCOMMIT — Set COMMIT flag. TSTSTATE flags — Operations on state flags (use masks not bit positions). TSTNOTSTATE flags ``` SETSTATE flags CLRSTATE flags TSTLOCAL ptr — Test if local pointer. TSTREMOTE ptr — Test if remote pointer. TSTADDR ptr - Test if appears to be an address. TSTNOTADDR ptr - Test if appears not to be an address. TSTIMUSE — Test is reference count is positive (i=1). TSTUBUSED — Test if reference count is not positive (i=0). TSTUESHARED — Test if reference count is 1. RELOCREQ ptr — If local, succeed, otherwise start relocation. (Set REMOTE_ flags.) DELETE — Clear registers, marks, and state. INITALL - Init all including unallocated. TSTBLACK — Checkerboard activation. TSTRED - Checkerboard activation. TSTCLASS class -- One of 5 cases activation, class: 0-4. TSTCELLHUM cellnum — activate based on cell number: 0-3. ``` ## 7 Annotated Code The following is the key simulator code with some annotations. This is the initialization routine. The ids of objects are filled in (although these are not currently used). The central function is to link up the cells with the port buses. ``` ensemble_init() register int i,j,n,a; cellstate *cp; simdinstr.id = 0; simdinstr.op = NOP; simdinstr.arg1 = UNDEF; simdinstr.arg2 = UNDEF; simdinstr.arg3 = UNDEF; simdinstr.arg4 = UNDEF; for (i=0; i<PORTNUM; i++) {</pre> port[i].portid = MKID(PORT,i); port[i].portloc = 0; for (i=0; i<ROWNUM; i++) row[i].rowid = MKID(ROWS,i); for (i=0; i<COLNUM; i++) col[i].colid = MKID(COLS,i);</pre> for (i=0; i<ROWNUM; i++)</pre> for (j=0; j<COLNUM; j++) {</pre> for (n=0; n<CELLSPERTILE; n++) {</pre> ``` ``` a = ADDR(i,j,n); cp = &CELLAT(a); cp->self = MKID(CELL,a); /* used when all cells on a port need to use a bit, etc. */ cp->portno = n + ((1&(i+j)) ? CELLSPERTILE : 0); cp->row = trow[i]; - Row and col are easy. cp->col = &col[j]; — Ports are more complex. cp->PORTW = *port[PORTWINDX(i,j)]; cp->PORTH->portloc = PORTADDR(i,j,MORTH); cp->PORTW = &port[PORTWINDX(i,j)]; cp->PORTW->portloc = PORTADDR(i,j,WEST); cp->PORTE = &port[PORTWINDX(i,j+1)]; if (j+1 == COLMUM) cp->PORTE->portloc = PORTADDR(i,j,EAST); cp->PORTS = &port[PORTNINDX(i+1,j)]; if (i+1 == ROWNUM) cp->PORTS->portloc = PORTADDR(i,j,SOUTH); cp->marks = 0; cp->refc = 0; cp->state = 0; /* { int m; for (m=0; m<REGCET; m++) cp->reg[m] = 0; } */ } } } ``` Here are the basic SIMD broadcast routines. Simply copy arguments to simdinstr structure. (Would have liked a somewhat different approach to variable numbers of arguments than provided in C.) ``` simd(op) int op; simdinstr.op = op; simdinstr.arg1 = UNDEF; simdinstr.arg2 = UNDEF; simdinstr.arg3 = UNDEF; simdinstr.arg4 = UNDEF; simdstep(); simd1(op,arg) int op, arg; simdinstr.op = op; simdinstr.arg1 = arg; simdinstr.arg2 = UNDEF; simdinstr.arg3 = UNDEF; simdinstr.arg4 = UNDEF; simdstep(); simd2(op,arg1,arg2) int op,arg1,arg2; simdinstr.op = op; simdinstr.arg1 = arg1; simdinstr.arg2 = arg2; simdinstr.arg3 = UNDEF; ``` ``` simdinstr.arg4 = UNDEF; simdstep(); } simd3(op,arg1,arg2,arg3) int op,arg1,arg2,arg3; simdinstr.op = op; simdinstr.arg1 = arg1; simdinstr.arg2 = arg2; simdinstr.arg3 = arg3; simdinstr.arg4 = UNDEF; simdstep(); } simd4(op,arg1,arg2,arg3,arg4) int op,arg1,arg2,arg3,arg4; simdinstr.op = op; simdinstr.arg1 = arg1; simdinstr.arg2 = arg2; simdinstr.arg3 = arg3; simdinstr.arg4 = arg4; simdstep(); This is the core SIMD execution routine (the process of execution has already been briefly discussed). The important state changes involve clock, simdinst.ph, and the commnication buses. simdstep() register int i,op,p,flg,val; clock++; - Update clock. op = simdinstr.op; — Get op. if (recordinstrfreq) { if (op != IMIT) instrfreq[op]++; if (showinstrs) { printf("%d: ",clock); printop(); printf("\n"); fflush(stdout); } if (op != NOP) { /* Actually there seem to be cases here */ /* arbcol, arbrow are actions over different entities */ /* Could improve on this */ if (op==ARBCOL) - Conditionally reset row/col buses. for (i=0; i<COLNUM; i++) { col[i].colval = 0; col[i].colowner = 0; }</pre> else if (op==SELROW) for (i=0; i<COLNUM; i++) col[i].colval = UNDEF;</pre> else if (op==SELCELL) for (i=0; i<CELLNUM; i++) cell[i].aux = 0;</pre> ``` ``` - Always reset port buses. for (i=0; i<PORTHUM; i++) port[i].portval = 0;</pre> for (i=0; i<PORTNUM; i++) port[i].portaddr = UNDEF;</pre> /* if (verbose) printf("phase1\n"); */ simdinstr.ph = 1; - Perform phase 1, all cells. for (i=0; i<CELLNUM; i++)</pre> simdaction(&cell[i]); /* if (verbose) printf("port\n"); */ /* In the following could put checks to see if bus already non-UNDEF which would indicate an error in use; this also ignores issues of cell arb */ if (op == FETCH) { - Not currently used. for (i=0; i<PORTWUM; i++) if (port[i].portaddr != UNDEF) port[i].portval = CELLAT(port[i].portaddr).reg[simdinstr.arg3]; } else if (op == CELLARBPTR) { for (i=0; i<CELLNUM; i++) { flg = 0; val = MSKBITM(cell[i].portno); for (p=0; p<PORTCMT; p++) {</pre> if (TSTBIT(cell[i].ports[p]->portval,val)) { if (flg) CLRBIT(cell[i].ports[p]->portval,val); flg = 1; } } } } else if (op == ARBROW) { for (i=0; i<ROWBUM; i++) { row[i].rowval = UMDEF; row[i].rowowner = 0; } for (p=COLNUM-1; 0<=p; p--) { if (0 != col[p].colowner) { val = col[p].colval; if (val != UNDEF) { i = row[val].rowowner; if (i != 0) SETBIT(col[COL(i)].colval,FAILFLAG); row[val].rowowner = col[p].colowner; } } } } /* if (verbose) printf("phase2\n"); */ simdinstr.ph = 2; - Perform phase 2, all cells. for (i=0; i<CELLWUM; i++)</pre> simdaction(&cell[i]); ``` } This routine, simulaction, is the heart of it all. It gets a pointer to a cell and uses the clock phase and the SIMD broadcast to perform the appropriate action on the cell and busses. Here is the prolog. The large switch statement is discussed below. Note the "local" macros that are introduced. ``` simdaction(cp) register cellstate *cp; { register int op,ph,st,r,val; int dir,tile,mask,a,flg; /* Do some top-level case analysis depending on whether active, etc. */ st = cp->state; #define DEACTIVATE CLRBIT(cp->state,STATE_ACTIVE) #define ACTIVATE SETBIT(cp->state,STATE_ACTIVE) op = simdinstr.op; ph = simdinstr.ph; #define IS_ACTIVE TSTBIT(st,STATE_ACTIVE) #define IS_VALID TSTBIT(st,STATE_VALID) switch (op) { -- All the case are discussed below. ``` Some cases in the switch are omitted. The various operations are discussed below. Here is a large group of relatively trivial operations that are not discussed in detail. These instructions consist of simple local operations and tests. ``` case NOP: break; case INIT: if (ph==2 && TSTBIT(st,STATE_VALID)) ACTIVATE; break; case ERASE: if (ph==2 && IS_ACTIVE) cp->marks = 0; break: case TSTMARK: /* mark */ if (ph==2 && IS_ACTIVE && !TSTBITM(cp->marks, simdinstr.arg1)) DEACTIVATE: break: case TSTNOTMARK: /* mark */ if (ph==2 && IS_ACTIVE && TSTBITM(cp->marks, simdinstr.arg1)) DEACTIVATE: break: case SETMARK: /* mark */ if (ph==2 && IS_ACTIVE) SETBITW(cp->marks,simdinstr.arg1); case CLRMARK: /* mark */ if (ph==2 && IS_ACTIVE) CLRBITM(cp->marks,simdinstr.arg1); break: case CONST: /* val */ if (ph==2 && IS_ACTIVE) cp->CELLACC = simdinstr.arg1; ``` ``` break; case CLEAR: /* reg */ if (ph==2 && IS_ACTIVE) cp->reg[simdinstr.arg1] = 0; break; case EQ: /* reg reg */ if (ph==2 && IS_ACTIVE && !((cp->reg[simdinstr.arg1])==(cp->reg[simdinstr.arg2]))) DEACTIVATE: break; case MEQ: /* reg reg */ if (ph==2 && IS_ACTIVE && !((cp->reg[simdinstr.arg1])!=(cp->reg[simdinstr.arg2]))) DEACTIVATE: break; case GT: /* reg reg */ if (ph==2 && IS_ACTIVE && !((cp->reg[simdinstr.arg1])>(cp->reg[simdinstr.arg2]))) DEACTIVATE; break: case LE: /* reg reg */ if (ph==2 && IS_ACTIVE && !((cp->reg[simdinstr.arg1])<=(cp->reg[simdinstr.arg2]))) DEACTIVATE; break: case LT: /* reg reg */ if (ph==2 && IS_ACTIVE && !((cp->reg[simdinstr.arg1])<(cp->reg[simdinstr.arg2]))) DEACTIVATE; break: case GE: /* reg reg */ if (ph==2 && IS_ACTIVE && !((cp->reg[simdinstr.arg1])>=(cp->reg[simdinstr.arg2]))) DEACTIVATE; break; case TSTZERO: /* reg */ if (ph==2 && IS_ACTIVE && !((cp->reg[simdinstr.arg1])==0)) DEACTIVATE: break; case TSTWZERO: /* reg */ if (ph==2 && IS_ACTIVE && ((cp->reg[simdinstr.arg1])==0)) DEACTIVATE: break; case MOVE: if (ph==2 && IS_ACTIVE) cp->reg[simdinstr.arg1] = cp->reg[simdinstr.arg2]; break: case ADD: /* treg, sreg */ if (ph==2 && IS_ACTIVE) cp->reg[simdinstr.arg1] += cp->reg[simdinstr.arg2]; break; case SUB: /* treg, sreg */ if (ph==2 && IS_ACTIVE) cp->reg[simdinstr.arg1] -= cp->reg[simdinstr.arg2]; ``` ``` break; case LOGNOT: /* reg */ if (ph==2 && IS_ACTIVE) cp->reg[simdinstr.arg1] = "cp->reg[simdinstr.arg1]; case LOGAND: /* treg, sreg */ if (ph==2 && IS_ACTIVE) cp->reg[simdinstr.arg1] &= cp->reg[simdinstr.arg2]; break; case LOGIOR: /* treg, sreg */ if (ph==2 && IS_ACTIVE) cp->reg[simdinstr.arg1] |= cp->reg[simdinstr.arg2]; break; case LOGXOR: /* treg sreg */ if (ph==2 && IS_ACTIVE) cp->reg[simdinstr.arg1] ^= cp->reg[simdinstr.arg2]; case ROTATEL: /* treg sreg */ if (ph==2 && IS_ACTIVE) { /* This is specifically a 16 bit operation and doesn't use a condition flag */ val = cp->reg[simdinstr.arg2]; val = ((val << 1) & Oxffff) + (val & Ox8000 ? 1 : 0);</pre> cp->reg[simdinstr.arg1] = val; } break; case ROTATER: /* treg sreg */ if (ph==2 && IS_ACTIVE) { /* This is specifically a 16 bit operation and doesn't use a condition flag */ val = cp->reg[simdinstr.arg2]; val = ((val >> 1) & 0x7fff) + (val & 1 ? 0x8000 : 0); cp->reg[simdinstr.arg1] = val; } break; case SHIFTL: /* treg sreg */ if (ph==2 && IS_ACTIVE) cp->reg[simdinstr.arg1] = (cp->reg[simdinstr.arg2]) << 1;</pre> break; case SHIFTR: /* treg sreg */ if (ph==2 && IS_ACTIVE) cp->reg[simdinstr.arg1] = ((unsigned int)(cp->reg[simdinstr.arg2])) >> 1; case SHIFTRA: /* treg sreg */ if (ph==2 && IS_ACTIVE) /* This is probably not very portable, but produces the desired effect on a SPARC +/ cp->reg[simdinstr.arg1] = (cp->reg[simdinstr.arg2]) >> 1; break: ``` The following operations make key use of the PORTNO assignment for cells which allocates a unique wire to each cell on a port bus. <sup>-</sup> If mark is set on pointed at cell, stay active. ``` case TSTMARKPTR: /* mark, source-ptr */ /* Note: the following is done by all cells */ if (ph==1) { if (IS_VALID && TSTBITW(cp->marks, simdinstr.arg1)) { val = MSKBITW(cp->portno); for (r=0; r<PORTCWT; r++) SETBIT(cp->ports[r]->portval,val); } ' } else { /* ph==2 */ if (IS_ACTIVE) { if (r = cp->reg[simdinstr.arg2], VALIDADDR(r)) { tile = TILE(r); dir = WHICHDIR(TILE(cp->self), tile); if (dir == UNDEF) { if (simdinstr.arg2==LEFT) SETBIT(cp->state,STATE_REMOTE_LEFT); else if (simdinstr.arg2==RIGHT) SETBIT(cp->state,STATE_REMOTE_RIGHT); DEACTIVATE; } else { val = PORTWO(r); if (!TSTBITW(cp->ports[dir]->portval,val)) DEACTIVATE; } else DEACTIVATE; } } break; - If mark is clear on pointed at cell, stay active. case TSTMOTMARKPTR: /* mark, source-ptr */ /* Almost identical to the above */ /* Note: the following is done by all cells */ if (ph==1) { if (IS_VALID && TSTBITW(cp->marks, simdinstr.arg1)) { val = MSKBITW(cp->portno); for (r=0; r<PORTCMT; r++) SETBIT(cp->ports[r]->portval,val); } else { /* ph==2 */ if (IS_ACTIVE) { if (r = cp->reg[simdinstr.arg2], VALIDADDR(r)) { tile = TILE(r); dir = WHICHDIR(TILE(cp->self),tile); if (dir == UNDEF) { if (simdinstr.arg2==LEFT) SETBIT(cp->state,STATE_REMOTE_LEFT); else if (simdinstr.arg2==RIGHT) SETBIT(cp->state,STATE_REMOTE_RIGHT); DEACTIVATE; } else { val = PORTNO(r); if (TSTBITM(cp->ports[dir]->portval, val)) - No!. DEACTIVATE; } else DEACTIVATE; } } break; ``` ``` - Set mark on pointed at cell. case SETMARKPTR: /* mark, target-ptr */ if (ph==1) { if (IS_ACTIVE) { if (r = cp->reg[simdinstr.arg2], VALIDADDR(r)) { dir = WHICHDIR(TILE(cp->self),TILE(r)); if (UNDEF == dir) { if (simdinstr.arg2==LEFT) SETBIT(cp->state,STATE_REMOTE_LEFT); else if (simdinstr.arg2==RIGHT) SETBIT(cp->state,STATE_REMOTE_RIGHT); DEACTIVATE; } else { SETBIT(cp->ports[dir]->portval, MSKBITW(PORTWO(r))); } else DEACTIVATE; } } else { /* ph==2 */ if (IS_VALID) { val = MSKBITW(simdinstr.arg1); if (!(TSTBIT(cp->marks,val))) { mask = val; val = MSKBITM(cp->portno); for (r=0; r<PORTCMT; r++)</pre> if (TSTBIT(cp->ports[r]->portval,val)) { SETBIT(cp->marks,mask); break; } } } } break; - Clear mark on pointed at cell. case CLRMARKPTR: /* mark, target-ptr */ /* Almost identical to the above */ if (ph==1) { if (IS_ACTIVE) { if (r = cp->reg[simdinstr.arg2], VALIDADDR(r)) { dir = WHICHDIR(TILE(cp->self),TILE(r)); if (UNDEF == dir) { if (simdinstr.arg2==LEFT) SETBIT(cp->state,STATE_REMOTE_LEFT); else if (simdinstr.arg2==RIGHT) SETBIT(cp->state,STATE_REMOTE_RIGHT); DEACTIVATE: } else { SETBIT(cp->ports[dir]->portval, MSKBITW(PORTWO(r))); } else DEACTIVATE; } else { /* ph==2 */ if (IS_VALID) { val = MSKBITN(simdinstr.arg1); if (TSTBIT(cp->marks, val)) { - No!. mask = val; val = MSKBITM(cp->portno); for (r=0; r<PORTCWT; r++)</pre> if (TSTBIT(cp->ports[r]->portval,val)) { ``` ``` CLRBIT(cp->marks, mask); - Versus SETBIT. break; } } } } break; The following group consists of port arbitration instructions, and fetch/store variants. - Arbitrate on specified port. case ARBPORT: /* dir */ if (IS_ACTIVE) { dir = simdinstr.arg1; if (ph==1) { if (!VALIDDIR(dir)) { CLRBIT(cp->state,STATE_TRYING); /*newer*/ DEACTIVATE: } else /* This requires that the ports be cleared to start with */ SETBITM(cp->ports[dir]->portval,cp->portno); } else { /* ph ==2 */ /* lower numbered bits have higher priority */ if ((cp->ports[dir]->portval) & LOWERBITS(cp->portno)) DEACTIVATE; CLRBIT(cp->state,STATE_TRYING); /*newer*/ } } break; Arbitrate on port given by pointer. case ARBPORTPTR: /* reg */ if (IS_ACTIVE && (r = cp->reg[simdinstr.arg1], VALIDADDR(r))) { tile = TILE(r): dir = WHICHDIR(TILE(cp->self),tile); if (ph==1) { if (dir == UNDEF) { if (simdinstr.arg1==LEFT) SETBIT(cp->state,STATE_REMOTE_LEFT); else if (simdinstr.arg1==RIGHT) SETBIT(cp->state,STATE_REMOTE_RIGHT); CLRBIT(cp->state,STATE_TRYING); /*newer*/ DEACTIVATE: } else SETBITN(cp->ports[dir]->portval,cp->portno); } else { /* ph ==2 */ if ((cp->ports[dir]->portval) & LOWERBITS(cp->portno)) DEACTIVATE; else CLRBIT(cp->state,STATE_TRYING); /*newer*/ } } break: - Assumes arbitration. -This should be split into two instructions. case STORE: /* target-ptr, treg, sreg */ /* originally written using action between ph==1 and ph==2 */ if (ph==1) { ``` ``` if (IS_ACTIVE && (r = cp->reg[simdinstr.arg1], VALIDADDR(r))) { tile = TILE(r); dir = WHICHDIR(TILE(cp->self),tile); if (dir == UNDEF) DEACTIVATE; cp->ports[dir]->portaddr = r; cp->ports[dir]->portval = cp->reg[simdinstr.arg3]; } } else { /* ph==2 */ /* Note: the following is done by all VALID cells */ if (IS_VALID) { val = cp->self; /* Check only one? */ for (r=0; r<PORTCNT; r++) if (cp->ports[r]->portaddr == val) cp->reg[simdinstr.arg2] = cp->ports[r]->portval; } } break: - Assumes arbitration. - MAKEREQ FETCHDATA performs a fetch. case MAKEREQ: /* ptr */ if (ph==1) { cp->aux = 0; /* Note: done by all */ if (IS_ACTIVE && (r = cp->reg[simdinstr.arg1], VALIDADDR(r))) { tile = TILE(r); dir = WHICHDIR(TILE(cp->self), tile); if (dir == UNDEF) { if (simdinstr.arg1==LEFT) SETBIT(cp->state,STATE_REMOTE_LEFT); else if (simdinstr.arg1==RIGHT) SETBIT(cp->state,STATE_REMUTE_RIGHT); DEACTIVATE: } else cp->ports[dir]->portaddr = r; /* Could xor */ } else { /* ph ==2 */ /* Note: the following is done by all cells */ val = cp->self; /* Check only one? */ for (r=0; r<PORTCWT; r++)</pre> if (cp->ports[r]->portaddr == val) SETBITM(cp->aux,r); break; case FETCHDATA: /* treg, source-ptr, sreg */ if (ph==1) { /* Note: the following is done by all cells */ if (IS_VALID) { val = cp->aux; if (val != 0) for (r=0; r<PORTCMT; r++) if (TSTBITM(val,r)) { cp->ports[r]->portaddr = cp->self; cp->ports[r]->portval = cp->reg[simdinstr.arg3]; ``` ``` } } else { /* ph ==2 */ if (IS_ACTIVE && (r = cp->reg[simdinstr.arg2], VALIDADDR(r))) { tile = TILE(r); dir = WHICHDIR(TILE(cp->self), tile); if (dir == UNDEF) DEACTIVATE; else if (cp->ports[dir]->portaddr == r) cp->reg[simdinstr.arg1] = cp->ports[dir]->portval; else { printid(cp); printf(" fetchdata: bad data portaddr = "); printaddr(cp->ports[dir]->portaddr); printf(" portval = %d\n",cp->ports[dir]->portval); } break; - No arbitration needed, activate by PORTNO. case STOREPH: /* portno, target-ptr, treg, sreg */ /* originally written using action between ph==1 and ph==2 */ if (ph==1) { if (IS_ACTIVE && (simdinstr.arg1 == cp->portno) 22 (r = cp->reg[simdinstr.arg2],VALIDADDR(r))) { tile = TILE(r): dir = WHICHDIR(TILE(cp->self), tile); if (dir == UNDEF) { if (simdinstr.arg2==LEFT) SETBIT(cp->state,STATE_REMOTE_LEFT); else if (simdinstr.arg2==RIGHT) SETBIT(cp->state_STATE_REMOTE_RIGHT); DEACTIVATE; } else { cp->ports[dir]->portaddr = r; cp->ports[dir]->portval = cp->reg[simdinstr.arg4]; } } else { /* ph==2 */ /* Note: the following is done by all cells */ if (IS_VALID) { val = cp->self; /* Check only one? */ for (r=0; r<PORTCMT; r++)</pre> if (cp->ports[r]->portaddr == val) cp->reg[simdinstr.arg3] = cp->ports[r]->portval; } } break; - No arbitration needed, activate by PORTNO. case FETCHPW: /* portno, treg, source-ptr, sreg */ if (ph==1) { if (IS_VALID) { if (simdinstr.arg1 == cp->portno) { val = cp->reg[simdinstr.arg4]; ``` ``` for (r=0; r<PORTCNT; r++) { cp->ports[r]->portaddr = cp->self; cp->ports[r]->portval = val; } } } else { /* ph==2 */ if (IS_ACTIVE && (val = cp->reg[simdinstr.arg3],VALIDADDR(val))) { for (r=0; r<PORTCNT; r++)</pre> if (cp->ports[r]->portaddr == val) { cp->reg[simdinstr.arg2] = cp->ports[r]->portval; break; } } break; This initiates global feedback. If anything is active, then global feedback bit should eventually be set. case SENDGLOBAL: if (IS_ACTIVE & ph==2) { globalfeedback++; break; These are the row/col instructions with associated global actions. The normal sequence is ARBTILE ARBCOL SELROW ARBROW SELCELL, and then perform global operations. case ARBTILE: if (IS_ACTIVE) { if (ph==1) { SETBITW(cp->ports[WORTH]->portval,cp->portno); } else { /* ph ==2 */ /* lower numbered bits have higher priority */ if ((cp->ports[WORTH]->portval) & LOWERBITS(cp->portno)) DEACTIVATE; } break; case ARBCOL: if (IS_ACTIVE) { if (ph==1) { SETBITW(cp->col->colval,ROW(cp->self)); } else { /* ph==2 */ if ((cp->col->colval) & LOWERBITS(ROW(cp->self))) DEACTIVATE; else cp->col->colowner = cp->self; } } break: case SELROW: /* target */ if (ph==2 && IS_ACTIVE) { if (r = cp->reg[simdinstr.arg1], VALIDADDR(r)) { if (cp->col->colowner == cp->self) cp->col->colval = ROW(r); else { ``` ``` printid(cp); printf(" selrow bad active cell\n"); DEACTIVATE; } else DEACTIVATE; } break; case ARBROW: /* target */ if (ph==2 && IS_ACTIVE) { if (r = cp->reg[simdinstr.arg1], VALIDADDR(r)) { if (cp->col->colowner == cp->self) { if (TSTBIT(cp->col->colval,FAILFLAG)) DEACTIVATE; } else { printid(cp); printf(" arbrow bad active cell\n"); DEACTIVATE; } else DEACTIVATE; break; case SELCELL: /* ptr */ if (IS_ACTIVE && ph==2) { if (r = cp->reg[simdinstr.arg1], VALIDADDR(r)) { if (cp->col->colowner != cp->self || row[ROW(r)].rowowner != cp->self) { printid(cp): printf(" selcell: not owner\n"); printcell(cp); printaddr(cp->col->colowner); printf("\n"); printaddr(row[ROW(r)].rowowner); printf("\n"); printcelladdr(row[ROW(r)].rowowner); } else { CELLAT(r).aux = cp->self; } else DEACTIVATE; } break; case FETCHGLBL: /* target source-ptr reg */ if (IS_ACTIVE && (r = cp->reg[simdinstr.arg2], VALIDADDR(r))) { if (cp->col->colowner != cp->self || row[ROW(r)].rowowner != cp->self || CELLAT(r).aux != cp->self) { if (ph==1) { printid(cp); printf(" fetchglbl: not owner\n"); } } else { if (ph==2) { /* no cell arbitration is necessary */ /* only really use the number of the cell from the pointer */ cp->reg[simdinstr.arg1] = CELLAT(r).reg[simdinstr.arg3]; } } } ``` ``` break; case STOREGLBL: /* target-ptr reg source */ /* as usual very similar to the above */ if (IS_ACTIVE && (r = cp->reg[simdinstr.arg1], VALIDADDR(r))) { if (cp->col->colowner != cp->self || row[ROW(r)].rowowner != cp->self || CELLAT(r).aux != cp->self) { if (ph==1) { printid(cp); printf(" storeglbl: not owner\n"); } } else { if (ph==2) { /* no cell arbitration is necessary */ /* if (verbose) { printf("storeglbl: "); printaddr(r); printf(" %d ",simdinstr.arg2); printf(" <- %d\n",cp->reg[simdinstr.arg3]); } */ CELLAT(r).reg[simdinstr.arg2] = cp->reg[simdinstr.arg3]; } } case TSTMARKGLBL: /* source-ptr mark (changed order) */ /* as usual very similar to the above */ if (IS_ACTIVE && (r = cp->reg[simdinstr.arg1], VALIDADDR(r))) { if (cp->col->colowner != cp->self || row[ROW(r)].rowowner != cp->self || CELLAT(r).aux != cp->self) { if (ph==1) { printid(cp); printf(" tstmarkglbl: not owner\n"); } } else { if (ph==2) { /* no cell arbitration is necessary */ if (!TSTBITM(CELLAT(r).marks, simdinstr.arg2)) DEACTIVATE: } } } break; case TSTWOTMARKGLBL: /* source-ptr mark */ /* as usual very similar to the above */ if (IS_ACTIVE && (r = cp->reg[simdinstr.arg1], VALIDADDR(r))) { if (cp->col->colowner != cp->self || row[ROW(r)].rowowner != cp->self || CELLAT(r).aux != cp->self) { if (ph==1) { printid(cp); printf(" tstnotmarkglbl: not owner\n"); } } else { if (ph==2) { /* no cell arbitration is necessary */ if (TSTBITH(CELLAT(r).marks, simdinstr.arg2)) ``` ``` DEACTIVATE; } } } break; case SETMARKGLBL: /* target-ptr mark */ /* as usual very similar to the above */ if (IS_ACTIVE && (r = cp->reg[simdinstr.arg1], VALIDADDR(r))) { if (cp->col->colowner != cp->self || row[ROW(r)].rowowner != cp->self || CELLAT(r).aux != cp->self) { if (ph==1) { printid(cp); printf(" setmarkglbl: not owner "); } } else { if (ph==2) { /* no cell arbitration is necessary */ SETBITM(CELLAT(r).marks, simdinstr.arg2); } } } break; case CLRMARKGLBL: /* target-ptr mark */ /* as usual very similar to the above */ if (IS_ACTIVE && (r = cp->reg[simdinstr.arg1], VALIDADDR(r))) { if (cp->col->colowner != cp->self || row[ROW(r)].rowowner != cp->self || CELLAT(r).aux != cp->self) { if (ph==1) { printid(cp); printf(" clrmarkglbl: not owner "); } } else { if (ph==2) { /* no cell arbitration is necessary */ CLRBITM(CELLAT(r).marks, simdinstr.arg2); } } } break: case INCRGLBL: /* val, target-ptr */ if (IS_ACTIVE && (r = cp->reg[simdinstr.arg2],VALIDADDR(r))) { if (cp->col->colowner != cp->salf || row[ROW(r)].rowowner != cp->self) { if (ph==1) { printid(cp); printf(" incrglbl: not owner\n"); } } else { if (ph==2) { /* no cell arbitration is necessary */ CELLAT(r).refc += simdinstr.arg1; } } } break; ``` This group allows general access to the bits of the state register, but is probably not how this state testing should be implemented. ``` /* Note: the following work with bit masks */ case TSTSTATE: /* flags */ if (ph==2 && IS_ACTIVE && !TSTBIT(cp->state, simdinstr.arg1)) DEACTIVATE; break: case TSTNOTSTATE: /* flags */ if (ph==2 && IS_ACTIVE && TSTBIT(cp->state, simdinstr.arg1)) DEACTIVATE; break; case SETSTATE: /* flags */ if (ph==2 && IS_ACTIVE) SETBIT(cp->state,simdinstr.arg1); break; case CLRSTATE: /* flags */ if (ph==2 && IS_ACTIVE) CLRBIT(cp->state, simdinstr.arg1); break; Here is another group of relatively simple instructions which are tests on addresses/pointers and the reference count register. case TSTLOCAL: /* ptr */ if (ph==2 && IS_ACTIVE) { if (!((r = cp~>reg[simdinstr.arg1], VALIDADDR(r)) && UNDEF != WHICHDIR(TILE(cp->self),TILE(r))) /* deactivate if not a valid address or not local */ DEACTIVATE: } break: case TSTREMOTE: /* ptr */ if (ph==2 && IS_ACTIVE) { if (!((r = cp->reg[simdinstr.arg1],VALIDADDR(r)) && UNDEF == WHICHDIR(TILE(cp->self),TILE(r)))) /* deactivate if not a valid address or local */ DEACTIVATE: } break; case TSTADDR: /* ptr */ if (ph==2 && IS_ACTIVE && !VALIDADDR(cp->reg[simdinstr.arg1])) /* deactivate if actually is a valid address */ DEACTIVATE: break; case TSTNOTADDR: /* ptr */ if (ph==2 && IS_ACTIVE && VALIDADDR(cp->reg[simdinstr.arg1])) /* deactivate if actually is a valid address */ DEACTIVATE: break; case TSTUNUSED: if (ph==2 && IS_ACTIVE && 0 != cp->refc) DEACTIVATE: break; ``` ``` case TSTUNSHARED: if (ph==2 && IS_ACTIVE && 1 != cp->refc) DEACTIVATE; break; case TSTIMUSE: if (ph==2 && IS_ACTIVE && cp->refc <= 0) DEACTIVATE; break: These were designed for use with explicit port arbitration (ARBPORT, ARBPORTPTR), allowing a relatively tight "repeat until all done" loop. case SETTRYING: if (ph==2) { if (IS_ACTIVE) { SETBIT(cp->state,STATE_TRYING); globalfeedback++; else CLRBIT(cp->state,STATE_TRYING); /* When clear? */ } break; case INITTRYING: if (ph==2) { if (TSTALLBIT(st,STATE_VALID|STATE_TRYING)) { ACTIVATE; globalfeedback++; } else { DEACTIVATE; } } break; case CLRTRYING: if (ph==2 && IS_ACTIVE) { CLRBIT(cp->state,STATE_TRYING); break; These are reference count operations (there is also a global version). case INCR: /* val */ if (IS_ACTIVE && ph==2) cp->refc += simdinstr.arg1; case INCRPTR1: /* val, target-ptr */ case IMCRPTR2: if (ph==1) { if (IS_ACTIVE && (val = NUM(cp->self),(op==INCRPTR1 ? 0<=val && val<=1 :</pre> 2<=val && val<=3)) && (r = cp->reg[simdinstr.arg2],VALIDADDR(r))) { dir = WHICHDIR(TILE(cp->self),TILE(r)); if (dir == UWDEF) { if (simdinstr.arg1==LEFT) SETBIT(cp->state,STATE_REMOTE_LEFT); else if (simdinstr.arg1==RIGHT) SETBIT(cp->state,STATE_REMOTE_RIGHT); DEACTIVATE; ``` ``` } else { SETBIT(cp->ports[dir]->portval, ((val==0||val==2)? MSKBITW(PORTWO(r)) : MSKBIT#(PORTWO(r)+PORTWOSHIFT))); } } } else { /* ph==2 */ /* IS_VALID ?? */ val = MSKBITW(cp->portno) | MSKBITW(cp->portno+PORTWOSHIFT); for (r=0; r<PORTCWT; r++)</pre> if (TSTBIT(cp->ports[r]->portval,val)) { if (TSTBIT(cp->ports[r]->portval,MSKBITM(cp->portno))) cp->refc += simdinstr.arg1; if (TSTBIT(cp->ports[r]->portval, MSKBITW(cp->portno+PORTWOSHIFT))) cp->refc += simdinstr.arg1; } } break; ``` These are the allocation instructions. The normal idiom is ALLOCREQ AVAIL1 AVAIL2 ALLOCPTR. This idiom will leave a pointer to a free cell in the specified register (or 0, null pointer, if fails), but the cell is still not allocated. ALLOCPTR is used to actually allocate the cell. Many of the details could be changed and improved here. ``` case ALLOCREQ: if (ph==1) { cp->aux = 0; if (IS_ACTIVE) { val = MSKBITW(cp->portno); for (r=0; r<PORTCMT; r++)</pre> SETBIT(cp->ports[r]->portval,val); } else { /* ph==2 */ if (!TSTBIT(cp->state,STATE_VALID)) { val = 0; mask = PORTNOCASE(cp->portno); for (r=0; r<PORTCMT; r++)</pre> val = (val << CELLSPERTILE) |</pre> PORTMOSEL(mask,cp->ports[r]->portval); cp->aux = val; /* Could use ACC instead */ } } break; case AVAIL1: /* reg */ case AVAIL2: if (ph==1) { if (cp->aux && (val = WUM(cp->self), (op==AVAIL1 ? O<=val && val<=1 : 2<=val && val<=3))) { mask = val: val = LOWESTBIT(cp->aux); for (r=PORTCNT-1; 0<=r; r--) { if (val & LOWERBITS(CELLSPERTILE)) { val = val & LOWERBITS(CELLSPERTILE); ``` ``` val = PORTWOREPLY(cp->portno,val); cp->ports[r]->portval |= ((mask==0 || mask==2) ? val : val << PORTHOSHIFT); break; 3 val >>= CELLSPERTILE; cp->aux = 0, } } else { /* ph==2 */ if (IS_ACTIVE) { if (op==AVAIL1) cp->reg[simdinstr.arg1] = 0; /*??*/ val = MSKBITM(cp->portno) { MSKBITM(cp->portno+PORTMOSHIFT); if (!(op==AVAIL2 && cp->reg[simdinstr.arg1]!=0)) { for (r=0; r<PORTCNT; r++) if (TSTBIT(cp->ports[r]->portval,val)) { if (TSTBIT(cp->ports[r]->portval, MSKBITM(cp->portno))) { if (op==AVAIL1) a = 0; else a = 2; } else if (TSTBIT(cp->ports[r]->portval, MSKBITH(cp->portno+PORTNOSHIFT))) { if (op==AVAIL1) a = 1; else a = 3; val = cp->self; switch (r) { case EAST: val = ADDR(ROW(val),COL(val)+1,a); break; case WORTH: val = ADDR(ROW(val)-1,COL(val),a); break; case WEST: val = ADDR(ROW(val),COL(val)-1,a); break; case SOUTH: val = ADDR(ROW(val)+1,COL(val),a); break; cp->reg[simdinstr.arg1] = val; break; } } if (op==AVAIL2 && cp->reg[simdinstr.arg1] == 0) DEACTIVATE; } break; case ALLOCPTR: /* target-ptr */ if (ph==1) { if (IS_ACTIVE) if ((r = cp->reg[simdinstr.arg1],VALIDADDR(r))) { dir = WHICHDIR(TILE(cp->self),TILE(r)); SETBIT(cp->ports[dir]->portval,MSKBITW(PORTWO(r))); } else DEACTIVATE; } else { /* ph==2 */ /* Note: the following is done by all cells */ if (!TSTBIT(cp->state,STATE_VALID)) { cp->refc = 0; val = MSKBITW(cp->portno); for (r=0; r<PORTCHT; r++) if (TSTBIT(cp->ports[r]->portval,val)) { ``` ``` SETBIT(cp->state, STATE_VALID); cp->refc++; /* should really just do once */ } } break; These are the commit instructions. case COMMIT: /* failure flag? */ if (ph==2 && IS_ACTIVE) { for (r=0; r<REGNUM; r++) { val = cp->reg[r]; cp->reg[r] = cp->reg[r+REGNUM]; /*??*/ cp->reg[r+REGNUM] = val; } /* This should result in eventual deletion of next stuff */ SETBIT(cp->state,STATE_COMMIT); break; case COMMITMARK: /* mark */ if (ph==2 && IS_ACTIVE) { if (TSTBITW(cp->marks, simdinstr.arg1)) { for (r=0; r<REGHUM; r++) {</pre> val = cp->reg[r]; cp->reg[r] = cp->reg[r+REGNUM]; /*??*/ cp->reg[r+REGMUM] = val; } SETBIT(cp->state,STATE_COMMIT); break; case SWAP: if (ph==2 && IS_ACTIVE) { for (r=0; r<REGEUM; r++) { val = cp->reg[r]; cp->reg[r] = cp->reg[r+REGNUM]; /*??*/ cp->reg[r+REGWUM] = val; } } break; case SETCOMMIT: if (ph==2 && IS_ACTIVE) { SETBIT(cp->state,STATE_COMMIT); break; Miscellaneous other intructions: - In real ensemble, expand to code sequence. case DELETE: if (ph==2 && IS_ACTIVE) { ``` ``` cp->marks = 0; cp->refc = 0; cp->state = 0; cp->aux = 0; for (r=0; r<REGCMT; r++) cp->reg[r] = 0; break; - Initiate relocation if non-local. case RELOCREQ: /* ptr */ if (ph==2 && IS_ACTIVE) { if ((simdinstr.arg1 == LEFT || simdinstr.arg1 == RIGHT) && (r = cp->reg[simdinstr.arg1], VALIDADDR(r)) && UNDEF == WHICHDIR(TILE(cp->self),TILE(r))) { if (simdinstr.arg1 == LEFT) SETBIT(cp->state,STATE_REMOTE_LEFT); else SETBIT(cp->state,STATE_REMOTE_RIGHT); DEACTIVATE; } } break; case SETRANDOM: /* reg */ if (ph==2 && IS_ACTIVE) { cp->reg[simdinstr.arg1] = randoom(); - Own random function. break; — May not be usable because of some IS_VALID tests. case INITALL: if (ph==2) ACTIVATE; break: case TSTBLACK: if (IS_ACTIVE && ph==2) if (PORTNOCASE(cp->portno)) DEACTIVATE: break; case TSTRED: if (IS_ACTIVE && ph==2) if (!PORTWOCASE(cp->portno)) DEACTIVATE; break; case TSTCLASS: /* class -- one of 5 cases */ if (IS_ACTIVE && ph==2) { val = cp->self; if (!(simdinstr.arg1 == ((COL(val)+3*ROW(val))%5))) DEACTIVATE; } break: case TSTCELLNUM: /* cellnum */ if (IS_ACTIVE && ph==2) { val = cp->self; if (!(simdinstr.arg1 == NUM(val))) DEACTIVATE: } break; case CELLARBPTR: /* target-ptr */ if (IS_ACTIVE) { ``` ``` if (ph==1) { if (r = cp->reg[simdinstr.arg1], VALIDADDR(r)) { dir = WHICHDIR(TILE(cp->self),TILE(r)); SETBIT(cp->ports[dir]->portval,MSKBITW(PORTWO(r))); } else { /* ph==2 */ if (r = cp->reg[simdinstr.arg1], VALIDADDR(r)) { dir = WHICHDIR(TILE(cp->self),TILE(r)); if (!TSTBIT(cp->ports[dir]->portval,MSKBITM(PORTNO(r)))) DEACTIVATE; } } } break; The action for an illegal op code is: default: printf("Illegal op code %d\n",op); fflush(stdout); exit(-1); } } The following routines are used with the SIMD instruction SENDGLOBAL to simulate the global feedback mechanism. /* special simd operations */ resetglob() globalfeedback = 0; int testglob() return globalfeedback; } ``` C Code of the Ensemble Simulator and of Several Benchmarks Run on it ``` _ ``` ``` "relocreq", "storepn", "fetchpn", "incrglbl", "clrtrying", "delete", "commitmark", "swap", "tstinuse", "setcommit", 'tstrero", 'tstnzero", "selcell", "arbtile", "setrandom", "logxor", "rotatel", "rotater", "shiftl", "shiftr", "shiftra", "alloc" for(1-0;i<CELLNUM;i++) fwrite(scall[i],sizeof(call[0]),l,fp); for(1=0,1<CELLNUM,1++) fread(scell[1],sireof(cell[0]),1,fp); printf("cannot access to .. \n", init); printf("cannot access ts .. \n", init); if(i) (sscanf(argv[0],"ts",name); i--;) strcpy(init,strcat(name,".init")); printf("cannot access ts .. \n", init); verbose - (NULL !- getenv("VERBOSE")); randomstate = 232323; show_ensemble_fn = show_ensemble_std; if((fp-fopen(prog, "r"))==NULL) if((fp-fopen(init,"v")) -- NULL) if((fp-fopen(init, 'r')) ==NULL) sscanf(argv[1],"%s",prog); char name(50),init[50],prog[50]; /* initialize structures */ /* execute SIMD code */ exit(1); exit(1); term build(), main(argc, argv, env) clse fp-stdin; ensemble_.nit(); char **argv, **env; simp_execute(); exit(1); if(ergc+3) 81ze - 4; FILE *fp; 1f(1) 1-arge; int argo; int 1, 100 • "nop", 'nit', "ersse", 'tstaalmark', 'tstnotmark', 'setmark', "clrNAAN", 'CONST", 'CLEAR', 'ED", 'BT", 'LE", 'LT", 'ge", 'move", "sdd", 'sub', 'lognot', 'logand', 'logior', 'arbport', 'arbportptr', 'store', 'stamarbptr', 'stanotmarkptr', 'setmarkptr', 'olmarkptr', 'intall', 'incrptr', 'incr', 'setcommitlock', 'tstnotcommitlock', 'setdeletelock', 'sendglobal', 'callarbptr', 'findfree', 'silocptr', 'arbcol', 'selrow', 'setmarkglobal', 'storeglob', 'stamarkglob', 'tstnotmarkglob', 'setmarkglob', 'clmarkglob', 'findfreedir', 'commit', 'tstromarkglob', 'tstnotstate', 'setstate', 'clmarkglob', 'tstnotmarkglob', 'tstnotmarkglob', 'tstnotstate', 'setstate', 'sallocal', 'stremote', 'tstnotmarkglob', 'tstnotaddr', 'tstunused', 'tstunshared', 'makereq', 'fetchdata', 'settrying', 'inittrying', 'incrptrl', 'incrptr2', 'allocreq', 'svaill', 'svail2', Typed values in registers? Specifically pointer vs number? Goal: stricter RISC approach (e.g. simplified fetch/store). Early draft: not worrying about efficiency at all. Revision based on various discussions. Still using "SIMD code" is "real code" approach Haven't really thought about FAILURE bit stuff. portno stuff not really handled very cleanly RRM "RISC" ensemble simulator in C int clock; /* Note: not phase number */ char *opnames[opcoDENUM] = [ int instrireq[OPCODENUM]; cellstate cell[CELLNUH], int recordingurfreq = 1; portbus port[PORTNUM], /* global variables */ rowbus row[ROWNUM]; /* other globals */ colbus col[COLNUM]; int globalfeedback; simdbus simdinstr; #include <atdio.h> Minclude "risc.h" "def.h" int randomstate, int showinstre, Int verbose, int size; include : ``` Ü risc ``` for (i=0) i <PORTNUM; i++) port[ii.portval = 0; for (i=0; i <PORTNUM; i++) port[ii.portaddr = unDEP;</pre> for (p=0; p<colNUN; p++) { if (col[p].colNal == i) { if (flg) row[i].rowowner = col[p].colowner; else SETBIT(col[p].colval;FAILFLAG);</pre> for (i=0; i <COLNUM; i++) col(i].colval = UNDEF; printop(); printf("\n"); fflush(stdout); ) for (1=0) i < CELLNOW; i++; cell[i].aux = 0; if (op == ARBROW) { /* @@@ this could be much improved(!!) */ /* if (verbose) printf("phase1\n"); */ if (port[i].portaddr !- UNDEF) vel = MSKBITN(cell[1].portno)/ /* @@ could improve on this */ for (1-0; 1<CELLNUM; 1++) { for (1=0; 1 < ROWNUM; 1++) [ for (1-0; 1<PORTNUM; 1++) for (1-0; 1<CELLNUM; 1++) printf("%d: ",clock); if (op -- CELLARBPIR) { simdaction(scell[i]); 1f (ob -- PETCR) ( simdinstr.ph = 1, if (op !- NOP) ( of cell arb */ (ob--SELCELL) flg = 1; flg = 0, if (op--ARBCOL) if (op==SELROH) 97.0 : if (op i= INIT) instrfreq[op]++; register int 1,0p,p,flg,vsl; simd4 (op, argl, arg2, arg3, arg4) int op, argl, arg2, arg3, arg4; simdinstr.op = op; simdinstr.arg1 = UNDEF; simdinstr.arg3 = UNDEF; simdinstr.arg4 = UNDEF; simdinstr.arg2 - UNDEF; simdinstr.arg1 - arg; simdinstr.arg2 - UNDEP; simdinstr.arg3 - UNDEF; simdinstr.arg4 - UNDEF; simdinstr.arg3 - UNDEF, simdinstr.arg4 - UNDEP simdinstr.arg3 = arg3; simdinstr.arg4 = GNDEF; simdinstr.arg1 = arg1; simdinstr.arg2 = arg2; simd3(op, argl, arg2, arg3) simdinstr.arg1 = arg1; simdinstr.arg2 - arg2; simdinstr.arg1 = arg1; simdinstr.arg2 = arg2; simdinstr.arg3 = arg3; simdinstr.arg4 = arg4; (recordinstrfreq) ( int op, argl, arg2, arg3, simdinstr.op = op; simdinatr.op = op; simdinatr.op = op; op = simdinstr.op; if (recordinstrere simd2(op, argl, arg2) if (showinstre) ( int op.argl.arg2; simdinstr.op simdstep(); simdstep(); simdstep(), simdl(op, arg) sindstep(); simdstep(); int op, arg; simdstep() clock++; (do)pujs int op; ``` ``` for (i=0; l < ROWNUM; i++) { row[i], rowval = ONDEF; row[i], rowowner = 0; ] for (i=0) is COLNUM, i++) [ col[i].colval = 0; col[i].colowner = 0; ] port[i].portval = CELLAT(port[i].portaddr).reg[simdinstr.arg3]; /* if (verbose) printf('port\n'); */ /* In the following could put checks to see if bus already non-UNDEP which would indicate an error in use; this also ignores issues /* Actually there seem to be cases here */ /* arbcol, arbrow are actions over different entities */ for (P=0; p<PORTCNT; p++) { if (TSTBIT(cell[i].ports[p]->portval,val)) { if (flg) CLRBIT(cell[i].ports[p]->portval,val), ``` ``` case CLEAR: /* reg */ if (ph==2 && IS_ACTIVE) cp-*reg[simdinstr.arg1] = 0; case ADD: /* treg, sreg */ if (ph==2 && IS_ACTIVE) case SUB: /* treg, sreg */ case LOGNOT: /* reg */ if (ph==2 && IS_ACTIVE) if (ph -- 2 66 IS_ACTIVE) if (ph==2 && IS_ACTIVE) if (ph--2 st IS_ACTIVE) if (ph == 2 && IS_ACTIVE) dir - simdinstr.argl; case ARBPORT: /* dir */ if (IS_ACTIVE) [ case LT: /* reg reg */ if (ph--1) ( DEACTIVATE; DEACTIVATE; DEACTIVATE; DEACTIVATE; DEACTIVATE, DEACTIVATE; CABB MOVE: break; break; break; break; break, break; break, break break; bresk break break break /* No some top-level case analysis depending on whether active, etc. */ if (ph==2 && IS_ACTIVE && !TSTBITN(cp->marks,simdinstr.arg1)) casq ISTNOTHARK: /* mark */ if (ph==2 & L IS_ACTIVE & TSTBITN(cp->marks,simdinstr.argl)) i = row[val].rowowner; if (i i= 0) SETBIT(col{COL(i)].colval,FAILFLAG); ii (ph--2 & TSTBIT(&t,STATE_VALID)) ACTIVATE; #define DEACTIVATE CLRBIT(cp->state, STATE_ACTIVE) row[val].rowowner = col[p].colowner; case ERASE: if (ph==2 &s IS_ACTIVE) cp->marks = 0; op - simdinatr.op; ph - simdinatr.ph; #define IS_ACTIVE TSTBIT(at,STATE_ACTIVE) #define IS_VALID TSTBIT(at,STATE_VALID) /* if (verbose) printf("phase2\n"); */ SETBITN(cp->marks, simdinstr.argl); CLRBITN(cp->m>rks,simdinstr.argl); if (ph=2 && IS_ACTIVE) cp->CELLACC = simdinstr.argl; if (0 != col[p].colowner) [ val = col[p].colval; tor (p-colnum-1; 0 -p; p--) register int op,ph,st,r,val, sim*linstr.ph = 2; for (i=0; i<CELLNUM; i++)</pre> if (val i= UNDE:F) ( case CLRMARK: /* mark */ if (ph==2 && IS_ACTIVE) if (ph--2 tt IS_ACTIVE) int dir, tile, mask, a, flg; case SETMARK: / mark ./ case TSTMARK: /* mark */ s:mdaction(&cell[i]); case CONST: /* val */ register cellstate *cp; case NOP: break; st - cp->state; DEACTIVATE, DEACTIVATE; switch (op) ( simdaction(cp) case INIT: break, break; ``` ``` !((cp->reg[simdinstr.arg1]) ~~(cp->reg[simdinstr.arg2]))) case LE: /* reg reg */ if (ph==2 && IS ACTIVE && if (cp->reg[simdinstr.arg1]) <=(cp->reg[simdinstr.arg2]))) case GE: /* reg reg */ if (ph=2 is IS_ACTIVE is if (cph=2 is IS_ACTIVE is i((cp->reg[simdinstr.arg1))>=(cp->reg[simdinstr.arg1))) case NEQ: /* reg reg */ if (ph--2 is IS_ACTIVE is if (cp->reg[simdinstr.arg1])!=(cp->reg[simdinstr.arg2]))) 1((cp->reg[simdinstr.arg1])>(cp->reg[simdinstr.arg2]))) !((cp->reg(simdinstr.argl)) <(cp->reg(simdinstr.arg2)))) cp->reg(simdinstr.arg1) |- cp->reg(simdinstr.arg2); cp->reg[simdinstr.arg1] &= cp->reg[simdinstr.arg2]; cp->reg[simdinstr.arg1] += cp->reg[simdinstr.arg2]; cp->reg[simdinstr.arg1] = -cp->reg[simdinstr.arg1]; cp->reg[simdinstr.arg1] -= cp->reg[simdinstr.arg2]; cp->reg(simdinstr.arg1) = cp->reg(simdinstr.arg2); CLRBIT(cp->state,STATE_TRYING); / @@ newer*/ case LOGAND: /* treg, sreg */ case LOGIOR: /* treg, sreg */ case EQ: /* reg reg */ if (ph==2 && IS_ACTIVE && case GT: /* reg reg */ if (ph=2 && IS_ACTIVE && if (ph=-2 66 IS ACTIVE 66 if (!VALIDDIR(dir)) { ``` ``` break; break; break; if (simdinstr.argl==LEFT) SETBIT(cp->state,STATE_REMOTE_LEFT); else if (simdinstr.argl==RIGHT) SETBIT(cp->state,STATE_REMOTE_RIGHT); CLRBIT(cp->state,STATE_TRYING); /*@@ newer*/ if (simdinstr.arg2==LEFT) SETBIT(cp->state,STATE_REWOTE_LEFT); else if (simdinstr.arg2==RIGHT) SETBIT(cp->state,STATE_REWOTE_RIGHT); if ((cp->ports(dir)->portval) & LOWERBITS(cp->portno)) DEACTIVATE; if ((cp->ports(dir)->portval) & LOWERBITS(cp->portno)) DEACTIVATE; /* This requires that the ports be cleared to start with */ if (IS_ACTIVE && (r = cp->reg[simdinstr.argl], vALIDADDR(r))) { case FETCH: /* treg, source-ptr, sreg */ if (IS_ACTIVE && (r = cp->reg[simdinstr.arg2],VALIDADDR(r))) [ if (IS_ACTIVE && (r = cp->reg[simdinstr.argl], VALIDADDR(r))) { /* originally written using action between ph==1 and ph==2 */ cp->ports[dir]->portaddr = r; cp->ports[dir]->portval = cp->reg[simdinstr.arg3]; /* Note: the following is done by all VALID cells */ if (iS_VALID) { cp->reg[simdinstr.arg1] = cp->ports[dir]->portval; cp->ports[dir]->portaddr = r; /* could xor */ SETBITN(cp->ports[dir]->portval,cp->portno); SETBITN(cp->ports[dir]->portval,cp->portno); /* lower numbered bits have higher priority */ CLRBIT(cp->state,STATE_TRYING); /*@@ newer*/ CLRBIT(cp->state,STATE_TRYING); /*@@ newer*/ dis - WHICHDIR(TILE(cp->self), tile); dir = WHICHDIR(TILE(cp->self),tile); case STORE: /* target-ptr, treg, sreg */ dir = WHICHDIR(TILE(cp->self),tile); if (dir == UNDEF) DEACTIVATE; /* (Au check only one? */ case ARBPORTPTR: /* reg */ ] else ( /* ph ==2 */ if (dir == UNDEF) ( if (dir -- UNDEF) ( ) else ( /* ph --2 */ ] else { /* ph --2 */ | else | /* ph=-2 */ tile = TILE(r); tile - TILE(r); tile - TILE(r); DEACTIVATE; DEACTIVATE; cb->self; DEACTIVATE; if (ph--1) ( if (ph==1) ( (ph==1) ( break; break; ``` ``` if (simdinstr.arg2==LEFT) SETBIT(cp->state, STATE_REMOTE_LEFT); else if (simdinstr.arg2==LEFT) SETBIT(cp->state,STATE_REMOTE_LEFT); else If (simdinstr.arg2==RIGHT) SETBIT(cp->state,STATE_REMOTE_RIGHT); if (simdinstr.arg2--RIGHT) SETBIT(cp->state, STATE_REMOTE_RIGHT); val = MSKBITN(cp->portno); for (r=0; r<PORTCNT; r++) SETBIT(cp->ports[r]->portval,val); for (r=0; r<PORTCNT; r++) SETBIT(cp->ports[r]->portval,val); if (cp->ports[r]->portaddr == val) cp->reg[simdinstr arg2] = cp->ports[r]->portval; 1f (IS_VALID && TSTBITN(cp->marks, simdinstr.argl)) ( (IS_VALID && TSTBITN(cp->marks, simdinstr.argl)) ( if (r = cp->rcg[simdinstr.arq2],VALIDADDR(r)) { dir = WHICHDIR(TILE(cp->self),TILE(r)); (!TSTBITN(cp->ports(dir)->portval,val)) if (r = cp->reg[simdinstr.arg2], VALIDADDR(r)) if (r = cp->reg[simdinstr.arg2], VALIDADDR(r)) if (TSTBITN(cp->ports(dir]->portval,val)) /* (@ almost identical to the above */ /* Note: the following is done by all cells */ /* Note: the following is done by all cells */ dir = WHICHDIR(TILE(cp->self),tile); dir - WHICHDIR(TILE(cp->self), tile); case TSTNOTMARKPIR: /* mark, source-ptr */ case SETMARKPTR: /* mark, target-ptr */ case TSTMARKPTR: /* mark, source-ptr */ val - MSKBITN(cp->portno); for (r=0; r<PORTCNT; r++) val - PORTNO(r); if (dir -- under) if (dir -- under) val = PORTNO(r); ] else DEACTIVATE; ) else DEACTIVATE; tile - TILE(r); tile - TILE(r); DEACTIVATE; DEACTIVATE; DEACTIVATE; DEACTIVATE; if (IS ACTIVE) [ else ( /* ph--2 */ if (IS ACTIVE) ( else ( /* ph==2 */ if (IS_ACTIVE) ( else ( else ( if (ph -- 1) ( if (ph==1) [ 1f (ph==1) ``` ``` risc.c ``` ``` for (r=0; r<PORTCNT; r++) if (TSTBIT(cp->ports[r]->portval,val)) { if (!(simdinstr.arg1 -- val)) DEACTIVATE; case TSTCLASS: /* class - . one of 5 cases */ if (IS_ACTIVE & ph*=2) { val = cp->self; if (TSTBIT(cp->state, STATE_DELETELOCK)) if (TSTBIT(cp->state, STATE_COMMITLOCK)) SETBIT(cp-'state, STATE_DELETELOCK); SETBIT(cp-'state, STATE_COMMITLOCK); if (!(simdinstr.arg1 == NUM(val))) cp->refc += simdinstr.argl; if (PORTNOCASE(cp.>portno)) if (PORTNOCASE(cp. > portno)) val = MSKBITN(cp->portno); /++++++++++++++++++++++++++++++/ case TSTCELLNUM: /* cellnum */ if (IS_ACTIVE && ph==2) { case TSTRED: 1f (IS_ACTIVE 66 ph==2) case TSTNOTCOMMITLOCK: if (IS_ACTIVE && ph==2) case SETDELETELOCK: if (IS_ACTIVE 66 ph==2) case TSTBLACK: if (IS_ACTIVE 66 ph=2) if (IS_ACTIVE && ph=-2) if (IS_ACTIVE && ph==2) if (IS_ACTIVE && ph--2) Case ISTNOTDELETELOCK: case INCR: /* val */ case SETCOMMITLOCK: 1f (IS VALID) ( val - cp->self; val-cp->self; DEACTIVATE; DEACTIVATE; DEACTIVATE; DEACTIVATE DEACTIVATE; DEACTIVATE; case ALLOC: break; break: if (simdinstr.arg2==LEFT) SETBIT(cp->state,STATE_REMOTE_LEFT); else if (simdinstr.arg2==RIGHT) SETBIT(cp->state,STATE_REMOTE_RIGHT); DEACTIVATE; if (simdinstr.arg2==LEFT) SETBIT(cp->state,STATE_REWOTE_LEFT); else if (simdinstr.arg2==RIGHT) SETBIT(cp->state,STATE_REWOTE_RIGHT); f (IS_ACTIVE is (r = cp->reg[simdinstr.arg2],VALIDADDR(r))) { dir = WHICHDIR(TILE(cp->self),TILE(r)); SETBIT(cp->ports[dir]->portval,MSRBITN(PORTWO(r))); SETBIT(cp->ports(dir)->portval, MSKBITN(PORTNO(r))); SETBIT(cp.>ports(dir)->portval, MSKBITN(PORTNO(r))); if (r = cp->reg[simdinstr.arg2],VALIDADDR(r)) { dir = WHICHDIR(TILE(cp->self),TILE(r)); if (TSTBIT(cp->ports(r)->portval,val)) { if (TSTBIT(cp->ports[r]->portval,val)) ( Case CLRMARKPTR: /* mark, target.ptr */ /* (@ almost identical to the above */ f (!(TSTBIT(cp->marks,val))) ( val = MSKBITN(simdinstr.argl); case INCRPTR: /* val,target-ptr */ val = MSKBITN(simdinstr.argl); SETBIT(cp-'marks,mask); CLRBIT(cp.>marks,mask); if (TSTBIT(cp.>marks,val)) [ val - MSKBITN(cp->portno); val = MSKBITN(cp->portno); for (r=0; r < PORTCNT; r++) for (r=0; rePORTCNT; r++) if (UNDEF == dir) { if (UNDEF -- dir) ( } else DEACTIVATE; | else DEACTIVATE; | else ( /* ph--2 */ | else ( /* ph--2 '/ if (ph==2) ACTIVATE; | else ( /* ph--2 */ DEACTIVATE; If (IS ACTIVE) ( ) ( IS VALID) ( if (IS VALID) mask = val; mask = val; break; break; ) else ( else (ph--1) ( case INITALL: if (ph--1) break; break; ``` ``` else ( SETBIT(cp->state,STATE_VALID); cp->refc=1; ) if (!(simdinstr.arg1 -- ((COL(val)+3*ROW(val))%5))) if (IS_ACTIVE 64 ph=-2) cp->refc += simdinstr.argl; ``` ``` 9 ``` ``` case EAST: val = ADDR(ROW(val),COL(val)+1,a); break; case NORTH: val = ADDR(ROW(val)-1,COL(val),a); break; case WEST: val = ADDR(ROW(val),COL(val)-1,a); break; case SOUTH: val = ADDR(ROW(val)+1,COL(val),a); break; if (r = cp > reg[simdinstr.arg]], VALIDADDR(r)) { dir = WHICHDIR(TILE(cp - seelf), TILE(r)); if (!TSTBIT(cp - > ports[dir] - > portval, MSKBITN(PORTNO(r)))) SETBIT(cp->ports(dir)->portval, MSKBITN(PORTNO(r))); if (r = cp->reg[simdinstr.argl], VALIDADDR(r)) { dir = WHICHDIR(TILE(cp->self), TILE(r)); SETBIT(cp->ports[dir]->portval, MSKBITN(PORINO(r))); /* (@ This is a bit much for a poor little cell */ if ((r = cp->reg(simdinstr.arg1), vALIDADDR(r))) ( /* Note: the following is done by all cells */ /* Am assuming that this cell is not free */ /* Note: the following is done by all cells */ if (!TSTBIT(cp->state,STATE_VALID)) { dir - WHICHDIR(TILE(cp-'self), TILE(r)); a = bitpos(PORTNOSEL(flg,val)) & 0x3; SETBIT(cp->ports[r]->portval,val); if (!ISTBIT(cp-'state,STATE_VALID)) ( cp->reg[simdinstr.arg1] = val; cp->reg[simdinstr.arg]] = UNDEF; flg = PORTNOCASE(cp->portno); case CELLARBPTR: /* target-ptr */ val = MSKBITN(cp->portno); for (r=0; r<PORTCNT; r++)</pre> for (r=0; r < PORTCNT; r++) case ALLOCPTR: /* target-ptr */ /* (a) NOT SO NICE . if (IS ACTIVE 66 ph==2) ( case FINDFREE: /* target */ val = cp->self; } else DEACTIVATE; switch (r) [ ) else ( /* ph--2 •/ ) else ( /* ph--2 */ } else { /* ph==2 */ globalfeedback++; if (IS_ACTIVE) ( DEACTIVATE; cp->refc = 0; if (IS_ACTIVE) ( if (IS ACTIVE) case SENDGLOBAL: break; if (ph--1) { (ph==1) ( 1f (ph=-1) { break; break; break; ``` ``` /* no cell arbitration is necessary */ /* only really use the number of the cell from the pointer */ cp->reg[simdinstr.arg1] = CELLAT(r).reg[simdinstr.arg3]; if ((cp->col->colval) & LOWERBITS(ROW(cp->self))) DEACTIVATE; case FETCHGLBL: /* target source-ptr reg */ if (IS_ACTIVE && (r = cp->reg[simdinstr.arg2],VALIDADDR(r))) { cp->refc++; /* should really just do once */ if (r = cp->reg[simdinstr.arg1],VALIDADDR(r)) { if (r = cp->reg[simdinstr.arg1], VALIDADDR(r)) { if (TSTBIT(cp->ports[r]->portval,val)) [ SETBITN(cp->col.>colval,ROW(cp->self)); if (cp->col->colowner == cp->self) { if (TSTBIT(cp->col->colval,FAILFLAG)) printid(cp); printf(" selrow bad active cell\n"); DEACTIVATE; printf(" fetchglbl: not owner\n"); ) printf(" arbrow bad active cell\n"); row[ROW(r)].rowowner |- cp->self || case STOREGLBL: /* target-ptr reg source */ else cp->col->colowner = cp->self; SETBIT(cp->state, STATE VALID); if (cp->col->colowner == cp->self) if (cp->col->colowner 1= cp->self CELLAT(r).aux != cp.'self) { cp->col->colval = ROW(r); val - MSKBITN(cp->portno); for (r=0; r < PORTCNT; r++) case SELROW: /* target */ if (ph==2 && IS_ACTIVE) { case ARBROW: /* target */ if (ph**2 & IS_ACTIVE) ( ] else ( /* ph--2 */ ) else DEACTIVATE; ] else DEACTIVATE; DEACTIVATE; printid(cp); printid(cp); DEACTIVATE; if (IS ACTIVE) ( if (ph==1) ( if (ph==2) ( if (ph==1) ( else ( case ARBCOL: ) else ( break; break; break; break; break; ``` ``` risc.c ``` ``` /* as usual very similar to the above */ if (IS_ACTIVE && (r = cp->reg(simdinstr.argl|,VALIDADDR(r))) ( /" as usual very similar to the above "/ if (IS_ACTIVE & (r = cp->reg[simdinstr.argl], VALIDADDR(r))) { /* as usual very similar to the above */ if (IS_ACTIVE && (r = cp->reg[simdinstr.argl],VALIDADDR(r))) { if (cp->col->colowner != cp->self || /* as usual very similar to the above */ if (IS_ACTIVE && (r = cp->reg[simdinstr.arg1],VALIDADDR(r))) { if (cp->col->colowner != cp->self || case TSTMARKGLBL: /* source-ptr mark (ฟิติผิจกลกged order */ If (ITSTBITN(CELLAT(r) marks, simdinstr. arg2)) if (TSTBITN(CELLAT(r).marks, simdinstr.arg2)) printf(" tatnotmarkglbl: not owner(n"); } printf(" tstmarkglbl: not owner\n"); } /* no cell arbitration is necessary */ /* no cell arbitration is necessary */ /* no cell arbitration is necessary */ printf(" storeglbl: not owner\n"); ) row[ROW(r)].rowowner != cp->self || if (cp->col->colowner i= cp->self || row[ROW(r)].rowowner i= cp->self || row(ROW(r)).rowowner |= cp->self || row[ROW(r)] rowowner |= cp->self || case TSTNOTMARKGLBL: /* source-ptr mark */ case SETMARKGLBL: /* target-ptr mark */ if (cp.,col.,colowner i= cp.,self CELLAT(r).aux |= cp->self) ( if (ph==1) [ CELLAT(r).aux i= cp->self) { CELLAT(r).aux != cp->self) ( CELLAT(r) aux != cp->self) { printf("storeglbl: "); (verbose) printaddr(r); DEACTIVATE; printid(cp); DEACTIVATE; printid(cp); printid(cp); if (ph -- 2) ( if (ph--2) ( if (ph==1) ( if (ph--1) ( if (ph==2) ( if (ph--1) { ) else ( else else break; break; break; ``` ``` if (IS_ACTIVE 64 (r * cp->reg[simdinstr.argl],VALIDADDR(r))) ( case EAST: val = ADDR(ROW(val),COL(val)+1,a); break; case NORTH: val = ADDR(ROW(val)-1,COL(val),a); break; case WEST: val = ADDR(ROW(val),COL(val)-1,a); break; case SOUTH: val = ADDR(ROW(val)+1,COL(val),a); break; if ((val = cp->ports[dir]->portval) != 0) ( /* want to find in the "other" tile */ /* @@ NICENESS */ if (ph==1) { Note: the following is done by all cells */ CLKBITN(CELLAT(r).marks, sindinstr.arg2); SETBITN(CELLAT(r).marks, simdinstr.arg2); a - bitpos(PORTNOSEL(flg,val)) 4 0x3; /* no cell arbitration is necessary */ /* no cell arbitration is necessary */ printf(" setmarkglbl: not owner "); } printf(" clrmarkglbl: not owner "); } row(ROW(r)).rowowner != cp->self | CELLAT(r).aux != cp->self) [ /* as usual very similar to the above */ SETBIT(cp.>ports[r]->portval,val); if (ITSTBIT(cp-'state,STATE_VALID)) cp->reg[simdinstr.arg] = UNDEF; cp->reg[simdinstr.arg] = val; if (cp. col. colowner ! cp. self | case CLRMARKGLBL: /* target-ptr mark */ flg = PORTNOCASE(cp->portno); case FINDFREEDIR: /* target dir */ val = MSKBITN(cp.>portno); for (r=0; r PORTCNT; r++) for (r=0; r<REGNUM; r++) ( dir - simdinstr.arg2; if (ph=-2 && IS_ACTIVE) [ if (VALIDDIR(dir)) { val = cp->self; switch (dir) ( cp.>reg(r); } else ( /* ph=-2 */ else DEACTIVATE; if (IS_ACTIVE) ( /* failure flag? */ printid(cp); printid(cp); if (ph==2) ( if (ph--1) ( 1f (ph--2) ( else ( else ( case COMMIT: break; break; break; ``` risc.c ``` œ ``` ``` if (dir == UNDEF) ( globalfeedback++; global feedback++; printid(cp); tile * TILE(r); if (IS_ACTIVE) [ if (IS_VALID) ( val = cp. aux; DEACTIVATE; if (val 1= 0) DEACTI VATE; case INITTRYING: If (ph=-2) { case SETTRYING: ACTIVATE: if (ph--2) { else break; break; break; break; if (IS_ACTIVE && (r = cp->reg(simdinstr.arg1), VALIDADDR(r))) ( /* (4) This should result in eventual deletion of next stuff */ case TSTADDR: /* ptr */ if (ph==2 is IS_ACTIVE is !VALIDADDR(cp->reg[simdinstr.arg1])) /* deactivate if actually is a valid address */ if (ph==2 && 1S_ACTIVE && VALIDADDR(cp->reg[simdinstr.argl])) /* deactivate if actually is a valid address */ case TSTSTATE: /* flags */ if (ph==2 && IS_ACTIVE && ITSTBIT(cp->state,simdinstr.argl)) if (ph==2 && IS_ACTIVE && TSTBIT(cp->state, simdinstr.argl)) DEACTIVATE; if (!(r = cp->reg[simdinstr.arg1],VALIDADDR(r)) 64 UNDEF == WHICHDIR(TILE(cp->self),TILE(r))) /* deactivate if not a valid address or local */ /* Note: the following work with bit masks */ cp->reg[r] = cp->reg[r+REGNUM]; /*@da*/ dir - WHICHDIR(TILE(cp-'self), tile); if (ph==2 && IS_ACTIVE && 1 != cp->refc) if (ph=-2 && IS_ACTIVE && 0 I= cp->refc cp->aux = 0; /* Note: done by all */ SETBIT(cp->state, simdinstr.argl); CLRBIT(cp->state, simdinstr.argl); SETBIT(cp->state, STATE_COMMIT); cp->reg[r+REGNUM] - val; case TSTNOTSTATE: /* flags */ case TSTREMOTE: /* ptr */ if (ph==2 && IS_ACTIVE) { if (ph==2 66 IS ACTIVE) [ case TSTNOTADDR: /* ptr */ case SETSTATE: /* flags */ case CLRSTATE: /* flags */ if (ph==2 && IS_ACTIVE) if (ph==2 && IS_ACTIVE) case TSTLOCAL: /* ptr */ Case MAKEREQ: /* ptr */ if (ph==1) ( tile - TILE(r); DEACTIVATE; DEACTIVATE; DEACTIVATE; DEACTIVATE; DEACTIVATE; DEACTIVATE; CASS TSTUNSHARED: DEACTIVATE; case TSTUNUSED: break; break; break; break: break: break; break; ``` ``` if (simdinstr.argl==LEFT) SETBIT(cp->state,STATE_REMOTE_LEFT); else if (simdinstr.argl==RIGHT) SETBIT(cp->state,STATE_REMOTE_RIGHT); } else [ /* ph ==2 */ if (IS_ACTIVE && (r = cp->reg[simdinstr.arg2],VALIDADDR(r))) { printf(" fetchdata: bad data portaddr = "); printaddr(cp->ports[dir]->portaddr); printf(" portval = %d\n",cp->ports[dir]->portval); cp->reg[simdinstr.arg1] - cp->ports[dir]->portval; cp->ports(r)->portval = cp->reg(simdinstr.arg3); else CLRBIT(cp.>state,STATE_TRYING); /* When clear? */ /* (all could xor case FETCHDATA: /* treg, source-ptr, sreg */ if (ph*=1) { /* Note: the following is done by all cells */ If (TSTALLBIT(st, STATE_VALID|STATE_TRYING)) ( Jone by all cells cp->ports[r]->portaddr = cp->self; dir - WHICHDIR(TILE(cp-'self), tile); if (cp->ports[dir]->portaddr == r) cp->ports[dir]->portaddr - r; SETBIT (cp->state, STATE_TRYING); if (cp->ports[r]->portaddr == if (dir -- UNDEF) DEACTIVATE; for (r=0; r<PORTCNT; r++) ] else ( /* ph ==2 */ /* Note: the following is if (TSTBITN(val,r)) ( for (r=0; r < PORTCNT; r++) val * cp->self; /* @d check only one? */ SETBITN(cp->aux,r); ``` ``` if (IS ACTIVE) ( if (1S ACTIVE && DEACTIVATE; CD->aux - 0; break; DEACTIVATE; else ) else ( if (ph--1) bresk; break; if (simdinstr.argl==LEFT) SETBIT(cp->state,STATE_REWOTE_LEFT); else if (simdinstr.argl==RIGHT) SETBIT(cp->state,STATE_REWOTE_RIGHT); if (ISTBIT(cp->ports{r]->portval,MSKBITN(cp->portno+PORTNOSHIFT)) /* (@ IS_VALID ?? */ val = MSKBITN(cp->portno+PORTNOSHIFT); if (TSTBIT(cp->ports[r]->portval,MSKBITN(cp->portno))) (val - NUM(cp->self), (op==INCRPTR1 ? 0<=val &6 val<=1 ((mask==0 | mask==2) ? val : val < PORTNOSHIFT); (val = NUM(cp->self), (op==AVAIL1 ? 0<=val && val<=1 : ((val--0||val--2)? MSKBITN(PORTNO(F)) : 2<=val & val<=3) & (r · cp->reg(simdinstr.arg2),VACIDADDR(r))) ( cp->aux = val; /* @@ could use ACC instead */ 2<=val 66 val<=3))) { MSKBITN(FORTNO(r)+PORTNOSHIFT)); PORTNOSEL(mask, cp.>ports[r]->portval); if (TSTBIT(cp->ports[r]->portval,val)) [ val = val & LOWERBITS(CELLSPERTILE); val = PORTNOREPLY(cp->portno,val); dir - WHICHDIR(TILE(cp. self), TILE(r)); if (val & LOWERBITS(CELLSPERTILE)) ( SETBIT(cp.>ports(r)->portval,val); cp->refc += simdinstr.argl; if (!TSTBIT(cp->state,STATE_VALID)) cp->refc += simdinstr.argl; SETBIT(cp.>ports[dir]->portval, val = (val << CELLSPERTILE) | val = LOWESTBIT(cp.>aux); for (r=PORTCNT-1; 0<=r; r--) {</pre> mask - PORTNOCASE(cp->portno); case incretti: /* val,target-ptr */ val - MSKBITN(cp->portno); cp->ports[r]->portval for (r=0, r PORTCNT; r++) for (r=0; r <PORTCNT; r++) for (r=0, r PORTCNT; r++) if (dir -- UNDEF) case AVAIL1: /* reg */ | else ( / ph--2 '/ | else ( /* ph==2 */ If (IS ACTIVE && if (IS ACTIVE) ( DEACTIVATE; 1f (cp. > dux 66 mask - val; cp->4ux = 0; else ( val = 0; (ph--1) ( Case INCRPTR2: if (ph==1) [ case ALLOCREQ: if (ph==1) ( case AVAIL2: break; break; ``` ``` if (sindinstr.arg2==LEFT) SETBIT(cp->state,STATE_REWOTE_LEFT); else if (sindinstr.arg2==RIGHT) SETBIT(cp->etate,STATE_REWOTE_RIGHT); val = MSKBITN(cp->portno) | MSKBITN(cp->portno+PORTNOSHIFT); case NORTH: val - ADDR ROW(val) 1,COL(val),a); break; case WEST: val - ADDR ROW(val),COL(val)-1,a); break; val = ADDR(ROW(val), COL(val)+1, a); break; case SOUTH: val = ADDR(ROW(vel)+1,COL(val),a); break; if (op==AvAIL2 && cp->reg(simdinstr.arg1) == 0) DEACTIVATE; /+ 2--4d cp->ports[dir]->portval = cp->reg[simdinstr.arg4]; if (op=-AVAIL1) cp->reg(simdinstr.arg1) = 0; /*@@*/ MSKBITN(cp.>portno+PORTNOSHIFT))) if (!(op-=AVAIL2 && cp->reg[simdinstr.argl]!=0)) ( case STOREPN: /* portno, target-ptr, treg, areg */ /* originally written using action between ph**1 and (r = cp - )reg(simdinstr.arg1, VALIDADDR(r))) tile = Tile(r); if (TSTBIT(cp->ports[r]->portval,val)) { if (op -- AVAIL1) a - 0; else a - 2; if (op==AVAIL1) a = 1; else a = 3; MSKBITN(cp. portno))) ( if (TSTBIT(cp->ports(r)->portval, SETBIT(cp. > state, STATE_REMOTE_RIGHT); if (TSTBIT(cp->ports(r)->portval, cp->reg[simdinstr.arg1] = val; dir - WHICHDIR(TILE(cp->self),tile); (simdinstr.arg1 == cp.>portno) 66 cp->ports(dir)->portsddr = r; for (r=0; r<PORTCNT; r++) val >>= CELLSPERTILE; val = cp->self; if (dir -- UNDEF) ( switch (r) ( CASE EAST: | else ( /* ph==2 */ ``` risc.c ``` if (IS_ACTIVE &6 (val = cp->reg[simdinstr.arg3],VALIDADDR(val))) { for (r-0; r<PORTCNT; r++)</pre> if (IS_ACTIVE && (r = cp->reg[simdinstr.arg2], VALIDADDR(r))) ( if (cp-ports[r]-portaddr == val) { cp->reg[simdinstr.arg2] = cp->ports[r]->portval; cp->reg(simdinstr.arg3) = cp->ports(r)->portval; /* Note: the following is done by all cells */ case FETCHPN: /* portno, treg, source-ptr, sreg */ /* no cell arbitration is necessary */ for (r=0; r<REGCNT; r++) cp->reg[r] = 0; if (TSTBITN(cp->marks, simdinstr.argl)) { cp->ports(r)->portaddr = cp->self; cp->ports(r)->portval = val; printid(cp); printf(" incrglbl: not owner\n"); if (cp->colowner != cp->self || row[ROW(r)].rowowner != cp->self) CELLAT(r).refc += simdinstr.argl; or (r=0, r<PORTCNT; r++) if (cp->ports[r]->portaddr == val) if (simdinstr.argl == cp.>portno) ( val = cp->reg(simdinstr.arg4); for (r=0; r<portcvr; r++) (</pre> case INCRGLBL: /* val, target-ptr */ CLRBIT(cp->state, STATE_TRYING); val = cp->self; /* @d check only one? */ for (r=0; r<PORTCNT; r++)</pre> if (ph==2 && IS ACTIVE) ( if (ph==2 && IS_ACTIVE) [ case CONMITMARK: /* mark */ case DELETE: if (ph==2 && IS_ACTIVE) ] else { /* ph--2 */ } else ( /* ph--2 */ if (IS VALID) ( if (IS_VALID) [ 1f (ph--2) ( if (ph--1) ( cp->state = 0; cp->marks = 0; cp->refc = 0; CD-> aux = 0; break; Case CLRTRING: if (ph==1) ( lelse break; break; break; break; break; ``` ``` /* lower numbered bits have higher priority */ if ((cp->ports[NORTH]->portval) & LOWERBITS(cp->portno)) DEACTIVATE; if (ph==2 66 IS_ACTIVE 66 !((cp->reg[simdinstr.arg1])==0)) if (ph==2 66 IS ACTIVE 66 ((cp->reg(simdinstr.argl])==0)) printaddr(cp->col->colowner); printf("\n"); printaddr(row[ROW(r)].rowowner); printf("\n"); printcelladdr(row[ROW(r)].rowowner); SETBITN(cp.>ports(NORTH)->portval,cp->portno); if (r = cp->reg[simdinstr.arg1], VALIDADDR(r)) val = cp->reg[r]; cp->reg[r] = cp->reg[r+REGNUM]; /*@@*/ cp->reg[r+REGNUM] = val; if (cp->col->colowner != cp->self || row[ROW(r)].rowowner != cp->self) ( cp->reg[r] = cp->reg[r+REGNUM]; /*(@@*/ if (ph==2 66 IS_ACTIVE 66 cp->refc <= 0) printf(" selcell: not owner\n"); SETBIT(cp->state, STATE_COMMIT); SETBIT(cp->state, STATE_COMMIT); CELLAT(r). aux = cp->self; for (r=0; r<REGNUM; r++) { for (r=0; r<REGNUM; r++) [ cp->reg[r+REGNUM] = val; case SELCELL: /* ptr */ if (IS_ACTIVE && ph==2) { if (ph -- . tt IS_ACTIVE) ( case SETRANDOM: /* reg */ if (ph==2 && IS_ACTIVE) [ if (ph--2 66 IS_ACTIVE) ( ] elf. { /* ph --2 */ case TSTNZERO: /* reg */ val = cp->reg[r]; case TSTZERO: /* reg */ printcell(cp); ] else DEACTIVATE; printid(cp); case ARBTILE: if (IS_ACTIVE) { DEACTIVATE; 1f (ph==1) [ case SETCOMNIT: DEACTIVATE; DEACTIVATE; Case TSTINUSE: ) else ( CABE SWAP: break; break; break; break; break; break; break; break: ``` ``` risc.c ``` ``` | trusk: | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | treds | case (LOGACR: /* treg steg */ | tre ``` simd1(CONST, FIBO); SHOW; ``` /* ----- flag = 0; /* Keep track of whether there are any fibo or plus tokens */ while (1) ( /* beginning of main loop */ do_posn(0); do_posnl(2,"fibo(0)=0"); fetchvers=0; storevers=fetchvers; build_cell(cp, ZERO, 1, NULL, NULL); sp = alloc_cell(cp); build_cell(cp,SUCC,1,sp,NULL); SIMD routines for Peano Fibonacci sp = alloc_cell(cp); build_cell(cp,FIBO,2,sp,NULL); cp = alloc_in(ROWNUM-3,4); /* rule (eq (fibo 0) 0) */ /* Build the initial term */ Rimdl (CONST, ZERO); SHOW; SINGL (EQ. ACC, TOK); SHOW; simil(SETMARK, 1); SHOW; simd(INIT); SHOW; for (i=0; i<n; i++) [ cellstate *cp, *sp; progname - 'fibo'; stat_print_loop(); SING(ERASE); SHOW; simd(INIT); SHOW; int reducedlim - 2; finclude "risc.h" root = cp.>self; tinclude <stdio.h> int flag, rent; simd_execute() define 2ERO 1 define FIBO 3 star init(); #define SUCC 2 term_build(n) int start; do_init(); define PLUS ds . do int i; Int n; ``` ``` /* rule (eq (fibo (succ (succ x))) (+ (fibo x) (fibo (succ x)))) */ do_posnl(2,*fibo(s s x) = fibo(x) + fibo(s x)*); simd(INIT); SHOW; /* rule (eq (fibo (succ 0)) (succ 0)) */ simd2(SETMARK, 15); SHOW; /* Success */ sind fetch(NLEFT, LEFT, LEFT); SHOW; do_posn1(2, fibo(s 0) = (s 0) "); simd(INIT); SHOW; simd2(TSTMARKPTR, 6, LEFT); SHOW; simd2(TSTMARKPTR, 1, LEFT); SHOW; simd2(SETMARKPTR,7, LEFT); SHOW; simd2(TSTMARKPTR, 9, LEFT); SHOW; simd2(TSTMARKPTR, 1, LEFT); SHOW; simd2(SETWARKPIR, 3, LEFT); SHOW; simd_incr_any(1,NLEFT); SHOW; Bimd2(MOVE, NTOK, ACC); SHOW; simd1(CLEAR,NLEFT); SHOW; simd1(CLEAR,NRIGHT); SHOW; simd1(CONST,ZERO); SHOW; simd2(MOVE,NTOR,ACC); SHOW; simdl(CLEAR, NRIGHT); SHOW; simdl(CONST, SUCC); SHOW; simd2(EQ, ACC, TOK); SHOW; simd2(EQ, ACC, TOK); SHOW; simdl(TSTMARK, 15); SHOW; simd1(TSTMARK, 15); SHOW; simd2(EQ, ACC, TOK); SHOW; simdl(SETMARK, 10); SHOW; simdl(CONST,SUCC); SHOW; Sindl(CONST, SUCC); SHOW simd1(SETMARK, 4); SHOW; simd1(SETMARK, 5); SHOW; simd1(SETMARK,6); SHOW; simdl(TSTMARK,4); SHOW; Bimdl (CLRMARK, 2); SHOW; simdl(TSTMARK,7); SHOW; simdl(SETMARK,9); SHOW; simd1(CLRMARK,2); SHOW; simdl(TSTMARK,2); SHOW; simd1(SETMARK, 0); SHOW; simdl(TSTMARK, 4); SHOW; simd1(SETMARK, 2); SHOW; simdl(SETMARK, 1); SHOW; simd1(TSTMARK,3); SHOW; simdl(TSTMARK, 5); SHOW; simd1(TSTMARK, 5); SHOW; Simdl(SETMARK, 0); SHOW simd(SENDGLOBAL); SHOW; sind(COMMIT); SHOW; Bind(COMMIT); SHOW; /* use forward ? */ simd(INIT); SHOW; Simd(INIT); SHOW; simd(INIT); SHOW; Sind(INIT); SHOW; simd(INIT); SHOW; if (testglob()) ( simd(INIT); BIEDG(INIT); resetglob(); ``` resetglob(); ``` /* Success */ simd_store(NRIGHT, LEFT, NTOK); SHOW; simd_store(NLEFT, LEFT, NTOK); SHOW; simd store(NRIGHT, TOK, TOK); SHOW; simd_fetch(NTOK, LEFT, LEFT); SHOW; simd fetch(NTOK, LEFT, NTOK); SHOW; FVO [ simd(INIT); SHOM; ] FVO [ simdl(TSTMARK,12); SHOM; ] SVO [ simd(INIT); SHOW; ] SVO [ simdl(TSTMARK,12); SHOW; ] simd_fetch(NTOK,LEFT,LEFT); SHOW; SVO [ simd(INIT); SROW; ) SVO [ simdl(TSTMARK,12); SHOW; ) FVO [ simdl(TSTMARK, 12); SHOH; ] SVO { simdl(TSTMARK, 11); SHOW; } SIMIZ(TSIMARKPTR, 10, LEFT); SHOW; SIM12(SETHARKPTR, 13, LEFT); SHOW; Simil store (NLEFT, TOK, TOK); SHOW; sind2(LOGIOR, FLAGS, ACC); SHOW; simd_incr_any(1,NTOK); SHOW; simd(INIT); SHOW; simd_incr_any(1,NTOK); SHOW; simd2(MOVE, NTOK, ACC); SROW; simdl(COMMITHARK, 12); SHOW; Simil (CLEAR, NRIGHT); SHOW; FVO ( simd(INIT); SHOW; ] Svo ( simd(INIT); SHOW; } simd_alloc(NRIGHT); SHOW; /* rule (ed (+ 0 x) x) */ SIM32 (SETMARK, 12); SHOH; simdl(TSTMARK, 12); SHOW; simd1(TSTWARK, 12); SHOW; simdl(TSTMARK, 12); SHOW; do posnl(2, 0 + x = x^*); simd alloc(NLEFT); SHOW; simil(SETHARK, 11); SHOW; simil(TSTMARK, 13); SHOW; simdl(TSTMARK, 11); SHOW; simdl(CONST, PLUS); SHOW; SIMIL (TSTMARK, 0); SHOW; simd1(CONST, ZERO); SHOW Sind2(EQ, ACC, TOK); SHOW, simdl (CONST, PLUS); SHOW simd2(EQ, ACC, TOK); SHOW Bind1(SETHARR, 1); SHOW; do_posnl(2,"setup"); simd(INIT); SHOW; simd1(CONST, 1); SHOW; it (size<2.mainloop) ( do_delete(); /*@@*/ Simd(ERASE); SHOW; Simd(INIT); SHOW; simd(INIT); SHOW; sind(INIT); SNOW; Sind(INIT); SHOW; simd(INIT); SHOW; (HINIT); SHOM; do_commit(in(); cotred += 4; ``` ``` /" rule (eg (+ (succ x) y) (succ (+ x y))) "/ simdl(CLEAR,NRIGHT); SHOW; simd_alloc(NLEFT); SHOW; simdl(SETWARK,9); SHOW; /* Success */ /* *imd2(TSTMARKPTR,0, LEFT); SHOW; */ SVO [ simdl(TSTMARK,9); SHOW; } simd_store(NLEFT,RIGHT,RIGHT); SHOW; do_posn1(2,"(s x) + y = s(x + y)"); sind(init); SHOW; aind_incr_any(I.NTOK); SHOM; sind(INIT); SHOM; sind(INIT); SHOM; sind atore(NLEFT,LEFT,NTOK); SHOM; SVO [ sind(INIT); SHOW; ) simd fetch(NTOK,LEFT,LEFT); SHOM; FVO [ simd(INIT); SHOW; ] FVO [ simdl(TSTWARK,9); SHOW; ] sind store(NLEFT, TOK, TOK); SHOW; SVO { simd(INIT); SHOW; } SVO { simd1(TSTWARK,9); SHOW; } SVO [ simd1(TSTMARK,9); SHOW; ] simd2(TSTWARKPTR,8,LEFT); SHOW; simd2(TSTMARKPTR, 1, LEFT); SHOW; simd2(SETWARKPIR, 6, LEFT); SHOW; simd(INIT); SHOW; simd1(CONST,1); SHOW; simd2(LOGAND,ACC,FLAGS); SHOW; simd1(TSTNIERO,ACC); SHOW; /* rule "propagate reduced" */ simd2(MOVE,NTOK,ACC); SHOM; simd1(CLEAR,RIGHT); SHOM; simd1(TSTNOTHARK, 0); SHOW; Svo ( simd(INIT); SHOW; ) simdi(CONST, SUCC); SHOW; simdl(CONST,SUCC); SHOW; simdl(EQ,ACC,TOK); SHOW; simd2(MOVE, NLEFT, RIGHT); simdl(CONST,SUCC); SHOW; simd2(EQ,ACC,TOK); SHOW; eimdl(SETWARK,8); SHOW; simdl(TSTMARK,2); SHOW; simdl(SETMARK, 2); SHOW; simd(COMMIT); SHOW; simdl(CLRMARK,2); SHOW; mimdl(TSTMARK,6); SHOW; simd1(SETMARK,0); SHOW; simdl(TSTMARK,2); SHOW; do_posnl(2, "reduced"); simd1 (CONST, FORWARD); simd(SENDGLOBAL); SHOW; simd2 (MOVE, NTOK, ACC); simd1 (CLEAR, RIGHT); simd(COMMIT); SHOW; SING(INIT); SHOM; simd(INIT), SHOW; simd(INIT), SHOW; if (testglob()) ( start - clock; flag - 1; ``` ``` fibo-simd.c ``` ``` selected b(); sind(SETFRILG); sind(SETFRILG); sind(SETFRILG); sind(CETFRILG); sind(CETFRILG); sind(CETFRILG); sind(CETRRILG); sind(CETRRILG); sind(SETNARGER); sind(CETRTILG); ``` /\* end of main loop \*/ do\_final(); mainloop++; ``` /* begin prop east */ mainloop++; do final(); build_full_cell(&CELLREF(i,j,n),0,((i==4)?7:0), /* alive(i,j,n], */ simd2(HOVE,TOK,ACC); SHOW; /* token holds number of neighbors */ simd2(HOVE,NTOK,ACC); SHOW; /* clear NTOK for later use */ simd1(CONST,l); SHOW; /* ACC holds constant l for increments */ simd1(TSTNZERO,FLAGS); SHOW; flag = 0; /* Keep track of whether there are any fibo or pluses */ (()==0) ? NULL : &CELLREF(1,1-1,n)), ((1==ROWNUM-1) ? NULL : &CELLREF(1+1,1,n)), (()==COLNUM-1) ? NULL : &CELLREF(1,1+1,n)), (((i==0) ? NULL : &CELLREF(1-1,j,n))); /* treat edges as permanent dead zone (null pointers) */ for (i=0; i<ROWNUW; i++) { for (j=0; j<COLNUW; j++) {</pre> simd1(SETWARK, 2); SHOW; /* set mark 2 if alive */ simd2(ADD,TOK,ACC); SHOW; /* increment TOK if alive */ cellstate *cp,*sp,*temp; n=0; /* pdl assumes we always allocate in cell 0 */ /* later we loop to make 4 layers */ for (j=0; j<COLNUM; j++) { temp = alloc_cell_in(1,j,n); if ((i==0) && (j==0)) {root = temp->self; ]; /* Everyone Starts with 0 neighbors */ do_posn(0); do_posn1(2,"Start LIFE Main Loop"); SIMD routines for Conway's Game of LIFE while (1) { /* begin of main loop */ fetchvers=0; storevers=fetchvers, /* Build the initial term */ for (1=0; i<ROWNUM; i++) simdl(CONST,0); SHOW; progname = "life"; simd(ERASE); SHOW; stat_print_loop(); simd(INIT); SHOW; finclude "basic.h" term build(ignore) #include <stdio.h> linclude "risc.h" int flag, rent; simd_execute() stat_init(); int start; do init(); int i, j,n; int ignore; ``` ``` simd1(CLEAR,FLAGS); SHOW; /* He's dead, Jim. Undercrowded. 1- neighbors*/ simd2(GT,TOK,ACC); SHOH; simd1(CLEAR,FLAGS); SHOH; /* He's dead, Jim. Overcrowded. 5* neighbors*/ /* end of real loop */ if ((CELLAT(root).CELLFLAGS)&1) break; /* quit when root alive */ simd2(LOGAND, ACC, NTOK); SHOW; /* high (2) bit */ sind(INIT); SHOW; sind2(MOVE,NTOK,TOK); SHOW; sind2(LOGAND,ACC,NTOK); SHOW; /* low (1) bit */ sind1(TSTNZERO,ACC); SHOW; /* begin life/death */ do_posn(3); do_posnl(2,"Start Life/Death"); simd(INIT); SHOW; simdl(CONST,1); /* ACC is 1 again */ simdl(CONST,2); SHOW; /* ACC is 2 */ simd1(CONST,2); SHOW; /* ACC is 2 */ do_posn(1); do_posn1(2, "Prop E-W"); simd(INIT); SHOW; simd2(TSTWARRPTR,2,RIGHT); SHOW; do_posn(2); do_posnl(2, *Prop N-S*); simd(INIT); SHOW; simd2(TSTMARRPTR, 2, NRIGHT); SHOW; simd(INIT); SHOW; simd2(TSTMARKPTR,3,NLEFT); SHOW; simd2(TSTMARKPTR, 4, NLEFT); SHOW; simd2(TSTMARKPTR, 4, LEFT); SHOW; simd2(TSTMARKPTR, 3, LEFT); SHOW; /* begin prop north-south */ *imd2(MOVE, FLAGS, ACC); SHOW; simdl(TSTNZERO, ACC); SHOW; simd2(ADD, TOK, ACC); SHOW; simd2(ADD, TOK, ACC); SHOW; simd2(ADD, TOK, ACC); SHOW; simd(INIT); SHOW; simd2(ADD, TOK, ACC); SHOW; Bimd2(ADD, TOK, ACC); SHOW; simd2(ADD, TOK, ACC); SHOW; /* setup for prop N-S */ simdl(SETMARK, 3); SHOW; simd2(EQ, TOK, ACC); SHOW; simd2(LT, TOK, ACC); SHOW; 1 /* end of main loop */ simd1(SETMARK, 4); SHOW; /* begin prop west */ simd1(CONST, 3); SHOW; simd(INIT); SHOW; simdl(CONST,5); SHOW; simd(INIT); SHOW; simd(INIT); SHOW; simd(INIT); SHOW; ``` ``` suncirc2.c ``` ``` hz " tim2.tms_utime - tim1.tms_utime + tim2.tms_stime - tim1.tms_stime; printf("time taken: 410.3f ms\n",hz*scale/60.0); sscanf(argv[1],"ed", sarg); for (i=0; i<lim; i++) { 52.167 ms unsigned char nxt[500]; struct tms tim1, tim2; unsigned char dat[500]; #include <ays/types.h> #include <ays/times.h> assault's suncirc2 400 scale=10; /* @@ */ lim - 1000/scale; int i,scale, lim; 15 MIPS machine times(6tim2); main(argc, argv) times(ttim1); run(arg); char **argv; time taken: ``` ``` if (((xfxix)) nn |= 1<<0; if (((TSTBIT(cc,0)&tSTBIT(cc,0))) nn if (((TSTBIT(cc,1)&tSTBIT(cc,1))) nn if (((TSTBIT(cc,1)&tSTBIT(cc,2))) nn if (((TSTBIT(cc,1)&tSTBIT(cc,2))) nn if (((TSTBIT(cc,1)&tSTBIT(cc,4))) nn if (((TSTBIT(cc,5)&tSTBIT(cc,5))) nn if (((TSTBIT(cc,5)&tSTBIT(cc,5))) nn if (((TSTBIT(cc,7); nx = TSTBIT(cc,7); nxt[i] = nn;</pre> printf("%1x",cl); for (i=0;i<35;i++) printf("%02x",dat[i]);</pre> for (1=0; 1<size; 1++) dat[1] = nxt[1]; putchar('\n'); cl - cln; • int cl.cln; size = 50; for (1 0<n; n--) { nn = 0; cc = cl; if ((TSTBIT(cc, 3) & TSTBIT(cc, 3))) nn |- 1<<0; if ((TSTBIT(cc, 0) | TSTBIT(cc, 0))) nn |- 1<<1; if ((TSTBIT(cc, 0) | TSTBIT(cc, 1))) nn |- 1<<1; if ((TSTBIT(cc, 2) | TSTBIT(cc, 1))) nn |- 1<<2; if ((TSTBIT(cc, 2) | TSTBIT(cc, 1))) nn |- 1<<2; }</pre> /* cc -0 -o sunctrc2 sunctrc2.c */ *define NAND(x,y) (-(x)\{(y)\}) #define IOR(x,y) ((x)\{(y)\} #define TSTBIT(x,n) (x)\{(x)\{(x)\} register int i, size, cc, nn, x; cln = nn; x = cc&l) for (1=0; l<&ize; i++) ( cc - dat[1]; ``` nn = 0; run (n) int n; ``` simd2d.c ``` ``` /* Build the initial term */ else ( cellstate *cp; term_build(num) | else ( cols = rows; if (rnd) exit(-1); char *val; - 1; int num; printf("[%dxxd -> %d-%dx%d:%d]\n",r,c,res,ROW(res),COL(res),NUM(res)); } if (r<ROWNUM) { *nr = r; } else { *nr = 2*ROWNUM-2 · r; rn += 2; } if (c<COLNUM) { *nc = c; } else { *nc = 2*COLNUM-2 · c; rn += 1; } if (r<ROWNUM) { nr = r, } else { nr = 2*ROWNUM-2 - r; nn += 2; } if (c<COLNUM) { nc = c, } else { nc = 2*COLNUM-2 - c; nn += 1; } /* Mapping of abstract grid onto ensemble */ /* G0 and G1 constitute total order chain */ (int res; res = ADDR(nr,nc,nn); /* produces an "address" */ SIMD routines for 2D sort define MARKN(x) (1<<(x)) /* #define VERSION1 /* */ return ADDR(nr,nc,nn); #define REDUCEDMARK 0 int r.c, *nr, *nc, *nn; #define NUMBERWARK 1 map2e(r,c,nr,nc,nn) Idefine BASIC /* */ tinclude <stdlib.h> #include <stdio.h> finclude "basic.h" finclude "risc.h" Idefine NULLPTR 0 tdefine SORTER 1 int nr,nc,nn; /* not used */ #define GO 2 #define GI 3 #define G2 4 #define G3 5 #define M1 6 /* tokens */ /* marks */ *nn • rn; map2a(r,c) int rn; , . int r,c; ``` ``` (2000 - 32*1 - j) : (2000 + 32*1 - j)); /* alternate bad case 2 */ /* cp->CELLFLAGS = 10000 - 32*1 - j; /* bad case */ /* cp->CELLFLAGS = 10000 + 32*1 - j; /* alternate bad case */ /* cp->CELLFLAGS = (2*cols-1)*(2*rows-2 - i) + (2*cols-2) - j; /* */ ms = MARKN(NDVBERMARK); if ((i+j)1) [ ms |= MARKN(G0); } else [ ms |= MARKN(G1); ] if (j1) [ ms |= MARKN(G2); } else [ ms |= MARKN(G3); } if (j == 2*cols-2) ms |= MARKN(G2); val = getenv("SEED"); if (val != NULL) { rnd = 1; sscanf(val,"%d",&randomstate); } else rnd = 0; if ((2*ROWNUM-1) < num | (2*COLNUM-1) < num) printf("Size is too large: %d\n", num);</pre> int i, j, r, c, n, ms, lp, rp, v, rows, cols, rnd; if (0<j) lp = map2a(i,j-1); /* NEXT OPTIONAL */ if (0<j) rp = map2a(i,j-1);</pre> 1f (0<j) rp = map2a(1,j-1); cp->CELLFLAGS = randoom(); for (i=0; i<2*rows-1; i++) ( cp = alloc_cell_in(r,c,n); for (j=0; j<2*cols-1; j++) { if (1 < (2*rows-2)) ( lp = map2a(i,j-1); /* NEXT OPTIONAL */ lp = map2a(i+1,j); rp = map2a(1, j-1); rp - map2a(1,j-1); lp - map2a(1-1, j); cp->CELLTOK = SORTER; map2e(1, ], &r, &c, &n); cp->CELLFLAGS = Cp.->marks - ms; rp = NULLPIR; if ()&1) ( 1p - NOLLPTR; ((161) ? rows = (num+1)/2; if (0<i) [ ``` ``` simd2d.c ``` ``` fetchvers=1; storevers=fetchvers; /* @@ */ shownext = 0; if (15<cnt) { printf("\n"); cnt=0; ] while (1) ( /* begin of main loop */ /* top node must assume priority */ cp->marks |= MARKN(G0) | MARKN(G2); root = map2a(2*rows-2,2*cols-2); printf(" %d", cp->CELLFLAGS); int start, s, flag, putr, none0; /* Improved worst case */ if (0 cent) printf("\n"); if (verbose) showdat(); cp -> CELLRIGHT " rp; sifdef BASIC s++; if (2<=s) s = 0; switch (s) ( case 0: cp->CELLIEFT = 1p; s++; if (9<=s) s = 0; switch (s) [ cp = &CELLAT(root); progname = "2D sort"; p = cp. >CELLLEFT; cp - 4CELLAT(p); while (p 1=0) { flag = G0; pntr = LEFT; none0 = 0; flag = G2; pntr = RIGHT; cellstate *cp; stat_init(); simd_execute() /* 2D Sort */ do_init(); int p, cnt; break; break; cnt = 0; case 1: showdat() s - 1; ``` ``` simd2(TSTMARRPTR,NUMBERMARK,pntr); SHOW; simd_fetch(NFLAGS,pntr,FLAGS); SHOW; simd(SENDGLOBAL); SHOW; cntred++; simdl(SETWARK,M1); SHOW; if (!testglob() && s == 0) break; simd_store(pntr,FLAGS,ACC); SHOW; do_posn1(2, check overlapping*); simd(INIT); SHOW; simd1(TSTWARE,MI); SHOW; do_posnl(2,"sort perform swap"); simd(INIT); SHOW; simdl(TSTMARK, NUMBERWARK); SHOW; simd2(CERMARKPTR, M1, pntr); SHOW; simd2(CLRMARRPTR, M1, pntr); SHOW; simd2(TSTWARRPTR, M1, pntr); SHOW; #Imd2(HOVE, FLAGS, NFLAGS); SHOW; do_posnl(2,"sort decide swap"); simd(INIT); SHOW; simd2(GT, FLAGS, NFLAGS); SHOW; simdi(TSTHARK,H1); SHOW; simdi(CLRHARK,H1); SHOW; simdi(HOVE,ACC,FLAGS); SHOW; simd1(TSTMARK,flag); SHOW; do_posnl(2,"sort start"); simd(INIT); SHOW; simd(INIT); SHOW; simd1(TSTMARK,M1); SHOW; #1fdef VERSION1 simd1(SETMARK, H1); SHOW; simdl(TSTMARK, H1); SHOW; simd1(CLRMARK, M1); SHOW; SIMG2(CLRMARK, M1); SHOW; stat_print_loop(); flag = G2; pntr = RIGHT; flag = GO; pntr = LEFT; resetglob(); none0 = 0; (0) usod op bresk; break; case 0: case 1: case 2: case 5: case 6: case 7: case 8: case 4: case 3: #endif ``` mainloop\*\*; ] /\* end of main loop \*/ certose showdat(); do\_final(); ``` simd-circuit.c ``` ``` cp = allocceli(x,y,i); cp->CELLTOK = tok; cp->PBATKS = mATKS; cp->CELLFLOGS = flags; cp->CELLFLOGS = flags; if (left==NULL) cp->CELLLEFT = 0; else cp->CELLRIGHT = right->self; return cp; if (ptr==NULL) cp->reg[r] = 0; else cp->reg[r] = ptr->self; res = AT(x,y,i); if (TSTBIT(res->state,STATE_VALID)) printf("cell %dx%d:%d already allocated\n",x,y,i); SETBIT(res->state,STATE_VALID); make_cell(x,y,i,tok,marks,flags,left,right) cellstate *left,*right; int x,y,i,tok,marks,flags; SIMD routines for Circuit Simulation tdefine AT(x,y,i) (&CELLREF(x,y,i)) /* Build the initial term */ setptr(AT(x,y,i),r,ptr); } /* tokens and marks */ setreg(x,y,i,r,ptr) cellstate *ptr; int x,y,i,r; finclude 'basic.h' Idefine NANDMARK 4 setptr(cp,r,ptr) cellstate *cp,*ptr; #include <stdio.h> #define NANDTOK 1 linclude "risc.h" cellstate *res; define ORMARK 5 alloccell(x,y,i) res->refc = 1; cellstate *cp; #define ORTOK 2 return res; cellstate . cellstate * int x,y,i; int r; ``` ``` term_build(n) int n; { int i; cellstate "cp,*sp; int st,ndir,r.c; sp = make_cell(0,0,0,NANDTOK,0,0,NULL,NULL); cp = make_cell(1,0,0,ORTOK,0,0,sp,sp); cp = make_cell(1,1,1,ORTOK,0,0,cp,cp); cp = make_cell(0,1,1,ORTOK,0,0,cp,cp); cp = make_cell(0,1,1,ORTOK,0,0,cp,cp); cp = make_cell(0,1,1,ORTOK,0,0,cp,cp); cp = alloc_cell(sp); build_cell(cp,NANDTOK,0,sp,sp); sp = cp; } ``` ``` simd(INIT); SHOW; simd(SETTRYING); SHOW; simd(SETTRYING); SHOW; simd2(TSTWARKPTR,0,LEFT); SHOW; simd2(TSTWARKPTR,0,REGHT); SHOW; simd1(CLEMARR,0); SHOW; simd2(TSTWARKPTR,0,LEFT); SHOW; simd2(TSTWARKPTR,0,LEFT); SHOW; simd1(SETWARK,0); SHOW; simd2(TSTWARK,0); SHOW; simd1(SETWARK,0); SHOW; simd1(SETWARK,0); SHOW; simd(INIT); SHOW; simd1(TSTWARK,ORMARK); SHOW; simd2(SETTRING); SHOW; simd2(TSTWOTHARRPTR,O,LEFT); SHOW; simd1(TSTWOTHARRPTR,O,REHT); SHOW; simd1(TTTRING); SHOW; simd1(SETHARRPTR,O,LEFT); SHOW; simd1(SETHARRPTR,O,LEFT); SHOW; simd1(SETHARRPTR,O,REFT); SHOW; while (1) ( /* begin of main loop */ stat_print_loop(); do_init(); fetchvers-1; storevers-fetchvers; shownext = 0; simdl(CONST,NANDTOK); SHOW; simd2(EQ,TOK,ACC); SHOW; simdl(SETWARK,NANDWARK); SHOW; simd(INIT); SHOW; simdl(CONST,ORTOK); SHOW; simdl(EQ,TOK,ACC); SHOW; simdl(SETMARK,ORMARK); SHOW; simdl(SETHARK, 0); SHOW; do_posn(0); do_posn1(2,"nand"); progname = "circ"; simd(INIT); SHOW; do_posnl(2, or*); /* Bubble Sort */ termshow = 0; stat_init(); sim execute() do_posn(1); int start; ``` ``` mainloop++; if (size <= mainloop) break; j /* end of main loop */ verbose = 0; do_final(); }</pre> ``` SIMD routines for Finite Element Method ... Hex Grid Each RRM Cell contains one hex grid element, which has 6 particle slots, one in each of 6 directions. These bits are kept in the lowest 6 bits of the FLAGS register. The extra two neighbors (NE and SW) are tested through the We keep pointers to NESW neighbors in R,L,NR,NL registers. All collisions preserve the total momentum and the number of particles. All particles have unit mass, unit velocity. In one timestep (one big loop) particles exit one cell, enter another, and possibly collide. The collision rules are the hardest to encode. The model is a hexagonal grid of space. Between each neighboring grid element, there is space for two particles, one in each direction. A particle continues in a straight line unless it collider with some other particle at the center of a hex grid. Otherwise it passes through. N and S nieghbors, and the information passed on. The model is implemented by having one RRM cell represent each hexagonal grid element. At the top of the big loop, each cell's FLAGS register contains 6 bits determining the presence or absence of 6 particles exiting the cell in each of 6 directions. grid element can be thought of as follows: the "o" is the center of the element. Here, the A,//,v are supposed to be arrows. The cell contains 6 possible particles, at most one on each of the above arrows. These bits (yes or no, a particle is on that track) are represented by the lowermost 6 bits of the the FLAGS register. (the particle The movement proceeds in two phases. In the first phase, any particle in element #1 on the indicated path is passed on to element #2. (the parti particles heading invard toward the center of an element move through the center, and arrive on the path leading out from the grid element. In the second phase, moves in its indicated direction). point of a grid element. However, if two particles meet head-on, they will deflect: In most cases, the particles move unimpeded through the center That is, two particles collide, and scatter through 60 degrees. The same may happen if three or four particles collide. In this model, collisions preserve momentum and number of particles, which nearly determines all the collision rules. The only degree of freedom remaining is the choice of scattering head-on collisions 60 degrees clockwise, or 60 degrees counter-clockwise. #Include <stdio.h> include "risc.h" !include 'basic.h' define ALLDIRS 63 /\* Build the initial term \*/ / Similar to SIMD-LIFE term\_build(ignore) int ignore; inc 1, j,n; cellstate \*cp,\*sp,\*temp; n=0; /\* pdl assumes we always allocate in cell 0 \*/ /\* later we loop to make 4 layers? (flip) \*/ ``` simd(INIT); SHOW; for (i=0; i<ROWNUW; i++) { for (j=0; j<COLNUW; j++) { build_full_cell(&CELLREF(i,j,n),0,(((i*j)/5) & ALLDIRS),/*state(1,j,n)*/</pre> /* note that ((i*j)/5) gives a distribution */ /* with no particles in the 0,0 area, and lots */ /* of particles in the COLNUM,ROWNUM area */ /* so we can detect termination by presence */ /* of particles in cell 0,0 */ ((j=-0) ? NULL : &CELLREF(i,j-1,n)), ((j=-ROWNUM-1) ? NULL : &CELLREF(i+1,j,n)), ((j=-COLNUM-1) ? NULL : &CELLREF(i,j+1,n)), ((1==0) ? NULL : &CELLREF(1-1, j,n)); /* treat edges as permanent dead zone (null pointers) */ simd2(LOGAND.ACC.FLAGS); SHOW; /* is the lst bit ON? */ simd1(TSTNZERO.ACC); SHOW; simd(INIT); SHOW; simd1(CONST.4); SHOW; simd2(LOCAND,ACC,FLAGS); SHOW; /* is the 3rd bit ON? */ simd1(TSTNZERO,ACC); SHOW; simd(INIT); SHOM; simd1(CONST,16); SHOM; simd2(LOGAND,ACC,FLAGS); SHOM; /* is the 5th bit ON? */ simd2(LOGAND, ACC, FLAGS); SHOW; /* is the 2nd bit ON? */ simd2(LOGAND, ACC, FLAGS); SHOW; /* is the 4th bit ON? */ do_posn(0); do_posnl(2, "Start Fluid Hain Loop"); /* transfer FLAGS bits into marks 1-6 */ for (j=0; j<COLNUM: j++) { temp = alloc_cell_in(i,j,n); if ((i==0) && (j==0)) {root = temp->self;}; while (1) [ /* begin of main loop */ fetchvers=0; storevers=fetchvers /* FLAGS now has good bits */ simdl(TSTNZERO, ACC); SHOW; simdl(TSINZERO, ACC); SHOW; simdl(TSTNZERO, ACC); SHOW; for (i=0; i<ROWNUM; i++) ( simd1(SETWARK, 2); SHOW; simdl(SETMARK, 3); SHOW; simdl(SETWARK,4); SHOW; simdl(SETMARK, 1); SHOW; simdl(SETMARK, 5); SHOW, simd1(CONST, 8); SHOW; simdl(CONST, 1); SHOW; mindl(CONST, 2); SHOW; proqueme - "fluid"; simd(ERASE); SHOW; stat_print_loop(); simd(INIT); SROW; sind(INIT); SHOW; simd(INIT); SNOW; stat_init(); simd_execute() /* Start */ do init(); ``` ``` simd1(EQ.ACC.FLAGS); SHOW; simd1(TSTNOTHARK.15); SHOW; /* didn't just become this */ simd1(CONST,32); SHOH; simd2(LOGAND,ACC,FLAGS); SHOM; /* is the 6th bit ON? */ simd(CLRWARK,15); /* clear this for done-test */ simd1(CONST,21); SHOW; /* begin real propagation "/ do_posn(2); do_posn1(2, "Propagate All"); simd(INIT); SHOW; simd2(LOGIOR,FLAGS,ACC); SHOW; do_posn(4); do_posn(2,"Do collisions"); /* first three "/ do_posn(1); do_posn1(2, "Prop Diagonal"); simd2(MOVE, FLAGS, ACC); /* flags = 0 */ simd2(TSTMARKPTR, 2, NRIGHT); SHOW; simd2(TSTWARRPTR,1,NRIGHT); SHOM; simd1(CONST,1); SHOW; simd2(LOGION,FLAGS,ACC); SHOW; simd(INIT); SHOW; simd2(TSTWARRPTR,8,NRIGHT); SHOW; simdl(SETHARK,6); SHOW; /* begin diagonal propagation */ simd2(TSTMARKPTR, 3, NLEFT); SHOW; #imd2(TSTMARKPIR, 4, RIGHT); SHOW; Bimd2(ISTMARRPIR, 7, RIGHT); SHOW simd2(TSTMARKPTR, 5, LEFT); SHOW; #imd2(TSTWARRPTR,6,LEPT); SHOW; simdl(CONST, 16); SHOW; simd2(LOGIOR, FLAGS, ACC); SHOW; simd1(CONST, 2); SHOW; simd2(LOGIOR, FLAGS, ACC); SHOW; Bimd2(LOGIOR, FLAGS, ACC); SHOW; simd2(LOGIOR, FLAGS, ACC); SHOW; #Imd2(MOVE, FLAGS, ACC); SHOM; SIMOZ (MOVE, FLAGS, ACC); SHOM; simd2(MOVE, FLAGS, ACC); SHOW simd1 (TSTNZERO, ACC); SHOW; mimd(INIT); SHOM; mimd1(CONST,54); SHOM; mimd2(EQ,ACC,FLAGS); SHOW; simd2(EQ, ACC, FLAGS); SHOW; simdl(SETMARK, 15); SHOW; simdl(SETHARK, 15); SHOW; simdl(CONST, 27); SHOW; SIMOI(SETMARK, 7); SHOW; simdl(CONST, 42); SHOW; # Lind1 (SETMARK, 8); SHOW simdl(CONST,21); SHOW; simd(INIT); SHOW; simd1(CONST,42); SHOW; simd1(CONST, 32); SHOW Bind1 (CONST, 0); SHOW; simd1(CONST,8); SHOW; simdl(CONST, 4); SHOW; /* second three */ imd(INIT); SHOW; sind(INIT); SHOW; Simd(INIT); SHOW; simd(INIT); SHOW; sind(INIT); SHOW; simd(INIT); SHOW; sind(INIT); SHOW; /* first four */ ``` simdl(CLRMARK, 16); SHOW; ``` simd2(EQ.ACC.FLAGS); SHOW; /* note no need for 15 check */ simd1(CONST.54); SHOW; /* since 54 not created */ simd2(EQ.ACC,FLAGS); SHOW; simd1(TSTNOTWARK,15); SHOW; /* didn't just become this */ simdl(SETWARK,16); SHOW; /* 16 is important case */ simdl(SETWARK,15); SHOW; /* 15 we are done now */ do_posn(5); do_posn(2,"Do Collisions (oppl)"); simd(INIT); SHOM; simd1(CONST.9); SHOM; simd2 (MOVE, NFLAGS, ACC); SHOW; simd2 (LOGAND, ACC, FLAGS); SHOW; simdl(CONST,54); SHOW; simdl(LOGAND,ACC,FLAGS); SHOW; simdl(HOVE,NFLAGS,ACC); SHOW; simd1(CONST,54); SHOW; / simd2(MOVE,FLAGS,ACC); SHOW; simd2(MOVE, FLAGS, ACC); SHOW; simd2(HOVE, FLAGS, ACC); SHOW; simd1(CONST,22); SHOW; simd2(MOVE,FLAGS,ACC); SHOW; simd2 (MOVE, FLAGS, ACC); SROW; simdl(CONST,32); SHOW; simd2(EQ,ACC,NFLAGS); SHOW; simdl(CONST,50); SHOW; simd2(MOVE,FLAGS,ACC); SHOW; simdl(CONST,2); SHOW; simd2(EQ,ACC,NFLAGS); SHOW; simd1(CONST,38); SHOW; simd2(EQ, ACC, NFLAGS); SHOW; simdl(CONST,4); SHOW; simdl(EQ,ACC,NFLAGS); SHOW; simdl(CONST,16); SHOW; simd2(EQ,ACC,NFLAGS); SHOW; SIMIZ(EQ, ACC, NFLAGS); SHOW; simdl(SETMARK, 15); SHOW; simdl(CLRMARK, 16); SHOW; simd1(SETMARK, 15); SHOW; simdl(TSTMARK, 16); SHOW; simd1(CLRMARK, 16); SHOW; Bimdl(TSTMARK, 16); SHOW; simdl(TSTMARK, 16); SHOW; /* start opposition ONE */ SIMIL (CLRMARK, 16), SHOW; RIMAI (TSTMARK, 16); SHOW; simdl(CLRMARK, 16); SHOW; simd1(CONST, 45); SHOW; simd1(CONST,52); SHOW; simd(INIT); SHOW; simd1(CONST,45); SHOW; simd(INIT); SHOW; simd1(CONST,27); SHOW; sim11 (CONST, 18); SHOW; simil(CONST, 0); SHOW; /* second three */ /* fourth three */ simd(INIT); SHOW; /* first three "/ /* second four */ /* third three */ (MOHS ((LINI)) Simd(INIT); SHOW; SIMONS (LINI) /* third four */ /* first two */ ``` SIMIZ (MOVE, FLAGS, ACC); SHOW; ``` simd2(E),ACC,NFLAGS); SHOH; simd1(TSTNOTWARK,15); SHOH; /* didn't just become this */ simd1(SETWARK,16); SHOH; /* 16 is important case */ simdl(TSTNOTWARK,15); SHOW; /* didn't just become this */ simdl(SETWARK,16); SHOW; /* 16 is important case */ simdl(SETWARK,15); SHOW; /* 15 wc are done now */ Simdl(SETHARK, 15); SHOW; /* 15 we are done now */ do_posn(6); do_posnl(2,"Do Collisions (opp 2)"); simd(INIT); SHOW; simdl(CLRMARK,16); SHOW; /* start opposition THREE */ do_posn(?); do_posn1(2,*Do Collisions (opp 3)*); simd(INIT); SHOW; SIMU2(LOGAND, ACC, FLAGS); SHOW; simd2(LOGAND, ACC, FLAGS); SHOW; simdl(CONST,45); SHOW; simd2(LOGAND,ACC,FLAGS); SHOW; simd1(CONST, 27); SHOW; simd2(LOGANL, ACC, FLAGS); SHOW; simd2(MOVE, NFLAGS, ACC); SHOW; simd2 (MOVE, NFLAGS, ACC); SHOW; SIMU2 (MOVE, NFLAGS, ACC); SHOW; sind1(CONST,4); SHOW; sind2(EQ,ACC,NFLAGS); SHOW; sind1(CONST,13); SHOW; sind2(MOVE,FLAGS,ACC); SHOW; simd2 (MOVE, FLAGS, ACC); SHOW; simd1(CONST,44); SHOW; simd2(MOVE,FLAGS,ACC); SHOW; simd1(CONST,41); SHOW; simd2(MOVE,FLAGS,ACC); SHOW; simd2(MOVE,FLAGS,ACC); SHOW; simd1(CLRMARK,16); SHOW; simd2(EQ, ACC, NFLAGS); SHOW, simd1(const,8); Show; simd2(EQ,ACC,NFLAGS); SHOW; simd2(EQ, ACC, NFLAGS); SHOW; simd2(EQ, ACC, NFLAGS); SHOW; SING2(EQ, ACC, NFLAGS); SHOW; simdl(TSTMARK, 16); SHOW; simdl(TSTMARK, 16); SHOW; simd1 (CLRMARK, 16); SHOW; simdl(TSTWARK, 16); SHOW; simdl(CLRMARK, 16); SHOW; simdl(CLRMARK, 16); SHOW; simd(INIT); SHOW; simd1(TSTMARK,16); SHOW; /* start opposition TWO */ simd1(CONST, 37); SHOW; #imd1(CONST, 36); SHOW; simd1(CONST, 18); SHOW; simdl(CONST, 32); SHOW; sindl(CONST, 36); SHOW; simdl(CONST,1); SHOW; simd1(CONST,0); SHOW; /* second three */ /* fourth three */ simd(INIT); SHOW; /* first three */ sind(INIT); SHOW; /* third three */ simd(INIT); SHOW; /* first two */ ``` ``` simd-fluid.c ``` ``` /* first three */ simdl(CONST.8); SHOM; simdl(CONST.8); SHOM; simdl(CONST.26); SHOW; simdl(CONST.26); SHOW; simdl(CLRNARK.16); SHOW; /* second three */ sind1(CONST,16): SHOW; sind1(EQ.ACC,NFLAGS); SHOW; sind1(CONST,25): SHOW; sind1(MOVE,FLAGS,ACC); SHOW; sind1(CLRHARK,16); SHOW; /* third three */ simd(INIT): SHOW; simd(TSTMARK,16); SHOW; simd1(CONST,1); SHOW; simd1(CONST,19); SHOW; simd1(CONST,19); SHOW; simd2(HOVE,FLAGS,ACC); SHOW; simd1(CLRMARK,16); SHOW; SIMONE, NFLAGS, ACC); SHOW; simdl(CONST,11); SHOW; simd2(HOVE,FLAGS,ACC); SHOW; simdl(CLRMARK,16); SHOW; simd2(EQ.ACC,NFLAGS); SHOW; simd1(CONST,9); SHOW; simd2(MOVE,FLAGS,ACC); SHOW; simd1(CONST,2); SHOW; simd2(EQ,ACC,NFLAGS); SHOW; simd(INIT); SHOW; simdl(TSTMARK,16); SHOW; simd(INIT); SHOW; simdl(TSTMARK,16); SHOW; simd(INIT); SHOW; simdl(TSTMARK,16); SHOW; simdl(CONST,0); SHOW; /* fourth three */ " first two "/ ``` if ((CELLAT(root).CELLFLAGS)&1) break; /\* quit when particle there \*/ mainloop++; } /\* end of main loop \*/ do\_final(); /\* end oppositions \*/ do\_posn(8); do\_posn1(2, "End of collisions"); /\* now FLAGS is ready for next iteration \*/ /\* end of real loop \*/ ``` flag = 0; /* Keep track of whether there are any fibo or pluses */ while (1) [ /* begin of main loop */ SIMD routines for Numeric Fibonacci do_init(); fetchvers=0; storevers=fetchvers; build_cell(cp, NUMB, n, NULL, NULL); sp = alloc_cell(cp); build_cell(cp,FIBO,0,sp,NULL); /* Build the initial term */ cp = alloc_in(ROWNUM-3,4); /* fibo */ do_posnl(2,"fibo base"); simd(INIT); SHOW; simd(ERASE); SHOW; simd1 (CONST, NUMB); SHOW; simi2(EQ, ACC, TOK); SHOW; simd1 (SETHARK, 1); SHOW; cellstate *cp, *sp; progname = "nfib"; stat_print_loop(); int reducedlim - 2; #include <stdio.h> finclude 'basic.h' define FIBOHOLD 4 root - cp. self; Binclude "risc.h" #define FIBO 2 define NUMB 1 stat_init(); simd_execute() int flag; int opt = 1; term_build(n) (0) usod op int start; cb = 8b; opt = 1; int 1; ``` ``` simd2(SUB, NFLAGS, ACC); SHOM; simd_store(NRIGHT, NFLAGS, NFLAGS); SHOM; sind_store(NLEFT, NFLAGS, NFLAGS); SHOM; sind2(MOVE,TOK,ACC); SHOW; sind2(MOVE,FLAGS,NFLAGS); SHOW; sind1(CLRWARK,3); SHOW; if (opt) { sind1(CLEAR,LEFT); SHOW; } if (CELLAT(root).marks6(1<<1)) break; simd1(SETMARK,3); SHOW; simd_fetch(NFLAGS,LEFT,FLAGS); SHOW; simd_store(LEFT, FLAGS, NFLAGS); SHOW; simd2(SETWARKPIR,7,NRIGHI); SHOW; simd2(SETWARRPTR, 8, NRIGHT); SHOW; simd2(SETWARRPTR, 6, NRIGHT); SHOW; simdl(SETMARK,5); SHOW; simd2(SETMARKPTR,6,NLEFT); SHOW; simd2(TSTMARKPTR, 0, NLEFT); SHOW; simd2(SETWARKPTR, 10, LEFT); SHOW; simd2(TSTMARKPTR,1,LEFT); SHOW; simd2(SUB, NFLAGS, ACC); SHOW; simd_incrptr(-1,LEFT); SHOW; simdl(CONST,NUMB); SHOW; simd2(LE, NPLAGS, ACC); SHOW; #imd2(MOVE, TOK, ACC); SHOH; simd2(MOVE, TOK, ACC); SHOW; simd_alloc(NRIGHT); SHOW; simd_alloc(NLEFT); SHOW; simd(INIT); SHOW; simdl(CONST,FIBO); SHOW; simd2(EQ,ACC,TOK); SHOW; mimd2(CONST,FIBO); SHOW; simdl(TSTMARK,10); SHOW; simdl(CONST,NUMB); SHOW; simd1(TSTMARK,3); SHOW; do_posnl(2,"fibo gen"); simdl(SETMARK,9); SHOW; simdl(TSTMARK,9); SHOW; simdl(SETMARK,2); SHOW; simdl(TSTMARK,3); SHOW; simd_alloc(LEFT); SHOW; simdl(SETHARK,8); SHOW; simdl(SETMARK, 4); SHOW; simdl(TSTMARK,6); SHOW; sindl(TSTMARK,7); SHOW; simd_alloc(LEFT); SHOW; simdl(TSTMARK,6); SHOW; simdl(SETMARK, 8); SHOW; simdl(TSTMARK,5); SHOW; simdl(CONST,1); SHOW; simd(SENDGLOBAL); SHOW; simdl(CONST,1); SHOW; simd(INIT); SHOW; simd(INIT); SHOW; sind(INIT); SHOW; simd(INIT); SHOW; simd(INIT); SHOW; simd(INIT); SHOW; sind(INIT); SHOW, if (testglob()) ( simd(INIT); resetglob(); ``` ``` simd-numeric-fibo-a.c ``` ``` do_commitfin(); /* was last, then second */ mainloop++; ) /* end of main loop */ do_stat_restore(); do_relocate(); do_forward(); do_delete(); special = 0; do_final(); /* ----- simd(INIT): SHOW; simd1(TSTMARK,11); SHOW; simd_fetch(ACC,RIGHT,FLAGS); SHOW; simd(INIT): SHOW; simd1(TSTMARK,11); SHOW; simd1(ADD,FLAGS,ACC); SHOW; simd_incrptr(-1,LEFT): SHOW; simd_incrptr(-1,LEFT): SHOW; simd_incrptr(-1,RIGHT); SHOW; if (opt) { simd1(CLEAR,LEFT); SHOW; } simd1(SETWARK,11); SHOW; simd_fetch(FLAGS,LEFT,FLAGS); SHOW; do_posn(1); do_posn1(2, *special*); simd2(EQ,TOK,ACC); SHOW; simd2(TSTWARKPTR,1,LEFT); SHOW; simd2(TSTWARKPTR,12,RIGHT); SHOW; simd1(INIT); SHOW; simd1(TSTWARK,12); SHOW; simd2(TSTMARKPTR, 1, RIGHT); SHOW; simd(SENDGLOBAL); SHOW; simd2(TSTWARKPTR,1,LEFT); SHOW; sindl(TSTMARK,7); SHOW; sindl(CONST,10); SHOW; sindl(LT,ACC,NFLAGS); SHOW; sindl(CONST,FIBOHOLD); SHOW; sindl(MOVE,TOK,ACC); SHOW; sindl(CONST,FIBOHOLD); SHOW; sindl(EO,TOK,ACC); SHOW; sindl(CONST,FIBO); SHOW; sindl(MOVE,TOK,ACC); SHOW; do_posnl(2,"fibo release"); simd(INIT); SHOW; simd1(TSTWARR,4); SHOW; simd1(CONST,PLUS); SHOW, simd2(MOVE,NTOR,ACC); SHOW; Simd2(MOVE, TOK, ACC); SHOW; simdl(CLEAR,NFLAGS); SHOW; simdl(COMMITMARK,9); SHOW; simdi(CONST, NUMB); SHOW; simd(INIT); SHOW; simd1(CONST,PLUS); SHOW; do_poshl(2,"plus"); simd(INIT); SHOW; simd1(CONST,PLUS); SHOW; simd1(CONST,PLUS); SHOW; do_posnl(2,"fibo hold"); simd(INIT); SHOW; if (testglob()) ( /. snId ./ resetglob(); /• new •/ ``` /\* excluding actions in autonomous processes from basic counts \*/ do\_stat\_save(); special = 1; ## simd-peano-fibo.c ``` /# ............... flag = 0; /* Keep track of whether there are any fibo or pluses */ while (1) { /* begin of main loop */ stat_print_loop(); do_posn(0); do_posn1(2,"fibo(0)=0"); simd(INIT); SHOW; do_init(); fetchvers=0; storevers=fetchvers; build cell(cp, SUCC, 1, sp, NULL); build_cell(cp, ZERO, 1, NULL, NULL); SIMD routines for Peano Fibonacci build cell(cp, FIBO, 2, sp, NULL); cp = alloc_in(ROWNUM-3,4); root = cp->self; /* rule (eq (fibo 0) 0) */ /* Build the initial term */ simd(ERASE); SHOW; simdl(CONST,ZERO); SHOW; simdl(EQ,ACC,TOK); SHOW; simdi(SETMARK,1); SHOW; simd(INIT); SHOW; sp = alloc cell(cp); for (i=0; i<n; i++) { sp = alloc_cell(cp); progname - "fibo"; cellstate *cp,*sp; int reducedlim = 2; #include <stdio.h> finclude "basic.h" finclude "risc.h" int flag, rent; stat_init(); simd_execute() define ZERO 1 Adefine SUCC 2 define FIBO 3 #define PLUS 4 term build(n) int start; int 1; int n; ``` simdl(CONST, FIBO); SHOW; ``` /* rule (eq (fibo (succ (succ x))) (+ (fibo x) (fibo (succ x)))) */ do_posnl(2,*fibo(s s x) = fibo(x) + fibo(s x)*); simd(INIT); SHOW; /* rule (eq (fibo (succ 0)) (succ 0)) */ /* use forward ? */ do_posn1(2, fibo(s 0) = (s 0)*); simd2(SETMARK, 15); SHOW; /* Success */ simd_fetch(NLEFT, LEFT, LEFT); SHOW; siad2(TSTMARKPTR, 10, LEFT); SHOW; P'3d2(TSTMARRPTR, 6, LEFT); SHOW; simd2(TSTWARKPIR, 1, LEFT); SHOW; simd1(TSTWARK,5); SHOW; simd2(SETWARKPTR,7,LEFT); SHOW; simd2(TSTMARKPTR, 9, LEFT); SHOW; simd2(SETMARKPTR, 3, LEFT); SHOW; simd2(TSTMARKPTR, 1, LEFT); SHOW; simd_incr_any(1,NLEFT); SHOW; simd2(MOVE, NTOK, ACC); SHOW; simd2(MOVE, NTOK, ACC); SHOW; simd1(CLEAR, NRIGHT); SHOW; simd1(CLEAR, NRIGHT); SHOW; simd1(CLEAR, NLEFT); SHOW; simd1 (CONST, SUCC); SHOW; simdl(TSTMARK, 15); SHOW; simd2(EQ, ACC, TOK); SHOW; simdi(CONST, ZERO); SHOW; simdl(const, succ); SHOW; simd2(EQ, ACC, TOK); SHOW; simdl(TSTMARK, 15), SHOW, sindi (CONST, SUCC); SHOW; simd2(EQ, ACC, TOK); SHOW; Bimd1(SETMARK, 10); SHOW; simdl(CLRMARK,2); SHOW; simdl(SETWARK,1); SHOW; simdl(SETWARK,0); SHOW; simdi(TSTMARK,4); SHOW; simdl(SETMARK,9); SHOW; simdi(TSTMARK,4); SHOW; simdl(TSTMARK, 5); SHOW; simdl(SETMARK, 2); SHOW; simdl(TSTMARK,2); SHOW; simdl(SETMARK, 4); SHOW; simd1(TSTMARK,3); SHOW; simd1(SETMARK, 5); SHOW; simd1(SETWARK,6); SHOW; simdl(TSTMARK,7), SHOW; simdl(CLRMARK, 2); SHOW; simdi(SETMARK, 0); SHOW, simd(SENDGLOBAL); SHOW; simd(COMMIT); SHOW; simd(COMMIT); SHOW; sind(INIT); SHOM; sind(INIT); SHOW; sind(INIT); SHOW; simd(INIT); SHOW; sind(INIT); SHOW; simd(INIT); SHOW; if (testglob()) ( sind(INIT); simd(INIT); resetglob(); flag - 1; ``` ``` simd1(SETMARK,4); SHOW; simd_fetch(NLEFT,RIGHT,LEPT); SHOW; FVO [ simd(INIT); SHOW; ] FVO [ simdl(TSTMARK,4); SHOW; ] simdlefork, TSGHT,70K); SHOW; FVO [ simd(INIT); SHOW; ] FVO [ simd(INIT); SHOW; ] FVO [ simdl(TSTMARK,4); SHOW; ] FVO [ sindl(TSTMARK,4); SHOW; ] simd_store(NLEFT, TOK, TOK); SHOW; simd2(TSTMARKPTR, 1, LEFT); SHOW; simd2(TSTMARKPTR, 8, LEFT); SHOW; simd2(SETMARKPTR, 6, LEFT); SHOW sind incr any (1, NRIGHT); SHOW; simd incr any(1,NLEFT); SHOW simd1 (RELOCREQ, RIGHT); SHOW; simd_incr_any(1,NTOK); SHOW; simd1(CLEAR, NRIGHT); SHOW; FVO [ simd(INIT); SHOW; ] simd alloc(NLEFT); SHOW; simd2(MOVE, NLEFT, RIGHT); simd1(CONST,SUCC); SHOW; simd2(EQ,ACC,TOK); SHOW; simd(INIT); SHOW; simdl(TSTMARK,4); SHOW; simdl(TSTMARK,9); SHOW; simd1(SETMARK, 2); SHOW; SIMG1 (TSTMARK, 4); SHOW; simdl(CLRMARK,2); SHOW; simdl(TSTMARK,2); SHOW; simd1(SETMARK, 8); SHOW; SIMGI(TSTMARK, 2); SHOW; simd1(TSTMARK,6); SHOW, simd(SENDGLOBAL); SHOW; sindl (CONST, FORWARD); simd2 (MOVE, NTOR, ACC); simdl(CLEAR, RIGHT), simd(COMMIT); SHOW; simd(INIT); SHOW; stad(INIT); SHOW; simd(INIT); SHOW; simd(INIT); SHOW; if (testglob()) ( resetalob(); flag = 1; endif simd2(SETWARK,12); SHOW; /* Success */ simd2(SETWARKPTR,13,LEFT); SHOW; simd_store(NRIGHT,TOK,TOK); SHOW; /* should always succeed */ simd store(NLEFT, TOK, TOK); SHOW; /*vers*/ /* Success */ simd store(NRIGHT, LEFT, NTOK); SHOW; SVO [ simd(INIT); SH-M; ] FVO [ simd(INIT); SHOW; ] FVO [ simd(INIT); SHOW; ] Sind store(NLEFT, LEFT, NON); SHOW; SVO [ simd(INIT); SHOW; ] SVO [ simd(INIT); SHOW; ] sind_fetch(NTOK,LEFT,LEFT); SHOW; FVO [ sind(INIT); SHOW; } FVO [ sind1(TSTWARK,12); SHOW; } simd_fetch(NTOK, LEFT, LEFT); SHOW; simd_fetch(NTOK, LEFT, NTOK); SHOW; Sv0 { simdl(TSTMARK, 11); SHOW; } SVO [ simdl(TSTWARK, 12), SHOW; ] simd1(CONST,1); SHOW; simd2(LOGIOR,FLAGS,ACC); SHOW; cntred += 4; sind incr any (1, NTOK); SHOW; simd_incr_any(1,NTOK); SHOW; simd2(MOVE,NTOK,ACC); SHOW; simd1(COMMITMARK, 12); SHOW Simd1 (CLEAR, NRIGHT); SHOW; Sv0 ( simd(INIT); SHOW; ) simd_alloc(NRIGHT); SHOW; /* rule (eq (+ 0 x) x) */ simdl(TSTMARK, 12); SHOW; simd_alloc(NLEFT); SHOW; simdl(SETMARK,11); SHOW; do_posn1(2,"0 + x = x"); simil(TSTMARK, 13); SHOW; simil(TSTMARK.12); SHOW; simdl(TSTMARK,12); SHOW; simdl(TSTMARK, 11); SHOW; simd(INIT); SHOW; simd1(CONST,PLUS); SHOW; sindl(CONST, PLUS); SHOW; simdl(CONST, 2ERO); SHOW; simd2(EQ, ACC, TOK); SHOW simd2(EQ, ACC, TOK); SHOW simdl(TSTMARK,0); SHOW; simdl(SETWARK,1); SHOW; do_posnl(2, setup"); if (size<2*mainloop) ( do_delete(); /*@*/ SIMO(INIT); SHOW; simd(ERASE); SHOW; simd(INIT); SHOW; simd(INIT); SHOW; sind(INIT); SHOW; SHOM sind(INIT); SHOW; simd(INIT); SHOW; do commitfin(); /. neu ./ ``` ``` SVO [ simd(INIT); SHOW; ) SVO [ simd1(TSTMARK,9); SHOW; ] simd_fetch(WTOK,LEFT,LEFT); SHOW; /* this and next omitted because ? */ FVO [ simd(INIT); SHOW; ) FVO [ simd1(TSTMARK,9); SHOW; ] /* rule (eq (+ (succ x) y) (succ (+ x y))) */ do_posnl(2,"(s x) + y = s(x + y)"); simd(INIT); SHOW; sind fetch(NRIGHT,RIGHT); SHOW; FVO [ sind(INIT); SHOW; } FVO [ sind1(TSTWARR,4); SHOW; } simd fetch(NFLAGS, RIGHT, FLAGS); SHOW; simd1(SETWARK,9); SHOW; /* Success */ sind store(NLEFT, LEFT, NTOR); SHOM; SVG [ sind(INIT); SHOW; ) SVO [ sindl(TSTMARK, 9); SHOM; ) ``` ``` simd-peano-fibo.c ``` simd\_store(NLEFT, RIGHT, RIGHT); SHOW; ~ ``` if (flag 66 reducedlim<rut) break; simdl(CONST,SUCC); SHOM; simd2(EQ,ACC,TOK); SHOM; /* simd2(TSTWARRPTR,0,LEFT); SHOW; */ simd(CLRTRYING); SHOW; simd2(TSTWARRGLBL,LEPT,0); SHOW; simd1(SETWARR,0); SHOW; simd(CLRTRYING): SHOW; simd1(TSTWARRPTR, 0, LEFT): SHOW; simd1(SETWARR, 0); SHOW; SVO [ simd(INIT); SHOW; ] SVO [ simdl(TSTWARK,9); SHOW; ] simdl(CONST,SUCC); SHOW; if (itestglob()) break; simd(ARBCOL); SHOW; simd1(SELROW, LEPT); SHOW; simd1(ARBROW, LEPT); SHOW; simd1(SELCELL, LEFT); SHOW; simdl(CONST,1); SHOW; simd2(LOGAND,ACC,FLAGS); SHOW; simdl(TSTNZERO,ACC); SHOW; /* rule "propagate reduced" */ do_posnl(2,"reduced"); simd(INIT); SHOM; simd1(TSTWARK,0); SHOH; simd1(CONST,1); SHOM; slmd2(LOGIOR,FLAGS,ACC); SHOM; cntred += clock-start; simd2(HOVE,NTOK,ACC); SHOW; simd1(CLEAR,RIGHT); SHOW; simd(COMMIT); SHOW; resetglob(); simd(INITTRYING); SHOM; Bimdl (TSTLOCAL, LEFT); SHOW; simdl(TSTNOTMARK,0); SHOW; simd(INIT); SHOW; simd1(CONST,3); SHOW; simd2(EQ,FLAGS,ACC); SHOW; sind(ARBTILE); SHOW; simdl(SETWARK,0); SHOW; simd(SETTRYING); SHOW; simd(SENDGLOBAL); SHOW; simd(NOP); SHOW; start = clock; simd(INIT); SHOW; if (testglob()) { simd(INIT); SHOW; cntrednum++; while (1) ( resetglob(); resetglob(); rent = 0; ``` ``` if (testglob()) break; if (CELLAT(root).CELLFLAGSE1) break; do_posn(1); do_posn1(2, *special*); /* excluding actions in autonomous processes from basic counts */ do_stat_save(); special = 1; do_commiffin(); /* was last, then second */ do_relocate(); do_delete(); special = 0; do_delete(); special = 0; do_stat_restore(); l /* end of main loop */ do_final(); } ``` ``` simd2(EQ.ACC.TOK); SHOW; /* am I a tak far avay? */ simd2(TSTWARRPTR.1,LEFT); SHOW; /* and are my 3 kids #'s? */ simd2(TSTWARRPTR.1,RIGHT); SHOW; /* must reinit */ simd2(MOVE,TOK,ACC); SHOW; /* now I will be close if (CELLAT(root).marks&(1<<1)) break; while (1) ( /* begin of main loop */ sind_incrptr(-1,LEFT); SHOW; sind_incrptr(-1,RIGHT); SHOW; sind_incrptr(-1,RLAGS); SHOW; sind_icrptr(-LEFT,EFT,FLAGS); SHOW; sind(INIT); SHOW; sind(TSTWARK,3); SHOW; simd_fetch(RIGHT, RIGHT, FLAGS); SHOW; mimd_fetch(FLAGS,FLAGS,FLAGS); SHOW, fetchvers=0; storevers=fetchvers; simd2(TSTWARKPTR,1,PLAGS); SHOW; simd1(SETWARK,3); SHOW; simd1(CONST,TAKCLOSE); SHOW; /* pull far numbers in close */ cp = alloc_in(ROWNUM-3,4); root = cp->self; simd(INIT); SHOW; simd1(CONST,TAKFAR); SHOW; sindl(CONST,NUMB); SHOW; simd2(EQ, ACC, TOK); SHOW; do_posnl(2,"tak base"); simd1(SETMARK,1); SHOW; simdl(TSTMARK,3); SHOW; simd(ERASE); SHOW; cellstate *cp, *sp; stat_print_loop(); sind(INIT); SHOW; simd(INIT); SHOW; progname = "tak"; int reducedlim - 2; do_posn(0); /* TAK */ simd_execute() stat_init(); int start, do init(); cp->CELLFLAGS = flags; if (left==NULL) cp->CELLLEFT = 0; else cp->CELLLEFT = left; if (right==NULL) cp->CELLRIGHT = 0; else cp->CELLRIGHT = right; if (nleft==NULL) cp->CELLNEFT = 0; else cp->CELLNEFT = nleft->self; if (nright==NULL) cp->CELLNRIGHT = 0; else cp->CELLNRIGHT = nright->self; simdl(TSTADDR,NLEFT); /* was NOP, now use simd_incr_addr below */ if (TSTDETAIL(10)) printf(" <--- do_commitfin_lrf finish\n"); build_full_cell_ints(cp,tok,flags,left,right,nleft,nright) cellstate *cp,*nleft,*nright) SIMD routines for the Takeutchi function simdl (ISTSTATE, STATE_COMMIT); simdl(TSTSTATE, STATE_COMMIT); simdl(CLRSTATE, STATE_COMMIT); simdl (TSTSTATE, STATE COMMIT) simd_incr_any(-1,NRIGHT); simd incr addr(-1,NLEFT); simd_incr_any(-1,NFLAGS); int tok, left, right, flags; if (displaysubs) SHOW; cntc +* clock-start, cp->CELLTOR - tok; simd(SENDGLOBAL); if (testglob()) ( do_commitfin_lrf() define TAXCLOSE 4 finclude <stdio.h> #include "basic.h" define TAKHOLD 5 linclude "risc.h" Idefine TAKFAR 3 cp.>marks = 0; start - clock; simd(INIT); simd(INIT); #define NUMB 1 resetglob(); simd(INIT); int start; define ``` ``` ? :: simd2(EQ,ACC,TOK); SHOW; /* am I a tak? "/ simd2(LE, LEFT, RIGHT); SHOW; /* if x < y */ simd1(CONST, TAKCLOSE); SHOW; simd2(MOVE, TOK, ACC); SHOH; simdl(CONST,NUMB); SHOW; simd1(SETWARK, 3); SHOW; /* takelose stuff */ simd(INIT); SHOW; /* clock 31 */ ``` /\* Build the initial term \*/ term build(n) int 1; Idefine PROBLEM\_SIZE 3 ``` /* excluding actions in autonomous processes from basic counts */ /* do we need this? */ simd2(comMITMARK,5); SHOW; /* commit */ /* clock = 134 on commit */ do_posn(1); do_posn1(2, 'special'); simd2(MOVE, NTOK, ACC); SHOW; mainloop++; ) /* end of main loop */ do_commitfin_lrf(); do_stat_restore(); do_stat_save(); special = 1; /* clock = 168 */ do_relocate(); do forward(); special - 0; do delete(); do_final(); ONLY DIFFERENCE ./ /* back at root (even if fail) */ simd(INIT); SHOW; /* set token of decendents */ simdl(TSTWARR,6); SHOW; simdl(CONST,TARCLOSE); SHOW; sindl(TSTWARK,5); SHOW; /* back at root */ sind_atore(NLEFT,RIGHT,RIGHT); SHOW; simd1(TSTWARK,5); SHOW; /* back at root */ simd_store(NLEFT,FLAGS,FLAGS); SHOW; simdl(TSTWARK,5); SHOW; /* back at root */ simd_etore(WRIGHT,FLAGS,LEFT); SHOW; simd(INIT); SHOW; simdl(TSTMARK,5); SHOW; /* back at root */ simd_store(NFLAGS,RIGHT,LEFT); SHOW; simdl(TSTMARK,5); SHOW; /* back at root */ simdl(CONST,1); SHOW; simd2(SUB,LEFT,ACC); SHOW; simd_store(NLEFT,LEFT,LEFT); SHOW; simdl(TSTMARK,5); SHOW; /* back at root */ simd2(SUB,RIGHT,ACC); SHOW; simd_atore(NRIGHT,LEFT,RIGHT); SHOW; simd(INIT); SHOW; simdl(TSTMARK,5); SHOW; /* back at root */ simd2(SUB,FLAGS,ACC); SHOW; ishdi(TSTWARK,5); SHOW; /* back at root */ simdl(TSTWARK,5); SHOW; /* back at root */ simd_etore(NFLAGS,FLAGS,RIGHT); SHOW; simd2(MOVE, FLAGS, FLAGS); SHOW; /* then z simd_store(NRIGHT, RIGHT, FLAGS); SHOW; simd2(SUB,FLAGS,ACC); SHOW; simd_store(NFLAGS,LEFT,FLAGS); SHOW simd2(SETWARRPTR, 6, NLEFT); SHOW; simd2(SETWARRPTR, 6, NRIGHT); SHOW; simd2(SETWARRPTR, 6, NFLAGS); SHOW; simdl(TSTMARK,4); SHOW; // simdl(CONST,TAKFAR); SHOW; simd2(MOVE, TOK, ACC); SHOW; simdl(SETWARK,4); SHOW; simd_alloc(NRIGHT); SHOW; simd_alloc(NFLAGS); SHOW; simdl(SETWARK,5); SHOW; simd_alloc(NLEFT); SHOW; simdl(TSTMARK,3); SHOW; Simdl(CLRMARK, 3); SHOW, do posnl(2,"tak gen"); SIMONS (LINI) PHIS simd(INIT); SHOW; simd(INIT); SHOW; sind(INIT); SHOW; simd(INIT); SHOW; Simd(INIT); SHOW; Bind(INIT); SHOW; simd(INIT); SHOW; simd(INIT); SHOW; /* else */ ```