A Single-Phase Clocked NOR/NOR CMOS Programmable Sequential Array Structure
A static CMOS Programmable Sequential Array (PSA) structure is presented, which uses a precharge CMOS NOR/NOR logic structure to implement combinational logic. It is fast, it consumes no static power, and it imposes no limits on the number of input terms. Only one input clock is required while additional clocks are generated by the PSA structure. Static latches are added to the output. Results will remain unchanged with the absence of a high clock signal. This single-phase clocking technique, with statistically latched outputs, permits this proposed PSA to be used for many different system overall timing strategies. The proposed methodology has been implemented with MOSIS scalable design rules (rev. 6) and has been adapted into the Berkeley VLSI CAD tool system—MPLA’s tiling format. An automatically generated example is given.
A Single-Phase Clocked NOR/NOR CMOS Programmable Sequential Array Structure
I. Introduction

Programmable logic arrays (PLAs) provide an efficient and flexible way to implement general modules for combinational systems in a regular manner. Similarly programmable sequential arrays can be formed by including storage cells together with the logic. These arrays can be programmed to implement general modules of sequential systems. While implementing Boolean functions with PLAs, a general logic function is first represented in a sum-of-products terms. Then, a two-stage NOR network (with NOT function added to both input and output) is used to map the logic equation to gates.

While nMOS realization of NOR gates is quite good and straightforward, static CMOS/bulk NOR gates present many problems [1]. First, while the better devices (n-channels) are in parallel, the worse devices (p-channels) are in series, which makes the gate slow. In fact, an NAND/NAND structure is generally used, which puts the better devices (n-channels) in series. Second, since both n-channel devices and p-channel devices are required, well location must be carefully arranged and the total area needed is large. Domino CMOS [2] method is used to implement small precharge PLAs in CMOS/bulk. The standard approach is to use a precharge NAND structure to implement the AND plane of the PLA and to use a precharge NOR to implement the OR plane. In between the planes and at output NOT is used. This approach is nice because there is no charge sharing possibility in precharge NOR gates. However the main disadvantage of this Domino PLA is that with large input terms, the series AND in the NAND gates is still slow. In fact, the delay is quadratic in the number of literals in series. As a result, it is desirable to have a precharge NOR/NOR typed of PLA structures in CMOS. Unfortunately, the precharge NOR gate cannot be concatenated directly to form a NOR/NOR PLA structure. This is a result of having the output of a precharge NOR gate goes from one to NOR. If this output is directly connected to the input of another precharge NOR gate undesired discharging will occur.

Thus, one must include some timing strategy to prevent unwanted discharging. In this paper, delayed clock is used to precharge the OR plane, which allows the concatenation of NOR gates. We will discuss the circuit and its timing strategy in the following sections. This PLA structure has been adapted into Berkeley PLA tools [3]. Optimization and auto-generation of general Finite State Machine (FSM) is available to the public.
II. Circuit Description

Several dynamic CMOS NOR/NOR PLA structures have been suggested [1] [4] [5] [6] [9]. The proposed approach is similar to [4]. A schematic diagram of the circuitry is shown in Figure 1. The AND plane consists of the normal precharge NOR gates arranged as one row per implicants. It is precharged during the low clock signal and it is evaluated when the clock signal is high. On the top of the AND plane is a dummy row. For each of the input literals, a diffusion area equals to the drain area of a pull-down transistor is added to the dummy row. As a result the total parasitic loading on the dummy row is greater than any of the implicant row. Since it is also precharged by the same clock signal as the rest of the implicants, it discharges at the worst-case rate in comparison with the rest of the rows. This slowest "dummy-implicant" is inverted to produce the delayed clock. This delayed clock is then used to precharge and evaluate the OR plane NOR gates. By the time this delayed clock rises to evaluate the OR plane logic, all implicants has settled to its desired output of either high or low state. Hence, no undesirable discharging by the OR plane NOR gates can occur. Outputs from the OR plane NOR gates are latched by static flip-flops. These flip-flops are gated by the inverse of the input clock and the delayed clock. In effect, the latched output will change its state only after the trailing edge of the input clock signal. This will warrant no undesirable discharging when concatenating blocks of these PSA. Moreover, this PSA structure is static from a system point of view.

With the absence of clock signal switching, the state of the output latches will stay unchanged holding the previous values. A detail circuit diagram implementing the logic function $F(A,B,C,D,E) = ABC + DE$ is given in Figure 2. The logic is as follows: Since $I_1 = (A' + B' + C')' = ABC; \ I_2 = (D' + E')' = DE; \ f = (I_1 + I_2)'$ and $F = f'$, therefore we have $F = ABC + DE$. There are cut-off transistors for both the AND and OR planes of the PSA to make sure that there is no direct path from Vdd to ground during the precharging phase. While the AND-plane NOR gates have only one "cut-off" transistor per gate to disconnect the inputs when precharging, the OR plane NOR gates need two transistors in series to cut off the input during precharge period. The detail timing strategy will be discussed in the following section.
III. Timing and Electrical Design Consideration

Only a single input clock is required. This reduces the routing area used for clock signals between blocks. Moreover, a single-phase clocked functional block, such as the PSA suggested, simplifies the overall system timing strategy [7]. Additional clocks needed are generated by the PSA. A timing diagram is given in Figure 3. There are total of four clock signals containing eight clock edges used to control this PSA structure. They are named A through H as depicted in Figure 3.

Region 1 is bounded by edge C and edge H from the previous cycle. Region 2 is between edges C and D. Region 3 is between edges D and E and region 4 is defined by edges F and H.

During region 1, AND-plane and the OR-plane of the PSA is being precharged. During region 2, AND-plane is evaluating. During the period of region 3, OR-plane is evaluating. Finally, the output is latched during region 4. A timing gap exists between region 3 and 4 to ensure that clock overlapping or clock skews will not create undesirable discharging of the dynamic NOR gates. Input should be valid before edge A and output will be valid shortly after edge H. Output will remain unchanged until shortly after the next edge H. As a result, outputs of this PSA structure can be used as input signals for the same PSA or other PSAs directly. The total delay contributed by the worse-case "dummy-row" and worse-case OR plane should not exceed the pulse width. Clock period must be longer than the sum of output latch settling time, the total delay from the PLA, and \( \Delta \), where \( \Delta \) is the delay contributed by inverting the input clock. This single-phase clock timing strategy does not have two-sided relation to satisfy [8].

To avoid noise problems, the layout should not connect Vdd or ground through diffusion layer. Although using diffusion may result in a more compact circuit, the noise problems as well as the speed slow down due to resistance do not payoff. We connect all gated-ground and gated-Vdd with metal layer only. Conforming to the MOSIS scalable CMOS rules (rev. 6), we obtain a 8x12 lambda pitch for the AND-plane and a 12x16 lambda pitch for the OR-plane.
IV. Example

A 4-bit counter is implemented. First, a finite state machine is described with PEG [3] specification. It is then automatically translated to logic equation format using the software PEG [3]. Logic equations are converted to truth table via EQNTOTT [3]. ESPRESSO [3] is used to simplify the truth table. Finally, MPLA [3] is used to generate layout in Magic format. The resulting PSA is measured at 194 μm by 343 μm in size. This counter is fabricated with a MOSIS 2 μm TinyChip. It is functional at a clock frequency of 50MHz. The layout of a fabricated chip is provided in Figure 4.

V. Conclusion

Programmable Sequential Arrays are useful parts of many digital designs. They can be used as building blocks of a general finite-state machines. They can be used as controllers for a processor. The CMOS PSA structure described provides a simple and flexible single-phase timing strategy. It uses a precharge CMOS NOR/NOR structure. No practical limit on the input variables is imposed. Several blocks of this PSAs can be concatenated to form a more complicated sequential machine. With the existing available software tools, fast and dense sequential blocks can be designed quickly. An example is given to illustrate the proposed structure.
Acknowledgment

The author would like to thank Dr. G. Lewicki for his encouragement and many helpful discussions.

References


List of Figures

Figure 1. A general schematic diagram of the single-phase NOR/NOR PSA

Figure 2. An Example PSA Implementing $F=ABC+DE$

Figure 3. Timing Diagram with Operating Regions

Figure 4. Layout of a 4-bit Counter
Figure 1. General schematic of the single-phase NOR/NOR PSA
Figure 2. Detailed diagram of the PSA circuit implementing $F=ABC+DE$
Figure 3. Timing diagram with operating regions