This report discusses the advantages of digital charge-coupled logic (DCCL) and makes comparisons with other current high density/low power LSI technologies. The basic equations necessary for designing DCCL logic gates are included and the design of various logic cells and arithmetic functions are discussed in Sections 1, 2, and 3. The principles used in the design of pipelined multiplier and adder/subtractor arrays are discussed in Section 4. The clocking schemes and test results obtained on both arithmetic arrays and single arithmetic functions are described in Sections 5 and 6. Section 7 describes the metal/polysilicon and double polysilicon...
fabrication processes used. The report concludes with Section 8 in which recommendations for the direction of future work are made.
FORWARD

This report has been prepared by TRW Defense and Space Systems Group. The work summarized in this report was performed under contract N00014-74-C-0068. The scientific officer for this contract was Dr. D. F. Barbe of the Naval Research Laboratory. The sponsor was L. W. Sumney of the Naval Electronics Systems Command. The period of performance was from February 1976 to May 1977.
# TABLE OF CONTENTS

1. INTRODUCTION AND SUMMARY .................................................. 1-1
   1.1 INTRODUCTION ................................................................. 1-1
   1.2 HISTORY ................................................................. 1-3
   1.3 PHASE 1 REPORT SUMMARY .............................................. 1-5
   1.4 PHASE 2 REPORT SUMMARY .............................................. 1-6

2. APPLICATION OF DIGITAL CHARGE COUPLED DEVICES ......................... 2-1
   2.2 COMPARISON OF DCCL WITH OTHER LSI TECHNOLOGIES .............. 2-6
      2.2.1 DCCL Full-Adder .................................................. 2-7
      2.2.2 CMOS Full-Adder ................................................. 2-7
      2.2.3 P-MOS and N-MOS Full-Adders .................................. 2-8
      2.2.4 Integrated Injection Logic Full-Adder ....................... 2-8
      2.2.5 Power Dissipation Comparisons in Arithmetic Arrays ....... 2-9
      2.2.6 Other Digital Technologies ................................... 2-10
      2.2.7 Package Density Comparisons .................................. 2-11
      2.2.8 CMOS, P-MOS and N-MOS Arrays .................................. 2-12
      2.2.9 I2L Arrays ...................................................... 2-12

3. SYNTHESIS OF DCCL DESIGN EQUATIONS .................................... 3-1
   3.1 DIGITAL GATES ............................................................ 3-1
   3.2 DCCL LOGIC CELL DESIGN ............................................. 3-4
   3.3 COMPARISON BETWEEN FULL-ADDER AND DUAL HALF-ADDER
       IMPLEMENTATIONS ..................................................... 3-6
      3.3.1 Full-Adder Implementation .................................. 3-6
      3.3.2 Clock Frequency ............................................... 3-6
      3.3.3 Power Dissipation .............................................. 3-7
      3.3.4 Signal-to-Noise Ratio .......................................... 3-8
      3.3.5 Transfer Efficiency .......................................... 3-8

4. IMPLEMENTATION OF PIPELINE ARITHMETIC ARRAYS .......................... 4-1
   4.1 THE DP2, 4-BIT + 4-BIT ADDER ARRAY ................................ 4-1
   4.2 THE DP2, 8-BIT + 8-BIT (DHA) ADDER ARRAY ........................ 4-1
   4.3 THE DP2, 8-BIT + 8-BIT (FA) ADDER ARRAY ........................ 4-4
   4.4 THE DP3, 16-BIT + 16-BIT ADDER ARRAY ................................ 4-4
<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>4.5</td>
<td>THE DP2, 3-BIT X 3-BIT MULTIPLIER ARRAY</td>
<td>4-10</td>
</tr>
<tr>
<td>4.6</td>
<td>THE DP3, 8-BIT X 8-BIT MULTIPLIER ARRAY</td>
<td>4-10</td>
</tr>
<tr>
<td>5.</td>
<td>FUNCTIONAL TESTING OF ARITHMETIC ARRAYS</td>
<td>5-1</td>
</tr>
<tr>
<td>5.1</td>
<td>TESTING THE DP2, 4 X 4 ARRAYS</td>
<td>5-1</td>
</tr>
<tr>
<td>5.2</td>
<td>FUNCTIONAL TESTING OF THE DP2, 8 X 8 ARRAYS</td>
<td>5-1</td>
</tr>
<tr>
<td>5.3</td>
<td>TESTING OF THE DP3, 16 X 16 ARRAY</td>
<td>5-10</td>
</tr>
<tr>
<td>5.4</td>
<td>TESTING OF THE DP2, 3 X 3 ARRAY</td>
<td>5-10</td>
</tr>
<tr>
<td>5.5</td>
<td>FUNCTIONAL TESTING OF THE DP3, 8 X 8 ARRAY</td>
<td>5-14</td>
</tr>
<tr>
<td>6.</td>
<td>FUNCTIONAL TESTING OF DP3A ARITHMETIC CELLS</td>
<td>6-1</td>
</tr>
<tr>
<td>6.1</td>
<td>TESTING THE FLOATING-DIFFUSION HALF-ADDER</td>
<td>6-1</td>
</tr>
<tr>
<td>6.2</td>
<td>TESTING THE FLOATING-GATE CASCADED DUAL HALF-ADDERS</td>
<td>6-8</td>
</tr>
<tr>
<td>6.3</td>
<td>TESTING THE FLOATING-DIFFUSION CASCADED DUAL HALF-ADDER</td>
<td>6-12</td>
</tr>
<tr>
<td>6.4</td>
<td>TESTING THE FLOATING-GATE FULL-ADDER</td>
<td>6-14</td>
</tr>
<tr>
<td>6.5</td>
<td>TESTING THE FLOATING-DIFFUSION FULL-ADDER</td>
<td>6-17</td>
</tr>
<tr>
<td>6.6</td>
<td>TESTING DP3 BURIED CHANNEL DESIGNS</td>
<td>6-20</td>
</tr>
<tr>
<td>7.</td>
<td>SEMICONDUCTOR PROCESSING</td>
<td>7-1</td>
</tr>
<tr>
<td>7.1</td>
<td>INTRODUCTION</td>
<td>7-1</td>
</tr>
<tr>
<td>7.2</td>
<td>DCCD PROCESS EVOLUTION</td>
<td>7-1</td>
</tr>
<tr>
<td>7.3</td>
<td>MASK GENERATIONS</td>
<td>7-2</td>
</tr>
<tr>
<td>7.4</td>
<td>GATE TECHNOLOGY</td>
<td>7-2</td>
</tr>
<tr>
<td>7.5</td>
<td>PROCESS MODIFICATIONS</td>
<td>7-3</td>
</tr>
<tr>
<td>7.6</td>
<td>BORON PENETRATION</td>
<td>7-4</td>
</tr>
<tr>
<td>7.7</td>
<td>THERMAL OXIDE PROCESSES</td>
<td>7-5</td>
</tr>
<tr>
<td>7.8</td>
<td>METAL STEP COVERAGE</td>
<td>7-6</td>
</tr>
<tr>
<td>7.9</td>
<td>HIGH SHEET RESISTANCE DIFFUSIONS</td>
<td>7-9</td>
</tr>
<tr>
<td>7.10</td>
<td>ADDITIONAL PROCESS VARIATIONS</td>
<td>7-10</td>
</tr>
<tr>
<td>7.11</td>
<td>N-SURFACE CHANNEL DEVICES</td>
<td>7-11</td>
</tr>
<tr>
<td>7.12</td>
<td>CLEAN GATE OXIDE TECHNOLOGY</td>
<td>7-11</td>
</tr>
<tr>
<td>7.13</td>
<td>ION IMPLANTATION</td>
<td>7-12</td>
</tr>
<tr>
<td>7.14</td>
<td>POLYCRYSTALLINE SILICON TECHNOLOGY</td>
<td>7-13</td>
</tr>
<tr>
<td>7.15</td>
<td>TEOS AS A PROTECTIVE OXIDE FILM</td>
<td>7-13</td>
</tr>
<tr>
<td>7.16</td>
<td>THE SINGLE LEVEL POLYSILICON PROCESS</td>
<td>7-15</td>
</tr>
<tr>
<td>7.17</td>
<td>DOUBLE POLYSILICON PROCESS</td>
<td>7-15</td>
</tr>
<tr>
<td>7.18</td>
<td>BURIED CHANNEL VERSION</td>
<td>7-18</td>
</tr>
<tr>
<td>8.</td>
<td>RECOMMENDATIONS FOR FUTURE WORK</td>
<td>8-1</td>
</tr>
<tr>
<td>8.1</td>
<td>PROCESSING</td>
<td>8-1</td>
</tr>
<tr>
<td>8.2</td>
<td>DESIGN</td>
<td>8-2</td>
</tr>
</tbody>
</table>
# LIST OF FIGURES

<table>
<thead>
<tr>
<th>FIGURE</th>
<th>DESCRIPTION</th>
<th>PAGE</th>
</tr>
</thead>
<tbody>
<tr>
<td>1-1.</td>
<td>Chronology of Program</td>
<td>1-5</td>
</tr>
<tr>
<td>2-1.</td>
<td>Distribution of Charge Packets</td>
<td>2-2</td>
</tr>
<tr>
<td>2-2.</td>
<td>Power Dissipation versus Clock Frequency for Full-Adders</td>
<td>2-7</td>
</tr>
<tr>
<td></td>
<td>constructed from various Semiconductor Technologies</td>
<td></td>
</tr>
<tr>
<td>3-1.</td>
<td>DCCL OR Gate</td>
<td>3-1</td>
</tr>
<tr>
<td>3-2.</td>
<td>DCCL OR Gate with Correction for 1 + 1 Logic</td>
<td>3-1</td>
</tr>
<tr>
<td>3-3.</td>
<td>DCCL AND Gate</td>
<td>3-2</td>
</tr>
<tr>
<td>3-4.</td>
<td>Exclusive-OR Gate</td>
<td>3-2</td>
</tr>
<tr>
<td>3-5.</td>
<td>DCCL Half-Adder</td>
<td>3-3</td>
</tr>
<tr>
<td>3-6.</td>
<td>DCCL Full-Adder and Truth Table</td>
<td>3-3</td>
</tr>
<tr>
<td>3-7.</td>
<td>Floating-Gate Slave Holding Well</td>
<td>3-4</td>
</tr>
<tr>
<td>3-8.</td>
<td>A Full-Adder Logic Cell Implemented with Dual Cascaded Half-Adders</td>
<td>3-6</td>
</tr>
<tr>
<td>4-1.</td>
<td>4-Bit + 4-Bit Adder Array Utilizing Dual Half-Adder Logic Cells</td>
<td>4-2</td>
</tr>
<tr>
<td>4-2.</td>
<td>DP2 4 + 4 Adder Array Utilizing Three Cascaded Half-Adders and a Single Half-Adder</td>
<td>4-3</td>
</tr>
<tr>
<td>4-3.</td>
<td>8-Bit + 8-Bit Adder Array Utilizing Cascaded Dual Half-Adder Logic Cells</td>
<td>4-5</td>
</tr>
<tr>
<td>4-4.</td>
<td>DP2 8 + 8 Adder Array Utilizing Seven Cascaded Half-Adders and a Single Half-Adder</td>
<td>4-6</td>
</tr>
<tr>
<td>4-5.</td>
<td>8-Bit + 8-Bit Adder Array Utilizing Full-Adder Logic Cells</td>
<td>4-7</td>
</tr>
<tr>
<td>4-6.</td>
<td>DP2 8 + 8 Adder Array Utilizing a Half-Adder ans Seven Full-Adders</td>
<td>4-8</td>
</tr>
<tr>
<td>4-7.</td>
<td>Logic Diagram of a DP3 16 + 16 Adder Array</td>
<td>4-9</td>
</tr>
<tr>
<td>4-8.</td>
<td>The DP3 16 + 16 Adder Array</td>
<td>4-11</td>
</tr>
<tr>
<td>4-9.</td>
<td>A Block Diagram of a 3 X 3 Multiplier Array</td>
<td>4-12</td>
</tr>
<tr>
<td>4-10.</td>
<td>DP2 3 X 3 Multiplier Array</td>
<td>4-10</td>
</tr>
<tr>
<td>4-11.</td>
<td>A Block Diagram of the DP-3 4 X 4 Multiplier Array</td>
<td>4-15</td>
</tr>
<tr>
<td>4-12.</td>
<td>DP3 8 X 8 Multiplier Array</td>
<td>4-16</td>
</tr>
<tr>
<td>5-1.</td>
<td>Gate Voltage VS Surface Potential (V&lt;sub&gt;G&lt;/sub&gt;/V&lt;sub&gt;S&lt;/sub&gt;) Curves</td>
<td>5-2</td>
</tr>
<tr>
<td>5-2.</td>
<td>Detail Block Diagram of a Single Half-Adder</td>
<td>5-3</td>
</tr>
<tr>
<td>5-3.</td>
<td>Detail Schematic of a Half-Adder, Indicating Surface Potentials</td>
<td>5-4</td>
</tr>
<tr>
<td>5-4.</td>
<td>Detail Timing Waveforms and Gate Voltages of Half-Adders</td>
<td>5-5</td>
</tr>
<tr>
<td>FIGURE</td>
<td>PAGE</td>
<td></td>
</tr>
<tr>
<td>--------</td>
<td>------</td>
<td></td>
</tr>
<tr>
<td>5-5. Input to 2-Word, 4-Bit Adder</td>
<td>5-6</td>
<td></td>
</tr>
<tr>
<td>5-6. Input to 2-Word, 4-Bit Adder</td>
<td>5-6</td>
<td></td>
</tr>
<tr>
<td>5-7. Input to 2-Word, 4-Bit Adder</td>
<td>5-7</td>
<td></td>
</tr>
<tr>
<td>5-8. Input to 2-Word, 4-Bit Adder</td>
<td>5-7</td>
<td></td>
</tr>
<tr>
<td>5-9. Input to 2-Word, 8-Bit Adder</td>
<td>5-8</td>
<td></td>
</tr>
<tr>
<td>5-10. Input to 2-Word, 8-Bit Adder</td>
<td>5-8</td>
<td></td>
</tr>
<tr>
<td>5-11. Input to 2-Word, 8-Bit Adder</td>
<td>5-9</td>
<td></td>
</tr>
<tr>
<td>5-12. Input to 2-Word, 8-Bit Adder</td>
<td>5-9</td>
<td></td>
</tr>
<tr>
<td>5-13. Input to 2-Word, 8-Bit Adder</td>
<td>5-10</td>
<td></td>
</tr>
<tr>
<td>5-14. Input to the 2-Word, 3-Bit Multiplier</td>
<td>5-11</td>
<td></td>
</tr>
<tr>
<td>5-15. Input to the 2-Word, 3-Bit Multiplier</td>
<td>5-11</td>
<td></td>
</tr>
<tr>
<td>5-16. Input to the 2-Word, 3-Bit Multiplier</td>
<td>5-12</td>
<td></td>
</tr>
<tr>
<td>5-17. Input to the 2-Word, 3-Bit Multiplier</td>
<td>5-12</td>
<td></td>
</tr>
<tr>
<td>5-18. Input to the 2-Word, 3-Bit Multiplier</td>
<td>5-13</td>
<td></td>
</tr>
<tr>
<td>5-19. Input to the 2-Word, 3-Bit Multiplier</td>
<td>5-13</td>
<td></td>
</tr>
<tr>
<td>6-1. Block Diagram of A DP3A Floating-Diffusion, Half-Adder</td>
<td>6-2</td>
<td></td>
</tr>
<tr>
<td>6-2. A Vg/Os Plot of a typical p-channel DP3A Wafer showing the difference in surface potentials under the first and second polysilicon gates</td>
<td>6-3</td>
<td></td>
</tr>
<tr>
<td>6-3. Waveforms associated with the Floating-Diffusion Half-Adder Test Cell</td>
<td>6-4</td>
<td></td>
</tr>
<tr>
<td>6-4. Functional Demonstration of Floating-Diffusion Half-Adder with an input of A = 1, B = 1 at a Clock Rate of 100KHz</td>
<td>6-5</td>
<td></td>
</tr>
<tr>
<td>6-5. Functional Demonstration of Floating-Diffusion Half-Adder with an input of A = 1, B = 0 at a clock rate of 100KHz</td>
<td>6-6</td>
<td></td>
</tr>
<tr>
<td>6-6. Functional Demonstration of Floating-Diffusion Half-Adder with an input of A = 1010111 and B = 0101111 at a clock rate of 6.5MHz. The output is slow rate limited by the final MOS circuit</td>
<td>6-6</td>
<td></td>
</tr>
<tr>
<td>6-7. The Operational Frequency Range of a Packaged Floating-Diffusion Half-Adder as a Function of Temperature</td>
<td>6-7</td>
<td></td>
</tr>
<tr>
<td>6-8. Block Diagram of a DP3 Floating-Gate, Dual Half-Adder DCCCL Cell</td>
<td>6-9</td>
<td></td>
</tr>
<tr>
<td>6-9. Gate structure and surface potentials of a DP3A half-adder showing the conditions that have to exist in order that the charge packet under the slave side of the floating-gate will transfer out the carry port</td>
<td>6-11</td>
<td></td>
</tr>
<tr>
<td>6-10. Block Diagram of a DP3 Floating-Diffusion Cascaded Dual Half-Adder DCCCL Cell</td>
<td>6-12</td>
<td></td>
</tr>
<tr>
<td>6-11. Waveforms associated with the floating-diffusion cascaded dual half-adder test cell</td>
<td>6-13</td>
<td></td>
</tr>
<tr>
<td>FIGURE</td>
<td>PAGE</td>
<td></td>
</tr>
<tr>
<td>-------</td>
<td>------</td>
<td></td>
</tr>
<tr>
<td>6-12.</td>
<td>Functional demonstration of a floating-diffusion cascaded dual half-adders at a clock frequency of 20KHz with inputs of ( A = 01001011 ), ( B = 00101101 ) and ( G = 00010111 ).</td>
<td>6-14</td>
</tr>
<tr>
<td>6-13.</td>
<td>Block Diagram of a DP3 Floating-Gate, Full-Adder DCCL Cell</td>
<td>6-15</td>
</tr>
<tr>
<td>6-14.</td>
<td>Waveforms associated with the Floating-Gate Full-Adder Test Cell</td>
<td>6-16</td>
</tr>
<tr>
<td>6-15.</td>
<td>Functional demonstration of the Floating-Gate Full-Adder Test Cell at a clock frequency of 20KHz and inputs of ( A = 11100111 ), ( B = 11000011 ), and ( G = 11000000 ).</td>
<td>6-17</td>
</tr>
<tr>
<td>6-16.</td>
<td>Block Diagram of a DP3 Floating-Diffusion, Full-Adder DCCL Cell</td>
<td>6-18</td>
</tr>
<tr>
<td>6-17.</td>
<td>Waveforms associated with the Floating-Diffusion Full-Adder Test Cell</td>
<td>6-19</td>
</tr>
<tr>
<td>6-18.</td>
<td>Functional demonstration of the floating-diffusion full-adder test cell at a clock frequency of 20KHz with inputs of ( A = 11110011 ), ( B = 11000011 ), and ( G = 11000000 ).</td>
<td>6-20</td>
</tr>
<tr>
<td>6-19.</td>
<td>DP3 Buried Channel Operation Verification</td>
<td>6-22</td>
</tr>
<tr>
<td>6-20.</td>
<td>Potential Diagram of the Buried Channel Shift Register</td>
<td>6-23</td>
</tr>
<tr>
<td>7-1.</td>
<td>Polysilicon Protect Configuration</td>
<td>7-3</td>
</tr>
<tr>
<td>7-2.</td>
<td>Charges or States Associated with the Silicon Dioxide-Silicon System</td>
<td>7-5</td>
</tr>
<tr>
<td>7-3.</td>
<td>Breaks in the Al Metallization DP-0 Design</td>
<td>7-6</td>
</tr>
<tr>
<td>7-4.</td>
<td>&quot;Gulch&quot; Formed under the Polysilicon Film</td>
<td>7-7</td>
</tr>
<tr>
<td>7-5.</td>
<td>&quot;Gulch&quot; and Steep Polysilicon Step covered by a TEOS Film</td>
<td>7-7</td>
</tr>
<tr>
<td>7-6.</td>
<td>Metallization over polystep covered with TEOS (DP-1 Design)</td>
<td>7-8</td>
</tr>
<tr>
<td>7-7.</td>
<td>Metallization over polystep without a TEOS deposition (DP-1 Design)</td>
<td>7-8</td>
</tr>
<tr>
<td>7-8.</td>
<td>Source/Drain contact to Poly Contact Structure</td>
<td>7-9</td>
</tr>
<tr>
<td>7-9.</td>
<td>Fixed Charge as a Function of Quartz Tube Age</td>
<td>7-12</td>
</tr>
<tr>
<td>7-10.</td>
<td>Polysilicon Gate (X4000) Covered by 13,000Å TEOS Film</td>
<td>7-14</td>
</tr>
<tr>
<td>7-11.</td>
<td>Polysilicon Gate (X4000) Covered by 13,000Å Silox Film</td>
<td>7-14</td>
</tr>
<tr>
<td>7-12.</td>
<td>DP3 Fabrication Process</td>
<td>7-19</td>
</tr>
</tbody>
</table>
## LIST OF TABLES

<table>
<thead>
<tr>
<th>TABLE</th>
<th>Description</th>
<th>PAGE</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.</td>
<td>Digital CCD and Analog CCD Comparison</td>
<td>2-5</td>
</tr>
<tr>
<td>2.</td>
<td>Cell Count for Various DCCL Arrays</td>
<td>2-10</td>
</tr>
<tr>
<td>3.</td>
<td>Total Power Dissipation in Watts of Various Size Arrays and Technologies at a Clock Frequency of 1MHz</td>
<td>2-11</td>
</tr>
<tr>
<td>4.</td>
<td>Total Power Dissipation in Watts of Various Size Arrays and Technologies at a Clock Frequency of 10MHz</td>
<td>2-11</td>
</tr>
<tr>
<td>5.</td>
<td>Estimates for the Active Area in $\text{mm}^2$ of various Arithmetic Arrays constructed from different semiconductor technologies</td>
<td>2-12</td>
</tr>
<tr>
<td>6.</td>
<td>16 X 16 Arithmetic</td>
<td>4-14</td>
</tr>
<tr>
<td>7.</td>
<td>Six Process Modifications: DP-0, DP-1, DP-2, DP-3</td>
<td>7-2</td>
</tr>
<tr>
<td>8.</td>
<td>The DP-3 Double Polysilicon Fabrication Process Flow Chart</td>
<td>7-16</td>
</tr>
</tbody>
</table>
1. INTRODUCTION AND SUMMARY

1.1 INTRODUCTION

The work reported on here has been exclusively concerned with the digital domain operation of charge coupled devices. One common example of the digital use of charge coupled devices is in the area of memory. But in the present context, we mean a great deal more than just memory. Generally speaking, any digital domain function can be accomplished with charge coupled devices; this means, in particular, digital charge coupled logic (DCCL) functions and digital arithmetic functions. Putting aside for the moment the question of how this is done, let us first ask why this would be done. After all, the charge coupled device technology produces a unique device in that it works as a sampled data analog system.

In view of the fact that the CCD is unique in this respect, what are the advantages of using the device in the digital domain? This can be answered by addressing an even more fundamental question; namely, why use any digital device? The reason that people have been using digital devices for some time can be summarized in a few statements:

- Freedom from parameter variations;
- Freedom from environment and environment changes;
- Flexible in application;
- Easily programmable;
- Arbitrary accuracy in calculations;
- Well known characteristics that are easily modeled and simulated;
- Low cost due to widespread use.

The above reasons are traditional in explaining the acceptance and wide use of any digital device. What we gain when we use a CCD in the digital domain is the addition of two other highly desirable attributes to the list. The CCD brings with it low power requirements and high functional density capability.
This marriage of CCD's and digital technology increases the general list of digital attributes and produces a very unique combination that permits the projection of devices and device characteristics that are otherwise unobtainable. The low power advantage is clearly desirable for applications that are space or man-pack related. The high functional density capability is exploited in any situation where a large amount of computation is required to perform an overall system function. The DCCL unit allows the designer to place a large number of functions on a single chip thereby eliminating interface and overhead circuitry and significantly reducing the overall chip count.

So far we have stated that some of the advantages are of using DCCL's. We have yet to address the question of how these devices are implemented. The basic DCCL technology has an obvious application in binary logic; each storage position either has charge or it does not and this fact represents a one or a zero just as in a digital memory. Beyond this however, we can extend the use of CCD's to perform arbitrary Boolean algebra. This concept is treated in detail below.

If our catalog of devices includes half-adders and full-adders along with logic functions such as AND's and OR's then we can implement any arbitrary logic or arithmetic function. There is one additional consideration however. Due to the operation of the charge coupled device whereby charge is shifted at each clock pulse, we do not have ripple through logic capability but rather must implement all of our function in a pipeline manner.

The reasons that pipeline calculations in arithmetic units are required is associated with the generation of the carry bit at each stage. For example, in the addition of two n-bit words, the two least significant bits can be added immediately and produce their sum and carry outputs. The carry is then available to be combined with the next significant bits and produce a new sum and carry. In this manner the carry is delayed during each operation and so must the application of the next significant bits be delayed by an equal amount. This requires a set of delays on the input lines. An analogous set of delays must be inserted in series with the output lines in order that the entire output word is available at one clock pulse sometime in the future. It is not difficult of course to obtain these delays in the CCD structure since that is the most natural operation for the device to perform. It does require
additional area however, and in general leads to a larger active area for the function. This added area can be removed in large scale functions where we can work with skewed arithmetic.

Working with skewed arithmetic means simply that the data enters the chip synchronously in time and passes through a set of delays that properly skews all of the bits. Then an arithmetic operation (such as addition or multiplication) is performed and the data is then shifted on to another operation. This technique continues until all of the operations have been done on the data. The data is once more passed through a set of delays that resynchronizes all of the bits so that they are available at the output pins at one point in time.

All of this means that we can eliminate the majority of the delays associated with the arithmetic operations for functions performed internal to the chip. Only the initial skewing delay and the final deskewing delay are required. All the while the data is on the chip it can be manipulated in a skewed fashion. There is another implication of using pipeline arithmetic. Since the data enters at one clock pulse and exits at a clock pulse sometime in the future, it is not efficient to do random calculations with pipeline techniques. This means that this technology is best suited for signal processing functions that operate on blocks of data at a time. It is not well suited to random calculations that occur only occasionally. A large number of algorithms either already are in a pipeline organization or can be cast into one, so that the application of DCCL is in no way truly restricted.

One other item is of note at this point: the throughput rate of pipeline arithmetic calculations is very high. The data enters the device at the maximum clock rate and the answer exits at some point in time later but still at the maximum clock rate. The designer must therefore only account for the series delay that is necessarily a part of the pipeline operation.

1.2 HISTORY

In 1973, the Naval Research Laboratory issued a request for quotation for a study program aimed at defining and analyzing those areas of application of charge coupled devices (CCDs) in signal processing systems. The broad objective of the RFQ was to initiate a study that would examine the impact of CCD technology on signal processing systems. Implicit in such a statement, of course, is the requirement to determine those areas of signal processing
systems where the use of CCDs offers an economic advantage. The extent of
that advantage, that is to say the impact, can then be projected. Naturally,
the projection cannot be made in terms of dollars and cents, but is best made
by direct comparison of identical functions realized with CCDs and any other
appropriate technology. Under these conditions, numbers such as speed, power,
and parts count can be tabulated and cross-correlated.

As a result of the proposal submitted to the Naval Research Laboratory,
TRW embarked on a study of the impact on signal processing systems of the use
of CCDs in the digital domain. The results of that study have been issued
under the title "Charge Coupled Devices in Signal Processing Systems; Volume I:
Digital Signal Processing".* Briefly stated, the study indicated that digital
CCDs combine the inherent advantages of any digital technology (such as high
noise immunity, freedom from device/parameter variations, stable operating con-
ditions, and ease in simulation) with the advantages peculiar to CCDs (such as
high density and low power). In addition, digital CCDs are best suited to
to signal processing applications where the signal flow can be carried out
in a pipelining fashion requiring little or no feedback; this permits rela-
tively high data throughput to be accomplished with the relatively low CCD
clock frequencies. Not surprisingly, the impact is most dramatic in those
situations where a large number of functions and/or high computational accuracy
is demanded. A large number of such instances occur in existing and projected
systems; these were identified and analyzed in some detail.

At the conclusion of the study, TRW recommended that an experimental
verification be carried out that would go beyond the basic device work already
accomplished and would demonstrate the real advantages of the approach. The
realization of a digital CCD fast Fourier transform on a chip was selected as
a useful vehicle; additionally this function, properly implemented, is quite
flexible and suited to a number of diverse situations. Accordingly, a tech-
nology development program was begun. The objective of the first phase was
the investigation and characterization of the fundamental building blocks that
would be employed in a typical application. The results of this Phase I pro-
gram include the further development of a full adder circuit function; the
design and test of a 4 x 4 adder and a 3 x 3 multiplier arrays; and a study
made to determine a method of interconnecting a number of projected FFT chips
into a single system. These results have been issued under the title "Charge

*Available from the National Technical Information Services; a companion
report "Charge Coupled Devices in Signal Processing Systems; Volume II:
Analog Signal Processing" is also available.
The objective of this second phase, being described here, was to develop large computational building blocks suitable for implementing an FFT or some other similar function. Near the end of Phase 2, a potential application in the area of voice processing arose which would ultimately require 16-bit arithmetic blocks, i.e., a \((16 \times 16)\) multiplier and a \((32 + 32)\) adder/subtractor. At the end of the thirteenth month Phase 2 effort, work was completed on 8-bit arithmetic block designs. Work on the larger blocks continued into Phase 3, beginning with the design of a \((32 + 32)\) adder/subtractor. The duration of the third phase is dependent on the final application selected. The chronology of events is summarized in Figure 1-1.

![Calendar Year Chart](image)

**Figure 1-1. Chronology of Program**

### 1.3 Phase 1 Report Summary

This report contains an overview of the entire program and a brief statement of goals and approaches. This is followed by a discussion of the development of the full-adder circuit function. The original concept is explained and subsequent alternations to the original layout are described; both two and three input adders are treated (Section 2). There are some hardware implications in the several computational algorithms that can be used and these are examined in Section 3. The primary test mask that was designed during Phase 1 is presented along with a summary of the test results in Section 4. The process sequences being employed to produce these devices are explained, and cross-sectional views of the devices are given in Section 5. This is followed by a presentation of the results of a study made to determine a method of interconnecting a number of the projected FFT chips into a single system. The report concludes in Section 7 with a recommendation for future work.
1.4 PHASE 2 REPORT SUMMARY

This report contains a commentary on the advantages of digital charge coupled logic (DCCL) and makes a comparison with other current high density/low power LSI technologies. A description of the basic equations necessary for designing DCCL logic gates is included and the design of various logic cells and arithmetic functions are discussed in Sections 1, 2 and 3. The principles used in the design of pipelined multiplier and adder/subtractor arrays are discussed in Section 4. The clocking schemes and test results obtained on both arithmetic arrays and single arithmetic functions are described in Sections 5 and 6. In Section 7, we describe the metal/polysilicon and double polysilicon fabrication processes used. The report concludes with Section 8 in which recommendations for the direction of future work are made.
2. APPLICATION OF DIGITAL CHARGE COUPLED DEVICES

2.1 ADVANTAGES OF DCCL

Our previous comments in Section 1.1, regarding the low power requirements and high functional density capabilities of the CCD technology serve to point out the distinct advantages of digital charge coupled devices versus any other digital technology. It is more informative in the present context to examine the advantages of digital charge coupled devices versus analog charge coupled devices. Perhaps a good starting point is the different types of signal representation used for each implementation.

The analog device takes one sample of the data and applies a significance to the amplitude of that sample; the digital device takes one sample and quantizes it into n bits and attaches a significance to the magnitude of n. Clearly this means that the DCCL requires an analog to digital converter; this is not much of a penalty in today's systems for a great number of systems exist wherein the data representation is already in digital form.

At first appearances it would seem likely that the n-bits per sample would require much more silicon area to perform the same function in the digital form than the one sample an analog device would require. This, however, is not necessarily the case; the fact that we must maintain an acceptable signal to noise ratio in the operation requires us to utilize quite large areas for each analog packet storage. On the other hand, the digital device can use extremely small storage elements for each of the n samples.

This is so because the digital device has an inherently better noise performance. In the analog operation any change in sample amplitude is a change in signal amplitude. In the digital device, the signal can change by quite an amount before this change is detected by the thresholding output circuit. In fact, the properties of this output circuit provide the digital implementation with one of its biggest noise margins. This circuit need only detect the presence of a charge packet greater than a certain amount or less than a certain amount and make its decision based on that information; this is distinctly different from assigning a significance to the exact amplitude of a charge packet as is required in the analog operation.

It is worthwhile to examine this question of the output circuit a little further. Figure 2-1 is a representation of the statistical variation to be found in the output packets present at the end of a digital operation. We note
that we get a Gaussian distribution in the number of packets for both a one and a zero output. This is to say, there is some charge packet size which is intended to represent a one bit and another charge packet size which is intended to represent a zero bit. Due to the various noise sources inherent in the device the exiting charge packets will not all be of exactly the same amplitude. This produces the Gaussian distribution shown.

![Figure 2-1. Distribution of Charge Packets](image)

Now the output circuits need only distinguish between these two major distributions; signals larger than the threshold point are interpreted as having originated from a one bit and signals smaller than that as having originated from a zero bit. In a statistical sense, the output circuit inevitably misinterprets some signals; this represents the bit error rate of the device, and is generally quite a small number. This operation strongly contrasts with the analog output operation which depends upon a very linear input to output function. In representing signals in the analog domain, it is extremely important to minimize all noise sources and to control environmental conditions as far as is possible. This is true because any loss (or gain) of carriers from the charge packet throughout the analog system amounts to a corresponding loss (or gain) in signal. The digital representation avoids these problems by simply assigning two values to the charge packet (one to zero) and placing the burden of distinction on the output circuit where the distinction is easily made.
In addition to this, the effects of transfer efficiency differ between the two types of devices. The device modulation transfer function (MfF) greatly influences the signal representation; therefore the transfer efficiency (through the MTF) is an important parameter for analog implementation considerations. In the digital domain however, the device operation is typically independent of the transfer efficiency in the absolute sense and therefore also independent of transfer efficiency variation from device to device within very wide limits.

When we consider the effects of temperature and the temperature range over which various devices will operate, we must at the same time consider the frequency of operation of these devices. This is a direct result of the physics involved in the charge coupled device technology. Whether the signal is in digital or analog form, the fact of the matter is that carriers are collected in any potential minimum that exists within the silicon; these carriers are generated by dark currents and their total quantity is also a function of how long that potential minimum exists within the silicon. This means that the temperature range is inexorably tied up with the frequency of operation since the dark current is a function of temperature and the time a potential minimum exists is a function of frequency.

While this effect is identical in both digital and analog operation, the consequences of it are drastically different for the two cases. As we have explained, any accumulation of carriers is an apparent increase in signal for the analog representation. The digital device however, can accumulate carriers up to the point that the threshold of the output circuit will misinterpret a previously designated zero bit as a one bit. As a result quite a large number of carriers can be accumulated before the threshold circuit provides an incorrect result.

This means that the digital device operation is not adversely affected by an increase in temperature up to the point that the threshold circuit ceases to operate properly. Beyond that point, the device consistently makes errors. Therefore the range of temperature over which a CCD can operate in a digital mode is quite large and very predictable.

This temperature range is not independent of the frequency range as we have just pointed out. This means that at any given temperature of operation there is a minimum frequency of operation for proper device characteristics. Or viewed another way, at any given frequency there is a maximum temperature at which the device will operate.
In connection with temperature considerations, it is important to comment on the placement of refresh circuits throughout the digital function. These circuits are required quite naturally at several points in most digital operations. Consider, for example, what occurs when the results are available from a multiplier; it is conceivable that the output of the multiplier may go directly to the output pins of the device for use somewhere else in the system, or the output of the multiplier may be inserted into some other on-chip function. Since branching is required to do this, it is a natural place to put a charge to voltage conversion circuit that will refresh the charge and be capable of distributing it to several different places. This natural occurrence of refresh circuits reduces the time that any potential minimum is required to exist in the silicon. This means that the temperature range over which the device will operate is generally quite large; operation up to +135°C is not uncommon.

Radiation effects are quite similar to temperature effects in certain respects. Radiation generally does two things to harm the CCD operation: it increases the general level of dark current; and it changes the device thresholds. The increase in dark current can be viewed in the same way as an increase in temperature. The change in threshold however is a different kind of effect. The CCD has some margin to threshold variations due to its inherent design. For large radiation doses the threshold shift is quite large and an adjustment of the bias voltages is generally required in order to maintain acceptable overall device characteristics. In general, the digital CCD operation is quite immune to most of these variations; within its range of noise margin, the device can accept changes in threshold and increase in dark current without changing any of its operational characteristics.

The various types of functions that are achievable in the analog and the digital domain differ also. The digital representation is capable of performing logic functions in addition to arithmetic functions. The analog device can provide multiplication and addition. There is a distinct difference in the arithmetic functions provided by each type of device. The digital representation has an accuracy in its calculation that is limited only by the input signal quantization. If the input signal is quantized to 8-bits then the calculation will be accurate to 8-bits. In the analog domain the accuracy is affected by the device parameters because of transfer efficiency and also by the linearity of the input-output circuitry. In addition, analog multiplication is also affected by the weighting and tapping scheme used in the multiplier. The digital multiplier is accurate to the number of bits in its representation. This
tapping and weighting error problem has received a great deal of attention from various workers in the analog signal processing field.

One other item should be reiterated at this point. In the digital representation, arithmetic functions are performed with pipeline techniques. This means that the two n-bit words representing, for example, a multiplier and a multiplicand are both accepted into the arithmetic function on one clock pulse; they are then clocked through the function with succeeding pulses and eventually the 2n product bits emerge all at one clock pulse. After the first multiplier and multiplicand have been clocked into the function, a second set which is completely independent of the first can be accepted on the next clock pulse. A third set can be accepted on the third clock pulse and so on. All of the intermediate products are shifted along and kept entirely distinct from each other. Finally, at the output, each of the products arrive at its own point in time and all the output bits appear simultaneously. This requires a throughput delay between the time the multiplier and multiplicand are first introduced and the time that their result emerges as a product. But succeeding products come out on every clock pulse. This is distinctly different from operation in the analog domain where such multiplications can occur within one given clock period and there is no throughput delay.

In Table 1, we have summarized a number of the salient features of our comparison between digital charge coupled devices and analog charge coupled devices.

<table>
<thead>
<tr>
<th>Item</th>
<th>Digital</th>
<th>Analog</th>
</tr>
</thead>
<tbody>
<tr>
<td>Signal Representation</td>
<td>n-bits per sample of input; requires handling n charge packets, but each packet is very small; an n-bit A/D is required.</td>
<td>One sample per sample of input; requires only one charge packet, but that packet must be large to maintain an acceptable S/N; no A/D is required.</td>
</tr>
<tr>
<td>Input/Output Circuitry</td>
<td>Simple, two-state, thresholding circuits; signal accuracy unaffected within large noise margins.</td>
<td>Linear circuits needed; signal accuracy a direct function of I/O circuit performance.</td>
</tr>
<tr>
<td>Transfer Efficiency effects</td>
<td>Device operation typically independent of transfer efficiency (greater than 0.95 per transfer insures proper operation).</td>
<td>The well-known device modulation transfer function (MFT) shows that any transfer loss is a signal degradation; both amplitude and phase effects are seen.</td>
</tr>
<tr>
<td>Item</td>
<td>Digital</td>
<td>Analog</td>
</tr>
<tr>
<td>--------------------------</td>
<td>-------------------------------------------------------------------------</td>
<td>------------------------------------------------------------------------</td>
</tr>
<tr>
<td>Temperature Range</td>
<td>Due to simple thresholding output circuit, dark current can accumulate</td>
<td>Any dark current accumulation is measured as signal and degrades the</td>
</tr>
<tr>
<td></td>
<td>right to the threshold without affecting the signal; a very large</td>
<td>S/N; special design precautions are required to minimize this effect,</td>
</tr>
<tr>
<td></td>
<td>temperature range results over which operation is totally unaffected.</td>
<td>but it cannot be eliminated.</td>
</tr>
<tr>
<td>Maximum Frequency</td>
<td>Limited by transfer efficiency at the point the threshold holding</td>
<td>Limited by transfer efficiency effects.</td>
</tr>
<tr>
<td></td>
<td>circuit is effected, thus a higher frequency is achievable.</td>
<td></td>
</tr>
<tr>
<td>Functions Achievable</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Logic</td>
<td>Yes</td>
<td>No</td>
</tr>
<tr>
<td>Addition</td>
<td>Yes, with an accuracy limited only by the input signal quantization.</td>
<td>Yes, with an accuracy limited by device parameters (such as transfer</td>
</tr>
<tr>
<td></td>
<td></td>
<td>efficiency) and the I/O linearity.</td>
</tr>
<tr>
<td>Multiplication</td>
<td>Yes, with an accuracy limited only by the input signal quantization.</td>
<td>Yes, with an accuracy limited by device parameters (transfer efficiency), I/O circuit linearity and more importantly, the tapping and weighting scheme used.</td>
</tr>
<tr>
<td>Subtraction</td>
<td>Yes, with an accuracy limited only by the input signal quantization.</td>
<td>No</td>
</tr>
</tbody>
</table>

**NOTE:** In performing arithmetic functions, pipeline techniques are used; thus after a throughput delay, results are available at each clock pulse. In performing arithmetic functions, the operations occur within a clock period and results are available at each clock pulse; no throughput delay is required.

### 2.2 COMPARISON OF DCCL WITH OTHER LSI TECHNOLOGIES*

Almost all previous power dissipation comparisons between different digital techniques have been made at a single gate level; this is meaningless in a DCCL application so we have chosen to make the comparisons first on a full-adder logic cell and then on large arithmetic arrays.

* R. A. Allen, R. J. Handy, J. E. Sandor, "Charge Coupled Devices in Digital LSI", 1976 International Electron Devices Meeting, December 1976. The circuits used for this comparison of technologies were described in this paper.
2.2.1 DCCL Full-Adder

DCCL does not require any dc current, but current digital functions such as full-adders and regeneration cells that utilize a floating-gate require three clock phases. The comparison done here uses the current technique of employing the three input full-adder implementation designed with 5μm minimum geometry. The area under each clock line of the full-adder was measured and the capacitance calculated for each of the different silicon oxide thicknesses.

The difference in clock levels required for each clock phase was used with the calculated capacitance to determine the power dissipation, CV^2f. This resulted in a total power dissipation of 29.6μW at a clock frequency of 1MHz. Since the power dissipation is a linear function of frequency, the characteristic results in a straight line as shown in Figure 2-2.

![Figure 2-2. Power Dissipation versus Clock Frequency for Full-Adders constructed from various Semiconductor Technologies](image)

2.2.2 CMOS Full-Adder

The CMOS full-adder used in this comparison contained 28 devices and was designed with 5μm minimum geometry and a 5 volt supply. It dissipates 870μW at a clock frequency of 1MHz. The power dissipation characteristic for CMOS is linear up to 10MHz and then typically changes to a much steeper curve as shown in Figure 2-2.
2.2.3 P-MOS and N-MOS Full-Adders

Typical P-MOS and N-MOS full-adder schematics containing 18 devices are identical. A 5μm minimum geometry p-channel enhancement mode full-adder with saturated loads will dissipate 12mW at a clock frequency of 1MHz and 100mW at 10MHz.

Because the minority carrier mobility of n-channel devices is twice that obtained with p-channel units, the gates can be made smaller and a 5μm minimum geometry n-channel full-adder will dissipate 4.8mW at 1MHz and 40mW at 10MHz. The power dissipation versus clock frequency characteristics for P-MOS and N-MOS full-adders are shown in Figure 2-2.

2.2.4 Integrated Injection Logic Full-Adder

A delay power product of 0.5pJ per gate for experimental I2L devices has been reported by S. Bruederle and P. Smith*, with 0.8pJ a value that is more commonly achieved in production. These reported figures were for 5μm devices, and we will assume for this comparison a delay-product of 0.8pJ per inverter at low clock frequencies.

An I2L full-adder requires 26 inverters. The total delay-product of the full-adder is the sum of the delay-product of a single inverter multiplied by the number of inverters, i.e., 20.8pJ.

There are a maximum of five stage-delays through the full-adder. If we assume that the five stage-delays, d5, can be contained within one half clock period, then the full clock period equals 10d5, and for a clock frequency of 1MHz, the power dissipation is:

\[ P_s = 20.8pJ / 0.1\mu s = 208\mu W \]

The power dissipation versus clock frequency curve for an I2L full-adder is shown in Figure 2-2. It should be noted that there are two break points on the curve; one at 50KHz and the other at 5MHz. Below 50KHz, the power-dissipation is constant. The reason for this is the common emitter current gain of a npn transistor used in an I2L configuration falls to below 4 at 1μA.

*Designing with I2L", Stan Bruederle and Philip Smith, 19/2 Westcon Aug 1975
Assuming that in a combinational logic circuit, half the inverters are not conducting and half have a collector current of 1μA and pnp emitter current of 0.5μA at a supply of 0.8 volts. The average power dissipation of such an inverter is 0.6μW. This results in a low frequency power dissipation of 15.6μW for the full-adder.

From 50KHz to 5MHz the power dissipation of an I2L circuit is limited by the base-emitter and intercircuit capacitances and is inversely proportional to supply current. At clock frequencies above 5MHz, the speed limitations are due primarily to constraints imposed by the storage of minority carrier charges in the npn emitter and in the pnp base. Additional limitations are due to stray parasitic capacitances, the base resistance of the npn transistor and the logic function implemented. These speed limitations of conventional I2L result in a 15MHz maximum operating frequency.

2.2.5 Power Dissipation Comparisons in Arithmetic Arrays

DCCL Arrays

When a variety of systems is considered, certain functions appear repeatedly: the fast-Fourier transform, for example, requires multipliers and adders; serial correlators require shift-registers, multipliers and accumulators; frequency synthesizers require shift-registers and accumulators; digital differential analyzers use adders and shift-registers to perform integration; and some transforms require add and subtract functions.

Although the power dissipation of a full-adder is useful for comparing various digital technologies, a comparison of the power-dissipation of arithmetic arrays is more meaningful.

A DCCL array requires half-adders, AND gates, charge refresh cells and shift-register delays in addition to full-adders. Each of these logic cells were treated in the same way as the full adder discussed above, by calculating the capacitance and power dissipation of each clock line.

In both DCCL adder and multiplier arrays in which a full-adder is implemented from two half-adders, the transfer time through a full-adder is two clock periods. Consistent with our current usage we now consider the delays, cell count and packing densities when a full-adder configuration is used. Each time that three bits are added together, the carry to the next higher binary level is delayed one clock period and the other input to the next level adder will also be delayed one clock period by means of shift-register
delays. The sum-bit outputs will also be delayed as higher binary values of the output number are generated and the lower values will also have to have delays inserted in order that the output bits are not skewed.

A list of the number of cells required for the arithmetic arrays is given in Table 2.

Table 2. Cell Count for Various DCCL Arrays

<table>
<thead>
<tr>
<th>Technology</th>
<th>16 + 16</th>
<th>32 + 32</th>
<th>8 x 8</th>
<th>16 x 16</th>
</tr>
</thead>
<tbody>
<tr>
<td>Regeneration cells</td>
<td>75</td>
<td>343</td>
<td>62</td>
<td>89</td>
</tr>
<tr>
<td>AND gates</td>
<td>-0-</td>
<td>-0-</td>
<td>64</td>
<td>256</td>
</tr>
<tr>
<td>Shift registers</td>
<td>360</td>
<td>1488</td>
<td>190</td>
<td>1328</td>
</tr>
<tr>
<td>Full-adders</td>
<td>15</td>
<td>31</td>
<td>47</td>
<td>214</td>
</tr>
<tr>
<td>Half-adders</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>16</td>
</tr>
</tbody>
</table>

2.2.6 Other Digital Technologies

All of the multiplier arrays described by A. Habibi and P. A. Wintz* were reviewed and the cell counts did not vary significantly from the schemes used for the DCCL arrays. Therefore in calculating the power dissipation for various technologies we have assumed the same mix of cells that is listed in Table 2.

For implementing the necessary delays in CMOS, P-MOS and N-MOS, we have assumed that shift-registers are used and that in the 12L array, D-type flip-flops are used.

The total power dissipation for various size arrays and technologies at clock frequencies of 1MHz and 10MHz are listed in Tables 3 and 4. These power dissipations are calculated for the specific cell mix described in Table 2, and will vary slightly according to which scheme is used for adding the summands.

Table 3. Total Power Dissipation in Watts of Various Size Arrays and Technologies at a Clock Frequency of 1MHz

<table>
<thead>
<tr>
<th>Technology</th>
<th>16 + 16</th>
<th>32 + 32</th>
<th>8 x 8</th>
<th>16 x 16</th>
</tr>
</thead>
<tbody>
<tr>
<td>DCCL</td>
<td>0.009</td>
<td>0.024</td>
<td>0.008</td>
<td>0.044</td>
</tr>
<tr>
<td>CMOS</td>
<td>0.582</td>
<td>2.3</td>
<td>0.820</td>
<td>4.1</td>
</tr>
<tr>
<td>P-MOS</td>
<td>2.9</td>
<td>13.3</td>
<td>2.5</td>
<td>15.0</td>
</tr>
<tr>
<td>N-MOS</td>
<td>0.531</td>
<td>2.3</td>
<td>0.559</td>
<td>3.1</td>
</tr>
<tr>
<td>I^2L</td>
<td>0.040</td>
<td>0.174</td>
<td>0.036</td>
<td>0.215</td>
</tr>
</tbody>
</table>

Table 4. Total Power Dissipation in Watts of Various Size Arrays and Technologies at a Clock Frequency of 10MHz

<table>
<thead>
<tr>
<th>Technology</th>
<th>16 + 16</th>
<th>32 + 32</th>
<th>8 x 8</th>
<th>16 x 16</th>
</tr>
</thead>
<tbody>
<tr>
<td>DCCL</td>
<td>0.089</td>
<td>0.237</td>
<td>0.077</td>
<td>0.444</td>
</tr>
<tr>
<td>CMOS</td>
<td>1.8</td>
<td>6.8</td>
<td>2.8</td>
<td>13.8</td>
</tr>
<tr>
<td>P-MOS</td>
<td>4.6</td>
<td>19.9</td>
<td>4.9</td>
<td>27.0</td>
</tr>
<tr>
<td>N-MOS</td>
<td>1.02</td>
<td>4.3</td>
<td>1.05</td>
<td>4.9</td>
</tr>
<tr>
<td>I^2L</td>
<td>0.596</td>
<td>2.7</td>
<td>0.544</td>
<td>3.2</td>
</tr>
</tbody>
</table>

2.2.7 Package Density Comparisons

Digital CCD technology is an inherently high density technique due to the fact that a DCCL logic function is implemented by processing an existing packet of charges in contrast to other logic families in which a logic function is implemented by a digital circuit built with several components. In addition, DCCL layouts have four layers of interconnection. The silicon substrate acts as a ground plane, the signal flow is along channels at or below the silicon surface and forms the first interconnect layer. The two levels of polysilicon that are insulated from each other can be used as cross-overs and are useful for interconnecting electrodes within logic cells. The single metal layer forms the bus lines for the clock phase and is the fourth interconnection layer.
The area of various DCCL arithmetic arrays are listed in Table 5 and are obtained from designed and fabricated chips. They are for the active circuit areas and do not include room for input/output buffers, bonding pads or borders.

Table 5. Estimates for the Active Area in mm² of various arithmetic arrays constructed from different semiconductor technologies.

<table>
<thead>
<tr>
<th>Technology</th>
<th>16 + 16</th>
<th>32 + 32</th>
<th>8 x 8</th>
<th>16 x 16</th>
</tr>
</thead>
<tbody>
<tr>
<td>DCCL</td>
<td>2.92</td>
<td>8.94</td>
<td>3.1</td>
<td>28.0</td>
</tr>
<tr>
<td>P-MOS</td>
<td>11.3</td>
<td>49.2</td>
<td>12.2</td>
<td>67.7</td>
</tr>
<tr>
<td>N-MOS</td>
<td>7.78</td>
<td>34.7</td>
<td>7.65</td>
<td>44.2</td>
</tr>
<tr>
<td>CMOS</td>
<td>16.5</td>
<td>70.2</td>
<td>19.5</td>
<td>104</td>
</tr>
<tr>
<td>I²L</td>
<td>14.9</td>
<td>64.9</td>
<td>26.2</td>
<td>137</td>
</tr>
</tbody>
</table>

2.2.8 CMOS, P-MOS and N-MOS Arrays

In calculating the areas of the various MOS arrays, a minimum geometry of 5μm is used. The MOS gate lengths were calculated for the required transconductance using the 5μm minimum geometry for the MOS gate width. Thus, knowing the gate lengths and widths and using an alignment tolerance of 2μm, the areas of the logic cells could be computed.

The estimated areas of various CMOS, P-MOS and N-MOS arithmetic arrays are listed in Table 5. The areas for interconnecting overhead is assumed to be 100%.

2.2.9 I²L Arrays

If we assume a minimum geometry of 5μm and an alignment tolerance of 2.4μm, then the layout of the D-type flip-flop illustrated in S. Brueederle's paper* will be .021mm². A full-adder laid out with the inverters perpendicular to the pnp emitter in the same way as the D-type flip-flop will measure .062mm². A half-adder will measure .04mm² and an AND-gate will measure .003mm².

The area estimates for various I²L arithmetic arrays are listed in Table 5. In the calculations for area, an interconnection overhead of 100% is used.

*Stan Brueederle and Philip Smith, "Designing with I²L", 19/2 Wescon, Aug. 1975

2-12
3. SYNTHESIS OF DCCL DESIGN EQUATIONS

3.1 DIGITAL GATES

Digital logic can be implemented with a two level gate process such as that used in standard analog CCD's. A logical one is defined as a charge quantity which is equal to the capacity of a minimum geometry storage electrode, and a logical zero is defined as an empty storage electrode.

The logical OR function is the easiest function to implement. The logical OR function is shown in Figure 3-1.

![Figure 3-1. DCCL OR Gate](image1.png)

When a logical one is transferred from either the A or the B input under a common storage electrode the OR function occurs. In this simple OR gate, the common storage electrode will contain a charge quantity which is twice that of a logical one when both A and B are ones. This condition can be corrected by providing a potential barrier and charge sink for the excess charge as shown in Figure 3-2.

![Figure 3-2. DCCL OR Gate with Correction for 1 + 1 Logic](image2.png)

Realizing that the charge which is discarded is the AND function of A and B, it is a natural extension of the basic OR gate to form an AND gate. As shown in Figure 3-3, an AND function is implemented by saving the charge which spills over the barrier electrode and sinking the OR function on an alternate clock phase.
The AND gate may be altered to perform the exclusive-OR function. The exclusive-OR function is shown in Figure 3-4.

In the exclusive-OR implementation, the output is taken from the OR function output. However, the output is corrected for the \((1 + 1)\) state by detecting the AND output with either a floating gate or a floating diffusion which raises the surface potential of the transfer gate and blocks the OR output. Since the \((1 + 1)\) state will leave a logical one charge packet under the D electrode, a charge sink must be provided on an alternate clock phase. If the AND function is not used, it must also be purged with a charge sink.

The next logical extension is implemented by taking the charge from the D electrode of the exclusive-OR gate to an output instead of a charge sink. The result is a half-adder which is shown in Figure 3-5.

The illustrated half-adder is currently being used as one of the fundamental cells in large arithmetic arrays. It is of interest to note the outputs which result when the A input is always supplied with a one. Under this condition, the carry output will be logically equal to B and the sum output will be logically equal to the complement of B. However by the action of the circuit, the resulting outputs are refreshed to a full logical one level. Hence, the half-adder may be used to perform the refresh function which becomes necessary to prevent signal degradation in large arrays.
A full-adder can be implemented by adding a third input to the input AND gate, an additional barrier and storage location and an OR gate.

The DCCL full-adder implementation is shown in Figure 3-6 along with its Truth Table.

**Truth Table**

<table>
<thead>
<tr>
<th>A</th>
<th>B</th>
<th>C</th>
<th>X</th>
<th>I</th>
<th>SUM</th>
<th>CARRY</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>
3.2 DCCL LOGIC CELL DESIGN

DCCL design begins with the selection of the dimension for a minimum geometry storage element. From these dimensions the dimension of other storage elements are determined in accord with the number of digital charge packets they must store. Present DCCL’s employ a 0.2mil$^2$ storage element.

However, the floating gate holding well, which is controlled by the floating gate slave, must be designed to hold and transfer a digital one charge packet. A schematic of the floating gate holding well with a corresponding surface potential diagram is shown in Figure 3-7.

\[ Q = A_S C_{ox} \Delta V_{fg} \]  
\[ \Delta V_{fg} = \frac{d V_{fg}}{d Q} \Delta Q \]

where \( A_S \) is the area of the floating gate holding well, \( C_{ox} \) is the oxide capacitance per unit area, and \( \Delta V_{fg} \) is the floating gate swing. The floating gate swing is,

Since the holding well must hold and transfer a logical one charge packet, it follows that \( Q = \Delta Q \) and,

\[ A_S = \frac{1}{(\frac{d V_{fg}}{d Q} C_{ox})} \]
This equation yields the minimum required holding well area. To provide for noise immunity it is reasonable to double the area calculated. The floating gate sensitivity can be calculated from,

\[
\frac{dV_{fg}}{dQ} = \frac{2V_0 (V_{fg} - V_{FB}) (\varepsilon-1) - V_o^2}{A_{fg} C_{ox} [2V_0 (V_{fg} - V_{FB}) (\varepsilon-1)^2 + V_o^2]} + 
\]

\[
\frac{[V_o^2 - 4V_0 (V_{fg} - V_{FB} (\varepsilon-1)) - 2 (V_{fg} - V_{FB})]}{A_{fg} C_{ox} [2(V_{fg} - V_{FB}) (\varepsilon-1)^2 + V_o^2]}^{1/2}
\]

where,

\[
V_{FG} = \phi_{ms} - Q_{ss}/C_{ox}
\]

\[
V_o = qNd \frac{\varepsilon_s}{C_{ox}}^2
\]

\[
\varepsilon = \frac{(A_{fg} C_{ox} + C_{ext})/A_{fg} C_{ox}}
\]

\[
V_{fg} \text{ is the floating gate voltage}
\]

\[
A_{fg} \text{ is the area of the floating gate}
\]

\[
V_{fb} \text{ is the flat-band voltage}
\]

\[
q \text{ is the magnitude of the electronic charge}
\]

\[
Nd \text{ is the substrate donor impurity concentration}
\]

\[
\varepsilon_s \text{ is the permittivity of silicon}
\]

\[
C_{ox} \text{ is the oxide capacitance per unit area}
\]

\[
Q_{ss} \text{ is the surface charge density}
\]

\[
\phi_{ms} \text{ is the gate/silicon work function difference}
\]

\[
C_{ext} \text{ is the capacitance on the floating gate}
\]

Present digital CCL designs using 5μm geometries on 5ohm-cm n type substrates yield typical values of \( \beta = 1.6 \), \( dV_{fg}/dQ = 7.5 \times 10^{12} \) volts/coulomb, and a floating gate swing of 1.2 volts. Calculating the area of the holding well yields \( A_s = 0.6 \text{mils}^2 \) which dictates that the floating gate holding well be three times the area of a logical one storage element.
3.3 COMPARISON BETWEEN FULL-ADDER AND DUAL HALF-ADDER IMPLEMENTATIONS

3.3.1 Full-Adder Implementation

A half-adder accepts two inputs \( a \) and \( b \), and produces a sum \( S = 1 \), if either input is 1, but not when both inputs are 1. The carry \( C_1 = 1 \) if both inputs are 1. Hence \( S = a \oplus b \) and \( C = ab \). A full-adder accepts three inputs and produces a sum \( S = 1 \) when one or all three inputs are 1, thus, in logical terms \( S = g + (a + b) \). A carry \( C = 1 \) is produced, when two or three inputs are 1's; \( C = g (a + b) + ab \). Hence, a full-adder can be realized using two half-adders plus an OR gate as shown in 3-8.

![Diagram of Full-Adder Logic Cell Implemented with Dual Cascaded Half-Adders](image)

Figure 3-8. A Full-Adder Logic Cell Implemented with Dual Cascaded Half-Adders

There are trade-offs in the choice of full-adders and dual half-adders that affect the maximum clock frequency, power dissipation, signal-to-noise ratio, transfer efficiency, propagation time and silicon area.

3.3.2 Clock Frequency

For the large charge packets used in DCCL's, the transfer of charges is dominated by the self-induced drift. In a half-adder the \( \varnothing l \) clock is applied to the "OR" gate, referred to as the D gate in Figure 3-6. The time duration of the \( \varnothing l \) clock is determined by the time necessary for an input of \( 2Q_0 \) charges to fill the input storage area \( D \), transfer over the barrier and fill the storage area \( X \), under the master side of the floating-gate as shown in Figure 3-6.

The time required for the initial \( 2Q_0 \) charge to fall until the surface potential is equal to the thermal voltage \( KT/q \) has been shown to be

\[
\tau_{HA} = \frac{\pi L^3 W_{COX}}{4\mu Q_0} \tag{5}
\]
where $L$ is the total length of the electrodes over the input gate, the D and X storage gates and the intermediate transfer gate. $W$ is the channel width, $C_{OX}$ is the oxide capacitance per unit area, $\mu$, the mobility of the carriers and $(\Phi_1 - \Phi_2) = 2Q_0/(L2WC_{OX})$ is the initial input charge. The potential difference $\Phi_1 - \Phi_2$ is the difference in surface potentials of a full charge packet $Q_0$.

At the end of the self-induced drift period, the remaining input charge has a surface potential of 26mV at room temperature and is swept out by the fringing fields.

The full-adder has an additional transfer area and storage gate that has to fill when the initial input charge is $3Q_0$. The self-induced drift period for the full adder is

$$t_{FA} = \frac{W L^3}{6 \mu Q_0} \frac{W C_{OX}}{Q_0}$$

The ratio in self-induced drift time between a half-adder and a full-adder is

$$\frac{t_{FA}}{t_{HA}} = 2 \left( \frac{L_{FA}}{L_{HA}} \right)^3$$

For the specific designs described here, $L_{HA} = 1.4$ mil and $L_{FA} = 2.6$ mil. The $\Phi_1$ period for the full-adder will be 2.1 times that required for the half-adder.

The clock period for full-adders and half-adders can be divided into two periods, the period that the charges are equalizing while the $\Phi_1$ clock is negative and the period when the $\Phi_1$ clock is positive and the other clocks are controlling the charges. In a half-adder, the first period is approximately 40% or $0.4t$ and the second period is 60% or $0.6t$. In a full-adder the first period is $2.1 \times 0.4t = 0.84t$ so that the total time for a full adder is $1.44t$, compared to $1t$ for a half-adder.

3.3.3 Power Dissipation

The power consumed in a DCCL is only that power required to charge the gate capacitance to each clock voltage. The capacitance of the $\Phi_1$ clock line to the full-adder is approximately 1.8 times that of a half-adder and except...
for that clock line, all the other capacitances are identical. This additional capacitance causes a full-adder to dissipate 20% more power than a half-adder. However, when the dual half-adders are used to implement a full-adder function, the configuration requires two one-bit shift-registers and an OR gate. These additional elements added to the two half-adders result in an overall power dissipation that is 2.5 times that of a full-adder.

3.3.4 Signal-to-Noise Ratio

A half-adder requires one input and one output port to the storage area under the master side of the floating gate. The spacing between the polysilicon gates covering the two ports is 5um which results in a minimum storage area of $A = 52\mu^2$.

The additional channel to the intermediate storage area in a full-adder requires that a second output port be added to the storage area under the master side of the floating-gate. The polysilicon spacing required by this additional output channel doubles the storage area to $A = 104\mu^2$.

Increasing the storage area results in an increase in the capacitive drive of the floating-gate.

$$\frac{1}{\beta} = \frac{C_{ox}}{C_{ox}} \frac{A + C_{ext}}{A}$$

However, it also reduces the change in surface potential under the master side of the floating-gate as shown in expression (4). The net result is a decrease in $\Delta V_g$ that is a nonlinear function of the area $A$. A decrease in $\Delta V_g$ will reduce the voltage difference between the slave side of the floating-gate acting as a transfer gate and acting as a charge barrier gate. The reduction in voltage charge may allow some charge to spill over the floating-gate when it is in the barrier mode. Thus an increase in $A$ results in decreasing the noise immunity.

3.3.5 Transfer Efficiency

It is not feasible to use a "fat-zero" in implementing arithmetic functions with DCCL's so that typically in our current units we obtain a transfer efficiency of only 0.998. There are two transfers through a full-adder resulting in a transfer efficiency of 0.996. In a dual half-adder configuration there are four transfers producing a transfer efficiency of 0.992. In the layout of a large pipeline arithmetic array it will therefore be necessary to insert a level of charge refresh cells twice as frequently when dual half-adder configurations are used.
4. IMPLEMENTATION OF PIPELINE ARITHMETIC ARRAYS

In this section we describe the design of the arithmetic arrays implemented on the DP2 and DP3 chips. Both chips utilize surface p-channel two-phase CCD technology. They differ in geometry and processing, the DP2 utilizing 7.6μm minimum geometry and a metal-polysilicon structure whereas in the DP3, the geometry is reduced to a minimum of 5.1μm and a double polysilicon gate structure is used.

4.1 THE DP2, 4-BIT + 4-BIT ADDER ARRAY

The DP2 4 + 4 adder array uses the dual cascaded half-adders to perform the arithmetic.

The addition of two 4-bit binary numbers \(a_0, a_1, a_2, a_3\) and \(b_0, b_1, b_2, b_3\) in which \(a_0\) and \(b_0\) are the least significant bits, is performed with DCCL in a straightforward manner.

\[
\begin{array}{cccc}
\text{First Word} & a_3 & a_2 & a_1 & a_0 \\
\text{Second Word} & b_3 & b_2 & b_1 & b_0 \\
\text{Carry Bits} & c_4 & c_3 & c_2 & c_1 \\
\text{SUM} & S_4 & S_3 & S_2 & S_1 & S_0 \\
\end{array}
\]

(Carry bit \(c_n\) is generated by column \(n-1\).)

A block diagram of the DP2, 4 + 4 adder array is shown in Figure 4-1 and a photograph of a processed array is shown in Figure 4-2. The 4 + 4 adder utilizes seven half-adders, three OR-gates, fifteen single-bit shift-registers and five output buffers. There is a propagation delay of four clock phases through the 4 + 4 array. The results of tests carried out on the 4 + 4 array are described in Section 5.1.

4.2 THE DP2, 8-BIT + 8-BIT (DHA) ADDER ARRAY

The DP2 contains two 8 + 8 bit adder arrays. In the first array the arithmetic is performed with cascaded dual half-adders (DHA) and in the other, the arithmetic is performed with full-adders (FA).

The 8 + 8 (DHA) adder array is an extension of the 4 + 4. The addition of the two 8-bit binary numbers \(a_0 - a_7\) and \(b_0 - b_7\) is performed with DCCL in the same manner as the 4 + 4 array.
FIGURE 4-2. OP2 4 + 4 ADDER ARRAY UTILIZING THREE CASCADED
HALF-ADDER AND A SINGLE HALF-ADDER.
A block diagram of the DP2, 8 + 8 (DHA) adder array is shown in Figure 4-3. The array utilizes fifteen half-adders, seven OR gates, seventy-seven single-bit shift-registers and eight output buffers. There is a propagation delay of eight clock phases through the 8 + 8 (DHA) adder. A photograph of a processed array is shown in Figure 4-4, and the results of tests carried out on the 8 + 8 (DHA) array are described in Section 5.2.

4.3 THE DP2, 8-BIT + 8-BIT (FA) ADDER ARRAY

The 8 + 8 (FA) adder array performs the addition of two 8-bit numbers in the same manner as the 8 + 8 (DHA) described in Section 4.2. A block diagram of the DP2, 8 + 8 (FA) adder array is shown in Figure 4-5. This array utilizes one half-adder, seven full-adders, eighty-four single-bit shift registers and eight output buffers. There is a propagation delay of eight clock phases through the 8 + 8 (FA) adder. A photograph of a processed array is shown in Figure 4-6.

4.4 THE DP3, 16-BIT + 16-BIT ADDER ARRAY

The 16 + 16 adder array performs the addition of two 16-bit binary numbers in the same pipeline manner described for the 8 + 8 (DHA) adder array described in Section 4.2. A block diagram of the 16 + 16 adder array is shown in Figure 4-7. The full-adder cells used in the 16 + 16 array are composed of dual cascaded half-adders and an OR gate as described in Section 5.1. A change was made to the basic design by utilizing a full adder for the least significant sum rather than a half-adder. The additional input allows us to cascade arrays up to any number of bit length words by using the "carry-in" feature.

The packets of charge propagating through the array undergo seventeen transfers. Due to the low transfer efficiency obtained through the DP2 arithmetic arrays, we decided to insert two levels of charge refresh cells in the 16 + 16 array.

The first level of charge refresh was inserted after nine or ten transfers and the second level after the seventeen transfer, immediately before the charge packet is transferred to a voltage signal by the output buffer. The 16 + 16 array utilizes thirty-two half-adders, sixteen OR gates, forty charge refresh cells, three hundred and forty-one single stage shift-register delays and seventeen output buffers. The design of the full-adders is discussed in Section 4-4.
Figure 4-2. 8-Bit Adder Array Utilizing Cascaded Dual Half-Adder Logic Cells

NOTICE EACH T = 1 BIT DELAY.
FIGURE 4-4. DP2 8 + 8 ADDER ARRAY UTILIZING SEVEN CASCADING HALF-adders AND A SINGLE HALF-adder
Figure 4-5  8-Bit + 8-Bit Adder Array Utilizing Full-Adder Logic Cells
FIGURE 4-6. DP2 8 + 8 ADDER ARRAY UTILIZING A HALF-ADDER AND SEVEN FULL-ADDER
FIGURE 4-7. LOGIC DIAGRAM OF A DP3 16 + 16 ADDER ARRAY
5.1, the charge refresh cell in Section 5.9 and the output buffer in Section 5.11. There is a propagation delay of seventeen clock phases through the 16 + 16 adder array. A photograph of the processed array is shown in Figure 4-8, and the results of tests carried out on the 16 + 16 array are described in Section 5.3.

The two large square areas shown in Figure 4-8, near the adder array, are two polysilicon/silicon dioxide capacitors that are used to evaluate to C-V characteristics during semiconductor processing evaluation.

4.5 THE DP2, 3-BIT X 3-BIT MULTIPLIER ARRAY

The operations required to multiply two 3-bit binary numbers are performed in the following manner.

\[
\begin{array}{ccc}
a_3 & a_2 & a_1 \\
b_3 & b_2 & b_1 \\
\hline
a_3b_1 & a_2b_1 & a_1b_1 \\
a_3b_2 & a_2b_2 & a_1b_2 \\
a_3b_3 & a_2b_3 & a_1b_3 \\
\hline
p_6 & p_5 & p_4 & p_3 & p_2 & p_1
\end{array}
\]

The nine summands must be formed with logic AND gates and then added by columns (with carries) to form the product \( p_6 \ldots p_2, p_1 \). Note: for an example, that if \( a_1 = a_2 = a_3 = b_1 = b_2 = 1 \), then one carry is produced in the generation of \( p_2 \) and two carries are produced in the generation of \( p_3 \).

The block diagram of the DP2 3 x 3 multiplier array is given in Figure 4-9 which shows that the generation of \( p_1 \) only requires an AND gate; the generation of \( p_2 \) requires a half-adder, but the generation of \( p_3 \) and \( p_4 \), both require three half-adders. The 3 x 3 array utilizes nine AND gates, nine half-adders, three OR gates, seventeen 1-bit shift-registers and six output buffers.

There is a propagation delay of four clock phases through the 3 x 3 multiplier. A photograph of the DP2, 3 x 3 multiplier is shown in Figure 4-10 and the test results are discussed in Section 5.4.

4.6 THE DP3, 8-BIT X 8-BIT MULTIPLIER ARRAY

The 8 x 8 multiplier requires many more operations than the 3 x 3 as shown in Table 6.
FIGURE 4-8. THE DP3 16 + 16 ADDER ARRAY

4-11
Figure 4-10  BP2 3 x 3 Multiplier Array
**TABLE 6. 16 X 16 ARITHMETIC**

<table>
<thead>
<tr>
<th>a_8</th>
<th>a_7</th>
<th>a_6</th>
<th>a_5</th>
<th>a_4</th>
<th>a_3</th>
<th>a_2</th>
<th>a_1</th>
</tr>
</thead>
<tbody>
<tr>
<td>b_8</td>
<td>b_7</td>
<td>b_6</td>
<td>b_5</td>
<td>b_4</td>
<td>b_3</td>
<td>b_2</td>
<td>b_1</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>a_8 \cdot b_1</th>
<th>a_7 \cdot b_1</th>
<th>a_6 \cdot b_1</th>
<th>a_5 \cdot b_1</th>
<th>a_4 \cdot b_1</th>
<th>a_3 \cdot b_1</th>
<th>a_2 \cdot b_1</th>
<th>a_1 \cdot b_1</th>
</tr>
</thead>
<tbody>
<tr>
<td>a_8 \cdot b_2</td>
<td>a_7 \cdot b_2</td>
<td>a_6 \cdot b_2</td>
<td>a_5 \cdot b_2</td>
<td>a_4 \cdot b_2</td>
<td>a_3 \cdot b_2</td>
<td>a_2 \cdot b_2</td>
<td>a_1 \cdot b_2</td>
</tr>
<tr>
<td>a_8 \cdot b_3</td>
<td>a_7 \cdot b_3</td>
<td>a_6 \cdot b_3</td>
<td>a_5 \cdot b_3</td>
<td>a_4 \cdot b_3</td>
<td>a_3 \cdot b_3</td>
<td>a_2 \cdot b_3</td>
<td>a_1 \cdot b_3</td>
</tr>
<tr>
<td>a_8 \cdot b_4</td>
<td>a_7 \cdot b_4</td>
<td>a_6 \cdot b_4</td>
<td>a_5 \cdot b_4</td>
<td>a_4 \cdot b_4</td>
<td>a_3 \cdot b_4</td>
<td>a_2 \cdot b_4</td>
<td>a_1 \cdot b_4</td>
</tr>
<tr>
<td>a_8 \cdot b_5</td>
<td>a_7 \cdot b_5</td>
<td>a_6 \cdot b_5</td>
<td>a_5 \cdot b_5</td>
<td>a_4 \cdot b_5</td>
<td>a_3 \cdot b_5</td>
<td>a_2 \cdot b_5</td>
<td>a_1 \cdot b_5</td>
</tr>
<tr>
<td>a_8 \cdot b_6</td>
<td>a_7 \cdot b_6</td>
<td>a_6 \cdot b_6</td>
<td>a_5 \cdot b_6</td>
<td>a_4 \cdot b_6</td>
<td>a_3 \cdot b_6</td>
<td>a_2 \cdot b_6</td>
<td>a_1 \cdot b_6</td>
</tr>
<tr>
<td>a_8 \cdot b_7</td>
<td>a_7 \cdot b_7</td>
<td>a_6 \cdot b_7</td>
<td>a_5 \cdot b_7</td>
<td>a_4 \cdot b_7</td>
<td>a_3 \cdot b_7</td>
<td>a_2 \cdot b_7</td>
<td>a_1 \cdot b_7</td>
</tr>
</tbody>
</table>

| a_8 \cdot b_8 | a_7 \cdot b_8 | a_6 \cdot b_8 | a_5 \cdot b_8 | a_4 \cdot b_8 | a_3 \cdot b_8 | a_2 \cdot b_8 | a_1 \cdot b_8 |

<table>
<thead>
<tr>
<th>P_{16}</th>
<th>P_{15}</th>
<th>P_{14}</th>
<th>P_{13}</th>
<th>P_{12}</th>
<th>P_{11}</th>
<th>P_{10}</th>
<th>P_{9}</th>
<th>P_{8}</th>
<th>P_{7}</th>
<th>P_{6}</th>
<th>P_{5}</th>
<th>P_{4}</th>
<th>P_{3}</th>
<th>P_{2}</th>
<th>P_{1}</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>7</td>
<td>6</td>
<td>5</td>
<td>4</td>
<td>3</td>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>2</td>
<td>3</td>
<td>3</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>3</td>
<td>3</td>
<td>2</td>
<td>2</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>2</td>
<td>2</td>
<td>3</td>
<td>3</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>3</td>
<td>3</td>
<td>2</td>
<td>2</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>
The 64 summands are formed by 64 AND gates and then added by columns (with carries) to form the products $p_{16}$, $p_2$, $p_1$. Note: for example, that if all inputs are a binary "1" then two carries are formed in the generation of $p_3$ so that a total of six entries must be added to generate $p_4$. This increase in entries continues until we reach $p_9$; this column has seven summands plus four carries resulting in a total of eleven entries. Thus, column $p_9$ requires five cascaded half-adders or ten half-adders for implementation.

A block diagram of the DP3, 8 x 8 multiplier is shown in Figure 4-11. This array contains 64 AND gates, 111 half-adders, 48 OR gates, 154 charge refresh cells, 466 single-bit shift-registers and 16 output buffers.

There is a propagation delay of 32 clock-phases through the 8 x 8 multiplier. A photograph of the DP3, 8 x 8 multiplier is shown in Figure 4-12 and the test results are discussed in Section 5.5.

**FIGURE 4-11. A BLOCK DIAGRAM OF THE DP-3 4 X 4 MULTIPLIER ARRAY**
5. FUNCTIONAL TESTING OF ARITHMETIC ARRAYS

5.1 TESTING THE DP2, 4 + 4 ARRAY

The 4 + 4 array processes the data in parallel, thus the two 4-bit numbers are applied synchronously and the outputs are also available synchronously. The design of the DP2, 4 + 4 array is described in Section 4.1.

The test procedure described in this section was carried out on a wafer, using a probe station.

The gate voltage versus surface potential ($V_{G}/\phi_{S}$) plots were made of several chips on each wafer processed. A typical $V_{G}/\phi_{S}$ plot of a p-channel DP2 wafer is shown in Figure 5-1 and a block diagram of a single DP2 half-adder cell is shown in Figure 5-2. A surface potential diagram of the half-adder sum and carry channels was derived as shown in Figure 5-3. The gates referenced in Figure 5-3 correspond to the gates shown in Figure 5-2.

The set of clock waveforms shown in Figure 5-4 are necessary to produce the required surface potentials in the correct phase and were derived from the $V_{G}/\phi_{S}$ plots and applied to the logic cell under test.

The eight input lines were exercised through all possible sixteen combinations and the output from the array was observed on a CRT. It was seen that the five output lines produced the correct output data for each input combination.

CRT photographs of the 4 + 4 sum outputs for various input combinations are shown in Figures 5-5, 5-6, 5-7, and 5-8.

5.2 FUNCTIONAL TESTING OF THE DP2, 8 + 8 ARRAYS

There are two 8 + 8 adder arrays on the DP2 chip, one design utilizes dual half-adders and is described in Section 4.2. The other design is made up of full-adders and is described in Section 4.3.

Several mask errors were made on the full-adder version which prevented it from functioning; because we planned on using the dual half-adder concept on the DP3 layout and since the DP3 mask set was soon to be completed, we decided not to procure a corrected version of the DP2 mask set.

The dual half-adder version of the 8 + 8 adder arrays was connected up with the same clock voltages derived for the 4 + 4 array, as shown in Figure 5-4.
\[ \Phi_s \text{ (SURFACE POTENTIAL) VOLTS} \]

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0

\[ V_{G} \text{ VOLT} -30 -25 -20 -15 -10 -5 0 \]

POLY

GATES OVER NORMAL OXIDE

AL

GATES OVER FIELD OXIDE

BACK BIAS = +9.45 VOLTS

\textbf{FIGURE 5-1. GATE VOLTAGE VS SURFACE POTENTIAL (}V_{G}/\Phi_{s}\text{) CURVES}
FIGURE 5-2. DETAIL BLOCK DIAGRAM OF A SINGLE HALF-ADDER
Figure 5.4: Detail Timing Waveforms and Gate Voltages of Half-Adder
Figure 5-5. Input to 2-Word 4-Bit Adder

\[ a_i = \begin{bmatrix} 1 & 1 & 1 & 1 \end{bmatrix} \quad \text{1V Inputs} \]

\[ b_i = \begin{bmatrix} 0 & 0 & 0 & 0 \end{bmatrix} \]

\[ s_i = \text{Sum} = \begin{bmatrix} 0 & 1 & 1 & 1 \end{bmatrix} \quad \text{Carry} \]

Figure 5-6. Input to 2-Word 4-Bit Adder

\[ a_i = \begin{bmatrix} 1 & 1 & 1 & 1 \end{bmatrix} \quad \text{1V Inputs} \]

\[ b_i = \begin{bmatrix} 0 & 0 & 0 & 1 \end{bmatrix} \]

\[ s_i = \text{Sum} = \begin{bmatrix} 0 & 0 & 0 & 0 \end{bmatrix} \quad \text{Carry} \]
Figure 5-7. Input to 2-Word, 4-Bit Adder

\[
\begin{align*}
a_i &= 1 1 1 1 \quad \{ \text{1V Inputs} \\
b_i &= 0 0 1 1 \quad \{ \text{1V Inputs} \\
s_i &= \text{Sum} = 1 0 0 1 0 \quad \text{Carry (not shown)}
\end{align*}
\]

Figure 5-8. Input to 2-Word, 4-Bit Adder

\[
\begin{align*}
a_i &= 1 1 1 1 \quad \{ \text{1V Inputs} \\
b_i &= 0 1 1 1 \quad \{ \text{1V Inputs} \\
s_i &= \text{Sum} = 1 0 1 1 0 \quad \text{Carry (not shown)}
\end{align*}
\]
All nine output channels responded correctly to the input signals as shown in the oscilloscope photographs in Figures 5-9, 5-10, 5-11, 5-12, and 5-13.

Figure 5-9. Input to 2-Word, 8 Bit Adder
\[ a_i = 11111111 \]
\[ b_i = 00000000 \]
\[ s_i = 01111111 \]

Figure 5-10. Input to 2-Word, 8 Bit Adder
\[ a_i = 11111111 \]
\[ b_i = 00000001 \]
\[ \begin{array}{c}
100000000 \\
\hline
\end{array} \]

5-8
Figure 5-11. Input to 2-Word, 8 Bit Adder
\[ a_i = 00101011 \]
\[ b_i = 00101010 \]
\[ s_i = 001101010 \]

Figure 5-12. Input to 2-Word, 8 Bit Adder
\[ a_i = 11010101 \]
\[ b_i = 10000100 \]
\[ s_i = 101011001 \]
Figure 5-13. Input to 2-Word, 8 Bit Adder

\[ a_1 = 11111111 \]
\[ b_1 = 11111111 \]
\[ s_1 = 111111110 \] \[ \text{Carry} \]

In these figures, \( a_1 \) and \( b_1 \) are the input bits, where \( a_0 \) and \( b_0 \) represent the least significant bits and \( s_1 \) represents the sum.

5.3 TESTING OF THE DP3, 16 + 16 ARRAY

In an effort to reduce the number of individual clock lines required by the DP2 designs, we studied the problem and carried out empirical testing on single half-adder test cells. As a result, we determined that the dual half-adder would function correctly with only five clock lines plus the inject diode and the OR refresh clock.

The DP3, 16 + 16 adder array design was based on this concept. Unfortunately we experienced a race condition that prevented us from being able to test the 16 + 16 array. This problem is described in some detail in Section 6.1.

5.4 TESTING OF THE DP2, 3 X 3 ARRAY

The 3-bit x 3-bit multiplier array was designed with the same dual-half adder and single half-adder cells used in the 4 + 4 and 8 + 8 arrays. The same clock waveforms shown in Figure 5-4 were applied to the 3 x 3 multiplier array and it performed the correct operation for all fifteen input states.
Oscilloscope photographs of the output for several different input combinations are shown in Figures 5-14, 5-15, 5-16, 5-17, 5-18, and 5-19. In these figures, a_j and b_j represent the input bits and p_i represents the product output.

Figure 5-14. Input to the 2-Word, 3-Bit Multiplier

\[
\begin{align*}
a_j & = 111 \\
b_j & = 100 \\
p_i & = 011100
\end{align*}
\]

Figure 5-15. Input to the 2-Word, 3-Bit Multiplier

\[
\begin{align*}
a_j & = 010 \\
b_j & = 010 \\
p_i & = 000100
\end{align*}
\]
Figure 5-15. Input to the 2-Word, 3 Bit Multiplier

\[
\begin{align*}
    a_i &= 110 \\
    b_i &= 010 \quad 1.5 \text{ V/cm} \\
    p_i &= 001100
\end{align*}
\]

Figure 5-17. Input to the 2-Word, 3 Bit Multiplier

\[
\begin{align*}
    a_j &= 100 \\
    b_j &= 100 \quad 1.5 \text{ V/cm} \\
    p_j &= 010000
\end{align*}
\]
Figure 5-18. Input to the 2-Word, 3 Bit Multiplier

\[
\begin{align*}
    a_1 &= 110 \\
    b_1 &= 100 \\
    p_i &= 011000
\end{align*}
\]

Figure 5-19. Input to the 2-Word, 3 Bit Multiplier

\[
\begin{align*}
    a_1 &= 101 \\
    b_1 &= 100 \\
    p_i &= 010100
\end{align*}
\]
5.5 FUNCTIONAL TESTING OF THE DP3, 8 X 8 ARRAY

The design of the 8 x 8 array has the same logic cells as the 16 + 16 array described in paragraph 5.3. It therefore has the identical race condition as the 16 + 16 array and has so far proven to be untestable.
6. FUNCTIONAL TESTING OF DP3A ARITHMETIC CELLS

Wafers were processed from the DP3 masks and various logic cells were tested. It was soon found that there were many layout errors, it was also found that some of the test cells were too complex to characterize and analyze easily.

It was therefore decided to correct the masks and simplify some of the test cells. The new mask set was designated DP3A.

Both 16 + 16 and 8 x 8 arithmetic arrays on the DP3A were designed from interconnected full-adders formed from dual cascaded floating-gate half-adders. In order to check the correct design of this basic array cell a single full-adder of the identical design used in the array was placed on the DP3A mask set as a test-cell.

In addition to the dual-cascaded floating-gate half-adder, layouts of design variations of half-adders and full-adders were also included as test cells on the DP3A layout.

6.1 TESTING THE FLOATING-DIFFUSION HALF-ADDER

A single half-adder with a floating-diffusion charge sensing switch was included in the DP3A design. A block diagram of this half-adder test cell is shown in Figure 6-1.

In Figure 6-1, the double lines ending in an arrow represent a channel path and charge transfer direction. A single line represents a clock line or metal signal path. The large areas at the end of a channel arrow represent charge storage gates and the narrow rectangles with a channel crossing through them represent transfer gates. The cross hatched areas represent diffused areas. This symbolic representation of a DCCL function is used throughout this section.

A typical $V_{g}/\phi_{S}$ plot of a DP3A wafer is shown in Figure 6-2; from these curves the potentials of the waveforms shown in Figure 6-3 were derived.

The clock and data waveforms shown in Figure 6-3 were used to test the half-adder, either on the wafer or as a packaged unit. Figures 6-4 and 6-5 show the correct operation of the half-adder tested on the wafer with a 100KHz clock frequency.
Figure 6-1. Block diagram of a DP3A floating-diffusion, half-adder DCL cell.
FIGURE 6-2. A Vg/c Plot of a Typical p-Channel DP3A Wafer Showing the Difference in Surface Potentials Under the First and Second Polysilicon Gates.
FIGURE 6-3. WAVEFORMS ASSOCIATED WITH THE FLOATING-DIFFUSION HALF-ADDER TEST CELL
Packaged half-adders were functionally tested over a wide temperature range in order to determine their minimum and maximum operating frequencies. During the testing it was determined that the half-adder performed correctly with a clock frequency of 6.5MHz at a temperature of 45.3°C as shown in Figure 6-6.

A curve of the operational range of the half-adder as a function of the temperature is shown in Figure 6-7. However it must be pointed out that both the low frequency of 10KHz and the high frequency of 6.5MHz are limited by the design of the pulse generator used in the testing (and the slow rate of the MOS output circuit in 6.5MHz case).

Figure 6-4. Functional demonstration of floating-diffusion half-adder with an input of A = 1, B = 1 at a clock rate of 100KHz.
A - in
B - in
Sum Out
Carry Out

Figure 6-5. Functional demonstration of floating-diffusion half-adder with an input of $A = 1, B = 0$ at a clock rate of 100KHz.

Sum
Carry

Figure 6-6. Functional demonstration of floating-diffusion half-adder with an input of $A = 10101111$ and $B = 01011111$ at a clock rate of 6.5MHz. The output is slow rate limited by the final MOS circuit.
FIG 6-7. THE OPERATIONAL FREQUENCY RANGE OF A PACKAGED FLOATING-DIFFUSION HALF-ADDER AS A FUNCTION OF TEMPERATURE
6.2 TESTING THE FLOATING-GATE CASCaded DUAL HALF-ADDERs

Both the 16 + 16 and 8 x 8 arrays designed on the DP3A chip used the floating-gate cascaded dual half-adder to implement the full-adder arithmetic function. A block diagram of this arithmetic cell is shown in Figure 6-8 and an isolated sample of the cell is included on the DP3A for evaluation purposes.

The dual half-adder was made to perform the correct arithmetic function, however the transfer efficiency was poor and the amplitude of the output charge packets were small. Consequently we felt that it would not be useful to pursue array operation since an array uses several cascaded arithmetic cells. The major difficulty in operation was found to be due to tying together on the metal interconnection pattern, a sink-diode gate and the carry-out gate to the same Ø4 clock line (as shown in Figure 6-8). This design approach was taken in an effort to reduce the number of separate clocks that are required in an array. A test of this arrangement had been made on the DP2 mask set previously with satisfactory results.

The failure occurs when two or more inputs to the adder are each a binary one. In this case, the input storage agea overflows and fills the area under the master-end of the floating-gate, correctly causing the surface potential on the slave-end of the floating-gate to switch from a transfer to a barrier level. This barrier prevents the output charge packet from transferring out of the sum port. During the next clock phase when Ø4 switches to its negative level the charge retained by the slave-end of the floating gate should transfer out under the Ø4 carry out gate. However, the charge under the master-side of the floating-gate is removed by a gate and diode also tied to the same Ø4 clock. This provides a race condition, but what is worse, the gate and diode combination move the surface potential under the floating gate to a value that is a function of the Ø4 voltage. This modified surface potential switches the slave side of the floating-gate back to a transfer mode and the output charge incorrectly transfers out the sum port.

Two changes were made to the clock phases; first, the timing of Ø3 clock was adjusted so that it switched to its less negative potential before Ø4 switched to its most negative level. This caused the sum output gate to act as a barrier. The output charge was then retained under the slave-end of the floating-gate.
Since the $\Phi 4$ sink gate to the master-end of the floating-gate is a second level polysilicon gate and the $\Phi 4$ carry-out gate is a first level polysilicon gate, there is a 2 to 3 volt surface potential difference between them. A step voltage was applied to the $\Phi 2$ clock so that the storage gate which is located between the slave-end of the floating-gate (at the poly 2 surface potential) and the carry-out gate (at the poly 1 surface potential) would be at a surface potential midway between the two $\Phi 4$ surface potentials, as shown in Figure 6-9. This technique proved satisfactory. The arithmetic functions were correct. However, the amplitude of the outputs were greatly diminished.

![Diagram](image_url)  
**Figure 6-9.** Gate structure and surface potentials of a DP3A half-adder showing the conditions that have to exist in order that the charge packet under the slave side of the floating-gate will transfer out the carry port.
6.3 TESTING THE FLOATING-DIFFUSION CASCADED DUAL HALF-ADDER

Up until the DP3 mask set, we had always used a floating-gate sensing device. However, it appeared that floating-diffusion would provide a sensor that was more sensitive, require less area and operate faster than the floating-gate sensor. Therefore the layout of a dual half-adder was modified from floating-gate to floating-diffusion and included on the DP3 chip as an evaluation device. Later, when the DP3 was corrected and became the DP3A mask set, some of the cells were modified; the floating-diffusion dual half-adder was modified to the single half-adder described above in Section 6-1.

A block diagram of the floating-diffusion cascaded dual half-adder is shown in Figure 6-10 and the associated waveforms used during testing are shown in Figure 6-11.

Figure 6-10. Block Diagram of a DP3 Floating-Diffusion Cascaded Dual Half-Adder DCCL Cell.
Figure 6-11. Waveforms associated with the floating-diffusion cascaded dual half-adder test cell.
The dual half-adder test cell operated correctly as shown in Figure 6-13.

Figure 6-12. Functional demonstration of the floating-diffusion cascaded dual half-adders at a clock frequency of 20KHz with inputs of A = 01001011, B = 00101101 and G = 00010111.

6.4 TESTING THE FLOATING-GATE FULL-ADDER

In a full-adder (3-inputs), when all inputs are at a binary "1", the three input charges have to fill three serial charge buckets; whereas, in a half-adder (2-inputs) the two inputs charges have to fill two serial charge buckets. Thus the full-adder will always be slower than the half-adder. However, the full-adder requires only one third of the area required by the cascaded dual half-adders and the two one-bit delay lines required to perform the same full-adder function.

With this in mind for future large arrays, we incorporated a floating-gate full-adder test cell on the DP3A design.

The block diagram of the floating-gate full-adder is shown in Figure 6-13 and the waveforms associated with testing the cell are shown in Figure 6-14.
Figure 6-13. Block Diagram of a DP3 Floating-Gate, Full-Adder DCL Cell
Figure 5-14. Waveforms associated with the Floating-Gate Full-Adder Test Cell.
The full-adder functioned correctly as shown in Figure 6-15.

Figure 6-15. Functional demonstration of the floating-gate full-adder test cell at a clock frequency of 20kHz and inputs of $A = 11110011$, $B = 11000011$, and $G = 11000000$.

6.5 TESTING THE FLOATING-DIFFUSION FULL-ADDER

A variation of the full-adder test cell was included on the DP3A design. This variation has a floating-diffusion charge sensing switch instead of the floating-gate. A block diagram of the floating-diffusion full-adder is shown in Figure 6-16 and the clock waveforms associated with testing it are shown in Figure 6-17. The full-adder operated satisfactorily as shown in Figure 6-18.
Figure 6-17. Waveforms associated with the Floating-Diffusion Full-Adder Test Cell
Figure 6-18. Functional demonstration of the floating-diffusion full-adder test cell at a clock frequency of 20KHz with inputs of $A = 11110011$, $B = 11000011$ and $G = 11000000$.

6.6 TESTING DP3 BURIED CHANNEL DESIGNS

Two test cells on the DP3 received the buried channel implant; one logic circuit and a 10-bit shift register. Unfortunately, the logic circuit cannot be tested due to a contact mask error which precludes its operation. However, the 10-bit shift register has been tested with favorable results. The length of the register makes transfer efficiency comparisons difficult.

In p-surface channel operation, the surface potential is less negative than the gate potential, i.e.,

$$\Phi_s = V_g + \frac{Q_{ss}}{C_{ox}} - V_o + (V_o^2 - 2Q_{ss}V_o/C_{ox} - 2V_oV_g)^{1/2}$$

where

$$V_o = \frac{qN_{des}C_{ox}}{2}$$

However, in p-channel channel operation, the channel potential is more negative than the gate potential, i.e.,

$$\Phi_c = V_g + \frac{Q_{ss}}{C_{ox}} - q(N_A - N_D) t^2/2cs - q(N_A - N_D) t/C_{ox}$$

where $t$ is the thickness of the implanted channel.

6-20
In the following equation, the last term dominates, i.e.,

\[ \Phi_c = V_g - q(N_A - N_D) \frac{t}{C_{ox}} \]

In both surface and buried channel operation, the surface (or channel) potential is displaced from the gate voltage by a quantity which is inversely proportional to \( C_{ox} \). Thus, in surface channel operation, one must make a poly 2 gate more negative than a poly 1 gate to equate surface potentials.

However, in buried channel operation, one must make a poly 1 gate more negative than a poly 2 gate to equate channel potentials.

Based on this, one may devise an experiment to verify buried channel operation. In this experiment, which is summarized in Figures 6-19 and 6-29, a small amplitude, 4-phase clocking scheme is used. The poly 1 clock phases are offset such that they never go more positive than the most negative portion of the poly 2 clock phases. Thus, surface channel operation is impossible. Based upon this, one may conclude that the output shown in Figure 6-20 is buried channel operation.
\[ \phi_1 \quad -4V \quad -2V \quad \text{POLY 2} \\
\phi_2 \quad -6V \quad -4V \quad \text{POLY 1} \\
\phi_3 \quad -2V \quad -4V \quad \text{POLY 2} \\
\phi_4 \quad -4V \quad -6V \quad \text{POLY 1} \\
\phi_R \quad -20V \quad 0V \quad \text{POLY 1} \\
\text{INJ} \quad -5.5V \quad -14.3V \\
\phi_{\text{out}} \quad -2.65V \quad \text{POLY 2} \\
C_2 \quad -5.5 \quad \text{POLY 1} \\
V_g \quad -3.94V \\
\text{Data} \quad "1" = -2.8V \quad \text{Real Zero} = -4V \\
V_{dd} \quad -15V \\
V_{ss} \quad 0V \\
\]

**FIGURE 6-19.** DP3 Buried Channel Operation Verification
Figure 6-20. Potential Diagram of the Buried Channel Shift Register

\[ \theta_c \approx V_g - 2V \]

\[ \theta_c \approx V_g - 4V \]

\[ \theta_c \approx V_g + \frac{Q_{ss}}{C_{ox}} - \frac{q(N_A - N_D) t_{imp}}{2t_s} - \frac{q(N_A - N_D) t_{imp}}{C_{ox}} \]
7. SEMICONDUCTOR PROCESSING

7.1 INTRODUCTION

The CCD Processing Laboratory has placed great emphasis on process standardization. This includes all major processing steps such as field oxide and gate oxide growth, polysilicon film deposition and gate configuration delineation, standardized phosphorous and boron implantation methods, TEOS/silox deposition and densification cycles, metallization methods, elimination of wafer material defects by means of mechanical abrasion and heavy phosphorous (N⁺) gettering of the underside of each wafer, and a variety of rapid in-process electrical checks (e.g.: C-V measurements; B-T measurements; C_MIN/C_MAX plots for impurity concentration determination, etc.).

Process standardization has also been achieved by automation of critical processing steps; this includes automated field and gate oxide furnaces; automated wafer coating and developing equipment; etc. In general, a considerable effort has been made to eliminate all unnecessary processing steps. This processing philosophy has significantly increased LSI device yields, while shortening wafer lot turn-around time.

Current CCD circuit layouts, which include resultant topologies, are compatible with both in-house photolithographic capabilities and CCD LSI fabrication technology. Complex CCD LSI's are produced with eight or more mask levels, that permit 5 to 7 micron circuit geometries. Gate densities are limited by current wet chemistry etching and dry plasma etching processes; these processes are used to define polysilicon gate patterns that limit overall CCD circuit densities. Digital CCD's are currently fabricated with two polysilicon gate patterns, 1000Å thermally grown silicon dioxide (SiO₂) gate dielectrics, and 10,000Å thick E-beam deposited aluminum interconnection patterns.

7.2 DCCD PROCESS EVOLUTION

Fabrication processes for the DP series of devices evolved through a number of mask layout iterations; these were related to circuit geometry changes that improved major operating parameters such as dynamic range and transfer efficiency. Each new generation of circuits necessitated new mask sets with corresponding process sequence changes, primarily connected with gate technology variations.
Table 7 lists the four major DP generations, including significant differences between each generation.

Table 7. Six Process Modifications: DP-O, DP-1, DP-2, DP-3

<table>
<thead>
<tr>
<th>Circuit</th>
<th>Mask Type</th>
<th>Gate Technology</th>
<th>Design Rules in Mils</th>
</tr>
</thead>
<tbody>
<tr>
<td>DP-O</td>
<td>Slip</td>
<td>Metal/Poly Standard</td>
<td>0.3</td>
</tr>
<tr>
<td>DP-1</td>
<td>Slip</td>
<td>Metal/Poly with Poly</td>
<td>0.45</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Protect Mask</td>
<td></td>
</tr>
<tr>
<td>DP-2</td>
<td>Slip and</td>
<td>Poly/Poly</td>
<td>0.3</td>
</tr>
<tr>
<td></td>
<td>Conventional</td>
<td></td>
<td></td>
</tr>
<tr>
<td>DP-3</td>
<td>Conventional</td>
<td>Poly/Poly</td>
<td>0.3</td>
</tr>
</tbody>
</table>

7.3 MASK GENERATIONS

"Slip" masks were selected for initial mask generations. This approach provided three or four mask levels per plate; during the alignment of successive mask levels, the "slip" mask was displaced in either the X or Y direction. Combining four mask levels onto one plate reduced mask fabrication costs and delivery schedules. The major problem encountered with slip masks involved alignment difficulties caused by considerable "run-out" and "rotation" between successive mask levels. Normal lens aberrations did not permit the mask manufacturer to compensate for dimensional tolerance variations between mask levels. These masks were not produced by TRW's in-house mask-making facility, which was not in existence during the initial phase of this program. While slip masks were used for DP-O, DP-1 and DP-2 mask generations, fabrication difficulties combined with low yields necessitated the abandonment of this approach in favor of conventional mask sets, which provided superior results. Conventional mask sets were therefore used for later versions of DP-2 and DP-3 series.

7.4 GATE TECHNOLOGY

Two competing approaches were evaluated to determine the most reliable and reproducible method of fabricating CCD gate structures. Two basic processes were considered that affected the gate material and the Level II gate structures. Aluminum was used for the Level II gates for DP-O and DP-1. Poor step coverage by the metal led to the use of polysilicon, which provided a significant improvement in step coverage over that provided by aluminum. Polysilicon Level II gates were used for DP-2 and DP-3 circuits with success.
7.5 PROCESS MODIFICATIONS

Significant oxide undercut occurred during fabrication of initial DP-1 lots using the double polysilicon gate configuration. Several process changes were made to prevent this occurrence. A special photoresist masking step was used to cover all metal and polysilicon gates and interconnects before oxide cuts were made. The special "poly-protect" masking step successfully prevented oxide undercut as shown in Figure 7-1. Note that severe oxide undercut shown in Figure 7-1b resulted in broken metallization lines or poor step coverage, necessitating elimination of this problem by inclusion of the additional masking "poly-protect" step.

![Figure 7. Polysilicon Protect Configuration](image)

Additional process modifications were made to improve gate control; first gate oxide and second gate oxide thicknesses was changed from 1000Å/2000Å to 1000Å/3000Å. Both boron and phosphorous-doped second polysilicon layers were fabricated; these were gas-phase doped. Subsequent process improvements permitted in-situ doping of polysilicon with arsenic, thereby reducing the number of process steps.

The standard line and separation distances were 7.5 microns. The most advanced circuit produced (DP-3) was designed with 5 micron line separations. No major process modifications were required by this design rule change; the same positive photoresist was used in going from 7.5 microns to 5 microns. However, adjustments were required in development and exposure times.
7.6 BORON PENETRATION

A DP-1 wafer lot was processed by doping the source and drain regions and polysilicon Level II film simultaneously. This was an attempt to simplify processing by eliminating the nitride layer as a diffusion mask. Very low threshold voltage values were achieved and in some instances, depletion type devices were produced. Boron penetration through the thin SiO$_2$ gate insulator into the Si substrate was suspected. In the worst case, "boron penetration" will form a heavily doped P-type region immediately beneath the gate insulator of the CCD devices, resulting in depletion type devices. Subsequent tests verified this occurrence. A common technique used in the study of impurity penetration through SiO$_2$ layers involves measurement of the depth of the junction formed by impurity penetration; a determination of the time necessary for penetration of the SiO$_2$ layer to a particular depth in the silicon can be made.

It is obvious that significant electrical effects due to boron penetration will occur in CCD devices long before a pn junction can be detected in the silicon substrate. At the onset of boron penetration into the silicon substrate, ionized boron atoms will act as a negative charge layer at the Si/SiO$_2$ interface, shifting the flat band voltage of the MOS structure. Movement in the flat band voltage can be detected as an equivalent change in MOS threshold voltage. C-V measurements carried out on test capacitors clearly showed boron penetration. Penetration was not uniform; it varied across a wafer, as shown by variations in test transistor threshold voltage measurements.

Additional experimental work is required to characterize boron penetration effects. This includes:

- Using relatively low temperatures ($< 950^\circ$C) for all processes following the boron diffusion step.
- Using short diffusion cycles during boron doping of the polycrystalline film; this results in producing higher sheet resistance values for polysilicon gates and interconnections. Processing tradeoffs must be considered that control boron concentration in the polysilicon gate structures versus boron penetration through the SiO$_2$ film.
- Performing boron diffusions in a dry atmosphere while minimizing low temperature wet cycles following boron diffusions.
Increasing polysilicon film thickness to 5000Å (3000Å was used with the wafers reported here).

The variety of problems encountered with boron doping of polysilicon films resulted in a major processing change; polysilicon films were subsequently in-situ doped with arsenic, thereby bypassing the boron doping problem.

7.7 THERMAL OXIDE PROCESSES

Low temperature (920°C) gate oxides were produced for the DP device series; these were wet oxides grown by steam oxidation of silicon. The steam is produced by an in-situ reaction of H₂ and O₂ gasses (the oxygen-hydrogen torch method). Q₅S values were typically 5 x 10¹⁰ cm⁻²; N₅S (fast surface state density) was in the 10⁹ cm⁻² range. Mobile ion concentration was smaller than the sensitivity of the C-V measurement equipment. Radiation hardness of these devices will be evaluated during the next phase of this program.

Silicon dioxide, prepared by thermal oxidation of silicon in oxygen or water ambients, generally will have positive charges associated with it. As a result, the underlying silicon will be depleted or inverted if it is P-type or accumulated if it is N-type. These charges or states may be classified in at least four categories. The nature of these charges or states in relationship to the SiO₂-Si interface structure is indicated in Figure 7-2. The charges include: a) fixed surface state or interface charges; b) mobile charges within the oxide; c) surface recombination generation centers or fast states; and d) traps within the oxide, which can be ionized by radiation.

---

![Figure 7-2. Charges or States Associated with the Silicon Dioxide-Silicon System](image-url)

7-5
The fixed charge is apparently quite close to the SiO₂-Si interface and its density can vary from 0 to at least $2 \times 10^{10}$ electronic charges/cm$^2$. The mobile charges are usually the result of processing contamination. On the other hand, deliberate contamination can result in values over $10^{13}$ electronic charges cm$^{-2}$. The third category does not represent a fixed charge, but rather may be associated with the often discussed "fast surface states". The density of such active surface states may range from less than $10^{10}$ cm$^{-2}$ upwards. The presence of these states depends on processing conditions, while the silicon surface potential determines whether or not they are charged. Positively charged traps in the oxide have been observed after exposure to X-ray, electron or other ionizing radiation. The concentration of these traps is of the order of $10^{18}$ cm$^{-3}$.

Another type of charge observed in the study of oxidized silicon is that on the outer surface of the oxide. Usually this charge is a result of migration in the vicinity of a biased junction or field plate. Such charge migration requires a conductive surface which is usually caused by contamination.

7.8 METAL STEP COVERAGE

The metal step coverage problem was identified early in the process with DP-O lots. See Figure 7-3.

Figure 7-3. Breaks in the Al Metallization DP-O Design
The gulch or undercut field oxide of Figure 7-3 is shown again in Figure 7-4. It is caused during etching of the 1500Å oxide covering the "metal" channel region and by the steepness of the polysilicon film edge. This kind of step is very difficult to cover by Al metallization, resulting in open metal lines.

![Figure 7-4. "Gulch" Formed under the Polysilicon Film](image)

It was found at TRW that films formed by thermal decomposition of tetraethyl ortho silicate (TEOS) produce very smooth step coverage as shown in Figure 7-5. To use this Teos deposition, an extra mask level was inserted into the process sequence. Thus the DP-1 circuit employed the added mask level to etch the oxide protecting the field oxide region where the step coverage problem occurs.

![Figure 7-5. "Gulch" and Steep Polysilicon Step covered by a TEOS Film](image)

The resultant process involves depositing TEOS over the polysilicon film before the "poly protect" mask operation. After the "poly protect" masking step, the oxide in the channel was etched out and fresh oxide regrown. Figure 7-6 shows a metallized poly step covered with TEOS, indicating good coverage.
whereas Figure 7-7 is a picture of a metallized poly step without TEOS. Metal discontinuities can be observed in this area; also note the steepness of the poly step, which is responsible for the metal coverage problem.

Figure 7-6. Metallization over polystep covered with TEOS (DP-1 Design)

Figure 7-7. Metallization over poly step without a TEOS deposition (DP-1 Design)