Download Design of a sixteen bit pipelined adder using CMOS Bulk P
Transcript
DUDLEY KNOX LIBRARY NAVAL PCSTGRADUATE SCHOOL MONTEREY, CALIFORNI NAVAL POSTGRADUATE SCHOOL Monterey, California THESIS DESIGN OF A SIXTEEN BIT PIPELINED ADDER USING CMOS BULK P-WELL TECHNOLOGY by William R. Reid December 1984 Thesis Advisor: D. E. Kirk Approfed for public release; distribution unlimited 1223070 SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered) READ INSTRUCTIONS BEFORE COMPLETING FORM REPORT DOCUMENTATION PAGE I. 4. REPORT NUMBER TITLE 2. (and Subtitle) 5. Design of a Sixteen Bit Pipelined Adder Using CMOS Bulk P-Well Technology 7. AUTHORfM William 9. R. RECIPIENT'S CATALOG NUMBER GOVT ACCESSION NO TYPE OF REPORT PERIOD COVERED 6. PERFORMING ORG. REPORT NUMBER S. CONTRACT OR GRANT NUMBER("») Reid PERFORMING ORGANIZATION NAME AND ADORESS 10. Naval Postgraduate School Monterey, California 93943 II. 6 Master's Thesis; December 1984 CONTROLLING OFFICE NAME AND ADDRESS 12. PROGRAM ELEMENT. PROJECT, TASK AREA 4 WORK UNIT NUMBERS REPORT DATE December 1984 Naval Postgraduate School Monterey, California 93943 13. NUMBER OF PAGES 116 14. MONITORING AGENCY NAME 4 ADORESSf// dltferent from Controlling Olllea) 15. SECURITY CLASS, (ot thla report) UNCLASSIFIED 15*. DECLASSIFICATION/ DOWNGRADING SCHEDULE 16. DISTRIBUTION ST ATEMEN T (of this Report) Approved for public release; distribution unlimited 17. DISTRIBUTION STATEMENT 18. SUPPLEMENTARY NOTES KEY WORDS 19. (of the abetract entered In (Continue on reverse aide it Block 20, It different from Report) necessary and Identify by block number) VLSI Design, CMOS, CMOS-PW, Pipelined Adder, Carry Look Ahead Addition, CAD Tools 20. ABSTRACT (Continue on reverse side It necessary and Identity by block number) The design of a sixteen-bit pipelined adder CMOS integrated circuit is presented. The adder is designed to maximize throughput and to provide for testability. Tutorial material on CMOS design is also presented. DD ,^3 1473 EDITION OF 1 NOV 65 IS S/N 0102- LF-014-6601 OBSOLETE " SECURITY CLASSIFICATION OF THIS PAGE (When Data Bntarad) Approved for public release; distribution is unlimited, Design of a Sixteen Bit Pipelined Adder Using CMOS Balk P-flell Technology by William R. Reid Lieutenant Commander, United States Navy B.S., Purdue University, 1975 Submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING from the NAVAL POSTGRADUATE SCHOOL December 1984 ABSTBACT The design grated maximize of a sixteen-bit circuit is throughput presented. and to pipelined adder The adder provide for CMOS inte- is designed to testability. Tutorial material on CMOS design is also presented. TABLE OF CONTENTS I. INTRODUCTION II. CMOS CIRCUITS A. IV. COMPARISON WITH NMOS 10 1. The Inverter 11 2. The NOR Gate and Transmission Gate .... 13 CMOS DESIGN METHODOLCGIES 16 C. CMOS IMPLEMENTATION TECHNOLOGIES 20 1. CMOS- SOS 21 2. CMOS-Bulk 21 3. Twin-tub CMOS 26 CMOS TECHNOLOGY SELECTION DESIGN TOOLS 27 29 A. CAESAR 29 B. LYRA 31 C. SIMULATION 32 1. SPICE 33 2. RNL 34 DESIGN OF THE ADDER 44 A. LOGICAL DESIGN 44 1. Zero Level CLA Logic 48 2. First Level CLA logic 49 B. Second Level CLA Logic DESIGN FOR TESTABILITY 53 C. LAYOUT DESIGN 54 3. V. 10 B. D. III. 8 TEST PLAN A. INPUTS AND OUTPUTS 49 63 63 TESTING FOE CORRECT OPERATION 3. 1. TESTING FOR SPEED OF OPERATION C. 71. Intermediate results CONCLUSIONS 66 66 67 72 A. THE CMOS TECHNOLOGIES 72 B. CMOS CAD TOOLS 72 C. DESIGN OF THE ADDER 73 APPENDIX A: SPICE MODEL CARDS FOR 3-MICRON CMOS-PW DE7ICES 74 UNIX MANUAL ENTRY FOR RULEC 77 APPENDIX C: PRESIM USER'S GUIDE 7S APPENDIX D: ADDER SIMULATION 82 APPENDIX E: LAYOUTS 102 TEST 7ECT0RS 111 APPENDIX B: APPENDIX F: LIST OF REFERENCES 113 BIBLIOGRAPHY 115 INITIAL DISTRIBUTION LIST 116 LIST OF TABLES 1. Lyra Error Abbreviations 32 2. First Level CLA Logic for a 16-bit Sam 49 3. Register Serial Outputs 67 4. PLA Evaluation Sequences 69 LIST OF FIGURES 2.1 2.2 CMOS Transistor Symbols (a) 11 NMOS Inverter (b) CMOS Inverter . . 12 2.3 Minimum Dimension Inverters 14 2.4 2-input Nor Gate 15 2.5 CMOS Transmission Gate 16 2.6 NMOS-Iike CMOS Static Gat€ 2.7 Dynamic NAND Gates 2.8 Dcmino CMOS Structure 2.9 Circuit Difficult to Implement in Domino CMOS 2.10 P-Well Process, Top View 2.11 P-Well Process, Side View 2.12 Bipolar Transistors in CMCS-Bulk 2.13 The Latchup Circuit 2.14 Grounding of the P-Well 3.1 CMOS Exclusive OR [Ref. 6] 37 3.2 CMOS Latch Design [Ref. 6] 39 4.1 CMOS Output Loading Model 46 4.2 Preliminary Chip Floorplan 55 4.3 Dual Mode Latch 56 4.4 AND Gate 57 4.5 OR Gate 57 4.6 58 4.7 Exclusive OR Gate PLA Structure 4.8 Final Layout 61 5.1 Charge Sharing in 17 6] [fief. [Ref. 6] [Ref- 18 6] 19 ... [Ref. 6] [Ref. [Ref. 6] 20 23 24 9] [Ref. 6] .... 25 25 26 59 a PLA 68 ] I- IO1QDDCTI0N years the ability For several design custom digital integrated The Conway and Mead design Intr o du ction to VLSI System s [ of systems circuits has been growing. methodology described in Eef . 1 # permits the systems engineer to be his own logic circuit designer. computer-aided design tion of engineers to systems (C&D) A prolifera- such as the MacPitts silicon compiler [Eef. 2], the chip layout language (CLL) [ Ref . 3], the graphics editor Caesar [Ref. 4], and the hierarchical Burlap possible layout language engineer to for the rapidly carry Conway design methodology through to a make [Ref. 5] the Mead it and This final design. includes iterative simulation and redesign to provide justi- fiable confidence in the final submitted design for fabrication. techniques utilized in the Many of the most of the CAD methodology and the final design implemented in one type of doping for the (NMOS) tools are based a on having technology that uses only semiconductor material active region of the transistors. switching speed, Mead and Conway in the Because of their higher negatively doped metal oxide semiconductor transistor technologies are generally used. Selection provide of an NMOS implementation technology does engineer with a complete and proven methodology for the design of a very large scale integrated (VLSI) the systems circuit and allows the use of many extensively tested Like any other design decision, selection of NMOS iiplementation brings with it some limitations. There CAD tools. are two circuits. primary problems associated with NMOS digital speed limitation. ultimate switching is the The first Though many NMOS VLSI circuits operate at clock rates in the applications requiring The second problem is the dissipation higher clock rates. relatively large amount of power consumed by NMOS of the State of the art, commercially available digital circuits. 8 10 MHz to there are many range, commonly have power consumptions NMOS VLSI circuits vicinity of 3 to Considerable design watts. 5 effort is dissipation of this much energy required to insure that the millimeters on a side of the micron sized features chip measuring approximately by a in the does not alter the performance 5 on the chip. One group of switching technologies that speed and complementary metal semiconductors circuits also offer the benefits of a CMOS circuit. graphics CAD tool called design in the used carrying out the CMOS . In this thesis investiga- and Conway methodology was utilized much of the Mead in the design (CMOS) general purpose color A Caesar that circuits of NMOS design of the adder in CMOS two separate 16 bit has been frequently was employed. goals were pursued. speed adder implies not only also a small a In pipelined high speed The first, of course, is speed and the seccnd is verif iability. but is of greater radiation hard- ening and increased noise margin. tion, power consumption greatly reduced oxide increased offers both A high high clock rate of operation latency between input of operands and output of the sum. discussion of CMOS technologies and the implementation of logic circuits in those technologies follows in Chapter A 2. Chapter to construct 3 presents a description and simulate the of the CAD tools used layout for the adder. The logic and layout design of the adder is covered in Chapter and is followed by a test Chapter 5. plan for the fabricated 4 chip in CMOS CIBCaiTS II. Before attempted, the digital of CMOS design an understanding of how circuits can be to best implement logic It is also important to be functions in CMOS is necessary. aware of the advantages and disadvantages of the different In this chapter the operation of CMOS digital circuits is explained using similar The different NMOS circuits as a benchmark for comparison. CMOS iiiplementation technologies. methodologies for assembling the CMOS results are reviewed pieces to produce the selection of the CMOS-Bulk p-well implementation technology is explained. desired logical A. and the CCMPAEISON WITH NMOS In switching device, there is circuits NMOS digital namely only one type n-channel enhancement the metal oxide semiconductor (MOS) transistor. of mode The other prin- cipal device utilized in NMOS circuits is the depletion mode n-channel MOS device which acts as a load resistor. there are both n-channel transistors available. 1, is p-channel enhancement and As in NMOS, be considered on when Vdd present on its gate. its gate. In Figure 2. 1 , mode the n-channel device can a logical The p-channel device can be (typically +5 Volts DC) considered on when ground (GND) In CMOS a logical 0, , is present on the symbols that will be used are for the n-channel and p-channel transistors in this thesis. The basic differences between NMOS and CMOS technologies can be demonstrated by comparing their application basic digital circuits. 10 to some Vdd^ g ate g ate n-channel c p-channel 1-GND Figure 2. CMOS Transistor Symbols. 1 The Inverter 1 Figure 2.2 there is logical a the lead resistor logical shows an NMOS (a) 1 the voltage drop across the input, on Whenever inverter. is approximately Vdd and the output is a This results in steady state power consumption. 0. When the input switches to logical a before the output 0, on the logical 1, the lead capacitance (CI) output must be charged to Vdd through the load resistor with can assume a a resistance of several kilohms. longer transition frcm charged through the the load capacitance to 1 , cally only one 1 helpful in times. on resistance where all is typi- on resistance of The technique mode transistor. (evaluation) gaining control outputs are clock cycle over the This longer switching time from accounted for, tion switched on set to during one clock cycle and then selectively forced on the opposite to where The reason for this asymmetry fourth or less that of the prechar^ing circuits, to 1 is discharged through the the pull-up load depletion logical than from the pull-down transistcr's is that much a where the load capacitance is load resistor, NMOS enhancement transistor. of This results in to the however, has proven unsymmetric switching to 1 must still be and represents the primary limita- speed of NMOS circuits. 11 Figure 2.2 (a) HMOS Inverter In the CMOS (b) inverter of Figure 2.2 device to switch on device to switch off, resulting in the input is An input of logical applied to the gates of both devices. causes the n-channel (b) CMOS Inverter. 1 and the p-channel an output of logical 0. Similarly, an input of results in an output of 1. In both cases, one device is fully off, representing a resistance on the steady state power the order of gigaohms. Thus, consumption power is essentially consumption of sition when neither Additionally, zero. consequence occurs transistor since the and to 1 is during the fully on or only tranoff. capacitance is both turned on transistor, the 1 output load charged and discharged through to operation the In a switching delays are theoretically the same. Actually the switching delays depend on many parameters. The n-channel and frequently not the same, p-channel device the lobility of 12 dimensions are the electrons in the capacitive Also, p-channel. the Typically/ slightly longer transition seen by the greater than is device because of the highly the load seen by the n-channel p-well. load CMOS p-well (CMOS-pw) p-channel device in doped the holes in greater than the mobility of the n-channel is the in CMOS-pw result time of the to 1 is a output tran- attempt to compensate for this by consistently making the p-channel transistors wider than the Some designers sition- n-channel transistors. output of a CMOS full excursion between Vdd and GND. Unlike NMOS, makes a the circuits less sensitive to noise digital circuit This makes CMOS than NMOS circuits. more from future reductions should also benefit CMOS in feature restricted in ultimate feature size requirements of the depletion because the power dissipation is more NMOS size. devices will mode create more problems as feature sizes In Figure 2.3 the relative sizes of minimum dimen- shrink. implemented in currently available sion inverters 3 micron feature size CMOS-PW and NMOS technologies are shown. 2- The NOR Gate and Trans mission Gate Figure 2.4 shows the circuit diagrams and layouts of a two-input NOR gate implemented in both 2.3 and From Figures 2.4 it is more complex and gates are CMOS-PW and NMOS. evident that area consuming than static 1 CMOS their NMOS complementary circuits a redundancy in the structures is evident. The pull-up only or pull-dcwn only would be sufficient to implement the counterparts. logic. these fully circuits of Figures 2.3 and 2.4 the perform two tasks. A logical 1 on an input the CMOS In inputs must causes both In a connection between the output and ground and a 1 Static logic circuits continuously evaluate their inputs and produce their specified logic output. Dynamic circuits periorm logical evaluation of the inputs only when directed to do so by control signals and/or clock signals. 13 m i 'liiini , ;i ,1 l !li!'iH :il!iil |iir l!i' i:. ^ 111 i cut , :i''ii!,ii;!"'"!i I ilii; i ii !:i:i!i 'ii" ' =1111 Ion. lilllll plant si D--.ll HI ^X^ ^lf pair t,- , 1 Dolr '':.' . •:: dl ) fusion NMOS lii in DO H |li!i::!":.il!ii«|:.l"!i!iir!l!llii| : !:-l in:lim;;iiii'- CMOS Figure 2.3 Minimua Dimension Inverters. disconnection between the output two actions are equivalent, be to necessary to implement accomplish chapter. Figure 2.5 this are major difference p-channel devices described section B made up requires of this CMOS transmission gate of evident. bilateral nature of control signal for operation. requirement is that in pass transistor is It is and Design methodologies the logic. lies in the transmission gate. Logically these therefore only one action should The parallelism of the and the NMOS and Vdd. The the CMOS n-channel and polarities of the of both both The reason for this bilateral the p-channel device does not transmit low voltages well and the n-channel device does not transmit 14 omjAB j;JM fe:,|l i siilw-S =£*r* art iiiifill ii liisfii II •"|ltal!l!llli m 11=1 hrr= K'•X. • : >1M H mlr a a rflfflMlOT 411 r«i« NMOS CMOS Figure 2.4 high voltages drops make tors. well. The it necessary to 2-iuput Nor Gate. resulting unpredictable utilize both types voltage of transis- This increase in complexity over its NMOS counterpart offset by the absence of the level restoring circuitry NMOS requires following a pass transistor. 2 is partially 2 In NMOS digital circuits the length to width ratio of the pull down transistor is usually four times that of the depletion mode transistor load. This ratio is required to insure sufficient excursion of the output voltage. However, after a pass transistor is used, a ratio of 8:1 rather than 4: 1 must be used to restore the 1GS threshold voltage drop across the pass transistor. 15 JL in out 1 c , In general CMOS technolcgies are ratioless. of most CMOS gates, .. i CMOS Transmission Gate. Figure 2.5 of "improper" ratios , logical operation will not affect the it The use will only affect the speed of opera- tion of the gates. B. CMOS DESIGN METHODOLOGIES Static gate cies when compared to static more area consuming. the individual gates NMOS gates. Second, output load Third, a can be faster in capacitance of One approach static NMOS-like p-channel device CMOS, thus, each they are Though the p-channel the fanout 3 and circuit are doubled duplicating its both the pull-up and pull-down section. use a to remedy these deficiencies is to style of design as in Figure 2.6 Here the is always on and the pull-up to pull-down CMOS static gate functionality in First, they can be slower. and n-channel gates are in parallel, the serious deficien- CMOS circuits have three is redundant, dimension ratio is relied upon to produce the proper output voltage. This introduces power consumption problems and takes away the full excursion 3 Fanout represents the number output of a logic gate must drive. 16 on the output. of transistors Another that the NMOS-like CMOS Static Gate Figure 2.6 approach is to build up make extensive use of logic functions. both polarities of [Ref. 6]. transmission gates to Using transmission all control signals are gates means required- The required to route these control signals can become very area consuming, especially if only one metal layer is available. resulting A logic. large number of wires and more effective solution is to use dynamic Figure 2.7 contains three different implementations third of a dynamic three- input NAND gate. In each, the output is meaningful (i.e. represents the value of the boolean expression in1 in2 in3) only when elk is high and elk is low. circuits of Figure 2.7 (a) and pull-down ratio to produce the NMOS-like style of design, (b) full 17 The depend on the pull-up to proper output. As with the excursion on the output is lost and there is steady state power consumption during the The circuit in evaluation cycle. Figure 2.7 is prec- (c) harged when elk is low and evaluation of the inputs takes This configuration allows only one place when elk is high. 1 to 0, so the inputs must be change of the output from stable at the time inputs from 1 elk goes high. output to return to 1. general dynamic In one of the has gone high cannot cause the after elk to change of A CMOS eliminates the redundancy of static CMOS by applying all inputs to one type of device and r 1 /\ / \ |— elk ClkJ Q ' inl in2 in3 inl- in2 • in3 l —_ inl inl L. inlc in2 * in2 . inl in2 in3 i in3 elk l_ in3 — elk — i -=~ Figure 2.7 a control signal to Dynamic, HAND Gates the other type [Ref. of device. 6]. The most popular dynamic CMOS logic design technique is domino CMOS [Ref. 7], illustrated in Figure 2.8 Here the output is the 18 logical AND of the boolean function implemented and a low, control (clock) the circuit is precharged, in2 (in1 + to be in3) When the clock is signal. and when the clock is high A 4\ C * inl- in2 + in3 inl 3 ±aX in2 clock Figure 2-8 Domino CMOS Structure evaluation occurs. domino gates on a signals ripple purely static. output of prevents driven lew With a through the the outputs though the on inverter gates from all by the inputs. If the logic by all the Domino 2. 9 logic were begins. changing CMOS is not of Figure cycle the insures that low when evaluation of 6]. the evaluation chip as The follow each gate is answer though. common clock shared during chip, [Ref. the This unless always the were implemented in domino CMOS it would be more area consuming than the same circuit implemented in static CMOS. 19 Dynamic CMOS is more area consuming in this case because these with only a few inputs. are simple gates Each NCR gate if implemented stati- n-channel devices and two p-channel devices. If implemented dynamically, each NOR gate requires three transistors of one type (one for each input and one cally would need two for the control signal) (for and one transistor of the other type the control signal again) needed remains the The number of transistors . logic requires the same but the dynamic designer to keep three inputs electrically isolated instead And if the dynamic design technique is domino, of just two. six additional inverters will be needed. Figure 2.4, As can be seen in in CMOS a NOR gate can be constructed from just follow-on inverter of the domino in an OR gate. Thus a second inverter is Adding the one stage. design results required to return the logic to that of a NOR gate. 1 L>^ Figure 2.9 C. Circuit Difficult laplement in Domino CMOS. CMOS IMPLEMENTATION TECHNOLOGIES One of the to to principal issues in the design implement CMOS digital circuits isolate the two types of devices. in silicon of a process is how This can be accomplished completely insulating substrate or through complex fabrication process. by using a 20 to a more CMOS-SOS 1 . process currently The only Metal-Oxide offered by Semiconductor Implementation Service (MOSIS) which uses an insulating substrate is Silicon on Sapphire In this technology the n-channel and p-channel tran- electrically (SOS) . islands left after etching an sistors are formed on silicon epitaxial layer of silicon on 2 sapphire a CMOS-B ulk . by MOSIS CMCS processes offered The other CMOS-Bulk p-well technology. the substrate p-channel (n-channel) n-doped is heavily doped the back The p-well p-channel CMOS, heavy doping the substrate. To the mobility to act as p-well of the (n-well) device (p-channel) optimized. device is (n-channel) though and and is first placed degrades the performance of the n-channel while the 2) devices from the substrate p-well (n-well) gate. or (p-doped) devices are in this isolate the n-channel (p-channel) (1 In CMOS-Bulk p-well the presence or absence of capacitors. (n-well) all use The p-well processes differ in the number of layers of metal interconnections a substrate. (Al^O^) electrons of in In the n-channel device still exceeds that of the holes in the p-channel device, the performance difference of the transisThe more uniform performance of the two tors is ninimized. transistor types makes the appropriate for p-well process CMOS random logic. Figures 2.10 and 2.11 represent the top and side views of the steps of the CMOS-pw process for the production of an inverter. These steps n-type substrate the areas in (3) are: p-well is patterned, the p-well and on the substrate the polysilicon is patterned, masks are placed (the starting (1) N+ mask 21 (4) (2) with an The active are established, the two ion implant is simply the photographic negative of the P+ mask) (6) contact cuts are made, (5) , and the metal is placed. Latchup in CMOS-pw a. One CMOS-Eulk, p-well and n-well is both associated problems main the of with Basically latchup. between Vdd and can result in the complete destruction of a and GND, define the Many researchers have tried to formally chip. latchup involves generation of a short circuit conditions [Eef. 8] that cause latchup to occur. This task is extremely complex because the phenomenon is so dependent which is unique to on layout, fully quantitative analysis able, a Though each chip design. of latchup is still a not avail- show what happens on the qualitative analysis will chip when latchup occurs. Looking at the side view of an inverter in Figure 2.12, parasitic bipolar transistors can be seen. The base of the npn transistor is the p-well and the base of the is the n-doped substrate. pnp transistor These parasitic transistors are connected as shewn in Figure 2.13 . If the output of the gates goes below GND by a value equal to the threshold of the inject current npn transistor, its intc the (electrons) emitter starts base (p-well) to and the resultant collector current flows to the Vdd node. If the resistance between the Vdd ncde and the source of the pull-up p-channel HO S transistor, R1, is large enough, the voltage drop across E1 will exceed the threshold of the pnp transistor. The collector current (holes) of the pnp device flows to the node and sistor, across R2 sistor. GND node. the source R2, is If the resistance of the pull-down n-channel great enough, will increase the base As is evident, between the GND the resultant voltage drop current in the there is positive feedback. 22 MOS trannpn tran- n u i) 2) rzszss^ •*r»*\ 1 r t V« w m m m ii 1 I p» I L I __ J n poly [J 4) 3) lint j area , - .- p I i- * '< -ell — t •"r-n" J 1 i[b]i B contact LIMIJ 5) 111 X] [ J 3 I[z H Figure 2.10 t' ^t SBr^? P-Well Process, Top View 23 [Ref. 6]. I oxiae p-well n-type substrate gate oxide poly V ^\ V 1 I n C /^ N+ contact cut s~ J. c I \ ( T 7 Z metal 4 / Figure 2-11 \ — I 1 P-iell Process, Side View 24 [Eef. 9], The only way to stop this destructive process once it has Prevention of latchup started is to disconnect Vdd or GND. must te designed in. & GND n a. wz h 75S n+ n+ Vdd A A wy/, D+ p-well J^ n-substrate Bipolar Transistcrs in CMOS-Bulk Figure 2.12 Figure 2.13 The Latchup Circuit The MOSIS CMOS-Bulk p-well features for the specific 25 purpose [ Ref . [Bef. 6]. 6 J. design rules include of reducing the ' probability of p-wells and areas exist for P+ doped active is to Their aim The ninimum separation latchup. reduce the gain thus requiring transistors, Pigure 2-14 this purpose. parasitic bipolar noise spike of longer sequence. A frequently used larger a duration to start the latchup technique is the of the rules for grounding of the p-well as illustrated in Here the effect cf the P+ doped area covering . ground bus is to reduce the half of the contact cut for the resistance E2 in Figure 2.13 Another practice is to place . small capacitor across the Vdd and GND pins of CMOS-Bulk chips. To provide capacitive filtering of noise spikes on a the Vdd chip, together. GND busses and frequently run close to provide are designed are Vdd input pads Also, capacitance between Vdd and GND. - r N+ diffusion — f-p-well - poly :!::>:. :«:::; '. -' ~t_: r"v"-" Figure n-wells contact Y GND bus 3. V a cut" ?" S "fv-.-^=^i-" f- — P+ doping Grounding of the P-Well, 2. 14 Iwin-tub CMOS This process, and p-wells also on called twin-well, high resistivity a 26 uses N- both or P- . substrate, or in an epitaxial layer of silicon on a P+ or N+ Since the well doping does not have to overcome the wafer. substrate doping, both the n-channel the p-channel transistors p-well and transistors in the can be in the n-well Domino CMOS is enhanced by the use of this optimized n-channel devices can speed up process since the the complex boolean expression evaluation and the optimized optimized. devices can p-channel speed up signal drive the stages (thereby reducing the effect of a given D. f between anout) CMOS TECHNOLOGY SELECTION implementation CMOS The available technologies MOSIS are CMOS-Bulk p-well with one metal layer, p-well with CMOS-Bulk metal two metal layers and layers, capacitors (for from CMOS-Bulk p-well with two circuits) and analog CMOS-SOS. CMOS-Bulk are: The advantages of margin, (2) faster than NMOS, fabrication process. susceptibility, (2) and (3) Its disadvantages a good noise proven reliable are: (1) latchup use of p-well guard rings is needed if radiation hardening is desired, than NMOS or very (1) CMOS-SOS, and (4) lower circuit density (3) more complex design rules than either NMOS or CMOS-SOS. The advantages of CMOS-SOS are: CMOS-Bulk, (2) very good noise margin, radiation hardened, and are: (2) (1) (4) no latchup. faster than NMOS or (3) intrinsically Its disadvantages expensive fabrication process due to the sapphire, sapphire variability reduces the reliability of the (1) fabrication process, thermal mismatch between the (3) sapphire and silicon limits the carrier mobility, and (4) it is not a viable technology for channel leakage. 27 dynamic memory due to back CMOS-Bulk process for technology p-well was selected the following the adder for files for this process Naval Postgraduate School were the implementation reasons. First, available at the enabling the use of extant tools. Second, since this would (NPS) computer aided design (CAD) be the as first CMOS VLSI design at NPS, utilizing the most reliable process is prudent to prevent design problems from being clouded by implementation process problems. 28 III. DESIGN TOOLS methodology on To employ the Mead-Conway design three computer aided scale design, needed- design they are created is the first tcol required. is necessary rule checker rules for the specified Though not a technology have complex task, made for even a is needed provides the proper the Lyra [Eef. 4], the design to. manual design circuit simucircuit as designed Finally, the adder, a design of the In the Caesar the design rule checker Terman's ENL circuit simulator [Ref. A. design been adhered modest design makes logical output. pipelined sixteen-tit a large number of checks that the verify that to Next, that all to confirm rule checking highly error prone. lator tools are (CAD) layout design editor for viewing the circuits as A must be large a [ layout editor 10], and C. Ref . 11] were employed. CAESAE Caesar is a generic layout for any editor. It is not designed particular VLSI implementation technology. not even limited to designing integrated circuits. layout editor for the is a graphics It is Caesar creation and manipula- tion of rectangles where the user specifies the color, size, It is through the user specified technology and placement. file that the rectangles of color take on meaning. Naval Postgraduate there are files available metal School (NPS) for use with oxide semiconductors Caesar. (NBOS) and two technology One is the for N-doped other is complementary metal oxide semiconductors utilizing well (CMCS-pw) . 29 At the a for P-doped works with Caesar files special its own cf format. These file are indicated by an appended file type of ca(i.e. Caesar will generate a Caltech Intermediate Format (CIF) file cf the same layout. Again it technology file which tells Caesar which CIF layer is the xxxx.ca). command On labels to attach to the colored rectangles. At NPS, Caesar is commands from to take set up any terminal where the execution of the Caesar program is initi(usually the ADM-3a console adjacent to the color ated graphics display unit) and from a four-button puck on a graphics tablet attached its graphics Caesar displays the to display color an AED results on device. 767 color monitor and displays its menus, messages, and prompts on the Detailed information command console. on the installation Caesar at NPS can be found and operation of in Reference 4 and Reference 2. Caesar is an interactive CAE The results of any tool. command are rapidly displayed on the AED 767. of a ccmmand may be repeated undone stroke of the specified key running Caesar, a checker, to Lyra, (u) cr (.) The results with a on the command console. user may also call upon check the area inside single While the design rule and within three Caesar units* of the current box for design rule violations. This interactive use of the design rule checker layout graphics display and the helps to insure that there will not be any design rule forced changes late in the design cycle when changes are much more time of interaction of (1) consuming. With Caesar's level with the designer, the design loop consisting issue commands to perturb existing circuit, (2) visual inspection to verify command's generation of desired A Caesar design is layed out on a grid of Caesar units. These units do not represent any specific length. When creating a CIF file from a Caesar file the desired length of a Caesar unit is specified. 30 results, and (3) design rule checking of new circuit, can be rapidly completed. Caesar circuits can be created files of type sub-cells. hierarchical design is a by piecing which in turn .ca) Theoretically, together cells may be made up (other Net only can cells (sub-cells, be called upon to fill locations in a circuit, need to be modified to function properly, subedit facilitate editing mode to below the current of layouts one a level be taken when is used since the changes Everywhere the given cell are global. if they Caesar provides Care must editing level. this subedit feature other of there is no limit to the number of levels in the hierarchy. etc.) Caesar, With tool. made to the cell is used on the chip, the newly edited version will appear. B. LIRA like Caesar, Lyra is design rule generic a When Lyra is invoked from within Caesar, checker. the actual program depends on the technology file indicated in the header of the Caesar file being edited. After running, Lyra sends a message to the executed for design to check rule errors command console indicating the number the graphics display Lyra paints error and labels The error of errors found. the exact location of each design rule violated. each error with the label consists On of abbreviations for the layers involved, followed by an underscore, followed by an abbreviation for the type of violation detected. Table 1 lists the abbreviations used by Lyra for CMOS-pw. The winter 1983 distribution of the University of California at Berkeley (UC3) CAT tools included two versions of Lyra. One for the Mead-Conway NMOS design rules and the other for the Jet Propulsion Laboratory's feature size CMOS-pw design rules. 31 five-micron Since MOSIS no longer (JPL) TABLE 1 Lyra Error Abbreviations Abbreviation Layer polysllicon metal p-well n+ diffusion cut p+ diffusion * s w d X m Erro r minimum width minimum separation malformed transistor c P fabrication of supports CMOS-pw process, the JE1 Professor obtained. design CMOS-pw process the MOSIS supported three-micron rules for were P Annatarone Marco at Carnegie-Mellon University (CMO) generated the listing of the three-micron CMOS-pw design rules compatible with has provided NPS with a the prototype from To generate executable code copy. Lyra Lyra and program and imbed process design rules, the program rulec specific the (see Appendix B) is run with the design rule list file as its argument. when Lyra Now, is invoked from Caesar CMOS-pw technology circuit, contact cuts, This version of Lyra for exceeding any maximum maximum size only a the three-micron minimum feature size CMOS-pw design rules are applied. does not check while editing design rule in this which may not exceed 3 dimensions. technology is microns by 8 The for microns. Avoidance of improper contact cuts can be accomplished by utilizing Caesar's hierarchical nature. Contact cuts of all needed sizes and types are generated once and saved to be inserted as cells wherever needed. C. SIMULATION Once loop, a completed this initial design circuit layout has it matches the designer's conception of how it should appear and is free of design rule violations. ance of the given circuit, simulate the though, performance of the SPICE [Ref. 11] and ENL [fief. remains uncertain. design, 11] are used. 32 The perform- programs To such as . SPICE 1 . SPICE is an important simulation CMOS digital and analog of high speed device detailed circuits. With its provide can SPICE modeling, tool in the design accurate predictions of performance once the device parameters of the implementation technology SPICE provides are known. the logical output of a circuit based upon the inputs and describes the transient behavior of the circuit as it changes to the new logical output. Thus SPICE enables designer to optimize transistor dimensions for speed. a Unfortunately, the version of SPICE currently available en both the Vax 11-780 and the IBM 3033 at NPS 2G6) (version fails when the parameters of the devices fabricated by the MCSIS three-micron CMOS-pw With these trocess are used. parameters the transient behavior solutions do not converge. Engineers Washington at are (UW) version of SPICE therefore currently however, of experimental UCB) which is CMOS-pw device has other bugs and is general changes to SPICE 2G6 that enable University employing an developed at x available for three-micron CMOS-pw the with the three-micron This version, not and UCB, {version 2X. successful simulating parameters. CMU, distribution. The SPICE 2X.x to simulate the devices will be incorporated into the distribution of SPICE {version 2G7) The Naval Postgraduate School is in the gueue of institutions to next receive SPICE 2G7 once it is ready. In order to run a SPICE simulation of a designed place circuit following steps should be First, the labeling feature of Caesar is used to using executed. CMOS circuit Caesar, the electrical labels on (Vdd, the nodes of GND, input, output, etc.). command : cif 100 -p 33 interest in the Second, the Caesar is issued to generate the baseDame. cif file. 100 indicates unless be specified must and scale of 100 centimicrons a centimicrons per Caesar unit The parameter per Caesar unit 5 default value the of The -p is desired. 200 option causes entries to be made in the basename.cif labels assigned- Third, after exiting Caesar and returning the circuit extractor to Unix, file for the Mextra [Eef. 10] is invoked using the command mextra basename % To modify the basename. sim to create the file basename. sim. file to a SPICE file [Ref. of (basena me . spice) , the program sim2spice The basenane. spice file contains 11] is used- capacitors in transistors and the circuit in a a list SPICE compatible format. file must be The basena me. spice to specify the wave- model parameters for the transistors, forms of the input (s) performed (usually , add the edited to to specify the type of analysis to be transient analysis) output to be produced (tables, graphs, and to specify the The Spice etc.). User's Manual [Eef. 11] contains the formats of these additions to basename. spice. Best case and worst case device model parameters for the MOSIS three-micron CMOS-pw process as compiled by Dr. of MIT are found in Appendix A. 2. M Annaratone of CHO and Dr. L. Glasser EN I ENL is a timing and circuits. It is an logic simulator for digital MOS event driven simulator which uses a resistance-capacitance model of a circuit to estimate node transition times and to estimate the effects of charge 5 Since the minimum dimensions for the 3-micron CMOS-pw CMOS-pw process are specified in microns instead of lambda, circuits are usually designed or Caesar using one micron per Caesar unit. 34 sharing. 6 After input values have been assigned by the user, those inputs by repeating the RNL calculates the effects of node value following operations until there are no further when a node is added to the network due to a changes: (1) the charge sharing implications transistor being turned on, of the new node's capacitance and logic state on each of its electrical neighbors is might be affected, computed, Vthev and Ethev for each (2) (the node that parameters of the equivalent circuit) and the new are calculated (O.OVdd to 0.3Vdd = logic state is determined from Vthev logic 0, 0.8Vdd to I.OVdd = logic 1, logic X otherwise), (3) if the node has changed state, the transition time is calculated using the node's capacitance, and (4) any changes are propagated to other nodes. Details of the computation Thevenin found in the RNL Version U.2(0W) methods used by RNL can be User's Guide [Ref. 11]. understanding what of More important to the information RNL user is an what it idea of an keeps, discards, and how it decides what to do next. the operation Basic to event. of RNL is the The three elements of an RNL event are: in the network, a (2) (1) new logic state for the node, a node and (3) the time when the node value changes to the new logic state. RNL maintains a list of events, what processing remains to be an input, sorted by time, done. that tells When the user changes an event is added to the list. RNL sequentially processes the next event on the list, stopping when list is empty, or (3) elapsed. when (2) a (1) the node the user is tracing changes value, simulation the specified To process an event, time interval has 5NL removes it from the list, changes the node's state to reflect its new value, and then 6 Charge sharing refers to the capacitive effects that happen when two or more previously unconnected nodes, each having seme charge and capacitance, become connected by a resistor (transistor turning on). . 35 calculates events resulting any new the node's from new value. calculating new events, first all nodes that might be affected by the change are found and marked. This includes the source and drain cf all transistors for which In node is the current the gate nodes through these turned non-conducting transistor a node capacitances. discharging of two calcula- sharing calculation is due to the charging and charge performed to model changes of state First, to search The For each marked node, or an input is reached. made. nodes connected transistors. on network stops when a through the tions are and all Second, final value a calculation is done to determine the node's ultimate logical state. given node can have only two events pending: A (1) a describing an immediate change in the node's state due to charge redistribution among the nodes on the connection list, and (2) a final value event describing charge sharing event driven state the final, of the node. observes the RNL following rules for processing events: (1) when a new charge sharing event is scheduled, throw away all previously pending events for the node, and event is calculated, it will be when (2) new final value a ignored if there is (a) a pending final event for the same value which is scheduled to occur sooner, (b) there is a pending charge sharing event for the same value as the new final event, or (c) there is no charge sharing event and the new final value event is the same as the node's current value. These rules are based on event that was last calculated reflects the latest configuration of the network and therethe fore assumption that should override sharing events because any followed by events discard charge a the calculated earlier. pending any calculation sharing new final value calculation. 36 value final is Charge events immediately — — These event rules, incorrect results. generate sometimes lead RNL to however, especially true This is of signal driven circuits (circuits where inputs are applied to transistor as well as its gate) the source and drain of a and circuits devices that depend the behavior predict to analog properties on the of of the circuit. the For example, consider the first exclusive OR gate design for the /\ A 1 1 -C o Ql 103 c Q5 — i 1 a(+)b , 1Q< '* 1 i q: i q6 > — —• B Abflr CMOS Exclusive OB Figure 3.1 pipelined adder in , ,,. Figure 3. 1 [ Ref . This design 6 ]- has proven to function correctly at CMU, however, the RNL simulation shows this circuit failing. Starting in assume that the Q1 , Q3 , 0.4, a input state A where A=0, B=1, then transitions and Q6 are on. When input to A 1. and out=1, Initially goes high, Q3 is turned off (no events generated) and Q2 is turned on, generating a charge sharing event and a final value event for 37 Abar resulting in Abar going low. still turned on Q6 trying to drive the is now low and the still turned on Q4 a finite amount of When Abar goes low, output node recognizes that it takes (RNL time for Q4 the off but to turn does not recognize that n-channel transistors do not conduct high is still trying to drive the output node voltages well) The result is an output high. of X, the undefined state. Since turning off Q4 adds no new nodes to the network, the event list is empty and the output Q4 is turned off. Next, remains at The X. primary difficulty centers around circuit the fact controlled by two nodes that As a result, a RNL has that the with this output node is can change at different times. charge sharing event due to one input can eliminate a final value event of the other, with that final value event being the force which determines the circuit's actual behavior. The circuit cf Figure 3.2 is which also fails in BNL simulation. a proven latch design In Figure 3.2 the frac- tions next to the transistors represent the length to width ratios of the devices. circuit is dependent on these This ratios fcr proper operation. gain input signal of the greater than gates. These on the 1 to cause the gates or locked up at circuits, X. other (see chapter 5) signal to is the same of Q5 and Q6 to be at the input signal is the opposite when of the feedback signal. and Q6 of Q5 the difference in these gains RNI does not recognize either logical gates of the feedback the gain to be sufficient ratios insure that the As a result, the circuit becomes Because of RNI's difficulty with these two designs were employed in the final adder to facilitate testing of the overall design. To use RNL as installed at NPS, should be followed. basename.cif as before. extract the circuit, the following steps First latel the circuit and generate Again the program Mextra is used to this time with the 38 -o option (Mextra CMOS Latch Design Figure 3-2 basenaie -o) The . capacitances. A [ Ref . 6]. -o option causes Mextra not to compute follow on program in this sequence, Presim, performs this computation with greater accuracy. It should be noted that there are three different circuit extraction There is the MIT version, the programs, each named Mextra. DCB version and the Ufi modified UCB version. to be used in the seguence, format of the The next tool Presim, can accept the output MIT version and the UW modified UCB version. At NPS, the UCB version is installed and was used. and UI modified DCB parameters in a versions differ transistor Annaratone at CMU developed • a order of the specification. Professor program, cformat, to change a generated by the UCB sim file in the The MIT version to the MIT format. However, cformat does not work if the -o option is used with Mextra. To manually be file can accuracy, the .sim avoid a loss of The changed to the Ufl modified UCB format. 39 : : first step in editor to The the 71 to use text header line of basebe made is to that needs to UCB" to the add "format: name. sim. change is this format other change n-channel transistors from "n" to Using the EX editor, the following steps accomplish change the labels for the "e". this basename.sim g/ n/s//e/g e % : - invokes the editor - make global change for all n as first char change to in a line, e : w - write back edited file : g - exit editor The next to create step is a binary file for RNL This is done by issuing the from basename.sim using Presim. command % presim basename.sim basename config Basename.sim is the edited .sim file and basename file into which presim writes its binary output. the calibration file values for A copy of used select other to the circuit element capacitance user's the presim Consortium release 2.0 simulating the adder guide from is the Config is than default and resistance. the UW/NflC VLSI calibration file used in are contained in Appendix C. The and the values used in the calibration file are taken from the MOSIS supplied electrical parameters. The final step is to run RNL itself. This is done by entering one of the following two Unix commands: % rnl % rnl cmdfile or where cmdfile is the name of RNL commands. RNL to take Entering its a file containing a seguence of the first Unix command commands directly 40 from will cause the console : fying a speci- Onix command is used, If the second interactively. command file, RNL first executes all the commands in cmdfile and upon completion, starts taking commands from the In either case, RNL should be given the following console. commands (load "uystd. 1") (load "uwsim. 1") (read- network "has ename") the file generated by presim. where basename is load RNL two commands with several The first macros which simplify user interfacing with RNL. The user interface with RNL is a LISP interpreter. The interpreter continuously executes the loop: command, actions, read (1) a evaluate the command and perform the specified There are two formats print the result. and (3) (2) for specifying commands to this loop. The first is: (function argument argument ... argument) Here the parentheses delimit the command and spaces separate The interpreter reads the entire command, the elements. parenthesis, to the closing preted as a function and all up then the first element is interthe others as arguments. The arguments may be of the same command form, (function arg arg ... arg). If the following command were issued to RNL, (+22) 12 (* RNL would respond by typing (/ 96 14 7 )) (12*4*2). The other format for commands to RNL is (function where the " f " ' (argument argument ... argument)) indicates the quote special form which keeps its argument from being evaluated. For example, (+ 2 3) evaluates to 5, but (+ 2 3) is a string of three elements. When this second RNL command format is not used to represent an argument of another command (i.e. is not contained within f 41 the parentheses of another command) be written in it may , the more natural form: function argument argument .... <newline> in the University of Tutorials on RNL are contained Washington/Northwest VLSI Refe r ence Manual [Ref. 11]- Design Tool s There are two points concerning cycle RNL simulation Presim, the aextra, Consortium's VLSI user should be a aware of that are not brought out in the documentation. first concerns the use of vectors in RNL commands. evidenced in the tutorials Simula lion results make verbose. output of and 11 less RNL then want to assign values used to cumbersome vector has been defined, After the As the adder vectors can be in Appendix D, and input the of Reference The a and user will The documentation shows to it. the format of the vector value assignment command to be: (invec However, ' (vecname values)) "values" field has the The first character should be a its own or a respectively. and negative numbers, 1 specific format. indicating positive The LISP interpreter negative numbers but RNL will not accept negative numbers as logical inputs. The second character is will a work with letter specifying the number base of the for binary, h for hexadecimal) binary value +101010 to - input vector (b For example, to assign the the vector vectone, the RNL command would be: (invec The labels on » other (vectone 0b10 1010}) point the input pads. concerns the Ehen location the entire chip of input is being simulated, the input labels are normally placed on the metal pads where the off chip leads are attached. Before an input signal from a bonding pad reaches the interior circuits of a chip it must pass through a resistor in an overvoltage 42 circuit. protection process this resistor on input Therefore, extraction the In is pads, viewed as an the input label and simulation open circuit. must he placed after the resistor in the signal path. With Caesar, the requisite Lyra, CAD tools design loop. for the With these tools design rule errors can be designed. and ENL, designer at NPS has complete logical circuit circuits that are free of and produce the desired The lack logical results of SPICE somewhat restricts the designer's ability to optimize speed, design techniques that can be run fast. a but there are several employed to design chips that These will be covered in the next chapter. 43 IV. DESIGN OF THE ADDER the primary goals of the As stated in the introduction, adder design are The adder is to testability. clock cycle (A it should accept the least 1 , to maximize throughput and least significant bit, pipelined adder. fce a as inputs two significant bit, Every 16-bit addends through A16 and 31, and one carry-in through B16) It is desired to produce the 16-bit sum bit. to provide for (S the (Cin) ,the least 1 significant bit, through S16) and the carry-out (Coat) bit as quickly as possible. Both the number of clock cycles from input of the addends to the output of the sum and the duration of each clock cycle are to be minimized. consideration in dary the design secon- A is expandability. An expandable design is one that can easily be extended to produce a 32-bit or 64-bit sum utilizing the same circuit structures. In this chapter the logical design and layout design of the 16-bit adder will be presented. presented in tions found Comp uter this chapter are taken or derived in chapters three through A rithme tic by Flores The equations [ Eef . 12]. from equa- six of The Logic of In these equations concatenation implies the logical AND, the symbol + implies the logical OR, and the symbol + implies the logical XOR. A. LOGICAL DESIGN In considering the speed spectrum of adders from a logical standpoint, at the fast end there is the table look-up. With 33 binary inputs and 17 outputs, this would 33 require an address space of 2 17-bit words. With current technology this is not feasible- spectrum is the serial adder. At the other On clock cycle 44 1 end of the it uses A1, and Cin to B1, into tit 2). produce 31 and Clout On clock cycle it uses A2, E2, 2 Here generate S2 and C2out. 16 of tit one (carry out and Clout to clock cycles elapse before An adder can also be implemented as a the sum is available. ripple carry adder where the duration of each clock pulse is sufficient to allow a carry into the sum to propagate all In the case of the 16-bit the way through to a carry out. adder, this would require a clock duration at least sixteen times the middle ground The £Ref. 3]. gate delay length of the each bit position, C (i) A carry into bit(i) =1 G(i) G (i) <?(,->= addition the carry into [t) QB (ecn 4.2) =1 P (i) implies that propagated through to bit and A (i) will provide B (i) (i+1). carry a regardless of the contents of the of the sum, £(,-,)+£(,-,)/>(,-,)+ 5 (.)= less significant bits of adder (egn 4.1) {l) primitives. , will- be implies that into bit (i+1) (CIA) A[,)B {i) <?(,-)= a carry look- ahead generated from the propagate, is , />,,,= P(i), and generate, the belongs to In carry look-ahead bit adder. of the one A ••• + Cm P [i _ 1 yP {7) P [1) c l>)® p (.) and sum generation is as follows. E. (egn 4.3) ( e^n u « 4 ) The algorithm for the CLA The first event is the evalu- ation of equations 4.1 and 4.2 to generate the P primitives. The second event uses the P(i) and (i) and G G (i) (i) primi- tives as inputs to eguation 4.3 to generate the C (i) 's. The final event is the computation of the S (i) •s from equation 4.4 . 45 1 As pointed out by Flores [Eef. Conradi and 12] and by Hauenstein [Eef. 3], there are several logical implementatask of tions of carry look ahead addition. A principal this thesis investigation was select to a fast Without the circuit simulator Spice, design. of each design considered was In this tative. logical the analysis more qualitative than quanti- qualitative analysis, a turned on tran- sistor is considered as a resistor with its resistance proportional to its length and inversely proportional to its width. All gates driven by such a turned on transistor are considered to be capacitive loads with tional to the area of the gate. considered to add capacitance propor- The interconnect wiring is both parallel capacitive series resistance as shown in Figure loading and 4. /[\ si Rtraru Rwire Rwire —^WaCwire Rwire Cgatel Cgate n Rtrans « / Figure 4.1 CHOS Output Loading Model. From this model it is obvious connect wiring and the number that the amount of interof gates driven (fanout) should be minimized to minimize the output transition time when the positions of switches SI and S2 of Figure 4.1 are 46 . This led reversed. following to the guidelines in the design of the adder: 1) internal logic of each stage should be accomplished with minimum dimension transistors , 3 microns This leads to more microns (length x width) x 4 The interconnections and reduces the capacitive load on the preceding stage. (3-micron x 9-micron) Significantly wider transistors should be used at the output of each stage where the 2) with shorter circuits compact fanout and interconnect leading is greater. 3) should be kept of any transistor The fanout to less than five. requires This the capacitive because area. 3-micron A 3-micron 3-micron x loading of 8-micron x gate a has depends on transistor driving the its six other fanout of a fanout of 4-micron transistor driving 4-micron transistors x definition more complete a six. A same load is considered to have a fanout of three. Though this implies solved by merely that a high fanout problem can be increasing the width of the driving transistor, the effects of the interconnect wiring. to the load of a be more remote resistance of the driving transistor. wiring is proportional to inversely proportional to its width, increase unless wiring will As gates are added each subsequent addition must transistor, from the it neglects the width Since the its length and the resistance of the is also increased. However, since the capacitance of the wiring is proportional to its area, most of the gain achieved by widening the wire to reduce resistance is offset by the increase in capacitance. As a result, in the design of the adder, increasing the width of the driving transistor was not viewed as a complete fix for a fanout problem. For the addition, comparison of the different approaches the term logical event needs to be defined. 47 to CLA The definition basic most logic combinational is a circuit performing its specified operations on those inputs and generating a set of outputs. followed by the compuTherefore, the input of the addends, accepting set of inputs, a tation and output of the sum can be considered as a logical However, design consideration for the event. a primary to provide adder is for testability and is the availability of this provision (see section 3 of this chapter). key a element of intermediate results This implies breaking up the sum generation into several separate events. The first event takes the addends as inputs, performs some logic operation on them and stores the results in a register. (s) takes its inputs next event from that register its results in another register. and stores This chain continues until event deposits the sum the last The on the output pads of the To provide the tester with easily interpreted inter- chip. presented in the equations mediate results, this chapter were taken as boundaries for each logical event. the inputs and side of the equation determine on the right The terms the left side terms determine the output of a logical event. all Once equation are generated by the logic of the equation becomes part of inputs the previous events, for an the current event. 1 . Zero Level CIA Logic This to generate the First, equations 4.1 and 4.2 are used to generate the sum. P (i) three events logic requires f s and G (i) 's. Second, Finally, The principal problem generated. from equation 4.3 the C the sum is in the application input P has (1) a fanout of 15, unsatisfactory. 48 f s are derived from equation 4.4 with this approach for adder lies (i) of equation 4.3 which a sixteen-bit Here, the makes this approach First Level CIA Logic 2- that Noting logic is level CIA cascading 4-bit Table 2 four-bit a within the generated using sum suggests design guidelines the same slices of logic as zero indicated in available after six events and the Here the sum is TABLE 2 First Level CLA Logic for a 16-bit Sum Event Bits No. 1-4 Bits 13-16 P(i),G(i) Compute P(i),G(i) P(i) rG(i) Delay Delay P(i),G (i) P(i) ,G\i) Compute Compute 1 Bits 9-12 Bits 5-8 P(i) ,G(i) Compute 2 P(i) ,G\i) C(i) S(i) C[i) Compute Delay 4 5 6 Delay Compute Compute 3 P(l) f G\±) Compute c(i) S(i) S(i) Delay Delay S(i) S(i) S(i) Delay Delay Delay S(i) S(i) S(i) fanout is reduced by a Compute factor of four. Compu te Delay Delay P(i) rGli) Deiav ?(i) rGli) Compute C(i) Compute S(i) The event cycle time reduction would more than make up for the event count increase since cycle time grows faster than linearly with fanout. The only drawback with this design lies in the cost of extending it to generate 32-bit or 64-bit sums. every 4-bit slice added, another event is required. 64-bit add would require 3- blocks, Thus, a events. Second Level CLA Logic Again the data blocks. 12 For is divided into 4-bit slices called But rather than let the carries ripple through the two new primitive functions are introduced. 49 They are the block propagate, 3P(i) functions. 3P(i) = 1 implies that a carry into block be propagated through to block will generate (i) and block generate, , block (i+1) carry into block (i+1). a For is the least significant bit, block where bit(1) BG primitives are generated by (i) , will implies that BG(i)=1 . 3G(i) a 4-bit The BP and equations 4.5 and 4.6 respec- tively, with the P(i)'s and G(i)'s computed as before. P {i )P (i)P (i)P (i) BP[i) = BG [i) Next, ~ G (<) +G WP («) + G W P (*) P (»)"*" G (i) H) P i*) PWP (egn 4.6) (2) , which represents the carry into block (i+1), is computed using equation the block carry, from block (egn 4.5) 3C (i) 4.7 which represents the same lcgic as equation 4.3 *<?<o- £ * same method (i) 'ji+1 Bp u s, (egn 4.7) the ? (i) 's, G(i) 's, and BC(i)»s have been generated. If the of generating level CIA were to be used, the final used in sum as two additional zero events would be The first again applies the logic of equation 4.3 required. to each ' } after three events, So far, BP(i)'s, BG BG i» = to generate the 4-bit block Here the Cin for block (i) carry into is given by BC(i-1). each bit. The second from the C (i) 's and One of these events can be eliminated if, while the P (i) s. BC(i) 's and their predecessors are being computed, an estimated sum of the 4-bit block is also computed. One method cycle is used to generate the sum f is to compute two estimated sums for each block, one assuming an carry into the block of and the other assuming a carry in of 1. When the correct carry in for block (i) is generated, it is used to multiplex block to the output. the correct sum for the This assumed carry method was rejected 50 because of the large amount of area consumed ters needed to hold is to compute the by the regis- two possible answers. The second method estimated sum block assuming of the a and then correcting the estimated sum once the actual carry-in to each block is known. carry-in of Since the estimated sum, ES after the (i) third event and computing , is not needed until event again it as one leads to fanout problems, the computation of £5(4), the most significant bit, through ES 1) is computed in two events as ( First, an intermediate estimated sum, follows. computed using two-bit (see equations 4.8 from bit 4.12 (2) each assuming slices, On the next event, ES (i) a carry P . (eqn 4.8) /»(,, IESp) = P{2)QG{i) IES {i) = (eqn 4.9) (eqn 4. 10) {i) (eqn 4. 11) IC2Z = G( 2 )+G( > 1 )/ ( 2) 4. 12) (eqn 4. 13) £5( 2 = IES ^) (eqn 4. 14) ) ) ES {i) = (eqn £5 (I = IES (i) ES {S) = !C2ZQlES {i) [lES {i) IC2z]QlES i4) 51 is is computed from the IES(i)'s and IC23 using equations 4.13 through 4.16 IES {1) = , is computed using equation (IC23) (3) (i) carry in a At the same time, through 4.11). into bit IES (eqn 4. 15) (eqn 4. 16) estimated sums three events, after Now, 4-bit block and the actual carry into each block tions 4.17 through 4.20 SW = S {1) = S H) - can easily be extended [c^ES^QESp \c, ni ES (i) ES BP and primitives, third level ^QES { (egn 4. 18) (egn 4.19) {i) (egn c ,nh ES {l) ES [2) ES (S) (~)ES {i logic, a cd primitives represent the carry is this design 64-bit sums. 4.6 which produced BG can be 4.20) 16-bit sum generation of to the B3P the Additionally, events. The logic of equations 4.5 and level primitives (egn 4. 17) C<niQ ES {l) level CIA generated in only four are . s [i) = Using second (Cinb) can be computed using equa- From these the sum available. for each the second used again to generate These third level 33G. propagate and carry generate properties of 16-bit slices. The carry into each 16-bit block is provided by implementing equation 4.7 . Thus, adding one event will provide the 16-bit blocks of a 6 4-bit sum. carry into each of four The logic of equation 4.3 is then used to generate the carry into each 4-bit block of the sum and the final sum is computed as before. result is that by adding two events, for using the same logic as before be designed), (i.e. the 16-bit adder can adder. 52 a The final total of six, and no new circuits need to be extended to a 64-bit B. DESIGN FOR TESTABILITY Another primary that is, provide for testability, design was the adder objective cf to the ability to logically fabrication errors or circuit malfunctions rather than visually searching for faults with a microscope. the As the complexity of integrated circuits has grown, detect ability to logically the number complexity increases, tested for and isolate a of a number of input testing is As faults to be vectors required to of likely input number of markedly. Unless grow rapidly. a design allows the tester to examine the chip , the order of magnitude of the used which interior logic desired, the specific fault technique is has decreased and outputs available inputs the normally detect faults using only vectors required to perform useful logical if logical testability is prohibitive. Thus, design technique that a provides for it must be used. One such design technique is level sensitive scan design (LSSD) £Ref. 13]. level sensitive implies that the output of any logic element is dependent inputs. only on the levels of its No logic elements are allowed to depend on a tran- sition such as in an edge triggered flip flop. implies that all memory elements in the design an auxiliary function where their to an output pad for examination. Scan design are to have contents are serially fed This gives a tester the ability to examine intermediate results. applying the In 1SSD technique to the adder design, the following steps were taken. circuits were designed to respond to the level of their inputs and not to require a transition to trigger their operation. Second, to insure that each logic First, all event worked only with stable, non-fluctuating input levels, the inputs to each event were gated. 53 The input gates were opened only after the inputs of the previous event were were stable stable) the outputs (i.e. and closed before the the previous event were opened. Third, input gates of a was used to stcre the output of each logic dual mode latch In the event. normal mode the outputs latches of cf one lcgic operation, event the register in parallel and stores them to be used as inputs for the next logic event. its secondary mode of operation, the register stops In parallel inputs and starts to run as register, shifting its contents onto an output pad. taking its a shift conseguences of using the LSSD technique is amount of area consumed by the dual mode regis- One of the the large ters. In high speed operation, an inverter pair would be sufficient to store inter-event results. speed testing where the capacitance But to permit low of a gate may discharge during one clock phase, and provide the dual mode feature, a pair of clocked latches with control circuits is required. C. LAYOUT DESIGN With the logic decided upon, the next step was to create the layout of the adder. The lcgic consisted of four events to produce the sum. Another event was needed to latch the input data onto the chip. two-phase clock was needed to A insure that two adjacent events did not (insuring stable inputs to each event). run simultaneously To make the output compatible with the input to another adder, a one event delay was added. This insures that the output of one adder does not change while a second adder is using the sum from as an input. the first With two 16-bit addend inputs, one carry-in input, one power supply (Ydd) input, one reference (GND) input, a 16-bit sum output, one carryout output, and two clock inputs, ten pads were left from a of the adder standard 64-pin chip for register mode control input register (shift mode) output. called Since the design 54 and for five registers, latching the input logic each one for event and five pads were used data, one for for input of the register mode control signals and five were used for the their output to serially registers contents. With the required inputs and output identified, the preliminary floor plan shown in Figure 4.2 was created. Input Bl - phil B16 in phi2 in Cin Event 2 : compute P, G (phi2) u O 3 a C Cn c CD u u Event 3 : BG, Compute BP, IES, (phil) u IC23 0) — in H Event 4 : <u compute BC, ES (phi2) u p 3 a, jj a o Event 5 : Compute sum (phil&2) Cout and delay until phi2 SI out S2 out Q 2 Output S16 - S3 (J Figure 4.2 Preliminary Chip Floorplan. 55 C C ' 1 circuit designed was The first the dual mode latch of circuit is designed to latch the IN level when Control is low (Control is high) and phil is high Figure 4.3 Here the T ph£L CON A JL J A phil H — in CON I -l fH I ^ < I phil — 1 A ^ H 4-" 'I < shift in phil T 4 CON i shift out phil OUT r i ph FT Figure 4.3 (phi 1 is low). also stored phil Dual Mode Latch. When phil goes low, in the r second latch copy of the input is a and becomes available at shift-out which is connected to shift-in of the next latch. When control goes high, the IN signal is blocked and the latch takes its shift-in of ground. input from the register to the leftmost latch in a the left. is tied register The to Versatec plots of the mode latch and the other actual layouts of this dual circuits described in this section are given in Appendix E. The ,AND gate used was corstructed from a NAND gate Similarly, followed by an inverter as shown in Figure 4.4 the OB gate was constructed frcm a NOR gate followed by an 56 J J Although logic implemented using inverter (see Figure 4.5). these AND and OR gates is more area consuming than the same logic implemented in NAND and NCR gates only, the penalty is used infrequently in the final not severe because they were design. Figure 4.4 AND Gate. p /V p p rC ^ — -1 l[ 1 J , ( L -^J OR gate i .A + B 1— Figure 4.5 The exclusive i OB Gate. (XOE) inverters and three NAND gates 57 as was constructed from two shown in Figure 4.6 . Thougii this design is considera hly more area consuming than it was selected because the RNL the XCE gate of Figure 3.1, circuit simulator could correctly model its operation. Exclusive OR Gate- Figure 4.6 More complex programmed logic logical sum (OR) phase design phil is high, logic functions arrays where the (PLA) of the products was needed. (AND) time the produced stable outputs (phi2 gcing using outputs are of inputs. FLA designed to A between the implemented were A single compute when preceding event low) the had and the time phil goes low, had to produce the proper sum-of -products results. To hold down fanout, a dynamic structure was needed so that inputs could be applied to prevent steady state power a single type of transistor. consumption a To precharged dynamic structure was needed. Because of charge sharing, the precharging must take place while the inputs are present on the transistor gates of the PLA (see chapter 5, section C, for a complete explanation of the charge sharing problem in this PLA structure) . Thus, two distinct events must occur during 58 this time period. First, the inputs must be applied and Then evaluation must occur. precharging must take place. To cause these two events to occur during a single phase of the inter-phase time when both phil and phi2 are The basic structure low must be utilized for precharging. the clock, of the resulting PLA is shown in Figure 4.7 Figure 4.7 deferring PIA Structure. the flocrplan back to in Figure 4.2, the layout of the circuits which perform the logic of each event are presented names assigned to the layouts are given below. Event 1 consists of a 33-bit dualmode latch. Event 2, which computes the P and G primitives for each bit, is made up of 16 AND gates, 16 XOE gates, and another 33-bit latch. Event 3, which computes the BP and BG primitives, is made in Appendix E. The and the IC23 for each 4-bit block, up of four instances cf PLA82 and a 29-bit latch. The IES (i) f s 59 . The circuit PLA82 2-output PLA Event 4, computes the ES(i) 21-bit latch. f s and BC for each 4- bit PLA84 to compute four instances of instance of and one 5-product, two XOE gates, ore AND gate, and one OR gate. , which block uses an 8-input, up of is made compute the PLA915 to the ES(i)'s BC (i) 's The circuit PLA915 is a 9-input, and a 15-product, 5-output PLA and the circuit P1A84 is an 8-input, 7-product, Event 4-output PLA. compute the S (i) f 5 uses four instances of and a 17 bit latch to s PLA104 to store results and provide the added delay (by taking the output from the shift out position, the extra clock cycle of delay is generated) The circuit PLA104 is a 10-input, With this design, cycles of two-phase non-overlapping clock; three cycles of a chip and the time output. time the addends are presented the sum becomes available at the In the first three registers the odd number of bits the need to store the carry-in is due to 4. 4-output PLA. the input to output latency is three full the clock elapse between the to the 14-product, In the last two registers value until event the odd number of bits is due to the need to store the computed value of carry-out. resulting final layout of Figure 4.3 shows actual on-chip layout locations of each event's logic. The In the circuits addition to the logic circuits for each event, These are driver circuits for AMP and AMP5 are also seen. the high fanout the control and clcck signals. Each takes as signal and produces as outputs the control signal and its inverse, both driven by 3-micron x 160-micron transistors. This amplifier is the same design its control input a used by the output pads to drive off chip loads. represents one implementation of a pipelined CLA adder designed for testability. The relative This final layout merits of this mented can, design and others that may as yet, have been imple- only be gualitati vely discussed. The addition of SPICE 2G7 to the CAE toolbag will provide future 60 Figure 4.8 CMOS designers make decisions Final Layout. with the quantitative analysis involving tradeoffs objectives. 61 among necessary to primary design This final design, when simulated using RNL, Testing of the should give an indication of properly at clock speeds up to 14 actual chips produced by MOSIS the accuracy presents a megahertz. following chapter for proper operation of the of RNL's predictions. test plan to check adder at low clock rates and ating speed. functioned The to determine the maximum oper- 62 7. TEST PLAH After several iterations of the design-simulate-redesign loop, a final layout was achieved for the 16-bit pipelined These iterations provide considerable confidence in adder. the logical correctness of the layout. Appendix ENL simulation results for the full adder. should be kept in results it D contains In reading these adder requires aind that the In three cycles of the two-phase clock to produce the sum. inputs were kept first part of the simulation, the the constant for three clock cycles to facilitate easier reading With these steady inputs, simulations were of the results. concentrating run to verify the generation of correct sums, on those addends that carry generates across The last cycle. feature of the carry propagates the boundaries of the simulation utilized part of the each clock would produce This was done to test 4-bit blocks. different inputs the pipelining insuring no dependence design, and on repeated inputs of the addends to produce the proper sum. After fabrication of the chip, application of similar inputs to make the same determinations for the actual circuits will form the initial portion of the test plan. In this chapter a test plan for the verification of computa- tional correctness and speed will be presented. A. INPUTS AND OUTPUTS The first step in testing the chip will be to connect it to the required input and this, must be the output circuitry. identity of the inputs determined. To accomplish and outputs on each pin Microscopic will reveal the logo "16-bit examination of the chip Add", located between the GND and Vdd in the buses for the pads 63 northeast corner (see Figure 4.8 which is repeated below for convenience). this landmark, the signals on the pads can be follows. Figure 4.8 (repeated) 64 Final Layout Using labeled as has sixteen input pads The western edge A, least significant bit, with the northern northern The end. located at the A(1), edge of for the addend the chip also has sixteen input pads for the addend B, with the least signifiThe southern located at the eastern endcant bit, B{1), output pads and two input edge has fourteen pads. At its western end is the GND input pad followed by fourteen output the most significant bit of the sum, through pads for S(16), S(3). Following S ( 3) at the eastern end is the input pad , the chip has eight input pads The eastern edge of for Vdd. Starting at the northern end, there and eight output pads. are input pads for phil, phi2, C0N1 Cin, the dual mode register of event 1), C0N5. They are SREG4, SREG5, Cout, applied to S (2) , power to and for a logical a phase time, sonie proper operation, For C0N4, and (serial SREG2, SEEG3, at the southern end. +5 volts volts and Simulation with RNL revealed high for S (1) or DC should to the GND including clocks and control either GND signals. of event 1), the chip, the Vdd pad logical inputs be C0N3, followed by output pads' for SREG1 output from dual mode register To supply C0N2, (control signal for Vdd for be All pad. signals should a logical 1. restrictions on the clock each clock should remain minimum of 20 nanoseconds and the clock interwhen both phil and phi2 are low, must be at least 10 nanoseconds in duration. For initial testing, to insure that charge sharing protlems caused by too short an interphase time, and fanout problems caused by too short a clock phase duration, are not interpreted as fabrication errors, the clock speed should be adjusted so that both above clock parameters are exceeded by one order of magnitude. The outputs, like the inputs, are at Vdd to represent logical 1 and at GND to represent a logical 0. used to measure the outputs should have 65 a The circuits high input impedance, on the order of The output pads of one megohm. handle the current source and the adder are not designed to transistor-transistor logic integrated circuits should be circuits. The output measurement constructed using NHOS or CMOS devicesthat are designed to operate between +5 7clts DC and ground. sinx requirements of B. TESTING FOE CORRECT OPERATION After connecting the adder to test harness, a of correct the generation the next step is to verify adder. There are several inputs that should be included in the testing circuits. sums by correct operation to verify the These are contained Appendix i-n the of individual In addition F. vectors of Appendix F, several randomly selected input vectors should be tested. If the adder should fail to to the test generate correct sums, The LSSD features can be employed to examine intermediate results. 1 . Interm e diate results With the LSSD design, tester a levels constant for a long period can leave input of time and use the shift mode of the internal registers to examine the internal state of the chip. The rightmost bit of each register is always available at the output pad for that register. To obtain the contents of the other bits, the control signal for the given register clock continues serial output high. is set to to run. at logical and held For registers will be meaningful and and stable while The serial output of registers 2 and 4 5 the phi2 is will be stable lists in order the intermediate when phil is high. Table values available at the 5REG (n) 3 3, 1, while the 1 CONn is high. 66 output pad when the input TABLZ 3 Register Serial Outputs Clock Cycle SEEG1 SHEG2 SREG3 SREG4 SREG B1 P1 B2 2 3 4 5 6 B3 B4 B5 B6 B7 B8 B9 BP1 IES3 IES4 BG2 IES5 IES6 IC67 BP3 IES11 IES12 BG4 IES13 IES14 IC1415 BG1 IES1 IES2 IC23 BP2 IES7 IES8 BG3 IES9 IES10 IC1011 BP4 IES15 IES16 Cin Cin BC2 Cout ES2 ES4 S.1 1 P2 P3 P4 P5 P6 P7 8 9 B10 10 B1 11 B12 313 314 315 B16 P8 P9 P10 P12 P12 P13 P14 P15 P16 A1 G1 A2 A3 A4 A5 A6 A7 A8 A9 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 G14 G15 G16 Cin 7 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 A10 A11 A12 A13 A14 A15 A16 31 Cin 32 S3 S5 S7 S9 ES6 ES8 S11 S13 S15 ES10 ES12 ES14 ES16 Cout S2 S4 S6 S8 S10 S12 S14 S16 BC1 BC3 ES1 ES3 ES5 ES7 ES9 ES11 ES13 ES15 33 34 C. TESTING FOR SPEED OF OPERATION Once the culled from chips containing fabrication errors the chip remaining is to returned set by MOSIS, determine just how fast the have been the task adder can run. Rather than simply increasing the clock rate until the adder fails, the duration of the time both phil and phi2 are high, and the interphase simulation time should indicates that RNL reduced separately. the circuit which generates S4 within P1A104 is the limiting tion (i.e. it circuit for clock phase dura- requires the 67 longest time to correctly evaluate its inputs). RNL simulation also the circuits in PLA 104 which generate indicates that S1 and S4 are the limiting circuits for the clock interphase duration. precharged dynamic circuits, the evaluation clock phase must be long enough to allow the inputs to drive the outputs to their proper even if the inputs are the same as those of the values, previous evaluation cycle. This allows the tester to use a constant input as the duration of each clock phase is Since the is constructed PLA of reduced until the adder produces incorrect results. Determination of the clock more difficult. changing to cause charge Figure example, is in1=1, interphase duration limit is This is because the inputs to a PLA must be 5. 1 sharing problems For Charge Sharing in a PLA. in Figure 5.1 assume in2=0, to occur. that and that this is 68 the first set of inputs correctly evaluated to 1 Now assume that the next produce out=0 when phil is high. in1=0 and input is out=0. in2=1, evaluate to should also which However, if the precharge time (when the inputs are is present on the gates of Q2 and £3 and phil is still low) insufficient, C2 will not be charged to Vdd when precharging ends was discharged to (C2 zero volts during the previous evaluation when in1 was high and phil was high). Now, when the low voltage across evaluation begins (phil going high) C2 causes Q5 and Q6 to interpret their input as a logical 0. As a result the output of the Q5-Q6 inverter pair goes high, causing Q8 to turn on, discharging C4 and resulting in an output of logical 1, which is incorrect. Table 4 lists the proper evaluation seguence when precharge time is sufficient and the improper seguence In this table, time. voltages a 1 on, a indicates off, indicates GND, and X indicates transistors, 1 indicates For the 4 PLA Evaluation Sequences Proper evaluation seguence: phi in C Q 2 1 1 12 1234 10 0011 10 01 01 00 11 01 1 0011 0111 0111 1 '234 T; out 1 1000 10 C0 010101 1C01 010101 1001 001101 1C01 1010010C01 Improper evaluation seguence: phi in C 1 Q 1 2 12 1234 1234567890 10 1 10 1 1 01 01 01 0011 0011 0011 0X11 oxxo a and an X indicates neither fully on TABLE 1 precharge for the inputs, output, and capacitor indicates Vdd, somewhere in between. to insufficient due 1100010C01 010101 1C01 010101 1C01 0011011C01 1010XX0X10 69 1 nor fully Subsequent inputs off. produce correct with constant results since precharge time will add of in 1 = more charge to sufficient charge to allow the and in2=1 may inputs, C2 until each there is output of the Q5-Q6 inverter to remain low. to check Thus, for charge sharing problems in the circuit of Figure 5.1, the inputs must alternate. Likewise, in PLA104 to check for charge sharing errors in output S1, BC=0 ES1=0, and ES1=1, its inputs must alternate between as the interphase time is BC=1 reduced. four instances of PLA104 plished for all This can be accom- simultaneously by alternating inputs of A = 0001 1001 1001 1001 B = 0000 1000 1000 1000 Cin = A = 0000 0000 0000 0000 B = 0000 0000 0000 0000 1 and Cin = To check for cycle must 104 charge sharing errors in S4, S3=S2=1,S1=0 and This may be accomplished for all four between BC=0, S4=0,S3=S2=S1=1 . the inputs to PLA BC=1, S4=0, instances of PLA104 simultaneously by alternating inputs of A = 0110 1 1 10 1110 1110 B = 0000 1000 1000 1000 Cin = 1 and A = B = 011 1 0111 0111 0111 0000 0000 0000 0000 Cin = This maximum identified speed testing assumes the slowest circuits 70 that RNL on the has correctly chip. RNL simulations (PLA915) PLA915 vs. have indicated is at least 20% faster 20.1 nsec for functioned properly with next slowest that the than PLA104 PLA1C4). a 5 Also, circuit (16.0 nsec for ail other PLA's nsec interphase time. Should PLA104 prove to be the speed limiting circuit for the chip, the actual failure speeds of the chip can serve as an indication of the accuracy future designs. 71 of the RNL simulation for VI. CONCLUSIONS The experience gained in the design of the adder coupled with the clarity of hindsight leads to the following conclu- sions and recommendations. A. THE CMOS TECHNOLOGIES CMOS technologies The increasing importance will in the MOSIS is already offering, size. A "VLSI scalable set of role designs of one-micron a design rules, fabrication in 3-micron CMOS the far more a steadily of future. the an experimental basis, on fabrication with Bulk p-well play CMOS minimum feature allow initial to fcr design verification before expensive 1-microc process is used, is being is considerable research developed. the private In sector there aimed at finding an insulating not have the Progress in substrate material that does variability and thermal problems this area will remove of sapphire. the drawback caused by latchup tendencies in CMOS Bulk. B. CMOS CAD TOOLS Though the design tute a complete circuits, the tccls currently available at set for recent the CAD design of tool set NPS consti- CMOS Bulk released p-well by the University of Washington/Northwest VLSI Consortium, Release coupled with University of California at 2.0 [Ref. 11 ], Berkeley Winter 1983 CAD tools, represents a more complete and cohesive set for CMOS design. When sufficient disk space on the Vax Release 2.0, 11-780 beccmes implementation of the 72 available to load the Release 2.0 CAD package An added is highly recommended. Release 2.0 package benefit of installing the cell is the library provided. The contains several basic standard cells with known The library also contains the performance characteristics. Though MOSIS does not standard pad frames used by MOSIS. library require the use of standard pad frames on designs submitted, their use does speed up fabrication. As mentioned earlier, as socn as SPICE 2G7 is available, the CAD toolbag would be most advantageous its addition to to a CMOS designer. C. DESIGH OF THE ADDER If the design of the adder were to be undertaken again, approach to generating the sum would probably have been used, especially if the new CAD tools mentioned above were available. The logic approach to the computation would still involve CLA addition, but it would be accoma different combinational logic and library plished using cells rather than PLA*s. Testability would probably suffer greatly, but effort would be made to reduce the sum generation tc two logical events. Though the level of testability provided by the current design should provide considerable insight into CMOS Bulk p-well performance and CAD tool accuracy, would be no need to repeat the investigation. 73 there APPENDIX A SPICE MODEL CABDS FOE 3-MICRON CMOS-PW DEVICES CMO* models for MOSIS 3-micron CMOS Bulk p-well devices: Fast Models .model n +xj=1.1e-6 .model p + xj=1.1e-6 nmos gamma=.3 pmos uo=500 vto=-.4 gamma=.3 lambda=1e-7 tox=0-7e-7 vto=0.4 cbs=5e-4 cbd=5e-4 Iambda=1e-7 tox=0. 7e-7 ld=1e-6 ld=1e-6 cbd=3.5e-4 cbs=3.5e-4 uo=300 Slow Models .model n + xj = 0.6e-6 .model p xj=0.6e-6 nmos lambda=1e-7 tox=Q.8e-7 vto=1.0 gamma=1.3 uo=400 gamma=-9 cbs=6e-4 cbd=6e-4 vto=-1.0 tox=0.8e-7 pmos lambda=1e-7 ld=.5e-6 cbs=4.1e-4 cbd=4.1e-4 uo=200 ld=.5e-6 MIT Models for MOSIS 3-micron CMOS Bulk p-well devices: Slow - Slow .model nss nmos +xj=.35e-6 cjsw=4e-1C cj=6e-4 +cgso= 1.3e-10 cgdo=1.3e-10 + vmax=5e4 pb=.7 +neff=2.5 ucrit=8e4 .model pss pmos +xj=.35e-6 rsh=20 level=2 mj=.5 tox=650e-10 wo=475 mjsw=. 5 uexp=.25 rsh=80 level=2 cgdo=1.3e-10 +vmax=5e4 pb=.7 +neff=2-5 ucrit=8e4 vto=1.2 nsub=1.5e16 tox=650e-10 cj=4.1e-4 cjsw=2.5e-10 +cgso= 1.3e-10 ld=.25e-6 mj=.5 uo=190 nsub=5e15 mjsw=.5 aexp=. 74 15 ld=.25e-6 vto=-1.2 tpg=-1 1 Slov n-type Fast p-type rsh=30 level=2 .model nfs nmos tox=600e-10 cj=6.0e-4 cjsw=4. Oe-10 uo=475 +cgso=1.9e-10 cgdo=1.9e-10 nsub=1.5e16 +xj=.35e-6 vmax=5e4 pb=.7 +neff=2.5 ucrit=8e4 vto=1.2 mjsw=.5 mj = .5 uexp=. 25 rsh=20 level=2 .model pfs pmos ld=.25e-6 tox=600e-10 ld=.40e-6 xj=.60e-6 cj=2.0e-4 cjsw=1-0€-10 uo=270 vto=-0.6 + cgso=2.0e-10 cgdo=2.0e-10 nsub=0.3e15 tpg=+vmax=5e4 pb=.7 +neff=2.0 ucrit=8e4 mjsw=. m j=. 5 uexp=. 5 15 Fast n-type Past p-type rsh=10 level=2 .model Lff nmos tox=550e-10 cj=3.0e-4 cjsw=2. Oe-10 +xj=.60e-6 +cgso=2.5e-10 cgdo=2.5e-10 vmax=5e4 pb=.7 +nef f=2.5 ucrit=8e4 .model pff pmos mj=.5 uo=675 ld=.40e-6 vto=0-6 nsub=0.5e16 mjsw=. 5 uexp=. 25 rsh=20 level=2 tox=550e-10 +xj=.60e-6 ld=.40e-6 cj=2.0e-4 cjsv=1.0€-10 uo=270 vto=-0.6 +cgso=2.5e-10 cgdo=2.5e-10 nsub=0.3e15 tpg=-1 vmax=5e4 +neff=2.0 pb=.7 ' mj=.5 ucrit=8e4 mjsw=.5 uexp=. 75 15 5 Fast n-type Slow p-type .model nsf naos xj=.60e-6 cgdo=2.0e-10 +vmax=5e4 ph=-7 +neff=2.5 ucrit=8e4 .model psf pmos tox=600e-10 uexp=.25 rsh=80 cgdo=1.2e-10 mj=.5 ucrit=8e4 vto=0.6 mjsy=.5 aij=.5 level=2 pb=.7 uo=675 ld=.40e-6 D=ub=0.5e16 cj=4. 1e-4 cjsw=2.5e-10 +cgso=1.2e-10 vmax=5e4 neff=2.0 10 cj=3.0a-4 cjsw=2.0€-10 +cgso=2.0e-10 +xj=-35e-6 rsh= level=2 tox=600e-10 uo=190 nsub=5.0e15 rajsw=. uexp=. 76 15 ld=..25-6 vto=-1.2 tpg=-1 o L APPENDIX B DNII MAHUA1 ENTET FOB EOLEC RULEC (CAD) CAD Toolbox User's Manual RULEC (CAD) NAME — Compile design rulec rules for Lyra SYNOPSIS rulec [—lo] rules DESCRIPTION Rulec i) . is a shell script with the following processing steps: The actual Lyra rule compiler is invoked to translate the symbolic rule description, rules. r, to lisp code, rules. ii) iii) iv) The compiler, Liszt, lisp rules. o rules. is is invoked to compile rules.l to -rules. loaded into Lyra.proto to generate an executable The intermediate files lisp Lyra, rulesX and rules. a are deleted. The following options are supported: —1 (load 011I7) No compilation is done. Previously compiled rules, rules. o, are loaded into Lyra.proto to generate an executable Lyra rules. This option is useful mainly at Berkeley, where Lyra.proto changes frequently. —o (save object) future. Name.o is not removed. Enables "rulec 4 rules' in the FILES ~cad/bin/ rulec — rulec shell script. ~cad/lib/lyra/Rulec 1 — lisp rule compiler ~cad/lib/lyra/Lyra.proto — Lyra sans compiled rules code. ^cad/lib/lyra/^r — standard rulesets. ""cad/lib/lyra/DEFAULTS -- gives default rulesets for Caesar technologies. SEE ALSO Lyra (CAD) Liszt (1) AOTHOS Michael Arnold. 77 APPENDIX C PBESIM USEE'S GUIDE Config file: used to calibrate ENL capm2a .00000 capm2p .00000 capma .00006 capmp .00000 cappa .00006 cappp .00000 capda .00010 capdp .00060 cappda .00010 cappdp .00060 capga .00057 lambda 1.0 78 PRESIM UWINW User's Guide VLSI Consortium Department of Computer Science University of Washington Seattle. (This document is WA 98195 based on portions of the document 'User's Guide to NET, PRESIM and J. Terman, Laboratory (or Computer Science, Mi.T., Cambridge, MA RNL/NL,* by Christopher 02139.) One must first we run PRESIM: convert the sim file to a network file suitable for use by RNL or NL - to do this presim foojim foo [config] options... which converts the The file foo.sim into a binary file for RNL/NL called foo. -f option: Suppresses the sum-of-products formation. This may be desired if you think sum-of-products is formed wrong otherwise the advantages of the transistor and node reduction make this option unattractive. The -« option: •cfile^n in value list of node aames and capacitances to the specified value will be included. writes a The -t file. Only capacitances larger than min- option: •tfllejnin value writes a tor. list The of transistors R's come from and RC file - there are two entries for each transisfrom the source/drain capacitance. Only RC values to the specified the size of the transistor, Ct values larger than minvalue will be included. The -p option: -presist .voltage provides a worse-case estimate of the circuit power consumption by assuming that all the pullups devices with drain-VDD) are all on simultaneously. "Voltage* specifics the supply (DEP or LOWP UW/NW VLSI Release 2 - 1 - 79 1CV V83 UW/NW PRESIM VLSI Consortium User's Guide VDD or 5 volts. The result is printed liter PRESEM completes its other processing. When figuring the resistance of a pullup device the 'power* characteristic resistance as set in the coring file is used. voltage, (or example *-pi* specifies a The optional third file (con fig) specifies various electrical parameters. The internal values (the any particular fabrication process (ITW-NW VLSI NOTE: A configuration file is provided in the source code that duplicates the internal settings as an example of how this ale could be used. In addition we note that, the resistor values are stored first sorted by width, then by length not by the ratio. Values not explicitly provided in the configuration file are estimated by Linear interpolation.) The formal of this file is lines of the form defaults) are a generic set. They do not reflect . parameter value comments-. Lines beginning with '? are treated as all comment. The parameter names and their default values are: ; configuration 51e for "standard" cappa cappp capda capdp cappda cappdp capga lambda process 2nd metal capacitance - area, pf/sq-microu 2nd metal capacitance - perimeter, p£/micron capm2a .00000 eaptnlp JXWOO capma .00003 captnp MFC metal capacitance - area, pf/sq-micron metal capacitance - perimeter, pf/micron poly capacitance - area, pf/sq-micron poly capacitance - perimeter, pf/micron n-diffusion capacitance — area, pf/sq-micron 1st .00000 1st .00004 DOOOO .00010 00060 n-diffusion capacitance .00010 p-diffusion capacitance - D0060 p-diffusion capacitance .00040 gate capacitance 2.5 microns/lambda (conversion from to units used in cap parameters) - ; logic highthresh 0.8 ; logic high threshold as a cntpuilup ; diffperim subparea perimeter, pf/micron area, pf/sq-micron lowthresh OJ .sim file units low threshold as a normalized voltage normalized voltage ; < > means that the capacitor formed by gate of pullup should be included in capacitance of output ; node ; < >0 means do not include diffusion perimeters on transistor gates when figuring ; that border ; sidewall capacitance (*) ; ; LTW/NW VLSI perimeter, pf/micron area, pf/sq-micron < >0 means that poly over transistor region will not be counted as part of the poly-bulk capacitor {') Release 2 10/1/83 80 UW/NW PRESIM VLSI Consortium Uier'i Guide diffusion extension for etch transistor, ije., each have a rectangular source and drain diffusion extending diffext units wide and diffext transistor is assumed to The transistor-width units nigh. effect of the add some capacitance to the source and drain node of each transistor — useful when processing the output of NET to improve the capacitive loading approximations without adding diffusion extension is to explicit load capacitors, lambda (it will diffext is specified in be converted using the lambda factor above). resistance channel context width length resist this command specifies the equivalent resistance for a transistor of type channel with the specified width and length. Transistors matching this entry will have the specified resistance; Linear interpolation done is the width and/or length if is not matched exactly. channel context width length is resist is (") is is is one of "enh", 'dep', "intrinsic*, low-power", "puUup*. or "p-chan" one of "static", "dynamic-high", "dynamic-low", or 'power* given in lambda given in lambda given in ohms These paramters should be 1 only when processing the output of the node extractor. They cause various corrections to be to the interconnect component of only extracted sim files a node's capacitance - made usually have information regarding interconnect capacitance. PRESIM uses these parameters in calculating the capacitance for each electrical node and the resis- tance for each transistor channel. UW/NW VLSI Release 3- 2 81 1IVU83 APPENDIX D ADDER SIMULATION The following two listings are; for the entire chip command file. and the RNL command file results of the (2) (1) running that In addition to this overall testing, all the simulated individually. A nice a watched the indication of when node layout of Appendix feature of RNL is G were changes state. Thus, by making all the outputs of watched nodes, RNL cycle to produce the outputs for a clock This can the simulation). indicated by simulation with running the outputs of will provide the minimum (neither X 1 nor 0) circuit time duration (the longest time be confirmed by resulting in faster clock, a a where insufficient time has been allowed. determine the minimum time RNL simulation to harging the PLA circuits for prec- slightly more is only involved. selected that will result in needing to be charged from alternating inputs are maximum amount of N+ diffusion vclts to Vdd. Then as these inputs are PIA precharge product term in For each alternated, the PLA, the time is until the circuit fails to produce correct results. with the longest for the longest through For the inspection for the product term precharge requirement was done by looking line which must be charged N+ diffusion PLA s in the adder, ' reduce the maximum visual number of transistors. The inspection results were confirmed by ENL simulations. 82 visual ) acr 2 13:59 <- 4 ) cr IV*** i r.- . c rr r< ) Face 4 1 1 Cloo -ill e "ch It . loc " Cloa d "u wsti;. 1") Cloa d "u WSiR . 1") (rea o-re t w o r * "crir " ) 65 ae a 7 aR a9 alC all al2 al3 (set c no aes ' (al s2 8? a 3 c 4 b S be b 7 c f b S o l bl2 bl3 b b 1 al4 al5 al o c 1 tv s 9 S 1 s7 s S 1 1 S 1 2 Sl3 S 1 4 S 1 5 s tl6 si s 2 S3 S4 S5 S cir cout rhll oni2 conl con2 con3 con4 crn5)) (cnf laa noo«S a7 a 3 a« a^ a6 a7 ao a c a 10 all al2 dU alt al5 al6 1 al t2 d3 n« b5 be b 7 r'e D-j bid bl 1 bl2 bl3 si 4 bl5 b!6 1 bl c on2 c or-,? co n4 con5 1 CO nl 1 'i 1 r> <• li bi5 1 1 6 > 1 cl n r-^ (Hef vec Cdef vec 1 11 DJ: 12 '(Mr ClCCK S rhll Ch i 2 alb al5 ) ) aW al3 al2 all all) as* aia7 a6 a5 a- a3 a2 al)) (defvec '(fcir Dbbfc n 6 b 1 5 n 1 bl3 bl2 bl fclO b 9 PS o 7 be b5 bi b3 t2 nil) (detvec '(fclr sur c out S16 sib sH S13 sl2 si S10 sfsis* s3 s^ s Sh s/ s7 sts5 s<< sb s2 si;; Si)) ' (def-renort ("stat clr. co vit ne*llne is now!" (vec c l o c < s vec b*a* ne* line (vec brbb) n e » 1 1 n e (vec sur) ) '(tin aaoa ) 1 1 ) ! ) ( ) ( de fun rr v ) Inc T ) ) ss(oi;rr (step (defun cvcles (al (repeat 1 1 a (setc incr 1001 (ss ' (x)) seto lncr Ch '( phil )) ( (ss ' (setc (1 (ss '( (x J incr onl J ) 1 ) ) ' (seto incr (n is (l 2bi- '( '( '( cM?] 25(.) ) x)) obl2) ) ) ) (cycles 5) (lnvec '(aaaa (lnvec '(btbt (cycles 3) (lnvec '(bbbr (cycles 3) h cln (cycles 3) 1 cln ' (.aaa? ( lnvec (lnvec '(btbt (cycles 3) n cln (cycles 31 Ub (• ou b 1 1 1 1 1 1 1 1 1 1 '"> 1 p1 C 1 1 1 Hi l) ) ) ) 1 ) ) 10 (J OMllllMUlllllin) oo r. oo r. oocououooo l- ] t Get 2° 13:^° cin Cinvec 'Ctbbt (cycles 3) (lnvec '(&rrc (cycles l) (lnvec '(assa (cycles (lnvec 'CHbtt (cycles 1) (lnvec '(aaaa (cycles (lnvec 'fbbee (cycles ) 19*<< eric.CiTc Face 2 1 J h ObC'OOCOOC'OG'.iOOoOO I Or<t"OUOt'OOOOQOOOflO) QbtffiOOUOGGOOCGOOOO) ! i ) l ) uel 11 1 ill 11 1111111) I'^f'OOClOO'iOOuuOO^O) Oell 1101 1 1 1 111 1111) cir (cycles l) (lnvec '(?33S 0b(iQOO0P0OO<J0anO00) (lnvec ' ( d b t OfcGOOOOCOOOGAOOOUG) (cycles 1 1 cin (cycles 4) ) 84 . Dec 15:23 6 cMp.loa Paoe 1984 1 Loe^lna uwslir.l Done loadino uwsl-n.l 3086 nodes, transistors: enn=l494 intrinslc=0 p-chan=il4l dep=n lo»-oower=0 pullupsO reslstc ; Ste phi phi cin cor con con con con hl6 bib C14 bl3 bl2 hll blO n9 = oP = beoins = = a a = = o e a = bC = :0 :0 :0 :C :0 a rO 8 :C a ? a b7 = 06 = b5 = a t4s a p b3 = h2 = e c bis al6 al5 a!4 al3 al2 a P a :0 :0 :0 :0 !C all :0 alO :0 a9 = a = P a7 = a6 = e5 = a4 = a3 = a2 = a al = a a<3 ns e a B P a a a Ster beoins phllrl a o a in ns. Step nealns phllso a a 35 ns, beoins a A* ns. pnl2=l a sl6=0 a 14.2 s°=n n 16.4 Ster- 85 , nee sll sl3 sl5 s7s S5 = 53 = S14 S12 » 15.4 16,4 a ifi.4 e chic. log Paoe 2 16.4 a 16.4 a 16.4 a 16.5 a 16.5 P 16.5 a 16.5 a 16.5 P 16.5 6 16.7 6 SlO s« = s6 = 5 4= S? = Sis ste Cur clo aaa bbb sum 198« 15:7"? 6 a e 20 is now: ent times 70 *ss0b01 cln = coutsx sObOoooooooooncnooo sObOOCOOOOOOOOOOOOO o X Step beoins phi2s0 9 a 70 ns. Step becins ohilsi o a so ns. Step beoins rhllsO a o e 105 ns. s- Ster beoins a i j 5 ns phl2=1 8 cout=0 a 72.9 state is now; Current rimes 140 cloc<s=0b01 cln=0 coutsO aaaasObOOCOOOOOOOOOOOOO bthh=0t0O0O00000000O0O0 sumsObO^OOOOOOOOOOOOOOO 4n Sten beoins chi2=0 p a 1 Ster becins ohilsi a a 150 ns, Step becins ohilso a ? 175 ns, ns , Ster begins a 185 ns, ohl2si a state is now: Current times 210 clockssOtoi clnsc coutso aaaasOtOOOOOOOOOOOOOf'00 bhbbsObOOOOOOOOOOOOOOOO 86 Dec 19R4 15:23 6 cnlp.loo Pace 3 SUm=0b0COOOO00O00O0OO00 Step beqlns phi2=0 a Step becins phil=i a n Ster becins Dhll=0 a n «• 210 ns. a 220 ns. 9 2*5 ns. Step becins a 25? ns. ohi2=l P state Is now: Current tirres 280 clocKs=0b01 cln=o cout=n aaaa=0b0 0oo0O0O0O0O00OO bbbbsObOOOOOCOOOOOOOooo Sum=Ot0OOO00OO0COC0000C Ster beains onl2=0 B * 2«0 ns, Stec becins phllr] a 290 ns, Step bedns chil=0 a n a 315 ns. t> Step becins a 3?5 ns, ohi?=l a o state Is no«»: Current tin>p* 350 clocks=0b01 cln=C cout=0 aaea=ObOOOOOCOOOOnooOOO bbbb=0b00000000C0000C00 gumsObO 00 000 00 00000000 Step beains bl6=l a bl5=J a c bl4=l a bl3=l e bR=l a b7 = l a b6=l a b5=l a a o a o 10=1 a 350 ns o a o a o a o a o a o a rhl2=0 i o al2=l all = l a9=l a4=l a3=l a2=l al=l ! o 87 P F Pec 6 15:23 1984 Step beolns onil=l a Sten beolns a o pnii = chip. loo Pace 6 350 ns. P 395 ns. 4 Step bealns a 3^5 ns. phl?=l » state Is now: Current timer 420 clocl<s = 0b01 cln = cout=0 aaas=nbOonoiiiiooooilil bbbhxObl 11100001 1 1 10000 SUirrObOOOOOOCPOOOOOOOOO Step beolns oni?=o a o a ^20 ns. Ster bealns phll=l a n a 430 ns. Step bealns e 455 ns. DhH=0 a o 5tec beclns a 465 ns. obl2=l a o state Is now: Current tlm*= 490 cloc*s=0fc0l cln=o cout=0 aafla = Cb0000llHOOO0iiii bbbbactlinooooillioooo SUn- = Ot00OO00C00OOOOOOO0 Ster beolns pni2=0 a o a 49C ns. Ster beolns phll=i a e 5^0 ns, Step bealns phll=P e o £ 525 ns. Step beolns P 535 ns. phl2=l e n slbsl e 14,6 s9=l a 16.7 sll=l a I6.7 sl3=l b 16.7 Sl5=l a 16.7 S7=l B 16.7 s5=l 16.7 s3=l e 16.7 sl4=l P 16.8 812=1 e 16. SlO=l e 16. fi 88 6 S8 = l a 16 ." s6=l S4=l s2=i e lfc .8 p .<? a 16 17 Sl = 3 19, .1 l 1994 15:23 oec chic.loa Paoe 5 state Is now Current ti""e = 560 clocKs = 0b01 c 1 n = o c o t = aaaa=0t00001 HOOOOi i n bbDhrCbl 11 100001 11 innoo sumsOfcOHlll 11 111 1 ill i; 1 ? Step tealns b9=l e l ? 560 ns Ster beclns Dhil=l p a 570 ns. SteD beolns nhil=0 a ? 59? ns. bl = l e bl6 = C tl5 = C bl4 = bl3 = a o a o a o o a e« = 9 b7r0 a b6 = n C5 = C a C rhi? Stec beolns a 605 ns, nhl?=l e state Is now: Current tiire* 630 clocks=Cb01 c 1 p = n c u t = aaaa=ObOOOC11110000llll bbbbsObOOOOOOOlOO'iOOOOi SumsObOl 1 11 1 1 1 1 1 1 1 1 11 11 Step beains nni2=0 a a 630 ns. Step beolns phll=l a a 6^0 ns. Stec beclns ohil=0 a a 665 ns. Step beclns a 675 ns. phi2=i a state Is now: Current tin>e = 700 clocks=0b0l cin=o coutsO a«aa = OtCO0Oi 111 00001 in pbbb=ObOooooon 100000001 89 Pec 198* 15:23 6 cnlc.loe Paae 6 sum = ot«oi 111111111111111 Step beolns Dhi2=0 a n a 700 ns. Stec bealns Dhllsi e c s 710 ns. Stec beoins ? 735 ns. Dhil=r> a n Step beolns e 745 ns. nnl2=l e n sl6=0 e 14.2 s9srt e 16.4 sll=0 e 16.4 Sl5=0 a 16.4 S"J = ? 16.4 s3=0 P 16,4 Sl«=0 » 16.5 Sl2=0 g 16.5 Sl0=0 16.5 Sfl=n a 16.5 16,5 S6 = S4=0 s 16.5 S2=0 C lb.7 Sl=0 « 20 state Is now: Current times 770 clocXssObOl c n = cout=o aaaa = 0r0OOOHlJ00COllu bbtbsObOCOOOOOl 00000001 <? (? 1 SUfsObOOOOlOOOOOOOlOCOO Step becins cln=l a o Dhl?=0 a 6 770 ns. Step beolns phllsi e o a 7q 9 805 ns, Stec bealns phllso e o r. ns. Step bealns I 815 ns. Dhl2=l a o state Is now: Current times 84u cloctcssObOl cln = cout = aaaa = Ofc0nooilll0O(i0llll ] thbhsObOOOCOCOlOOOOiOOl SUmsObOOOOlOOOOO 010000 Step beolns chi2=0 a o a 840 ns, 90 : Dec ]9»4 55:23 6 cMc.loc Faoe Step beoins phll=l » P P50 Step beoins phll=0 e P 875 ns. 7 r.s. Ster beoins £ 895 ns. phl2=l a o state Is no*: Current tiroes 910 cout = clock-s = ObUl cln = aaea = 0b000ruillooocun l bbbb=0b00O0CG01O0O00O01 sum=Ob0000100P000030000 Step begins phi2 = a 9 Step beoins phi 1 = 9 i « 920 ns. Ster beoins pnll=0 P a 9 ns. 1 s> 1 <j 5 ns. Step beoins a 955 r>s. phi2=l e o sl=l a 19.3 state Is now Current tlme= 980 clocks=0b01 cln=l cout=0 aaaa=0b0000111100ncilll bbbb=0b00nn00010O000001 SUirsObOOOClOOOOO^OlOOOl Stec beclns a 9R0 ns. Ster beoins Dhll=i p o a 990 ns. Step beoins phll=0 a o P 1015 ns. Steo beoins ohl?=l a n e 1025 ns. a 1 6= a 1 al5=i a al4=i a al3 = l e a6=l a a7=l a a6=l a a5=l b<J = o a a c bl=0 a cln=C a ohl2=0 a o 91 P: Dec 6 19P4 15:23 chip.loq Faoe P state Is now: Current tlires 1^50 clccKssOt-Ol cln=o eout=0 *aea=0blllli l 1111111111 bbhbcotoooonoooooooooco SUmrObCOOOlOOCOOOOlOOOl Step beqlns phl2=0 a o 9 1050 ns, Stec beclns Dhl1=l a e 10*0 ns. Sten beolns phll=0 a 1085 ns. Sten becins a 1095 ns. phl2=l 9 state Is now Current times 112C clcctcs = Ob01 cln = couts" aaaasOfcll 1111111111 11 11 bbbb = Cc000 00OOOoo0000() (* SUmsObOCOOlOOOOOOOlOOOl Ster beolns Dhi2 = C a e 1120 ns, Ster bealns chllsl a a 1130 ns. Step beolns Dhll=0 a 9 1155 ns. r» Step bealns 9 11*5 ns. phi2=l » sl6=l e 14.6 s9=l e 16.7 e 16.7 s 1 1 = 1 Sl5=l f 16.7 s7=l a 16.7 s3=l P 1*.7 Sl4=l a 16. sl2 = l a 16. sl0=l a 16.8 «;8=1 a 16.8 s6cl a 16.8 s4=i a i*.s s2=l a 17 state Is now: Current tlrre = 1190 cloc<s=0b0l cln=0 cout=o aaea = Ohlllll 1 l'l 11 11111 1 bbbb=0b0o00000000000000 sui" = 0b01 111111111111111 92 H nee 6 15:23 1994 chip. loo F?oe Sten bealns cin=l B Dhl2=0 e o B 1190 ns. Ster bealns onil=l e B 1200 ns. fl 1225 ns. SteD bealns nnll=o a n ° St en bealns B 1235 ns, phi2=l b o state Is now: Current tlme = 1260 cloc* s=Oh01 cln=l cout=0 aaap = otllllllllllllll 11 bbbb=0bOOOo0000O000C000 sumsObOllllllllllUltll Stec beolns oni2=c e e 1260 ns. Ster bealns ohllsi e » 127" ns. SteD bealns B 12Q5 ns, pMl = o a o Ster beolns a 1305 ns. nhi2=l B o state Is now: Current tlme= 1330 clocKs=0b01 cln=l cout = aaae = 0fcl 111111 1111 111 bbbnsObOOOOOOOOOOOOOOno suf=0b01 111111111111111 Stec bealns nni2=0 B & 1330 ns. Ster berins onil=l ' B 1340 ns. fl 1365 ns. Stec bealns phll=0 s o Ster beolns B 1375 ns. ohl2=l P o SlftsO B 14.2 s9=0 B 16.4 Sll=0 B 16.4 Sl3=0 B 16.4 16.4 S15=0 S7r0 6 16.4 S5=0 o 16.4 S3=0 B 16.4 fl 93 Pec 15:23 b Sl4 = 9 b Sl2=0 610=0 a sflsrt a sfi=0 S4=0 « a s2 = o a 19%4 chic. leg Pane 1 <J 16.5 16.5 16.5 16.5 16.5 16.5 16.7 sl=0 9 20 cout=l a 21.1 state is no*: Current times 1400 clocKs=0b01 cln=l couts] aaeasotllll 1111111111!) hbbbsOcOOnooOOOOOOOOOOO suir = OM n oooonooccooooco Ster beclns hl=l a C cin=0 9 ohl2=0 a ic 14 ns Ster beolns pnii=i a n ° M10 ns, Step healns pnll=C a B 1^35 ns. . Ster bealns ? 1445 ns. phi2=l b state Is now: current times 1470 clocks=0b01 ein = coutsi aaaa=Orll 1111111111111) brbb=0bO0O000000000O001 sumsOfclOOnooOOOOG 0000^0 Stec bealns phl?=0 » 9 1470 ns. Step peclns pnll=l a * 1^90 ns. Step bealns phll=0 e £ 1505 ns. Step beolns P 1515 ns. phl2=l a state Is nowi Current time= 1540 cloclcssf'bOl cln = C cout=l aaaa = Obllllllllll 111111 bbbh=0bO00000O0OO000O01 SUirsOblOOOOOOOOOOOOOOOO Step bealns nhi?=0 a n a 1540 ns. 94 Dec 15:23 6 Step beains onilsl a o Step beains onil=0 e 19«fl chlcloc paae P. 1550 ns. a 1575 ns. 11 Step beolns 9 1585 ns. phi?=l e o state Is new: Current time? 1610 coutsl cloclcs = Ob01 cln = aaaa*Oblllll lllllllll 11 bbchsotooonnooooooooooi SUm=0blO0OO00000O0O0O00 Sten beolns * 1610 ns. Step beolns Dhll=l a o P 1620 ns, Step begins DHl1=0 6 a 1645 ns. bl = e ohi7=0 a Sten reains * 1655 ns. phl?=l a i state Is now: Current tirre = 1680 clocks=0b n l cin=o coutsl aaae=0bll 11111111111111 bbbbsObOOOOOOOOOOOOOOOO SUmsOblOOO 000000000000 Step beolns al6=0 a o al5=0 P al4=n e o al3=o p o el2 = C a all=0 e o aioro a o a<*=0 afl=0 a7 = a6=0 a5=0 a a a a 1690 ns, o o c a2=o e aleO a pni2=0 c 3=0 P o a a 16P0 ns. o o a a a* = 9 a o Ster beolns philxi a n 95 7a B Dec 19T4 15:23 6 Steo beolns Dhll=0 P P cMc.loo Faqe 12 1715 ns. Ster bealns P 1"?25 ns. chJ2=l * state Is no«: Current tlrre= 1750 cloocssOfOl cln = o coutal aaae = 0fc00C. 000000 000000 bbbb=0bO00O000C)oocoC00 SUn-sOfclOuOOOCOOOOOOOOOO Stec beolns f 1750 ns. Ster bealns rMll=l p a 1760 ns. Steo bealns onil=0 a o a 17H5 ns. <? 17Q5 ns 6=1 a b15=1 bl4=l e bl 3=1 B bl2=l bllsl blOsi 8 b 1 b9 = l P p s9 = sll sl3 sl5 S7 = s5 = 53 = s!4 sl2 slO o P e b8=l s b7=l a h6=l e b5=l B ^4=1 ? b3=l a b2=t P hl=l P ohl2=o Ste chl sib C s o ^ecins = a 1 a 1 o I4.fi 16.7 a 16.7 a 16.7 a 16.7 a 16.7 a 1 1 1 a lb. a 16.7 16.8 16.8 1 P 1 P a 1 16. = a s« = 54 = s? = a 16.8 16.6 a 16. a a 19.1 sfi si* 17 96 1 :: Dec 6 1964 15:23 enip.loc Pace 13 cout = P 7.2,9 state i s now Current times l a 20 clockssObOl cln=P cout=0 aaaa = 0b00OOC0OOC0OO0<^0O bbbbaOfcllll] 111 1 11 1111 SUffsObOllllll 1111111113 Step beolns al? = l 8 Dni2=o e o P 182^ ns. Ster beolns nnii=i 9 o ? 1R30 ns. Ster beolns Phll=0 8 8 1*55 ns. Stec beains 9 1965 n s, Dhi?=l 8 Sl6rO E 14.2 8 16.4 s9 = SllsO 8 16.4 Sl3=0 B 16.4 Sl5r0 8 16.4 S7=0 8 36.4 S5s0 P 16.4 S3=0 « 16." Sl4=0 e 16.5 Sl2=0 P 16.5 SlO=0 e 16.5 s8=n e H,,5 S6=C 8 16.5 S4=0 6 16.5 S2=0 8 18.7 Sl=C a 20 state Is now Current tirres 1890 clocks=PbOl cir = o cout=o aaaa=0bCOO0l00OOOOO0O0O bbbb=Otl 111111111111111 SUirrObOOOOOOOOOOCOCCOOO Ster beolns bl2=0 a Dhl2=0 e a 189n ns. Step beolns ohll=l a e 1900 ns. Ster beolns chll=0 e 9 1925 ns. Ster beolns B unl?=l e sl6=l a 14.6 1935 rs . 97 9 Dec » 16.7 16.7 16.7 36.7 16.7 16.7 16.7 1 3 16. 1 a 16.8 1 B s9 = a 1 513 sl5 1 9 1 s7 = s5 = s3 = a e B sl4 s!2 slO 16. = S6 = 54 = S a 16.9 16.8 16.8 S? = P 17 sl = e sfl 1994 15:23 6 sU : a ehlr.loo p<?ne 14 P 19.1 sta e Is now Cur ert times I960 clo Ks=0b01 clnsc cour=o aaa sobooooiooonoooonoo hhb sOM 11101 11 11 ill 111 Ohoim j liiuiiiii Sttr beains cln=l e pni2so b o B i960 ns, Stec renins P 197 8 1095 ns SUff dn1 1 = P 1 i ns o Ster beoins nnilso e 200? ns. Stec peclns Dnl2si e p SlbsO B 14.2 Sl3=0 B 16,4 Sl5sP B 16.4 Sl4sC B 16.5 Sl2s0 a 16.5 COUtsl 6 21.1 state is no*: Current timer 2030 cloc*s=0b01 clnsi coutsl fc naaesObOOOOlOOOOOOOOOOO bbbbsnbl 111011111111111 sumsObloOOOOl 1111111111 Ster beains bl6=0 a bl5=0 hl4=0 B bl3s0 P bllsO P M0 = b9sn hB=n 2030 ns. P e » C 98 nee 6 b7=n b6so 9 b5 = a b4=0 a b3 = b2 = a blatO a 12 = a 19M 15:23 chic. log Paae 15 e e pni2=o " n a a Stec hecins Dhilsi a o a 2040 ns, ten benins rhil=n a o 6 2nb 5 S ns. Ster beairs a 2075 ns. phi2=l a o slb = l a ii.fi sn=i e 16.7 sl5 = l a 16.7 si4=i a 16. 9 sl2 = l e 16." cout=P a 22.9 state Is now: Current times 210n clocics = 0fc01 cln = l cout = aaaa=0h0000000000000000 nbbb=ObCOOOOOOOnnoCOOOO SUWaObOllllll 11U11 1111 Ster beolns cln=0 a o chl2=0 a o a 2100 ns. Sted beains phll=l a o a 2110 ns. Stec beains phll=0 P a 2135 ns. Step beains 6 2145 ns. phl?=l P B 14.2 sl6 = S9 = 9 16.4 Sll=0 B 16.4 Sl3=0 B 16.4 Sl5=0 e 16.4 S7=0 P 16.4 S5=0 B 16.4 s3=0 e 16.4 Sl4=0 a 16.5 sl?=0 e 16.5 Sl0=n e 16.5 sfl=P P 16.5 s6=0 a 16,5 99 n*?c 6 15:23 1984 S4=0 a a 16.5 16.7 s2 = chic.loa Paae 16 S1=0 e 20 cout=l » 21.1 state is now: Current times 2170 cloci«cs = ObOi cln = cout=l aaaa=ob0onoocooooooooco bfcthsOt^O^OOOOOOOOOOOno sum=0blO000000000OCTO00 Stec beains ohi2=0 a a 2170 ns, Stec beclns a 21«n ns. a 2705 ns, dM1 = 1 a Stec becins rhil=0 c Ster beains f 2215 ns. Dhi?=l 9 cout=0 B 22.9 state is now; Current timer 2240 clocKs=0b0l cjn=0 cout=0 aaaasoboooooooooooononn cobb=ObOOOOioOOOCOOonoo suirsotooooonooooonooooo Ster beains nhl2=0 e fl 2240 ns. Sten *ealns Dhll=l P a 2250 ns. Ster beclns onil=0 « e 2275 ns. Stec beains a 22P5 ns. chi2=l » sl=0 a 20 state is now: Current tlire= 2310 cloc»cs = Ob03 cin = cout = aaaa=0b0OO0OOOOOO00C00O bbbb=Ob0COO000OO0OOO0OO SUffcObOOOOOOOOOOOOOOOOO Stec beclns ohl2=0 a o a 2310 ns. Ster becins ohll=l a o a 2320 ns. Ster beolns a 2345 ns. 100 nee fe ohil=0 15:23 1964 cMn.loc i-aoe 17 f Step beclns £ 7355 ns. phl2=t a o state Is now: Current ti^e* 7380 clcc*s=0b01 cln = c cout = n aaea=0fc00O0O0000C0C0000 bbbb=0b0O0O1CO000OOOOC0 S'.im = 0b000O0^000n0O000O0 exit 101 APPEND1I I LAX0U1S LEGEND >;-.•-.' Contact Cut p-well P+ doping polysilicon Diffusion Metal 102 AND Gate 103 m . . r,*v.5 - - ..... ' . -. ryyv m ::::q: ism mmmmm OR Gate 104 A + B XOR Gate 105 r.s2 BgS^^^^£^S^SY^aa^ -f »'. jrtNN^-^-NV XN^y,' -^-ft^^N^ ^ipmiiifinl.iiiSiiii g^SSISSS urm ESSSS in mmm «!l T^TJ »**& yt%-p-,-.-.^ Wm CON(n) ;.13 I! S^K£ CON(n) '.' •. :;: •' :•.•:-. ; firTtTOTmTTr" imnninrn HiMTTOnnininr'n'ni'' '' !!tl'l' || , ti ll!l! out shift out 106 o p CQ CO w M } \ ''(?& *$?.ta m en W mop rip j. < 0-, CO w CO u M m u 107 t l m m *y ^T CM n co CO CO CO <N (N CJ U H U M M w u H »-i U H <n ^H to hi CO M u H i— <N .— x: a. •H x: a. •H x: cu H ^Hyiijjhj: ^^n u—ta-u-tij u—y •' • .,. |H; £} • Li - , • r-^ . . .. .., ,/ I ,- "* —— ' ij GND "|Lga :::i?S^::-n:ffia:::-::nsa: : ES4 PLA84 108 ES3 ES2 ESI ' ClC Im u u ^r CO •rl lu rorn Ul |V1 w lu i M pi rsr C/l w ij i o h H to fn cn i<n W — Iti ij u i r. i ,-ri r-i u — N -h f I »H i W w 1(1 «H £ £ o< a 1/1 Icj a ~ i rt ttfi Vdd f-f t< - 1 : ! ¥ g mH m mH ra h % ^_. l_-?Lfe ^ J' 1 | I 1 ! rti i j* •y-jV 'ps-. [j.-ij Ej f, t-:i:3:Jx-.^:.j[vJ:i-.: fSj — i p :,'-it I fp ;.• '3 ii H:.. n.-fi-ihi - : : !.l 1 1 y^frr.'f '.•":.' : : : :j.E/ :j :i "i :p ^ t|_j tj ; H j Q-^"~ ~^ - ^^-^H;'- ^- -l^l;^ ^ .i'^-^p ;c> <^ 3 ^^^ ZBZZB. '•<"•?••>? i -J-J ' N ^r"% a . :. -. ^_[f ^JJ j^Jj |Tp!J g vv v GND li ^::::^y§nT^,: S4 PLA104 109 S3 fc J|-:.-:;;^gT::;j^ S2 SI -• r. ^T •H a. m CQ a. 0) u U a m § (J <r ^l ro a l^-^ig-X, ' CQ <N &. ca i a o Wl-^ •H CQ iH •H J= a. x: tu .c ft § § § n CM iH iH ca CU CQ a i u l^ 1 O jm 11 CM tm-y Vdd V. • i|-:fi:j P*^T^ i . |*gA,,»..j,.,ir.„ti. k^qfcn:-f M". ' ' ••ptf 4". t::j i^^ffl ' fj^5^^gg^^;g;|j;;: ^•|f'-t!-.H:.if.i:]--: ; l,i:: fc^".! ":-(:j;| t } ) :i...i ,.iv^i...; Ui -t i ?i- ; |j::| jj.;j. [£[:§,; g ggg^ gj] ®53gg^| f^;ffi^jL- ^.- ifcft, .-* —>**) ! 1 5 JCTafezzfa - - '^r-^^--r.»^-^s^a-^^5^~-ij-:-M^^ f .•: :£-:- ^"' l^STlia P{ :•:: ».,„. l.-.-TJu.S.:. --4.-- •:: F-xr : t: -.< : .1-a-J g^flilifs^iiiili^iiiiol 1 r .'..'.' '- -% r~T7T, gt»giiigiii BC4 PLA915 110 BC3 BC2 BCl BCO 1 1 AP£MDH F TEST VECTORS Addend Addend A msb- - - - - 1st) Sum Cin B msb- - - - - lsb lsb msb- initialize all internal nodes 0000000000000000 0000000000000000 xxxxxxxxxxxxxxxxx 0000000000000000 0000000000000000 xxxxxxxxxxxxxxxxx 0000000000000000 ooooooooooocoooo 00000000000000000 test for proper P and G primitives 0000000000000000 11111111 1111 1111111111 11 0000000000000000 01111111111111111 0101010101010101 1010101010101010 01111111111111111 1010101010101010 0101010101010101 01111111111111111 0001000100010001 0000000000000000 00001000100010001 0001000100010001 0001000100010001 00010001000100010 0101010101010101 0001000100010001 00110011001100110 0101010101010101 0101010101010101 01010101010101010 0101010101010101 00110011001 1001 1 01000100010001000 0010001000100010 00110011001 10011 00101010101010101 11 1 01111 11 11 1 1 1111 111 11 1 test fcr proper IES test fcr proper IC23 test for carry from block to blcck 00000000000011 11 0000000000000001 00000000000011 11 0000000000000000 1 00000000000010000 00000000111111 11 cooooooooooooooo 1 00000000100000000 11 00000000000010000 1 000000001 1111111 0000000000000001 00000000100000000 0000000011 111111 0000 111111111111 0000000000010000 0000000000000000 0000000010000111 0001000000000000 000011 1111111111 0000000000000001 00001000000000000 0000 111111111111 0000000000010000 00001000000001 000011 1111111111 00000001000COOOO 00001000011 111111 1111111111111111 0000000000000000 1111111111111111 1111111111111111 0000000000000001 0000000000010000 10000000000000000 10000000000001 111 1111111111111111 0000000100000000 10000000011 111111 1111111111111 00O1000O000C0000 1000011 1111111111 111 112 1 1 11 10000000000000 000 LIST OF REFERENCES 1. Mead, C. and Conway,. L-, Introd uctio n to VLSI Sy stems , Addison-Wesley , 1980. 2. Carlson, D.J., Appl ication of a Silicon Compiler to of DigiTaT" Pipel ine d* Hu TEipI iers , V LSI Design "I SEE" TEesis, TIaval Postgraduate 5ch~ooI, Konterey , Ca., June 1984. 3. Conradi, J. R. and Hauenstein, B. E. , VLSI Design of a 16 Bit Very Fast Pip_eliEed Carry L ook" Ahea]3 Adder, M~S"EE Thesis, Uaval Postgraduate "School, MonTerey , Ca. September 1983. , 4. Ousterhout, J., Editing lh 2 I Circuits with Caesar, Computer Science Division, BeparlmenH or Electrical Engineering and Computer Sciences. University of California, Berkeley, pp. 1-22, March 22,1983. 5. Tsai, L. L. and Achugbue, J. Hierarchical VLSI Design System," 21-26, July/August 1983: 6. Carnegie-Mellon University Computer Science Department Report CMU-CS-84-101, Let^s Design CMOS Circuits! Part One, by M. Annaratone ,~Ipril 3, T9"83. 7. Krambeck, R. H.< Lee, C. M. and Law, H. S., "High Speed Compact Circuits *ith CMOS," IEEE Journal of Solid State Circuits, Vol. SC- 17, No. 3, pp7 ~575-6 V5~ June, T9"H27 8. Fang, R. C. and Moll, J. L. , "Latchup Model for the Parasitic p-n-p-n Path in Bulk CMOS," I EEE Transactions on El ect ron Devices, Vol. ED-31, No. TT ppT~TT3^TZ0T January~TT84T 9. Massachusetts Institute cf Technology VLSI Memo No. 82-117, Introductory CMOS Techniques, by L. A. Glasser and W. S . "Son g7~7ebr u ary " 19E37 10. Computer Science Divisicn (EECS) , University of California, Berkeley, Report No. UCB/CSD/83/ 15, 1983 VLSI Tools, edited by R. M. Mayo, J. K. OusterhouTT and fl7~ST~ Scott, March, 1S83. "BURLAP: 0., A VLSI Design, pp. arr * 1 11. University of Washington/Northwest VLSI Consortium, Design Tools Reference Man ual, Release 2.0, A"ugust~T7 T9847 V LSI 113 12 ' PrStfSa-Hiii, 1?B§. 13 - EaiSSs.?. ~ - 23i£ Iik"S!IML 114 f £2a^iSE ^t^fi^J Arithmetic, : Bh "' s "e BIBLIOGEAPHY Novel Clocking V. D. , and Agarwal, "A Mercer, MR. Technique for VLSI Circuit Testability," IEEE Journal of Solid State Circuits, Vol. SC- 19, No. 2, pp. "2U7-2TT7 Ipril, Tosuntikool, N. and Saxe, C. L. , "Rapid Design of Functional Cells," VLSI Design, pp. 73-77, July/August 1983. Williams, M. J. Y. and Angell, J- B., "Enhancing Testability of Large-Scale Integrated Circuits via Test Points and Added C-22, No. Logic," IEEE Trans actio ns on Computers, Vol. 1, pp. 46-6TJ, January, T9T3T 115 INITIAL DISTRIEUTION LIST No. 1. Superintendent Copies 2 Attn: Library, Code 0142 Naval Postgraduate School Monterey, California 93943 2. Dr. Dcnald Kirk Code 62KI Naval Postgraduate School Monterey, California 93943 3. Dr. H. H. Loomis Code 621M Naval Postgraduate School Monterey, California 93943 4. Dr. a. L. Cotton Code 62CC Naval Postgraduate School Monterey, California 93943 5. Defense Technical Information Center Cameron Station Alexandria, Virginia 22314 6. LCDE William H. Reid 11224 Edgemoor Court Woodbridge, Virginia 6 1 1 2 1 22192 116 --> 3 ReW B32W B3255 „ ^sign vJ i*eeB °1 of a S technologylb III Thesis R3255 c.l Reid Design of a sixteen bit pipelined adder using CMOS Bulk P-Well Technology.