Download Design of a sixteen bit pipelined adder using CMOS Bulk P

Transcript
DUDLEY KNOX LIBRARY
NAVAL PCSTGRADUATE SCHOOL
MONTEREY, CALIFORNI
NAVAL POSTGRADUATE SCHOOL
Monterey, California
THESIS
DESIGN OF A
SIXTEEN BIT PIPELINED ADDER
USING CMOS BULK P-WELL TECHNOLOGY
by
William
R.
Reid
December 1984
Thesis Advisor:
D.
E.
Kirk
Approfed for public release; distribution unlimited
1223070
SECURITY CLASSIFICATION OF THIS PAGE (When Data
Entered)
READ INSTRUCTIONS
BEFORE COMPLETING FORM
REPORT DOCUMENTATION PAGE
I.
4.
REPORT NUMBER
TITLE
2.
(and Subtitle)
5.
Design of a Sixteen Bit Pipelined
Adder Using CMOS Bulk P-Well Technology
7.
AUTHORfM
William
9.
R.
RECIPIENT'S CATALOG NUMBER
GOVT ACCESSION NO
TYPE OF REPORT
PERIOD COVERED
6.
PERFORMING ORG. REPORT NUMBER
S.
CONTRACT OR GRANT
NUMBER("»)
Reid
PERFORMING ORGANIZATION NAME AND ADORESS
10.
Naval Postgraduate School
Monterey, California 93943
II.
6
Master's Thesis;
December 1984
CONTROLLING OFFICE NAME AND ADDRESS
12.
PROGRAM ELEMENT. PROJECT, TASK
AREA 4 WORK UNIT NUMBERS
REPORT DATE
December 1984
Naval Postgraduate School
Monterey, California 93943
13.
NUMBER OF PAGES
116
14.
MONITORING AGENCY NAME
4
ADORESSf//
dltferent from Controlling Olllea)
15.
SECURITY CLASS,
(ot thla report)
UNCLASSIFIED
15*.
DECLASSIFICATION/ DOWNGRADING
SCHEDULE
16.
DISTRIBUTION ST ATEMEN T
(of this Report)
Approved for public release; distribution unlimited
17.
DISTRIBUTION STATEMENT
18.
SUPPLEMENTARY NOTES
KEY WORDS
19.
(of the abetract entered In
(Continue on reverse aide
it
Block
20, It different from Report)
necessary and Identify by block number)
VLSI Design, CMOS, CMOS-PW, Pipelined Adder, Carry Look
Ahead Addition, CAD Tools
20.
ABSTRACT
(Continue on reverse side
It
necessary and Identity by block number)
The design of a sixteen-bit pipelined adder CMOS integrated
circuit is presented.
The adder is designed to maximize
throughput and to provide for testability. Tutorial material
on CMOS design is also presented.
DD
,^3
1473
EDITION OF
1
NOV
65
IS
S/N 0102- LF-014-6601
OBSOLETE
"
SECURITY CLASSIFICATION OF THIS PAGE (When Data
Bntarad)
Approved for public release;
distribution is unlimited,
Design of a
Sixteen Bit Pipelined Adder
Using CMOS Balk P-flell Technology
by
William R. Reid
Lieutenant Commander, United States Navy
B.S., Purdue University, 1975
Submitted in partial fulfillment of the
requirements for the degree of
MASTER OF SCIENCE IN ELECTRICAL ENGINEERING
from the
NAVAL POSTGRADUATE SCHOOL
December 1984
ABSTBACT
The design
grated
maximize
of a sixteen-bit
circuit is
throughput
presented.
and
to
pipelined adder
The
adder
provide
for
CMOS inte-
is designed
to
testability.
Tutorial material on CMOS design is also presented.
TABLE OF CONTENTS
I.
INTRODUCTION
II.
CMOS CIRCUITS
A.
IV.
COMPARISON WITH NMOS
10
1.
The Inverter
11
2.
The NOR Gate and Transmission Gate
....
13
CMOS DESIGN METHODOLCGIES
16
C.
CMOS IMPLEMENTATION TECHNOLOGIES
20
1.
CMOS- SOS
21
2.
CMOS-Bulk
21
3.
Twin-tub CMOS
26
CMOS TECHNOLOGY SELECTION
DESIGN TOOLS
27
29
A.
CAESAR
29
B.
LYRA
31
C.
SIMULATION
32
1.
SPICE
33
2.
RNL
34
DESIGN OF THE ADDER
44
A.
LOGICAL DESIGN
44
1.
Zero Level CLA Logic
48
2.
First Level CLA logic
49
B.
Second Level CLA Logic
DESIGN FOR TESTABILITY
53
C.
LAYOUT DESIGN
54
3.
V.
10
B.
D.
III.
8
TEST PLAN
A.
INPUTS AND OUTPUTS
49
63
63
TESTING FOE CORRECT OPERATION
3.
1.
TESTING FOR SPEED OF OPERATION
C.
71.
Intermediate results
CONCLUSIONS
66
66
67
72
A.
THE CMOS TECHNOLOGIES
72
B.
CMOS CAD TOOLS
72
C.
DESIGN OF THE ADDER
73
APPENDIX A:
SPICE MODEL CARDS FOR 3-MICRON CMOS-PW
DE7ICES
74
UNIX MANUAL ENTRY FOR RULEC
77
APPENDIX C:
PRESIM USER'S GUIDE
7S
APPENDIX
D:
ADDER SIMULATION
82
APPENDIX
E:
LAYOUTS
102
TEST 7ECT0RS
111
APPENDIX
B:
APPENDIX F:
LIST OF REFERENCES
113
BIBLIOGRAPHY
115
INITIAL DISTRIBUTION LIST
116
LIST OF TABLES
1.
Lyra Error Abbreviations
32
2.
First Level CLA Logic for a 16-bit Sam
49
3.
Register Serial Outputs
67
4.
PLA Evaluation Sequences
69
LIST OF FIGURES
2.1
2.2
CMOS Transistor Symbols
(a)
11
NMOS Inverter
(b)
CMOS Inverter
.
.
12
2.3
Minimum Dimension Inverters
14
2.4
2-input Nor Gate
15
2.5
CMOS Transmission Gate
16
2.6
NMOS-Iike CMOS Static Gat€
2.7
Dynamic NAND Gates
2.8
Dcmino CMOS Structure
2.9
Circuit Difficult to Implement in Domino CMOS
2.10
P-Well Process, Top View
2.11
P-Well Process, Side View
2.12
Bipolar Transistors in CMCS-Bulk
2.13
The Latchup Circuit
2.14
Grounding of the P-Well
3.1
CMOS Exclusive OR
[Ref.
6]
37
3.2
CMOS Latch Design
[Ref.
6]
39
4.1
CMOS Output Loading Model
46
4.2
Preliminary Chip Floorplan
55
4.3
Dual Mode Latch
56
4.4
AND Gate
57
4.5
OR Gate
57
4.6
58
4.7
Exclusive OR Gate
PLA Structure
4.8
Final Layout
61
5.1
Charge Sharing in
17
6]
[fief.
[Ref. 6]
[Ref-
18
6]
19
...
[Ref. 6]
[Ref.
[Ref. 6]
20
23
24
9]
[Ref. 6]
....
25
25
26
59
a
PLA
68
]
I-
IO1QDDCTI0N
years the ability
For several
design custom digital integrated
The
Conway
and
Mead
design
Intr o du ction to VLSI System s
[
of systems
circuits has been growing.
methodology
described in
Eef . 1 #
permits the systems
engineer to be his own logic circuit designer.
computer-aided design
tion of
engineers to
systems
(C&D)
A
prolifera-
such as
the
MacPitts silicon compiler [Eef. 2], the chip layout language
(CLL) [ Ref . 3], the graphics editor Caesar [Ref. 4], and the
hierarchical
Burlap
possible
layout
language
engineer to
for the
rapidly carry
Conway design methodology through to
a
make
[Ref. 5]
the Mead
it
and
This
final design.
includes iterative simulation and redesign to provide justi-
fiable
confidence
in
the
final
submitted
design
for
fabrication.
techniques utilized in the
Many of the
most of the CAD
methodology and
the final design implemented in
one type
of doping
for the
(NMOS)
tools are based
a
on having
technology that uses only
semiconductor material
active region of the transistors.
switching speed,
Mead and Conway
in the
Because of their higher
negatively doped metal oxide semiconductor
transistor technologies are generally used.
Selection
provide
of
an NMOS
implementation
technology
does
engineer with a complete and proven
methodology for the design of a very large scale integrated
(VLSI)
the systems
circuit and allows the use of many extensively tested
Like any other design decision,
selection of
NMOS iiplementation brings with it some limitations.
There
CAD tools.
are
two
circuits.
primary
problems
associated
with
NMOS
digital
speed limitation.
ultimate switching
is the
The first
Though many NMOS VLSI circuits operate at clock rates in the
applications requiring
The second problem is the dissipation
higher clock rates.
relatively large amount of power consumed by NMOS
of the
State of the art, commercially available
digital circuits.
8
10 MHz
to
there are many
range,
commonly have power consumptions
NMOS VLSI circuits
vicinity of
3
to
Considerable design
watts.
5
effort is
dissipation of this much energy
required to insure that the
millimeters on a side
of the micron sized features
chip measuring approximately
by a
in the
does not alter the performance
5
on the chip.
One
group of
switching
technologies that
speed and
complementary
metal
semiconductors
circuits also offer the benefits
of a CMOS circuit.
graphics CAD
tool called
design
in the
used
carrying out the
CMOS
.
In this thesis investiga-
and Conway methodology was utilized
much of the Mead
in the design
(CMOS)
general purpose color
A
Caesar that
circuits
of NMOS
design of the
adder in CMOS two separate
16
bit
has been
frequently
was employed.
goals were pursued.
speed adder implies not only
also a
small
a
In
pipelined high speed
The first,
of course, is speed and the seccnd is verif iability.
but
is
of greater radiation hard-
ening and increased noise margin.
tion,
power consumption
greatly reduced
oxide
increased
offers both
A
high
high clock rate of operation
latency between
input
of
operands
and
output of the sum.
discussion of CMOS technologies and the implementation
of logic circuits
in those technologies follows
in Chapter
A
2.
Chapter
to construct
3
presents
a
description
and simulate the
of the CAD tools used
layout for the
adder.
The
logic and layout design of the adder is covered in Chapter
and is
followed by a test
Chapter
5.
plan for the fabricated
4
chip in
CMOS CIBCaiTS
II.
Before
attempted,
the
digital
of CMOS
design
an understanding of how
circuits
can
be
to best implement logic
It is also important to be
functions in CMOS is necessary.
aware of the advantages and disadvantages of the different
In this chapter the operation of CMOS digital circuits is explained using similar
The different
NMOS circuits as a benchmark for comparison.
CMOS iiiplementation technologies.
methodologies for assembling the CMOS
results are reviewed
pieces to produce the
selection of
the CMOS-Bulk p-well implementation technology is explained.
desired logical
A.
and the
CCMPAEISON WITH NMOS
In
switching
device,
there is
circuits
NMOS digital
namely
only
one type
n-channel enhancement
the
metal oxide semiconductor (MOS)
transistor.
of
mode
The other prin-
cipal device utilized in NMOS circuits is the depletion mode
n-channel MOS device which acts as a load resistor.
there
are both
n-channel
transistors available.
1,
is
p-channel enhancement
and
As in NMOS,
be considered on when Vdd
present on its gate.
its gate.
In Figure 2.
1
,
mode
the n-channel device can
a
logical
The p-channel device
can be
(typically +5 Volts DC)
considered on when ground (GND)
In CMOS
a
logical
0,
,
is present on
the symbols that will be used
are
for the n-channel and p-channel transistors in this thesis.
The basic differences between NMOS and CMOS technologies
can be demonstrated
by comparing their application
basic digital circuits.
10
to some
Vdd^
g ate
g ate
n-channel
c
p-channel
1-GND
Figure
2.
CMOS Transistor Symbols.
1
The Inverter
1
Figure 2.2
there is
logical
a
the lead resistor
logical
shows an NMOS
(a)
1
the voltage drop across
the input,
on
Whenever
inverter.
is approximately Vdd and the
output is
a
This results in steady state power consumption.
0.
When the input
switches to
logical
a
before the output
0,
on the
logical 1, the lead capacitance
(CI)
output must be charged to Vdd through the load resistor with
can assume a
a
resistance
of several kilohms.
longer transition frcm
charged through the
the load capacitance
to
1 ,
cally only one
1
helpful in
times.
on resistance
where
all
is typi-
on resistance of
The technique
mode transistor.
(evaluation)
gaining control
outputs are
clock cycle
over the
This longer switching time from
accounted for,
tion
switched on
set
to
during one clock cycle and then selectively forced
on the opposite
to
where
The reason for this asymmetry
fourth or less that of the
prechar^ing circuits,
to
1
is discharged through the
the pull-up load depletion
logical
than from
the pull-down transistcr's
is that
much
a
where the load capacitance is
load resistor,
NMOS enhancement transistor.
of
This results in
to the
however,
has proven
unsymmetric switching
to
1
must still be
and represents the primary limita-
speed of NMOS circuits.
11
Figure 2.2
(a)
HMOS Inverter
In the CMOS
(b)
inverter of Figure 2.2
device to switch on
device to switch off,
resulting in
the input is
An input of logical
applied to the gates of both devices.
causes the n-channel
(b)
CMOS Inverter.
1
and the p-channel
an output of logical 0.
Similarly, an input of
results in an output of 1.
In both
cases, one device is fully off, representing a resistance on
the steady state
power
the
order of gigaohms.
Thus,
consumption
power
is essentially
consumption of
sition
when
neither
Additionally,
zero.
consequence occurs
transistor
since the
and
to
1
is
during the
fully
on
or
only
tranoff.
capacitance is both
turned on transistor, the 1
output load
charged and discharged through
to
operation the
In
a
switching delays are theoretically the same.
Actually the switching delays depend on many parameters.
The
n-channel and
frequently not the
same,
p-channel device
the lobility of
12
dimensions are
the electrons in
the capacitive
Also,
p-channel.
the
Typically/
slightly longer transition
seen by
the
greater than
is
device because of the highly
the load seen by the n-channel
p-well.
load
CMOS p-well (CMOS-pw)
p-channel device in
doped
the holes in
greater than the mobility of
the n-channel is
the
in CMOS-pw
result
time of the
to
1
is
a
output tran-
attempt to compensate for this by
consistently making the p-channel transistors wider than the
Some designers
sition-
n-channel transistors.
output of a CMOS
full excursion between Vdd and GND.
Unlike NMOS,
makes a
the
circuits less sensitive to noise
digital circuit
This makes CMOS
than NMOS circuits.
more from future reductions
should also benefit
CMOS
in feature
restricted in ultimate feature size
requirements of the depletion
because the power dissipation
is more
NMOS
size.
devices will
mode
create more
problems
as feature
sizes
In Figure 2.3 the relative sizes of minimum dimen-
shrink.
implemented in currently available
sion inverters
3
micron
feature size CMOS-PW and NMOS technologies are shown.
2-
The NOR Gate and Trans mission Gate
Figure 2.4 shows the circuit diagrams and layouts of
a
two-input NOR
gate implemented in both
2.3 and
From Figures
2.4 it is
more complex and
gates are
CMOS-PW and NMOS.
evident that
area consuming than
static 1 CMOS
their NMOS
complementary circuits a
redundancy in the structures is evident.
The pull-up only
or
pull-dcwn only
would be
sufficient to implement the
counterparts.
logic.
these fully
circuits of Figures 2.3 and 2.4 the
perform two tasks.
A logical
1
on
an input
the CMOS
In
inputs must
causes both
In
a
connection between the output and ground and
a
1 Static
logic circuits
continuously evaluate their
inputs and produce their specified logic output.
Dynamic
circuits periorm logical evaluation of the inputs only when
directed to do so by control signals and/or clock signals.
13
m
i
'liiini
,
;i
,1
l
!li!'iH :il!iil
|iir
l!i'
i:.
^
111
i
cut
,
:i''ii!,ii;!"'"!i
I
ilii;
i
ii
!:i:i!i
'ii"
'
=1111
Ion.
lilllll
plant
si
D--.ll
HI
^X^
^lf
pair
t,-
,
1
Dolr
'':.'
.
•::
dl
)
fusion
NMOS
lii in
DO
H
|li!i::!":.il!ii«|:.l"!i!iir!l!llii|
:
!:-l
in:lim;;iiii'-
CMOS
Figure 2.3
Minimua Dimension Inverters.
disconnection between the output
two actions are equivalent,
be
to
necessary to implement
accomplish
chapter.
Figure 2.5
this are
major difference
p-channel
devices
described
section
B
made up
requires
of
this
CMOS transmission gate of
evident.
bilateral nature of
control signal for operation.
requirement is that
in
pass transistor is
It is
and
Design methodologies
the logic.
lies in the
transmission gate.
Logically these
therefore only one action should
The parallelism of the
and the NMOS
and Vdd.
The
the CMOS
n-channel and
polarities of the
of both
both
The reason for this bilateral
the p-channel device does
not transmit
low voltages well and the n-channel device does not transmit
14
omjAB
j;JM fe:,|l
i
siilw-S
=£*r*
art
iiiifill
ii
liisfii
II
•"|ltal!l!llli
m
11=1
hrr=
K'•X.
•
:
>1M
H
mlr
a
a
rflfflMlOT
411
r«i«
NMOS
CMOS
Figure 2.4
high voltages
drops make
tors.
well.
The
it necessary to
2-iuput Nor Gate.
resulting unpredictable
utilize both types
voltage
of transis-
This increase in complexity over its NMOS counterpart
offset by the absence of the level restoring
circuitry NMOS requires following a pass transistor. 2
is partially
2 In NMOS digital
circuits the length to width ratio of
the pull down
transistor is usually four times that of the
depletion mode transistor load.
This ratio is required to
insure sufficient excursion of the output voltage.
However,
after a pass transistor is used, a ratio of 8:1 rather than
4: 1 must be
used to restore the 1GS threshold voltage drop
across the pass transistor.
15
JL
in
out
1
c
,
In general CMOS technolcgies are ratioless.
of most CMOS gates,
..
i
CMOS Transmission Gate.
Figure 2.5
of "improper" ratios
,
logical operation
will not affect the
it
The use
will only affect the speed of opera-
tion of the gates.
B.
CMOS DESIGN METHODOLOGIES
Static gate
cies when compared to static
more area consuming.
the individual gates
NMOS gates.
Second,
output load
Third,
a
can be faster in
capacitance of
One approach
static NMOS-like
p-channel device
CMOS,
thus,
each
they are
Though
the p-channel
the fanout 3 and
circuit are
doubled
duplicating its
both the pull-up and pull-down section.
use a
to remedy these deficiencies is to
style of design as in Figure 2.6
Here the
is
always on and the pull-up to pull-down
CMOS static gate
functionality in
First,
they can be slower.
and n-channel gates are in parallel,
the
serious deficien-
CMOS circuits have three
is redundant,
dimension ratio is relied upon to produce the proper output
voltage.
This
introduces power consumption problems and
takes
away the
full
excursion
3 Fanout represents
the number
output of a logic gate must drive.
16
on the
output.
of transistors
Another
that the
NMOS-like CMOS Static Gate
Figure 2.6
approach is to
build up
make extensive use of
logic functions.
both polarities of
[Ref. 6].
transmission gates to
Using transmission
all control signals are
gates means
required-
The
required to route these
control signals can become very area consuming,
especially
if only one metal layer is available.
resulting
A
logic.
large number
of
wires
and more effective
solution is to use dynamic
Figure 2.7 contains three different implementations
third
of a dynamic three- input NAND gate.
In each,
the output is
meaningful (i.e. represents the value of the boolean expression in1 in2 in3)
only when elk is high and elk is low.
circuits of Figure 2.7 (a) and
pull-down ratio to produce the
NMOS-like style of design,
(b)
full
17
The
depend on the pull-up to
proper output.
As with the
excursion on the output is
lost and there is steady
state power consumption during the
The circuit in
evaluation cycle.
Figure 2.7
is prec-
(c)
harged when elk is low and evaluation of the inputs takes
This configuration allows only one
place when elk is high.
1
to 0,
so the inputs
must be
change of the output from
stable at the time
inputs from
1
elk goes high.
output to return to
1.
general dynamic
In
one of the
has gone high cannot cause the
after elk
to
change of
A
CMOS eliminates
the redundancy
of
static CMOS by applying all inputs to one type of device and
r
1
/\
/ \
|—
elk
ClkJ
Q
'
inl in2 in3
inl- in2
•
in3
l
—_
inl
inl
L.
inlc
in2
*
in2
.
inl in2 in3
i
in3
elk
l_
in3
—
elk
—
i
-=~
Figure 2.7
a
control
signal to
Dynamic, HAND Gates
the other type
[Ref.
of device.
6].
The most
popular dynamic CMOS logic design technique is domino CMOS
[Ref. 7],
illustrated in Figure 2.8 Here the output is the
18
logical AND of
the boolean function
implemented and
a
low,
control (clock)
the circuit is precharged,
in2
(in1
+
to be
in3)
When the clock is
signal.
and when the clock is high
A
4\
C
*
inl- in2 + in3
inl
3
±aX
in2
clock
Figure 2-8
Domino CMOS Structure
evaluation occurs.
domino gates
on a
signals ripple
purely static.
output of
prevents
driven lew
With
a
through the
the outputs
though the
on inverter
gates from
all
by the inputs.
If the logic
by all the
Domino
2. 9
logic were
begins.
changing
CMOS is not
of Figure
cycle the
insures that
low when evaluation
of
6].
the evaluation
chip as
The follow
each gate is
answer though.
common clock shared
during
chip,
[Ref.
the
This
unless
always the
were implemented
in domino CMOS it would be more area consuming than the same
circuit implemented in
static CMOS.
19
Dynamic CMOS
is more
area consuming in
this case because these
with only a few inputs.
are simple gates
Each NCR gate if implemented stati-
n-channel devices and two p-channel
devices.
If implemented dynamically, each NOR gate requires
three transistors of one type (one for each input and one
cally would
need two
for the control signal)
(for
and one transistor of the other type
the control signal again)
needed remains the
The number of transistors
.
logic requires the
same but the dynamic
designer to keep three
inputs electrically isolated instead
And if the dynamic design technique is domino,
of just two.
six additional inverters will be needed.
Figure 2.4,
As can be seen in
in CMOS a NOR gate can be constructed from just
follow-on inverter of the domino
in an OR gate.
Thus a second inverter is
Adding the
one stage.
design results
required to return the logic to that of
a
NOR gate.
1
L>^
Figure 2.9
C.
Circuit Difficult
laplement in Domino CMOS.
CMOS IMPLEMENTATION TECHNOLOGIES
One of the
to
to
principal issues in the design
implement CMOS
digital circuits
isolate the two types of devices.
in silicon
of a process
is how
This can be accomplished
completely insulating substrate or through
complex fabrication process.
by using
a
20
to
a
more
CMOS-SOS
1 .
process currently
The only
Metal-Oxide
offered by
Semiconductor Implementation Service (MOSIS)
which uses an
insulating substrate is Silicon on Sapphire
In this technology the n-channel and p-channel tran-
electrically
(SOS)
.
islands left after etching an
sistors are formed on silicon
epitaxial layer of silicon on
2
sapphire
a
CMOS-B ulk
.
by MOSIS
CMCS processes offered
The other
CMOS-Bulk p-well technology.
the
substrate
p-channel (n-channel)
n-doped
is
heavily doped
the back
The
p-well
p-channel
CMOS,
heavy doping
the
substrate.
To
the mobility
to act as
p-well
of the
(n-well)
device
(p-channel)
optimized.
device is
(n-channel)
though
and
and
is first placed
degrades the performance of the n-channel
while the
2)
devices from the substrate
p-well (n-well)
gate.
or
(p-doped)
devices are in this
isolate the n-channel (p-channel)
(1
In CMOS-Bulk p-well
the presence or absence of capacitors.
(n-well)
all use
The p-well processes differ in
the number of layers of metal interconnections
a
substrate.
(Al^O^)
electrons
of
in
In
the
n-channel device still exceeds that of the holes in the
p-channel device, the performance difference of the transisThe more uniform performance of the two
tors is ninimized.
transistor types
makes the
appropriate for
p-well process
CMOS random logic.
Figures 2.10
and 2.11
represent the
top and
side
views of the steps of the CMOS-pw process for the production
of an
inverter.
These steps
n-type substrate the
areas in
(3)
are:
p-well is patterned,
the p-well and
on the substrate
the polysilicon is patterned,
masks are
placed (the
starting
(1)
N+ mask
21
(4)
(2)
with an
The active
are established,
the two ion implant
is simply
the photographic
negative of the P+ mask)
(6)
contact cuts are made,
(5)
,
and
the metal is placed.
Latchup in CMOS-pw
a.
One
CMOS-Eulk,
p-well and n-well is
both
associated
problems
main
the
of
with
Basically
latchup.
between Vdd
and can result in
the complete destruction
of a
and GND,
define the
Many researchers have tried to formally
chip.
latchup involves generation
of a short circuit
conditions [Eef. 8] that cause latchup to occur.
This task
is extremely complex because the phenomenon is so dependent
which is unique to
on layout,
fully quantitative analysis
able,
a
Though
each chip design.
of latchup is still
a
not avail-
show what happens on the
qualitative analysis will
chip when latchup occurs.
Looking
at the
side
view
of an
inverter
in
Figure 2.12, parasitic bipolar transistors can be seen.
The
base of the npn transistor is the p-well and the base of the
is the n-doped substrate.
pnp transistor
These parasitic
transistors are connected as shewn in Figure 2.13 .
If the
output of the gates goes below GND by a value equal to the
threshold
of the
inject current
npn transistor,
its
intc the
(electrons)
emitter starts
base
(p-well)
to
and the
resultant collector current flows to the Vdd node.
If the
resistance between the Vdd ncde and the source of the
pull-up p-channel HO S transistor, R1, is large enough,
the
voltage drop across E1 will exceed the threshold of the pnp
transistor.
The collector current (holes) of the pnp device
flows to the
node and
sistor,
across R2
sistor.
GND node.
the source
R2,
is
If the resistance
of the
pull-down n-channel
great enough,
will increase the base
As is evident,
between the GND
the resultant
voltage drop
current in the
there is positive feedback.
22
MOS trannpn tran-
n
u
i)
2)
rzszss^
•*r»*\
1
r
t
V«
w m
m
m
ii
1
I
p»
I
L
I
__
J
n
poly
[J
4)
3)
lint
j
area
,
-
.-
p
I
i-
*
'<
-ell
—
t
•"r-n"
J
1
i[b]i
B
contact
LIMIJ
5)
111
X]
[
J
3
I[z
H
Figure 2.10
t'
^t SBr^?
P-Well Process, Top View
23
[Ref. 6].
I
oxiae
p-well
n-type substrate
gate oxide
poly
V
^\
V
1
I
n
C
/^
N+
contact cut
s~
J.
c
I
\
(
T
7
Z
metal
4
/
Figure 2-11
\
—
I
1
P-iell Process, Side View
24
[Eef.
9],
The only
way
to stop this
destructive process once
it has
Prevention of latchup
started is to disconnect Vdd or GND.
must te designed in.
&
GND
n
a.
wz
h
75S
n+
n+
Vdd
A
A wy/,
D+
p-well
J^
n-substrate
Bipolar Transistcrs in CMOS-Bulk
Figure 2.12
Figure 2.13
The Latchup Circuit
The MOSIS CMOS-Bulk p-well
features
for
the
specific
25
purpose
[
Ref .
[Bef. 6].
6 J.
design rules include
of
reducing
the
'
probability of
p-wells and
areas exist for
P+ doped active
is to
Their aim
The ninimum separation
latchup.
reduce the gain
thus requiring
transistors,
Pigure 2-14
this purpose.
parasitic bipolar
noise spike of longer
sequence.
A frequently used
larger
a
duration to start the latchup
technique is the
of the
rules for
grounding of the p-well
as illustrated in
Here the effect cf the P+ doped area covering
.
ground bus is to reduce the
half of the contact cut for the
resistance E2 in Figure 2.13
Another practice is to place
.
small capacitor across the Vdd and GND pins of CMOS-Bulk
chips.
To provide capacitive filtering
of noise spikes on
a
the
Vdd
chip,
together.
GND busses
and
frequently run close
to provide
are designed
are
Vdd input pads
Also,
capacitance between Vdd and GND.
-
r
N+ diffusion
—
f-p-well
-
poly
:!::>:.
:«:::;
'.
-'
~t_:
r"v"-"
Figure
n-wells
contact
Y
GND bus
3.
V
a
cut"
?"
S
"fv-.-^=^i-"
f-
— P+ doping
Grounding of the P-Well,
2. 14
Iwin-tub CMOS
This
process,
and
p-wells
also
on
called
twin-well,
high
resistivity
a
26
uses
N-
both
or
P-
.
substrate, or in an epitaxial layer of silicon on a P+ or N+
Since the well doping does not have to overcome the
wafer.
substrate
doping,
both
the n-channel
the p-channel transistors
p-well and
transistors in the
can be
in the n-well
Domino CMOS is enhanced by the use of this
optimized n-channel devices can speed up
process since the
the complex boolean expression evaluation and the optimized
optimized.
devices can
p-channel
speed up
signal drive
the
stages (thereby reducing the effect of a given
D.
f
between
anout)
CMOS TECHNOLOGY SELECTION
implementation
CMOS
The
available
technologies
MOSIS are CMOS-Bulk p-well with
one metal layer,
p-well with
CMOS-Bulk
metal
two metal
layers
and
layers,
capacitors
(for
from
CMOS-Bulk
p-well with
two
circuits)
and
analog
CMOS-SOS.
CMOS-Bulk are:
The advantages of
margin,
(2)
faster than NMOS,
fabrication process.
susceptibility,
(2)
and
(3)
Its disadvantages
a
good noise
proven reliable
are:
(1)
latchup
use of p-well guard rings is needed if
radiation hardening is desired,
than NMOS or
very
(1)
CMOS-SOS,
and
(4)
lower circuit density
(3)
more
complex design rules
than either NMOS or CMOS-SOS.
The advantages of CMOS-SOS are:
CMOS-Bulk,
(2)
very good noise margin,
radiation hardened, and
are:
(2)
(1)
(4)
no latchup.
faster than NMOS or
(3)
intrinsically
Its disadvantages
expensive fabrication process due to the sapphire,
sapphire variability reduces the reliability of the
(1)
fabrication
process,
thermal mismatch between the
(3)
sapphire and silicon limits the carrier mobility, and (4) it
is not
a
viable technology for
channel leakage.
27
dynamic memory due
to back
CMOS-Bulk
process for
technology
p-well
was selected
the following
the adder for
files for
this process
Naval Postgraduate School
were
the
implementation
reasons.
First,
available at
the
enabling the use of extant
tools.
Second, since this would
(NPS)
computer aided design (CAD)
be the
as
first CMOS VLSI design
at NPS,
utilizing
the most
reliable process is prudent to prevent design problems from
being clouded by implementation process problems.
28
III.
DESIGN TOOLS
methodology on
To employ the Mead-Conway design
three computer aided
scale design,
needed-
design
they are created is the first tcol required.
is necessary
rule checker
rules for
the specified
Though not
a
technology have
complex task,
made for even a
is needed
provides the proper
the Lyra
[Eef. 4],
the design
to.
manual design
circuit simucircuit as designed
Finally,
the
adder,
a
design of the
In the
Caesar
the
design rule checker
Terman's ENL circuit simulator [Ref.
A.
design
been adhered
modest design makes
logical output.
pipelined
sixteen-tit
a
large number of checks that
the
verify that
to
Next,
that all
to confirm
rule checking highly error prone.
lator
tools are
(CAD)
layout design editor for viewing the circuits as
A
must be
large
a
[
layout
editor
10],
and C.
Ref .
11] were employed.
CAESAE
Caesar is a generic layout
for any
editor.
It is not designed
particular VLSI implementation technology.
not even limited to
designing integrated circuits.
layout editor for the
is a graphics
It is
Caesar
creation and manipula-
tion of rectangles where the user specifies the color, size,
It is through the user specified technology
and placement.
file that the rectangles of color
take on meaning.
Naval Postgraduate
there are
files available
metal
School (NPS)
for use with
oxide semiconductors
Caesar.
(NBOS)
and
two technology
One is
the
for N-doped
other is
complementary metal oxide semiconductors utilizing
well
(CMCS-pw)
.
29
At the
a
for
P-doped
works with
Caesar
files
special
its own
cf
format.
These file are indicated by an appended file type of ca(i.e.
Caesar will generate
a
Caltech
Intermediate Format (CIF) file cf the same layout.
Again it
technology file which
tells Caesar which CIF layer
is the
xxxx.ca).
command
On
labels to attach to the colored rectangles.
At NPS,
Caesar is
commands from
to take
set up
any
terminal where the execution of the Caesar program is initi(usually the
ADM-3a console adjacent to the color
ated
graphics display unit)
and from a four-button puck on a
graphics
tablet
attached
its graphics
Caesar displays
the
to
display
color
an AED
results on
device.
767 color
monitor and displays its menus, messages, and prompts on the
Detailed information
command console.
on the installation
Caesar at NPS can be found
and operation of
in Reference
4
and Reference 2.
Caesar is an interactive CAE
The results of any
tool.
command are rapidly displayed on
the AED 767.
of a ccmmand may be
repeated
undone
stroke of the specified key
running Caesar,
a
checker,
to
Lyra,
(u)
cr
(.)
The results
with
a
on the command console.
user may also call upon
check the area inside
single
While
the design rule
and within three
Caesar units* of the current box for design rule violations.
This interactive use of the
design rule checker
layout graphics display and the
helps to insure that there
will not be
any design rule forced changes late in the design cycle when
changes are much more time
of interaction
of
(1)
consuming.
With Caesar's level
with the designer, the design loop consisting
issue commands
to perturb
existing circuit,
(2)
visual inspection to verify command's generation of desired
A
Caesar design is layed out on a grid of Caesar units.
These units do not represent any specific length.
When
creating a CIF file from a Caesar file the desired length of
a Caesar unit is specified.
30
results, and
(3)
design rule checking of new circuit, can be
rapidly completed.
Caesar
circuits can
be created
files of type
sub-cells.
hierarchical design
is a
by piecing
which in turn
.ca)
Theoretically,
together cells
may be made up
(other
Net only can cells
(sub-cells,
be called upon to fill locations in a circuit,
need to be modified to function properly,
subedit
facilitate editing
mode to
below the current
of
layouts one
a
level
be taken when
is used since the changes
Everywhere the given
cell are global.
if they
Caesar provides
Care must
editing level.
this subedit feature
other
of
there is no limit to the number
of levels in the hierarchy.
etc.)
Caesar,
With
tool.
made to the
cell is used on the
chip, the newly edited version will appear.
B.
LIRA
like Caesar,
Lyra is
design rule
generic
a
When Lyra is invoked from within Caesar,
checker.
the actual program
depends on the
technology file indicated in the header of the Caesar file
being edited.
After running, Lyra sends a message to the
executed
for design
to check
rule errors
command console indicating the number
the graphics display Lyra paints
error and labels
The error
of errors found.
the exact location of each
design rule violated.
each error with the
label consists
On
of abbreviations
for the
layers
involved, followed by an underscore, followed by an abbreviation for the type of violation detected.
Table
1
lists the
abbreviations used by Lyra for CMOS-pw.
The
winter
1983
distribution
of
the
University
of
California at Berkeley (UC3) CAT tools included two versions
of Lyra.
One for the Mead-Conway NMOS design rules and the
other for the Jet Propulsion Laboratory's
feature size CMOS-pw
design rules.
31
five-micron
Since MOSIS no longer
(JPL)
TABLE
1
Lyra Error Abbreviations
Abbreviation
Layer
polysllicon
metal
p-well
n+ diffusion
cut
p+ diffusion
*
s
w
d
X
m
Erro r
minimum width
minimum separation
malformed transistor
c
P
fabrication of
supports
CMOS-pw process,
the JE1
Professor
obtained.
design
CMOS-pw process
the MOSIS supported three-micron
rules for
were
P
Annatarone
Marco
at
Carnegie-Mellon University (CMO) generated the listing of the
three-micron CMOS-pw
design rules compatible with
has provided NPS with a
the prototype
from
To generate executable code
copy.
Lyra
Lyra and
program
and imbed
process design rules, the program rulec
specific
the
(see Appendix B)
is
run with the design rule list file as its argument.
when Lyra
Now,
is invoked from Caesar
CMOS-pw technology circuit,
contact cuts,
This version of Lyra
for exceeding any maximum
maximum size
only
a
the three-micron minimum feature
size CMOS-pw design rules are applied.
does not check
while editing
design rule
in this
which may not exceed
3
dimensions.
technology is
microns by
8
The
for
microns.
Avoidance of
improper contact cuts can be accomplished by
utilizing Caesar's hierarchical nature.
Contact cuts of all
needed sizes and types are generated once and saved to be
inserted as cells wherever needed.
C.
SIMULATION
Once
loop,
a
completed this initial design
circuit layout has
it matches the designer's conception of how it should
appear and is free of design rule violations.
ance of the given circuit,
simulate the
though,
performance of the
SPICE [Ref. 11] and ENL
[fief.
remains uncertain.
design,
11] are used.
32
The perform-
programs
To
such as
.
SPICE
1 .
SPICE is an important simulation
CMOS digital and analog
of high speed
device
detailed
circuits.
With its
provide
can
SPICE
modeling,
tool in the design
accurate
predictions of performance once the device parameters of the
implementation technology
SPICE provides
are known.
the
logical output of a circuit based upon the inputs and
describes the transient behavior of the circuit as it
changes to the new logical output.
Thus SPICE enables
designer to optimize transistor dimensions for speed.
a
Unfortunately, the version of SPICE currently available en both the Vax 11-780 and the IBM 3033 at NPS
2G6)
(version
fails when the parameters of the devices fabricated by
the MCSIS three-micron CMOS-pw
With these
trocess are used.
parameters the transient behavior solutions do not converge.
Engineers
Washington
at
are
(UW)
version of SPICE
therefore
currently
however,
of
experimental
UCB)
which is
CMOS-pw device
has other bugs and is
general
changes to SPICE 2G6 that enable
University
employing an
developed at
x
available for
three-micron CMOS-pw
the
with the three-micron
This version,
not
and
UCB,
{version 2X.
successful simulating
parameters.
CMU,
distribution.
The
SPICE 2X.x to simulate the
devices will be incorporated
into the
distribution of SPICE
{version
2G7)
The Naval
Postgraduate School is in the gueue of institutions to
next
receive SPICE 2G7 once it is ready.
In order to run a SPICE simulation of a
designed
place
circuit
following steps
should be
First, the labeling feature of Caesar is used to
using
executed.
CMOS circuit
Caesar,
the electrical
labels on
(Vdd,
the
nodes of
GND, input, output, etc.).
command
:
cif 100 -p
33
interest in the
Second, the Caesar
is issued to generate the baseDame. cif file.
100
indicates
unless
be specified
must
and
scale of 100 centimicrons
a
centimicrons per
Caesar unit
The parameter
per
Caesar unit 5
default value
the
of
The -p
is desired.
200
option
causes entries to
be made in the basename.cif
labels assigned-
Third, after exiting Caesar and returning
the circuit extractor
to Unix,
file for the
Mextra [Eef. 10] is invoked
using the command
mextra basename
%
To modify the basename. sim
to create the file basename. sim.
file to a SPICE file
[Ref.
of
(basena me . spice)
,
the program sim2spice
The basenane. spice file contains
11] is used-
capacitors in
transistors and
the circuit
in a
a
list
SPICE
compatible format.
file must be
The basena me. spice
to specify the wave-
model parameters for the transistors,
forms of the input
(s)
performed (usually
,
add the
edited to
to specify the type of analysis to be
transient analysis)
output to be produced (tables,
graphs,
and to
specify the
The Spice
etc.).
User's Manual [Eef. 11] contains the formats of these additions to basename. spice.
Best case and worst case device
model parameters for the
MOSIS three-micron CMOS-pw process
as
compiled by Dr.
of
MIT are found in Appendix A.
2.
M
Annaratone of CHO and Dr.
L.
Glasser
EN I
ENL is a timing and
circuits.
It
is an
logic simulator for digital MOS
event driven
simulator which
uses
a
resistance-capacitance model of a circuit to estimate node
transition times and to estimate the effects of charge
5 Since the
minimum dimensions for the 3-micron CMOS-pw
CMOS-pw
process are specified in microns instead of lambda,
circuits are usually designed or Caesar using one micron per
Caesar unit.
34
sharing.
6
After input values have been assigned by the user,
those inputs by repeating the
RNL calculates the effects of
node value
following operations until there are no further
when a node is added to the network due to a
changes:
(1)
the charge sharing implications
transistor being turned on,
of the new node's capacitance and logic state on each of its
electrical neighbors is
might be affected,
computed,
Vthev and Ethev
for each
(2)
(the
node that
parameters of the
equivalent circuit)
and the new
are calculated
(O.OVdd to
0.3Vdd =
logic state is determined from Vthev
logic 0, 0.8Vdd to I.OVdd = logic 1, logic X otherwise), (3)
if the node has changed state, the transition time is calculated using the node's capacitance, and (4)
any changes are
propagated to other nodes.
Details of the computation
Thevenin
found in the RNL Version U.2(0W)
methods used by RNL can be
User's Guide [Ref.
11].
understanding
what
of
More important to the
information
RNL
user is an
what
it
idea of
an
keeps,
discards, and how it decides what to do next.
the operation
Basic to
event.
of RNL
is the
The three elements of an RNL event are:
in the network,
a
(2)
(1)
new logic state for the node,
a node
and
(3)
the time when the node value changes to the new logic state.
RNL maintains a list of events,
what processing remains to be
an input,
sorted by time,
done.
that tells
When the user changes
an event is added to the list.
RNL sequentially
processes the next event on the list, stopping when
list is empty,
or
(3)
elapsed.
when
(2)
a
(1)
the
node the user is tracing changes value,
simulation
the specified
To process an event,
time interval
has
5NL removes it from the list,
changes the node's state to reflect its new value, and then
6 Charge sharing
refers to the capacitive
effects that
happen when two or more previously unconnected nodes,
each
having seme charge and capacitance, become connected by a
resistor (transistor turning on).
.
35
calculates
events resulting
any new
the node's
from
new
value.
calculating new events, first all nodes that
might be affected by the change are found and marked.
This
includes the source and drain cf all transistors for which
In
node is
the current
the gate
nodes through
these
turned
non-conducting transistor
a
node capacitances.
discharging of
two calcula-
sharing calculation is
due to the charging and
charge
performed to model changes of state
First,
to
search
The
For each marked node,
or an input is reached.
made.
nodes connected
transistors.
on
network stops when a
through the
tions are
and all
Second,
final value
a
calculation is done to determine the node's ultimate logical
state.
given node can have only two events pending:
A
(1)
a
describing an immediate change in the
node's state due to charge redistribution among the nodes on
the connection list, and (2)
a final value event describing
charge sharing event
driven state
the final,
of the
node.
observes the
RNL
following rules for processing events: (1) when a new charge
sharing event is scheduled,
throw away all previously
pending events for the node, and
event is calculated,
it will be
when
(2)
new final value
a
ignored if
there is
(a)
a
pending final event for the same value which is scheduled to
occur sooner,
(b)
there is
a
pending charge sharing event
for the same value as the new final event,
or
(c)
there is
no charge sharing event and the new final value event is the
same as the node's current value.
These rules are based on
event that was last calculated
reflects the latest configuration of the network and therethe
fore
assumption that
should override
sharing
events
because
any
followed by
events
discard
charge
a
the
calculated earlier.
pending
any
calculation
sharing
new final value calculation.
36
value
final
is
Charge
events
immediately
—
—
These event rules,
incorrect results.
generate
sometimes lead RNL to
however,
especially true
This is
of
signal driven circuits (circuits where inputs are applied to
transistor as well as its gate)
the source and drain of a
and circuits
devices
that depend
the behavior
predict
to
analog properties
on the
of
of the
circuit.
the
For
example, consider the first exclusive OR gate design for the
/\
A
1
1
-C
o
Ql
103
c
Q5
—
i
1
a(+)b
,
1Q<
'*
1
i
q:
i
q6
>
—
—•
B
Abflr
CMOS Exclusive OB
Figure 3.1
pipelined
adder in
,
,,.
Figure
3.
1
[
Ref .
This design
6 ]-
has proven
to
function correctly at CMU, however, the RNL simulation shows
this circuit failing.
Starting in
assume that the
Q1
,
Q3
,
0.4,
a
input
state
A
where A=0,
B=1,
then transitions
and Q6 are on.
When input
to
A
1.
and
out=1,
Initially
goes high,
Q3 is
turned off (no events generated) and Q2 is turned on, generating a charge sharing event and a final value event for
37
Abar resulting in Abar going low.
still turned
on Q6
trying to drive the
is now
low and the still turned on Q4
a
finite
amount of
When Abar goes low,
output node
recognizes that it takes
(RNL
time for Q4
the
off but
to turn
does not
recognize that n-channel transistors do not conduct high
is still trying to drive the output node
voltages well)
The result is an output
high.
of X,
the undefined state.
Since
turning off Q4 adds no new
nodes to the network, the event list is empty and the output
Q4 is turned off.
Next,
remains at
The
X.
primary difficulty
centers around
circuit
the fact
controlled by two nodes that
As a
result,
a
RNL has
that the
with this
output node
is
can change at different times.
charge sharing
event due to one
input can
eliminate a final value event of the other,
with that final
value event being the force which determines the circuit's
actual behavior.
The circuit cf
Figure 3.2 is
which also fails in BNL simulation.
a
proven latch design
In Figure 3.2 the frac-
tions next to the transistors
represent the length to width
ratios of the devices.
circuit is dependent on these
This
ratios fcr proper operation.
gain
input signal
of the
greater than
gates.
These
on
the
1
to cause the gates
or
locked up at
circuits,
X.
other
(see chapter 5)
signal to
is
the same
of Q5 and Q6
to be at
the input signal is the opposite
when
of the feedback signal.
and Q6
of Q5
the difference in these gains
RNI does not recognize
either logical
gates
of the feedback
the gain
to be sufficient
ratios insure that the
As
a
result,
the circuit becomes
Because of RNI's difficulty with these two
designs were employed
in the
final adder
to facilitate testing of the overall design.
To use RNL as installed at NPS,
should be followed.
basename.cif as before.
extract the circuit,
the following steps
First latel the circuit
and generate
Again the program Mextra is used to
this time with the
38
-o option
(Mextra
CMOS Latch Design
Figure 3-2
basenaie -o)
The
.
capacitances.
A
[
Ref . 6].
-o option causes Mextra
not to compute
follow on program in this sequence, Presim,
performs this computation with greater accuracy.
It should
be noted that
there are three different circuit extraction
There is the MIT version, the
programs, each named Mextra.
DCB version and the
Ufi
modified UCB version.
to be used in the seguence,
format of the
The next tool
Presim,
can accept the output
MIT version and the UW
modified UCB version.
At NPS,
the UCB version is installed and was used.
and UI
modified DCB
parameters
in
a
versions differ
transistor
Annaratone at CMU developed
•
a
order of
the
specification.
Professor
program, cformat,
to change a
generated by the UCB
sim file
in the
The MIT
version to the
MIT format.
However, cformat does not work if the -o option is used with
Mextra.
To
manually be
file can
accuracy,
the .sim
avoid a loss of
The
changed to the Ufl modified UCB format.
39
:
:
first
step in
editor to
The
the 71
to use
text
header line of basebe made
is to
that needs to
UCB" to the
add "format:
name. sim.
change is
this format
other change
n-channel transistors from "n" to
Using the EX editor,
the following steps accomplish
change the labels for the
"e".
this
basename.sim
g/ n/s//e/g
e
%
:
-
invokes the editor
-
make global change
for all n as first char
change to
in a line,
e
:
w
-
write back edited file
:
g
-
exit editor
The next
to create
step is
a
binary file
for RNL
This is done by issuing the
from basename.sim using Presim.
command
%
presim basename.sim basename config
Basename.sim is
the edited
.sim file
and basename
file into which presim writes its binary output.
the
calibration file
values for
A
copy of
used
select other
to
the circuit element capacitance
user's
the presim
Consortium
release 2.0
simulating
the adder
guide from
is the
Config is
than
default
and resistance.
the UW/NflC
VLSI
calibration file used in
are contained in Appendix C.
The
and the
values used in the calibration file are taken from the MOSIS
supplied electrical parameters.
The final step is to run
RNL itself.
This is done
by entering one of the following two Unix commands:
%
rnl
%
rnl cmdfile
or
where cmdfile is the name of
RNL commands.
RNL
to
take
Entering
its
a
file containing a seguence of
the first Unix command
commands
directly
40
from
will cause
the
console
:
fying
a
speci-
Onix command is used,
If the second
interactively.
command file, RNL first executes all the commands in
cmdfile and upon completion, starts taking commands from the
In either case, RNL should be given the following
console.
commands
(load "uystd. 1")
(load "uwsim. 1")
(read- network "has ename")
the file generated by presim.
where basename is
load RNL
two commands
with several
The first
macros which
simplify
user interfacing with RNL.
The user interface
with RNL is
a
LISP interpreter.
The interpreter continuously executes the loop:
command,
actions,
read
(1)
a
evaluate the command and perform the specified
There are two formats
print the result.
and (3)
(2)
for specifying commands to this loop.
The first is:
(function argument argument ... argument)
Here the parentheses delimit the command and spaces separate
The interpreter reads the entire command,
the elements.
parenthesis,
to the closing
preted as a function and all
up
then the first element is interthe others as arguments.
The
arguments may be of the same command form, (function arg arg
... arg).
If the following command were issued to RNL,
(+22)
12
(*
RNL would respond by typing
(/
96
14
7
))
(12*4*2).
The other format
for commands to RNL is
(function
where the
"
f
"
'
(argument argument ...
argument))
indicates the quote special form which keeps
its argument from
being evaluated.
For example,
(+
2
3)
evaluates to 5, but
(+ 2 3)
is a string of three elements.
When this second RNL command format is not used to represent
an argument of another command (i.e. is not contained within
f
41
the parentheses of
another command)
be written in
it may
,
the more natural form:
function argument argument .... <newline>
in the University of
Tutorials on RNL are contained
Washington/Northwest
VLSI
Refe r ence Manual [Ref. 11]-
Design
Tool s
There are two points concerning
cycle
RNL simulation
Presim,
the aextra,
Consortium's VLSI
user should be
a
aware of that are not brought out in the documentation.
first
concerns the use of vectors in RNL commands.
evidenced in
the tutorials
Simula lion results
make
verbose.
output of
and
11
less
RNL
then want to assign values
used to
cumbersome
vector has been defined,
After the
As
the adder
vectors can be
in Appendix D,
and
input
the
of Reference
The
a
and
user will
The documentation shows
to it.
the format of the vector value assignment command to be:
(invec
However,
'
(vecname values))
"values" field has
the
The first character should be a
its own
or a
respectively.
and negative numbers,
1
specific format.
indicating positive
The LISP interpreter
negative numbers but RNL will not accept
negative numbers as logical inputs.
The second character is
will
a
work with
letter specifying
the number base of the
for binary, h for hexadecimal)
binary value +101010
to
-
input vector
(b
For example, to assign the
the vector vectone,
the RNL command
would be:
(invec
The
labels on
»
other
(vectone 0b10 1010})
point
the input pads.
concerns the
Ehen
location
the entire chip
of
input
is being
simulated, the input labels are normally placed on the metal
pads where the off chip leads are attached.
Before an input
signal from a bonding pad reaches the interior circuits of a
chip it must pass through a resistor in an overvoltage
42
circuit.
protection
process
this
resistor
on input
Therefore,
extraction
the
In
is
pads,
viewed
as
an
the input label
and
simulation
open
circuit.
must he placed
after the resistor in the signal path.
With Caesar,
the requisite
Lyra,
CAD tools
design loop.
for the
With these tools
design rule errors
can be designed.
and ENL,
designer at NPS has
complete logical
circuit
circuits that are
free of
and produce the desired
The lack
logical results
of SPICE somewhat restricts the
designer's ability to optimize speed,
design techniques that can be
run fast.
a
but there are several
employed to design chips that
These will be covered in the next chapter.
43
IV. DESIGN OF THE ADDER
the primary goals of the
As stated in the introduction,
adder design are
The adder is to
testability.
clock cycle
(A
it should accept
the least
1 ,
to maximize throughput and
least significant bit,
pipelined adder.
fce
a
as
inputs two
significant bit,
Every
16-bit addends
through A16
and 31,
and one carry-in
through B16)
It is desired to produce the 16-bit sum
bit.
to provide for
(S
the
(Cin)
,the least
1
significant bit, through S16)
and the carry-out (Coat)
bit
as quickly
as possible.
Both the number of clock cycles
from input of the
addends to the output of the
sum and the
duration of each clock cycle are to be minimized.
consideration in
dary
the
design
secon-
A
is expandability.
An
expandable design is one that can easily be extended to
produce a 32-bit or 64-bit sum utilizing the same circuit
structures.
In this chapter the
logical design and layout
design of the 16-bit adder will be presented.
presented in
tions found
Comp uter
this chapter are
taken or derived
in chapters three through
A rithme tic
by Flores
The equations
[
Eef .
12].
from equa-
six of The
Logic of
In these equations
concatenation implies the logical AND,
the symbol + implies
the logical OR, and the symbol + implies the logical XOR.
A.
LOGICAL DESIGN
In
considering the
speed
spectrum
of adders
from
a
logical standpoint, at the fast end there is the table
look-up.
With 33 binary inputs and 17 outputs,
this would
33
require an address space of 2
17-bit words.
With current
technology this is
not feasible-
spectrum is the serial adder.
At the other
On clock cycle
44
1
end of the
it uses A1,
and Cin to
B1,
into tit 2).
produce 31 and Clout
On clock cycle
it uses A2, E2,
2
Here
generate S2 and C2out.
16
of tit one
(carry out
and Clout to
clock cycles elapse before
An adder can also be implemented as a
the sum is available.
ripple carry adder where the duration of each clock pulse is
sufficient to allow a carry into the sum to propagate all
In the case of the 16-bit
the way through to a carry out.
adder,
this would require a clock duration at least sixteen
times the
middle ground
The
£Ref. 3].
gate delay
length of the
each bit position,
C (i)
A
carry into bit(i)
=1
G(i)
G
(i)
<?(,->=
addition the carry into
[t)
QB
(ecn 4.2)
=1
P (i)
implies that
propagated through to bit
and
A (i)
will provide
B (i)
(i+1).
carry
a
regardless of the contents of the
of the sum,
£(,-,)+£(,-,)/>(,-,)+
5 (.)=
less significant bits of
adder
(egn 4.1)
{l)
primitives.
,
will- be
implies that
into bit (i+1)
(CIA)
A[,)B {i)
<?(,-)=
a
carry look- ahead
generated from the propagate,
is
,
/>,,,=
P(i), and generate,
the
belongs to
In carry look-ahead
bit adder.
of the one
A
•••
+ Cm P [i _
1
yP
{7)
P
[1)
c l>)® p (.)
and
sum generation is as follows.
E.
(egn 4.3)
(
e^n
u
«
4
)
The algorithm for the CLA
The first event is the evalu-
ation of equations 4.1 and 4.2 to generate the P
primitives.
The second event uses the P(i) and
(i)
and G
G (i)
(i)
primi-
tives as inputs to eguation 4.3 to generate the C (i) 's.
The
final event is the computation of the S (i) •s from equation
4.4
.
45
1
As pointed
out by Flores
[Eef.
Conradi and
12] and by
Hauenstein [Eef. 3], there are several logical implementatask of
tions of carry look ahead addition.
A principal
this
thesis investigation
was
select
to
a
fast
Without the circuit simulator Spice,
design.
of each design considered was
In this
tative.
logical
the analysis
more qualitative than quanti-
qualitative analysis,
a
turned
on tran-
sistor is considered as a resistor with its resistance
proportional to its length and inversely proportional to its
width.
All gates driven by such a turned on transistor are
considered to
be capacitive loads with
tional to the area of the gate.
considered
to
add
capacitance propor-
The interconnect wiring is
both parallel
capacitive
series resistance as shown in Figure
loading
and
4.
/[\
si
Rtraru
Rwire
Rwire
—^WaCwire
Rwire
Cgatel
Cgate
n
Rtrans
«
/
Figure 4.1
CHOS Output Loading Model.
From this model it is obvious
connect
wiring and
the
number
that the amount of interof gates
driven
(fanout)
should be minimized to minimize the output transition time
when the positions
of switches SI and S2 of Figure 4.1 are
46
.
This led
reversed.
following
to the
guidelines in
the
design of the adder:
1)
internal logic of each stage should be accomplished with minimum dimension transistors , 3 microns
This leads to more
microns (length x width)
x 4
The
interconnections and
reduces the capacitive load on the preceding stage.
(3-micron x 9-micron)
Significantly wider transistors
should be used at the output of each stage where the
2)
with shorter
circuits
compact
fanout and interconnect leading is greater.
3)
should be kept
of any transistor
The fanout
to less
than five.
requires
This
the capacitive
because
area.
3-micron
A
3-micron
3-micron
x
loading of
8-micron
x
gate
a
has
depends on
transistor driving
the
its
six other
fanout of
a
fanout
of
4-micron transistor driving
4-micron transistors
x
definition
more complete
a
six.
A
same load
is
considered to have a fanout of three.
Though this implies
solved
by
merely
that
a
high fanout problem
can be
increasing the width of the driving transistor,
the effects of the interconnect wiring.
to the load of a
be
more remote
resistance of the
driving
transistor.
wiring is proportional to
inversely proportional to its width,
increase unless
wiring will
As gates are added
each subsequent addition must
transistor,
from the
it neglects
the width
Since
the
its length and
the resistance of the
is also
increased.
However, since the capacitance of the wiring is proportional
to its area,
most of the gain achieved by widening the wire
to reduce
resistance is offset by the increase in capacitance.
As a result, in the design of the adder, increasing
the width of the
driving transistor was not viewed as a
complete fix for a fanout problem.
For the
addition,
comparison of the
different approaches
the term logical event needs to be defined.
47
to CLA
The
definition
basic
most
logic
combinational
is a
circuit
performing its specified operations on those inputs and generating a set of outputs.
followed by the compuTherefore, the input of the addends,
accepting
set of inputs,
a
tation and output of the sum can be considered as a logical
However,
design consideration for the
event.
a primary
to provide
adder is
for testability and
is the availability of
this provision
(see section 3
of this
chapter).
key
a
element of
intermediate results
This implies breaking up
the sum generation into several separate events.
The first
event takes the addends as inputs, performs some logic operation
on them and stores the results in a register.
(s)
takes its inputs
next event
from that register
its results in another register.
and stores
This chain continues until
event deposits the sum
the last
The
on the output pads
of the
To provide the tester with easily interpreted inter-
chip.
presented in
the equations
mediate results,
this chapter
were taken as boundaries for each logical event.
the inputs and
side of the equation determine
on the right
The terms
the left side terms determine the output of a logical event.
all
Once
equation are generated by
the logic of the equation becomes part of
inputs
the
previous events,
for an
the current event.
1 .
Zero Level CIA Logic
This
to generate
the
First, equations 4.1 and 4.2 are used to generate the
sum.
P (i)
three events
logic requires
f
s and G (i) 's.
Second,
Finally,
The principal problem
generated.
from equation 4.3 the C
the sum is
in the application
input P
has
(1)
a
fanout of 15,
unsatisfactory.
48
f
s
are
derived from equation 4.4
with this approach for
adder lies
(i)
of equation 4.3
which
a
sixteen-bit
Here,
the
makes this approach
First Level CIA Logic
2-
that
Noting
logic is
level CIA
cascading 4-bit
Table
2
four-bit
a
within the
generated using
sum
suggests
design guidelines
the same
slices of
logic as
zero
indicated in
available after six events and the
Here the sum is
TABLE 2
First Level CLA Logic for a 16-bit Sum
Event
Bits
No.
1-4
Bits
13-16
P(i),G(i)
Compute
P(i),G(i)
P(i) rG(i)
Delay
Delay
P(i),G (i)
P(i) ,G\i)
Compute
Compute
1
Bits
9-12
Bits
5-8
P(i) ,G(i)
Compute
2
P(i) ,G\i)
C(i)
S(i)
C[i)
Compute
Delay
4
5
6
Delay
Compute
Compute
3
P(l)
f
G\±)
Compute
c(i)
S(i)
S(i)
Delay
Delay
S(i)
S(i)
S(i)
Delay
Delay
Delay
S(i)
S(i)
S(i)
fanout is reduced by
a
Compute
factor of four.
Compu te
Delay
Delay
P(i) rGli)
Deiav
?(i) rGli)
Compute
C(i)
Compute
S(i)
The event cycle time
reduction would more than make up for the event count
increase since cycle time grows faster than linearly with
fanout.
The only drawback with this design lies in the cost
of extending
it to
generate 32-bit
or 64-bit
sums.
every 4-bit slice added, another event is required.
64-bit add would require
3-
blocks,
Thus,
a
events.
Second Level CLA Logic
Again the data
blocks.
12
For
is divided into 4-bit
slices called
But rather than let the carries ripple through the
two new primitive
functions are introduced.
49
They
are the block propagate, 3P(i)
functions.
3P(i) =
1
implies that a carry into block
be propagated through to
block
will generate
(i)
and block generate,
,
block
(i+1)
carry into block (i+1).
a
For
is the least significant bit,
block where bit(1)
BG primitives are generated by
(i)
,
will
implies that
BG(i)=1
.
3G(i)
a
4-bit
The BP and
equations 4.5 and 4.6 respec-
tively, with the P(i)'s and G(i)'s computed as before.
P {i )P (i)P (i)P (i)
BP[i) =
BG [i)
Next,
~ G
(<)
+G
WP
(«)
+ G W P (*) P (»)"*" G
(i)
H)
P
i*)
PWP
(egn 4.6)
(2)
,
which represents the carry
into block (i+1),
is computed using equation
the block carry,
from block
(egn 4.5)
3C (i)
4.7 which represents the same lcgic as equation 4.3
*<?<o-
£
*
same method
(i)
'ji+1
Bp u
s,
(egn 4.7)
the
? (i) 's,
G(i) 's,
and BC(i)»s have been generated.
If the
of generating
level CIA were
to be used,
the final
used in
sum as
two additional
zero
events would be
The first again applies the logic of equation 4.3
required.
to each
'
}
after three events,
So far,
BP(i)'s, BG
BG i»
=
to generate the
4-bit block
Here the Cin for block
(i)
carry into
is given by BC(i-1).
each bit.
The second
from the C (i) 's and
One of these events can be eliminated if, while the
P (i) s.
BC(i) 's and their predecessors are being computed,
an estimated sum of the 4-bit block is also computed.
One method
cycle
is used
to
generate the
sum
f
is
to
compute two
estimated
sums
for each
block,
one
assuming an carry into the block of
and the other assuming
a carry in of 1.
When the correct carry in for block (i) is
generated,
it
is used to multiplex
block to the output.
the correct sum for the
This assumed carry method was rejected
50
because of the
large amount of area consumed
ters needed to hold
is to
compute the
by the regis-
two possible answers.
The second method
estimated sum
block assuming
of the
a
and then correcting the estimated sum once the
actual carry-in to each block is known.
carry-in of
Since the estimated sum, ES
after the
(i)
third event and computing
,
is not needed until
event again
it as one
leads to fanout problems, the computation of £5(4), the most
significant bit, through ES 1)
is computed in two events as
(
First, an intermediate estimated sum,
follows.
computed using two-bit
(see equations 4.8
from bit
4.12
(2)
each assuming
slices,
On the next event, ES
(i)
a
carry
P
.
(eqn 4.8)
/»(,,
IESp) = P{2)QG{i)
IES {i) =
(eqn 4.9)
(eqn 4. 10)
{i)
(eqn 4. 11)
IC2Z = G( 2 )+G(
>
1
)/
(
2)
4. 12)
(eqn 4. 13)
£5( 2 = IES ^)
(eqn 4. 14)
)
)
ES {i) =
(eqn
£5 (I = IES (i)
ES {S)
=
!C2ZQlES {i)
[lES {i) IC2z]QlES i4)
51
is
is computed from the IES(i)'s
and IC23 using equations 4.13 through 4.16
IES {1) =
,
is computed using equation
(IC23)
(3)
(i)
carry in
a
At the same time,
through 4.11).
into bit
IES
(eqn 4. 15)
(eqn 4.
16)
estimated sums
three events,
after
Now,
4-bit block and the actual carry into each block
tions 4.17 through 4.20
SW =
S {1) =
S H) -
can easily
be extended
[c^ES^QESp
\c, ni
ES (i) ES
BP and
primitives,
third level
^QES
{
(egn 4. 18)
(egn 4.19)
{i)
(egn
c ,nh ES {l) ES [2) ES (S) (~)ES {i
logic,
a cd
primitives represent the carry
is
this design
64-bit sums.
4.6 which produced
BG can be
4.20)
16-bit sum
generation of
to the
B3P
the
Additionally,
events.
The logic of equations 4.5 and
level primitives
(egn 4. 17)
C<niQ ES {l)
level CIA
generated in only four
are
.
s [i) =
Using second
(Cinb)
can be computed using equa-
From these the sum
available.
for each
the second
used again
to generate
These
third level
33G.
propagate and carry generate
properties of
16-bit slices.
The carry into each 16-bit
block is provided by implementing equation 4.7 .
Thus,
adding one
event will provide the
16-bit blocks of
a
6
4-bit sum.
carry into each
of four
The logic of equation 4.3 is
then used to generate the carry into each 4-bit block of the
sum and
the final
sum is computed
as before.
result is that by adding two events, for
using the same logic as before
be
designed),
(i.e.
the 16-bit adder can
adder.
52
a
The final
total of six, and
no new circuits need to
be extended to a 64-bit
B.
DESIGN FOR TESTABILITY
Another primary
that is,
provide for testability,
design was
the adder
objective cf
to
the ability to logically
fabrication errors or circuit malfunctions rather
than visually searching for faults with a microscope.
the
As the complexity of integrated circuits has grown,
detect
ability to logically
the number
complexity increases,
tested
for and
isolate
a
of a
number of input
testing is
As
faults to
be
vectors required
to
of likely
input
number of
markedly.
Unless
grow rapidly.
a
design
allows the tester to examine the
chip ,
the order of magnitude of the
used which
interior logic
desired,
the
specific fault
technique is
has decreased
and outputs
available inputs
the normally
detect faults using only
vectors required to perform
useful logical
if logical
testability is
prohibitive.
Thus,
design technique that
a
provides for it
must be
used.
One such design technique is level sensitive scan design
(LSSD)
£Ref.
13].
level sensitive implies that the output
of any logic element is dependent
inputs.
only on the levels of its
No logic elements are allowed to depend on a tran-
sition such as in an edge triggered flip flop.
implies that all memory elements in the design
an auxiliary function where their
to an output pad for examination.
Scan design
are to have
contents are serially fed
This gives a tester the
ability to
examine intermediate results.
applying the
In
1SSD technique to the adder design, the following steps were
taken.
circuits were designed to respond to the
level of their inputs and not to require a transition to
trigger their operation.
Second, to insure that each logic
First,
all
event worked only with stable,
non-fluctuating input levels,
the inputs to each event were
gated.
53
The input gates were
opened only after the inputs
of the
previous event were
were stable
stable)
the outputs
(i.e.
and closed
before the
the previous event were opened.
Third,
input gates of
a
was used to stcre the output of each logic
dual mode latch
In the
event.
normal mode
the outputs
latches
of
cf
one lcgic
operation,
event
the
register
in parallel
and
stores them to be used as inputs for the next logic event.
its secondary mode of
operation,
the register
stops
In
parallel inputs and starts to run as
register, shifting its contents onto an output pad.
taking its
a
shift
conseguences of using the LSSD technique is
amount of area consumed by the dual mode regis-
One of the
the large
ters.
In high speed operation,
an inverter pair would be
sufficient to store inter-event results.
speed testing where the capacitance
But to permit low
of a gate may discharge
during one clock phase, and provide the dual mode feature,
a
pair of clocked latches with control circuits is required.
C.
LAYOUT DESIGN
With the logic decided upon, the next step was to create
the layout of the adder.
The lcgic consisted of four events
to produce the sum.
Another event was needed to latch the
input data onto the chip.
two-phase clock was needed to
A
insure that two
adjacent events did not
(insuring stable inputs to each event).
run simultaneously
To make the output
compatible with the input to another adder,
a
one event delay was added.
This insures that the output of
one adder does not change while a second adder is using the
sum from
as an input.
the first
With two 16-bit addend
inputs,
one carry-in input,
one power supply (Ydd)
input,
one reference (GND) input,
a 16-bit sum output,
one carryout output, and two clock inputs,
ten pads were left from a
of the adder
standard 64-pin chip for register mode control input
register (shift mode) output.
called
Since the design
54
and
for
five
registers,
latching the input
logic
each
one for
event and
five pads were used
data,
one
for
for input of
the register mode control signals and five were used for the
their
output
to serially
registers
contents.
With
the
required inputs and output identified, the preliminary floor
plan shown in Figure 4.2 was created.
Input Bl
-
phil
B16
in
phi2
in
Cin
Event
2
:
compute
P,
G
(phi2)
u O
3
a
C
Cn
c
CD
u u
Event
3
:
BG,
Compute BP,
IES,
(phil)
u
IC23
0)
—
in
H
Event
4
:
<u
compute BC, ES
(phi2)
u
p
3
a,
jj
a
o
Event
5
:
Compute sum
(phil&2)
Cout
and delay until phi2
SI out
S2 out
Q
2
Output S16
-
S3
(J
Figure 4.2
Preliminary Chip Floorplan.
55
C
C
'
1
circuit designed was
The first
the dual mode
latch of
circuit is designed to latch the IN
level when Control is low (Control is high) and phil is high
Figure 4.3
Here the
T
ph£L
CON
A
JL
J
A
phil
H
—
in
CON
I
-l
fH
I
^
<
I
phil
—
1
A
^
H
4-"
'I
<
shift
in
phil
T
4
CON
i
shift
out
phil
OUT
r
i
ph FT
Figure 4.3
(phi
1
is low).
also stored
phil
Dual Mode Latch.
When phil goes low,
in the
r
second latch
copy of the input is
a
and becomes
available at
shift-out which is connected to shift-in of the next latch.
When control goes high, the IN signal is blocked and the
latch takes its
shift-in of
ground.
input from the register to
the leftmost
latch in
a
the left.
is tied
register
The
to
Versatec plots of the
mode latch and the other
actual layouts of this dual
circuits described in this section
are given in Appendix E.
The
,AND gate
used
was corstructed
from
a
NAND
gate
Similarly,
followed by an inverter as shown in Figure 4.4
the OB gate was constructed frcm a NOR gate followed by an
56
J
J
Although logic implemented using
inverter (see Figure 4.5).
these AND and OR gates is more area consuming than the same
logic implemented in NAND and NCR gates only, the penalty is
used infrequently in the final
not severe because they were
design.
Figure 4.4
AND Gate.
p
/V
p
p
rC
^
—
-1
l[
1
J
,
(
L
-^J
OR gate
i
.A
+ B
1—
Figure 4.5
The exclusive
i
OB Gate.
(XOE)
inverters and three NAND gates
57
as
was constructed
from two
shown in Figure 4.6
.
Thougii this design is considera hly
more area consuming than
it was selected because the RNL
the XCE gate of Figure 3.1,
circuit simulator could correctly model its operation.
Exclusive OR Gate-
Figure 4.6
More
complex
programmed logic
logical sum
(OR)
phase design
phil
is high,
logic functions
arrays
where the
(PLA)
of the products
was needed.
(AND)
time the
produced stable outputs (phi2 gcing
using
outputs are
of inputs.
FLA designed to
A
between the
implemented
were
A
single
compute when
preceding event
low)
the
had
and the time phil
goes low, had to produce the proper sum-of -products results.
To hold down fanout,
a
dynamic structure was needed so that
inputs could be applied to
prevent steady state power
a
single type of transistor.
consumption
a
To
precharged dynamic
structure was needed.
Because of charge sharing, the precharging must take place while the inputs are present on the
transistor gates of the PLA (see chapter 5, section C, for a
complete explanation of the charge sharing problem in this
PLA structure) .
Thus, two distinct events must occur during
58
this time period.
First,
the inputs must
be
applied and
Then evaluation must occur.
precharging must take place.
To cause these two events to
occur during
a
single phase of
the inter-phase time when both phil and phi2 are
The basic structure
low must be utilized for precharging.
the clock,
of the resulting PLA is shown in Figure 4.7
Figure 4.7
deferring
PIA Structure.
the flocrplan
back to
in
Figure 4.2,
the
layout of the circuits which perform the logic of each event
are presented
names assigned to the
layouts are given below. Event 1 consists of a 33-bit dualmode latch.
Event 2, which computes the P and G primitives
for each bit, is made up of 16 AND gates,
16 XOE gates,
and
another 33-bit latch. Event 3, which computes the BP and BG
primitives,
is made
in Appendix
E.
The
and the IC23 for each 4-bit block,
up of four instances
cf PLA82 and a
29-bit latch.
The IES
(i)
f
s
59
.
The
circuit PLA82
2-output PLA
Event 4,
computes the ES(i)
21-bit latch.
f
s
and BC
for each 4- bit
PLA84 to compute
four instances of
instance of
and one
5-product,
two XOE gates, ore AND gate, and one OR gate.
,
which
block uses
an 8-input,
up of
is made
compute the
PLA915 to
the ES(i)'s
BC
(i) 's
The circuit PLA915 is a 9-input,
and
a
15-product,
5-output PLA and the circuit P1A84 is an 8-input, 7-product,
Event
4-output PLA.
compute the
S (i)
f
5
uses four
instances of
and a 17 bit latch to
s
PLA104 to
store results and
provide the added delay (by taking the output from the shift
out position,
the extra clock cycle of delay is generated)
The circuit PLA104 is a 10-input,
With this design,
cycles of
two-phase non-overlapping clock; three cycles of
a
chip and the time
output.
time the addends are presented
the sum becomes available
at the
In the first three registers the odd number of bits
the need to store the carry-in
is due to
4.
4-output PLA.
the input to output latency is three full
the clock elapse between the
to the
14-product,
In the last two registers
value until event
the odd number of bits is due
to the need to store the computed value of carry-out.
resulting final layout of Figure 4.3 shows
actual on-chip layout locations of each event's logic.
The
In
the circuits
addition to the logic circuits for each event,
These are driver circuits for
AMP and AMP5 are also seen.
the high fanout
the
control and clcck signals.
Each takes as
signal and produces as outputs the
control signal and
its inverse,
both driven by 3-micron x
160-micron transistors.
This amplifier is the same design
its
control
input a
used by the output pads to drive off chip loads.
represents one implementation of a
pipelined CLA adder designed for testability.
The relative
This
final layout
merits of this
mented can,
design and others that may
as yet,
have been imple-
only be gualitati vely discussed.
The
addition of SPICE 2G7 to the CAE toolbag will provide future
60
Figure 4.8
CMOS designers
make
decisions
Final Layout.
with the quantitative analysis
involving tradeoffs
objectives.
61
among
necessary to
primary
design
This final design, when simulated using RNL,
Testing of the
should give an indication of
properly at clock speeds up to
14
actual chips produced by MOSIS
the accuracy
presents
a
megahertz.
following chapter
for proper operation of the
of RNL's predictions.
test plan to check
adder at low clock rates and
ating speed.
functioned
The
to determine the maximum oper-
62
7.
TEST PLAH
After several iterations of the design-simulate-redesign
loop,
a
final layout was
achieved for the 16-bit pipelined
These iterations provide considerable confidence in
adder.
the logical correctness of the
layout.
Appendix
ENL simulation results for the full adder.
should be kept in
results it
D
contains
In reading these
adder requires
aind that the
In
three cycles of the two-phase clock to produce the sum.
inputs were kept
first part of the simulation,
the
the
constant for three clock cycles to facilitate easier reading
With these steady inputs, simulations were
of the results.
concentrating
run to verify the generation of correct sums,
on those
addends that
carry generates across
The last
cycle.
feature of the
carry propagates
the boundaries of the
simulation utilized
part of the
each clock
would produce
This
was done
to test
4-bit blocks.
different inputs
the pipelining
insuring no dependence
design,
and
on repeated
inputs of the addends to produce the proper sum.
After fabrication of the chip,
application of similar
inputs to make the same determinations for the actual
circuits will form the initial portion of the test plan. In
this chapter
a
test plan
for the verification
of computa-
tional correctness and speed will be presented.
A.
INPUTS AND OUTPUTS
The first step in testing the chip will be to connect it
to the required input and
this,
must be
the
output circuitry.
identity of the inputs
determined.
To accomplish
and outputs on
each pin
Microscopic
will reveal the logo "16-bit
examination of the chip
Add",
located between the GND
and Vdd
in the
buses for
the pads
63
northeast corner
(see
Figure 4.8 which is repeated below for convenience).
this landmark,
the signals on the
pads can be
follows.
Figure 4.8
(repeated)
64
Final Layout
Using
labeled as
has sixteen input pads
The western edge
A,
least significant bit,
with the
northern
northern
The
end.
located at the
A(1),
edge of
for the addend
the
chip also
has
sixteen input pads for the addend B, with the least signifiThe southern
located at the eastern endcant bit, B{1),
output pads and two input
edge has fourteen
pads.
At its
western end is the GND input pad followed by fourteen output
the most significant bit of the sum, through
pads for S(16),
S(3).
Following
S
(
3)
at the eastern end is the input pad
,
the chip has eight input pads
The eastern edge of
for Vdd.
Starting at the northern end, there
and eight output pads.
are input pads for phil, phi2,
C0N1
Cin,
the dual mode register of event 1),
C0N5.
They are
SREG4, SREG5, Cout,
applied to
S (2)
,
power to
and
for a
logical
a
phase time,
sonie
proper operation,
For
C0N4,
and
(serial
SREG2,
SEEG3,
at the southern end.
+5 volts
volts
and
Simulation with RNL revealed
high for
S (1)
or
DC should
to the GND
including clocks and control
either GND
signals.
of event 1),
the chip,
the Vdd pad
logical inputs
be
C0N3,
followed by output pads' for SREG1
output from dual mode register
To supply
C0N2,
(control signal for
Vdd for
be
All
pad.
signals should
a
logical
1.
restrictions on the clock
each clock
should remain
minimum of 20
nanoseconds and the clock interwhen both phil and phi2 are low,
must be at
least 10 nanoseconds in duration.
For initial testing,
to
insure that charge sharing protlems caused by too short an
interphase time,
and fanout problems
caused by too short a
clock phase duration, are not interpreted as fabrication
errors,
the clock speed should be adjusted so that both
above clock parameters are
exceeded
by
one order of
magnitude.
The outputs, like the inputs,
are at Vdd to represent
logical 1 and at GND to represent a logical 0.
used to measure
the
outputs should have
65
a
The circuits
high
input
impedance,
on the order of
The output pads of
one megohm.
handle the current source and
the adder are not designed to
transistor-transistor logic integrated
circuits should be
circuits.
The output measurement
constructed using NHOS or CMOS devicesthat are designed to
operate between +5 7clts DC and ground.
sinx requirements of
B.
TESTING FOE CORRECT OPERATION
After connecting the adder to
test harness,
a
of correct
the generation
the next
step is
to verify
adder.
There are several inputs that should be included in
the testing
circuits.
sums by
correct operation
to verify the
These are contained
Appendix
i-n
the
of individual
In addition
F.
vectors of Appendix F, several randomly selected
input vectors should be tested. If the adder should fail to
to the test
generate correct sums,
The LSSD features can be employed to
examine intermediate results.
1 .
Interm e diate results
With
the LSSD
design,
tester
a
levels constant for a long period
can leave
input
of time and use the shift
mode of the internal registers to examine the internal state
of the chip.
The rightmost bit
of each register is always
available at the output pad for that register.
To obtain
the contents of the other bits,
the control signal for the
given register
clock continues
serial output
high.
is set to
to run.
at logical
and held
For registers
will be meaningful
and
and stable while
The serial output of registers
2
and
4
5
the
phi2 is
will be stable
lists in order the intermediate
when phil is high.
Table
values available at
the 5REG (n)
3
3,
1,
while the
1
CONn is high.
66
output pad
when the input
TABLZ
3
Register Serial Outputs
Clock
Cycle
SEEG1
SHEG2
SREG3
SREG4
SREG
B1
P1
B2
2
3
4
5
6
B3
B4
B5
B6
B7
B8
B9
BP1
IES3
IES4
BG2
IES5
IES6
IC67
BP3
IES11
IES12
BG4
IES13
IES14
IC1415
BG1
IES1
IES2
IC23
BP2
IES7
IES8
BG3
IES9
IES10
IC1011
BP4
IES15
IES16
Cin
Cin
BC2
Cout
ES2
ES4
S.1
1
P2
P3
P4
P5
P6
P7
8
9
B10
10
B1
11
B12
313
314
315
B16
P8
P9
P10
P12
P12
P13
P14
P15
P16
A1
G1
A2
A3
A4
A5
A6
A7
A8
A9
G2
G3
G4
G5
G6
G7
G8
G9
G10
G11
G12
G13
G14
G15
G16
Cin
7
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1
A10
A11
A12
A13
A14
A15
A16
31
Cin
32
S3
S5
S7
S9
ES6
ES8
S11
S13
S15
ES10
ES12
ES14
ES16
Cout
S2
S4
S6
S8
S10
S12
S14
S16
BC1
BC3
ES1
ES3
ES5
ES7
ES9
ES11
ES13
ES15
33
34
C.
TESTING FOR SPEED OF OPERATION
Once the
culled
from
chips containing fabrication errors
the chip
remaining is to
returned
set
by MOSIS,
determine just how fast the
have been
the
task
adder can run.
Rather than simply increasing the clock rate until the adder
fails, the duration of the time both phil and phi2 are high,
and
the interphase
simulation
time should
indicates that
RNL
reduced separately.
the circuit which generates S4
within P1A104 is the limiting
tion
(i.e.
it
circuit for clock phase dura-
requires the
67
longest
time
to
correctly
evaluate its
inputs).
RNL simulation also
the circuits
in PLA 104
which generate
indicates that
S1 and
S4 are
the
limiting circuits for the clock interphase duration.
precharged dynamic
circuits, the evaluation clock phase must be long enough to
allow the inputs to drive the outputs to their proper
even if the inputs are the same as those of the
values,
previous evaluation cycle.
This allows the tester to use a
constant input as the duration of each clock phase is
Since
the
is constructed
PLA
of
reduced until the adder produces incorrect results.
Determination of the clock
more difficult.
changing to
cause charge
Figure
example,
is in1=1,
interphase duration limit is
This is because the inputs to a PLA must be
5.
1
sharing problems
For
Charge Sharing in a PLA.
in Figure 5.1 assume
in2=0,
to occur.
that
and that this is
68
the first set of inputs
correctly evaluated to
1
Now assume that the next
produce out=0 when phil is high.
in1=0 and
input is
out=0.
in2=1,
evaluate to
should also
which
However, if the precharge time
(when the inputs are
is
present on the gates of Q2 and £3 and phil is still low)
insufficient, C2 will not be charged to Vdd when precharging
ends
was discharged to
(C2
zero volts during
the previous
evaluation when in1 was high and phil was high).
Now, when
the low voltage across
evaluation begins (phil going high)
C2 causes Q5 and Q6 to interpret their input as a logical 0.
As a
result the output of the Q5-Q6 inverter pair goes high,
causing Q8 to turn on,
discharging C4 and resulting in an
output of logical 1, which is incorrect.
Table 4 lists the
proper evaluation seguence when precharge time is sufficient
and
the improper
seguence
In this table,
time.
voltages
a
1
on,
a
indicates off,
indicates GND, and
X
indicates
transistors,
1
indicates
For the
4
PLA Evaluation Sequences
Proper evaluation seguence:
phi in C
Q
2
1
1
12
1234
10
0011
10
01
01
00 11
01
1
0011
0111
0111
1
'234
T;
out
1 1000 10 C0
010101 1C01
010101 1001
001101 1C01
1010010C01
Improper evaluation seguence:
phi in C
1
Q
1
2
12
1234
1234567890
10
1
10
1
1
01
01
01
0011
0011
0011
0X11
oxxo
a
and an X indicates neither fully on
TABLE
1
precharge
for the inputs, output, and capacitor
indicates Vdd,
somewhere in between.
to insufficient
due
1100010C01
010101 1C01
010101 1C01
0011011C01
1010XX0X10
69
1
nor fully
Subsequent inputs
off.
produce correct
with constant
results since
precharge time
will add
of in 1 =
more charge to
sufficient charge to allow the
and
in2=1 may
inputs,
C2 until
each
there is
output of the Q5-Q6 inverter
to remain low.
to check
Thus,
for
charge
sharing problems
in
the
circuit of Figure 5.1, the inputs must alternate.
Likewise,
in PLA104 to check for charge sharing errors in output S1,
BC=0
ES1=0,
and ES1=1,
its inputs must alternate between
as the interphase time is
BC=1
reduced.
four instances of PLA104
plished for all
This can be accom-
simultaneously by
alternating inputs of
A
= 0001
1001
1001
1001
B =
0000 1000 1000 1000
Cin
=
A =
0000 0000 0000 0000
B =
0000 0000 0000 0000
1
and
Cin =
To check for
cycle
must
104
charge sharing errors in S4,
S3=S2=1,S1=0 and
This may be accomplished for all four
between
BC=0, S4=0,S3=S2=S1=1
.
the inputs to PLA
BC=1,
S4=0,
instances of PLA104 simultaneously by alternating inputs of
A
=
0110
1
1
10
1110 1110
B =
0000 1000 1000 1000
Cin
=
1
and
A
=
B =
011
1
0111 0111 0111
0000 0000 0000 0000
Cin =
This maximum
identified
speed testing assumes
the
slowest
circuits
70
that RNL
on
the
has correctly
chip.
RNL
simulations
(PLA915)
PLA915 vs.
have indicated
is at least 20% faster
20.1 nsec for
functioned properly with
next slowest
that the
than PLA104
PLA1C4).
a 5
Also,
circuit
(16.0 nsec for
ail other PLA's
nsec interphase time.
Should PLA104 prove to be the speed limiting circuit for
the chip, the actual failure speeds of the chip can serve as
an
indication of
the accuracy
future designs.
71
of the
RNL simulation
for
VI.
CONCLUSIONS
The experience gained in the design of the adder coupled
with the clarity of hindsight leads to the following conclu-
sions and recommendations.
A.
THE CMOS TECHNOLOGIES
CMOS technologies
The
increasing importance
will
in the
MOSIS is already offering,
size.
A
"VLSI
scalable set of
role
designs
of
one-micron
a
design rules,
fabrication in 3-micron CMOS
the far more
a
steadily
of
future.
the
an experimental basis,
on
fabrication with
Bulk p-well
play
CMOS
minimum feature
allow initial
to
fcr design verification before
expensive 1-microc process is
used,
is being
is considerable
research
developed.
the private
In
sector there
aimed at finding an insulating
not have the
Progress in
substrate material that does
variability and thermal problems
this area
will remove
of sapphire.
the drawback
caused by
latchup tendencies in CMOS Bulk.
B.
CMOS CAD TOOLS
Though the design
tute
a
complete
circuits,
the
tccls currently available at
set for
recent
the
CAD
design of
tool
set
NPS consti-
CMOS Bulk
released
p-well
by
the
University of Washington/Northwest VLSI Consortium,
Release
coupled with University of California at
2.0
[Ref. 11 ],
Berkeley Winter 1983 CAD tools,
represents a more complete
and cohesive set for CMOS design.
When sufficient disk
space
on
the Vax
Release 2.0,
11-780
beccmes
implementation of the
72
available to load the
Release 2.0 CAD package
An added
is highly recommended.
Release
2.0 package
benefit of installing the
cell
is the
library provided.
The
contains several basic standard cells with known
The library also contains the
performance characteristics.
Though MOSIS does not
standard pad frames used by MOSIS.
library
require the use of standard pad frames on designs submitted,
their use does speed up fabrication.
As mentioned earlier, as socn as SPICE 2G7 is available,
the CAD toolbag would be most advantageous
its addition to
to a CMOS designer.
C.
DESIGH OF THE ADDER
If the design of the adder
were to be undertaken again,
approach to generating the sum would probably
have been used,
especially if the new CAD tools mentioned
above were available. The logic approach to the computation
would still involve CLA addition,
but it would
be accoma different
combinational logic and library
plished using
cells rather
than PLA*s.
Testability would probably suffer greatly, but
effort would
be made
to reduce the
sum generation
tc two
logical events.
Though the level of testability provided by
the current design should provide considerable insight into
CMOS Bulk p-well
performance and CAD tool
accuracy,
would be no need to repeat the investigation.
73
there
APPENDIX A
SPICE MODEL CABDS FOE 3-MICRON CMOS-PW DEVICES
CMO*
models for MOSIS 3-micron CMOS Bulk p-well devices:
Fast Models
.model
n
+xj=1.1e-6
.model
p
+ xj=1.1e-6
nmos
gamma=.3
pmos
uo=500
vto=-.4
gamma=.3
lambda=1e-7
tox=0-7e-7
vto=0.4
cbs=5e-4
cbd=5e-4
Iambda=1e-7
tox=0. 7e-7
ld=1e-6
ld=1e-6
cbd=3.5e-4 cbs=3.5e-4
uo=300
Slow Models
.model
n
+ xj = 0.6e-6
.model
p
xj=0.6e-6
nmos
lambda=1e-7
tox=Q.8e-7
vto=1.0
gamma=1.3 uo=400
gamma=-9
cbs=6e-4
cbd=6e-4
vto=-1.0 tox=0.8e-7
pmos
lambda=1e-7
ld=.5e-6
cbs=4.1e-4
cbd=4.1e-4
uo=200
ld=.5e-6
MIT Models for MOSIS 3-micron CMOS Bulk p-well devices:
Slow - Slow
.model nss nmos
+xj=.35e-6
cjsw=4e-1C
cj=6e-4
+cgso= 1.3e-10
cgdo=1.3e-10
+ vmax=5e4
pb=.7
+neff=2.5
ucrit=8e4
.model pss pmos
+xj=.35e-6
rsh=20
level=2
mj=.5
tox=650e-10
wo=475
mjsw=.
5
uexp=.25
rsh=80
level=2
cgdo=1.3e-10
+vmax=5e4
pb=.7
+neff=2-5
ucrit=8e4
vto=1.2
nsub=1.5e16
tox=650e-10
cj=4.1e-4 cjsw=2.5e-10
+cgso= 1.3e-10
ld=.25e-6
mj=.5
uo=190
nsub=5e15
mjsw=.5
aexp=.
74
15
ld=.25e-6
vto=-1.2
tpg=-1
1
Slov n-type
Fast p-type
rsh=30
level=2
.model nfs nmos
tox=600e-10
cj=6.0e-4 cjsw=4. Oe-10 uo=475
+cgso=1.9e-10 cgdo=1.9e-10 nsub=1.5e16
+xj=.35e-6
vmax=5e4
pb=.7
+neff=2.5
ucrit=8e4
vto=1.2
mjsw=.5
mj = .5
uexp=.
25
rsh=20
level=2
.model pfs pmos
ld=.25e-6
tox=600e-10
ld=.40e-6
xj=.60e-6
cj=2.0e-4 cjsw=1-0€-10
uo=270 vto=-0.6
+ cgso=2.0e-10
cgdo=2.0e-10 nsub=0.3e15
tpg=+vmax=5e4
pb=.7
+neff=2.0
ucrit=8e4
mjsw=.
m j=. 5
uexp=.
5
15
Fast n-type
Past p-type
rsh=10
level=2
.model Lff nmos
tox=550e-10
cj=3.0e-4 cjsw=2. Oe-10
+xj=.60e-6
+cgso=2.5e-10
cgdo=2.5e-10
vmax=5e4
pb=.7
+nef f=2.5
ucrit=8e4
.model pff pmos
mj=.5
uo=675
ld=.40e-6
vto=0-6
nsub=0.5e16
mjsw=.
5
uexp=. 25
rsh=20
level=2
tox=550e-10
+xj=.60e-6
ld=.40e-6
cj=2.0e-4 cjsv=1.0€-10 uo=270 vto=-0.6
+cgso=2.5e-10 cgdo=2.5e-10
nsub=0.3e15
tpg=-1
vmax=5e4
+neff=2.0
pb=.7
'
mj=.5
ucrit=8e4
mjsw=.5
uexp=.
75
15
5
Fast n-type
Slow p-type
.model nsf naos
xj=.60e-6
cgdo=2.0e-10
+vmax=5e4
ph=-7
+neff=2.5
ucrit=8e4
.model psf pmos
tox=600e-10
uexp=.25
rsh=80
cgdo=1.2e-10
mj=.5
ucrit=8e4
vto=0.6
mjsy=.5
aij=.5
level=2
pb=.7
uo=675
ld=.40e-6
D=ub=0.5e16
cj=4. 1e-4 cjsw=2.5e-10
+cgso=1.2e-10
vmax=5e4
neff=2.0
10
cj=3.0a-4 cjsw=2.0€-10
+cgso=2.0e-10
+xj=-35e-6
rsh=
level=2
tox=600e-10
uo=190
nsub=5.0e15
rajsw=.
uexp=.
76
15
ld=..25-6
vto=-1.2
tpg=-1
o
L
APPENDIX B
DNII MAHUA1 ENTET FOB EOLEC
RULEC (CAD)
CAD Toolbox User's Manual
RULEC (CAD)
NAME
— Compile design
rulec
rules for Lyra
SYNOPSIS
rulec [—lo] rules
DESCRIPTION
Rulec
i)
.
is
a shell script with the following processing steps:
The actual Lyra rule compiler
is
invoked to translate the symbolic rule
description, rules. r, to lisp code, rules.
ii)
iii)
iv)
The
compiler, Liszt,
lisp
rules. o
rules.
is
is
invoked to compile rules.l to
-rules.
loaded into Lyra.proto to generate an executable
The intermediate
files
lisp
Lyra,
rulesX and rules. a are deleted.
The following options are supported:
—1
(load 011I7) No compilation is done. Previously compiled rules, rules. o,
are loaded into Lyra.proto to generate an executable Lyra rules. This
option is useful mainly at Berkeley, where Lyra.proto changes frequently.
—o
(save object)
future.
Name.o
is
not removed.
Enables
"rulec
4
rules' in the
FILES
~cad/bin/ rulec — rulec
shell script.
~cad/lib/lyra/Rulec 1 — lisp rule compiler
~cad/lib/lyra/Lyra.proto — Lyra sans compiled rules code.
^cad/lib/lyra/^r — standard rulesets.
""cad/lib/lyra/DEFAULTS -- gives default rulesets for Caesar technologies.
SEE ALSO
Lyra (CAD)
Liszt (1)
AOTHOS
Michael Arnold.
77
APPENDIX C
PBESIM USEE'S GUIDE
Config file: used to calibrate ENL
capm2a
.00000
capm2p
.00000
capma
.00006
capmp
.00000
cappa
.00006
cappp
.00000
capda
.00010
capdp
.00060
cappda
.00010
cappdp
.00060
capga
.00057
lambda
1.0
78
PRESIM
UWINW
User's
Guide
VLSI Consortium
Department of Computer Science
University of Washington
Seattle.
(This document
is
WA
98195
based on portions of the document 'User's Guide to NET, PRESIM and
J. Terman, Laboratory (or Computer Science, Mi.T., Cambridge,
MA
RNL/NL,* by Christopher
02139.)
One must first
we run PRESIM:
convert the sim
file
to a
network
file
suitable for use by
RNL
or
NL -
to
do
this
presim foojim foo [config] options...
which converts the
The
file
foo.sim into a binary
file
for
RNL/NL
called foo.
-f option:
Suppresses the sum-of-products formation. This may be desired if you think
sum-of-products is formed wrong otherwise the advantages of the transistor and
node reduction make this option unattractive.
The
-« option:
•cfile^n in value
list of node aames and capacitances to the specified
value will be included.
writes a
The
-t
file.
Only capacitances larger than min-
option:
•tfllejnin value
writes a
tor.
list
The
of transistors
R's
come from
and
RC
file - there are two entries for each transisfrom the source/drain capacitance. Only RC
values to the specified
the size of the transistor,
Ct
values larger than minvalue will be included.
The
-p option:
-presist .voltage
provides a worse-case estimate of the circuit power consumption by assuming that all the pullups
devices with drain-VDD) are all on simultaneously. "Voltage* specifics the supply
(DEP or
LOWP
UW/NW
VLSI Release 2
- 1 -
79
1CV V83
UW/NW
PRESIM
VLSI Consortium
User's Guide
VDD
or 5 volts. The result is printed liter PRESEM completes its
other processing. When figuring the resistance of a pullup device the 'power* characteristic resistance
as set in the coring file is used.
voltage, (or example *-pi* specifies a
The
optional third
file
(con fig) specifies various electrical parameters.
The
internal values (the
any particular fabrication process (ITW-NW VLSI
NOTE: A configuration file is provided in the source code that duplicates the internal settings as an
example of how this ale could be used. In addition we note that, the resistor values are stored first
sorted by width, then by length not by the ratio. Values not explicitly provided in the configuration
file are estimated by Linear interpolation.) The formal of this file is lines of the form
defaults) are a generic set.
They do not
reflect
.
parameter value comments-.
Lines beginning with '? are treated as
all
comment. The parameter names and
their default values
are:
;
configuration 51e for "standard"
cappa
cappp
capda
capdp
cappda
cappdp
capga
lambda
process
2nd metal capacitance - area, pf/sq-microu
2nd metal capacitance - perimeter, p£/micron
capm2a .00000
eaptnlp JXWOO
capma .00003
captnp
MFC
metal capacitance - area, pf/sq-micron
metal capacitance - perimeter, pf/micron
poly capacitance - area, pf/sq-micron
poly capacitance - perimeter, pf/micron
n-diffusion capacitance — area, pf/sq-micron
1st
.00000
1st
.00004
DOOOO
.00010
00060
n-diffusion capacitance
.00010
p-diffusion capacitance
-
D0060
p-diffusion capacitance
.00040
gate capacitance
2.5
microns/lambda (conversion from
to units used in cap parameters)
-
;
logic
highthresh 0.8
;
logic high threshold as a
cntpuilup
;
diffperim
subparea
perimeter, pf/micron
area, pf/sq-micron
lowthresh OJ
.sim
file
units
low threshold as a normalized voltage
normalized voltage
;
< > means that the capacitor formed by gate of
pullup should be included in capacitance of output
;
node
;
< >0 means do
not include diffusion perimeters
on transistor gates when figuring
;
that border
;
sidewall capacitance (*)
;
;
LTW/NW VLSI
perimeter, pf/micron
area, pf/sq-micron
< >0 means that poly over transistor region will not
be counted as part of the poly-bulk capacitor {')
Release 2
10/1/83
80
UW/NW
PRESIM
VLSI Consortium
Uier'i Guide
diffusion extension for etch transistor,
ije., each
have a rectangular source
and drain diffusion extending diffext units wide and
diffext
transistor
is
assumed
to
The
transistor-width units nigh.
effect of the
add some capacitance to
the source and drain node of each transistor —
useful when processing the output of NET to improve
the capacitive loading approximations without adding
diffusion extension
is
to
explicit load capacitors,
lambda
(it
will
diffext
is
specified in
be converted using the lambda factor
above).
resistance channel context width length resist
this
command
specifies the equivalent resistance for a transistor
of type channel with the specified width and length. Transistors
matching this entry will have the specified resistance; Linear
interpolation
done
is
the width and/or length
if
is
not matched
exactly.
channel
context
width
length
is
resist is
(")
is
is
is
one of "enh", 'dep',
"intrinsic*,
low-power",
"puUup*. or "p-chan"
one of "static", "dynamic-high", "dynamic-low", or 'power*
given in lambda
given in lambda
given in
ohms
These paramters should be
1
only
when
processing the output of
the node extractor. They cause various corrections to be
to the interconnect
component of
only extracted sim
files
a node's capacitance
-
made
usually
have information regarding interconnect
capacitance.
PRESIM
uses these parameters in calculating the capacitance for each electrical
node and the
resis-
tance for each transistor channel.
UW/NW
VLSI Release
3-
2
81
1IVU83
APPENDIX
D
ADDER SIMULATION
The following two listings are;
for the
entire chip
command file.
and
the RNL command file
results of
the
(2)
(1)
running that
In addition to this overall testing, all the
simulated individually.
A nice
a watched
the indication
of when
node
layout of Appendix
feature of RNL is
G were
changes state.
Thus, by making all the outputs of
watched nodes,
RNL
cycle to produce the outputs
for a clock
This can
the simulation).
indicated by
simulation with
running the
outputs of
will provide the minimum
(neither
X
1
nor
0)
circuit
time duration
(the
longest time
be confirmed
by
resulting in
faster clock,
a
a
where insufficient time has
been allowed.
determine the minimum time
RNL simulation to
harging the
PLA circuits
for prec-
slightly more
is only
involved.
selected that will result in
needing to be charged from
alternating inputs are
maximum amount of N+ diffusion
vclts to Vdd.
Then as these
inputs are
PIA precharge
product term in
For each
alternated,
the PLA,
the
time is
until the circuit fails to produce correct results.
with the longest
for
the longest
through
For the
inspection for the product term
precharge requirement was done
by looking
line which
must be charged
N+ diffusion
PLA s in the adder,
'
reduce
the maximum
visual
number
of
transistors.
The
inspection results were confirmed by ENL simulations.
82
visual
)
acr
2
13:59
<-
4
)
cr
IV***
i
r.-
.
c rr
r<
)
Face
4
1
1
Cloo -ill e "ch It . loc "
Cloa d "u wsti;. 1")
Cloa d "u WSiR . 1")
(rea o-re t w o r * "crir " )
65 ae a 7 aR a9 alC all al2 al3
(set c no aes ' (al s2 8? a
3 c 4 b S be b 7 c f b S o l
bl2 bl3 b
b 1
al4 al5 al o c 1 tv
s 9 S 1
s7 s
S 1 1 S 1 2 Sl3 S 1 4 S 1 5 s
tl6 si s 2 S3 S4 S5 S
cir cout rhll oni2 conl con2 con3 con4 crn5))
(cnf laa noo«S
a7 a 3 a« a^ a6 a7 ao a c a 10 all al2 dU alt al5 al6
1 al
t2 d3 n« b5 be b 7 r'e D-j bid bl 1 bl2 bl3 si 4 bl5 b!6
1 bl
c on2 c or-,? co n4 con5
1 CO nl
1
'i
1
r>
<•
li
bi5
1
1
6
>
1
cl n
r-^
(Hef vec
Cdef vec
1
11
DJ:
12
'(Mr ClCCK S rhll
Ch i 2
alb al5
)
)
aW
al3 al2 all all)
as* aia7 a6 a5 a- a3 a2 al))
(defvec '(fcir Dbbfc n 6 b 1 5 n 1
bl3 bl2 bl
fclO
b 9 PS o 7 be b5 bi b3 t2 nil)
(detvec '(fclr sur c out S16 sib sH S13 sl2 si
S10
sfsis* s3 s^
s
Sh s/
s7 sts5 s<<
sb
s2 si;;
Si))
'
(def-renort
("stat
clr. co vit ne*llne
is now!" (vec c l o c < s
vec b*a*
ne* line
(vec brbb) n e » 1 1 n e
(vec sur) )
'(tin
aaoa
)
1
1
)
!
)
(
)
(
de fun
rr
v
)
Inc T
)
)
ss(oi;rr
(step
(defun cvcles (al
(repeat 1 1 a
(setc incr 1001
(ss
'
(x))
seto lncr
Ch '( phil ))
(
(ss
'
(setc
(1
(ss
'(
(x
J
incr
onl J )
1
)
)
'
(seto incr
(n
is
(l
2bi-
'(
'(
'(
cM?]
25(.)
)
x))
obl2)
)
)
)
(cycles 5)
(lnvec '(aaaa
(lnvec '(btbt
(cycles 3)
(lnvec '(bbbr
(cycles 3)
h cln
(cycles 3)
1 cln
' (.aaa?
( lnvec
(lnvec '(btbt
(cycles 3)
n cln
(cycles 31
Ub
(•
ou
b
1
1 1
1 1 1
1 1 1 1
'">
1
p1
C
1 1 1
Hi
l)
)
)
)
1 )
)
10
(J
OMllllMUlllllin)
oo
r.
oo
r.
oocououooo
l-
]
t
Get 2°
13:^°
cin
Cinvec 'Ctbbt
(cycles 3)
(lnvec '(&rrc
(cycles l)
(lnvec '(assa
(cycles
(lnvec 'CHbtt
(cycles 1)
(lnvec '(aaaa
(cycles
(lnvec 'fbbee
(cycles
)
19*<<
eric.CiTc Face
2
1
J
h
ObC'OOCOOC'OG'.iOOoOO
I
Or<t"OUOt'OOOOQOOOflO)
QbtffiOOUOGGOOCGOOOO)
!
i
)
l
)
uel 11
1
ill 11 1111111)
I'^f'OOClOO'iOOuuOO^O)
Oell 1101
1 1 1
111 1111)
cir
(cycles l)
(lnvec '(?33S 0b(iQOO0P0OO<J0anO00)
(lnvec ' ( d b t OfcGOOOOCOOOGAOOOUG)
(cycles 1
1 cin
(cycles 4)
)
84
.
Dec
15:23
6
cMp.loa Paoe
1984
1
Loe^lna uwslir.l
Done loadino uwsl-n.l
3086 nodes, transistors: enn=l494 intrinslc=0 p-chan=il4l dep=n lo»-oower=0 pullupsO reslstc
;
Ste
phi
phi
cin
cor
con
con
con
con
hl6
bib
C14
bl3
bl2
hll
blO
n9 =
oP =
beoins
=
=
a
a
=
=
o
e
a
=
bC
=
:0
:0
:0
:C
:0
a
rO
8
:C
a
?
a
b7 =
06 =
b5 =
a
t4s
a
p
b3 =
h2 =
e
c
bis
al6
al5
a!4
al3
al2
a
P
a
:0
:0
:0
:0
!C
all
:0
alO
:0
a9 =
a
=
P
a7 =
a6 =
e5 =
a4 =
a3 =
a2 =
a
al =
a
a<3
ns
e
a
B
P
a
a
a
Ster beoins
phllrl a o
a
in
ns.
Step nealns
phllso a
a
35
ns,
beoins a A* ns.
pnl2=l a
sl6=0 a 14.2
s°=n n 16.4
Ster-
85
,
nee
sll
sl3
sl5
s7s
S5 =
53 =
S14
S12
»
15.4
16,4
a
ifi.4
e
chic. log Paoe
2
16.4
a 16.4
a 16.4
a 16.5
a 16.5
P 16.5
a 16.5
a 16.5
P 16.5
6 16.7
6
SlO
s« =
s6 =
5 4=
S? =
Sis
ste
Cur
clo
aaa
bbb
sum
198«
15:7"?
6
a
e
20
is now:
ent times 70
*ss0b01 cln =
coutsx
sObOoooooooooncnooo
sObOOCOOOOOOOOOOOOO
o
X
Step beoins
phi2s0 9
a
70
ns.
Step becins
ohilsi
o
a
so
ns.
Step beoins
rhllsO a o
e
105 ns.
s-
Ster beoins a i j 5 ns
phl2=1 8
cout=0 a 72.9
state is now;
Current rimes 140
cloc<s=0b01 cln=0 coutsO
aaaasObOOCOOOOOOOOOOOOO
bthh=0t0O0O00000000O0O0
sumsObO^OOOOOOOOOOOOOOO
4n
Sten beoins
chi2=0 p
a
1
Ster becins
ohilsi a
a
150 ns,
Step becins
ohilso a
?
175 ns,
ns
,
Ster begins a 185 ns,
ohl2si a
state is now:
Current times 210
clockssOtoi clnsc coutso
aaaasOtOOOOOOOOOOOOOf'00
bhbbsObOOOOOOOOOOOOOOOO
86
Dec
19R4
15:23
6
cnlp.loo Pace
3
SUm=0b0COOOO00O00O0OO00
Step beqlns
phi2=0 a
Step becins
phil=i a n
Ster becins
Dhll=0 a n
«•
210 ns.
a
220 ns.
9
2*5 ns.
Step becins a 25? ns.
ohi2=l P
state Is now:
Current tirres 280
clocKs=0b01 cln=o cout=n
aaaa=0b0 0oo0O0O0O0O00OO
bbbbsObOOOOOCOOOOOOOooo
Sum=Ot0OOO00OO0COC0000C
Ster beains
onl2=0 B
*
2«0 ns,
Stec becins
phllr]
a
290 ns,
Step bedns
chil=0 a n
a
315 ns.
t>
Step becins a 3?5 ns,
ohi?=l a o
state Is no«»:
Current tin>p* 350
clocks=0b01 cln=C cout=0
aaea=ObOOOOOCOOOOnooOOO
bbbb=0b00000000C0000C00
gumsObO 00 000 00 00000000
Step beains
bl6=l a
bl5=J a c
bl4=l a
bl3=l e
bR=l a
b7 = l
a
b6=l
a
b5=l
a
a
o
a
o
10=1
a
350 ns
o
a
o
a
o
a
o
a
o
a
o
a
rhl2=0
i
o
al2=l
all = l
a9=l
a4=l
a3=l
a2=l
al=l
!
o
87
P
F
Pec
6
15:23 1984
Step beolns
onil=l a
Sten beolns
a o
pnii =
chip. loo Pace
6
350 ns.
P
395 ns.
4
Step bealns a 3^5 ns.
phl?=l »
state Is now:
Current timer 420
clocl<s = 0b01 cln =
cout=0
aaas=nbOonoiiiiooooilil
bbbhxObl 11100001 1 1 10000
SUirrObOOOOOOCPOOOOOOOOO
Step beolns
oni?=o a o
a
^20 ns.
Ster bealns
phll=l a n
a
430 ns.
Step bealns
e
455 ns.
DhH=0
a
o
5tec beclns a 465 ns.
obl2=l a o
state Is now:
Current tlm*= 490
cloc*s=0fc0l cln=o cout=0
aafla = Cb0000llHOOO0iiii
bbbbactlinooooillioooo
SUn- = Ot00OO00C00OOOOOOO0
Ster beolns
pni2=0 a o
a
49C ns.
Ster beolns
phll=i a
e
5^0 ns,
Step bealns
phll=P e o
£
525 ns.
Step beolns P 535 ns.
phl2=l e n
slbsl e 14,6
s9=l a 16.7
sll=l a I6.7
sl3=l b 16.7
Sl5=l a 16.7
S7=l B 16.7
s5=l
16.7
s3=l e 16.7
sl4=l P 16.8
812=1 e 16.
SlO=l e 16.
fi
88
6
S8 = l
a
16 ."
s6=l
S4=l
s2=i
e
lfc
.8
p
.<?
a
16
17
Sl =
3
19, .1
l
1994
15:23
oec
chic.loa Paoe
5
state Is now
Current ti""e = 560
clocKs = 0b01 c 1 n = o c o t =
aaaa=0t00001 HOOOOi i n
bbDhrCbl 11 100001 11 innoo
sumsOfcOHlll 11 111 1 ill
i;
1
?
Step tealns
b9=l e
l
?
560 ns
Ster
beclns
Dhil=l p
a
570 ns.
SteD beolns
nhil=0 a
?
59? ns.
bl =
l
e
bl6 = C
tl5 = C
bl4 =
bl3 =
a
o
a
o
a
o
o
a
e« =
9
b7r0
a
b6 = n
C5 = C
a
C
rhi?
Stec beolns a 605 ns,
nhl?=l e
state Is now:
Current tiire* 630
clocks=Cb01 c 1 p = n c u t =
aaaa=ObOOOC11110000llll
bbbbsObOOOOOOOlOO'iOOOOi
SumsObOl
1
11
1
1 1 1 1 1
1 1
11 11
Step beains
nni2=0 a
a
630 ns.
Step beolns
phll=l a
a
6^0 ns.
Stec beclns
ohil=0 a
a
665 ns.
Step beclns a 675 ns.
phi2=i a
state Is now:
Current tin>e = 700
clocks=0b0l cin=o coutsO
a«aa = OtCO0Oi 111 00001 in
pbbb=ObOooooon 100000001
89
Pec
198*
15:23
6
cnlc.loe Paae
6
sum = ot«oi 111111111111111
Step beolns
Dhi2=0 a n
a
700 ns.
Stec bealns
Dhllsi e c
s
710 ns.
Stec beoins
?
735 ns.
Dhil=r>
a
n
Step beolns e 745 ns.
nnl2=l e n
sl6=0 e 14.2
s9srt e 16.4
sll=0 e 16.4
Sl5=0 a 16.4
S"J =
? 16.4
s3=0 P 16,4
Sl«=0 » 16.5
Sl2=0 g 16.5
Sl0=0
16.5
Sfl=n a 16.5
16,5
S6 =
S4=0 s 16.5
S2=0 C lb.7
Sl=0 « 20
state Is now:
Current times 770
clocXssObOl c n = cout=o
aaaa = 0r0OOOHlJ00COllu
bbtbsObOCOOOOOl 00000001
<?
(?
1
SUfsObOOOOlOOOOOOOlOCOO
Step becins
cln=l a o
Dhl?=0 a
6
770 ns.
Step beolns
phllsi e o
a
7q
9
805 ns,
Stec bealns
phllso e o
r.
ns.
Step bealns I 815 ns.
Dhl2=l a o
state Is now:
Current times 84u
cloctcssObOl cln =
cout =
aaaa = Ofc0nooilll0O(i0llll
]
thbhsObOOOCOCOlOOOOiOOl
SUmsObOOOOlOOOOO 010000
Step beolns
chi2=0 a o
a
840 ns,
90
:
Dec
]9»4
55:23
6
cMc.loc Faoe
Step beoins
phll=l »
P
P50
Step beoins
phll=0 e
P
875 ns.
7
r.s.
Ster beoins £ 895 ns.
phl2=l a o
state Is no*:
Current tiroes 910
cout =
clock-s = ObUl cln =
aaea = 0b000ruillooocun
l
bbbb=0b00O0CG01O0O00O01
sum=Ob0000100P000030000
Step begins
phi2 =
a
9
Step beoins
phi 1 =
9 i
«
920 ns.
Ster beoins
pnll=0 P
a
9
ns.
1
s>
1
<j
5
ns.
Step beoins a 955 r>s.
phi2=l e o
sl=l a 19.3
state Is now
Current tlme= 980
clocks=0b01 cln=l cout=0
aaaa=0b0000111100ncilll
bbbb=0b00nn00010O000001
SUirsObOOOClOOOOO^OlOOOl
Stec beclns
a
9R0 ns.
Ster beoins
Dhll=i p o
a
990 ns.
Step beoins
phll=0 a o
P
1015 ns.
Steo beoins
ohl?=l a n
e
1025 ns.
a
1
6=
a
1
al5=i a
al4=i a
al3 = l e
a6=l a
a7=l a
a6=l a
a5=l
b<J
=
o
a
a
c
bl=0 a
cln=C a
ohl2=0 a
o
91
P:
Dec
6
19P4
15:23
chip.loq Faoe
P
state Is now:
Current tlires 1^50
clccKssOt-Ol cln=o eout=0
*aea=0blllli l 1111111111
bbhbcotoooonoooooooooco
SUmrObCOOOlOOCOOOOlOOOl
Step beqlns
phl2=0 a o
9
1050 ns,
Stec beclns
Dhl1=l a
e
10*0 ns.
Sten beolns
phll=0 a
1085 ns.
Sten becins a 1095 ns.
phl2=l 9
state Is now
Current times 112C
clcctcs = Ob01 cln =
couts"
aaaasOfcll 1111111111 11 11
bbbb = Cc000 00OOOoo0000()
(*
SUmsObOCOOlOOOOOOOlOOOl
Ster beolns
Dhi2 = C a
e
1120 ns,
Ster bealns
chllsl a
a
1130 ns.
Step beolns
Dhll=0 a
9
1155 ns.
r»
Step bealns 9 11*5 ns.
phi2=l »
sl6=l e 14.6
s9=l e 16.7
e 16.7
s 1 1 = 1
Sl5=l f 16.7
s7=l a 16.7
s3=l P 1*.7
Sl4=l a 16.
sl2 = l a 16.
sl0=l a 16.8
«;8=1 a 16.8
s6cl a 16.8
s4=i a i*.s
s2=l
a
17
state Is now:
Current tlrre = 1190
cloc<s=0b0l cln=0 cout=o
aaea = Ohlllll 1 l'l 11 11111 1
bbbb=0b0o00000000000000
sui" = 0b01 111111111111111
92
H
nee
6
15:23
1994
chip. loo F?oe
Sten
bealns
cin=l B
Dhl2=0 e o
B
1190 ns.
Ster bealns
onil=l e
B
1200 ns.
fl
1225 ns.
SteD bealns
nnll=o a n
°
St en bealns
B 1235 ns,
phi2=l b o
state Is now:
Current tlme = 1260
cloc* s=Oh01 cln=l cout=0
aaap = otllllllllllllll 11
bbbb=0bOOOo0000O000C000
sumsObOllllllllllUltll
Stec beolns
oni2=c e
e
1260 ns.
Ster bealns
ohllsi e
»
127" ns.
SteD bealns
B
12Q5 ns,
pMl = o
a
o
Ster beolns a 1305 ns.
nhi2=l B o
state Is now:
Current tlme= 1330
clocKs=0b01 cln=l cout =
aaae = 0fcl
111111 1111 111
bbbnsObOOOOOOOOOOOOOOno
suf=0b01 111111111111111
Stec bealns
nni2=0 B
&
1330 ns.
Ster berins
onil=l '
B
1340 ns.
fl
1365 ns.
Stec bealns
phll=0 s o
Ster beolns B 1375 ns.
ohl2=l P o
SlftsO B 14.2
s9=0 B 16.4
Sll=0 B 16.4
Sl3=0 B 16.4
16.4
S15=0
S7r0 6 16.4
S5=0 o 16.4
S3=0 B 16.4
fl
93
Pec
15:23
b
Sl4 =
9
b
Sl2=0
610=0
a
sflsrt
a
sfi=0
S4=0
«
a
s2 = o
a
19%4
chic. leg Pane
1
<J
16.5
16.5
16.5
16.5
16.5
16.5
16.7
sl=0 9 20
cout=l a 21.1
state is no*:
Current times 1400
clocKs=0b01 cln=l couts]
aaeasotllll 1111111111!)
hbbbsOcOOnooOOOOOOOOOOO
suir = OM n oooonooccooooco
Ster beclns
hl=l a C
cin=0 9
ohl2=0 a
ic
14
ns
Ster beolns
pnii=i a n
°
M10
ns,
Step healns
pnll=C a
B
1^35 ns.
.
Ster bealns ? 1445 ns.
phi2=l b
state Is now:
current times 1470
clocks=0b01 ein = coutsi
aaaa=Orll 1111111111111)
brbb=0bO0O000000000O001
sumsOfclOOnooOOOOG 0000^0
Stec bealns
phl?=0 »
9
1470 ns.
Step peclns
pnll=l a
*
1^90 ns.
Step bealns
phll=0 e
£
1505 ns.
Step beolns P 1515 ns.
phl2=l a
state Is nowi
Current time= 1540
cloclcssf'bOl cln = C cout=l
aaaa = Obllllllllll 111111
bbbh=0bO00000O0OO000O01
SUirsOblOOOOOOOOOOOOOOOO
Step bealns
nhi?=0 a n
a
1540 ns.
94
Dec
15:23
6
Step beains
onilsl a o
Step beains
onil=0 e
19«fl
chlcloc paae
P.
1550 ns.
a
1575 ns.
11
Step beolns 9 1585 ns.
phi?=l e o
state Is new:
Current time? 1610
coutsl
cloclcs = Ob01 cln =
aaaa*Oblllll lllllllll 11
bbchsotooonnooooooooooi
SUm=0blO0OO00000O0O0O00
Sten beolns
*
1610 ns.
Step beolns
Dhll=l a o
P
1620 ns,
Step begins
DHl1=0 6
a
1645 ns.
bl =
e
ohi7=0
a
Sten reains * 1655 ns.
phl?=l a i
state Is now:
Current tirre = 1680
clocks=0b n l cin=o coutsl
aaae=0bll 11111111111111
bbbbsObOOOOOOOOOOOOOOOO
SUmsOblOOO 000000000000
Step beolns
al6=0 a o
al5=0 P
al4=n e o
al3=o p o
el2 = C a
all=0 e o
aioro a o
a<*=0
afl=0
a7 =
a6=0
a5=0
a
a
a
a
1690 ns,
o
o
c
a2=o e
aleO a
pni2=0
c
3=0
P
o
a
a
16P0 ns.
o
o
a
a
a* =
9
a
o
Ster beolns
philxi a n
95
7a B
Dec
19T4
15:23
6
Steo beolns
Dhll=0 P
P
cMc.loo Faqe
12
1715 ns.
Ster bealns P 1"?25 ns.
chJ2=l *
state Is no«:
Current tlrre= 1750
cloocssOfOl cln = o coutal
aaae = 0fc00C. 000000 000000
bbbb=0bO00O000C)oocoC00
SUn-sOfclOuOOOCOOOOOOOOOO
Stec beolns
f
1750 ns.
Ster bealns
rMll=l p
a
1760 ns.
Steo bealns
onil=0 a o
a
17H5 ns.
<?
17Q5 ns
6=1
a
b15=1
bl4=l
e
bl 3=1
B
bl2=l
bllsl
blOsi
8
b
1
b9 = l
P
p
s9 =
sll
sl3
sl5
S7 =
s5 =
53 =
s!4
sl2
slO
o
P
e
b8=l s
b7=l a
h6=l e
b5=l B
^4=1 ?
b3=l a
b2=t P
hl=l P
ohl2=o
Ste
chl
sib
C
s
o
^ecins
=
a
1
a
1
o
I4.fi
16.7
a
16.7
a 16.7
a
16.7
a 16.7
a
1
1
1
a
lb.
a
16.7
16.8
16.8
1
P
1
P
a
1
16.
=
a
s« =
54 =
s? =
a
16.8
16.6
a
16.
a
a
19.1
sfi
si*
17
96
1
::
Dec
6
1964
15:23
enip.loc Pace
13
cout = P 7.2,9
state i s now
Current times l a 20
clockssObOl cln=P cout=0
aaaa = 0b00OOC0OOC0OO0<^0O
bbbbaOfcllll] 111 1 11 1111
SUffsObOllllll 1111111113
Step beolns
al? = l 8
Dni2=o e o
P
182^ ns.
Ster beolns
nnii=i 9 o
?
1R30 ns.
Ster beolns
Phll=0 8
8
1*55 ns.
Stec beains 9 1965 n s,
Dhi?=l 8
Sl6rO E 14.2
8 16.4
s9 =
SllsO 8 16.4
Sl3=0 B 16.4
Sl5r0 8 16.4
S7=0 8 36.4
S5s0 P 16.4
S3=0 « 16."
Sl4=0 e 16.5
Sl2=0 P 16.5
SlO=0 e 16.5
s8=n e H,,5
S6=C 8 16.5
S4=0 6 16.5
S2=0 8 18.7
Sl=C a 20
state Is now
Current tirres 1890
clocks=PbOl cir = o cout=o
aaaa=0bCOO0l00OOOOO0O0O
bbbb=Otl 111111111111111
SUirrObOOOOOOOOOOCOCCOOO
Ster beolns
bl2=0 a
Dhl2=0 e
a
189n ns.
Step beolns
ohll=l a
e
1900 ns.
Ster beolns
chll=0 e
9
1925 ns.
Ster beolns B
unl?=l e
sl6=l a 14.6
1935 rs
.
97
9
Dec
»
16.7
16.7
16.7
36.7
16.7
16.7
16.7
1
3
16.
1
a
16.8
1
B
s9 =
a
1
513
sl5
1
9
1
s7 =
s5 =
s3 =
a
e
B
sl4
s!2
slO
16.
=
S6 =
54 =
S
a
16.9
16.8
16.8
S? =
P
17
sl =
e
sfl
1994
15:23
6
sU
:
a
ehlr.loo
p<?ne
14
P
19.1
sta e Is now
Cur ert times I960
clo Ks=0b01 clnsc cour=o
aaa sobooooiooonoooonoo
hhb sOM 11101 11 11 ill 111
Ohoim
j
liiuiiiii
Sttr beains
cln=l e
pni2so b o
B
i960 ns,
Stec renins
P
197
8
1095 ns
SUff
dn1
1
=
P
1
i
ns
o
Ster beoins
nnilso e
200? ns.
Stec peclns
Dnl2si e p
SlbsO B 14.2
Sl3=0 B 16,4
Sl5sP B 16.4
Sl4sC B 16.5
Sl2s0 a 16.5
COUtsl 6 21.1
state is no*:
Current timer 2030
cloc*s=0b01 clnsi coutsl
fc
naaesObOOOOlOOOOOOOOOOO
bbbbsnbl 111011111111111
sumsObloOOOOl 1111111111
Ster beains
bl6=0 a
bl5=0
hl4=0 B
bl3s0 P
bllsO P
M0 =
b9sn
hB=n
2030 ns.
P
e
»
C
98
nee
6
b7=n
b6so
9
b5 =
a
b4=0
a
b3 =
b2 =
a
blatO
a 12 =
a
19M
15:23
chic. log Paae
15
e
e
pni2=o
"
n
a
a
Stec hecins
Dhilsi a o
a
2040 ns,
ten benins
rhil=n a o
6
2nb 5
S
ns.
Ster beairs a 2075 ns.
phi2=l a o
slb = l a ii.fi
sn=i e 16.7
sl5 = l a 16.7
si4=i a 16. 9
sl2 = l e 16."
cout=P a 22.9
state Is now:
Current times 210n
clocics = 0fc01 cln = l cout =
aaaa=0h0000000000000000
nbbb=ObCOOOOOOOnnoCOOOO
SUWaObOllllll 11U11 1111
Ster beolns
cln=0 a o
chl2=0 a o
a
2100 ns.
Sted
beains
phll=l a o
a
2110 ns.
Stec beains
phll=0 P
a
2135 ns.
Step beains 6 2145 ns.
phl?=l P
B 14.2
sl6 =
S9 =
9 16.4
Sll=0 B 16.4
Sl3=0 B 16.4
Sl5=0 e 16.4
S7=0 P 16.4
S5=0 B 16.4
s3=0 e 16.4
Sl4=0 a 16.5
sl?=0 e 16.5
Sl0=n e 16.5
sfl=P P 16.5
s6=0 a 16,5
99
n*?c
6
15:23 1984
S4=0
a
a
16.5
16.7
s2 =
chic.loa Paae 16
S1=0 e 20
cout=l » 21.1
state is now:
Current times 2170
cloci«cs = ObOi
cln =
cout=l
aaaa=ob0onoocooooooooco
bfcthsOt^O^OOOOOOOOOOOno
sum=0blO000000000OCTO00
Stec beains
ohi2=0 a
a
2170 ns,
Stec beclns
a
21«n ns.
a
2705 ns,
dM1 =
1
a
Stec becins
rhil=0 c
Ster beains f 2215 ns.
Dhi?=l 9
cout=0 B 22.9
state is now;
Current timer 2240
clocKs=0b0l cjn=0 cout=0
aaaasoboooooooooooononn
cobb=ObOOOOioOOOCOOonoo
suirsotooooonooooonooooo
Ster beains
nhl2=0 e
fl
2240 ns.
Sten *ealns
Dhll=l P
a
2250 ns.
Ster beclns
onil=0 «
e
2275 ns.
Stec beains a 22P5 ns.
chi2=l »
sl=0 a 20
state is now:
Current tlire= 2310
cloc»cs = Ob03 cin =
cout =
aaaa=0b0OO0OOOOOO00C00O
bbbb=Ob0COO000OO0OOO0OO
SUffcObOOOOOOOOOOOOOOOOO
Stec beclns
ohl2=0 a o
a
2310 ns.
Ster becins
ohll=l a o
a
2320 ns.
Ster beolns
a
2345 ns.
100
nee
fe
ohil=0
15:23
1964
cMn.loc
i-aoe
17
f
Step beclns £ 7355 ns.
phl2=t a o
state Is now:
Current ti^e* 7380
clcc*s=0b01 cln = c cout = n
aaea=0fc00O0O0000C0C0000
bbbb=0b0O0O1CO000OOOOC0
S'.im = 0b000O0^000n0O000O0
exit
101
APPEND1I I
LAX0U1S
LEGEND
>;-.•-.'
Contact Cut
p-well
P+ doping
polysilicon
Diffusion
Metal
102
AND Gate
103
m
.
.
r,*v.5
-
-
.....
'
.
-.
ryyv
m
::::q:
ism
mmmmm
OR Gate
104
A + B
XOR Gate
105
r.s2
BgS^^^^£^S^SY^aa^ -f »'. jrtNN^-^-NV XN^y,' -^-ft^^N^
^ipmiiifinl.iiiSiiii
g^SSISSS
urm
ESSSS
in
mmm
«!l
T^TJ
»**&
yt%-p-,-.-.^
Wm
CON(n)
;.13
I!
S^K£
CON(n)
'.'
•.
:;:
•' :•.•:-.
;
firTtTOTmTTr"
imnninrn HiMTTOnnininr'n'ni''
''
!!tl'l' ||
,
ti ll!l!
out
shift out
106
o
p
CQ
CO
w
M
}
\
''(?& *$?.ta
m
en
W
mop
rip
j.
<
0-,
CO
w
CO
u
M
m
u
107
t
l
m m
*y
^T
CM
n
co
CO
CO
CO
<N
(N
CJ
U
H
U
M
M
w u
H
»-i
U
H
<n
^H
to
hi
CO
M
u
H
i—
<N
.—
x:
a.
•H
x:
a.
•H
x:
cu
H
^Hyiijjhj:
^^n u—ta-u-tij u—y
•'
•
.,.
|H;
£}
•
Li -
,
•
r-^ .
.
..
..,
,/
I
,-
"*
——
'
ij
GND
"|Lga :::i?S^::-n:ffia:::-::nsa:
:
ES4
PLA84
108
ES3
ES2
ESI
'
ClC
Im
u u
^r
CO
•rl
lu
rorn
Ul
|V1
w lu
i
M
pi
rsr
C/l
w
ij
i o
h
H
to fn
cn
i<n
W
—
Iti
ij
u
i
r.
i
,-ri
r-i
u
—
N
-h
f
I
»H
i
W
w
1(1
«H
£ £
o< a
1/1
Icj
a
~
i
rt
ttfi
Vdd
f-f
t<
-
1
:
!
¥
g
mH m
mH ra h
%
^_.
l_-?Lfe ^ J'
1
| I 1
!
rti
i
j*
•y-jV
'ps-.
[j.-ij
Ej
f,
t-:i:3:Jx-.^:.j[vJ:i-.:
fSj
—
i
p
:,'-it
I
fp
;.•
'3
ii
H:..
n.-fi-ihi
-
:
:
!.l
1 1
y^frr.'f '.•":.'
:
:
:
:j.E/
:j
:i
"i
:p
^
t|_j
tj
;
H
j
Q-^"~
~^
-
^^-^H;'- ^- -l^l;^
^ .i'^-^p
;c>
<^
3
^^^
ZBZZB. '•<"•?••>?
i
-J-J
'
N ^r"%
a
.
:.
-.
^_[f ^JJ
j^Jj
|Tp!J
g vv v
GND
li
^::::^y§nT^,:
S4
PLA104
109
S3
fc
J|-:.-:;;^gT::;j^
S2
SI
-•
r.
^T
•H
a.
m
CQ
a.
0)
u
U a
m
§
(J
<r
^l
ro
a
l^-^ig-X,
'
CQ
<N
&.
ca
i
a
o
Wl-^
•H
CQ
iH
•H
J=
a.
x:
tu
.c
ft
§
§
§
n
CM
iH
iH
ca
CU
CQ
a
i
u
l^
1
O
jm
11
CM
tm-y
Vdd
V.
•
i|-:fi:j
P*^T^
i
.
|*gA,,»..j,.,ir.„ti.
k^qfcn:-f
M". '
'
••ptf
4".
t::j
i^^ffl
'
fj^5^^gg^^;g;|j;;:
^•|f'-t!-.H:.if.i:]--: ; l,i::
fc^".!
":-(:j;| t
}
)
:i...i
,.iv^i...;
Ui
-t
i
?i-
;
|j::|
jj.;j.
[£[:§,;
g ggg^
gj]
®53gg^|
f^;ffi^jL-
^.- ifcft, .-*
—>**) !
1
5
JCTafezzfa
-
- '^r-^^--r.»^-^s^a-^^5^~-ij-:-M^^
f
.•:
:£-:-
^"'
l^STlia
P{ :•::
».,„.
l.-.-TJu.S.:.
--4.--
•::
F-xr
:
t:
-.<
: .1-a-J
g^flilifs^iiiili^iiiiol
1
r
.'..'.'
'-
-%
r~T7T,
gt»giiigiii
BC4
PLA915
110
BC3
BC2
BCl
BCO
1
1
AP£MDH
F
TEST VECTORS
Addend
Addend
A
msb- - - - - 1st)
Sum
Cin
B
msb- - - - - lsb
lsb
msb-
initialize all internal nodes
0000000000000000
0000000000000000
xxxxxxxxxxxxxxxxx
0000000000000000
0000000000000000
xxxxxxxxxxxxxxxxx
0000000000000000
ooooooooooocoooo
00000000000000000
test for proper P and G primitives
0000000000000000
11111111
1111 1111111111 11
0000000000000000
01111111111111111
0101010101010101
1010101010101010
01111111111111111
1010101010101010
0101010101010101
01111111111111111
0001000100010001
0000000000000000
00001000100010001
0001000100010001
0001000100010001
00010001000100010
0101010101010101
0001000100010001
00110011001100110
0101010101010101
0101010101010101
01010101010101010
0101010101010101
00110011001 1001
1
01000100010001000
0010001000100010
00110011001 10011
00101010101010101
11
1
01111
11 11
1
1
1111 111
11
1
test fcr proper IES
test fcr proper IC23
test for carry from block to blcck
00000000000011
11
0000000000000001
00000000000011
11
0000000000000000
1
00000000000010000
00000000111111
11
cooooooooooooooo
1
00000000100000000
11
00000000000010000
1
000000001 1111111
0000000000000001
00000000100000000
0000000011 111111
0000 111111111111
0000000000010000
0000000000000000
0000000010000111
0001000000000000
000011 1111111111
0000000000000001
00001000000000000
0000 111111111111
0000000000010000
00001000000001
000011 1111111111
00000001000COOOO
00001000011 111111
1111111111111111
0000000000000000
1111111111111111
1111111111111111
0000000000000001
0000000000010000
10000000000000000
10000000000001 111
1111111111111111
0000000100000000
10000000011 111111
1111111111111
00O1000O000C0000
1000011 1111111111
111
112
1
1
11
10000000000000 000
LIST OF REFERENCES
1.
Mead, C. and Conway,. L-, Introd uctio n to VLSI Sy stems ,
Addison-Wesley , 1980.
2.
Carlson,
D.J., Appl ication of a Silicon Compiler to
of DigiTaT" Pipel ine d* Hu TEipI iers ,
V LSI Design
"I SEE"
TEesis, TIaval Postgraduate 5ch~ooI, Konterey , Ca., June
1984.
3.
Conradi, J. R. and Hauenstein, B. E. , VLSI Design of a
16 Bit
Very Fast Pip_eliEed Carry L ook" Ahea]3 Adder,
M~S"EE Thesis, Uaval Postgraduate "School, MonTerey , Ca.
September 1983.
,
4.
Ousterhout, J.,
Editing lh 2 I Circuits with Caesar,
Computer Science Division,
BeparlmenH or Electrical
Engineering and Computer Sciences.
University of
California, Berkeley, pp. 1-22, March 22,1983.
5.
Tsai,
L.
L.
and Achugbue,
J.
Hierarchical VLSI Design System,"
21-26, July/August 1983:
6.
Carnegie-Mellon University Computer Science Department
Report CMU-CS-84-101, Let^s Design CMOS Circuits! Part
One, by M. Annaratone ,~Ipril 3, T9"83.
7.
Krambeck, R.
H.< Lee, C.
M.
and Law, H.
S.,
"High
Speed Compact Circuits
*ith CMOS,"
IEEE Journal of
Solid State Circuits, Vol. SC- 17, No. 3, pp7 ~575-6 V5~
June, T9"H27
8.
Fang, R.
C.
and Moll, J.
L. , "Latchup Model for the
Parasitic
p-n-p-n
Path
in
Bulk
CMOS,"
I EEE
Transactions on El ect ron Devices, Vol. ED-31, No.
TT
ppT~TT3^TZ0T January~TT84T
9.
Massachusetts Institute cf Technology VLSI Memo No.
82-117, Introductory CMOS Techniques, by L. A. Glasser
and W. S . "Son g7~7ebr u ary " 19E37
10.
Computer Science Divisicn
(EECS) ,
University of
California, Berkeley, Report No.
UCB/CSD/83/ 15, 1983
VLSI Tools, edited by R. M.
Mayo, J.
K.
OusterhouTT
and fl7~ST~ Scott, March, 1S83.
"BURLAP:
0.,
A
VLSI Design,
pp.
arr
*
1
11.
University of Washington/Northwest VLSI Consortium,
Design
Tools Reference
Man ual,
Release 2.0,
A"ugust~T7 T9847
V LSI
113
12
'
PrStfSa-Hiii,
1?B§.
13
-
EaiSSs.?.
~
- 23i£
Iik"S!IML
114
f
£2a^iSE
^t^fi^J
Arithmetic,
:
Bh "' s
"e
BIBLIOGEAPHY
Novel Clocking
V.
D. ,
and Agarwal,
"A
Mercer,
MR.
Technique for VLSI Circuit Testability," IEEE Journal of
Solid State Circuits, Vol. SC- 19, No. 2, pp. "2U7-2TT7 Ipril,
Tosuntikool, N. and Saxe, C. L. , "Rapid Design of Functional
Cells," VLSI Design, pp. 73-77, July/August 1983.
Williams, M. J. Y. and Angell, J- B., "Enhancing Testability
of Large-Scale Integrated Circuits via Test Points and Added
C-22,
No.
Logic," IEEE Trans actio ns on Computers, Vol.
1,
pp.
46-6TJ,
January, T9T3T
115
INITIAL DISTRIEUTION LIST
No.
1.
Superintendent
Copies
2
Attn: Library, Code 0142
Naval Postgraduate School
Monterey, California 93943
2.
Dr. Dcnald Kirk
Code 62KI
Naval Postgraduate School
Monterey, California 93943
3.
Dr. H. H. Loomis
Code 621M
Naval Postgraduate School
Monterey, California 93943
4.
Dr. a. L. Cotton
Code 62CC
Naval Postgraduate School
Monterey, California 93943
5.
Defense Technical Information Center
Cameron Station
Alexandria, Virginia 22314
6.
LCDE William H. Reid
11224 Edgemoor Court
Woodbridge, Virginia
6
1
1
2
1
22192
116
-->
3
ReW
B32W
B3255
„
^sign
vJ
i*eeB
°1
of a S
technologylb
III
Thesis
R3255
c.l
Reid
Design of a sixteen
bit pipelined adder
using CMOS Bulk P-Well
Technology.