Download STATIC DETERMINATION OF DYNAMIC PROPERTIES OF

Transcript
STATIC DETERMINATION OF DYNAMIC PROPERTIES
OF GENERALIZED TYPE UNIONS
Patrick Couso~and Radhia Couso~*
Laboratoire d'Informatique, U.S.M.G., BP. 53
38041 Grenoble Cedex, France
Abstract. The classical programming languaqes such as PASCAL or ALGOL 88 do not provide {ull data
type security. Run-time errors are not precluded on basic operations.
Type safety necessitates a
refinement of the data type notion which allows subtypes. The compiler must also be able to ensure
that basic operations are applicable. This verification consists in determining a local subtype of
g l o b a l l y declared variables or constants. This may be achieved by improved compiler capabilities
to analyze the program properties or by language constructs which permit the expression of these
properties. Both approaches are discussed and illustrated by the problems of access to records via
pointers,
records,
access to variants of record structures, d e t e r m i n a t i o n of d i s j o i n t collections of linked
and d e t e r m i n a t i o n of integer subrenge. Both approaches are complementary and a balance
must be found between what must be specified by the programmer and what must be discovered by the
compiler.
Key words and phrases
:
Type safety,
type unions,
subtype, data type, system of equations,
verification/discovery, error detection capabilities, abstract interpretation of programs,
use of o o i n t e r s / v a r i a n t s of record;structures, domains/collections,
88, EUCLIO,
type
secure
integer subrange type, ALGOL
PASCAL.
CR categories
:
4.12, 4.13, 4.2,
5.4.
1. Introduction
reference ignores the fact that a reference may be
dummy,
The type of an object defines how that object
relates to other objects and which actions may be
In all these languages the p r o b l e m of subscript ran-
systems of ALGOL 801187S], PASCAL[1874], ALGOL 68
~e is not safely treated by the type concept, LiKe-
[1975] ... do not convey enough information to de-
wise,
termine staticly whether a given action applied
For example, in AL-
CAL,
Attache de Recherche au C.N.R.S., L a b o r a t o i r e
Associ~ N ° 7.
**
This work was supported by IRIA-SESORI under
grants 75-035 and 78-180.
in PAS-
a pointer to a record must De considered as
potentially designating any record of a given type.
in ALGOL 88 the type
*
the classical type systems define only loose
relationships between objects. For example,
GOL 80 the type procedure does not include the type of acceptable parameters,
[variants of record
of erring on the current a l t e r n a t i v e of the union.
applied to it. U n f o r t u n a t e l y the classical tyoe
to a value will be meaningful.
in PASCAL type unions
structures) are unsafe because of the possibility
One cannot express the fact that two linked linear
lists of the same type do not intermix. Finally, the
rules of the language
or the programming discipline
accepted by the programmer are not statiely enfor-
77
cad by the compilers,
so that run-time
checks are
the widely used remedy.
However these expensive
time checks are usually
turned off before
programming
security
run-
linguistic
the "least"
In the interest of increased
flexible
reliability
framework
type properties),
error has been discovered.
ware products,
offered by full typing
[within a suitable
to properly
propagate
and the simplicity
Ibut incomplete)
classical
strong
offered by the
type systems.
of soft-
the language designer may reply upon :
2. Nil and Non-nil Pointers
- The design of a refined
necessitates
and safe type system,
linguistic
constructs
te strong type oropertles.
which
Among the objections
which propaga-
ters ere the faotsthat
The rules of the lan-
pe violations
guage must then be checkable by a mere textual
scan of programs
[1976] provide
language
a secure
design
and baroque
le.g. ALGOL
68[1975]
approach may degenerate
programming
ranteeing
This
the type of the object pointed
[1974] except for.variant
to large
that pointers
languages.
heap cells
The design of a refined compiler wkich performs
a static
proved
treatment of programs
error-detection
then remains simple
and provides
capabilities.
im-
and flexible,
but security
offered by compiler verifications
[e.g. EUCLIO
legality assertions which the compiler
for the verifier].
may degenerate
is
generates
and mysterious
The comelier
comparable
the two approaches
re-
is not used).
However
have the nil value which points
to no element at all
; this is a source of frequent
summary of the meaningful
operations
as a static
the operations
prescribed
on that value.
by a syntactically
techniques
techniques,
It is shown that the language
value which happens
Cousot
since both
need a refinement
pe notion.
They differ by the fact that one needs
to be nil.
- the subtype of nil pointers
type
(which happens
covery may be equivalent
only to pointers
type enforcement or dis(e.g.
the language
of non-nil
is not the case for infinite
pointers).
type systems
by the programming
checks
is really
ve-
[e.g.
protection facilities.
However
time checkable.
needed
syntactic
languages genera-
and
ting code in master-mode
by appropriate
this hardware
detec-
constion is not always utilizable.
Moreover,
for
a means by which languamore complicated
can establish
system.
:
Ithese checks are usually
for system implementation
ge designers
this
ry cheap for pointers when using the hardware memory
are not compile
Finally we propose
Since
nil references,type
of non intermixing
type discovery
can be applied
subtype.
designer has three solutions
Run-time
can be facilitated
to that record
to have only one value)
The rule is that dereferencing
between
rule must be enforceable
In such a case
to that re-
the other uses a type dis-
We show that strong
which
pointers
type
cord type
type checking and discovery.
integer ranges),
to a record
- the subtype of non-nil
of the ty-
but we show the close connexion
collections
so
:
- the type of pointers
are strong-
meaning-
a pointer
type notion must then be refined
that one can distinguish
ra-
ly related
a type checker whereas
The pointer
design ap-
approach
are not always dynamically
ful. This is the case when dereferencing
by means of examples;
to program optimization
proach and the compiler design
tructs.
which
au-
e degree of sophistication
ther than program verification
This
cells)
they are no longer accessible
The type of a value may be viewed
techniques we propose for the static ana-
lysis of programshave
unions,
allocated
program verifiers.
We illustrate
coverer,
from variable
when "dispose"
valid construct
[1976].
and ensuring
a pointer may always
However
tomatic
until
by gua-
at (PASCAL
errors.
This compiler design approach
into futurustic
(disjoint
IPASCALE1874]
The language
of records),
point only to explicitly
main allocated
ty-
and that they may be left dang-
ling. One can take care of these objections,
end EUCLIO
use of type unions).
(PL/I)
against the use of poin-
they can lead to serious
a balance
between the
scripting
78
these
examples
run-time
such as array subchecks are very
expensive.
using a simole
Safe
of line (3). This reasoning is easily mechanized as
follows
: associate
invarian~P1,
P2, P3, P4 and P5
language
a type
tion
design,
system
which
prescribed
truct
strong
that
typing
valid
be dynamically
pointer types,
to
ooints
cons-
(i.e. forbid
rations
(i.e.
or non-nil
pointer9
and
check the correct use of opeauthorize
dereferencing
for non-
nil pointers only).
-
Compile
time checks,
to recognize
the safe
language
PASCAL ( H o a r e
variants
are
of
(1)
P1
[2)
P2 = (P1
= [at
(11)
respectively.
of
the
orogramming
and W i r t h [ 1 9 7 3 ] ) ,
as d e f i n e d
by
these
the
in-
subsequent
:
= L)
or
(3)
P3 = [P2 and
P4 = P2 and
(5)
P5 = P3 o r
--
[5)
and
P5)
(4)
(b
= true)
and
((pt
<> n i l )
(pt+.value
[ot+.value
(~ p t '
= n))
and
(b
= false)
<> n)
I s°t'(p4)
pt
and p t = o t ' + . n e x t )
has b e e n d e l i b e r a t e l y
see D e m b i n s k i
and b)
oversimplified,
and S c h w a r t z [ 1 9 7 6 ] ) .
now this last strategy.
Since
2.1
related
equations
(Equation
use o£ a type scheme which is too tolerant.
We illustrate
and
semantics
the type o£ an object
the type"non-nil
syntactically
(9)
the
disallow tyoe vio-
to be changed from the tyoe"nil
pointer",to
to
between nil
system
lations
[21,[4},(71,
According
algorithm from the test
meaningful.
This type scheme must distinguish
and non-nil
i.e.
any opera-
by a syntactically
always
will
with
ensures
propagation
in
general
systems
it
solution
to
consider
simplifications
is
such
undecidable
as t h e
to
one a b o v e ,
find
a
we m u s t
Static Correctness Check of Access to Records
[to
the
prejudice
of
the
via Pointers
precision of our results),
Consider
the simple
record with value
ignore
problem of searching for a
"n" in a linked
the existence of the boolean variable b, of"
the fields "value"
linear list L :
For that purpose we will
and thus focusing
in records o£ the linear list,
on pointers.
Moreover,
consider only the pointer variable
value
next
lowing mredicates
L
we will
pt, and the fol-
,F-Fq__Tq?
....
respectively
solution
as follows
:
:
pt = nil, pt <> nil,
dicates
The PASCAL
on st
is given by PASCALF1974]
(p. 64)
gram
denoted by nil,
form a complete
.is
(at = nil) or (pt<> nil)
non-nil,T
. These pre-
lattice whose HASSE's die-
:
T
[1)
pt := L; b := true;
(2)
{P1}
[3)
while
L
(4)
[pt
<> n i l )
=
nil /
~non-nil
and b do
({P2}
Where i is used to denote
(5]
i~
pt+.value
Known about the variable
(6)
b
(7)
{P3}
(8)
the fact that nothing
is
= n then
pt.
:= f a l s e
Since we are only considering
else
subset Of the set of predicates,
(9)
{P4}
(10)
pt
(11)
{P5});
tions can be simplified
an oversimplified
our system of equa-
accordingly
:
:= p t + . n e x t ;
(1')
P1
= T
(2')
P2 = (PI or P5) and non-nil
(3')
P3
= P2
(4')
P4
= P2
[5')
P5 = P3 o r
The above piece of program is correct with regard
to accesses
to records via pointers,
pt is not nil when dereferenced
at lines
since
T
(5) and
(In
(10). This fact is established by the programmer
79
equation
(1)
we c o n s i d e r
(pt
= L)
since
L may
be an e m o t y or n o n - e m p t y
nil) or
[pt <> nil]
only consider
Our
F is
complete
has
a
PI,
lattice
[5
theorem
least
complete
T, in e q u a t i o n
(pt =
Kleene's
[5] we
tions.
'next'
a (nil or non-nil]
oointer
PS> = F [ < P 1 ,
P2,
PI,
application
in
Therefore,
states
that
the
:
P4,
[Tarski[1955]).
from
the
L s in itself, this least
= T,
P2
= [P1
=
Knas-
P3
F
fixpoint
P4
oreviously established)
~ [2,
can
5]]
and
non-nil
and
non-nil
non-nil
P5 = P3 o r T
Kleene
non-nil
±
l
±
l
or T
= [PI o r P5]
or T ]
[T
or ±]
end
non-nil,
l
,[±
end non-nil
>
i
±
l
i
T
l
i
,(l or T]>
I
±
T
and
>
Kildall[1973]
= <T,
non-nil,
non-nil
the K l e e n e ' s
>
call
~3 = F[~2]
this
programs.
or T ]
<T,
and
non-nil,non-nil,non-ni],(l
non-nil
or
,non-nil,non
nil,
T]>
T
since
technique
Abstract
nil,non-nil,[non-ni]
T]>
or
= <T,
non-nil
,non-nil,non-nil,
T
program
>
the a b s t r a c t
interpretation
of
since
= h3
cutions
ber o{ steps,
sequence converges
which
is o b v i o u s
The s o l u t i o n
us that P2
= P3
[7] and
since
means
[10] are s t a t i c l y
which
according
which
imolies
t h r o u g h ot at lines
s h o w n to be c o r r e c t .
to the v a l u e of PI and PS,
2.2
to
the test on pt at line
be i d e n t i c a l l y
true.
is g a t h e r e d
about
the pro-
which executes
to this new m e a n i n g .
of some f a c e t s
We
the
then get
of the p o s s i b l e
exe-
A theoretic
framework
interpretation
o£ p r o g r a m s
together with
examples
are g i v e n
o£
in C o u s o t [ 1 g 7 6 ] .
A Safe Linguistic Framework to Handle Nil Poin-
that
A comolete
and s a t i s f a c t o r y
p r o b l e m of d e r e f e r e n c i n g
(as in ref real
is
[nil]
tens[1976] within
[11],
pointer
[3] m a y not
e reoresentation
any name r e f e r s
idea of z e n e r a l i z i n g
to r e p r e s e n t
f r o m the above
non-nil
80
:= 3,14]
the f r a m e w o r k
of n i l - n a m e s
to a nil name
is p r o p o s e d by M e e r of A L G O L
to n o n - n i l
68.
The
values
by
(this is a c h i e v e d by not profor the nil symbol),
to a v a l u e .
nil-names.
oointers
s o l u t i o n of the
or a s s i g n i n g
t y p e s are r e s t r i c t e d
exclusion
viding
p r o p a g a t i o n may be d e r i v e d
to the p r o g r a m
ters
With regard
[2] and
in p a r t i c u l a r ,
The s i m p l e p r o g r a m m e r ' s
and i n t e r p r e t a t i o n
is g i v e n
tells
[5] and
its i n t e r p r e t a t i o n
that pt m a y be nil at p r o g r a m p o i n t s
constant
various
the data
L s is a f i n i t e
that pt is not nil at lines
[g] of o u r p r o g r a m ,
of r e c o r d s
num-
system of e q u a t i o n s
= P4 = n o n - n i l ,
our i n t e r p r e t a t i o n
the a c c e s s e s
to o u r
in a f i n i t e
about
of the p r o g r a m .
abstract
Kleene's
some d e t a i l s
o£ an f n t e r p r e t o r
summary
of
we
are f o r g o t t e n ,
according
a static
versions
Sintzoff[1972]
both a new m e a n i n g
g r a m by m e a n s
have
Following
text and the i n f o r m a t i o n
= < T , ( T O F T] and n o n - n i l , n o n
algorithms
they are " e f f i c i e n t "
sequence.
o£ the o r o g r a m
>
stun,
and W e g b r e i t [ 1 9 7 5 ]
been recognized,
= < T , ( T or T] a n d
non-nil
same as above,
non-nil,
OF T ] >
12 = F[Ii ]
[4],
gives
:
= T
-<T,
lattice.
sequence
= P2
I i = F[~ 0 ]
Thus,
to the
non-nil
P2
:
P5]
$]
(where
according
= P2
the
sequence,
or
or
i
computa-
of the p r o g r a m
are i n t e r p r e t e d
computation
= ±,
useless
= non-nil
since
from
(P,
l
[T
the
Moreover,
morphism
P1
PS>]
application
as the limit of K l e e n e ' s
~O = < l ,
= <T,(T
actions
simolified equations
:
= <T,[±
A symbolic execution
the f o l l o w i n g
is of the f o r m
itself.
sequence when eliminating
elementary
to pt].
order-preserving
lattice
he d e £ i n e d
we get
preserving
fixpoint
a complete
[1952]
P4,
an order
ter-Tarski
F is
delivers
is a s s i g n e d
P2,
list,
that the { u n c t i o n
system of e q u a t i o n s
<P1,
where
denoted
the fact
(when d e f i n e d ]
value which
linear
The type void
Finally
the
so that
is used
type o£ nil end
is the u n i o n of the p r e v i o u s
ones.
For example we can write a c o n s t r u c t i o n like
mode
list = union
mode cell
adequate syntactic constructs]
is correct. Since in
this examole the type system is finite,
[ref cell, void]
: struct [integer value,
both ap-
proaches are equivalent as far as type verifications
list next]
are concerned.
to represent linked linear lists. An emety list is
represented by the value empty,
the only void value.
Our routine would have to be rewritten
:
3.
list pt :: L;
Variants of Record Structures
while case pt in
3.1
Ire# cell pt']
~>
else
In ALGOL BeEIg75] a variable may assume values
(ot := next of ot'~
out
~>
Unsafe Type Unions in PASCAL
if value of ot'=n then false
true]
aT different types. The type of this variable is
fi,
then said to be the union of the types of these va-
false
lues. In PASCAL[1974] the concept of type unions
is embodied in the form of variants of record struc-
eSaO
do skip od;
tures : a record type may be specified as consisting of several variants, optionally discriminated
This program is safe, since in ALGOL 88 the
by a tag field.
non-safe coercion of pt from mode union [ref cell,
void]
Example
to mode ref cell has to be made explicit by
a conformity case construct.
type mode
The idea is therefore
= [int, char];
type charint =
to force the p r o g r a m m e r to explicitly perform the
record
run-time tests, which in this example is dictated
anyway by the logic of the problem
:
case tag : mode of
{the rewritten
version admittedly looks a bit cumbersome, but more
int : [i : integer];
convenient ways of expressing such a flow of con-
char : (c : character]
trol may be exhibited
end;
[Oijkstra[1875]]].
vat digit,
2.3
Remarks
letter, a l p h a n u m : charint
In a program containing these declarations
the
occurrence of a variable designator alohanum.c is
It is remarkable that both approaches necessionly valid,
if at this point that variable is of
tate the same secure type system, yet they differ
type character.
It is so,
[if and] only if alpha-
in the choices of making it available or not to the
num.tag = char. H o w e v e r this is not staticly veriprogrammer.
fied by the PASCAL compilers for the following reaThe refined type system considers the pointer
type as the onion of two sybtypes
: pure
sons
:
[non-nil)
The tag field of a variant record definition
pointers and dummy (nil) pointers.
Type safety is
is optional, and may exist only in the pro-
guaranteed by requiring strong typing : the type
grammer's mind.
of a value determines which operations may be mea-
ningfully applied to it.
When present,
the tag field may be assigned,
thus allowing to realize implicit type transIn both cases the type correctness has to be
fer functions. For instance,
a variable of
verified or established by the compiler. For that
type character :
purpose an [often implicit) system of equations is
used.
equations has to be found by the compiler,
other
alphanum.tag
In one case the solution to that system of
alphanum.c
in the
case the c o m p i l e r simply verifies that the
~= char;
:= 'H';
may be interpreted as being of type integer
solution supplied by the programmer [by means of
for the purpose of printing the internal
8]
representation
- No a s s i g n m e n t s
:
alphanum.tag
:= int~
once
they
to the
have
been
tag f i e l d s
are
authorized
initialized.
writeln[alohanum.i];
- U n i t i n Z is a l l o w e d
(Note
that
the
tag
is a p p r o p r i a t e l y
about
its
value
set,
and
alphanum
care
one
can w r i t e
as well
::
:
:= letter~
:
is
alphanum.c
safe
but without
legal,
because
the
type
of
the
right
hand
side
'H';
value
eharint[char]
may
be c o e r c e d
to the
type
writeln[alphanum.i]:]
of
the
left
hand
side
t~b e c h a r i n t [ a n y ]
3.2
Safe Type Unions in ALGOL 68/EUCLID
variable
permits
charint[any]
a l o h a n u m to h o l d
v a l u e of t y p e c h a r i n t [ c h a r ]
(the
either a
o r a v a l u e of t y p e
charint
[int]].
Suggestions
structures
i.e.
have
which
compile-time
assignments
determine
been
ensure
Such
ta Z f i e l d s
the c u r r e n t
similar
to p r o v i d e
type-unions
checkable.
to the
a statement
made
that
to the
features
and
tag v a l u e
syntactic
are
let
from
"inspect
the
-
forbid
were
using
when", of S I M U L A
68 ~ 9 7 5 ]
we w o u l d
charint
= union
mode
integer
digit
charint
alohanum~
tag
field
which
in
original
its
is
:
(integer,
~ character
character]~
letter
safe
case
from
the
usin Z conformity
and
clauses.
The
antagonism
type-safe,
lows
with
which
ALGOL
PASCAL
handles
68-1ike
manner.
parameterized-tyoes,
a formal
parameter
of
: [int,
char]
is m o r e
variant
This
the
the
type
obvious
records
Since
al-
usually
declaration
mode
type
charint
[ta Z
: mode]
char
end
end
When
~>
vat c
: character
the
that
the
:: x ~ end
letter
run-time
:= x
check
type
transfer
tested
on
it
case
:
tag of
int
; end
char
statement
o£ w h i c h
out
carried
by
ensures
variant
a cam-
of a record
the
to the
compiler
checks
for
which
all
can
non-union
Static Treatment of Type Unions
char
has
been
vide
flexible
rity
[Wirth [1975]]
be
able
type
deliberately
unions
to d i s c e r n
following
abstract
Record
of
the r e c o r d
actual
tag
type
parameter
"charint"
may
is
be a c o n s -
designed
at the
expense
: however,
a wise
the
programs
secure
interpretation
vat digit
which
vat
: charint
letter
lint]
: charint
allows
alphanum
type
(char]
are
to p r o -
of
secu-
compiler
by
of t h e s e
a single
record
with
since
tide
[the
numerous
record
tag
is of e n u m e r a t e d
should
using
the
programs:
which
represents
[any]
at
moments
This
the
of p r o g r a m
a program
identified
to n e s t e d
T which
set
program
represented
is a f i n i t e
is a u g m e n t e d
non-initialized
point,
execution,
but
at
with
by
by
The
set
a null
value.
two d i f f e -
two d i f f e r e n t
dealin Z with
:
82
may
be
assumed
by
a tag
field
a
variants
is s t r a i g h t f o r w a r d ] .
type
value
same
variants
types
Since
the
consider
generalization
and
values
type unions
ta Z,
:
when
abstractly
We will
values.
type-safe
be
tag f i e l d s .
rent
68 or E U C L I D
will
of d i s c r e t e
unions
: charint
values
by t h e i r
single
:
vat
ALGOL
to r e t r i e v e
case
a variable
or any,
digit
an
:
int
~ end
and
ta Z is e x p l i c i t l y
d i s c r i m a t i n g case
PASCAL
~ end
ensures
corresponding
=
i : integer
united
is in use,
charint
declared,
tant
var
to r e t r i e v e
tyoes~
tag of
int ~ >
type-checking
be
record
case
of
way
is by a d i s c r i m i n a t i n g
be
3.3
type
been
only
in
in a
EUCLIO
tag w i l l
The
the
~>
if
case
olete
EUCLID[1978]
principle
type
This
int = >
programmer,
since
discriminatin Z x : alphanum
;
end
has
since
char
is h i d d e n
checked
write
the
object
coercion,
:: a l p h a n u m
be v i o l a t e d .
statement.
In A L G O L
m a y be
allowed,
would
[I 974 ].
The
is no d e - u n i t i n g
letter
compiler
context
There
safe,
of a r e c o r d
variable,
a s t a t i c s u m m a r y of the p o t e n t i a l
gram executions must consider
tag fields.
variables
(More g e n e r a l l y ,
o{ e n u m e r a t e d
a set of values
type].
over,
is a f i n l t e
Thus the a b s t r a c t
in 2 T, the p o w e r -
complete
if the p r o g r a m c o n t a i n s
enumerated
lattice.
to take
c o u n t of t h e m in the p r o g r a m a b s t r a c t
m simple
Finally,
(2 T x ... x 2 T] m times.
programs
{null}
{null}
{null}
the a s s i g n m e n t
{male}
5)
Since
this space
can be p e r f o r m e d
at c o m p i l e
this g i v e s
rise
un-
to two e x e c u t i o n
paths
[6] and [8] :
I
[6)
{male}
I {female}
{null}
i
is
[7]
of
time.
{male}
[ {female}
[8)
{male}
{female}
9]
{male}
{female}
10]
The two e x e c u t i o n
Example :
= {mole}
=
{male}
{null}
{female}
paths
are j o i n e d
together:
i female}u{female}i{male}
'
{male}u{male}
type o e r s o n
I{null}
the value of the test is s t a t l c l y
Known,
is a com-
the a b s t r a c t e x e c u t i o n
is i g n o r e d
to m a r y . a g e is ignored.
the a s s i g n m e n t
interpreta-
space
to p a u l . a g e
I {female}
Since
if the p r o g r a m c o n t a i n s
with tag of type T, o u r a b s t r a c t d a t a
lattice,
{null}
{male}
ac-
v a r i a b l e s of type T or record v a r i a b l e s
plete f i n i t e
{null}
[2)
[4)
of
senior
mary
[I)
[3)
More-
simple v a r i a b l e s
type T, it is c o n v e n i e n t
tion process.
i
line i paul
for
this is the case for
v a l u e s o{ the tag will be c h o s e n
set of T, w h i c h
pro-
= {female}
u {Female}
= {mole, female}
record
case
sex
: (male,
female]
of
Note that at line
may have
end ;
vat paul,
mary,
senior
don't appreciate
: person;
[10]
tag v a l u e s
senior.sex
it is c l e a r that "senior"
"male"
or "female".
the qact that
= if p a u l . a g e
paul,sex
else
:= male;
but n e i t h e r do A L G O L 68 n o r EUCLIO.
paul.age
:=
...;
mary.sex
:= female;
mary.age
:= ...;
guages
(3}
(4)
union
(S)
if p a u l . a g e
£ mary.age
then
(6)
senior
it is e v i d e n t
that in some
m e r k n o w s p e r f e c t l y well
:= paul;
type is used,
{7)
female {i
With
cases
these
but is u n a b l e
of a
to e x p l o i t
this
statement.
This same
limitation
t r e a t m e n t of p r o g r a m s ,
lan-
the p r o g r a m -
which alternative
since he m u s t use a d i s c r i m i n a t i n g
exist
else
then male
Knowledge,
static
we
:
~ mary.age
(1)
(2}
However,
case
a r i s e s w i t h our
more powerful
schemes
[Sintzoff[1975]].
(6}
senior
:=
mary;
Finally,
(9)
useful
end;
[10]
in the s t a t i c
information
tements, and if s t a t e m e n t s ,
mity
The symbolic execution
of
that piece of
used
f r o m case sta-
as A L G O L 68 c o n f o r -
tests.
Example
program would be :
t r e a t m e n t of p r o g r a m s
will be g a t h e r e d
{Paul
:
= {male}
; Mary
= {Female}
; Senior
= {Male,
Female}}
if S e n i o r . S e x
...
[I)
...
[2)
...
else
...
fi
83
= Paul.sex
then
The abstract i n t e r p r e t a t i o n of a test (A = B) in a
1.1
Static variants to describe classes of data
context where A and B are variables which may as-
which are different but yet closely related.
sume set of values S A and S B delivers a context
For examole, Men and W o m e n may be described
w h e r e A and B may assume the set of values S A n S B
as Persons d e p e n d i n g on their sex, thus
on the true path.
EUCLIO authorizes
(Thus in {I) we get Paul = Senior
= (Male} n {Male, Female}
= {Male}).The context De-
livered for the false path is
type Person
:
(Sex = {Male, Female))
= ...
type Man = Person(Male)
type Woman
A = i__f (ISA nSBI = 1) and not (SAc S8) then SA - SB
= Person(Female)
In PASCAL however, variables of abstract
else S A f i
(Thus in (2] we get Paul
:
type Man and Woman may be staticly recogni-
: {male} and Senior = {Fe-
zed when their tag values never change.
male}).
1.2
Dynamic variants,
to describe objects whose
When this abstract interpretation of programs
components depend on a possibly
is terminated we can recognize secure programs by
the following facts
:
stopped,
There are no assignments to tag fields, other
than
changing
state. For example a car may be m o v i n g or
thus EUCLID authorizes
type Car
for i n i t i a l i z a t i o n ( w h i c h is recognized
{State
: {moving, stopped, des-
troyed))
by the fact that the tag value is changed from
:
= ...
vat mycar : Car{an__~)
null to some value). We can also authorize useless tag assignments,
Since the actual oarameter supolied for the
i.e. those which assign
tag is any,
to a tag w i t h o u t c h a n g i n g its value.
the variable can be changed from
one variant to another d u r i n g execution, by
The unsafe de-uniting coercions must be checked.
assigning values of d i f f e r e n t variants to
This cannot occur when a record variable is assigned to another,
the variable. However,
since all record variables
are considered to be of union types.
no r e f i n e m e n t is al-
lowed, and no proper subset of the possible
(Note that
tag values can be used
such an assignment may indirectly modify a
:
vat m y c a r : Car({moving,
stopped})
tag value, but this is safe). Oe-uniting coerThis fact may be discovered by a static ana-
cions only occur when accessing a field in a
lysis of the program, and might be useful
record. This is safe only if the tag has been
in memory allocation.
staticly established to be of correct value.
Otherwise,
a w a r n i n g is reported to the ueer,
2. Storage S h a r i n g (Overlays). This implies the use
and a run-time check inserted in the program.
of the same storage area (expressed in the language as "the same actual variable") for diffe-
3.4
Flexibility Versus Security
rent purposes i.e. for r e p r e s e n t i n g different
abstract variables whose lifetimes are d i s j o i n t
This compiler aoproach has the advantage of fle-
{block structure is not incorporated in PASCAL).
xibility over the secure language approach. It
This is a typical case of free union, where no
is clear that all EUCLIO orograms translated in-
tag will be carried along to indicate the cur-
to PASCAL will be recognized to be safe by this
rently valid variant. This tag may be staticly
technique.
simulated,
Following Wirth[1975] there appear to be three
fields of the variant. Unsafe a s s i g n m e n t s will
d i f f e r e n t motivations behind the desire for variants
of record structures
provided that one ensures an appro-
priate setting of the tag upon assignment to
be identified and therefore the m u t a t i o n from
:
one abstract variable to another may be reported
I. The need for heterogeneous structures, in two
main cases
to the user.
:
84
3. R e a l i z a t i o n
EUCLID
of i m p l i c i t
in r e c o g n i t i o n
led b r e a c h e s
necessary,
by m e a n s
of the Tact that c o n t r o l -
provides
unchecked
oT type c o n v e r t e r s
the o r o z r a m
elements
never refer
:
seen how a P A S C A L
it is c l e a r
: d o m a i n of elem;
comoiler might
0S2
: d o m a i n of elem;
that P A S C A L
provides
fle-
soecify
that DS1
dynamic
variables,
into d i f f e r e n t
We h a v e shown
: + DSI~
$2
: + OS2;
The r e s u l t s
of this s t a t i c
z r a m s m i g h t also be useful
we get a s o p h i s t i c a t e d
It is o b v i o u s
constructs
can be simoly
cessitate
language
the p r o g r a m m e r
lan-
to
scan
intentions
is the case
examole concerning dynamic
two p o i n t e r s
which
may ne-
be sets of d i s j o i n t
i£ $I and $2 are p o i n t e r s
records
of the same
between
of the linked s t r u c t u r e ,
to d i f f e r e n t
type.
a pointer
only in the p r o g r a m m e r ' s
$I and $2 o o i n t
of the program-
c a u g h t by c o m o i l e r s
This
that
; the d e c l a r a t i o n s
:
the c o n f u s i o n
the f i r s t e l e m e n t
list is v a l i d
has no
Now,
to d i f f e r e n t
Unfortunately
for e x p r e s s i n g his i n t e n -
rich and not n e c e s s a ~ l y e a s y
constructs.
they o e i n t
Thus
that the p r o g r a m s will
since
H o w e v e r some simple
mer which
of pro-
c o m p i l e r for a s i m p l e
then,
not be v e r y readable,
treatment
in c o d e g e n e r a t i o n .
and OS2 will
domains
$1
tions.
now n e c e s s a r y
o n e can s p e c i f y
0SI
tructs h a v e been used in e i t h e r s e c u r e or i n s e c u r e
preestablished
is
to the same record
that a c o m p i l e r m a y r e p o r t to the user w h i c h c o n s -
guage.
it
code of the c h a r a c t e r
x i b i l i t y at the e x p e n s e of security.
ways.
and
to s t a t e the c o n t r a r y .
In L I S [ 1 9 7 4 ]
type c o n v e r s i o n s ,
this fact to the user.
Finally,
share
all
<<= c h a r a c t e r [ ' H ' )
to i the i n t e r n a l
'H'. We have
reoort
may
of the t y p e s y s t e m are s o m e t i m e s
i := u n s i g n e d - i n t
assigns
type t r a n s f e r f u n c t i o n s .
records
to
and the
intellect.
of type elem,
t h e m s e l v e s m a y p o i n t to the same record.
the idea of d o m a i n s
has to be r e c u r s i v e l y
in o r d e r
that e l e m e n t s
to s o e c i f y
Thus
applied
of d o m a i n 0SI
to u n d e r s t a n d
o o i n t only
in o u r next
a l l o c a t i o n of records.
0SI
to e l e m e n t s
: domain
tyoe eleml
of 0S1
:
of eleml;
= record
next
: + 0SI;
val
: R;
4. Disjoint Collections of Linked Records
end;
4.1 Collections in EUCLID
and
that e l e m e n t s
of D S 2
Suppose
in PASCAL we h a v e to r e o r e s e n t
[of t y p e R], we can u s e two a r r a y s
:
: d o m a i n of e l e m 2
With
such a d e c l a r a t i o n ,
next
comoiler
that the sets $1 and $2 are disjoint,
say any m o d i f i c a t i o n
and v i c e - v e r s a .
namically
linked
tyDe list
elem
knows
linear
lists
we will
S i n c e we w a n t
cardina-
different
use dy-
:
different
eleml
= record
val
: list;
to g u a r a n t e e
domains
riable we h a v e
= + elem;
next
: R
end;
that is to
that n, the m a x i m a l
lity of the two sets is not known,
: t OS2;
val
of $1 has no s i d e e f f e c t on $2
Suppose
= record
of R;
the P A S C A L
;
:
tyoe e l e m 2
v a t $I, $2 = a r r a y E 1 . . n ]
to e l e m e n t s
two sets
0S2
of r e c o r d s
of 0S2 can o o i n t only
that two p o i n t e r s
into
can n e v e r r e f e r to the same va-
to c o n s i d e r
that +DSI
tyoes of Dointers.
and elem2 are d i f f e r e n t
to w r i t e
twice
deletion
...) which
the a l z o r i t h m s
handle
and
The t r o u b l e
types,
+DS2 are
is now that
so that we have
(insertion,
search,
the two s i m i l a r
lists $I
and $2.
: R;
end;
EUCLIDE1978]
vat $I,
$2
tyoes
This tffme, the r e a d e r s
compilers]
is m o r e
flexible
and a u t h o r i z e s
: list;
of the o r o g r a m
have to s u p o o s e
(e.g. P A S C A L
to be o a r a m e t e r i z e d .
Thus we will d e s c r i b e
the tyoes of lists $I and $2 once,
that the sets $I and $2
the d o m a i n
85
(called c o l l e c t i o n
as d e o e n d i n g
in EUCLIO]
on
to w h i c h
they
The
collection
Delong.
type
elem
collection
is p a r a m e t e r i z e d
to w h i c h
This
collection
type
elem]
type
elements
by
the
of t y p e
C is a c o l l e c t i o n
pointing
elem(C
to C
name
elem
Since
C of the
of
point.
of r e c o r d s
be p r o v i d e d
[o ~
ve
:
: collection
of e l e m ( C ) ]
we h a v e
entered
listsuoport
it by
a reeursive
in the d e f i n i t i o n
by an a c t u a l
some
type
=
lanzuaze
parameter]
listsuooort(OS
next
v a t val
record
0SI
vat
$1
: collection
Now
: collection
$2
cribed
once,
a parameter
on
0SI
been
insert
later.
Since
we
lists
$I
suffices
or OS2
$1,
record
and
$2 c a n
to pass
to w h i c h
of
parameter
actual
Now w e
have
the
they
type,
be d e s name
refer
$I w h i c h
to
of
indicate
insert[OS
actual
but
To m a k e
have
0SI
and
0S2
: collection
the
0SI,
different]
conspicuous
same
:
are
as
that
Since
the
type
linear
pends
on
listsupport
type
its
tion m u s t
be
o£
type
the
type
lists
just
of
records
0SI,
the
listsupoort[OS
elem
of
vat
0SI
val
type
: R]
formal
[or
the s a m e
type.
that
to 0S1.
as
port
0SI
is i n c o m p l e t e
"listsupport"
type
elem
We g e t
of
that
type
point
elem,
to c o l -
:
(OS : l i s t s u p p o r t ( p a r a m e t e r ] ]
next
the
: +C
record
= collection
: listsupport(OS1];
$I
: + 0SI
: listsupoort[OS2];
$2
: + 0S2
is p r e c i s e
with
but
somewhat
the P A S C A L
type
sum-
list
=
type
of
by t h a t
: ?]
the
elem
= record
val
var
:
= collection
Apart
o_~
:
above
since
of
elements
OS
of
: list;
:
R
and $2
from
the
the
and
are
disjoint
the E U C L I O
precision.
$2 are
;
difficulty
tage
of
: list
disjoint
especially
for
linked
of c o p e i n g
approach
Since
the
lists,
register
linear
with
has
it can
a new
the
comniler
lists}.
lin-
advan-
Knows
produce
that
better
allocation.
pointing
is a c o l l e c t i o n
Moreover
of
type
restricted
memory
: listsupport[?]]
$2
{$I
notion,
listsup-
:
listsupport(OS
$I,
guistic
$I
declaration
when
:
end;
de-
collec-
name
overcomplicated
declarations
+ elem;
of e l e m ( ? ]
as 0SI
of e l e m ( O S ]
name
collections
such
=
: R
listsuooort
:
is a c o l l e c t i o n
the
of
of
var 0S2
compared
= listsupport[OS1]
However
name
v a t 0S1
which
will
code
means
type
type
to r e p l a c e
CO : l i s t s u p p o r t ( o a r a m e t e r ] ]
vat
elem(OS]
such
is the
now w a n t
by the
listsuoport.
vat val
collections
= collection
parameterized
A declaration
be p r o -
record
type
of e l e m [ O S ] ,
to g i v e
the
of a c o l l e c t i o n
name
a shorthand
will
whet
we
type
next
type
that
in the d e £ i n i t i o n
listsuooort
:
OS,
different
we now w a n t
to the
linked
0S2
S
collections
that
type,
"listsupport"
porting
OS,
=
of e l e m ( O S ]
parameter
lists
of this
of t y p e
OS c o r r e s o o n d i n Z to the
parameters
that
indicates
in d e f i n i n g
in p a r t i c u l a r
lections
belongs
to d e c l a r e
vat
is c l e a r
to sol-
: listsuoport[perameter])
actual
supporting
end
procedure
have
= forward
r in list
0SI.
It
we
:
r)
to c o l l e c t i o n
the f o r m a l
the
succeeded
collection
elem(OS2]
type
the
parameter
used,
vided
:
insert[OS1,
possible
keyword
has
the d e f i n i t i o n s
it just
the c o l l e c t i o n
will
of
: + 0S2
the o p e r a t i o n s
The
use
must
of e l e m [ O S 1 ]
: + 0SI
var OS2
vat
: +C
: R
(each
listsuoport
collection
vat
vat
question
of
convention
record
end
of e l e m ( O S )
cord
=
86
the
variants
allocation
type
combination
R with
in r e c o r d s
strategies.
two v a r i a n t s
of c o l l e c t i o n s
may
yield
Suppose
Ra,
and
efficient
we have
a re-
Rb of d i f f e r e n t
memory
sizes
Type
say 1 and 3 w o r d s
Rtyoe
Type
= (Re,
R (tag
:
Although
tion
Rb]
: Rtype]
= record
case t a g i n
Ra =
...
end Ra
Rb ~
,..
end Rb
We have
cation
the
following
of R
of c o l l e c t i o n
compilers
seems
for memory
We w i l l
pointer
01:
I I I I I I I I I
v a t C2 : c o l l e c t i o n
of R[Rb]
I
v e t C3 : c o l l e c t i o n
of R{unknown}ll
[the
collection
type of records
(it m a y
will
vat
be R[Ra]
not c h a n g e
C4
(The
once
: collection
records
variant
values
during
Thus
defect
we c a n n o t
lists
and we may
to
in P A S C A L
let
collections.
represent
variables
a collection
which
point
by the
within
that
set of
collec-
Examole
:
X
Y
Z
C3 i s unknown
of a r e c o r d
I I ---I
C4 can
change
execution,
variants
ber of c o l l e c t i o n s
somewhat
its u s e f u l n e s s
t
to the
I I---1
from
C1 ~
of
collections
is d e t e r m i n e d
declare
an array
C2
one
by a s s i g n i n g
Collection
records).
will
The m a i n
no-
tion.
III
type
R[any]
of c o l l e c t i o n
of d i f f e r e n t
the
the
allocated).
of
to a n o t h e r
1
of collection
or R [ R b ] ] . T h e
appear
However
undeniable
discover
may
cower
allo-
:
o f R[Ra]
vat
in E U C L I O
to u n d e r s t a n d .
compiler
expressive
4.2 Compiler Discovery of Disjoint Collections
alternatives
of c o l l e c t i o n s
limited
difficult
the
end e a s e ; end r e c o r d
of quite
is that
at c o m p i l e
o~ d i s j o i n t
the
the
num-
01 will
be d e n o t e d
pointer
will
linear
not
ones•
:
mute
C1
try
Thus
the
Z].
(V;W],
We will
in o o p o s i t i o n
to rind
global
local
program
form
collection
try
of a p r o g r a m
However
the
at e a c h
be of
Y,
variables
collections.
time.
be d e n o t e d
[×,
into
disjoint
to EUCLID,
collections
we
but
local
try
to com-
invariants
we will
9oint
be r e s t r i c t e d
will
C2
to p a r t i t i o n
to
:
[V, W are
pointers
to the same
collection]
and
TIll
[X,
Y,
Z are
oointers
to the
same
collection]
T[2]
which
we w i l l
denote
:
{V, W / X , Y , Z }
We n o w have
TEn]
to d e f i n e
dicates
(i.e.
example
:
the
the
union
conjunction
of sets
~ of such
{ A , B , C / D,E} ~ { F , A , G / H} = { A , B , C , F , G
I £ on one hand A may p o i n t
It has
not b e e n
collection
tions
[which
be u s e d
tition
is in f a c t
which
would
criticism
recognized
the
are v a l i d
union
to the
the c o n c e p t
recursively,
a collection
a globally
declared
of
collec-
at v a r i o u s
be u s e f u l
is that
that
that
into
smaller
program
compiler).
disjoint
record
referenced
A similar
by F and G, i t
par-
sub-collections.
87
referenced
clear
that
to a
A, B,
on the same r e c o r d .
of
formation•
the i n s t r u c t i o n s
After
is
for
/ D,E / H}
hand A may p o i n t
The i n s t r u c t i o n s
cannot
cannot
to a record
on the o t h e r
C, F and G may p o i n t
ooints
of c o l l e c t i o n
is to say one
by B and C, o r ,
pre-
of c o l l e c t i o n s )
the p r o g r a m p r o v i d e
:
useful
in-
X
: = nil ;
X
:= Y; { w h e r e
if X
= nil
new
it is
Y is
then
known
to be
nil}
...
The
following
the
job
Known
var
that
or will
be
record.
Thus
X will
the o n l y
pointer
we h a v e
or c o n s i s t i n g
of
point
to no r e c o r d
to
isolated
a single
the
newly
at all,
With
copy
CI,
C1
X 2 .....
X .....
Xn /
Yi . . . . .
C2,
:: $1~
while
= {X i,
($1
L
: list;
v a t $2
: list];
: list;
$2
:: nil;
L
:= nil:
~P1}
input
predicate
P1
to do
{P0}
(empty
an
is s u p p o s e d
bezin
allocated
a collection
record).
procedure
:
procedure
[X]
PASCAL
Cl
<> n i l
do
begin
Yn }
{P2}
the
above
P2
instructions
= extract(X,
{X /
lead
to an o u t p u t
predicate:
new(C2)~
X1 .....
C2+.val
:= C l + . v a l ~
C2+.next
:= nil;
{P3}
PI)
X
/
n
Y
More
generally,
with
an
input
ter
assignement
such
as
:
Y }
"''"
n
i"
predicate
if
L = nil'then
{P4}
P,I, a p o i n -
$2
:= C2
{PS}
X+.next
...
+,next
:=
Y+.next
optional
may c a u s e
record.
The
output
+.next
else
optional
X and Y t o
Hence
...
they
indirectly
are
predicate
put
will
point
in
the
be P2
{P8}
to
L+.next
a common
{P7};
{P6}
same c o l l e c t i o n .
= P1
:= C2
L
u {X,Y}.
:= C2;
C1
:= C l . n e x t ;
{P9}
A sensible
the
remark
right-hand
in w h i c h
broken
plicity,
obvious
side
case
into
of
we
this m a y
ignore
the v a l u e
delivered
the a s s i g n m e n t
cause
two d i s j o i n t
case
{P1}
is that
this
may
a collection
other
be
e nd
nil,
{PlO}
to be
sub-collections.
fact,
by
For
than
end~
sim-
in the
According
:
the
X :=
Y+.next
...
+.next
{P2}
basic
establish
;
[I]
optional
P1
to o u r
constructs
the
abstract
of
the
following
interpretation
lenzuage
system
we
can
o£
now
of e q u a t i o n s
:
= extract[L,extract[S2,extract(C1,PO)
U {C1,$1})]
which
will
cause
X to be d i s c o n n e c t e d
collection
and
collection
of Y. W h e n
variable,
ted
be c o n n e c t e d
the o u t p u t
to the
inout
X end
P2 = e x t r a c t ( X ,
to a r e c o r d
Y are
assertion
assertion
P1)
from
[2]
of the
the
P2 will
PI by
~ {X,
not
its
be
same
rela-
(3)
:
P2 = P1 u P9
(Since
the
[C1
<> n i l )
mation
on c o l l e c t i o n s
when
assignment
of
deed modification
Now,
sen
the
we w i l l
copying
give
an e x a m p l e .
of a l i n k e d
linear
We h a v e
list
b y C2 a r e
cho-
:
us
no i n f o r -
non-oointer
in
the
values
structure
and
pointed
ignored)
[4]
P4 = e x t r a e t ( L , P 3 ]
(5]
P5 = e x t r a c t [ S 2 ,
(6]
P6 = P3
(since
88
gives
true]
P3 = e x t r a c t [ C 2 , P 2 ]
(The
Y}
test
we i g n o r e
[7)
P7 = P6 ~ { L , C 2 }
(8)
P8 = P5 ~ P7
P4) ~ { $ 2 ,
the
£act
C2}
that
L <> n i l )
a
to
[9]
P9
[I0]
= extract(L,PB]
[The
statement
same
collection)
PIO
=> P6 = P3 = { $ 1 , C I / $ 2 , L / 0 2 }
*[8]
u {L,C2}
Cl
:: C l . n e x t
leaves
C1
in
the
(7]
=> P7
(8]
=> P8
*
= extract[CI,P1
= P6 ~
{L,C2}
= {$1,C1/$2,L,02}
P7
= P5
u P7
~ P9]
{$1,CI/$2,C2/L}
Since
that
the
the
above
are v e r i f i e d
theoretical
system
conditions
of e q u a t i o n s
[Cousot[1076])
fixpoint
using
We
with
a #inite
has
we can
Kleene's
which
*
ensure
a solution
compute
the
Same v a l u e
least
only
dicate
PO,
($1,$2]
where
and
[CI,C2,L]
*
=> P1
on
on the
are
PO
[I)
the m o s t
other
supposed
= {$I,S2
hand
hand
the
the
initial
stop
on t h a t
o£ the
loop
path.
It
remains
:
Pi
PIO
ore-
= extract[el,
P1 ~ P9)
: extract(C1,
{$1,CI/$2/C2/L}
:
{$1,C1/$2,C2,L}]
parameters
local
to be in the
/ CI,C2,L}
out
sequence.
disadvantageous
the one
as a b o v e ,
the p a t h
(10]=>
start
~ {$1,C1/$2,L,02}
P8 = { $ 1 , C 1 / $ 2 , L , C 2 }
{$1,Cl/$2,C2,L}]
variables
same
= ±,
collection:
¥i
PIO
*
{ [1,10]
The
= extract[L,extract(S2,extract[O1,PO]
£inal
main
{cl,sl})l
= {01/$1/$2,C2,L}
results
result
records
extract[el,
are m a r k e d
is that
on e n t r y
although
of the
by
a star
$I
and
procedure
[*].
$2 may
"copy"
The
share
:
: extract[L,extract(S2,{SI,S2/C1/C2,L}
PO
u {Cl,Sl}))
*
P1
= {$1,$2/C1,C2,L}
= extract[L,extract(S2,{S1,S2,Cl/C2,L}]]
it is g u a r a n t e e d
that
= extract[L,{Sl,Cl/S2/C.2,L}]
of
:
=
{S1,C1/$2/02/L}
[2]
=> P2 = P1 u P9 = P1
[3]
=> P3 = e x t r a c t ( C 2 , P 2 ]
(4]
=> P4 = e x t r a c t ( L , P 3 ]
(5)
=> P5
the
orocedure
PIO
this
is not
the
case
on e x i t
= {C1/$1/$2,C2,L}.
u ± = P1
{SI,G1/S2/G2/L}
=
{$1,C1/$2/C2/L}
=
= extract[S2,P4]
4.3 Remarks
u {$2,C2}
a. This
{$I,CI/$2/C2/L}
abstract
interpretation
of p r o g r a m s
may
be
u {$2,C2}
refined
as in E U C L I O
one
associate
: when
records
have
variants
P5 = { $ I , C 1 / $ 2 , C 2 / L }
*
(6
=> P8 = P3 = { $ 1 , 0 1 / $ 2 / C 2 / L }
(7
=> P7 = P6 u { L . C 2 }
{$1,C1/$2/C2/L}
u {L,C2}
can
of tags
of
in f a c t
will
all
velopments
with
each
records
be
xib]e
=> P8 = P5 u P7
= {$1,C1/$2,C2/L}
than
authorize
u {$1,C1/$2/C2,L}
{A,B}
application
of o a r a g r a p h
3,
We will
the
or
"any"
"one
oT"
collections
amon Z three
the
in the c o l l e c t i o n .
the m a i n
{ $ I , C 1 / $ 2 / C 2 , L}
(8
collection
with
say
possibilities
set
This
o£ o u r d e -
be m o r e
of E U C L I D ,
#leand
two v a r i a n t s
{A,8,C}.
Other-
= {$1,C1/$2,C2,L}
[g]
=> P9 = e x t r a c t ( L , P S ]
*
P9 :
in
variant
PIO
PO . . . . .
=> P2
= PI
:
*
P2 =
the
while-loop
have
stabilized
{S1,Cl/$2/C2/L}
the
reason
on
{A,B,E}
in-
the
following
= T
:
~
{A,B}
{A,C}
{A}
{B}
{B,C}
{$1,Cl/S2,C2,L}
{$1,01/$2,C2,L}
=> P3 = e x t r a c t ( C 2 , P 2 ]
*[4]
=> P4 = e x t r a c t [ L , P 3 )
pass,
until
we
:
u P9
*(3)
We come b a c k
stated
rarchy
{$1,01/$2,C2,L}
We go on c y c l i n g
[2]
wise
u {L,C2}
= {$1,C1/$2,L/C2}
{}=±
{$1,CI/$2/L/C2}
=
for
P4 w i t h
the
so we s t o p
on t h a t
path.
value
of
the
previous
89
{C}.
type
hie-
whereas EUCLID uses a s i m p l i f i e d type inclusion
fuse the automatic discovery of collections. The
scheme
advantage however must be c o u n t e r b a l a n c e d by the
:
{A,B,C}
/
(A}
fact that p a r a m e t e r i z e d collections
\
(B}
(which are
= T
necessary with recursive data structures) may
become inflexible and d i f f i c u l t to use.
{C}
\l/
We now come t o
{}:a
between
tely
b. Besides and in opposition with EUCLID the collections are defined as local invariants, Very pree
the
programmer
necessary
unions,
an e x a m p l e
that
for
is
and
the
secure
to
say
where
a cooperation
compiler
and c h e a p
a case
has
definite
disadvantages
5.
Integer 8ubrange Type
when
over
use
the
the
is
absolu-
of
type
compiler
programmer.
cise and detailed information can beJgathered
whereas the EUCLID p r o g r a m m e r would have to
g l o b a l l y specify the union of such information.
This localization of c o l l e c t i o n s may have imporA subrange type such as
tant c o n s e q u e n c e s
type index = 0..9
- An optimizing c o m p i l e r will be able to limit
the number of objects which are supmosed to
is used to specify that variables of type index
have been m o d i f i e d by side-effects when assigning to objects designated by pointers,
will possess the properties of variables of the
[use-
base integer type, u n d e r the restriction that its
ful in register allocation],
value remains within the specified range.
[19751). In Cousot[19751, we developed
- R u n - t i m e tests may be inserted before a statement
i n t e g e r variables. Let us take an obvious example :
to verify that no variable in the c o l l e c t i o n
i i= I;
of X may access the record which X points to,
{P1}
while
- The g a r b a g e collector may be called when all
variables in a c o l l e c t i o n are "dead"
The s i m p l e
abstract
we i l l u s t r a t e d
stylized
may u s e
the
that
ways.
fair
the
interpretation
here
recognize
is
_< 1 0 0 0 do
i
:=
i+1
{P3};
{ P4}
- etc...
It
i
{P2}
[i.e.
are not used before being assigned to),
c.
[Wirth
a techni-
que to have the c o m p i l e r d i s c o v e r the subrange of
:
dispose[X}~
to
:
:
may be f u r t h e r
data
structures
Boom[1974],
however
to
say
provided
a d v a n t a g e of EUCLID
Let
programs
investigated
are
used
that
to
by t h e
ple
in
EUCLID c o m p i l e r s
locally
refine
programmer.
by [a,b]
the
equations
predicate
a _< i
corresponding
to
_< b .
our
exam-
:
(1)
P1
[2]
P2 = [P1
[3]
P3 = P2 + [ 1 ,
(4]
P4
where
and
(or better part of
= [1,11
= (P1
+ is
u and
vals.
intentions since the expressive power of collections is limited),
is
of
u P3)
n [-oo,
1000]
1]
u P3]
n [1001,
+oo1
The
i~ then t h a t w h e n the p r o g r a m m e r
has declared his intentions
us d e n o t e
The s y s t e m
Kerr[1975].
same t e c h n i q u e s
collections
of
defined
n are
Suppose
by [a,
union
b]
and
+ [c,
d]
= [a+c,b+d],
intersection
we knew t h e
solution
of
to
inter-
that
system,
iie.
he is forced to c o n f o r m
to his declarations. For example he will not
P1
be able to use the same pointer variable to
P4 = [ 1 0 0 1 ,
traverse two lists which are built in diffe-
It
rent collections. On the contrary this may con-
this
90
is
= [1,1],
obgious
solution
P2 = [ 1 , 10001
, P3 = [ 2 ,
10011,
10011
to
let
is
a -Fixpoint
the
compiler
o-F t h e
verify
system
that
:
[1]
-~-> P1 = [ 1 ,
1]
(2]
~>
u P3]
P2 = (P1
([1,
1]
[[1,
1001]
= [1,
[3]
~>
~>
u [2,
P3 = P2 + [ 1 ,
10001
1001])
n [-oo,
clear,
n [-oo
I000]
[1,
:
[2, 1001]
To make it
instead of P1, Pi, PI, P4 the programmer is
only able to declare vat i :
1000]]
1 ..1001
that is to
say that PJ u P2 o P3 u P4 E [I, 10011 which adds
an inequation to tiTe system o£ equations but does
1]
:
P4 = (P1
[-oo,
1000]
1000]
[1 + 1,
(4]
compiler to discover local subranges.
n
not provide its solution. We then consider integer
+ [1,
1]
suorange types as union types since the global de ~
1000 + 1]
claration must be the union of all local subranges.
Thus, if
U P3]
n [1001,
we declare
:
+~1
var i : 0..2;
[[1,
1]
u [2,
[I, 1 0 0 1 ]
:
[1001,
1001]]
n [1001,
n [1001,
+~1
we really want to say that the type of i at each
+~]
program point is one of the following a l t e r n a t i v e s :
1001]
If on the contrary we want the c o m p i l e r to dis0..2
cover this fixpoint, we may try to solve the equations by algebraic m a n i p u l a t i o n s
[Cheatham and
TownleyE1g761] which may be quite inextricable.
other way is to use Kleene's sequence,
The
but the trou-
ble is that our abstract data space is an infinite
lattice, and we may have infinite sequences. Since
0 . .0
2,
.2
compilers must work even for programs which may
turn out to loop,
the only way to cope with the
undecidable problem is to accept a p p r o x i m a t i v e ans-
]t
wers. For example in the program :
for i := I to 100 do
We t h e n
begin
understand
a criticism b y H a b e r m a n n [ 1 9 7 3 1
that sLbranges ere not types,
n := i;
while
if
write
since a global sub-
range type definition does not determine the set
n <> 1 do
even(n]
of operators that are applicable to variablesof
then n := n/2
that type.
else n := 3 * n + I;
For example,
(i]
let f be a {unction with one formal
parameter o# type 2..I0 and i a variable globally
end;
declared of type d.,5. The variable i may be used
at program point p in the expression f[i] provided
C o u s o t [ 1 9 7 5 ] will d i s c o v e r an approximate range
that i may be united to the subrange 2..10. Oynafor n which will be [1,
+~].However,
if the actual
mically the local type of i at program point p is
range of n were known by the programmer and if the
i..i, wnich is simply derived from the value T of
programmer could tell this to the compiler,
then
the variable i. in the expression f[i],
a v e r i f i c a t i o n would be simpler
i must be
[in most cases but
coerced from the type i..i to the type 2..10. This
not on this difficult example].
is
fies that the subrange of i at program point p must
We can now state our main objection against subrange types in PASCAL
safe when 2 ~ i and i s 10. Staticly this signi-
be a subrange of 2..5. This subrange of 2..5 cannot
: the fact that range asser-
tions must be given globally in the d e c l a r a t i o n pre-
be locally specified in PASCAL.
vent the programmer from giving the solution of the
This u n d e r s t a n d i n g of subranges leads us to the con-
system of equations
clusion that integer subranges should be specified
to the compiler.
The programmer can only give an approximation of
locally. Moreover,
the solution,
vious examples we cannot exbeet the compiler to be
which is usually insufficient for the
9]
and in opposition with our pre-
able to discover these local subrange properties.
results will be obtained by type checking or type
It is then essential that orogrammers provide them,
discovery as long as finite type systems are consi-
by means of assertions or as previously by means of
dered. The main difference b e t w e e n these approaches
conformity clauses so that we would write in the
is the one between security (at the expense of fle-
spirit of ALGOL 88 (MeertensL1975])
xibility) or simplicity
:
[at the expense of preci-
sion, and of the possibility that c o m p i l e r warnings
i := 1;
be ignored).
while case i in
However when the type union system is infinite
[I..I000)o:(i := O + I; true),
out
teger subrenge type),
:false
[in-
it has been shown that static
type checking necessitates language constructs which
coPe
eilow
do
subtypes
to be locally derived.
skip od;
The argument was based upon the o b s e r v a t i o n
These constructs give the solution of the system of
equations which the compiler has to solve for
that type v e r i f i c a t i o n consists in
strong type checking.
a solution to e system of type equations.
The redundancies
[equations
establishing
Global
identically verified) can be eliminated. Moreov!~r
tyae declarations give an a p p r o x i m a t i o n of the so-
the PASCAL r e s t r i c t i o n that the bounds of ranges
lutions to that system.
must be m a n i f e s t e o n s t o n ~ i s
lar solution from that a p p r o x i m a t i o n may involve
o definite advantoge
The discovery of o particu-
since this v e r i f i c a t i o n will involve no symbolic
infinite computations. On the contrary,
formula manipulations.
guage is d e s i g n e d to directly provide a solution
Run-time tests will remain
necessary in difficult cases, but their number will
to the compiler,
be decreased.
straightforward verification.
if the lan-
type checking consists in a
This reasoning might turn out to be useful
to language designers who until now could not lo-
6.
Conclusion
gically prove the validity of their design of language constructs. M o r e o v e r this reasoning may ser-
We illustrated the Tact that unsecure data
types
ve as a basis to define type safety in languages
[which do not guarantee all operations on
values of that type to be meaningful]
sidered as the union of secure
oSes of these were pointers,
records in collections,
and prove particular languages to be type reliable.
can be con-
(sub) types. Exam-
varionts in records,
7.
integer subranges.
Acknowledgements
P. Cousot benefitted from d i s c u s s i o n s on this topic
A type-saTe programming system must staticly determine which sate subtyDe of the union is used when
with colleagues in IFIP Working Group 2.q
checking correct use of operations on union typed
oriented hi,her-level languages). The o r g a n i z a t i o n
objects.
with J. Horning and K. C o r r e l l o f a debate on EUCLIO
The language d e s i g n e r may achieve this
goal by one of the followin Z alternatives:
was especially helpful.
- Incorporate rules and constructs in the
We w e r e
language so that any operation of the langua-
for
ge can be staticly shown to be o p e r a t i n Z on
correctly typed arguments.
- D e s i g n a compiler in order to verify that
the security rules have not been transgressed,
It
compiling
although not enforced by the language.
was a r g u e d
techniques
that
in
must
both
cases,
be u s e d ,
the
[Machine-
same
and c o m p a r a b l e
92
us.
very
lucky
to
hove
F.
Blanc
do t h e
typing
8.
Bibliography
MOL-Bulletin
[P. Cousot
ed.],
Nb 5, Sept.
1976,
DO. 91-172.
ALGOL 6 0 1 1 9 6 3 ] .
Naur P.
[Ed.]
" R e v i s e d R e o o r t on t h e A l g o r i t h m i c
GOL 6 0 " ,
CACM, 6,
[Jan.
1983],
Language AL-
HabermannL 1973 ].
Do. 1 - 1 7 .
"Critical
Comments
Pascal",
ALGOL
6811875].
Van Wijngaarden
" R e v i s e d R e p o r t on t h e A l g o r i t h m i c
GOL 68", Aeta
Informatica
(Eds.].
A. et al.
Language
[1973),
Sprin-
Language AL-
5 [1975],
pp. 1-236.
Hoare
and Wirth[1973].
"An Axiomatic
Definition
guage PASCAL",
Analysis
3, 47-57
ger-Verlag .
90om[1974].
"Optimization
on the Programming
Acta Informa~ica
of Programs
in Languages
of the Programming
Acta Informatiea,
Lan-
2, 1973,
op. 3 3 5 - 3 5 5 .
with Pointer Variables".
Ph.D.
Thesis,
Dept.
ce. University
Appl.
Anal.
oT Waterloo,
and Camp.
Ontario
Karr[1975].
Scien-
[June].
"Gathering
Come.
Cheatham
and
Associates
About Programs",
Inc. CA-7507-1411,
Mass.
July 1974.
Townley[1976],
"Symbolic
Evaluation
DO analysis".
of Programs
Proceedings
posium on Symbolic
August
Information
Kildall[1973].
: a look at lo-
of the 1876 ACM Sym-
and Algebraic
Comeutation.
1976.
"A Unified Aooroach
to Global
tion",
of ACM Symposium
Conf.
Record
ples of Programming
pp. 194-2U8,
Program Ootimiza-
Languages,
October
on Princi-
Boston,
Mass.,
1973.
Cousot[1975J.
"Static Determination
Programs".
U,S.FI.G.,
in the proceedings
of Dynamic
RR 25,
Nov. 1975,
of "Colloque
sot la orogrammation",
Properties
of
Kleene[1952].
to apoear
"introduction
International
Parls 13-15 April
Publishing
LIS[1974].
Cousot
Couset[1976]
model for static
truction
Fourth
: a unified
analysis
ming Languages.
lattice
of programs
or approximation
ACM Symposium
Manual
by cons-
on Principles
Los Angeles,
of "Colloque
tion",
to appear
13-15 April
languages
Pascal[1974].
in the proceedings
"PASCAL,
sur la Programma1976,
18,
Mitchell
J.G.,
"Report
Homing
"SINULA
and formal
[1875),
O.J.,
J.C.,
LIS"Reference
Oecem-
1976.
London R.L.,
1875"
[S. Sehuman,
in Algo-
ed.]
Paris.
K., Wirth N.
Science,
and Report",
Nb.18,
8irtwistle,
Dahl,
Lecture
Notes
Springer-Verla Z, 1974.
Myhrhaug,
Begin",Student]itteratur,
on Soecific
Jan.
Language
in "New Oirections
Lund,
Nygaard.
1874.
Sintzoff[1972]
Popek G.J.
on the Programming
IRIA,
Jensen
"Calculating
B.W.,
Language
Report 4549 EI/EN,
January
User Manual
in Comouter
Ounod.
non determinacy
of programs", CACM,
Lamoson
Heliard
:
pp. 453-457.
EUCLID[1976].
Languages
SIMULA[1974].
commands,
derivation
Revised
"~Iode and Meaning",
1977.
Oijkstra[1975].
"Guarded
Implemen<ation
pc. 125-138,
International
Paris,
J.P.,
Meertens[1975].
of Program-
January
type in programming
a new approach",
Rissen
, CII Technical
ber 1974,
and Schwartz[1878].
"(he pointer
J.O.,
of fixooints".
rithmic
Oembinski
Ichbiah
P.
"The System
Interpretation
North-Holland
1978,
Ounod.
"Abstract
to Metamathematics",
Co, Amsterdam.
EUCLID",
93
1972,
Properties
Models",
of Programs
SIGPLAN
DO. 203-207.
by Valuations
Notices,
Vol.
~ Nb.1,
Sintzoff[1975].
"Verifications
utilisables
d'assertions
comme valeurs
riables ext6rieures",
programs",
oour les fomc~ions
et affectant des va-
in "Proving
and Improving
ed. G. Huet and G. Kahn,
IRiA.
Tarski[1955].
"A lattice-theoretical
its Apo]ications",
matics,
5 11955),
F±xooint
Theorem and
Pacific Journal'of Mathepp. 235-30B.
We~breit[1975].
"Property
Sets",
Vol.
Extraction
i.E.E.E.
SE-I,
in Well-FoundeO
Trans.
on Software
Property
Engineering,
nb. 3, pp. 270-285.
WirthE1975].
"An Assessment
CAL",
SIGPLAN
of the Programming
Notices,
Vol.
10,
Language PAS-
nb. 6, June 1975,
pp. 23-30.
94