Download UniCC User Manual - Phorward Software Technologies

Transcript
UniCC Parser Generator
2.6.3. The virtual machine
The topic of intermediate code generation performed by a compiler to build a binary program or directly run it
is that huge and complex, that it can't be covered in this manual or even in huger compiler textbooks. You
only have to know, that every compiler creates some kind of intermediate representation of its input. In most
cases some kind of a more or less abstract tree structure (abstract syntax tree) representing semantics and
nesting, but there are also compilers that compile into an intermediate language that can be used for further
optimization or platform-dependent code generation. Both methods are used in many of todays existing
compiler implementations.
An abstract syntax tree (AST) is the representation of a parse tree where atomic syntactical elements are
hidden, and only the semantic information (e.g. a literal from the code, a variable address or a node defining a
conditional statement) and its nested structure is clearly defined. Such an AST-representation could already be
used for a direct interpretation of the input. In C, for example, such an abstact syntax tree could be expressed
as a huge union data structure, providing several structures for every language element. There could be one
structure describing a value, one structure describing an assignment, one structure describing an IF-construct,
and so on. The result is a tree structure that is made-up of many of such nodes, links and leafs that can be
easily interpretered by a recursively executed virtual machine. Althought this method is very simple to
implement, it still requires a lot of coding efforts, and saving this tree to a file is not possible without the use
of some nested structuring. The benefit of this method is, that no backpatching is required, because the
structure is a logical tree.
The generation of intermediate code could also be done from an abstract syntax tree, but it can also directly be
done out of the parser (in simple languages), where the parser serves as the immediate representation of the
abstract syntax tree during input recognition. Intermediate code, in turn, can be established on various
paradigms. 3-address-code is very useful to generate optimized code for register-machines. 1-address-code
can be used for stack-based machines, like the Java Virtual Machine.
This is also the approach we want to implement in XPL. We define a virtual execution machine for a
stack-based virtual machine, interpreting 1-address-code. This simple virtual machine remains entirely
independent from the overlying language (here XPL). It can also be seen as a target-platform for any other
kind of language, e.g. a simple BASIC-dialect.
2.6.3.1. xpl_value - a dynamic value object
The stack-elements of the simple virtual machine are described in a structure called xpl_value, defined in
xpl.h.
/* Virtual machine values */
typedef enum
{
XPL_NULLVAL,
XPL_INTEGERVAL,
XPL_FLOATVAL,
XPL_STRINGVAL
} xpl_datatype;
typedef struct
{
xpl_datatype
type;
/*
/*
/*
/*
Nothing/Undefined */
Integer type */
Float type */
String type */
/* Value type */
union
{
32
2.6.3. The virtual machine