Download - Theses
Transcript
on the automatic construction read/write -------- of storage ------- eragram translators with minimal reouirement. ----------- by Majid A thesis submitted accordance with degree Doctor of Azarakhsh the University to for the regulations of Philosophy. of the Glasgow award of in the I ACKNOWLEDGEMENTS I like would to for University Pahlavi for study I a higher would appreciation for workp of were Gilles me the giving apportunity and to. degree. like to express to Mr. D. G. his D. C. Professor thank continued great Jenkins value in who supervised interest and support thanks sincere ny bringing and this ' which this thesis to the staff of University of completion. I the also thank Science Computing Glasgow who all have helped Majid members of Department me in of so many the ways. Azarakhsh ý.ýaýý CONTENTS CHAPTER 1: INTRODUCTION Motivation Philosophy 2: APPROACH Principles the of Grammar and Formal KHAR Definitions System Notational Lexical in Scanning in KHAR KHAR Conventional Semantics Process Compilation Conventional Syntax of approach Semantic in Processing KHAR Code Generation 3: DESCRIPTION OF KHAR The recogniser Preparation and the of Code Name File Internal input to graph the encoder KHAR system and Code Name File Form of the Source Index Program Encoding Fixed Codes and Variable A practical 4: Codes example SYNTAX CHECKING & ERROR RECOVERY Syntax Graph Transition Valid f,latrix elements Checking of technique a Transition used in Matrix KHAR i ý Formal algorithm An example Error recovery Specific Actions An example of their Error States Action General Error Summary of 5: use Actions SEMANTIC PROCESSING & CODE GENERATION 1, The Semantic Mechanisms of KHAR Semantic Actions Semantic Checking Pass 1: syntactic Pass 2: dealing Pass 3: checking variables Pass 4: checking procedures Attriobute in PL/O processing with manifest in Propagation constants Expressions Code Generation 6: THE INTERFACE TO KHAR Syntax Languages The General Syntax Graph of the Language Symbols Separation of Use of pointers Special Actons Syntax Graph Transition Action Syntax General of SL SL and Language in Transition SAl to for Table Table SL Languages of for SA12 SLO for SLO SLO Keywords Table Coding SL1 in of SLO Symbol The End-State Description Statements the of Actions 7: SL4 Syntax of Syntax Graphs Coding of SL4 SL4 in SLO IMPLEMENTATION Outline of prototype (TT) Table Transition and Action of Structure of AT Files and Programs Used Files used in system used Processes Table (AT) TT Structure Programs 8: of used the in in the system the system DISCUSSION & CONCLUSIONS Size Simplicity and Extensibility Portability Clear and flexible Definition of Potential for Applicibility Interface language semantics development to Programming Languages for Microprocessors Conclusions ADDITIONAL MATERIAL: LISTING OF THE KHAR TRANSLATOR SYSTEM ; I I ABSTRACT We present a translator amount minimum of -the syntax handling of syntax AML/1, and where a use is this a L.L.M. are which to illustrate similarly, of Wirth's of the semantics to to the checking application recovery KHAR's handling designed languages language, error the checking to illustrate PL/0, and is environments for may be used oriented a machine applicationto in storage the system and use its We describe of I read/write The system resource. scarce KHAR, which system, KHAR's use its mini-language, We show, of semantics. of too, how for the not found in PL/O can be handled. features interface The to the KHAR of a language designer/implementer is a set of semantic Cordy, to which may be added error These graphs are encoded in a development Languages. The linearized two of sets tables, actions integer simplicity the as numbers of this We report of minimizing its with the actions, to encoded a point read-only form of simple on the mechanism with degree work storage to which requirements. actions. Syntax is translated into to recognise the second in the syntax. stack, language mechanism is due to the multipass after here, of BNF, called a push-down automaton be taken at that and graphs, and code emitting cross-linkage with on registers operate We compare this to action recovery one to drive language, the CFG of the defines graph, provided system which which These handle symbols. The of KHAR. nature those used by Cordy. KHAR meets its design objective We also design, because portability of compilers with the note of for a compiler of the applicability its clear KHAR system microcomputers. writing system. of KHAR to and flexible and its research interface. implications We also compare in language We discuss for the the the production features of KHAR CHAPTER 1: INTRODUCTION MOTIVATION At the start to compilers critical, candidate adopted possible, portable. for to PL/1 or and PASCAL are inappropriate Any language seeking to microprocessors and code Such is under development of the production consists so and engineers factors on. to generate are highly to language moving from logic urgent. High level languages unless severely feedback important, features called direct typical of for conditions A Microprocessor CJENKINS]. The development process languages, AML/1, of the refined the use of each language by designer. Since the language design be able to respond to user feedback, providing good recovery such as of access to variables implement, error subsystems must allow necessary requires to research if sub-setted. length restriction of a series Refinement language, real-time necessity language, a As no and, the checking, structures. Language, AML/2 type tire on-Line, independent multiple imposing while say, reliability, and multipass [PIERCE]. exist to be of use to engineers byte the within language engineers is small, a standard be the subsystems for partitions for to Secondly use by electrical need was known to exists, have would microcomputer access background systems existed, the work, small control any method designs in work process clear this of and reporting while subjective must be quick to at the same time from the outset, for, the language would be unusable otherwise, Good languages, [WIRTHa] machines are is large for one for 8080 which to a system for a small A reasonable runs in 48k byte. kit starter was low on work on tables held Languages could in yet pageable be made available thought byte 56k a the satisfy for need development on a typical microprocessors. I say, 8k of RAM, to contain, a translator space, ROM or would available might be expected but upto 32k byte; of ROM. If which Neither is It CORAL 66 compiler the with present at in company investigating electronics PASCAL still of itself large on run a 16k word machine. compiling language to implementations may be contrasted level high of obtained. known implementation in runs capable This the-INTEL of The smallest is compiler make majority PDP-11 which [PUG]. machine as PASCAL, were designed and the machines. the this that such and no feedback system were made available used relatively into RAM as little code, a required, relying range of economically. PHILOSOPHY This, thesis presents to construction methods the actions the of interpretation which each by laboriously the of are pass a single translates consuming as. little KHAR, the Farsi work for space of results translator a complete encoded as "translator source entries engine", code into This as possible. table-driven re-applying in which tables for scheme, into a "donkey" object engine engine, code while has been named "donkey". t The desire advanced as to be able to develop one reason for adopting stand-alone "good" a multipass-approach. compilers was A second issue is must be that specific The PASCAL compiler by a machine of just These approaches use this cannot require of design is to language, fact, a context free of the art" language since it is in implementing is to This PASCAL. Study is use the used of the as prudence the AML series, AfL/1. different languages of by the by the terms of AML/1 is, basic type, in for calls the Ile and security in specification first programming manipulated presenting of hardware. underlying the use manipulated first of as a step onto KHAR system for range the of the ideal a EPASKO]. machine its is no features PL/Q, which CWIRTHb]. of CAM ANJ. actual of which map the machine one University across AML sequence the machine a language considering has been described The cost present Free PASCAL of virtual actual of Thus checking. KHAR before to machine the first the by machines of discussed use pseudo-code whose virtual the We thus permit. a of as close to examples same as the Language such semantic the a high-level programmer r be must programmer since approach by Poole are The implementation hardware. actual abstract is be portable to approach, virtual and accessible environment designed outlined or abstract and 'CTANENBAUMb], code was presented machine portability by the produced code A similar production intermediate EM-1, of machine, optimised microprocessors. the "abstract" language engineering as new microprocessors issue general the - generates interpreted in it, to EBAUERed]. of The market. electrical and transportable adaptable approach Amsterdam a "good" portability: rapidly on the appear in of in a new language error both the presentation of high, technique recovery Vrije is and the the since proposed Celfast PL/O compiler by Amman compilers in "state for Ck4IRTNb] the reveals of amount on an interpreted language minimal to required effort code the abstract for compiler for designed machine this the language. An alternative to a use microprocessor, CBROWNb], to development. in driven by the paid to methods. basis of introduced be the to has been KHAR the number of be free included semantic approach in of the primitive is, not actions but and reporting are major CWIRTHb), of new has attention table- when considering compilation phase or that throughout: recovery error and CGLENNIEJ. as a means of becomes recent a not driven work error compiler. are this at it result, a Conventionally, of AML/1 of and any attempt As be CBROWNa] design and code generation semantic by a semantic As discussed table may have to possible the checking techniques pioneering guaranteed in a conventional phases graphs I1L/1 macros enormously. is CTANENBAUMa7. One language the appears discusses Wirth in is compiler a conventional of the information. Glennie's multi-pass discussed Notably, syntax approach the as syntax however, syntax semantic is stream task recovery. taken use of any type of checking, the the writing compiler syntactic error of writing KHAR, driven to of driven Table in that the which the complicates comparable been task The addition recovery also that influence an [JENKINS]. only is make the to altered approach feature undesirable to defining syntactic errors. system described operations semantic given that This here, the was CCCRDYJ has Cordy the a compiler, this actions the table input driven although can be reduced in due to adopted. by Bauer in IBAUERed], and by Wirth in CW'IRTNb], a method can be used to derive systematic the grammar of its from machine. abstract made and of the syntax of the language, defined a suitably are methods in so that required, to be most likely of the errors an understanding requiring necessarily hoc ad translator a cannot be treated recovery error that and way systematic a such that of coding the code for into language Both state the can recovery sensible be attempted. handling Error attacked table-driven using has which the of addition explicit to written procedure to approach information in as discussed CGRIES] The system is constrained grammar. LL(1) Griffiths in they ECAUERed]. make table recovery provides sufficient automatic [FOSTER] d) LL(1) Here, on the to language. over the of entry designer arise KHAR since to a The define semantic separates rigidly to languages it since is application enough improving techniques to discussed read degree of "look by ahead" to enable good recovery, (e. g. grammetrs so that easier are LL(1) an say that even a small information are to with useable, redundant improve Languages accept their methods is simpler syntax exist driven to and grammars b) error c) the problems adopted, programming of allow for and semantics. syntax a) to No productions. recovery error KHAR is in been Currently, set" a non-terminal has [AMMAN] is the "follow the recognise recovery error, Amman of and involves to symbols manner efficient by James [JArES]. method limitations definite an methods PASCAL compilers, production in and recovery they and use Foster's are SID) LL(1), [HOARE]. last This is point of its KHAR, objective of remarked by watt [WATTb], designers and implementors Context of Free the a wide of variety facilitated such of A CFG is the capable syntax-directed construction available to large a and the deterministic efficient As has been the techniques parsing of tools defining of language, programming considered. languages programming design is design, useful most final the when language the one of a typical of in use Grammar(CFG). syntax importance great part existence of CGRIES] has from parsers definitions. syntax S further He remarks Firstly, they are CFGs that incapable Secondly, they semantics to One approach in adopted syntax. translators, provided, and production rules, of no provide that names defining of features. to is, ' in two this respects. context-sensitive "semantic conventionally been is routines" inserted are routines linking has frequently problem of syntax of means expl4cit set a semantic an approach deficient are in associated with itself being RHS of the top-down parsing. The KHAR system driven recogniser points of syntax graph, the of uses KHAR approach, may have verbs a CFG, which Our work grammar. as in this heavily relies CWIRTHb] and the placed on semantic graph, at the table a appropriate of the as introduced in use ECORDY]. we have minimised However, and KHAR by dealing time. This internal the simplified with semantics simplification the number structure (context arises of of verbs the semantic sensitivities) from the (actions) required mechanism one aspect multi-pass at philosophy of a adopted throughout sensitivities necessary advantage sensitivities are actions that the work. defined added the at The meaning and implications in the designer terms of appropriate is a semantic graph with the points. This has the constrained to one at a time and thus the user can find way in which a sensitivity is handled. context of describe out the these exact CHAPTER 2: APPROACH The objective this work system for of portable translator languages to be implemented intend We also to it use both object for Regarding compile compiles the efficiency of the concerned about rather than its run-time and programming and to be highly and able configurable terms of the language. any particular intended in be translator system we are of any compiler produced demand. in this implemented. any source program level system in such a way to the space requirement Some of the programs language of high time environment which it more each this flexible a construct computers. time applications real machine for to a variety on small to generate in terms of its is system are executed Their can be run. task is to create These are permanent only once a few files files as for before long as no change is made to the language. The basic PLauger[KERNIGHANI; processes predecessor, to approach each if a of any. this system which is consumes work is constructed as input that of of the Kernighan a sequence output of of and subits i PRINCIPLES OF THE APPROACH OF KHAR In our many to separate be carried We to aim to compiler little to out by a process reduce the a minimum and make more input figure. model the with compiling Ideally, one input of it possible in storage. streams the work in use A process might a prelude or can be illustrated output the of computers setting-up is stream. any pass itself as task separable and one to into process every maximum memory requirement one general break as possible. read-write the following processes available Ir read Thus we aim to approach with be set up phase. as shown in the I I -source program 1 _,__1 -T ----ý1 I1 ----ýI I III process 1II IE-----I ý"T ý ý I II 1F---- intermediate Me I II I ii table -ýý process 2 Ii iE-----i of ý I 1 intermediate Me inforI'mation F---.ý ý ý ii ----ý 0 0 intermediate i i E---' file I I j-----ý I ---)-I process nIIcode <--------ýýý This shows a number of subprocesses symbols (which can be understood process consumed by one process and is the concrete linear a stream by the programmer as a program) stream of symbols which is a program for The input which transforms the actual is the representation of to a machine. output of a previous of a data structure. I is transformed This data structure by the process, instructions, the the figure should to only be noted that distinct several or by a different on associated with in changes state processes as may need access table this with modes. (Modes 1,2 used graphs to set of the GRAMMARAND and is access in encoded one of of system more fully in elements "language or matrices four by translating ) in the figure determining structures of its one and modes- 1 programs, SL languages. graphs of of control in each of three and action the execution the KHAR machine each of and 3 are used to translate encode transition the or under the graph, since the processes. they a are We show this 3. chapter FORMAL DEFINITIONS Language is based on a vocabulary. All be either successively We do not show the syntax separate can executions syntax KHAR operating graph, rules of information The table The individual numbered 1-n programs controlled the some set of accordingly. These processes aspect of base of information this information of the a subset controlled syntax execution above is a set of files of the process. result 4 the way S It and in some specified output procedure. given processes. the through a program or a of right is that into of keywords". formulas this vocabulary are For each of these which define the For programming called languages "language is defined set of well-formed languages symbols" a set or of sentences. These rules formally are G(Z) defined part Those of least at symbols nonterrninals the of the and 'V' We only language of left a the 1) The initial those languages that terminals 2) For every nonterminal set of its symbols that Therefore its of called are terminals '<I and and '>' for grammars the satisfy may follow by discussed can parts right (The initial in appear the of productions (the symbol of A is the first position of from A. ) derived the rules brackets whose symbols). must be disjoint. sentences as them from terminals. symbols of alternative set of all appear : rules director the union angular is symbol. terminals. called is a class distinguished of part are use underlined Z must the called this of productions. and is a language of consider set symbols to distinguish nonterminals is as other We will nonterminals. following one rule appearing vocabulary 'e', finite how determine they is produced. of the language any programming be a nonempty to because productions sentence correct A grammar left called symbol A, which generates initial symbols must be disjoint generated any sequence Griffiths the empty in from sequence from the set of A. (This point 2. b of CBAUERed] and by Wirth ch in EWIRTNb]. ) NOTATIONAL SYSTEM To describe other another words a language. these productions metalanguageThe notation we need a notational a language we use is in called system, or in which we can describe codified Backus-Maur Form (tBNF). It construction. These notations are as follows: 1) underlined braces } multiples from an presents { and a which <label part> <const part> language the of used to enclose are choice may be presented Expressions explanation exact be must made. like vertically this 0 a . like or horizontally { <Label this part>l part>l<const } ... the meta symbol 'vertical ( note stroke' between expressions) 2) underlined optional 3) C square brackets statements " indicate dots "... item, one or more times. enclose such as C label or expresions three ] and of repetition :] preceding CCPIVEPlTIONALCOMPILATION PROCESS A compiler its input, code for program". source a source 60 as Algol or, or a translator, program written FORTRAtd,*and specific This program better, in and then is This traditionally synthesis analysis part, of the the a program which some high-level as its produces computer. process In the simpler is language, the output, output is called divided into object code. compiler such appropriate the analysis accepts as accepts the "object of the source program, discards compiled (such those as parts having characters The compiler for part identifiers >. synthesis phases. has These After the been converted semantic syntax in the and addresses are representation language generator approach discussion of check of are of the is the target the presented in string in to used computer by the in order is is may be found input to ( for for scanned in literature variables to produce code that syntax and using the of the complete Runtime storage internal the assembly the of a or machine This generator. a compiler, being CGRIES] or and and program make sentences performed. the example analysis source ready is during simultaneously, program the the tokens of program task both and is tokens allocated hardest of information tasks, usually source then tokens. the symbols, during lexical grouped and, the into be to of the language such as tokens used basic the which language corresponding semantic its tokens the source of the new tokens, are these of into analysis, rules, tables execution not of tables several definitions the are etc. builds also which string into grouped identifiers, language symbols, analysis linear a been source and transforms comments) The source program is now the of code most systematic [WILCOX]. A fuller CEAUERed]. LEXICAL SCANNING IN KHAR Source text of file the language in an internal is read as a. string is recognised form, that of characters, and this will each basic symbol be passed to the output is to say, as an integer number. Language identifiers, the are etc. symbols fixed-length I characters. with for the language design, development) most measurable language, (including rather index by an (2000,3000, .... into ) which do we requires space use the save not for and between separating of be of it is has other used is which (to readability aid system programs this storage based on known or in written further here. at most 3 and 5 characters determine to undertaken the about consider two bytes encoding Programming since text of space). could encoding shortest this of requirements (to strings encountered. intermediate information statistical storage of to to operate length variable information of information of the processes no account research form in transmitting than of the further effective the special User-defined codes. an offset the rest takes and compactness but names, represented practice bits balancing representation read/write symbols properly, In principle the allows In KHAR a form after chosen as it frequent concerns. wider are the overhead good communications most predefined plus minimises processes, is standard class. This encoding between by dictionary, their words, and integers constants identifies reserved represented appropriate It i. e. symbols for each on backing each Integer code in storage, integers). SYNTAX IN KHAR As remarked in CWATTa], a CFG grammar serves as a means of communicating between and that a between the language designed well but requirements are both strictly not Language formal CFG. Watt defy for the behaviour with of only with the algorithm contexts, are as sets as the we by a CFG "context semantic in approach and code linearized the which are to define "compiler" syntax the which seem to us to be that to handle semantics in CWATTb] which requires graphically Again, of ECORDYJ, of all encompasses with graphs, actions, by a can sensitivity. of integers presented described language. of this with can be determined can be defined required a pass of cover "syntax" set semantic produce context stack expressed KHAR a limited using to KHAR we deal where between language- which that In typical a compatibility extended that a language semantic KHAR so as to a particular one in of correspondence program parser. and then syntax The advantages use of the cannot be defined be contary, states languages programming programming a a of can be placed which all and the should is-the by including augmented graphs "syntax" features verbs, satisfy even type compatibility by a context-free sensitivities" of In parameters. Our view and checked Watt the identifiers, programmer, implementor. features of by CFCs are of the and typical Examples well-formedness those precisely and the unfortunately, that algorithmically. designer similtaneously can description factual argues criteria deal that, data types, I generalised form CFG and applications and emitting designer context-free. which declarations language the and,. further, and shall that individually see, this KHAR need as contrasted the stack to handle specific sensitivities to user the approach produces to of the a much semantic simpler This level graphical grammars bear to unlikely in separated, in be encoded, translated process semantic existing levels by for the information being no semantic production designer to and to performance of KHAR. the are the to used graphs carry can out a which does of correction of can often a few error in the problem, although error productions if the he range as by using This at this checking KHAR but productions. definition in occur be achieved on language error not So a wide separation been developed has not involve they since concentrate by adding advantages of has been handled. because or by adding ignore the definition techniques recovery content This error above, However, implementor and system, discarded be applied could the the and designer since in results is which shows the writer is course, language. his and semantics. remarked the KHAR system, correctness two- of of CWATTa]. which that some error methods syntax here of latter a compiler discounts since approach of the proof presentation we observe same person Gries I CEOCHMAN] of but, grammars, the of graphical Further, the formalisms the with affix of property presentation compiling burden CCORDY]. of contrasts extended the distinguishing the approach and those than mechanism the of was an allows separate can improve comes between he wishes. SEMANTICS Semantic analysis is a part of 2.10 compilation - which checking syntax for To check attributes part of the of tables internal source such a later identifiers, In semantic created in the information like have a unique table information. information this to these that, entry tables to check in declaration the we use information the encoding some semantic process for and so semantic in the table errors. and the amount appears element, For table. may be nested, Algol- the same identifier and each such declaration with will in the table must it. in the input information on be referenced will which in symbols identifiers, about in the entries associated we have already found about depends on the type of that procedures This information dimensionalities addresses, in which procedures Each time an identifier some In length in different may be declared of we have one entry we need for languages the checking need information same process, we have variable therefore of processing contain and added the are process strings. For each identifier of which as attributes, in time we program. character be collected will correctness These tables andi information at we form. integers semantic all which the purpose of the source program. structure semantic for and code generation be stream, it checked against by our semantic carries the operations. SEMANTICS IN KHAR Four kinds of semantic operation charts of ECORDY]. These operations, actions are : are presented which provide in the different semantic kinds'of 1) l ormal Operation, 2) Parametrized 3) Emitting Semantic 4) Semantic Choice Tables, stacks, together data as a "semantic to The semantic complete for set and Operation. and operations a semantic refered Operation queues The structures". Operation, Semantic so are and "semantic its data "semantic called associated operations" and operations are mechanism". operations, accessing named above, and managing on any of the data four This restriction are meant data a semantic structures are makes possible to a provide and structure, these to restricted automatic a generalized interpreter. chart Semantic semantic of are called structure the operations classes. on are commonly needed in producing mechanisms which for charts a programming language a are discussed. set These are : to 1) symbol table 2) type 3) count stack mechanism and 4) address fix mechanism. The multipass structure one, are fewer attributes stack plus mechanism, mechanism, a group of of the KHAR system The operations registers. in number and are distinguished of identifiers. This reduces is by having discussed fully the mechanisms on the single stack no knowlege of the in chapter 5. CODEGENERATION In KHAR we are dealing desires to control map one for by constrained transfer of of the language. code for one into our control any orders, At the other suitable assembler is this in simple typing hidden and say, intermediate at present, and PL/O CWIRTHb). a design programmer objective as the high-level by the extreme, the which being the machine order semantic CTANENBAUFib]. We have, (AML/1) a language the code exactly, Thus code generation the system. must with generated This some orders, of PASCAL, language mapping may be code. flow especially control we structures may pseudo-machine code for of a generate CPASKO] high-level CHAPTER 3: DESCRIPTION OF KHAR We describe the and briefly system before and or graph to program example practical coupled, Errors are checking similar to example, at identifiers. is via not acceptable Tree is complementary PL/O the semantic does Semantic syntax phases for graphs by to of is structurally dealt with semantic valid information successive check the - -3.1- of in is the on the is Error expressions. For about information do situations defining by achieved passes since pass. information up and down the semantics error one combinations alternating tasks, simpler in of pass. is deals transducer or environmental checking KHAR emphasis tables syntax production is by a previous emitted be defined. achieved of other The pushdown KHAR access transmission tokens where tasks. task the a transducer. Since an existing CCORDY] but Cordy Transmission operators. for a to the a pushdown and the recovery by using of time no semantic have to of All error done to verbs, through traverses which of AML/1. productions. and code emission that one aspect of error achieved is This semantic by syntactic can be often language. obeying with with recovery only the stuctures representation for automaton, and semantics, this interface this use of external the of information carrying one, structure syntax of the the converting a pushdown via dealt separately the of processing internal an of KHAR consists graph, in the interface, designer's the involved process its presenting presenting by presenting KHAR system of types Abstract through The and Parse two only information placed internal the considerable code tables This gives tokens. requires pork interface on representation the revised These diagrams This the handling the store form of of requirements. KHAR system. that the for language designer require a translator one, as discussed B. on the next page. the KHAR. This use, and undesirable general to an internal diagram, constructs he has to machine-code KHAR. We clearly follow is in enables modest read/write by hand. Effectively is an unacceptable external of the transducer A shows a primitive This structure development for programs within Diagram driving on the stack in for the from an CW'IRTHbJ. Diagrams A&ß driving tables I' ing encod process KHARr--->c de o I dict ionaries Diagram A language description Irans l alorE I driving Fables program encoding KHAR process A dictionaries code We now have to supply of a compiler coding data structure the KHAR system, pair tables, of required of smallest primitive implement a zeroth SLO, and add table syntax possible error which recovery, SLI. This define present successive interface to Languages are more fully following translate page, now the described presents AML/1 CJENKINS]. for tables codes. (or the Language encoding SL1 was used to were added to KHAR. The is by in chapter 6. of the a picture a Larger minimised 140 integer KHAR system Language to the minimum features access of In to !CHAR. SLO has actions SL as facilities of versions building the construct a hand coded a Syntax of version PASCAL the a Language. supplying successfully we had to do by hand to about for analyser version, and allows the definition KNAR needed to allow with a syntax we use the to SmaLL Language), the to drive gives ENF to a modified recognises which Wirth translator. the using Diagram SL4. These SL C, KHAR system on set the up to oÜ .ý ýpp O 0 Diagram C -.Nü0 0 (" ... ... 0 ý ,0Q '0 ri p 0 A LL i0 Q.. .w R ... Ei H 0 -4 H 7 r+- Q, 0ý ._ý 0 7 ý 0ýý ý H tn ý Z ý A ý i D ýýý .« --ý ... ýý 0)ö ö0 0ý .... ý .ý ._ - 3.5 - deal KHAR can to parse is program concerned only remarked Languages do not inspecting by it and find in the the true syntax of also is his to Free in used its in this the of As sense. narrowest CWATTa], practical easily CJENSEN] or We see immediately PL/O CWIRTHb]. needed to up syntax We see this PASCAL as given of the thesis Grammars. is knowledge that free context The system establishing introduction have Context a no semantic one pass. here and syntax correct, by Watt with have which which in successfully, sentences is point in as A1L/1, such grammar(CFG), languages with enough CWIRTHb], productions like <typeidentifier>:: <callclause>:: demonstrate syntax check the the syntax To deal focus our a syntax then that the attention diagram of a feature in with semantics of actions placed so as to drive of it the of version language the to system to used KHAR system the intrusion semantic PL/0, will and any aspect on one particular KHAR system aspect semantics remove this This approach allows one PL/0, in this produce the so far outlined to a PL/O program. of with of syntax of We use the translated the to a CFG. of the modify treatment 5) (chapter thesis >. =<CALL><procedureidentifier><; have to We thus and =<identifier> of to deal KHAR to with form other the aspect. of the become for PL/O. a designer of a language to focus at a time, and also of the language specification allows we and draw semantics that a pass language, This is "compiler" attention a user to see the effect by inspection of a diagram. / internal Where the semantic , as in of an expression, checking abstract tree parse A KHAR. previous then presentation of encountered so that discussed in more detail include in the KHAR never requires to in achieved by seeking the with or a current one "REAL" is included in REAL the The valid. Further, since language, which KHAR. It syntax properties each semantic should assist We note also included in the That is of attributes no second as group of errors user in passing of in the a that' error and the majority will deals with same type program these recogniser passes, "REAL", That are is not, branch We need only and *-INTEGER-REAL. KHAR of stream represent the is above consequence. by including the of input the may both fact, In implied as from these the of unnecessary "type". of types read it expressed or is The approach is KHAR so that language designer of symbol of to graph ANSI FORTRAN and (REAL, INTEGER, +), of pass, syntax in graph *-REAL-INTEGER omitting a in triple operator) any concept "INTEGER", other valid the the amending makes representation token. expected (REAL, REAL, +) not to match and the approach Comparision program. type. 5. itself the first operand, through its with combinations. multi-pass handle up the passes can use a simple pass in chapter to objects (operand, KHAR machine two the full permit attributes identifier every (operand, type, type) this leaf define we to simple of alternatively, second that too propagation first the the, valid We emphasise run the check for the has appended pass These two passes KHAR is of structure *-REAL- two passes state and what do the is rest. one aspect of reported together, actions are debugging. error recovery recovery is under the control of syntax errors are also of the reported in We discuss one pass. Our picture is now as in Diagram The syntax course, all I 4.1 the checking KHAR system D, on the using syntax in chapter as set following graphs and actions defined We discuss chapter of error for 4. up to form a PL/O compiler passes are, of technique in . page. each of these SL4. graphs and our basic checking -u i. "*-% .a0 E 82. afl. ianram t0 (0 N ý ýi2. ý 0 UY ý ý CD ý ... Q 0 Y ci. . ai L 0 t2. ® L i iC (1 0 t2. i L i ý .. M a a s 0 0. ý ý L. 0 C 0 .... ý ý. .V .... ý N (} N 0 ý ý 0 L L 0 L 0 4- Ü0 üý a _ 14 0 .. ý 0 ý CY. tQ $C v a 0 a ý Ir C -o 0U U ý to to 0 i2. 0 U C ® ý ý J üÖ ý. . '0 t1. +' e+-0 I PREPARATION OF INFORMATION TABLE TO BE USED CODING IN THE SOURCE LANGUAGE SYht?OLS language For each two sets the information. of "language the TTD, is the Table terminal transition special symbols in data syntax Symbol Language The set of of matrices language", of matrices or string fixed of on the elements is language a "language construct one blank number and call or it each group symbol, latter, is that from them have their own separately. we explain encountered in of in columns "the called words, "language into valid transition all elements however, the is triple ... that of groups, following the from to the each group. data the with the the to delimiters), This starting sepecrated of according character data". symbol same applies beginning terminals we group and one group and the newline top language divided characters or made symbols". (reserved code for which LSD, is and the itself, of and Data classification, above, Eoth and the (LSD) Data syntax syntax language. the preparation, For each programming given the prepare keywords" The former, language from derived to need Symbol Language the we "language respectively. in used system, the are them the (TTD) Data information the by our These We call syntax". Transition from be used to associated a associated other. next is by at group Each least code The layout of this ------)1 I a is as follows: data fixed Each item of the language the of subsets is in this It comment and file could and can symbol, until the first it code 17). finds the symbols which may start Algol code, 17. W we may write ; or in Pascal 17 (* These two symbols are separated be copies not fixed. comment any character This means in process and by their 17 *) all of identifiers characters symbol after "comment-end" The length string. of encoding fixed it a string in the language. finds code 17, then symbol (the by of this of the source text, until only These code. They come under the fixed comment. be up to the maximum length reads is- parts For example in the symbols of the language blank and as long as they are preceded we introduce that terminate each has a unique that so The number of different codes. ýE-- keywords is to appear in one can appear in any order subsets fixed above I---------- subset of the items of the language keywords to the associated proceeding fixed code rE----I fi end efixed code II one code the system the "comment-start" reads and second symbol after ignores fixed 'is also It string "start ends, symbols part are considered internal code. a of inside the string" string it const must For string. write(START be a write to statement data, codes, is means treated belongs separator free. the fixed "comment". as comment. to the between code, A typical is these and will string" symbol be typed two consecutive if these two be coded string example is a itself a times for symbols string to a END); string" '0', So any text as shown in the table appearing example language programming symbols between string" output "EtJD is the symbol to end a const. In this constant constant appearing as one constant "end "end and characters the we introduce that language are START and END then i iENDEND is the symbol to end a const. programming could of If constant constant symbol data" constant So any number symbols. single in "language at least of this AFL/1. one blank in the data Please and of fixed data after follows note the '0' which that the format is from this line until the first semicolon encountered text is treated as comment and is ignored by the program which uses this data. The following are the keywords of the AML Language. Numeric fixed codes are : 0 1 2 5 17 18 100 for for for for for for reserved words. standard functions. delimiters. single character comment ends. constant string ends. end of file. 1 abdgx at do if in end out ref rep body byte call code data else bytes while define endcall endcode endprogram endproc here main proc then program 2 cc cs ge gt hi le Ls lt mi ne pl ra sr vc vs 5 (CC>;,: )]) =I 17 (* *) 18 of go 100 There is a program to read this "code name file" are permanent files data and make (CNF) and the "code name index for their corresponding two file"- language. files, the (CRIF), which CODE MAME FILE & CODE NAME INDEX FILE CNF is between and the symbols, information. The group of the created to access the this table each of of group from column starting two code complete in symbols encountered Similarly to applies To following 1) the find in Code Name File. corresponding table while the with first the simultaneously The locates to of this the the third column table take into second start of group code contains the in the codes transition in determined tables what code should is the regarding then used to various code the programs. located symbols CNIF, via languages, source These codes are the "base-code". code, on CNF accesed consideration information. the grouped within programming the by the replaced is according starting information found in located the separator along CMIF which the a each group. symbol symbols located which the based on an arbitrary This not pointer The codes in column three the are as array the contains one blank separately are information of for codes group codes information one has only dimensional column column it so that constructed from of be the this the transition matrices information are and the-same SL languages. associated with any symbol, the steps are to be considered a check is made to see-if Two possibilities may occur : the first character is alphabetic. a) the first the symbol may be in any of the first idea that b) The first 6. other if , four required, that the case fixed symbol code 5 and groups may be searched one after such times until the symbol is treated otherwise in CNF. groups i. e. group with two groups happens all case (a) If four in is not alphabetic character in one of the other falls in which case we form a tentative is alphabetic character the symbol is matched, that identifier. specified as a user's the I Althougý by above, first, a similar procedure and one beginning. the of While two from case (b) time is completed In this the way digital automatically and base code will be can be for by the code is also determined. the is symbols be altered. can very easily the the relevant so that updated the appropriate of in right skipped is continuously (b) case of the source symbol is continued, of symbols coding accounted in undertaken in characters groups the matching code information the search would the length considering be accomplished in the Any alteration in automatically determining the codes. subsequent INTERNAL FORM OF THE SOURCE PROGRAM in previous As we mentioned symbols source places that will text are put in the dictionary index plus code. standard integers. be coded as 3-digit the appropriate is their sections, Integers names and Integers of integers special apearing in and are coded from in the their 1000 to 1999 , 1000. I Users' defined identifiers are coded from 2000 to 2999 and from strings constant 3999. 3000 to ENCODING is done in six This encoding text source the source step as its the previous of , order (other Each step, to step one is would be the internal from step six and the output text. The input steps. than the first), own input. takes the form of the output The steps are in the following : 1) delete commentary and code the newline all 2) code all strings constant 3) code standard and integers; names and user's 4) code triple special-characters; 5) code double special-characters; 6) code single special-characters. The output from step six is character; identifiers; defined a sequence of integers. FIXED CODES AND VARIABLE CODES The vocabulary be categorized T of terminal as follows symbols of languages : a) Reserved Words (RW) b) Standard-Procedure-Identifiers (SP) c) Standard-Function-Identifiers (SF) d) Standard-Constant-Identifiers (SC) e) Single-CHaracter-Delimiters (SCHD) f) Double g) Triple-CHaracter-Delimiters CHaracter-Delimiters (DCHD) (TCHD) we consider can in the ( We assume that class These can be assumed to to each in order The assigned invariant for of coding 1) Fixed choice of constructs, all arise languages of more than three symbol consisting asigned of to identify codes all special be classes to the the programming there we use, exists no ) characters. or groups and a code can be them. above classes languages of objects remain Three types we consider. as undernoted. codes as shown in the table these codes we assume that the programming is for languages on the next page. Although arbitrarily our purpose associated they to be considered remain the to the language invariant in the work. for FIXED CODE (in code files) undefined (in Language-Synbol comment delimiter reserved word standard-procedure-identifier standard-function-identifier SL-reserved-words single-character-delimeter double-character-delimeter triple-character-delimeter identifier in general constant identifier identifier function procedure identifier type identifier identifier variable identifier field comment delimiters delimiters constant string unsigned integer constant string transition matrix state number valid element action number nil symbol any symbol end file end line 1 2 3 4 5 6 7 8 9 10 11 12 13 14 17 18 19 20 21 22 23 24 25 26 100 999 2) codes which change from one Semi-fixed are fixed 3) MEANING within Variable defined one language, We reserve for e. g. codes for to another but language symbols. e. g. program, user- identifiers. "codefite-maker", all all 3-digit variable codes, codes, language codes which change from program to The code association called Data) is accomplished starting one-digit integers automatically by a program, from a base-code. and 2-digit for and have defined integers semi fixed for codes, use as integers 4-digit 100 as the base-code. fixed It should be noted that is this and can be readily arbitrary increase changed to space efficiency. A PRACTICAL EXAMPLE When we want the system for a particular language L: 1) Switch the system to SL language; 2) Write a program 3) Compile and execute builds (this 4) Replace In this language. using by these language L. part we explain how details system for AML and can be used as a pattern we do some changes and necessary the prepare changes in the in the syntax for of AMYLand system. Language AML, Programming AML is a high from level some assembler of the to use the addressing manufacturer's and recognition new ones and the system to included are Finally see the consequent the L); the the system. required control for ALL programmer in tables previous ready program, which statements Language L; the this up tables of parsing is in SL for assembler. designedCJENKINS7, tedium modes of of the knowing to the exact microprocessor the relieve syntax via the The, following <program>:: brick>:: <location>:: <at =<digit>{<digit>}* body>ENDPROGRAM =112131415161718191OtAIBICIDjEIF <declarations>:: <size>:: =BYTES<size>I{BYTE<identifier>}*I{REF<identifier>}* number>j<string> =<integer <string>:: <block>:: <simple character>{<printing ="<printing <code brick>:: statement>:: assume construction). a good possible DCODE assembler>Et: =WFIILE<cond>DO<block>REP test> statement>}<conditional construct the a set of syntax reader is practice to reduce them together. in the syntax from the graphs familiar For each nonterminal by merging once clause> =IF<cond>THEN<block>{ELSE<block>}END ={<simple that machine statement> =CALL<identifier>ENDCALL statement>:: We first statement>I<if =<code brick>l<call =CODE<target clause>:: <cond>:: statement>I<while =<simple statement>:: <while character>)*" ={<statement>}* <statement>:: used BNF. =AT<number><suffix> <number>:: is in Af'L/1 clause> =HERE<at =BIQIDIH <if of =(DEFINE<identifier>=<constant>)* CDATA<data brick>Ef1DDATA}* (PROC<identifier>BODY<block>Ef1DPROC)-* MAIN<block>ENDMAIN <suffix>:: <call grammar =<identifier><location><declarations> clause>:: <digit>:: the gives =PROGRAM<identifier><location><program body>:: <program <data table with we may have the number That is graphs of other if the rules a of one syntax of graphs graph graph. It to as few as nonterminal nonterminals, (we grammar, is we replace only it by its graph. graphs if times. also Also it is better they have a small This minimizes syntax the nonterminals graph even if the amount of information reduces the compilation Figures to replace by they are used a few we need to time. 3a to 3c show the syntax their graphs of this language. keep and FIGURE 3a Matrix 99 for AFL/1 ý I HERE PROGRAM denL -->i --ý-ýAT-->1971, DEFINE"i denE--->=------>cons EonE DATA--), 1987ýENDDATA PROC--->i denE-->C96]->ENDPROC L->MAIN--->C967 ENDPROGRAM ----> FIGURE 3b 98,93. Matrices and 97 for, AML/1 ->HERE 98 a denE AT-aC97]--'V>BYTE---> i ident k->BYTES ---3si ze---ýiREFmai f denE 93 Ii dent str ing 97 0 number [-> D h-> H ý6--J N FIGURE 3c ýc Matrices 96,95 and 94 for AML/1 r-->[95] ý >[94]-ADO ->NHILE CIF f967 REP 1947->THET, I-4C967 END--J , ELSE-->C967 95 26 1-->ENDCODE F CODE 'ý' 'CALL-->i dent->ENDCALL -J 91 CE CS VS I The terminal symbols by placing non-terminals 99 shows the number by represented are their of production the brckets, square name within denotations their distinguished and (C]). Graph symbol the of language. To every downwards. graph we assign Code 99 is devoted graph codes to the other The table graphs below shows the a code number from starting to the main graph and the allocation is otherwise codes 99 of free. with associated these seven nonterminals: 1) 2) 3) 4) 5) 6) -7) Graph code 99 98 97 96 95 94 93 Nonterminal Program Declaration Location Statement Simple statement Conditions Size -. In each square box of the appropriate table is also graph useful code next above to their also graphs nonterminal which shows the number of is written names. occurrences the The next of each nonterminal Syntax Graph in the other graphs. Number of Occurrences 99 - 98 once in 99 97 once in 99 and once in 98 96 twice three 95 once in 96 once in 94 94 93 twice in 96 once in 98 in 99 and times in 96 and J CHAPTER 4: SYNTAX & ERROR RECOVERY In this we discuss chapter the use intermediate step the algorithm used to check syntax before discussing in their encoding, recovery error of syntax graphs, the transition matrix. graphs, and its by supported the and We give an example encoding. SYNTAX GRAPH We always represent the graphs, for "syntax so called a label We assign the given syntax of a language as as in graph", to each graph, a set of [WIRTHb]. from 99 down to 50. This allows 50 graphs. Label 99 is devoted the other is graphs to the main graph. otherwise free, Allocation but of labels conventionally to from 98 downwards. It set of is useful 99 occürrence, of for the graphs of in nonterminals of language the AML/1 in chapter. This table graph has been done This graphs. the previous to make a table and shows that it is thus and have one graph fewer. the graph 9E is only more efficient called to include once it in that in the graph The set of elements, in a syntax is graph terminals "valid the elements" of these the union of all consequently, and nonterminal symbols, that of is the set sets appearing graph and "language of symbols". For Valid elements each point of a syntax of these Two points having points". A lines there the same destinations piece line of at least exists are called consisting one destination. "the same condition is such points of lines. by directed graph are connected called a "state-line". While in a state, next terminal state or nonterminal are valid the state function elements number and "exit these stream, appear may valid the as to cause a change of We have a function state. return to We will elements. take see this number "0" state" line one "initial line" is the line which comes to the at one or more alternative these elements We allocate the "initial of state lines and elements symbols" to the "initial and EXIT to exit The allocation state". of that The initial lines". (Ile call "initial the input graph there exists graph and terminates graph. it which later. For each syntax more the set of elements Line" within of that and one the graph. ) this call or the each of them an "exit and call numbers to state lines is otherwise free. Syntax graphs have do this in the to be translated KHAR system, we use into a data the transition structure. To matrix as an intermediate In order translate to best through explained AP1L/1, syntax interpret we first structures from input the preparing to the translator. Matrix Transition is in step each their to graph graph 98, to directed lines as to data appropriate a transition Therefore an example. syntax say into graphs we translate its This matrix. one graph transition corresponding matrix. We refer numbered the on we draw and state ý(matrix "BYTE", A line is So, for corresponding code. each valid we have, say m. graph, say n. elements to the columns this particular example we vis: and "ENDDATA", of each line". or a "state line" "size" since This state. In case of a state to the state we in, are line is tine, it we call state". each to "BYTE" 3 and 5 respectively. i, state each In our example "BYTES", already valid elements, element number and, according the "next of the are or "STRING". "IDENT" an "exit In "STRING" "IDENT", "REF", leaves has a state this Assign matrix. 0 to 6, and 6 valid 93) either elements numbers 0 to m-1 to rows. have 7 states, "BYTES", an mXn these so we know how many states graphs, Also we can count the number of valid First and states we valid the valid and "REF" have some valid is a next element, there elements of state and the corresponding elements state and, or exit 0 are next states are 0 To fill corresponding next to these state. The same for all other elements rows and we have a transition the graph number with associate The corresponding number". would be the following matrix. BYTES I BYTE REF I IDENT its elements for matrix transition matrix transition STRING that of that We do the would be empty. row the put Then the set columns. columns are valid of this we matrix under the appropriate states corresponding of elements "matrix of row i of a transition the elements graph. tie it the graph 98 and call of matrix ENDDATA --I-------I-------I-----i-------I---------I---------I 01 135 --I-------I-------I-----I-------i---------I---------I ý221 --i-------I-------I-----I-------I---------I---------I 21II --i-------I-------I-----I-------I---------i---------I 3114 --I-------I-------I-----I-------I---------I---------I 43 --I-------I-------I-----I-------I---------I---------I si61 --I-------I-------I-----I-------I---------I---------i IsI 61 EXIT EXIT EXIT ----------------------------------------------------- (fig 4, transition is not the final This "syntax-error recovery there matrix and also for associated with would be some semantic These actions have discussed form of our transition part" action will number 98 ) each to state each valid be element and code generation be added to the. above There is a matrices. used for of each error state actions. example matrix after these topics. However, the linearised form of the graph can now be shown as: we I 98 C0 (EYTES>1; BYTE>2; REF>5); 1 (IDENT>2; STP.ING>2); 2 (ENDDATA>EXIT); 3 (IDENT>4); 4 (BYTE>3; ENDDATA>EXIT); 5 (IDENT>6); 6 (REF>5; ENDDATA>EXIW This syntax ] to corresponds from as (scanning code such DATA string I BYTES "A CHARACTER STRING" ENDDATA DATA space I BYTES size DATA workl ý BYTE a BYTE b BYTE c BYTE d ENDDATA DATA refs ý REF index We also emitting it just 5 while stops is size REF top-of-stack actions and an error action possible, chapter show where ENDDATA (* the recovery scan. Error we consider constant a manifest *) ENDDATA be can "I"): by placed This action. Semantic actions in Recovery the showing latter is a code- the worst discussed are second part of in this chapter. We show part added as well of this linear as code emitting form with error recovery in the following actions diagram. 98 E 0(BYTES>1; BYTE>2; REF>5 ? STOP); 1(IDENT>2lemit(" STRING>21emit(" 0 ... 00 rmb ", css. ); fcc /", css., "/", nl, " fcb 0" ? STOP); actions is "STOP" end the of the is action in given but chapter it is is of in to is a "? " one of the after )" S, "STOP" chapter a durm y action, needs placed we have access actions As we show in Language. action, "emit(......... The verb state. and code generating only recovery error semantic K14ARthrough in correct no use in SL the 0, state 1. state the at which The correct S. VALID ELEMENTS OF A TRANSITION MATRIX The valid in one of the following fall matrices Language keywords: 1) be coded would on the top of columns in appearing elements these automatically groups elements transition : as they are and appear exactly the language by a program using keyword file; 2) nonterminals or transition numbers 99 to 50 as mentioned matrix 3) identifier: 4) constant 5) integer: 6) nil 7) anything these matrices: string: symbol: symbol. of fixed If matrix, so that to coded before; - For groups 3 to 7 above we put the fixed table are already codes, we have this it (chapter fixed is placed is encountered code 3), 26 into after the the matrix. (anything-symbol) in the rightmost from code obtained in a transition of the used entries any specific symbols expected in a row, in that state. CHECKING TECHNIQUE USED IN KHAR the main the syntax we have one or more matrices: language For each matrix, any of 99. matrix The matrices source program in the is one used by KHAR to are written these of check corresponding language. programming we have one entry For each matrix that matrix that matrix. but may be an exit there Associated from any point any is a pointer state of in any state of terminal or element, valid there any state within nonterminal, with first the point, to another state of the same matrix. begins The program execution element first the of from this same matrix. The other indirectly, from this main matrix. We have two types symbols that, it the source case this entering the initial does current a which elements; match not state any checking Upon the continues exit -4.7- be checked-with from symbol will of that the current by pointing this latter In the symbols) new or terminal are of them the next valid be checked against will occurred, the new matrix. with just symbol will director directly they either the source new matrix, symbols (the match may be called, (a matrix-number). or they are nonterminals before with in of valid matrices valid and stops by exiting of the main matrix, state first the to access with case be checked matrix and if element of the source symbol but if to the beginning of access is matrix, that to regained This transfered. The process the valid Another source is symbol state, current and As soon as an error longer by placing accessed direct the the in verbs to scanner If state. the stack on that after valid element. to the new the "end of file" until any of the valid one of the next a match occurs and access is transfered continues occurs or of the elements occurs. scanning occurs by controlled the onto symbol to match with the source read an error for a stack using was access on exit. symbol does not match with the source which pushed by the entry and the same process state off of the current indicated by is Information elements from matrix achieved always expects is the state state is and popped a matrix entering previous recovery information. accessing the of point syntax the take graph error the error the input text alone. The error of action entry action chosen in is no actions the table language by the designer. The errors A symbols. are good caused compiler and correct as many of submissions of ajob the which how to process compiler continue the In this actions term can not "Error should find as possible them before the by unexpected, it is it analysis when Recovery" is system we have one which we use as the error all errors to finally correct, or missing in wrongly program a source the reduce number- For other completed. should be these errors able spelt to occur. of errors determine For this used. general action action part-of and states some special in transition matrices. The error all the first serves the "end Ialid as states of can be added to each actions is the recovery we of state" the than quicker action the valid all which also we check the source and if of the state the next valid we reach the error been checked with That is, marker. with part, action error from the start it check on until it does in the table element and Then we know the symbol part. of the current elements state and has been detected. an error As we mentioned the number the checks an with elements match carry here the different all error case usually terminates symbol with that for case. Each state has different or matrices In the latter state. not can be general actions program current and matrix that syntax with enters if definitely match can not have an error matrix is dummy action We give to with valid if and only a new matrix the the valid one of so that an error to exception satisfy the the on the following this. syntax the a match occurs. current is not algorithm. of obvious symbol words, The we use "STOP" SL. page the formal state is It In other needed. matrix new matrix, first source elements. a this of Conventionally, of to entry element part is symbol expected before checker, symbol when we enter an when the earlier, will we main as a FORMALALGORITHM push to matrix no. 99 onto stack; point symbol; end of program: =FALSE; read WHILE matrix on stack AND NOT end of program BEGIN entry first DO WHILE NOT exit AND NOT end of program DO BEGIN indicates data under pointer end of state THEN IF entry in transition BEGIN IF NOT looking for director symbol in first state of matrix THEN END BEGIN report error and call error recovery action ELSE BEGIN in data departure transition (via to of point stack) return from which entry was made to this matrix and access next entry; pop stack END END ELSE BEGIN into transition data ( the expected symbol ) IF symbol under pointer is BEGI a matrix code THEN IF this stack is not already on the stack as unsatisfied goal THEN BEGIN onto stack; {used to recover this point of push value of pointer into matrix} departure access new goal matrix via index table to as an unsatisfied another matrix goal on the stack) matrices; {record level: =level +1 END ELSE access next expected symbol in this state END ELSE {not matrix code, could match current symbol} BEGIN IF current BEGIN IF next symbol matches symbol under pointer, state NOT EXIT THEN access next the expected symbol TbE ý'"ý state, finding ELSE exit: =TRUE; matrices on stack as satisfied level: =O; {sets all read next symbol END ELSE access next expected END END END; IF NOT end of BEGIN program recover point IF next state ELSE BEGIN access next END END END. symbol THEN (exit in current from matrix goals} state has occured) from stack; of departure pop stack; is EXIT THEN exit: =TRUE state to find an expected symbol; exit: =FALSE new symbol AN EXAMPLE We describe described syntax the above. the action Consider of the the following of APIL/1, but has no practical transition encoding matrix of graph 99 for PROGRAMexample HERE DATA work space HERE BYTE work 1. ENDDATA MAIN CODE ENDCODE ENDPROGRAP i given At1L/1. algorithm for the small matrix the program which satisfies meaning, and make reference on the next page, which is part to of the IIIIIIIIIIII ATINEREI 971DEFINEIDATAIPROCI 981MAIN IPRGGRAPIIIDENTI I---I ---I -I01 1ýIýIII ---ý-------ý-----ý---ý----ý---ý------ý----ý----I---I----ý-ý ... 121ýýýIýIýýý 11 i i i i i i i i i i i-... 11 ---i 21 ------------ --- ---- --- ------ ---- ---- --- ---- I- ... -ý -i-I- 3 ---ý-------ý-----ý---ý----ý---ý------ý----ý----ý---ý----ý48I 13 -ý ... -ý 23 ---I-------1-----ý---ý----ý---I------ý----I----ý---ý----ý56III 1-------I-----1---I----I---1------i----I----i---1----I6(ý ---I---1----I---1------1----1----I---1----1---I-------I7j14 ---1-------1-----I---1----I---1------I----I----I---I----19II 8 ---I-------1-----I---I----I---1------I----i----I---I----I1101351 I 9 ---I-------1-----1---i----1---1------1----1----1---1----I111 10 ---1-------I-----1---I----I---I------I----1----1---1----IIIII1 11 ---I-------1-----I---I----I---I------1----I----I---I----1IIIII 121 8131 ---I-------1-----1---1----I---I------I----1----I---I----113 1I ... ... -1 ... -1 ... -I ... -I ... -I 121 ... -1 I ... -I ... -I ---I-1 96 1 ENDPRCGRAh'I I----I (1-----------I 24 -----I 23 ----ý----I -----------I24 1 EXIT .1 ---I ----I -----------I- ---ý24 11 ---I -33 ---ý-------ý-----ý---ý----ý---ý------ý----ý----ý---ý----ý-121 35 1IIIIIII1 ---ý-------ý-----ý---ý----ý---ý------ý----ý----ý---ý----ý36 The index to the matrix since ... . ---ý- the -I ... II -I -- first neither symbol number 99 is pushed onto "PROGRAM"is read. condition is The outer FALSE, note that 4.12 the stack and WHILE loop is entered "end of program" is set TRUE by encountering language another is loop As the symbol PROGRAMso the in entry 1. the ELSE part next state transition the The THEN clause entry "IF follows This the when both is is and as just is part current state is occurs, state 4 is state symbol matches ELSE the to access symbol taken. IF the of read- matches is the same as the next state taken the is so that this next inner WHILE loop 8, and space" Looping, symbol does match and the next symbol, "DATA", is which takes is the 9 is stae so state 35 is "AT", is accessed, ELSE part the a match occurring "work next "IF of the ..... a the match read. access to and "HERE" is read; expected and on next symbol first "IF" is matrix in repeated, and the the reached, is first accessed, "IF loop space"; accessed the symbol now state of expected but current part is repeated the the next IDENT. until occurs is, match in the and control matches statement pointer" of loop not On Looping, THEN same flow with IDENT, HERE does 2 and the symbol next matches and "example" under accessed, the described The accessed. the TRUE. The flow still are taken The loop is repeated, state read conditions same path moves current "example" symbol since repeated the NOT EXIT" is symbol PROGRA1 so that under state This examined. WHILE accessed ELSE part The first taken. first IF statement the of PROGRAM, the is The inner data transition expecting I The next next is while is accessed. state "work is matrix of the code condition. ELSE part pointer the sentinel an error in The entry under in statement is PROGRAMM so the to corresponds file" of it symbol: entered. also "end the code" symbol looping, BYTE is taken again statement is on so the this read. but obeyed. i The point 98 matrix increased first the as accessed. is BYTE. indicate of The 98 has been matrix is The loop of repeated, with next out symbol t1AIN is recovered associated is made to entry THEN from point the this first the to outer WHILE loop made with symbol read, the is P1AIN in match, symbol symbol IDENT with read. the and now ENDDATA. The loop The next symbol ENDDATA so that on the is in next becomes EXIT so exit to used is l", is state is entry obeyed, is which entry point state so then 98 is in stack the 12. the popped. to symbol on the repeated; on leaving encountered clause is is a this as "work expected. is in the TRUE read. expected 99 is symbol goal a director that match state is expected next symbol, a which next program stack with in level, so that result a match so the next of potential repeated next showing BYTE is since occurs. its has and the accessed, The IF NOT end of loop zero The next therefore a match and the 4. no match, is 3 to resulting repeated in exit not set found, state accessing loop is level 3; is state a not BYTES so the and results repeated next stab state is is does This is is which The loop followed. state The register, table. stack. first the stackand a matrix symbol expected The loop index that is the onto the on the placed ELSE route ELSE... usual the indicate by one to been pushed via accessed just has is departure of this point The departure of state next 12. state, DATA. Since 23 is program is entered and accessed is entry and is WHILE is The state WHILE examined and end of inner inner the Access the FALSE, the a match and the next CODE. 11 On looping, the expected symbol is found to be a matrix number, 96; departure the first the again, accessed; no match is CALL as the expected found, the 95 is state index IF symbol as the accessed is point 95 accessed, of and the via departure matrix first by one, expected and matrix and the giving symbol. CALL is On looailg, the and match, increased 96 accessed matrix looping, On increased level stacked, no symbol. expected next is there level stacked, WHILE, of symbol. expected On looping, table. 98 is to point not as the matched, current is symbol CODE, I in 3 becoming looping, code loop next the show that for entry same flow being FALSE. complete is is loop expected The to into and the state, on looping, executed be EXIT; but entry again, symbol the state. the and been 0 to set is, that goals, from number" EP:DPROGRAPIis the symbol a ... is first current time) the on is exit stack, 96 so the expected symbol As this statement "IF is there symbol, is EXIT, CODE becomes until and the the equal matrix the accessed. state" state 96 after does not however, repeated the end of 24, entry "On looping, is next TRUE. On looping, state is the and 99 left be found, to above "IF next only with expected (for that TRUE, having becomes exit popped, the level recovered the a matrix next 95 is so that Since symbol. match, clause to described be becomes now satisfied occurs, obeyed process found found stack in symbol has been matched. of departure the ENDCODE, and on read, expected WHILE loop, 96 and 95 are control is symbol only loop and results on the next EXIT so exit inner the set 95 is of accessed, and the from matrices The point the ENDCODE is after a director member of is this since exits The next state. next matches The state to made to CODE. A match occurs access and is has 0, the no end of its NOT looking THEN for director symbol" symbol, the from that is statement the that decremented, so, becomes a part the of is departure of empty. The looping the execution match occurs. innert: termination of "IF NOT end of next WHILE Loop is outer concludes left the EUDPROGRAF1 so exit "exit the is popped TRUE. On empty, is finished. could be analysis The becomes and is stack from obeyed. becomes so exit the since that reporting with EXIT and level EXIT, statement is next Left. loop, the stack is state is FIILE Loop is program" the recovered, accessed state next is under the popped The NOT" lies and symbol next "IF 0, marker, expected the decremented this being into and the recovered stack The accessed is goal) state 96 is the accessed, the caused end of into point on Looping, TRUE, and, matrix" the departure looping, On director a of point and level popped symbol. since 0. becoming As exit point state looping, on expected again seeking are as an unsatisfied remains The departure in is which we and the executed, stack, as the obeyed pointer. entry is one matrix only accessed entry the Since obeyed. ELSE clause 95 recovered (so statement and ERROR RECOVERY A possible the writing in an error source read. is fails may implementation appropriate symbols one For example found. to This report cause is error table recovery error recovery them until the until called "panic failures in mode" that For example state process a specific reads In this recovery. if an error any, occurs is symbol end-of-statement statement, if made- by an indicated and accessing the symbols errors. recovery ignoring by one, often many other message where ignoring further of error symbol case it and also in "VAR" in statement that end of they appear new errors. single several in the PASCAL program identifiers the of misspelling it different languages flexible undefined, features of a programing Because of the via the SL languages, system the Either called. initially. be words be corrected have we these because of a be should which it at the in manner. general action actions that they for the for a and some each of the in These tools flexible a and there are, are in all state right error KHAR and the one can simply that may be used actions if be checked could valid one determine to usually misspelt, in These try For example may be employed. which of cause effect, are like flexible. very interface provided add his own new features to the and use them. These error at reserved language, simplicity will that should whereas we implement. tools and occurs compiler word VAR Each of actions. very intended recovery, error specific some had been would and wherever undefined it the at process. a good the one of semicolon next be generated will an error be chance In our recovery reserved will are many compilers messages the some identifiers they program in error symbol correct a good of error detecting After is rest Sometimes error what the to ignored we have statement, in suppressed and we ignore PASCAL program point actions in the one of the specific Their action are accessed by calling formal algorithm actions is described or an action interpreter where an error'is the general in the following action sections. detected. may. be SPECIFIC the ACTIONS Reserved words, or "verbs", error of following actions in the SL languages call They are as shown in the KHAR machine. the are used to table. Action treaning GO 96 to error state, see below STOP stop checking succ point to next source symbol LOOP go to source that state, another symbols state appears in that stay until in one of input the state, the read and valid elements then stream, ignore of go to state. appropriate AN EXAMPLE OF THEIR USE The usage of these In our previous no. 98 of Ah1L/1. program syntax State matrix. 0: actions will become clear example we made a transition This matrix checker be able to continue. that error would stop. Therefore We discuss is not complete It from syntax graph and in case of error the needs some more information to to each state of we add error the action matrix by an example. added for actions each state seperately. There be will transition not have errors to in error For matrices. action State no the satisfy of states our of do which STOP as a dummy action syntax state error SL. of 1: In this If state we expect any error elements valid no. 4 happens we check for after of this So we have a set them there is symbols then until we in that one of the 2 of means exit next this from this this column for state "nil" 7 matrix. the "nil" in the symbol for The "nil" once only MAIM to and ignore We read elements this of In the state. corresponding the set case of ENDDATA is each is source appears, ENDDATA we a valid state of element EXIT which matrix. In the case of the other from table the follow we see that and corresponding matrix. and its state is called INTERRUPT symbols state. our As matrix. matrix to that PROC a next know state access of "string". are matrix DATA or ENDDATAor one of the from this exiting the graph of this shows, elements "identifier" either in graph 99 and by refering ý kinds these the we place first the To of the set we force elements this matrix. Its which the next 25) only state symbol causes a refinemement - 4.19 - we introduce effect symbol (code exit and valid place element a new a new is this is EXIT. in the behaviour of matches any symbol but no new symbol is KHAR. "Mil" as the in the difference keywords "nil" on treatment which is satisfy the syntax the still the other is made. Since a match with has read one of these, the scanner symbol makes a key ENDDATA and of recovery after occurs Thus the use of "nil" symbol. next read this symbol and may be used to current graph from of the outer 98 was and the which called. State 2: In this state we corresponding symbol of in did this the just not state match we will add an exit so that if leave and from matrix is next matrix which is occur. This state recovery part of the we were the match will never is the in this if the current much if we go 86t loose error no that symbol EXIT. not code under there "Et'DDATA" the expect to recovery sent here. column of take the error, produce place To do this "nil" ENDDATA, with out symbol exit will so the '0', same as state we error that is STOP. State 3: If go State happened any error to 4, EtJDDATA and follow symbols go out of this (as the matrix. 4: The same as state 3. state go to error state action for we search 4, part or for of state BYTE and one of 2) the and State 5: Very State similar 3 but state the same action is the "nil" To put these language 5. in this state the error next state action part "ENDDATA>2, and, after putting in the table actions and SL verbs actions as its only element symbol. as shown briefly error their error keywords the matrix, with as state 7: There would be no error these BYTE by REF. we replace 6: We take State to in chapter 3. Eut at this we use symbol '>' and ', ' to separate the alternative 1 can be written actions into form of stage to to link show symbols symbols. So as DATA>7, PROC>7, INTERRUPT>7, the error appropriate and symbols in the linearized in the matrix of state the we place the matrix NAIN>7" 98, it would be as follows: IBYTESIBYTEIREFIIDENTISTRINGIENDDATAI 25 I I I I I I I -I ----- ---- --- ----- ------ ------- ---- ------------nl 11315 STOP -I-----I----I---I-----I------I-------I----I------------22 1I ENDDATA>2, II DATA>7, PRCC>7, INTERRUPT>7, II(I III MAIN>7 -i-----I----I---I-----I------I-------I----I------------I EXIT EXITI STOP 2I -i-----I----I---I-----I------I-------I----I------------I4 31 II BYTE>4, ENDDATA>4, DATA>7, PROC>7, INTERRUPT>7, II wAIN>7 -I-----I----I---I-----I------I-------I----I------------41 (BYTE>4, 13Ii ENDDATA>4, DATA>7, PROC>7, II INTERRUPT>7, I MAIN>7 -i-----I----I---I-----I------I-------I----I------------I6I( REF>6, EtlDDATA>6, 5I 'I DATA>7, PROC>7, I III INTERRUPT>7, I r'AIN>7 -I-----I----I---I-----I------i-------I----I------------I15 IREF>6, 61 Et!DDATA>6, DATA>7, III PRCC>7, III INTERRUPT>7, III r°AI N>7 -I-----I----I---I-----I------I-------I----I------------11111 IEXITI STOP 71 ERROR STATES An alternative transition the set matrices form, matrices of follow are implementation is to symbols an intermediate we used a compressed of add "error to the adding states" head of representation representation error to the the matrix, parts to and to add Since matrix. between which action graph shows these these and encoded additional together states and symbols, matrix used in error the indicated these error direct states which recovery error will in an "error recovery state when an error the scanning diagram 'matrix actionI in complete BYTE and the in entries back to a normal process the of Access is made to matrix". occurs, symbols state in occur. In the following BYTES any of the ordinary with we show 98 matrix together with its to force form. ý REF ý IDENT STRING ENDDATA ý------ý------ý----ý--------ý---------ý-------I ------013ý5 STOP --ý-------ý-------ý-----ý-------ý---------ý---------ý------ 122 GO>7 --ý-------ý-------ý-----ý-------ý---------ý---------ý------ 2 EXIT GO>8 --ý-------ý-------ý-----ý-------ý---------ý---------ý-----34 GO>9 --ý-------ý-------ý-----ý-------ý---------ý---------ý-----EXIT 43 GO>9 --ý-------ý-------ý-----ý-------ý---------ý---------ý-----56 GO>10 --ý-------ý-------ý-----ý-------ý---------ý---------ý-----EXIT GO>10 6ýý5ý ----------------------------------------------------------lEPyDDATAIBYTEIREFIDATAIPROCIINTEPRUPT114AIPJI 26 1 25 --ý-------ý----ý---ý----ý----ý---------ý----ý----ý----ý JEXITJEXITJ 21 EXIT 71 1 JEXITJ 71 --I -------I ----I ---I ----I ----I ---------I ----I ----I ----1 I11111 JEXITJ 81 ----I --I -------I ----I ---I ----I ---------I ----I ----I ----1 JEXITJEXIT) JEXIT) 4(41 EXIT 91 91 -I ------I ----I ---I ----I ----I ---------I ----I ----I ----I JEXITJEXITJ JEXITJ 1 616 10 EXIT 101 ------------------------------------------------------- Note that Looping the "anything" in an error state To send the pointer until symbol, 26, is always used a symbol is read and matched. to an error state after error detection we "GO". have action 8 for "go to state means which Separating into for introduce is the state an error the when the useful in from parts we have "GO>8" putting them recovery". error We can adopt it leave we action states. several once error states error 2 For example at the end of state and same action it if but and use the GO verb. and the GO verb is are patrs If policy. a mixed state states repeated is an action used used times several we GENERAL ACTION The use of introduce of kind the "panic of on within a a set which in used recovery of symbols the allow user of KHAR to CAPirAUJ, an improved is in kept form on existence may occur. is such of CAP'P1AP1] The technique place error in action" recovery which states error from non-terminal sections substantial the within symbol a of that current text can one Thus called. was be can take only or on a symbol non-terminal this which valid recovery skipped in certain circumstances. r The general as an error matrix distinction rigid to access exactly end recovery state. between syntax state for as in the normal of state new next a uses an appropriate action use in symbol and repeat this way. scan except for taking state is possible and semantics. syntactic is not the signal This normal error within the because of KHAR's The verb LOOP is used The use of the state that encountering action, the scan from the beginning is the but to read a of the state. Since this set state may contain on which recovery director may occur symbols of these and so on, the follow non-terminals, is augmented automatically by all the nonterminals. SUMMARYOF ERROR ACTIONS USE n "n" the state "n" is any normal should be an error state; LOOP n state within the matrix, not an error state; GO n this forces action access to another state without recovery; EXIT forces with exit from the present recovery left matrix to the higher to the calling level; STOP stops the process of syntax checking; SUCC read the next symbol and resume scanning as indicated. point, SEMANTIC PROCESSING & CODE GENERATION CHAPTER 5: We have already In this we discuss chapter (oricontext semantics This covers most of the structured language tree abstract parse integer variables. semantics of We then use in the production semantic but of we take in expressions of expression, illustrate We this a language are for PL/O. a block in up the attributes has only by considering the PL/O as problem such these of a compiler propagation an arithmetic how show encountered problems the not in the KHAR system. mechanism used in KHAR to handle the actual sensitivities). by shöwing their applied the approach outlined as ANSI FORTRAN. THE SEMANTIC MECHANISMS OF KHAR "semantic The expression a semantic example of data mechanism" structure such a mechanism and is the the was used operations symbol table in CCORDY] to it. upon mechanism. cover His He first asserts that 1) it is universally used, 2) contains name of object, 3) contains its 4) indicates 5) contains 6) contains parameters. its data type, structure, addressing auxiliary variable, information, information, array, procedure, etc., and dimensionality or number of Typical 1) enter name, 2) enter address, KHAR and are symbol that of integer the into of is a popped off dictionaries the identifiers, of handled objects dictionaries. these KHAR within KHAR has Thus no CCORDY]. of stack, the as shall we the stack; element would be accessible. a is a register from the read-only input see, of push-down which are integer items may be pushed onto or is a read-only the structure than in store, which only stack, the top (CSS) Source Symbol register A Register elements rather this whose content is the last stream by the recogniser, symbol it read is used in a manner; (RG) this " but approach In KHAR we do not compilation. have the KHAR are: variables; 3) do between stack this 2) Current difference the denotations sense The mechanisms 1) A symbol We indices in table us to as are given each attribute. conventional table. constant and the brings a symbol construct : strings and so on, one for immediately This of upon the symbol table operations the is a working SET incremented verb register from a or decremented; whose contents range of may be set sources, or using may be 4) A Label (LABEL) register this is used to provide used to generate 5) A Level 6) An Index values are which labels; (LEVEL) register this unique integer of a set is incremented within level, address level a provide language, structured for pairs to used to generate PL/O machine; the (INDEX) register this or a block count decremented register is used to index into the stack, and is set' is not SCOPEor SEARCHverbs; by the use of the 7) Top of stack register this is used to access available explicitly the top of the to the it stack, user of KHAR. actions or verbs SEMANTIC ACTIONS We may now describe upon the semantic the structures semantic of which operate KHAR. MARK This action increases LEVEL by one and puts "F'ARK" on the stack. FLUSH A This action removes all MARK from the, stack, entries down to and including and LEVEL is decremented by one. the If MARKwas not found on the stack, to 0, and LEVEL becomes the stack is pointer set 0. SCOPE, SEARCH & CHECK These three have actions similar syntax Each one has one argument and two sets these will the two sets result first down is used to Both these values for the pushed behaves the stack onto for just on the the change stack, is the and obeyed, be done. This declarations.. stack items declared actions duplicated it on the would on "SCOPE" one of actions actions of with put of set complete Otherwise, locate set depending out matched SL4. One of actions. CHECK action. "mark" first the check "SEARCH" searches argument.. last the second to used is argument found the otherwise the to if accordingly, verb if of be carried SCOPE, SEARCH or the of searches elements actions of in as shown like its with "SCOPE", and is stack. value next a match of INDEX so that to the matched symbol may be accessed. "CHECK" just first set checks of the argument; actions is carried if it is out, not zero otherwise, the the second. POP This action removes items removes one item, from the stack. "POP, O" removes all POP by items, itself &' "POP, and items. removes "n" PUSH This places action RG on the of contents value following integer, PUSH alone and an. integer symbol, index value of to source current PUSH LEVEL, LABEL, the which value the or as written, of value the PUSH RG puts stack. PUSH CSS, the current LEVEL of on the stack, PUSH LABEL the symbol, the one item is the of value an either item. an encoded SET the This action SET RG pops a value the current the alters RG and SET -n "n" subtracts the PUSH, places value of while from the RG, SET CSS places RG, SET LABEL and SET LEVEL, LABEL and LEVEL, of RG. register working into stack into symbol sorce values the off the SET +n RG. SET following adds "n" as alone, symbol in to for RG. SEMANTIC CHECKING IN PL/O Semantic checking and in given PASS 1: the in third the syntactic pass PLO is done "CONST" part, we concentrate'on pass is in on procedure supporting material, in in three the calls. In the passes. second on "VAR" pass The Linearized LISTING first part SL encoding OF THE KHAR SYSTEM. processing This pass checks the syntax of PL/O and outputs expressions in they repeated are following are We do not show the syntax notation. postfix error addition The only sections. the the with correcting actions of diagrams seperately semantic actions in placed in linearized the as the coding and may be seen by inspecting actions the appendix. PASS 2: dealing with we "flush" and upon exit to and including if index, For are zero, If is SCOPE to use push its stack, a identifier, the identifier code we push the code and "0", semantic we not is we emit symbols in a constant the to in used that scope. its and as a "don't to unless for identifier as a value Remember that denotations. source copied stack has zero their EMIT current are SEARCH the identifier matching it EMIT so that "factor", by indices represented then in the it then otherwise If the above its constants is value not replace the identifier code, symbol. Note that RG to otherwise we set required by the on the action. We report stack down entries For each constant the stack we use ERRORto output we stack. onto we the marker. entries. code, the onto is, level, this at it If identifiers, other When we are I For each identifier, we push two entries encountered is to remove all that already. not declared its actually care" the "mark". it. is declared message. If value. the stack, we "mark" [block], 98 each time on entering In matrix check constants manifest appears an error if any of the constant identifiers in "VAR" part, as a procedure at left the in matrix hand side ": _" of in matrix "CALL" after ident 98, in matrix 97, or 97. i We tabulate carried the pass and then present index number of the action the with for the actions placed the to show where it graphs would be out. Semantic Actions: Manifest Constants 0. set 0 1. mark 2. set css 3. scope rg(error(Ln, 4. scope css(error(ln, 5. flush ": ", rg., ": " declared"); ", css., " declared"); push rg push push css push 6. search css(;; ) check %index+l(epror(ln, ": ", css., " const 7. search css(;; ) check %index+l(error(ln, ": ", css., " not a procedure");; in LHS");; ) S. set emit 9. search(;; ) check %index(check 10. emit(css) 11. emit(css) set emit %index+l(emit(%index+l);; css; ) );; ) ) 0; ) I (0 .a E ", le N d' 64-3 t"1 C ý S -0 ... ^^ . iý4 ý C m ..U ý .4 ýýý "`E . Pl "` .ý a a z 0 U T 0? a7 00 0) 5.8 - - W IL I IH X W ý Z W 1 i's Z W z TT K- c-I ft&j ý0 n c Gý 0 .... -a C -ýt3 11 00 (C) 0 -+' NU u. u T . dJ c ... Z H m 5.9 - O O - n G 0 ... N 11 ^ r-ý 11 v L CL X >n u >v ýý 413 If 0 f S. 1s - 95 *Clerm7 Ltermi< F-->'E' L. _1J. -.. _J 94 ---ýCfact or7 Cfacýor7 \ 344- liE- 93 I 0 ident -> -number-- 9 10 ýC[expression] --ý0)- I 8 -> I PASS 3: checking variables As in pass 1, in matrix MARK the and on leaving, stack is already "variable identifier If on the stack. identifier redeclared", on ýthe stack I Report an error if with on Eblock], entering we we FLUSH. identifier For each variable it 98, each time "1" encountered we use SCOPEto check if found on the stack, but if it is not found, to mark it above it, identifiers any of the variable is an error, there we push that as of interest. on the stack appears in "CONST" part as a procedure after "CALL" in matrix name in 98, or in 97. Report an error i. e. it 98, if is not declared have been marked with "ident" before and marked with "0" as in the first ": _" in 97 is not on the stack, "1". All pass. other identifers will i Semantic actions: Variable Identifiers 1. mark 2. scope css(error(ln, ": ", css., " declared", nl); push css push 0; ) 3. scope css(error(Ln, ": ", css., " declared", nl); push css push 1; ) 4. Rush 5. search css(;; ) check %index( check %index+1(; error(Ln, ": ", css., "cannot error( q, ': ", css., " undeclared"); I) I appear in LHS"); ); 6. search(;; ) check %index( ": ", css., " not a procedure");; check %index+l(error(ln, ": ", css., " undeclared"); error(ln, 7. search css(; error(ln, ": ", css., " is not defined", nl); ) ); -I n 34 .ü ® II .E 0 ß a N m ý G m ý ý R1 . ýa C @i -o ... ý S .0 1 LGý, "` .ý z 0 0 ' a Ll -Ak 00 0) -S. 14 -- ý H >C [tJ Ih L] z w i 'A:. Z W ZO ra ý n C 0 (C) .r.. "i ..C a 0 ü ."... u iý Lf)T ftAj CJ Q1 "'d "- J Q U H T - 5.15 T - Cterm] A CLerm7 ý-ý---+t-ý I _ý CfacLor] [foe Lor] ýý Lden i -ý 7 number ýC----: -)Cexpressi on] -=-ý)ý A I ý ý/ýý f il u v n v -> tw * (C) (3) S. 17 - 11 PASS 4: checking procedures As in previous and entering the upon exit and "flush" the is encountered time a [block] each before stack in 98. matrix identifier Push procedure there, it if and is, Report if an error appears "CONST" part in "VAR" before ": =" in matrix in 98, or interest. i. e. that if is "procedure not already identifier is as of no interest. identifiers the it of interest on the on the 98, in 97. Report an error stack, an error identifiers any of stack - in part on the report We mark other redeclared". stack "mark" passes, is if not the ident after declared or CALL in is declared 97 is not but marked as not of Semantic actions: Procedure Identifiers 1. Mark 2. scope css(error(Ln, ": ", css., " declared", nl); push css push 0; ) 3. scope css(error(Ln, ": ", css., " declared", nl); push css push 1; ) 4. flush 5. search css(;; ) check %index( check %index+l(error(ln, ": ", css., " is a procedure name");; );; ) 6. search(;; ) check %index( ": ", css., " is not a-procedure"); check %index+1(; error(ln, ": ", css., " undeclared"); error(ln, ); Matrices 99 & 98 e n L m .Q E .xU 0 .ý tl . C Li ý .ý NI M ý ý .a ofi º .ý "1 "% ý "% ä a 'ý'' 'o ý H X W A Platrix 97 0 Z w iy Z L1J z ý-- T r, 0) 5.21 O Q T Matrix 96 i 11. vý ýi i -Me výý u ý Ii Matrices 95,94 & 93 95 [term] nL. ol -`ýýý A Cterm] >+ -.. L-->..._, . -<J ýý 94 ýr nC.G'- `ý --ý[factor] [factor] ý*<---ý 93 > idenL= number C--ý, Caxpressi on7 - 5.23 - ;NI c( Q_ (4') ý ATTRIBUTE PROPAGATION IN EXPRESSIONS in As outlined that require introduction the be propagated attributes discuss We therefore to this the a language does PL/O chapter, within for problem this Abstract with Parse not Tree. to similarities ANSI FORTRAN. We assume that or polish, in in bracketed the tree repeatedly language surveys very the KHAR has overheads. corresponding semantic reverse tokens to the expressions way. (identifiers) checking linearized so this emitted type appended have to can take operand, be from left triple operator) propagated This place. expression have shown CTANENBAUMb], the simple, Consider semantic of CCORDY7, we assume that leaves (operand, one in heavy result As in the the scanning with are of has and some convenient so that dealing pass code expressions. The attributes up earlier post-fix, identifiers are an at of majority Labourious approach will We present a simple example is done by to right, a time. As expressions only occasionally and then give graphs. the expression -W+x*Y+Z. We assume that might be externally it has been translated represented +++ W REAL -- Note the use of -- into an internal form which as X REAL Y REAL *+Z to represent unary REAL + ---. minus and the start-of- I , end-of-expression expression, We give while the semantic for the two passes below, of the passes here. pass transforms our example to W REAL >X +++ < -- +++ and ---. and actions graphs the action explaining The first markers, REAL Y REAL *+Z REAL + ---. I The action of the semantic their types until their type further monadic it and than two deep as it ">". Then by its followed In either the is to case this but emitting does so. If operand, else prefix fragment it with operands and operator is is followed by the is bracketed is copied of the expression rest identifiers stack is encountered an operator is emitted two operands. graph by "<" "---" until is reached and emitted. The action of the second pass is to match its operator and emitting) actions. operator and types. graph gives type operand(s) needed"for of this the two scans prefix a language (code combinations can be ignored. of The second which requires explicit on our example is to produce +++ W REAL X REAL Y REAL *+Z Repeating bracketed graph and semantic the valid identifiers The actual The effect a syntax graph defines The syntax the checking changing. with the REAL + ---. gives, first, +++ W REAL <*Y REAL"X REAL >+Z REAL + --V, and, then, W REAL TEMP REAL +Z 5.25 REAL + ---. A third pair of scans gives +++ <+W REAL TEMP REAL >Z REAL + --- and +++ TEMP REAL Z REAL + ---, in which, turn, gives, REAL TEMP REAL > --- +++ <+Z and +++ TEMP REAL ---, which becomes on the second scan TEMP REAL and is no longer Note that report an expression, we rely part processing of the if matrix it syntactic place to is not bracketed code emission repair to processing the error actions to as such. detect and in the error allow further we wish. We tabulate graphs. the We can also errors. action on since the actions for the two passes and then present the Semantic : first Actions pass of attribute propagation 1. push css 2. emit(% 2, Z1) pop 3. emit(``/.3) css, %2, Zi, 4. emit("<", 5. emit(Z4, 6. emit("<",, 7. emit(%3, "<", '3, css, '4, css, ">") %2, %1, ">") %3, '2, X1, ">") %4) push css 8. emit(css) Semantic Actions : second pass of 1. push css 2. emit(%1, "real") 3. emit(%l, "int") 4. emit(%1, css) 5. emit("temp", 6. emit(css) css) att rim propagation i First Pass : Matrices 89 & 88 Wt "I Il 1.1 ii ýý r G ... "' ý .ý4 ý ft. L 0 .0 0 ý 0 (0 (0 ý. Lý ® n 00 0? ý. 0 u ..,,. -a 0 c 0 ý It 4 lý L. Q ! "' .00 00 oa ýa -T c t 0 0 0 ~ w" -a ai 0C Ca E ý' T M 00 00 -3.28 v 0. ý ý ý ý v -0 -i 0 4.63 0 ý 0. 0 U w ý ý iý (0 Second Pass : Matrices 89 & 88 I I I ý A G .«. .G «N ý to C 0 0 (Ii ai It it ýýý ýJ.. I- z H1G J äQ ti wz tL l IH ti W" 1 .. . 1J ý ý a c ,.. s -u ... .0 11th if___ T __ - 5.29 - a c 0 co c0 rn 00 I CODE GENERATION We now discuss simple assembler for process which will generate be The placling by inspection The use of the compiler from of The method of what uses identifier SEARCH when <statement>. the LABEL to register on load derived again, information the the the separation graphs the in the or stack required VAR part. store generate labels and LEVEL and RG to (CSS, LEVEL, RG) is encountered is syntax build simplifies instructions, A triple the of the and happening. the control information references. is to mechanism emission of CWIRTHb]. CCORDYJ but, echoes the in code could assembler stage. in actions emitting a we take the discussion the work at this KHAR semantic the out understanding address code code emission semantics transfer the of for the of code for we assume a two-pass A simple' but KHAR passes for is, that code addresses. code as sufficient assembly needed the PL/O machine, two as constructed in terms of generating code generation order pushed to onto The triple for construct generate the stack is located is generated use while in the storage for each using scanning CHAPTER 6: THE INTERFACE TO KHAR As was introduced language designer linearized form fora needed syntax discussed SLO. We then using KHAR set show the Syntax in tables Actions for KHAR use to recovery actions, SLO to then general, the a encode etc., to are all for graph the more SL1 and coclude the chapter interface to SLO, having in may be used that SL1 which produce in SLO present implement needed which linearized the up for current to he will which and error graph hand coded present of Language Languages Special the syntax interface the pass. the giving the the the 3, chapter a Syntax particular first SL4, is of We discuss detail, in useable is translated SL1. We by presenting then that of KHAR. SYNTAX LANGUAGES In this part we introduce family a designed for any other language we may implement. creating the transition One of the key ideas possible and action manually SLO. the automatic table and of at in the creation least we do this for Using SLO we may define of table design of of these one of SL tables' SLO, and the action SL tables. our smallest the SL1 languages series SL1,... table of to make is The transition languages language which must table be made in the SL series, may be created I by SLO and so on. automatically interface user's KHAR. to improved to produce language to be implemented. in A program and according to of of matrices the syntax language the the to form emitting SL, a linearized form can be of a new syntax information, for a language of the L transition L. the following with of the as language features actions A program in SL has one or more blocks and "}" suitable SL(i) the with linearized code is SL(i) say that deal to any SL is information is That SL(i+1), semantic Eventually structure bounded in curly brackets : { main block blocks other } "lain transition matrices, Each block statements and ")". labels blocks" has one and or more Blocks layout part, and compound statements block. bounded by square has one or more bounded by are main from the to the main compound statements action the the information contain Each compound statement one syntax-error from constructed each of which has a similar "C" and "]". brackets information the and "other matrix other "(" is block" all simple parenthesis labeled. The are numeric. ,, - 6.2 - THE GENERAL SYNTAX OF SL LAIGUAGES :: ={<main block>C; <block>... <program> block> <main :: =<block> <block> :: =<matrix <state> :: =<state label>C<state>C; <statement> error [, <semantic label> "integers Label> <valid element> <syntax error [semantic [code actions> actions] emitting [special actions] <syntax actions> [semantic :: _ " depends on which SL(i) :: ="as defined [code emitting vary syntax only in the of the SL :: = STOP :: =[special " 5"IERRCRIEMITI actions>, :: = empty actions] defined. actions] ISA111SA12 are as follows. action] action] chapter is being actions]l[semantic <semantic actions>, used in the overall syntax in :: =[special SL Languages nonterminals [error between 0 to number of states" :: =SA1ISA21 ........... error ] Language symbols" actions] Different three :: _ "all actions> between 99 and 50 " :: ="integers <state Label> I <code emitting actions>]C ] <statement>]... ) actions> element> > <state :: =<valid ] <state>... label>(C<statement>C; ? <syntax <matrix } actions] three and series. nonterminals <code emitting For SLO these GRAPH OF THE GENERAL SYNTAX -- df . 41 CN OD c .... "N N -ý-ý .ýý 0 .N T p .o "- -4-3 -4.) -0 .... 9) .. _. ý I. aýU Nö Ua 1 0 U Oý ý-" A ,0 "% W M "I v. XN T .ýa ý .... ý. Q j 6. ..öE.. .ý'NG . i. ý "ND Li T xý. 1 .... 0 EC 1 - 6.4 ", . LANGUAGE SYMBOLS OF SL language symbols of SL are: The present 4 In sp rg css set pop push emit tab ni sa0 sal sa3 sa4 say say sal sat use find go or check search stop-get index flush sa8 sa9 salO loop unget level label error sail sa12 exit mark scope 5 w I "[-@ ( / +*]}<>?,. SEPERATIOM OF SL KEYWORDS& LANGUAGE KEYWORDS In our small As syntax. language we have a mentioned earlier and it So be able to compile system sets of keywords. when intend to set "keywords". codes for contains These two sets of add a feature The other set). it we to keywords in used a program in SL is the linearized of a language syntax the of set all set is the keywords of up our of that an SL program should keywords to the language system. keywords are unchanged its form language. know both (except SL in which case we update the the language We concatanate SL keywords always come first. SL keywords are always,. the same. for which we these two and call This ensures that the USE OF POINTERS IN TRANSITION TABLE table In our transition pointers: by three i 1) pointer to the same table 2) pointer to semantic 3) pointer to code-emitting no such actions However if different indicate to the next state; actions; If any of the last litte element is followed each valid actions. two for that pointers is zero, it be means there will element. the valid is number the case element was a matrix a as follows 1) the same as above 2) pointer to, either, type 3) pointer to either type of action before of action after entering the matrix entering the matrix. SPECIAL ACTIONS SAO TO SA12 These are the only ones available I be used in all other in SLO. They may, SLs and in any programming language of if course, necessary. sao This is the null action, used to satisfy syntax of SLO. SAl Changes the state beginning of the numbers appropriate at the end of each block in to state an SL their actual in transition Language addresses table. at the This time of the is done of code emitting. SA2 is normally This is action to called remember after each state starting of point is number the The read. current state in of the table. transition SA3 This puts the current array symbol on the first item avalable table. transition SA4 This changes the sign of the "next the transition table by action and puts symbol" at the end of current so that (as they are negative) are recognized address array, state and changed to matrix their it in they actual SA1. SA5 This initializes number. action all remembers elements the beginning of "state of addresses" each array matrix into and a negative I SA6 This action counts the number of matrices, of the beginning length of appropriate records matrix, matrix codes with matrix, transition writes the action and writes in an SL block of each of them in the transition transition file, last comes at the end of the table on its program, the address records table on file. SA7 Puts the current symbol on the action table. SA8 Puts the "next state" on the action table. SA9 Puts a zero on transition table. SA10 Puts a zero on action table. SA11 Puts the current pointer of action 6.8 table on transition table., the the SA12 Put "rw-stop" error Syntax action error In this in on the transiton table. This is the only syntax SLO. action part for causes the compilation SLO we have no syntax to stop at that point. error recovery. Any error SYNTAX GRAPH FOR SLýr 4) . i1 x c N T T A 041 ý ... I -'U) ac -0 0 e. m "1 CL U W0 (^, J ow >-.. I ý4, 0% 'ý ý T t.. mm si QE "+ý ' 4J . yC "% u T XL. 0 ... _ Lý Slý . ý. 0' EC :f e6l ý - 6.10 - TRANSITION TABLE FOR SLO 1 99 0 102 0{ 16 20 3 40 51 6 7 80 93 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 0 34 21 12 0 1 C 18 0 0 0 1 22 24 0 6 0 1 ( 30 0 0 0 1 23 36 0 8 0 1 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 66 69 70 71 72 73 74 75 > 42 0 0 0 1 22 48 0 10 0 1 1 62 0 12 ; 30 0 15 ) 82 0 29 0 1 27 68 0 18 0 1 27 68 0 18 ; 30 0 20 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 ) 82 0 22 0 1 ; 18 0 0 ] 92 0 0 0 1 ; 6 0 0 } 998 0 26 0 1 I i ACTION TABLE FOR SL/O 37 00 1 20 3 4 50 6 70 8 90 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 stop sal sat sa2 sa3 sa4 0 sa9 sal 0 sa9 sa9 0 sa7 0 salO 0 sa9 sa0 sa10 0 sal, sa6 0 sa9 sa9 sa9 sa0 0 salt sa10 0 SYNTAX GRAPHS FO -r0. ý On P1atrix* 99 -'j tQ T CD ý 0- ^ ý «ý o -T ý ._ 0.41 -4.3 -o. oö _.. OýU o >-- a 1 @J Na XN Oý O . «.. T L dý S1 -+1 >ý No oE ýc n "% u T 1 '1 XL '-0 Lý .ýE EC . "% C161 6.13 - I tlatrix 93 Qý 4-t ý ý ý ý .... Ö ý 0 C9 > ý 0 Z H LL- -t - 6.14 -- 0 Matrix 97 co ~....... ýý. ý 6.15 CODItJG OF SL1 IN SLO The following small present is a complete language, program written in SLO, for SL1. { 99 98 97 C 0((>1 ); 1(21>2Isa1 sat ); 2(C>3 ); 3(22>41sa2 ); 4((>5 ); 23>61sa3 ); 5(@>12(sa9 sail; 6(»7 ); 7(22>8Isa4 ); 8(I>9Isa9 sail; ; >1l sa9 sa9; a>12Isa9 sa9 sa9 sail); ); 9(97>l0IsalO 10(; >11; a>12Isa9 sail); 11(23>6Isa3 ); 12(98>13IsalO ); 13()>14 ); 14(; >3; 3>15 ); 15(? >exit(sal sa6; ; >1 ) 3; C O(exit>exitlsa7; stop>exitlsa7; go>1Isa7; find>21sa7 ); ); 1(22>exitlsa8 25>3 ); "2(>3Isa7; 3( 23>4Isa7 ); 4(go>5Isa7; >31sa7; 23>41sa7 ); 5(22>61sa8 ); 6(or>21sa7; 25>exit ) C 0(saO>OIsa7; sal>OIsa7; sa2>OIsa7; sa3>Olsa7; sa5>Olsa7; sa6>Olsa7; sa7>OIsa7; sa3>OIsa7; 25>exit sal2>OIsa7; saiO>QIsa7; salt>Olsa7; ] .} sa4>Olsa7; sa9>OIsa7; ) our THE END-STATE SYMBOL We choose languages a special and it assign implementation In this it If language arises as "end-state" if'we 5(end>3; to another valid programming '--, -SL languages. symbol. of elements valid if character, as the first have "? " in used a we like. valid element after "; " SL statement ?>1 ? stop) "? " is recognized the first the the For example in the following of a state. preceded in of one change that only rarely as our end-state "? " was in use, we may is which we use "? " happened that The ambiguity expect character element element because as a valid but the second one is end-state But in the following by a "; ". because it we is not SL statement 5(? >4; end>5 ? stop) after "(" we can have either So syntax-error-actions. if ambiguity state) or meaning, valid this a valid element or having read the is end-state valid element. (end of state element of a state symbol). it if is (that Using "\" So if should this before followed end-state first "? " state is "? " takes there an its "? " is to be used as the be-preceded by a "\". by is an errorspecial first DESCRIPTION OF THE STATEMENTS All the valid of that is the address expected next The error if recovery the elements element valid the we expect state particular of in an SL program statements have the current source same layout. symbol and the "next state of another Being to match with state" statement where we can find symbol. part action none of at the end of the in statement element valid each statement elements matched is for error the current source can be a reserved word an integer an identifier string a constant a matrix label a statement label any special action null-symbol I any-symbol A "next one that after symbol. A valid in a state" in statements is an existing statement label: I , ACTIONS are listed The actions been explained here but the majority elsewhere. The chapter Described in chapter 4. Described in chapter 4. is of them have already indicated. USE succ STOP The whole process would stop, 4. see chapter LOOP n See chapter 4: "n" is any state number in the same transition 4; "n" is ariy state number in the same transition matrix. GO n See chapter matrix. i POP, PUSH & SET These three are actions semantic in described actions chapter 5. EMIT and ERROR "EMIT" have They can and "ERROR" outputs on "output-file" outputs any number Arguments parantheses The difference have the same arguments. These two actions and meanings as follows are on "error-file". by seperated arguments of is that commas, in : CSS outputs the current source outputs the content of register pointer to "constant symbol RG "constant string" the outputs string" %n 0 outputs the n'th. the element down the element stack. XINDEX outputs value of in the stack accessed by the current INDEX. %INDEX+n %INDEX-n outputs If the the argument element +/- is followed be symbol would output. , n from that by a full-stop accessed by INDEX. then the actual, For example EMIT CSS the outputs internal code of current source symbol but EMIT CSS. outputs the actual symbol. NL outputs a new line outputs a space outputs the Line number outputs "tab" SP LN TAB SYNTAX OF SL4 SL4 is set of graphs the current and give interface its encoding to KHAR. We present in SL1. its syntax as a SYNTAX GRAPHS Matrix 99 rr f 1 1.. .. W. CD C e. ^üý .ýý OU "- . O ý. ' ý.. ü Ü1a `t7ý 4) .... ýO I1 .a O -E U ow >- 0 ý 0 06- ý. ýß xy 0C cý O ''' *om L. No ýý Nc ", XL E . ij ß3 Ec ý. LVJ T . ý. ý. ý.. "1 ý 22 1 Matrix 98 i. 0 .. c 1 m 0 N O C' r4 . 41 n (0 C1) u C 4) E .... 4) ÖHÜWW c~nwcýnä°cD TTTT Ii "O .. Öý H iy. IY ý .. r 0 wc TýN TT LJ.. ý Matrix 97 I N qp. F"`cýn 0 0 mmý 1 . iý L.i E OU tl.l >G -+JMü3 tý C6LWU ý--_1 C ý Li ý L AQ *1R U ýU .._t U! ri 0 0) m ... U) 0 I ýý ý -o .. ý K--1 (Y. U 1 CY. to H LY 0 IT, ý.. W i Y. W W LiJ U3 ý.. 0. J 0-i aý - 6.24-- ý ý 0 ftý r;*Iýw ii a zA mJ TT Matrices 96 & 95 -ýx-----;. ýý )1C957 "_. ý -ýCSS 3RG t ng r-ýºconsi-sir NL SP LN TAB Integer 0 3Lýý T INDEX T I 6.25 - CODING OF SL4 IN SL1 { 99 98 97 96 } 1; C C 0((>1 a go 18 ); 1(21>21sa1 sa2 a go 16 ); 2(C>3 a go 18 ); 3(22>41sa2 0 go 18 ); 4((>5 O go 18 ); 5(? >17; 23>61sa3 a go 18 ); 6(»7 ägo18); 7(22>81sa4 a go 18 ); 8(, >91sa11; I>11lsa9 sail; ; >131sa9 sa9; ?>171sa9 sa9 sa9 sail a go 18 ); 9(97>101sa10 a go 18 ); 10(1>11lsa11; a go 18 ); ; >131sa9; ?>171sa9 sa9 sail 11(97>121sa1O O go 18 ); 12(; >13; ?>171sa9 sa11 a go 18 ); 13(23>61sa3 a go 18 ); 140>15 a go 18 ); 15(; >3; 3>16 @ go 18 ); 16()>exitisa1 sa6; ; >l 0 go 18 ); 17(98>141sa1O a go 18 ); ); 22( go 4 or 23>22 go 7 or )J go 16 ) 18( 0 find J; loop>1lsa7; C 0(stop>exitjsa7; get>Olsa7; unget>Olsa7; find>21sa7 a exit ); go>1lsa7; use>1lsa7; 1(22>exitisa8 a exit ); 2(23>31sa7 @ exit ); 3(>2; go>41sa7 @ exit ); 4(22>5Jsa8 a exit ); 25>exit 0 exit ) 5(or>21sa7; ]; 1 O(eMit>1Isa7; error>1lsa7; push>21sa7; set>21sa7; pop>O sa7; 27>OIsa7; flush>OIsa7; search>41sa7; check>41sa7; 25>exit a exit ); 1(96>0 a exit ); 2(%>31sa7; 26>Olsa7 a exit); 3(26>Olsa7 @ exit); 4( %>51sa7; 26>61sa7 @ exit ); 5( 26>61sa7 a exit); 6( (>7 a exit); 7( 97>8 a exit); 8( ; >91 sa7 a exit);. 9( 97>10 exit); a 10( ; >11Isa7 a exit); 11( )>0 a exit) 0( (>lisa7 a exit); l( %>21sa7; css>5jsa7; rg>51sa7; 20>51sa7; 26>61sa7 a exit); 2( index>31sa7; 26>51sa7 a exit); 3(+>41sa7; ->41sa7; 25>5 0 exit ); 4( 26>51sa7 a exit ); 5(. >6lsa7; 25>6 0 exit); 6(, >llsa7; )>exitlsa7 @ exit) CHAPTER 7: IMPLEMENTATION Two versions an in ICL1904S C language, of a derivative and the system of number the involved write in use in designed for The main aim of the the intended working version in system, relies particular, This was on in the written on many the file the enables KHAR, see below, use of large be handled to user. the present and to research, implementation is therefore It the that development and the second command macros. commands for-the must be emphasised It This UNIX operating to The first on a DEC PDP11/40 second BCPL. of ability files by providing passes of were developed. KHAR system PASCAL and the features the the of of different is the the enable a required. from the final form prototype UNIX environment. study primitives in several environment the explo4t to is version of required the for KHAR its ways. OUTLINE OF PROTOTYPE First, all language and the into a simulated would be floppy discs. Second, etc., the information all on possible are available in the file is read from the serial indexed retained held random file backing KHAR actions, in each pass. error information This in a stand-alone The discussion - 7.1 - about a program f4Les of the UNIX filestore organisation. store store system using recovery, semantic, concluding chapter 5 showed that a few passes, only access these files Third, the statements, the set a full to give enabled be of to trace of trace great of in value errors. The implementation Only possible. made of local require that that full a has proved This machine. to needed errors. contains can be selectively the of code generation, when reporting KHAR machine present which behaviour tracing except principally little has one recursive procedure Thus the basic variables. language a procedural is overhead to attempted imposed with be call simple-minded as is made, and no use is does code of the machine recursion the and as be used. machine not This means is easily transportable. The main components of the KHAR machine are the algorithm of which has been described, which uses the case statement when an action is called segment of code. Actions the user via for. Each action are easily the SL languages. and the action of C to parse the the recogniser, linear is implemented interpreter, action code as a separate added and can be made available to I (TT) and Action Table Transition These tables code for emit the system of the and emitting revolves smallest language section as atone it. in the the figure an important we are code our create These two tables In elements. of these and to able the of part are made for tables languages. of checking and are following tables. of TT and consists by hand, array Suppose we have m matrices parts these succeeding dimensional is semantic Once SL series by KHAR to parse This checking, the structure we give Structure language. syntax about interpreted are appropriate for them automatically arranged information all (AT) Table we have in the pointers of a number of states, on I`t`page'-,, -7.7 number of "valid end of part in AT. a element state beginning all with is part"s a language symbol, valid 2) a pointer at this state element matched the current followed part TT consists then of each part. In one state, by a "0" to to a syntax in TT has four of m Each part the same structure. by a pointer element 1) to another to we have enlarged and followed Each valid language, in which a indicate error the recovery elements point. of the same table, the r for use if this source symbol. 3) a pointer to AT for semantic actions. 4) a pointer to AT for code emitting Although the system includes actions. these in the table - 7.3 - whether they are in a "sub pass" or not the structure required further space reduction be subdivided recovery part of a few states in a transition is the same. In that case instead of repeating that we put it in one state at the of end of each state calling it "Error state" elements, the "error has no valid a "0", if were needed. Sometimes the error matrix could state" and refer element and in which can not match anything, TT at the the matrix, from the other states. end to it part it consists and a pointer of to AT. two l A State of "TT" "TT" matrix numb4r ix matr, number state 0 state I 9 matr ix number state n a 0 0 stat® 0 state 1 of "AT" Structure The structure number of "action of the end of TT to the terminates shows this. that start on encountering is simpler than "TT". Each has some actions parts". indicate are "AT" action of the "0". parts. The to Execution diagram of 'a by a "0" to consists followed The pointers part. these It of on this table an action the next from part page II The Structure one of "AT" state "TTu of valid element 11AT« 4. CICAion SI n©xE sLata i 0 0 2 0 0 valid eIemenI next sLote ioný acL 0 0 FILES & PROGRAMSUSED In this part the programs and the files we explain we use in the system. A listing complete additional FILES used LISTINGS material, in In this the the of source programs system we use 26 files 1) *codename-file (code name file) 2) *cnindex-file (code name index 3) *define-file 4) *cndefine-file 5) nocomment ('all 6) const-file (constant 7) nostring 8) cs-file 9) csindex-file as follows file) file) (code name define (all file) comment removed) file) strings (constant encoded) file) string (constant index string file) 10) intgr-file 11) noidentifier 12) id-file 13) notriplechar (all triple characters encoded) 14) nodoublechar (all double characters encoded) 15) code-file 16) action-rw-file 17) *define-rw (integer file) (all identifiers ( identifier encoded) file) (code file) (action ( define given OF THE KHAR TRANSLATOR SYSTEM. system (define is reserved reserved word file) word) in the (transition file) 18) *ttable-file 19) *ttdefine-file (transition table define file) 20) *ttaction-file (transition table action file) 21) *ttcsindex-file 22) *ttcs-file output-file 24) error-file 25) lang-sym 26) any (text) this file its basic We refer files those (transition (transition 23) (language in constant for text to the above files index string file) file) string which we wish to convert corresponding by their are temporary 16, action-rw-file, semantic these'are 1 numbers, 26. to Only are permanent to the files. consists the reserved of all and code-emitting reserved all codes. by an '*' names are preceded whose That is, constant symbols) symbols to their syntax-error, hand. table table is any input system and the others File table used and is made by actions, words used in our words language, small SL. File made by hand and it 25, is also used in the language we wish to implement our small others explained language. are Apart made by in the next from these executing section. consists plus two programs all files in of all the symbols the symbols used in (16 the and system, 25), the which are in PROGRAMSused There programs in used files the tasks, these to refer system. 14 programs are their explain the the they as AS, files use and the ... this they section we create. We N. , A program This program reads the "language file symbols", 25, and makes files three 1) codename-file (1) 2) cnindex-file (2) (these 3) In system. two files third base-code. 3) (3) define-filet On this in chapter are explained We append of the first the lengths one we write more two files information to the terminal symbols of a this and the in the other file programs. B program 6 we saw that In chapter into be categorized different classes. language This program reads files can 1 and 2 then appends some information to file 3, such as the number of items in of that class each those class, classes the position of objects are negative numbers Also this program creates "base-code" are which for placed file 4 not in file present in the the number of items on which language, in that "code-name-table and "number of code names" are written. is needed in program N. 2 and so on. For This class. length", information C program files This program using of language. the It then reads source the remaining outputs 1 and 2 finds on file text the -same files. delimiters of the language, and replace Their them by their and 3000 3999) have to code integers But strings. data" there integers simultaneously along are files Also it like integers This along "language write string input the text is put in string codes (which integers are numbers used in the text with coding we constant or "transition symbols" used are codes themselves, and we do is why we have two similar programs. with as they are and only constant strings codes the constant while program strings. F This program reads file defined constant and the actual and the constant in strings The actual in which the integers Program D codes all to find between these not wish to code them again. program constant codes. between is task find To distinguish They both read similiar. main a dictionary. E leaves comment and all 5. These two programs are very table deletes text, D and E programs also out the comment delimiters identifiers puts all by their user defined 7, replaces all standard codes and outputs identifiers on file the result 12. and user on file 11. names G program check files This program will special character characters), outputs all in the language " triple-character on file the result 2 (symbols symbols exist replacing 1 and to from made if and special if see three it so any special 11, file reads by their symbols triple- codes and 13. r program H This program codes double-character way as program program for G did creates program j file 15. There is a file syntax-error arrange for This this to be included action file these in looking those semantic file file we have "action-rw-file" reads is to two files M. text and as follows: using an code used file, suppose file in in our used C through some "constant example, are and numbered programs appropriate ' actions "action-rw-file" and writes As which words and code emitting be coded the integers. have we only reserved called and its program file On this in left single-characters of all recovery, language. program same I finally We the triples. This program codes all small symbols in special I. 15. 16. Now This definitions" we have our nl and its sp code file appropriate 132 the output of this #define nl 132 #define sp 135 udetine rg 139 which are constant rg as 139 135 would be as follows program definitions in the programming language "C". valid program The transition by program automatically series which transition program actual is of all N, except by made for table this code file, reads this structure program tables usable hand. the for languages the Once language is done (SLO) in table it on we in file file 18 are made in the SL have the 15. This with the N. L This program is used to make the action Language language smallest this and outputs by program we implement in SL series as in program K. table of our smallest M program This program reads file file 6 and amends some constant definitions on 19. program N This program has 4 modes ýhecking 1) syntax 2) semantic 3) code emitting 4) transition checking table making. In mode 1 the program reads source into table, integer numbers), checks for syntax from are neither syntax "code-file", (which is and using now converted the transition validity. We use mode 2 when there there text are no syntax nor semantic errors. errors and mode 3 when PROCESSES USED IN THE SYSTEM system we have In this sequence of the following each processes using a time a programs. Process Executing programs A and Q. This change is made to "language Process 2, 1) code all and all is each used symbols". Coding As we explained I; process symbols: earlier, in this C reads the input there case we execute text a file are two ways of coding programs C, D, and I terminates having with F, G, file H 15, codes. 2) Code all program symbols but E instead of integers. We do the same as above but we use D. Process 3 Code "action-rw-file" then This Small execute process program is Language. Process 4 using J. used after File process 17 is 2 (do not made at any change. in the "language code end of integers), this symbols" and process. or our Code transition terminates process exactly the SLO and then table of with having transition as those same structure program execute table of K. SLO made with by program made automatically This N. Process 5 'Code creates process usable "action in program table the N. data" "action of SLO and then table" for execute SLO, with the program final L. This structure DISCUSSION & CONCLUSIONS CHAPTER 8: We first discuss the research size of the system. as for how the multipass KHAR, which language designer intended of this in actions on a simple We reconsider might approach of semantic as operations turn gives be extended it to a simple both extensibility interface and flexible for the to limited of its together adopted, within the syntax the definition range of flexibility languages such the with of a language, mechanism leads stack approach light the leads and for the is discussed. The multipass potential in of and the overall storage, approach The need to have a clear portability. location the use of working regards We then consider structure to which KHAR meets the objectives the extent of us to graphical defined consider the language semantics. languages for and suggest as PASCAL, or which that into the KHAR was approach fields of KNAR to the other application. t:e conclude compilation of by considering languages for the appl4cation microprocessors. of SIZE KHAR implementation The present code, bytes 2000 with the KHAR machine itself. for be kept on secondary code and for data which can including PL/0, and may be regarded bytes, of as fixed the tables 6000 bytes about less KHAR uses PL/0 and code of for free its half than be could program work space & EXTENSIBILITY mechanisms PL/1, stack other consist of four stack take account of type within which based on knowing within handled in the pass. one of which and over 50 semantic higher Further, the also as a used The mechanisms actions introduces which path to take. type or "do not care" severe defined than KHAR. Cordy has to and KHAR the a to behave modified his mechanisms, choose to "care" table, SP/6, in the symbol table. This is an order structures. CCORDY] for mechanisms, structures the the choice in a symbol as those for actions, proposed of consist of the same class decisions for PL/0. and three choice ROM for of since entries being is used for compiler working leave RAM would of The semantic subset a bytes dictionaries, compiling SIMPLICITY that 24k using KHAR. 8k bytes reduce 7000 about are required storage memory. The encoded graphs we estimate implemented stack, working of colnstants. Thus, text of this of 16000 bytes requires and The remainder require generation, read-only in constants, Only 4000 bytes storage. working of bytes 12000 about occupies semantic We avoid being semantic handled. We about the type of object outcome only affects the taken, action semantic form of those are a reduced The effect into successfully Thus of features, say, introduction act propagation within or actions could this extension. The most introduction the required for of block of six the stack, of the were added addition SL recompiled and the is, to of was in about so lines redefined within on the and its this and time. use eight- of code to the capable The change stack, SL and of of attribute of a required and the working hours, program careful operations PL/O of (or the are These changes code generating generating actions. translator KHAR. basic other the the graphs and a set of SEARCH and in new reouire of type-checking. capable MARK, FLUSH, SCOPE, KHAR 60 or for a machine defined INDEX register AML/1 a machine language. structured that need KHAR from a CFG to with new operations with of change a language a typed, scope syntax between changes with However, the existing using adopted. the introduction stack. showed that be defined significant the on semantic consideration can complexity not into actions the graphs a language does we considered operate of the problem to handle semantic in stage reconsideration for COMPLEX, expressions, to primitive graphs the approach multipass a translator as into KHAR. This of the of type, at one Thus our graph. complexity structure new mechanisms For example, new introduce to an additional of the in CCORDY]. because KHAR to through path internal the be handled use is this of than rather the not the code for addition index ability to CHECK, together These features requiring the KHAR. The syntax table builder) PORTABILITY We have implemented stack mechanism return only. coding of KHAR as it capibility a The only secondary in other storage, The final KHAR are machine simple be able should to hardware capable requirement would be redefined subroutine entry and eight. Thus the a minimal all support from the integer one-dimensional architecture support with limited indexing KHAR machine. the requirement would be inexpensive of random access from KHAR. for to generate would be a version portability in a language supported written the below demands only stands structures Thus arrays. is used that so variables, of subroutines. nesting The data global C Language is used for the of The depth of nesting for hardware KHAR using by itself. code for of The code generation KHAP. passes the new machine and the system recompiled. CLEAR & FLEXICLE INTERFACE The interface only simple a knowledge of the syntax requires graphical and its of SL and the semantics encoding of the KHAR mechanisms. The interface used to KHAR is essentially to itself implement implementations. is KHAR, completely and Independent thus remains the language invariant across of I The flexibility semantic the the mastered, semantic means of a ready its on the system Cordy and the language the Once these are requires produced, as claimed the exact semantics claims the in of use of CCORDY], become language the semantic to to graphs be this. achieving means of in skill designer. the of nature minimal designer, the the of communicating to other superior part graphs Indeed, users. to available mechanisms use of interface, the of I OF LANGUAGE SEMANTICS DEFINITION The reliance defined on PASCAL to an LL(1) using definition which semantics and code the latter details of "generate" writing affix grammar in if has be to is is it that We consider that the the hard to Also, although of a the procedure of the compiler see what the semantic user the semantics imposed by the multipass designer effects of implementation relatively and user to select approach, an aspect make it of the refinement possible of for and isolate of semantics precisely. Further, set semantics nature graphic in KHAR, and the individual of a use by a textual PASCAL. together, the essentially presentation its language a is. meaning both understands by supplied of [BOCHf1AN] produces considered concealed are semantics one are generation The result system. the be read can only which express KI4AR implements SL "programs" is immediately inefficient but a language once it and translated available. it has been expressed so that The translator is available as a standard a as definitive may well be by which to judge for compilers other language. the POTENTIAL FOR DEVELOPflENT We feel that is there considerable based on KHAR. Our experience a wide range with of PASCAL, but than rather the We suggest operations more abstract 0 that in change KHAR makes was intended approximately the on to deal subset-s of architecture, machine PASCAL. KHAR can be used of operations be adapted to tackle KHAR could of user defined checking in PASCAL, and to handle types, scalar the evaluation the of which has been avoided constant of problem expressions at time. compile A possible building) derived KHAR, which of at compile from is to use the former the to approach actions construct well Languages, development such languages. to translate strict low-level with a minor The system and small simple relatively shows that possible. application for potential time an additional at all type checking pass in the program. the type declarations beyond the original available are times, or to passes is This extension but of the work objectives (table- special the possibility has been noted. The evaluation simple extension stack mechanism to operations similar to that of of constant KHAR if would only need KHAR and the suggested expressions for to integer be at arithmetic extended corresponding checking compile were by operators the type time of adding to would be a The allowed. arithmetic SL. A technique expressions shold be sufficient. APPLICABILITY TO PROGRAMMINGLANGUAGES FOR MICROPROCESSORS language The problem of briefly the provide design goals user with him with provide high-level, the doing him by for The attained. the features access to all language two must of the machine yet which can be given more than of the user which language by defining lost by a modern, architectures designer for We to an appropriate If of the machine, check semantic the the semantics which static given semantics that passes, useful he succeeds clues the about think as separate that language the make the assembler programming. code generation the the definitions to one microprocessor. have will of he is is designer the peculiarities machine the program aim of separate different allow then the introduce with been In summary, we may say that be to the protection all programming this, machine have has microprocessors language. Yet another for for in the introduction. outlined conflicting design to the ability to appropriate to associated exists of in in closely KHAR, would semantics of the pass. I 0 I CONCLUSIONS This technology resulted both will work to achieve a began highly in a system which, although in research be Languages for provided KHAR which on for such languages I useful low-level the reconsidering not fully into extended which has a low read/write provide system in this design the because of and will compiler translator multipass programming, designer, by the work, of high-level clear a translator storage has requirement. interface system for I , REFERENCES taMMAra] "The Amman, U., to the Method Development of 1973, Symposium I A. 1974), Norti-Holland, Structured of Programming a Compiler", International Gunther at, et Applied Computing (Amsterdam: eds., pp93-99. CBAUERed] F. L., & Eickel, J., Advanced Course, 2nd Sauer, An Compiler eds., ed., (Berlin: Construction, Springer-Verlag, 1976). CßOCHNIANN] Boch.rnann, G. V., & Ward, P., "Compiler Grammers", Comp J vol Attribute Writing System for 21 no 2 pp144-148 CQROWNa7 Brown, P. A. "The N.L/1 Macro Comm ACM vol Processor", 10 Oct 1977, pp618-623. [©ROUNb] Brown, "Macro P. J., Implementation", Processors Comp. J. vol Software and 12, Nov 1969, pp327-331. CCORDY] Cordy, J. R., Language of Toronto, "A Semantics", Computer Diagrammatic Technical Systems - references. 1 - Approach Report Research Programming to CSRG-67, Group, (University 1976). CGLENNIE] "On the Glennie, A., of Universal a (AD-24o512), Syntax Machine Compiler", (Carnegie-Mellon the and Construction 2 No Report Technical 1960). University, CGRIESJ Gries, Compiler D., (New York: for Construction Digital Computers, 1971). Wiley [HOkRE] Hoare, "Hints C. A. R., Symposium Language on Programming on principles of Design", languages, programming ACM (Boston 1973). CJAIIESJ . James, "A Syntax L. R., Technical Report CSRG-13, Systems Research Jenkins, D. G., private communication, Group, Recovery Error Directed (University of Toronto, Method", Computer 1976). CJENlCINS] of Glasgow, "A Microprocessor (Dept. Language: of Computing version Science, 1", Univ. 1976). CJENSEN] Jensen, Lecture & Wirth, K., Notes Springer-Verlag in N., PASCAL User Manual and Computer Science, vol 18, Report, (Berlin: 1974). CKERNIGHAN7 13.14., & Plauger, P. J., (Reading, tMlass.: Addison-Wiley, 1976). Kernighan, - references. 2 - Software Tools, CLECARPIE] " Lecarme, & 0., Bochmann, User's System, de Montreal, Univ. "A G. V., Manual", (Departement Dec 1974, revised Compiler Writing d'Informatique, 1975). July CPASKOJ for "Pseudo-Machine Pasko, H. J., Report CSRG-30, Univ. of Code Generation", Tech. 1973. Toronto, [PIERCE] Pierce, "Source R. H., Comp J vol computer", language debugging on a small 17 no 4 pp313-317 1974 CPUGJ Pascal Users Notes, Group, DEC Pascal News, nos 9& PDP-11 (ESI), Implementation 10, p83, Sept 1977 CTANEN©AUMa] Tanenbaum, Poor Man's Software "A General A. S., Purpose Compiler-Compiler", Engineering, SE-2, vol PSacroprocessor IEEE 2, June as a Transactions 1976, on ppl2l-125. ETANENBAU"b] Tanenbaum, for A. S., Machine "Implications Archtecture", Structured of CACM vol Programming 21 no 3 pp237-246,1978 Ck'ATTa] Watt, Thesis, D. A., Univ. "Analysis of Orientated Glasgow, Two Level 1974 Grammars", PhD - CwATTb] Watt, D. A., Informatica, "The Parsing Problem for vol. 8 no 1 ppl-20 Affix Grammars", Acta 1977 I - references. 3 - [WILCOX] Wilcox, T. R., Programming "Generating Languages", Machine PhD for Code Thesis, Nigh-Level Cornell University, 1971. C6lIRTHa] Wirth, N., "The Prr9ramming Informatica, vol Wirth, Algorithms Language Pascal", Acta 1 pp35-63,1971. [w'IRTH] N., (Englewood Cliffs, 14. J.: - references. Data + Structures Prentice-Hall, 4 - = 1976). Programs,