Protected Data consists of all "global" (limited to the scope of the compiler object) data used by the compiler during any phase of compilation.
ObjectPtr myCurrentClass;
myCurrentClass is an ObjectPtr that maintains the current class while parsing. This is set when the compiler parses the line "<ClassName> methodsFor: <String>". myCurrentClass is used to attach successfully compiled methods to their respective methodDictionary.
Protected Services are methods that are used throughout complilation for various tasks. Most of these are accessor methods to get or set any "global" data.
Boolean tokenIs(const String & string, const unsigned int whichToken);
This is a utility method that compares the current token (if whichToken is zero) or the Nth token (based on whichToken) with string and returns a boolean which is true if they are equal.
ClassPtr currentClass();
This is the "get" accessor to retrieve the currently set class.
void currentClass(const ClassPtr & newClass);
This is the "set" accessor to store the currently set class. This method takes a ClassPtr as it's parameter and sets the current class reference directly.
void currentClass(const String & newClassName);
This is the "set" accessor to store the currently set class. This method takes a String as it's parameter, looks for the class in the global Smalltalk dictionary, and set the current class reference to the result of the find.
void currentCategory(const ObjectPtr & categoryString);
This sets the current class category to the string represented by categoryString.
Public services are methods used by the compiler directly or indirectly. These are mainly used for object construction and destruction.
virtual void init();
This method is defined by PCCTS and is used to customize the initialziation of a newly constructed compiler object.
virtual ~ST80Compiler();
This is an override of the destructor. It is used to customize the destruction of a deleted compiler object.
These grammar rules (which will later be transformed automatically by PCCTS into C++ methods) are the high level gramatical constructs as defined by Smalltalk-80. The first rule, parse, is the entry point for the parser itself.
parse
This is the entry point into the grammar. It consists of one or more "classDefinition" rules followed by a bang ("!"). Nothing is defined to happen in this rule (other than parsing) because everything is handled below.
classDefinition
A class definition is defined as starting with a class header. It is followed by zero or more methods terminated with a bang or followed by one or more statically parsed statements (such as subclassing).
staticStatement
This parses two different static statements. Both statements start with a className, this is parsed and the result is stored in the string variable super. If super == "nil", just set superclass to gNil, otherwise, find super in the smalltalk dictionary. If it is not found, warn the user, but create a new class with that name.
If the next word following the className is one of the three types of subclassing messages, then we know that we will be parsing a subclassing static statment. The subclass token will be followed by a symbol representing the name of the new subclass to be created. We parse the symbol and store it in a string variable called sub. We check the smalltalk dictionary for an entry named sub. If class sub already exists, then we simply set the current class to it, otherwise we create the class and then set the current class to it.
Next, we parse the keyword "instanceVariables:" and it's respective argument, which is captured in a string. This string is sent to the method builder's method called addInstanceVars(). Next, we parse the keyword "classVariables:" and it's respective argument, which is captured in a string. This string is sent to the method builder's method called addClassVars(). Then, we parse the keyword "poolDictionaries:" and it's respective argument, which is captured in a string. This string is sent to the method builder's method called addPools(). Now, we parse the keyword "category:" and it's respective argument, which is either the word "nil" or an actual string token. If the word is "nil", then we set an objectPointer to gNil, otherwise we set it to an object containing the string parsed. Finally, we set the classes category to the objectPointer.
Otherwise, the next token is "comment:", and we know that the static statement sets the comment of the reciever (a class). We parse the keyword "comment:" and it's respective argument, which is captured in an objectPointer. This object is used to set the current classes' comment variable directly.
classHeader
Class headers start with a bang (no pun intended), followed by a className, and the optional token "class". Upon parsing the classname, we set the compiler's current class reference to this class. If we happen to also parse the token "class", then we set the current class to the current class' class (i.e. the metaclass). Next, we parse the token "methodsFor:", and a string. Upon parsing the string, we set the compiler's current category to this string. Finally we parse a terminating bang.
method
Methods are the majority of the compiler's parsing efforts. We start by creating a new MethodBuilder object. Next we parse the subrule messagePattern, which returns a string with the methods selector. We use this string to set the methodbuilder's selector, and to notify the user what method we are currently compiling. Next, we optionally parse temporaries. This is followed by parsing statements. Once we are done successfully parsing statements, we tell the method builder to create a new CompiledMethod object, and we add it to the current class' method dictionary.
messagePattern [MethodBuilder & m, String & s]
A message pattern is the method's signature, or declaration. It is either a unary selector, a binary selector, or a keyword selector. In the case of a unary selector, we simply parse it via the "unarySelector" rule. Binary selectors are followed by a variableName, which we parse and then tell the methodBuilder to create an argument by that name. Keyword selectors are only slightly more difficult in that there can be one or more pairs of keywords followed by arguments. While parsing keywords, we must add each keyword to a string to build the actual keyword selector's message pattern (i.e. "a: var1 b: var2" tranlates to "a:b:")
temporaries [MethodBuilder & m]
Temporaries are delimited by vertical bars ("|"), so we first parse the token verticalBar. Next, we parse zero or more variable names. Upon parsing a variable name, we tell the method builder to create a temporary by that name.
statements [MethodBuilder & m]
Statements are simply defined as an optional nonEmptyStatements.
nonEmptyStatements
Non-empty statements are either return statements or regular statements. A return statement parses an uparrow followed by an expression and an optional dot (that we ignore). Upon parsing the return statement's expression we tell the method builder to generate a return bytecode. Otherwise, we parse a regular statement. A regular statement parses an expression, followed by an optional dot-statement pair (notice the indirect recursion). All behavior is handled in the subrules.
expression
An expression is either an assignment expression or a simpleExpression. An assignment expresison parses a variableName, assign, and then an expression (notice direct recursion). Upon parsing (and returning from recursion), we tell the method builder to generate the appropriate variable store instructions.
simpleExpression
This is the meat of all statements. A simple expression is defined as starting with a primary, which returns an objectPointer referring to the data parsed. Next, the primary may be optionally followed by a messageExpression and zero or more semicolon-messageElt pairs. All behavior is handled by the subrules.
messageElt
A messageElt (still not entirely sure what an "Elt" is, but we think it means "element") is the statement part of a cascaded message. It is, like all messages, either a unary, binary, or keyword message.
A unary message just parses a unarySelector, and is done.
A binary message parses a binarySelector, followed by a unaryObjectDescription.
A keyword message must do some extra work. It parses one or more keyword-binaryObjectDescription pairs, and for each pair must append the current keyword to a string representing the selector, as well as count the number of arguments (as data for the message send instruction).
Finally, once a message is properly parsed, the method builder is told to generate an appropriate message send.
messageExpression
A messageExpression simply delegates parsing to one of unaryExpression, binaryExpression, or keywordExpresison.
unaryExpression
A unaryExpression one or more unarySelectors. Upon parsing a unarySelector, the methodBuilder is told to send the appropriate send instruction. An optional binary or keyword expression may follow the unarySelectors.
binaryExpression
A binaryExpresison is composed of one or more pairs of binarySelectors and unaryObjectDescriptions. Upon parsing a pair, the method builder is told to generate the appropriate send instruction. An optional keywordExpresison may follow the group of pairs.
keywordExpression
A keywordExpression is composed of one or more pairs of keywords and binaryObjectDescriptions. Upon parsing each keyword, the current keyword is appended to the string representing the current selector and the argument is incremented. Upon parsing each pair, the methodbuilder is told to generate the appropriate send instruction.
unaryObjectDescription
A unaryObjectDescription is basically the meat of an argument in a message send. It is defined by parsing a primary, followed by zero or more unary selectors (a message within a message argument). Upon parsing a unary selector the method builder is told to generate the appropriate send instructions.
binaryObjectDescription
A binaryObjectDescription is defined as a primary, followed by zero or more unarySelectors, followed by zero or more pairs of binarySelectors and unaryObject Descriptions. Upon parsing both groups, methodbuilder is told to generate the appropriate send bytecodes.
primary
Primary holds the place of the receiver in a simple expression. It is parsed as either literal, variableName, block, or "(" expression ")". Upon parsing variableName, the methodbuilder is told to push. Upon parsing literal, the methodbuilder is told to pushLiteral. All behavior for parsing the remaining rules is handled by the subrules.
literal
Literal is a constant in a method. It is parsed as either numberConstant, characterConstant, stringConstant, or sharp followed by symbol or array. All behavior is handled by the subrules.
block
Block is parsed as "[", optionally one or more ":" variableName pairs followed by "|", statements and "]". Upon parsing a variableName the methodbuilder is told to createArgument, this is repeated until a verticalBar is parsed. Statements are parsed and handled by the subrule. Upon parsing a closeBracket the methodbuilder is told to appendBlock.
array
An array is a sequence of literals enclosed in parenthesis. One is parsed as "(", arrayConstantElt, and ")" (notice the indirect recursion). Here the actual array object is created. Upon parsing an arrayConstantElt, the array is told to add the element. A fter adding all elements, the object is returned.
arrayConstantElt
An arrayConstantElt is one of numberConstant, characterConstant, stringConstant, symbol, or array ( notice the indirect recursion ). All behavior is handled by subrules. The object received from the subrule is returned.
symbol
A symbol is parsed as either identifier, binarySelector, or one or more keywords. Upon parsing an actual symbol object is created and returned.
unarySelector
A unarySelector is the selector for a message without arguments. It is parsed as identifier. After parsing, it returns the string containing the identifier.
binarySelector
A binarySelector is the selector for a non-keyword message with one argument. It is parsed as either binaryOperator or verticalBar. After parsing, it returns the string containing the binary selector.
type
Type is a classname of an object for specifying type. It is parsed as "(", className, and ")". It is ignored for now.
className
A classname is parsed as identifier. The string containing the identifier is returned.
variableName
A variableName is parsed as identifier. The string containing the identifier is returned.
dot
Dot parses the low level literal ".", and tells the method builder to generate the popStackTop instruction.
semicolon
Semicolon parses the low level literal ";", tells the method builder to generate the popStackTop instructions, and then regenerates the bytecodes to handle the primary again. This last instruction is mainly needed due to a deficency translating from LR to LL parsing (in LR, the parser knows exactly how many semicolons have been parsed by the time it gets to the primary and simply duplicates the stacktop that many times).
binaryOperator
binaryOperator parses any of the low level tokens for a binary selector (see Goldberg for exact combination), and returns that string.
keyword
keyword parses the low level token for a keyword and returns that string.
identifier
identifier parses the low level token for an identifier and returns that string.
characterConstant
characterConstant parses the low level token for a character ("$"<char>) and returns that string and a character object pointer.
stringConstant
stringConstant parses the low level token for a string and returns that string and a string object pointer.
numberConstant
numberConstant parses the low level token for a number ("$"<char>) and returns that string and an integer object pointer. (Floats are not handled at this time.
primitive[SInteger & number]
primitive parses the low level token for a primitive number (a signed integer)and returns that SInteger.