There are two grammars here so far. The first is a lexer segment that lexes perl regular expressions, for Antlr. The second is a Smalltalk-80 grammar (originally written for PCCTS, not Antlr).
Perl Regex
This works as part of an ANTLR lexer. IDENTIFIER, since it starts w/ an ALPHA includes SUBSTITUTION and MATCH in order to resolve lexical ambiguities. This doesn’t deal w/ perl’s /flags as we didn’t allow them in our use of this grammar, but they are easily added without problem.
IDENTIFIER : (SUBSTITUTION)=>SUBSTITUTION #{$setType(SUBSTITUTION);} | (MATCH)=>MATCH #{$setType(MATCH);} | (ALPHA|UNDERSCORE)(ALPHA|DIGIT|UNDERSCORE)+ ;
protected MATCH { char c = ‘\00’; } : // Start the match with the normal ‘m’ ‘m’ #{((c=LA(1)) != ‘\00’)}? REGEX_DELIM INSIDE_REGEX[c] #{(LA(1) == c)}? REGEX_DELIM | ‘/’ INSIDE_REGEX[’/’] ‘/’ ;
protected SUBSTITUTION { char c = ‘\00’; } : ‘s’ #{((c=LA(1)) != ‘\00’)}? REGEX_DELIM INSIDE_REGEX[c] #{(LA(1) == c)}? REGEX_DELIM INSIDE_REGEX[c] #{(LA(1) == c)}? REGEX_DELIM ;
protected INSIDE_REGEX[char m] : (ESC | .) (#{ ( LA(1) != m )}? INSIDE_REGEX[m] )? ;
protected REGEX_DELIM : ~(‘a’ .. ‘z’ | ‘A’ .. ‘Z’ | ‘0’ .. ‘9’ | ‘\00’) ;
Smalltalk-80
This is a quick grammar-only output from a PCCTS version of Smalltalk-80. It should be trivial to convert to Antlr (and I will–sooner or later).
parse : ( classDefinition bang )+ “@” ;
classDefinition : classHeader ( method bang )* | ( staticStatement )+ ;
staticStatement : className ( keyword sharp className keyword stringConstant keyword stringConstant keyword stringConstant keyword ( identifier | stringConstant ) | keyword stringConstant ) #{ “.” } ;
classHeader : bang className #{ identifier } keyword stringConstant bang ;
method : messagePattern #{ temporaries } #{ primitive } statements ;
messagePattern : unarySelector | binarySelector variableName | ( keyword variableName )+ ;
temporaries : verticalBar ( variableName )* verticalBar ;
statements : #{ nonEmptyStatements } ;
nonEmptyStatements : uparrow expression #{ “.” } | expression #{ dot statements } ;
expression : ( variableName assign )? variableName assign expression | simpleExpression ;
simpleExpression : primary #{ messageExpression ( semicolon messageElt )* } ;
messageElt : ( unarySelector | binarySelector unaryObjectDescription | ( keyword binaryObjectDescription )+ ) ;
messageExpression : unaryExpression | binaryExpression | keywordExpression ;
unaryExpression : ( unarySelector )+ #{ binaryExpression | keywordExpression } ;
binaryExpression : ( binarySelector unaryObjectDescription )+ #{ keywordExpression } ;
keywordExpression : ( keyword binaryObjectDescription )+ ;
unaryObjectDescription : primary ( unarySelector )* ;
binaryObjectDescription : primary ( unarySelector )* ( binarySelector unaryObjectDescription )* ;
primary : literal | variableName | block | openParen expression closeParen ;
literal : numberConstant | characterConstant | stringConstant | sharp ( symbol | array ) ;
block : openBracket #{ ( colon variableName )+ verticalBar } statements closeBracket ;
array : openParen ( arrayConstantElt )* closeParen ;
arrayConstantElt : numberConstant | characterConstant | stringConstant | symbol | array ;
symbol : ( identifier | binarySelector | keyword ) ;
unarySelector : identifier ;
binarySelector : binaryOperator | verticalBar ;
type : openParen className closeParen ;
className : identifier ;
variableName : identifier ;
bang : “!” ;
uparrow : “^” ;
dot : “.” ;
assign : “:=|_” ;
semicolon : “;” ;
sharp : “#” ;
colon : “:” ;
openBracket : “[” ;
closeBracket : “]” ;
openParen : “(” ;
closeParen : “)” ;
verticalBar : “|” ;
binaryOperator : “([/<>%\&?,+=\@-\*\~])#{[/<>%\&?,!+=\@|-\*\~]}” ;
keyword : KEYWORD ;
identifier : “[a-zA-Z][a-zA-Z0-9]*” ;
characterConstant : “$~[@\n\r\t\ ]” ;
stringConstant : STRING_LITERAL ;
numberConstant : “[0-9]+” ;
primitive : PRIMITIVE ;