BYACC/J

Java extension

v 1.15

Project  Download  Changes


What is it?

BYACC/J is an extension of the Berkeley v 1.8 YACC-compatible parser generator. Standard YACC takes a YACC source file, and generates one or more C files from it, which if compiled properly, will produce a LALR-grammar parser. This is useful for expression parsing, interactive command parsing, and file reading. Many megabytes of YACC code have been written over the years.

 This is the standard YACC tool that is in use every day to produce C/C++ parsers. I have added a "-J" flag which will cause BYACC to generate Java source code, instead. So there finally is a YACC for Java now!


How does BYACC/J compare to other parser generators?

Of course, the original YACC design is about twenty years old now, and newer and better technologies are currently available. I think Jacc is great, and so is Java Cup. Both of these provide more thorough parsing of LALR and LL grammars than the venerable YACC. Yet the idea of a YACC for Java is, in my opinion, extremely valuable.
Several benefits are derived from a Java parser-generator of this sort:


How do I use it?

First of all, read a YACC book. Since this is actual Berkeley YACC, all of the usual procedures using YACC apply here. Some good descriptions and tutorials of YACC grammar and procedures are available in book form and on the Net. Here is the standard YACC manual page. Since Java's syntax is different from C, certain format differences must be followed. A typical YACC source file consists of the following:

  1. The first part of the file is the DECLARATIONS area, where you define tokens, precedences, etc.

  2. The second part is the ACTIONS area, where the grammar and the user's C actions are parsed.

  3. The third part is the CODE area, where user functions are added.

They are separated by '%%' at the start of a line. Visually:
DECLARATIONS
%%
ACTIONS
%%
CODE

Portions of the file can be set off and ignored by YACC by surrounding them with %{ and %} . This ability is typically used in YACC C files to insert definitions and #include statements. BYACC/J uses this area for the Java package and import statements. Everything after this will be wrapped in a Java class called Parser. Thus all functions you write will become methods belonging to Parser.
All of the user actions you write will be Java code inserted into a method called yyparse(). This means that only curly braces '{,}' are allowed. No classes or methods can be defined in the ACTIONS area. Of course, your code can instantiate classes and call methods.

Here is our example Java implementation of the classic YACC calculator demo:

%{
import java.lang.Math;
import java.io.*;
import java.util.StringTokenizer;
%}

/* YACC Declarations */
%token NUM
%left '-' '+'
%left '*' '/'
%left NEG /* negation--unary minus */
%right '^' /* exponentiation */

/* Grammar follows */
%%
input: /* empty string */
 | input line
 ;

line: '\n'
 | exp '\n' { System.out.println(" " + $1.dval + " "); }
 ;

exp: NUM { $$ = $1; }
 | exp '+' exp { $$ = new ParserVal($1.dval + $3.dval); }
 | exp '-' exp { $$ = new ParserVal($1.dval - $3.dval); }
 | exp '*' exp { $$ = new ParserVal($1.dval * $3.dval); }
 | exp '/' exp { $$ = new ParserVal($1.dval / $3.dval); }
 | '-' exp %prec NEG { $$ = new ParserVal(-$2.dval); }
 | exp '^' exp { $$ = new ParserVal(Math.pow($1.dval, $3.dval)); }
 | '(' exp ')' { $$ = $2; }
 ;
%%

String ins;
StringTokenizer st;

void yyerror(String s)
{
 System.out.println("par:"+s);
}

boolean newline;
int yylex()
{
String s;
int tok;
Double d;
 //System.out.print("yylex ");
 if (!st.hasMoreTokens())
 if (!newline)
 {
 newline=true;
 return '\n'; //So we look like classic YACC example
 }
 else
 return 0;
 s = st.nextToken();
 //System.out.println("tok:"+s);
 try
 {
 d = Double.valueOf(s);/*this may fail*/
 yylval = new ParserVal(d.doubleValue()); //SEE BELOW
 tok = NUM;
 }
 catch (Exception e)
 {
 tok = s.charAt(0);/*if not float, return char*/
 }
 return tok;
}

void dotest()
{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
 System.out.println("BYACC/J Calculator Demo");
 System.out.println("Note: Since this example uses the StringTokenizer");
 System.out.println("for simplicity, you will need to separate the items");
 System.out.println("with spaces, i.e.: '( 3 + 5 ) * 2'");
 while (true)
 {
 System.out.print("expression:");
 try
 {
 ins = in.readLine();
 }
 catch (Exception e)
 {
 }
 st = new StringTokenizer(ins);
 newline=false;
 yyparse();
 }
}

public static void main(String args[])
{
 Parser par = new Parser(false);
 par.dotest();
}

If this code were in a file called calc.y, you would yacc-process it with the command:
yacc -J calc.y
This will generate the file Parser.java, which can then be compiled by:
javac Parser.java
to create the file Parser.class which can be run with:
java Parser

The same file, but using the semantic_type option, as in
yacc -Jsemantic=double tf.y
is available here.


Command Line Options

In addition to the normal yacc command line switches, I have supplied these:
 


User-Supplied Methods

In order for javac to compile the code properly, the user must supply two methods in the YACC source:


About the Generated Java File

I suggest heartily that the user peruse the file Parser.java to see how YACC's parsing algorithm works. I have done an immense amount of analysis and reverse engineering of the original BYACC sources. The Java code that is generated is heavily commented, and is amenable to debugging, and can provide a nice education in the workings of a YACC parser.

Optionally, the class generated is made an extension of Thread, as a convience, so that parsing may be performed as a background thread, allowing the current execution to continue unimpeded.  A run() method and a constructor are also inserted into the code.

However, it may occur that the programmer needs to extend a different class.  In this case, the -x<classname> option is provided, which will create an alternate extension.  Since it is impossible to predict the needs of the other class, the run() and constructor will be omitted.


About the ParserVal class

Previously, BYACC/J gave the programmer a choice of either double or int semantic (the value of a number or string) values.  This worked for very simple parsing, but was extremely limiting.  It would have been very difficult to mix value types within a file, thus making things like interpreters and compilers impossible.

The semantic value is stored in a public class called ParserVal, which is defined thusly:
 

public class ParserVal
{
  public int ival;
  public double dval;
  public String sval;
  public Object obj;
  public ParserVal(int val)
  {
    ival=val;
  }
  public ParserVal(double val)
  {
    dval=val;
  }
  public ParserVal(String val)
  {
    sval=val;
  }
  public ParserVal(Object val)
  {
    obj=val;
  }
}//end class

So now a semantic value can be an int, a double, a String, or an Object. In your scanner (or something that yylex() calls), you may use this like:
 

yylval = new ParserVal(doubleval);
yylval = new ParserVal(integerval);
...or even something like...
yylval = new ParserVal(new myTypeOfObject());

And on the Left Hand Side (the YACC side) you can use the values of the $ and the $$ just as easily:
 

$$.ival = $1.ival + $2.ival;
$$.dval = $1.dval - $2.dval;

A side effect of using this inner class is that the default parser no longer fits into one .class file, however, the resulting ParserVal.class is extremely small.


Examples

As time goes on, we will provide some examples and templates to speed you on your way.


Why?

Because someone said YACC couldn't be done in Java. Silly person!


Credits

Of course, thanks go to Tom Corbett for BYACC, a fine implementation of YACC. And thanks to his altruistic nature for putting it in the Public Domain. I just added the Java switch. Check the ACKNOWLEDGEMENTS file for more contributors. 



Changes

Changes from 0.92 to 1.0a

This version has one minor change from 0.91.
Instead of "extends Thread" being the default description of the Parser class, it must be stated on the command line.
The default declaration is:
public class Parser
This can be changed with yacc -Jclass=<classname> to something like:
public class MyParserClass
This can be modified with  -Jextends=<classname> to something like:
public class MyParserClass extends MyBaseClass
Finally, it can be further modified with  -Jimplements=<interfaceName> to something like:
public class MyParserClass extends MyBaseClass implements MyInterface
Another nice change, is that the package name can be more easily set with -Jpackage=<packageName> to read:
package org.fluffy.poodle;

New in 1.0a

We have performed considerable debug, reorganization, and output cleaning. The output code is now Javadoc capable.

New in 1.1

Added the option -Jstack=NNN to allow users to set the semantic stack size.

New in 1.11

-d option is now supported with -J. It creates interface with token constants.
Workaround for problem when static class initializer of the generated Parser is larger than 64Kb.
Improved performance by removing some unused code and making use of exceptions.
Added -Jnodebug option to omit debugging code for further better performance
Added -Jfinal option to make generated class final
Added -Jthrows option to declare thrown exceptions for yyparse() method
Added the ability to specify options within the grammar file

New in 1.12

bug #1406129 [ -Jclass=X12345678 -> Segmentation fault ] fixed

New in 1.13

bug #1506924 - ArrayIndexOutOfBoundException in yyparse() fixed

New in 1.14

Generics can be now used in semantic type.
fixed problem with the name of verbose file
bug #1598776 - '-Jnodebug' option is broken fixed
bug #1600683 - error recovery throws ArrayIndexOutOfBoundsException: -1 fixed

New in 1.15

bug #1638577 - stack corruption when generating a java parser fixed
bug #1630851 - Targets with empty right-hand side problematic in Java mode fixed

BYACC/J Project Page

We are now hosted at SourceForge!
SourceForge Logo

Download

The latest modified/cleaned up/updated Berkeley Yacc source files, GNU makefile, and Borland C++ 5 Makefile can be obtained bellow, in addition to several binary builds. These are native console applications, so they do not require any class libraries to work. Of course, you will need a Java development environment to process the generated source files.
And remember, this version of YACC also parses "standard" C/C++ YACC source files!
 
 

Binaries for Solaris/Sparc, Win32, Mac OS X, and Linux/Intel. Source files, GNU and Borland Makefiles in a GZIP TAR file. (WinZip can unpack these!)


Older releases are available here.


Questions

YACC has already been described many times, and in great detail, so I would appreciate that BYACC/J users' questions about YACC and LALR parsers be directed to the many good sources available on the Net and in print. In other words, I will not do your homework for you! ;-) However, I would be happy to help with the Java file generation, as that is the portion that I have implemented. 


For more information, please write to Tomas Hurka
Last updated: Nov 28 2008