JavaでAST作るのは大変

コンパイラの実装言語にあると良い機能でJavaでAST(抽象構文木)作るのが大変だという話に触れたのだけれども、実は自分の知らない良いやり方があるのではないかと思ってClojureなどJavaで実装された処理系のソースを覗いてみました。
結論:やっぱり大変。
Clojureの場合

% ls clojure-1.2.0-RC2/src/jvm/clojure/lang/
AFn.java                  ILookupSite.java                  Namespace.java
AFunction.java            ILookupThunk.java                 Numbers.java
AMapEntry.java            IMapEntry.java                    Obj.java
APersistentMap.java       IMeta.java                        PersistentArrayMap.java
APersistentSet.java       IObj.java                         PersistentHashMap.java
APersistentVector.java    IPersistentCollection.java        PersistentHashSet.java
ARef.java                 IPersistentList.java              PersistentList.java
AReference.java           IPersistentMap.java               PersistentQueue.java
ASeq.java                 IPersistentSet.java               PersistentStructMap.java
ATransientMap.java        IPersistentStack.java             PersistentTreeMap.java
ATransientSet.java        IPersistentVector.java            PersistentTreeSet.java
Agent.java                IProxy.java                       PersistentVector.java
ArrayChunk.java           IReduce.java                      ProxyHandler.java
ArraySeq.java             IRef.java                         RT.java
Associative.java          IReference.java                   Range.java
Atom.java                 ISeq.java                         Ratio.java
Binding.java              ITransientAssociative.java        Ref.java
Box.java                  ITransientCollection.java         Reflector.java
ChunkBuffer.java          ITransientMap.java                Repl.java
ChunkedCons.java          ITransientSet.java                RestFn.java
Compile.java              ITransientVector.java             Reversible.java
Compiler.java             Indexed.java                      Script.java
Cons.java                 IndexedSeq.java                   SeqEnumeration.java
Counted.java              IteratorSeq.java                  SeqIterator.java
Delay.java                Keyword.java                      Seqable.java
DynamicClassLoader.java   KeywordLookupSite.java            Sequential.java
EnumerationSeq.java       LazilyPersistentVector.java       Settable.java
Fn.java                   LazySeq.java                      Sorted.java
IChunk.java               LineNumberingPushbackReader.java  StringSeq.java
IChunkedSeq.java          LispReader.java                   Symbol.java
IDeref.java               LockingTransaction.java           TransactionalHashMap.java
IEditableCollection.java  MapEntry.java                     Util.java
IFn.java                  MapEquivalence.java               Var.java
IKeywordLookup.java       MethodImplCache.java              XMLHandler.java
ILookup.java              MultiFn.java
ILookupHost.java          Named.java

JRubyの場合

% ls jruby-1.5.1/src/org/jruby/ast/
AliasNode.java                    DStrNode.java                      NilNode.java
AndNode.java                      DSymbolNode.java                   Node.java
ArgAuxillaryNode.java             DVarNode.java                      NodeType.java
ArgsCatNode.java                  DXStrNode.java                     NonLocalControlFlowNode.java
ArgsNoArgNode.java                DefinedNode.java                   NotNode.java
ArgsNode.arities.erb              DefnNode.java                      NthRefNode.java
ArgsNode.erb                      DefsNode.java                      OpAsgnAndNode.java
ArgsNode.java                     DotNode.java                       OpAsgnNode.java
ArgsPreOneArgNode.java            EncodingNode.java                  OpAsgnOrNode.java
ArgsPreTwoArgNode.java            EnsureNode.java                    OpElementAsgnNode.java
ArgsPushNode.java                 EvStrNode.java                     OpElementOneArgAndAsgnNode.java
ArgumentNode.java                 FCallManyArgsBlockNode.java        OpElementOneArgAsgnNode.java
ArrayNode.java                    FCallManyArgsBlockPassNode.java    OpElementOneArgOrAsgnNode.java
AssignableNode.java               FCallManyArgsNode.java             OptArgNode.java
AttrAssignNode.java               FCallNoArgBlockNode.java           OrNode.java
AttrAssignOneArgNode.java         FCallNoArgBlockPassNode.java       PostExeNode.java
AttrAssignThreeArgNode.java       FCallNoArgNode.java                PreExeNode.java
AttrAssignTwoArgNode.java         FCallNode.java                     RedoNode.java
BackRefNode.java                  FCallOneArgBlockNode.java          RegexpNode.java
BeginNode.java                    FCallOneArgBlockPassNode.java      RescueBodyNode.java
BignumNode.java                   FCallOneArgNode.java               RescueNode.java
BinaryOperatorNode.java           FCallSpecialArgBlockNode.java      RestArgNode.java
BlockAcceptingNode.java           FCallSpecialArgBlockPassNode.java  RetryNode.java
BlockArg18Node.java               FCallSpecialArgNode.java           ReturnNode.java
BlockArgNode.java                 FCallThreeArgBlockNode.java        RootNode.java
BlockNode.java                    FCallThreeArgBlockPassNode.java    SClassNode.java
BlockPassNode.java                FCallThreeArgNode.java             SValue19Node.java
BreakNode.java                    FCallTwoArgBlockNode.java          SValueNode.java
CallManyArgsBlockNode.java        FCallTwoArgBlockPassNode.java      SelfNode.java
CallManyArgsBlockPassNode.java    FCallTwoArgNode.java               Splat19Node.java
CallManyArgsNode.java             FalseNode.java                     SplatNode.java
CallNoArgBlockNode.java           FileNode.java                      StarNode.java
CallNoArgBlockPassNode.java       FixnumNode.java                    StrNode.java
CallNoArgNode.java                FlipNode.java                      SuperNode.java
CallNode.java                     FloatNode.java                     SymbolNode.java
CallOneArgBlockNode.java          ForNode.java                       ToAryNode.java
CallOneArgBlockPassNode.java      GlobalAsgnNode.java                TrueNode.java
CallOneArgFixnumNode.java         GlobalVarNode.java                 TypedArgumentNode.java
CallOneArgNode.java               Hash19Node.java                    UndefNode.java
CallSpecialArgBlockNode.java      HashNode.java                      UnnamedRestArgNode.java
CallSpecialArgBlockPassNode.java  IArgumentNode.java                 UntilNode.java
CallSpecialArgNode.java           IScopingNode.java                  VAliasNode.java
CallThreeArgBlockNode.java        IfNode.java                        VCallNode.java
CallThreeArgBlockPassNode.java    InstAsgnNode.java                  WhenNode.java
CallThreeArgNode.java             InstVarNode.java                   WhenOneArgNode.java
CallTwoArgBlockNode.java          InvisibleNode.java                 WhileNode.java
CallTwoArgBlockPassNode.java      IterNode.java                      XStrNode.java
CallTwoArgNode.java               LambdaNode.java                    YieldNode.java
CaseNode.java                     ListNode.java                      YieldOneNode.java
ClassNode.java                    LiteralNode.java                   YieldThreeNode.java
ClassVarAsgnNode.java             LocalAsgnNode.java                 YieldTwoNode.java
ClassVarDeclNode.java             LocalVarNode.java                  ZArrayNode.java
ClassVarNode.java                 Match2Node.java                    ZSuperNode.java
Colon2ConstNode.java              Match3Node.java                    ZYieldNode.java
Colon2ImplicitNode.java           MatchNode.java                     ZeroArgNode.java
Colon2MethodNode.java             MethodDefNode.java                 executable
Colon2Node.java                   ModuleNode.java                    java_signature
Colon3Node.java                   MultipleAsgn19Node.java            types
ConstDeclNode.java                MultipleAsgnNode.java              util
ConstNode.java                    NewlineNode.java                   visitor
DAsgnNode.java                    NextNode.java
DRegexpNode.java                  NilImplicitNode.java

Jythonの場合

% ls src/org/python/core/
AbstractArray.java               PyClass.java                PyReflectedConstructor.java
AnnotationReader.java            PyClassMethod.java          PyReflectedField.java
ArgParser.java                   PyClassMethodDerived.java   PyReflectedFunction.java
AstList.java                     PyClassMethodDescr.java     PyReversedIterator.java
BaseSet.java                     PyCode.java                 PyRunnable.java
BuiltinDocs.java                 PyComplex.java              PyRunnableBootstrap.java
BytecodeLoader.java              PyComplexDerived.java       PySequence.java
ClassDictInit.java               PyCompoundCallable.java     PySequenceIter.java
ClasspathPyImporter.java         PyDataDescr.java            PySequenceList.java
ClasspathPyImporterDerived.java  PyDescriptor.java           PySet.java
CodeBootstrap.java               PyDictProxy.java            PySetDerived.java
CodeFlag.java                    PyDictionary.java           PySingleton.java
CodeLoader.java                  PyDictionaryDerived.java    PySlice.java
CompileMode.java                 PyEllipsis.java             PySlot.java
CompilerFacade.java              PyEnumerate.java            PyStaticMethod.java
CompilerFlags.java               PyEnumerateDerived.java     PyString.java
ContextGuard.java                PyException.java            PyStringDerived.java
ContextManager.java              PyFastSequenceIter.java     PyStringMap.java
Deriveds.java                    PyFile.java                 PySuper.java
FilelikeInputStream.java         PyFileDerived.java          PySuperDerived.java
FunctionThread.java              PyFileReader.java           PySyntaxError.java
FutureFeature.java               PyFileWriter.java           PySystemState.java
IdImpl.java                      PyFinalizableInstance.java  PyTableCode.java
InitModule.java                  PyFloat.java                PyTraceback.java
JavaImportHelper.java            PyFloatDerived.java         PyTuple.java
JavaImporter.java                PyFrame.java                PyTupleDerived.java
JythonInitializer.java           PyFrozenSet.java            PyType.java
MakeProxies.java                 PyFrozenSetDerived.java     PyTypeDerived.java
NewCompilerResources.java        PyFunction.java             PyUnicode.java
Opcode.java                      PyFunctionTable.java        PyUnicodeDerived.java
Options.java                     PyGenerator.java            PyXRange.java
ParserFacade.java                PyIgnoreMethodTag.java      PythonCodeBundle.java
Pragma.java                      PyIndentationError.java     PythonCompiler.java
PragmaReceiver.java              PyInstance.java             PythonTraceFunction.java
Py.java                          PyInteger.java              ReflectedArgs.java
PyArray.java                     PyIntegerDerived.java       ReflectedCallData.java
PyArrayDerived.java              PyIterator.java             SequenceIndexDelegate.java
PyBaseCode.java                  PyJavaPackage.java          Slotted.java
PyBaseException.java             PyJavaType.java             StderrWrapper.java
PyBaseExceptionDerived.java      PyList.java                 StdoutWrapper.java
PyBaseString.java                PyListDerived.java          SyspathArchive.java
PyBeanEvent.java                 PyLong.java                 SyspathJavaLoader.java
PyBeanEventProperty.java         PyLongDerived.java          ThreadState.java
PyBeanProperty.java              PyMethod.java               ThreadStateMapping.java
PyBoolean.java                   PyMethodDescr.java          TraceFunction.java
PyBuiltinCallable.java           PyModule.java               WrappedIterIterator.java
PyBuiltinClassMethodNarrow.java  PyModuleDerived.java        __builtin__.java
PyBuiltinFunction.java           PyNewWrapper.java           adapter
PyBuiltinFunctionNarrow.java     PyNone.java                 codecs.java
PyBuiltinFunctionSet.java        PyNotImplemented.java       exceptions.java
PyBuiltinMethod.java             PyObject.java               imp.java
PyBuiltinMethodNarrow.java       PyObjectDerived.java        io
PyBuiltinMethodSet.java          PyOverridableNew.java       packagecache
PyBytecode.java                  PyProperty.java             ucnhashAPI.java
PyCallIter.java                  PyPropertyDerived.java      util
PyCell.java                      PyProxy.java

ファイル名を見てもらえれば分かると思いますが、1ノードタイプ⇔1ファイルです。
何故こんなことになるかというと原因は二つあって

  1. Javaでは1クラスにつき1ファイル必要
  2. オブジェクト指向言語ではCompositeパターンで木構造を実装する

です。Compositeパターンでは葉の種類毎に1クラスを作るので、抽象構文木のノードタイプ数だけファイルができる事になります。
ソースの自動生成とかの工夫はしているのかもしれないですが未確認です。

ちなみにVariantのある言語だと、例えばOCamlならファイルは一つで良くて(parsetree.mli)

% ls ocaml-3.11.2/parsing/
asttypes.mli  lexer.mll    linenum.mll  location.mli  longident.mli  parse.mli   parsetree.mli  printast.mli  syntaxerr.mli
lexer.mli     linenum.mli  location.ml  longident.ml  parse.ml       parser.mly  printast.ml    syntaxerr.ml

以下の様に書けます。

type expression =
  { pexp_desc: expression_desc;
    pexp_loc: Location.t }

and expression_desc =
    Pexp_ident of Longident.t
  | Pexp_constant of constant
  | Pexp_let of rec_flag * (pattern * expression) list * expression
  | Pexp_function of label * expression option * (pattern * expression) list
  | Pexp_apply of expression * (label * expression) list
  | Pexp_match of expression * (pattern * expression) list
  | Pexp_try of expression * (pattern * expression) list
  | Pexp_tuple of expression list
  | Pexp_construct of Longident.t * expression option * bool
  | Pexp_variant of label * expression option
  | Pexp_record of (Longident.t * expression) list * expression option
  | Pexp_field of expression * Longident.t
  | Pexp_setfield of expression * Longident.t * expression
  | Pexp_array of expression list
  | Pexp_ifthenelse of expression * expression * expression option
  | Pexp_sequence of expression * expression
  | Pexp_while of expression * expression
  | Pexp_for of string * expression * expression * direction_flag * expression
  | Pexp_constraint of expression * core_type option * core_type option
  | Pexp_when of expression * expression
  | Pexp_send of expression * string
  | Pexp_new of Longident.t
  | Pexp_setinstvar of string * expression
  | Pexp_override of (string * expression) list
  | Pexp_letmodule of string * module_expr * expression
  | Pexp_assert of expression
  | Pexp_assertfalse
  | Pexp_lazy of expression
  | Pexp_poly of expression * core_type option
  | Pexp_object of class_structure