JEP 301: Enhanced Enums

Owner	Maurizio Cimadamore
Type	Feature
Scope	SE
Status	Closed / Withdrawn
Component	tools / javac
Discussion	platform dash jep dash discuss at openjdk dot java dot net
Effort	M
Duration	M
Relates to	JEP 286: Local-Variable Type Inference
Created	2016/11/25 11:27
Updated	2020/09/29 20:27
Issue	8170351

Summary

Enhance the expressiveness of the enum construct in the Java Language by allowing type-variables in enums (generic enums), and performing sharper type-checking for enum constants.

Goals

These two enhancements work together to enable enum constants to carry constant-specific type information as well as constant-specific state and behavior. There are many situations where developers have to refactor enums into classes in order to achieve the desired result; these enhancements should reduce this need.

The following example shows how the two enhancements work together:

enum Argument<X> { // declares generic enum
   STRING<String>(String.class), 
   INTEGER<Integer>(Integer.class), ... ;

   Class<X> clazz;

   Argument(Class<X> clazz) { this.clazz = clazz; }

   Class<X> getClazz() { return clazz; }
}

Class<String> cs = Argument.STRING.getClazz(); //uses sharper typing of enum constant

Non-Goals

This JEP targets specific enhancements to how enum constants are type-checked. As such, other enum-related features such as:

allow enum subclassing
allow enum in non-static contexts

are outside the scope of this JEP.

Motivation

Java enums are a powerful construct. They allow grouping of constants - where each constant is a singleton object. Each constant can optionally declare a body, which can be used to override the behavior of the base enum declaration. In the following we will try to model the set of Java primitive types using an enum. Here's a start:

enum Primitive {
    BYTE,
    SHORT,
    INT,
    FLOAT,
    LONG,
    DOUBLE,
    CHAR,
    BOOLEAN;
}

As stated above, an enum declaration is like a class, and can have constructors; we can use this feature to keep track of the boxed class and the default value of each primitive:

enum Primitive {
    BYTE(Byte.class, 0),
    SHORT(Short.class, 0),
    INT(Integer.class, 0),
    FLOAT(Float.class, 0f),
    LONG(Long.class, 0L),
    DOUBLE(Double.class, 0d),
    CHAR(Character.class, 0),
    BOOLEAN(Boolean.class, false);

    final Class<?> boxClass;
    final Object defaultValue;

    Primitive(Class<?> boxClass, Object defaultValue) {
       this.boxClass = boxClass;
       this.defaultValue = defaultValue;
    }

}

While this is rather nice, there are some limitations: that the field boxClass is loosely typed as Class<?>, as the field type needs to be compatible with all the sharper types used by the enum constants. As a result, any attempt to do something like this:

Class<Short> cs = SHORT.boxedClass(); //error

Will fail with a compile-time error. Even worse, the field defaultValue has a type of Object. This is unavoidable since the field needs to be shared across multiple constants modelling different primitive types. Hence, static safety is lost, as the compiler allows code like the following:

String s = (String)INT.defaultValue(); //ok

Let's now try to extend the enum and add some operations to the constants modelling primitive types (for the sake of brevity, in the remainder we will only show a subset of the constants):

enum Primitive {
    INT(Integer.class, 0) {
       int mod(int x, int y) { return x % y; }
       int add(int x, int y) { return x + y; }
    },
    FLOAT(Float.class, 0f)  {
       long add(long x, long y) { return x + y; }
    }, ... ;

    final Class<?> boxClass;
    final Object defaultValue;

    Primitive(Class<?> boxClass, Object defaultValue) {
       this.boxClass = boxClass;
       this.defaultValue = defaultValue;
    }

}

Again, this results in problems, as there's no way to do something like this:

int seven = INT.add(3, 4); //error

That's because the static type of INT is simply Primitive and Primitive has no member named add. So, in order to add operations to our enum, we need to add the members to the enum declaration itself, as follows:

enum Primitive {
    INT(Integer.class, 0),
    FLOAT(Float.class, 0f), ... ;

    final Class<?> boxClass;
    final Object defaultValue;

    Primitive(Class<?> boxClass, Object defaultValue) {
       this.boxClass = boxClass;
       this.defaultValue = defaultValue;
    }

    int mod(int x, int y) {
       if (this == INT) {
          return x % y;
       } else {
          throw new IllegalStateException();
       }
    }

    int add(int x, int y) {
        if (this == INT) {
          return x + y;
       } else {
          throw new IllegalStateException();
       }
    }

    long add(float x, float y) {
        if (this == FLOAT) {
          return x + y;
       } else {
          throw new IllegalStateException();
       }
    }
    ...

}

But the code above has, again, several problems. First, this breaks encapsulation: suddenly, Primitive acquires a bunch of members, none of which make sense for all the constants. As a result, the implementation of each method becomes more convoluted, as the methods must check whether they have been called on the right enum constant. Type-safety is also lost, as the compiler will not detect bad usages such as:

int zero = FLOAT.mod(50, 2); //ok

All the problems described above can be addressed by removing specific asymmetries between enums and classes, and by refining the way in which enum constants are type-checked. More precisely:

allow type-parameter in enum declarations
do not prematurely erase sharp type-information associated with enum constants

With these enhancements, the Primitive enum can be rewritten as follows:

enum Primitive<X> {
    INT<Integer>(Integer.class, 0) {
       int mod(int x, int y) { return x % y; }
       int add(int x, int y) { return x + y; }
    },
    FLOAT<Float>(Float.class, 0f)  {
       long add(long x, long y) { return x + y; }
    }, ... ;

    final Class<X> boxClass;
    final X defaultValue;

    Primitive(Class<X> boxClass, X defaultValue) {
       this.boxClass = boxClass;
       this.defaultValue = defaultValue;
    }
}

This generic declaration is clearly more expressive than the previous one - now the enum constant Primitive.INT has a sharper parameterized type Primitive<Integer> which means that its members are also sharply typed:

Class<Short> cs = SHORT.boxedClass(); //ok!

Also, since type information on enum constants is not prematurely erased, the compiler can reason about membership of constants - as demonstrated below:

int zero_int = INT.mod(50, 2); //ok
int zero_float = FLOAT.mod(50, 2); //error

The compiler is now able to reject the second statement as there's no member mod in the enum constant FLOAT - which guarantees extra type-safety.

Description

Generic enums

As discussed in JDK-6408723, an important requirement for allowing generics in enums is that type-parameters are fully bound in the enum constant declaration. This allows for a straightforward translation scheme which can augment the one we have today - for instance, given an enum declaration like the following:

enum Foo<X> {
   ONE<String>,
   TWO<Integer>;
}

The corresponding desugared code will look as follows:

/* enum */ class Foo<X> {
   static Foo<String> ONE = ...
   static Foo<Integer> TWO = ...

   ...
}

That is, it is still possible to map each constant to a static field declaration, as type bindings are all statically known.

It might be desirable to allow diamond on enum constant initialization - for instance:

enum Bar<X> {
   ONE<>(Integer.class),
   TWO<>(String.class);

   Bar(X x) { ... }
}

If the diamond syntax is used, special care is required if the enum constant has a body (i.e. it is translated into an anonymous class) and the inferred type is non-denotable. As in the case for diamond with anonymous inner classes, the compiler will have to reject that case.

Sharper typing of enum constants

Under current rules, the static type of an enum constant is the enum type itself. Under such rules, the constants Foo.ONE and Foo.TWO above will both have the same type, namely Foo. This is undesirable for at least two reasons:

in case of a generic enum (as Foo), the static type of a constant is not sharp enough to capture the full type info carried by that constant
even in the absence of generic enum, the constant type is not sharp enough to let a client access a member that is only defined on that enum constant (see the example at the beginning of this page)

To overcome this limitation, typing of enum constants should be redefined so that a given enum constant gets its own type. Let E be an enum declaration, and C be a (possibly generic) enum constant declaration in E. The constant C is associated with a sharper type if either of the following conditions are satisfied:

C is of the kind C<T1, T2 ... Tn> but declares no body; the constant sharper type is E<T1, T2 ... Tn>
C has a body; the constant sharper type is an anonymous type (written E.C) whose supertype is either
- E<T1, T2, ... Tn> if C is of the kind C<T1, T2, ... Tn> and E is a generic enum
- E, if E is non-generic

These enhanced typing rule allow the static types for Foo.ONE and the one for Foo.TWO to be different.

Additional Considerations

Binary compatibility

Let's assume we have the following enum:

enum Test {
   A { void a() { } }
   B { void b() { } }
}

As we have seen, this would be translated as follows:

/* enum */ class Test {
   static Test A = new Test() { void a() { } }
   static Test B = new Test() { void b() { } }
}

If we allow sharper type for enum constants, a naive approach would translate the code as follows:

/* enum */ class Test {
   static Test$1 A = new Test() { void a() { } }
   static Test$2 B = new Test() { void b() { } }
}

Here, the binary incompatibility is manifest: the type of the enum constant A just changed from Test to Test$1 upon recompilation. This change is going to break non-recompiled clients using Test.

To overcome this problem, it is better to take an erasure-based approach: while the static type of A might be the sharper type Test.A - any reference to the type of the constant gets erased to the base enum type Test. This leads to code that is binary compatible with respect to what we had before. However, if everything gets erased to Test, how is access to members of a specific enum constants implemented?

Foo.A.a();

It is easy to see that, if in the code above, symbolic references to A are erased to Test, the method call will not be well-typed (as Test does not have a member named a). To overcome this problem, the compiler has to insert a synthetic cast:

checkcast Test$1
invokevirtual Test$1::a

This is not dissimilar with what happens when accessing members of an intersection type through erasure.

Another orthogonal observation is that the current naming scheme for enum constants classes is too fragile - the names Test$1 and Test$2 shown above are essentially order-dependent - this means that changing the order in which enum constants are declared could lead to binary compatibility issues. More specifically, if in the code above A is swapped with B and the enum is recompiled, the client bytecode above would fail to link, as Test$1 would no longer have a member method named a. This is in stark contrast with the respect to what the JLS has to say about binary compatible evolution of enums:

Adding or reordering constants in an enum will not break compatibility with pre-existing binaries.

One way to preserve binary compatible evolution would be to emit order insensitive class names, such as Test$A and Test$B instead of Test$1 and Test$2. The impact of such a change with respect to reflection and serialization is discussed below.

Serialization

In Java, all enums are implicitly serializable, as Enum implements Serializable. We would like that the changes provide here be serialization-compatible; they should not change the serialized form. The serialization specification:

http://docs.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#6469

provides special treatment for enums; the serialized form of an enum constant is its name only, and it is not possible to customize serialization/deserialization of an enum constant. (Note that all enum constants are initialized during the <clinit>, and the Enum.valueOf method that is used by deserialization calls the enum's static values() method, which implicitly forces initialization of the base enum class (and of all the constants)).

In other words, no compatibility problem with respect to the serialized form exists, as the serialized form already does not depend on the class name generated by the compiler.

Reflection

Another place where binary names come up is reflection. The following is perfectly legal reflective code:

Class<?> c = Class.forName("Test$1");
System.err.println(c.getName()); //prints Test$1

While reflection has restrictions in order to prevent an enum constant to be instantiated reflectively, there's no restriction for inspecting the members of an enum constant class. Therefore, existing code using the idiom above would cease to work should we change the binary form of enum constants.

Denotability

Currently, an enum constant is a value, not a type. So, a legitimate question is as to whether enum constants should also be denotable types.

The usual arguments apply here - on the one hand, having a denotable type for an enum constant makes it less magic, and allow programmer to declare variable with that type. But there are also disadvantages:

could make the code less readable (e.g. A a = A) - as the same ident could mean both value and type
not clear as to whether all enum constants get their own type; what about an enum constant that does not declare any additional member? Is its type just an alias for the base enum type?

On the other hand, if the enum constant type is a non-denotable type, it becomes an opaque thing that programmers can only interact with indirectly (e.g. through type inference). To mitigate some of the drawbacks of a non-denotable type, it is important to note that the proposal to add local variable type inference could technically allow programmers to declare variables with the sharper enum type, even though it is non-denotable (e.g. var a = A).

Accessibility

There is one corner case with respect to accessibility of members through the enum sharper type. Consider the following case:

package a;

public enum Foo {
  A() { 
    public String s = "Hello!";
  };
}

package b;

class Client {
   public static void main(String[] args) {
      String s = Foo.A.s; //IllegalAccessError
   }
}

When executing this code, the VM will issue an IllegalAccessError; the problem is that the anonymous class for the enum constant Foo$A is package-private; as a result, an attempt to access a public field in a package-private class from another package will result in an access error. To overcome this problem, the enum constant class should have same modifier as the enum class in which it is defined.

Source compatibility

From a source compatibility perspective, there are cases in which sharper typing could leak out as a result of an interaction between this feature and type inference - consider the following code:

EnumSet<Test> e = EnumSet.of(Test.A);

The code above used to behave in a relatively straightforward fashion: the static type of Test.A is simply Test, meaning that inferring the type-variable of EnumSet.of was simple, as both constraints named the type Test. But if we change the way in which Test.A is type-checked, the behavior gets more interesting: the type-variable of EnumSet.of will get two competing constraints: it must be equal to Test (form the target-type) and it must be a supertype of Test.A. Luckily, in such a scenario, type inference is smart enough to prefer the stricter equality constraint, and ends up inferring Test. All things considered, the source compatibility impact of this change is not too different from the one in JDK-8075793, where the change caused capture variables to appear in more places instead of their upper bounds.

Risks and Assumptions

This proposal has two main risks outlined in the sections above:

change in binary names of enum constants could lead to issues with core reflection
change in typing of enum constants could result in subtle changes in method type inference, especially in the absence of a target-type

The first problem is probably nothing to be concerned about; as it has been shown, binary names of enum constants is currently very fragile and prone to re-ordering issues. As a result, any code that is relying on the binary name of an enum constant is inherently fragile, as it is essentially relying on the output of a specific compiler.

The second problem is more worrisome, as it could cause potential source compatibilities. In order to detect how frequent the source incompatibility scenario described above could be, we have measured how many times the EnumSet.of method was called with various arities; for each call we kept track of whether the call occurred in a context where a target type was available. Below are the results (the measurements have been taken against the full open JDK forest).

Total calls to EnumSet.of: 150
- calls with arity = 1 : 69
  - of which, without target-type: 0

In other words, the source compatibility scenario described above does not seem to pose any serious threat.

Dependencies

The sharper type used for an enum constant are not necessarily denotable; these would constitute another category of non-denotable types. This may interact with the treatment of non-denotable types in JEP-286 (Local Variable Type Inference). Depending on decisions made in JEP-286 regarding non-denotable types, one might be able to say:

var a = Argument.String;

and have the type of a be the sharper type Argument.String rather than the coarser type Argument.