JEP draft: Type operator expressions in the JVM

OwnerJohn Rose
TypeFeature
ScopeJDK
StatusDraft
Componenthotspot / runtime
Created2018/06/13 06:48
Updated2024/09/25 17:27
Issue8204937

DRAFT DRAFT DRAFT

Summary

Extend the space of JVM type descriptors to include type operators, which are symbolic references to factory-made types. This is a separable component of template classes.

Goals

Allow JVM type descriptors (for methods, fields, and constants) to make new distinctions between types not already present in the system of classes, primitives, and arrays. Support future translation strategies which must make distinctions between different usages of the same basic JVM type, or which must provide a way to specify factory input to a class factory or template species factory.

Non-Goals

This work is a low-level VM hook, like invokedynamic, not a language feature like lambdas. As such, it will not propose any specific mechanism for representing parameterized types; it will only provide a necessary "hook" to name such types. It will not provide a new way to define classes; it will only provide a way to associate such classes with a public symbolic descriptor. It will not define any language features, nor translation strategies. It will not attempt to extend, conflict, or rationalize the current syntax for static generic signatures (JVMS 4.7.9.1).

Success Metrics

Experimental translation strategies can be created which distinguish List<Integer> from List<String> in classfiles. Experimental class templating mechanisms will be able to create species that are denotable from JVM type descriptors. Designers of language features and translation strategies will be able to vary the encodings of new source-level types by changing a bootstrap method, rather than changing the JVM's core logic. Security proofs will be easier to construct, given the black-box nature of type operators, decoupled from the complex details of templates and other advanced language features. Experimental migration strategies can be tested without fully instantiating new language features, since new place-holder types easily be posited by simple changes in javac.

Motivation

Descriptors which can denote complex type instances, such as List<int> or List<ComplexDouble> are a necessary component of "reified generics", which in turn are a goal of Project Valhalla. If a value type is to "code like a class, work like an int", then it seems necessary to be able to denote container types which are customized to that value type, rather than being erased to Object like a reference type.

Description

We will extend the JVM's fundamental syntax for field descriptors, once for all future type schemes (we hope!). The syntax will allow any single type descriptor to be modified by an optional suffix, which has the effect of constraining the original type descriptor in an ad hoc, programmable manner. The combination of the original type descriptor and the suffix is called a type operator expression.

The resolvable semantic elements of this expression are:

All of the above semantic elements are optional; any may be omitted. If the type operator name is omitted, it will be derived from the carrier type, as in the case of a template class whose top type is the unspecialized class itself. If the carrier type is omitted, it is defined to be Object, the customary carrier for untyped values in the JVM.

For example, here are some potential use cases for type operator expressions:

The concrete grammar for such descriptors, including new productions, will be something like the following:

MethodType: '(' (FieldType)* ')' (FieldType | 'V')
FieldType: PrimitiveType | ArrayType | ObjectType | *TypeExpr
PrimitiveType: 'B' | 'C' | 'D' | 'F' | 'I' | 'J' | 'S' | 'Z'
ArrayType: '[' (PrimitiveType | ArrayType | ObjectType)
ObjectType: 'L' ClassName ';'
*TypeExpr: TypeCarrier '/' (TypeOpName)? (';' | '[' (TypeArg)+ ']' )
*TypeCarrier: FieldType | `L`
*TypeOpName: '$' Identifier | ('L' ClassName) (';' '$' Identifier)?
*TypeArg: FieldType | MethodType | NameArg | NumberArg
*NumberArg: ('-')? DigitNotZero (Digit)* ';' | '0' ';'
*NameArg: '$' Identifier ';'
Identifier: (any character except '.' ';' '[' '/' '<' '>' ':')*

This grammar is built on slightly edited form of the one in JVMS 4.3. The new productions which support type operators are TypeExpr, TypeCarrier, TypeOpName, TypeArg, NumberArg, and NameArg. (They are starred.) The production for Identifier is taken from JVMS 4.7.9.1.

A TypeExpr denotes a fresh type which is treated by the JVM as distinct from any other type with a different descriptor string, including primitives, arrays, classes, and other TypeExprs.

The syntactic components of a TypeExpr are a TypeCarrier, a TypeOpName, and a sequence of zero or more TypeArgs. These denote the resolvable semantic components of a resolved type operator expression, which are respectively the carrier type, the type oeprator name, and the type arguments.

Two TypeExprs with exactly the same spelling denote the same type. Any FieldType which is a proper prefix of another FieldType is a proper supertype of the longer FieldType. Other than those relations, the JVM does not recognize any equivalences or relations between types with differently spelled TypeExprs.

In particular, the verifier treats every distinct type operator expression as a generic "black box" type, which starts with the carrier type and constrains it in some way, unknowable to the verifier.

Thus, the verifier will allow values of the type operator type to implicitly convert to its carrier type, or any supertypes of its carrier type, but it will not allow such values to be converted to any other type. Also, the verifier will not convert implicitly from a carrier type to a type operator type built on top of that carrier type; such conversions must be performed by explicit bytecode execution.

Here are some syntax examples of descriptors containing type operator expressions (along with some hypothetical meanings):

The last four examples show that type operator expressions can nest. For example, L/LFoo[LBar;/$N;] denotes a type which is derived first from Bar by modifying it with N, then passing the modified type to the parameterized type constructor Foo. (The carrier type of the result is Object, not Foo.) The last two examples show that type expressions can nest by piling up several TypeOp suffixes. The order of these suffixes is significant purely because the descriptor strings are different: I/$J;/$K; is a different verifier type from I/$K;/$J; even if the computational effects of the J and K type modifiers happen to commute.

The JVM will accept type operator expressions, structured as TypeExpr strings, in the following contexts:

Normally, descriptor syntaxes are disjoint from the syntax of class names that appear with CONSTANT_Class constants. For example, the descriptor I is very different from the class name I. However, in some cases the syntaxes can overlap; the class name of an array is the same as its descriptor, including the trailing semicolon. We use this trick with type operator expressions also, so that the same type operator expression can be inserted directly into a descriptor, and also used as a class name.

A class name string can be unambiguously distinguished as a type operator expression in three steps:

If the first step and any of the remaining steps pass, then the class name string is proven not to be a plain class name or an array class name, and may be assumed to be a type operator expression (or else an erroneous input). Otherwise it can be assumed to be a plain class name (or array class name). Another simpler technique (though perhaps a slower one) is simply to parse the class name string as a simple class or array name, and see if the end is reached, or else the next remaining character is slash / introducing a type operator suffix; in that case the second step must be executed first.

The second and third steps are expensive but necessary, but can be deferred until after the first step, which is cheap. Note that the JVMS specifies that a class name may not contain an open bracket [ unless it is an array type name, and in that case the bracket will not follow a package separator /. Therefore the class name grammar is not ambiguous, even after type operator expressions are added.

Some operations on a type expression require access to the inside of the black box. These include loading a reflective constant for a type expression, making a type test (checkcast), making an array type whose component is the type expression, calling a method on an instance whose verified type is a type expression, etc.

The built-in resolution mechanism for type operator expressions will perform the following jobs:

The details of these steps and the associated APIs are defined elsewhere, and may be extended over time. See below for a sketch of resolved type descriptors and their behavior. Type operators are named by an optional class and optional identifier. If the class is present, it will help determine the bootstrap method; for example, if it is a template, the template will be specialized to the given arguments. If the identifier only is present, the BSM will be a centralized one which assigns fixed standard meanings to a small number of names.

When value types become available, type operator expressions will also be allowed to interoperate with value types. A given type operator expression will always be unambiguously assigned a kind, as a value or a reference. If other kinds are invented, type operator expressions will be "kinded" in the same way. For example, the '$' could be followed by a kind character, or additional characters besides '$' could be assigned to introduce type operator expressions of distinct various kinds.

The descriptor will not be a Class but will have its own reflective type and API. The descriptor will report a concrete carrier Class which is compatible with all values described by the original type operator expression. The BSM for a type operator may return a resolved type descriptor which reports only Object as its carrier class, or it may spin and load a new anonymous class, and use that. In either case, the JVM will be able to use the carrier class as a safe supertype for the type operator expression. The JVM will not freely convert from the carrier class to the type operator type, except via a checkcast bytecode, whose behavior is under the control of the resolved type descriptor selected by the BSM.

Note that the type operator expression language is self-contained and pre-normalized. It does not make references into any constant pool, nor is there any "calculus" for proving that two distinct type expressions denote the same type.

The API for resolved type descriptors will be something like this:

interface ResolvedTypeDescriptor<T extends C, C> {
  Class<T> resolvedType();
  Class<C> carrierType();
  static <T> ResolvedTypeDescriptor<T,?> of(Class<T> clazz);

  // These defaults may be wired into the JVM bytecodes if desired.
  default boolean isInstance(C x) {
    if (this != of(carrierClass()))  throw subclassResponsibility();
    return carrierClass().isInstance(x);
  }
  default boolean isAssignableFrom(ResolvedTypeDescriptor<?,?> subDesc) {
    if (this != of(carrierClass()))  throw subclassResponsibility();
    return carrierClass().isAssignableFrom(subDesc.resolvedType());
  }
  default T cast(C x) {
    if (this != of(carrierClass()))  throw subclassResponsibility();
    return carrierClass().cast(x);
  }
  default T newArray(int length) {
    if (this != of(carrierClass()))  throw subclassResponsibility();
    return java.lang.reflect.Array.newInstance(carrierClass().getComponentType(), length);
  }
  default MethodHandle findVirtual(Lookup lookup, String name, MethodTypeDescriptor type) {
    if (this != of(carrierClass()))  throw subclassResponsibility();
    return lookup.findVirtual(carrierClass(), name, type.asMethodType());
  }
  private static RuntimeException subclassResponsibility() {
    throw new IllegalArgumentException();
  }
  /**
   * Initial entry point called from the VM when a type operator
   * expression must be resolved.
   */
  static <C> ResolvedTypeDescriptor<?,C> initialMetafactory(
    Lookup lookup, TypeDescriptorBootstrapCallInfo<C> bci
  ) throws BootstrapMethodError {
    String descriptor = bci.invocationName();
    Class<C> carrierType = bci.invocationType();
    Class<?> typeOpClass = bci.typeOperatorClass();
    String typeOpName = bci.typeOperatorName();
    List<Object> typeOpArgs = bci.asList();
    ...
  }
}

It is an open question whether any of the ResolvedTypeDescriptor API should be merged into the Class API. That decision could create a set of secondary "crasses" (runtime type quasi-classes) which do not directly represent a classfile, but instead represent a type somehow derived from or related to one or more classfiles. There is some precedent for this, since the existing Class instances for primitives and void, and for arrays, may be viewed as "crasses". In that case, the carrierClass API would probably be named getPrimaryClass, and would map a "crass" to its nearest proper supertype (or Object or an interface), and there would be a new query isTypeExpression.

Keeping the ResolvedTypeDescriptor API disjoint from the legacy Class API would be cleaner, but would also require us to duplicate or extend many APIs, such as Lookup, in which Class is a proxy for a JVM type descriptor. An interface TypeDescriptor (proposed by the Constable project) may give us a hook to generify those APIs, rather than brutally duplicating them, and without introducing "crasses".

Alternatives

This design can be viewed as a refinement of an earlier experimental mechanism called "class-dynamic", which decoded a sub-language from class name strings and spun classfiles on the fly in response to resolution requests. That mechanism funneled the type operator expression through the class name, which is similar to the above design, but makes no distinction between a regular class reference and a type operator expression.

The integration of type operators into the JVM seems to be cleaner if the distinction between regular named classes and type expressions is explicit from the beginning. In addition, we do not want to commit to spinning classfiles in response to type operators; some use cases of type operators intentionally alias regular classes, but with some extra "annotation" payload injected. This cannot be done in a framework which confuses class names with type expressions.

When we design template classes, we could attempt to add a purpose-built descriptor syntax designed expressly for templates. However, a design like the one in this JEP would be needed anyway.

We could try to live without reified generics altogether, in which case the existing type descriptors would be serviceable.

Testing

// What kinds of test development and execution will be required in order // to validate this enhancement, beyond the usual mandatory unit tests?

Risks and Assumptions

// Describe any risks or assumptions that must be considered along with // this proposal.

Dependencies

// Describe all dependencies that this JEP has on other JEPs, JBS issues, // components, products, or anything else.

Design FAQ

DRAFT DRAFT DRAFT The following section will be part of the comments, not the JEP proper.