JEP draft: Type operator expressions in the JVM
Owner | John Rose |
Type | Feature |
Scope | JDK |
Status | Draft |
Component | hotspot / runtime |
Created | 2018/06/13 06:48 |
Updated | 2024/09/25 17:27 |
Issue | 8204937 |
DRAFT DRAFT DRAFT
Summary
Extend the space of JVM type descriptors to include type operators, which are symbolic references to factory-made types. This is a separable component of template classes.
Goals
Allow JVM type descriptors (for methods, fields, and constants) to make new distinctions between types not already present in the system of classes, primitives, and arrays. Support future translation strategies which must make distinctions between different usages of the same basic JVM type, or which must provide a way to specify factory input to a class factory or template species factory.
Non-Goals
This work is a low-level VM hook, like invokedynamic
, not a language
feature like lambdas. As such, it will not propose any specific
mechanism for representing parameterized types; it will only provide a
necessary "hook" to name such types. It will not provide a new way to
define classes; it will only provide a way to associate such classes
with a public symbolic descriptor. It will not define any language
features, nor translation strategies. It will not attempt to extend,
conflict, or rationalize the current syntax for static generic
signatures (JVMS 4.7.9.1).
Success Metrics
Experimental translation strategies can be created which distinguish
List<Integer>
from List<String>
in classfiles. Experimental class
templating mechanisms will be able to create species that are
denotable from JVM type descriptors. Designers of language features
and translation strategies will be able to vary the encodings of new
source-level types by changing a bootstrap method, rather than
changing the JVM's core logic. Security proofs will be easier to
construct, given the black-box nature of type operators, decoupled
from the complex details of templates and other advanced language
features. Experimental migration strategies can be tested without
fully instantiating new language features, since new place-holder
types easily be posited by simple changes in javac.
Motivation
Descriptors which can denote complex type instances, such as
List<int>
or List<ComplexDouble>
are a necessary component of
"reified generics", which in turn are a goal of Project Valhalla. If
a value type is to "code like a class, work like an int", then it
seems necessary to be able to denote container types which are
customized to that value type, rather than being erased to Object
like a reference type.
Description
We will extend the JVM's fundamental syntax for field descriptors, once for all future type schemes (we hope!). The syntax will allow any single type descriptor to be modified by an optional suffix, which has the effect of constraining the original type descriptor in an ad hoc, programmable manner. The combination of the original type descriptor and the suffix is called a type operator expression.
The resolvable semantic elements of this expression are:
- carrier type: the original type descriptor (before the suffix)
- type operator name: a resolved class name and/or simple identifier
- type arguments: one or more type descriptors and/or other constants
All of the above semantic elements are optional; any may be omitted.
If the type operator name is omitted, it will be derived from the
carrier type, as in the case of a template class whose top type is the
unspecialized class itself. If the carrier type is omitted, it is
defined to be Object
, the customary carrier for untyped values in
the JVM.
For example, here are some potential use cases for type operator expressions:
-
reified generics: The carrier type is
Map
, the type operator name is omitted, and the arguments areint
andString
. The whole expression denotesMap<int,String>
. -
wildcards: The carrier type is
List
, the type operator name is omitted, and the argument is the symbol (not type)?
. The whole expression denotesList<?>
, as distinct from rawList
. Given that wildcards are a special case of a concept called "existential types", it is notable that type operator expressions provide a way to wrap any bounded type (a carrier type) inside a symbolically labeled existential type. -
non-nullable references: The carrier type is
String
and the type operator is!
(orjava/lang/NotNull
) with no arguments. The whole expression denotesString!
, a non-nullable string reference. -
nullable values: The omitted carrier type defaults to
Object
and the type operator is?
(orjava/lang/Nullable
) with one argumentint
. The whole expression denotesint?
, a nullable integer. -
reified intersections: The carrier type is some interface
I
and the type operator is&
with a type argumentJ
. The whole expression denotes the intersection typeI&J
. -
reified unions: The omitted carrier type defaults to
Object
and the type operator name is|
with two or more type argumentsI
,J
. The whole expression denotes the intersection typeI|J
. -
fixed-sized arrays: The carrier type is an array type
double[]
and the type operator name isArray.length
with one argument5
. The whole expression denotesdouble[5]
, an length-constrained array. -
range constraints: The carrier type is a primitive type
int
and the type operator name isInteger.interval
with argumentsge
and0
. The whole expression denotesint
constrained to non-negative values. -
null and notreached type tokens: The omitted carrier type defaults to
Object
and the type operator isjava/lang/Null
orjava/lang/NotReached
. The whole expression denotes a reference constrained to be null, or a reference that is never delivered to its consumer (i.e., the constraint always fails).
The concrete grammar for such descriptors, including new productions, will be something like the following:
MethodType: '(' (FieldType)* ')' (FieldType | 'V')
FieldType: PrimitiveType | ArrayType | ObjectType | *TypeExpr
PrimitiveType: 'B' | 'C' | 'D' | 'F' | 'I' | 'J' | 'S' | 'Z'
ArrayType: '[' (PrimitiveType | ArrayType | ObjectType)
ObjectType: 'L' ClassName ';'
*TypeExpr: TypeCarrier '/' (TypeOpName)? (';' | '[' (TypeArg)+ ']' )
*TypeCarrier: FieldType | `L`
*TypeOpName: '$' Identifier | ('L' ClassName) (';' '$' Identifier)?
*TypeArg: FieldType | MethodType | NameArg | NumberArg
*NumberArg: ('-')? DigitNotZero (Digit)* ';' | '0' ';'
*NameArg: '$' Identifier ';'
Identifier: (any character except '.' ';' '[' '/' '<' '>' ':')*
This grammar is built on slightly edited form of the one in JVMS 4.3.
The new productions which support type operators are TypeExpr
,
TypeCarrier
, TypeOpName
, TypeArg
, NumberArg
, and NameArg
.
(They are starred.) The production for Identifier
is taken from
JVMS 4.7.9.1.
A TypeExpr
denotes a fresh type which is treated by the JVM as
distinct from any other type with a different descriptor string,
including primitives, arrays, classes, and other TypeExpr
s.
The syntactic components of a TypeExpr
are a TypeCarrier
, a
TypeOpName
, and a sequence of zero or more TypeArg
s. These denote
the resolvable semantic components of a resolved type operator
expression, which are respectively the carrier type, the type
oeprator name, and the type arguments.
Two TypeExpr
s with exactly the same spelling denote the same type.
Any FieldType
which is a proper prefix of another FieldType
is a
proper supertype of the longer FieldType
. Other than those
relations, the JVM does not recognize any equivalences or relations
between types with differently spelled TypeExpr
s.
In particular, the verifier treats every distinct type operator expression as a generic "black box" type, which starts with the carrier type and constrains it in some way, unknowable to the verifier.
Thus, the verifier will allow values of the type operator type to implicitly convert to its carrier type, or any supertypes of its carrier type, but it will not allow such values to be converted to any other type. Also, the verifier will not convert implicitly from a carrier type to a type operator type built on top of that carrier type; such conversions must be performed by explicit bytecode execution.
Here are some syntax examples of descriptors containing type operator expressions (along with some hypothetical meanings):
Ljava/util/Map;/[ID]
(the type speciesMap<int,double>
)Ljava/util/List;/[I]
(the type speciesList<int>
)Ljava/util/List;/[[I]
(the type speciesList<int[]>
)Ljava/util/List;/[$wild;]
(the wildcard species ofList
)Ljava/util/List;/$Wild;
(wildcard alternate spelling)Ljava/util/List;/$Wild[Ljava/lang/Object;]
(wildcard alternate spelling)[D/$length[5;]
(fixed-sized arraydouble[5]
, not array of length-5double
)I/$interval[$ge;0;]
(int
whose value is non-negative)L/Ljava/util/TupleTemplate[Ljava/lang/String;I]
(a pair ofString
andint
)L/Ljava/util/TupleTemplate[FFF]
(a triple offloat
s)Ljava/lang/String;/$N;
(theN
variant ofString
)(Ljava/lang/String;)Ljava/lang/String;/$N;
(method wrapping anN-String
)(Ljava/lang/String;/$N;)Ljava/lang/String;
(method unwrapping anN-String
)L/;
(shortest possible expression, a trivially constrainedObject
)LFoo;/;
(carrier type only, with trivial modification)L/$N;
(type operator only,N-Object
)L/[$Arg;]
(lone argument with no type operator: no hypothetical meaning)L/LFoo[LBar;/$N;]
(the type speciesFoo<N-Bar>
)L/LFoo[LBar;]/$N;
(the type speciesN-Foo<Bar>
)[D/$length[5;]/$N;
(N
variant of fixed-sized arraydouble[5]
)[D/$N/$length[5;];
(fixed-sized variant ofN
-variant ofdouble[]
)
The last four examples show that type operator expressions can nest.
For example, L/LFoo[LBar;/$N;]
denotes a type which is derived first
from Bar
by modifying it with N
, then passing the modified type to
the parameterized type constructor Foo
. (The carrier type of the
result is Object
, not Foo
.) The last two examples show that type
expressions can nest by piling up several TypeOp
suffixes. The
order of these suffixes is significant purely because the descriptor
strings are different: I/$J;/$K;
is a different verifier type from
I/$K;/$J;
even if the computational effects of the J
and K
type
modifiers happen to commute.
The JVM will accept type operator expressions, structured as
TypeExpr
strings, in the following contexts:
- class field types -- allocated as "black box" references
- class method types -- treated as "black box" arguments and returns
CONSTANT_NameAndType
types -- resolvable black box types (field or method)CONSTANT_Class
names -- resolvable type expressions with programmed resolutioninstanceof
operands (viaCONSTANT_Class
) -- resolvable types with programmed behaviorcheckcast
,anewarray
operands -- similar toinstanceof
invokevirtual
receivers (viaCONSTANT_Methodref
) -- programmably resolved methodsgetstatic
,putfield
, etc. -- similar toinvokevirtual
Normally, descriptor syntaxes are disjoint from the syntax of class
names that appear with CONSTANT_Class
constants. For example, the
descriptor I
is very different from the class name I
. However, in
some cases the syntaxes can overlap; the class name of an array is the
same as its descriptor, including the trailing semicolon. We use this
trick with type operator expressions also, so that the same type
operator expression can be inserted directly into a descriptor, and
also used as a class name.
A class name string can be unambiguously distinguished as a type operator expression in three steps:
- check if the last character is
]
or ';' (otherwise, fail) - if the string begins with
[
, parse the array type name and look for a following/
- otherwise, scan the string to see if the character
;
or[
occurs
If the first step and any of the remaining steps pass, then the class
name string is proven not to be a plain class name or an array class
name, and may be assumed to be a type operator expression (or else
an erroneous input). Otherwise it can be assumed to be a plain
class name (or array class name). Another simpler technique (though
perhaps a slower one) is simply to parse the class name string as a
simple class or array name, and see if the end is reached, or else the
next remaining character is slash /
introducing a type operator
suffix; in that case the second step must be executed first.
The second and third steps are expensive but necessary, but can be
deferred until after the first step, which is cheap. Note that the
JVMS specifies that a class name may not contain an open bracket [
unless it is an array type name, and in that case the bracket will not
follow a package separator /
. Therefore the class name grammar is
not ambiguous, even after type operator expressions are added.
Some operations on a type expression require access to the inside of
the black box. These include loading a reflective constant for a type
expression, making a type test (checkcast
), making an array type
whose component is the type expression, calling a method on an
instance whose verified type is a type expression, etc.
The built-in resolution mechanism for type operator expressions will perform the following jobs:
- Derive a bootstrap meethod ("BSM") from the
TypeCarrier
andTypeOpName
. - Call the BSM on the
TypeArg
s, suitably parsed and reified. - (Also pass relevant context, such as the current class, the carrier, and the operator name.)
- Receive in reply from the BSM a resolved type descriptor for the type.
- Permanently and atomically record that descriptor for that exact type expression.
- Use the descriptor to derive the various behaviors required for that type.
The details of these steps and the associated APIs are defined elsewhere, and may be extended over time. See below for a sketch of resolved type descriptors and their behavior. Type operators are named by an optional class and optional identifier. If the class is present, it will help determine the bootstrap method; for example, if it is a template, the template will be specialized to the given arguments. If the identifier only is present, the BSM will be a centralized one which assigns fixed standard meanings to a small number of names.
When value types become available, type operator expressions will also be allowed to interoperate with value types. A given type operator expression will always be unambiguously assigned a kind, as a value or a reference. If other kinds are invented, type operator expressions will be "kinded" in the same way. For example, the '$' could be followed by a kind character, or additional characters besides '$' could be assigned to introduce type operator expressions of distinct various kinds.
The descriptor will not be a Class
but will have its own
reflective type and API. The descriptor will report a concrete
carrier Class
which is compatible with all values described by the original
type operator expression. The BSM for a type operator may return a
resolved type descriptor which reports only Object
as its carrier
class, or it may spin and load a new anonymous class, and use that.
In either case, the JVM will be able to use the carrier class as a
safe supertype for the type operator expression. The JVM will not
freely convert from the carrier class to the type operator type,
except via a checkcast
bytecode, whose behavior is under the control
of the resolved type descriptor selected by the BSM.
Note that the type operator expression language is self-contained and pre-normalized. It does not make references into any constant pool, nor is there any "calculus" for proving that two distinct type expressions denote the same type.
The API for resolved type descriptors will be something like this:
interface ResolvedTypeDescriptor<T extends C, C> {
Class<T> resolvedType();
Class<C> carrierType();
static <T> ResolvedTypeDescriptor<T,?> of(Class<T> clazz);
// These defaults may be wired into the JVM bytecodes if desired.
default boolean isInstance(C x) {
if (this != of(carrierClass())) throw subclassResponsibility();
return carrierClass().isInstance(x);
}
default boolean isAssignableFrom(ResolvedTypeDescriptor<?,?> subDesc) {
if (this != of(carrierClass())) throw subclassResponsibility();
return carrierClass().isAssignableFrom(subDesc.resolvedType());
}
default T cast(C x) {
if (this != of(carrierClass())) throw subclassResponsibility();
return carrierClass().cast(x);
}
default T newArray(int length) {
if (this != of(carrierClass())) throw subclassResponsibility();
return java.lang.reflect.Array.newInstance(carrierClass().getComponentType(), length);
}
default MethodHandle findVirtual(Lookup lookup, String name, MethodTypeDescriptor type) {
if (this != of(carrierClass())) throw subclassResponsibility();
return lookup.findVirtual(carrierClass(), name, type.asMethodType());
}
private static RuntimeException subclassResponsibility() {
throw new IllegalArgumentException();
}
/**
* Initial entry point called from the VM when a type operator
* expression must be resolved.
*/
static <C> ResolvedTypeDescriptor<?,C> initialMetafactory(
Lookup lookup, TypeDescriptorBootstrapCallInfo<C> bci
) throws BootstrapMethodError {
String descriptor = bci.invocationName();
Class<C> carrierType = bci.invocationType();
Class<?> typeOpClass = bci.typeOperatorClass();
String typeOpName = bci.typeOperatorName();
List<Object> typeOpArgs = bci.asList();
...
}
}
It is an open question whether any of the ResolvedTypeDescriptor
API should
be merged into the Class
API. That decision could create a set of
secondary "crasses" (runtime type quasi-classes) which do not directly
represent a classfile, but instead represent a type somehow derived
from or related to one or more classfiles. There is some precedent
for this, since the existing Class
instances for primitives and
void
, and for arrays, may be viewed as "crasses". In that case, the
carrierClass
API would probably be named getPrimaryClass
, and
would map a "crass" to its nearest proper supertype (or Object
or an
interface), and there would be a new query isTypeExpression
.
Keeping the ResolvedTypeDescriptor
API disjoint from the legacy
Class
API would be cleaner, but would also require us to duplicate
or extend many APIs, such as Lookup
, in which Class
is a proxy for
a JVM type descriptor. An interface TypeDescriptor
(proposed by the
Constable
project) may give us a hook to generify those APIs, rather
than brutally duplicating them, and without introducing "crasses".
Alternatives
This design can be viewed as a refinement of an earlier experimental mechanism called "class-dynamic", which decoded a sub-language from class name strings and spun classfiles on the fly in response to resolution requests. That mechanism funneled the type operator expression through the class name, which is similar to the above design, but makes no distinction between a regular class reference and a type operator expression.
The integration of type operators into the JVM seems to be cleaner if the distinction between regular named classes and type expressions is explicit from the beginning. In addition, we do not want to commit to spinning classfiles in response to type operators; some use cases of type operators intentionally alias regular classes, but with some extra "annotation" payload injected. This cannot be done in a framework which confuses class names with type expressions.
When we design template classes, we could attempt to add a purpose-built descriptor syntax designed expressly for templates. However, a design like the one in this JEP would be needed anyway.
We could try to live without reified generics altogether, in which case the existing type descriptors would be serviceable.
Testing
// What kinds of test development and execution will be required in order // to validate this enhancement, beyond the usual mandatory unit tests?
Risks and Assumptions
// Describe any risks or assumptions that must be considered along with // this proposal.
Dependencies
// Describe all dependencies that this JEP has on other JEPs, JBS issues, // components, products, or anything else.
Design FAQ
DRAFT DRAFT DRAFT The following section will be part of the comments, not the JEP proper.
-
You didn't use dot
.
for type operator syntax; why not? Because in some pathways, descriptors flow through class names, and slashes are converted to dots and vice versa. Any distinction between slash and dot would be lost at that point, without complicated context-sensitive rules for dot-preservation or dot-recovery. -
That grammar is complicated: Everything seems optional. Why not get rid of some optionality? Briefly, each optionality is motivated as follows. The
TypeCarrier
could be removed in favor of making it alwaysObject
, but many use cases for type operators work within a static bound type, and it is wasteful not to allow that static bound to appear as a true verifier type. Given aTypeCarrier
, it makes sense that the actual type operator should sometimes be derived directly from the carrier and other types be a separately specified parameter, hence the optionality of theTypeOpName
. But if theTypeOpName
is unrelated to the carrier type, the carrier is oftenObject
, hence a special abbreviation for that common case that makes theTypeCarrier
optional. So the carrier can be either identical with the type operator, or completely separate. The argument list is optional since some type operators inherently require arguments while some are "just the mode" (as with "not null"). The trailing semicolon;
for missing aTypeArg
list is a judgment call; it could be denoted instead by[]
, but that seems egregiously noisy for a simple modifier like "not null", and requiring a non-emptyTypeArg
list in the brackets adds trivial complexity. -
That grammar is complicated: Why are there different ways to denote a type operator? Dropping the
TypeOpName
allows the carrier and the type operator to come from the same class, as noted above, while allowing the type operator to be a fully resolved class name give obvious modularity benefits. In the latter case allowing an additional name to select a class member gives a way for one class to expose a library of type operators. The final case, of a simple identifier, allows either the carrier type to selected a class member (or "mode" argument such as "wildcarded"), or the system to globally define a handful of type operators outside of the package scoping system:!
(for "not null") and?
(for "maybe null") are two such likely global operators. -
That grammar is complicated: You allow too many kinds of type operator arguments. Why not just have types as parameters? Type arguments are clearly all you need to upgrade today's generics in place, to reify their types inside of descriptors. But this is short-sighted, since C++ generics allow many other kinds of arguments. The grammar chosen above allows specification a reasonable array of non-type arguments corresponding to common use cases of template arguments in C++ and other languages. After types, strings are the obvious next candidate, and indeed strings can denote anything else we need, and are agreeably fundamental in the JVM. We threw in
MethodType
because that is a fundamental construct in the JVM, and shouldn't be passed through a stringy encoding channel. We threw inNumberArg
because small integral numbers are fundamental in various use cases, such as definite arrays. All of the above correspond to natively encoded constant pool entries (except integers which are larger than along
). -
You forget
MethodHandle
andDouble
arguments, aren't those fundamental also? Yes, they are, but they can be readily encoded to bootstrap methods using combinations of the other argument types, and designing a hardwired stringy encoding for them would be needlessly complex. For a method handle, just pass several arguments denoting its class, name, and type, with maybe a ref-kind also. For a floating point number, consider using a string containing its hex-float representation, to avoid problems with rounding and ambiguity. -
Those identifier strings are useless without a way to quote the illegal characters; why not have strings with proper quoting? The limitations on
TypeArg
strings are the same as those on class names, and there are standard systems (such as the "Symbolic Freedom" encoding) for representing the handful of illegal characters using escape sequences. Bootstrap methods which need general strings should use such a scheme. This is much easier than somehow telling the JVM it must start allowing hitherto "dangerous characters" in small parts of the descriptor grammar. -
Constrained primitive types, seriously? An earlier version of the grammar assumed that the only carrier type was
Object
, allowing the "head" of the type operator expression to be type operator name (such as a template class). This had two major downsides: First, it didn't capture the fact that a template might well be the supertype of all its instances; this is certainly true for containers likeList<int>
; throwing away that type bound means more checkcast bytecodes to restore it in method code, which seems a sorry waste. Second, theL
descriptor letter might be augmented (at some point) by additional classy descriptors (such as theQ
descriptor of the "minimal value type" prototype). Allowing carrier types to be any pre-existing verifier types seems prudent. Given that, the primitive types and arrays come in pretty much "for free", although it would be reasonable to disallow constrained primitives if that turns out to be hard to implement, and add them in later when primitives are unified more fully with other types. -
Why doesn't the
ArrayType
production mentionFieldType
any more? The array type syntax is our sole legacy syntax that is similar to a type operator. From a prior component type it creates a complex new array object type. We don't want to pretend that there is a way to customize that array object type by adding arbitrary "tweaks" to its component type -- it is hard enough to manage constrained scalar types without cutting them into the "guts" of the JVM's built-in array object mechanism. We take the simpler choice of allowing array instances to be constrained without asking questions about what is inside them. When arrays are virtualized (made instances of interfaces) then we can fully nest constraints within array component types, but not until then.