Room 101

Gilad Bracha's blog. A place to be (re)educated in Newspeak

Originally posted on Blogger on October 6th, 2018

Reified Generics: The Search for the Cure

Many have argued that run time access to generic type information is very important. A very bitter debate about this ensued when we added generics to Java. The topic recurs whenever one designs a statically typed object oriented language. Should one reify generic types, or erase them? Java chose erasure, .Net and Dart chose reification, and all three solutions are in my mind unsatisfactory for various reasons, including but not limited to the handling of erasure or its presumed alter ego, reification.

Pedantic note 1: Throughout this post, I will use the terms erasure and reification as shorthand for erasure and reification of generic type information.

In a well designed object-oriented language, erasure and reification are not contradictory at all. This statement might bear some explanation, so here we go ...

A while back, I discussed the problem of shadow language constructs. I gave examples of shadow constructs such as Standard ML modules, traditional imports etc. Here is another: reified generics.

Generics introduce a form a shadow parameterization. Programming languages all have a perfectly good mechanism for declaring parameterized constructs and invoking them. You may have heard of it - it is widely known by the name function, and it goes back to the 17th century.

Pedantic note 2: Yes, programming language functions are usually not mathematical functions. The parameterization mechanism is however, essentially the same.

Generics introduce a different form of formal and actual parameters. There is a purpose to that: static analysis. However, when languages try to provide run time access to these parameters (i.e., reification of generics), we are creating a lobotomized twin of the existing runtime parameter passing system. A new, redundant, confusing and costly set of mechanisms is added to the run time in order to declare, pass, store and access these parameters.

The first guiding principle of any solution is to avoid shadow constructs. We already have parameterization support, let's use it.

Generics are functions from types to types, typically classes to classes.

Pedantic note 3: If your language is prototype based, generics might be considered functions between prototypes. If your language has primitive types - well, you're up the creek without a paddle anyway. There is no justification for primitive types in an object oriented language.

If classes are expressions, we can write reified generics as ordinary functions. Here's some sample pseudo-code. It's given in a quasi-standard syntax, so I don't waste time explaining Newspeak syntax.

public var List = (function(T) {

return class {

var hd, tl;

class Link {

public datum;

public next;

}

public elementType() { return T}

public add(e) {

var tmp := Link new.

tmp.datum := e;

tmp.next := hd;

tl := hd;

hd := tmp;

return e;

}

}).memoize();

Here's a summary of what the above means:

We declare a variable named List, initialized to a closure.
The closure takes a class T as a parameter and returns a class as its result.
The result class is specified via a class expression, which implements a linked list.
The class expression includes a nested class declaration, Link.
The method memoize() is called on the closure to, well, memoize it. Memoize() returns a memoized version of its receiver.

Each call to List() returns a list class specialized to the actual parameter of List(). We can create a list of integers by saying

var lst := List(Integer).new();

and we can dynamically check what type of elements lst holds

lst.elementType(); // returns Integer, the class object reifying the integer type.

The reified element type is shared among all instances of a given list class, because it is stored in the closure surrounding the class. We avoid duplicating classes with the same parameters - this is just function memoization (and I assume a memoize() method on closures for this purpose). All this works independent of any static types. We are just using standard runtime mechanisms like closures, memoized functions, objects representing classes and yes, class expressions. Smalltalk had these, in essence, over 40 years ago.

What if I don't have class expressions? Well, don't you know that everything should be an expression? Besides, this works fine if you have the ability to define classes reflectively like Smalltalk, or have properly defined nested classes like Newspeak, though it may be a bit more verbose and require more library support to be palatable.

Now let's add types. In the code below, type annotations are completely optional, and have absolutely no runtime effect. They are shown in blue.

public var List = (function(T : Class) {

return class {

var hd, tl : Link;

class Link {

public datum : T;

public next : Link;

}

public elementType() : Class { return T}

public add:(e : T) : T {

var tmp: Link := Link new.

tmp.datum := e;

tmp.next := hd;

tl := hd;

hd := tmp;

return e;

}

}).memoize();

You may notice one odd thing - we use the name of the formal parameter to the closure, T, as a type. This is justified by the following rule.

Rule 1: In any method or closure m, a formal parameter T of type Class (or any subtype thereof) implicitly defines a type variable T which is in scope in type expressions from the point T is declared to the end of the method/closure.

Next, we need to be able to use the information given by the declaration of List when we write types like List[Integer]. We use the following rule.

Rule 2: If e is an expression of function type with parameter(s) of type Class and return type Class, e's name can be used inside a type expression as a type function; an invocation of the type function e[T1, ..., Tn] denotes the type of the instances of the class returned by the value expression

e(T1, ..., Tn).

We can then write, and typecheck

var lst : List[Integer] := List(Integer).new();

var i : Integer := lst.add(3) + 4;

Oh, and we can still do this:

lst.elementType(); // returns Integer, the class object reifying the integer type.

Similarly, if you wish to create new instances of the incoming type parameters, you should be able to do that in the above regime - though you will have to confront the fact that different subtypes may have different constructors and plan around that explicitly - say, by defining a common construction interface for these types.

The beauty of this scheme is that no runtimes were harmed in the making of this reified generic type system. The type system is completely optional. And this is my point: reification was there all along. The typechecker simply needs to understand this fact and leverage it. The basic approach would work with any language with types reified as values, regardless of whether it has generics.

Interestingly, we now have reification of generics, and erasure, at the same time. The two are not in conflict. Reification is supported by the normal runtime mechanisms, independent of types, which are optional and always erased, carrying no runtime cost or semantics.

Reification of generics is now a choice for library implementers. If they think it is worthwhile to pay the costs, so that, for example, someone can cheaply test if a collection is a collection of integers or a collection of strings, they are free to do so.

If they don't want to pay a price for reification but still want to typecheck generics, they can do that too. Nothing prevents one from explicitly declaring type parameters (as opposed to the implicit ones derived from the class-valued value parameters used for reification).

Tangent, TL; DR: Now is the time to mention that traditional reification of generics - that is, runtime support for a shadow parameterization mechanism - is a disaster. It hurts performance in both space and time; Just ask the brave VM engineers who struggled with these issues on the Dart VM. Mitigating that introduces enormous complexity into the runtime and requires a huge effort, which would be better spent doing something good and useful instead.

In systems designed to support multiple programming languages, reification brings a different problem. All languages must deal with the complexity of reification; worse they must conform to the expectations of the reified generic type system of the "master language" (C# or Java, for example).

Consider .Net, the poster child of generic reification. Originally, .Net was intended to be a multi-language system, but dynamic language support there has suffered, in no small part due to reification. Visual Basic was a huge success until .Net came along and made it conform to C#. And what Iron Ruby/Python programmer ever enjoyed being forced to feed type arguments (whatever those might be) into a collection they are creating?

In contrast, the JVM was conceived as a monolingual system. Sun management deluded themselves that Java was the ultimate programming language (though yours truly did try to hint, ever so gently, that further progress in PL was at least conceivable). And yet, the JVM has become home to a wide variety of languages. This is due to multiple factors, invokedynamic not the least among them. But erasure plays a crucial and underappreciated role here as well. If not for erasure, the JVM would have the above-mentioned problems wrt dynamic languages, just like .Net.

Of course, generics have many issues that are independent of reification. The great difficulties with generics come up when they interact with subtyping. All the problems of variance, as well as inference, are rooted in that interaction. If you are happy with any existing approach, you should be able to incorporate it into the above reification strategy - but I am not aware of any pre-existing generic design that I would consider satisfactory.

I think I may now have a plausible approach to the typing issues, but the margins of this blog are too narrow to contain it. A follow up post will either make it all clear, or confess that it hasn't worked out. The above comments on reification stand on their own in any case.