Room 101


Gilad Bracha's blog. A place to be (re)educated in Newspeak

May 25th, 2024

Visiting Expressions and Expressing Visitors

Recently, the expression problem and the visitor pattern came up again in some discussions in social media. I pointed out that class hierarchy inheritance (aka virtual classes) makes visitors unnecessary. It turns out this is not widely understood. I've spoken about this on more than one occasion, but never bothered to write it up. This post is here to correct that omission.

Consider a very simple expression language:

Expression -> Number | Expression + Expression

We can represent programs in this language using the module below, which defines an AST with two kinds of node, Number and Sum:


Newspeak3
'Root'
class ExpressionsAST new = Object new (
) (
	public class Number withValue: v = Object new (
		|
			public val = v.
		|
	) (
	) : ()
	public class Sum adding: a to: b = Object new (
		|
			public e1 = a.
			public e2 = b.
		|
	) (
	) : ()
) : ()



Note that the entire AST library is defined as a set of nested classes inside a single outer class, ExpressionsAST.   Since the library is a class, the library as a whole can be extended or modified via inheritance. This is called class hierarchy inheritance, and is key to what follows.

We might want to extend our language with a new operation, such as multiplication. It would be nice to be able to specify that extension separately from the original program.

Why would you want to specify the extension separately? There are many potential reasons.  You might be trying to extend a closed-source compiler.  Or the maintainers of an open-source compiler don't want you messing with the source, and you don't want to fork it, with all that entails. There are surely additional circumstances of this nature.

We may also want to do something useful with this language, such as run programs or pretty print them. Again, we would like to be able to specify these programs separately from the original AST, and of each other.

The expression problem, broadly interpreted, is the problem of specifying these different components independently of each other rather than monolithically. The problem was so named by Phil Wadler. 

AFAICR, this came out of discussions on the sidelines of OOPSLA 98, though I won't swear to that. The basic issue was known much earlier, as Phil discusses in his note, cited at the start of the post.  

Now, this situation is not unique to ASTs and compilers; it represents a much more general problem of program extensibility. A small academic literature exists on the topic. In academia, the focus is often on typechecking such code, because that makes things harder. 

Academics love hard problems. If a problem is hard, you have to be really clever to solve it, and in academia, cleverness and novelty are key.

The most obvious arrangement in an object-oriented language is to define the functions as methods of the classes, but your AST becomes large and unwieldy - and that's assuming you are free to modify the AST source. 

Instead, best practice when implementing a compiler in an object-oriented language has been to keep the AST minimal (as above), and add all functionality as visitors. The thing is, visitors are not totally obvious, and are arguably clunky. 

The visitor pattern was first published in the Gang-of-Four book, but people had been doing it for some time earlier. For example, if you go look at the Strongtalk sources, you'll find a notion of 'Tool', which is a visitor by another name.  

We can visualize the problem in table form:

Class/Function eval prettyPrint typecheck
Number
Sum
Product

Each row represents a particular node type in the AST. Each column represents a particular function on the AST (e.g., example, eval represents an interpreter). The expression problem can be thought of as the problem of how to specify each table entry in a separate module.

The code shown above has no functionality, only two types, yielding this table:

Class/Function
Number
Sum

In most languages, addressing the expression problem is awkward at best. Object oriented languages make it easy to add rows via inheritance. However adding functions non-invasively is a problem: we would need to modify the classes by adding methods corresponding to the functions. In other words, adding new rows to the table is easy, but adding columns less so. Visitors were invented to address this problem, as were object algebras.

Conversely, in functional or procedural languages, new functions are easy to add. However, each new type needs to be handled inside each of the existing functions. In terms of our table format, adding columns is straightforward, but adding rows is problematic. There's been research done on  extensible pattern matching (which hasn't caught on), and in Haskell you can use type classes.

There's more to say about how Newspeak relates to object algebras, dualities and so forth, but this post is pretty long already. I'll probably write a follow-up post to address some of that.

We will now show how to address these issues in Newspeak, using virtual classes.

You can in fact, get around this in various other ways. Mixins are sufficient, for example, and I showed how to do this in my book on Dart (sections 2.15.1, 5.4, 5.5).  Newspeak's virtual classes imply mixins, and provide a more powerful and elegant mechanism.

We start with the code above, which defined our original AST. 

Adding Classes/Types

At some point, you discover multiplication. You extend your language accordingly:

Expression -> Number | Expression + Expression | Expression * Expression


The Newspeak code is given below:
 
public class MultiplyAST new = ExpressionsAST new (
	) (
public class Product multiplying: a by: b
		(* :exemplar: Product multiplying: (Number withValue: 10) by: (Number withValue: 20) *)
		 = Object new (
			|
				public e1 = a.
				public e2 = b.
			|
		) (
) : (
)
) : (
)
 

Class MultiplyAST  subclasses the original AST,  and adds class Product to represent multiplication. In effect, we have added a row to the original table.

Class/Function
Number
Sum
Product

So far this looks a lot like what you'd do in a conventional object-oriented language. However,
we are not adding Product to some global namespace, but extending the AST library with a new member.

Details: Because Newspeak has no global scope, top level classes cannot reference a superclass explicitly (so they always inherit from the default, Object). Therefore we define MultiplyAST as a nested class of

Newspeak3
'Root'
class ExpressionsWithMultiplyAST astClass: A
(* :exemplar: ExpressionsWithMultiplyAST astClass: ExpressionsAST *)
 = Object new (
	|
		ExpressionsAST = A.
	|
) (
	public class MultiplyAST new = ExpressionsAST new (
	) (
		public class Product multiplying: a by: b
		(* :exemplar: Product multiplying: (Number withValue: 10) by: (Number withValue: 20) *)
		 = Object new (
			|
				public e1 = a.
				public e2 = b.
			|
		) (
		) : ()
	) : ()
) : ()
 
The latter takes ExpressionsAST as a parameter to its factory, and MultiplyAST inherits from that. We'll see this same pattern repeated below.
Incidentally, this allows us use this code to extend different AST libraries with Product. 

Adding Methods/Functions


Now lets look at adding functions. We'll define a class,

public class Evaluator new = ExpressionsAST new (
	) (
public class Number withValue: v
		(* :exemplar: Number withValue: 2 *)
		 = super Number withValue: v (
		) (
public eval = (
				^val
			)
) : (
)
public class Sum adding: a to: b
		(* :exemplar: Sum adding: (Number withValue: 10) to: (Number withValue: 6) *)
		 = super Sum adding: a to: b (
		) (
public eval = (
				^e1 eval + e2 eval
			)
) : (
)
) : (
)

that extends the AST, much as we did before with MultiplyAST.  

And much like before, Evaluator is nested in an enclosing module,  

Newspeak3
'Root'
class ExpressionsEvaluator astClass: A
(* :exemplar: ExpressionsEvaluator astClass: ExpressionsAST *)
 = Object new (
	|
		private ExpressionsAST = A.
	|
) (
	public class Evaluator new = ExpressionsAST new (
	) (
		public class Number withValue: v
		(* :exemplar: Number withValue: 2 *)
		 = super Number withValue: v (
		) (
			public eval = (
				^val
			)
		) : ()
		public class Sum adding: a to: b
		(* :exemplar: Sum adding: (Number withValue: 10) to: (Number withValue: 6) *)
		 = super Sum adding: a to: b (
		) (
			public eval = (
				^e1 eval + e2 eval
			)
		) : ()
	) : ()
) : ()


Evaluator defines classes Number and Sum that inherit from the corresponding classes in the Evaluator's superclass (the original AST) and extend them with eval methods.

You can look at Number to see the mechanics. 

public class Number withValue: v
		(* :exemplar: Number withValue: 2 *)
		 = super Number withValue: v (
		) (
public eval = (
				^val
			)
) : (
)

We see that it inherits from super Number. This works exactly like calling a super method.  The effect is that our Number class has all the features defined in the original AST, but extended with the eval method it defines.

public class Sum adding: a to: b
		(* :exemplar: Sum adding: (Number withValue: 10) to: (Number withValue: 6) *)
		 = super Sum adding: a to: b (
		) (
public eval = (
				^e1 eval + e2 eval
			)
) : (
)
 
 works exactly the same way.

This amounts to adding a column to the original table.

Class/Function eval
Number
Sum

 

Adding Both Rows and Columns


Now we may want to take the extended AST and existing interpreter, and add interpretation of multiplication. Here, we define an Evaluator class with a nested class Product. Product overrides super Product and adds an eval method.  The complete code is shown below.





Looking at the table below, we see that the top and middle lefthand entries (Number & Sum) were defined in one module; the bottom left (Product) was defined in another; the top and center right rubrics (the eval implementations for Number and Sum) in a third, and the bottom right entry (Product eval) added in the fourth and final module given directly above. This demonstrates the freedom we have to define types and functions either together or separately, as requirements evolve.

Class/Function eval
Number
Sum
Product

It should be obvious that we could add additional operations like pretty printing or typechecking in the same way. If you want to see how this works on a larger scale, you can take a look at the ShapeRank prototype. You'll see a similar pattern. The AST is defined in ShapeRankASTs, and extended in ShapeRankScopes, ShapeRankInterpreter and ShapeRankTypechecker.

What's attractive about this approach is that you can be naive without loss of generality. You can put methods on the AST nodes in the most obvious way.  You don't need a non-obvious design pattern (e.g., visitors or object algebras) which requires advance planning to put in place. You don't even need to be aware of the issue when you start. In short, it's simple, and simple is good.