Room 101

Gilad Bracha's blog. A place to be (re)educated in Newspeak

Originally posted on Blogger on October 6th, 2009

An Image Problem

Imagine a library, say in Java, that provided a universal facility for saving the exact runtime state and configuration of any application. It would essentially capture the heap, all threads and their stacks, and save them to a file.

You could then load this file from disk and reconstitute your application just as it was when it was saved: For example, windows would re-open just where they were (adjusting their location if the screen size differed). The system would cope with open file handles and sockets in a reasonable fashion, re-opening them if they were available. All this would work irrespective of the underlying platform, and it would be fast as well.

The facility would be transparent to the programmer - you wouldn’t need to change your program in any way, beyond calling the library function. Pretty neat.

I don’t know of such a facility in Java or .Net. However, Smalltalk has had such a beast for over 30 years. In Smalltalk parlance, it’s called an image. Most Smalltalks are inseparably coupled to the notion of such an image.

Tangent: There are exceptions of course, notably Strongtalk.

The idea hasn’t caught on. Why? Should it catch on, and if so, how?

To be sure, the image is a very powerful mechanism, and has significant advantages. For example, in the context of the IDE, it gives the ability to save and share debugging sessions.

One advantage of images is rapid start-up. If your program builds up a lot of state as it starts, it can be intolerably slow coming up. This is one the problems that killed Java on the client. One should avoid computing that state at startup; one easy way to do this is to precompute most of it and store it in an image. It is typically much faster to read in an image than to try and compute its contents from scratch.

The problem begins when one relies exclusively on the image mechanism. It becomes very difficult to disentangle one’s program from the state of the ongoing computation.

If your program misbehaves (and they always do) and corrupts the state of the process, things get tedious. Say you have a test that somehow failed to clean up properly. In a conventional setting, you fix the problem with your code, and run the test again. With an image, you have the added burden of cleaning up the mess your program made.

In general, traditional environments tend to force you to start over with each run of the program: the edit-compile-link-load-throw-the-application-off-the-cliff-let-it-crash-and-start-all-over-again cycle, in the words of the original Java whitepaper (on page 52, if you're looking).

Tangent: That whitepaper is a masterpiece of technical rhetoric, and was visionary in its day. Alas, Java never fully realized that vision.

The thing is, sometimes it’s good to be able to start afresh. It may be easier to start from scratch than to mutate your existing process.

Mega-tangent: incidentally, this is an argument for sexual reproduction as well.

Of course sometimes starting anew isn’t so nice: think about fix-and-continue debugging.

In some cases it is even more critical to separate your code from the computation. You often save your image just to save your program. It may take you a while to find out that your image has been corrupted. Now you need to go back to a correct image, and yet you need to extract your code safely from the corrupt image. To be sure, Smalltalk IDEs provide a variety of tools that can help you with that, but I have never been really happy with them.

Tangent: this is where irate Smalltalkers berate me about change sets and logs and Envy and Monticello etc. etc. Sorry, I don’t think it’s good enough.

In general, Smalltalk makes it hard to modularize one's code, and especially to separate the application from the IDE. The exclusive reliance on the image model greatly aggravates these difficulties.

Traditional development tools, primitive as they often are, naturally provide a persistent, stateless representation of the program. In fact they provide two: the source code, in a text file, and a binary.

Semi-tangent: source code seems the most obvious thing in the world; but traditional Smalltalk’s have no real syntax above the method level! Classes are defined via the evaluation of reflective expressions, which rely on the reflective API. This is very problematic: the API often varies from one implementation to another. By the way, this is one of the ways Newspeak differs from almost every Smalltalk (the late, great Resilient being the only exception I can recall). Newspeak has a true syntax. Furthermore, because Newspeak module declarations are fully parametric in all their external dependencies, they can be compiled at any time in any order - unlike code in most languages (say Java packages) where there are numerous constraints on compilation order (e.g., imports must be defined).

A binary is a stateless representation of the program code, but one that does not require a compiler to decode it. Smalltalk doesn’t usually have a binary form. Code is embedded in the image. There are exceptions, and some Smalltalk flavors have ways of producing executables, but the classic approach ties code and computation together in the image and makes it very hard to pry them apart.

None of this means you can’t have an image as well as a binary format. What is important is that you do not have just an image. Ideally, you have images and a binary format. This is one of my goals with Newspeak, and we are pretty close.

In Newspeak, serialized top level classes can serve as a binary format. I will expand on how serialization can serve as a binary format in an upcoming post. At the same time, we continue to use images, though I hope they will become much less central to our practice as time goes by.