Room 101


Gilad Bracha's blog. A place to be (re)educated in Newspeak

Originally posted on Blogger on December 5th, 2008

Unidentified Foreign Objects (UFOs)

I recently found out that Newspeak’s basic foreign function interface (FFI), called Aliens, is being made available in Squeak (though that will require new VMs with the required primitives). Thanks to John McIntosh for doing this.

I should also thank Eliot Miranda for most of the original work on aliens, and Vassili Bykov, Peter Ahe and Bill Maddox for the rest. Also thanks to Lars Bak, whose work on the Strongtalk FFI inspired the VM level view of aliens; and to Dave Ungar, who was the first to understand that objects were all you needed on the language side of an FFI. Lastly, this post benefited immensely from conversations with Vassili.

So I figured I’d write a little bit about Aliens from a high level perspective. As usual, the ideas apply to programming languages in general.

In Smalltalk, there isn’t a standard FFI. Various dialects provide different solutions, with varying degrees of functionality, performance and ease of use. To be honest, they are usually a poor fit with the surrounding language and fairly awkward to use. This inhibits Smalltalk’s interoperability with the rest of the world. I’d argue that the absence of a good, standard FFI has cost the Smalltalk community dearly.

In Java, by contrast, native methods and JNI provide a standardized FFI. This mechanism is far from perfect, but at least there is a more or less standard solution.

What these and other systems have in common is support for a special construct (such as the native modifier for methods, or declarations like extern C, or the truly ugly ad hoc FFI syntax extensions used in various Smalltalks) for foreign functions.

Newspeak’s FFI was strongly influenced by the Strongtalk FFI; but unlike Strongtalk, Newspeak doesn’t have a special syntax for foreign calls. As Self showed many years ago, one doesn’t really need a special syntax for the FFI. The foreign functions, APIs, DLLs etc. can all be represented as objects. They just happen to be foreign objects.

The idea of a foreign object, which we call an alien, is at the foundation of the Newspeak FFI.

For starters, any decent language should be able to represent functions as values; and in an object-oriented language, these values are objects, accessed via a standard interface. Foreign functions are just a different implementation of that interface.

Another natural way to model a foreign function is as a method defined on a foreign object. For example, one can view an entire DLL as an object with a set of methods corresponding to the functions defined by the DLL. Better yet, we could represent an entire API as an object, independently of what DLLs actually defined it.

Aliens can be defined for different foreign languages; for example, while Alien is used to interface with C, we also have a class called ObjectiveCAlien that can be used to interface with ObjectiveC, which is the native language on MacOS X. C Aliens and ObjectiveC Aliens do not interfere with each other, and when/if we need to add Java Aliens or CLR Aliens we can do that as well.

The alien approach is also a good fit with security: one need not be concerned that code may bypass high level language safety guarantees by calling out to C; untrusted code can be prevented from doing that, simply by not providing any Alien library objects to it.

Newspeak’s C Alien implementation is fast, but also dangerous. An alien is basically a blob of memory. The user of an Alien is responsible for interpreting and accessing that data correctly. There is no checking being done for you.

Tangent: It's worth noting that the basic Alien layer may evolve further; for example, we aren't thrilled with the practice of subclassing Alien. It's not clear if the Alien class really needs to change, or just the pattern of using it.

On top of this foundation, safer and/or more convenient abstractions can be built. We have built objects that support not just methods corresponding to the functions of an API, but also methods that provide factories for the various datatypes used in the function’s signatures, including those defined by macros. These objects wrap the basic alien API, and help with error prone book keeping - converting between Newspeak types (e.g., Strings) and foreign types, freeing aliens after use etc.

At the moment, both the declarations of low level aliens and higher level APIs are constructed manually, which is tedious and error prone. We’ve been planning on a higher level tool called CSlick, which would allow you to specify a set of .h files and the requisite DLLs, and obtain an object that supports the desired functions automatically.

As a first approximation, you could think of CSlick as a function:

CSlick: List -> List -> ForeignAPI

The signature above is deliberately curried, because you may actually want to be able to specify just the header files, and later bind different DLLs to provide the actual functionality, just as a .h file can be associated with different .c files.

When this will happen is anyone’s guess right now; but Vassili has done this before (in the context of Lisp) and I’m sure he can do it again.

The resulting foreign API should incorporate the low level alien API, and, as much as possible, a higher level API as well.

The CSlick implementation will need to know how to parse C header files, and how to reflectively manufacture the low level code that actually invokes the C functions. Fortunately we have a strong parsing infrastructure, so that isn’t as daunting as it sounds.

When I’ve told people about CSlick, they often mention SWIG. However, I believe CSlick can be made substantially easier to use than SWIG. SWIG has to cope with multiple languages, each with a pre-existing story on how to do foreign calls. In contrast, we can integrate CSlick more tightly with the language. Ultimately, that should translate to a simpler model for the user.

The key take away is that objects are all you really need to interact with foreign programming languages. They are better than built in language constructs in terms of ease of use, security, and multiple language support. As usual, less is more.