Language Translations (was: DeRailed DSL)

Sat Feb 10 10:26:14 PST 2007

Since there seems to be no escaping it, let's return to the realm of 
theory for a moment.  The ultimate goal of all tools and approaches 
being discussed is to automate the process of representing one language, 
A, in another language, B.  From here I feel the problem space can be 
broken into three general categories, the first being any case where a 
strict A->B mapping is desired and little to no modification of the 
output will occur.  This may be because A is a superset of B and can 
therefore the output is likely to be very close to the desired result 
(as long as the domain remains in or near the boundaries of B), or 
simply because the output can be used as reference material of sorts 
with the embellishment handled elsewhere.  A very limited example of 
where A is a superset of B might be translating the Greek word for 
'love' into English.  In Greek, there are at least four separate words 
to describe different kinds of affection, but all of these words can be 
adequately described as short phrases in English.

A more technical example where embellishment of the output, B, is often 
unnecessary is representing a database model in a language intended to 
access the database.  Typically, it is sufficient to perform A->B into a 
set of definition modules (header files) and do the heavy lifting 
separately in language B.  The output of the translation is inspectible, 
and any use of the output is verifiable as well.  Compilers are the 
preferred tool for such translations, and the problem is well 
understood.  Let's call this case A.

The second case is where a loose A->B mapping is desired or where a 
great deal of modification of B will occur.  To return to the Greek 
example for a moment, someone translating English into Greek may need to 
embellish the result to ensure that it communicates the proper intent. 
And since the original intent is contextual, an intelligent analysis of 
A is typically required.

Another situation that has been mentioned in this thread is the desire 
to perform matrix operations in a language that does not support them 
directly.  In this case we would like to do the bulk of our work in B 
but represent multiplication, addition, etc, in a manner that is 
relatively efficient.  The salient point here is that B already supports 
mathematic expressions, and this extension is simply intended to 
specialize B for additional type-driven semantics.  Meta-language tools 
tend to be fairly good at this, and several popular examples of this 
particular solution exist, expression templates being one such.  Let's 
call this case B.

The third case is where the complexity of A and B are fairly equal and 
the domains of each do not sufficiently overlap.  In such a situation, 
embellishment of the result of A->B is necessary to sufficiently express 
the desired behavior.  Let's call this case AB since the division of 
work or complexity is roughly balanced.

 From experience, it is evident that attempts to map solutions for case 
A and case B onto this problem have distinct but recognizable issues. 
Solutions for case A (ie. compilers) are excellent at a static A->B 
translation, but if B is modified into B' and then A is changed, the new 
A->B translation must again manually be converted to B', which tends to 
generally be quite complex.  From a business perspective, I have seen 
cases where language A was thrown away entirely and all work done in 
language B simply to avoid this process, and even then the vestiges of A 
can have a long-lasting impact on work in B--often it's simply too 
expensive to rewrite B' from scratch, but the existing B' is awkwardly 
expressed because of the inexact mapping that took place.

Solutions for case B, on the other hand, have the opposite problem. 
They allow for a great deal of flexibility in language B, but the way 
they perform A->B tends to be impenetrable for any reasonably complex A, 
and the process is typically not inspectible.  The C macro language is 
one example here, as are C++ and even D templates.  In fact, since they 
live in B I believe that the new mixin/import features belong to this 
category as well.  I do suspect that great improvements can be made 
here, but I am skeptical that any such tool will ever be ideal for AB.

With this in mind, it seems clear that a third approach is required for 
AB, but to discover such an approach let's first distill the previous 
two approaches: solutions for A seem to exist as external agents which 
perform the translation, while solutions for B seem to exist as 
in-language compile-time languages.  Solutions for A are insufficient 
because they do not allow for ongoing manipulation of both A and B, and 
solutions for B are insufficient because the expressing a means of 
performing A->B within B is often awkward and occurs in a way that can 
not be independently monitored.

My feeling is that the proper solution for case AB is a dynamic 
composition of pre-defined units of B to express the meaning of A.  Each 
unit is individually inspectible and its meaning is well understood, so 
any composition of such units should be comprehensible as well.  I have 
only limited experience here, but my impression is that fully reflected 
dynamic languages are well-suited for this situation.  Ruby on Rails is 
one example of such a solution, and I suspect that similar examples 
could be found for Lisp, etc.

Does this sound reasonable?  And can anyone provide supporting or 
conflicting examples?  My goal here is simply to establish some general 
parameters for the problem domain in an attempt to determine whether the 
new and planned macro features for D will ever be suitable for AB 
problems, and whether another solution for D might exist that is more 
fitting or more optimal.

Sean