Language Translations (was: DeRailed DSL)
Sean Kelly
sean at f4.ca
Sat Feb 10 10:26:14 PST 2007
Since there seems to be no escaping it, let's return to the realm of
theory for a moment. The ultimate goal of all tools and approaches
being discussed is to automate the process of representing one language,
A, in another language, B. From here I feel the problem space can be
broken into three general categories, the first being any case where a
strict A->B mapping is desired and little to no modification of the
output will occur. This may be because A is a superset of B and can
therefore the output is likely to be very close to the desired result
(as long as the domain remains in or near the boundaries of B), or
simply because the output can be used as reference material of sorts
with the embellishment handled elsewhere. A very limited example of
where A is a superset of B might be translating the Greek word for
'love' into English. In Greek, there are at least four separate words
to describe different kinds of affection, but all of these words can be
adequately described as short phrases in English.
A more technical example where embellishment of the output, B, is often
unnecessary is representing a database model in a language intended to
access the database. Typically, it is sufficient to perform A->B into a
set of definition modules (header files) and do the heavy lifting
separately in language B. The output of the translation is inspectible,
and any use of the output is verifiable as well. Compilers are the
preferred tool for such translations, and the problem is well
understood. Let's call this case A.
The second case is where a loose A->B mapping is desired or where a
great deal of modification of B will occur. To return to the Greek
example for a moment, someone translating English into Greek may need to
embellish the result to ensure that it communicates the proper intent.
And since the original intent is contextual, an intelligent analysis of
A is typically required.
Another situation that has been mentioned in this thread is the desire
to perform matrix operations in a language that does not support them
directly. In this case we would like to do the bulk of our work in B
but represent multiplication, addition, etc, in a manner that is
relatively efficient. The salient point here is that B already supports
mathematic expressions, and this extension is simply intended to
specialize B for additional type-driven semantics. Meta-language tools
tend to be fairly good at this, and several popular examples of this
particular solution exist, expression templates being one such. Let's
call this case B.
The third case is where the complexity of A and B are fairly equal and
the domains of each do not sufficiently overlap. In such a situation,
embellishment of the result of A->B is necessary to sufficiently express
the desired behavior. Let's call this case AB since the division of
work or complexity is roughly balanced.
From experience, it is evident that attempts to map solutions for case
A and case B onto this problem have distinct but recognizable issues.
Solutions for case A (ie. compilers) are excellent at a static A->B
translation, but if B is modified into B' and then A is changed, the new
A->B translation must again manually be converted to B', which tends to
generally be quite complex. From a business perspective, I have seen
cases where language A was thrown away entirely and all work done in
language B simply to avoid this process, and even then the vestiges of A
can have a long-lasting impact on work in B--often it's simply too
expensive to rewrite B' from scratch, but the existing B' is awkwardly
expressed because of the inexact mapping that took place.
Solutions for case B, on the other hand, have the opposite problem.
They allow for a great deal of flexibility in language B, but the way
they perform A->B tends to be impenetrable for any reasonably complex A,
and the process is typically not inspectible. The C macro language is
one example here, as are C++ and even D templates. In fact, since they
live in B I believe that the new mixin/import features belong to this
category as well. I do suspect that great improvements can be made
here, but I am skeptical that any such tool will ever be ideal for AB.
With this in mind, it seems clear that a third approach is required for
AB, but to discover such an approach let's first distill the previous
two approaches: solutions for A seem to exist as external agents which
perform the translation, while solutions for B seem to exist as
in-language compile-time languages. Solutions for A are insufficient
because they do not allow for ongoing manipulation of both A and B, and
solutions for B are insufficient because the expressing a means of
performing A->B within B is often awkward and occurs in a way that can
not be independently monitored.
My feeling is that the proper solution for case AB is a dynamic
composition of pre-defined units of B to express the meaning of A. Each
unit is individually inspectible and its meaning is well understood, so
any composition of such units should be comprehensible as well. I have
only limited experience here, but my impression is that fully reflected
dynamic languages are well-suited for this situation. Ruby on Rails is
one example of such a solution, and I suspect that similar examples
could be found for Lisp, etc.
Does this sound reasonable? And can anyone provide supporting or
conflicting examples? My goal here is simply to establish some general
parameters for the problem domain in an attempt to determine whether the
new and planned macro features for D will ever be suitable for AB
problems, and whether another solution for D might exist that is more
fitting or more optimal.
Sean
More information about the Digitalmars-d
mailing list