Guy Steele on language design

Bane branimir.milosavljevic at gmail.com
Wed Jan 20 01:52:01 PST 2010


bearophile Wrote:

> The "Coders at Work" book written by Peter Seibel is a good collection of interview to famous programmers (some parts of the book are too much long and boring). In a chapter Guy Steele tells some of his ideas about designing a language. He is a good programmer, he has given a significant help in the development of Java and Fortress languages.
> 
> Fortress purpose is to innovate, to be a future very hi-performance Fortran designed for highly parallel computers, so its purposes are partially the same of D (despite being a quite different language).
> 
> Here are three interesting quotations from his interview that I think can be interesting for the design of D too, followed by few of comments of mine.
> 
> ===============================
> 
> From page 355:
> 
> Steele: But I don't want to be seen as a detractor of Bjarne Stroustrup's effort. He set himself up a particular goal, which was to make an object-oriented language that would be fully backwards-compatible with C. That was a difficult task to set himself. And given that constraint, I think he came up with an admirable design and it has held up well. But given the kinds of goals that I have in programming, I think the decision to be backwards-compatible with C is a fatal flaw. It's just a set of difficulties that can't be overcome. C fundamentally has a corrupt type system. It's good enough to help you avoid some difficulties but it's not airtight and you can't count on it.
> 
> -----------------
> 
> For Fortress a C-like type system can't be enough. I don't know  what Steele thinks about D type system and the way it is used.
> 
> ===============================
> 
> From page 361:
> 
> Seibel: So are there language features that make programmers--folks who have mastered this unnatural act--more productive? You're designing a language right now so you've obviously got some opinions about this.
> 
> Steele: I said earlier that I think you can't afford to neglect correctness. On the other hand, I think we can design tools to make it easier to achieve that. We can't make it trivial, but I think we can make it easier to avoid mistakes of various kinds. A good example is overflow detection on arithmetic, or providing bignums instead of just letting 32-bit integers wrap around. Now, implementing those is more expensive but I believe that providing full-blown bignums is a little less error-prone for some kinds of programming. A trap that I find systems programmers and designers of operating-systems algorithms constantly falling into is they say, "Well, we need to synchronize some phases here so we're going to use a take-a-number strategy. Every time we enter a new phase of the computation we'll increment some variable and that'll be the new number and then the different participants will make sure they're all working on the same phase number before a certain operation happens." And that works pretty well in practice, but if you use a 32-bit integer it doesn't take that long to count to four billion anymore. What happens if that number wraps around? Will you still be OK or not? It turns out that a lot of such algorithms in the literature have that lurking bug. What if some thread stalls for 2 to the 32nd iterations? That's highly unlikely in practice, but it's a possibility. And one should either mitigate that correctness problem or else do the calculation to show that, yeah, it's sufficiently unlikely that I don't want to worry about it. Or maybe you're willing to accept one glitch every day. But the point is you should do the analysis rather than simply ignoring the issue. And the fact that counters can wrap around is a lurking pitfall that doesn't hurt most programmers but for a very few it lays traps in their algorithms.
> 
> -----------------
> 
> From what he says it seems that multiprecision integers (like the ones you can use in Python, CLisp, etc) have to be the default integer type in a language, because (as I've seen using languages that are able to catch integer overflows) they are safer.
> 
> The smart compiler can replace multiprecision numbers with fixed-sized ones everywhere it can demonstrate they can't overflow. Then the programmer can profile the code and replace the multiprecision ones with fixnums in the spots where (and if) he/she/shi experimentally sees the performance is not good enough. And even in those spots in non-release mode the code can test for overflows at runtime. If well implemented (this means they are fully stack-allocated unless the numbers become quite large, as in Python 2.x) this can be a way to avoid some bugs.
> 
> ===============================
> 
> From page 356-357:
> 
> Steele: I think it's important that a language be able to capture what the programmer wants to tell the computer, to be recorded and taken into account. Now different programmers have different styles and different ideas about what they want recorded. As I've progressed through my understanding of what ought to be recorded I think we want to say a lot more about data structures, we want to say a lot more about their invariants. The kinds of things we capture in Javadoc are the kinds of things that ought to be told to a compiler. If it's worth telling another programmer, it's worth telling the compiler, I think.
> 
> Seibel: Isn't most of the stuff in Javadoc, other than the human-readable prose, actually derived from the code?
> 
> Steele: Some of it is. But some of it isn't. Relationships between parameters are not well captured by Java code. For instance, here's an array and here's an integer and this integer ought to be a valid index into the array. That's something you can't easily say in Java. That's an important concept and in Fortress you are able to say such things.
> 
> Seibel: And they're compiled into runtime asserts or statically checked?
> 
> Steele: Whatever is appropriate. Both. In the case of Fortress we are trying to be able to capture those kinds of relationships. We talked about algebraic relationships earlier, the idea that some operation is associative. We want to be able to talk about that very explicitly in Fortress. And I don't expect that every applications programmer is going to stop and think, "You know, this subroutine I just invented is associative." But library programmers really care about that a lot. Partly because if they're going to use sophisticated implementation algorithms, the correctness of the algorithm hinges crucially on these properties. And so where it does depend crucially on those properties, we want a way to talk about them in a way the compiler can understand. I conjecture that that is an important approach to finding our way forward, to capture in the language important properties of programming.
> 
> Seibel: What about the role of the language in making it impossible to make mistakes? Some people say, "If we just lock this language down enough it'll be impossible to write bad code." Then other people say, "Forget it; that's a doomed enterprise, so we might as well just leave everything wide open and leave it to the programmers do be smart." How do you find that balance?
> 
> Steele: The important thing is just to realize that it is a trade-off that you make. And you can't hope to eradicate all bad code. You can hope to discourage certain kinds of likely errors by requiring "Mother, may I?" code; in order to do something difficult, you have to write something a little more elaborate to say, "Yes, I really meant this." Or you can purposely make it difficult or impossible to say a certain thing, such as, for example, to corrupt the type system. Which has its pluses and minuses--it's really hard to write device drivers for bare metal in a completely type-safe language just because the levels of abstraction are wrong for talking to the bare metal. Or you can try to add in stuff that lets you say, "This variable really is this device register at absolute address XXXX." That in itself is a kind of unsafe feature.
> 
> -----------------
> 
> Integer assured to be inside the range of indexes of an array: in D2 such indexes are less common compared to C (thanks to foreach, higher level functions, etc), but it seems a good idea.
> 
> A way to tell the compiler that a function (that takes two arguments) is commutative or associative: this sounds like a nice idea. It looks similar to the the pure/@pure attribute of D2, so the syntax is easy to invent here, @associative and @commutative :-) But I guess it's less easy to teach the compiler how to use such such annotations.
> 
> In C++0x one of the advanced purposes of the Axioms (http://en.wikipedia.org/wiki/Concepts_%28C%2B%2B%29#Axioms ) was to be able to tell the optimization stages of the compiler some invariants like that one. In the meantime the iedea of adding Axioms to C++0x is delayed of refused. In practice I think that you can gain most of the purposes of Axioms with a less general approach, because most times you don't need that full generality, and implementing it in the compiler looks hard. This means that having just few built-in annotations like @associative can be enough for many purposes.
> 
> Giving the language a way to move some more semantics from comments (or the programmer head) to the code is generally a good idea, and I think it's something that will be done more and more in languages of the future, but you have to be careful to not turn the language into something too much complex and fussy to use (as Cyclone), finding a good balance between requiring that extra semantics annotations everywhere and make them useless. CommonLisp shows a possible strategy to do this: when you compile a program or just a function at max optimization level, the compiler gives you a list of the spots where it is missing optimizations because it lacks the necessary semantics. So the programmer can add the semantics, type annotations and other annotations in those spots, and increase the the code performance.
> 
> Bye,
> bearophile

Hm, Guy Steele sounds like porn actor name. Interesting article, dough. 



More information about the Digitalmars-d mailing list