Notes IV

Tue Jan 22 15:07:04 PST 2008

With a bit more experience of D, here is my 4th list of notes about D v.1.x (maybe this is the last one, because I have already said most things I can think of). Few bits are repeated from the precedent notes because I have new arguments to support them.
Probably some of the following things are silly, and you can ignore them, but I believe some of them are meaningful enough.

1) In D I often enough do mistekes like this:
foreach (i, a, myobj)
But usually not this one:
foreach (i; a; myobj)
There for my eyes it's not always easy to spot the difference between "," and ";", so I think a "in" instead of ";" (as used in Python and C# too) can be seen better:
foreach (i, a in myobj)

2) I think in some cases it may be possible to unify the functions of std.conv, like:
toFloat(x)
with the casting, like:
cast(float)x
So to convert an int x to float you can use:
float(x)
To convert a string to float you can use the same syntax, with no need of the std lib:
float(" -12.6  ")
(Note the spaces, they are ignored).
I don't know how badly this can interact with other bits of D syntax.

3) In my D code I keep writing "length" all the time (currently I can find 458 "length" words inside my d libs). It's a long word, $ inside the scope of [] helps reducing the typing, but I often enough write "lenght", so I think still a default attribute named "len" (as in Python) may be better than "length". The attribute "dup" too is an abbreviation, probably of "duplicate", so abbreviations seem acceptable in such context.

4) Regarding the array/string concatenation the D docs say:
>Many languages overload the + operator to mean concatenation. This confusingly leads to, does:
"10" + 3
produce the number 13 or the string "103" as the result? It isn't obvious, and the language designers wind up carefully writing rules to disambiguate it - rules that get incorrectly implemented, overlooked, forgotten, and ignored. It's much better to have + mean addition, and a separate operator to be array concatenation.<

I don't agree with that. Maybe that comment is true for Perl, where the wild casting is automatic, or it's true for Java, that automatically converts values to strings if you add them to a strings, but other languages are quite strict, like Python, that allows you only to "sum" two strings or to sum two numbers:
>>> "10" + "3"
'103'
>>> 10 + 3
13
>>> "10" + 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot concatenate 'str' and 'int' objects
>>> 10 + "3"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Thanks to such strong (but dynamic) typing in Python I've never had problems in using "+" to join strings, and if you look in the Python newsgroup you will find surprisingly little complaints from people doing errors because of such overloading of "+". So I think "+" and "+=" are fine to join strings/arrays if you use them in a strong typed way, beside being more standard, because various languages use "+" for that purpose. (And string join is a very common operation, and to input "~" it forces to use the numerical keyboard to input it if you don't use an English keyboard). 

5) AST macros: I think they add power to the language, and I think I'll enjoy using them. It will increase the appeal & sexiness of the language for some people. But they have downsides too, and not just derived from how much/how little hygienic they are. In Lisp macros are very useful, but a common compliant is that "with macros every programmer reinvents his/her language, making difficult understand and modify the code written by others". So I suggest to be careful in adding macros to D... Unfortunately I don't have a better suggestion to give. Note that macros that are present inside the STD lib avoid that problem, because they are standard, everyone uses them, and most people don't need to understand how the insides of the std lib works (as the source of the C++ STL, or of Blitz++, etc). 

6a) Often most of the time necessary to write programs is used to debug the code. So I encourage D to try to adopt syntax and other constructs that help avoid bugs in the first place. Many bugs can be avoided adding certain runtime cheeks that the compiler can remove when the code is compiled in release mode.
6b) It may be useful to create an online repository of bugs present in D code written by all people (ranked by their experience?), that may allow us to know what parts of D syntax lead to more bugs in people code, so we can fix the language to avoid some of them :-) For example I'd like to know if the error in the foreach() caused by "," and ";" of point (1) is common to other D programmers, or if (unlikely) it's just a problem of mine.
6c) Some of the features of MemCheek seem one of those useful things that can be active by default and be disabled in release mode:
http://hald.dnsalias.net/projects/memcheck/
(If they are built-in they are more useful because if built-in then everyone uses them by default, like array bound cheeks).
6d) (This was present in one of my precedent lists of notes, but in a less defined way) To reduce some kinds of bugs "*" can be used for GC pointers and the "@" symbol can be used for normal pointers, so the programmer can better specify his/her intentions, so the compiler can catch as compile-time errors the operations that aren't allowed on GC-pointers (but are allowed on normal pointers). A casting operation can be then be defined to convert the two classes of pointers.

7) The D syntax of is() is powerful, but I think in some of its variants it's not much readable, so may there may be a better syntax (even if requires is() to be split in my more than one syntax).

8) I think string functions of Phobos are quite usable and powerful enough, but I think they are a bit too many/much complex. So I think it's better to reduce their number/complexity a bit. I think it's very positive when about 90% of the string functions can fit in a brain and they can be used from memory, leaving the necessity to read the Phobos docs only in the few cases where you need some subtler/less common string function. Such high memory recall rate is common among Python programmers (while Delphi has tons of very fast string functions (often written in assembly) and I have never succeed learning a large percentage of them), and it allows you to speed up you programming a lot. If you put lot of string functions in a lib, with complex usage, you obtain a more powerful string library, but then you have to look the manuals often, and the coding becomes slow. That's why I suggested that the "chomp"/"chop" names are too much similar and easy to confound.

9) AA literals need lot of improvement:
void main() {
  int[][int] aa1;
  aa1[10] = [1, 2, 3];
  aa1[20] = [10, 20];

  //auto aa2 = [10: [1, 2, 3],
  //            20: [10, 20]];
  //test.d(9): Error: cannot infer type from this array initializer
  //test.d(9): Error: array initializers as expressions are not allowed
  //test.d(9): Error: cannot use array to initialize int
  //test.d(9): Error: array initializers as expressions are not allowed

  //auto aa2 = [10: [1, 2, 3].dup,
  //            20: [10, 20]];
  // test.d(16): comma expected separating array initializers, not .
  // test.d(16): semicolon expected following auto declaration, not 'dup'
  // test.d(17): found ':' when expecting ';' following 'statement'
  // test.d(17): found ']' when expecting ';' following 'statement'

  //int[][int] aa2 = [10: [1, 2, 3].dup,
  //                  20: [10, 20]];
  // test.d(23): comma expected separating array initializers, not .
  // test.d(23): semicolon expected, not 'dup'
  // test.d(24): found ':' when expecting ';' following 'statement'
  // test.d(24): found ']' when expecting ';' following 'statement'

  //auto aa2 = [10: ([1, 2, 3].dup),
  //            20: [10, 20]];
  //test.d(30): Error: cannot infer type from this array initializer
  //test.d(30): Error: array initializers as expressions are not allowed
  //test.d(30): Error: cannot use array to initialize int
  //test.d(30): Error: array initializers as expressions are not allowed
  //test.d(31): variable test.main.aa2 is not a static and cannot have static initializer

  int[][int] aa2 = [10: ([1, 2, 3].dup),
                    20: [10, 20]];
  // OK
}

10a) The new syntax for properties in C# seems nice; instead of this code:
private int myval;
public int Myval { get { return myval; } private set { myval = value; } }
You just need:
public int property Myval { get; private set; }
10b) In C# "yield" too seem to have a nice syntax:
http://en.wikipedia.org/wiki/Comparison_of_C_Sharp_and_Java

11) Regarding the way to reference variables in the global scope, D uses the syntax:
  .varname
(The Python community has discussed similar topics, but they have different problems
(because D variables are explicitly present or absent:
http://www.python.org/dev/peps/pep-3104/ ).
But ".varname" may be too much error-prone because the dot isn't much visible. So something longer and more explicit may be more visible and less easy to miss, like:
  global(x)
Note that we may think about a notation to specify how many scopes to ascend:
  ...x
That can be written as:
  outer(outer(outer(x)))
But this capability of ascending many scopes makes the code a tangled mess, so it's anti-feature.

12) In Python functions are objects, so you can add them attributes:
def foo(inc):
  foo.tot += inc
  return foo.tot
foo.tot = 10
foo(10)
print foo.tot # prints 20

So is it a silly idea to allow public static variables in D functions, to do something similar?

int foo(int inc) {
  public static int tot;
  foo.tot += inc;
  return foo.tot;
}
void main() {
  foo.tot = 10;
  foo(10);
  printf("%d\n", foo.tot);
}

13a) random functions in Phobos of DMD 2.x: a random function has contrasting requirements, because you need them in very different situations. I think such requirements can be satisfied with using two different random functions:
- One very fast RND function, with very simple and short syntax, useful for little programs or where you need to compute lot of randoms, like in a little game. It may use the Kiss algorithm used by Tango.
- One slower RND generator, it has to be very good, and thread safe. So this can use the Mersenne Twister, be a class and use a longer dotted syntax.
13b) In my d libs I have added some very useful functions like choice(sequence), randInt(a,b), randRange, shuffle(sequence), etc. I think they are almost necessary, and very easy to implement.

14) D follows the good choice of fixing the length of all types, but real. I can accept that some compilers and CPUs can support 80-bit floating point values, while others can't, but I don't like to use "real"s leaving the compiler the choice to use 64 or 80 bit to implement them. So "real" can be renamed "real80", and have a fixed length. If the compiler/CPU doesn't allow 80 bit floating point numbers, then fine, you don't find real80 defined, and if you use them you get a syntax error (or you use a static if + an alias to rename float as real, so you can fake them by yourself. I don't like the compiler to fake them silently for me).

15) I'd like to import modules only inside unittests, or just when I use the -unittest flag. With the help of a static if something simple like this may suffice:
static if (unittest) {
  import std.stdio;
  unittest foo1 { ... }
  unittest bar1 { ... }
}

16) Against C rules, in some situations I think it may be better if some results are upcasted to ulong/long:
import std.stdio;
void main() {
  uint a = 3_000_000_000;
  uint b = 3_000_000_000;
  writefln(a + b); // 1705032704

  int c = -1_600_000_000;
  int d = -1_600_000_000;
  writefln(c + d); // 1094967296
}
(The Ada language uses a different solution to avoid such class of bugs, but it may be too much far from the style of C-like languages. Delphi looks like a compromise).

17) After using D for some time I think still that "and", "or" (and maybe "not") as in Python are more readable than "&&" and "||" (and maybe !, but this is less important).
The only good side of keeping them in D is to make the compiler digest C-like code better.
GCC has the -foperator-names option, that allows you to use "and", "or", etc, in C++ code.

--------------------------

The following things are sources of bugs, I don't know how much frequent such bugs are in real code, and I don't have idea how to modify the syntax/grammar/compiler to avoid/reduce their occurrence:

18a) In some situations side-effects may be bad for the health of the programmer:
import std.stdio;
void main() {
  int[5] x = 10, y = 20;
  writefln (x, " ", y); // [10,10,10,10,10] [20,20,20,20,20]

  int i = 0;
  while (i < x.length)
    y[i] = x[i++];
    //y[i++] = x[i]; // OK
  writefln (x, " ", y); // [10,10,10,10,10] [20,10,10,10,10]
}
D doesn't like warnings, but maybe some way can be invented to avoid such kind of errors (Python doesn't have ++ -- to avoid that kind of bugs. You use += -= or two separated instructions. I like compact code, but not when it leads to more bugs, so I tend to avoid putting ++ and -- as subexpressions, I use them on their own or in instructions separated by comma like in the for()).

18b)
Here indentation doesn't follow the program meaning:
if (x == 0)
    if (y == 0)
        foo();
else
    z = x + y;

In Python such possible source of bugs is absent, because the indentation is the only important thing, so this:

if x == 0:
    if y == 0:
        foo()
else:
    z = x + y

Is different from this other one, and you can see the difference very well:

if x == 0:
    if y == 0:
        foo()
    else:
        z = x + y

The net result is that in Python (once you have an editor that helps you avoiding mixing leading tabs and spaces) those bugs can be avoided. And I like that a lot. To avoid that kind of bugs the guidelines of good C/C++ (and probably Java/C# too) coding often tell you to always use brackets:

if (x == 0) {
    if (y == 0) {
        foo();
	}
} else {
    z = x + y;
}

That way the code becomes longer, but I think you can avoid some bugs.
D already disallows:
  if (a > b);
  for (...);
But Making D *require* {} after for, while, if/else, ecc looks like a "draconian" way to avoid that kind of bugs. I don't have better solutions, but maybe we/you can think some about this.

18c) This is another silly bug, but I presume it's not common enough to justify compiler changes:
void main() {
  void foo(int x) { printf("%d\n", x); }
  foo((1, 2)); // prints 2
}

Bye,
bearophile