Few things II

Tue Aug 7 07:03:06 PDT 2007

Here are some more notes, that I think are less important/significant than the first ones. Probably some of the following notes are silly/useless. There are various things I don't know about D yet.

1) The Ddoc has some bugs. For example I'd like to write in the docs that a function takes an input in [0,1), but that produces a wrong html output.

----------------------

2) Regarding the printing functions:

2a) I think that the names writef e writefln are a bit too much long and not easy to remember. So better names can be found. For example show/shownl, show/showr, show/showl, put/putr, write/writel, print/printnl, print/printl, etc. I have seen Tango uses print.

2b) I like the idea of the writefl/write functions that don't perform formatting. I was printing a string that casually contained an % and that has produced a bug. So I've created a better printing function, that avoids that bug too.

2c) writefln may print structs too in a default way, my basic (non-recursive!) implementation (that represents structs as: <field1, field2, ...> ): 

char[] sstr(TyStruct)(TyStruct s) {
  auto result = new char[][s.tupleof.length];
  foreach (i, field; s.tupleof)
    result[i] = str(field);
  return "<" ~ result.join(", ") ~ ">";
}

2d) Regarding the printing I think it's not nice that:
writefln(["1", "2"])
prints the same thing as:
writefln([1, 2])
So maybe writefln can use "" when printing strings inside arrays/AAs, to differentiate them.
(To sort out the ideas you may also take a look at the difference between repr() and str() in Python).

2e) I also think writefln when in printing arrays/AAs can put spaces between the commas.
I think [el1, el2, el3] and [1:"a", 2:"b"] are more readable than [el1,el2,el3] and [1:a,2:b].

2f) I think this is a writefln inconsistency:
import std.stdio;
void main() {
  char[3] a = "abc";
  writefln([a, a]); // Prints: [abc,abc]
  writefln([a: 1]); // Prints: [[a,b,c]:1]
}

----------------------

3) Given an array x1, like a char[], it seems to me that x1.dup is a bit slower compared to:
char[] x2 = new char[x1.length];
x2[] = x1;
I hope this can be fixed (or maybe someone can tell me why).

----------------------

4) The __builtin_expect is a method that gcc (versions >= 2.96) offer for programmers to indicate branch prediction information to the compiler. The return value of __builtin_expect is the first argument (which could only be an integer) passed to it. This may be useful for dmd too, maybe for far future times, when dmd has solved most of its bugs and the D syntax is more finished.

----------------------

5) I think built-in sets can be quite useful (I use them often in Python, I can show some examples if you want). Few set operations among the keys of AAs may be quite useful too (Python doesn't have them). (A first version may be inside Phobos, implemented using the AAs with empty values.)

----------------------

6) I suggest to add a third optional to the AAs, the numerical progressive index, it seems easy to add, the foreach already supports more than two variables:
foreach(index, key, value; someAA) {...}

But I don't like much the fact that the foreach with one parameter scans the values:
foreach(value; someAA) {...}
because most times you have to iterate on the keys (in Python too the default iteration on a dict is on its keys). With the key you can find the value, while I believe with the value you can't find the key.

I'd like a way to find the key if you have a pointer to the actual value contained inside the AA. (I have seen you can cast the pointer to the value to a struct and you can start to walk back with that, but I have not succeed yet to find the key in a reliable way, you may suggest it to me. Maybe a function to do this can be added to Phobos.) I can use it to create a "ordered associative array" data structure (using the built-in AAs) that keeps a linked list of the AA pairs, according to the original insertion order inside the AA. To do that linked list of values you may need to find the key of a given value pointer (or a given key-value pair).

----------------------

7) A "len" property instead of "length" may be useful (even if it's a bit less clear) because it's used so much often it's boring to write such long word over and over. (And because 'length' isn't an easy to write word for non-English speaking people).

----------------------

8) From the docs:
Overlapping copies are an error:
s[0..2] = s[1..3];	// error, overlapping copy
s[1..3] = s[0..2];	// error, overlapping copy
Disallowing overlapping makes it possible for more aggressive parallel code optimizations than possible with the serial semantics of C.

But the programmer may need such operations anyway now and then, so maybe a different syntax can be added... I don't know. Or maybe other solutions can be found.

----------------------

9) Bit fields of structs: D is a practical language, so I think it can support them too, to allow people to translate C => D code more simply (D has already chosen similar practical compromises regarding C for other things).

----------------------

10) I think the compiler has to accept a line like this too, converting the strings as necessary:
char[][int] a = [1:"abba", 2:"hello"];

It may even accept this too (probably the values have to be converted to dynamic arrays):
auto a = [1:"abba", 2:"hello"];
auto aa1 = [1:2, 3:4];
auto sa = ["0x","1x"];
auto a2 = [[1, 2], [3, 4]];

While this works already:
auto a = [1, 2, 3, 4];
auto ca = ['0','1'];
int[int] aa2 = [1:2, 3:4];
int[][] a3 = [[1, 2], [3, 4]];

----------------------

11) I have seen that D AAs are quite efficient, but probably there are very good hash implementations around. So "superfasthash" or a modified version of the super refined and optimized Python/Perl hashes can be useful to replace the currently used by DMD. (Python source code is fully open source but code licenses may be a problem anyway, I don't know).

----------------------

12) I have seen D allows to pass anonymous functions like:
(int x){return x*x;}

C# V.3.0 uses a more synthetic syntax:
x => x*x

That syntax isn't bad, but I don't know how the compiler automatically manages the types in such situations, it can be a bit complex.

----------------------

13) The Haskell language community is quite organized, they have a large page on the main Haskell site that compares the best ways to create the entries for the Shootout site. They have even improved the language because one of such tests has shown a bad (slow) side of Haskell. I have recently posted few D entries into the Shootout site, with mixed results (one entry is at the top).
Regarding the shootout code the following code may be used by DMD developers to improve the compiler, it shows a bad side of DMD compared to GCC:

C code:

#include <stdio.h>
#include <stdlib.h>
unsigned long fib(unsigned long n) {
    return( (n < 2) ? 1 : (fib(n-2) + fib(n-1)) );
}
int main(int argc, char *argv[]) {
    int N = ((argc == 2) ? atoi(argv[1]) : 1);
    printf("%ld\n", fib(N));
    return(0);
}

The D code:

import std.stdio, std.conv;
ulong fib(ulong n) {
    return (n < 2) ? 1 : (fib(n-2) + fib(n-1));
}
void main(string[] argv) {
    ulong n = ((argv.length == 2) ? toUlong(argv[1]) : 1);
    writefln(fib(n));
}

Used:
gcc version 3.4.2 (mingw-special)
DMD Digital Mars D Compiler v1.020 (Windows)

Compiled with:
gcc -O2 fiboC.c -o fiboC.exe
dmd -O -release -inline fiboD.d

Timings with n = 38:
C: 4.38 s
D: 5.28 s

I think such diffece is big enough to justify an improvement.

----------------------

14) This code gives:
problem.d(5): Error: cannot evaluate StrType() at compile time
But I'd like to use typeid().toString at compile time (it's useful inside compile time functions, etc):

import std.stdio;
char[] StrType(T)() {
    return typeid(T).toString;
}
const char[] r = StrType!(typeof(1))();
void main() {
    writefln(r);
}

Later I have solved similar problems with mixins, compile-time demangle, etc, but I think the toString of typeid() may just work at compile time.

----------------------

15) On the newsgroup I have seen a list of "wishes", among them I like:
- support struct&array in switch
- Multi-Dimensional Allocation (of dynamic arrays too)
- Named keyword arguments
- Iterators and Generators (yield is a good starting point)
- Templates in classes
- !in (probably easy to implement)
- Multiple return values (tuples) (Quite useful, but I have found other solutions)
- Statically check for == null (maybe not easy to implement)
- Implicit New or short syntax for New

Beside them I like list generators from Python, but their syntax isn't much compatibile with D syntax, so I don't know how they can be written.

----------------------

16) Beside C++ and Python, I think D can copy Erlang e Fortress languages a bit too. Erlang is nice for its easy parallel capabilities, and the Fortress language made by Sun shows lot of collections literals (bags, sets, lists, etc), and an easy management of parallel processing (I think it looks somewhat similar to the ParallelPascal one).

----------------------

17) I am following this newsgroup for some weeks, I have seen now and then it pops out some DMD problems regarding type inferencing (I too have found a 'bug' related to that, I have shown it on the learn newsgroup). I am helping the development of "ShedSkin" it's a compiler for an implicitly static subset of Python to C++. This program contains a really powerful type inferencer able to find fully by itself the types of all the variables used inside a 3000-lines long Python program (that contains no explicit type annotations).
ShedSkin is rather slow and I think D will never need all its type inferencing capabilities, but I think its main point is to show that you can produce very fast code even if you start with code that's written with a very easy hi-level syntax. This 'discovery' may be useful to D too in the future.

----------------------

18) I don't like still that string literals are immutable while the others not. I suggest to make the string literals mutable as the other ones (automatic dup during program creation). This will avoid some of my bugs (and it can probably avoid problems in porting code Win <=>Linux that uses strings) and help me avoid some tricks to de-constant strings when they are constant. In few situations you may need to create a program that avoids heap activity, so in those situations you may need the normal string literals anyway, in such situations you may add a certain letter after the string literal (like c?) that tells the compile to not perform the automatic dup.

----------------------

19) I have created many very general function templates like map, imap, map, zip, sum, max, min, filter, ifilter, all, any, etc, that accept arrays, AAs and iterable classes. It may be useful to add a bit of functional-style programming to D, inside a Phobos module. Such functions may be fit for the 90% of the code where max running speed isn't necessary. They shorten the code, reduce bugs, speed up programming, make the program simpler to read, etc, and being hi-level they may even end helping the compiler produce efficient code (as foreach sometimes does).

----------------------

20) After few test of mine I think the default way dmd computes struct hashes has to be improved (or removed), to make it compute the hash looking inside the dynamic arrays contained into the struct too. Otherwise it silently gives the "wrong" results.

----------------------

21) In this newsgroup I've seen that people like a lot the ability of doing:
txt.split()
But I don't like it much because it's a syntactical sugar that masks the fact that those split are functions (that pollute the namespace) and they aren't true methods (object functions). That namespace pollution may cause problems, if you have already defined elsewhere a function with the same name and similar signature.

I think that's all, my successive post will probably be much smaller and mostly comments to specific posts :-)

Bear hugs,
bearophile