Few things II

Tue Aug 7 08:49:37 PDT 2007

bearophile wrote:
> 
> 6) I suggest to add a third optional to the AAs, the numerical progressive index, it seems easy to add, the foreach already supports more than two variables:
> foreach(index, key, value; someAA) {...}
> 
> But I don't like much the fact that the foreach with one parameter scans the values:
> foreach(value; someAA) {...}
> because most times you have to iterate on the keys (in Python too the default iteration on a dict is on its keys). With the key you can find the value, while I believe with the value you can't find the key.

I disagree.  Since the key in an AA is equivalent to the index in an 
array, the current behavior of foreach is consistent for all types 
passed to it.  One parameter gives you the value, two parameters gives 
you the 'key' and the value.  And since the key for an AA is equivalent 
to the index value, I'm not sure I like the three parameter idea either. 
  It's just as easy to use a separate variable for that when counting 
entries is important anyway.

> I'd like a way to find the key if you have a pointer to the actual value contained inside the AA. (I have seen you can cast the pointer to the value to a struct and you can start to walk back with that, but I have not succeed yet to find the key in a reliable way, you may suggest it to me. Maybe a function to do this can be added to Phobos.) I can use it to create a "ordered associative array" data structure (using the built-in AAs) that keeps a linked list of the AA pairs, according to the original insertion order inside the AA. To do that linked list of values you may need to find the key of a given value pointer (or a given key-value pair).

It's hardly efficient, but:

Key getKeyOf(Value, Key)( Value[Key] aa, Value val )
{
     foreach( k, v; aa )
     {
         if( v == val )
             return k;
     }
     return Key.init;
}

> ----------------------
> 
> 7) A "len" property instead of "length" may be useful (even if it's a bit less clear) because it's used so much often it's boring to write such long word over and over. (And because 'length' isn't an easy to write word for non-English speaking people).

It would bed nice if there were a way to support this within the current 
syntax.  Though I suppose a property method could be created:

void copyTo(T)( T src, T dst )
in
{
     assert( src.length == dst.length );
}
body
{
     memmove( dst.ptr, src.ptr, src.length );
}

myArray[1 .. 3].copyTo( myArray[2 .. 4] );

> ----------------------
> 
> 11) I have seen that D AAs are quite efficient, but probably there are very good hash implementations around. So "superfasthash" or a modified version of the super refined and optimized Python/Perl hashes can be useful to replace the currently used by DMD. (Python source code is fully open source but code licenses may be a problem anyway, I don't know).

Not hard to do.  I've considered even using a "power of two" number of 
buckets for the D AA but because the degenerate case is so much worse 
than with a prime number of buckets I think the current implementation 
is more generally useful.  A library hashtable is probably the best 
option for more specialized needs.

> ----------------------
> 
> 12) I have seen D allows to pass anonymous functions like:
> (int x){return x*x;}
> 
> C# V.3.0 uses a more synthetic syntax:
> x => x*x

Crazy.  Who knew C# had functional influences.

> ----------------------
> 
> 13) The Haskell language community is quite organized, they have a large page on the main Haskell site that compares the best ways to create the entries for the Shootout site. They have even improved the language because one of such tests has shown a bad (slow) side of Haskell. I have recently posted few D entries into the Shootout site, with mixed results (one entry is at the top).
> Regarding the shootout code the following code may be used by DMD developers to improve the compiler, it shows a bad side of DMD compared to GCC:
> 
> C code:
> 
> #include <stdio.h>
> #include <stdlib.h>
> unsigned long fib(unsigned long n) {
>     return( (n < 2) ? 1 : (fib(n-2) + fib(n-1)) );
> }
> int main(int argc, char *argv[]) {
>     int N = ((argc == 2) ? atoi(argv[1]) : 1);
>     printf("%ld\n", fib(N));
>     return(0);
> }
> 
> 
> The D code:
> 
> import std.stdio, std.conv;
> ulong fib(ulong n) {
>     return (n < 2) ? 1 : (fib(n-2) + fib(n-1));
> }
> void main(string[] argv) {
>     ulong n = ((argv.length == 2) ? toUlong(argv[1]) : 1);
>     writefln(fib(n));
> }
> 
> 
> Used:
> gcc version 3.4.2 (mingw-special)
> DMD Digital Mars D Compiler v1.020 (Windows)
> 
> Compiled with:
> gcc -O2 fiboC.c -o fiboC.exe
> dmd -O -release -inline fiboD.d
> 
> Timings with n = 38:
> C: 4.38 s
> D: 5.28 s
> 
> I think such diffece is big enough to justify an improvement.

Unless this was run on a 64-bit Unix machine, the major differernce in 
performance is that 'long' in the C code is a 32-but value, while 'long' 
in the D code is a 64-bit value.  Someone should verify that the sizes 
match.

> ----------------------
> 
> 14) This code gives:
> problem.d(5): Error: cannot evaluate StrType() at compile time
> But I'd like to use typeid().toString at compile time (it's useful inside compile time functions, etc):
> 
> import std.stdio;
> char[] StrType(T)() {
>     return typeid(T).toString;
> }
> const char[] r = StrType!(typeof(1))();
> void main() {
>     writefln(r);
> }

What about .stringof?

Sean