Few things

Wed Aug 1 03:18:17 PDT 2007

Hello, this is my second post on digitalmars newsgroups (the first one is on digitalmars.D.learn). Here are some ideas, suggestions (and some problems too) I have found. I am currently using dmd 1.020 on Windows. Probably some of them are useless (maybe because already present, in the same or different form), or already discussed/refused, or even plain stupid, but you may find something interesting too among them.

1) This is quite important for me: dmd may look for the needed modules by itself, starting from the current directory and package directories (the -I parameter is useful still for more complex situations), and the exploring the directories in the "path" variable. So instead of:

dmd importer.d amodule1.d amodule2.d ...

To compile/run you just need:

dmd importer.d
dmd -run importer.d

So the programmer can avoid giving the module names two times, once inside the code and once again to the compiler. Later I have found that the good "bud" utility does that and quite more, but I belive the compiler can have that basic capability of looking for modules by itself.

-------------------------

2) dmd can start looking for the files read by the "import expressions" from the current directory (where the main module is), so in most situations the -J flag isn't necessary anymore.

-------------------------

3) I think it may be better to import "statically" by default (as in python, in that language people suggest to avoid the "from amodule import *" that means import all names from amodule).

-------------------------

4) Python defines < <= == != >= > among dicts (AAs) too:

>>> {1:2, 2:3} == {1:2, 2:3}
True
>>> {1:3, 2:3} == {1:2, 2:3}
False
>>> {1:2, 2:3} < {1:2, 2:3}
False
>>> {1:2, 2:3} > {1:2, 2:3}
False
>>> {1:2, 2:3, 3:1} > {1:2, 2:3}
True
>>> {1:2, 2:3, 3:1} > {1:2, 2:3, 4:1}
False
>>> {1:2, 2:3, 3:1} < {1:2, 2:3, 4:1}
True

It seems not even the quite useful opEquals among AAs is defined yet in dmd V1.015:
assert(['a':2, 'b':3] == ['a':2, 'b':3]);

How I have implemented it:

bool equalAA(TyK1, TyV1, TyK2, TyV2)(TyV1[TyK1] aa1, TyV2[TyK2] aa2) {
  static if( !is(TyK1 == TyK2) || !is(TyV1 == TyV2) )
    return false;
  else {
    if (aa1.length != aa2.length)
      return false;
    foreach(k, v; aa1) {
      auto k_in_aa2 = k in aa2;
      if (!k_in_aa2 || (*k_in_aa2 != v))
        return false;
    }
    return true;
  }
}

Usecase: I use it inside unittests to test the correctness of functions that return an AA, comparing their result with the known correct results.

-------------------------

5) Iside the unit tests I'd like to use something else beside the assert() like the fail():

fails(something, someException1)
fails(something, Excep1, Excep2, ...)
(Or "raises" instead of "fails").

So instead of code like:

bool okay = false;
try
    foo(-10);
catch(NegativeException e)
    okay = true;
assert(okay);

I can use something like:

fails(foo(-10), NegativeException);

So far I've managed to create the following, but its usage is far from nice, it needs (maybe you can improve it):
assert(Raises!(...)(...))

bool Raises(TyExceptions...)(void delegate() deleg) {
  try
    deleg();
  catch(Exception e) {
    foreach(TyExc; TyExceptions)
      if (cast(TyExc)e !is null)
        return true;
    return false;
  }
  return (!TyExceptions.length);
}

-------------------------

6) It can be useful a way to create an empty associative array with something like:
new int[char[]]();

I have succed creating something similar, but I think a standard built-in way is better:

template AA(KeyType, ValueType) {
    const ValueType[KeyType] AA = AA_impl!(KeyType, ValueType).res;
}
template AA_impl(KeyType, ValueType) {
    ValueType[KeyType] result;
    const ValueType[KeyType] res = result;
}

Usage: AA!(char[], int)

-------------------------

7) From the FAQ: >Many people have asked for a requirement that there be a break between cases in a switch statement, that C's behavior of silently falling through is the cause of many bugs. The reason D doesn't change this is for the same reason that integral promotion rules and operator precedence rules were kept the same - to make code that looks the same as in C operate the same. If it had subtly different semantics, it will cause frustratingly subtle bugs.<

I agree with both points of view. My idea: calling this statement differently (like caseof) instead of "switch" (like in Pascal), so you can change its semantics too, removing the falling through (you may use the Pascal semantic too).

-------------------------

8) From the docs: >If the KeyType is a struct type, a default mechanism is used to compute the hash and comparisons of it based on the binary data within the struct value. A custom mechanism can be used by providing the following functions as struct members:<
I think structs can have a default way to sort them too, lexicographically, that can be replaced by a custom mechanism when needed. How I have implemented it:

int scmp(TyStruct1, TyStruct2)(TyStruct1 s1, TyStruct2 s2) {
  static if (s1.tupleof.length < s2.tupleof.length) {
    foreach(i, field; s1.tupleof) {
      if (field < s2.tupleof[i])
        return -1;
      else
        if (field > s2.tupleof[i])
        return 1;
    }
    return -1;
  } else {
    foreach(i, field; s2.tupleof) {
      if (field < s1.tupleof[i])
        return 1;
      else
        if (field > s1.tupleof[i])
        return -1;
    }
    static if (s1.tupleof.length == s2.tupleof.length)
      return 0;
    else
      return 1;
  }
}

How I use it, to shorten the code I use to sort structs:
struct P1 {
  char[] k;
  int v;
  int opCmp(P1 other) {
    return scmp(this, other);
  }
}

I think dmd can do something like that by default.

-------------------------

9) Often 90% of the lines of a program don't need to run at max speed or to use as little memory as possible, for such lines a higher-level kind of programming is the best thing. For the other 10% of lines, D allows a lower-level programming style, allowing assembly too.
For that 90% of code it may be useful to add an *optional* parameter to the sort property/methods of arrays, you may call it 'key' (see the same parameter of Python V.2.4+ sort/sorted), it's a comparison function that takes a value of the array and return another value that is compared to sort the original values.
(Note: CPython has a *really* optimized C sort routine, called Timsort (inside listobject), D may just copy it if some tests show it's faster & better). (There is an almost-free-standing version of Timsort too that can be found online.)

-------------------------

10) I think few useful properties/methods can be added to the built-in AAs:
- aa.clear() to remove all key/val from the AA.
- aa.dup, to create a copy of the AA.
- aa.sort, that compulsively takes a 'key' function (that takes two arguments, key and value) and returns the sorted array of the keys. Here is a simple example of such sorting:

TyKey[] sortedAA(TyKey, TyVal, TyFun)(TyVal[TyKey] aa, TyFun key) {
  struct Pair {
    TyKey k;
    ReturnType!(TyFun) key_kv;

    int opCmp(Pair otherPair) {
      if (key_kv == otherPair.key_kv) return 0;
      return (key_kv < otherPair.key_kv) ? -1 : 1;
    }
  }

  Pair[] pairs;
  pairs.length = aa.length;
  uint i = 0;
  foreach(k, v; aa) {
    pairs[i] = Pair(k, key(k, v));
    i++;
  }

  TyKey[] result;
  result.length = aa.length;
  foreach(ii, p; pairs.sort)
    result[ii] = p.k;

  return result;
}

You can use it with any key, like:
TyV Vgetter(TyK, TyV)(TyK k, TyV v) { return v; }
For the 10% of the lines of the program that need max speed or low memory usage, you can write sorting code the usual lower-level way.

-------------------------

11) I think std.string can have an iterable split too, similar to this one. For big input strings it's *much* faster and requires much less memory than the usual split (all Python is shifting from functions that return arrays to lazy iterators, that are often faster and need less memory):

Xsplitarray!(TyElem) xsplitarray(TyElem)(TyElem[] items, TyElem delimiter) {
  return new Xsplitarray!(TyElem)(items, delimiter);
}

class Xsplitarray(TyElem) {
  TyElem[] items, part;
  TyElem delimiter;

  this(TyElem[] initems, TyElem indelimiter) {
    items = initems;
    delimiter = indelimiter;
  }

  int opApply(int delegate(ref TyElem[]) dg) {
    size_t i, j;
    int result;

    while (j < items.length)
      for (; j<items.length; j++)
        if (items[j] == delimiter) {
          part = items[i .. j];
          result = dg(part);
          if (result)
            goto END; // ugly
          i = j = j + 1;
          break;
        }

    if (i <= items.length) {
      part = items[i .. length];
      dg(part);
    }

    END:
    return 0;
  }
}

-------------------------

12) From the docs:>Pointers in D can be broadly divided into two categories: those that point to garbage collected memory, and those that do not.<
GC pointers don't support some operations, so why not define two kinds of pointers (gcpointer?), so the compiler can raise an error when the code tries to do one of the many illegal operations (= that produces undefined behavior) on GC pointers? (If it's needed a cast() may be used to convert a GC pointer to a normal pointer and vice versa).

-------------------------

13) Coverage processing: numbers can become misaligned in very long loops that execute some lines lot of times. So I suggest three changes: ============ instead of "0000000" because among the other numbers my eyes spot a sequence of equal signs better than a sequence of zeros, automatic indenting management, and the ability to divide numbers by 1000 or 1'000'000. I use this little (a bit cryptic) Python script to do such processing (but I'd like dmd to do something similar by itself):

filename = "fannkuch3"
#remove_last = None  # then numbers shown are the original ones
#remove_last = -3    # then numbers shown are in thousands
remove_last = -6     # then numbers shown are in millions

lines = file(filename+ ".lst").readlines()[:-2]
parts = [line.split("|", 1) for line in lines]
n = max(len(p1) for p1, p2 in parts) # len of the maximum number
for p1, p2 in parts:
    p1s = p1.strip()
    p2b = "|" + p2.rstrip()
    if p1s:
        if p1s == "0000000":
            print ("=" * (n + remove_last)) + p2b
        else:
            print p1s.rjust(n)[:remove_last] + p2b
    else:
        print (" " * (n + remove_last)) + p2b

-------------------------

14) Maybe it can be useful to optionally associate a unittest to the name of the function/class/method tested. Possible syntax:
unittest (name) { }
Or:
unittest name { }
This may be used by IDEs to manage test-driven development in a smarter way. I don't know how the compiler itself can use that information.

-------------------------

15) On windows, expecially for newbies I think it can be positive to have a single zip with all necessary to use D (with dmd, dmc, bud and maybe dfl too).

-------------------------

16) I think std.string.chop and std.string.chomp have too much similar names, so one of such functions can be renamed (I think the function that removes the last char can be removed, it's not much useful. While the other function that removes the optionally present newline is quite useful).

-------------------------

17) It may be good if in dmd the -run parameter can be positioned anywhere in the command line.

-------------------------

18) In Python I find rather useful the built-in ** infix operator (it works among integers or floating point numbers):

>>> 2 ** 3
8
>>> 2 ** 0.5
1.4142135623730951
>>> -2 ** -2.5
-0.17677669529663689

std.math.pow is less useful when both arguments are integral.

-------------------------

19) The dmd compiler can tell what variables aren't used (Delphi Pascal does this) or the ones that aren't modified after the initialization... (that is that are essentially unused).

Hugs,
bearophile