Programming Windows D Examples are now Online!

Tue Jun 21 12:35:13 PDT 2011

On 6/21/11, bearophile <bearophileHUGS at lycos.com> wrote:
> After translating so much code have you seen some recurring patterns in your
> code conversion bugs?

A lot of C code is missing default labels. I've used the github 2.054
clone to compile all examples in order to catch missing labels at
compile time. I've missed many of them before this feature became
available.

Float/Doubles are initialized to NaN in D, however I don't know what
they're initialized to in C. But it seems none of the examples had
floating point variables initialized. This causes halts/exceptions in
D, and without ddbg it's really hard to track down what went wrong. I
don't think the stack traces helped me in this regard. Generally
forgetting to initialize floats seems to be common in other C code as
well.

C struct initializer syntax is used for static arrays of structs in
the C code. I think this could easily be a source of bugs if you add
or remove fields from a struct. I'm not sure if that syntax works in D
but I've used explicit ctor calls instead, such as:

struct Foo { int x, y; }
Foo[5] foo = [Foo(1, 2), Foo(1, 2), Foo(1, 2), Foo(1, 2), Foo(1, 2)];

It's verbose but it's safer.

In some code a variable would be defined as an int, and returned as a
short. There are other cases where it's defined as an int, and then
casted to a ubyte (BYTE is an alias for ubyte there). I have to fix
some of these.

I did notice an annoying thing with D, for example:
ubyte val = min(0, 128);
Error: cannot implicitly convert expression (min(0,128)) of type int to ubyte

The problem with this is I have to add a cast or a safer to!() call.
to!() can only do a runtime check:
ubyte val = to!ubyte(min(1024, 2048));  // overflow exception at runtime

And a cast is completely unsafe:
ubyte val = cast(ubyte)(min(1024, 2048));  // no error, bugs introduced!

Of course the compiler doesn't know what min does, only that it
returns an int, so I don't know whether it could ever figure out that
a function would never return a value that overflows the assignment
variable (maybe this feature could be implemented for pure
functions..).

C Globals become an instant source of bugs as soon as you fire up a
new thread. This is due to D's thread-local storage by default
feature. I like D's approach, and C globals should be replaced with
messaging or some other technique. I've used __gshared in various
modules to make the examples compile and work properly, but this
should be replaced.
This isn't unique to these examples. I've seen globals used everywhere
for messaging purposes in C code, e.g. some audio processing examples
in a different library.

A weird thing I've seen in the examples was a pointer to a static
variable was passed to a newly spawned thread's function. And via this
static variable the original thread and the work thread communicated
together. How this can work is beyond me, but it can't work in D.
Luckily D's spawn() function does offer protection for this kind of
code.

These examples used implicit fallbacks a lot, with only a comment
explicitly mentioning that there is a fallback in play. I think it's
good that we've abandoned implicit fallbacks, they could easily cause
bugs.
But, on the other hand someone might get an error about an implicit
fallback and think they forgot to add a "break" when porting C code.
So they add that, but now they've introduced a bug because the
fallback was actually necessary. I had this happen to me in one
example because the fallback wasn't explicitly mentioned.

There are issues with calling format() and writef(). This is not just
for this translation project. Most C code uses some form of
printf/wsprintf to either store a string to a variable or print it
out. But the format specifiers often do not translate into D. I think
what is probably causing this is that C types are not the same as D
types, so when you see this in C code:
printf("Note Count %ld, (long) notescount);

This might not translate into D and you could easily get an exception
at runtime (a pity that this doesn't get caught at compile-time). The
root of the problem is that the format specifiers explicitly name the
type, while you inadvertently change the actual type in D (maybe you
were fixing a conversion bug), and then you randomly get an exception
thrown at runtime.

The #1 problem with these exceptions is that you get no error line
whatsoever. You just get back "Invalid format specifier" or something
like that. This is of no help, and I always have to fire up ddbg or
manually hunt down each format specifier and replace it with D's safe
"%s", which works great.
The exceptions with no error line problem is common for most Phobos
functions, I absolutely hate getting back an error message such as:
std.exception.ErrnoException at std\stdio.d(286): Cannot open file
`blablabla' in mode `r' (No such file or directory)

No stack trace, no error line except the error line in Phobos. I
*don't care* about error lines in Phobos, the error is in my code not
in Phobos, and I really think this needs to improve.

Many functions assume ASCII or USC2, both of which are wrong to assume
these days. I have some examples I still have to fix, because I've
used code like:
    string buffer;
    TextOut(buffer.toUTF16z, buffer.length);
This is incorrect code. TextOut takes the number of characters, not
code units when issuing the call. The code will compile and run fine
for the majority of cases because most characters will fit into
wchar's code point, and you will have a 1to1 code-point to code-unit
ratio. But add a Japanese character and you will see garbage being
printed out to the screen. The correct workaround for this is to
either:
1) Store everything as a dchar[]:
    dstring buffer;
    TextOut(buffer.toUTF16z, buffer.length);  // correct as long as
UTF never has more than 4 code units
2) Use some Phobos code-point counting function
    string buffer;
    TextOut(buffer.toUTF16z, std.utf.count(buffer));

The second option might cause performance issues, while using UTF32
dchars might cause excessive use of memory. Other solutions might be
to wrap strings and cache the count of code points. It's a balancing
act between memory and performance.

Some API functions take char* as a parameter when expecting some
pointer to a data buffer which can but doesn't necessarily store text.
I don't know why char* is used instead of void*, perhaps char* are
more often used with these functions so they've decided to type it
like that.

There are many cases of implicit 0 to void* casts, but D catches this
at compile time.

A big issue with porting is that D's character arrays are always
initialized to invalid values. This wreaks havoc in code, in
particular with static arrays. For example:

    char[100] buffer;
    APIWriteToBuffer(buffer.ptr);
    myDStringFunction(buffer[]);  // <- buggy code

The problem is twofold. First, the API call might not work because it
could easily expect a null terminator to be present as a first
character:
    char[100] buffer;
    APIWriteToBuffer(buffer.ptr);  // Might not work!

So I've had to do this:
    char[100] buffer;
    buffer[] = '\0';  // or buffer[0] = '\0', depending on the function
    APIWriteToBuffer(buffer.ptr);

Second, after the call you will have a buffer that has some valid text
stored into it, and then either null terminators to the length of the
array (if you did buffer[] = '\0'), OR you will have a null terminator
followed by a series of char.init invalid UTF values.

So when you finally call a D function with such a static array, it
could print out garbage or throw an exception:
    char[100] buffer;
    APIWriteToBuffer(buffer.ptr);
    myDStringFunction(buffer[]);  // what will this function do when
stumbles on '\0' or char.init?

Behind the scense that D function might in turn call a C function, so
continued from the last example:
    char[100] buffer;
    APIWriteToBuffer(buffer.ptr);
    myDStringFunction(buffer[]);

    void myDStringFunction(char[] buffer)
    {
        // huge mess, the API function might write nonsense like ['f',
'o', 'o', '\0', char.init, char.init..]
        APIPrintToScreen(buffer.ptr, buffer.length);
    }

So you have to do conversion and duplication all the time. You have to do:

    char[100] buffer;
    APIWriteToBuffer(buffer.ptr);  // buffer might now be ['f', 'o',
'o', '\0', char.init, char.init..];
    myDStringFunction(to!string(buffer.ptr.toStringz));

How it works: First, you need to treat the buffer as a C
null-terminated char array, hence using .ptr. Then, you have to call a
Phobos function so it can retrieve a string until the first null
terminator.

But it gets worse when UTF is in play because in v2.043 conv.to
doesn't know how to convert wchar*'s. I've made a request on github
and I think this feature will be incorporated, but for now you can't
do this:

    wchar[100] buffer;
    APIWriteToBuffer(buffer.ptr);
    myDStringFunction(to!string(buffer.ptr));  // error, to!() can't
convert wchar*

So I've had to create my own function which converts a wchar* to
wchar[], which can then be used for whatever purpose:

    wchar[100] buffer;
    APIWriteToBuffer(buffer.ptr);
    myDStringFunction(to!string(fromWStringz(buffer.ptr)));

So generally static arrays of characters and string handling between C
and D can be painful.