Text in D article

Sat Nov 18 08:31:27 PST 2006

Daniel Keep wrote:

> Very true.  I suppose I *should* say that literals are NUL-terminated,
> but I want to make it perfectly clear that relying on this is a bad
> idea; is it accepted practice to simply treat all strings as if they
> were possibly non NUL-terminated?

I'm not sure if the text primarily wants to discuss Unicode encodings,
or if it wants to discuss strings and text in D in general, but....

The main problem with printf is that you see a line like printf("foo")
and think that all strings are allowed. If neither would work, then it
wouldn't be as tempting to try it. But your conclusion/practice is OK,
you shouldn't use printf with D strings without having a *good* reason
(chances are that the C library will choke on the UTF-8 format anyway?)

Even the good ole "%.*s" hack is not portable to all possible platforms.
(it depends on how parameters are passed, think it breaks on Solaris...)
toStringz is the safest, even if you probably need to couple it with a
call to an encoding conversion if the local platform isn't using UTF-8 ?
But then you are on your own, the D library doesn't do such conversions.

Even simple D programs such as:
import std.stdio;
void main(char[][] args)
{
   foreach(char[] arg; args)
     writefln("%s", arg);
}

Will break down if you run them on a platform without UTF-8 support,
since you will get illegal strings in "args" (exceptions on writefln)
As a workaround you can cast them over to ubyte[], translate to UTF-8
from the local encoding, and cast them back into (now legal) char[]...
But I would hardly characterize that as a language "support" for the
legacy platforms, it's better to say D *requires* Unicode support ?

You might also want to touch briefly on the topics on COW and mutability
and how you might get segfaults writing to string literals. Or not... :)

--anders