might be a bug in the DMD FrontEnd

Fri Mar 30 04:56:15 PDT 2007

Deewiant wrote:
> Daniel Keep wrote:
>> That said, I personally think that if you need to use printf because
>> writefln is barfing on your string, then that's a bug in your program.
>> char[] is UTF-8: if you're not storing UTF-8, you should be using
>> ubyte[], not char[].
> 
> I agree. However, both Phobos and Tango use char[] for all their
> string-processing functions _which also work on non-UTF-8_. This means that to
> call such a function you need to do, for instance,
> "std.string.strip(*cast(char[])iso-8859-1-string.ptr);" which gets ugly very
> quickly.
> 
> I hoped that Tango would use ubyte[] in the C standard library, at least, but
> no. I understand why not (standard; most people use only char[] and don't want
> to do the cast from char[]* to ubyte[]* as above; good for ASCII anyway), and so
> I don't complain, but it's still something I'd like.
> 
> Perhaps D needs a way to allow implicit conversion:
> 
> finally(ubyte[] is char[]) {
> 	char[] foo(ubyte[] myString) {
> 		return std.string.strip(myString.dup);
> 	}
> }
> 
> <g>

> foreach( dchar c ; some_string )
> {
>     // ...
> }

Would *not* work correctly with the above if your string contains
anything outside of the ASCII range.  Yes, the functions might work with
non-UTF-8 codepages, but that's more a side-effect of how they are
implemented.

I think what Phobos really needs is a character encoding conversion
library, even if it's just a paper-thin binding to iconv or something.

	-- Daniel

[1] I hope I've got the right term; I'm liable to get my head chewed off
if I'm wrong :P

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/