ubyte vs. char for non-UTF-8 (was Re: toString vs. toUtf8)

Wed Nov 21 01:59:19 PST 2007

Julio César Carrascal Urquijo wrote:
> Even Unicode has UCS which is the not-quite-UTF encoding used in Windows 
> NT4 (yes, there are still lots of machines using NT4).

FYI:  You probably already know this but I wanted to be sure, plus 
others might find it of interest..

http://en.wikipedia.org/wiki/UTF-16

UCS2 is not quite UTF-16, but UCS2 is a subset of UTF-16 ("upwards 
compatibility from UCS-2 to UTF-16"), it's essentially UTF-16 without 
the surrogate pairs.

So, in D you can generally* say:
wchar[] data = cast(wchar[]) std.file.read("filename");

and it should work without throwing any invalid UTF errors.

* this may depend on whether it's UCS-2, UCS-2BE, or UCS-2LE.  I'm not 
sure which format D's UTF-16 is in.

Regan