ubyte vs. char for non-UTF-8 (was Re: toString vs. toUtf8)
Regan Heath
regan at netmail.co.nz
Wed Nov 21 01:59:19 PST 2007
Julio César Carrascal Urquijo wrote:
> Even Unicode has UCS which is the not-quite-UTF encoding used in Windows
> NT4 (yes, there are still lots of machines using NT4).
FYI: You probably already know this but I wanted to be sure, plus
others might find it of interest..
http://en.wikipedia.org/wiki/UTF-16
UCS2 is not quite UTF-16, but UCS2 is a subset of UTF-16 ("upwards
compatibility from UCS-2 to UTF-16"), it's essentially UTF-16 without
the surrogate pairs.
So, in D you can generally* say:
wchar[] data = cast(wchar[]) std.file.read("filename");
and it should work without throwing any invalid UTF errors.
* this may depend on whether it's UCS-2, UCS-2BE, or UCS-2LE. I'm not
sure which format D's UTF-16 is in.
Regan
More information about the Digitalmars-d
mailing list