ubyte vs. char for non-UTF-8 (was Re: toString vs. toUtf8)
Matti Niemenmaa
see_signature at for.real.address
Tue Nov 20 09:12:01 PST 2007
Regan Heath wrote:
> I think we should be encouraging people to convert this data to UTF-8
> before calling any D string handling functions on it (those that accept
> w/d/char[]). Which implies all D string handling functions should only
> operate on UTF-8/16/32.
This is an impossible task. Given a plaintext file, you cannot know what
encoding it is in. If you assume an encoding and convert it to UTF-8 for
internal use and then recode it back to that encoding for output, you may lose
information.
> w/d/char[] arrays are implicitly convertable to void[] (and void*?) so
> perhaps C functions should accept void* instead? I mean, void* means
> "pointer to something/anything"...
void* means "pointer to anything", as you say. ubyte* means "pointer to unsigned
byte(s)", which is a different thing entirely.
To me, ubyte[] means either integers in the range 0-255 or "arbitrary data".
void[] is more like "arbitrary memory": used for hacking around language
restrictions or for extremely low-level stuff such as memory management.
Would you consider malloc as returning the same type of data which mbstrlen accepts?
--
E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
More information about the Digitalmars-d
mailing list