Relaxing the definition of isSomeString and isNarrowString

via Digitalmars-d digitalmars-d at puremagic.com
Sun Aug 24 11:28:51 PDT 2014


On Sunday, 24 August 2014 at 18:19:45 UTC, Andrew Godfrey wrote:
> The OP and the question of auto-decoding share the same root 
> problem: Even though D does a lot better with UTF than other 
> languages I've used, it still confuses characters with code 
> points somewhat. "Element type is some character" is an example 
> from OP. So clarify for me:
> If a programmer makes an array of either 'char' or 'wchar', 
> does that always, unambiguously, mean a UTF8 or UTF16 code 
> point?

It has to, because it is required by the specification. But ...

> E.g. If interoperating with C code, they will never make the 
> mistake of using these types for a non-string byte/word array?

... of course this cannot be guaranteed. In fact, even the 
druntime currently just assumes that program arguments and 
environment variables are UTF8 encoded on Unix, AFAIK. This is 
true in most cases, but of course not guaranteed. Potentially 
also problematic are the functions taking filenames. In Unix, 
filenames are just opaque arrays of bytes, but those functions 
take `string` parameters, i.e. assuming UTF8 encoding. This could 
force the user to place non-UTF8 sequences into strings.


More information about the Digitalmars-d mailing list