Relaxing the definition of isSomeString and isNarrowString
via Digitalmars-d
digitalmars-d at puremagic.com
Sun Aug 24 11:28:51 PDT 2014
On Sunday, 24 August 2014 at 18:19:45 UTC, Andrew Godfrey wrote:
> The OP and the question of auto-decoding share the same root
> problem: Even though D does a lot better with UTF than other
> languages I've used, it still confuses characters with code
> points somewhat. "Element type is some character" is an example
> from OP. So clarify for me:
> If a programmer makes an array of either 'char' or 'wchar',
> does that always, unambiguously, mean a UTF8 or UTF16 code
> point?
It has to, because it is required by the specification. But ...
> E.g. If interoperating with C code, they will never make the
> mistake of using these types for a non-string byte/word array?
... of course this cannot be guaranteed. In fact, even the
druntime currently just assumes that program arguments and
environment variables are UTF8 encoded on Unix, AFAIK. This is
true in most cases, but of course not guaranteed. Potentially
also problematic are the functions taking filenames. In Unix,
filenames are just opaque arrays of bytes, but those functions
take `string` parameters, i.e. assuming UTF8 encoding. This could
force the user to place non-UTF8 sequences into strings.
More information about the Digitalmars-d
mailing list