Relaxing the definition of isSomeString and isNarrowString
Jonathan M Davis via Digitalmars-d
digitalmars-d at puremagic.com
Sun Aug 24 19:57:07 PDT 2014
On Monday, 25 August 2014 at 02:40:20 UTC, Vladimir Panteleev
wrote:
> On Monday, 25 August 2014 at 01:31:35 UTC, H. S. Teoh via
> Digitalmars-d wrote:
>> In D, an array of char, wchar, or dchar always means a Unicode
>> encoding.
>> Non-Unicode encodings should be represented as ubyte[] (resp.
>> ushort[]
>> or ulong[], if such exist) instead.
>
> This doesn't get you far in practice if you want to actually
> operate on the text.
Well, all of the non-string specific stuff (like find) will work
just find, but since all of the string-specific functions assume
UTF-8, UTF-16, or UTF-32, you'll have to convert it. We can't
really do otherwise, because you have to know what encoding
you're dealing with to operate on it as a string, and than means
that you need to either call specific functions which expect the
encoding that you're using, or you need types specific to those
encodings (in which case, you wouldn't use ubyte[] and the like
directly).
We do need better support for other encodings, but I don't think
that it really costs us anything to treat char as UTF-8, wchar as
UTF-16, and dchar as UTF-32 and require that other encodings use
different representations.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list