Relaxing the definition of isSomeString and isNarrowString

Jonathan M Davis via Digitalmars-d digitalmars-d at puremagic.com
Sun Aug 24 19:57:07 PDT 2014


On Monday, 25 August 2014 at 02:40:20 UTC, Vladimir Panteleev 
wrote:
> On Monday, 25 August 2014 at 01:31:35 UTC, H. S. Teoh via 
> Digitalmars-d wrote:
>> In D, an array of char, wchar, or dchar always means a Unicode 
>> encoding.
>> Non-Unicode encodings should be represented as ubyte[] (resp. 
>> ushort[]
>> or ulong[], if such exist) instead.
>
> This doesn't get you far in practice if you want to actually 
> operate on the text.

Well, all of the non-string specific stuff (like find) will work 
just find, but since all of the string-specific functions assume 
UTF-8, UTF-16, or UTF-32, you'll have to convert it. We can't 
really do otherwise, because you have to know what encoding 
you're dealing with to operate on it as a string, and than means 
that you need to either call specific functions which expect the 
encoding that you're using, or you need types specific to those 
encodings (in which case, you wouldn't use ubyte[] and the like 
directly).

We do need better support for other encodings, but I don't think 
that it really costs us anything to treat char as UTF-8, wchar as 
UTF-16, and dchar as UTF-32 and require that other encodings use 
different representations.

- Jonathan M Davis


More information about the Digitalmars-d mailing list