Relaxing the definition of isSomeString and isNarrowString

Andrew Godfrey via Digitalmars-d digitalmars-d at puremagic.com
Sun Aug 24 17:14:07 PDT 2014


On Sunday, 24 August 2014 at 18:43:36 UTC, Dmitry Olshansky wrote:
> 24-Aug-2014 22:19, Andrew Godfrey пишет:
>> The OP and the question of auto-decoding share the same root 
>> problem:
>> Even though D does a lot better with UTF than other languages 
>> I've used,
>> it still confuses characters with code points somewhat. 
>> "Element type is
>> some character" is an example from OP. So clarify for me:
>> If a programmer makes an array of either 'char' or 'wchar', 
>> does that
>> always, unambiguously, mean a UTF8 or UTF16 code point?
>
> Yes, pedantically - UTF-8 and UTF-16 code _units_. dchar is a 
> codepoint.
>
>> E.g. If
>> interoperating with C code, they will never make the mistake 
>> of using
>> these types for a non-string byte/word array?
>>
>
> char != byte, and compiler will reject pointer and array 
> assignments of byte* to char*, ubyte[] to char[] etc. Values 
> themselves are convertible, so would work with implicit 
> conversion.
>
>> If and only if this is true, then D has done well and I'm 
>> unafraid of
>> duck-typing here.

Both your answers are at the level of the compiler/language spec.
Relevant yes, but not complete. E.g. How often will people 
manually converting a .h file, convert C "const char *" correctly 
to either something char-based or something ubyte-based, 
depending on whether it represents utf-8 code points?
How often will they even know?
With wchar it's probably even worse, because of API's that
use one type but depending on other parameters,
the string elements can be utf-16 code points or
glyph indices.


More information about the Digitalmars-d mailing list