Inconsitency

Chris wendlec at tcd.ie
Wed Oct 16 02:11:51 PDT 2013


On Wednesday, 16 October 2013 at 09:00:01 UTC, monarch_dodra 
wrote:
> On Wednesday, 16 October 2013 at 08:48:30 UTC, Chris wrote:
>> On Wednesday, 16 October 2013 at 08:03:26 UTC, qznc wrote:
>>> On Tuesday, 15 October 2013 at 14:11:37 UTC, Kagamin wrote:
>>>> On Sunday, 13 October 2013 at 14:14:14 UTC, nickles wrote:
>>>>> Also, I understand, that there is the std.utf.count() 
>>>>> function which returns the length that I was searching for. 
>>>>> However, why - if D is so UTF-8-centric - isn't this 
>>>>> function implemented in the core like ".length"?
>>>>
>>>> Most code doesn't need to count graphemes and lives happily 
>>>> with just strings, that's why it's not in the core.
>>>
>>> Most code might be buggy then.
>>>
>>> An issue the often comes up is file names. A file called 
>>> "bär" will be normalized differently depending on the 
>>> operating system. In both cases it is one grapheme. However, 
>>> on Linux it is one code point, but on OS X it is two code 
>>> points.
>>
>> Now that you mention it, I had a program that would send 
>> strings to a socket written in D. Before I could process the 
>> strings on OS X, I had to normalize the decomposed OS X 
>> version of the strings to the composed form that D could 
>> handle, else it wouldn't work. I used libutf8proc for it (only 
>> one tiny little function). It was no problem to interface to 
>> the C library, however, I thought it would have been nice, if 
>> D could've handled this on its own without depending on third 
>> party libraries.
>
> I'm not sure this is a "D" issue though: It's a fact of unicode
> that there are two different ways to write ä.

My point was it would have been nice to have a native D function 
that can convert between the two types, especially because this 
is a well known issue. NSString (Cocoa / Objective-C) for example 
has things like precomposedStringWithCompatibilityMapping etc.


More information about the Digitalmars-d mailing list