Inconsitency

monarch_dodra monarchdodra at gmail.com
Wed Oct 16 02:00:00 PDT 2013


On Wednesday, 16 October 2013 at 08:48:30 UTC, Chris wrote:
> On Wednesday, 16 October 2013 at 08:03:26 UTC, qznc wrote:
>> On Tuesday, 15 October 2013 at 14:11:37 UTC, Kagamin wrote:
>>> On Sunday, 13 October 2013 at 14:14:14 UTC, nickles wrote:
>>>> Also, I understand, that there is the std.utf.count() 
>>>> function which returns the length that I was searching for. 
>>>> However, why - if D is so UTF-8-centric - isn't this 
>>>> function implemented in the core like ".length"?
>>>
>>> Most code doesn't need to count graphemes and lives happily 
>>> with just strings, that's why it's not in the core.
>>
>> Most code might be buggy then.
>>
>> An issue the often comes up is file names. A file called "bär" 
>> will be normalized differently depending on the operating 
>> system. In both cases it is one grapheme. However, on Linux it 
>> is one code point, but on OS X it is two code points.
>
> Now that you mention it, I had a program that would send 
> strings to a socket written in D. Before I could process the 
> strings on OS X, I had to normalize the decomposed OS X version 
> of the strings to the composed form that D could handle, else 
> it wouldn't work. I used libutf8proc for it (only one tiny 
> little function). It was no problem to interface to the C 
> library, however, I thought it would have been nice, if D 
> could've handled this on its own without depending on third 
> party libraries.

I'm not sure this is a "D" issue though: It's a fact of unicode
that there are two different ways to write ä.


More information about the Digitalmars-d mailing list