Inconsitency

Chris wendlec at tcd.ie
Wed Oct 16 01:48:29 PDT 2013


On Wednesday, 16 October 2013 at 08:03:26 UTC, qznc wrote:
> On Tuesday, 15 October 2013 at 14:11:37 UTC, Kagamin wrote:
>> On Sunday, 13 October 2013 at 14:14:14 UTC, nickles wrote:
>>> Also, I understand, that there is the std.utf.count() 
>>> function which returns the length that I was searching for. 
>>> However, why - if D is so UTF-8-centric - isn't this function 
>>> implemented in the core like ".length"?
>>
>> Most code doesn't need to count graphemes and lives happily 
>> with just strings, that's why it's not in the core.
>
> Most code might be buggy then.
>
> An issue the often comes up is file names. A file called "bär" 
> will be normalized differently depending on the operating 
> system. In both cases it is one grapheme. However, on Linux it 
> is one code point, but on OS X it is two code points.

Now that you mention it, I had a program that would send strings 
to a socket written in D. Before I could process the strings on 
OS X, I had to normalize the decomposed OS X version of the 
strings to the composed form that D could handle, else it 
wouldn't work. I used libutf8proc for it (only one tiny little 
function). It was no problem to interface to the C library, 
however, I thought it would have been nice, if D could've handled 
this on its own without depending on third party libraries.


More information about the Digitalmars-d mailing list