Inconsitency
Chris
wendlec at tcd.ie
Wed Oct 16 01:48:29 PDT 2013
On Wednesday, 16 October 2013 at 08:03:26 UTC, qznc wrote:
> On Tuesday, 15 October 2013 at 14:11:37 UTC, Kagamin wrote:
>> On Sunday, 13 October 2013 at 14:14:14 UTC, nickles wrote:
>>> Also, I understand, that there is the std.utf.count()
>>> function which returns the length that I was searching for.
>>> However, why - if D is so UTF-8-centric - isn't this function
>>> implemented in the core like ".length"?
>>
>> Most code doesn't need to count graphemes and lives happily
>> with just strings, that's why it's not in the core.
>
> Most code might be buggy then.
>
> An issue the often comes up is file names. A file called "bär"
> will be normalized differently depending on the operating
> system. In both cases it is one grapheme. However, on Linux it
> is one code point, but on OS X it is two code points.
Now that you mention it, I had a program that would send strings
to a socket written in D. Before I could process the strings on
OS X, I had to normalize the decomposed OS X version of the
strings to the composed form that D could handle, else it
wouldn't work. I used libutf8proc for it (only one tiny little
function). It was no problem to interface to the C library,
however, I thought it would have been nice, if D could've handled
this on its own without depending on third party libraries.
More information about the Digitalmars-d
mailing list