Inconsitency
Chris
wendlec at tcd.ie
Wed Oct 16 02:11:51 PDT 2013
On Wednesday, 16 October 2013 at 09:00:01 UTC, monarch_dodra
wrote:
> On Wednesday, 16 October 2013 at 08:48:30 UTC, Chris wrote:
>> On Wednesday, 16 October 2013 at 08:03:26 UTC, qznc wrote:
>>> On Tuesday, 15 October 2013 at 14:11:37 UTC, Kagamin wrote:
>>>> On Sunday, 13 October 2013 at 14:14:14 UTC, nickles wrote:
>>>>> Also, I understand, that there is the std.utf.count()
>>>>> function which returns the length that I was searching for.
>>>>> However, why - if D is so UTF-8-centric - isn't this
>>>>> function implemented in the core like ".length"?
>>>>
>>>> Most code doesn't need to count graphemes and lives happily
>>>> with just strings, that's why it's not in the core.
>>>
>>> Most code might be buggy then.
>>>
>>> An issue the often comes up is file names. A file called
>>> "bär" will be normalized differently depending on the
>>> operating system. In both cases it is one grapheme. However,
>>> on Linux it is one code point, but on OS X it is two code
>>> points.
>>
>> Now that you mention it, I had a program that would send
>> strings to a socket written in D. Before I could process the
>> strings on OS X, I had to normalize the decomposed OS X
>> version of the strings to the composed form that D could
>> handle, else it wouldn't work. I used libutf8proc for it (only
>> one tiny little function). It was no problem to interface to
>> the C library, however, I thought it would have been nice, if
>> D could've handled this on its own without depending on third
>> party libraries.
>
> I'm not sure this is a "D" issue though: It's a fact of unicode
> that there are two different ways to write ä.
My point was it would have been nice to have a native D function
that can convert between the two types, especially because this
is a well known issue. NSString (Cocoa / Objective-C) for example
has things like precomposedStringWithCompatibilityMapping etc.
More information about the Digitalmars-d
mailing list