Dicebot on leaving D: It is anarchy driven development in all its glory.
Joakim
dlang at joakim.fea.st
Thu Sep 6 17:19:01 UTC 2018
On Thursday, 6 September 2018 at 16:44:11 UTC, H. S. Teoh wrote:
> On Thu, Sep 06, 2018 at 02:42:58PM +0000, Dukc via
> Digitalmars-d wrote:
>> On Thursday, 6 September 2018 at 14:17:28 UTC, aliak wrote:
>> > // D
>> > auto a = "á";
>> > auto b = "á";
>> > auto c = "\u200B";
>> > auto x = a ~ c ~ a;
>> > auto y = b ~ c ~ b;
>> >
>> > writeln(a.length); // 2 wtf
>> > writeln(b.length); // 3 wtf
>> > writeln(x.length); // 7 wtf
>> > writeln(y.length); // 9 wtf
> [...]
>
> This is an unfair comparison. In the Swift version you used
> .count, but here you used .length, which is the length of the
> array, NOT the number of characters or whatever you expect it
> to be. You should rather use .count and specify exactly what
> you want to count, e.g., byCodePoint or byGrapheme.
>
> I suspect the Swift version will give you unexpected results if
> you did something like compare "á" to "a\u301", for example
> (which, in case it isn't obvious, are visually identical to
> each other, and as far as an end user is concerned, should only
> count as 1 grapheme).
>
> Not even normalization will help you if you have a string like
> "a\u301\u302": in that case, the *only* correct way to count
> the number of visual characters is byGrapheme, and I highly
> doubt Swift's .count will give you the correct answer in that
> case. (I expect that Swift's .count will count code points, as
> is the usual default in many languages, which is unfortunately
> wrong when you're thinking about visual characters, which are
> called graphemes in Unicode parlance.)
No, Swift counts grapheme clusters by default, so it gives 1. I
suggest you read the linked Swift chapter above. I think it's the
wrong choice for performance, but they chose to emphasize
intuitiveness for the common case.
I agree with most of the rest of what you wrote about programmers
having no silver bullet to avoid Unicode's and languages'
complexity.
More information about the Digitalmars-d
mailing list