[Issue 4483] foreach over string or wstring, where element type not specified, does not support unicode

d-bugmail at puremagic.com d-bugmail at puremagic.com
Sat Jan 25 20:53:19 PST 2014


https://d.puremagic.com/issues/show_bug.cgi?id=4483



--- Comment #10 from Martin Nowak <code at dawg.eu> 2014-01-25 20:50:26 PST ---
(In reply to comment #8)
> I'm very much opposed to this. The way it works now has been that way from the
> beginning, and an unknowably vast amount of code depends on it.
> 
You fail to recognize that it's broken from the begging.
The knowably vast amount of people that stumble over this should
show you how surprising this is. It's the only place in the language
where unicode handling is opt-in and choosing an incorrect default
goes against basic D principles.

D is designed such that most "obvious" code is fast and safe. On occasion a
function might need to escape the confines of type safety for ultimate speed
and control. For such rare cases D offers native pointers, type casts...

> Furthermore, I don't like the inherent slowdowns it causes. Only rarely does
> one need decoded chars, the rest of the time working with bytes is fast and
> correct.

That's not the best argument. Handling UTF-8 only adds a single comparison
    if (str[i] < 0x80)
        return str[i];
    else
        decodeUTF8(str, i);
that the branch predictor will always get right.
What's true is that we had several codegen issues with unicode decoding,
e.g. the comparison wasn't inlined.
Currently we decode foreach with a delegate, creeping slow.
https://github.com/rejectedsoftware/vibe.d/pull/327
So handling UTF-8 requires care in library design and a good optimizer but I
don't see how it is inherently slower.

-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------


More information about the Digitalmars-d-bugs mailing list