Major performance problem with std.array.front()

Dmitry Olshansky dmitry.olsh at gmail.com
Sun Mar 9 11:34:58 PDT 2014


09-Mar-2014 07:53, Vladimir Panteleev пишет:
> On Sunday, 9 March 2014 at 03:26:40 UTC, Andrei Alexandrescu wrote:
> I don't understand this argument. Iterating by code unit is not
> meaningless if you don't want to extract meaning from each unit
> iteration. For example, if you're parsing JSON or XML, you only care
> about the syntax characters, which are all ASCII. And there is no
> confusion of "what exactly are we counting here".
>
>>> This was debated... people should not be looking at individual code
>>> points, unless they really know what they're doing.
>>
>> Should they be looking at code units instead?
>
> No. They should only be looking at substrings.

This. Anyhow searching dchar makes sense for _some_ languages, the 
problem is that it shouldn't decode the whole string but rather encode 
the needle properly and search that.

Basically the whole thread is about:
how do I work efficiently (no-decoding) with UTF-8/UTF-16 in cases where 
it obviously can be done?

The current situation is bad in that it undermines writing decode-less 
generic code. One easily falls into auto-decode trap on first .front, 
especially when called from some standard algorithm. The algo sees 
char[]/wchar[] and gets into decode mode via some special case. If it 
would do that with _all_ char/wchar random access ranges it'd be at 
least consistent.

That and wrapping your head around 2 sets of constraints. The amount of 
code around 2 types - wchar[]/char[] is way too much, that much is clear.

-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list