Major performance problem with std.array.front()

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Sun Mar 9 11:41:53 PDT 2014


On 3/9/14, 11:34 AM, Dmitry Olshansky wrote:
> 09-Mar-2014 07:53, Vladimir Panteleev пишет:
>> On Sunday, 9 March 2014 at 03:26:40 UTC, Andrei Alexandrescu wrote:
>> I don't understand this argument. Iterating by code unit is not
>> meaningless if you don't want to extract meaning from each unit
>> iteration. For example, if you're parsing JSON or XML, you only care
>> about the syntax characters, which are all ASCII. And there is no
>> confusion of "what exactly are we counting here".
>>
>>>> This was debated... people should not be looking at individual code
>>>> points, unless they really know what they're doing.
>>>
>>> Should they be looking at code units instead?
>>
>> No. They should only be looking at substrings.
>
> This. Anyhow searching dchar makes sense for _some_ languages, the
> problem is that it shouldn't decode the whole string but rather encode
> the needle properly and search that.

That's just an optimization. Conceptually what happens is we're looking 
for a code point in a sequence of code points.

> Basically the whole thread is about:
> how do I work efficiently (no-decoding) with UTF-8/UTF-16 in cases where
> it obviously can be done?
>
> The current situation is bad in that it undermines writing decode-less
> generic code.

s/undermines writing/makes writing explicit/

> One easily falls into auto-decode trap on first .front,
> especially when called from some standard algorithm. The algo sees
> char[]/wchar[] and gets into decode mode via some special case. If it
> would do that with _all_ char/wchar random access ranges it'd be at
> least consistent.
>
> That and wrapping your head around 2 sets of constraints. The amount of
> code around 2 types - wchar[]/char[] is way too much, that much is clear.

We're engineers so we should quantify. Ideally that would be as simple 
as "git grep isNarrowString|wc -l" which currently prints 42 of all 
numbers :o).

Overall I suspect there are a few good simplifications we can make by 
using isNarrowString and .representation.


Andrei



More information about the Digitalmars-d mailing list