Major performance problem with std.array.front()

Dmitry Olshansky dmitry.olsh at gmail.com
Sun Mar 9 12:25:25 PDT 2014


09-Mar-2014 22:41, Andrei Alexandrescu пишет:
> On 3/9/14, 11:34 AM, Dmitry Olshansky wrote:
>> This. Anyhow searching dchar makes sense for _some_ languages, the
>> problem is that it shouldn't decode the whole string but rather encode
>> the needle properly and search that.
>
> That's just an optimization. Conceptually what happens is we're looking
> for a code point in a sequence of code points.

Yup. It's till not a good idea to introduce this in std.algorithm in a 
non-generic way.

>> That and wrapping your head around 2 sets of constraints. The amount of
>> code around 2 types - wchar[]/char[] is way too much, that much is clear.
>
> We're engineers so we should quantify. Ideally that would be as simple
> as "git grep isNarrowString|wc -l" which currently prints 42 of all
> numbers :o).

Add to that some uses of isSomeString and ElementEncodingType.
138 and 80 respectively.

And in most cases it means that nice generic code was hacked to care 
about 2 types in particular. That is what bothers me.

> Overall I suspect there are a few good simplifications we can make by
> using isNarrowString and .representation.

Okay putting potential breakage aside.
Let me sketch up an additive way of improving current situation.

1. Say we recognize any indexable entity of char/wchar/dchar, that 
however has .front returning a dchar as a "narrow string". Nothing fancy 
- it's just a generalization of isNarrowString. At least a range over 
Array!char will work as string now.

2. Likewise representation must be made something more explicit say 
byCodeUnit and work on any isNarrowString per above. The opposite of 
that is byCodePoint.

3. ElementEncodingType is too verbose and misleading. Something more 
explicit would be useful. ItemType/UnitType maybe?

4. We lack lots of good stuff from Unicode standard. Some recently 
landed in std.uni. We need many more, and deprecate crappy ones in 
std.string. (e.g. wrapping text is one)

5. Most algorithms conceptually decode, but may be enhanced to work 
directly on UTF-8/UTF-16. That together with 1, should IMHO solve most 
of our problems.

6. Take into account ASCII and maybe other alphabets? Should be as 
trivial as .assumeASCII and then on you march with all of std.algo/etc.

-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list