Major performance problem with std.array.front()
Dmitry Olshansky
dmitry.olsh at gmail.com
Sun Mar 9 12:25:25 PDT 2014
09-Mar-2014 22:41, Andrei Alexandrescu пишет:
> On 3/9/14, 11:34 AM, Dmitry Olshansky wrote:
>> This. Anyhow searching dchar makes sense for _some_ languages, the
>> problem is that it shouldn't decode the whole string but rather encode
>> the needle properly and search that.
>
> That's just an optimization. Conceptually what happens is we're looking
> for a code point in a sequence of code points.
Yup. It's till not a good idea to introduce this in std.algorithm in a
non-generic way.
>> That and wrapping your head around 2 sets of constraints. The amount of
>> code around 2 types - wchar[]/char[] is way too much, that much is clear.
>
> We're engineers so we should quantify. Ideally that would be as simple
> as "git grep isNarrowString|wc -l" which currently prints 42 of all
> numbers :o).
Add to that some uses of isSomeString and ElementEncodingType.
138 and 80 respectively.
And in most cases it means that nice generic code was hacked to care
about 2 types in particular. That is what bothers me.
> Overall I suspect there are a few good simplifications we can make by
> using isNarrowString and .representation.
Okay putting potential breakage aside.
Let me sketch up an additive way of improving current situation.
1. Say we recognize any indexable entity of char/wchar/dchar, that
however has .front returning a dchar as a "narrow string". Nothing fancy
- it's just a generalization of isNarrowString. At least a range over
Array!char will work as string now.
2. Likewise representation must be made something more explicit say
byCodeUnit and work on any isNarrowString per above. The opposite of
that is byCodePoint.
3. ElementEncodingType is too verbose and misleading. Something more
explicit would be useful. ItemType/UnitType maybe?
4. We lack lots of good stuff from Unicode standard. Some recently
landed in std.uni. We need many more, and deprecate crappy ones in
std.string. (e.g. wrapping text is one)
5. Most algorithms conceptually decode, but may be enhanced to work
directly on UTF-8/UTF-16. That together with 1, should IMHO solve most
of our problems.
6. Take into account ASCII and maybe other alphabets? Should be as
trivial as .assumeASCII and then on you march with all of std.algo/etc.
--
Dmitry Olshansky
More information about the Digitalmars-d
mailing list