Speed kills

Jon D via Digitalmars-d digitalmars-d at puremagic.com
Wed Mar 9 12:30:10 PST 2016


On Tuesday, 8 March 2016 at 14:14:25 UTC, ixid wrote:
> Since I posted this thread I've learned std.algorithm.sum is 4 
> times slower than a naive loop sum. Even if this is for reasons 
> of accuracy this is exactly what I am talking about- this is a 
> hidden iceberg of terrible performance that will reflect poorly 
> on D. That's so slow the function needs a health warning.

I seen a few cases while exploring D. Not all fully researched 
(apologies for that), but since there appears to be interest in 
identification I'll list them.

* Lower-casing strings (likely upper-casing), and some character 
type checks.

Routines like to toLower and asLowerCase call functions that work 
for all unicode characters. I was able to create much faster 
versions by checking if the character was ascii, then calling 
either the ascii version or the more general version. Same is 
true for a few routines like isNumber. Some have the ascii check 
optimization built in, but not all.

If this optimization is added, it might also be useful to add a 
few common combinations (or a generic solution, if that's 
feasible). For example, to check if a character is alpha-numeric, 
one currently ORs two tests from the standard library, isAlpha 
and isNumber. Putting in an ascii optimization check requires 
putting it before doing the OR, rather than inside the tests 
being ORed.

* Large associative arrays

When associative arrays get beyond about 10 million entries 
performance starts to decline. I believe this is due to resizing 
the arrays. It's worse with strings as keys than integers as 
keys. Having a way to reserve capacity may help under some 
circumstances.

* Associative arrays - Converting keys to immutable versions for 
lookup

Associative arrays want immutable values as keys. Far as I can 
tell, immutable values are also required when performing a 
lookup, even if a new entry won't be stored. A couple apps I've 
written walk through large lists of text values, naturally 
available as char[] because they are read from input streams. To 
test presence in an associative array, it's necessary to copy 
them to immutable strings first. I haven't fully researched this 
one, but my experience is that copying from char[] to string 
becomes a meaningful cost.

On the surface, this appears to be an optimization opportunity, 
to create the immutable strings only when actually storing a new 
value.

--Jon


More information about the Digitalmars-d mailing list