Speed kills
Jon D via Digitalmars-d
digitalmars-d at puremagic.com
Wed Mar 9 12:30:10 PST 2016
On Tuesday, 8 March 2016 at 14:14:25 UTC, ixid wrote:
> Since I posted this thread I've learned std.algorithm.sum is 4
> times slower than a naive loop sum. Even if this is for reasons
> of accuracy this is exactly what I am talking about- this is a
> hidden iceberg of terrible performance that will reflect poorly
> on D. That's so slow the function needs a health warning.
I seen a few cases while exploring D. Not all fully researched
(apologies for that), but since there appears to be interest in
identification I'll list them.
* Lower-casing strings (likely upper-casing), and some character
type checks.
Routines like to toLower and asLowerCase call functions that work
for all unicode characters. I was able to create much faster
versions by checking if the character was ascii, then calling
either the ascii version or the more general version. Same is
true for a few routines like isNumber. Some have the ascii check
optimization built in, but not all.
If this optimization is added, it might also be useful to add a
few common combinations (or a generic solution, if that's
feasible). For example, to check if a character is alpha-numeric,
one currently ORs two tests from the standard library, isAlpha
and isNumber. Putting in an ascii optimization check requires
putting it before doing the OR, rather than inside the tests
being ORed.
* Large associative arrays
When associative arrays get beyond about 10 million entries
performance starts to decline. I believe this is due to resizing
the arrays. It's worse with strings as keys than integers as
keys. Having a way to reserve capacity may help under some
circumstances.
* Associative arrays - Converting keys to immutable versions for
lookup
Associative arrays want immutable values as keys. Far as I can
tell, immutable values are also required when performing a
lookup, even if a new entry won't be stored. A couple apps I've
written walk through large lists of text values, naturally
available as char[] because they are read from input streams. To
test presence in an associative array, it's necessary to copy
them to immutable strings first. I haven't fully researched this
one, but my experience is that copying from char[] to string
becomes a meaningful cost.
On the surface, this appears to be an optimization opportunity,
to create the immutable strings only when actually storing a new
value.
--Jon
More information about the Digitalmars-d
mailing list