Speed kills

Wed Mar 9 13:01:13 PST 2016

On Wed, Mar 09, 2016 at 08:30:10PM +0000, Jon D via Digitalmars-d wrote:
> On Tuesday, 8 March 2016 at 14:14:25 UTC, ixid wrote:
> >Since I posted this thread I've learned std.algorithm.sum is 4 times
> >slower than a naive loop sum. Even if this is for reasons of accuracy
> >this is exactly what I am talking about- this is a hidden iceberg of
> >terrible performance that will reflect poorly on D. That's so slow
> >the function needs a health warning.

In the case of std.algorithm.sum, the focus is on accuracy rather than
performance.  It does some extra work to ensure maximum accuracy in the
result, so it shouldn't be expected to have top performance.  Granted,
though, the docs could be clearer about this accuracy vs. performance
tradeoff.  Please file a bug on this (or better yet, submit a PR for
it).

In any case, 4 times slower sounds a bit excessive... it would be good
to investigate why this is happening and fix it.

> I seen a few cases while exploring D. Not all fully researched
> (apologies for that), but since there appears to be interest in
> identification I'll list them.

One of the big performance killers that people are likely to run into is
autodecoding for strings in all range-based algorithms.  Walter has
repeatedly complained about this, but so far Andrei (and some others)
have resisted getting rid of autodecoding.  Fortunately, you can
(mostly) suppress this by wrapping your strings in
std.encoding.codeUnits before operating on them. Nevertheless, it's one
of those hidden gotchas that could result in poorer performance than
what you could get otherwise.

Another big performance killer that I repeatedly run into is the
overly-eager GC collection schedule.  Some may argue against the use of
the GC, period, but I have found that even with the GC, it's possible to
get 30-40% speedups just by calling GC.disable() and manually triggering
GC.collect() at a lower frequency than the default. This is especially
true when your program performs many allocations of small objects, like
(small) strings. I have observed this in at least 2-3 different
memory-intensive programs of mine, including a recent experiment in
writing a faster CSV parser than std.csv.  Arguably, we should fix this
in druntime itself so that collection cycles are run less often, though
I'm not sure what implications this may have on low-memory systems.
Sometimes, you don't even need to do this for the entire program; you
can get big performance boosts just by wrapping GC.disable() /
GC.enable() around strategic sections of allocation-intensive code.

> * Lower-casing strings (likely upper-casing), and some character type
> checks.
> 
> Routines like to toLower and asLowerCase call functions that work for
> all unicode characters. I was able to create much faster versions by
> checking if the character was ascii, then calling either the ascii
> version or the more general version. Same is true for a few routines
> like isNumber. Some have the ascii check optimization built in, but
> not all.
> 
> If this optimization is added, it might also be useful to add a few
> common combinations (or a generic solution, if that's feasible). For
> example, to check if a character is alpha-numeric, one currently ORs
> two tests from the standard library, isAlpha and isNumber. Putting in
> an ascii optimization check requires putting it before doing the OR,
> rather than inside the tests being ORed.

These sound like valuable additions to Phobos. Submit a PR, please?

[...]
> * Associative arrays - Converting keys to immutable versions for lookup
> 
> Associative arrays want immutable values as keys. Far as I can tell,
> immutable values are also required when performing a lookup, even if a
> new entry won't be stored. A couple apps I've written walk through
> large lists of text values, naturally available as char[] because they
> are read from input streams. To test presence in an associative array,
> it's necessary to copy them to immutable strings first. I haven't
> fully researched this one, but my experience is that copying from
> char[] to string becomes a meaningful cost.
[...]

This is arguably a bug.  If you're only looking up a key, you should be
able to pass in const(char)[] rather than immutable(char)[] (aka
string).  The GC performance problem I mentioned above is especially
noticeable when there are large numbers of small allocations, as would
be the case if you constantly have to call .idup just because AA lookups
demand immutable keys.  Please file an issue for this, if there isn't
one already filed.

T

-- 
Windows 95 was a joke, and Windows 98 was the punchline.