Optimization fun

Thu Nov 6 17:11:18 PST 2014

On Thu, Nov 06, 2014 at 02:58:16PM -0800, H. S. Teoh via Digitalmars-d wrote:
> So today, I was playing around with profiling and optimizing my sliding
> block puzzle solver, and found some interesting things:
[...]

Oh, I forgot to mention something else that might be of interest to
people: in the course of optimizing the performance of canFind(), one of
the techniques I used was to reduce the frequency with which it's
called. One such situation was the duplication between the canMove()
method, which checks whether a prospective move is valid, and move(),
which performs the actual move (update the puzzle board).

In more general terms, the same pattern applies to data structures that
have a check method for evaluating whether a particular operation is
legal or not, and another method for actually performing the operation.
For example, when updating a hash table or binary tree you might do a
"find" to see if an entry already exists, before you insert a new entry.
While check() followed by insert() makes sense logically (first you
check, then you proceed if the check is OK), it performs poorly in code,
because quite often there is a fair amount of overlap in the work done
by either method. The check, for example, may have found the node in
which the insert will take place, but since it has to return to the
caller first, this information is lost and insert() has to recompute
something that it should already have known.

There are a few common solutions to this problem:

1) Have a checkAndInsert() method that both checks and performs the
insertion if the check passes. While this solves the problem, it's ugly
because it conflates two logically distinct operations into one, thus
breaking the principle of single responsibility.

A variation on (1) is to pass a flag (ugh) to insert() to do a check
before inserting. It's essentially the same solution with the same
drawbacks.

2) check() returns a node (or iterator, or range, whatever) instead of a
bool, which is then passed to insert() so it doesn't have to traverse
the structure again. This is a slightly better solution, but the
disadvantage is that you expose the implementation details of the
structure, and besides, why should a method named "check" return an
iterator? It's also unable to capture other repeated computations
effectively; e.g., check() may have computed things like the balance
factor of the tree at that point, which insert() can profitably reuse,
but an iterator/range doesn't (and shouldn't!) capture such a kind of
information.

3) check() takes an additional ref parameter that can be used to store
extra information to be passed to insert(), then the caller passes this
extra information when it calls insert(). This is also ugly, because it
pollutes user code with implementation details. This can be restricted
to only private use by other methods, so that we don't break
encapsulation, but then user code won't be able to benefit from it.

In the end, the idea occurred to me that the reason insert() can reuse
certain things check() has already computed, is because those things
were computed as part of the process of *verifying* the validity of a
certain operation. Much like verifying the validity of a piece of data
with a checksum, the checksum computed by the verification function is a
proof, or certificate, that the data is valid. So it occurred to me that
check() should return not a bool, but an opaque *certificate* object
that captures the information computed by check() in the process of
validating the user-proposed operation. This certificate object can then
be inspected by the caller to see if the check passed (a default,
"invalid" certificate is returned if check() fails), and it can also
capture additional information that insert() can use afterwards. Then
insert() itself will have two overloads: one that takes the normal
parameters, which will (re)compute whatever information is necessary to
perform the operation, and another that takes a certificate that
"certifies" that the given operation has been previously validated, and
therefore insert() can skip recomputation of things already computed by
check() -- which are captured in the certificate object. Furthermore,
being a certificate, it also contains information about the original
request (otherwise you could illegally pair an invalid request with the
certificate of a different, valid request and cause insert() to
malfunction).

So my eventual solution was to change the canMove() method's signature
from:

	bool canMove(Move m);

to:

	MoveCert canMove(Move m);

And then the overloads of move() are:

	void move(Move m) {
		// Unvalidated overload: will compute the values needed
		// by the second overload of move manually since it's
		// not already computed.
		MoveCert cert = ... ;

		// The actual implementation is second overload; now
		// that we have computed the necessary info to proceed,
		// we share the rest of the implementation with it. Thus
		// there is no needless code duplication.
		move(cert);
	}

	void move(MoveCert cert) {
		// Here, we can use info stored in cert that have been
		// precomputed by check(), thereby avoiding the
		// unnecessary overhead of computing them all over
		// again.
	}

In the specific case of my solver, MoveCert looks like this:

	private struct MoveCert {
		private Move m;			// original "request"
		private ptrdiff_t offset;	// cached result of countUntil()

		// This makes this change transparent to existing
		// callers that expect a bool:
		bool isValid() { ... }
		alias isValid this;
	}

MoveCert.offset caches the value of countUntil() that canMove() has
already computed, so move() can simply look it up instead of recomputing
it.

This particular optimization significantly reduced the number of calls
to countUntil(), and resulted in about 20-30% performance improvement.
But the best part it, it did so by ostensibly making the code *prettier*
and not breaking encapsulation, quite the opposite effect of typical
optimization techniques!

(Having said that, this technique does have to be used with care...
while working on this, I noticed that another value was being recomputed
by move() that canMove() already computes. However, adding that value to
MoveCert actually *degraded* performance -- because it made MoveCert too
large, and the overhead of passing it back to the caller from canMove()
and then passing it to move() afterwards outweighed the cost of just
recomputing the value in move(). So the things worth caching in MoveCert
are those that are expensive to recompute, not those that are trivially
recomputed. In this case, the value in question was calculated by
multiplying a few integers. Evidently, performing this multiplication
the second time is cheaper than the cost of making MoveCert too large
and needing more overhead to pass around. Profiling is a must in
deciding just how much to put in the certificate object.)

T

-- 
If I were two-faced, would I be wearing this one? -- Abraham Lincoln