The Case Against Autodecode

Sun May 15 18:35:24 PDT 2016

On Mon, May 16, 2016 at 12:31:04AM +0000, Jack Stouffer via Digitalmars-d wrote:
> On Sunday, 15 May 2016 at 23:10:38 UTC, Jon D wrote:
> >Given the importance of performance in the auto-decoding topic, it
> >seems reasonable to quantify it. I took a stab at this. It would of
> >course be prudent to have others conduct similar analysis rather than
> >rely on my numbers alone.
> 
> Here is another benchmark (see the above comment for the code to apply
> the patch to) that measures the iteration time difference:
> http://forum.dlang.org/post/ndj6dm$a6c$1@digitalmars.com
> 
> The result is a 756% slow down

I decide to do my own benchmarking too. Here's the code:

	/**
	 * Simple-minded benchmark for measuring performance degradation caused by
	 * autodecoding.
	 */

	import std.typecons : Flag, Yes, No;

	size_t countNewlines(Flag!"autodecode" autodecode)(const(char)[] input)
	{
	    size_t count = 0;

	    static if (autodecode)
	    {
	        import std.array;
	        foreach (dchar ch; input)
	        {
	            if (ch == '\n') count++;
	        }
	    }
	    else // !autodecode
	    {
	        import std.utf : byCodeUnit;
	        foreach (char ch; input.byCodeUnit)
	        {
	            if (ch == '\n') count++;
	        }
	    }
	    return count;
	}

	void main(string[] args)
	{
	    import std.datetime : benchmark;
	    import std.file : read;
	    import std.stdio : writeln, writefln;

	    string input = (args.length >= 2) ? args[1] : "/usr/src/d/phobos/std/datetime.d";

	    uint n = 50;
	    auto data = cast(char[]) read(input);
	    writefln("Input: %s (%d bytes)", input, data.length);
	    size_t count;

	    writeln("With autodecoding:");
	    auto result = benchmark!({
	        count = countNewlines!(Yes.autodecode)(data);
	    })(n);
	    writefln("Newlines: %d  Time: %s msecs", count, result[0].msecs);

	    writeln("Without autodecoding:");
	    result = benchmark!({
	        count = countNewlines!(No.autodecode)(data);
	    })(n);
	    writefln("Newlines: %d  Time: %s msecs", count, result[0].msecs);
	}

	// vim:set sw=4 ts=4 et:

Just for fun, I decided to use std/datetime.d, one of the largest
modules in Phobos, as a test case.

For comparison, I compiled with dmd (latest git head) and gdc 5.3.1. The
compile commands were:

	dmd -O -inline bench.d -ofbench.dmd
	gdc -O3 bench.d -o bench.gdc

Here are the results from bench.dmd:

	Input: /usr/src/d/phobos/std/datetime.d (1464089 bytes)
	With autodecoding:
	Newlines: 35398  Time: 331 msecs
	Without autodecoding:
	Newlines: 35398  Time: 254 msecs

And the results from bench.gdc:

	Input: /usr/src/d/phobos/std/datetime.d (1464089 bytes)
	With autodecoding:
	Newlines: 35398  Time: 253 msecs
	Without autodecoding:
	Newlines: 35398  Time: 25 msecs

These results are pretty typical across multiple runs. There is a
variance of about 20 msecs or so between bench.dmd runs, but the
bench.gdc runs vary only by about 1-2 msecs.

So for bench.dmd, autodecoding adds about a 30% overhead to running
time, whereas for bench.gdc, autodecoding costs an order of magnitude
increase in running time.

As an interesting aside, compiling with dmd without -O -inline causes
the non-autodecoding case to be actually consistently *slower* than the
autodecoding case. Apparently in this case the performance is dominated
by the cost of calling non-inlined range primitives on byCodeUnit,
whereas a manual for-loop over the array of chars produces similar
results to the -O -inline case.  I find this interesting, because it
shows that the cost of autodecoding is relatively small compared to the
cost of unoptimized range primitives.  Nevertheless, it does make a big
difference when range primitives are properly optimized.  It is
especially poignant in the case of gdc that, given a superior optimizer,
the non-autodecoding case can be made an order of magnitude faster,
whereas the autodecoding case is presumably complex enough to defeat the
optimizer.

T

-- 
Democracy: The triumph of popularity over principle. -- C.Bond