Wed Oct 17 - Avoiding Code Smells by Walter Bright

Tue Oct 30 22:22:33 UTC 2018

On Mon, Oct 22, 2018 at 07:27:51AM +0000, Basile B. via Digitalmars-d-announce wrote:
> On Friday, 19 October 2018 at 03:53:12 UTC, Walter Bright wrote:
> > On 10/15/2018 2:23 PM, Walter Bright wrote:
> > > I'm giving a presentation at:
> > > 
> > > http://nwcpp.org/
> > > 
> > > See you there!
> > 
> > Had a nice crowd there last night. Apparently lots of people were
> > interested in this topic!
> > 
> > Video: https://www.youtube.com/watch?v=lbp6vwdnE0k&feature=youtu.be
> > 
> > Slides: http://nwcpp.org/talks/2018/code_smells.pdf
> 
> https://www.reddit.com/r/programming/comments/9qbhw2/nwcpp_walter_bright_talks_about_the_code_smells/

Haha, I liked how Walter caught the audience's attention by using
#include <windows.h> as the first example of a code smell. :-D

Honestly, though, IMO this is also a code smell:

	import std.stdio;

That is, if it appears at the top of every source file.  Walter sortof
said the same thing already, but basically, this usually also implies
writeln's (or printf's) all over code that probably don't really have
anything to do with I/O.  Which in itself is already bad enough -- code
that isn't responsible for I/O shouldn't be doing I/O -- but it also
binds the code to writing to stdout, so if later on you wish to write to
a logfile instead, you're up the creek without a paddle.

I would go even further, though.  All too often I've seen (and wrote)
code that does this:

	import std.stdio;

	auto myFunc(Args...)(Args args) {
		... // do lots of stuff
		auto f = File(myFilename, "w");
		f.writefln(... lots of stuff... );
		...
	}

Then I decide that the code is too complex for me to have any confidence
that it is correct, and so I need to write a unittest for it.  But ...
since it writes to an external file, and possibly decides on its own
output filename, the unittest has to worry about how to test it without
leaving detritus lying around in the filesystem.

My current go-to solution is to abstract away the filesystem itself,
e.g.:

	// N.B.: *no* import of std.stdio!
	auto myFunc(File = std.stdio.File, Args...)(Args args) {
		... // do lots of stuff
		auto f = File(myFilename, "w");
		f.writefln(... lots of stuff... );
		...
	}

	unittest
	{
		struct FilesystemProxy {
			string[string] contents;
			string curFile;
			this(string filename) { curFile = filename; }
			writefln(Args...)(string fmt, Args args) {
				contents[curFile] ~= format(fmt, args);
			}
		}
		FilesystemProxy fs;
		...
		auto ret = myFunc!FilesystemProxy(blah, blah, blah);

		assert(fs.contents["file1"] == expectedOutput);
		...
	}

This lets me inject a proxy object in place of std.stdio.File, so that
the unittest can check code logic without touching the real filesystem.
But by default, when real user code calls the function, it writes to the
real filesystem.

You may think this is excessive, but recently I've had a success story
with this style of coding: I have some D code cross-compiled to run on
Android, and it was crashing at runtime in a very hard-to-debug way. In
order to narrow things down, I had to write unittests to make sure the
code actually did what I thought it did. But I couldn't just write it
straight, because it assumes a filesystem structure that doesn't exist
on my host PC, and running unittests on Android was out of the question
since it would just run into the same bug and crash without yielding
useful information.

Abstracting away the filesystem allowed me to compile and test that bit
of code on the host PC separately from the other code, and without
requiring that I make a copy of the Android filesystem structure on the
PC just so I could test the code.  This enabled me to locate the bug and
write a unittest that prevents future regression, AND this test can be
run locally before being deployed to Android.

Now of course, Android has debugging tools and such that I could have
used instead -- but imagine if this code were deployed on a bare-metal
embedded system with little or no debugging support.  Being able to test
the code logic without deploying the code on the real target hardware is
a big plus.  This style of coding makes your code compilable without any
platform-specific libraries that may not even run on your host PC.  If
compiling for the target machine requires libxyz, and libxyz simply
doesn't run on a PC, then you're out of luck. But once you abstract away
that dependency, you can compile your business logic without libxyz (by
substituting proxy objects like above) and run tests locally until
you're sure the logic is sound, before deploying it to the actual
hardware.

//

As for encapsulating globals in a struct, I don't like it.  It's better
than literally littering globals everywhere, making it next to
impossible to figure out what the code is doing -- it's like action at a
distance, something physicists are uncomfortable with because it's hard
to reason about (well, almost impossible when applied to code).  At the
very least, it makes globals greppable, and makes any access plainly
obvious. But it still suffers from all the drawbacks of globals:

- It's very hard to tell where the global is initialized -- or more
  importantly, *when* it's initialized.  Not initializing it before it's
  needed is a hard-to-trace bug.

- It exhibits action-at-a-distance: an arbitrary piece of code can
  affect an arbitrary other piece of code. Very hard to reason about the
  code as a whole.

- It's still a side-channel, the same problem discussed a few slides
  prior.  It's still a global, just in a better-looking dress.  It's
  papering over the problem instead of solving it.

- It makes the code hard to test, or just plain non-testable. (Writing a
  unittest that has to setup global variables beforehand is a bad, bad,
  bad thing. If such a test can even be written in the first place...
  usually such code is untestable except in a whole-program way, because
  its pieces are too tightly bound to each other. And we all know what
  happens to code quality when it's not tested enough.)

I would take it one step further: now that all the globals are collected
in one place, PASS IT AS A FUNCTION ARGUMENT to code that needs it.
That way, you know exactly who it gets passed to, and you know code that
you didn't pass it to would not have a side-channel to its members.  But
since it's cumbersome to pass it around everywhere (if it's truly
global), I'd just move functions that depend on it into the struct as
member functions, that way the globals get passed implicitly.

Doing this will naturally lead to the next step: a good programmer will
inevitably realize how ugly the struct is, because it's a haphazard
collection of random stuff thrown together, and will inevitably feel the
urge to split up the struct into logically self-contained pieces, EACH
WITH ITS OWN MEMBERS unrelated to the other pieces. In other words, the
code will "want" to be in different modules. This logically leads to
proper modularization of the original, unfocused, global-dependent code.

//

Now, on the slide about no I/O in low-level functions, I don't think
passing a delegate as a stand-in for I/O is a good idea. Especially with
the stated example:

	int sum(void delegate(char*) put, int i, int j) {
		if (i > 100)
			put("i is out of range");
		return i+j;
	}

The problem is that this code shouldn't be doing I/O at all, indirectly
or otherwise. Writing out a string via a delegate is not in the charter
of this function.  Again, it's just papering over the problem rather
than solving it.  Doing so uglifies the API, and makes it needlessly
cumbersome to use.  Plus, it invites lazy coders to write:

	auto s = sum((s) {} /* ignore errors */, x, y);

just to get rid of annoying error messages, thus leading to code that
hides errors rather than deal with them.  Having seen too many cases of
this in real-life code, this kind of API always makes me cringe.

Instead, I'd say just throw a darned Exception already.  It's what
exceptions are meant for: an out-of-band bailout when something went
wrong, that the immediate caller either has to handle, or let somebody
further up the call stack handle. Usually, the latter is a much better
idea, because the lower-level code may not have the necessary context to
make the right decision about this.

T

-- 
Leather is waterproof.  Ever see a cow with an umbrella?