Fantastic exchange from DConf

Tue May 9 17:30:42 PDT 2017

On Tue, May 09, 2017 at 11:09:27PM +0000, Guillaume Boucher via Digitalmars-d wrote:
> On Tuesday, 9 May 2017 at 16:55:54 UTC, H. S. Teoh wrote:
> > Ouch.  Haha, even I forgot about this particularly lovely aspect of
> > C.  Hooray, freely call functions without declaring them, and
> > "obviously" they return int! Why not?
> 
> To be fair, most of your complaints can be fixed by enabling compiler
> warnings and by avoiding the use of de-facto-deprecated functions
> (strnlen).

The problem is that warnings don't work, because people ignore them.
Everybody knows warnings shouldn't be ignored, but let's face it, when
you make a 1-line code change and run make, and the output is 250 pages
long (large project, y'know), any warnings that are buried somewhere in
there won't even be noticed, much less acted on.

In this sense I agree with Walter that warnings are basically useless,
because they're not enforced. Either something is correct and compiles,
or it should be an error that stops compilation. Anything else, and you
start having people ignore warnings.

Yes I know, there's gcc -Werror and the analogous flags for the other
compilers, but in a sufficiently large project, -Werror is basically
impractical because too much of legacy code will just refuse to compile,
and it's not feasible to rewrite / waste time fixing it.

As for avoiding de-facto-deprecated functions, I've already said it:
*everybody* knows strcat is bad, and strcpy is bad, and so on and so
forth.  So how come I still see new C code being written almost every
day that continues to use these functions?  It's not that the coders
refuse to cooperate... I've seen a lot of code in my project where
people meticulously use strncpy instead of strcat / strcpy -- I presume
out of the awareness that they are "bad".  But when push comes to shove
and there's a looming deadline, all scruples are thrown to the winds and
people just take the path of least resistance.  The mere fact that
strcat and strcpy exist means that somebody, sometime, will use them,
and usually to disastrous consequences.

And *that's* the fundamental problem with C (and in the same principle,
C++): the correct way to write code is also a very onerous, fragile,
error-prone, and verbose way of writing code. The "obvious" and "easy"
way to write C code is almost always the wrong way.  The incentives are
all wrong, and so there's a big temptation for people to cut corners and
take the easy way out.

It's much easier to write this:

	int myfunc(context_t *ctx) {
		data_desc_t *desc = ctx->data;
		FILE *fp = fopen(desc->filename, "w");
		char *tmp = malloc(1000);
		strcpy(tmp, desc->data1);
		fwrite(tmp, strlen(tmp), 1, fp);
		strcpy(tmp, desc->data2);
		fwrite(tmp, strlen(tmp), 1, fp);
		strcpy(desc->cache, tmp);
		fclose(fp);
		free(tmp);
		return 0;
	}

rather than this:

	int myfunc(context_t *ctx) {
		data_desc_t *desc;
		FILE *fp;
		char *tmp;
		size_t bufsz;

		if (!ctx) return INVALID_CONTEXT;
		desc = ctx->data;

		if (!desc->data1 || !desc->data2) return INVALID_ARGS;

		fp = fopen("blah", "w");
		if (!fp) return CANT_OPEN_FILE;

		bufsz = desc->data1_len + desc->data2_len + 1;
		tmp = malloc(bufsz);
		if (!tmp) return OUT_OF_MEMORY;

		strncpy(tmp, desc->data1, bufsz);
		if (fwrite(tmp, strlen(tmp), 1, fp) != 1)
		{
			fclose(fp);
			unlink("blah");
			return IO_ERROR;
		}

		strncpy(tmp, desc->data2, bufsz);
		if (fwrite(tmp, strlen(tmp), 1, fp) != 1)
		{
			fclose(fp);
			unlink("blah");
			return IO_ERROR;
		}

		if (desc->cache)
			strncpy(desc->cache, tmp, sizeof(desc->cache));

		if (fclose(fp) != 0)
		{
			WARN("I/O error");
			free(tmp);
			return IO_ERROR;
		}
		free(tmp);
		return OK;
	}

Most people would probably write something in between, which is neither
completely wrong, nor completely right. But it works for 90% of the
cases, and since it meets the deadline, it's "good enough".

Notice how much longer and more onerous it is to write the "correct"
version of the code than the easy way. A properly-designed language
ought to reverse the incentives: the default, "easy" way to write code
should be the "correct", safe, non-leaking way.  Potentially unsafe,
potentially resource-leaking behaviour should require work on the part
of the coder, so that he'd only do it when there's a good reason for it
(optimization, or writing @system code that needs to go outside the
confines of the default safe environment, etc.).

In this respect, D scores much better than C/C++.  Very often, the
"easy" way to write something in D is also the correct way. It may not
be the fastest way for the performance-obsessed premature-optimizing C
hacker crowd (and I include myself among them), but it won't leak
memory, overrun buffers, act on random stack values from uninitialized
local variables, etc.. Your program is correct to begin with, which then
gives you a stable footing to start working on improving its
performance.  In C/C++, your program is most likely wrong to begin with,
so imagine what happens when you try to optimize that wrong code in
typical C/C++ hacker premature optimization fashion.

(Nevermind the elephant in the room that 80-90% of the "optimizations"
C/C++ coders -- including myself -- have programmed into their finger
reflexes are actually irrelevant at best, because either compilers
already do those optimizations for you, or the hot spot simply isn't
where we'd like to believe it is; or outright de-optimizing at worst,
because we've successfully defeated the compiler's optimizer by writing
inscrutable code.)

Null dereference is one area where D does no better than C/C++, though
even in that case, language features like closures help alleviate much
of the kind of code that would otherwise need to deal with pointers
directly. (Yes, I'm aware C++ now has closures... but most of the C++
code out in the industry -- and C++ coders themselves -- have a *long*
ways to go before they can catch up with the latest C++ standards. Until
then, it's lots of manual pointer manipulations that are ready to
explode in your face anytime.)

> The remaining problems theoretically shouldn't occur by disciplined
> use of commonly accepted C99 guidelines.  But I agree that even then
> and with the use of sanitizers writing correct C code remains very
> hard.

That's another fundamental problem with the C/C++ world: coding by
convention.  We all know all too well that *if* we'd only abide by
such-and-such coding guidelines and recommendations, our code would
actually stand a chance of being correct, safe, non-leaking, etc..
However, the problem with conventions is that they are just that:
conventions. They get broken all the time, with disastrous consequences.
I used to believe in convention -- after all, who wouldn't want to be
goodie-two-shoes coders who abides by all the rules so that they could
take pride in their shiny, perfect code?  Unfortunately, after almost 20
years working in the industry and seeing "enterprise" code that makes my
eyes bleed, I've lost all confidence that conventions are of any help.
I've seen code written by supposedly "renown" or "expert" C coders that
represent some of the most repulsive, stomach-turning examples of
antipatterns I've ever had the dubious pleasure of needing to debug.

D's stance of static verifiability and compile-time guarantees is an oft
under-appreciated big step in the right direction.  In the long run,
conventions will not solve anything; you need *enforcement*. The
compiler has to be able to prove, at compile-time, that function X is
actually pure, or nothrow, or @safe, or whatever, for those things to
have any value whatsoever. And for this to be possible, the language
itself needs to have these notions built-in, rather than have it tacked
on by an external tool (that people will be reluctant to use, or
outright ignore, or iti doesn't work with their strange build system,
target arch, or whatever).

Sure, there are currently implementation bugs that make @safe not quite
so safe in some cases, or too much of Phobos is still
@safe-incompatible. But still, these are implementation quality issues.
The concept itself is a sound and powerful one.  A compiler-verified
attribute is far more effective than any blind faith trust in convention
ever will be, e.g., D's immutable vs. C++'s easy-to-cast-away const --
that we *trust* people won't attempt. Yes, I'm aware of bugs in the
current implementation that allows you to bypass immutable, but still,
it's a QoI issue.  And yes, there are areas in the spec that have holes,
etc.. But assuming these QoI issues and spec holes / inconsistencies are
fixed, what we have is a powerful system that will actually deliver
compile-time guarantees about memory safety, rather than a system of
conventions that you can never be too sure that somebody somewhere
didn't break, and therefore you can only *hope* that it is memory-safe.

T

-- 
Life is too short to run proprietary software. -- Bdale Garbee