Fantastic exchange from DConf

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Tue May 9 09:55:54 PDT 2017


On Tue, May 09, 2017 at 08:18:09AM +0000, Patrick Schluter via Digitalmars-d wrote:
> On Tuesday, 9 May 2017 at 06:15:12 UTC, H. S. Teoh wrote:
> > 
> > 	int my_func(mytype_t *input, outbuf_t *output_buf,
> > 	            char *buffer, int size)
> > 	{
> > 		/* Typical lazy way of null-checking (that will blow up
> > 		 * later) */
> > 		myhandle_t *h = input ? input->handle : 0;
> > 		writer_t *w = output_buf ? output_buf->writer : 0;
> > 		char *block = (char *)malloc(size);
> 
> Hey, you've been outed as a C++ programmer. A real C programmer never
> casts a void *. In that specific case, casting away the malloc()
> return can mask a nasty bug. If you have forgotten to include the
> header declaring the function, the compiler would assume an int
> returning function and the cast would suppress the righteous warning
> message of the compiler. On 64 bit machines the returned pointer would
> be truncated to the lower half.  Unfortunately on Linux, as the heap
> starts in the lower 4 GiB of address space, the code would run for a
> long time before it crashed. On Solaris-SPARC it would crash directly
> as binaries are loaded address 0x1_0000_0000 of the address space.

Ouch.  Haha, even I forgot about this particularly lovely aspect of C.
Hooray, freely call functions without declaring them, and "obviously"
they return int! Why not?

There's an even more pernicious version of this, in that the compiler
blindly believes whatever you declare a symbol to be, and the
declaration doesn't even have to be in a .h file or anything even
remotely related to the real definition. Here's a (greatly) reduced
example (paraphrased from an actual bug I discovered):

	module.c:
	-------
	int get_passwd(char *buf, int size);
	int func() {
		char passwd[100];
		if (!get_passwd(buf, 100)) return -1;
		do_something(passwd);
	}

	passwd.c:
	---------
	void get_passwd(struct user_db *db, struct login_record *rec) {
		... // stuff
	}

	old_passwd.c:
	-------------
	/* Please don't use this code anymore, it's deprecated. */
	/* ^^^^ gratuitous useless comment */
	int get_passwd(char *buf, int size) { ... /* old code */ }

Originally, in the makefile, module.o is linked with libutil.so, which
in turn is built from old_passwd.o and a bunch of other stuff. Later on,
passwd.o was added to libotherutil.so, which was listed after libutil.so
in the linker command, so the symbol conflict was masked because the
linker found the libutil.so version of get_passwd first.

Then one day, somebody changed the order of libraries in the makefile,
and suddenly func() mysteriously starts malfunctioning because
get_passwd now links to the wrong version of the function!

Worse yet, the makefile was written to be "smart", as in, it uses
wildcards to pick up .so files (y'know, us lazy programmers don't wanna
have to manually type out the name of every library). So when somebody
tried to fix this bug by removing old_passwd.o from libotherutil.so
altogether, the bug was still happening in other developers' machines,
because a stale copy of the old version of libotherutil.so was still
left in their source tree, so when *they* built the executable, it
contains the bug, but the bug vanishes when built from a fresh checkout.
Who knows how many hours were wasted chasing after this heisenbug.


[...]
> > 		if (w->buffered) { /* oops, possible null deref */
> > 			strncpy(buffer, input->data, size); /* oops, unterminated string */
> > 			if (w->write(buffer, size) != 0)
> > 				/* hmm, is 0 the error status, or is it -1? */
> > 				/* also, what if w->write == null? */
> 
> Or is it inspired by fwrite, which returns the number of written
> records? In that case 0 return might be an error or not, depends on
> size.

Yep, fwrite has an utterly lovely interface. The epitome of API design.
:-D


[...]
> > 		if (fp) fclose(fp);	/* oops, uninitialized ptr deref */
> 
> Worse, you didn't check the return of fclose() on writing FILE.
> fclose() can fail if the disk was full. As the FILE is buffered, the
> last fwrite might not have flushed it yet. So it is the fclose() that
> will try to write the last block and that can fail, but the app
> wouldn't be able to even report it.
[...]

Haha, you're right.  NONE of the code I've ever dealt with even
considers this case. None at all.  In fact, I don't even remember the
last time I've seen C code that bothers checking the return value of
fclose().  Maybe I've written it *once* in my lifetime when I was young
and naïve, and actually bothered to notice the documentation that
fclose() may sometimes fail. Even the static analysis tool we're using
doesn't report it!!

So again Walter was spot on: fill up the disk to 99% full, and 99% of C
programs would start malfunctioning and showing all kinds of odd
behaviours, because they never check the return code of printf, fprintf,
or fclose, or any of a whole bunch of other syscalls that are regularly
*assumed* to just work, when in reality they *can* fail.

The worst part of all this is, this kind of C code is prevalent
everywhere in C projects, including those intended for supposedly
security-aware software.  Basically, the language itself is just so
unfriendly to safe coding practices that it's nigh impossible to write
safe code in it.  It's *theoretically* possible, certainly, but in
practice nobody writes C code that way.  It is a scary thought indeeed,
how much of our current infrastructure relies on software running this
kind of code.  Something's gotta give, eventually. And it ain't gonna be
pretty when it all starts crumbling down.


T

-- 
Caffeine underflow. Brain dumped.


More information about the Digitalmars-d mailing list