D mentioned on Rust discussions site

H. S. Teoh hsteoh at quickfur.ath.cx
Sun May 24 06:08:42 UTC 2020


On Sat, May 23, 2020 at 09:05:47PM -0700, Walter Bright via Digitalmars-d wrote:
> On 5/23/2020 6:18 AM, Dibyendu Majumdar wrote:
> > > https://www.digitalmars.com/articles/C-biggest-mistake.html
> > 
> > I do not think it was a mistake at all. Treating pointers and arrays
> > uniformly is one of the great innovations in C.
> 
> The amount of buffer overflows enabled by it is legendary.

On Sat, May 23, 2020 at 09:15:18PM -0700, Walter Bright via Digitalmars-d wrote:
> On 5/23/2020 4:50 PM, Andrei Alexandrescu wrote:
> > This "no fat pointers" decision cascaded into another disastruous
> > choice - zero terminated strings. Worse than even autodecoding :o).
> 
> Ironically, a lot of C's speed advantage is lost in endlessly scanning
> strings to find their length.

Hear, hear.

Having worked for the past 25 years in the industry primarily with C
code, I have to say that the amount of C code with hidden potential
buffer overflows is staggering. In spite of a big push in the past 10 or
so years towards safer C coding practices, I still regularly come across
code that have potential buffer overflows hidden deep within a large
codebase, and code being freshly written that *still* depend on
unfounded assumptions about array length -- because manually passing the
length is just too cumbersome, and people either don't bother with it,
or make mistakes while doing it (nothing like passing the wrong length
to screw up your day -- esp. when it gets missed by QA and proceeds to
explode in the customer's production environment -- or worse, it
*doesn't* get noticed by the customer until a hacker decides to exploit
it).

Not to mention the gigantic pain in writing said code in C in the first
place -- you constantly have to be burning mental calories trying to
keep track of which pointer goes with which length, making sure to
manually check bounds, making sure to think thrice about your loop to
prevent overruns, endless calls to strlen() without even thinking about
the performance consequences. I'm serious, I work with a huge codebase
of almost 2 million LOC, and strlen calls are everywhere, including
inside macros that nobody really takes the time to understand but just
sprinkle everywhere in their code.  The complexity of managing all of
this is just beyond any normal person's capacity given the tight
deadlines we have to meet, so people just call strlen willy-nilly
without even thinking about the performance consequences.  And of
course, the huge bulk of code that expects null-terminated strings means
it's just not worth trying to encapsulate the length in any sort of way
-- you make yourself incompatible with just about everything else, and
end up spending endless effort converting to/from char* instead, which
is not any better.  So the problem perpetuates itself, and becomes an
accumulated cost that nobody can feasibly improve without rewriting the
entire darned thing from ground up. (And even then, all it takes is for
*someone* to start using char* again, and suddenly you're back to square
one. No sane C coder uses anything *but* char* for strings in "normal"
C code. Esp. code that needs to talk to other C libraries not written by
your team. It's inextricably ingrained in C culture.)

And strlen isn't even the end of it; because of manual memory
management, everyone defensively strdup's every darned string that they
intend to keep, because there's just no other sane way of ensuring the
pointer won't get invalidated later (and no guarantee, thanks to default
mutability, that someone won't change the contents of the string down
the line and screw up the assumptions in your code). So in addition to
strlen, your average typical C code is plagued with endless calls to
strdup or equivalent, even for simple things like taking substrings.

Some people complain about D strings being immutable(char)[], but I tell
you, encapsulated in that seemingly-trivial construct is a ton of
experience-backed insight about string management in C-like programming
paradigms that's worth careful study.  People like bashing D because of
the GC, but nobody talks about how *not* needing to call strlen() or
strdup() endlessly is such a performance booster, not to mention it
makes your code a LOT simpler and your APIs a lot cleaner to work with.

Considering just how much work has been poured into mitigating the
horrible consequences of "no fat pointers" and null-terminated strings
-- think about how much effort has been poured into dealing with buffer
overflow bugs over the past 10 years: entire cottage industries have
grown up around developing tools for detecting and fixing these sorts of
things, and who knows how much money poured into cleaning up the
consequences of the countless security exploits enabled by said buffer
overflows -- the laughable "benefits" of saving a couple of bytes by
using only lean pointers hardly measures up to anything less than a
colossal design mistake in retrospect. If you can even call them
"benefits": think of all the costly workarounds people have had to put
up with: everyone inventing their own way of passing array length
instead of having a standardized API, and inevitably doing it poorly /
with costly slip-ups, and the memory cost of needing to invent, store,
and manage data structures needed to manage all of this -- I surmise
this in itself already counteracts any meager savings one may have
gained by avoiding fat pointers. (Just think: in any persistent struct
that carries pointers to arrays: you're already paying for the cost of a
fat pointer because you need to store the length somehow anyway, except
everyone invents their own way of doing this so you're essentially
already paying for fat pointers but with none of the benefits of having
a standardized fat pointer type that's been tested to do it correctly:
you run the risk of human error at every turn.)

D arrays being fat pointers is a HUGE step at getting rid of the
nonsensical churn C's array-pointers lead to. And immutable(char)[] is a
huge saver in terms of performance.  The two taken together is one of
D's strengths.


T

-- 
"I suspect the best way to deal with procrastination is to put off the procrastination itself until later. I've been meaning to try this, but haven't gotten around to it yet. " -- swr


More information about the Digitalmars-d mailing list