Why static analysis is the way to go
H. S. Teoh
hsteoh at qfbox.info
Thu May 28 22:58:33 UTC 2026
On Thu, May 28, 2026 at 09:30:32PM +0000, monkyyy via Digitalmars-d wrote:
> On Thursday, 28 May 2026 at 19:41:27 UTC, H. S. Teoh wrote:
[...]
> > https://www.zdnet.com/article/rust-will-save-linux-from-ai-says-greg-kroah-hartman/
[...]
> > D made a lot of right choices in this area: statically-verifiable
> > const, compiler-enforced nothrow, pure, etc., arrays that always
> > carry length and out-of-bounds deference causing a runtime exception
> > instead of overwriting arbitrary memory, GC eliminating an entire
> > class of pointer bugs, etc.. These make D a huge pleasure to work
> > with, as opposed to the constant stream of pointer bugs, memory
> > leaks, and programming-by-convention that has been proven to be
> > ineffective decades ago, that you have to put up with when working
> > in C.
[...]
> 99.99% of the effect is api and type theory: slices being built and
> foreach being overloadable with ranges, ranges being there
Slices/arrays carrying length wherever they go is a HUGE benefit in
retrospect. I remember the early days of arguments about "wasting"
precious bytes by passing array lengths around all the time where "a
simple pointer would do the job". And also arguments about "wasting"
precious CPU cycles performing bounds checks. That was so shortsighted.
:-D Moore's Law ensured such arguments would become irrelevant.
The whole concept behind ranges, while seemingly simple in retrospect,
neatly abstract the idea of iteration in a far-reaching scope that
eliminates 90% of the for-loops I write. Loop conditions are
notoriously hard to get right, and humans are bad at repetitive tasks
like writing iteration from 0 to N. Or was it 1 to N-1? Or maybe 1 to
N? Or 0 to N-1? Abstracting this away eliminates an entire subclass of
off-by-1 errors, in addition to opening up new avenues of expressivity.
Every time I write C I miss being able to write a UFCS chain to massage
my data into a printable form -- in C it's a painful diversion from what
you really want to be focusing on -- finding the bug -- and writing a
separate function just to dump data in a human-readable way. In D, it's
2 seconds writing a UFCS chain and you go back to debugging immediately.
Huge difference.
And then there's the decision to use GC: the haters will hate, but
having a GC at my disposal simplified like 99% of my APIs. Clean APIs
lead to less bugs, more composibility, and more reusability. Recently I
had to implement a new API in C, and it was painful. At every turn you
have to worry about whether/how buffers would be passed, who would be
responsible for allocation, who for deallocation, what to do in case of
errors, ad nauseum. Should I return a pointer to a static string
buffer? Should I allocate and hope my caller doesn't forget to free?
Should I ask instead for a buffer from the caller to fill? So much
mental effort is expended on such peripheral yet important questions.
And then afterwards you discover to your chagrin that you've actually
managed to FORGET about one important detail, and your program just
crashed because of a dangling pointer. Again. In D, I can just
allocate and return, or return a pointer to a static buffer, or whatever
-- IT DOESN'T CHANGE THE API. That's huge. I can totally retrofit my
function with a new allocation mechanism and the callers don't even have
to know -- it will JUST WORK. The haters can hate, but I still love my
GC.
> This is not static analysis, I dont use any static analysis keywords
> that would cause any of it to be inside my code, yet Im not running
> into c like segfaults every time I write string code
You *do* realize that every time you write string code in D, you're
using static analysis, right? ;-)
Remember, string = immutable(char)[]. That "tiny" decision to stick
"immutable" on it is what makes D strings so straightforward to use.
The fact that immutable is compile-time enforced means nobody can
sneakily mutate stuff behind your back and ruin your algorithms; your
code can count on the string staying the way it is and not suddenly
mutating into something else, breaking your assumptions, and causing
your code to crash. Without that, your string APIs would be full of
problems with aliasing and functions trampling over each other's
strings, leading to a massive mess like in C, where, when reading string
manipulation code, the back of your mind is constantly wondering, is
this going to overrun the buffer? Is the buffer being aliased? Why is
this code modifying what's supposed to be an immutable identifier? Will
this affect some code out there that assumes strings don't change? Why
does this function take non-const? Now I have to write yet another cast.
Cross your fingers that some idiot in the future (namely, yours truly 2
months later) doesn't start mutating it and causing a crash in an
unrelated module. And then out of paranoia you start strcpy'g your
strings everywhere, and Schlemiel starts showing up in your performance
benchmarks.
Remember back in the day when somebody had the bright idea to make AA's
more "user-friendly" by allowing const or even mutable arrays as keys?
Yeah, the aftermath was NOT pretty. Took us at least a couple of years
to clean up the mess amid user complaints like "why did my key suddenly
disappear from my AA?" and "this key is obviously in the AA why isn't it
finding it?!".
> Hot take, 70% of it of my avoidance of segfault comes from foreach
> being able to use range alone; if Phobos wasn't there but the way I
> made datastructures was front,pop, empty; I would not make segfaults.
Retrospect is always 20/20 as they say. Before Columbus made the egg
stand upright nobody thought it was possible; afterwards they dismissed
it with "well of course that's obviously how you do it!". Now that
we're used to range-based UFCS chains, it seems to be the most obvious
way of thinking about your algorithms. Whereas before, the big picture
was drowned out by the nitty-gritty of initializing and bumping your
loop variable and writing the right loop conditions so that you exit at
the right time without off-by-1 errors. The *idea* of ranges elevated
our thinking to a whole 'nother level, that it doesn't even make any
sense to go back to thinking about loop counter bumping anymore.
Besides, segfaults are only a tiny part of what D eliminates by design.
D's GC, complain about it as anyone might, has nevertheless saved me
hours, no, days and weeks, of chasing down pointer bugs and
use-after-free errors. It eliminated an entire class of bugs from my
code, not to mention make my APIs cleaner so callers are less likely to
use it wrongly. And it removed the huge mental load of having to worry
about memory management issues all the danged time, which you have to do
every time you write non-trivial C code, and freed up mental resources
to actually, y'know, focus on the problem you're supposed to be solving
with your code instead of micromanaging memory management nitty-gritty.
T
--
Latin's a dead language, as dead as can be; it killed off all the Romans, and now it's killing me! -- Schoolboy
More information about the Digitalmars-d
mailing list