D is nice whats really wrong with gc??

H. S. Teoh hsteoh at qfbox.info
Fri Dec 22 22:33:35 UTC 2023


On Fri, Dec 22, 2023 at 09:40:03PM +0000, bomat via Digitalmars-d-learn wrote:
> On Friday, 22 December 2023 at 16:51:11 UTC, bachmeier wrote:
> > Given how fast computers are today, the folks that focus on memory
> > and optimizing for performance might want to apply for jobs as
> > flooring inspectors, because they're often solving problems from the
> > 1990s.
> 
> *Generally* speaking, I disagree. Think of the case of GTA V where
> several *minutes* of loading time were burned just because they
> botched the implementation of a JSON parser.

IMNSHO, if I had very large data files to load, I wouldn't use JSON.
Precompile the data into a more compact binary form that's already ready
to use, and just mmap() it at runtime.


> Of course, this was unrelated to memory management. But it goes to
> show that today's hardware being super fast doesn't absolve you from
> knowing what you're doing... or at least question your implementation
> once you notice that it's slow.

My favorite example is this area is the poor selection of algorithms, a
very common mistake being choosing an O(n²) algorithm because it's
easier to implement than the equivalent O(n) algorithm, and not very
noticeable on small inputs. But on large inputs it slows to an unusable
crawl. "But I wrote it in C, why isn't it fast?!" Because O(n²) is
O(n²), and that's independent of language. Given large enough input, an
O(n) Java program will beat the heck out of an O(n²) C program.


> But that is true for any language, obviously.
>
> I think there is a big danger of people programming in C/C++ and
> thinking that it *must* be performing well just because it's C/C++.
> The C++ codebase I have to maintain in my day job is a really bad
> example for that as well.

"Elegant or ugly code as well as fine or rude sentences have something
in common: they don't depend on the language." -- Luca De Vitis

:-)


> > I say this as I'm in the midst of porting C code to D. The biggest
> > change by far is deleting line after line of manual memory
> > management.  Changing anything in that codebase would be miserable.
> 
> I actually hate C with a passion.

Me too. :-D


> I have to be fair though: What you describe doesn't sound like a
> problem of the codebase being C, but the codebase being crap. :)

Yeah, I've seen my fair share of crap C and C++ codebases. C code that
makes you do a double take and stare real hard at the screen to
ascertain whether it's actually C and not some jokelang or exolang
purposely designed to be unreadable/unmaintainable. (Or maybe it would
qualify as an IOCCC entry. :-D)  And C++ code that looks like ... I
dunno what.  When business logic is being executed inside of a dtor, you
*know* that your codebase has Problems(tm), real big ones at that.



> If you have to delete "line after line" of manual memory management, I
> assume you're dealing with micro-allocations on the heap - which are
> performance poison in any language.

Depends on what you're dealing with.  Some micro-allocations are totally
avoidable, but if you're manipulating a complex object graph composed of
nodes of diverse types, it's hard to avoid. At least, not without
uglifying your APIs significantly and introducing long-term
maintainability issues.  One of my favorite GC "lightbulb" moments is
when I realized that having a GC allowed me to simplify my internal APIs
significantly, resulting in much cleaner code that's easy to debug and
easy to maintain. Whereas the equivalent bit of code in the original C++
codebase would have required disproportionate amounts of effort just to
navigate the complex allocation requirements.

These days my motto is: use the GC by default, when it becomes a
problem, then use a more manual memory management scheme, but *only
where the bottleneck is* (as proven by an actual profiler, not where you
"know" (i.e., imagine) it is).  A lot of C/C++ folk (and I speak from my
own experience as one of them) spend far too much time and energy
optimizing things that don't need to be optimized, because they are
nowhere near the bottleneck, resulting in lots of sunk cost and added
maintenance burden with no meaningful benefit.


[...]
> Of course, this directly leads to the favorite argument of C
> defenders, which I absolutely hate: "Why, it's not a problem if you're
> doing it *right*."
> 
> By this logic, you have to do all these terrible mistakes while
> learning your terrible language, and then you'll be a good programmer
> and can actually be trusted with writing production software - after
> like, what, 20 years of shooting yourself in the foot and learning
> everything the hard way?  :) And even then, the slightest slipup will
> give you dramatic vulnerabilities.  Such a great concept.

Year after year I see reports of security vulnerabilities, the most
common of which are buffer overflows, use-after-free, and double-free.
All of which are caused directly by using a language that forces you to
manage memory manually.  If C were only 10 years old, I might concede
that C coders are just inexperienced, give them enough time to learn
from field experience and the situation will improve. But after 50
years, the stream of memory-related security vulnerabilities still
hasn't ebbed.  I think it's beyond dispute that even the best C coders
make mistakes -- because memory management is HARD, and using a language
that gives you no help whatsoever in this department is just inviting
trouble. I've personally seen the best C coders commit blunders, and in
C, all it takes is *one* blunder among millions of lines of code that
manage memory, and you have a glaring security hole.

It's high time people stepped back to think hard about why this is
happening, and why 50 years of industry experience and hard-earned best
practices has not improved things.

And also think hard about why eschew the GC when it could
single-handedly remove this entire category of bugs from your program in
one fell swoop.

(Now, just below memory-related security bugs is data sanitization bugs.
Unfortunately the choice of language isn't going to help you very much
in there...)


T

-- 
In theory, software is implemented according to the design that has been carefully worked out beforehand. In practice, design documents are written after the fact to describe the sorry mess that has gone on before.


More information about the Digitalmars-d-learn mailing list