More radical ideas about gc and reference counting

Manu via Digitalmars-d digitalmars-d at puremagic.com
Fri May 9 23:27:22 PDT 2014


On 10 May 2014 07:05, Wyatt via Digitalmars-d
<digitalmars-d at puremagic.com> wrote:
> On Friday, 9 May 2014 at 16:12:00 UTC, Manu via Digitalmars-d wrote:
>
> I've been digging into research on the subject while I wait for test scripts
> to run, and my gut feeling is it's definitely possible to get GC at least
> into striking distance, but I'm not nearly an expert on this area.
>
> (Some of these are dead clever, though! I just read this one today:
> https://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/local-gc.pdf)

Well, that's a nice paper. But I've seen heaps of people paste heaps
of papers, and nobody EVER come along and say "this, we could do this
in D".
I'm happy to be proven wrong, in fact, there's nothing I'd like more.
I'm not an expert on GC (and I don't really want to be), but I've been
trolling this forum for years and the conversation emerges regularly.
As far as I can tell, there is a de facto agreement that any of those
potentially awesome GC's are not actually feasible in the context of
D.

This is perhaps my biggest mistake in my commitment to D; I came along
and identified it as a red-flag on day 1, but I saw there was lots of
discussion and activity on the matter. I assumed I shouldn't worry
about it, it would sort itself out...
Years later, still, nobody seems to have any idea what to do, at
least, such that it would be addressed in a manner acceptable to my
work.
The only option I know that works is Obj-C's solution, as demonstrated
by a very successful embedded RTOS, and compared to competition, runs
silky smooth. Indeed iOS makes it a specific design goal that it
should always feel silky smooth, never stuttery, they consider it a
point of quality, and I completely agree. I don't know what other
horse to back?

>> I don't know how to reconcile the problem with the existing GC,
>> and I am not happy to sacrifice large parts of the language for
>> it.  I've made the argument before that sacrificing large parts
>> of the language as a 'work-around' is, in essence, sacrificing
>> practically all libraries. That is a truly absurd notion; to
>> suggest that anybody should take advice to sacrifice access to
>> libraries is being unrealistic.
>>
> This is important, and simply throwing up our collective hands and saying to
> just not use major language features (I believe I recall slices were in that
> list?) really doesn't sit well with me either.
>
> But conversely, Manu, something has been bothering me: aren't you restricted
> from using most libraries anyway, even in C++?

No, and this is where everyone seems to completely lose the point.
Just because high performance/realtime code has time critical parts,
that tends not to be much of your code. It is small by volume, and
received disproportionate attention by coders. It's fine, forget about
that bit, except for that it needs to be able to run uninterrupted.
_Most_ of your code is ancillary logic and glue, which typically runs
in response to events, and even though it's execution frequency is
super-low, it's still often triggered in realtime thread (just not
very often).
There are also many background threads you employ to do low priority
tasks where the results aren't an immediate requirement.
Some of these tasks include: resource management, loading and
preparing data, communications/networking, processing low-frequency
work; almost all of these tasks make heavy use of 3rd party libraries,
and allocate.

You can't have an allocation stop the world, because it stops the
realtime threads too, at least, under any mythical GC scheme I'm aware
of that's been proposed as a potential option for D?

> "Decent" or "acceptable"
> performance isn't anywhere near "maximum", so shouldn't any library code
> that allocates in any language be equally suspect?  So from that standpoint,
> isn't any library you use in any language going to _also_ be tuned for
> performance in the hot path?  Maybe I'm barking up the wrong tree, but I
> don't recall seeing this point addressed.

A library which is a foundation of a realtime system will employ
realtime practises. Those are not the libraries I'm worried about.
Most libraries that are useful aren't those libraries. They are tool
libs, and they are typically written to be simple and maintainable,
and usually by a PC developer, with no real consideration towards
specific target applications.

> More generally, I feel like we're collectively missing some important
> context:  What are you _doing_ in your 16.6ms timeslice?  I know _I'd_
> appreciate a real example of what you're dealing with without any hyperbole.

It doesn't matter what I'm doing in my 16ms timeslice most of the
time. I'm running background threads, and also triggering occasional
low frequency events in the realtime thread.
Again, most code by volume is logic and glue, it is not typically
serviced intensively like the core realtime systems, and most often,
by the junior programmers...
I appreciate that I haven't successfully articulated the function of
this code, but that is because to describe "what I'm doing" would be
to give you a million lines of code to nit-pick through. Almost
anything you can imagine, is the answer, as long as it's reasonably
short such that it's not worth the synchronisation costs of queueing
it with a parallel job manager or whatever.
This logic and glue needs to have access to all the conveniences of
the language for productivity and maintainability reasons, and
typically, if you execute only one or 2 of these bits of code per
frame, it will have no meaningful impact on performance... unless it
allocates, triggers a collect, and freezes the system. I repeat, the
juniors... D has lots of safety features to save programmers from
themselves, and I don't consider it a workable option, or goal for the
language, to suggest we should abandon them.

ARC overhead would have no meaningful impact on performance, GC may
potentially freeze execution. I am certain I would never notice ARC
overhead on a profiler, and if I did, there are very simple methods to
shift it elsewhere in the few specific circumstances it emerges.

> What actually _must_ be done in that timeframe?  Why must collection run
> inside that window?  What must be collected when it runs in that situation?
> (Serious questions.)

Anything can and does happen in low-frequency event logic. Collection
'must' run in that window in the event an allocation exists, and there
is no free memory, which is the likely scenario.
strings? closures? array initialisations were a problem (i'm not sure
if that was considered a bug and fixed though?). even some should-be
stack allocations are allocated when the compiler thinks it's a
requirement for safety.
string interaction with C libs is good source of allocations, but
there are many.
Or even small transient allocations, temp's to do small amounts of
work which would otherwise be released upon scope exit. Banning that
sort of practise throws a massive spanner in conventional software
engineering practise that everyone is familiar with.

> See, in the final-by-default discussions, you clearly explained the issues
> and related them well to concerns that are felt broadly, but this... yeah, I
> don't really have any context for this, when D would already be much faster
> than the thirty years of C navel lint (K&R flavour!) that I grapple in my
> day job.

I appreciate your argument. I realise it's why I've had so little
effect... I just can't easily articulate specific instances, because
they are basically unknowable until they happen, and they're hidden by
literally millions of loc. Implicit allocations appear all the time,
and if you deliberately ban them, you quickly find all those language
features don't work and your code gets a lot more difficult to write
and maintain. It's a practicality, risk aversion, and maintenance
issue. If you had to tag main() with @nogc, and then write a
millions-of-loc program in a team of 60, D becomes so much less
compelling argument than otherwise. D has 2 major advantages over C++
as I see. 1, meta, and that's great, but you don't sit and write meta
all the time. 2, productivity and correctness, which I find to be the
more compelling case to be made to adopt D. It affects all programmers
all day, every day, and we lose many aspects of the languages offering
if we tag @nogc, which includes libs.


More information about the Digitalmars-d mailing list