Either I'm confused or the gc is

H. S. Teoh hsteoh at quickfur.ath.cx
Fri Oct 23 19:14:15 UTC 2020


On Fri, Oct 23, 2020 at 06:31:41PM +0000, donallen via Digitalmars-d wrote:
[...]
> Some food for thought:
> 
> If I insert a writeln at the very beginning of walk_account_tree,
> printing, say, the name and guid of the account, the problem goes away
> -- the program behaves correctly.

Curious.  IME, this is usually the sign of memory corruption elsewhere
in the code, and it's just pure happenstance that changing the
allocation pattern causes the corruption to overwrite something else
that doesn't lead to observable symptoms.

I wonder what might happen if instead of inserting a writeln, you
inserted a dummy allocation? Like:

	int[] dummy = [ 1, 2, 3 ];
	printf("%d\n", dummy.length); // don't let optimizer elide it


> If I change
> children[i].path = format("%s:%s", account.path, children[i].name);
> to
> children[i].path = format("%s:%s", account.path, children[i].name).dup;
> 
> in the loop that fills in the child Account structs in children, same
> thing -- correct behavior. The dup of the format should not be
> necessary, since format returns a new GC-allocated string.

Indeed.  So again it looks like some kind of memory corruption.  Only,
without a reproducible test case we're just shooting in the dark here as
to what exactly is causing it.  Have you looked into Dustmite?  Once you
set it up, it's fully automated, and you can leave it running in the
background until it has reduced the code to a minimal(-ish) test case,
so it won't consume too much of your time.

Another random shot in the dark is, what compiler flag(s) are you using
to compile the program?  If you're using dmd with -O or -inline (or
both), there's a small chance you might be seeing a codegen bug.  (I've
had bad experiences with dmd's backend before, both in terms of codegen
bugs and poor optimization quality, and nowadays only use dmd for quick
prototyping or fast turnaround during development; for production I
avoid dmd and use LDC's much better and more reliable backend.)


> I suspect all of these odd "cures" have the effect they do because
> they change the allocation pattern of this section of the code.
> Without any of them, it appears that an allocation is triggering a GC
> at exactly the wrong moment. What is not understood is what is wrong
> with the moment.
[...]

Indeed.  If you could get dustmite to produce a minimal testcase that we
can use to reproduce the problem locally, we'd be able to help you
pinpoint exactly what is wrong with your code.  Or determine that it's a
GC bug, should it come to that.


T

-- 
Give a man a fish, and he eats once. Teach a man to fish, and he will sit forever.


More information about the Digitalmars-d mailing list