toStringz note about keeping references

Jonathan M Davis jmdavisProg at gmx.com
Sun Oct 14 16:56:23 PDT 2012


On Monday, October 15, 2012 01:36:27 Andrej Mitrovic wrote:
> On 10/15/12, Jonathan M Davis <jmdavisProg at gmx.com> wrote:
> > I'd have to see exactly what TDPL says to comment on that accurately
> 
> Maybe I've misread it. On Page 288 it says:
> 
> "An immutable value is cast in stone: as soon as it's been
> initialized, you may as well
> consider it has been burned forever into the memory storing it. It
> will never change
> throughout the execution of the program."
> 
> Perhaps what was missing is: "as long as there is a reference to that data".

That says _nothing_ about collection. It's only saying that the value won't 
ever change. It's trying to highlight the difference between const and 
immutable. It would make _no_ sense for immutable data to not be collected 
when all references to it were gone.

> I'd really like to know for sure if the GC implementation actually
> collects immutable data or not.

I guarantee that if it doesn't, it's a bug. There are exceptions (e.g. string 
literals in Linux - because they go in ROM), and the GC isn't exactly 
enthusiastic about reclaiming memory, which means that stuff can hang around 
for quite a while, but normally, immutability should have no effect on the 
lifetime of an object.

> I've always used toStringz in direct
> calls to C without caring about keeping a reference to the source
> string in D code.

It's perfectly safe as long as the C function doesn't hold on to the pointer. 
If it does, then you could get screwed later on when that pointer gets used, 
and whether it works or not then becomes non-deterministic (since it depends 
on whether the GC has collected the memory or not and whether that memory has 
been reused) which could cause some really nasty bugs. That's why the note on 
toStringz is there in the first place. It would not surprise me at all if it's 
a common bug when interfacing with C that references are not kept around when 
they should be. I suspect that the main reason that it doesn't cause more 
issues is because most C functions don't keep pointers around.

> Anyway this stuff is important for OOP wrappers of C/C++ libraries. If
> the string reference must kept on the D side then this makes writing
> wrappers harder.

That's true, but to some extent, that's just life when dealing with 
interfacing with code outside of the GC's reach.

However, I believe that another option is to explicitly tell  the GC not 
collect a chunk of memory (glancing at core.memory, I suspect that removeRoot 
is the function to use for that, but I've never done it before, so I'm not 
well acquainted with the details). But if you want it to ever be collected, 
you'd need to make sure that it was readded to the GC again later, which could 
also complicate wrappers. It _is_ another option though if keeping a reference 
around in the D code is problematic.

> And what about multiple calls? What if on each call to c_Foo_test()
> the C library stores each 'input' pointer internally? That would mean
> we have to keep an array of these pointers on the D side.

Potentially, yes.

> It's not know what the C library does without inspecting the source of
> the C library. So it becomes very difficult to write wrappers which
> are GC-safe.

Unfortunately, that's true. However, remember that in C, you normally have to 
manage your own memory, so if C functions aren't appropriately clear about who 
owns what memory or which pointers get kept, then they'll run into serious 
problems in pure C. So, in general, I would expect a C function to be fairly 
clear when it keeps a pointer around or gives you a pointer to memory that it 
allocated or controls. But it's fairly rare that C functions keep pointers 
around (that would mean using global variables which are generally rare), so 
in most cases, it's a non-issue.

> There are wrappers out there that seem to expect the source won't be
> collected. For example GtkD also uses toStringz in calls to C without
> ever storing a reference to the input string.

As long as the function doesn't keep any of the pointers that it's given, then 
it's fine. If it _does_ keep a pointer around, then it's a bug for the D code 
not to keep a reference around. But as I said, it's fairly rare for C code to 
do that, which is probably why this doesn't cause more issues. But the note on 
toStringz is there precisely because most people aren't going to think of that 
problem, and they need to be aware of it when using toStringz.

- Jonathan M Davis


More information about the Digitalmars-d-learn mailing list