toStringz note about keeping references
Jonathan M Davis
jmdavisProg at gmx.com
Sun Oct 14 16:56:23 PDT 2012
On Monday, October 15, 2012 01:36:27 Andrej Mitrovic wrote:
> On 10/15/12, Jonathan M Davis <jmdavisProg at gmx.com> wrote:
> > I'd have to see exactly what TDPL says to comment on that accurately
>
> Maybe I've misread it. On Page 288 it says:
>
> "An immutable value is cast in stone: as soon as it's been
> initialized, you may as well
> consider it has been burned forever into the memory storing it. It
> will never change
> throughout the execution of the program."
>
> Perhaps what was missing is: "as long as there is a reference to that data".
That says _nothing_ about collection. It's only saying that the value won't
ever change. It's trying to highlight the difference between const and
immutable. It would make _no_ sense for immutable data to not be collected
when all references to it were gone.
> I'd really like to know for sure if the GC implementation actually
> collects immutable data or not.
I guarantee that if it doesn't, it's a bug. There are exceptions (e.g. string
literals in Linux - because they go in ROM), and the GC isn't exactly
enthusiastic about reclaiming memory, which means that stuff can hang around
for quite a while, but normally, immutability should have no effect on the
lifetime of an object.
> I've always used toStringz in direct
> calls to C without caring about keeping a reference to the source
> string in D code.
It's perfectly safe as long as the C function doesn't hold on to the pointer.
If it does, then you could get screwed later on when that pointer gets used,
and whether it works or not then becomes non-deterministic (since it depends
on whether the GC has collected the memory or not and whether that memory has
been reused) which could cause some really nasty bugs. That's why the note on
toStringz is there in the first place. It would not surprise me at all if it's
a common bug when interfacing with C that references are not kept around when
they should be. I suspect that the main reason that it doesn't cause more
issues is because most C functions don't keep pointers around.
> Anyway this stuff is important for OOP wrappers of C/C++ libraries. If
> the string reference must kept on the D side then this makes writing
> wrappers harder.
That's true, but to some extent, that's just life when dealing with
interfacing with code outside of the GC's reach.
However, I believe that another option is to explicitly tell the GC not
collect a chunk of memory (glancing at core.memory, I suspect that removeRoot
is the function to use for that, but I've never done it before, so I'm not
well acquainted with the details). But if you want it to ever be collected,
you'd need to make sure that it was readded to the GC again later, which could
also complicate wrappers. It _is_ another option though if keeping a reference
around in the D code is problematic.
> And what about multiple calls? What if on each call to c_Foo_test()
> the C library stores each 'input' pointer internally? That would mean
> we have to keep an array of these pointers on the D side.
Potentially, yes.
> It's not know what the C library does without inspecting the source of
> the C library. So it becomes very difficult to write wrappers which
> are GC-safe.
Unfortunately, that's true. However, remember that in C, you normally have to
manage your own memory, so if C functions aren't appropriately clear about who
owns what memory or which pointers get kept, then they'll run into serious
problems in pure C. So, in general, I would expect a C function to be fairly
clear when it keeps a pointer around or gives you a pointer to memory that it
allocated or controls. But it's fairly rare that C functions keep pointers
around (that would mean using global variables which are generally rare), so
in most cases, it's a non-issue.
> There are wrappers out there that seem to expect the source won't be
> collected. For example GtkD also uses toStringz in calls to C without
> ever storing a reference to the input string.
As long as the function doesn't keep any of the pointers that it's given, then
it's fine. If it _does_ keep a pointer around, then it's a bug for the D code
not to keep a reference around. But as I said, it's fairly rare for C code to
do that, which is probably why this doesn't cause more issues. But the note on
toStringz is there precisely because most people aren't going to think of that
problem, and they need to be aware of it when using toStringz.
- Jonathan M Davis
More information about the Digitalmars-d-learn
mailing list