toStringz note about keeping references
Charles Hixson
charleshixsn at earthlink.net
Tue Oct 16 10:47:11 PDT 2012
On 10/14/2012 04:54 PM, Ali Çehreli wrote:
> On 10/14/2012 04:36 PM, Andrej Mitrovic wrote:
> > On 10/15/12, Jonathan M Davis<jmdavisProg at gmx.com> wrote:
> >> I'd have to see exactly what TDPL says to comment on that accurately
> >
> > Maybe I've misread it. On Page 288 it says:
> >
> > "An immutable value is cast in stone: as soon as it's been
> > initialized, you may as well
> > consider it has been burned forever into the memory storing it. It
> > will never change
> > throughout the execution of the program."
> >
> > Perhaps what was missing is: "as long as there is a reference to that
> data".
>
> Andrei must have written that only D in mind, without any C interaction.
> When we consider only D, then the statement is correct: If there is no
> more references, how can the application tell that the data is gone or not?
>
> > I'd really like to know for sure if the GC implementation actually
> > collects immutable data or not.
>
> It does. Should be easy to test with an infinite loop that generates
> immutable data.
>
> > I've always used toStringz in direct
> > calls to C without caring about keeping a reference to the source
> > string in D code. I'm sure others have used it like this as well.
>
> It depends on whether the C-side keeps a copy of that pointer.
>
> > Maybe the only reason my apps which use C don't crash is because a GC
> > cycle doesn't often run, and when it does run it doesn't collect the
> > source string data (either on purpose or because of buggy behavior, or
> > because the GC is imprecise).
> >
> > Anyway this stuff is important for OOP wrappers of C/C++ libraries. If
> > the string reference must kept on the D side then this makes writing
> > wrappers harder. For example, let's say you've had this type of
> > wrapper:
> >
> > extern(C) void* get_Foo_obj();
> > extern(C) void* c_Foo_test(void* c_obj, const(char)* input);
> >
> > class Foo
> > {
> > this() { c_Foo_obj = get_Foo_obj(); } // init c object by calling
> > a C function
> >
> > void test(string input)
> > {
> > c_Foo_test(c_Foo_obj, toStringz(input));
> > }
> >
> > void* c_Foo_obj; // reference to C object
> > }
> >
> > Should we always store a reference to 'input' to avoid GC collection?
>
> If the C function copies the pointer, yes.
>
> > E.g.:
> >
> > class Foo
> > {
> > this() { c_Foo_obj = get_Foo_obj(); } // init c object by calling
> > a C function
> >
> > void test(string input)
> > {
> > input_ref = input
> > c_Foo_test(c_Foo_obj, toStringz(input));
> > }
> >
> > string input_ref; // keep it alive, C might use it after test() returns
>
> That's exactly what I do in a C++ library that wraps C types.
>
> > void* c_Foo_obj; // reference to C object
> > }
> >
> > And what about multiple calls? What if on each call to c_Foo_test()
> > the C library stores each 'input' pointer internally? That would mean
> > we have to keep an array of these pointers on the D side.
>
> Again, that's exactly what I do in C++. :) There is a global container
> that keeps the objects alive.
>
> > It's not know what the C library does without inspecting the source of
> > the C library. So it becomes very difficult to write wrappers which
> > are GC-safe.
>
> Most functions document what they do with the input parameters. If not,
> it is usually obvious.
>
> > There are wrappers out there that seem to expect the source won't be
> > collected. For example GtkD also uses toStringz in calls to C without
> > ever storing a reference to the input string.
>
> Must be verified case-by-case.
>
> Ali
>
There's a problem with this kind of ad hoc solution... If the library
version changes, it can break things without changing the interface.
I think the real answer is that there needs to be a C layer between the
D program and the library that copies any immutable data that may need
to be kept, and only passes pointers to the copy to the C library.
OTOH, a nicer solution would be if there were a way of marking in D that
"This item should never be garbage collected". There are ways to do
approximately that, but unfortunately the one's I'm recalling only work
on class instances. That's a clumsy way to do things. What seems
better is a wrapper around an item that holds a reference, so that data
is marked held. This could later be released. This is actually just
"keeping a reference", but it's a bit of syntactic sugar to make it
easy. Call the pair of functions "hold" and "release" or some such.
(Actually, hold would need to do a bit more than just keep a reference.
It would also need to ensure that the data wasn't on the stack, which
might mean you were working with a duplicate of the data, but since it's
immutable, that shouldn't matter.)
More information about the Digitalmars-d-learn
mailing list