exern (C) linkage problem

Tue Jul 20 02:40:58 PDT 2010

On Tue, 20 Jul 2010 05:10:47 -0400, bearophile wrote:

> In that code, for further safety, I'd like to make it not possible
> (without a cast) code like this (here toStringz doesn't get called):
> strcmp(Cstring(s1.ptr), Cstring(s2.ptr));
> 
> So I think this code is a bit better:
> 
> import std.string: toStringz;
> 
> struct Cstring {
>     const(char)* ptr; // const(ubyte)* ?
>     static Cstring opCall(string s) {
>         Cstring cs;
>         cs.ptr = toStringz(s);
>         return cs;
>     }
> }
> 
> extern(C) Cstring strcmp(Cstring s1, Cstring s2);
> 
> void main() {
>     auto s1 = "abba";
>     auto s2 = "red";
>     auto r2 = strcmp(Cstring(s1), Cstring(s2));
> }
> 
> Lars T. Kyllingstad:
> 
>> but I think it should wrap a ubyte*, not a char*.  The reason for this
>> is that D's char is supposed to be a UTF-8 code unit, whereas C's char
>> can be anything.
> 
> Right. But toStringz() returns a const(char)*, so do you want to change
> toStringz() first?

Yes.  I think we should stop using char* when interfacing with C code 
altogether.  The "right" thing to do, if you can call it that, would be 
to use char* only if you KNOW the C function expects text input encoded 
as UTF-8 (or just plain ASCII), and ubyte* for other encodings and non-
textual data.  But this rule requires knowledge of what each function 
does with its input and must hence be applied on a case-by-case basis, 
which makes automated translation of C headers to D difficult.

So I say make it simple, don't assume that your C functions handle UTF-8, 
and use ubyte* everywhere.  (Actually, it's not that simple, either.  I 
just remembered that C's char is sometimes signed, sometimes unsigned...)

Maybe this should be discussed on the main NG.  It's been bothering me 
for a while.  I think I'll start a topic on it later.

-Lars