exern (C) linkage problem
Lars T. Kyllingstad
public at kyllingen.NOSPAMnet
Tue Jul 20 02:40:58 PDT 2010
On Tue, 20 Jul 2010 05:10:47 -0400, bearophile wrote:
> In that code, for further safety, I'd like to make it not possible
> (without a cast) code like this (here toStringz doesn't get called):
> strcmp(Cstring(s1.ptr), Cstring(s2.ptr));
>
> So I think this code is a bit better:
>
> import std.string: toStringz;
>
> struct Cstring {
> const(char)* ptr; // const(ubyte)* ?
> static Cstring opCall(string s) {
> Cstring cs;
> cs.ptr = toStringz(s);
> return cs;
> }
> }
>
> extern(C) Cstring strcmp(Cstring s1, Cstring s2);
>
> void main() {
> auto s1 = "abba";
> auto s2 = "red";
> auto r2 = strcmp(Cstring(s1), Cstring(s2));
> }
>
> Lars T. Kyllingstad:
>
>> but I think it should wrap a ubyte*, not a char*. The reason for this
>> is that D's char is supposed to be a UTF-8 code unit, whereas C's char
>> can be anything.
>
> Right. But toStringz() returns a const(char)*, so do you want to change
> toStringz() first?
Yes. I think we should stop using char* when interfacing with C code
altogether. The "right" thing to do, if you can call it that, would be
to use char* only if you KNOW the C function expects text input encoded
as UTF-8 (or just plain ASCII), and ubyte* for other encodings and non-
textual data. But this rule requires knowledge of what each function
does with its input and must hence be applied on a case-by-case basis,
which makes automated translation of C headers to D difficult.
So I say make it simple, don't assume that your C functions handle UTF-8,
and use ubyte* everywhere. (Actually, it's not that simple, either. I
just remembered that C's char is sometimes signed, sometimes unsigned...)
Maybe this should be discussed on the main NG. It's been bothering me
for a while. I think I'll start a topic on it later.
-Lars
More information about the Digitalmars-d-learn
mailing list