converting D's string to use with C API with unicode
tsbockman
thomas.bockman at gmail.com
Sat Dec 5 20:45:40 UTC 2020
On Saturday, 5 December 2020 at 19:51:14 UTC, Jack wrote:
>>version(Windows) extern(C) export
>>struct C_ProcessResult
>>{
>> wchar*[] output;
In D, `T[]` (where T is some element type, `wchar*` in this case)
is a slice structure that bundles a length and a pointer
together. It is NOT the same thing as `T[]` in C. You will get
memory corruption if you try to use `T[]` directly when
interfacing with C.
Instead, you must use a bare pointer, plus a separate length/size
if the C API accepts one. I'm guessing that
`C_ProcessResult.output` should have type `wchar**`, but I can't
say for sure without seeing the Windows API documentation or C
header file in which the C structure is detailed.
>> bool ok;
>>}
>>struct ProcessResult
>>{
>> string[] output;
>> bool ok;
>>
>> C_ProcessResult toCResult()
>> {
>> auto r = C_ProcessResult();
>> r.ok = this.ok; // just copy, no conversion needed
>> foreach(s; this.output)
>> r.output ~= cast(wchar*)s.ptr;
This is incorrect, and will corrupt memory. `cast(wchar*)` is a
reinterpret cast, and an invalid one at that. It says, "just take
my word for it, the data at the address stored in `s.ptr` is
UTF16 encoded." But, that's not true: the data is UTF8 encoded,
because `s` is a `string`, so this will thoroughly confuse things
and not do what you want at all. The text will be garbled and you
will likely trigger a buffer overrun on the C side of things.
What you need to do instead is allocate a separate array of
`wchar[]`, and then use the UTF8 to UTF16 conversion algorithm to
fill the new `wchar[]` array based on the `char` elements in `s`.
The conversion algorithm is non-trivial, but the `std.encoding`
module can do it for you.
>> return r;
>> }
>>}
>
Note also that when exchanging heap-allocated data (such as most
strings or arrays) with a C API, you must figure out who is
responsible for de-allocating the memory at the proper time - and
NOT BEFORE. If you allocate memory with D's GC (using `new` or
the slice concatenation operators `~` and `~=`), watch out that
you keep a reference to it alive on the D side until after the C
API is completely done with it. Otherwise, D's GC may not realize
it's still in use, and may de-allocate it early, causing memory
corruption in a way that is very difficult to debug.
More information about the Digitalmars-d-learn
mailing list