Biggest problems w/ D - strings

Sat Aug 11 15:46:16 PDT 2007

C. Dunn wrote:
> BCS Wrote:
> 
>> Reply to Sean,
>>
>>> C. Dunn wrote:
>>>
>>>> I have a field of n chars stored on disk.  It holds a null-terminated
>>>> string, padded with zeroes.  It is amazingly difficult to compare
>>>> such a char[n] with some other char[] (which, by the dictates of D,
>>>> may or may not be null-terminated).
>>>>
>>> I'm not sure I understand.  Why bother computing string length in the
>>> C fashion when D provides a .length property which holds this
>>> information?
>>>
>>> Sean
>>>
>> He might be using a D char[] as an oversized buffer for a c style string.
> 
> Exactly.  This is very common in the database world.  The disk record has a fixed size, so I have a struct which looks like this:
> 
> struct Data{
>   int id;
>   char[32] name;
>   // ...
> };
> 
> A C function produces this data.  D can accept the C struct with no problems.  'name' is just a static array.  But processing the name field in D is awkward.  'name.length' is 32, but 'strlen(name)' could be less (or infinity if the string is a full 32 characters sans zeroes, which is why I need strnlen()).

I use postgre and mysql for lots of things. Postgre is much easier to grab the string length from cause it returns with the tuple. If I remember right, internally, the schema is stored, then each string looks like this:

for varchar <= 255
struct Firstname {
	ubyte length;
	char[size] data;
}

for varchar <= 65535
struct Firstname {
	ushort length;
	char[size] data;
}

--------------------------

why would you want to zero terminate your strings in a database form. It doesn't make any sense. you trade off 1 byte of savings for up to 255 loops to find zero, or two bytes of savings for up to 65535 loops. Consider you have two options... you can always null terminate it -- which means that for strings shorter than 256 chars, you don't save anything -- or you could do this to keep the last char:
uint i;
for(i = 0; i < 256; i++) {
	if(str[i] == 0) {
		break;
	}
}

return i;

but then that has two checks instead of one (i < 256 && str[i] != 0).

In postgre, using libpq, something like what you're saying is very easy...

int len = PQgetlength(res, row, offset);
if(len >= 0) {
	char* r = PQgetvalue(res, row, offset);
	char[] rr;
	rr.length = len;
	rr[0 .. len] = r[0 .. len];
}

I really suggest using string lengths. it will save you tons of processing power. (especially if you are > 65535 chars in length) and also, by storing the length, you also have the added advantage of being able to store binary data in there, because a zero in the string won't terminate the string. Also, you may find out that people can end strings early passing malformed utf-8 sequences and such too.

Every C library that I use, which uses null terminated strings, I quickly convert them to the dark side for the above reasons. walter is very smart making strings that way -- for slicing purposes too :) Example, imagine a RIGHT(str, 5) function with null terminated strings, then think of it in D: (str.length > 5 ? str[length-5 .. length] : str);

ok, enough rambling... I LOVE strings in D :)

Kenny