To Walter, about char[] initialization by FF

Derek Parnell derek at nomail.afraid.org
Tue Aug 1 21:55:11 PDT 2006


On Wed, 02 Aug 2006 16:22:54 +1200, Regan Heath wrote:

>>>  char  ==> An unsigned 8-bit byte. An alias for ubyte.
>>>  schar ==> A UTF-8 code unit.
>>>  wchar ==> A UTF-16 code unit.
>>>  dchar ==> A UTF-32 code unit.
>>>
>>>  char[] ==> A 'C' string
>>>  schar[] ==> A UTF-8 string
>>>  wchar[] ==> A UTF-16 string
>>>  dchar[] ==> A UTF-32 string
>>>
>>> And then have built-in conversions between the UTF encodings. So if  
>>> people
>>> want to continue to use code from C/C++ that uses code-pages or similar
>>> they can stick with char[].
>>>
>>>
>>
>> Yes, Derek, this will be probably near the ideal.
> 
> Yet, I don't find it at all difficult to think of them like so:
> 
>    ubyte ==> An unsigned 8-bit byte.
>    char  ==> A UTF-8 code unit.
>    wchar ==> A UTF-16 code unit.
>    dchar ==> A UTF-32 code unit.
> 
>    ubyte[] ==> A 'C' string
>    char[]  ==> A UTF-8 string
>    wchar[] ==> A UTF-16 string
>    dchar[] ==> A UTF-32 string

Me too, but that's probably because I've not been immersed in C/C++ for the
last 20 odd years ;-) 

I "think in D" now and char[] is a UTF-8 string in my mind. 
 
> If you want to program in D you _will_ have to readjust your thinking in  
> some areas, this is one of them.
> All you have to realise is that 'char' in D is not the same as 'char' in C.

True, but Walter seems hell bent of easing the transition to D for C/C++
refugees.
 
> In quick and dirty ASCII only applications I can adjust my thinking  
> further:
> 
>    char   ==> An ASCII character
>    char[] ==> An ASCII string
> 
> I do however agree that C functions used in D should be declared like:
>    int strlen(ubyte* s);
> 
> and not like (as they currently are):
>    int strlen(char* s);
> 
> The problem with this is that the code:
>    char[] s = "test";
>    strlen(s)
> 
> would produce a compile error, and require a cast or a conversion function  
> (toMBSz perhaps, which in many cases will not need to do anything).
> 
> Of course the purists would say "That's perfectly correct, strlen cannot  
> tell you the length of a UTF-8 string, only it's byte count", but at the  
> same time it would be nice (for quick and dirty ASCII only programs) if it  
> worked.

And I'm a wannabe purist <G>
 
> Is it possible to declare them like this?
>    int strlen(void* s);
> 
> and for char[] to be implicitly 'paintable' as void* as char[] is already  
> implicitly 'paintable' as void[]?
> 
> It seems like it would nicely solve the problem of people seeing:
>    int strlen(char* s);
> 
> and thinking D's char is the same as C's char without introducing a  
> painful need for cast or conversion in simple ASCII only situations.

Is the zero-terminator for C strings that will get in the way. We need a
nice way of getting the compiler to ensure C-strings are always terminated
correctly.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocrity!"
2/08/2006 2:48:43 PM



More information about the Digitalmars-d mailing list