To Walter, about char[] initialization by FF
Derek Parnell
derek at nomail.afraid.org
Tue Aug 1 21:55:11 PDT 2006
On Wed, 02 Aug 2006 16:22:54 +1200, Regan Heath wrote:
>>> char ==> An unsigned 8-bit byte. An alias for ubyte.
>>> schar ==> A UTF-8 code unit.
>>> wchar ==> A UTF-16 code unit.
>>> dchar ==> A UTF-32 code unit.
>>>
>>> char[] ==> A 'C' string
>>> schar[] ==> A UTF-8 string
>>> wchar[] ==> A UTF-16 string
>>> dchar[] ==> A UTF-32 string
>>>
>>> And then have built-in conversions between the UTF encodings. So if
>>> people
>>> want to continue to use code from C/C++ that uses code-pages or similar
>>> they can stick with char[].
>>>
>>>
>>
>> Yes, Derek, this will be probably near the ideal.
>
> Yet, I don't find it at all difficult to think of them like so:
>
> ubyte ==> An unsigned 8-bit byte.
> char ==> A UTF-8 code unit.
> wchar ==> A UTF-16 code unit.
> dchar ==> A UTF-32 code unit.
>
> ubyte[] ==> A 'C' string
> char[] ==> A UTF-8 string
> wchar[] ==> A UTF-16 string
> dchar[] ==> A UTF-32 string
Me too, but that's probably because I've not been immersed in C/C++ for the
last 20 odd years ;-)
I "think in D" now and char[] is a UTF-8 string in my mind.
> If you want to program in D you _will_ have to readjust your thinking in
> some areas, this is one of them.
> All you have to realise is that 'char' in D is not the same as 'char' in C.
True, but Walter seems hell bent of easing the transition to D for C/C++
refugees.
> In quick and dirty ASCII only applications I can adjust my thinking
> further:
>
> char ==> An ASCII character
> char[] ==> An ASCII string
>
> I do however agree that C functions used in D should be declared like:
> int strlen(ubyte* s);
>
> and not like (as they currently are):
> int strlen(char* s);
>
> The problem with this is that the code:
> char[] s = "test";
> strlen(s)
>
> would produce a compile error, and require a cast or a conversion function
> (toMBSz perhaps, which in many cases will not need to do anything).
>
> Of course the purists would say "That's perfectly correct, strlen cannot
> tell you the length of a UTF-8 string, only it's byte count", but at the
> same time it would be nice (for quick and dirty ASCII only programs) if it
> worked.
And I'm a wannabe purist <G>
> Is it possible to declare them like this?
> int strlen(void* s);
>
> and for char[] to be implicitly 'paintable' as void* as char[] is already
> implicitly 'paintable' as void[]?
>
> It seems like it would nicely solve the problem of people seeing:
> int strlen(char* s);
>
> and thinking D's char is the same as C's char without introducing a
> painful need for cast or conversion in simple ASCII only situations.
Is the zero-terminator for C strings that will get in the way. We need a
nice way of getting the compiler to ensure C-strings are always terminated
correctly.
--
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocrity!"
2/08/2006 2:48:43 PM
More information about the Digitalmars-d
mailing list