Casting between char[]/wchar[]/dchar[]

Sat Aug 5 14:20:13 PDT 2006

Hasan Aljudy wrote:
> 
> 
> Walter Bright wrote:
> 
>> kris wrote:
>>
>>> Hasan Aljudy wrote:
>>>
>>>> What are the rules for implicit/explicit casting between char[] and 
>>>> wchar[] and dchar[] ?
>>>>
>>>> When one casts (explicitly or implicitly) does the compiler 
>>>> automatically invoke std.utf.toUTF*()?
>>>>
>>>> Here's an idea that should simplify much of string handling in D:
>>>> allow char[] and wchar[] and dchar[] to be castable implicitly to 
>>>> each other, provided that the compiler invokes the appropriate 
>>>> std.utf.toUTF* method.
>>>> I think this is perfectly safe; no data is lost, and string handling 
>>>> can become much more flexable.
>>>>
>>>> Instead of writing three version of the same funciton for each of 
>>>> char[] wchar[] and dchar[], one can just write a wchar[] version 
>>>> (for example) and the compiler will handle the conversion from/to 
>>>> char[] and dchar[].
>>>>
>>>> This is also relevies developers from writing templetized 
>>>> functions/class when they deal with strings.
>>>>
>>>> Thoughts?
>>>
>>>
>>>
>>> This one was beaten soundly around the head & shoulders in the past :)
>>>
>>> In a systems language like D, one could argue that hidden conversions 
>>> and/or translations (a) can mask what would otherwise be unintended 
>>> compile-time errors (b) can be terribly detrimental to performance 
>>> where multiple conversions are implicitly applied. Such an 
>>> environment could potentially put C0W to shame in terms of heap abuse 
>>> -- recall some of the recent CoW examples, and sprinkle in a few 
>>> unintended conversions for good measure :)
>>>
>>> IIRC, the last time this came up there was a pretty strong feeling 
>>> that such things should be explicit (partly because it can be an 
>>> expensive operation ~ likely sucking on the heap also).
>>
>>
>>
>> Yes. It's hard to judge where the line is, but too many implicit 
>> conversions leads to very hard to understand/debug programs.
> 
> 
> Can I ask you atleast to simplify the conversion by adding properties 
> utf* to char/wchar/dchar arrays?
> 
> so, if I have:
> ----
> char[] process( char[] str ) { ... }
> 
> ...
> 
> dchar[] my32str = .....;
> 
> //I can write
> my32str = process( my32str.utf8 ).utf32;
> 
> //instead of
> //my32str = toUTF32( process( toUTF8( my32str ) ) );
> ----
> 
> 

er, you can do that yourself, Hasan?

char[] utf8 (dchar[] s)
{
   ...
}

dchar[] utf32 (char[] s)
{
   ...
}

etc, followed by:

 > char[] process( char[] str ) { ... }
 >
 > ...
 >
 > dchar[] my32str = .....;
 >
 > //I can write
 > my32str = process( my32str.utf8 ).utf32;
 >
 > //instead of
 > //my32str = toUTF32( process( toUTF8( my32str ) ) );

However, this is sucking on the heap, since you're not providing 
anywhere for the conversion to occur. Hence it it expensive (heap 
allocation is several times slower than a 'typical' utf conversion, and 
there's potential lock-contention to deal with also). This is partly why 
there was some pushback against such properties in the past; especially 
when you can add them yourself using the funky array-prop syntax 
(demonstrated above).

There's nothing wrong with convenience props and so on, but if the ones 
built-in to the compiler are expensive to use, D will inevitably get a 
reputation for being slow and/or heap-bound; just like Java did ~ 
deserved or otherwise. D currently offers a number of alternatives anyway.

Again, why not use a String aggregate instead? To hide/abstract the 
distinction between Unicode types? I suspect that would be both more 
efficient and more convenient? Having written just such a class, I can 
attest to these attributes.