Casting between char[]/wchar[]/dchar[]

Sat Aug 5 14:43:36 PDT 2006

kris wrote:
> Hasan Aljudy wrote:
> 
>>
>>
>> Walter Bright wrote:
>>
>>> kris wrote:
>>>
>>>> Hasan Aljudy wrote:
>>>>
>>>>> What are the rules for implicit/explicit casting between char[] and 
>>>>> wchar[] and dchar[] ?
>>>>>
>>>>> When one casts (explicitly or implicitly) does the compiler 
>>>>> automatically invoke std.utf.toUTF*()?
>>>>>
>>>>> Here's an idea that should simplify much of string handling in D:
>>>>> allow char[] and wchar[] and dchar[] to be castable implicitly to 
>>>>> each other, provided that the compiler invokes the appropriate 
>>>>> std.utf.toUTF* method.
>>>>> I think this is perfectly safe; no data is lost, and string 
>>>>> handling can become much more flexable.
>>>>>
>>>>> Instead of writing three version of the same funciton for each of 
>>>>> char[] wchar[] and dchar[], one can just write a wchar[] version 
>>>>> (for example) and the compiler will handle the conversion from/to 
>>>>> char[] and dchar[].
>>>>>
>>>>> This is also relevies developers from writing templetized 
>>>>> functions/class when they deal with strings.
>>>>>
>>>>> Thoughts?
>>>>
>>>>
>>>>
>>>>
>>>> This one was beaten soundly around the head & shoulders in the past :)
>>>>
>>>> In a systems language like D, one could argue that hidden 
>>>> conversions and/or translations (a) can mask what would otherwise be 
>>>> unintended compile-time errors (b) can be terribly detrimental to 
>>>> performance where multiple conversions are implicitly applied. Such 
>>>> an environment could potentially put C0W to shame in terms of heap 
>>>> abuse -- recall some of the recent CoW examples, and sprinkle in a 
>>>> few unintended conversions for good measure :)
>>>>
>>>> IIRC, the last time this came up there was a pretty strong feeling 
>>>> that such things should be explicit (partly because it can be an 
>>>> expensive operation ~ likely sucking on the heap also).
>>>
>>>
>>>
>>>
>>> Yes. It's hard to judge where the line is, but too many implicit 
>>> conversions leads to very hard to understand/debug programs.
>>
>>
>>
>> Can I ask you atleast to simplify the conversion by adding properties 
>> utf* to char/wchar/dchar arrays?
>>
>> so, if I have:
>> ----
>> char[] process( char[] str ) { ... }
>>
>> ...
>>
>> dchar[] my32str = .....;
>>
>> //I can write
>> my32str = process( my32str.utf8 ).utf32;
>>
>> //instead of
>> //my32str = toUTF32( process( toUTF8( my32str ) ) );
>> ----
>>
>>
> 
> 
> er, you can do that yourself, Hasan?
> 
> char[] utf8 (dchar[] s)
> {
>   ...
> }
> 
> dchar[] utf32 (char[] s)
> {
>   ...
> }
> 
> etc, followed by:
> 
>  > char[] process( char[] str ) { ... }
>  >
>  > ...
>  >
>  > dchar[] my32str = .....;
>  >
>  > //I can write
>  > my32str = process( my32str.utf8 ).utf32;
>  >
>  > //instead of
>  > //my32str = toUTF32( process( toUTF8( my32str ) ) );
> 

I know, but
1: The syntax is still not documented..
2: I'm talking about making these properties a part of the standard.

actually, I think:

alias toUTF8 utf8;
alias toUTF16 utf16;
alias toUTF32 utf32;

would do the trick.

> 
> However, this is sucking on the heap, since you're not providing 
> anywhere for the conversion to occur. Hence it it expensive (heap 
> allocation is several times slower than a 'typical' utf conversion, and 
> there's potential lock-contention to deal with also). This is partly why 
> there was some pushback against such properties in the past; especially 
> when you can add them yourself using the funky array-prop syntax 
> (demonstrated above).
> 
> There's nothing wrong with convenience props and so on, but if the ones 
> built-in to the compiler are expensive to use, D will inevitably get a 
> reputation for being slow and/or heap-bound; just like Java did ~ 
> deserved or otherwise. D currently offers a number of alternatives anyway.

Doesn't COW suck on the heap? object allocation? array concatenation? 
increasing the length property?

I suppose one could write custom allocators for these "temporary" 
conversions. For example, pre-allocate a chunk of heap for temporary utf 
conversions (10 K would suffice, I think) and use it like a stack to 
make the allocation faster?

Honestly, I don't know how that would work, but I bet someone else does, 
and I bet that person can write such an allocator.
Then, integrating that allocator into std.utf would make it faster to 
use the standard utf conversion properties. No?

> 
> Again, why not use a String aggregate instead? To hide/abstract the 
> distinction between Unicode types? I suspect that would be both more 
> efficient and more convenient? Having written just such a class, I can 
> attest to these attributes.

Because the standard library functions always expect a char[].
What you did with mango was write a whole library, not just a String class.

BTW, are there tutorials for using mango Strings?