Casting between char[]/wchar[]/dchar[]
Hasan Aljudy
hasan.aljudy at gmail.com
Sat Aug 5 14:43:36 PDT 2006
kris wrote:
> Hasan Aljudy wrote:
>
>>
>>
>> Walter Bright wrote:
>>
>>> kris wrote:
>>>
>>>> Hasan Aljudy wrote:
>>>>
>>>>> What are the rules for implicit/explicit casting between char[] and
>>>>> wchar[] and dchar[] ?
>>>>>
>>>>> When one casts (explicitly or implicitly) does the compiler
>>>>> automatically invoke std.utf.toUTF*()?
>>>>>
>>>>> Here's an idea that should simplify much of string handling in D:
>>>>> allow char[] and wchar[] and dchar[] to be castable implicitly to
>>>>> each other, provided that the compiler invokes the appropriate
>>>>> std.utf.toUTF* method.
>>>>> I think this is perfectly safe; no data is lost, and string
>>>>> handling can become much more flexable.
>>>>>
>>>>> Instead of writing three version of the same funciton for each of
>>>>> char[] wchar[] and dchar[], one can just write a wchar[] version
>>>>> (for example) and the compiler will handle the conversion from/to
>>>>> char[] and dchar[].
>>>>>
>>>>> This is also relevies developers from writing templetized
>>>>> functions/class when they deal with strings.
>>>>>
>>>>> Thoughts?
>>>>
>>>>
>>>>
>>>>
>>>> This one was beaten soundly around the head & shoulders in the past :)
>>>>
>>>> In a systems language like D, one could argue that hidden
>>>> conversions and/or translations (a) can mask what would otherwise be
>>>> unintended compile-time errors (b) can be terribly detrimental to
>>>> performance where multiple conversions are implicitly applied. Such
>>>> an environment could potentially put C0W to shame in terms of heap
>>>> abuse -- recall some of the recent CoW examples, and sprinkle in a
>>>> few unintended conversions for good measure :)
>>>>
>>>> IIRC, the last time this came up there was a pretty strong feeling
>>>> that such things should be explicit (partly because it can be an
>>>> expensive operation ~ likely sucking on the heap also).
>>>
>>>
>>>
>>>
>>> Yes. It's hard to judge where the line is, but too many implicit
>>> conversions leads to very hard to understand/debug programs.
>>
>>
>>
>> Can I ask you atleast to simplify the conversion by adding properties
>> utf* to char/wchar/dchar arrays?
>>
>> so, if I have:
>> ----
>> char[] process( char[] str ) { ... }
>>
>> ...
>>
>> dchar[] my32str = .....;
>>
>> //I can write
>> my32str = process( my32str.utf8 ).utf32;
>>
>> //instead of
>> //my32str = toUTF32( process( toUTF8( my32str ) ) );
>> ----
>>
>>
>
>
> er, you can do that yourself, Hasan?
>
> char[] utf8 (dchar[] s)
> {
> ...
> }
>
> dchar[] utf32 (char[] s)
> {
> ...
> }
>
> etc, followed by:
>
> > char[] process( char[] str ) { ... }
> >
> > ...
> >
> > dchar[] my32str = .....;
> >
> > //I can write
> > my32str = process( my32str.utf8 ).utf32;
> >
> > //instead of
> > //my32str = toUTF32( process( toUTF8( my32str ) ) );
>
I know, but
1: The syntax is still not documented..
2: I'm talking about making these properties a part of the standard.
actually, I think:
alias toUTF8 utf8;
alias toUTF16 utf16;
alias toUTF32 utf32;
would do the trick.
>
> However, this is sucking on the heap, since you're not providing
> anywhere for the conversion to occur. Hence it it expensive (heap
> allocation is several times slower than a 'typical' utf conversion, and
> there's potential lock-contention to deal with also). This is partly why
> there was some pushback against such properties in the past; especially
> when you can add them yourself using the funky array-prop syntax
> (demonstrated above).
>
> There's nothing wrong with convenience props and so on, but if the ones
> built-in to the compiler are expensive to use, D will inevitably get a
> reputation for being slow and/or heap-bound; just like Java did ~
> deserved or otherwise. D currently offers a number of alternatives anyway.
Doesn't COW suck on the heap? object allocation? array concatenation?
increasing the length property?
I suppose one could write custom allocators for these "temporary"
conversions. For example, pre-allocate a chunk of heap for temporary utf
conversions (10 K would suffice, I think) and use it like a stack to
make the allocation faster?
Honestly, I don't know how that would work, but I bet someone else does,
and I bet that person can write such an allocator.
Then, integrating that allocator into std.utf would make it faster to
use the standard utf conversion properties. No?
>
> Again, why not use a String aggregate instead? To hide/abstract the
> distinction between Unicode types? I suspect that would be both more
> efficient and more convenient? Having written just such a class, I can
> attest to these attributes.
Because the standard library functions always expect a char[].
What you did with mango was write a whole library, not just a String class.
BTW, are there tutorials for using mango Strings?
More information about the Digitalmars-d
mailing list