string is rarely useful as a function argument

Gor Gyolchanyan gor.f.gyolchanyan at gmail.com
Thu Dec 29 06:58:36 PST 2011


What if the string converted itself from utf-8 to utf-32 back and
forth as necessary (utf-8 for storing and utf-32 for processing):

struct String
{
public:
    bool encoded() @property const
    {
        return _encoded;
    }

    bool encoded(bool should) @property
    {
        if(should)
            if(!encoded)
            {
                _utf8 = to!string(_utf32);
                encoded = true;
            }
        else
            if(encoded)
            {
                _utf32 = to!dstring(_utf8);
                encoded = false;
            }
    }

    // Here goes the part where you get to use the string

private:
    bool _encoded;
    union
    {
        string _utf8;
        dstring _utf32;
    }
}

This has a lot of drawbacks and is purely a curiosity. The idea of
expressing the encoding of string as a property of strings, rather,
then a difference between separate types of strings.

On Thu, Dec 29, 2011 at 1:02 PM, Walter Bright
<newshound2 at digitalmars.com> wrote:
> On 12/29/2011 12:12 AM, Gor Gyolchanyan wrote:
>>
>> This a a great idea! In this case the default string will be a
>> random-access range, not a bidirectional range. Also, processing
>> dstring is faster, then string, because no encoding needs to be done.
>> Processing power is more expensive, then memory. utf-8 is valuable
>> only to pass it as an ASCII string (which is not too common) and to
>> store large chunks of it. Both these cases are much less common then
>> all the rest of string processing.
>
>
> dstring consumes 4x the memory, and this can easily cause perf degradations
> due to thrashing and poor cache locality.



-- 
Bye,
Gor Gyolchanyan.


More information about the Digitalmars-d mailing list