Const, invariant, strings and "constant data need never be copied"

Stewart Gordon smjg_1998 at yahoo.com
Fri Nov 2 19:05:04 PDT 2007


"Janice Caron" <caron800 at googlemail.com> wrote in message 
news:mailman.521.1194042761.16939.digitalmars-d at puremagic.com...
> On 11/1/07, Stewart Gordon <smjg_1998 at yahoo.com> wrote:
>> In DMD 2.006, the definition of string was changed from const(char)[] to
>> invariant(char)[] (and similarly wstring and dstring).  This change has 
>> no
>> doubt broken a fair amount of D 2.x code.
>
> All of my D2 code compiled without change.

Because your D2 code doesn't manipulate strings?

>> Declaring these with invariant parameters therefore means that it
>> is often necessary to .idup a string just to pass it to one of these
>> functions.
>
> That's not true. If /all/ strings are invariant, throughout, then
> everything works.

If /all/ strings are invariant, then you're very limited in what 
manipulations you can perform.

>> Moreover, if a piece of code manipulates strings with a mixture
>> of direct modification and calls to std.string functions, it necessitates
>> quite a bit of copying of strings.
>
> No it doesn't, it merely means ensuring that the reference is unique
> and then calling assumeUnique().

Only in the cases where ensuring that the reference is unique is possible.

>> void main(string[] a) {
>>     char[] text = cast(char[]) read(a[1]);
>
> Well that line's wrong for a start. It should be
>    string text = cast(string)read(a[1]);
>
> There's your problem right there.

Firstly, that was D1 code.  In D1, string is simply an alias of char[].

Secondly, if it were string, the rest of my code wouldn't work under D2, 
because there the string type denotes immutable data.

>>     foreach (ref char c; text) {
>>         if ((c >= 'A' && c <= 'M') || (c >= 'a' && c <= 'm')) {
>>             c += 13;
>>         } else if ((c >= 'N' && c <= 'Z') || (c >= 'n' && c <= 'z')) {
>>             c -= 13;
>>         }
>>     }
>
> I believe that should be
>
>    bool willChange = false;
>    foreach(char c;text) if (inPattern["A-Za-z"]) { willChange = true; 
> break }
>    if (willChange)
>    {
>        char[] s = text.dup;
>        foreach (ref char c; s) {
>           if ((c >= 'A' && c <= 'M') || (c >= 'a' && c <= 'm')) {
>               c += 13;
>           } else if ((c >= 'N' && c <= 'Z') || (c >= 'n' && c <= 'z')) {
>               c -= 13;
>        }
>        text = assumeUnique(s);
>    }

There's the problem.  You've made the code more complicated to make the 
final copy conditional on something actually changing.  In an ideal world, 
it would be unnecessary to make that final copy at all (as far as the way my 
example uses it is concerned).

Moreover, your code loops twice, first to see if there's anything to change 
and then to perform the conversion.  This in itself would take a performance 
hit.

> (Before this release, I would have written
>    text = cast(string)s;
> That still compiles without complaint, but assumeUnique() is better).
>
> The test to see if the string will change is good copy-on-write
> behavior. The rest is your code, adapted to how you're supposed to do
> things in D2.006. First you dup text, because that string /might/ be
> in ROM. Then you make your changes. When you've got what you want, you
> use assumeUnique() to turn it back into a string. This does /not/ make
> a copy.

You miss the point.  My example is of ad-hoc code to perform the conversion 
in place, because it is the most efficient mechanism with the constraints 
under which the application will ever perform it.  Data always loaded into 
RAM immediately before the conversion, and no desire to keep the 'before' 
data once the conversion has happened.

<snip>
>> There are probably plenty
>> of more involved examples in which there's more difference than this 
>> between
>> the 1.x and 2.x code.
>
> If every string function you write obeys the copy-(only)-on-write
> protocol, then I don't see that.
<snip>

Well, I wasn't writing a string function there, so that's beside the point. 
If you're implementing a complicated string-manipulating algorithm, you're 
not necessarily going to separate every little step of the algorithm into a 
separate function.

Stewart.

-- 
My e-mail address is valid but not my primary mailbox.  Please keep replies 
on the 'group where everybody may benefit. 




More information about the Digitalmars-d mailing list