[Issue 1654] New: Array concatenation should result in mutable or invariant depending on usage

d-bugmail at puremagic.com d-bugmail at puremagic.com
Fri Nov 9 11:45:37 PST 2007


http://d.puremagic.com/issues/show_bug.cgi?id=1654

           Summary: Array concatenation should result in mutable or
                    invariant depending on usage
           Product: D
           Version: 2.007
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla at digitalmars.com
        ReportedBy: schveiguy at yahoo.com


One of the great aspects of D is the array concatenation operator.  However,
the current compiler sets the type of the result according to the operands. 
This prevents using invariant or const arrays to build a mutable array without
doing extra work.  For example, the toStringz function:

char *toStringz(const(char)[] s)
{
  copy = new char[s.length + 1];
  copy[0..s.length] = s;
  copy[s.length] = 0;
  return copy.ptr;
}

This could easily be written as:
return (s ~ "\0").ptr;

Except that the result of the concatenation is a const(char)[].

But why does it need to be const?  I'm guessing that the reason is because when
dealing with functions that take invariant strings, it would be ugly to always
have to cast to invariant when doing concatenation.  And of course, functions
that use invariant strings can be optimized differently than mutable or even
const strings.

The issue I see is that the true result *IS* mutable, because it is generated
from the heap.  It's artificially cast to invariant to keep the type the same.

So here is a proposal that would allow efficiency, and utilization of the fact
that concatenation always results in newly allocated data:

First, there should be two array concatenation operator internal functions. 
One that takes two invariant arrays and one that takes two const arrays.  I'll
call them icat and ccat respectively.  Both will return mutable arrays.  The
icat function can be pure when pure functions are supported.

When the compiler encounters an array concatenation in code, if at least one
argument is not invariant, then ccat will be used.  If both arguments are
invariant, then icat will be used.

Regardless of the method used, if the resulting rvalue is expected to be
invariant, then an implicit invariant cast is allowed.  This means that the
following code does not need invariant casts:

char[] blah = "hello".dup;
string blah2 = blah ~ "world";

If the result is assigned to an lvalue that is either const or mutable, then no
cast is needed (mutable array is implicitly castable to const).

This would allow us to use the full potential of array concatenation without
having to worry about casting away invariant or writing several lines of code
to get around this problem.

I'll further point out that the workaround for the current problem does not
allow efficient concatenation.  For example:

const(char)[] a1, a2;

// method 1, just dup it.
// the problem is that we make a needless temporary copy of the data
char[] cat1 = (a1 ~ a2).dup;

// method 2, build a temporary array
// the problem is that the initialization of the array is needless.
char[] cat2 = new char[a1.length + a2.length]; // needless init of memory
cat2[0..a1.length] = a1;
cat2[a1.length..$] = a2;

In addition, idup is not needed for creating an invariant array out of the
concatenation of two mutable or const arrays.


-- 



More information about the Digitalmars-d-bugs mailing list