.NET on a string

Mon Mar 23 19:24:30 PDT 2009

> I thought one of the benefits of having immutable strings is that 
> substrings were just pointers to slices of the original data in Java and 
> .NET.  So every time I do a substring in Java and .NET, it creates a copy 
> of the data?  That seems very wasteful, especially when the data is 
> immutable...
>

Fair enough, I meant substrings in more of a C++ way. Perhaps discussing 
slices / arrays by comparison with substrings / strings is a bad idea, 
because as you point out strings are immutable whilst arrays are not.

My main problem with slices is that when you a) append to them, or b) resize 
them (by assigning to the length property), and the new size goes past the 
bounds of the "original" array, then the slice gets "divorced" from the 
array, and from a light-weight "view" it gets promoted to a "first class" 
array (at least this is the behavior in 2.025).

This means that for arrays of value types, you can modify the original array 
indirectly via the outstanding slices, but only up until when either a) or 
b) occur, at runtime.

I would prefer to be able to see explicitly in the source code what is a 
"full array", and what is just a "view".

On the other hand, this is not a deal breaker, because I can always roll my 
own slice like this:

struct (T) ArraySlice {

private:

     T[] a; // reference to array

     int begin; // index where slice begins

     int end; // one past index where slice ends

public:

     T opIndex(size_t i) { return a[i]; }

      void opIndexAssign(size_t i, T val) { a[i] = val; }

     int length() { return end - begin; }

     // comment this function out to prevent resizing

     void length(int newLength) {

           end = begin + newlength;

           if (end > a.length) { end = a.length; }

                }

     // support foreach

     int opApply(int delegate(ref int) dg) {

           foreach (i;begin..end) {

                if (dg(a[i])) break;

           }

           return 0;

     }

}

> That is an interesting idea.  But I have a couple problems with it:
>
> First, when I see ref int[] s, I think reference to an array, not this 
> array references data from another array.
> Second, your proposed default (not using ref) is to copy data everywhere, 
> which is not good for performance.  Most of the time, arrays are passed 
> without needing to "own" the data, so making copies everywhere you forgot 
> to put ref would be hard to deal with.  It's also completely incompatible 
> with existing code, which expects reference semantics without using ref.
>

That's exactly right, but for D / .NET I do not expect existing code to 
compile as-is. For example, I do not intent to port any of the phobos / 
tango code. I envision all I / O and system stuff to go through 
[mscorlib]System.

Cristian