.NET on a string

Tue Mar 24 08:58:56 PDT 2009

On Mon, 23 Mar 2009 22:24:30 -0400, Cristian Vlasceanu  
<cristian at zerobugs.org> wrote:

>> I thought one of the benefits of having immutable strings is that
>> substrings were just pointers to slices of the original data in Java and
>> .NET.  So every time I do a substring in Java and .NET, it creates a  
>> copy
>> of the data?  That seems very wasteful, especially when the data is
>> immutable...
>>
>
> Fair enough, I meant substrings in more of a C++ way. Perhaps discussing
> slices / arrays by comparison with substrings / strings is a bad idea,
> because as you point out strings are immutable whilst arrays are not.
>

OK, I didn't think of that implementation.

> My main problem with slices is that when you a) append to them, or b)  
> resize
> them (by assigning to the length property), and the new size goes past  
> the
> bounds of the "original" array, then the slice gets "divorced" from the
> array, and from a light-weight "view" it gets promoted to a "first class"
> array (at least this is the behavior in 2.025).

It's even more bizarre than this :)  If you append to a slice that happens  
to point to the first bytes of the original array, it appends in place (no  
divorce!), possibly overwriting the (possibly immutable!) data still in  
the original array.  This is the bug I'm trying to fix with my proposals.

But I see your point, which is one aspect that I was aware of, but didn't  
really feel like it was a huge problem.  After all, if the behavior is  
deterministic, then you can know whether your slice still points to the  
original array.  But looking at it from your point of view, the scheme  
definitely has valid issues.  Most other parts of the D language allow you  
to simply look at the type of something and know what it means.  Arrays,  
you have to examine the code that created/used the array to figure out  
whether it's an alias or unique data.  That is a problem.

So maybe it is a worthwhile exercise to figure out if there is a way to  
embed the attributes of the array into the type.  I'll have to think about  
how this could be implemented in a way that makes sense, is realistic, and  
does not hinder performance or syntax.  There might still be a way to make  
this work.

>> That is an interesting idea.  But I have a couple problems with it:
>>
>> First, when I see ref int[] s, I think reference to an array, not this
>> array references data from another array.
>> Second, your proposed default (not using ref) is to copy data  
>> everywhere,
>> which is not good for performance.  Most of the time, arrays are passed
>> without needing to "own" the data, so making copies everywhere you  
>> forgot
>> to put ref would be hard to deal with.  It's also completely  
>> incompatible
>> with existing code, which expects reference semantics without using ref.
>>
>
> That's exactly right, but for D / .NET I do not expect existing code to
> compile as-is. For example, I do not intent to port any of the phobos /
> tango code. I envision all I / O and system stuff to go through
> [mscorlib]System.

That is what I would expect also, but part of the benefit of having .NET  
implemented in another language is to port existing code from that  
language to a .NET runtime.  Any code that was to be ported might suffer.   
There are already instances of ref char[] or ref string in many  
applications/libs that would cause strange bugs when ported to .NET.

Oh, and BTW, if I couldn't use Tango, I'd most certainly not use D.NET ;)   
I sort of loathe the .NET runtime libs, except for certain parts.

-Steve