.NET on a string

Tue Mar 17 10:03:56 PDT 2009

On Tue, 17 Mar 2009 10:59:21 -0400, Georg Wrede <georg.wrede at iki.fi> wrote:

> Walter Bright wrote:
>> Cristi talks about adapting D strings to .NET.
>>   
>> http://www.reddit.com/r/programming/comments/84urf/net_on_a_string/?already_submitted=true
>
> Actually, meant to ask earlier, but
>
> "[...in .NET,] System.Array and System.ArraySegment are distinct,  
> unrelated types. In D arrays slices are indistinguishable from arrays  
> and this creates the problems that I mentioned in the interview."
>
> Is this discussed somewhere?
>
> The idea of slices and arrays being distinct types does seem to have  
> advantages. I've seen a couple of mentions of this lately, but has there  
> been a *rigorous* discussion?

There has been.  But there are very good reasons to keep arrays and slices  
the same type.  Even in C# and Java, a substring is the same type as a  
string.  It allows iterative patterns such as:

str = str[1..$];

The main problem comes form having a substring overwrite data in another  
string via appending.  From what I can tell, that is the *only* issue with  
arrays and slices being the same type.  This problem can be solved in  
other ways.  I've put forth 2 solutions that as far as I know were not  
proven to be infeasible, but which never received any attention in the NG  
 from Walter or Andrei.

There was mention of a separate T[new] type, before my time with D, but I  
think you still have the same aliasing problems there unless you zero out  
the source instance on assignment, or simply disallow assigning a T[new]  
 from another T[new], which can affect the design of code, and make certain  
iterative patterns difficult if not impossible.

The two solutions I put forth are:

1. Storing the requested length of an allocated block in the GC.  This not  
only allows for more intuitive appending, but could also help the GC when  
scanning for pointers (you might not have to zero out the block first).   
This proposal received zero responses.

2. Storing the requested length of an allocated array at the front of the  
array.  I had a hackish scheme to use the most significant bit in the  
length field to identify if a slice is just beyond the allocated length.   
Therefore, an append operation would first check if it is the first  
element in a block, and if so check the allocated length before deciding  
whether to append in place or allocate a new block.  There were some  
questions on the proposal, but I don't believe anyone found any killer  
problems with it.  It has advantages and disadvantages over the first  
proposal and current implementation.  The main advantage is that the  
append code does not have to query the GC to get the requested length.

Aside from the append problem, I see no other drawbacks to having strings  
the way they are.

References:

Proposal 1:  
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=63146

Proposal 2:  
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=77437

-Steve