.NET on a string

Mon Mar 23 10:41:45 PDT 2009

On Sun, 22 Mar 2009 01:31:28 -0400, Cristian Vlasceanu wrote:

>>>
>>> The idea of slices and arrays being distinct types does seem to have
>>> advantages. I've seen a couple of mentions of this lately, but has  
>>> there
>>> been a *rigorous* discussion?
>>
>> There has been.  But there are very good reasons to keep arrays and  
>> slices
>> the same type.  Even in C# and Java, a substring is the same type as a
>> string.  It allows iterative patterns such as:
>>
>> str = str[1..$];
>>
>
> I am afraid that there's a fallacy in your argument: substrings ARE  
> strings:
> 1) they are full copies of the characters in the given range, and 2) once
> the substring is created it goes its own merry way (i.e. it does not keep
> track of any relationship to the "original" string). Slices ARE NOT  
> arrays.
> Slices are more like "views" into the original array. It is like the
> difference between the icon and the saint / deity that it represents.

I thought one of the benefits of having immutable strings is that  
substrings were just pointers to slices of the original data in Java and  
.NET.  So every time I do a substring in Java and .NET, it creates a copy  
of the data?  That seems very wasteful, especially when the data is  
immutable...

Note I have no intimate knowledge of the inner workings of Java and .Net,  
I just went on what logically makes sense.

Your statement that slices are not arrays depends on your definition of  
slice and array.  To me, slices and arrays are identical.  Slices are  
simply a smaller set of the array.  It's like saying subsets are a  
different type than sets.  I suppose it depends on what language you  
learned about slices (for me it was D).

The only issue I see is the builtin append operation.  Everything else has  
a clean solution already.

> Another point that I have a hard time getting accross (even to the  
> language
> heavy-weights) is that just because it is easy to represent arrays and
> slices seemlessly IN THE PARTICULAR CASE OF THE DIGITAL MARS BACKEND it  
> does
> not mean it is going to work as smooth and seamless in other systems. The
> .NET backend that I am working on is the case in point. If instead of  
> using
> .NET built-in arrays I craft my own representation (to stay compatible  
> with
> the DMD's way of doing array and slices) then I give up interoperability
> with other languages -- and that would defeat the point of doing D on  
> .NET
> to begin with.

Fair enough, but I think restricting the design of D to cater to other  
back ends is probably not a huge driver for Walter.  It's not without  
precedent that people adapting languages to .NET have to introduce syntax  
changes to the language, no?  I'm thinking of C++.NET.

> Your proposed solution is interesting but implementation-specific. I am
> afraid that I cannot not use it with .NET (I just generate IL code,  
> which is
> more high-level than "ordinary" assembly code).
>
> I passed a proposal of my own to Walter and Andrei, and that is to have D
> coders explicitly state the intent of using a slice with the "ref"  
> keyword;
> "ref" is already a legal token in D (at least in 2.0) albeit it is only
> valid in the context of a parameter list, or foreach argument list. It is
> not legal to say "ref int j = i;" in a declaration, for example. But it  
> is a
> trivial change in the parser (I have implemented this change as a proof  
> of
> concept / language extension research) to allow ref (just for slices):  
> "ref
> int[] s = a[1..2];" other than in parameter and foreach arg lists.
>
> I think that "ref" makes sense, because slices, like I said, are
> conceptually views (or references) into the "true" arrays. This simple
> change would a) make D code more self-documenting, and it would give a  
> very
> powerful hint to the compiler. Also, the "ref" semantics is backwards
> compatbile with the exisiting cases where "ref" is allowed.

That is an interesting idea.  But I have a couple problems with it:

First, when I see ref int[] s, I think reference to an array, not this  
array references data from another array.
Second, your proposed default (not using ref) is to copy data everywhere,  
which is not good for performance.  Most of the time, arrays are passed  
without needing to "own" the data, so making copies everywhere you forgot  
to put ref would be hard to deal with.  It's also completely incompatible  
with existing code, which expects reference semantics without using ref.

-Steve