.NET on a string

Mon Mar 23 14:02:52 PDT 2009

On Mon, 23 Mar 2009 20:41:45 +0300, Steven Schveighoffer <schveiguy at yahoo.com> wrote:

> On Sun, 22 Mar 2009 01:31:28 -0400, Cristian Vlasceanu wrote:
>
>>>>
>>>> The idea of slices and arrays being distinct types does seem to have
>>>> advantages. I've seen a couple of mentions of this lately, but has  
>>>> there
>>>> been a *rigorous* discussion?
>>>
>>> There has been.  But there are very good reasons to keep arrays and  
>>> slices
>>> the same type.  Even in C# and Java, a substring is the same type as a
>>> string.  It allows iterative patterns such as:
>>>
>>> str = str[1..$];
>>>
>>
>> I am afraid that there's a fallacy in your argument: substrings ARE  
>> strings:
>> 1) they are full copies of the characters in the given range, and 2)  
>> once
>> the substring is created it goes its own merry way (i.e. it does not  
>> keep
>> track of any relationship to the "original" string). Slices ARE NOT  
>> arrays.
>> Slices are more like "views" into the original array. It is like the
>> difference between the icon and the saint / deity that it represents.
>
> I thought one of the benefits of having immutable strings is that  
> substrings were just pointers to slices of the original data in Java and  
> .NET.  So every time I do a substring in Java and .NET, it creates a  
> copy of the data?  That seems very wasteful, especially when the data is  
> immutable...
>
> Note I have no intimate knowledge of the inner workings of Java and  
> .Net, I just went on what logically makes sense.
>
> Your statement that slices are not arrays depends on your definition of  
> slice and array.  To me, slices and arrays are identical.  Slices are  
> simply a smaller set of the array.  It's like saying subsets are a  
> different type than sets.  I suppose it depends on what language you  
> learned about slices (for me it was D).
>
> The only issue I see is the builtin append operation.  Everything else  
> has a clean solution already.
>
>> Another point that I have a hard time getting accross (even to the  
>> language
>> heavy-weights) is that just because it is easy to represent arrays and
>> slices seemlessly IN THE PARTICULAR CASE OF THE DIGITAL MARS BACKEND it  
>> does
>> not mean it is going to work as smooth and seamless in other systems.  
>> The
>> .NET backend that I am working on is the case in point. If instead of  
>> using
>> .NET built-in arrays I craft my own representation (to stay compatible  
>> with
>> the DMD's way of doing array and slices) then I give up interoperability
>> with other languages -- and that would defeat the point of doing D on  
>> .NET
>> to begin with.
>
> Fair enough, but I think restricting the design of D to cater to other  
> back ends is probably not a huge driver for Walter.  It's not without  
> precedent that people adapting languages to .NET have to introduce  
> syntax changes to the language, no?  I'm thinking of C++.NET.
>
>> Your proposed solution is interesting but implementation-specific. I am
>> afraid that I cannot not use it with .NET (I just generate IL code,  
>> which is
>> more high-level than "ordinary" assembly code).
>>
>> I passed a proposal of my own to Walter and Andrei, and that is to have  
>> D
>> coders explicitly state the intent of using a slice with the "ref"  
>> keyword;
>> "ref" is already a legal token in D (at least in 2.0) albeit it is only
>> valid in the context of a parameter list, or foreach argument list. It  
>> is
>> not legal to say "ref int j = i;" in a declaration, for example. But it  
>> is a
>> trivial change in the parser (I have implemented this change as a proof  
>> of
>> concept / language extension research) to allow ref (just for slices):  
>> "ref
>> int[] s = a[1..2];" other than in parameter and foreach arg lists.
>>
>> I think that "ref" makes sense, because slices, like I said, are
>> conceptually views (or references) into the "true" arrays. This simple
>> change would a) make D code more self-documenting, and it would give a  
>> very
>> powerful hint to the compiler. Also, the "ref" semantics is backwards
>> compatbile with the exisiting cases where "ref" is allowed.
>
> That is an interesting idea.  But I have a couple problems with it:
>
> First, when I see ref int[] s, I think reference to an array, not this  
> array references data from another array.
> Second, your proposed default (not using ref) is to copy data  
> everywhere, which is not good for performance.  Most of the time, arrays  
> are passed without needing to "own" the data, so making copies  
> everywhere you forgot to put ref would be hard to deal with.  It's also  
> completely incompatible with existing code, which expects reference  
> semantics without using ref.
>
> -Steve

I recall a discussion on Java forum where a user was getting a huge "memory leak" because he was processing large xml files and stored some (rather small) String values that were substrings inside those xml files. Since then I got an impression that a substring of a String is just a slice into an original string. But I agree that this is an implementation detail.