[review] new string type

Steven Schveighoffer schveiguy at yahoo.com
Tue Nov 30 10:52:20 PST 2010


On Tue, 30 Nov 2010 13:34:50 -0500, Jonathan M Davis <jmdavisProg at gmx.com>  
wrote:

>
> 1. At least until universal function syntax is in the language, you use  
> the
> ability to do str.func().

Yes, but first, this is not a problem with the string type, it's a problem  
with the language.  Second, any string-specific functions can be added, it  
is a struct after all, not a builtin.

> 2. Functions that would work just fine treating strings as arrays of  
> code units
> (presumably because they don't care about what the actual data is) lose
> efficiency, because now a string isn't an array.

Which functions are those?  They can be allowed via wrappers.

> 3. You have no access to the underlying array unless you're dealing with  
> an
> actual array of dchar.

I thought of adding some kind of access.  I wasn't sure the best way.

I was thinking of allowing direct access via opCast, because I think  
casting might be a sufficient red flag to let you know you are crossing  
into dangerous waters.

But it could just be as easy as making the array itself public.

> 4. Indexing is no longer O(1), which violates the guarantees of the index
> operator.

Indexing is still O(1).

> 5. Slicing (other than a full slice) is no longer O(1), which violates  
> the
> guarantees of the slicing operator.

Slicing is still O(1).

> What you're doing here is forcing the view of a string as a range of  
> dchar in
> all cases. Granted, that's what you want in most cases, but it can  
> degrade
> efficiency, and the fact that some operations (in particular indexing  
> and slicing)
> are not O(1) like they're supposed to be means that algorithms which  
> rely on
> O(1) behavior from them could increase their cost by an order of  
> magnitude. All
> the cases where treating a string as an actual array which are currently  
> valid
> are left out to dry

You can still use char[] and wchar[].

> The inherent problem with strings is that we _want_ to be able to view  
> them as
> both arrays of code units and as ranges of dchar. Trying to view them  
> only as
> ranges of dchar is inefficient in a number of cases. Even something as  
> simple as
> getting a substring is less efficient with this code. Only in cases  
> where you
> don't actually know the length of the substring that you want is this
> essentially as efficient as what we have now. You're ignoring all cases  
> where
> viewing a string as an array of code units is correct and desirable.  
> You're only
> solving half of the problem (albeit the more prevalent half).

I'm not planning on eliminating char and wchar based arrays.

In other words, you should be able to get access to the array, but it  
should not be the default, and it should be considered unsafe.

> Now, making it possible to have a wrapper struct like this which works  
> in many
> cases where you'd use strings could reduce errors in code in many  
> situations, so
> giving the programmer the ability to do that could be interesting. But  
> it seems
> to me that this solution is too limiting to be a viable replacement for  
> strings
> as they are.

Hopefully you can see that I'm not eliminating the functionality you are  
looking for, just making it not the default.

-Steve


More information about the Digitalmars-d mailing list