[review] new string type

Steven Schveighoffer schveiguy at yahoo.com
Fri Dec 3 12:29:12 PST 2010


On Fri, 03 Dec 2010 14:40:30 -0500, Jerry Quinn <jlquinn at optonline.net>  
wrote:

> I tend to do a lot of transforming strings, but I need to track offsets  
> back to the original text to maintain alignment between the results and  
> the input.  For that, indexes are necessary and we use them a lot.

In my daily usage of strings, I generally use a string as a whole, not  
individual characters.  But I do occasionally use it.

Let's also understand that indexing is still present, what is deactivated  
is the ability to index to arbitrary code-units.  It sounds to me like  
this new type would not affect your ability to store offsets (you can  
store an index, use it later when referring to the string, etc. just like  
you can now).

My string type does not allow for writeable strings.  My plan was to allow  
you access to the underlying char[] and let you edit that way.  Letting  
someone write a dchar into the middle a utf-8 string could cause lots of  
problems, so I just disabled it by default.

Not sure how that affects your 'transforming' work, are you actually  
changing the data or just lazily transforming?  I'm interested to hear  
whether you think my string type would be a viable alternative.

> Probably the right thing to do in this case is just pay for the cost of  
> using dchar everywhere, but if you're working with large enough  
> quantities of data, storage efficiency matters.

The huge advantage of using utf-8 is backwards compatibility with ASCII  
for C functions.

-Steve


More information about the Digitalmars-d mailing list