Higher level built-in strings

Tue Jul 20 08:07:29 PDT 2010

Walter Bright Wrote:

> bearophile wrote:
> > Walter Bright:
> >> 1. most string operations, such as copying and searching, even regular 
> >> expressions, work just fine using regular indices.
> >> 
> >> 2. doing the operations in (1) using code points and having to continually
> >>  decode the strings would result in disastrously slow code.
> > 
> > In my original post I have forgotten another difference over arrays: 5b) a
> > method like ".unit()" that allows to index code units. So "foo".unit(1) is
> > always O(1). Lower level code can use this method as [] is used for arrays.
> 
> This is backwards. The [i] should behave as expected for arrays. As it turns 
> out, indexing by byte is *far* more common than indexing by code unit, in fact, 
> I've never ever needed to index by code unit.
> 
> (Though it is sometimes necessary to step through by code unit, that's different 
> from indexing by code unit.)

I've had the same experience.  The proposed changes would make string useless to me, even for Unicode work.  I'd end up using ubyte[] instead.