VLERange: a range in between BidirectionalRange and RandomAccessRange

Wed Jan 12 11:57:58 PST 2011

On 01/12/2011 08:28 PM, Don wrote:
> I think the only problem that we really have, is that "char[]",
> "dchar[]" implies that code points is always the appropriate level of
> abstraction.

I'd like to know when it happens that codepoint is the appropriate level 
of abstraction.
* If pieces of text are not manipulated, meaning just used in the 
application, or just transferred via the application as is (from file / 
input / literal to any kind of output), then any kind of encoding just 
works. One can even concatenate, provided all pieces use the same 
encoding. --> _lower_ level than codepoint is OK.
* But any of manipulation (indexing, slicing, compare, search, count, 
replace, not to speak about regex/parsing) requires operating at the 
_higher_ level of characters (in the common sense). Just like with 
historic character sets in which codes used to represent characters (not 
lower-level thingies as in UCS). Else, one reads, compares, changes 
meaningless bits of text.

As I see it now, we need 2 types:
* One plain string similar to good old ones (bytestring would do the 
job, since most unicode is utf8 encoded) for the first kind of use 
above. With optional validity check when it's supposed to be unicode text.
* One hiher-level type abstracting from codepoint (not code unit) 
issues, restoring the necessary properties: (1) each character is one 
element in the sequence (2) each character is always represented the 
same way.

Denis
_________________
vita es estrany
spir.wikidot.com