std.algorithm.remove and principle of least astonishment

Mon Nov 22 06:09:41 PST 2010

On Mon, 22 Nov 2010 08:24:33 -0500
Michel Fortin <michel.fortin at michelf.com> wrote:

> I agree there might be a use case for a special data type allowing fast 
> random access to graphemes and able to retain the precise count of 
> graphemes. But if what you do only requires iterating over all 
> graphemes, a wrapper range that converts to graphemes on the fly might 
> be less overhead than building a separate data structure.

It's true as long as you can assert each string is iterated at most once. But the job of constructing an instance of "UText" (say, grapheme string) should be exactly the same as what each iteration has to do on the fly. Or do i miss a point?
Also, it's not only about indexing or iterating. Simply finding/counting/replacing given characters (I mean in the sense of graphemes) or slices requires the string to be not only grouped, but also normalised (else how is the routine supposed to recognise the same char in another form?). A heavy job as well, you don't want to do twice. Grouping makes normalising easier (you only cope with a mini-array of codes at once, already known to represent a whole char) (and sorting codes in stacks is easier as well).
Finally, to avoid reprocessing already processed text, I had the idea of "utf33" ;-) This is utf32 plus the guaranty that character forms are already normalised and sorted.

denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com