standard ranges

Roman D. Boiko rb at d-coding.com
Thu Jun 28 06:08:23 PDT 2012


Timings should not be very different from random access in any 
UTF-32 string implementation, because of design of these 
algorithms:

* only operations on 64-bit aligned words are performed 
(addition, multiplication, bitwise and shift operations)

* there is no branching except at the very top level for very 
large array sizes

* data is stored in a way that makes algorithms cache-oblivious 
IIRC. Authors claim that very few cache misses are neccessary 
(1-2 per random access).

* after determining code unit index for some code point index 
further access is performed as usually inside an array, so in 
order to perform slicing it is only needed to calculate code unit 
indices for its end and start.

* original data arrays are not modified (unlike for compact 
representations of dstring, for example).


More information about the Digitalmars-d mailing list