Why ranges don't return vectors?

Thu Feb 28 17:02:16 PST 2013

Seriously, nobody will ever get performance from single-element 
iterator/range pattern - this makes CPU stall!

Anytime you read one byte from typical hard disk, system reads full 
sector - typically 512 bytes, no more, no less. Anytime you read one 
byte from memory, CPU loads entire cache line from RAM into the cache 
(64 bytes on all modern Intel CPU's). Why not exploit that with ranges?

Ranges could potentially return arrays in front() (or in 
frontVector/whatever), so basically they will become ranges of ranges 
where the deepest range is always RandomAccessRange. This has obvious 
performance benefits. Everyone knows that traversing memory is faster 
than iteraring by popFront(). On the other hand memory lacks the 
flexibility of ranges - so lets make a hybrid range!

Advantages:
* performance (!)
* limited lookahead/backtracking

How?

1. instruction level parallelism: CPU's can execute few instructions in 
parallel given that they operate on different data (different vector 
elements)
2. SIMD: load entire vector to SSE/AVX register then run the operation 
on all elements at once
3. use of L1 cache: traversing vector in memory is way faster that 
calling popFront() for each vector element - likely if it's inlined

Disadvantages:
* potential inconvenience

I know it can't be easy drop-in addition to the current range 
implementation, but who knows...