4x faster strlen with 4 char sentinel

Jay Norwood via Digitalmars-d-announce digitalmars-d-announce at puremagic.com
Mon Jun 27 14:41:57 PDT 2016


On Monday, 27 June 2016 at 20:43:40 UTC, Ola Fosheim Grøstad 
wrote:
> Just keep in mind that the major bottleneck now is loading 64 
> bytes from memory into cache. So if you test performance you 
> have to make sure to invalidate the caches before you test and 
> test with spurious reads over a very large memory area to get 
> realistic results.
>
> But essentially, the operation is not heavy, so to speed it up 
> you need to predict and prefetch from memory in time, meaning 
> no library solution is sufficient. (you need to prefetch memory 
> way before your library function is called)

I doubt the external memory accesses are involved in these 
measurements. I'm using a 100KB char array terminated by four 
zeros, and doing strlen on substring pointers into it incremented 
by 1 for 100K times.  The middle of the three timings is for 
strlen2, while the two outer timings are for strlen during the 
same program execution.

I'm initializing the 100KB immediately prior to the measurement. 
The 100KB array should all be in L1 or L2 cache by the time I 
make even the first of the three time measurements.

The prefetch shouldn't have a problem predicting this.

2749
688
2783

2741
683
2738




More information about the Digitalmars-d-announce mailing list