Why exceptions for error handling is so important

Tue Jan 13 02:53:15 PST 2015

On Tuesday, 13 January 2015 at 09:58:56 UTC, bearophile wrote:
> Take a look at the ideas of "C++ seasoning" 
> (http://channel9.msdn.com/Events/GoingNative/2013/Cpp-Seasoning 
> ), where they suggest to do kind of the opposite of what you 
> do, it means throwing out loops and other things, and replacing 
> them with standard algorithms.

Yes... you can do that. For little gain since C++'s support for 
high level programming is bloat inducing. Just take a look of all 
the symbols you need to have a conforming iterator or 
allocator... An allocator should be a simple 5 line snippet, it 
is bloatsome in STL:

https://gist.github.com/donny-dont/1471329

Then I needed a circular buffer. STL didn't have one. So I 
downloaded the Boost one. It was terribly inefficient, because it 
was generic and STLish.

I ended up writing my own using a fixed size log2 array with a 
start and end index. Clean conditional-free efficient code due to 
the log2 property and modular arithmetics.

So, in the end I get something faster, that produce more readable 
code, give faster compiles, is easier to read, is easier to debug 
and was implemented in less time than finding and figuring out 
the Boost one...

I have no problem with using array<T> and vector<T> where it 
fits, but in the end templated libraries prevent transparency. If 
you want speed you need to understand layout. Concrete 
implementations make that easier.

>> A solution like list comprehensions is a lot easier on the 
>> programmer, if convenience is the goal.
>
> There's still time to add lazy and eager sequence 
> comprehensions (or even better the computational thinghies of 
> F#) to D, but past suggestions were not welcomed. D has lot of 
> features, adding more and more has costs.
>
>
>> Phobos "ranges" need a next_simd() to be efficient. Right?
>
> Perhaps, but first std.simd needs to be finished.

Right, but you need to support masked simd if you want to do 
filtering. Maybe autovectorization is the only path.

Still, you also need to keep your loops tiny if you want to 
benefit from X86 loop buffer. The CPU is capable of unrolling 
tight loops in hardware before hitting the execution pipeline. 
Thus getting the conditionals out of the pipeline.

Then you have cache locality. You need to break up long loops so 
you don't push things out of the caches.

So, if you can gain 2x by good cache locality/prefetching and 4x 
by using AVX over scalars, then you gain 8x performance over a 
naive implementation. That hurts.