Scientific computing and parallel computing C++23/C++26

Bruce Carneal bcarneal at
Thu Jan 20 22:26:23 UTC 2022

On Thursday, 20 January 2022 at 19:57:54 UTC, Ola Fosheim Grøstad 
> On Thursday, 20 January 2022 at 17:43:22 UTC, Bruce Carneal 
> wrote:
>> It's possible, for instance, that you can *know*, from first 
>> principles, that you'll never meet objective X if forced to 
>> use platform Y.  In general, though, you'll just have a sense 
>> of the order in which things should be evaluated.
> This doesn't change the desire to do performance testing at 
> install or bootup IMO. Even a "narrow" platform like Mac is 
> quite broad at this point. PCs are even broader.

Never meant to say that it did.  Just pointed out that you can 
factor some of the work.

>> Yes, SIMD can be the better performance choice sometimes.  I 
>> think that many people will choose to do a SIMD implementation 
>> as a performance, correctness testing and portability baseline 
>> regardless of the accelerator possibilities.
> My understanding is that the presentation Bryce made suggested 
> that you would just write "fairly normal" C++ code and let the 
> compiler generate CPU or GPU instructions transparently, so you 
> should not have to write SIMD code. SIMD would be the fallback 
> option.

The dream, for decades, has been that "the compiler" will just 
"do the right thing" when provided dead simple code, that it will 
achieve near-or-better-than-human-tuned levels of performance in 
all scenarios that matter.  It is a dream worth pursuing.

> I think that the point of having parallel support built into 
> the language is not to get the absolute maximum performance, 
> but to make writing more performant code more accessible and 
> cheaper.

If accessibility requires less performance then you, as a 
language designer, have a choice.  I think it's a false choice 
but if forced to choose my choice would bias toward performance, 
"system language" and all that.  Others, if forced to choose, 
would pick accessibility.

> If you end up having to handwrite SIMD to get decent 
> performance then that pretty much makes parallel support a 
> fringe feature. E.g. it won't be of much use outside HPC with 
> expensive equipment.

I disagree but can't see how pursuing it further would be useful. 
  We can just leave it to the market.

> So in my mind this feature does require hardware vendors to 
> focus on CPU/GPU integration, and it also requires a rather 
> "intelligent" compiler and runtime setup in order to pay for 
> the debts of the "abstraction overhead".

I put more faith in efforts that cleanly reveal low level 
capabilities to the community, that are composable, than I do in 
future hardware vendor efforts.

> I don't think just translating a language AST to an existing 
> shared backend will be sufficient. If that was sufficient 
> Nvidia wouldn't need to invest in nvc++?

Well, at least for current dcompute users, it already is 
sufficient.  The Julia efforts in this area also appear to be 
successful.  Sean Baxter's "circle" offshoot of C++ is another. I 
imagine there are or will be other instances where relatively 
small manpower inputs successfully co-opt backends to provide 
nice access and great performance for their respective language 

> But, it remains to be seen who will pull this off, besides 
> Nvidia.

I don't think there is much that remains to be seen here.  The 
rate and scope of adoption are still interesting questions but 
the "can we provide something very useful to our language 
community?" question has been answered in the affirmative.

People choose dcompute, circle, Julia-GPU over or in addition to 
CUDA/OpenCL today.  Others await more progress from the C++/SycL 
movement.  Meaningful choice is good.

More information about the Digitalmars-d mailing list