Scientific computing and parallel computing C++23/C++26
bcarneal at gmail.com
Thu Jan 20 22:26:23 UTC 2022
On Thursday, 20 January 2022 at 19:57:54 UTC, Ola Fosheim Grøstad
> On Thursday, 20 January 2022 at 17:43:22 UTC, Bruce Carneal
>> It's possible, for instance, that you can *know*, from first
>> principles, that you'll never meet objective X if forced to
>> use platform Y. In general, though, you'll just have a sense
>> of the order in which things should be evaluated.
> This doesn't change the desire to do performance testing at
> install or bootup IMO. Even a "narrow" platform like Mac is
> quite broad at this point. PCs are even broader.
Never meant to say that it did. Just pointed out that you can
factor some of the work.
>> Yes, SIMD can be the better performance choice sometimes. I
>> think that many people will choose to do a SIMD implementation
>> as a performance, correctness testing and portability baseline
>> regardless of the accelerator possibilities.
> My understanding is that the presentation Bryce made suggested
> that you would just write "fairly normal" C++ code and let the
> compiler generate CPU or GPU instructions transparently, so you
> should not have to write SIMD code. SIMD would be the fallback
The dream, for decades, has been that "the compiler" will just
"do the right thing" when provided dead simple code, that it will
achieve near-or-better-than-human-tuned levels of performance in
all scenarios that matter. It is a dream worth pursuing.
> I think that the point of having parallel support built into
> the language is not to get the absolute maximum performance,
> but to make writing more performant code more accessible and
If accessibility requires less performance then you, as a
language designer, have a choice. I think it's a false choice
but if forced to choose my choice would bias toward performance,
"system language" and all that. Others, if forced to choose,
would pick accessibility.
> If you end up having to handwrite SIMD to get decent
> performance then that pretty much makes parallel support a
> fringe feature. E.g. it won't be of much use outside HPC with
> expensive equipment.
I disagree but can't see how pursuing it further would be useful.
We can just leave it to the market.
> So in my mind this feature does require hardware vendors to
> focus on CPU/GPU integration, and it also requires a rather
> "intelligent" compiler and runtime setup in order to pay for
> the debts of the "abstraction overhead".
I put more faith in efforts that cleanly reveal low level
capabilities to the community, that are composable, than I do in
future hardware vendor efforts.
> I don't think just translating a language AST to an existing
> shared backend will be sufficient. If that was sufficient
> Nvidia wouldn't need to invest in nvc++?
Well, at least for current dcompute users, it already is
sufficient. The Julia efforts in this area also appear to be
successful. Sean Baxter's "circle" offshoot of C++ is another. I
imagine there are or will be other instances where relatively
small manpower inputs successfully co-opt backends to provide
nice access and great performance for their respective language
> But, it remains to be seen who will pull this off, besides
I don't think there is much that remains to be seen here. The
rate and scope of adoption are still interesting questions but
the "can we provide something very useful to our language
community?" question has been answered in the affirmative.
People choose dcompute, circle, Julia-GPU over or in addition to
CUDA/OpenCL today. Others await more progress from the C++/SycL
movement. Meaningful choice is good.
More information about the Digitalmars-d