CT Intel's new language

Sat Jun 14 13:03:57 PDT 2008

Nick Sabalausky wrote:
> "Nick Sabalausky" <a at a.a> wrote in message 
> news:g314r3$m01$1 at digitalmars.com...
>> "janderson" <askme at me.com> wrote in message 
>> news:g311fe$fbl$1 at digitalmars.com...
>>> The main enhancement seems to be a vector class that can perform in 
>>> Parallel.  That was on Walters to-do list for a long time, although it 
>>> was taken out at some point.
>>>
>> Hmm, yea, sounds like something D2 could already implement just by using 
>> some inline assembly (with proper use of "version()", of course) inside of 
>> the map/reduce/etc functions that already exist in a couple libraries.
>>
> 
> Of course, couldn't the same thing be done in C++0x? (I dunno, it's been 
> awhile since I've been up-to-date on the world of C++) So why make a new 
> language out of it?
> 
> I just looked it up some more info. Looks like Ct is some sort of 
> dynamically-compiled language/VM (hmm, maybe/maybe not a VM) that's 
> callable/launchable through C++. The automatic adjustment of granularity 
> sounds interesting. Although still don't think it's anything that couldn't 
> be done in D (even if it meant writing a non-trivial new library).
> 
> From: http://techresearch.intel.com/articles/Tera-Scale/1514.htm
> "How does it work?
> In principal, Ct works with any standard C++ compiler because it is a 
> standards-compliant C++ library (with a lot of runtime behind the scenes). 
> When one initializes the Ct library, one loads a runtime that includes the 
> compiler, threading runtime, memory manager — essentially all the components 
> one needs for threaded and/or vectorized code generation. Ct code is 
> dynamically compiled, so the runtime tries to aggregate as many smaller 
> tasks or data parallel work quanta so that it can minimize threading 
> overhead and control the granularity according to run-time conditions. With 
> the Ct dynamic engine, one gets precise inter-procedural traces to compile, 
> which is extremely useful in the highly modularized and indirect world of 
> object-oriented programming."
> 
> 

Personally I'd like to simply do this:

float A[] = new float[100];
float B[] = new float[100];

A += B;
A *= B;

And have the compiler pick a good optimisation using SIMD, parallel 
processing / whatever.  It should be part of the standard or compiler, 
otherwise you can't say that D is any better then C++ in that regard.

I imagine there would be cases where the compiler could do better 
optimisation then a lib if it knew what these constructs means.

For instance, in its simplest form:

A = B - B; //Lib couldn't optimize this but D could

Note that something like this may be difficult for a user to optimize if 
it was spread across function boundaries, but D could as part of its 
inlining process.

Also SIMD and such offer mul and add operations that happen at the same 
time.  A compiler could easily pick these up automatically.  If it was 
in a lib the user would have to hand optimize each case which would make 
code harder to interpret.

ie in a lib user might go:

A.MulAdd(B, C)

If the compiler knew about this it could easily optimize:

A *= B;
A += B;

And something like:

A *= B;
A *= C;

Each line could be put in a separate thread.

The compiler would have to have some heretic which would decide what 
optimisation route to take.  It may even be able to hot-swap the code 
when the application begins.  Maybe eventually the heretic could be 
exposed to the programmer so they could optimize it and maybe eventually 
a reasonably optimal heretic would be created.

-Joel