ldc 0.9.1 released

Wed May 27 15:19:32 PDT 2009

bearophile wrote:
> Frits van Bommel:
> 
> Thank you for your answers.
> 
>> This one is only done for certain GC allocations by the way, not all of them. 
>> The ones currently implemented are:
>>   * new Struct/int/float/etc.,
>>   * uninitialized arrays (used for arr1 ~ arr2, for instance),
>>   * zero-initialized arrays (e.g. new int[N])
>>   * new Class, unless
>>     a) it has a destructor,
>>     b) it has a custom allocator (overloads new), or
>>     c) it has a custom deallocator (overloads delete).
> 
> I'm trying to find situations where that's true, but in two small programs that use both structs and classes (that don't escape the scope and follow your unless list) I see:
> 
> call	_d_allocmemoryT
> call _d_allocclass
> Are those calls to variants of alloca()?

No, those are GC allocations.

This small program contains no gc allocations with ldc -O3:
-----
struct Struct {
     int i, j = 4;
}

class Class {
     int i, j = 6;
}

int frob(T)(T t) {
     t.i = 4;
     return t.j;
}

int withStruct() {
     return frob(new Struct);
}

int withClass() {
     return frob(new Class);
}
-----

It does still contain them when inlining is disabled, as it is by default with 
-O2 (aka -O); this seems to be because the LLVM pass that adds parameter 
attributes (like nocapture, better known as 'scope' in these newsgroups) is 
missing from the default list of optimizations :(. I'll fix this in the 
repository soon.

Another constraint I forgot to mention: it doesn't work for allocations in 
loops, because it's tricky to figure out whether the allocation is still 
reachable when the loop reaches the same position again.
(For this reason, the pass by default runs before each inliner run and once 
after all inlining is done since the inliner can inline code into loops, yet 
allows for simplifications that make escape analysis more accurate)

> While looking for those alloca I have also tested code that has the following two lines one after the other:
>     auto a = new int[1000];
>     a[] = 2;
> 
> That code is very common, because you currently can't write:
>     auto a = new int[1000] = 2;
> 
> The latest LDC compiles that as:
> 
> 	pushl	%esi
> 	subl	$4016, %esp
> 	leal	16(%esp), %esi
> 	movl	%esi, (%esp)
> 	movl	$4000, 8(%esp)
> 	movl	$0, 4(%esp)
> 	call	memset
> 	movl	%esi, (%esp)
> 	movl	$2, 8(%esp)
> 	movl	$1000, 4(%esp)
> 	call	_d_array_init_i32
> 
> I think the memset may be avoided.

That's trickier to get right, because the optimizer would have to look ahead to 
see the new memset call is always followed by the initialization, with no reads 
in between.
The 1-byte element case can probably be handled by LLVM if _d_array_init_i8 is 
replaced by another memset, though. (and similarly, _d_array_init_i16 could be 
handled for cases like 0xFFFF, but not 0x1234, by turning it into memset).