Memory safety, C#, D and more

Tue May 5 18:22:10 PDT 2009

Here I have collected few more bits that may be interesting for D development/design.

-------------------

In C# the "fixed" statement prevents the garbage collector from relocating a movable variable. The fixed statement is only permitted in an unsafe context:

http://msdn.microsoft.com/en-us/library/f58wzh21.aspx
http://msdn.microsoft.com/en-us/library/aa664784(VS.71).aspx

So it "pins" a variable, so the GC can't move it anymore in memory, so you can then take and use its address safely. It looks a bit messy, but it allows C# to avoid a conservative GC and keep its moving one.

You can use it for example like this:

int[,,] a = new int[2, 3, 4];
unsafe {
   fixed (int* p = a) {
      for (int i = 0; i < a.Length; ++i) // treat as linear
         p[i] = i;
   }
}

Where int[,,] are built-in multi-dimensional arrays made of a single block of memory.
C# has built-in both arrays of arrays as D, and such multi-dimensional arrays that save some memory and improve cache coherence a bit (but sometimes on modern CPU I have seen they may end a bit slower, because they may require integer multiplications to find items if a bitshift can't be used).

fixed can also be nested if you want to pin two or more pointers:

fixed (...) fixed (...) { ... }

The pointer is meant as fixed only inside the scope.

Where you use "fixed" to take the char* of a string, then the compiler calls toStringz automatically.

You can also use fixed to call another function with a pointer:

class Test {
   unsafe static void Fill(int* p, int count, int value) {
      for (; count != 0; count--)
         *p++ = value;
   }
   static void Main() {
      int[] a = new int[100];
      unsafe {
         fixed (int* p = a) Fill(p, 100, -1);
      }
   }
}

I guess the compiler makes sure to never relocate the "a" array inside that Fill() method.

So C# follows the principle opposite of D: start from being safe and allow everything possible to increase flexibility. D starts from an unsafe situation and does more to give some safety.

This explains a bit how "fixed" interacts with the generational GC:
http://www.codeproject.com/KB/dotnet/pointers.aspx

>Pinning has a HUGE cost to the garbage collector. I assume that you are familiar with the generational algorithm of the garbage collection. Let us say we allocated enough memory to fill Gen 0 Heap (the youngest), and that an additional allocation will trigger a collection. If that very last allocation at the end of the heap was pinned, the pinned object moves to generation 1. (Call GC.GetGeneration(obj) and see). Gen 1 is guaranteed to grow to include the pinned memory at the very end of the Gen 0 Heap. Even if all other memory in Gen 0 was freed, that would still leave a huge unreclaimed space of memory and Gen 0 will begin allocating starting from its previous limit. That is how bad "pinning" is. [...] when you use fixed, do whatever you have do quickly and avoid any memory allocation in the process, which can potentially trigger a garbage collection. If a garbage collection did occur inside a fixed block, most likely the pinned memory was close to the end of Gen 0 heap.<

In practice the C# runtime retains most of its safety even if you use pointers. For example if you run the following code (not in debug mode):

int* a = stackalloc int[n];
for (int i = 0; i < 3 * n; i++) {
    a[i] = i;
    Console.WriteLine("a[i] = {0}", a[i]);
}

With n=10 it stops running just after i=10 (1 past the length). So the runtime is able to catch the trespassing outside the allowed memory anyway, and the docs say it stops the program as soon as possible to avoid malicious code, avoid troubles, etc.

"stackalloc" is the way to have in C# the stack-based dynamic arrays of C99 (I may like to have them in D2 too. C# is surely a kitchen-sink-too language). So that's a stack safety, not an heap one.

Such kind of unsafe code that uses pointers is faster than the normal C# code (often the compiler/runtime isn't able to remove array bound checks, despite this is a supported feature) and slower than equivalent "release mode" D code. I don't know how the C# runtime is able to catch that trespassing, maybe it uses a canary, or sets the memory after the array as not writeable.

After a small test with the following code that performs reads only:

int* a = stackalloc int[n];
for (int i = 0; i < 30 * n; i++) {
    Console.WriteLine("a[{0}] = {1}", i, a[i]);
}

Now the running doesn't stop, so with n=10 it stops printing when i = 299. So there's write-safety only.

I have tried with dmd a stack-based "array":

import std.conv: toInt;
import std.c.stdlib: alloca;
void main(string[] args) {
    int n = args.length == 2 ? toInt(args[1]) : 10;
    int* a = cast(int*)alloca(n * int.sizeof);
    for (int i = 0; i < 30 * n; i++) {
        a[i] = i;
        printf("a[%d] = %d\n", i, a[i]);
    }
}

It stops printing after i = 12 (3 items after the last one). If inside the loop I keep only the printf, it prints up to 300 and more, no read safety.

While the following code with a heap-based array:

import std.conv: toInt;
void main(string[] args) {
    int n = args.length == 2 ? toInt(args[1]) : 10;
    auto aa = new int[n];
    auto a = aa.ptr;
    for (int i = 0; i < 3000 * n; i++) {
        a[i] = i;
        printf("a[%d] = %d\n", i, a[i]);
    }
}

generates an Access Violation after i=15391, there's not much write safety.

In C# the following heap-based array program:

using System;
unsafe sealed class test {
    static unsafe void Main(string[] args) {
        int n = args.Length > 0 ? Int32.Parse(args[0]) : 10;
        int[] a = new int[n];
        unsafe {
            fixed (int* p = a) {
                for (int i = 0; i < 1000 * n; ++i) {
                    p[i] = i;
                    Console.WriteLine("p[{0}] = {1}", i, p[i]);
                }
            }
        }
    }
}

prints items up to i=20 and then throws an exception:
System.IO.IOException, "The handle is invalid"

(in debug code it stops when i is about 25). So even with heap memory and in unsafe mode C# is safe enough (and stopping very soon it allows to find bugs faster, because the program stops very close to where the bug is).

Having such safety when working with pointers-based arrays is a very good thing, I'd like to have it D too when I am not compiling in release mode. Is this doable?

-----------------------------

C# enums can optionally have the "Flags" attribute, that doesn't change the 0,1,2,3... of items, but the compiler sees them as powers of two, so they can be combined bitwise:
http://weblogs.asp.net/wim/archive/2004/04/07/109095.aspx

[Flags]
public enum ClientStates {
  Ordinary,
  HasDiscount,
  IsSupplier,
  IsBlackListed,
  IsOverdrawn
}

ClientStates c = ClientStates.HasDiscount | ClientStates.IsSupplier;

C# enum values can also be printed (and they show their name), this is useful for D2 too.

-----------------------------

Unrelated. (Java) 'new' considered harmful:
http://www.ddj.com/java/184405016

Bye,
bearophile