guidelines for parameter types

Tue Dec 18 09:22:56 PST 2012

On 12/18/2012 04:51 AM, Dan wrote:
 > On Tuesday, 18 December 2012 at 06:34:55 UTC, Ali Çehreli wrote:
 >> I don't think this is well known at all. :) I have thought about these
 >> myself and came up with some guidelines at http://ddili.org/ders/d.en
 >
 > Thanks - I will study it. I see that you have covered also in, out,
 > inout, lazy, scope, and shared, so that should keep me busy for a while.

For convenience, here are the chapters and guidelines that are relevant:

1) Immutability:

   http://ddili.org/ders/d.en/const_and_immutable.html

Quoting:

* As a general rule, prefer immutable variables over mutable
   ones.

* Define constant values as enum if their values can be
   calculated at compile time. For example, the constant value of
   seconds per minute can be an enum:

         enum int secondsPerMinute = 60;

* There is no need to specify the type explicitly if it can be
   inferred from the right hand side:

         enum secondsPerMinute = 60;

* Consider the hidden cost of enum arrays and enum associative
   arrays. Define them as immutable variables if the arrays are
   large and they are used more than once in the program.  Specify
   variables as immutable if their values will never change but
   cannot be known at compile time. Again, the type can be
   inferred:

         immutable guess = read_int("What is your guess");

* If a function does not modify a parameter, specify that
   parameter as const. This would allow both mutable and immutable
   variables to be passed as arguments:

     void foo(const char[] s)
     {
         // ...
     }

     void main()
     {
         char[] mutableString;
         string immutableString;

         foo(mutableString);      // ← compiles
         foo(immutableString);    // ← compiles
     }

* Following from the previous guideline, consider that const
   parameters cannot be passed to functions taking immutable. See
   the section titled "Should a parameter be const or immutable?"
   above.

* If the function modifies a parameter, leave that parameter as
   mutable (const or immutable would not allow modifications
   anyway):

     import std.stdio;

     void reverse(dchar[] s)
     {
         foreach (i; 0 .. s.length / 2) {
             immutable temp = s[i];
             s[i] = s[$ - 1 - i];
             s[$ - 1 - i] = temp;
         }
     }

     void main()
     {
         dchar[] salutation = "hello"d.dup;
         reverse(salutation);
         writeln(salutation);
     }

     The output:

     olleh

2) const ref Parameters and const Member Functions:

   http://ddili.org/ders/d.en/const_member_functions.html

Quoting:

* To give the guarantee that a parameter is not modified by the
   function, mark that parameter as in, const, or const ref.

* Mark member functions that do not modify the object as const:

     struct TimeOfDay
     {
     // ...
         string toString() const
         {
             return format("%02s:%02s", hour, minute);
         }
     }

  This would make the struct (or class) more useful by removing an
  unnecessary limitation. The examples in the rest of the book
  will observe this guideline.

3) Constructor and Other Special Functions:

   http://ddili.org/ders/d.en/special_functions.html

Quoting:

Immutability of constructor parameters

   In the Immutability chapter we have seen that it is not easy to
   decide whether parameters of reference types should be defined
   as const or immutable. Although the same considerations apply
   for constructor parameters as well, immutable is usually a
   better choice for constructor parameters.

   The reason is, it is common to assign the parameters to members
   to be used at a later time. When a parameter is not immutable,
   there is no guarantee that the original variable will not
   change by the time the member gets used.

 >> I don't know how practical it is but it would be nice if the price of
 >> copying an object could be considered by the compiler, not by the
 >> programmer.
 >
 > I agree - would be nice if compiler could do it but if it tried some
 > would just not be happy about the choices, no matter what.
 >
 >>
 >> According to D's philosophy structs don't have identities. If I pass a
 >> struct by-value, the compiler should pick the fastest method.
 >>
 >
 > Even if there is a postblit? Maybe that would work, but say your object
 > were a reference counting type. If the compiler decided to pass by ref
 > sneakily for performance gain when you think it is by value that might
 > be a problem. Maybe not, though, as long as you know how it works. I
 > have seen that literal structs passed to a function will not call the
 > postblit - but Johnathan says this was a bug in the way the compiler
 > classifies literals.

I am also keeping in mind that struct objects are supposed to be treated 
as simple values without identities:

   http://dlang.org/struct.html

Quoting:

   A struct is defined to not have an identity; that is, the
   implementation is free to make bit copies of the struct as
   convenient.

 >> That's sensible. (In practice though, it is rarely done in C++. For
 >> example, if V is int and v is not intended to be modified, it is still
 >> passed in as 'V v'.)
 >>
 >
 > Absolutely. I read somewhere it was pedantic to do such things. Then I
 > read some other articles that touted the benefit, even on an int,
 > because the reader of (void foo(const int x) {...} ) knows x will/should
 > not change, so it has clearer intentions for future maintainers.

Yeah. In C++, it is funny that all of my local variables are const as 
much as possible, but all of the by-value parameters are left non-const. 
I think part of the reason is the fact that, that top level const is 
seen as leaking an implementation detail to the signature. It also has a 
potential to confuse the newer users.

 >> That makes a difference whether V is a value type or not. (It is not
 >> clear whether you mean V is a value type.) Otherwise, e.g.
 >> immutable(char[]) v has a legitimate meaning: The function requires
 >> that the caller provides immutable data.
 >
 > When is 'immutable(char[]) v' preferable to 'const(char[]) v'? If you
 > select 'const(char[]) v' instead, your function will not mutate v and if
 > it is generally a useful function it will even accept 'char[]' that *is*
 > mutable. I agree with the meaning you suggest, but under what
 > circumstances is it important to a function to know that v is immutable
 > as opposed to simply const?

Yes, const(char)[] is more welcoming as you state. On the other hand, 
immutable is a requirement on the user: The function demands immutable 
data. This may be so if that string should be used later unchanged. 
Imagine a constructor takes the file name as 'string' (i.e. 
immutable(char)[]). Then the object is assured that the file name can be 
used later and it will be the same as when the object has been constructed.

Assuming that the object (or a function) needs the string to not change 
ever, let's enumerate the cases:

If the function signature is const(char)[], the function must make an 
.idup of it because it cannot rely on the user not changing it.

If the function signature is immutable(char)[], then the function is 
leaking out an implementation detail: It is communicating the fact to 
the user, saying "I need an immutable string, if you have one, great; if 
not, *you* make an immutable copy to give me." By that analysis, I see 
'string' parameters as an optimization: Yes, an immutable data is 
needed. If the user has one, the immutable copy is elided.

A solution that I have for the above is to make the function a template, 
and use a 'static if' to decide whether the object was mutable, and make 
an immutable copy if needed:

import std.stdio;
import std.conv;

ref immutableOf(T)(ref T param)
{
     static if (is(typeof(T[0]) == immutable)) {
         return param;

     } else {
         writeln("Duplicating mutable " ~ T.stringof);
         return to!(immutable(T))(param);
     }
}

void foo(T)(T s)
{
     immutable imm_s = immutableOf(s);
     writefln("s.ptr: %s, imm_s.ptr: %s", s.ptr, imm_s.ptr);
}

void main()
{
     char[] m = "hello".dup;
     immutable(char)[] s = "world";

     foo(m);
     foo(s);
}

The output shows that an immutable copy is made only when user's data 
has been mutable to begin with:

Duplicating mutable char[]
s.ptr: 7F6E216E8FD0, imm_s.ptr: 7F6E216E8FC0
s.ptr: 482240, imm_s.ptr: 482240

The above works but obviously is very cumbersome.

There is a similar analysis for return value types: Why should I ever 
return a string from a function that produces one? Why restrict my 
users? I should return char[] so that they can further modify it they 
want to.

Later I learned that mutable return values of pure functions can 
automatically casted to immutable; so yes, it makes more sense to return.

char[] foo() pure     // <-- returns mutable
{
     char[] result;
     return result;
}

void main()
{
     char[] m = foo();  // <-- works
     string s = foo();  // <-- works
}

 >> | ref immutable(V) v | No need - restrictive with no benefit|
 >> | | over 'ref const(V) v' |
 >>
 >> I still has a different meaning: You must have an immutable V and I
 >> need a reference to it. It may be that the identity of the object is
 >> important and that the function would store a reference to it.
 >>
 >
 > This may be a use-case for it. You want to store a reference to v and
 > save it for later - so immutable is preferred over const. I may be
 > mistaken but I thought the thread on 'rvalue references' talks about
 > taking away the rights to take the address of any ref parameter:
 > http://forum.dlang.org/post/4F863629.6000407@erdani.com

I am behind with my reading. I remember that thread but I must study it 
again. :)

 >> Again, if the function demands immutable(V), which may be null, then
 >> it actually has some use.
 >
 > I agree - I just don't know yet when a function would demand
 > 'immutable(V)' over 'const(V)'.

It makes sense only for by-reference I think. At the risk of repeating 
myself, the function wants to store a file name to be used later.

 >> | T t | T is primitive, dynamic array, or assoc |
 >> | | array (i.e. cheap/shallow copies). For |
 >> | | generic code no knowledge of COW or |
 >> | | cheapness so prefer 'ref T t' |
 >>
 >> I am not sure about that last guideline. I think we should simply type
 >> T and the compiler does its magic. I don't know how practical my 
hope is.
 >>
 >> Besides, we don't know whether T is primitive or not. It can be
 >> anything. If T is int, 'ref T t' could actually be slower due to the
 >> pointer indirection due to ref.
 >
 > Agreed. In a separate thread
 > http://forum.dlang.org/thread/opufykfxwkkjchqcwgrg@forum.dlang.org I
 > included some timings of passing a struct as 'in S', 'in ref S', and
 > 'const ref S'. The very small sizes, matching up to sizes of primitives,
 > showed litte if any benefit of by value over ref. Maybe the
 > test/benchmark was flawed?

I must read that too. :)

I wonder whether the compiler applied optimizations and was able to keep 
lots of stuff in registers. If the code is complex enough perhaps then 
by-value may be faster. (?)

 > But for big sizes, the by reference clearly
 > won by a large margin. The problem with template code is you don't have
 > any knowledge and the cost of 'by value' is unbounded, whereas
 > difference between 'int t' and 'ref const(int) t' might be small.

Right. I hope others bring their experiences. We must understand these 
details. :)

I was fortunate enough to meet with deadalnix and Denis Koroskin last 
week. I told deadalnix about this very topic and how important it is to 
have a talk on this at DConf. He said he might be willing to give that 
talk. (Unless of course you make your submission for DConf 2013 first. ;) )

Ali