Confusion/trying to understand CTFE keywords

Mon Jun 4 03:18:05 UTC 2018

On Sunday, June 03, 2018 21:32:06 gdelazzari via Digitalmars-d-learn wrote:
> Hello everyone, I'm new here on the forum but I've been exploring
> D for quite a while. I'm not an expert programmer by any means,
> so this one may be a really silly question and, in that case,
> please forgive me.
>
> With the premise that I've still not looked a lot into "complex"
> compile time "stuff" (whether it's templates, CTFE, compile-time
> constants, etc...), so that may be the reason I may be missing
> the point... I'm trying to understand why keywords such as
> "static" or "enum" are used to denote compile time "things". What
> I mean is that those keywords are also used for other purposes,
> so I find it a bit confusing. Couldn't a keyword like "ctfe"
> (just making it up right now) exist? So that, when seeing
> something like
>
> ctfe myNumber = 5;
>
> ctfe if (myNumber + 2 == 7)
> {
>    // ...
> }
>
> one could immediately understand that the code is
> executed/evaluated at compile time. True, after someone knows
> that "static" and "enum" mean (in the above example) that some
> compile-time things are happening, it's fine. I just find it a
> bit confusing not having a dedicated keyword but re-using
> existing ones that also serve other purposes...
>
> Note that this is not an attack to the language or anything (I
> actually really love it), I'm just trying to understand the
> reasoning behind this choice.
>
> Thank you very much in advance.

I think that part of your problem here comes from the fact that you think of
enum or static are "CTFE keywords." That's not what they are at all. Yes,
they can trigger CTFE, but they're not the only way.

Given how CTFE works in D, it really wouldn't make sense to have a keyword
for it. CTFE is simply what kicks in when you have an expression that _must_
be evaluated at compile time. e.g.

enum a = 42;

enum Foo
{
    a = 42,
    b = 92
    c = 12
}

struct S
{
    int i = 9;
}

all are cases where CTFE is used, because all of them require that the value
be known at compile time. In the example, they're just integers, so no
functions are called, but any one of them could be initialized with an
expression that involved calling a function. e.g.

struct S
{
    int i = foo() + 7;
}

There is no special keyword here, and there is no need for one. By the
language's design, if i is directly initialized, its value must be known at
compile time, and so any expression that's used to directly initialize it
must be evaluated at compile time. The same goes for enums or static
variables. How would you expect something like this to even work with a
special keyword for CTFE?

Now, the fact that enum was used for manifest constants such as

enum foo = "hello";

in addition to actual enums such as

enum Color
{
    red,
    green,
    blue,
    orange,
    yellow
}

is arguably unnecessary and confusing (and as some of the other posts in
this thread mention, this was done to avoid adding a new keyword). So,
maybe, we should have had something like

manifest foo = "hello";

and made

enum foo = "hello";

illegal, but even if we had done something like that, it would not have had
any effect on how CTFE works. It would have just clarified the difference
between manifest constants and proper enums. As for why enum was reused for
manifest constants, it was not only to save a keyword but because they
bascially act like anonymous enums (and in fact, that's what the spec calls
them) in that how they act is exactly like enums except for the fact that
they don't declare a new type. So, whether it would have been better to use
a new keyword is a matter of debate.

As for static, most of what it does is inherited from C/C++ and Java, and
while does get used in several contexts, it's actually used quite
consistantly, much as it might not seem that way at first.

When talking about a static variable, the key difference between a static
variable and an enum (be it a manifest constant or an actual enum) is that a
static variable is an actual variable with a location in memory, whereas an
enum is just a value that you can refer to by name. The value of the enum is
basically copy-pasted wherever it it is used. It's not legal to take the
address of an enum, and in fact, that means that if you do something like

enum arr = [1, 2, 3];

then every time you use arr, you're potentially allocating a new dynamic
array, because foo(arr) is the same as foo([1, 2, 3]) except for two things:

1. If you change the value of arr, it changes its value everywhere, wheres
if you use [1, 2, 3] directly, you'd have to change every place that you
used it if you wanted to change it.

2. The value of arr is copied, not the expression used to initialize it. So,
if you had

enum arr = [1, 2, bar()];

and bar() resulted in 42, then foo(arr) would be the same as foo([1, 2, 42])
and not foo([1, 2, bar()]).

However, static variables are actually variables. The key difference between
them and other variables at the same scope is that they're not associated
with that particular instance of whatever the non-static variables are
associated with. So, in the case of a class or struct - e.g.

struct S
{
    int i;
    static int s;
}

the static variable is associated with that class or struct, and that
variable is shared across all instances of that class or struct within the
same thread (for it it also be shared across threads, it would have to be
marked as shared).

If the static variable is a local variable, then it's shared across all
calls to that function rather than being unique to each call of that
function. It's scoped to the function, but otherwise, it's basically the
same as a module-level variable or static variable in a class or struct.

static on module-level variables is a no-op, since there's only one
"instance" of the module. So, they're arguably implied to be static.

Basically, whenever the keyword static is used, it means that that symbol
has no context. In the case of variables, that means that it's not
associated with any particular instance of a class or struct or any
particular call to a function. In the case of types or functions, it means
that it has no context pointer. The place where this is easiest to
understand is with static member functions. e.g. in

struct S
{
    int foo() {...}
    static int bar() {...}
    ...
}

foo has the implied this pointer/ref to the object that foo is being called
on, whereas bar has no impied this pointer/ref. It's scoped to the struct,
but it really isn't much different from a free function in the same module.
The primary difference is that it must be called using either the type it's
in - e.g. S.bar() - or it must be called on an instance of S - e.g.
S.init.bar(). But even if it's called on an instance of S, it still doesn't
have access to that instance. It has no context pointer (which in this case
would have been the this pointer/ref if it had been a non-static member
function).

For a nested function such as in

auto foo(string s)
{
    static int bar(int i)
    {
        ...
    }
    ...
}

the static means that it doesn't have a context pointer to the stack of the
function that it's in. It can't access any variables that are inside of its
outer function, because that would require a context pointer to the stack of
the outer function. On the other hand, if it's non-static, e.g.

auto foo(string s)
{
    int bar(int i)
    {
        ...
    }
    ...
}

then the nested function does have a context pointer to the stack of the
outer function, and it can access those variables and manipulate them.
Similarly, nested structs and classes have a context pointer to their
containing function. e.g. in

auto foo(string s)
{
    struct S
    {
        int bar(int i)
        {
            ...
        }
        ...
    }
    ...
}

the struct S has access to the stack of the function that it's in, and bar
could access the function paramters of foo. However, if the struct is static

auto foo(string s)
{
    static struct S
    {
        int bar(int i)
        {
            ...
        }
        ...
    }
    ...
}

then it doesn't get a context pointer to foo's stack, and if you try to
access it, you'll get an error about bar not being able to access the frame
of function foo. It should be noted that this context pointer it why you
usually need to mark Voldemort types as being static, because otherwise, the
compiler ends up with two context pointers for the struct - the function
that it's declared in and the context for the predicate that was passed via
the alias template parameter, and it unforunately can't handle having
multiple context pointers like that. So, if your nested struct or class
doesn't actually need access to its outer scope, it really should be marked
as static.

Nested structs and classes inside of structs and classes follow similar
principles but aren't as consistent about it. Essentially, all classes and
structs nested inside of structs are implicitly static. They have no context
pointer to the struct that contains them and aren't associated with any
particular instance of the struct. Structs nested inside classes are the
same. However, non-static classes nested inside classes _do_ have access to
the class that contains them. They have an implied variable called outer
that is the this reference for the class instance that they're associated
with (this is something that comes from Java). But if a class nested inside
a class is marked with static, then it is not associated with any class
instance and does not have an additional context pointer/reference such as
outer.

Structs and classes at the module level are basically implied to be static,
since they have no additional context pointer and are just scoped by the
module.

Static constructors then are pretty straightforward. They are used to
initialize static variables at runtime instead of compile time and have no
context pointers of any kind. Module-level static constructors are used to
initialize module-level variables (which are implicitly static), and static
class/struct constructors initialize static variables inside
classes/structs.

So, while static _seems_ somewhat inconsistent at first, the way it's used
is pretty consistent overall. The main inconsistency is the places where
static is essentially implicit rather than explicit (such as module-level
variables or structs nested in other structs or classes).

Regardless, none fo the extra complexity with regards to the differences
between enum and static or the different ways that they're used really has
anything to do with CTFE. CTFE is the general mechanism by which constructs
which must have their value known at compile time have the expression that
gives them their value evaluated. It just so happens that enum and static
variables are two of the various cases where a value must be known at
compile time and thus are two of the various cases where CTFE is used.

So, sorry for the wall of text, but hopefully that helps clarify things.

If you haven't yet, I'd suggest reading

http://ddili.org/ders/d.en/index.html

Even if you understand most of the content already, it will probably help
fill in some of the holes you have in your understanding, and depending on
how much you've done with D, it may contain quite a bit of new content that
will help you out.

- Jonathan M Davis