Inferring static array size

Thu May 16 20:48:18 UTC 2024

On Friday, 3 May 2024 at 09:15:02 UTC, rkompass wrote:
> Could DIP 1039 be restarted?

I don’t know why it couldn’t.

One thing where DIP1039 would shine over `staticArray`:
String literals (typed `immutable(Char)[]`) are zero terminated. 
The zero isn’t part of the slice, but it’s there, so one can pass 
`"blah".ptr` to a C API and be good, because `"blah"` isn’t 
`['b', 'l', 'a', 'h']`, but rather `['b', 'l', 'a', 'h', 
'\0'][0..4]`.
If you write:
```d
auto blah = "blah".staticArray;
```
As far as I can tell, `blah` is an array of 4 chars. It has no 
zero terminator, and I don’t see how it could have one.
However,
```d
immutable(char)[$] blah = "blah";
```
could absolutely be implemented as making space for the 4 
characters plus a zero terminator, but `blah.length` would still 
be 4 and it would be a normal static array otherwise, but 
contrary to the `staticArray` solution, you could pass `blah.ptr` 
to a C API. However, you can’t pass a *copy* of `blah` to a C 
API, as a copy only copies 4 bytes.

To be precise: Let `Char` denote any character type; if `arr` is 
a static array initialized by the `Char[$]` and a string literal, 
`arr.ptr[a.length]` is `Char(0)`. That rule does not apply to 
non-character element types (e.g. `int`), and it does not apply 
to an array object spelled out as a list, e.g. in `['x']`, or 
gained from a function call. This is already the behavior of 
`Char[]`s.

One might add that the zero terminator not being part of the 
array might be incorrect, as in C, it is part of the array. My 
solution would be to add another construct:
```d
immutable(char)[$+1] str = "Hello";
```
The `$+1` is core syntax. There is no `$+2` or anything.
It expresses that the array is (at least) 1 character longer than 
the initializer looks like.
This is *exactly* what C does. It’s the maximally faithful 
translation of:
```c
const char str[] = "Hello";
```
It only exists for character arrays and they must be initialized 
by a string literal or by a compile-time known `const(char)[]` 
ending with the zero character.

Whoever revives DIP1039 should take care to answer questions 
posed in the 
[reviews](https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1039.md#reviews). I have some:
1. The DIP should provide a rationale as to why the feature is 
not allowed in function declarations.
2. The DIP does not provide enough examples; it should clearly 
demonstrate behavior for every situation in which a type can be 
used. The DIP author agrees.
3. The DIP should explain what the problem was with the first 
attempt and how this proposal address that problem. The DIP 
author disagrees.
4. The DIP should specify if arrays of character types include a 
terminating \0 and, if so, if it is part of the length.
5. The DIP fails to provide a rationale as to why 
std.array.staticArray is insufficient.
6. Better examples are needed. The DIP author agreed.
7. The DIP should use ease of reading/writing code as an argument 
and provide examples to that effect. The DIP author agreed.
8. The benefit gained is very minor, so the DIP should address 
this in relation to the difficulty of the implementation and 
maintenance. The DIP author agreed.

My takes:
1. It makes conceptually no sense as the length is inferred from 
the initializer, but function parameters have none. One could 
argue that it does make sense for defaulted parameters, though. 
For template value parameters, static arrays and slices are 
almost equivalent anyway. It should be allowed for function 
return types, though. If entire types can be inferred, that 
should be possible. Essentially, `int[$] f()` is `auto f()` where 
the compiler errors if the return type isn’t convertible to a 
static array.
2. (Grunt work.)
3. I have no idea what the complaint even means. There is no 
“problem,” just a nuisance. Something that’s trivial in C is hard 
in D, which makes no sense.
4. I stated that, precisely, the zero terminator should be 
present, but not part of the static array proper.
5. It’s insufficient to interface with C APIs. Something that’s 
trivial (and safe) in C requires a library in D.
6. (Grunt work.)
7. (Grunt work.)
8. It is minor, no doubt about that. For quite some features of 
D, the benefit is small or even none technically, but the 
implementation isn’t gigantic either. The prime example is `=>` 
function definitions: Those *only* have ease of writing on their 
side and added zero things that couldn't easily be done without 
them. The `T[$]` declarations at least has the additional 
argument that C code can be translated to D in the core language 
and that literals could be stack allocated and passed to C APIs. 
Of course Phobos could add `staticCharArrayZ` to add the zero 
terminator – oh wait, it can’t because either the zero is lost on 
copying or part of the length.

---

Other thoughts:

The DIP mentions `T[$]` only in declarations and casts. It 
neither states that `T[$]` is a type of its own right nor does it 
deny it.

If `T[$]` were a type, it would be a weird type, on par with 
`void`, probably even worse. (Hint: An `int[$]` has no values of 
its own, and no size. It must decay into an `int[n]` wherever 
it’s used. Probably many more issues.) My sense is, adding `T[$]` 
requires a lot of work, both specifying it and implementing it. 
That sets the DIP up for failure as it becomes convoluted and 
full of corner cases. The DIP mentioned it as a type suffix, 
which would, IMO, boil down to making `int[$]` a fully formed 
type. I can only advise anyone who thinks of rebooting the DIP: 
Don’t do that.
The only advantage I could think of why `T[$]` should be a type 
is so that object.d could provide `sstring` as an alias to 
`immutable(char)[$]`. My best guess is that, if people who’d 
otherwise write `immutable(char)[$]` over and over in their code 
will just define `ichar = immutable(char)` and go with `ichar[$]`.

That doesn’t keep us from allowing `[$]` suffixes on function 
return types (roughly equivalent to `auto` return types), as well 
as on parameter and variable declarations. For that, bake it into 
the syntax there:
```diff
     VarDeclarations:
-       StorageClasses? BasicType TypeSuffixes? 
IdentifierInitializers
+       StorageClasses? BasicType TypeSuffixes? 
StaticArraySuffixes? IdentifierInitializers

     Declarator:
-       TypeSuffixes? Identifier
+       TypeSuffixes? StaticArraySuffixes? Identifier

     FuncDeclarator:
-       TypeSuffixes? Identifier FuncDeclaratorSuffix
+       TypeSuffixes? StaticArraySuffixes? Identifier 
FuncDeclaratorSuffix
+
+   StaticArraySuffixes:
+       [ $ + 1 ] ArraySuffixes?
+       [ $ ] ArraySuffixes?
+
+   ArraySuffixes:
+       [ $ ] ArraySuffixes?
+       [ AssignExpression ] ArraySuffixes?
+       [] ArraySuffixes?
```
That allows, essentially, `char[$+1][][$][4]`, but not 
`char[$]*`. It does not bake into the grammar that `int[$+1]` 
isn’t possible, but it does acknowledge that `[$+1]` cannot 
possibly appear after another `ArraySuffix`.

The reason to allow nested arrays of various kinds, but not e.g. 
pointers to them, is that those can be expressed in one literal:
```d
char[$+1][] strings = [ "abc", "cd" ];
// as if:
char[4][] strings = [ "abc", "cd" ];
// 4 == [ "abc", "cd" ].map!(x => x.length).reduce!max + 1

int[$][$] matrix = [ [ 1, 2 ], [ 3, 4 ], [ 5, 6 ] ];
// as if
int[2][3] matrix = [ [ 1, 2 ], [ 3, 4 ], [ 5, 6 ] ];
```

The DIP went into detail how it’s supposed to work with variable 
declarations.

For function parameters, they need default values. Then, they’re 
basically identical to variable declarations and infer the size 
from the default argument.
For function return types, let’s say the function return type is 
specified as `T[$]`. To infer the size, any `return expression;` 
is treated as if `return 
cast(T[(expression).length])expression;`. For a programmer, that 
would be annoying to write, but the compiler can do it. Note that 
what’s inside a `cast` is unevaluated, so both: the expression is 
evaluated exactly once; and the length must be known at compile 
time.