Inferring static array size
Quirin Schroll
qs.il.paperinik at gmail.com
Thu May 16 20:48:18 UTC 2024
On Friday, 3 May 2024 at 09:15:02 UTC, rkompass wrote:
> Could DIP 1039 be restarted?
I don’t know why it couldn’t.
One thing where DIP1039 would shine over `staticArray`:
String literals (typed `immutable(Char)[]`) are zero terminated.
The zero isn’t part of the slice, but it’s there, so one can pass
`"blah".ptr` to a C API and be good, because `"blah"` isn’t
`['b', 'l', 'a', 'h']`, but rather `['b', 'l', 'a', 'h',
'\0'][0..4]`.
If you write:
```d
auto blah = "blah".staticArray;
```
As far as I can tell, `blah` is an array of 4 chars. It has no
zero terminator, and I don’t see how it could have one.
However,
```d
immutable(char)[$] blah = "blah";
```
could absolutely be implemented as making space for the 4
characters plus a zero terminator, but `blah.length` would still
be 4 and it would be a normal static array otherwise, but
contrary to the `staticArray` solution, you could pass `blah.ptr`
to a C API. However, you can’t pass a *copy* of `blah` to a C
API, as a copy only copies 4 bytes.
To be precise: Let `Char` denote any character type; if `arr` is
a static array initialized by the `Char[$]` and a string literal,
`arr.ptr[a.length]` is `Char(0)`. That rule does not apply to
non-character element types (e.g. `int`), and it does not apply
to an array object spelled out as a list, e.g. in `['x']`, or
gained from a function call. This is already the behavior of
`Char[]`s.
One might add that the zero terminator not being part of the
array might be incorrect, as in C, it is part of the array. My
solution would be to add another construct:
```d
immutable(char)[$+1] str = "Hello";
```
The `$+1` is core syntax. There is no `$+2` or anything.
It expresses that the array is (at least) 1 character longer than
the initializer looks like.
This is *exactly* what C does. It’s the maximally faithful
translation of:
```c
const char str[] = "Hello";
```
It only exists for character arrays and they must be initialized
by a string literal or by a compile-time known `const(char)[]`
ending with the zero character.
Whoever revives DIP1039 should take care to answer questions
posed in the
[reviews](https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1039.md#reviews). I have some:
1. The DIP should provide a rationale as to why the feature is
not allowed in function declarations.
2. The DIP does not provide enough examples; it should clearly
demonstrate behavior for every situation in which a type can be
used. The DIP author agrees.
3. The DIP should explain what the problem was with the first
attempt and how this proposal address that problem. The DIP
author disagrees.
4. The DIP should specify if arrays of character types include a
terminating \0 and, if so, if it is part of the length.
5. The DIP fails to provide a rationale as to why
std.array.staticArray is insufficient.
6. Better examples are needed. The DIP author agreed.
7. The DIP should use ease of reading/writing code as an argument
and provide examples to that effect. The DIP author agreed.
8. The benefit gained is very minor, so the DIP should address
this in relation to the difficulty of the implementation and
maintenance. The DIP author agreed.
My takes:
1. It makes conceptually no sense as the length is inferred from
the initializer, but function parameters have none. One could
argue that it does make sense for defaulted parameters, though.
For template value parameters, static arrays and slices are
almost equivalent anyway. It should be allowed for function
return types, though. If entire types can be inferred, that
should be possible. Essentially, `int[$] f()` is `auto f()` where
the compiler errors if the return type isn’t convertible to a
static array.
2. (Grunt work.)
3. I have no idea what the complaint even means. There is no
“problem,” just a nuisance. Something that’s trivial in C is hard
in D, which makes no sense.
4. I stated that, precisely, the zero terminator should be
present, but not part of the static array proper.
5. It’s insufficient to interface with C APIs. Something that’s
trivial (and safe) in C requires a library in D.
6. (Grunt work.)
7. (Grunt work.)
8. It is minor, no doubt about that. For quite some features of
D, the benefit is small or even none technically, but the
implementation isn’t gigantic either. The prime example is `=>`
function definitions: Those *only* have ease of writing on their
side and added zero things that couldn't easily be done without
them. The `T[$]` declarations at least has the additional
argument that C code can be translated to D in the core language
and that literals could be stack allocated and passed to C APIs.
Of course Phobos could add `staticCharArrayZ` to add the zero
terminator – oh wait, it can’t because either the zero is lost on
copying or part of the length.
---
Other thoughts:
The DIP mentions `T[$]` only in declarations and casts. It
neither states that `T[$]` is a type of its own right nor does it
deny it.
If `T[$]` were a type, it would be a weird type, on par with
`void`, probably even worse. (Hint: An `int[$]` has no values of
its own, and no size. It must decay into an `int[n]` wherever
it’s used. Probably many more issues.) My sense is, adding `T[$]`
requires a lot of work, both specifying it and implementing it.
That sets the DIP up for failure as it becomes convoluted and
full of corner cases. The DIP mentioned it as a type suffix,
which would, IMO, boil down to making `int[$]` a fully formed
type. I can only advise anyone who thinks of rebooting the DIP:
Don’t do that.
The only advantage I could think of why `T[$]` should be a type
is so that object.d could provide `sstring` as an alias to
`immutable(char)[$]`. My best guess is that, if people who’d
otherwise write `immutable(char)[$]` over and over in their code
will just define `ichar = immutable(char)` and go with `ichar[$]`.
That doesn’t keep us from allowing `[$]` suffixes on function
return types (roughly equivalent to `auto` return types), as well
as on parameter and variable declarations. For that, bake it into
the syntax there:
```diff
VarDeclarations:
- StorageClasses? BasicType TypeSuffixes?
IdentifierInitializers
+ StorageClasses? BasicType TypeSuffixes?
StaticArraySuffixes? IdentifierInitializers
Declarator:
- TypeSuffixes? Identifier
+ TypeSuffixes? StaticArraySuffixes? Identifier
FuncDeclarator:
- TypeSuffixes? Identifier FuncDeclaratorSuffix
+ TypeSuffixes? StaticArraySuffixes? Identifier
FuncDeclaratorSuffix
+
+ StaticArraySuffixes:
+ [ $ + 1 ] ArraySuffixes?
+ [ $ ] ArraySuffixes?
+
+ ArraySuffixes:
+ [ $ ] ArraySuffixes?
+ [ AssignExpression ] ArraySuffixes?
+ [] ArraySuffixes?
```
That allows, essentially, `char[$+1][][$][4]`, but not
`char[$]*`. It does not bake into the grammar that `int[$+1]`
isn’t possible, but it does acknowledge that `[$+1]` cannot
possibly appear after another `ArraySuffix`.
The reason to allow nested arrays of various kinds, but not e.g.
pointers to them, is that those can be expressed in one literal:
```d
char[$+1][] strings = [ "abc", "cd" ];
// as if:
char[4][] strings = [ "abc", "cd" ];
// 4 == [ "abc", "cd" ].map!(x => x.length).reduce!max + 1
int[$][$] matrix = [ [ 1, 2 ], [ 3, 4 ], [ 5, 6 ] ];
// as if
int[2][3] matrix = [ [ 1, 2 ], [ 3, 4 ], [ 5, 6 ] ];
```
The DIP went into detail how it’s supposed to work with variable
declarations.
For function parameters, they need default values. Then, they’re
basically identical to variable declarations and infer the size
from the default argument.
For function return types, let’s say the function return type is
specified as `T[$]`. To infer the size, any `return expression;`
is treated as if `return
cast(T[(expression).length])expression;`. For a programmer, that
would be annoying to write, but the compiler can do it. Note that
what’s inside a `cast` is unevaluated, so both: the expression is
evaluated exactly once; and the length must be known at compile
time.
More information about the dip.ideas
mailing list