D Language Foundation August 2024 Monthly. Meeting Summary
Mike Parker
aldacron at gmail.com
Wed Dec 18 04:34:25 UTC 2024
The D Language Foundation's monthly meeting for August 2024 took
place on Friday the 9th. It lasted about an hour and forty
minutes.
## The Attendees
The following people attended:
* Walter Bright
* Iain Buclaw
* Rikki Cattermole
* Jonathan M. Davis
* Timon Gehr
* Martin Kinkelin
* Dennis Korpel
* Mathias Lang
* Razvan Nitu
* Mike Parker
* Robert Schadek
* Quirin Schroll
* Adam Wilson
## The Summary
### Replacing D's escape analysis
Rikki said he'd spoken with Dennis a month ago about trying to
simplify D's escape analysis, but nothing had come of it. At
BeerConf, he'd brought it up again and Dennis said he'd been
thinking about it. Rikki had also spoken to Walter about it, and
Walter had said that DIP 1000 wasn't quite doing what we wanted
it to do and was a bit too complex.
As such, Rikki wanted to discuss the possibility of replacing DIP
1000 as D's escape analysis solution. He thought the first step
before making any solid decisions was to make sure it was fully
under a preview switch. Dennis confirmed that it currently was.
Rikki said the next step was to think about replacing it and what
that might look like. He asked for suggestions.
Dennis said that before deciding on how to replace it, we should
first state what was wrong with the current design and the goals
of a replacement. He said he had some issues with it. One was the
lack of transitive scope. Another was that structs only had one
lifetime even if they had multiple members. He'd been thinking of
allowing struct fields to be annotated as `scope`, but he didn't
have a concrete proposal yet.
Walter said the difficulty he'd encountered wasn't that DIP 1000
was complicated, but that the language was complicated. You had
to fit it into each of the language's various constructs. How did
reference types work? Or implicit class types? Or lazy arguments?
Constructors? That was where the complexity came from.
He gave the example of implicit `this` arguments to member
functions. He'd explained over and over again that if anyone
wanted to understand how they worked with DIP 1000 in
constructors or member functions, the thing to do was to write it
out as if `this` were an explicit argument. Then you'd be able to
see how it was supposed to work. But people found that endlessly
confusing.
Any proposal to simplify it would also have to justify how it
could possibly be simpler than DIP 1000 was now, as DIP 1000 was
complicated because the language was complicated. It had to
support every language construct.
Rikki said there were three different levels of escape analysis.
The most basic level was "this is an output, and this contributes
to the outputs". Then you had what the language was able to
infer. Then you had what the programmer could add that could be
proven. We didn't really have that scale, so now there was no
escaping when things were just too broad.
He said he would also like the `this` reference and the nested
encapsulation context to be explicit arguments that you could
annotate when you needed to, e.g., to declare it couldn't be
`null`.
Walter noted that Herb Sutter had put out a proposal for his
revamped C++ language requiring `this` to be explicit. That would
resolve confusion regarding implicit arguments. But there was
also the case of implicit arguments when you had a nested
function. Those were hidden arguments and had the same issue. He
didn't see any straightforward solution because it was a
complicated problem.
He reiterated that the complexity of DIP 1000 was due to the
complexity of the language and not to the concept itself, which
was very simple. If you wrote it out using pointers, everything
was clear and simple. It was when you added things like `auto
ref` that it started getting complex. He'd never liked `auto ref`
and never used it because it was just confusing.
He said if Rikki could think of a better way to do it, he was all
for it. DIP 1000 was his best shot at it.
Timon said that DIP 1000 arguably already incurred some of the
complexity cost of being able to annotate different levels of
indirection, but it didn't allow you to do that in general. There
was probably a better trade-off there.
Walter said there were two kinds of indirection: pointers and
references. That doubled the complexity of DIP 1000 right there.
Timon agreed but said it meant that DIP 1000 was not what you got
when you translated everything to pointers. DIP 1000 was a step
up from that because it actually had two levels of indirection
per pointer, but it was restricted in a way that wasn't
particularly orthogonal. How you annotated either level of
indirection depended on the construct.
Walter agreed and said that was because references had an
implicit indirection and pointers did not. He asked what could be
done about that.
Rikki asked everyone to let him know if they had any ideas.
Quirin said that he understood the aim of DIP 1000 to be that you
could take the address of a local variable, like a static array
or something, and the pointer would be scoped and unable to
escape. So it might be the case that in a future version of the
language where DIP 1000 was the default, there could be a
compiler switch to disable it so that taking the address of a
local variable would then be an error.
He said the issue he'd run into was that if you had a system
function but didn't actually annotate it as `@system`, and then
you had a `scope` annotation, the compiler would assume that you
were doing the scope thing correctly. But if you weren't, you
were screwed. It was very easy to do accidentally.
This was an issue with DIP 1000. You could shoot yourself in the
foot in system code. Not in safe code if you were doing it
correctly, but if you were a beginner and didn't annotate
something `@safe` and then used, for example, `-preview=in`,
which was implicitly scope, you could get into trouble.
So he thought having the option to disable that stuff but enable
all the checks of `scope` and things like that in `@safe` code
would be good.
Walter said `@system` turned off all the checks because sometimes
you needed to do nasty things. And beginners shouldn't be writing
`@system` code. Quirin said if you didn't use `@safe`, DIP 1000
made the language more dangerous. He thought this might be why
some people had a problem with it.
Walter thought the biggest problem was that people didn't like to
write annotations. The only reason they were necessary was for
the case where you didn't have a function body. If you just took
the address of a local, the compiler would say, "Okay, that's a
scope pointer now". It would do that automatically. You didn't
have to do anything extra for that. The difficulty was in the two
places where you needed to add annotations: when there was no
function body and in a virtual function. The compiler couldn't do
it automatically in those cases.
Jonathan said part of the problem DIP 1000 was trying to solve
was something he didn't care about. He was totally fine that
taking the address of a local meant you had to avoid escaping it.
It was nice to have extra checks for it, but all those
annotations got very complicated very fast, and a lot of it was
because of how complicated the language was.
For the most part, he wouldn't want DIP 1000 on at all except in
very specific circumstances where he wanted some extra safety.
Not having it was actually simpler. If you were only taking the
addresses of locals in a small number of places, and therefore
those functions were `@system`, then most of your code was safe
and you were fine. But once you turned on DIP 1000, you ended up
with `scope` inferred all over the place, and then figuring out
what was going on became far, far more complicated.
He said it seemed like a lot of complication to try to make
something safe, which most code shouldn't need to worry about
anyway. If you were using the GC for everything, then you
typically only had to take the address of things in a small
number of places. All the complications around `scope` didn't
really buy you anything. It just made it harder to figure out
what was going on.
If it were a problem that needed a solution, he would love it if
we could solve it in a simpler way. He had no clue how we might
go about that, but if it were up to him he'd rather not have it
at all because of the complexity that it brought.
Walter said the language had recently been changed to allow `ref`
for local variables. That allowed for more safety without needing
annotations. He thought it was a good thing. It improved the
language by reducing the need for raw pointers. The next step
would be to allow `ref` on struct fields. The semantics of that
would have to be worked out, but the more you could improve the
language to reduce the need for raw pointers, the more inherently
safe it would become, and there would be fewer problems.
Adam said someone in Discord had suggested that we not build
Phobos v3 with DIP 1000 turned on. He kind of agreed with that
view. He'd told Walter before that he thought DIP 1000 had been a
huge waste of time for minimal gain.
Rikki wanted to point out that without reference counting, there
was basically no way we could do 100,000 requests per second.
That was gated by Walter's work on owner escape analysis, and
that in turn was gated on escape analysis. So he was blocked on
this, and that was why he wanted to get escape analysis sorted.
Walter said the reason the ROI was so low was because it was
rather rare that people had errant bugs in their programs because
of errant pointers into the stack. Mathias asked why we were
spending so much time on it in that case. Walter likened it to
airplane crashes: they were rare, but they were disastrous when
they happened. You couldn't be a memory-safe language and have
that problem.
Mathias said that DIP 1000 made him want to use D less, not more,
because of all the sea of deprecations he got when he enabled it
with vibe.d. It was just terrible. He was hoping it would never
be turned on by default.
When it came to the DIP itself, he said that composition just
didn't work. Any design that required him to annotate his class
or struct with `scope` in the type definition was dead on
arrival. He said a lot of people compared it to `const`, which
was the wrong comparison. `const` was outside in, but `scope` was
inside out. So if your outer layer was `const` and you composed a
type with multiple layers, then all your layers were `const`.
With `scope` it was the other way around. We had no way to
represent the depth of scopeness in the language. It wasn't
possible grammatically. It was just unworkable and unusable.
I suggested we put a pin in the discussion here and schedule a
meeting just to focus on DIP 1000. Everyone agreed.
(__UPDATE__: We had the meeting later and decided we needed to do
two things to move forward: compile a list of failing DIP 1000
cases to see if they are resolvable or not; and consider how to
do inference by default. I have no further updates at this time.)
### Improve error messages as a SAOC project
Razvan said that Max Haughton had proposed improving compilation
error messages a while back as a potential SAOC project. The goal
was to implement an error-handling mechanism that was more
sophisticated than the current approach of just printing errors
as they happened. The details had yet to be hashed out, but the
main idea was to implement an error message queue.
One of the problems with the current approach was that errors
were sometimes gagged during template instantiation. What we
wanted to do was to save them somewhere so that they could be
printed when returning to the call site. This would be quite
useful also for users of DMD-as-a-library.
With SAOC on the horizon, Razvan wanted to avoid the situation
where the judges accepted an application for this project, and we
later decided we didn't want to go this route for some reason.
Rikki suggested the queue should be thread-safe, as he needed it
for Semantic 4. It had been on his TODO list to write exactly
that, so the project had his support.
Dennis asked what wasn't thread-safe about the current mechanism
with its global error count. Rikki said he hadn't looked into it,
but in a multi-threaded scenario, any thread that threw would
need to write the error out on the main thread. He didn't think
the functionality was there.
Walter said he'd refactored error handling as an abstract class,
so it could be overridden to do whatever we wanted. We could make
it multi-threaded or whatever. The transition to using it was
incomplete because gags were still in there, but one of the
reasons he'd done it was to get rid of gags, and that would
eliminate the global state. He told Razvan that anything like the
proposed project should be built around instantiations of that
class.
Razvan asked if that meant he had Walter's approval for the
project. Walter said he didn't know what it was trying to
accomplish so he couldn't say just yet.
Razvan gave the real-world example of calling `opDispatch` on a
struct. Maybe the body had some errors and failed to instantiate.
You had no way of knowing that at the call site. It would just
look like `opDispatch` didn't exist on that struct. Right now,
without knowing why it failed, there was no way to output a
decent error message. The error was going to say that there was
no field or member for that struct.
He said there were other examples. The project aimed to save the
error messages instead of just tossing them in the dumpster so
that an accurate error message could be output to the user back
at the call site. When fixing some bugs in the past, he had
needed to resort to all kinds of hacks to decide why something
was failing.
Walter said he thought that was worth pursuing. But it would
involve getting rid of the gagging entirely and replacing it with
another abstract function or another error handler instantiation.
Razvan said that wasn't necessarily true. When errors were
gagged, you could save the state instead of printing them out.
Mathias thought it was a good idea and should go forward.
Regarding instantiation errors, he said he saw them most often
when there was an inference issue. For example, he'd do a `map`,
but somewhere his delegate did an unsafe operation and he ended
up with an error saying the overload couldn't be found. He
wondered if there was a way it could print the error about the
safety problem instead.
Razvan said that this project would save everything that had
failed so that a decision could be made at the call site by
searching through the queue. He didn't know if this could be
solved in other ways.
Walter asked how you would know at the call site which error
mattered. Razvan said it depended on the use case. Walter said if
you printed them all out, then you'd end up with the C++ problem
of hundreds of pages of error messages.
Razvan said the project would give you a tool to put out better
error messages than we had now. It wasn't intended to just save
all the error messages and print them all out. That wouldn't make
sense. Maybe in time--and he suspected Walter wouldn't like
this--we might have priority error messages.
Walter said no, normally it was the first error that mattered. If
you just logged the first error message, you'd be most of the way
to where you were trying to go. Razvan agreed that would be one
strategy.
Jonathan said that once we had the list, there were different
things we could do with it. There might be a flag that puts out
five error messages instead of one, or maybe an algorithm to
enable it to go more intelligently. If we decided it wasn't doing
anything for us we could always get rid of it later. But just
having the list of error messages would enable us to do more than
we currently could without it, though it might be hard to figure
out how to use it in some circumstances.
Walter said as an initial implementation, he'd suggest just
logging the first error message and see how far that got us.
Martin said it was okay as just another straightforward
implementation of the abstract error sink. What worried him was
if any extra context was needed, like different error categories
or warning categories, or instantiation context, that kind of
stuff. If we needed to extend the interface to accommodate that
sort of thing, it might get hairy. Interface changes might come
with a performance cost for compilers that weren't interested in
the feature. That was something to be wary of.
He said another thing was that we already had a compiler switch
to show gagged errors.
Third, there were circumstances in which some code only worked
when a template instantiation was semantically analyzed a second
time due to forward references or something. If we just went with
a simple approach, an error on the first analysis of an
instantiation could be invalidated on the second analysis. But
even in that case, it might be nice to have the error to let you
know about the forward reference.
Razvan agreed there could be some problems with this approach,
but he didn't see any definite blockers. No one objected to
moving forward with the project.
(__UPDATE__: Royal Simpson Pinto was accepted into SAOC 2024 to
work on this project.)
### Moving std.math to core.math
Martin said he'd been wanting to move `std.math` to `core.math`
for years. It had come up in discussions with Walter quite a
while ago in GitHub PRs, and he recalled Walter had agreed with
it. It had come up again more recently in attempts to make the
compiler test suite independent of Phobos. With DMD and the
runtime in the same repository now, it would be nice for all of
the make targets to be standalone with no dependency on Phobos
just to run the compiler tests.
He'd experimented and found that most of the Phobos imports in
the test cases were `std.math`. One common reason was the
exponentiation operator, `^^`. There were also some tests that
tested the math builtins.
Calls to the standard math functions were detected by the
compiler at CTFE using the mangled function names. That was
already a problem because when we changed an attribute in
`std.math`, we needed to update the compiler as well due to the
new mangled name. So we tested that all of that worked and that
the CTFE math results complied with what we expected. So there
was an implicit dependency on Phobos.
Martin said he wanted approval before going ahead because it
wouldn't be worth it to get going and then be shut down. He
wanted to make sure everyone was on board with it and that there
weren't any blockers to be aware of. Phobos would import and
forward everything to `core.math`, which already existed in the
runtime. It had something like five functions currently.
LDC already did some forwarding of math functions. `std.math` was
one of the few Phobos modules in which LDC and GDC had some
modifications, and that was just to be able to use intrinsics.
Moving it into the runtime would be nicer as it would minimize or
eliminate the need for their Phobos forks.
Walter said that `std.math` was kind of a grab bag of a lot of
things. He suggested just moving things into DRuntime that should
be `core.math` and forwarding to those, then changing the test
suite to use `core.math`. He wanted to keep `std.math`. There was
still a lot of room for math functions that didn't need to be in
the compiler test suite, and they could remain there.
Jonathan said that in the past when we decided we really wanted
something in DRuntime that had been in Phobos, but we really
wanted people importing Phobos, we moved the thing to
`core.internal`. For example, `std.traits` imported
`core.internal.traits` to avoid duplicating traits used in
DRuntime, and users could still get at it through `std.traits`.
In the general case, it was just a question of whether we wanted
`core.internal` or something more public. He'd prefer going with
`core.internal` where possible, but either way, he saw no problem
with the basic idea.
Rikki said if we were talking about primitives that the compilers
recognized and that were currently living in Phobos, then yeah,
move them. Full stop, no questions asked. He asked if anyone had
an objection to that. When no one did, he said that was the
answer to Martin's question.
Martin said the thing he didn't like about that was that we were
drawing a line. Where should it be drawn? It wasn't just about
the builtins. The list of CTFE builtins might not be complete.
There might be some functions that should be in there but
weren't. But really, most of the functions in `std.math` were
detected by the compiler.
As far as he knew, `std.math` was quite nicely isolated and
didn't depend on anything else in Phobos. He would double-check,
but he was certain it was good in that respect so that it could
just be moved over. He really didn't want to split it up. If it
was in the runtime, it was logical to include it directly from
there, starting from some specific compiler version and keeping
it in the Phobos API for a while for backward compatibility. So
the final location would be in `core.math`.
He said we did the same thing for the lifetime helpers. `move`
used to be in Phobos. That was a totally bollocks decision. How
could such a primitive function be in the standard library
instead of the runtime? But now it was in the runtime,
unfortunately with slightly different semantics, and he'd been
using it from there for ages.
Walter said the dividing line was simple: if you wanted to put it
in the compiler test suite, it needed to go in the runtime.
Martin said he would need to check, but he thought it would be
most of the functions anyway.
Mathias thought we should get rid of the exponentiation operator,
though that wouldn't solve Martin's problem. Martin said moving
it to the runtime would get rid of the special case where you got
the error trying to use it when you didn't import `std.math`. At
least we'd have that. Walter agreed with Mathias that it should
go. He thought it was an ugly wart in the language.
Adam said that Phobos 3 was a great opportunity for the change.
It was a natural dividing line. We could keep Phobos 2 as it was
and support it for a long time, but Martin could do whatever he
wanted in Phobos 3. Adam had already been looking at `std.math`
and thinking how much he dreaded porting it over. So if Martin
came up with something else and told him how to make it work,
he'd make it work.
### Primary Type Syntax DIP
Quirin had joined us to discuss [the current draft of his Primary
Type Syntax
DIP](https://forum.dlang.org/thread/zymqcnpjcpuphpeulhev@forum.dlang.org) (that was the second draft; his most recent as I write [is the fourth draft](https://forum.dlang.org/thread/cekqyahwnumvesppxsfs@forum.dlang.org)).
He assumed most of us had not read through the entire thing, as
it was a long text. He thought most DIPs were really, really
short and missed a lot of detail. He felt that anything that
touched on it that entered your thoughts should be a part of a
DIP.
The basic idea of the proposal was that we modify the grammar
without any change to the semantics or anything like that. It
aimed to ensure that any type that could be expressed by an error
message, for example, could be expressed in code as well and you
wouldn't get parsing errors. You might get a visibility error
because something was private, but that was a semantic error, not
a parsing error.
He said the easiest example was a function pointer that returned
by reference. This could not be expressed in the current state of
D. The DIP suggested we add a clause to the type grammar allowing
`ref` in front of some basic types and some type suffixes. What
had to follow obviously was a function or delegate type suffix,
and this formed a type but not a basic type. The difference was
meaningful because, for a declaration, you needed a basic type
and not a type.
It also suggested that you could form a basic type from a type by
putting parentheses around it. This was essentially the same as a
primary expression, where if you had, e.g., an addition
expression, you could put parentheses around it and then multiply
it with something else. But you had to put the parentheses around
it because it would otherwise have a different meaning.
So to declare a variable of a `ref`-returning function pointer
type, you had to use parentheses:
```
(ref int function() @safe) fp = null;
```
Rikki said that based on his knowledge of parsers, this could be
difficult to recognize. The best way forward would be to
implement it and see what happens. If it could be implemented
without failing the test suite, it shouldn't be an issue and
could go in.
Quirin said he had started implementing it for that reason. So
far, it hadn't been a problem. He'd needed to modify something to
do a further look ahead, but that was a niche case, and he had no
idea why anyone would write such code. But he hadn't found any
issues because the language usually tried to parse stuff as
declarations first. When it didn't work, then it parsed as an
expression. If it succeeded in parsing as a declaration, it just
worked.
Walter said that there was a presentation at CppCon in 2017
titled, ['Curiously Recurring C++
Bugs`](https://youtu.be/lkgszkPnV8g?si=2dNk7AVdI75_80Eo). One of
the problems they went into was things like this. Was it a
function call or a declaration? C++ apparently had all sorts of
weird errors around things like this. So when you were talking
about adding more parentheses, there was a large risk of creating
ambiguities that led to unexpected compiler behavior.
In adding more meaning to parentheses in the type constructor,
we'd need to be very sure that it didn't lead to ambiguities in
the grammar, where users could write code that looked like one
thing, but it was actually another completely unintended thing.
He didn't know if the proposal suffered from this problem, but he
suggested caution in adding more grammar productions like this.
Quirin said there were two grammar productions. One was the
primary type stuff, and the other was just allowing `ref` in
front of some part so that you could declare a function pointer
or delegate that returned by reference. He thought the latter one
should be uncontentious. The only problem was that you could just
put `ref` in front of something because it was a `ref` variable,
or a parameter that was passed by reference, and it didn't apply
to the function pointer type.
Walter said that with the function pointer type, you had two
possibilities. One was that the function returned by reference,
and the other was that it was a reference to a function.
Quirin said that was exactly like his second example where you
had a function that returned a reference to a function pointer
that returned its result by reference:
```
ref (ref int function() @safe) returnsFP() @safe => fp;
```
You needed the parentheses here to disambiguate.
Walter said D already had a syntax where you could add `ref` on
the right after the parameter list, and that meant the function
returned by reference. But D allowed ref in both places to mean
the same thing, which was an ambiguity in the language.
Quirin said the problem was that each time someone asked about
this on the forums, the answer was "you can't return a function
pointer by reference". People complained about putting `ref`
after the parameter list because it felt unnatural. His DIP was
trying to make it work with `ref` in front. And if you needed
parentheses to disambiguate, then you needed parentheses.
Walter wasn't saying Quirin was wrong. He just wanted to put up a
warning flag that `ref` was currently allowed in both places.
Changing that could break existing code and result in ambiguity
errors in the grammar. That was his concern.
Quirin said he had an implementation for the proposal, and the
implementation for `ref` worked as intended. He'd played around
with it for quite a while and really tried to push some limits.
He'd found no issues with it.
He said the same issue applied to linkage. Like a function
pointer with `extern(C)` linkage. The issue there in his
implementation was that it didn't apply the linkage to the type.
He could parse it, but he couldn't apply it, and he didn't know
why. But the whole of the thing worked perfectly. The example
code he was showing wasn't fantasy code. It was compilable with
his local compiler.
Walter asked Quirin to watch the video he'd mentioned. He said
that maybe Quirin had solved the problem, but asked that he
please review it for grammar and parentheses problems and make
sure the proposal didn't suffer from them.
There were some questions about the details of the DIP that
Quirin addressed, and Rikki suggested an alternative to consider
if it didn't work out. He said it appeared that there was no real
blocker here.
Walter said it was a laudable goal and he liked it. He just
wanted to make sure we didn't get into that C++ problem of an
ambiguous grammar that could be an expression or could be a type,
then the compiler guessed wrong and caused hidden bugs.
Quirin said he had initially thought this would cause some weird
niche problem somewhere and that he'd probably find one if he
implemented it. Miraculously, it just worked. The implementation
was there and anyone could play around with it. It was so much
easier than reading a proposal and trying to work it out in your
head.
Walter said it would be a pretty good thing to try it on the
compiler test suite. Quirin agreed.
### The 'making printf safe' DIP
Dennis was wondering about [the DIP to make `printf`
safe](https://forum.dlang.org/post/v7740t$1q51$1@digitalmars.com). It was mostly meant for DMD, which wanted to become safe. But DMD had the bootstrap compiler situation. Was the plan to wait five years until the bootstrap compiler was up to date, or could we have some shorter-term solutions to make DMD's error interface `@safe` compatible?
Walter asked why we needed such an old version for the bootstrap.
In the old days, his bootstrap compiler was always the previous
release. Why were we going back so far?
Martin said it was because we had the C++ platforms. If we newly
conquered a platform using D, the most practical thing to do
currently was to use GDC. The 2.076 version had the CXX front end
with those backported packages. That was what he recommended to
every LDC package maintainer. They were all concerned about the
bootstrapping process. So he always pointed them to GDC for
bootstrapping the first version. Then they were free to compile
more recent versions.
He said the ideal situation was that we could still use that
specific GDC release to compile the latest version. As far as he
knew, that was the status quo. So we didn't have to do multiple
jumps. Just compile that GCC version, which was still completely
C++, and then you could compile all the existing D compilers
using that GDC.
So whenever we had a new requirement for new features, then it
was going to become a multi-step process. That wasn't a problem
for us but would be for the package maintainers. If we were doing
good, then we weren't putting too much pressure on them. Most of
them did it in their spare time, making sure they had D compilers
for their platforms. If we made the bootstrapping process more
complicated for them, they wouldn't appreciate it.
Iain said that it was 2024 and people were still inventing new
CPUs. He'd had [Chinese guys inventing their own MIPS
CPU](https://en.wikipedia.org/wiki/Loongson) having to drag out
the old GDC version and port it to their CPU just to get LDC and
DMD working on it. That was another modern chip that was up and
coming. It was keeping those guys happy having a modern version
of the D compiler rather than the C++ version so that they could
jump to the latest. So that older bootstrap version was
completely invaluable.
Walter said okay. It wasn't critical that the D compiler source
code be made safe. It was just something he would like to do. But
if it was going to cause a lot of downstream problems, then of
course, what else could we do?
Iain said we'd have to make the documentation very loud and very
explicit. GDC did pretty well at this, explaining what you had to
do if you were starting from a given version of the compiler
because certain versions of GDC were written with a specific C++
standard. To get to the latest, you had to go through these
versions from whatever your starting point was. We should agree
to do the same for DMD as well.
Rikki noted that Elias had done a new dockerization image of LDC
which did the bootstrap from the LTS version of it up to the
latest. He said we should be able to dump the compiler code base
as C++, and then use that to bootstrap the same compiler version.
He'd been thinking about that for a long time. It wasn't a
problem today, but it would become a problem down the road.
Dennis asked if he meant exporting the compiler source as C++,
and Rikki said yes. Martin said he very much disagreed. It wasn't
like clang was transformable to C code so it could be
bootstrapped with a C compiler.
Regarding the LTS version of LDC, he had dropped it because he
didn't want to backport platform support in the compiler, in the
runtime, in Phobos, into a very old version with many, many, many
changes in between, just to get a bootstrap. That was stuff that
Iain had already taken care of. That was extremely important work.
He said at some point we'd end up in a situation where we
wouldn't be able to compile the latest with a very old compiler.
There would be some steps needed in between. But any changes we
made should be simple stuff. We could add `@safe` here or there,
or use native bitfields, or whatever. We just had to make a very
conscious decision to introduce new steps only when we really
needed to.
Iain added that whenever we introduced a new feature to the
compiler implementation, it shouldn't be anything fringe. It
should be a well-established feature that was stable and that we
knew was working, and happily working for at least five years.
Martin suggested using cross-compilation when experimenting on
new platforms, and the discussion veered off onto that for a
while. Then Dennis brought us back to the original point.
He thought we all agreed that the bootstrap situation made it
kind of complex to add new `printf` features. He wondered if
there could be an alternative to, e.g., `error("%s",
expr.toChars()`, where we used the `printf` format that included
the length and had a function that could return a tuple of the
length and pointer that was compatible with C varargs, e.g.,
`error("%.*s", expr.toPrintfTuple().expand)`. This would be
compatible with the old compiler. The new compiler could do its
safety checks, but the old compiler would still work without
them. This would allow us to make a `printf`-based error
interface safe with new compilers while not breaking anything.
We'd just have to ditch the magic format string rewriting in the
DIP.
Martin said that sounded valuable. All we needed was to make sure
it compiled with the older compilers, and because our test suite
was using newer compilers as well, then this would ensure we had
test coverage for the implementation of the new thing.
Walter added that the goal of fixing `printf` here wasn't just to
fix `printf`, but to get rid of the incentive to use C strings in
the front end. Right now, half of the data structures used C
strings and the other half used D strings. Fixing the `printf`
issue would enable us to tilt the source code toward using D
strings everywhere.
Dennis noted that `toPrintfTuple` could just convert a D string
to a `printf` tuple. Walter thought it was a good idea. Mathias
agreed and asked why we were still using `printf` strings in
2024. We had type information. Why were we even passing `%s` in
`std.format`? Tango had a better format for it. C#, Java... they
had all solved this problem differently. Why were we using it?
Walter said it was because `writeln` sucked. Mathias asked why we
couldn't fix it. Walter said that right now that was on Adam.
They had discussed it. The problem was that `writeln` was
absurdly complicated. If you put it or `writefln` in a piece of
code, you'd get a blizzard of template instantiations. That made
it really difficult when you were looking at code dumps to try to
isolate a problem. With `printf` it was really simple. It was
just a function call: push a couple of arguments on the stack,
call a function, done.
Another issue was that `writeln` itself was a bunch of templates.
The error sink was an abstract interface. He thought that was an
ideal use case for an abstract interface and it worked great.
`writeln` was not an abstract interface. It was an overly
complicated system.
Dennis asked if a viable alternative to `printf`-based errors
could be that we created a minimal template version of `writeln`
for DMD, since DMD mostly only concatenated strings and
occasionally formed an integer. Walter said we could write our
own `printf`, but the one in the C standard library was the most
battle-tested, debugged, and optimized. Dennis emphasized that we
only needed to concatenate strings. We didn't need things like
battle-tested float conversion for that.
Johanthan suggested we just wrap it. Dennis said that was also
okay.
Martin said that DMD at the moment didn't depend on Phobos
because doing so was a big can of worms. We could write our own
stripped down version of `writeln` that we needed. But then there
were similar things in other parts of the code base, like path
manipulations and stuff. All of that was stuff we already had in
Phobos, yet had to implement from scratch using some dirty
`malloc` stuff. That would be one of the first problems.
The second problem was using C varargs for error strings and
such. This was one of those ABI issues that were hard to get
right. They were a very platform-specific, special-case, complex
part of the ABI. This introduced difficulty when conquering a new
platform in trying to get the compiler to compile itself. If we
could ditch C varargs and use proper D stuff, that would make it
all easier.
Adam said that he had talked about simplifying `writeln` and the
`std.conv` stuff, but he'd found that people protested when
anyone suggested getting rid of any templates they liked. He was
on board with what Walter said about `writeln` being problematic
because it was a blizzard of templates. But he kept hearing from
people that we shouldn't remove these templates.
Jonathan said that we couldn't be removing the templates for
range-based stuff. For things like the `write` and `std.conv`
families, the problem was that they were using templates to take
your arbitrary type and convert it to a string. The alternative
was to hand them a string, which meant you had to do the work
upfront yourself. That might work internally in DMD, but not in
Phobos.
Regardless, the implementation we had could be improved. It was
quite slow from what he'd seen. So even if we opted to keep the
blizzard of templates, we needed to redo it.
Walter reiterated that `printf` was much maligned, but it was the
most debugged, optimized function in history. Maybe a `writeln`
could be implemented that just forwarded calls safely to
`printf`. It had its problems, which was why he had put forward
the safe `printf` proposal.
He said Jonathan was absolutely correct that templates gave a lot
of advantages to `writeln`. He wasn't arguing with that. But when
trying to debug the compiler, dealing with `writeln` was a giant
pain. That was why he always went back to `printf`. And he didn't
want the compiler dependent on `writeln`, because then we'd be
unable to bootstrap the compiler.
Jonathan agreed that we didn't want DMD dependent on Phobos. In
that case, maybe just wrapping `printf` with something that took
a `string` and converted it to a C string was the way to go.
Walter said that was what the safe `printf` proposal did, it just
had the compiler rewrite the `printf` expression to make it
memory-safe. Jonathan said we could avoid calling `printf`
directly with a wrapper function instead. Either way, the
compiler's situation was different from the general case.
He said we definitely needed to rewrite `writeln` to make it more
efficient. It wasn't appropriate for the compiler, though, since
it was doing all kinds of stuff the compiler didn't need.
We left the topic there and moved on to the next one.
### void-initializg a ref variable
Dennis asked if everyone agreed that `void` initializing a `ref`
variable should be an error. [The DIP didn't specify
it](https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1046.md), and he didn't think there was any use case for it. Walter said that was an error. No one objected.
### Scopes and auto ref
Dennis asked if everyone agreed that the keywords `auto ref` on
variables must be together and not apply with the keywords in
different scopes, e.g., `auto { ref int x = 3; }`. Walter said
yes, kill that with fire.
Quirin said he'd noticed that when looking at the grammar, `auto`
and `ref` didn't always need to be next to each other. It was
possible, for example, to write `ref const auto foo` in a
parameter list. He suggested we should ban that. Walter said it
should be deprecated.
## Conclusion
Given that some of us would be traveling on the second Friday in
September, just before DConf, we agreed to schedule our next
monthly meeting on the first Friday, September 6th, at 15:00 UTC.
If you have something you'd like to discuss with us in one of our
monthly meetings, feel free to contact me and let me know.
More information about the Digitalmars-d-announce
mailing list