My Long Term Vision for the D programming language

Robert Schadek rburners at gmail.com
Tue Nov 16 21:00:48 UTC 2021


# A Long Term Vision for the D programming language

D -- The best programming language!

I imagine a DConf where you guys yell at me, not because we 
disagree,
but because I'm old and forgot my hearing aid.

This is what I think needs to be done to get us there.

## GitHub / Organisation

GitHub has >10^7 accounts, D's bugzilla has what, 10^3?
No matter what feature github is missing there is no reason to 
not migrate to
github.
The casual D user, when he finds a bug, will never report it we 
he has to
create a special account on our bugzilla.

Github has an okay api, I bet we can replicate 99% of the 
features that are
missing with very little code executed by some bots.

Additionally, we are terrible at longer term planing and 
management.
In pretty much all software projects, you can find milestones, 
epics, roadmaps.
Github has those features, github is were our code lives, why 
does our planing
not life there as well.

I fully understand that D is a community project and that we can 
not tell the
bulk of the contributors to work on issue X or milestone Y, but 
we could ask
them nicely.
And if we follow our own project planing, they might just follow 
along as well.

Currently, I don't know where D is heading.
And if I don't know, how should average JS developer know?
Not by reading a few hundred forum posts, but by looking at the 
project
planing tools he/she is used to.

D does need more people, removing unnecessary bar of entry that 
our bugzilla
should be a no-brainer.

The role of the language/std leadership I see as keeping on top 
of the issues,
PR's, milestones, etc..
Setting priorities, motivating people with good libraries on 
code.dlang
to get them into phobos.
And of course, laying out new directions and goal for the
language and library.
Not short term but long term e.g. ~5 years.
Only after that work is done comes the developing.
Having more development time left would be the measure of success 
for the
leadership side.


## The D Compiler

### Long term goal

My desktop computer has 64GB of RAM, my laptop has 16GB why is 
that all D
compiler work like its 1975 where lexer, parser, ... were 
different programs?
Having played a bit with web languages like svelte and elm, I'm 
disappointed
when going back to D.
An incremental compile, with reggae, for my work project takes 
about seven
seconds.
Elm basically had the unittests running by the time the keyUp 
event reached my
editor.
Svelte was equally fast, but instead of re-running the tests the 
updated
webpage was already loaded.
I know, that D and those two language aim for different 
platforms, but I the
premise should be clear.
Why redo work, if I got enough memory to store it all many times 
over.
For example, if I have a function

```D
T addTwo(T)(T a, T b) {
	return a + b;
}
```

and a test

```D
unittest {
	auto rslt = addTwo(1, 2);
	assert(rslt == 3);
}
```

and change `a + b` to `a * b` only the unittest calling it should 
be
re-compiled and executed.
Additionally, in most modern language most editor's/ide's can 
show me what
the type of `rslt` is, plus many more convenience features.
The compiler at some point knew what the type of `rslt` was, but 
it forgets it
as soon as the compilation is done.
No editor can benefit from this information, that the compiler 
had.
The worst thing though, when I compile next time and no 
dependency leading to
`rslt` has changed, the compiler computes it all over again.
What a waist of time.
Enough talking about how bad the current state is, let's see how 
much greener
the grass could be.

Imagine, a compiler daemon that you start once per 
project/program that keeps
track of all files related to this project.
When a files changes, it lexes and parses the file, and stores 
the information
it can reuse.
As it has the old parse tree of the previous version of the file, 
it should be
able to figure out which declarations have changed.
At some point even dmd must know what the dependencies between 
declarations in
different modules are, or what type template types have been 
evaluated to.
If that information is stored, building an lsp (language server 
protocol)
interface that each lsp client can talk to, to get this 
information is the
easy part.
When all the dependencies, are tracked the above example for the 
minimal
unittest re-run should be possible.
Which a well defined dependency graph effective multi-threading 
should be
possible as well.

Why do I have to choose which backend I want to use before I start
the compiler.
I would imagine, if the compiler daemon didn't find any errors 
with my code, I
should be able to tell it, use llvm to build me x86 executable.
When I ask next for an executable build with the gcc backend, 
only the parts
that change because of version blocks should be rebuild.
There is no reason to re-lex, or re-parse, re-anything any 
already opened file.
Even better, when working on the unittests why create any 
executable at all.
Why not create whatever bytecode any reasonable VM requires and 
pass it.
Companies run on lua, why can't my unittests?
There are embedded devices that run a python VM as its execution 
environment.
Compiling unittests to machine-code shouldn't be a thing.

WASM needs to be first class citizen.
I want to compile D to javascript, nice looking javascript.

Now for the really big guns.
When the compiler daemon is basically the glue that glues 
compiler library
functions together, we could create, basically, database 
migration for
breaking changes.
As an example, lets say autodecoding should be removed.
We would write a program that would, as one part of it, find all 
instances
of a foreach over a string `s` and replace that which a 
`s.byDchar`;
For all breaking changes between version, we supply code 
migrations.
If we are really eager to please, we write a script that applies 
those to all
packages on code.dlang.org and creates PR's where possible on 
github.
No more sed scripts, no more 2to3.py scripts, proper compiler 
library support
for code migrations.
Just imagine the productivity gains for your private code bases 
when you have
to do refactoring.
Refactoring, your D programming, by creating a D programming for 
the D
compiler library.
To add one more level of meta, this could be levered to do 
refactoring on the
compiler library source itself.
The member `id` for the class TemplateInstance should be called 
`identifier`,
no problem lets write a small migration scripts.
When phobos canFind becomes isFindable, just write a small D 
program and run
it on the compiler codebase.

The documentation/spec of the language leaves things to be 
desired, when can
spend huge amount of man power on it, but keeping the spec 
correct and up to
date is a tedious, thankless task.
And to be frank, we don't have the numbers, just take a look at 
the photo of
the C++ standard committee meeting, and of the last physical 
dconf.
But why work hard when we can work smart.
Why can't we use `__traits(allMembers, ` to iterate the AST 
classes and
generate the grammar from that?
You changed the grammar, fair enough, just re-run the AST classes 
to ddoc tool,
done.
I know the current AST classes are not correct reflective of the 
language
grammar, but maybe that is something worth fixing.
Also, there are hundreds of small D files that are used as test 
cases for the
compiler, why aren't they part of the spec?

Just to state the obvious, this would require the compiler 
library to
understand dub.(json|sdl) files, but some of that work is already 
being worked
on ;-)

### Error message

We need really good error message.
After playing around with elm, coming back to D is really hard.
In comparison, D might as well just use the system speaker to 
send a peep
every time it finds an error.


## phobos

Batteries included, all of them, even the small flat strange ones.

### Serialization

That means that phobos needs to have support for json-schema, 
yaml, ini, toml,
sdl, jsonnet.
Given a json-schema file named `schema.json` we need to be able 
to write
`mixin(generateJsonSchemaASTandParser(import("schema.json")))` 
and get a
parser and AST hierarchy based on the `schema.json`.
json-schema is also sometimes used for yaml, that should be 
support as well.
Some of the other formats support similar schema specifications 
as well.
Given a hierarchy of classes/structs, phobos also needs a method 
to build
parser for those file formats.
Yes that means serialization should be a part of phobos.
Ideally, we find an abstract DSL set of UDA that can be reused 
for all of the
formats, but the more important step is to have them in phobos.
Perfection being the enemy of the good and all.

### Event loop

phobos needs to have support for an event loop.
The compiler daemon library thing needs that, and that thing 
should be a heavy
user of phobos, dogfooding right.
io_uring seems to be the fast modern system on linux > 5.2, 
obviously Windows,
MacOSX needs to be supported as well.
But again, if the windows event loop is 5x slower than linux, so 
be it.
It is much more important, that there is no friction to get 
started.
The average, javescript dev looking for a statically typed 
language will
likely be blown away by the performance nonetheless.
I'm not saying, merge vibe-core, but I'm saying take a really 
close look at
vibe-core, and grill Sönke for a couple of hours.
At least with io_uring this event loop should scale mostly linear 
in
performance with the amount of threads, given enough CPU cores.

### HTTP

Yes, 1, 2, and 3.

### Interop

I'm not sure if this is the right place to talk about this, but I 
didn't find
any better place, so here I go.
autowrap ^1 already allows trivial interaction with python and 
excel.
This and support for C#, WASM, haskell, golang, and rust should 
be part of
phobos/D.
If a project demands to get some toml output out of a golang 
call, passing it
to haskell because there is an algorithm you want to reuse, 
followed by
a call to scikit-learn, and finally passing it to C#, D should be 
the obvious
choose.

### Error Messages

The error messages in phobos are sometimes not great.
That is not good.
When you come from another language, that is not c++, and try to 
get started
with ranges good error messages in phobos are important.

One obvious example is how we constrain template function similar 
to this:

```D
auto someFun(R)(R r) if(isInputRange(R)) {
	...
}
```

you get stuff like

```
a.d(8): Error: template `a.someFun` cannot deduce function from 
argument types `!()(int)`
a.d(3):        Candidate is: `someFun(R)(R r)`
   with `R = int`
   must satisfy the following constraint:
`       isInputRange!R`
```

looks helpful but it is not as good as it could be.
If you don't know what an InputRange is, this does not help you.
You have to go to the documentation.
This could be made a lot easier by a small refactor.

```D
auto someFun(R)(R r) {
	static assert(isInputRange!R, inputRangeErrorFormatter!R);
	...
}
```

The function `inputRangeErrorFormatter` would create a string 
that shows
which of the required features of an InputRange are not fulfilled 
by `R`.
Especially, when there is overload resolution done by Template 
Constrains the
error message get difficult to understand fast.
Just look at:

```D
a.d(3):        Candidates are: `someFun(R)(R r)`
   with `R = int`
   must satisfy the following constraint:
`       isInputRange!R`
a.d(7):                        `someFun(R)(R r)`
   with `R = int`
   must satisfy the following constraint:
`       isRandomAccessRange!R`
```

This can be fixed quite easily as well:

```D
private auto someFunIR(R)(R r) { ... }

private auto someFunRAR(R)(R r) { ...  }

auto somFun(R)(R r) {
	static if(isInputRange!R) {
		someFunIR(r);
	} else static if(isRandomAccessRange!R) {
		someFunRAR(r);
	} else {
		static assert(false, "R should be either be an "
				~ "InputRange but " ~ inputRangeErrorFormatter!R
				~ "\n or R should be an RandomAccessRange but "
				~ randomAccessRangeErrorFormatter!R
				~ "\n therefore you can call " ~ __FUNCTION__);
	}
}
```

### Synchronization

This section is needed to be read with the section about *shared* 
in
*The Language* part of this text.
When we have an event loop that also works with threads, 
communication has to
happen somehow.
Mutex do not scale, because it is just to hard.
As an exercise, name the three necessary requirements for a 
deadlock.
Wrong, there are four.

* Mutual exclusion
* Hold and wait
* No preemption
* Circular wait

phobos must have message passing that works with threads and the 
event-loop.
Two kinds of mail-boxes are to be support 1-to-1 and 1-to-N, 
where N is a
defined number of receives, such that the next sender is blocked 
until all N
have read.
Both types support multiple senders, and predefined mailbox queue 
sizes.
Making this @safe, and not just @trusted, will likely require 
some copying.
That is fine, when copying is eating your multi-threading gains,
multi-threading was not the solution to your problem, IMO.
Message passing and the SumType are likely a nice way to emulate 
the Ada
rendezvous concept.

## The Language

Get your tomatoes and eggs ready.

### GC

There GC is here to stay, you don't do manual memory management 
(MMM) in a
compiler daemon that tracks dependency.
I don't care how smart you are, you are not that smart.
D is not going to run the ECU of the next Boeing airplane, rust 
will succeed C
there.
Rust will succeed C and C++ everywhere, but who cares JS runs the 
rest.
How many OS kernels have you written, but how many data 
transformations have
you written.
So fight a war that is over and lost, for a niche field anyway, 
or actually
have some wins and run the world.

Mixing MMM, RC, and GC, is also too complicated IMO.
The whole lifetime tracking requirements make my head spin.
That being said, I think there is a place to reuse the gained 
knowledge.
In my day job I have a lot of code that results in a call to 
std.array.array
allocating an array of some T which by the end of the function 
gets
transformed into something else that is then returned.
The array never leaves the scope of the function.
Given lifetime analysis the compiler could insert GC.free calls.
Think automatic `T t` to `scope T t` transformation.
At least for the code I have been writing for the last two years, 
this should
release quite a bit of memory back to the GC, without the GC 
every having to
mark and sweep.

We want the JS developer, if we have to teach them to use MMM, 
and or RC we
might as well not try.
I don't even want to think about memory I want to get some work 
done.
I don't want to get more work by thinking about memory.
I want to get my project running and iterate on that.

To summarize, GC and GC only.

### shared

As said in the phobos section about synchronisation, this is an 
important
building block.
As shared is basically broken, maybe painting a holistic picture 
of where we
want D's multi-threading/fiber programming to go is better than 
to take a look
at shared on its own.
For me, this would mean sharing data between threads and/or 
fibers should be
as easy and error free has letting the GC handle memory.
That means, race conditions need very difficult to produce the 
same as
deadlocks.
This, to me, implies message passing or Ada rendezvous and not 
trading locks
to work on shared data.

### betterC

betterC is, at best, a waste-by-product, if we have to use 
betterC to write
something for WASM, or anything significant, we might as well 
start learning
rust right now.

### autodecoding

Having been saved by it a couple of times, and using a non US 
keyboard
everyday, I still think it is not a terrible idea, but I think 
this battle is
lost and I'm already full of tomatoes by this point.
Meaning, autodecoding will have to go.
At the same time we have to update std.uni and std.utf.
The majority of developers and users of software speak languages 
that do not
fit into ASCII.
When a project requires text processing, your first thought must 
be D, not
perl.
std.uni and std.utf have to be a superset of the std.uni and 
std.utf of the top
20 languages.

### properties

Let's keep it simple, and consistent.
You add parenthesis to call a function.
You can not call a property function with parenthesis.
You can not take the address of a property function.

### @safe pure @nogc UDA

Consistency is king:

@safe -> safe
@trusted -> trusted
@system -> system
@nogc -> nogc


Long story short, language attributes do not start with a @, user 
defined
attributes (UDAs) do.

### string interpolation

I had this in the phobos section at the start of writing this.
String interpolation is not what you want, I know it is what you 
want right
now, because you think it fixes your problem, but it does not.
String interpolation is like shoe laces, you want them, but you 
are walking on
lava, opening shoes are not actually your problem.
For work, I have D that generates about 10k lines of typescript, 
and the
places where string interpolation would have helped were trivial 
to do in
std.format.
IMO, the better solution would be something like vibe's diet, 
mustache,
handlebar that doesn't require a buildstep like diet.
Whitespace control and Nullable is a big part of this to.

### ImportC

ImportC must have a preprocessor, or it is DOA.
Shelling out to gcc or clang to preprocess, makes the build 
system horrible
which in turn will make the compiler library daemon thing 
difficult to build.
This is also important for the language interop, as I imagine 
that most
interop will go through a layer of C.
When ImportC can use openssl 1.0.2s or so it is good enough.
Having done some usage of openssl recently, my eyes can not 
un-see the
terribleness that is the openssl usage of C.

### Specification

This was already partially discussed in the long term goals, but 
needs better
documentation or better yet a spec.
The cool thing is, we don't need to be an ISO spec aka. a pdf.
We could very well be a long .d file with lots of comments and 
unittests.
Frankly, I think that would be much more useful anyway.
Of giving a few select/unmaintained example of a language feature 
show the
tests the compiler runs.
Actually, having looked at some of the tests to figure out how 
stuff should be
I would imagine other people would benefit as well.
When the compiler fails to execute the spec, either the spec is 
wrong or the
compiler has a bug.
Two birds with one stone, right? right!

### Andorid/IOS

Obviously, D needs to run on those platforms.
Both platforms have api's, using them must be as easy `dub add 
andoird at 12.0.1`.
The gtkd people basically wrote a small program to create a D 
interface to gtk
from the gtk documentation.
I bet a round of drinks at the next physical dconf that this is 
possible for
android and ios as well.
The dart language people shall come to fear our binding generation
capabilities.


## On Versioning

D3 will never happen, it sounds to much like what we got when we 
moved from D1
to D2.
The D2 version number 2.098.X does not make sense.
D 2.099 plus std v2 would also be terrible.
By the time I have explained to somebody new why D is in version 
2.099 with
phobos having parts in version v2 in addition to 
std.experimental, which is
was pretty much DOA, the person has installed, compiled, and run 
"hello world"
in rust.
I talked to Andrei about this, as it seemed that we where firmly 
set in our
corners of the argument.
Andrei mentioned the C++ approach, which has been really 
successful.
Good ideas are there to steal, so lets do what C++ does.

Lets call the next D 23, the one after that maybe D 25.
Backwards compatibility is not a given.
But we ship the latest version of, lets say, three D versions 
with the
current release.
D X is implemented in D X-1.
This would mean that the three old D version would still need to 
be able to
create working binaries ~10 years down the road.
I would say, the older versions should only get patches that stop 
them from
doing so.
If they come with a bug, and we have moved on to a new D version, 
this bug
will exist forever in that D version.


## Leadership

I'm writing this section as one of the last.
This is maybe one of the most important parts, but also the 
hardest
to validate.
When reading the forum, or the github PR's I get the feeling that 
people think
that D is a consensus driven, meritocracy.
That is not the case, and that is okay.
The impression of it is very dangerous as it sets people up to be 
continuously
disappointed.
Just look for all the posts where people complain that Walter 
does not change
his mind.
To me this posts shows this disconnect, people except Walter to 
change his
mind because, at least to their mind, their idea is better what 
Walter thinks.
But he doesn't have to agree, because he is the *benevolent 
dictator for life*.
Who is right or wrong is irrelevant, the impression of level of 
influence is
not.
Being a bit dramatic, given people false hope, that gets 
disappointed, will
drive them away from D.
A simple solution, IMO, is to take clear stance on issues.
Direct simple language.
A leadership person saying, yes xor no to thing X.
When new information comes up that warrants a reversal of such a 
statement,
leadership would lay out how decision (yes|no) on X was changed 
by new
information Y.

I see the DIP process troublesome as it gives the impression of 
say of what D
will become.
Maybe renaming *D Improvement Proposals* into
*D Improvement Suggestion* would be an option while 
simultaneously increasing
the amount of work that should go into writing of a *DIS*.
I find that the especially the given *Rationals* are way to short 
to give a way
the pros and cons of an improvement of most existing DIPs.
Just have a look at the quality of the C++ proposals.
The DIS' should aim for that.
Or at least have a matrix how the improvement interacts with each 
of the D
features and an analysis how this actually makes D better in real 
world terms
(code.dlang.org).
This would be another nice usage for the compiler library daemon 
thing.
Always asking, just because we could, should we.

But taking formal steps for the DIP can be avoided I believe if 
the direction
the language should develop in is clearly marked by leadership.
There is no need to discuss the shared atomics DIP, if leadership 
dictates
that message passing is the selected mechanism for thread 
communication and
only that.
Sure you can still argue for shared atomics, but you have no 
reason to be
disappointed when nobody takes you serious, as you already knew 
where the
journey is going.


## The practical way forward

This year (2021), move from bugzilla to github.
A nice Christmas present to show that we mean business.

D 23:

* remove auto-decoding
* safe by default
* attribute consistency
* ImportC preprocessor
* remove std.experimental

D 25:

* All but, the compiler daemon library thing

D 27:

* Compiler daemon thing.

The work on the compiler daemon thing, will have to start before 
2025.


## The motto

I'm serious about the motto at the top.
When people start complaining that their language is better, its 
free
marketing for D.


## Closing

If D continues the way it does, it will soon be irrelevant.
And I don't want that, I want to be yelled at dconf 2071.

D's powerful templates, ctfe, and ranges made heads turn, but the 
other
language have caught up.
Let us really innovate, so that D not only becomes the Voldemort 
language for
C++, but for all other languages as well, because D is the best 
language.


^1 https://code.dlang.org/packages/autowrap



More information about the Digitalmars-d mailing list