My Long Term Vision for the D programming language
Robert Schadek
rburners at gmail.com
Tue Nov 16 21:00:48 UTC 2021
# A Long Term Vision for the D programming language
D -- The best programming language!
I imagine a DConf where you guys yell at me, not because we
disagree,
but because I'm old and forgot my hearing aid.
This is what I think needs to be done to get us there.
## GitHub / Organisation
GitHub has >10^7 accounts, D's bugzilla has what, 10^3?
No matter what feature github is missing there is no reason to
not migrate to
github.
The casual D user, when he finds a bug, will never report it we
he has to
create a special account on our bugzilla.
Github has an okay api, I bet we can replicate 99% of the
features that are
missing with very little code executed by some bots.
Additionally, we are terrible at longer term planing and
management.
In pretty much all software projects, you can find milestones,
epics, roadmaps.
Github has those features, github is were our code lives, why
does our planing
not life there as well.
I fully understand that D is a community project and that we can
not tell the
bulk of the contributors to work on issue X or milestone Y, but
we could ask
them nicely.
And if we follow our own project planing, they might just follow
along as well.
Currently, I don't know where D is heading.
And if I don't know, how should average JS developer know?
Not by reading a few hundred forum posts, but by looking at the
project
planing tools he/she is used to.
D does need more people, removing unnecessary bar of entry that
our bugzilla
should be a no-brainer.
The role of the language/std leadership I see as keeping on top
of the issues,
PR's, milestones, etc..
Setting priorities, motivating people with good libraries on
code.dlang
to get them into phobos.
And of course, laying out new directions and goal for the
language and library.
Not short term but long term e.g. ~5 years.
Only after that work is done comes the developing.
Having more development time left would be the measure of success
for the
leadership side.
## The D Compiler
### Long term goal
My desktop computer has 64GB of RAM, my laptop has 16GB why is
that all D
compiler work like its 1975 where lexer, parser, ... were
different programs?
Having played a bit with web languages like svelte and elm, I'm
disappointed
when going back to D.
An incremental compile, with reggae, for my work project takes
about seven
seconds.
Elm basically had the unittests running by the time the keyUp
event reached my
editor.
Svelte was equally fast, but instead of re-running the tests the
updated
webpage was already loaded.
I know, that D and those two language aim for different
platforms, but I the
premise should be clear.
Why redo work, if I got enough memory to store it all many times
over.
For example, if I have a function
```D
T addTwo(T)(T a, T b) {
return a + b;
}
```
and a test
```D
unittest {
auto rslt = addTwo(1, 2);
assert(rslt == 3);
}
```
and change `a + b` to `a * b` only the unittest calling it should
be
re-compiled and executed.
Additionally, in most modern language most editor's/ide's can
show me what
the type of `rslt` is, plus many more convenience features.
The compiler at some point knew what the type of `rslt` was, but
it forgets it
as soon as the compilation is done.
No editor can benefit from this information, that the compiler
had.
The worst thing though, when I compile next time and no
dependency leading to
`rslt` has changed, the compiler computes it all over again.
What a waist of time.
Enough talking about how bad the current state is, let's see how
much greener
the grass could be.
Imagine, a compiler daemon that you start once per
project/program that keeps
track of all files related to this project.
When a files changes, it lexes and parses the file, and stores
the information
it can reuse.
As it has the old parse tree of the previous version of the file,
it should be
able to figure out which declarations have changed.
At some point even dmd must know what the dependencies between
declarations in
different modules are, or what type template types have been
evaluated to.
If that information is stored, building an lsp (language server
protocol)
interface that each lsp client can talk to, to get this
information is the
easy part.
When all the dependencies, are tracked the above example for the
minimal
unittest re-run should be possible.
Which a well defined dependency graph effective multi-threading
should be
possible as well.
Why do I have to choose which backend I want to use before I start
the compiler.
I would imagine, if the compiler daemon didn't find any errors
with my code, I
should be able to tell it, use llvm to build me x86 executable.
When I ask next for an executable build with the gcc backend,
only the parts
that change because of version blocks should be rebuild.
There is no reason to re-lex, or re-parse, re-anything any
already opened file.
Even better, when working on the unittests why create any
executable at all.
Why not create whatever bytecode any reasonable VM requires and
pass it.
Companies run on lua, why can't my unittests?
There are embedded devices that run a python VM as its execution
environment.
Compiling unittests to machine-code shouldn't be a thing.
WASM needs to be first class citizen.
I want to compile D to javascript, nice looking javascript.
Now for the really big guns.
When the compiler daemon is basically the glue that glues
compiler library
functions together, we could create, basically, database
migration for
breaking changes.
As an example, lets say autodecoding should be removed.
We would write a program that would, as one part of it, find all
instances
of a foreach over a string `s` and replace that which a
`s.byDchar`;
For all breaking changes between version, we supply code
migrations.
If we are really eager to please, we write a script that applies
those to all
packages on code.dlang.org and creates PR's where possible on
github.
No more sed scripts, no more 2to3.py scripts, proper compiler
library support
for code migrations.
Just imagine the productivity gains for your private code bases
when you have
to do refactoring.
Refactoring, your D programming, by creating a D programming for
the D
compiler library.
To add one more level of meta, this could be levered to do
refactoring on the
compiler library source itself.
The member `id` for the class TemplateInstance should be called
`identifier`,
no problem lets write a small migration scripts.
When phobos canFind becomes isFindable, just write a small D
program and run
it on the compiler codebase.
The documentation/spec of the language leaves things to be
desired, when can
spend huge amount of man power on it, but keeping the spec
correct and up to
date is a tedious, thankless task.
And to be frank, we don't have the numbers, just take a look at
the photo of
the C++ standard committee meeting, and of the last physical
dconf.
But why work hard when we can work smart.
Why can't we use `__traits(allMembers, ` to iterate the AST
classes and
generate the grammar from that?
You changed the grammar, fair enough, just re-run the AST classes
to ddoc tool,
done.
I know the current AST classes are not correct reflective of the
language
grammar, but maybe that is something worth fixing.
Also, there are hundreds of small D files that are used as test
cases for the
compiler, why aren't they part of the spec?
Just to state the obvious, this would require the compiler
library to
understand dub.(json|sdl) files, but some of that work is already
being worked
on ;-)
### Error message
We need really good error message.
After playing around with elm, coming back to D is really hard.
In comparison, D might as well just use the system speaker to
send a peep
every time it finds an error.
## phobos
Batteries included, all of them, even the small flat strange ones.
### Serialization
That means that phobos needs to have support for json-schema,
yaml, ini, toml,
sdl, jsonnet.
Given a json-schema file named `schema.json` we need to be able
to write
`mixin(generateJsonSchemaASTandParser(import("schema.json")))`
and get a
parser and AST hierarchy based on the `schema.json`.
json-schema is also sometimes used for yaml, that should be
support as well.
Some of the other formats support similar schema specifications
as well.
Given a hierarchy of classes/structs, phobos also needs a method
to build
parser for those file formats.
Yes that means serialization should be a part of phobos.
Ideally, we find an abstract DSL set of UDA that can be reused
for all of the
formats, but the more important step is to have them in phobos.
Perfection being the enemy of the good and all.
### Event loop
phobos needs to have support for an event loop.
The compiler daemon library thing needs that, and that thing
should be a heavy
user of phobos, dogfooding right.
io_uring seems to be the fast modern system on linux > 5.2,
obviously Windows,
MacOSX needs to be supported as well.
But again, if the windows event loop is 5x slower than linux, so
be it.
It is much more important, that there is no friction to get
started.
The average, javescript dev looking for a statically typed
language will
likely be blown away by the performance nonetheless.
I'm not saying, merge vibe-core, but I'm saying take a really
close look at
vibe-core, and grill Sönke for a couple of hours.
At least with io_uring this event loop should scale mostly linear
in
performance with the amount of threads, given enough CPU cores.
### HTTP
Yes, 1, 2, and 3.
### Interop
I'm not sure if this is the right place to talk about this, but I
didn't find
any better place, so here I go.
autowrap ^1 already allows trivial interaction with python and
excel.
This and support for C#, WASM, haskell, golang, and rust should
be part of
phobos/D.
If a project demands to get some toml output out of a golang
call, passing it
to haskell because there is an algorithm you want to reuse,
followed by
a call to scikit-learn, and finally passing it to C#, D should be
the obvious
choose.
### Error Messages
The error messages in phobos are sometimes not great.
That is not good.
When you come from another language, that is not c++, and try to
get started
with ranges good error messages in phobos are important.
One obvious example is how we constrain template function similar
to this:
```D
auto someFun(R)(R r) if(isInputRange(R)) {
...
}
```
you get stuff like
```
a.d(8): Error: template `a.someFun` cannot deduce function from
argument types `!()(int)`
a.d(3): Candidate is: `someFun(R)(R r)`
with `R = int`
must satisfy the following constraint:
` isInputRange!R`
```
looks helpful but it is not as good as it could be.
If you don't know what an InputRange is, this does not help you.
You have to go to the documentation.
This could be made a lot easier by a small refactor.
```D
auto someFun(R)(R r) {
static assert(isInputRange!R, inputRangeErrorFormatter!R);
...
}
```
The function `inputRangeErrorFormatter` would create a string
that shows
which of the required features of an InputRange are not fulfilled
by `R`.
Especially, when there is overload resolution done by Template
Constrains the
error message get difficult to understand fast.
Just look at:
```D
a.d(3): Candidates are: `someFun(R)(R r)`
with `R = int`
must satisfy the following constraint:
` isInputRange!R`
a.d(7): `someFun(R)(R r)`
with `R = int`
must satisfy the following constraint:
` isRandomAccessRange!R`
```
This can be fixed quite easily as well:
```D
private auto someFunIR(R)(R r) { ... }
private auto someFunRAR(R)(R r) { ... }
auto somFun(R)(R r) {
static if(isInputRange!R) {
someFunIR(r);
} else static if(isRandomAccessRange!R) {
someFunRAR(r);
} else {
static assert(false, "R should be either be an "
~ "InputRange but " ~ inputRangeErrorFormatter!R
~ "\n or R should be an RandomAccessRange but "
~ randomAccessRangeErrorFormatter!R
~ "\n therefore you can call " ~ __FUNCTION__);
}
}
```
### Synchronization
This section is needed to be read with the section about *shared*
in
*The Language* part of this text.
When we have an event loop that also works with threads,
communication has to
happen somehow.
Mutex do not scale, because it is just to hard.
As an exercise, name the three necessary requirements for a
deadlock.
Wrong, there are four.
* Mutual exclusion
* Hold and wait
* No preemption
* Circular wait
phobos must have message passing that works with threads and the
event-loop.
Two kinds of mail-boxes are to be support 1-to-1 and 1-to-N,
where N is a
defined number of receives, such that the next sender is blocked
until all N
have read.
Both types support multiple senders, and predefined mailbox queue
sizes.
Making this @safe, and not just @trusted, will likely require
some copying.
That is fine, when copying is eating your multi-threading gains,
multi-threading was not the solution to your problem, IMO.
Message passing and the SumType are likely a nice way to emulate
the Ada
rendezvous concept.
## The Language
Get your tomatoes and eggs ready.
### GC
There GC is here to stay, you don't do manual memory management
(MMM) in a
compiler daemon that tracks dependency.
I don't care how smart you are, you are not that smart.
D is not going to run the ECU of the next Boeing airplane, rust
will succeed C
there.
Rust will succeed C and C++ everywhere, but who cares JS runs the
rest.
How many OS kernels have you written, but how many data
transformations have
you written.
So fight a war that is over and lost, for a niche field anyway,
or actually
have some wins and run the world.
Mixing MMM, RC, and GC, is also too complicated IMO.
The whole lifetime tracking requirements make my head spin.
That being said, I think there is a place to reuse the gained
knowledge.
In my day job I have a lot of code that results in a call to
std.array.array
allocating an array of some T which by the end of the function
gets
transformed into something else that is then returned.
The array never leaves the scope of the function.
Given lifetime analysis the compiler could insert GC.free calls.
Think automatic `T t` to `scope T t` transformation.
At least for the code I have been writing for the last two years,
this should
release quite a bit of memory back to the GC, without the GC
every having to
mark and sweep.
We want the JS developer, if we have to teach them to use MMM,
and or RC we
might as well not try.
I don't even want to think about memory I want to get some work
done.
I don't want to get more work by thinking about memory.
I want to get my project running and iterate on that.
To summarize, GC and GC only.
### shared
As said in the phobos section about synchronisation, this is an
important
building block.
As shared is basically broken, maybe painting a holistic picture
of where we
want D's multi-threading/fiber programming to go is better than
to take a look
at shared on its own.
For me, this would mean sharing data between threads and/or
fibers should be
as easy and error free has letting the GC handle memory.
That means, race conditions need very difficult to produce the
same as
deadlocks.
This, to me, implies message passing or Ada rendezvous and not
trading locks
to work on shared data.
### betterC
betterC is, at best, a waste-by-product, if we have to use
betterC to write
something for WASM, or anything significant, we might as well
start learning
rust right now.
### autodecoding
Having been saved by it a couple of times, and using a non US
keyboard
everyday, I still think it is not a terrible idea, but I think
this battle is
lost and I'm already full of tomatoes by this point.
Meaning, autodecoding will have to go.
At the same time we have to update std.uni and std.utf.
The majority of developers and users of software speak languages
that do not
fit into ASCII.
When a project requires text processing, your first thought must
be D, not
perl.
std.uni and std.utf have to be a superset of the std.uni and
std.utf of the top
20 languages.
### properties
Let's keep it simple, and consistent.
You add parenthesis to call a function.
You can not call a property function with parenthesis.
You can not take the address of a property function.
### @safe pure @nogc UDA
Consistency is king:
@safe -> safe
@trusted -> trusted
@system -> system
@nogc -> nogc
Long story short, language attributes do not start with a @, user
defined
attributes (UDAs) do.
### string interpolation
I had this in the phobos section at the start of writing this.
String interpolation is not what you want, I know it is what you
want right
now, because you think it fixes your problem, but it does not.
String interpolation is like shoe laces, you want them, but you
are walking on
lava, opening shoes are not actually your problem.
For work, I have D that generates about 10k lines of typescript,
and the
places where string interpolation would have helped were trivial
to do in
std.format.
IMO, the better solution would be something like vibe's diet,
mustache,
handlebar that doesn't require a buildstep like diet.
Whitespace control and Nullable is a big part of this to.
### ImportC
ImportC must have a preprocessor, or it is DOA.
Shelling out to gcc or clang to preprocess, makes the build
system horrible
which in turn will make the compiler library daemon thing
difficult to build.
This is also important for the language interop, as I imagine
that most
interop will go through a layer of C.
When ImportC can use openssl 1.0.2s or so it is good enough.
Having done some usage of openssl recently, my eyes can not
un-see the
terribleness that is the openssl usage of C.
### Specification
This was already partially discussed in the long term goals, but
needs better
documentation or better yet a spec.
The cool thing is, we don't need to be an ISO spec aka. a pdf.
We could very well be a long .d file with lots of comments and
unittests.
Frankly, I think that would be much more useful anyway.
Of giving a few select/unmaintained example of a language feature
show the
tests the compiler runs.
Actually, having looked at some of the tests to figure out how
stuff should be
I would imagine other people would benefit as well.
When the compiler fails to execute the spec, either the spec is
wrong or the
compiler has a bug.
Two birds with one stone, right? right!
### Andorid/IOS
Obviously, D needs to run on those platforms.
Both platforms have api's, using them must be as easy `dub add
andoird at 12.0.1`.
The gtkd people basically wrote a small program to create a D
interface to gtk
from the gtk documentation.
I bet a round of drinks at the next physical dconf that this is
possible for
android and ios as well.
The dart language people shall come to fear our binding generation
capabilities.
## On Versioning
D3 will never happen, it sounds to much like what we got when we
moved from D1
to D2.
The D2 version number 2.098.X does not make sense.
D 2.099 plus std v2 would also be terrible.
By the time I have explained to somebody new why D is in version
2.099 with
phobos having parts in version v2 in addition to
std.experimental, which is
was pretty much DOA, the person has installed, compiled, and run
"hello world"
in rust.
I talked to Andrei about this, as it seemed that we where firmly
set in our
corners of the argument.
Andrei mentioned the C++ approach, which has been really
successful.
Good ideas are there to steal, so lets do what C++ does.
Lets call the next D 23, the one after that maybe D 25.
Backwards compatibility is not a given.
But we ship the latest version of, lets say, three D versions
with the
current release.
D X is implemented in D X-1.
This would mean that the three old D version would still need to
be able to
create working binaries ~10 years down the road.
I would say, the older versions should only get patches that stop
them from
doing so.
If they come with a bug, and we have moved on to a new D version,
this bug
will exist forever in that D version.
## Leadership
I'm writing this section as one of the last.
This is maybe one of the most important parts, but also the
hardest
to validate.
When reading the forum, or the github PR's I get the feeling that
people think
that D is a consensus driven, meritocracy.
That is not the case, and that is okay.
The impression of it is very dangerous as it sets people up to be
continuously
disappointed.
Just look for all the posts where people complain that Walter
does not change
his mind.
To me this posts shows this disconnect, people except Walter to
change his
mind because, at least to their mind, their idea is better what
Walter thinks.
But he doesn't have to agree, because he is the *benevolent
dictator for life*.
Who is right or wrong is irrelevant, the impression of level of
influence is
not.
Being a bit dramatic, given people false hope, that gets
disappointed, will
drive them away from D.
A simple solution, IMO, is to take clear stance on issues.
Direct simple language.
A leadership person saying, yes xor no to thing X.
When new information comes up that warrants a reversal of such a
statement,
leadership would lay out how decision (yes|no) on X was changed
by new
information Y.
I see the DIP process troublesome as it gives the impression of
say of what D
will become.
Maybe renaming *D Improvement Proposals* into
*D Improvement Suggestion* would be an option while
simultaneously increasing
the amount of work that should go into writing of a *DIS*.
I find that the especially the given *Rationals* are way to short
to give a way
the pros and cons of an improvement of most existing DIPs.
Just have a look at the quality of the C++ proposals.
The DIS' should aim for that.
Or at least have a matrix how the improvement interacts with each
of the D
features and an analysis how this actually makes D better in real
world terms
(code.dlang.org).
This would be another nice usage for the compiler library daemon
thing.
Always asking, just because we could, should we.
But taking formal steps for the DIP can be avoided I believe if
the direction
the language should develop in is clearly marked by leadership.
There is no need to discuss the shared atomics DIP, if leadership
dictates
that message passing is the selected mechanism for thread
communication and
only that.
Sure you can still argue for shared atomics, but you have no
reason to be
disappointed when nobody takes you serious, as you already knew
where the
journey is going.
## The practical way forward
This year (2021), move from bugzilla to github.
A nice Christmas present to show that we mean business.
D 23:
* remove auto-decoding
* safe by default
* attribute consistency
* ImportC preprocessor
* remove std.experimental
D 25:
* All but, the compiler daemon library thing
D 27:
* Compiler daemon thing.
The work on the compiler daemon thing, will have to start before
2025.
## The motto
I'm serious about the motto at the top.
When people start complaining that their language is better, its
free
marketing for D.
## Closing
If D continues the way it does, it will soon be irrelevant.
And I don't want that, I want to be yelled at dconf 2071.
D's powerful templates, ctfe, and ranges made heads turn, but the
other
language have caught up.
Let us really innovate, so that D not only becomes the Voldemort
language for
C++, but for all other languages as well, because D is the best
language.
^1 https://code.dlang.org/packages/autowrap
More information about the Digitalmars-d
mailing list