Crow programming language

andy andy-hanson at protonmail.com
Thu Feb 15 04:32:27 UTC 2024


For the past few years I've been writing a programming language 
entirely in D.
The website https://crow-lang.org/ explains the language itself, 
so here I thought I'd include some comments on my experience 
writing a medium-sized project in D.

## Pros

* Debug builds with DMD in under 5 seconds.
* LDC produces very fast optimized code (at the cost of long 
compile times). Compiling to WASM supports running code in the 
website.
* Metaprogramming was useful in the interpreter to generate 
specialized code for various operations, e.g. operations for 
reading N bytes from a pointer for various values of N.
* I like how you generally get a compile error instead of the 
code doing something surprising. I've added new features and had 
them work correctly the first time thanks to purity and strong 
typing.


## Cons

* I run into https://issues.dlang.org/show_bug.cgi?id=22944 a 
lot. This is annoying when calling a function that takes many 
delegates. A single error in one delegate causes spurious `@nogc` 
errors in every one.
* Having to write `@safe @nogc pure nothrow` all the time. It 
needs a way to make that the default and mark specific things as 
not-safe or not-pure.


## Unions

I used a `TaggedUnion` mixin. It looks like:

```
immutable struct ParamsAst {
	immutable struct Varargs {
		DestructureAst param;
	}
	mixin TaggedUnion!(SmallArray!DestructureAst, Varargs*);
}
```

This is like a `DestructureAst[] | Varargs*`.
Normally that would be 192 bits: 64 for the array length, 64 for 
the pointer, 1 for the tag, and 63 for alignment.
But this uses a `SmallArray`, which packs the pointer and length 
together, and also has some room for the tag. So `ParamsAst` only 
takes up 64 bits.

I implemented pattern matching through a generated `match` 
function that takes a delegate for each type. A pattern matching 
syntax for D could make this prettier.


## Tail calls

Using tail calls makes a big difference to interpreter 
performance. Unfortunately there's no way to specify that a call 
must be a tail call. It only happens in optimized builds, so I 
pass `--d-version=TailRecursionAvailable` in those builds only, 
and other builds use a less efficient method to call the next 
operation.


## Immutability

Almost everything in the compiler is immutable.
The AST is immutable, so instead of updating it with semantic 
information, the type checker returns a "model".
This has the advantage of allowing several different AST types to 
compile to the same model type; a lot of different-looking things 
are just function calls.
In the IDE, when a file changes, it updates the AST of only the 
affected code, and updates the model for the module and any 
modules that depend on it.


## Late (logical variables)

Sometimes a field of an immutable entity can't be written 
immediately.
For example, the type checker first builds a model for the 
signature of every function, and only then checks function bodies 
(since that involves looking at the signatures of other 
functions).
To accomplish this I have a `Late` type. This starts off 
uninitialized. Attempting to read it while it's uninitialized is 
an assertion error. Once it's initialized, it can't be written 
again. Thus it's logically immutable from the reader's 
perspective since it will never read two different values.
This requires using unsafe code to write the late value (since 
you can't normally write to an immutable value). This apparently 
works, though I wonder if some day a compiler will optimize away 
`lateSet` since it's pure, takes `immutable` inputs, and returns 
nothing.


## Purity

The compiler part of the code (basically everything but the 
interpreter) is completely pure. It basically implements the LSP 
(Language Server Protocol) and the LSP client is the one doing 
all the I/O. Thus the I/O implementation can be different for 
desktop, IDE, and web.

One annoyance with pure code is having to pass `AllSymbols`, the 
symbol (interned string) table, to any function that needs to 
create a symbol or un-intern it. I think using this through a 
global variable could be considered pure, since a caller to 
`symbolOfString` can't tell whether the symbol has been added or 
not, and the `stringOfSymbol` never changes. But I'm not sure if 
that's actually safe or how to tell D to allow a global variable 
in pure code.

## Scope

I've used `scope` and `in` wherever possible with 
`-preview=dip1000 -preview=in`. I often need to cast away `scope` 
using a function `castNonScope`. This feels like it needs a 
language intrinsic or at least a standard library function.


More information about the Digitalmars-d-announce mailing list