C is Brittle D is Plastic
Quirin Schroll
qs.il.paperinik at gmail.com
Fri Apr 3 12:20:19 UTC 2026
On Sunday, 22 March 2026 at 04:47:41 UTC, Walter Bright wrote:
> It's true that writing code in C doesn't automatically make it
> faster.
>
> For example, string manipulation. 0-terminated strings (the
> default in C) are, frankly, an abomination. String processing
> code is a tangle of strlen, strcpy, strncpy, strcat, all of
> which require repeated passes over the string looking for the
> 0. (Even worse, reloading the string into the cache just to
> find its length makes things even slower.)
>
> Worse is the problem that, in order to slice a string, a malloc
> is needed to copy the slice to. And then carefully manage the
> lifetime of that slice.
>
> The fix is simple - use length-delimited strings.
Working with C++ and having implemented my own `span` and
`string_view` types, I think D’s strings are under-appreciated.
They’re by far the best strings I’ve seen in a programming
language.
C++ has: `char*` and `const char*`, `char[]`, `const char[]`,
`std::string`, `std::string_view`, `std::span<char>`, and
`std::span<const char>` (those times 7 because there’s `wchar_t`,
`char8_t`, `char16_t` and `char32_t` and `signed`/`unsigned
char`) (those times 2 for `volatile` to be pedantic). At least.
Knowing when to use which is not straightforward. Comparing
substrings efficiently is difficult, I always have to look up the
arguments for `compare`. Most people just allocate substrings and
compare those naïvely. Returning a length-delimited string by
mutable content and with a default is impossible before C++20’s
`span`: You can return `data()` or `nullptr` erasing the length,
or return a `string_view` making the characters `const`.
`std::string_view` and `span`s easily dangle, so using them e.g.
for map keys is an issue: If *one* key would be dangling, you
*have* to use `std::string` and copy all the keys, simply because
C++ doesn’t have a GC; and then, you have `std::less<>` on your
ordered map type or `std::equal_to` and a custom hash on your
unordered map type because you still want to look up using
`string_view` keys without copying them. Appending is done with
`+` and only ever returns `std::string`. C++ has no `switch`ing
on strings; if they’re short, you can write a `constexpr` utility
function that maps them to numbers and switch on them. To top it
all off, (`unsigned`) `char*` are also used for random data
(instead of `void*`), and (`un`)`signed char` as the smallest
integer types.
C has a small subset of this, which makes it arguably worse.
D has: `immutable(char)[]`, `const(char)[]`, and `char[]` (times
3 for `wchar` and `dchar`). Their use-cases are straightforward,
you never need to decide vanilla/signed/unsigned or `whcar` vs
`char16_t`/`char32_t`. You can just return a `char[]` you have
and an empty one as the default. They can’t easily dangle.
Appending its own operator `~` and you can “just append” things.
Comparing substrings is straightforward. You can just use
`string` as map key type and just perform lookup with a `char[]`.
You can just `switch` on strings. (Honestly, D should add
`switch` for all slice types over switchable types: You should be
able to switch over `int[]` and `int[][]`.) Sure, there’s also
`shared` and `inout`, which are basically in the same camp as
C++’s `volatile`: You rarely encounter them.
Built-in length-delimited strings (or slices/spans) would be a
win for C, but C++ shows they’re not a panacea. The GC enables
D’s length-delimited strings to be great instead of just good;
when you disable the GC, they’re still good and do profit off of
the GC existing at compile-time. You can build a static string at
compile-time in D. That’s impossible in C++ before C++26; I’m not
sure if C++26’s reflection can do it. A lot of C++ code is C++03,
lacking even basics, and lots more is C++14 (still many Linux
distros’ default compiler’s default), which has no `string_view`
and C++20 brought `span` and transparent lookup in unordered
containers. Before actually working with those, I couldn’t have
imagined how terrible it was.
The worst part about D’s strings are auto-decoding and that
literals include a secret zero past the end without you asking
for it.
About the last thing, maybe in the next Edition, we can have
`""z` strings that request a secret zero and only add the zero if
a non-z string is used to initialize an `immutable(char)*` or
`const(char)*` (or make it an error to omit the `z` in that
case). That would allow for some compression to be done by the
compiler: If you have strings in your code like `"BC"` and
`"ABCD"`, it could just re-use segment. With the secret zero, it
can only re-use suffixes.
More information about the Digitalmars-d
mailing list