[Dlang-study] [rcstring] Defining rcstring

Tue Feb 2 15:57:51 PST 2016

On Tuesday, February 02, 2016 16:40:26 Andrei Alexandrescu wrote:
> * call it rcstring or RCString? The first makes it closer to "string",
> the other is politically correct.

Since it's not built-in, I really so no reason to break the naming
conventions - especially if it's not a drop-in replacement for string.

> * Characters are small so no need to return them by reference. Because
> of this, making RCString @safe should be possible in current D. However,
> this also makes RCString not a plug-in replacement for string (which may
> after all be a good thing)

I take it that you mean that any functions that RCString might have (like
opIndex or front or whatever) always return individual characters by value?
That could be a bit annoying upon occasion, but I don't think that it's all
that big a deal - particularly if it means that it can be @safe.

> * Since string-compatibility is off the table, how about we fix string's
> issues with autodecoding? RCString should offer no indexed access and no
> length. Instead it offers the ranges byCodeUnit, byChar, byWChar, and
> byDChar. The first one does not do any decoding and offers length and
> random access. (What should be its element type?) The other ones are
> bidirectional ranges that do the appropriate decoding.

I think that it's a great idea to make it so that the programmer has to ask
for byCodeUnit, byChar, or whatever they want in terms of decoding. That
will make dealing with RCStrings less error-prone (though it would be nice
if we could do something similar to string).

As for the element type, I assume that you're talking about how it stores
them internally rather than the external element type? Since if you have to
use byChar or byCodeUnit or whatever to access the elements, then the
element type does (on some level at least) become an implementation detail.
I would have assumed that we'd either make it char (in which case the
documentation should probably make that clear for efficiency purposes, but
it wouldn't really matter to the API), or we would templatize RCString on
the character type, in which case, byCodeUnit would be over that character
type, and the other ranges would decode as necessary.

And between those two options, I'd favor templatizing it. I definitely think
that most code should use char as the code unit, but some of the
Windows-centric folks are going to want wchar to avoid decoding to talk to
Windows system calls, and if you're doing anything where you do want to
operate at the code point level quite a bit, then having dchar as the code
unit (and thus avoiding decoding altogether) would be desirable for
efficiency. And all it really costs us is that using RCString is a bit more
verbose, because you have to do RCString!char instead of RCString, but maybe
we can come up with good aliases if that's a problem.

> * Immutable does not play well with reference counting. I'm of a mind to
> reject immutable rcstring for now and figure out later how to go about
> it. Then const rcstring is okay because we always consider const a view
> on mutable strings (even though they're gone). We'll cast const away
> when manipulating the refcount.

I would point out that it's undefined behavior to cast away const from a
variable and mutate it, and the spec is very explicit about that:

http://dlang.org/spec/const3.html

So, unless you get Walter to agree that that should no longer be undefined
behavior, and then we update the spec and make sure that the compiler
doesn't do anything anymore that treats it as undefined behavior, then
casting away const to mutate is not something that we should ever be doing.
And Walter has generally been very adamant that const needs to not have
any backdoors, so I don't know how easy it's going to be to convince him.

As nice as it is to be able to depend on a const variable not mutating a
mutable one, there are an annoying number of places where we can't use const
as long as casting away const and mutating is undefined. So, such a change
may very well be for the better, but it _is_ a change, and we certainly
shouldn't be introducing anything into Phobos that does it unless we change
the spec and make sure that the compiler is in line with the change.

On another note, how does this relate to discussions on adding reference
counting into the language? I would assume that this can be done with or
with that, but does it affect the API in any way that someone using it would
care about if we introduce it with library-based reference counting and then
later change it to use a new language construct if/when we add a reference
counting mechanism to the language?

- Jonathan M Davis