[Dlang-study] [rcstring] Defining rcstring

Михаил Страшун public at dicebot.lv
Sat Feb 6 13:51:08 PST 2016


I am sorry that I keep arguing about things that may looks unimportant
to you but so far it looks like a major effort will go into designing a
thing that will only be helpful in a few situations and won't change
overall situation much.

>> Not sure what you mean about "no need to return them by reference"
>> though. Does that apply only to byX ranges or you want to make the whole
>> string effectively unmodifiable? In other words, how the idiom of
>> mutable reusable buffer will look like?
>
> Characters in a string may be modifiable by means of opAssign,
> opOpAssign etc.

Makes sense. And just to be extra sure - opSliceAssign is planned to be
allowed to, right? If yes, that should be enough to for most scenarios.
Interaction with C bindings may become complicated though, do you have
any vision about it?

>> When it comes to encoding, there is also issue of how lacking is current
>> support of non-UTF encodings in Phobos.
>
> D uses UTF for strings. Vivid anecdotes aside, we really can't be
> everything to everyone. Your friend could have written a translator to
> UTF in a few lines.The DNA optimization points at performance bugs in
> phobos that far as I know have been fixed or are fixable by rote. I
> think this non-UTF requirement would just stretch things too far and
> smacks of solving the wrong problem.

From a pure technical point of view you are perfectly right. But does
that makes the fact potential users leave dissapointed better? UTF isn't
a silver bullet and good standard library should either support other
encodings naturally or have a good documentation which shows idioms to
transcode input to unicode without sacrificing performance.

Remember: not every string is a text but right now Phobos allows you
only to choose between raw bytes and UTF-8 when it comes to stuff like
File.byLineCopy - it is hardly a surprise library author prefer
convenience of the latter and ignore other options. Standard library
doesn't only provide some bits of functionality - it influences minds
and habits of developers of 3d-party libraries.

Considering each new incompatible change of same feature domain comes
with exponential user resistance, this is pretty much last chance to get
string semantics as future-proof as possible.

>> * What are cases for const if one wants to prohibit immutable for a
>> given a type?
>
> Const is a non-modifiable view on data that may otherwise be mutable.

Again, I'd like to read confirmation from Walter on this because I
recall different statements from him in the past on this topic. Also my
own experience of trying to use const in such manner (== effectively
logical const, like in C++) is rather bad and if it was the intention,
it feels like major design PITA that is much too intrusive for declared
goals. Physical immutability guarantess add at last some justification
for it being that demanding.

>> Everything else is just making compiler happy when it forces
>> const on you (like `this` pointer within in/out contracts).
>> * As a consequence, how will compiler ensure in/out contracts won't
>> affect refcounting state for `this` if it becomes legal to cast const
>> away and mutate?
>
> I am sorry but I don't understand this question. To the extent I do I do
> not have an answer for the time being.

You may call me paranoid but I was thinking about this :)

void foo ( )
in
{
    // compiler currently qualified "this" inside a conract with const
    // which gives guarantees that enabling/disabling contracts has no
    // (accidental) effect on class/struct semantics. If passing
    // const "this" to a function may actually change a refcount, it
    // may add to a contract impact in a subtle way
    bar(this);
}
body
{

>> * How do you envision efficient cross-thread sharing of rcstring if
>> immutability is out of the question?
>
> Initially no sharing will be allowed. Following the initial
> implementation we may add implementation for the "shared" qualifier for
> rcstring.

That is one of main decision topics for any string replacement. If
sharing support is even not supposed to be discussed, what is the point
of the case study?

I have seen plenty of successful thread-local implementation of
reference counted strings. It is multi-threading that makes things
complicated - and commiting to new standard design which does not plan
for sharing from the very beginning is a good way to ensure it will not
be usable in such way.

>> * If one can't support immutability for something relatively simple and
>> specialized like char array, doesn't it effectively kill the concept or
>> immutable containers important for multi-threading?
>
> Back to basics: immutability in D has always been in intent "real"
> immutability. That means the bytes of an immutable object are
> effectively read-only after initialization. Composition implies that all
> bytes of all members of an immutable object are immutable.
>
> The advantage of this is because we can share with minimal barriers
> (only need to make sure data is not shared before initialization has
> finished). The disadvantage is that reference counting is not compatible
> with immutable objects.
>
> We can't deliver two contradictory guarantees at the same time.

I know what immutability is in D, but that doesn't really answer my
question :) Right now I am aware of two truly scaling approaches to
sharing in D:

- `@safe Unique!T` which allows multi-threaded ownsership transfer (not
actually supported by Phobos yet, but all prerequisites seem to be there)
- immutable (both directly and by making immutable copy from mutable
data) + atomics

Anything that involves locking a mutex on method calls (like it tends to
happen with all straightforward shared RC implementations) destroys
performance so hard it is hardly even considered an option these days.

So considering you are willing to abandond immutability and unqiqueness
support still has a long way to go, what does remain? Will new
"standard" string type be incapable of lock-free sharing?

On a related topic:

Why do you completely discard external reference counting approach (i.e.
storing refcount in GC/allocator internal data structures bound to
allocated memory blocks)? Is there any paper explaining pitfalls of such
concept?

BR,
Dicebot

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.puremagic.com/pipermail/dlang-study/attachments/20160206/acadbee1/attachment.sig>


More information about the Dlang-study mailing list