Our Sister

ZombineDev via Digitalmars-d digitalmars-d at puremagic.com
Sat May 28 02:43:41 PDT 2016


On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
wrote:
> I've been working on RCStr (endearingly pronounced "Our 
> Sister"), D's up-and-coming reference counted string type. The 
> goals are:

<Slightly off-topic>

RCStr may be an easier first step, but I think generic dynamic 
arrays are more interesting, because are more generally 
applicable and user types like move-only resources make them a 
more challenging problem to solve.

BTW, what happened to scope? Generally speaking, I'm not a fan of 
Rust, and I know that you think that D needs to differentiate, 
but I like their borrowing model for several reasons:
a) while not 100% safe and quite verbose, it offers enough 
improvements over @safe D to make it a worthwhile upgrade, if you 
don't care about any other language features
b) it's not that hard to grasp / almost natural for people 
familiar with C++11's copy (shared_ptr) and move (unique_ptr) 
semantics.
3) it's general enough that it can be applied to areas like 
iterator invalidation, thread synchronization and other logic 
bugs, like some third-party rust packages demonstrate.

I think that improving escape analysis with the scope attribute 
can go along way to shortening the gap between Rust and D in that 
area.

The other elephant(s) in the room are nested contexts like 
delegates, nested structs and some alias template parameter 
arguments. These are especially bad because the user has zero 
control over those GC allocations. Which makes some of D's key 
features unusable in @nogc contexts.
<End off-topic>

>
> * Reference counted, shouldn't leak if all instances destroyed; 
> even if not, use the GC as a last-resort reclamation mechanism.
>
> * Entirely @safe.
>
> * Support UTF 100% by means of RCStr!char, RCStr!wchar etc. but 
> also raw manipulation and custom encodings via RCStr!ubyte, 
> RCStr!ushort etc.
>
> * Support several views of the same string, e.g. given s of 
> type RCStr!char, it can be iterated byte-wise, code point-wise, 
> code unit-wise etc. by using s.by!ubyte, s.by!char, s.by!dchar 
> etc.
>
> * Support const and immutable qualifiers for the character type.
>
> * Work well with const and immutable when they qualify the 
> entire RCStr type.
>
> * Fast: use the small string optimization and various other 
> layout and algorithms to make it a good choice for high 
> performance strings
>
> RFC: what primitives should RCStr have?
>
>
> Thanks,
>
> Andrei

0) (Prerequisite) Composition/interaction with language 
features/user types - RCStr in nested contexts (alias template 
parameters, delegates, nested structs/classes), array of RCStr-s, 
RCStr as a struct/class member, RCStr passed as (const) ref 
parameter, etc. should correctly increase/decrease ref count. 
This is also a prerequisite for safe RefCounted!T.
Action item: related compiler bugs should be prioritized. E.g. 
the RAII bug from
Shachar Shemesh's lightning talk - 
http://forum.dlang.org/post/n8algm$qra$1@digitalmars.com.
See also:
https://issues.dlang.org/buglist.cgi?quicksearch=raii&list_id=208631
https://issues.dlang.org/buglist.cgi?quicksearch=destructor&list_id=208632
(not everything in those lists is related but there are some 
nasty ones, like bad RVO codegen).

1) Safe slicing

2) shared overloads of member functions (e.g. for stuff like 
atomic incRef/decRef)

3) Concatenation (RCStr ~= RCStr ~ RCStr ~ char)

4) (Optional) Reserving (pre-allocating capacity) / shrinking. I 
labeled this feature request as optional, as it's not clear if 
RCStr is more like a container, or more like a slice/range.

5) Some sort of optimization for zero-terminated strings. Quite 
often one needs to interact with C APIs, which requires calling 
toStringz / toUTFz, which causes unnecessary allocations. It 
would be great if RCStr could efficiently handle this scenario.

6) !!! Not really a primitive, but we need to make sure that 
applying a chain of range transformations won't break ownership 
(e.g. leak or free prematurely).

7) Should be able to replace GC usage in transient ranges like 
e.g. File.byLine

8) Cheap initialization/assignment from string literals - should 
be roughly the same as either initializing a static character 
array (if the small string optimization is used) or just making 
it point to read-only memory in the data segment of the 
executable. It shouldn't try to write or free such memory. When 
initialized from a string literal, RCStr should also offer a 
null-terminating byte, provided that it points to the whole
If one wants to assign a string literal by overwriting parts of 
the already allocated storage, std.algorithm.mutation.copy should 
be used instead.

There may be other important primitives which I haven't thought 
of, but generally we should try to leverage std.algorithm, 
std.range, std.string and std.uni for them, via UFCS.

----------

On a related note, I know that you want to use AffixAllocator for 
reference counting, and I think it's a great idea. I have one 
question, which wasn't answered during that discussion:

// Use a nightly build to compile
import core.thread : Thread, thread_joinAll;
import std.range : iota;
import std.experimental.allocator : makeArray;
import std.experimental.allocator.building_blocks.region : 
InSituRegion;
import std.experimental.allocator.building_blocks.affix_allocator 
: AffixAllocator;

AffixAllocator!(InSituRegion!(4096) , uint) tlsAllocator;

static assert (tlsAllocator.sizeof >= 4096);

import std.stdio;
void main()
{
     shared(int)[] myArray;

     foreach (i; 0 .. 100)
     {
         new Thread(
         {
             if (i != 0) return;

             myArray = tlsAllocator.makeArray!(shared 
int)(100.iota);
             static 
assert(is(typeof(&tlsAllocator.prefix(myArray)) == 
shared(uint)*));
             writefln("At %x: %s", myArray.ptr, myArray);

         }).start();

         thread_joinAll();
     }

     writeln(myArray); // prints garbage!!!
}

So my question is: should it be possible to share thread-local 
data like this?
IMO, the current allocator design opens a serious hole in the 
type system, because it allows using data allocated from another 
thread's thread-local storage. After the other thread exits, 
accessing memory allocated from it's TLS should not be possible, 
but https://github.com/dlang/phobos/pull/3991 clearly allows that.

One should be able to allocate shared memory only from shared 
allocators. And shared allocators must backed by shared parent 
allocators or shared underlying storage. In this case the Region 
allocator should be shared, and must be backed by shared memory, 
Mallocator, or something in that vein.


More information about the Digitalmars-d mailing list