Our Sister

ZombineDev via Digitalmars-d digitalmars-d at puremagic.com
Sat May 28 03:35:49 PDT 2016


On Saturday, 28 May 2016 at 09:43:41 UTC, ZombineDev wrote:
> On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
> wrote:
>> I've been working on RCStr (endearingly pronounced "Our 
>> Sister"), D's up-and-coming reference counted string type. The 
>> goals are:
>
> <Slightly off-topic>
>
> RCStr may be an easier first step, but I think generic dynamic 
> arrays are more interesting, because are more generally 
> applicable and user types like move-only resources make them a 
> more challenging problem to solve.
>
> BTW, what happened to scope? Generally speaking, I'm not a fan 
> of Rust, and I know that you think that D needs to 
> differentiate, but I like their borrowing model for several 
> reasons:
> a) while not 100% safe and quite verbose, it offers enough 
> improvements over @safe D to make it a worthwhile upgrade, if 
> you don't care about any other language features
> b) it's not that hard to grasp / almost natural for people 
> familiar with C++11's copy (shared_ptr) and move (unique_ptr) 
> semantics.
> 3) it's general enough that it can be applied to areas like 
> iterator invalidation, thread synchronization and other logic 
> bugs, like some third-party rust packages demonstrate.
>
> I think that improving escape analysis with the scope attribute 
> can go along way to shortening the gap between Rust and D in 
> that area.
>
> The other elephant(s) in the room are nested contexts like 
> delegates, nested structs and some alias template parameter 
> arguments. These are especially bad because the user has zero 
> control over those GC allocations. Which makes some of D's key 
> features unusable in @nogc contexts.
> <End off-topic>
>
>>
>> * Reference counted, shouldn't leak if all instances 
>> destroyed; even if not, use the GC as a last-resort 
>> reclamation mechanism.
>>
>> * Entirely @safe.
>>
>> * Support UTF 100% by means of RCStr!char, RCStr!wchar etc. 
>> but also raw manipulation and custom encodings via 
>> RCStr!ubyte, RCStr!ushort etc.
>>
>> * Support several views of the same string, e.g. given s of 
>> type RCStr!char, it can be iterated byte-wise, code 
>> point-wise, code unit-wise etc. by using s.by!ubyte, 
>> s.by!char, s.by!dchar etc.
>>
>> * Support const and immutable qualifiers for the character 
>> type.
>>
>> * Work well with const and immutable when they qualify the 
>> entire RCStr type.
>>
>> * Fast: use the small string optimization and various other 
>> layout and algorithms to make it a good choice for high 
>> performance strings
>>
>> RFC: what primitives should RCStr have?
>>
>>
>> Thanks,
>>
>> Andrei
>
> 0) (Prerequisite) Composition/interaction with language 
> features/user types - RCStr in nested contexts (alias template 
> parameters, delegates, nested structs/classes), array of 
> RCStr-s, RCStr as a struct/class member, RCStr passed as 
> (const) ref parameter, etc. should correctly increase/decrease 
> ref count. This is also a prerequisite for safe RefCounted!T.
> Action item: related compiler bugs should be prioritized. E.g. 
> the RAII bug from
> Shachar Shemesh's lightning talk - 
> http://forum.dlang.org/post/n8algm$qra$1@digitalmars.com.
> See also:
> https://issues.dlang.org/buglist.cgi?quicksearch=raii&list_id=208631
> https://issues.dlang.org/buglist.cgi?quicksearch=destructor&list_id=208632
> (not everything in those lists is related but there are some 
> nasty ones, like bad RVO codegen).
>
> 1) Safe slicing
>
> 2) shared overloads of member functions (e.g. for stuff like 
> atomic incRef/decRef)
>
> 3) Concatenation (RCStr ~= RCStr ~ RCStr ~ char)
>
> 4) (Optional) Reserving (pre-allocating capacity) / shrinking. 
> I labeled this feature request as optional, as it's not clear 
> if RCStr is more like a container, or more like a slice/range.
>
> 5) Some sort of optimization for zero-terminated strings. Quite 
> often one needs to interact with C APIs, which requires calling 
> toStringz / toUTFz, which causes unnecessary allocations. It 
> would be great if RCStr could efficiently handle this scenario.
>
> 6) !!! Not really a primitive, but we need to make sure that 
> applying a chain of range transformations won't break ownership 
> (e.g. leak or free prematurely).
>
> 7) Should be able to replace GC usage in transient ranges like 
> e.g. File.byLine
>
> 8) Cheap initialization/assignment from string literals - 
> should be roughly the same as either initializing a static 
> character array (if the small string optimization is used) or 
> just making it point to read-only memory in the data segment of 
> the executable. It shouldn't try to write or free such memory. 
> When initialized from a string literal, RCStr should also offer 
> a null-terminating byte, provided that it points to the whole
> If one wants to assign a string literal by overwriting parts of 
> the already allocated storage, std.algorithm.mutation.copy 
> should be used instead.
>
> There may be other important primitives which I haven't thought 
> of, but generally we should try to leverage std.algorithm, 
> std.range, std.string and std.uni for them, via UFCS.
>
> ----------
>
> On a related note, I know that you want to use AffixAllocator 
> for reference counting, and I think it's a great idea. I have 
> one question, which wasn't answered during that discussion:
>
> // Use a nightly build to compile
> import core.thread : Thread, thread_joinAll;
> import std.range : iota;
> import std.experimental.allocator : makeArray;
> import std.experimental.allocator.building_blocks.region : 
> InSituRegion;
> import 
> std.experimental.allocator.building_blocks.affix_allocator : 
> AffixAllocator;
>
> AffixAllocator!(InSituRegion!(4096) , uint) tlsAllocator;
>
> static assert (tlsAllocator.sizeof >= 4096);
>
> import std.stdio;
> void main()
> {
>     shared(int)[] myArray;
>
>     foreach (i; 0 .. 100)
>     {
>         new Thread(
>         {
>             if (i != 0) return;
>
>             myArray = tlsAllocator.makeArray!(shared 
> int)(100.iota);
>             static 
> assert(is(typeof(&tlsAllocator.prefix(myArray)) == 
> shared(uint)*));
>             writefln("At %x: %s", myArray.ptr, myArray);
>
>         }).start();
>
>         thread_joinAll();
>     }
>
>     writeln(myArray); // prints garbage!!!
> }
>
> So my question is: should it be possible to share thread-local 
> data like this?
> IMO, the current allocator design opens a serious hole in the 
> type system, because it allows using data allocated from 
> another thread's thread-local storage. After the other thread 
> exits, accessing memory allocated from it's TLS should not be 
> possible, but https://github.com/dlang/phobos/pull/3991 clearly 
> allows that.
>
> One should be able to allocate shared memory only from shared 
> allocators. And shared allocators must backed by shared parent 
> allocators or shared underlying storage. In this case the 
> Region allocator should be shared, and must be backed by shared 
> memory, Mallocator, or something in that vein.

Here's another case where the last change to AffixAllocator is 
really dangerous:
void main()
{
     immutable(int)[] myArray;

     foreach (i; 0 .. 100)
     {
         new Thread(
         {
             if (i != 0) return;

             myArray = tlsAllocator.makeArray!(immutable 
int)(100.iota);
             writeln(myArray); // prints [0, ..., 99]

         }).start();

         thread_joinAll(); // prints garbage
     }

     writeln(myArray);
}

In this case it severely violates the promise of immutable.


More information about the Digitalmars-d mailing list