[phobos] I think we need to make an emergency release

Andrei Alexandrescu andrei at erdani.com
Wed Aug 25 14:36:24 PDT 2010


I needed Appender to be a reference type for std.format and std.stdio. 
The point is to be able to compose a complex stream read into a sequence 
of simpler reads. This is possible if you use one Appender throughout, 
but I don't see why preclude people from appending output to their own 
strings. It's not a difficult to implement feature (my convoluted code 
notwithstanding), and it's useful. Why decide to not provide it?

Andrei

On 8/25/10 14:13 PDT, Steve Schveighoffer wrote:
>
>
>
>
> ----- Original Message ----
>> From: Andrei Alexandrescu<andrei at erdani.com>
>> To: Discuss the phobos library for D<phobos at puremagic.com>
>> Sent: Wed, August 25, 2010 4:47:35 PM
>> Subject: Re: [phobos] I think we need to make an emergency release
>>
>> On 8/25/10 12:48 PDT, Steve Schveighoffer wrote:
>>> One very very notable  difference from the original appender is it does not
>> take
>>> a pointer as  the parameter.  This means it does not update the original
>> array
>>>   you pass to it (well, it does, but will not modify the length).
>>
>> I don't  think this is good. When you format stuff you do want to append
>> the  formatted result incrementally to the string.
>
> All it does, and I've made these changes in a couple places, is change this:
>
> auto app = appender(&buffer);
> ...
> return buffer;
>
> to this:
>
> auto app = appender(buffer);
> ...
> return app.data;
>
> You say it's needed, but in reality I didn't find a single compelling use case
> in phobos, and Appender is used all over phobos.  Most cases looked like this:
>
> string buffer;
> auto app = appender(&buffer);
>
> and no reference to buffer later.  Those were trivially replaced with this:
>
> auto app = appender!string();
>
> There was one case in stdio.d where appender is used to convert a utf-16 file to
> utf-8, but appender already has code to take care of that (not added by me).
> The new code looks less confusing to me, you be the judge:
>
> [steves at steveslaptop phobos]$ svn diff std/stdio.d
> Index: std/stdio.d
> ===================================================================
> --- std/stdio.d    (revision 1925)
> +++ std/stdio.d    (working copy)
> @@ -2099,8 +2099,8 @@
>            * Read them and convert to chars.
>            */
>           static assert(wchar_t.sizeof == 2);
> -        auto app = appender(&buf);
> -        buf.length = 0;
> +        auto app = appender(buf);
> +        app.clear();
>           for (int c = void; (c = FGETWC(fp)) != -1; )
>           {
>               if ((c&  ~0x7F) == 0)
> @@ -2120,11 +2120,13 @@
>                       }
>                       c = ((c - 0xD7C0)<<  10) + (c2 - 0xDC00);
>                   }
> -                std.utf.encode(buf, c);
> +                //std.utf.encode(buf, c);
> +                app.put(cast(dchar)c);
>               }
>           }
>           if (ferror(fps))
>               StdioException();
> +        buf = app.data;
>           return buf.length;
>       }
>
> @@ -2138,20 +2140,16 @@
>            * cases.
>            */
>         L1:
> -        if(buf.ptr is null)
> -        {
> -            sz = 128;
> -            auto p = cast(char*) GC.malloc(sz, GC.BlkAttr.NO_SCAN);
> -            buf = p[0 .. 0];
> -        } else {
> -            buf.length = 0;
> -        }
> +        auto app = appender(buf);
> +        app.clear();
> +        if(app.capacity == 0)
> +            app.reserve(128); // get at least 128 bytes available
>
> -        auto app = appender(&buf);
>           int c;
>           while((c = FGETC(fp)) != -1) {
>               app.put(cast(char) c);
> -            if(buf[$ - 1] == terminator) {
> +            if(c == terminator) {
> +                buf = app.data;
>                   return buf.length;
>               }
>
> ===========================
>
> I also suspect that the way this function's return value is typed is based on
> the 'take-an-array-pointer' mode of Appender.  I think it'd be better typed this
> way:
>
> private char[] readlnImpl(FILE* fps, char[] buf = null, dchar terminator = '\n')
>
> That would get rid of the "set buf before returning" part of the code, you just
> return app.data.
>
> There were several cases where someone did something like this:
>
> buffer.length = 0;
> auto app = appender(&buffer);
>
> or
>
> auto app = appender(&buffer);
> buffer.length = 0;
>
> This is trivially replaced with this:
>
> auto app = appender(buffer);
> app.clear();
>
> This tells appender, you can own and replace the data in buffer.  It will not
> append outside of buffer, rather it will realloc as necessary.
>
>>
>>> Other than that, here  is the updated API (with private data shown for an
>> idea of
>>> how the  implementation works):
>>>
>>> struct Appender(A : T[], T)
>>>   {
>>>       private struct Data
>>>        {
>>>           size_t capacity;
>>>            Unqual!(T)[] arr;
>>>        }
>>>
>>>       private Data* _data;
>>
>> I think this  looks good - why not make arr of type pointer to array and
>> support append to  existing arrays?
>
> Safety concerns.  You are most likely passing a reference to a stack-based array
> reference, so if you escape the Appender, you escape the stack-based reference.
> This isn't strictly necessary because the most important thing is the data.
> Now, you could escape stack data by passing a reference to a stack-allocated
> buffer, but I think there's no way around this.
>
> The thing is, when requiring passing a pointer, you are *guaranteeing* the
> chance for escaping.  Even when you don't care about the original reference.
> The fact that about half of phobos usages of appender looked like this:
>
> string buffer;
> auto app = appender(&buffer); // appender contains a pointer to stack data now!
>
> tells me that people will make this mistake.
>
>>> /**
>>> Construct an appender with a given  array.  Note that this does not copy the
>>> data, but appending to the  array will copy the data, since Appender does
> not
>>> know if this data has  valid data residing after it.  The initial capacity
>> will
>>> be
>>>   arr.length.
>>> */
>>>       this(T[]  arr);
>>
>> Appender was partly created to avoid the issue the comment warns  about.
>
> All it means is, don't expect what you do to the appender to be reflected in the
> original array reference.  I think designing to support this is a huge safety
> problem, and provides almost no useful benefit.  If you have the appender, you
> can get the appender's data,  why do you have to use the original reference?.
> Also note any aliases to the original array reference won't be updated.
>
> -Steve
>
>
>
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos


More information about the phobos mailing list