redefining "put" and "OutputRange"

monarch_dodra monarchdodra at gmail.com
Fri Aug 30 07:38:47 PDT 2013


On Friday, 30 August 2013 at 14:03:36 UTC, Dmitry Olshansky wrote:
> 30-Aug-2013 14:53, monarch_dodra пишет:
>> I'm starting this thread, first, as a shamless plug to a brand 
>> new pull
>> I just did, which (amongst others) allows "put" to transcode 
>> on the fly
>> any string/char of any length, to any string/char of any 
>> length.
>>
>> This fixes issues mostly in std.format, and also allows things 
>> like
>> formattedWrite to work with pure delegates (it didn't before), 
>> as well
>> as output directly in wstring or dstring format (awesome for 
>> writing to
>> a UTF-16 file, for example).
>>
>
> Which is all good and well, but seeing this:
>
> static
> if (is (R == T function(const( char)[]), T) || is (R == T 
> delegate(const( char)[]), T))
>         enum isSink = 1;
>     else static if (is (R == T function(const(wchar)[]), T) || 
> is (R == T delegate(const(wchar)[]), T))
>         enum isSink = 2;
>     else static if (is (R == T function(const(dchar)[]), T) || 
> is (R == T delegate(const(dchar)[]), T))
>         enum isSink = 4;
>     else
>         enum isSink = 0;
>
> Doesn't inspire confidence - it's special casing on (w|d)char 
> arrays, again... Let's hopefully stop spreading this plague 
> throughout especially under banners of generality.

It's a special case for sinks, yes. I'm not a fan of this, but I 
think it is the *single* cases we can trust. (More on this bellow)

>> --------
>>
>> The real reason I'm starting this thread is I believe the 
>> current way
>> "put" leads to a *MASSIVE*, *HORRIFYING* issue. I dare not say 
>> it:
>> Escaping references to local stack variables (!!!).
>
> It is a dangerous primitive. It's not a good idea to wrap 
> everything with safe bags and specialize a single case - arrays 
> and not even appender of (w|d)char[].
>
> Instead it's once again a case where primitive needs better 
> high-level contract inexpressible in simply terms such as 
> @safe-ty provides.
>
> The rule is: OutputRange must not hold references to any slices 
> given.
> And is trivially true for many of current ranges.

OutputRange really just means that put(r, e) resolves one way or 
another. And it also fundamentally depends on what you consider 
the "element type".

For example, int[][] is an output range for the element int[]. It 
makes a copy of said element (int[]), but it certainly *won't* 
copy the contents of that slice.

>> Basically, if R accepts an "E[]", than put will accept a 
>> single E
>> element as input, and convert it to an "E[]" on the fly, using 
>> "put(r,
>> (&e)[0 .. 1]);". I'm sure you can see the problem. It allows 
>> things such
>> as:
>>
>> //----
>> void main()
>> {
>>     Appender!(int[][]) app; //A type that accumulates slices
>>     put(app, 1);
>>     put(app, 2);
>>     put(app, 3);
>>     writeln(app.data); //prints: [[3], [3], [3]]
>> }
>> //----
>
> Bad code regardless. The bug is in Appender of slices aliasing 
> them instead of copying the data thus breaking the above rule. 
> BTW it will break in similar fashion with any alias-able type 
> I've no idea how one would help it.

The bug most certainly isn't in Appender. Appender's job is to 
accumulate *slices*, and is exactly what it is doing. The caller 
code might be incorrect, but std.range.put *is* accepting it, and 
is doing a terrible job at it.

>> I'd like to make a proposition: "put" needs to be changed to 
>> *not*
>> accept putting an E into something that accepts E[]. There is 
>> simply *no
>> way* to do this safely, and without allocating (both of which, 
>> IMO, are
>> non-negotiable).
>
> Just relax and step back for a moment. The bug in question is 
> painfully easy to blowup so chances for it being HORRIBLE are 
> quite low (it's a loud bug). Safety is cool but I expect that 
> output ranges are designed with idea of copying something 
> somewhere or absorbing or accumulating.

I'd agree, if output ranges were actually "designed". Right now, 
the basic definition is that an "OutputRange" collects 
"Elements". "put" extends the supported "Elements".

The truth is that format sinks "(const(char)[]){}" is the *only* 
OutputRange that collects "Elements", but whose' signture is one 
that accepts a slice. This "flaws" the slice/element notion.

If format sinks were defined as "(char){}" to begin with, then 
everything would work fine (and *does*), but this is not the case 
today, and that is the *only* reason I made an exception for them.

>> For objects that define put/opCall, then it is not very 
>> complicated to
>> have two different signatures for "put(E[])"/"opCall(E[])" 
>> *and*
>> "put(E)"/"opCall(E)". This makes it explicit what is and isn't 
>> accepted.
>
> And that will subtly break some genuinely fine code...

It would "explicitly" break code that may (or may *not*) be fine.

>> Lucky enough, the problem never existed with input ranges: 
>> "int[][]"
>> never accepted "int", so there is no problem there.
>
> This is it - a confusion between output range of int[]'s 
> accepting them one by one and of int and accepting them in 
> chunks.

I think the problem is "put" overstepping its boundaries. If 
"r.put(someSlice)" compiles, "put" has no reason to think that R 
actually owns the elements in the slice.


More information about the Digitalmars-d mailing list