redefining "put" and "OutputRange"

Dmitry Olshansky dmitry.olsh at gmail.com
Fri Aug 30 07:03:35 PDT 2013


30-Aug-2013 14:53, monarch_dodra пишет:
> I'm starting this thread, first, as a shamless plug to a brand new pull
> I just did, which (amongst others) allows "put" to transcode on the fly
> any string/char of any length, to any string/char of any length.
>
> This fixes issues mostly in std.format, and also allows things like
> formattedWrite to work with pure delegates (it didn't before), as well
> as output directly in wstring or dstring format (awesome for writing to
> a UTF-16 file, for example).
>

Which is all good and well, but seeing this:

static
if (is (R == T function(const( char)[]), T) || is (R == T 
delegate(const( char)[]), T))
         enum isSink = 1;
     else static if (is (R == T function(const(wchar)[]), T) || is (R == 
T delegate(const(wchar)[]), T))
         enum isSink = 2;
     else static if (is (R == T function(const(dchar)[]), T) || is (R == 
T delegate(const(dchar)[]), T))
         enum isSink = 4;
     else
         enum isSink = 0;

Doesn't inspire confidence - it's special casing on (w|d)char arrays, 
again... Let's hopefully stop spreading this plague throughout 
especially under banners of generality.

> --------
>
> The real reason I'm starting this thread is I believe the current way
> "put" leads to a *MASSIVE*, *HORRIFYING* issue. I dare not say it:
> Escaping references to local stack variables (!!!).

It is a dangerous primitive. It's not a good idea to wrap everything 
with safe bags and specialize a single case - arrays and not even 
appender of (w|d)char[].

Instead it's once again a case where primitive needs better high-level 
contract inexpressible in simply terms such as @safe-ty provides.

The rule is: OutputRange must not hold references to any slices given.
And is trivially true for many of current ranges.

> Basically, if R accepts an "E[]", than put will accept a single E
> element as input, and convert it to an "E[]" on the fly, using "put(r,
> (&e)[0 .. 1]);". I'm sure you can see the problem. It allows things such
> as:
>
> //----
> void main()
> {
>      Appender!(int[][]) app; //A type that accumulates slices
>      put(app, 1);
>      put(app, 2);
>      put(app, 3);
>      writeln(app.data); //prints: [[3], [3], [3]]
> }
> //----

Bad code regardless. The bug is in Appender of slices aliasing them 
instead of copying the data thus breaking the above rule. BTW it will 
break in similar fashion with any alias-able type I've no idea how one 
would help it.


> I'd like to make a proposition: "put" needs to be changed to *not*
> accept putting an E into something that accepts E[]. There is simply *no
> way* to do this safely, and without allocating (both of which, IMO, are
> non-negotiable).

Just relax and step back for a moment. The bug in question is painfully 
easy to blowup so chances for it being HORRIBLE are quite low (it's a 
loud bug). Safety is cool but I expect that output ranges are designed 
with idea of copying something somewhere or absorbing or accumulating.

>
> For objects that define put/opCall, then it is not very complicated to
> have two different signatures for "put(E[])"/"opCall(E[])" *and*
> "put(E)"/"opCall(E)". This makes it explicit what is and isn't accepted.

And that will subtly break some genuinely fine code...

> Lucky enough, the problem never existed with input ranges: "int[][]"
> never accepted "int", so there is no problem there.

This is it - a confusion between output range of int[]'s accepting them 
one by one and of int and accepting them in chunks.

-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list