redefining "put" and "OutputRange"

Fri Aug 30 08:16:13 PDT 2013

30-Aug-2013 18:38, monarch_dodra пишет:
>>
>> Which is all good and well, but seeing this:
>>
>> static
>> if (is (R == T function(const( char)[]), T) || is (R == T
>> delegate(const( char)[]), T))
>>         enum isSink = 1;
>>     else static if (is (R == T function(const(wchar)[]), T) || is (R
>> == T delegate(const(wchar)[]), T))
>>         enum isSink = 2;
>>     else static if (is (R == T function(const(dchar)[]), T) || is (R
>> == T delegate(const(dchar)[]), T))
>>         enum isSink = 4;
>>     else
>>         enum isSink = 0;
>>
>> Doesn't inspire confidence - it's special casing on (w|d)char arrays,
>> again... Let's hopefully stop spreading this plague throughout
>> especially under banners of generality.
>
> It's a special case for sinks, yes. I'm not a fan of this, but I think
> it is the *single* cases we can trust. (More on this bellow)

No thanks. Full functionality outweighs trusted but crippled.

>>> The real reason I'm starting this thread is I believe the current way
>>> "put" leads to a *MASSIVE*, *HORRIFYING* issue. I dare not say it:
>>> Escaping references to local stack variables (!!!).
>>
>> It is a dangerous primitive. It's not a good idea to wrap everything
>> with safe bags and specialize a single case - arrays and not even
>> appender of (w|d)char[].
>>
>> Instead it's once again a case where primitive needs better high-level
>> contract inexpressible in simply terms such as @safe-ty provides.
>>
>> The rule is: OutputRange must not hold references to any slices given.
>> And is trivially true for many of current ranges.
>
> OutputRange really just means that put(r, e) resolves one way or
> another. And it also fundamentally depends on what you consider the
> "element type".
>

You put too much faith in the source code alone. Not every assumption is 
written in the source (while it should be probably).

> For example, int[][] is an output range for the element int[]. It makes
> a copy of said element (int[]), but it certainly *won't* copy the
> contents of that slice.

The main reason for output range is to absorb data one by one or in 
chunks (= slices). In that sense int[][] is a bad output range.

I do not really care for formalism that defines what is an element type 
here.

>>> I'd like to make a proposition: "put" needs to be changed to *not*
>>> accept putting an E into something that accepts E[]. There is simply *no
>>> way* to do this safely, and without allocating (both of which, IMO, are
>>> non-negotiable).
>>
>> Just relax and step back for a moment. The bug in question is
>> painfully easy to blowup so chances for it being HORRIBLE are quite
>> low (it's a loud bug). Safety is cool but I expect that output ranges
>> are designed with idea of copying something somewhere or absorbing or
>> accumulating.
>
> I'd agree, if output ranges were actually "designed".

And they were.

> Right now, the
> basic definition is that an "OutputRange" collects "Elements". "put"
> extends the supported "Elements".
>
> The truth is that format sinks "(const(char)[]){}" is the *only*
> OutputRange that collects "Elements", but whose' signture is one that
> accepts a slice. This "flaws" the slice/element notion.

Because it was lacking in performance the most.

> If format sinks were defined as "(char){}" to begin with, then
> everything would work fine (and *does*),

And would slowly crawling into oblivion, that said std.stdio is slow 
even w/o put-ing char by char (+char is not complete thus would require 
buffering on the other side of fence).

but this is not the case today,
> and that is the *only* reason I made an exception for them.

Chances are you missed ubyte/ubyte[] of std.digest.

>>> For objects that define put/opCall, then it is not very complicated to
>>> have two different signatures for "put(E[])"/"opCall(E[])" *and*
>>> "put(E)"/"opCall(E)". This makes it explicit what is and isn't accepted.
>>
>> And that will subtly break some genuinely fine code...
>
> It would "explicitly" break code

... and that is bad ...

> that may (or may *not*) be fine.

The point is if it wasn't fine then it wouldn't survive a day in the wilds.

>>> Lucky enough, the problem never existed with input ranges: "int[][]"
>>> never accepted "int", so there is no problem there.
>>
>> This is it - a confusion between output range of int[]'s accepting
>> them one by one and of int and accepting them in chunks.
>
> I think the problem is "put" overstepping its boundaries. If
> "r.put(someSlice)" compiles, "put" has no reason to think that R
> actually owns the elements in the slice.

It should and this is where we differ I guess. I can't think of a useful 
output range that stores away aliases to slices it takes.

-- 
Dmitry Olshansky