Why is there no lazy `format`?

Tue Oct 20 18:03:32 UTC 2020

On Tue, Oct 20, 2020 at 01:10:12PM -0400, Steven Schveighoffer via Digitalmars-d wrote:
[...]
> std.format is not designed around tracking an in-progress conversion,
> so you would have to convert whole things at once. It might not be
> that desirable.
> 
> For example:
> 
> formatRange("%s", someLargeArrayOrStruct);
> 
> this is going to have to buffer the *whole thing*, and then give you
> lazy access to the buffer.

Yeah, I think std.format's design isn't really conducive to lazy access.
Also, the way the OP wrote the example code isn't really consistent,
because it appears to be returning segments of the formatted string
rather than characters in the string, i.e., it behaves like `string[]`
rather than `string`, which isn't how std.format is designed to work.

If anything, perhaps what's closer to what the OP wants is a lazy
version of text(), because there you can actually individually format
arguments lazily.  But nonetheless, as Steven said, you still need a
buffer of arbitrary size because the .toString of an arbitrary
user-defined type can return an arbitrary amount of formatted data.  You
also cannot impose @nogc, because .toString methods can potentially be
allocating (complex ones almost certainly will).

In such scenarios, output ranges are a much better way to control
allocations -- the caller specifies the allocation scheme (by passing in
an output range that implements the desired allocation scheme).

What *would* be nice, is a standard library construct for inverting an
output range into an input range. Fibers is one way of doing this.
Basically, the pipeline up to the output range will run in its own
fiber, and initially it's backgrounded. As data is requested from the
input range end of the interface, it will context-switch to the output
range fiber and generate data which gets saved into a buffer. At some
point calling Fiber.yield(); then the input range end will start
spooling the generated data to the caller.  Once the buffered data is
exhausted, it context-switches to the output range fiber again, etc..

Note that this does not alleviate the need for buffering, and it's not
100% lazy; what it primarily does is to give a nice input range
interface for stuff written into an output range.  I don't expect it
will do very well performance-wise either, unless the data generators
are designed to cooperate with the inverter -- but in that case, they
would have been written to return an input range instead of requiring an
output range in the first place. So this construct is really more for
convenience than anything.

T

-- 
Любишь кататься - люби и саночки возить.