One way to deal with transient ranges & char[] buffers

Fri Aug 2 00:50:26 PDT 2013

On Friday, 2 August 2013 at 05:35:28 UTC, H. S. Teoh wrote:
> Recently, I discovered an interesting idiom for dealing with 
> transient
> ranges, esp. w.r.t. strings / char[]. Many places in D require 
> string,
> but sometimes what you have is char[] which can't be converted 
> to string
> except by .idup. But you don't want to .idup in generic code, 
> because if
> the input's already a string, then it's wasteful duplication.

Places in D that require `string` either do so because they need 
the immutable guarantee or they do so out of error (e.g. should 
have used a string of const characters instead). The latter can 
of course be worked around, but the only *solution* involves 
fixing the upstream code, so I'll assume we're discussing the 
former case.

We don't have any generic mechanism for deep copying ranges. The 
`save` primitive is often implemented by means of copying, but 
conceptually is doing something very different, so it cannot be 
applied here. So, I don't see how your idea translates to ranges 
in general (not completely sure if it was intended to).

Thus, let's tackle the case of arrays/slices in particular, of 
which strings are the most common example. There is a precedent 
in D to push to the decision to copy an array upwards in the 
code. When the operations at hand require the immutable 
guarantee, state it in the interface of the code, such as by 
asking for `string` on a function's parameter. That's why so many 
functions take `string` when they need the immutable guarantee, 
as opposed to `const(char)[]` or a template parameter, followed 
by a GC copy operation. This way, copies are not only minimized, 
but centralized more in user code where they are more visible, 
and the method of making the copy - remember, not all client code 
is fine with rampant GC use - is also pushed up. Also, copies are 
one thing, but what if the caller had a string but in a different 
encoding? Not only does an allocation have to be made, but 
decoding and encoding is also necessary; the details of how to 
handle this are also pushed up, with the same benefits. It's a 
pretty mainstream idiom and is often reiterated by members of the 
community, such as in Ali's talk at dconf.

Your proposed solution only shares one benefit with the solution 
described above - that if the direct caller had a `string` 
already (or a range of `string`s), nothing has to be done. It 
forfeits all the other benefits for convenience. It also has 
problems with template bloat, which can be fixed but at a 
syntactical cost.

Overall I think it reduces the genericity of algorithms by trying 
to handle input types it doesn't actually support, which can be a 
big problem for performance-critical code.